Weekly Shaarli

All links of one week in a single page.

Week 52 (December 25, 2023)

Use find and grep to list files containing a string and sort the results by mtime

This is an extension of the very common find ... -exec grep ... {} \; construct I use almost daily to find which files contain a particular text string.

Now, let's say you're looking for files in all your Ansible roles containing the string ppa:, because you want to create a new role using a suitable existing role as a template. In this case, I think most recently modified is an excellent proxy for suitability.

Thus, the challenge: can we tack on something to the find ... grep construct such that the output shows matching files in order of most recently modified?

taha@asks2:~
$ cd /media/bay/taha/projects/ansible && find . -not -path '*/legacy/*' -type f -name "*.yml" -exec grep -il "ppa:" {} \; -printf "%T+ %p\n" | grep -v "^\\./.*" | sort && cd $OLDPWD
2021-06-09+22:27:20.6889730070 ./roles/public/php-versions/tasks/setup-Debian.yml
2022-01-09+07:39:51.1836426130 ./roles/public/java-openjdk/tasks/ppa.yml
2022-02-23+15:33:13.2318546100 ./roles/dev/deluge/tasks/install.yml
2022-03-31+04:42:16.5644336650 ./roles/dev/editor-notepadqq/tasks/main.yml
2022-05-09+15:21:01.7974094830 ./roles/dev/qownnotes/tasks/main.yml
2022-06-25+02:46:13.0580097480 ./playbooks/workstation/roles/boot-grub/tasks/main.yml
2022-06-25+02:46:18.2940273650 ./roles/dev/libreoffice/tasks/main.yml
2022-07-03+00:16:44.5563131800 ./roles/dev/shutter/tasks/main.yml
2022-07-04+20:35:15.6128270470 ./roles/dev/magnus/tasks/main.yml
2022-07-14+21:09:01.6502524570 ./roles/dev/flatpak-remote/tasks/main.yml
2022-07-14+22:05:43.3325667290 ./roles/public/firejail/tasks/install.yml
2022-07-16+13:42:04.3756138100 ./roles/public/variety/tasks/main.yml
2022-07-16+22:05:52.0552035060 ./roles/public/browser-chromium/tasks/install.yml
2022-07-21+00:14:51.1137716920 ./roles/public/foliate-ebookreader/tasks/ppa.yml
2022-07-21+03:07:42.2876610030 ./roles/public/graphics-driver-nvidia/tasks/install.yml
2022-07-21+06:05:27.4514643180 ./roles/public/foliate-ebookreader/tasks/flatpak.yml
2022-07-23+18:32:46.8466638700 ./roles/dev/handbrake/tasks/main.yml
2022-08-12+00:09:51.6575729520 ./roles/dev/x2goclient/tasks/main.yml
2022-08-19+15:28:49.5605481840 ./roles/dev/x2goserver/tasks/main.yml
2022-11-04+11:35:14.6208169990 ./roles/public/python3/tasks/python-ppa.yml
2022-11-19+03:16:16.7832183030 ./roles/public/browser-firefox/tasks/main.yml
2022-12-24+23:04:01.2033026010 ./roles/public/R/tasks/dependencies.yml
2022-12-31+19:38:32.9105553030 ./roles/public/digikam/tasks/install-ppa.yml
2023-01-01+01:39:08.2045090970 ./roles/public/digikam/tasks/install-appimage.yml
2023-01-14+00:44:34.8526187360 ./roles/public/java-openjdk/defaults/main.yml
2023-01-26+14:10:18.9247087870 ./roles/public/ansible/tasks/main.yml
2023-01-26+16:14:59.9903243110 ./roles/public/sioyek-pdf/defaults/main.yml
2023-05-12+12:01:30.4705549280 ./roles/public/mpv/tasks/install.yml
2023-05-13+00:47:41.1561557100 ./roles/public/nextcloud-desktop/tasks/main.yml
2023-08-27+18:19:44.8291334420 ./roles/public/digikam/defaults/main.yml

Eureka!

Explainer

  • the initial cd <path-parent> ensure that the resulting paths displayed by find don't contain the <path-parent> part (to avoid cluttering the output), and the final cd $OLDPWD just make sure that the bash prompt is not changed to <path-parent>.
  • unless you want to exclude some path from the search, there is obviously no need for -not -path '*/<some-path>/*'.
  • grep -i for case insensitive matching, and -l (that's the letter l for list) makes grep print only the filename and not each matching line (this is crucial for this hack to work, we want grep to produce as little output as possible, in fact, if I could figure out a way to silence grep altogether I would have, but I couldn't).
  • -printf "%T+ %p\n" adds the file mtime to the output (on a new line). Thanks angus@Unix.SE.

At this point, an example of the unfinished product is order. Before sorting, and before the final grep -v, the output looks like this (excerpt):

taha@asks2:~
$ cd /media/bay/taha/projects/ansible && find . -not -path '*/legacy/*' -type f -name "*.yml" -exec grep -il "ppa:" {} \; -printf "%T+ %p\n" && cd $OLDPWD
./roles/public/java-openjdk/defaults/main.yml
2023-01-14+00:44:34.8526187360 ./roles/public/java-openjdk/defaults/main.yml
./roles/public/java-openjdk/tasks/ppa.yml
2022-01-09+07:39:51.1836426130 ./roles/public/java-openjdk/tasks/ppa.yml
./roles/public/browser-firefox/tasks/main.yml
2022-11-19+03:16:16.7832183030 ./roles/public/browser-firefox/tasks/main.yml
./roles/public/R/tasks/dependencies.yml
2022-12-24+23:04:01.2033026010 ./roles/public/R/tasks/dependencies.yml
./roles/public/browser-chromium/tasks/install.yml
2022-07-16+22:05:52.0552035060 ./roles/public/browser-chromium/tasks/install.yml
./roles/dev/x2goclient/tasks/main.yml
2022-08-12+00:09:51.6575729520 ./roles/dev/x2goclient/tasks/main.yml

with the grep output on its own line, followed by the time-stamped output of printf. Like I said, it would have been better if we could somehow silence the grep output at this point. If you know a way, feel free to let me know!

As expected, sorting resulted in the non-time-stamped lines dangling about like some unwanted appendage:

taha@asks2:~
$ cd /media/bay/taha/projects/ansible && find . -not -path '*/legacy/*' -type f -name "*.yml" -exec grep -il "ppa:" {} \; -printf "%T+ %p\n" && cd $OLDPWD
2022-01-09+07:39:51.1836426130 ./roles/public/java-openjdk/tasks/ppa.yml
2022-07-16+22:05:52.0552035060 ./roles/public/browser-chromium/tasks/install.yml
2022-08-12+00:09:51.6575729520 ./roles/dev/x2goclient/tasks/main.yml
2022-11-19+03:16:16.7832183030 ./roles/public/browser-firefox/tasks/main.yml
2022-12-24+23:04:01.2033026010 ./roles/public/R/tasks/dependencies.yml
2023-01-14+00:44:34.8526187360 ./roles/public/java-openjdk/defaults/main.yml
./roles/public/java-openjdk/tasks/ppa.yml
./roles/public/browser-chromium/tasks/install.yml
./roles/dev/x2goclient/tasks/main.yml
./roles/public/browser-firefox/tasks/main.yml
./roles/public/R/tasks/dependencies.yml
./roles/public/java-openjdk/defaults/main.yml

At this point I was out of ideas, so grep -v ... it was, and we end up with the one-liner shown above. It's an ugly hack, but hey, it works :-)

Bash 5.1 on Ubuntu 22.04.3, with GNU find 4.8.0, GNU grep 3.7, and GNU sort 8.32.

Sveriges energisystem (Energimyndigheten)
thumbnail

Energisystemet är alltid i balans. Det betyder att den tillförda energin alltid är lika stor som den använda energin, inklusive förluster.

Via Energimyndigheten.