GREP¶
grep
stands for global regular expression print
.
Well, to my understanding, it is just filter-then-print.
1. Preparation¶
First, let's get some data and play with it.
Raw Data
Run the following command to create a test data.
cat <<LOGS> tut-access.log
161.138.187.117 - - [05/Jan/2021:23:05:01 -0500] "PUT /app/main/posts HTTP/1.0" 200 4973 "http://smith.com/" "Mozilla/5.0 (Windows NT 6.1; yi-US; rv:1.9.2.20) Gecko/2011-09-10 13:36:12 Firefox/3.6.13"
83.191.216.184 - - [05/Jan/2021:23:05:39 -0500] "POST /apps/cart.jsp?appID=3885 HTTP/1.0" 200 5031 "http://www.bryant.com/terms/" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_12_4) AppleWebKit/534.0 (KHTML, like Gecko) Chrome/60.0.831.0 Safari/534.0"
164.42.246.104 - - [05/Jan/2021:23:09:56 -0500] "PUT /wp-content HTTP/1.0" 200 4976 "https://flynn-cruz.com/home/" "Mozilla/5.0 (Android 2.1; Mobile; rv:45.0) Gecko/45.0 Firefox/45.0"
28.219.159.236 - - [05/Jan/2021:23:11:43 -0500] "PUT /list HTTP/1.0" 200 5010 "http://www.wright.com/home/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.1 (KHTML, like Gecko) Chrome/42.0.863.0 Safari/536.1"
189.143.182.79 - - [05/Jan/2021:23:14:06 -0500] "GET /apps/cart.jsp?appID=9015 HTTP/1.0" 200 5037 "http://www.davis-moreno.com/app/tag/home.asp" "Mozilla/5.0 (Windows CE; ti-ET; rv:1.9.1.20) Gecko/2013-12-25 17:20:54 Firefox/3.8"
28.105.221.183 - - [05/Jan/2021:23:17:22 -0500] "GET /wp-admin HTTP/1.0" 200 4947 "http://www.smith-phelps.biz/homepage/" "Mozilla/5.0 (Windows; U; Windows 95) AppleWebKit/535.24.6 (KHTML, like Gecko) Version/5.0.5 Safari/535.24.6"
86.74.0.138 - - [05/Jan/2021:23:19:53 -0500] "GET /app/main/posts HTTP/1.0" 200 5036 "http://turner-brown.com/blog/search/" "Mozilla/5.0 (Windows NT 4.0; hr-HR; rv:1.9.2.20) Gecko/2014-09-22 21:20:35 Firefox/3.6.4"
154.106.166.221 - - [05/Jan/2021:23:20:24 -0500] "GET /app/main/posts HTTP/1.0" 200 4942 "https://www.white.com/wp-content/faq/" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/531.2 (KHTML, like Gecko) Chrome/30.0.872.0 Safari/531.2"
160.75.38.89 - - [05/Jan/2021:23:25:07 -0500] "GET /list HTTP/1.0" 200 5043 "http://gordon.biz/search/main/tags/privacy.asp" "Mozilla/5.0 (Windows NT 5.1; fil-PH; rv:1.9.0.20) Gecko/2019-03-26 20:49:08 Firefox/3.8"
90.1.228.91 - - [05/Jan/2021:23:25:57 -0500] "GET /wp-admin HTTP/1.0" 200 4982 "http://brady.com/tag/home/" "Mozilla/5.0 (X11; Linux i686; rv:1.9.6.20) Gecko/2012-05-30 12:45:08 Firefox/3.8"
LOGS
2. Basics¶
Let's first try a very simple task to know a bit what grep
can do.
All the PUT Requests
grep PUT tut-access.log
161.138.187.117 - - [05/Jan/2021:23:05:01 -0500] "PUT /app/main/posts HTTP/1.0" 200 4973 "http://smith.com/" "Mozilla/5.0 (Windows NT 6.1; yi-US; rv:1.9.2.20) Gecko/2011-09-10 13:36:12 Firefox/3.6.13"
164.42.246.104 - - [05/Jan/2021:23:09:56 -0500] "PUT /wp-content HTTP/1.0" 200 4976 "https://flynn-cruz.com/home/" "Mozilla/5.0 (Android 2.1; Mobile; rv:45.0) Gecko/45.0 Firefox/45.0"
28.219.159.236 - - [05/Jan/2021:23:11:43 -0500] "PUT /list HTTP/1.0" 200 5010 "http://www.wright.com/home/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.1 (KHTML, like Gecko) Chrome/42.0.863.0 Safari/536.1"
Here, grep
will print all the lines with the string PUT
in it.
3. A Bit More¶
grep
has a lot paramters, among those, some are quite convenient and frequently used. Below, we will try them out one by one.
Ignore Case
Sometimes, you want to match the string regardless of the letter being upper case or lower case.
grep -i li tut-access.log
like
, list
and Linux
are all valid match.
However, ignoring case
may not behave as expected in languages other than English
.
83.191.216.184 - - [05/Jan/2021:23:05:39 -0500] "POST /apps/cart.jsp?appID=3885 HTTP/1.0" 200 5031 "http://www.bryant.com/terms/" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_12_4) AppleWebKit/534.0 (KHTML, like Gecko) Chrome/60.0.831.0 Safari/534.0"
28.219.159.236 - - [05/Jan/2021:23:11:43 -0500] "PUT /list HTTP/1.0" 200 5010 "http://www.wright.com/home/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.1 (KHTML, like Gecko) Chrome/42.0.863.0 Safari/536.1"
28.105.221.183 - - [05/Jan/2021:23:17:22 -0500] "GET /wp-admin HTTP/1.0" 200 4947 "http://www.smith-phelps.biz/homepage/" "Mozilla/5.0 (Windows; U; Windows 95) AppleWebKit/535.24.6 (KHTML, like Gecko) Version/5.0.5 Safari/535.24.6"
154.106.166.221 - - [05/Jan/2021:23:20:24 -0500] "GET /app/main/posts HTTP/1.0" 200 4942 "https://www.white.com/wp-content/faq/" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/531.2 (KHTML, like Gecko) Chrome/30.0.872.0 Safari/531.2"
160.75.38.89 - - [05/Jan/2021:23:25:07 -0500] "GET /list HTTP/1.0" 200 5043 "http://gordon.biz/search/main/tags/privacy.asp" "Mozilla/5.0 (Windows NT 5.1; fil-PH; rv:1.9.0.20) Gecko/2019-03-26 20:49:08 Firefox/3.8"
90.1.228.91 - - [05/Jan/2021:23:25:57 -0500] "GET /wp-admin HTTP/1.0" 200 4982 "http://brady.com/tag/home/" "Mozilla/5.0 (X11; Linux i686; rv:1.9.6.20) Gecko/2012-05-30 12:45:08 Firefox/3.8"
Match Only Whole Word
It's quite often you want to only match the string as a whole word, instead of part of another word.
grep -w Linux tut-access.log
28.219.159.236 - - [05/Jan/2021:23:11:43 -0500] "PUT /list HTTP/1.0" 200 5010 "http://www.wright.com/home/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.1 (KHTML, like Gecko) Chrome/42.0.863.0 Safari/536.1"
90.1.228.91 - - [05/Jan/2021:23:25:57 -0500] "GET /wp-admin HTTP/1.0" 200 4982 "http://brady.com/tag/home/" "Mozilla/5.0 (X11; Linux i686; rv:1.9.6.20) Gecko/2012-05-30 12:45:08 Firefox/3.8"
grep -w Lin tut-access.log
No line is matched.
Invert Match
Invert Match
is just to exclude the lines having matching items.
grep -v Windows tut-access.log
83.191.216.184 - - [05/Jan/2021:23:05:39 -0500] "POST /apps/cart.jsp?appID=3885 HTTP/1.0" 200 5031 "http://www.bryant.com/terms/" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_12_4) AppleWebKit/534.0 (KHTML, like Gecko) Chrome/60.0.831.0 Safari/534.0"
164.42.246.104 - - [05/Jan/2021:23:09:56 -0500] "PUT /wp-content HTTP/1.0" 200 4976 "https://flynn-cruz.com/home/" "Mozilla/5.0 (Android 2.1; Mobile; rv:45.0) Gecko/45.0 Firefox/45.0"
28.219.159.236 - - [05/Jan/2021:23:11:43 -0500] "PUT /list HTTP/1.0" 200 5010 "http://www.wright.com/home/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.1 (KHTML, like Gecko) Chrome/42.0.863.0 Safari/536.1"
90.1.228.91 - - [05/Jan/2021:23:25:57 -0500] "GET /wp-admin HTTP/1.0" 200 4982 "http://brady.com/tag/home/" "Mozilla/5.0 (X11; Linux i686; rv:1.9.6.20) Gecko/2012-05-30 12:45:08 Firefox/3.8"
Combinations
We can use the combination of parameters above to achieve more.
grep -v -w http tut-access.log
I know this looks silly not just grep https
, but at least it shows what it can do. Although, well, it doesn't show what more it has achieved, probably.
164.42.246.104 - - [05/Jan/2021:23:09:56 -0500] "PUT /wp-content HTTP/1.0" 200 4976 "https://flynn-cruz.com/home/" "Mozilla/5.0 (Android 2.1; Mobile; rv:45.0) Gecko/45.0 Firefox/45.0"
154.106.166.221 - - [05/Jan/2021:23:20:24 -0500] "GET /app/main/posts HTTP/1.0" 200 4942 "https://www.white.com/wp-content/faq/" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/531.2 (KHTML, like Gecko) Chrome/30.0.872.0 Safari/531.2"