AWK Basics¶
awk
is a great tool, especially when you know how to use it!
Let's warm up with some very simple example.
1. Preparation¶
Raw Data
We need a file to work with, run the following command to create it.
cat <<LOGS> test.log
2020-01-02 12:14:22 ERROR app1 I have a bad dream
2020-05-28 10:42:32 WARNING app1 I have a tea
2020-09-21 09:56:12 INFO app1 I eat an apple
2020-11-30 19:39:23 ERROR app1 I eat a bad banana
LOGS
2. Print¶
Let's print the date in each line:
awk '{print $1}' test.log
2020-01-02
2020-05-28
2020-09-21
2020-11-30
Here, by default, awk use the space
as the default field separator, and $1
means the first field.
Tips
$0
means the whole row of the data, try it out!
Now let's try printf
, which will print with format:
printf
awk '{printf "%s", $1}' test.log
You may find out, all the fields collapse to one line:
2020-01-022020-05-282020-09-212020-11-30
Let's get the multi-line result back:
New Line at the End
awk '{printf "%s\n", $1}' test.log
2020-01-02
2020-05-28
2020-09-21
2020-11-30
You can also set the width of the fields using printf
:
Column Width and Alignment
awk '{printf "%-12s%s\n", $1,$2}' test.log
2020-01-02 12:14:22
2020-05-28 10:42:32
2020-09-21 09:56:12
2020-11-30 19:39:23
What does "%-12s" mean?
"%-12s" means that it will print the value as a string of 12 characters align to the left.
"%12s" will print the same content but align to the right.
3. FILENAME, NF, NR, FS, OFS, RS, ORS¶
There are lots of useful variables in awk
, let's explore some of them here. I just don't want feed you too much now in case it'll scare you (:P).
Variable | Description |
---|---|
FILENAME | File name |
NF | Total number of fields |
NR | Row No. |
FS | Field Separator (default: space) |
OFS | Output Field Separator (default: space) |
RS | Row Separator (default: newline) |
ORS | Output Row Separator (default: newline) |
Let's try print out the filename
, number of fields
, row number
and date
, and the fields are seperated by tab
.
FILENAME, NF, NR, and $1
awk '{print FILENAME,NF,NR,$1}' OFS="\t" test.log
test.log 9 1 2020-01-02
test.log 8 2 2020-05-28
test.log 8 3 2020-09-21
test.log 9 4 2020-11-30
Let's set the FS(field separator)
to -
, and print out the first and second field.
Use -
as the Field Separator
There are different ways setting FS
, you can try out any of the below:
awk 'BEGIN{FS="-"}{print $1, $2}' test.log
awk -F- '{print $1, $2}' test.log
awk -F - '{print $1, $2}' test.log
awk --field-separator=- '{print $1, $2}' test.log
As you will notice, as we change a different field separator, the first and second fields are different now.
2020 01
2020 05
2020 09
2020 11
4. Get the Log Contents¶
As the last task in this section, let's try something more complicated. It may seems not easy at first, but by the end of the whole tutorial, it will look just like as easy as an primary school math problem.
Now, you may find out, the log format of our log file is not perfect. If you try to print out the contents only, it is not that straight forward(there are so many fields! And the number of fields are different in different rows.).
How to do it? Lets make use of NF
, OFS
, ORS
and a for loop
.
Print the Log Contents
What this command does is, for each field from the fifth to the last:
- print the value of the field
- if it is not the last field, print a
file separator
, else print arow separator
awk '{for (i=5; i<NF+1; i++) printf "%s%s", $i, (i<NF? OFS:ORS)}' test.log
Here is the result:
I have a bad dream
I have a tea
I eat an apple
I eat a bad banana
5. Summary¶
I think it is totally enough for a basic session.
We have tried out:
print
,printf
FS
,OFS
- a slightly complicated task using the combination of
NF
,OFS
andfor loop
.
We will target some harder tasks further through the tutorials.