AWK Basics¶

awk is a great tool, especially when you know how to use it!

Let's warm up with some very simple example.

1. Preparation¶

Raw Data

We need a file to work with, run the following command to create it.

cat <<LOGS> test.log
2020-01-02 12:14:22 ERROR app1 I have a bad dream
2020-05-28 10:42:32 WARNING app1 I have a tea
2020-09-21 09:56:12 INFO app1 I eat an apple
2020-11-30 19:39:23 ERROR app1 I eat a bad banana
LOGS

2. Print¶

Let's print the date in each line:

print

RunOutput

awk '{print $1}' test.log

Here, by default, awk use the space as the default field separator, and $1 means the first field.

Tips

$0 means the whole row of the data, try it out!

Now let's try printf, which will print with format:

printf

RunOutput

awk '{printf "%s", $1}' test.log

You may find out, all the fields collapse to one line:

2020-01-022020-05-282020-09-212020-11-30

Let's get the multi-line result back:

New Line at the End

RunOutput

awk '{printf "%s\n", $1}' test.log

You can also set the width of the fields using printf:

Column Width and Alignment

RunOutput

awk '{printf "%-12s%s\n", $1,$2}' test.log

2020-01-02  12:14:22
2020-05-28  10:42:32
2020-09-21  09:56:12
2020-11-30  19:39:23

What does "%-12s" mean?

"%-12s" means that it will print the value as a string of 12 characters align to the left.

"%12s" will print the same content but align to the right.

3. FILENAME, NF, NR, FS, OFS, RS, ORS¶

There are lots of useful variables in awk, let's explore some of them here. I just don't want feed you too much now in case it'll scare you (:P).

Variable	Description
FILENAME	File name
NF	Total number of fields
NR	Row No.
FS	Field Separator (default: space)
OFS	Output Field Separator (default: space)
RS	Row Separator (default: newline)
ORS	Output Row Separator (default: newline)

Let's try print out the filename, number of fields, row number and date, and the fields are seperated by tab.

FILENAME, NF, NR, and $1

RunOutput

awk '{print FILENAME,NF,NR,$1}' OFS="\t" test.log

test.log        9       1       2020-01-02
test.log        8       2       2020-05-28
test.log        8       3       2020-09-21
test.log        9       4       2020-11-30

Let's set the FS(field separator) to -, and print out the first and second field.

Use - as the Field Separator

RunOutput

There are different ways setting FS, you can try out any of the below:

awk 'BEGIN{FS="-"}{print $1, $2}' test.log

awk -F- '{print $1, $2}' test.log

awk -F - '{print $1, $2}' test.log

awk --field-separator=- '{print $1, $2}' test.log

As you will notice, as we change a different field separator, the first and second fields are different now.

4. Get the Log Contents¶

As the last task in this section, let's try something more complicated. It may seems not easy at first, but by the end of the whole tutorial, it will look just like as easy as an primary school math problem.

Now, you may find out, the log format of our log file is not perfect. If you try to print out the contents only, it is not that straight forward(there are so many fields! And the number of fields are different in different rows.).

How to do it? Lets make use of NF, OFS, ORS and a for loop.

Print the Log Contents

RunOutput

What this command does is, for each field from the fifth to the last:

print the value of the field
if it is not the last field, print a file separator, else print a row separator

awk '{for (i=5; i<NF+1; i++) printf "%s%s", $i, (i<NF? OFS:ORS)}' test.log

Here is the result:

I have a bad dream
I have a tea
I eat an apple
I eat a bad banana

5. Summary¶

I think it is totally enough for a basic session.

We have tried out:

print, printf
FS, OFS
a slightly complicated task using the combination of NF, OFS and for loop.

We will target some harder tasks further through the tutorials.