Skip to content

AWK Basics

awk is a great tool, especially when you know how to use it!

Let's warm up with some very simple example.

1. Preparation

Raw Data

We need a file to work with, run the following command to create it.

cat <<LOGS> test.log
2020-01-02 12:14:22 ERROR app1 I have a bad dream
2020-05-28 10:42:32 WARNING app1 I have a tea
2020-09-21 09:56:12 INFO app1 I eat an apple
2020-11-30 19:39:23 ERROR app1 I eat a bad banana
LOGS

2. Print

Let's print the date in each line:

print

awk '{print $1}' test.log
2020-01-02
2020-05-28
2020-09-21
2020-11-30

Here, by default, awk use the space as the default field separator, and $1 means the first field.

Tips

$0 means the whole row of the data, try it out!

Now let's try printf, which will print with format:

printf

awk '{printf "%s", $1}' test.log

You may find out, all the fields collapse to one line:

2020-01-022020-05-282020-09-212020-11-30

Let's get the multi-line result back:

New Line at the End

awk '{printf "%s\n", $1}' test.log
2020-01-02
2020-05-28
2020-09-21
2020-11-30

You can also set the width of the fields using printf:

Column Width and Alignment

awk '{printf "%-12s%s\n", $1,$2}' test.log
2020-01-02  12:14:22
2020-05-28  10:42:32
2020-09-21  09:56:12
2020-11-30  19:39:23

What does "%-12s" mean?

"%-12s" means that it will print the value as a string of 12 characters align to the left.

"%12s" will print the same content but align to the right.

3. FILENAME, NF, NR, FS, OFS, RS, ORS

There are lots of useful variables in awk, let's explore some of them here. I just don't want feed you too much now in case it'll scare you (:P).

Variable Description
FILENAME File name
NF Total number of fields
NR Row No.
FS Field Separator (default: space)
OFS Output Field Separator (default: space)
RS Row Separator (default: newline)
ORS Output Row Separator (default: newline)

Let's try print out the filename, number of fields, row number and date, and the fields are seperated by tab.

FILENAME, NF, NR, and $1

awk '{print FILENAME,NF,NR,$1}' OFS="\t" test.log
test.log        9       1       2020-01-02
test.log        8       2       2020-05-28
test.log        8       3       2020-09-21
test.log        9       4       2020-11-30

Let's set the FS(field separator) to -, and print out the first and second field.

Use - as the Field Separator

There are different ways setting FS, you can try out any of the below:

awk 'BEGIN{FS="-"}{print $1, $2}' test.log
awk -F- '{print $1, $2}' test.log
awk -F - '{print $1, $2}' test.log
awk --field-separator=- '{print $1, $2}' test.log

As you will notice, as we change a different field separator, the first and second fields are different now.

2020 01
2020 05
2020 09
2020 11

4. Get the Log Contents

As the last task in this section, let's try something more complicated. It may seems not easy at first, but by the end of the whole tutorial, it will look just like as easy as an primary school math problem.

Now, you may find out, the log format of our log file is not perfect. If you try to print out the contents only, it is not that straight forward(there are so many fields! And the number of fields are different in different rows.).

How to do it? Lets make use of NF, OFS, ORS and a for loop.

Print the Log Contents

What this command does is, for each field from the fifth to the last:

  • print the value of the field
  • if it is not the last field, print a file separator, else print a row separator
awk '{for (i=5; i<NF+1; i++) printf "%s%s", $i, (i<NF? OFS:ORS)}' test.log

Here is the result:

I have a bad dream
I have a tea
I eat an apple
I eat a bad banana

5. Summary

I think it is totally enough for a basic session.

We have tried out:

  • print, printf
  • FS, OFS
  • a slightly complicated task using the combination of NF, OFS and for loop.

We will target some harder tasks further through the tutorials.