Skip to content

Weasley's Wizard Wheezes Orders: Harry Potter

We got some order records available, and you may want to take a look:

Name Item Price(Galleon) Quantity Date
Harry Potter Invisibility Cloak 1000 1 2020-01-01
Tom Riddle Diary Book 1 2 2020-01-01
Albus Dumbledore Chocolate Frogs 2 2 2020-01-02
Gellert Grindelwald Elder Wand 1000 1 2020-01-01
Lord Voldemort Cauldron 5 1 2020-01-02
Lord Voldemort Wormtail 1.99 1 2020-01-01
Lord Voldemort Nagini 10.99 1 2020-01-02
Lord Voldemort Jetpack 0.99 100 2020-01-01
Lord Voldemort Horcrux 2 6 2020-01-01
Albus Dumbledore Butterbeer 1 2 2020-01-01
Harry Potter Resurrection Store 1000 1 2020-01-02
Lord Voldemort Love Portions 5 100 2020-01-01
Harry Potter Portable Swamps 3 5 2020-01-02
Albus Dumbledore Socks 1.5 2 2020-01-01

1. Preparation

Raw Data

Run the following code to generate the raw data.

cat <<ORDERS> order.txt
Name                 Item                Price(Galleon)    Quantity     Date
Harry Potter         Invisibility Cloak  1000              1            2020-01-01
Tom Riddle           Diary Book          1                 2            2020-01-01
Albus Dumbledore     Chocolate Frogs     2                 2            2020-01-02
Gellert Grindelwald  Elder Wand          1000              1            2020-01-01
Lord Voldemort       Cauldron            5                 1            2020-01-02
Lord Voldemort       Wormtail            1.99              1            2020-01-01
Lord Voldemort       Nagini              10.99             1            2020-01-02
Lord Voldemort       Jetpack             0.99              100          2020-01-01
Lord Voldemort       Horcrux             100               6            2020-01-01
Albus Dumbledore     Butterbeer          1                 2            2020-01-01
Harry Potter         Resurrection Store  1000              1            2020-01-02
Lord Voldemort       Love Portions       5                 100          2020-01-01
Harry Potter         Portable Swamps     3                 5            2020-01-02
Albus Dumbledore     Socks               1.5               2            2020-01-01
ORDERS

2. Fixed Width Fields

You might have noticed, this data's fields all have a fixed width.

Ubuntu

If you are using Ubuntu/Debian, the default awk may be mawk, which may not work properly.

You can install gawk and optionally, you can update awk alternative to gawk

sudo apt-get install gawk
sudo update-alternatives --config awk

gawk provides a variable FIELDWIDTHS, where you can specify the each field's width.

Fixed Width

gawk 'BEGIN  { FIELDWIDTHS = "21 20 18 13 10" } {print $1, $2}' order.txt
Name                  Item
Harry Potter          Invisibility Cloak
Tom Riddle            Diary Book
Albus Dumbledore      Chocolate Frogs
Gellert Grindelwald   Elder Wand
Lord Voldemort        Cauldron
Lord Voldemort        Wormtail
Lord Voldemort        Nagini
Lord Voldemort        Jetpack
Lord Voldemort        Horcrux
Albus Dumbledore      Butterbeer
Harry Potter          Resurrection Store
Lord Voldemort        Love Portions
Harry Potter          Portable Swamps
Albus Dumbledore      Socks

More on FIELDWIDTHS (For AWK Version >= 4.2)

Sometimes, you may want to skip several characters in the field.

If your gawk version is later than 4.2, you can try out this:

gawk 'BEGIN  { FIELDWIDTHS = "21 20 18 13 5:10" } {print $1, $2}' order.txt

Also sometimes, the you may have trailing data.

If your gawk version is later than 4.2, try out this

gawk 'BEGIN  { FIELDWIDTHS = "21 20 18 *" } {print $4}' order.txt

3. Variables and Functions

3.1 Varible

awk use -v for variable assignment

Variables

awk -v var1=value1 -v var2=value2 some-data.txt

3.2 Function

awk function's structure is straight forward, below is an example.

Function

# some.awk
function foo(bar) {
    print bar
}

4. Summary by Name

Save the following codes as name-filter.awk.

name-filter.awk

function print_line(n) {
    for (i=0; i<n; i++){
        printf "%s", "-"
    }
    printf "\n"
}
BEGIN {
    FIELDWIDTHS = "21 20 18 13 10"
    print_line(82)
}
{
    if (NR == 1) {
        print $0
        print_line(82)
    }
    if (NR > 1 && $1 ~ name) {
        print $0
        sum += $4 * $3
    }
}
END {
    print_line(82)
    printf "%-21s%21s\n", "Name", "Total Amount(Galleon)"
    printf "%-21s%21s\n", name, sum
    print_line(42)
}

You may find out there is a function in the codes. It is a rather simple one. Just give you a concept.

Also you may have noticed, there is a variable name not initialized. This is the variable we will pass into the codes when we run the commands.

Harry Potter's Orders

gawk -v name="Harry" -f name-filter.awk order.txt
----------------------------------------------------------------------------------
Name                 Item                Price(Galleon)    Quantity     Date
----------------------------------------------------------------------------------
Harry Potter         Invisibility Cloak  1000              1            2020-01-01
Harry Potter         Resurrection Store  1000              1            2020-01-02
Harry Potter         Portable Swamps     3                 5            2020-01-02
----------------------------------------------------------------------------------
Name                 Total Amount(Galleon)
Harry                                 2015
------------------------------------------

4. Summary by Name and Date

Save the following codes as name-date-filter.awk.

name-date-filter.awk

function print_line(n) {
    for (i=0; i<n; i++){
        printf "%s", "-"
    }
    printf "\n"
}
BEGIN {
    FIELDWIDTHS = "21 20 18 13 10"
    print_line(82)
}
{
    if (NR == 1) {
        print $0
        print_line(82)
    }
    if (NR > 1 && $1 ~ name) {
        if (startDate != "" && $5 < startDate)
            next
        if (endDate != "" && $5 > endDate)
            next
        print $0
        sum += $4 * $3
    }
}
END {
    print_line(82)
    printf "%-21s%21s\n", "Name", "Total Amount(Galleon)"
    printf "%-21s%21s\n", name, sum
    print_line(42)
}

You may find out there is a function in the codes. It is a rather simple one. Just give you a concept.

Also you may have noticed, there is a variable name not initialized. This is the variable we will pass into the codes when we run the commands.

Harry Potter's Orders

gawk -v name="Harry" -v startDate="2020-01-02" -v endDate="2020-01-03" -f name-date-filter.awk order.txt
----------------------------------------------------------------------------------
Name                 Item                Price(Galleon)    Quantity     Date
----------------------------------------------------------------------------------
Harry Potter         Resurrection Store  1000              1            2020-01-02
Harry Potter         Portable Swamps     3                 5            2020-01-02
----------------------------------------------------------------------------------
Name                 Total Amount(Galleon)
Harry                                 1015
------------------------------------------

Harry Potter's Orders

gawk -v startDate="2020-01-02" -v endDate="2020-01-03" -f name-date-filter.awk order.txt
----------------------------------------------------------------------------------
Name                 Item                Price(Galleon)    Quantity     Date
----------------------------------------------------------------------------------
Albus Dumbledore     Chocolate Frogs     2                 2            2020-01-02
Lord Voldemort       Cauldron            5                 1            2020-01-02
Lord Voldemort       Nagini              10.99             1            2020-01-02
Harry Potter         Resurrection Store  1000              1            2020-01-02
Harry Potter         Portable Swamps     3                 5            2020-01-02
----------------------------------------------------------------------------------
Name                 Total Amount(Galleon)
                                1034.99
------------------------------------------

Next tutorial, we will use our programs dealing with a 5 million data!