Skip to content

Weasley's Wizard Wheezes Orders Continued

Let's continue with our last tutorial, but this time, we have 5 million data in 7 years range.

1. Preparation

Test Data

The test data for this section can be find at our Underneathall GitHub, weasleys-wizard-wheezes-order.zip.

It contains 5,000,000 Weasleys' Wizard Wheezes orders in year 2014 to 2020. Download the data and unzip it.

2. Name and Date Filter

Continue with our previous program:

name-date-filter.awk

function print_line(n) {
    for (i=0; i<n; i++){
        printf "%s", "-"
    }
    printf "\n"
}
BEGIN {
    FIELDWIDTHS = "21 20 18 13 10"
    print_line(82)
}
{
    if (NR == 1) {
        print $0
        print_line(82)
    }
    if (NR > 1 && $1 ~ name) {
        if (startDate != "" && $5 < startDate)
            next
        if (endDate != "" && $5 > endDate)
            next
        print $0
        sum += $4 * $3
    }
}
END {
    print_line(82)
    printf "%-21s%21s\n", "Name", "Total Amount(Galleon)"
    printf "%-21s%21s\n", name, sum
    print_line(42)
}

5m Orders

gawk -v name="Harry" \
    -v startDate="2020-01-02" \
    -v endDate="2020-01-02" \
    -f name-date-filter.awk \
    weasleys-wizard-wheezes-order.txt
----------------------------------------------------------------------------------
Name                 Item                Price(Galleon)    Quantity     Date
----------------------------------------------------------------------------------
Harry Potter         Deflagration Deluxe 3.99              1            2020-01-02
Harry Potter         Skiving Snackbox    2.99              2            2020-01-02
Harry Potter         Fainting Fancy      4.99              8            2020-01-02
Harry Potter         Darkness Powder     4.99              4            2020-01-02
Harry Potter         Hats, Headless      6.99              9            2020-01-02
...
...
Harry Potter         Nosebleed Nougat    0.99              4            2020-01-02
Harry Potter         Pygmy Puffs         1.99              4            2020-01-02
Harry Potter         wands, trick        5.99              10           2020-01-02
Harry Potter         Fainting Fancy      4.99              5            2020-01-02
----------------------------------------------------------------------------------
Name                 Total Amount(Galleon)
Harry                               607.39
------------------------------------------

3. Item Count

Sometimes, it would be useful to know how many items are ordered

items-count-filter.awk

function print_line(n) {
    for (i=0; i<n; i++){
        printf "%s", "-"
    }
    printf "\n"
}
BEGIN {
    FIELDWIDTHS = "21 20 18 13 10"
    print_line(82)
}
{
    if (NR == 1) {
        print $0
        print_line(82)
    }
    if (NR > 1 && $1 ~ name) {
        if (startDate != "" && $5 < startDate)
            next
        if (endDate != "" && $5 > endDate)
            next
        print $0
        sum += $4 * $3
        items += $4
    }
}
END {
    print_line(82)
    printf "%-21s%8s%25s\n", "Name", "Items", "Total Amount(Galleon)"
    printf "%-21s%8d%25.2f\n", name, items, sum
    print_line(54)
}

Items Count

gawk -v name="Harry" \
    -v startDate="2020-01-02" \
    -v endDate="2020-01-02" \
    -f items-count-filter.awk \
    weasleys-wizard-wheezes-order.txt
----------------------------------------------------------------------------------
Name                 Item                Price(Galleon)    Quantity     Date
----------------------------------------------------------------------------------
Harry Potter         Deflagration Deluxe 3.99              1            2020-01-02
Harry Potter         Skiving Snackbox    2.99              2            2020-01-02
Harry Potter         Fainting Fancy      4.99              8            2020-01-02
Harry Potter         Darkness Powder     4.99              4            2020-01-02
Harry Potter         Hats, Headless      6.99              9            2020-01-02
...
...
Harry Potter         Nosebleed Nougat    0.99              4            2020-01-02
Harry Potter         Pygmy Puffs         1.99              4            2020-01-02
Harry Potter         wands, trick        5.99              10           2020-01-02
Harry Potter         Fainting Fancy      4.99              5            2020-01-02
----------------------------------------------------------------------------------
Name                    Items    Total Amount(Galleon)
Harry                     161                   607.39
------------------------------------------------------

4. Product Filter

Save the code below to weasleys-wizard-wheezes-filter.awk

weasleys-wizard-wheezes-filter.awk

function print_line(n) {
    for (i=0; i<n; i++){
        printf "%s", "-"
    }
    printf "\n"
}
BEGIN {
    FIELDWIDTHS = "21 20 18 13 10"
    print_line(82)
}
{
    if (NR == 1) {
        print $0
        print_line(82)
    }
    if (NR > 1 && $1 ~ name && $2 ~ product) {
        if (startDate != "" && $5 < startDate)
            next
        if (endDate != "" && $5 > endDate)
            next
        print $0
        sum += $4 * $3
        items += $4
    }
}
END {
    print_line(82)
    printf "%-21s%8s%25s\n", "Name", "Items", "Total Amount(Galleon)"
    printf "%-21s%8d%25.2f\n", name, items, sum
    print_line(54)
}

Let's give it a run!

Harry's Swamp Purchase

gawk -v name="Harry" \
    -v product="Swamp" \
    -v startDate="2020-01-01" \
    -v endDate="2020-01-01" \
    -f weasleys-wizard-wheezes-filter.awk \
    weasleys-wizard-wheezes-order.txt
----------------------------------------------------------------------------------
Name                 Item                Price(Galleon)    Quantity     Date
----------------------------------------------------------------------------------
Harry Potter         Portable Swamp      1.99              3            2020-01-01
Harry Potter         Portable Swamp      1.99              3            2020-01-01
----------------------------------------------------------------------------------
Name                    Items    Total Amount(Galleon)
Harry                       6                    11.94
------------------------------------------------------

5. Summary

You might have noticed, we have printed out the order details for every row, which will have duplicate items. However, sometimes we just want a summary of all the goods bought with the quantities summed up.

Let's tweak a bit of our existing program and save to weasleys-wizard-wheezes-summary.awk

weasleys-wizard-wheezes-summary.awk

function print_line(n) {
    for (i=0; i<n; i++){
        printf "%s", "-"
    }
    printf "\n"
}
BEGIN {
    FIELDWIDTHS = "21 20 18 13 10"
    print_line(71)
}
{
    if (NR > 1 && $1 ~ name && $2 ~ product) {
        if (startDate != "" && $5 < startDate)
            next
        if (endDate != "" && $5 > endDate)
            next
        sum += $4 * $3
        items += $4
        quantity[$2] += $4
        !price[$2] && price[$2] = $3
    }
}
END {
    printf "%-20s%18s%13s%20s\n", "Item", "Price(Galleon)", "Quantity", "Amount(Galleon)"
    print_line(71)
    for (item in quantity){
        printf "%-20s%18.2f%13d%20.2f\n", item, price[item], quantity[item], price[item] * quantity[item]
    }
    print_line(71)
    printf "%-21s%8s%25s\n", "Name", "Items", "Total Amount(Galleon)"
    printf "%-21s%8d%25.2f\n", name, items, sum
    print_line(54)
}

Weasley's Wizard Wheezes Order Summery

gawk -v name="Harry Potter" \
    -v startDate="2020-01-01" \
    -v endDate="2020-01-15" \
    -f weasleys-wizard-wheezes-summary.awk \
    weasleys-wizard-wheezes-order.txt
-----------------------------------------------------------------------
Item                    Price(Galleon)     Quantity     Amount(Galleon)
-----------------------------------------------------------------------
Fever Fudge                       6.99          138              964.62
Muggle magic tricks               3.99           97              387.03
Pygmy Puffs                       1.99          100              199.00
Hats, Headless                    6.99           45              314.55
Hats, Shield                      6.99          121              845.79
Extendable Ears                   6.99           81              566.19
Ministry.                         1.99           89              177.11
Quills                            1.99           54              107.46
Ton-Tongue Toffees                2.99           99              296.01
U-No-Poo                          2.99          156              466.44
Deflagration Deluxe               3.99           99              395.01
Portable Swamp                    1.99          106              210.94
wands, trick                      5.99           85              509.15
Skiving Snackbox                  2.99           90              269.10
Basic Blaze Box                   0.99           73               72.27
Darkness Powder                   4.99           93              464.07
Fainting Fancy                    4.99           95              474.05
Daydream Charms                   0.99           87               86.13
wands, fake                       3.99           64              255.36
Puking Pastilles                  0.99          102              100.98
Punching telescope                1.99          119              236.81
Nosebleed Nougat                  0.99           62               61.38
Hangman, Reusable                 3.99           82              327.18
Wildfire Whiz-Bangs               2.99           73              218.27
Canary Creams                     6.99           63              440.37
Love potions                      6.99           93              650.07
-----------------------------------------------------------------------
Name                    Items    Total Amount(Galleon)
Harry Potter             2366                  9095.34
------------------------------------------------------