Weasley's Wizard Wheezes Orders Continued¶
Let's continue with our last tutorial, but this time, we have 5 million data in 7 years range.
1. Preparation¶
Test Data
The test data for this section can be find at our Underneathall GitHub, weasleys-wizard-wheezes-order.zip
.
It contains 5,000,000 Weasleys' Wizard Wheezes
orders in year 2014
to 2020
. Download the data and unzip
it.
2. Name and Date Filter¶
Continue with our previous program:
name-date-filter.awk
function print_line(n) {
for (i=0; i<n; i++){
printf "%s", "-"
}
printf "\n"
}
BEGIN {
FIELDWIDTHS = "21 20 18 13 10"
print_line(82)
}
{
if (NR == 1) {
print $0
print_line(82)
}
if (NR > 1 && $1 ~ name) {
if (startDate != "" && $5 < startDate)
next
if (endDate != "" && $5 > endDate)
next
print $0
sum += $4 * $3
}
}
END {
print_line(82)
printf "%-21s%21s\n", "Name", "Total Amount(Galleon)"
printf "%-21s%21s\n", name, sum
print_line(42)
}
5m Orders
gawk -v name="Harry" \
-v startDate="2020-01-02" \
-v endDate="2020-01-02" \
-f name-date-filter.awk \
weasleys-wizard-wheezes-order.txt
----------------------------------------------------------------------------------
Name Item Price(Galleon) Quantity Date
----------------------------------------------------------------------------------
Harry Potter Deflagration Deluxe 3.99 1 2020-01-02
Harry Potter Skiving Snackbox 2.99 2 2020-01-02
Harry Potter Fainting Fancy 4.99 8 2020-01-02
Harry Potter Darkness Powder 4.99 4 2020-01-02
Harry Potter Hats, Headless 6.99 9 2020-01-02
...
...
Harry Potter Nosebleed Nougat 0.99 4 2020-01-02
Harry Potter Pygmy Puffs 1.99 4 2020-01-02
Harry Potter wands, trick 5.99 10 2020-01-02
Harry Potter Fainting Fancy 4.99 5 2020-01-02
----------------------------------------------------------------------------------
Name Total Amount(Galleon)
Harry 607.39
------------------------------------------
3. Item Count¶
Sometimes, it would be useful to know how many items are ordered
items-count-filter.awk
function print_line(n) {
for (i=0; i<n; i++){
printf "%s", "-"
}
printf "\n"
}
BEGIN {
FIELDWIDTHS = "21 20 18 13 10"
print_line(82)
}
{
if (NR == 1) {
print $0
print_line(82)
}
if (NR > 1 && $1 ~ name) {
if (startDate != "" && $5 < startDate)
next
if (endDate != "" && $5 > endDate)
next
print $0
sum += $4 * $3
items += $4
}
}
END {
print_line(82)
printf "%-21s%8s%25s\n", "Name", "Items", "Total Amount(Galleon)"
printf "%-21s%8d%25.2f\n", name, items, sum
print_line(54)
}
Items Count
gawk -v name="Harry" \
-v startDate="2020-01-02" \
-v endDate="2020-01-02" \
-f items-count-filter.awk \
weasleys-wizard-wheezes-order.txt
----------------------------------------------------------------------------------
Name Item Price(Galleon) Quantity Date
----------------------------------------------------------------------------------
Harry Potter Deflagration Deluxe 3.99 1 2020-01-02
Harry Potter Skiving Snackbox 2.99 2 2020-01-02
Harry Potter Fainting Fancy 4.99 8 2020-01-02
Harry Potter Darkness Powder 4.99 4 2020-01-02
Harry Potter Hats, Headless 6.99 9 2020-01-02
...
...
Harry Potter Nosebleed Nougat 0.99 4 2020-01-02
Harry Potter Pygmy Puffs 1.99 4 2020-01-02
Harry Potter wands, trick 5.99 10 2020-01-02
Harry Potter Fainting Fancy 4.99 5 2020-01-02
----------------------------------------------------------------------------------
Name Items Total Amount(Galleon)
Harry 161 607.39
------------------------------------------------------
4. Product Filter¶
Save the code below to weasleys-wizard-wheezes-filter.awk
weasleys-wizard-wheezes-filter.awk
function print_line(n) {
for (i=0; i<n; i++){
printf "%s", "-"
}
printf "\n"
}
BEGIN {
FIELDWIDTHS = "21 20 18 13 10"
print_line(82)
}
{
if (NR == 1) {
print $0
print_line(82)
}
if (NR > 1 && $1 ~ name && $2 ~ product) {
if (startDate != "" && $5 < startDate)
next
if (endDate != "" && $5 > endDate)
next
print $0
sum += $4 * $3
items += $4
}
}
END {
print_line(82)
printf "%-21s%8s%25s\n", "Name", "Items", "Total Amount(Galleon)"
printf "%-21s%8d%25.2f\n", name, items, sum
print_line(54)
}
Let's give it a run!
Harry's Swamp Purchase
gawk -v name="Harry" \
-v product="Swamp" \
-v startDate="2020-01-01" \
-v endDate="2020-01-01" \
-f weasleys-wizard-wheezes-filter.awk \
weasleys-wizard-wheezes-order.txt
----------------------------------------------------------------------------------
Name Item Price(Galleon) Quantity Date
----------------------------------------------------------------------------------
Harry Potter Portable Swamp 1.99 3 2020-01-01
Harry Potter Portable Swamp 1.99 3 2020-01-01
----------------------------------------------------------------------------------
Name Items Total Amount(Galleon)
Harry 6 11.94
------------------------------------------------------
5. Summary¶
You might have noticed, we have printed out the order details for every row, which will have duplicate items. However, sometimes we just want a summary of all the goods bought with the quantities summed up.
Let's tweak a bit of our existing program and save to weasleys-wizard-wheezes-summary.awk
weasleys-wizard-wheezes-summary.awk
function print_line(n) {
for (i=0; i<n; i++){
printf "%s", "-"
}
printf "\n"
}
BEGIN {
FIELDWIDTHS = "21 20 18 13 10"
print_line(71)
}
{
if (NR > 1 && $1 ~ name && $2 ~ product) {
if (startDate != "" && $5 < startDate)
next
if (endDate != "" && $5 > endDate)
next
sum += $4 * $3
items += $4
quantity[$2] += $4
!price[$2] && price[$2] = $3
}
}
END {
printf "%-20s%18s%13s%20s\n", "Item", "Price(Galleon)", "Quantity", "Amount(Galleon)"
print_line(71)
for (item in quantity){
printf "%-20s%18.2f%13d%20.2f\n", item, price[item], quantity[item], price[item] * quantity[item]
}
print_line(71)
printf "%-21s%8s%25s\n", "Name", "Items", "Total Amount(Galleon)"
printf "%-21s%8d%25.2f\n", name, items, sum
print_line(54)
}
Weasley's Wizard Wheezes Order Summery
gawk -v name="Harry Potter" \
-v startDate="2020-01-01" \
-v endDate="2020-01-15" \
-f weasleys-wizard-wheezes-summary.awk \
weasleys-wizard-wheezes-order.txt
-----------------------------------------------------------------------
Item Price(Galleon) Quantity Amount(Galleon)
-----------------------------------------------------------------------
Fever Fudge 6.99 138 964.62
Muggle magic tricks 3.99 97 387.03
Pygmy Puffs 1.99 100 199.00
Hats, Headless 6.99 45 314.55
Hats, Shield 6.99 121 845.79
Extendable Ears 6.99 81 566.19
Ministry. 1.99 89 177.11
Quills 1.99 54 107.46
Ton-Tongue Toffees 2.99 99 296.01
U-No-Poo 2.99 156 466.44
Deflagration Deluxe 3.99 99 395.01
Portable Swamp 1.99 106 210.94
wands, trick 5.99 85 509.15
Skiving Snackbox 2.99 90 269.10
Basic Blaze Box 0.99 73 72.27
Darkness Powder 4.99 93 464.07
Fainting Fancy 4.99 95 474.05
Daydream Charms 0.99 87 86.13
wands, fake 3.99 64 255.36
Puking Pastilles 0.99 102 100.98
Punching telescope 1.99 119 236.81
Nosebleed Nougat 0.99 62 61.38
Hangman, Reusable 3.99 82 327.18
Wildfire Whiz-Bangs 2.99 73 218.27
Canary Creams 6.99 63 440.37
Love potions 6.99 93 650.07
-----------------------------------------------------------------------
Name Items Total Amount(Galleon)
Harry Potter 2366 9095.34
------------------------------------------------------