Weasley's Wizard Wheezes Orders: Harry Potter¶
We got some order records available, and you may want to take a look:
Name | Item | Price(Galleon) | Quantity | Date |
---|---|---|---|---|
Harry Potter | Invisibility Cloak | 1000 | 1 | 2020-01-01 |
Tom Riddle | Diary Book | 1 | 2 | 2020-01-01 |
Albus Dumbledore | Chocolate Frogs | 2 | 2 | 2020-01-02 |
Gellert Grindelwald | Elder Wand | 1000 | 1 | 2020-01-01 |
Lord Voldemort | Cauldron | 5 | 1 | 2020-01-02 |
Lord Voldemort | Wormtail | 1.99 | 1 | 2020-01-01 |
Lord Voldemort | Nagini | 10.99 | 1 | 2020-01-02 |
Lord Voldemort | Jetpack | 0.99 | 100 | 2020-01-01 |
Lord Voldemort | Horcrux | 2 | 6 | 2020-01-01 |
Albus Dumbledore | Butterbeer | 1 | 2 | 2020-01-01 |
Harry Potter | Resurrection Store | 1000 | 1 | 2020-01-02 |
Lord Voldemort | Love Portions | 5 | 100 | 2020-01-01 |
Harry Potter | Portable Swamps | 3 | 5 | 2020-01-02 |
Albus Dumbledore | Socks | 1.5 | 2 | 2020-01-01 |
1. Preparation¶
Raw Data
Run the following code to generate the raw data.
cat <<ORDERS> order.txt
Name Item Price(Galleon) Quantity Date
Harry Potter Invisibility Cloak 1000 1 2020-01-01
Tom Riddle Diary Book 1 2 2020-01-01
Albus Dumbledore Chocolate Frogs 2 2 2020-01-02
Gellert Grindelwald Elder Wand 1000 1 2020-01-01
Lord Voldemort Cauldron 5 1 2020-01-02
Lord Voldemort Wormtail 1.99 1 2020-01-01
Lord Voldemort Nagini 10.99 1 2020-01-02
Lord Voldemort Jetpack 0.99 100 2020-01-01
Lord Voldemort Horcrux 100 6 2020-01-01
Albus Dumbledore Butterbeer 1 2 2020-01-01
Harry Potter Resurrection Store 1000 1 2020-01-02
Lord Voldemort Love Portions 5 100 2020-01-01
Harry Potter Portable Swamps 3 5 2020-01-02
Albus Dumbledore Socks 1.5 2 2020-01-01
ORDERS
2. Fixed Width Fields¶
You might have noticed, this data's fields all have a fixed width.
Ubuntu
If you are using Ubuntu
/Debian
, the default awk
may be mawk
, which may not work properly.
You can install gawk
and optionally, you can update awk
alternative to gawk
sudo apt-get install gawk
sudo update-alternatives --config awk
gawk
provides a variable FIELDWIDTHS
, where you can specify the each field's width.
Fixed Width
gawk 'BEGIN { FIELDWIDTHS = "21 20 18 13 10" } {print $1, $2}' order.txt
Name Item
Harry Potter Invisibility Cloak
Tom Riddle Diary Book
Albus Dumbledore Chocolate Frogs
Gellert Grindelwald Elder Wand
Lord Voldemort Cauldron
Lord Voldemort Wormtail
Lord Voldemort Nagini
Lord Voldemort Jetpack
Lord Voldemort Horcrux
Albus Dumbledore Butterbeer
Harry Potter Resurrection Store
Lord Voldemort Love Portions
Harry Potter Portable Swamps
Albus Dumbledore Socks
More on FIELDWIDTHS (For AWK Version >= 4.2)
Sometimes, you may want to skip several characters in the field.
If your gawk
version is later than 4.2
, you can try out this:
gawk 'BEGIN { FIELDWIDTHS = "21 20 18 13 5:10" } {print $1, $2}' order.txt
Also sometimes, the you may have trailing data.
If your gawk
version is later than 4.2
, try out this
gawk 'BEGIN { FIELDWIDTHS = "21 20 18 *" } {print $4}' order.txt
3. Variables and Functions¶
3.1 Varible¶
awk
use -v
for variable assignment
Variables
awk -v var1=value1 -v var2=value2 some-data.txt
3.2 Function¶
awk
function's structure is straight forward, below is an example.
Function
# some.awk
function foo(bar) {
print bar
}
4. Summary by Name¶
Save the following codes as name-filter.awk
.
name-filter.awk
function print_line(n) {
for (i=0; i<n; i++){
printf "%s", "-"
}
printf "\n"
}
BEGIN {
FIELDWIDTHS = "21 20 18 13 10"
print_line(82)
}
{
if (NR == 1) {
print $0
print_line(82)
}
if (NR > 1 && $1 ~ name) {
print $0
sum += $4 * $3
}
}
END {
print_line(82)
printf "%-21s%21s\n", "Name", "Total Amount(Galleon)"
printf "%-21s%21s\n", name, sum
print_line(42)
}
You may find out there is a function in the codes. It is a rather simple one. Just give you a concept.
Also you may have noticed, there is a variable name
not initialized. This is the variable we will pass into the codes when we run the commands.
Harry Potter's Orders
gawk -v name="Harry" -f name-filter.awk order.txt
----------------------------------------------------------------------------------
Name Item Price(Galleon) Quantity Date
----------------------------------------------------------------------------------
Harry Potter Invisibility Cloak 1000 1 2020-01-01
Harry Potter Resurrection Store 1000 1 2020-01-02
Harry Potter Portable Swamps 3 5 2020-01-02
----------------------------------------------------------------------------------
Name Total Amount(Galleon)
Harry 2015
------------------------------------------
4. Summary by Name and Date¶
Save the following codes as name-date-filter.awk
.
name-date-filter.awk
function print_line(n) {
for (i=0; i<n; i++){
printf "%s", "-"
}
printf "\n"
}
BEGIN {
FIELDWIDTHS = "21 20 18 13 10"
print_line(82)
}
{
if (NR == 1) {
print $0
print_line(82)
}
if (NR > 1 && $1 ~ name) {
if (startDate != "" && $5 < startDate)
next
if (endDate != "" && $5 > endDate)
next
print $0
sum += $4 * $3
}
}
END {
print_line(82)
printf "%-21s%21s\n", "Name", "Total Amount(Galleon)"
printf "%-21s%21s\n", name, sum
print_line(42)
}
You may find out there is a function in the codes. It is a rather simple one. Just give you a concept.
Also you may have noticed, there is a variable name
not initialized. This is the variable we will pass into the codes when we run the commands.
Harry Potter's Orders
gawk -v name="Harry" -v startDate="2020-01-02" -v endDate="2020-01-03" -f name-date-filter.awk order.txt
----------------------------------------------------------------------------------
Name Item Price(Galleon) Quantity Date
----------------------------------------------------------------------------------
Harry Potter Resurrection Store 1000 1 2020-01-02
Harry Potter Portable Swamps 3 5 2020-01-02
----------------------------------------------------------------------------------
Name Total Amount(Galleon)
Harry 1015
------------------------------------------
Harry Potter's Orders
gawk -v startDate="2020-01-02" -v endDate="2020-01-03" -f name-date-filter.awk order.txt
----------------------------------------------------------------------------------
Name Item Price(Galleon) Quantity Date
----------------------------------------------------------------------------------
Albus Dumbledore Chocolate Frogs 2 2 2020-01-02
Lord Voldemort Cauldron 5 1 2020-01-02
Lord Voldemort Nagini 10.99 1 2020-01-02
Harry Potter Resurrection Store 1000 1 2020-01-02
Harry Potter Portable Swamps 3 5 2020-01-02
----------------------------------------------------------------------------------
Name Total Amount(Galleon)
1034.99
------------------------------------------
Next tutorial, we will use our programs dealing with a 5 million data!