Student Scores¶
We've got a list of scores of the exam Ancient Chinese Literature
attended last term. Let's have a look how the students are doing.
Apparently from the exam results, age is apparently an advantage in this exam.
Name | Gender | Nationality | Birth Year | Score |
---|---|---|---|---|
William Shakespeare | Male | English | 1564 | 90 |
Jane Austen | Female | English | 1775 | 87 |
Alexandre Dumas | Male | French | 1802 | 74 |
Mark Twain | Male | American | 1835 | 79 |
Charles Dickens | Male | English | 1812 | 83 |
Franz Kafka | Male | German | 1883 | 58 |
J.R.R. Tolkien | Male | English | 1892 | 47 |
Ernest Hemingway | Male | American | 1899 | 66 |
1. Preparation¶
Raw Data
Run the below command to create our test data
cat <<SCORES> scores.txt
William Shakespeare Male English 1564 90
Jane Austen Female English 1775 87
Alexandre Dumas Male French 1802 58
Mark Twain Male American 1835 79
Charles Dickens Male English 1812 83
Franz Kafka Male German 1883 74
J.R.R. Tolkien Male English 1892 47
Ernest Hemingway Male American 1899 66
SCORES
2. Who failed the exam?¶
Apperantly, some students had scores lower than 60
, let's find out who exactly.
Who Failed
This command will print out the names if the last field is smaller than 60
.
awk '$NF<60 {print $1,$2}' scores.txt
Alexandre Dumas
J.R.R. Tolkien
Command Structure
As you may have figured out, a simple filtering awk
commmand composed of mainly two parts:
awk 'MATCHING_PATTERN {ACTIONS}' scores.txt
3. Gender Filter¶
Let's have a look what the male students' scores are.
We will use a format
:
- First column (
First Name
) as a width of10
- Second column (
Last Name
) as a width of12
- Third column (
Gender
) as a width of8
- Last column (
Score
) as a width of8
Male
The command will find out all the students if the third field is Male
.
awk '$3=="Male" {printf "%-10s%-12s%-8s%8d\n", $1,$2,$3,$NF}' scores.txt
William Shakespeare Male 90
Alexandre Dumas Male 58
Mark Twain Male 79
Charles Dickens Male 83
Franz Kafka Male 74
J.R.R. Tolkien Male 47
Ernest Hemingway Male 66
We can sort the result by scores using sort
Male Sorted
awk '$3=="Male" {printf "%-10s%-12s%-8s%8d\n", $1,$2,$3,$NF}' scores.txt \
| sort -rnk4
William Shakespeare Male 90
Charles Dickens Male 83
Mark Twain Male 79
Franz Kafka Male 74
Ernest Hemingway Male 66
Alexandre Dumas Male 58
J.R.R. Tolkien Male 47
4. More Filters¶
Let's have a look how the students born in 1800s
doing?
1800s
awk '$5<1900 && $5>=1800 {printf "%-10s%-12s%8d%8d\n", $1,$2,$5,$NF}' scores.txt
awk '$5 ~ /18[0-9]{2}/ {printf "%-10s%-12s%8d%8d\n", $1,$2,$5,$NF}' scores.txt
Alexandre Dumas 1802 58
Mark Twain 1835 79
Charles Dickens 1812 83
Franz Kafka 1883 74
J.R.R. Tolkien 1892 47
Ernest Hemingway 1899 66
Using Regex
It is quite convenient to use regex
as the matching pattern. We will further explore the usage of regex
later in our tutorial dealing with logs.
Now, who scored more than 80
and born after 1800
?
Young and Excellent
The command filter out the last field large than 80
and the fifth field larger than 1800
.
awk '$NF>80 && $5>1800 {printf "%-10s%-12s%8d%8d\n", $1,$2,$5,$NF}' scores.txt
As the only student scored more than 80
and born after 1800
, Mr Dickens must be a very hard working student!
Charles Dickens 1812 83
How are English students doing in this exam?
English Students' Results
awk '$4=="English" {printf "%-10s%-12s%-10s%8d\n", $1,$2,$4,$NF}' scores.txt
William Shakespeare English 90
Jane Austen English 87
Charles Dickens English 83
J.R.R. Tolkien English 47
5. Age¶
Today is 2020-12-30
, and how old are our students?
Age
awk '{printf "%-10s%-12s%8d\n", $1,$2,2020-$5}' scores.txt
Blimey, they are old.
William Shakespeare 456
Jane Austen 245
Alexandre Dumas 218
Mark Twain 185
Charles Dickens 208
Franz Kafka 137
J.R.R. Tolkien 128
Ernest Hemingway 121