Skip to content

Student Scores - Continued

Let's continue with our students' exam result. This time, we will try to write an awk program!

1. Preparation

Raw Data

Run the below command to create our test data

cat <<SCORES> scores.txt
William Shakespeare Male English 1564 90
Jane Austen Female English 1775 87
Alexandre Dumas Male French 1802 58
Mark Twain Male American 1835 79
Charles Dickens Male English 1812 83
Franz Kafka Male German 1883 74
J.R.R. Tolkien Male English 1892 47
Ernest Hemingway Male American 1899 66
SCORES

2. Using a Script

An awk program is mainly composed three blocks: BEGIN, COMMANDS, and END

Name Description
BEGIN Initialize
COMMANDS Process Rows
END Finalize

First, let us try a Hello world program of awk.

This program will:

  1. Print a Hello World! at initialization.
  2. For each line, print a Hello, and the students name.
  3. Print a Good Luck at finalization.

Save the codes in Codes tab into a file named helloworld.awk, run the command in the Run tab, and check the result against the Output tab.

Hello World

Save the following codes in a file named helloworld.awk

BEGIN {
    printf "Hello World!\n"
}
{
    printf "Hello, %s %s!\n", $1,$2
}
END {
    printf "Good Luck!\n"
}

awk -f helloworld.awk scores.txt
Hello World!
Hello, William Shakespeare!
Hello, Jane Austen!
Hello, Alexandre Dumas!
Hello, Mark Twain!
Hello, Charles Dickens!
Hello, Franz Kafka!
Hello, J.R.R. Tolkien!
Hello, Ernest Hemingway!
Good Luck!

3. Statistics

3.1 Count

Let's try a simple program: find out how many male and female students are there?

Gender Count

Save the codes into a file gender-count.awk.

BEGIN {
    female = 0
    male = 0
    mystery = 0
    printf "Gender    Count   \n"
}
{
    if ($3 == "Male")
        male += 1
    else if ($3 == "Female")
        female += 1
    else
        mystery += 1
}
END {
    printf "%-10s%-8d\n", "Female",female
    printf "%-10s%-8d\n", "Male",male
    printf "%-10s%-8d\n", "Mystery",mystery
}
PROG

awk -f gender-count.awk scores.txt
Gender    Count
Female    1
Male      7
Mystery   0

What about another count by nationality?

Nationality Count

Save the codes into a file nationality-count.awk.

{
    nationality[$4] += 1
}
END {
    printf "%-12s%6s\n", "Nationality","Counts"
    for (n in nationality)
        printf "%-12s%6d\n", n, nationality[n]
}

awk -f nationality-count.awk scores.txt
Nationality Counts
German           1
American         2
French           1
English          4

3.2 Average

Now let's try to calculate the average scores

First, let's calculate the average score by gender.

Average by Gender

Save the codes into a file gender-average.awk.

BEGIN {
    female = 0; male = 0; mystery = 0
    female_score = 0; male_score = 0; mystery_score = 0
    printf "Gender    Avg Scores \n"
    printf "---------------------\n"
}
{
    if ($3 == "Male") {
        male += 1
        male_score += $NF
    }
    else if ($3 == "Female"){
        female += 1
        female_score += $NF
    }
    else {
        mystery += 1
        mystery_score += $NF
    }
}
END {
    printf "%-10s%-12.2f\n", "Female",female_score / female
    printf "%-10s%-12.2f\n", "Male",male_score / male
    printf "%-10s%-12.2f\n", "Mystery",mystery == 0 ? 0:mystery_score / mystery
    printf "---------------------\n"
    printf "%-10s%-12.2f\n", "Total",(female_score + male_score + mystery_score) / NR
}
PROG

awk -f gender-average.awk scores.txt
Gender    Avg Scores
---------------------
Female    87.00
Male      71.00
Mystery   0.00
---------------------
Total     73.00

Now, let's take a look of the performance by their home country.

Nationality Average

Save the codes into a file nationality-average.awk.

{
    nationality[$4] += $NF
    nationality_count[$4] += 1
}
END {
    printf "%-12s%10s\n", "Nationality","Avg Score"
    for (n in nationality)
        printf "%-12s%10.2f\n", n, nationality[n] / nationality_count[n]
}

awk -f nationality-average.awk scores.txt
Nationality  Avg Score
German           74.00
American         72.50
French           58.00
English          76.75

Let's calculate the average score of students born in different times before 1800 and after 1800.

Average by Time

Save the codes into a file time-average.awk.

BEGIN {
    before_1800 = 0; after_1800 = 0; mystery = 0
    before_1800_score = 0; after_1800_score = 0; mystery_score = 0
    printf "Gender      Avg Scores \n"
    printf "-------------------------\n"
}
{
    if ($5 >= 1800) {
        after_1800 += 1
        after_1800_score += $NF
    }
    else {
        before_1800 += 1
        before_1800_score += $NF
    }
}
END {
    printf "%-12s%12.2f\n", "Before 1800",before_1800_score / before_1800
    printf "%-12s%12.2f\n", "After 1800",after_1800_score / after_1800
    printf "-------------------------\n"
    printf "%-12s%12.2f\n", "Total",(before_1800_score + after_1800_score) / NR
}

awk -f time-average.awk scores.txt
Gender      Avg Scores
-------------------------
Before 1800        88.50
After 1800         67.83
-------------------------
Total              73.00