awk Programming
The name awk comes from the initials of its designers: Alfred V. Aho, Peter J. Weinberger, and Brian W. Kernighan. The original version of awk was written in 1977 at AT&T Bell Laboratories. In 1985 a new version made the programming language more powerful, introducing user-defined functions, multiple input streams, and computed regular expressions. This new version became generally available with Unix System V Release 3.1.
.awk is a programming language designed to search for match patterns and perform actions on files.
AWK Commands
$awk 'length($1) > 5 {print}' mywords
$awk 'length($1) > 5' mywords
$awk '$1 ~ /^[b,c]/ {print $1}' mywords
The above script print all the words that begin with b or c character. The regular expression is placed between two slash characters.
$awk 'NR % 2 == 0 {print}' mywords
NR is a built-in variable that refers to the current line being processed. The above program prints each second record of the mywords file. Modulo dividing the NR variable we get an even line. $awk ‘{print NR, $0}’ mywords Here NR variable will print the line number and the $0 variable refers to the whole record.
$awk '{print substr($0, 4)}' code.c
substr() function. It prints a substring from the given string. We apply the function on each line, skipping the first three characters. In other words, we print each record from the fourth character till its end.
The Match Function
The match() is a built-in string manipulation function. It tests if the given string contains a regular expression pattern. The first parameter is the string, the second is the regex pattern.
$awk 'match($0, /^[c,b]/)' mywords
The program prints those lines that begin with c or b. The regular expression is placed between two slash characters.
$awk 'match($0, /i/) {print $0 " has i character at " RSTART}' mywords
The match() function sets the RSTART variable; it is the index of the start of the matching pattern. This prints those words that contain the ‘i’ character. In addition, it prints the first occurrence of the character.
$ awk -F: '{print $1, $7}' /etc/passwd | head -7
$ echo "Jane 17#Tom 23#Mark 34" | awk 'BEGIN {RS="#"} {print $1, "is", $2, "years old"}'
Jane is 17 years old
Tom is 23 years old
Mark is 34 years old
The RS is the input record separator, by default a newline. In the example, we have relevant data separated by the # character. The RS is used to strip them. AWK can receive input from other commands like echo.