Awk Command in Linux

 


Awk Command in Linux 

 

Awk is a scripting language used for manipulating data and generating reports. The awk command programming language requires no compiling and allows the user to use variables, numeric functions, string functions, and logical operators.  


Awk is a utility that enables a programmer to write tiny but effective programs in the form of statements that define text patterns that are to be searched for in each line of a document and the action that is to be taken when a match is found within a line. Awk is mostly used for pattern scanning and processing. It searches one or more files to see if they contain lines that matches with the specified patterns and then perform the associated actions.  


Awk is abbreviated from the names of the developers – Aho, Weinberger, and Kernighan.  


WHAT CAN WE DO WITH AWK?  


1. AWK Operations:  
(a) Scans a file line by line  
(b) Splits each input line into fields  
(c) Compares input line/fields to pattern  
(d) Performs action(s) on matched lines  


2. Useful For:  
(a) Transform data files  
(b) Produce formatted reports  


3. Programming Constructs:  
(a) Format output lines  
(b) Arithmetic and string operations  
(c) Conditionals and loops 


How Does the AWK Command Work? 

The awk command's main purpose is to make information retrieval and text manipulation easy to perform in Linux. The command works by scanning a set of input lines in order and searches for lines matching the patterns specified by the user. 

For each pattern, users can specify an action to perform on each line that matches the specified pattern. Thus, using awk, users can easily process complex log files and output a readable report. 


AWK Statements 

The command provides basic control flow statements (if-elsewhileforbreak) and also allows users to group statements using braces {}. 

  • if-else 

The if-else statement works by evaluating the condition specified in the parentheses and, if the condition is true, the statement following the if statement is executed. The else part is optional. 

For example: 

awk -F ',' '{if($2==$3){print $1","$2","$3} else {print "No Duplicates"}}' answers.txt 

Using the if-else statement in awk. 


The output shows the lines in which duplicates exist and states No duplicates if there are no duplicate answers in the line. 

  • while 

The while statement repeatedly executes a target statement as long as the specified condition is true. That means that it operates like the one in the C programming language. If the condition is true, the body of the loop is executed. If the condition is false, awk continues with the execution. 

For example, the following statement instructs awk to print all input fields one per line: 

awk '{i=0; while(i<=NF) { print i ":"$i; i++;}}' employees.txt 

Using the while statement in awk. 

  • for 

The for statement also works like that of C, allowing users to create a loop that needs to execute a specific number of times. 

For example: 

awk 'BEGIN{for(i=1; i<=10; i++) print "The square of", i, "is", i*i;}' 

An example of the for statement in awk. 

The statement above increases the value of i by one until it reaches ten and calculates the square of i each time. 

  • break 

The break statement immediately exits from an enclosing while or for. To begin the next iteration, use the continue statement. 

The next statement instructs awk to skip to the next record and begin scanning for patterns from the top. The exit statement instructs awk that the input has ended. 

Following is an example of the break statement: 

awk 'BEGIN{x=1; while(1) {print "Example"; if ( x==5 ) break; x++; }}' 

Using the break statement in awk. 

The command above breaks the loop after 5 iterations. 


AWK Patterns 

Inserting a pattern in front of an action in awk acts as a selector. The selector determines whether to perform an action or not. The following expressions can serve as patterns: 

  • Regular expressions. 

  • Arithmetic relational expressions. 

  • String-valued expressions. 

  • Arbitrary Boolean combinations of the expressions above. 

The following sections explain the above-mentioned expressions and how to use them.

 

Regular Expression Patterns 

Regular expression patterns are the simplest form of expressions containing a string of characters enclosed in slashes. It can be a sequence of letters, numbers, or a combination of both. 

In the following example, the program outputs all the lines starting with "A". If the specified string is a part of a larger word, it is also printed. 

awk '$1 ~ /^A/ {print $0}' employees.txt 

An example of a regular expression pattern. 


Relational Expression Patterns 

Another type of awk patterns are relational expression patterns. The relational expression patterns involve using any of the following relational operators: <, <=, ==, !=, >=, and >. 

Following is an example of an awk relational expression: 

awk 'BEGIN { a = 10; b = 10; if (a == b) print "a == b" }' 

An example of a relational expression pattern.


 

Range Patterns 

A range pattern is a pattern consisting of two patterns separated by a comma. Range patterns perform the specified action for each line between the occurrence of pattern one and pattern two. 

For example: 

awk '/clerk/, /manager/ {print $1, $2}' employees.txtExample of a range pattern. 

The pattern above instructs awk to print all the lines of the input containing the keywords "clerk" and "manager". 


Special Expression Patterns 

Special expression patterns include BEGIN and END which denote program initialization and end. The BEGIN pattern matches the beginning of the input, before the first record is processed. The END pattern matches the end of the input, after the last record has been processed. 

For example, you can instruct awk to display a message at the beginning and at the end of the process: 

awk 'BEGIN { print "List of debtors:" }; {print $1, $2}; END {print "End of the debtor list"}' debtors.txt 

An example of a special expression pattern in awk. 


Combining Patterns 

The awk command allows users to combine two or more patterns using logical operators. The combined patterns can be any Boolean combination of patterns. The logical operators for combining patterns are: 

  • || (or) 

  • && (and) 

  • ! (not) 

For example: 

awk '$3 > 10 && $4 < 20 {print $1, $2}' employees.txt  

Example of a combining pattern in awk. 

The output prints the first and second fields of those records whose third field is greater than ten and the fourth field is less than 20. 


AWK Variables 

The awk command has built-in field variables, which break the input file into separate parts called fields. The awk assigns the following variables to each data field: 

  • $0. Used to specify the whole line. 

  • $1. Specifies the first field. 

  • $2. Specifies the second field. 

  • etc. 

Other available built-in awk variables are: 

  • NR. Counts the number of input records (usually lines). The awk command performs the pattern/action statements once for each record in a file. 

For example: 

awk '{print NR,$0}' employees.txt 

Example of the NR variable in awk. 

The command displays the line number in the output. 


  • NF. Counts the number of fields in the current input record and displays the last field of the file. 

For example: 

awk '{print $NF}' employees.txt 

Example of the NF variable in awk. 


  • FS. Contains the character used to divide fields on the input line. The default separator is space, but you can use FS to reassign the separator to another character (typically in BEGIN). 

For example, you can make the etc/passwd file (user list) more readable by changing the separator from a colon (:) to a dash (/) and print out the field separator as well: 

awk -FS 'BEGIN {FS=":"; OFS="-"} {print $0}' /etc/passwd 

An example of the FS variable in awk. 

  • RS. Stores the current record separator character. The default input line is the input record, which makes a newline the default record separator. The command is useful if the input is a comma-separated file (CSV). 

For example: 

awk 'BEGIN {FS="-"; RS=","; OFS=" owes Rs. "} {print $1,$2}' debtors.txt 

An example of the RS variable in awk. 


  • OFS. Stores the output field separator, which separates the fields when printed. The default separator is a blank space. Whenever the printed file has several parameters separated with commas, the OFS value is printed between each parameter. 

For example: 

awk 'OFS=" works as " {print $1,$3}' employees.txt 

An example of the ofs variable in awk. 


AWK Actions 

The awk tool follows rules containing pattern-action pairs. Actions consist of statements enclosed in curly braces {} which contain expressions, control statements, compound statements, input and output statements, and deletion statements. Those statements are described in the sections above. 

Create an awk script using the following syntax: 

awk '{action}'  

For example: 

awk '{print "How to use the awk command"}' 

An example of an awk action. 

This simple command instructs awk to print the specified string each time you run the command. Terminate the program using Ctrl+D. 


How to Use the AWK Command - Examples 

  • Apart from manipulating data and producing formatted outputs, awk has other uses as it is a scripting language and not only a text processing command. This section explains alternative use cases for awk. 

  • Calculations. The awk command allows you to perform arithmetic calculations. For example: 

df | awk '/\/dev\/loop/ {print $1"\t"$2 + $3}' 

An example of using awk for performing arithmetic calculations. 

In this example, we pipe into the df command and use the information generated in the report to calculate the total memory available and used by the mounted filesystems that contain only /dev and /loop in the name. 

The produced report shows the memory sum of the /dev and /loop filesystems in columns two and three in the df output. 

  • Filtering. The awk command allows you to filter the output by limiting the length of the lines. For example: 

awk 'length($0) > 8' /etc/shells 

An example of using awk to filter a command output. 


In this example, we ran the /etc/shells system file through awk and filtered the output to contain only the lines containing more than 8 characters. 

  • Monitoring. Check if a certain process is running in Linux by piping into the ps command. For example: 

ps -ef | awk '{ if($NF == "clipboard") print $0}' 

An example of using awk to check running processes. 


The output prints a list of all the processes running on your machine with the last field matching the specified pattern. 

  • Counting. You can use awk to count the number of characters in a line and get the number printed in the result. For example: 

awk '{ print "The number of characters in line", NR,"=" length($0) }' employees.txt 

An example of using awk to count character number in each line of a file. 


Conclusion 

After reading this Blog, maybe you know what the "awk command" is and how you can use it effectively for various use cases.

The awk command is also a scripting language with many uses, and it is essential knowledge for every Linux user. Use it for powerful text manipulations, but also as a scripting language. 

 


Comments

Popular posts from this blog

Oracle Database Server Architecture: Overview

Oracle E-Business Suite (EBS) - Introduction

Why enterprises must not ignore Azure DevOps Server