Why is string processing used?
String processing is commonly used to filter and extract specific parts of output from commands such as ifconfig
, ip a
, etc.
-
head
Command
Thehead
command is used to display the first few lines of a file. By default, it shows the first 10 lines.head 1.txt # Displays the first 10 lines of 1.txt head -n 5 1.txt # Displays the first 5 lines of 1.txt head -n 3 1.txt # Displays the first 3 lines of 1.txt head -c 50 1.txt # Displays the first 50 bytes of 1.txt
You can also display the first few lines from multiple files:
head 1.txt messages /var/log/centos.rep # Displays first 10 lines from all three files
tail
Command
Thetail
command is used to display the last few lines of a file. It shows the last 10 lines by default.tail 1.txt messages passwd centos.rep # Displays the last 10 lines of each file tail -f /var/log # Displays the last 10 lines of log files and updates when new entries appear
-
wc
Command
Thewc
command counts the lines, words, and bytes in a file.wc 1.txt # Output: lines words bytes wc -l 1.txt # Counts the number of lines wc -w 1.txt # Counts the number of words wc -c 1.txt # Counts the number of bytes
-
Counting for Multiple Files
wc * # Counts lines, words, and bytes for all files ls -1 # Lists files in the directory wc -l # Counts the number of files
-
Basic Sorting
sort test1.txt # Sorts lines alphabetically
-
Additional Sort Options
-
Sort in numeric order (for numbers)
sort -h test1.txt # Sorts numbers in ascending order
-
Sort in reverse order
sort -r test1.txt # Sorts lines in reverse order
-
Sort randomly
sort -R test1.txt # Sorts lines randomly
-
Remove duplicates
sort test1.txt | uniq -c # Removes duplicates and shows their count
-
-
Sorting by Specific Columns
sort -k 2 test1.txt # Sorts by the second column
-
Handling Delimited Files
sort -t ',' -k 2 test1.txt # Sorts by the second column of a comma-separated file
-
Sorting by Month
sort -M test1.txt # Sorts by month names
- Monitor the output of a command periodically
watch -n 5 date # Displays the current date every 5 seconds watch -n 1 date # Displays the current date every second
-
Basic Search
grep searchword filename # Finds occurrences of 'searchword' in the file grep "root" /etc/passwd # Finds 'root' in the /etc/passwd file
-
Search in Multiple Files
grep root /etc/passwd /etc/shadow # Finds 'root' in both files
-
Ignore Case (Case-insensitive search)
grep -i root messages # Finds both 'root' and 'Root'
-
Search for Multiple Words
grep -E "(session|root|mounting)" var/log/messages # Finds 'session', 'root', or 'mounting'
-
Search for Words in the Same Line
grep "session" var/log/messages | grep root # Finds lines containing both 'session' and 'root'
-
Exclude a Word (Using
-v
)grep "session" var/log/messages | grep -v root # Finds lines with 'session' but not 'root'
-
Search for a Word at the Beginning or End of a Line
- Word at the beginning of a line:
grep "^root" var/log/messages # Finds lines where 'root' appears at the start
- Word at the end of a line:
grep "root.$" var/log/messages # Finds lines where 'root' appears at the end
- Word at the beginning of a line:
-
Search by Date
grep "^2sep" var/log/messages | grep root # Finds logs for 'root' on 2nd September
-
Find Single Character After a Word
grep 'roo.' filename # Finds 'roo' followed by any single character
-
Find Multiple Characters After a Word
grep 'roo..' filename # Finds 'roo' followed by two characters
-
Find Empty Lines
grep '^$' filename # Finds empty lines
-
Exclude Lines Starting with
#
(e.g., Comments)grep -v '^#' ssd_config # Excludes lines starting with #
-
Find Alphanumeric Characters
grep "[[:alnum:]]" filename # Finds all alphanumeric characters
-
Search for a Pattern from Another File
grep -f domain.txt url.txt # Searches for domains from domain.txt in url.txt
-
Extract Columns from CSV File
cut -d ',' -f 2 my-csv.csv # Extracts the second column (e.g., names) cut -d ',' -f 1,3 my-csv.csv # Extracts columns 1 and 3 cut -d ',' -f 1-3 my-csv.csv # Extracts columns 1 to 3
-
Replace Commas with Spaces
cut my-csv.csv | tr ',' ' ' # Replaces commas with spaces
-
Find Specific IP Address
ifconfig | grep 'inet' | cut -d ' ' -f 9 # Extracts the IPv4 address
-
Combine Files Column-wise
paste name.txt surname.txt # Joins columns from two files paste -d ' ' name.txt surname.txt # Joins with space delimiter
-
Save Output to a New File
paste -d ' ' name.txt surname.txt > fullname.txt # Saves the output in fullname.txt
: An Advanced Text Processing Tool
AWK is a powerful tool used for processing and analyzing text. It can modify file content, unlike commands such as cut
, sort
, grep
, and cat
, which are used mainly for text extraction and viewing. AWK allows you to perform operations on file contents based on patterns and conditions.
An AWK command typically follows this syntax:
awk 'pattern {action}' filename
- Pattern: The condition that determines when the action is performed.
- Action: The task that AWK performs if the pattern matches.
For example:
awk '{print $0}' filename
This command will print all lines of the file because $0
refers to the entire line.
AWK allows you to specify a delimiter for field separation in a file. The default delimiter is a space or tab, but you can set a custom one using the -F
option.
For instance:
awk -F: '{print $1}' filename
Here, -F:
sets the colon :
as the field separator. $1
refers to the first field of each line.
To print multiple fields, you can list them individually:
awk -F: '{print $1, $2, $3}' filename
You cannot use a range like $1-$3
directly. Instead, specify each field individually (e.g., $1, $2, $3
).
AWK also lets you filter content based on patterns. For example, to extract the second field of each line from the ifconfig
command output:
ifconfig | awk '{print $2}'
You can also apply conditions to select lines that match a certain pattern. For example, to print the second field of lines containing the string inet
:
ifconfig | awk '/inet /{print $2}'
AWK allows you to add custom text at specific points in the command execution using the BEGIN
and END
blocks. These blocks allow you to define actions that happen before processing the data (BEGIN) and after processing is complete (END). These are useful for formatting output or adding headers/footers to the results.
If you want to print custom text before displaying the actual data, you can use the BEGIN
block:
command = ifconfig | awk 'BEGIN{print "== IP Address =="} /inet /{print $2}'
Output:
== IP Address ==
192.168.1.40
127.0.0.1
In this command:
- The
BEGIN
block prints"== IP Address =="
before any data is processed. - The
/inet /{print $2}
part prints the second field (the IP address) of the lines containinginet
.
You can also use the END
block to add custom text after the data processing is done. For example:
command = ifconfig | awk 'BEGIN{print "== IP Address =="} /inet /{print $2} END{print "====="}'
Output:
== IP Address ==
192.168.1.40
127.0.0.1
=====
In this command:
- The
BEGIN
block prints the header"== IP Address ==""
. - The
/inet /{print $2}
part prints the IP addresses. - The
END
block prints"====="
after the data has been processed.
You can use AWK with the echo
command to add a custom message at the beginning and end of the output:
command = echo "one two three four" | awk 'BEGIN {print "=== Start ==="} {print $0} END {print "-- Stop --"}'
Output:
=== Start ===
one two three four
-- Stop --
In this command:
- The
BEGIN
block prints"=== Start ==="
. - The
{print $0}
part prints the entire input line ($0
refers to the whole line). - The
END
block prints"-- Stop --"
after the input has been processed.
The BEGIN
and END
blocks in AWK are useful for:
-
Printing a header before the main output (
BEGIN
block). -
Adding a footer or final message after the data processing (
END
block).
In AWK, you can include custom delimiters when printing output. This allows you to control the format and how the fields are separated in the printed result. You can add separators like spaces, commas, slashes, or any other character to customize the output. This is helpful for organizing the printed fields or making the output more readable.
Here are a few examples:
You can print two fields with a dash between them:
command → echo "one two three four" | awk 'BEGIN {print "=== start ==="} {print $1, " - ", $2} END {print "-- stop --"}'
Output:
=== start ===
one - two
-- stop --
- In this example:
- The
BEGIN
block prints"=== start ==="
. - The
{print $1, " - ", $2}
part prints the first and second fields separated by a dash (-
). - The
END
block prints"-- stop --"
after the data.
- The
You can print two fields with a slash between them:
command → echo "one two three four" | awk 'BEGIN {print "=== start ==="} {print $1, " / ", $2} END {print "-- stop --"}'
Output:
=== start ===
one / two
-- stop --
- Here, the fields are separated by a slash (
/
).
You can add a newline between two fields by using /n
to break the output into separate lines:
command → echo "one two three four" | awk 'BEGIN {print "=== start ==="} {print $1, " /n ", $2} END {print "-- stop --"}'
Output:
=== start ===
one /n two
-- stop --
- In this case,
/n
will not actually produce a new line. If you want a true newline, just use\n
in the print statement:
command → echo "one two three four" | awk 'BEGIN {print "=== start ==="} {print $1, "\n", $2} END {print "-- stop --"}'
Output:
=== start ===
one
two
-- stop --
- Here,
\n
correctly creates a new line between the fields.
You can print fields with an underscore between them:
command → echo "one two three four" | awk 'BEGIN {print "=== start ==="} {print $1, " _ ", $2} END {print "-- stop --"}'
Output:
=== start ===
one _ two
-- stop --
- In this example, the fields are separated by an underscore (
_
).
AWK is a powerful tool for text processing, allowing you to extract and manipulate specific fields from text files or commands. Below are some useful examples that show how you can work with AWK to process text and fields in different ways.
To print specific fields from an input, you can use AWK to reference fields by number.
-
Print the first field:
command → echo "armour infosec" | awk '{print $1}'
Output:
armour
- This prints the first field, which is "armour".
-
Print the second field:
command → echo "armour infosec" | awk '{print $2}'
Output:
infosec
- This prints the second field, which is "infosec".
You can update a field's value before printing it using AWK.
- Update the first field:
Output:
command → echo "armour infosec" | awk '{$1="ARMOUR"; print $1}'
ARMOUR
- Here, the first field is updated to "ARMOUR" and printed.
You can filter specific values by using conditional statements in AWK.
-
Print the second field where the first field matches "inet" (e.g., from
ifconfig
output):command → ifconfig | awk '$1=="inet" {print $2}'
Output:
192.168.1.40 127.0.0.1
- This prints the second field (IP addresses) where the first field is "inet".
-
Print lines where the first field does not match "inet":
command → ifconfig | awk '$1!="inet" {print $2}'
Output:
(Lines that don't have "inet")
AWK allows you to specify field separators to process files with different delimiters. For example, you can use a colon (:
) separator to work with /etc/passwd
.
-
Print lines where the first field is "root":
command → cat /etc/passwd | awk -F: '$1=="root" {print $0}'
Output:
root:x:0:0:root:/root:/bin/bash
- This command prints lines where the first field is "root".
-
Print lines where the first field is not "root":
command → cat /etc/passwd | awk -F: '$1!="root" {print $0}'
Output:
(Lines where the first field is not "root")
AWK can also handle numeric comparisons for fields.
-
Print lines where the third field is "0" (e.g., from
/etc/passwd
):command → cat /etc/passwd | awk -F: '$3==0 {print $0}'
Output:
root:x:0:0:root:/root:/bin/bash
-
Print lines where the third field is greater than or equal to "1000":
command → cat /etc/passwd | awk -F: '$3>=1000 {print $0}'
Output:
(Lines with UID >= 1000)
-
Print lines where the third field is greater than "0":
command → cat /etc/passwd | awk -F: '$3>0 {print $0}'
Output:
(Lines with UID > 0)
AWK can process files line by line and can output data in a specific format using BEGIN
and END
blocks.
- Using
BEGIN
andEND
to print a header and footer with a file:Output:command → vim test.txt BEGIN { print "passwd file" } { print $1, "home at", $6 } END { print "END passwd file" } command → awk -F: -f test.txt /etc/passwd
passwd file root home at /root user1 home at /home/user1 END passwd file
AWK supports arithmetic operations such as addition, subtraction, multiplication, and division.
- Print the MAC address from
ifconfig
using AWK (find lines that start with "ether"):Output:command → ifconfig | awk '$1=="ether" {print $2}'
(MAC address)
AWK automatically counts the number of fields in each line with the special variable NF
.
-
Print the number of fields in a line:
command → echo "one two three four" | awk '{print NF}'
Output:
4
- This prints the number of fields (4 in this case).
-
Print the last field:
command → echo "one two three four" | awk '{print $NF}'
Output:
four
$NF
refers to the last field.
-
Print the second-last field:
command → echo "one two three four" | awk '{print $(NF-1)}'
Output:
three
-
Print both the last and second-last fields:
command → echo "one two three four" | awk '{print $NF, $(NF-1)}'
Output:
four three
You can use NF
to analyze the /etc/passwd
file or other files.
-
Print the number of fields in each line of
/etc/passwd
:command → cat /etc/passwd | awk '{print NF}'
-
Print the last field of each line in
/etc/passwd
:command → cat /etc/passwd | awk '{print $NF}'
-
Print both the last and second-last fields in
/etc/passwd
:command → cat /etc/passwd | awk '{print $NF, $(NF-1)}'
AWK is an extremely flexible tool for text processing. It allows you to:
- Access specific fields.
- Update or filter data based on conditions.
- Use mathematical operations.
- Process files with specific delimiters.
- Count fields with
NF
and manipulate data accordingly.
sed OPTIONS [SCRIPT] [INPUT_FILE]
SCRIPT:
[addr]X[options
X is a single-letter sed command
[addr] can be a single line number, a regular expression, or a range of lines. If [addr] is specified, the command X will be expression only on the matched lines
Additional [ options ] are used for some sed command
sed ‘20,25d’ input.txt > output.txt
The following example deletes lines 20 to 25 in the input.
20,25 is an address range
d is the delete commans
SED commands
a text append text adter a line
d delete the pattern
i text insert text before a line
p print the pattern space
q [ exit-code ] (quit) Exit sed without processing any more commands to input
s/regexp/replacement/[flags] (subsstitute) Match the regular-expansion against the content of the pattern space. If found, replace matched string with replacement.
Command Line operations
-n disable automatic printing; sed only produces output when explicitly told to via the p command.
-e script add script
-r use extended regular expression rather than basic regular expressions.
sed ‘1.2p’ /etc/passwd
This will print line 1 and 2, two times and the rest of the file as same as it is
sed -n ‘1.2p’ /etc/passwd
This command will only print line 1 and 2, the rest of the file will not be printed
sed -n ‘/^$/p’ /etc/passwd
Print only empty lines
sed -n ‘/^$/!p’ /etc/passwd
Do not print empty lines
sed -n '1,$p' /etc/passwd
Print all lines
sed -n '$p' /etc/passwd
This will only print last line
sed -n '5,8!p' /etc/passwd
Do not print lines 5 to 8
sed '5,$d' /etc/passwd
Do not print lines 5 to last / print lines 1 to 4
sed -n '/root/p' /etc/passwd
Matching root
in given file
sed -n '/root/,+3p' /etc/passwd
Print lines where root
is matched and print next 3 lines where root
is matched
sed -n 's/sbin/SBIN' /etc/passwd
Searching sbin
and replacing it with SBIN
( This will only replace first match in a single lines )
sed -n 's/root/ROOT/gp' /etc/passwd
g
Means global, every single match of the file will be replaced with the word we are replacing
sed -n 's/root/ROOT/1p' /etc/passwd
Match root
in line in given file and replace only first root
withROOT
sed -n 's/root/ROOT/2p' /etc/passwd
Match root
in line in given file and replace only second root
withROOT
echo “Welcome To Presidential Suite” | sed ‘s/\(\b[A-Z]\)/\(\1\)/g’
Output will be :
(W)elcome (T)o (P)residential (S)uite
sed ‘4 s/sbin/SBIN/’ file_name.txt
This will replace sbin
to SBIN
in line number 4
sed G file_name.txt
Adds an empty line after every line in the file.
sed 'G;G;G' file_name.txt
This will add 3 empty lines after every line in the file.
sed ‘/arnold/a ARNOLD User’ file_name.txt
This will add ARNOLD User
after arnold
in the given file
sed ‘2a ARNOLD User’ file_name.txt
This will add ARNOLD User
after the second line of the given file
sed ‘5a ARNOLD User’ file_name.txt
This will add ARNOLD User
after the fifth line of the given file
sed ‘5!a ARNOLD User’ file_name.txt
This will add ARNOLD User
after every line except fifth line of the given file
sed ‘1,5a ARNOLD User’ file_name.txt
This will add ARNOLD User
after every line between 1 to 5th line of the given file
sed ‘1i ARNOLD User’ file_name.txt
This will add ARNOLD User
before the first line of the given file
sed ‘5!i ARNOLD User’ file_name.txt
This will add ARNOLD User
before every line except fifth line of the given file
sed ‘1i--------------------’ filename.txt
This will make a line after line 1
sed ‘1,$i--------------------’ filename.txt
This will make a line after every line
sed ‘/arnold/d’ file_name.txt
d
is for delete
This will not print arnold
in the file output
sed ‘/\tarnold/d’ file_name .txt
Do not print arnold
before which tab
key is used
sed ‘/[[:space:]]arnold/d’ filename.txt
This will NOT
print lines where arnold
appears after a ENTER
(at the start of a new line) SPACE
and TAB
key.
sed ‘[0-9]/d’ filename.txt
Do NOT
print output line which has given range of numbers in it.
sed ‘/[[:digit:]]/p’ filename.txt
This will only print output which has digits
sed ‘/[[:digit:]]/d’ filename.txt
This will NOT print output which has digits
sed ‘/[a-zA-Z]/d’ filename.txt
Delete all lines containing alphabets A to Z capital to small
sed ‘/[A-Z]/d’ filename.txt
Delete all lines containing alphabets A to Z capital or Uppercase
sed ‘/\//d’ filename.txt
Delete all lines containing Slash (/)
sed ‘/\thr/d’ filename.txt
Delete all lines containinghr
after TAB
key
sed ‘/arnold/c ARNOLD User’ filename.txt
This will change or replace arnold
to ARNOLD User
sed '1c ARNOLD' filename.txt
This will change or replace arnold
to ARNOLD
Only in line 1
sed '1,5c ARNOLD' filename.txt
This will change or replace arnold
to ARNOLD
Only in line range 1 to 5
sed ‘/arnold/q’ filename.txt
This will Quit
the file when arnold
is found / printed
The status code of a successfully ran command is 0
, To make any status code of a successful command rather than 0
Use command below
sed ‘/arnold/q2’ filename.txt
echo $?
The output of status code of command will be 2
sed ‘1e date’ /etc/passwd
This will generate output of date
command at first line of /etc/passwd
file output
sed ‘$e date’ /etc/passwd
This will generate output of date
command at last line of /etc/passwd
file output
sed ‘1e echo -n "Date: "; date’ filename.txt
This will echo
date
then output of date
command at first line of filename.txt
file output
sed ‘1,3e id’ filename.txt
This will generate output of id
command at first to third line of filename.txt
file output
echo “one five three” | sed ‘s/five/two/’
This will search five then replace it with two
sed ‘s/arnold/ARNOLD/’ filename.txt
This will search arnold
and replace it with ARNOLD
in the given file
echo “Arnold user UID 1000” | sed ‘s/[[:digit:]]\+/***/’
This will replace the digits with three stars ( *** ) ; Output will be :
Arnold user UID ***
echo “Arnold user UID 1000” | sed ‘s/[[:digit:]]/*****/’
This will replace only first digit with five stars ( ***** ) ; Output will be :
Arnold user UID *****000
echo “Arnold user UID 1000” | sed ‘s/[0-9]\+/*****/’
Using Pattern instead of Character class;Output will be :
Arnold user UID *****
sed ‘s/[0-9]\+/*****/g’ filename.txt
This will replace all the digits of the file with four stars
sed ‘s/[[:digit:]]\+/***/g’ filename.txt
This will replace all of the digits of the given file with three stars
sed ‘s/arnold/Arnold/ & & & &/’ filename.txt
This will repeat Arnold arnold arnold arnold arnold
when arnold
is found in the given file ( Depends on the number of & symbol we provided )
sed -ne ‘/arnold/p’ -ne ‘/root/p’ filename.txt
sed -e ‘/arnold/a "+++++++++++++++++++++" ’ -e ‘/arnold/i "---------------------" ’ filename.txt
This command will append +++++++++++++++++++++
and prepend ---------------------
at the same time when the arnold
word is found in the given file.
sed -i -e ‘/arnold/a "+++++++++++++++++++++" ’ -e ‘/arnold/i "---------------------" ’ filename.txt
Using -i will save the changes in the file