STRING PROCESSING IN LINUX

Why is string processing used? String processing is commonly used to filter and extract specific parts of output from commands such as ifconfig, ip a, etc.

HEAD

head Command
The head command is used to display the first few lines of a file. By default, it shows the first 10 lines.

head 1.txt   # Displays the first 10 lines of 1.txt
head -n 5 1.txt   # Displays the first 5 lines of 1.txt
head -n 3 1.txt   # Displays the first 3 lines of 1.txt
head -c 50 1.txt   # Displays the first 50 bytes of 1.txt

You can also display the first few lines from multiple files:

head 1.txt messages /var/log/centos.rep   # Displays first 10 lines from all three files

TAIL

tail Command
The tail command is used to display the last few lines of a file. It shows the last 10 lines by default.

tail 1.txt messages passwd centos.rep   # Displays the last 10 lines of each file
tail -f /var/log   # Displays the last 10 lines of log files and updates when new entries appear

WC

wc Command
The wc command counts the lines, words, and bytes in a file.

wc 1.txt   # Output: lines words bytes
wc -l 1.txt   # Counts the number of lines
wc -w 1.txt   # Counts the number of words
wc -c 1.txt   # Counts the number of bytes

Counting for Multiple Files

wc *   # Counts lines, words, and bytes for all files
ls -1   # Lists files in the directory
wc -l   # Counts the number of files

SORT

Basic Sorting

sort test1.txt   # Sorts lines alphabetically

Additional Sort Options

Sort in numeric order (for numbers)

sort -h test1.txt   # Sorts numbers in ascending order

Sort in reverse order

sort -r test1.txt   # Sorts lines in reverse order

Sort randomly

sort -R test1.txt   # Sorts lines randomly

Remove duplicates

sort test1.txt | uniq -c   # Removes duplicates and shows their count

Sorting by Specific Columns

sort -k 2 test1.txt   # Sorts by the second column

Handling Delimited Files

sort -t ',' -k 2 test1.txt   # Sorts by the second column of a comma-separated file

Sorting by Month

sort -M test1.txt   # Sorts by month names

WATCH COMMAND

Monitor the output of a command periodically

watch -n 5 date   # Displays the current date every 5 seconds
watch -n 1 date   # Displays the current date every second

GREP

Basic Search

grep searchword filename   # Finds occurrences of 'searchword' in the file
grep "root" /etc/passwd   # Finds 'root' in the /etc/passwd file

Search in Multiple Files

grep root /etc/passwd /etc/shadow   # Finds 'root' in both files

Ignore Case (Case-insensitive search)

grep -i root messages   # Finds both 'root' and 'Root'

Search for Multiple Words

grep -E "(session|root|mounting)" var/log/messages   # Finds 'session', 'root', or 'mounting'

Search for Words in the Same Line

grep "session" var/log/messages | grep root   # Finds lines containing both 'session' and 'root'

Exclude a Word (Using -v)

grep "session" var/log/messages | grep -v root   # Finds lines with 'session' but not 'root'

Search for a Word at the Beginning or End of a Line

Word at the beginning of a line:

grep "^root" var/log/messages   # Finds lines where 'root' appears at the start

Word at the end of a line:

grep "root.$" var/log/messages   # Finds lines where 'root' appears at the end

Search by Date

grep "^2sep" var/log/messages | grep root   # Finds logs for 'root' on 2nd September

ADVANCED GREP

Find Single Character After a Word

grep 'roo.' filename   # Finds 'roo' followed by any single character

Find Multiple Characters After a Word

grep 'roo..' filename   # Finds 'roo' followed by two characters

Find Empty Lines

grep '^$' filename   # Finds empty lines

Exclude Lines Starting with # (e.g., Comments)

grep -v '^#' ssd_config   # Excludes lines starting with #

Find Alphanumeric Characters

grep "[[:alnum:]]" filename   # Finds all alphanumeric characters

Search for a Pattern from Another File

grep -f domain.txt url.txt   # Searches for domains from domain.txt in url.txt

CUT COMMAND

Extract Columns from CSV File

cut -d ',' -f 2 my-csv.csv   # Extracts the second column (e.g., names)
cut -d ',' -f 1,3 my-csv.csv   # Extracts columns 1 and 3
cut -d ',' -f 1-3 my-csv.csv   # Extracts columns 1 to 3

Replace Commas with Spaces

cut my-csv.csv | tr ',' ' '   # Replaces commas with spaces

Find Specific IP Address

ifconfig | grep 'inet' | cut -d ' ' -f 9   # Extracts the IPv4 address

PASTE

Combine Files Column-wise

paste name.txt surname.txt   # Joins columns from two files
paste -d ' ' name.txt surname.txt   # Joins with space delimiter

Save Output to a New File

paste -d ' ' name.txt surname.txt > fullname.txt   # Saves the output in fullname.txt

AWK

: An Advanced Text Processing Tool

AWK is a powerful tool used for processing and analyzing text. It can modify file content, unlike commands such as cut, sort, grep, and cat, which are used mainly for text extraction and viewing. AWK allows you to perform operations on file contents based on patterns and conditions.

Basic Structure of an AWK Command:

An AWK command typically follows this syntax:

awk 'pattern {action}' filename

Pattern: The condition that determines when the action is performed.
Action: The task that AWK performs if the pattern matches.

For example:

awk '{print $0}' filename

This command will print all lines of the file because $0 refers to the entire line.

Using a Custom Delimiter:

AWK allows you to specify a delimiter for field separation in a file. The default delimiter is a space or tab, but you can set a custom one using the -F option.

For instance:

awk -F: '{print $1}' filename

Here, -F: sets the colon : as the field separator. $1 refers to the first field of each line.

Specifying Multiple Fields:

To print multiple fields, you can list them individually:

awk -F: '{print $1, $2, $3}' filename

You cannot use a range like $1-$3 directly. Instead, specify each field individually (e.g., $1, $2, $3).

Filtering with AWK:

AWK also lets you filter content based on patterns. For example, to extract the second field of each line from the ifconfig command output:

ifconfig | awk '{print $2}'

You can also apply conditions to select lines that match a certain pattern. For example, to print the second field of lines containing the string inet:

ifconfig | awk '/inet /{print $2}'

AWK with BEGIN and END Blocks

AWK allows you to add custom text at specific points in the command execution using the BEGIN and END blocks. These blocks allow you to define actions that happen before processing the data (BEGIN) and after processing is complete (END). These are useful for formatting output or adding headers/footers to the results.

Example 1: Printing a Header and Data with BEGIN

If you want to print custom text before displaying the actual data, you can use the BEGIN block:

command = ifconfig | awk 'BEGIN{print "== IP Address =="} /inet /{print $2}'

Output:

== IP Address ==
192.168.1.40
127.0.0.1

In this command:

The BEGIN block prints "== IP Address ==" before any data is processed.
The /inet /{print $2} part prints the second field (the IP address) of the lines containing inet.

Example 2: Adding a Footer with END

You can also use the END block to add custom text after the data processing is done. For example:

command = ifconfig | awk 'BEGIN{print "== IP Address =="} /inet /{print $2} END{print "====="}'

Output:

== IP Address ==
192.168.1.40
127.0.0.1
=====

In this command:

The BEGIN block prints the header "== IP Address =="".
The /inet /{print $2} part prints the IP addresses.
The END block prints "=====" after the data has been processed.

Example 3: Custom Text with `echo` Command

You can use AWK with the echo command to add a custom message at the beginning and end of the output:

command = echo "one two three four" | awk 'BEGIN {print "=== Start ==="} {print $0} END {print "-- Stop --"}'

Output:

=== Start ===
one two three four
-- Stop --

In this command:

The BEGIN block prints "=== Start ===".
The {print $0} part prints the entire input line ($0 refers to the whole line).
The END block prints "-- Stop --" after the input has been processed.

The BEGIN and END blocks in AWK are useful for:

Printing a header before the main output (BEGIN block).
Adding a footer or final message after the data processing (END block).

DELIMITERS

In AWK, you can include custom delimiters when printing output. This allows you to control the format and how the fields are separated in the printed result. You can add separators like spaces, commas, slashes, or any other character to customize the output. This is helpful for organizing the printed fields or making the output more readable.

Here are a few examples:

Example 1: Using a Dash (" - ") Between Fields

You can print two fields with a dash between them:

command → echo "one two three four" | awk 'BEGIN {print "=== start ==="} {print $1, " - ", $2} END {print "-- stop --"}'

Output:

=== start ===
one - two
-- stop --

In this example:
- The BEGIN block prints "=== start ===".
- The {print $1, " - ", $2} part prints the first and second fields separated by a dash (-).
- The END block prints "-- stop --" after the data.

Example 2: Using a Slash (" / ") Between Fields

You can print two fields with a slash between them:

command → echo "one two three four" | awk 'BEGIN {print "=== start ==="} {print $1, " / ", $2} END {print "-- stop --"}'

Output:

=== start ===
one / two
-- stop --

Here, the fields are separated by a slash (/).

Example 3: Using a Newline (" /n ") Between Fields

You can add a newline between two fields by using /n to break the output into separate lines:

command → echo "one two three four" | awk 'BEGIN {print "=== start ==="} {print $1, " /n ", $2} END {print "-- stop --"}'

Output:

=== start ===
one /n two
-- stop --

In this case, /n will not actually produce a new line. If you want a true newline, just use \n in the print statement:

command → echo "one two three four" | awk 'BEGIN {print "=== start ==="} {print $1, "\n", $2} END {print "-- stop --"}'

Output:

=== start ===
one 
two
-- stop --

Here, \n correctly creates a new line between the fields.

Example 4: Using an Underscore (" _ ") Between Fields

You can print fields with an underscore between them:

command → echo "one two three four" | awk 'BEGIN {print "=== start ==="} {print $1, " _ ", $2} END {print "-- stop --"}'

Output:

=== start ===
one _ two
-- stop --

In this example, the fields are separated by an underscore (_).

WORKING with AWK

AWK is a powerful tool for text processing, allowing you to extract and manipulate specific fields from text files or commands. Below are some useful examples that show how you can work with AWK to process text and fields in different ways.

1. Accessing Specific Fields

To print specific fields from an input, you can use AWK to reference fields by number.

Print the first field:
```
command → echo "armour infosec" | awk '{print $1}'
```
Output:
```
armour
```
- This prints the first field, which is "armour".
Print the second field:
```
command → echo "armour infosec" | awk '{print $2}'
```
Output:
```
infosec
```
- This prints the second field, which is "infosec".

2. Updating Values in Fields

You can update a field's value before printing it using AWK.

Update the first field:
```
command → echo "armour infosec" | awk '{$1="ARMOUR"; print $1}'
```
Output:
```
ARMOUR
```
- Here, the first field is updated to "ARMOUR" and printed.

3. Matching Specific Values

You can filter specific values by using conditional statements in AWK.

Print the second field where the first field matches "inet" (e.g., from ifconfig output):
```
command → ifconfig | awk '$1=="inet" {print $2}'
```
Output:
```
192.168.1.40
127.0.0.1
```
- This prints the second field (IP addresses) where the first field is "inet".

Print lines where the first field does not match "inet":

command → ifconfig | awk '$1!="inet" {print $2}'

Output:

(Lines that don't have "inet")

4. Using Field Separators

AWK allows you to specify field separators to process files with different delimiters. For example, you can use a colon (:) separator to work with /etc/passwd.

Print lines where the first field is "root":
```
command → cat /etc/passwd | awk -F: '$1=="root" {print $0}'
```
Output:
```
root:x:0:0:root:/root:/bin/bash
```
- This command prints lines where the first field is "root".

Print lines where the first field is not "root":

command → cat /etc/passwd | awk -F: '$1!="root" {print $0}'

Output:

(Lines where the first field is not "root")

5. Working with Numbers and Fields

AWK can also handle numeric comparisons for fields.

Print lines where the third field is "0" (e.g., from /etc/passwd):

command → cat /etc/passwd | awk -F: '$3==0 {print $0}'

Output:

root:x:0:0:root:/root:/bin/bash

Print lines where the third field is greater than or equal to "1000":

command → cat /etc/passwd | awk -F: '$3>=1000 {print $0}'

Output:

(Lines with UID >= 1000)

Print lines where the third field is greater than "0":

command → cat /etc/passwd | awk -F: '$3>0 {print $0}'

Output:

(Lines with UID > 0)

6. Using AWK with Files

AWK can process files line by line and can output data in a specific format using BEGIN and END blocks.

Using BEGIN and END to print a header and footer with a file:

command → vim test.txt
BEGIN { print "passwd file" }
{ print $1, "home at", $6 }
END { print "END passwd file" }

command → awk -F: -f test.txt /etc/passwd

Output:

passwd file
root home at /root
user1 home at /home/user1
END passwd file

7. Arithmetic Operations with AWK

AWK supports arithmetic operations such as addition, subtraction, multiplication, and division.

Print the MAC address from ifconfig using AWK (find lines that start with "ether"):
```
command → ifconfig | awk '$1=="ether" {print $2}'
```
Output:
```
(MAC address)
```

8. Working with Field Count (`NF`)

AWK automatically counts the number of fields in each line with the special variable NF.

Print the number of fields in a line:
```
command → echo "one two three four" | awk '{print NF}'
```
Output:
```
4
```
- This prints the number of fields (4 in this case).

Print the last field:

command → echo "one two three four" | awk '{print $NF}'

Output:

four

$NF refers to the last field.

Print the second-last field:

command → echo "one two three four" | awk '{print $(NF-1)}'

Output:

three

Print both the last and second-last fields:

command → echo "one two three four" | awk '{print $NF, $(NF-1)}'

Output:

four three

9. Working with `/etc/passwd` File Fields

You can use NF to analyze the /etc/passwd file or other files.

Print the number of fields in each line of /etc/passwd:
```
command → cat /etc/passwd | awk '{print NF}'
```

Print the last field of each line in /etc/passwd:

command → cat /etc/passwd | awk '{print $NF}'

Print both the last and second-last fields in /etc/passwd:

command → cat /etc/passwd | awk '{print $NF, $(NF-1)}'

Summary:

AWK is an extremely flexible tool for text processing. It allows you to:

Access specific fields.
Update or filter data based on conditions.
Use mathematical operations.
Process files with specific delimiters.
Count fields with NF and manipulate data accordingly.

SED

sed OPTIONS [SCRIPT] [INPUT_FILE]

SCRIPT:

[addr]X[options

X is a single-letter sed command

[addr] can be a single line number, a regular expression, or a range of lines. If [addr] is specified, the command X will be expression only on the matched lines

Additional [ options ] are used for some sed command

sed ‘20,25d’ input.txt > output.txt

The following example deletes lines 20 to 25 in the input.

20,25 is an address range

d is the delete commans

SED commands

a text append text adter a line

d delete the pattern

i text insert text before a line

p print the pattern space

q [ exit-code ] (quit) Exit sed without processing any more commands to input

s/regexp/replacement/[flags] (subsstitute) Match the regular-expansion against the content of the pattern space. If found, replace matched string with replacement.

Command Line operations

-n disable automatic printing; sed only produces output when explicitly told to via the p command.

-e script add script

-r use extended regular expression rather than basic regular expressions.

sed ‘1.2p’ /etc/passwd

This will print line 1 and 2, two times and the rest of the file as same as it is

sed -n ‘1.2p’ /etc/passwd

This command will only print line 1 and 2, the rest of the file will not be printed

sed -n ‘/^$/p’ /etc/passwd

Print only empty lines

sed -n ‘/^$/!p’ /etc/passwd

Do not print empty lines

sed -n '1,$p' /etc/passwd

Print all lines

sed -n '$p' /etc/passwd

This will only print last line

sed -n '5,8!p' /etc/passwd

Do not print lines 5 to 8

sed '5,$d' /etc/passwd

Do not print lines 5 to last / print lines 1 to 4

Pattern Matching / Pattern Searching

sed -n '/root/p' /etc/passwd

Matching root in given file

sed -n '/root/,+3p' /etc/passwd

Print lines where root is matched and print next 3 lines where root is matched

Find and replace

sed -n 's/sbin/SBIN' /etc/passwd

Searching sbin and replacing it with SBIN ( This will only replace first match in a single lines )

sed -n 's/root/ROOT/gp' /etc/passwd

g Means global, every single match of the file will be replaced with the word we are replacing

sed -n 's/root/ROOT/1p' /etc/passwd

Match root in line in given file and replace only first root withROOT

sed -n 's/root/ROOT/2p' /etc/passwd

Match root in line in given file and replace only second root withROOT

echo “Welcome To Presidential Suite” | sed ‘s/\(\b[A-Z]\)/\(\1\)/g’

Output will be : (W)elcome (T)o (P)residential (S)uite

sed ‘4 s/sbin/SBIN/’ file_name.txt

This will replace sbin to SBIN in line number 4

sed G file_name.txt

Adds an empty line after every line in the file.

sed 'G;G;G' file_name.txt

This will add 3 empty lines after every line in the file.

Append

sed ‘/arnold/a ARNOLD User’ file_name.txt

This will add ARNOLD User after arnold in the given file

sed ‘2a ARNOLD User’ file_name.txt

This will add ARNOLD User after the second line of the given file

sed ‘5a ARNOLD User’ file_name.txt

This will add ARNOLD User after the fifth line of the given file

sed ‘5!a ARNOLD User’ file_name.txt

This will add ARNOLD User after every line except fifth line of the given file

sed ‘1,5a ARNOLD User’ file_name.txt

This will add ARNOLD User after every line between 1 to 5th line of the given file

prepend

sed ‘1i ARNOLD User’ file_name.txt

This will add ARNOLD User before the first line of the given file

sed ‘5!i ARNOLD User’ file_name.txt

This will add ARNOLD User before every line except fifth line of the given file

sed ‘1i--------------------’ filename.txt

This will make a line after line 1

sed ‘1,$i--------------------’ filename.txt

This will make a line after every line

sed ‘/arnold/d’ file_name.txt

d is for delete This will not print arnold in the file output

sed ‘/\tarnold/d’ file_name .txt

Do not print arnold before which tab key is used

Using character classes

sed ‘/[[:space:]]arnold/d’ filename.txt

This will NOT print lines where arnold appears after a ENTER (at the start of a new line) SPACE and TAB key.

sed ‘[0-9]/d’ filename.txt

Do NOT print output line which has given range of numbers in it.

sed ‘/[[:digit:]]/p’ filename.txt

This will only print output which has digits

sed ‘/[[:digit:]]/d’ filename.txt

This will NOT print output which has digits

sed ‘/[a-zA-Z]/d’ filename.txt

Delete all lines containing alphabets A to Z capital to small

sed ‘/[A-Z]/d’ filename.txt

Delete all lines containing alphabets A to Z capital or Uppercase

sed ‘/\//d’ filename.txt

Delete all lines containing Slash (/)

sed ‘/\thr/d’ filename.txt

Delete all lines containinghr after TAB key

Change or replace

sed ‘/arnold/c ARNOLD User’ filename.txt

This will change or replace arnold to ARNOLD User

sed '1c ARNOLD' filename.txt

This will change or replace arnold to ARNOLD Only in line 1

sed '1,5c ARNOLD' filename.txt

This will change or replace arnold to ARNOLD Only in line range 1 to 5

Quit or exit file

sed ‘/arnold/q’ filename.txt

This will Quit the file when arnold is found / printed

Exit status code

The status code of a successfully ran command is 0 , To make any status code of a successful command rather than 0 Use command below

sed ‘/arnold/q2’ filename.txt
echo $?

The output of status code of command will be 2

Perform the shell command

sed ‘1e date’ /etc/passwd

This will generate output of date command at first line of /etc/passwd file output

sed ‘$e date’ /etc/passwd

This will generate output of date command at last line of /etc/passwd file output

sed ‘1e echo -n "Date: "; date’ filename.txt

This will echo date then output of date command at first line of filename.txt file output

sed ‘1,3e id’ filename.txt

This will generate output of id command at first to third line of filename.txt file output

Substitute

echo “one five three” | sed ‘s/five/two/’

This will search five then replace it with two

sed ‘s/arnold/ARNOLD/’ filename.txt

This will search arnold and replace it with ARNOLD in the given file

echo “Arnold user UID 1000” | sed ‘s/[[:digit:]]\+/***/’

This will replace the digits with three stars ( *** ) ; Output will be : Arnold user UID ***

echo “Arnold user UID 1000” | sed ‘s/[[:digit:]]/*****/’

This will replace only first digit with five stars ( ***** ) ; Output will be : Arnold user UID *****000

echo “Arnold user UID 1000” | sed ‘s/[0-9]\+/*****/’

Using Pattern instead of Character class;Output will be : Arnold user UID *****

sed ‘s/[0-9]\+/*****/g’ filename.txt

This will replace all the digits of the file with four stars

sed ‘s/[[:digit:]]\+/***/g’ filename.txt

This will replace all of the digits of the given file with three stars

sed ‘s/arnold/Arnold/ & & & &/’ filename.txt

This will repeat Arnold arnold arnold arnold arnold when arnold is found in the given file ( Depends on the number of & symbol we provided )

Multiple sed commands [ -e ]

sed -ne ‘/arnold/p’ -ne ‘/root/p’ filename.txt

sed -e ‘/arnold/a "+++++++++++++++++++++" ’ -e ‘/arnold/i "---------------------" ’ filename.txt

This command will append +++++++++++++++++++++ and prepend --------------------- at the same time when the arnold word is found in the given file.

Saving the changes [ -i ]

sed -i -e ‘/arnold/a "+++++++++++++++++++++" ’ -e ‘/arnold/i "---------------------" ’ filename.txt

Using -i will save the changes in the file

Files

String-Processing.md

Latest commit

History

String-Processing.md

File metadata and controls

STRING PROCESSING IN LINUX

HEAD

TAIL

WC

SORT

WATCH COMMAND

GREP

ADVANCED GREP

CUT COMMAND

PASTE

AWK

Basic Structure of an AWK Command:

Using a Custom Delimiter:

Specifying Multiple Fields:

Filtering with AWK:

AWK with BEGIN and END Blocks

Example 1: Printing a Header and Data with BEGIN

Example 2: Adding a Footer with END

Example 3: Custom Text with echo Command

DELIMITERS

Example 1: Using a Dash (" - ") Between Fields

Example 2: Using a Slash (" / ") Between Fields

Example 3: Using a Newline (" /n ") Between Fields

Example 4: Using an Underscore (" _ ") Between Fields

WORKING with AWK

1. Accessing Specific Fields

2. Updating Values in Fields

3. Matching Specific Values

4. Using Field Separators

5. Working with Numbers and Fields

6. Using AWK with Files

7. Arithmetic Operations with AWK

8. Working with Field Count (NF)

9. Working with /etc/passwd File Fields

Summary:

SED

Pattern Matching / Pattern Searching

Find and replace

Append

prepend

Using character classes

Change or replace

Quit or exit file

Exit status code

Perform the shell command

Substitute

Multiple sed commands [ -e ]

Saving the changes [ -i ]

Example 3: Custom Text with `echo` Command

8. Working with Field Count (`NF`)

9. Working with `/etc/passwd` File Fields