Skip to content

Latest commit

 

History

History
1144 lines (819 loc) · 26.2 KB

String-Processing.md

File metadata and controls

1144 lines (819 loc) · 26.2 KB

STRING PROCESSING IN LINUX

Why is string processing used? String processing is commonly used to filter and extract specific parts of output from commands such as ifconfig, ip a, etc.


HEAD

  1. head Command
    The head command is used to display the first few lines of a file. By default, it shows the first 10 lines.

    head 1.txt   # Displays the first 10 lines of 1.txt
    head -n 5 1.txt   # Displays the first 5 lines of 1.txt
    head -n 3 1.txt   # Displays the first 3 lines of 1.txt
    head -c 50 1.txt   # Displays the first 50 bytes of 1.txt

    You can also display the first few lines from multiple files:

    head 1.txt messages /var/log/centos.rep   # Displays first 10 lines from all three files

TAIL

  1. tail Command
    The tail command is used to display the last few lines of a file. It shows the last 10 lines by default.
    tail 1.txt messages passwd centos.rep   # Displays the last 10 lines of each file
    tail -f /var/log   # Displays the last 10 lines of log files and updates when new entries appear



WC

  1. wc Command
    The wc command counts the lines, words, and bytes in a file.

    wc 1.txt   # Output: lines words bytes
    wc -l 1.txt   # Counts the number of lines
    wc -w 1.txt   # Counts the number of words
    wc -c 1.txt   # Counts the number of bytes
  2. Counting for Multiple Files

    wc *   # Counts lines, words, and bytes for all files
    ls -1   # Lists files in the directory
    wc -l   # Counts the number of files



SORT

  1. Basic Sorting

    sort test1.txt   # Sorts lines alphabetically
  2. Additional Sort Options

    • Sort in numeric order (for numbers)

      sort -h test1.txt   # Sorts numbers in ascending order
    • Sort in reverse order

      sort -r test1.txt   # Sorts lines in reverse order
    • Sort randomly

      sort -R test1.txt   # Sorts lines randomly
    • Remove duplicates

      sort test1.txt | uniq -c   # Removes duplicates and shows their count
  3. Sorting by Specific Columns

    sort -k 2 test1.txt   # Sorts by the second column
  4. Handling Delimited Files

    sort -t ',' -k 2 test1.txt   # Sorts by the second column of a comma-separated file
  5. Sorting by Month

    sort -M test1.txt   # Sorts by month names



WATCH COMMAND

  • Monitor the output of a command periodically
    watch -n 5 date   # Displays the current date every 5 seconds
    watch -n 1 date   # Displays the current date every second



GREP

  1. Basic Search

    grep searchword filename   # Finds occurrences of 'searchword' in the file
    grep "root" /etc/passwd   # Finds 'root' in the /etc/passwd file
  2. Search in Multiple Files

    grep root /etc/passwd /etc/shadow   # Finds 'root' in both files
  3. Ignore Case (Case-insensitive search)

    grep -i root messages   # Finds both 'root' and 'Root'
  4. Search for Multiple Words

    grep -E "(session|root|mounting)" var/log/messages   # Finds 'session', 'root', or 'mounting'
  5. Search for Words in the Same Line

    grep "session" var/log/messages | grep root   # Finds lines containing both 'session' and 'root'
  6. Exclude a Word (Using -v)

    grep "session" var/log/messages | grep -v root   # Finds lines with 'session' but not 'root'
  7. Search for a Word at the Beginning or End of a Line

    • Word at the beginning of a line:
      grep "^root" var/log/messages   # Finds lines where 'root' appears at the start
    • Word at the end of a line:
      grep "root.$" var/log/messages   # Finds lines where 'root' appears at the end
  8. Search by Date

    grep "^2sep" var/log/messages | grep root   # Finds logs for 'root' on 2nd September

ADVANCED GREP

  1. Find Single Character After a Word

    grep 'roo.' filename   # Finds 'roo' followed by any single character
  2. Find Multiple Characters After a Word

    grep 'roo..' filename   # Finds 'roo' followed by two characters
  3. Find Empty Lines

    grep '^$' filename   # Finds empty lines
  4. Exclude Lines Starting with # (e.g., Comments)

    grep -v '^#' ssd_config   # Excludes lines starting with #
  5. Find Alphanumeric Characters

    grep "[[:alnum:]]" filename   # Finds all alphanumeric characters
  6. Search for a Pattern from Another File

    grep -f domain.txt url.txt   # Searches for domains from domain.txt in url.txt



CUT COMMAND

  1. Extract Columns from CSV File

    cut -d ',' -f 2 my-csv.csv   # Extracts the second column (e.g., names)
    cut -d ',' -f 1,3 my-csv.csv   # Extracts columns 1 and 3
    cut -d ',' -f 1-3 my-csv.csv   # Extracts columns 1 to 3
  2. Replace Commas with Spaces

    cut my-csv.csv | tr ',' ' '   # Replaces commas with spaces
  3. Find Specific IP Address

    ifconfig | grep 'inet' | cut -d ' ' -f 9   # Extracts the IPv4 address



PASTE

  1. Combine Files Column-wise

    paste name.txt surname.txt   # Joins columns from two files
    paste -d ' ' name.txt surname.txt   # Joins with space delimiter
  2. Save Output to a New File

    paste -d ' ' name.txt surname.txt > fullname.txt   # Saves the output in fullname.txt



AWK

: An Advanced Text Processing Tool

AWK is a powerful tool used for processing and analyzing text. It can modify file content, unlike commands such as cut, sort, grep, and cat, which are used mainly for text extraction and viewing. AWK allows you to perform operations on file contents based on patterns and conditions.

Basic Structure of an AWK Command:

An AWK command typically follows this syntax:

awk 'pattern {action}' filename
  • Pattern: The condition that determines when the action is performed.
  • Action: The task that AWK performs if the pattern matches.

For example:

awk '{print $0}' filename

This command will print all lines of the file because $0 refers to the entire line.

Using a Custom Delimiter:

AWK allows you to specify a delimiter for field separation in a file. The default delimiter is a space or tab, but you can set a custom one using the -F option.

For instance:

awk -F: '{print $1}' filename

Here, -F: sets the colon : as the field separator. $1 refers to the first field of each line.

Specifying Multiple Fields:

To print multiple fields, you can list them individually:

awk -F: '{print $1, $2, $3}' filename

You cannot use a range like $1-$3 directly. Instead, specify each field individually (e.g., $1, $2, $3).

Filtering with AWK:

AWK also lets you filter content based on patterns. For example, to extract the second field of each line from the ifconfig command output:

ifconfig | awk '{print $2}'

You can also apply conditions to select lines that match a certain pattern. For example, to print the second field of lines containing the string inet:

ifconfig | awk '/inet /{print $2}'

AWK with BEGIN and END Blocks

AWK allows you to add custom text at specific points in the command execution using the BEGIN and END blocks. These blocks allow you to define actions that happen before processing the data (BEGIN) and after processing is complete (END). These are useful for formatting output or adding headers/footers to the results.

Example 1: Printing a Header and Data with BEGIN

If you want to print custom text before displaying the actual data, you can use the BEGIN block:

command = ifconfig | awk 'BEGIN{print "== IP Address =="} /inet /{print $2}'

Output:

== IP Address ==
192.168.1.40
127.0.0.1

In this command:

  • The BEGIN block prints "== IP Address ==" before any data is processed.
  • The /inet /{print $2} part prints the second field (the IP address) of the lines containing inet.

Example 2: Adding a Footer with END

You can also use the END block to add custom text after the data processing is done. For example:

command = ifconfig | awk 'BEGIN{print "== IP Address =="} /inet /{print $2} END{print "====="}'

Output:

== IP Address ==
192.168.1.40
127.0.0.1
=====

In this command:

  • The BEGIN block prints the header "== IP Address =="".
  • The /inet /{print $2} part prints the IP addresses.
  • The END block prints "=====" after the data has been processed.

Example 3: Custom Text with echo Command

You can use AWK with the echo command to add a custom message at the beginning and end of the output:

command = echo "one two three four" | awk 'BEGIN {print "=== Start ==="} {print $0} END {print "-- Stop --"}'

Output:

=== Start ===
one two three four
-- Stop --

In this command:

  • The BEGIN block prints "=== Start ===".
  • The {print $0} part prints the entire input line ($0 refers to the whole line).
  • The END block prints "-- Stop --" after the input has been processed.

The BEGIN and END blocks in AWK are useful for:

  • Printing a header before the main output (BEGIN block).

  • Adding a footer or final message after the data processing (END block).


DELIMITERS

In AWK, you can include custom delimiters when printing output. This allows you to control the format and how the fields are separated in the printed result. You can add separators like spaces, commas, slashes, or any other character to customize the output. This is helpful for organizing the printed fields or making the output more readable.

Here are a few examples:

Example 1: Using a Dash (" - ") Between Fields

You can print two fields with a dash between them:

command → echo "one two three four" | awk 'BEGIN {print "=== start ==="} {print $1, " - ", $2} END {print "-- stop --"}'

Output:

=== start ===
one - two
-- stop --
  • In this example:
    • The BEGIN block prints "=== start ===".
    • The {print $1, " - ", $2} part prints the first and second fields separated by a dash (-).
    • The END block prints "-- stop --" after the data.

Example 2: Using a Slash (" / ") Between Fields

You can print two fields with a slash between them:

command → echo "one two three four" | awk 'BEGIN {print "=== start ==="} {print $1, " / ", $2} END {print "-- stop --"}'

Output:

=== start ===
one / two
-- stop --
  • Here, the fields are separated by a slash (/).

Example 3: Using a Newline (" /n ") Between Fields

You can add a newline between two fields by using /n to break the output into separate lines:

command → echo "one two three four" | awk 'BEGIN {print "=== start ==="} {print $1, " /n ", $2} END {print "-- stop --"}'

Output:

=== start ===
one /n two
-- stop --
  • In this case, /n will not actually produce a new line. If you want a true newline, just use \n in the print statement:
command → echo "one two three four" | awk 'BEGIN {print "=== start ==="} {print $1, "\n", $2} END {print "-- stop --"}'

Output:

=== start ===
one 
two
-- stop --
  • Here, \n correctly creates a new line between the fields.

Example 4: Using an Underscore (" _ ") Between Fields

You can print fields with an underscore between them:

command → echo "one two three four" | awk 'BEGIN {print "=== start ==="} {print $1, " _ ", $2} END {print "-- stop --"}'

Output:

=== start ===
one _ two
-- stop --
  • In this example, the fields are separated by an underscore (_).


WORKING with AWK

AWK is a powerful tool for text processing, allowing you to extract and manipulate specific fields from text files or commands. Below are some useful examples that show how you can work with AWK to process text and fields in different ways.

1. Accessing Specific Fields

To print specific fields from an input, you can use AWK to reference fields by number.

  • Print the first field:

    command → echo "armour infosec" | awk '{print $1}'
    

    Output:

    armour
    
    • This prints the first field, which is "armour".
  • Print the second field:

    command → echo "armour infosec" | awk '{print $2}'
    

    Output:

    infosec
    
    • This prints the second field, which is "infosec".

2. Updating Values in Fields

You can update a field's value before printing it using AWK.

  • Update the first field:
    command → echo "armour infosec" | awk '{$1="ARMOUR"; print $1}'
    
    Output:
    ARMOUR
    
    • Here, the first field is updated to "ARMOUR" and printed.

3. Matching Specific Values

You can filter specific values by using conditional statements in AWK.

  • Print the second field where the first field matches "inet" (e.g., from ifconfig output):

    command → ifconfig | awk '$1=="inet" {print $2}'
    

    Output:

    192.168.1.40
    127.0.0.1
    
    • This prints the second field (IP addresses) where the first field is "inet".
  • Print lines where the first field does not match "inet":

    command → ifconfig | awk '$1!="inet" {print $2}'
    

    Output:

    (Lines that don't have "inet")
    

4. Using Field Separators

AWK allows you to specify field separators to process files with different delimiters. For example, you can use a colon (:) separator to work with /etc/passwd.

  • Print lines where the first field is "root":

    command → cat /etc/passwd | awk -F: '$1=="root" {print $0}'
    

    Output:

    root:x:0:0:root:/root:/bin/bash
    
    • This command prints lines where the first field is "root".
  • Print lines where the first field is not "root":

    command → cat /etc/passwd | awk -F: '$1!="root" {print $0}'
    

    Output:

    (Lines where the first field is not "root")
    

5. Working with Numbers and Fields

AWK can also handle numeric comparisons for fields.

  • Print lines where the third field is "0" (e.g., from /etc/passwd):

    command → cat /etc/passwd | awk -F: '$3==0 {print $0}'
    

    Output:

    root:x:0:0:root:/root:/bin/bash
    
  • Print lines where the third field is greater than or equal to "1000":

    command → cat /etc/passwd | awk -F: '$3>=1000 {print $0}'
    

    Output:

    (Lines with UID >= 1000)
    
  • Print lines where the third field is greater than "0":

    command → cat /etc/passwd | awk -F: '$3>0 {print $0}'
    

    Output:

    (Lines with UID > 0)
    

6. Using AWK with Files

AWK can process files line by line and can output data in a specific format using BEGIN and END blocks.

  • Using BEGIN and END to print a header and footer with a file:
    command → vim test.txt
    BEGIN { print "passwd file" }
    { print $1, "home at", $6 }
    END { print "END passwd file" }
    
    command → awk -F: -f test.txt /etc/passwd
    
    Output:
    passwd file
    root home at /root
    user1 home at /home/user1
    END passwd file
    

7. Arithmetic Operations with AWK

AWK supports arithmetic operations such as addition, subtraction, multiplication, and division.

  • Print the MAC address from ifconfig using AWK (find lines that start with "ether"):
    command → ifconfig | awk '$1=="ether" {print $2}'
    
    Output:
    (MAC address)
    

8. Working with Field Count (NF)

AWK automatically counts the number of fields in each line with the special variable NF.

  • Print the number of fields in a line:

    command → echo "one two three four" | awk '{print NF}'
    

    Output:

    4
    
    • This prints the number of fields (4 in this case).
  • Print the last field:

    command → echo "one two three four" | awk '{print $NF}'
    

    Output:

    four
    
    • $NF refers to the last field.
  • Print the second-last field:

    command → echo "one two three four" | awk '{print $(NF-1)}'
    

    Output:

    three
    
  • Print both the last and second-last fields:

    command → echo "one two three four" | awk '{print $NF, $(NF-1)}'
    

    Output:

    four three
    

9. Working with /etc/passwd File Fields

You can use NF to analyze the /etc/passwd file or other files.

  • Print the number of fields in each line of /etc/passwd:

    command → cat /etc/passwd | awk '{print NF}'
    
  • Print the last field of each line in /etc/passwd:

    command → cat /etc/passwd | awk '{print $NF}'
    
  • Print both the last and second-last fields in /etc/passwd:

    command → cat /etc/passwd | awk '{print $NF, $(NF-1)}'
    

Summary:

AWK is an extremely flexible tool for text processing. It allows you to:

  • Access specific fields.
  • Update or filter data based on conditions.
  • Use mathematical operations.
  • Process files with specific delimiters.
  • Count fields with NF and manipulate data accordingly.




SED

sed OPTIONS [SCRIPT] [INPUT_FILE]

SCRIPT:

[addr]X[options

X is a single-letter sed command

[addr] can be a single line number, a regular expression, or a range of lines. If [addr] is specified, the command X will be expression only on the matched lines

Additional [ options ] are used for some sed command

sed ‘20,25d’ input.txt > output.txt

The following example deletes lines 20 to 25 in the input.

20,25 is an address range

d is the delete commans

SED commands

a text append text adter a line

d delete the pattern

i text insert text before a line

p print the pattern space

q [ exit-code ] (quit) Exit sed without processing any more commands to input

s/regexp/replacement/[flags] (subsstitute) Match the regular-expansion against the content of the pattern space. If found, replace matched string with replacement.

Command Line operations

-n disable automatic printing; sed only produces output when explicitly told to via the p command.

-e script add script

-r use extended regular expression rather than basic regular expressions.


sed ‘1.2p’ /etc/passwd

This will print line 1 and 2, two times and the rest of the file as same as it is

sed -n ‘1.2p’ /etc/passwd

This command will only print line 1 and 2, the rest of the file will not be printed

sed -n ‘/^$/p’ /etc/passwd

Print only empty lines

sed -n ‘/^$/!p’ /etc/passwd

Do not print empty lines

sed -n '1,$p' /etc/passwd

Print all lines

sed -n '$p' /etc/passwd

This will only print last line

sed -n '5,8!p' /etc/passwd

Do not print lines 5 to 8

sed '5,$d' /etc/passwd

Do not print lines 5 to last / print lines 1 to 4


Pattern Matching / Pattern Searching

sed -n '/root/p' /etc/passwd

Matching root in given file

sed -n '/root/,+3p' /etc/passwd

Print lines where root is matched and print next 3 lines where root is matched


Find and replace

sed -n 's/sbin/SBIN' /etc/passwd

Searching sbin and replacing it with SBIN ( This will only replace first match in a single lines )

sed -n 's/root/ROOT/gp' /etc/passwd

g Means global, every single match of the file will be replaced with the word we are replacing

sed -n 's/root/ROOT/1p' /etc/passwd

Match root in line in given file and replace only first root withROOT

sed -n 's/root/ROOT/2p' /etc/passwd

Match root in line in given file and replace only second root withROOT

echo “Welcome To Presidential Suite” | sed ‘s/\(\b[A-Z]\)/\(\1\)/g’

Output will be : (W)elcome (T)o (P)residential (S)uite

sed ‘4 s/sbin/SBIN/’ file_name.txt

This will replace sbin to SBIN in line number 4

sed G file_name.txt

Adds an empty line after every line in the file.

sed 'G;G;G' file_name.txt

This will add 3 empty lines after every line in the file.


Append

sed ‘/arnold/a ARNOLD User’ file_name.txt

This will add ARNOLD User after arnold in the given file

sed ‘2a ARNOLD User’ file_name.txt

This will add ARNOLD User after the second line of the given file

sed ‘5a ARNOLD User’ file_name.txt

This will add ARNOLD User after the fifth line of the given file

sed ‘5!a ARNOLD User’ file_name.txt

This will add ARNOLD User after every line except fifth line of the given file

sed ‘1,5a ARNOLD User’ file_name.txt

This will add ARNOLD User after every line between 1 to 5th line of the given file


prepend

sed ‘1i ARNOLD User’ file_name.txt

This will add ARNOLD User before the first line of the given file

sed ‘5!i ARNOLD User’ file_name.txt

This will add ARNOLD User before every line except fifth line of the given file

sed ‘1i--------------------’ filename.txt

This will make a line after line 1

sed ‘1,$i--------------------’ filename.txt

This will make a line after every line


sed ‘/arnold/d’ file_name.txt

d is for delete This will not print arnold in the file output

sed ‘/\tarnold/d’ file_name .txt

Do not print arnold before which tab key is used

Using character classes

sed ‘/[[:space:]]arnold/d’ filename.txt

This will NOT print lines where arnold appears after a ENTER (at the start of a new line) SPACE and TAB key.

sed ‘[0-9]/d’ filename.txt

Do NOT print output line which has given range of numbers in it.

sed ‘/[[:digit:]]/p’ filename.txt

This will only print output which has digits

sed ‘/[[:digit:]]/d’ filename.txt

This will NOT print output which has digits

sed ‘/[a-zA-Z]/d’ filename.txt

Delete all lines containing alphabets A to Z capital to small

sed ‘/[A-Z]/d’ filename.txt

Delete all lines containing alphabets A to Z capital or Uppercase

sed ‘/\//d’ filename.txt

Delete all lines containing Slash (/)

sed ‘/\thr/d’ filename.txt

Delete all lines containinghr after TAB key


Change or replace

sed ‘/arnold/c ARNOLD User’ filename.txt

This will change or replace arnold to ARNOLD User

sed '1c ARNOLD' filename.txt

This will change or replace arnold to ARNOLD Only in line 1

sed '1,5c ARNOLD' filename.txt

This will change or replace arnold to ARNOLD Only in line range 1 to 5


Quit or exit file

sed ‘/arnold/q’ filename.txt

This will Quit the file when arnold is found / printed


Exit status code

The status code of a successfully ran command is 0 , To make any status code of a successful command rather than 0 Use command below

sed ‘/arnold/q2’ filename.txt
echo $?

The output of status code of command will be 2


Perform the shell command

sed ‘1e date’ /etc/passwd

This will generate output of date command at first line of /etc/passwd file output

sed ‘$e date’ /etc/passwd

This will generate output of date command at last line of /etc/passwd file output

sed ‘1e echo -n "Date: "; date’ filename.txt

This will echo date then output of date command at first line of filename.txt file output

sed ‘1,3e id’ filename.txt

This will generate output of id command at first to third line of filename.txt file output

Substitute

echo “one five three” | sed ‘s/five/two/’

This will search five then replace it with two

sed ‘s/arnold/ARNOLD/’ filename.txt

This will search arnold and replace it with ARNOLD in the given file

echo “Arnold user UID 1000” | sed ‘s/[[:digit:]]\+/***/’

This will replace the digits with three stars ( *** ) ; Output will be : Arnold user UID ***

echo “Arnold user UID 1000” | sed ‘s/[[:digit:]]/*****/’

This will replace only first digit with five stars ( ***** ) ; Output will be : Arnold user UID *****000

echo “Arnold user UID 1000” | sed ‘s/[0-9]\+/*****/’

Using Pattern instead of Character class;Output will be : Arnold user UID *****

sed ‘s/[0-9]\+/*****/g’ filename.txt

This will replace all the digits of the file with four stars

sed ‘s/[[:digit:]]\+/***/g’ filename.txt

This will replace all of the digits of the given file with three stars

sed ‘s/arnold/Arnold/ & & & &/’ filename.txt

This will repeat Arnold arnold arnold arnold arnold when arnold is found in the given file ( Depends on the number of & symbol we provided )


Multiple sed commands [ -e ]

sed -ne ‘/arnold/p’ -ne ‘/root/p’ filename.txt
sed -e ‘/arnold/a "+++++++++++++++++++++" ’ -e ‘/arnold/i "---------------------" ’ filename.txt

This command will append +++++++++++++++++++++ and prepend --------------------- at the same time when the arnold word is found in the given file.

Saving the changes [ -i ]

sed -i -e ‘/arnold/a "+++++++++++++++++++++" ’ -e ‘/arnold/i "---------------------" ’ filename.txt

Using -i will save the changes in the file