Course Outline (Part 10)

In Part 10 of the Linux Bash Course, we cover advanced text processing and system execution techniques. You will learn regular expression syntax for pattern matching, process sandboxing using subshells, input/output redirection with custom file descriptors, and how to write tools to parse CSV logs and JSON APIs.

For details on pattern matching expansion rules, see the GNU Bash Reference Manual on Pattern Matching.


Chapter 28: Regular Expressions Deep Dive

28.1 Literal Characters

Literal characters in regular expressions match themselves exactly. For instance, the regex bash matches the sequence “bash” in any input string.

  • Syntax:
    grep "literal" filename
  • Example Command:
    echo "Welcome to free tech learning" | grep -o "tech"
  • Expected Output:
    tech
  • Flag & Command Breakdown:
    • -o: Prints only the matching part of the line, rather than the entire line.

28.2 The Dot (Any Character)

The dot (.) is a wildcard metacharacter that matches any single character except a newline.

  • Syntax:
    # Matches any 3-character pattern starting with 'b' and ending with 't'
    grep "b.t"
  • Example Command:
    echo -e "bat\nboot\nbit\nbet" | grep -E "^b.t$"
  • Expected Output:
    bat
    bit
    bet
  • Flag & Command Breakdown:
    • -E: Uses Extended Regular Expressions (ERE) syntax.
    • ^ and $: Anchors matching the exact beginning and end of the line.

28.3 Character Classes ([abc], [a-z])

A character class (or character set) matches any single character enclosed within the square brackets. You can specify ranges of characters using a hyphen (e.g., [a-z] for lowercase letters, [0-9] for digits).

  • Syntax:
    grep "[allowed_characters]"
  • Example Command:
    echo -e "track\ntrick\ntruck" | grep -E "tr[au]ck"
  • Expected Output:
    track
    truck

28.4 Negated Classes ([^abc])

Placing a caret (^) as the first character inside square brackets negates the class, causing it to match any single character except those listed in the brackets.

  • Syntax:
    grep "[^excluded_characters]"
  • Example Command:
    echo -e "tack\ntick\ntock" | grep -E "t[^i]ck"
  • Expected Output:
    tack
    tock

28.5 Anchors: Start (^) and End ($)

Anchors do not match physical characters. Instead, they lock the match to specific positions in the line:

  • ^: Forces the match to start at the absolute beginning of a line.

  • $: Forces the match to end at the absolute end of a line.

  • Syntax:

    grep "^pattern$"
  • Example Command:

    echo -e "error log\nsystem error\nerror" | grep "^error$"
  • Expected Output:

    error

28.6 Quantifiers: *, +, ?, {n,m}

Quantifiers specify how many times the preceding character or group must match:

  • *: Matches 0 or more times.

  • +: Matches 1 or more times (Extended Regex).

  • ?: Matches 0 or 1 time (Extended Regex).

  • {n,m}: Matches between n and m times.

  • Example Command:

    echo -e "color\ncolour\ncolouur" | grep -E "colo?ur"
  • Expected Output:

    color
    colour

28.7 Capturing Groups ((...))

Parentheses group parts of a regular expression together. This allows you to apply quantifiers to the entire group or extract specific sub-matches.

  • Bash Context: Inside conditional double brackets [[ ... ]] using the =~ regex operator, captured groups are automatically saved to the BASH_REMATCH array variable.

  • Example Script:

    #!/bin/bash
    TEXT="Server ID: SRV-8492"
    # Match and capture digits
    if [[ "$TEXT" =~ ID:\ ([A-Z0-9\-]+) ]]; then
        echo "Full Match: ${BASH_REMATCH[0]}"
        echo "Captured ID: ${BASH_REMATCH[1]}"
    fi

28.8 Alternation: | (OR)

The pipe symbol (|) matches either the pattern on its left or the pattern on its right.

  • Syntax:
    grep -E "pattern1|pattern2"
  • Example Command:
    echo -e "info log\nwarning log\ndebug log" | grep -E "info|warning"
  • Expected Output:
    info log
    warning log

28.9 Escaping Metacharacters

To match a metacharacter (like ., *, ?, [, \) literally, escape it by prefixing it with a backslash (\).

  • Syntax:
    grep "\metacharacter"
  • Example Command:
    echo "Search results for 127.0.0.1" | grep -o "127\.0\.0\.1"
  • Expected Output:
    127.0.0.1

28.10 PCRE (Perl-Compatible Regex) in Bash

Perl-Compatible Regular Expressions (PCRE) support advanced features like lookaheads, lookbehinds, and non-greedy quantifiers. GNU grep supports PCRE patterns via the -P flag.

  • Syntax:
    grep -P "pcre_pattern"
  • Example Command:
    # Extract only text inside brackets using lookaround assertions
    echo "Data [SecretKey123] parsed" | grep -oP '(?<=\[)[a-zA-Z0-9]+(?=\])'
  • Expected Output:
    SecretKey123
  • Flag & Command Breakdown:
    • -P: Interprets the pattern as a Perl-Compatible Regular Expression (PCRE).
    • (?<=\[): Positive lookbehind. Asserts that the match must be preceded by [.
    • (?=\]): Positive lookahead. Asserts that the match must be followed by ].

Chapter 29: Subshells & Command Substitution

29.1 Creating a Subshell with ( )

Wrapping commands in parentheses ( ) runs them inside a subshell process. The subshell is a duplicate copy of the parent shell process. Variables defined or modified inside a subshell do not affect the parent shell environment.

  • Syntax:
    ( command1; command2 )
  • Example Command:
    APP_STATUS="Active"
    ( APP_STATUS="Stopped"; echo "Subshell: $APP_STATUS" )
    echo "Parent: $APP_STATUS"
  • Expected Output:
    Subshell: Stopped
    Parent: Active

29.2 Command Substitution: `cmd` vs $(cmd)

Command substitution runs a command and replaces it with its standard output.

  • Legacy: Backticks `command`. They are difficult to nest and require escaping backslashes.

  • Modern: $(command). This is the preferred syntax because it is easy to nest and handles inner quoting cleanly.

  • Syntax:

    variable=$(command)
  • Example Command:

    FILES_COUNT=$(ls | wc -l)
    echo "Files in folder: $FILES_COUNT"
  • Expected Output:

    Files in folder: 5

29.3 Process Substitution: <(cmd) and >(cmd)

Process substitution allows you to pass the output of a command as a temporary file path parameter to another command.

  • <(command): Generates a temporary named pipe file descriptor containing the command’s stdout.

  • >(command): Generates a temporary file descriptor that pipes data written to it into the command’s stdin.

  • Example Command:

    # Compare output of two command lists directly using diff
    diff <(echo "A"; echo "B") <(echo "A"; echo "C")
  • Expected Output:

    2c2
    < B
    ---
    > C

29.4 Variable Scope in Subshells

Variables defined in the parent shell are copied to the subshell by value, meaning the subshell receives a copy of the variable, not a reference. Any changes the subshell makes to the variable are lost when the subshell terminates.


29.5 The exec Command

The exec command replaces the current shell process with the specified command, inheriting the shell’s PID. Any commands after exec in the script are not executed because the shell process no longer exists.

  • Syntax:
    exec command
  • Example Command:
    cat << 'EOF' > exec_demo.sh
    #!/bin/bash
    echo "Before exec..."
    exec echo "Shell replaced!"
    echo "This line will never execute."
    EOF
    bash exec_demo.sh
  • Expected Output:
    Before exec...
    Shell replaced!

29.6 Replacing the Shell with exec

You can use exec in wrapper scripts to configure environment variables or paths before launching a binary. This ensures the binary runs as the primary process (PID 1), which is a common practice in Docker containers.


29.7 File Descriptors Beyond 0,1,2

By default, the shell uses three standard file descriptors (FDs): 0 (stdin), 1 (stdout), and 2 (stderr). You can open custom file descriptors (numbered 3 through 9) to read from or write to files throughout your script.

  • Syntax:
    # Open file descriptor 3 for writing
    exec 3>output_log.txt
    
    # Write to descriptor 3
    echo "Data" >&3
    
    # Close descriptor 3
    exec 3>&-
  • Example Command:
    exec 4>custom_out.txt
    echo "Writing to custom FD 4" >&4
    exec 4>&-
    cat custom_out.txt
    rm -f custom_out.txt
  • Expected Output:
    Writing to custom FD 4

29.8 Duplicating File Descriptors

You can duplicate or redirect file descriptors to redirect multiple outputs at once. E.g., exec 3>&1 saves the current state of stdout (FD 1) to FD 3, allowing you to temporarily redirect stdout and restore it later.

  • Example Command:
    # Redirect all stdout to a file temporarily, then restore it
    exec 3>&1                 # Copy stdout descriptor (1) to descriptor 3
    exec 1>temp_stdout.txt    # Redirect stdout to a file
    echo "To file"            # This goes to the file
    exec 1>&3                 # Restore stdout from descriptor 3
    echo "To terminal"        # This goes back to the terminal
    rm -f temp_stdout.txt
  • Expected Output:
    To terminal

29.9 Using Coprocesses: coproc

The coproc command launches a command asynchronously as a background process while establishing bidirectional pipes to read from and write to it.

  • Syntax:
    coproc COPROC_NAME { command; }
  • Example Command:
    # Start coprocess that capitalizes text using tr
    coproc MY_COPROC { tr '[:lower:]' '[:upper:]'; }
    # Write input to the coprocess pipe
    echo "lowercase data" >&"${MY_COPROC[1]}"
    # Close write pipe to signal EOF
    exec "${MY_COPROC[1]}>&-"
    # Read capitalized output
    read -r RESULT <&"${MY_COPROC[0]}"
    echo "Result: $RESULT"
  • Expected Output:
    Result: LOWERCASE DATA
  • Flag & Command Breakdown:
    • ${MY_COPROC[1]}: The file descriptor used to write to the coprocess’s stdin.
    • ${MY_COPROC[0]}: The file descriptor used to read from the coprocess’s stdout.

29.10 Performance: Subshells vs Subprocesses

Launching a subshell (e.g. (cmd)) or subprocess (e.g. $(cmd)) requires the operating system to perform a fork operation, which duplicates process memory and file descriptors.

  • Performance Hit: While fork overhead is minimal for small tasks, running thousands of subprocesses inside a loop will slow down your script.
  • Optimization: Use built-in shell features (like parameter expansions or arithmetic brackets) instead of calling external commands (like sed or expr) inside loops.

Chapter 30: Working with CSV & Logs

30.1 Parsing CSV with awk

awk is a powerful text processing utility designed for parsing structured, column-based data. It treats commas or tabs as field separators and processes files line-by-line.

  • Syntax:
    awk -F"separator" '{print $field_number}' filename
  • Example Command:
    printf "suresh,admin,active\njohn,user,inactive\n" > users.csv
    awk -F"," '{print "User: " $1 " is " $3}' users.csv
    rm -f users.csv
  • Expected Output:
    User: suresh is active
    User: john is inactive
  • Flag & Command Breakdown:
    • -F",": Sets the field separator (delimiter) to a comma.
    • $1 and $3: Refer to the first and third columns in the row.

30.2 Parsing CSV with cut and tr

For simple delimited text files, you can use cut to extract columns and tr to clean up or convert delimiters.

  • Syntax:
    cut -d"delimiter" -f[field_index] filename
  • Example Command:
    echo "suresh:admin:active" | cut -d":" -f2 | tr '[:lower:]' '[:upper:]'
  • Expected Output:
    ADMIN
  • Flag & Command Breakdown:
    • -d":": Sets the delimiter to a colon.
    • -f2: Selects only the second field.

30.3 Handling Quoted Fields

Standard column splitting tools like cut and awk fail when parsing CSV files that contain commas inside quotes (e.g., "Delhi, India",Asia,Active). To parse these files correctly, you should use more advanced CSV parsers like fping, Python, or Python-based CSV tools.

  • Example Python Inline command:
    echo '"Delhi, India",Asia' | python3 -c "import csv,sys; print(list(csv.reader(sys.stdin))[0][0])"
  • Expected Output:
    Delhi, India

30.4 Generating Reports from Logs

You can build data pipelines to analyze log files by chaining commands like grep, awk, sort, and uniq together with pipes.

  • Example Command:
    # Simulate an Apache access log file
    printf "192.168.1.1 GET /blog\n192.168.1.2 GET /index\n192.168.1.1 GET /blog\n" > access.log
    # Extract IP addresses, count unique requests, and sort by hits
    awk '{print $1}' access.log | sort | uniq -c | sort -rn
    rm -f access.log
  • Expected Output:
          2 192.168.1.1
          1 192.168.1.2
  • Flag & Command Breakdown:
    • sort: Sorts lines of text alphabetically or numerically.
    • uniq -c: Groups adjacent duplicate lines and prefixes them with a count of occurrences.
    • sort -rn: Sorts lines numerically (-n) in reverse (-r) order, placing the highest values first.

30.5 Rotating Log Files with Scripts

Writing a custom log rotation script allows you to manage disk space by compressing and archiving old log files.

  • Example Script:
    #!/bin/bash
    LOG_FILE="/var/log/myapp.log"
    MAX_SIZE=$((10 * 1024 * 1024)) # 10MB limit
    
    if [[ -f "$LOG_FILE" && $(stat -c%s "$LOG_FILE") -gt $MAX_SIZE ]]; then
        mv "$LOG_FILE" "${LOG_FILE}.$(date +%Y%m%d).bak"
        gzip "${LOG_FILE}.$(date +%Y%m%d).bak"
        touch "$LOG_FILE"
        chmod 640 "$LOG_FILE"
    fi

30.6 Colorizing Log Output

Using ANSI escape color codes in your scripts makes warnings and errors stand out in the terminal, making logs much easier to scan.

  • Syntax:
    echo -e "\033[style;color_code_mText\033[0m"
  • Example Command:
    RED='\033[0;31m'
    GREEN='\033[0;32m'
    NC='\033[0m' # No Color (reset)
    
    echo -e "${GREEN}[OK]${NC} Database is online."
    echo -e "${RED}[ERROR]${NC} Disk space critical!"
  • Expected Output:
    [OK] Database is online.
    [ERROR] Disk space critical!

30.7 Extracting JSON with jq (External)

JSON is the standard format for modern web APIs. The external jq utility is the industry standard for parsing, filtering, and extracting data from JSON strings in shell scripts.

  • Syntax:
    jq '.key_name' filename.json
  • Example Command:
    echo '{"user": "suresh", "permissions": ["read", "write"]}' | jq -r '.permissions[0]'
  • Expected Output:
    read
  • Flag & Command Breakdown:
    • -r: Output raw strings instead of JSON-formatted strings (which are wrapped in double quotes).

30.8 Extracting XML with xpath / xmlstarlet

To parse XML files from shell scripts, you can use command-line XML parsing utilities like xmlstarlet or xpath.

  • Example Command:
    # Extract element using xmlstarlet
    echo "<site><name>FreeTech</name></site>" | xmlstarlet sel -t -v "/site/name"
  • Expected Output:
    FreeTech
  • Flag & Command Breakdown:
    • sel -t -v: Selects and extracts the value of the specified XPath query.

30.9 Monitoring Logs with a Script

You can monitor log files in real-time by wrapping tail -f in a loop and matching specific pattern triggers (like “ERROR” or “FAIL”) as new lines are appended.

  • Example Script:
    #!/bin/bash
    # Monitor log and trigger alerts on errors
    tail -f /var/log/syslog | while read -r line; do
        if [[ "$line" =~ "ERROR" ]]; then
            echo -e "\033[0;31m[CRITICAL ALERT]\033[0m: $line"
            # Trigger custom alert function
        fi
    done

30.10 Email Alerts on Log Patterns

When a script detects a critical error pattern, it can send email notifications to system administrators using terminal mail clients like mailx or sendmail.

  • Syntax:
    echo "Body text" | mailx -s "Subject Line" [email protected]
  • Example Command:
    # If a backup fails, send email alert
    echo "Daily database backup failed!" | mailx -s "CRITICAL: Backup Fail" [email protected] 2>/dev/null || echo "Email client not configured."
  • Expected Output:
    Email client not configured.