In Part 10 of the Linux Bash Course, we cover advanced text processing and system execution techniques. You will learn regular expression syntax for pattern matching, process sandboxing using subshells, input/output redirection with custom file descriptors, and how to write tools to parse CSV logs and JSON APIs.
For details on pattern matching expansion rules, see the GNU Bash Reference Manual on Pattern Matching.
Chapter 28: Regular Expressions Deep Dive
28.1 Literal Characters
Literal characters in regular expressions match themselves exactly. For instance, the regex bash matches the sequence “bash” in any input string.
- Syntax:
grep "literal" filename - Example Command:
echo "Welcome to free tech learning" | grep -o "tech" - Expected Output:
tech - Flag & Command Breakdown:
-o: Prints only the matching part of the line, rather than the entire line.
28.2 The Dot (Any Character)
The dot (.) is a wildcard metacharacter that matches any single character except a newline.
- Syntax:
# Matches any 3-character pattern starting with 'b' and ending with 't' grep "b.t" - Example Command:
echo -e "bat\nboot\nbit\nbet" | grep -E "^b.t$" - Expected Output:
bat bit bet - Flag & Command Breakdown:
-E: Uses Extended Regular Expressions (ERE) syntax.^and$: Anchors matching the exact beginning and end of the line.
28.3 Character Classes ([abc], [a-z])
A character class (or character set) matches any single character enclosed within the square brackets. You can specify ranges of characters using a hyphen (e.g., [a-z] for lowercase letters, [0-9] for digits).
- Syntax:
grep "[allowed_characters]" - Example Command:
echo -e "track\ntrick\ntruck" | grep -E "tr[au]ck" - Expected Output:
track truck
28.4 Negated Classes ([^abc])
Placing a caret (^) as the first character inside square brackets negates the class, causing it to match any single character except those listed in the brackets.
- Syntax:
grep "[^excluded_characters]" - Example Command:
echo -e "tack\ntick\ntock" | grep -E "t[^i]ck" - Expected Output:
tack tock
28.5 Anchors: Start (^) and End ($)
Anchors do not match physical characters. Instead, they lock the match to specific positions in the line:
-
^: Forces the match to start at the absolute beginning of a line. -
$: Forces the match to end at the absolute end of a line. -
Syntax:
grep "^pattern$" -
Example Command:
echo -e "error log\nsystem error\nerror" | grep "^error$" -
Expected Output:
error
28.6 Quantifiers: *, +, ?, {n,m}
Quantifiers specify how many times the preceding character or group must match:
-
*: Matches 0 or more times. -
+: Matches 1 or more times (Extended Regex). -
?: Matches 0 or 1 time (Extended Regex). -
{n,m}: Matches betweennandmtimes. -
Example Command:
echo -e "color\ncolour\ncolouur" | grep -E "colo?ur" -
Expected Output:
color colour
28.7 Capturing Groups ((...))
Parentheses group parts of a regular expression together. This allows you to apply quantifiers to the entire group or extract specific sub-matches.
-
Bash Context: Inside conditional double brackets
[[ ... ]]using the=~regex operator, captured groups are automatically saved to theBASH_REMATCHarray variable. -
Example Script:
#!/bin/bash TEXT="Server ID: SRV-8492" # Match and capture digits if [[ "$TEXT" =~ ID:\ ([A-Z0-9\-]+) ]]; then echo "Full Match: ${BASH_REMATCH[0]}" echo "Captured ID: ${BASH_REMATCH[1]}" fi
28.8 Alternation: | (OR)
The pipe symbol (|) matches either the pattern on its left or the pattern on its right.
- Syntax:
grep -E "pattern1|pattern2" - Example Command:
echo -e "info log\nwarning log\ndebug log" | grep -E "info|warning" - Expected Output:
info log warning log
28.9 Escaping Metacharacters
To match a metacharacter (like ., *, ?, [, \) literally, escape it by prefixing it with a backslash (\).
- Syntax:
grep "\metacharacter" - Example Command:
echo "Search results for 127.0.0.1" | grep -o "127\.0\.0\.1" - Expected Output:
127.0.0.1
28.10 PCRE (Perl-Compatible Regex) in Bash
Perl-Compatible Regular Expressions (PCRE) support advanced features like lookaheads, lookbehinds, and non-greedy quantifiers. GNU grep supports PCRE patterns via the -P flag.
- Syntax:
grep -P "pcre_pattern" - Example Command:
# Extract only text inside brackets using lookaround assertions echo "Data [SecretKey123] parsed" | grep -oP '(?<=\[)[a-zA-Z0-9]+(?=\])' - Expected Output:
SecretKey123 - Flag & Command Breakdown:
-P: Interprets the pattern as a Perl-Compatible Regular Expression (PCRE).(?<=\[): Positive lookbehind. Asserts that the match must be preceded by[.(?=\]): Positive lookahead. Asserts that the match must be followed by].
Chapter 29: Subshells & Command Substitution
29.1 Creating a Subshell with ( )
Wrapping commands in parentheses ( ) runs them inside a subshell process. The subshell is a duplicate copy of the parent shell process. Variables defined or modified inside a subshell do not affect the parent shell environment.
- Syntax:
( command1; command2 ) - Example Command:
APP_STATUS="Active" ( APP_STATUS="Stopped"; echo "Subshell: $APP_STATUS" ) echo "Parent: $APP_STATUS" - Expected Output:
Subshell: Stopped Parent: Active
29.2 Command Substitution: `cmd` vs $(cmd)
Command substitution runs a command and replaces it with its standard output.
-
Legacy: Backticks
`command`. They are difficult to nest and require escaping backslashes. -
Modern:
$(command). This is the preferred syntax because it is easy to nest and handles inner quoting cleanly. -
Syntax:
variable=$(command) -
Example Command:
FILES_COUNT=$(ls | wc -l) echo "Files in folder: $FILES_COUNT" -
Expected Output:
Files in folder: 5
29.3 Process Substitution: <(cmd) and >(cmd)
Process substitution allows you to pass the output of a command as a temporary file path parameter to another command.
-
<(command): Generates a temporary named pipe file descriptor containing the command’s stdout. -
>(command): Generates a temporary file descriptor that pipes data written to it into the command’s stdin. -
Example Command:
# Compare output of two command lists directly using diff diff <(echo "A"; echo "B") <(echo "A"; echo "C") -
Expected Output:
2c2 < B --- > C
29.4 Variable Scope in Subshells
Variables defined in the parent shell are copied to the subshell by value, meaning the subshell receives a copy of the variable, not a reference. Any changes the subshell makes to the variable are lost when the subshell terminates.
29.5 The exec Command
The exec command replaces the current shell process with the specified command, inheriting the shell’s PID. Any commands after exec in the script are not executed because the shell process no longer exists.
- Syntax:
exec command - Example Command:
cat << 'EOF' > exec_demo.sh #!/bin/bash echo "Before exec..." exec echo "Shell replaced!" echo "This line will never execute." EOF bash exec_demo.sh - Expected Output:
Before exec... Shell replaced!
29.6 Replacing the Shell with exec
You can use exec in wrapper scripts to configure environment variables or paths before launching a binary. This ensures the binary runs as the primary process (PID 1), which is a common practice in Docker containers.
29.7 File Descriptors Beyond 0,1,2
By default, the shell uses three standard file descriptors (FDs): 0 (stdin), 1 (stdout), and 2 (stderr). You can open custom file descriptors (numbered 3 through 9) to read from or write to files throughout your script.
- Syntax:
# Open file descriptor 3 for writing exec 3>output_log.txt # Write to descriptor 3 echo "Data" >&3 # Close descriptor 3 exec 3>&- - Example Command:
exec 4>custom_out.txt echo "Writing to custom FD 4" >&4 exec 4>&- cat custom_out.txt rm -f custom_out.txt - Expected Output:
Writing to custom FD 4
29.8 Duplicating File Descriptors
You can duplicate or redirect file descriptors to redirect multiple outputs at once. E.g., exec 3>&1 saves the current state of stdout (FD 1) to FD 3, allowing you to temporarily redirect stdout and restore it later.
- Example Command:
# Redirect all stdout to a file temporarily, then restore it exec 3>&1 # Copy stdout descriptor (1) to descriptor 3 exec 1>temp_stdout.txt # Redirect stdout to a file echo "To file" # This goes to the file exec 1>&3 # Restore stdout from descriptor 3 echo "To terminal" # This goes back to the terminal rm -f temp_stdout.txt - Expected Output:
To terminal
29.9 Using Coprocesses: coproc
The coproc command launches a command asynchronously as a background process while establishing bidirectional pipes to read from and write to it.
- Syntax:
coproc COPROC_NAME { command; } - Example Command:
# Start coprocess that capitalizes text using tr coproc MY_COPROC { tr '[:lower:]' '[:upper:]'; } # Write input to the coprocess pipe echo "lowercase data" >&"${MY_COPROC[1]}" # Close write pipe to signal EOF exec "${MY_COPROC[1]}>&-" # Read capitalized output read -r RESULT <&"${MY_COPROC[0]}" echo "Result: $RESULT" - Expected Output:
Result: LOWERCASE DATA - Flag & Command Breakdown:
${MY_COPROC[1]}: The file descriptor used to write to the coprocess’s stdin.${MY_COPROC[0]}: The file descriptor used to read from the coprocess’s stdout.
29.10 Performance: Subshells vs Subprocesses
Launching a subshell (e.g. (cmd)) or subprocess (e.g. $(cmd)) requires the operating system to perform a fork operation, which duplicates process memory and file descriptors.
- Performance Hit: While fork overhead is minimal for small tasks, running thousands of subprocesses inside a loop will slow down your script.
- Optimization: Use built-in shell features (like parameter expansions or arithmetic brackets) instead of calling external commands (like
sedorexpr) inside loops.
Chapter 30: Working with CSV & Logs
30.1 Parsing CSV with awk
awk is a powerful text processing utility designed for parsing structured, column-based data. It treats commas or tabs as field separators and processes files line-by-line.
- Syntax:
awk -F"separator" '{print $field_number}' filename - Example Command:
printf "suresh,admin,active\njohn,user,inactive\n" > users.csv awk -F"," '{print "User: " $1 " is " $3}' users.csv rm -f users.csv - Expected Output:
User: suresh is active User: john is inactive - Flag & Command Breakdown:
-F",": Sets the field separator (delimiter) to a comma.$1and$3: Refer to the first and third columns in the row.
30.2 Parsing CSV with cut and tr
For simple delimited text files, you can use cut to extract columns and tr to clean up or convert delimiters.
- Syntax:
cut -d"delimiter" -f[field_index] filename - Example Command:
echo "suresh:admin:active" | cut -d":" -f2 | tr '[:lower:]' '[:upper:]' - Expected Output:
ADMIN - Flag & Command Breakdown:
-d":": Sets the delimiter to a colon.-f2: Selects only the second field.
30.3 Handling Quoted Fields
Standard column splitting tools like cut and awk fail when parsing CSV files that contain commas inside quotes (e.g., "Delhi, India",Asia,Active). To parse these files correctly, you should use more advanced CSV parsers like fping, Python, or Python-based CSV tools.
- Example Python Inline command:
echo '"Delhi, India",Asia' | python3 -c "import csv,sys; print(list(csv.reader(sys.stdin))[0][0])" - Expected Output:
Delhi, India
30.4 Generating Reports from Logs
You can build data pipelines to analyze log files by chaining commands like grep, awk, sort, and uniq together with pipes.
- Example Command:
# Simulate an Apache access log file printf "192.168.1.1 GET /blog\n192.168.1.2 GET /index\n192.168.1.1 GET /blog\n" > access.log # Extract IP addresses, count unique requests, and sort by hits awk '{print $1}' access.log | sort | uniq -c | sort -rn rm -f access.log - Expected Output:
2 192.168.1.1 1 192.168.1.2 - Flag & Command Breakdown:
sort: Sorts lines of text alphabetically or numerically.uniq -c: Groups adjacent duplicate lines and prefixes them with a count of occurrences.sort -rn: Sorts lines numerically (-n) in reverse (-r) order, placing the highest values first.
30.5 Rotating Log Files with Scripts
Writing a custom log rotation script allows you to manage disk space by compressing and archiving old log files.
- Example Script:
#!/bin/bash LOG_FILE="/var/log/myapp.log" MAX_SIZE=$((10 * 1024 * 1024)) # 10MB limit if [[ -f "$LOG_FILE" && $(stat -c%s "$LOG_FILE") -gt $MAX_SIZE ]]; then mv "$LOG_FILE" "${LOG_FILE}.$(date +%Y%m%d).bak" gzip "${LOG_FILE}.$(date +%Y%m%d).bak" touch "$LOG_FILE" chmod 640 "$LOG_FILE" fi
30.6 Colorizing Log Output
Using ANSI escape color codes in your scripts makes warnings and errors stand out in the terminal, making logs much easier to scan.
- Syntax:
echo -e "\033[style;color_code_mText\033[0m" - Example Command:
RED='\033[0;31m' GREEN='\033[0;32m' NC='\033[0m' # No Color (reset) echo -e "${GREEN}[OK]${NC} Database is online." echo -e "${RED}[ERROR]${NC} Disk space critical!" - Expected Output:
[OK] Database is online. [ERROR] Disk space critical!
30.7 Extracting JSON with jq (External)
JSON is the standard format for modern web APIs. The external jq utility is the industry standard for parsing, filtering, and extracting data from JSON strings in shell scripts.
- Syntax:
jq '.key_name' filename.json - Example Command:
echo '{"user": "suresh", "permissions": ["read", "write"]}' | jq -r '.permissions[0]' - Expected Output:
read - Flag & Command Breakdown:
-r: Output raw strings instead of JSON-formatted strings (which are wrapped in double quotes).
30.8 Extracting XML with xpath / xmlstarlet
To parse XML files from shell scripts, you can use command-line XML parsing utilities like xmlstarlet or xpath.
- Example Command:
# Extract element using xmlstarlet echo "<site><name>FreeTech</name></site>" | xmlstarlet sel -t -v "/site/name" - Expected Output:
FreeTech - Flag & Command Breakdown:
sel -t -v: Selects and extracts the value of the specified XPath query.
30.9 Monitoring Logs with a Script
You can monitor log files in real-time by wrapping tail -f in a loop and matching specific pattern triggers (like “ERROR” or “FAIL”) as new lines are appended.
- Example Script:
#!/bin/bash # Monitor log and trigger alerts on errors tail -f /var/log/syslog | while read -r line; do if [[ "$line" =~ "ERROR" ]]; then echo -e "\033[0;31m[CRITICAL ALERT]\033[0m: $line" # Trigger custom alert function fi done
30.10 Email Alerts on Log Patterns
When a script detects a critical error pattern, it can send email notifications to system administrators using terminal mail clients like mailx or sendmail.
- Syntax:
echo "Body text" | mailx -s "Subject Line" [email protected] - Example Command:
# If a backup fails, send email alert echo "Daily database backup failed!" | mailx -s "CRITICAL: Backup Fail" [email protected] 2>/dev/null || echo "Email client not configured." - Expected Output:
Email client not configured.