Linux Shell Operations

The Ultimate CSV Scripting Guide

Automate data validation, cleaning, and reporting using standard Unix utilities like awk, sed, and join.

Counting Records

Standard counting often includes the header row. Use these methods to get accurate data counts.

# Count lines including header
wc -l data.csv

# Count data rows only (Excludes line 1)
awk 'NR > 1' data.csv | wc -l

# Count using variable for scripts
ROW_COUNT=$(tail -n +2 data.csv | grep -c .)

Column & Field Validation

Check if every row adheres to the expected schema (e.g., exactly 8 columns).

# Find rows that DO NOT have exactly 8 columns
awk -F',' 'NF != 8 { print "Error: Line " NR " has " NF " fields" }' data.csv

# Validate that Column 3 is always numeric
awk -F',' '$3 !~ /^[0-9]+$/ { print "Non-numeric ID at line " NR ": " $3 }' data.csv

Data Transformation

Convert case, replace delimiters, or calculate new fields on the fly.

# Convert CSV to Pipe-Delimited (|)
sed 's/,/|/g' data.csv > output.psv

# Uppercase the names in Column 2
awk -F',' 'BEGIN{OFS=","} {$2=toupper($2); print}' data.csv

# Add a "Total" column (Col4 = Col2 + Col3)
awk -F',' 'BEGIN{OFS=","} {if(NR>1) $4=$2+$3; else $4="Total"; print}' data.csv

Merging Files (Relational Joins)

Use the join command to combine two CSVs on a common ID column.

# Sort files first (Join requires sorted input)
sort -t',' -k1,1 users.csv > users_sorted.csv
sort -t',' -k1,1 orders.csv > orders_sorted.csv

# Join on Column 1 (ID)
join -t',' -1 1 -2 1 users_sorted.csv orders_sorted.csv > joined_data.csv

Note: The -1 1 -2 1 flags tell join to use Column 1 of File 1 and Column 1 of File 2 as the keys.

Common One-Liners

Requirement Command
Remove Header Row sed '1d' file.csv
Sum values in Col 5 awk -F, '{s+=$5} END{print s}' file.csv
Unique values in Col 1 cut -d, -f1 file.csv | sort -u
Filter by Date (regex) grep "2023-12-.." file.csv