Text manipulation is an essential part of a developer‘s toolbox. Whether it‘s processing logs, transforming configuration files or wrangling messy data, being able to find and replace text quickly is a must-have skill.
This is where sed shines.
sed stands for stream editor. As the name suggests, it allows performing non-interactive edits on text streams – whether an input file, pipeline input or stdout stream.
With its strong regex capabilities, sed enables developers to restructure, clean and transform text without manually opening files. And it‘s blazingly fast too – sed was designed as a lightweight editor to be used in scripts and pipelines.
However, while basic sed finding and replacing is easy enough, handling multiple find-replace expressions can get complex.
In this comprehensive 3200+ word guide, we will systematically cover various methods to achieve multi-substitutions using sed, with detailed examples for each approach.
You will learn:
- Techniques to specify multiple sed replacement expressions
- Using external sed scripts for managing find-replace rules
- Chaining multiple sed commands together
- Advanced substitution features like ranges, flags and captures
- Applying multi-sed workflows to real-world examples
Follow along, and you will level up your text-wrangling skills considerably!
Why Learn Multiple sed Substitutions?
Before we jump into the techniques, it‘s worth covering why you may need multiple substitutions in the first place.
In simple cases, doing one global search-replace per sed invocation works fine. For example:
sed ‘s/foo/bar/g‘ file.txt
However, text transformations often require multiple find-replace passes. Some common cases where this happens:
Structured text processing – Data formats like JSON, CSV, XML contain nested logical units like objects, rows, and tags. Multi-pass sed lets you manipulate specific sections separately.
Multi-stage transformations – Converting between formats like HTML→Markdown or encoding changes utf-8→ascii is best done in steps.
Stream editing – When processing pipeline data, you may progressively sanitize, structure and filter the stream.
Large refactors – Refactoring code or data structures often needs changes to multiple interconnected pieces.
As you can see, chained sed substitutions helps process text in logical stages.
Understanding multi-replace best practices is thus key for any developer working with text processing.
Alright, enough talk – let‘s get hands-on!
Sed Replacements 101
Before we get into advanced techniques, let‘s recap basic sed replacements as a refresher.
The syntax for substitutions in sed is:
sed ‘s/search_pattern/replacement_text/flags‘ input_file
Here:
s
: the substitute commandsearch_pattern
: text pattern to findreplacement_text
: new text that will replace matchesflags
: optional flags likeg
for global replaceinput_file
: file to run sed on
For example, to globally replace all apple
instances with orange
in input.txt
:
sed ‘s/apple/orange/g‘ input.txt
This will print out the modified file to stdout. To edit the file inplace, use the -i
flag like so:
sed -i ‘s/apple/orange/g‘ input.txt
Easy enough! Now let‘s understand approaches for more advanced multi-substitutions.
Method 1: Chaining Expressions
Our first method for multi-replace involves chaining multiple substitute commands together in one sed invocation.
There are two popular ways to achieve this:
- Using multiple
-e
expressions - Joining expressions with semicolons
Let‘s explore both techniques.
1.1 Multiple -e Expressions
The -e
flag allows adding multiple expressions to a sed command:
sed -e ‘s/find1/replace1/‘ -e ‘s/find2/replace2/‘ file
For example, to replace both apple
and banana
in file.txt
:
sed -e ‘s/apple/orange/‘ -e ‘s/banana/strawberry/‘ file.txt
The main benefit of -e
is it allows clearly separating out each substitution rule, improving readability.
Here is an expanded example:
# Remove sensitivity classification from file header
sed -e ‘s/(C)//‘ -e ‘s/(I)//‘ -e ‘s/(S)//‘ report.pdf
# Standardize date formats
sed -e ‘s/JAN/01/‘ -e ‘s/FEB/02/‘ -e ‘s/MAR/03/‘ logs.csv
# Anonymize PII
sed -e ‘s/Alice/<FirstName>/’ -e ‘s/Smith/<LastName>/‘ records.json
In addition to readability, -e
allows reusing sections easily across files.
1.2 Joining Expressions
Instead of -e
, we can also chain together expressions directly using semicolons:
sed -e ‘s/find1/replace1/; s/find2/replace2‘ file
Our earlier fruit example becomes:
sed -e ‘s/apple/orange/g; s/banana/strawberry/g‘ file.txt
Benefits of joining expressions:
- Compact one-liner form, great for piping
- Avoids repetition of calling sed multiple times
When to use each approach?
Generally:
- Use newlines between expressions when readability matters
- Use semicolons to join expressions into condensed one-liners
So pick the method aligning with your needs.
Method 2: External Sed Script Files
So far we covered specifying replacements within the same sed invocation. But what about running a larger set of reuseable find-replace rules?
This is where external sed script files help.
A sed script is just a text file containing substitution commands – one expression per line:
# script.sed
s/find1/replace1/flags
s/find2/replace2/flags
s/findN/replaceN
To execute it, use sed -f
:
sed -f script.sed input.txt
Let‘s see an example script fruits.sed
to standardize fruit names:
# fruits.sed
s/apples/apple/
s/mangoess/mango/
s/banannas/banana/
Run it like so:
sed -f fruits.sed smoothie-list.txt
This approach makes sed expressions reusable across files. Just invoke fruits.sed
anytime you need to fix fruit spellings.
Pro tips for sed scripts:
- Name them like
.sedscripts
,.sed_n
to group in directories - Include comment headers describing the transformations
- Maintain consistent flags like
g
,p
across rules
External scripts are great when handling lots of tribal knowledge around data cleansing, log parsing etc. Extracting the rules into a separate sed file allows reapplying them easily.
Some real-world examples next.
Real-World Examples
Let‘s apply what we‘ve learned to some practical multi-sed use cases.
1. Anonymizing Log Files
Shared log files often contain personally identifiable information (PII) like email addresses, names, IDs which need scrubbing.
We can pseudonymize logs using a lookup file like:
# anonymize_map.txt
alice@example.com -> anon_user1@example.com
bob@example.com -> anon_user2@example.com
Then sed substitute the real emails with anonymized versions:
sed -f anonymizer.sed access.log > clean_access.log
Here is the script anonymizer.sed
:
# anonymizer.sed
s/alice@example\.com/anon_user1@example.com/g
s/bob@example\.com/anon_user2@example.com/g
The .
escapes handle dots in email addresses.
This approach maintains utility of logs for debugging while scrubbing personal identifiers!
2. Converting HTML to Markdown
Static site generators like Jekyll, Hugo work with Markdown formatted document content. When migrating from an old HTML site, reformatting content can be tedious.
Let‘s automate it with sed!
We will use the html2markdown
tool to handle the HTML->MD conversion, and post-process with sed for cleanup:
for file in *.html; do
html2markdown "$file" | sed -f markdown_postproc.sed > "${file/%html/md}";
done
This loops through HTML files, converting and post-processing each to Markdown.
The script markdown_postproc.sed
does additional formatting:
#markdown_postproc.sed
# Images
s/<img src="/!\[image]\(/g
# Lists
s/<ul>/* /g
s/<ol>/* /g
s/<\/ul>/*/g
s/<\/ol>/*/g
# Bold / italic
s/<b>/**/g
s/<\/b>/**/g
s/<i>/**/g
s/<\/i>/**/g
It ensures images, lists and styling syntax follow Markdown conventions. The result is polished HTML→Markdown conversions!
3. Stream Data Processing
Ingest pipelines often require multi-stage processing on streaming data.
For example, here is a sed pipeline to cleanse live app server logs:
tail -F app.log | \
sed ‘s/WARN/warning/g‘ | \
sed ‘s/ERR/error/g‘ | \
sed -f parsers.sed | \
sed ‘s/id=[0-9]*/id=xxx/‘ > clean_logs.txt
This tails the live log, standardizes severity levels, applies parsing rules and scrubs IDs – all using pipelined seds!
The script parsers.sed
contains tribal cleanup knowledge:
# parsers.sed
# Users
s/User:\s*\([a-zA-Z]*\)/User: <name>/
# URLs
s|https?://[a-zA-Z0-9_./?&=-]*|<url>|g
# Filepaths
s|(/\w+)+|</path>|g
Such data pipelines enable smoothly ingesting and transforming unstructured streams.
Alternative Tools Beyond Sed
While powerful, sed is not the only text manipulation game in town. How does it compare to alternatives like awk or perl?
awk is another popular stream editor in Unix. It operates on a per-line basis like sed, making it well-suited for text transformations.
However, sed has a few advantages for replacements:
- Simpler syntax –
s/find/replace/
reads easier than awk‘ssub()
andgsub()
- Regex support – Sed uses Extended Regex by default for more advanced matches
- In-place editing – Sed has
-i
to directly edit files without temporary output - Portability – Available by default on more Unix platforms
That said, awk shines in cases involving more advanced data structure and type conversions vs plain-text find-replace.
Perl is another common choice due to very fully featured string manipulation capabilities. The perl -pi -e
option allows direct in-place editing like sed.
Downsides compared to sed:
- Heavier tool – Perl runtime brings more overhead. Sed designed to be lightweight.
- Readability – Cryptic dense Perl regex syntax vs intuitive sed
- Availability – Perl not installed by default on most systems
So while Perl is extremely powerful, sed strikes the right balance for most text transformation tasks.
Best Practices for Multiple Sed Substitutions
We‘ve covered a ton of material on multi-sed techniques. Let‘s wrap up with some best practices to keep in mind:
- Stick to line-by-line processing instead of attempting buffers or streams for simplicity
- Use an external sed script file for anything more than 2-3 substitutions
- Validate replacement logic works correctly before running inplace
-i
- Watch escaping! Things can get complex with nested slashes, pipes etc
- Comment substitution rules clearly for later understanding
- Prefer readability with
-e
over condensed semicolon sequences when possible - Capture groups help handle repeated logic, but balance with readability
Adopting these tips will ensure you avoid pitfalls and create maintainable multi-sed text processing flows.
Additional Resources
With that, we have reached the end of our guided tour of multi-sed replacements!
We covered a lot of ground explaining the leading methods, real-world use cases, comparisons and best practices around sed substitutions.
To take your skills even further, here are some bonus resources:
- Advanced Sed Tutorial With Examples For Beginners – Great examples extending sed‘s capabilities
- IBM Developer Sed Guide – Well-structured overview of sed with tutorials
- Sed One-Liners Explained – Collections of useful sed tricks for reference
Additionally, be sure to experiment with sed hands-on using the techniques covered here. As the saying goes – practice makes perfect!
I hope this guide serves you well as a master reference for all things multi-sed. Happy text slicing and dicing!