Text manipulation is an essential part of a developer‘s toolbox. Whether it‘s processing logs, transforming configuration files or wrangling messy data, being able to find and replace text quickly is a must-have skill.

This is where sed shines.

sed stands for stream editor. As the name suggests, it allows performing non-interactive edits on text streams – whether an input file, pipeline input or stdout stream.

With its strong regex capabilities, sed enables developers to restructure, clean and transform text without manually opening files. And it‘s blazingly fast too – sed was designed as a lightweight editor to be used in scripts and pipelines.

However, while basic sed finding and replacing is easy enough, handling multiple find-replace expressions can get complex.

In this comprehensive 3200+ word guide, we will systematically cover various methods to achieve multi-substitutions using sed, with detailed examples for each approach.

You will learn:

  • Techniques to specify multiple sed replacement expressions
  • Using external sed scripts for managing find-replace rules
  • Chaining multiple sed commands together
  • Advanced substitution features like ranges, flags and captures
  • Applying multi-sed workflows to real-world examples

Follow along, and you will level up your text-wrangling skills considerably!

Why Learn Multiple sed Substitutions?

Before we jump into the techniques, it‘s worth covering why you may need multiple substitutions in the first place.

In simple cases, doing one global search-replace per sed invocation works fine. For example:

sed ‘s/foo/bar/g‘ file.txt

However, text transformations often require multiple find-replace passes. Some common cases where this happens:

Structured text processing – Data formats like JSON, CSV, XML contain nested logical units like objects, rows, and tags. Multi-pass sed lets you manipulate specific sections separately.

Multi-stage transformations – Converting between formats like HTML→Markdown or encoding changes utf-8→ascii is best done in steps.

Stream editing – When processing pipeline data, you may progressively sanitize, structure and filter the stream.

Large refactors – Refactoring code or data structures often needs changes to multiple interconnected pieces.

As you can see, chained sed substitutions helps process text in logical stages.

Understanding multi-replace best practices is thus key for any developer working with text processing.

Alright, enough talk – let‘s get hands-on!

Sed Replacements 101

Before we get into advanced techniques, let‘s recap basic sed replacements as a refresher.

The syntax for substitutions in sed is:

sed ‘s/search_pattern/replacement_text/flags‘ input_file

Here:

  • s: the substitute command
  • search_pattern: text pattern to find
  • replacement_text: new text that will replace matches
  • flags: optional flags like g for global replace
  • input_file: file to run sed on

For example, to globally replace all apple instances with orange in input.txt:

sed ‘s/apple/orange/g‘ input.txt

This will print out the modified file to stdout. To edit the file inplace, use the -i flag like so:

sed -i ‘s/apple/orange/g‘ input.txt

Easy enough! Now let‘s understand approaches for more advanced multi-substitutions.

Method 1: Chaining Expressions

Our first method for multi-replace involves chaining multiple substitute commands together in one sed invocation.

There are two popular ways to achieve this:

  1. Using multiple -e expressions
  2. Joining expressions with semicolons

Let‘s explore both techniques.

1.1 Multiple -e Expressions

The -e flag allows adding multiple expressions to a sed command:

sed -e ‘s/find1/replace1/‘ -e ‘s/find2/replace2/‘ file

For example, to replace both apple and banana in file.txt:

sed -e ‘s/apple/orange/‘ -e ‘s/banana/strawberry/‘ file.txt

The main benefit of -e is it allows clearly separating out each substitution rule, improving readability.

Here is an expanded example:

# Remove sensitivity classification from file header  
sed -e ‘s/(C)//‘ -e ‘s/(I)//‘ -e ‘s/(S)//‘ report.pdf

# Standardize date formats
sed -e ‘s/JAN/01/‘ -e ‘s/FEB/02/‘ -e ‘s/MAR/03/‘ logs.csv 

# Anonymize PII  
sed -e ‘s/Alice/<FirstName>/’ -e ‘s/Smith/<LastName>/‘ records.json

In addition to readability, -e allows reusing sections easily across files.

1.2 Joining Expressions

Instead of -e, we can also chain together expressions directly using semicolons:

sed -e ‘s/find1/replace1/; s/find2/replace2‘ file

Our earlier fruit example becomes:

sed -e ‘s/apple/orange/g; s/banana/strawberry/g‘ file.txt

Benefits of joining expressions:

  • Compact one-liner form, great for piping
  • Avoids repetition of calling sed multiple times

When to use each approach?

Generally:

  • Use newlines between expressions when readability matters
  • Use semicolons to join expressions into condensed one-liners

So pick the method aligning with your needs.

Method 2: External Sed Script Files

So far we covered specifying replacements within the same sed invocation. But what about running a larger set of reuseable find-replace rules?

This is where external sed script files help.

A sed script is just a text file containing substitution commands – one expression per line:

# script.sed
s/find1/replace1/flags
s/find2/replace2/flags 
s/findN/replaceN

To execute it, use sed -f:

sed -f script.sed input.txt 

Let‘s see an example script fruits.sed to standardize fruit names:

# fruits.sed
s/apples/apple/
s/mangoess/mango/
s/banannas/banana/

Run it like so:

sed -f fruits.sed smoothie-list.txt

This approach makes sed expressions reusable across files. Just invoke fruits.sed anytime you need to fix fruit spellings.

Pro tips for sed scripts:

  • Name them like .sedscripts, .sed_n to group in directories
  • Include comment headers describing the transformations
  • Maintain consistent flags like g, p across rules

External scripts are great when handling lots of tribal knowledge around data cleansing, log parsing etc. Extracting the rules into a separate sed file allows reapplying them easily.

Some real-world examples next.

Real-World Examples

Let‘s apply what we‘ve learned to some practical multi-sed use cases.

1. Anonymizing Log Files

Shared log files often contain personally identifiable information (PII) like email addresses, names, IDs which need scrubbing.

We can pseudonymize logs using a lookup file like:

# anonymize_map.txt  
alice@example.com -> anon_user1@example.com
bob@example.com -> anon_user2@example.com 

Then sed substitute the real emails with anonymized versions:

sed -f anonymizer.sed access.log > clean_access.log

Here is the script anonymizer.sed:

# anonymizer.sed
s/alice@example\.com/anon_user1@example.com/g
s/bob@example\.com/anon_user2@example.com/g

The . escapes handle dots in email addresses.

This approach maintains utility of logs for debugging while scrubbing personal identifiers!

2. Converting HTML to Markdown

Static site generators like Jekyll, Hugo work with Markdown formatted document content. When migrating from an old HTML site, reformatting content can be tedious.

Let‘s automate it with sed!

We will use the html2markdown tool to handle the HTML->MD conversion, and post-process with sed for cleanup:

for file in *.html; do
   html2markdown "$file" | sed -f markdown_postproc.sed > "${file/%html/md}";  
done

This loops through HTML files, converting and post-processing each to Markdown.

The script markdown_postproc.sed does additional formatting:

#markdown_postproc.sed

# Images 
s/<img src="/!\[image]\(/g  

# Lists
s/<ul>/* /g  
s/<ol>/* /g
s/<\/ul>/*/g
s/<\/ol>/*/g

# Bold / italic  
s/<b>/**/g
s/<\/b>/**/g
s/<i>/**/g
s/<\/i>/**/g

It ensures images, lists and styling syntax follow Markdown conventions. The result is polished HTML→Markdown conversions!

3. Stream Data Processing

Ingest pipelines often require multi-stage processing on streaming data.

For example, here is a sed pipeline to cleanse live app server logs:

tail -F app.log | \
sed ‘s/WARN/warning/g‘ | \
sed ‘s/ERR/error/g‘ | \ 
sed -f parsers.sed | \
sed ‘s/id=[0-9]*/id=xxx/‘ > clean_logs.txt

This tails the live log, standardizes severity levels, applies parsing rules and scrubs IDs – all using pipelined seds!

The script parsers.sed contains tribal cleanup knowledge:

# parsers.sed
# Users  
s/User:\s*\([a-zA-Z]*\)/User: <name>/

# URLs
s|https?://[a-zA-Z0-9_./?&=-]*|<url>|g

# Filepaths  
s|(/\w+)+|</path>|g

Such data pipelines enable smoothly ingesting and transforming unstructured streams.

Alternative Tools Beyond Sed

While powerful, sed is not the only text manipulation game in town. How does it compare to alternatives like awk or perl?

awk is another popular stream editor in Unix. It operates on a per-line basis like sed, making it well-suited for text transformations.

However, sed has a few advantages for replacements:

  • Simpler syntaxs/find/replace/ reads easier than awk‘s sub() and gsub()
  • Regex support – Sed uses Extended Regex by default for more advanced matches
  • In-place editing – Sed has -i to directly edit files without temporary output
  • Portability – Available by default on more Unix platforms

That said, awk shines in cases involving more advanced data structure and type conversions vs plain-text find-replace.

Perl is another common choice due to very fully featured string manipulation capabilities. The perl -pi -e option allows direct in-place editing like sed.

Downsides compared to sed:

  • Heavier tool – Perl runtime brings more overhead. Sed designed to be lightweight.
  • Readability – Cryptic dense Perl regex syntax vs intuitive sed
  • Availability – Perl not installed by default on most systems

So while Perl is extremely powerful, sed strikes the right balance for most text transformation tasks.

Best Practices for Multiple Sed Substitutions

We‘ve covered a ton of material on multi-sed techniques. Let‘s wrap up with some best practices to keep in mind:

  • Stick to line-by-line processing instead of attempting buffers or streams for simplicity
  • Use an external sed script file for anything more than 2-3 substitutions
  • Validate replacement logic works correctly before running inplace -i
  • Watch escaping! Things can get complex with nested slashes, pipes etc
  • Comment substitution rules clearly for later understanding
  • Prefer readability with -e over condensed semicolon sequences when possible
  • Capture groups help handle repeated logic, but balance with readability

Adopting these tips will ensure you avoid pitfalls and create maintainable multi-sed text processing flows.

Additional Resources

With that, we have reached the end of our guided tour of multi-sed replacements!

We covered a lot of ground explaining the leading methods, real-world use cases, comparisons and best practices around sed substitutions.

To take your skills even further, here are some bonus resources:

Additionally, be sure to experiment with sed hands-on using the techniques covered here. As the saying goes – practice makes perfect!

I hope this guide serves you well as a master reference for all things multi-sed. Happy text slicing and dicing!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *