toofishes.net

Delete lines between patterns with sed

sed can be a powerful but tricky tool to master, but it rose to the occasion tonight for cleaning up several old HTML pages. I had some JS code needing killing that was always between two known HTML comments, so some sort of multi-line removal seemed perfect.

Given some HTML that looked like this:

<div id="footer">
    <!-- Start of StatCounter Code -->
    <script type="text/javascript">
        var sc_project=000000; 
        var sc_invisible=1; 
        var sc_partition=9; 
        var sc_security="00000000"; 
        var sc_text=2; 
    </script>
    <script type="text/javascript" src="https://www.statcounter.com/counter/counter.js"></script>
    <!-- End of StatCounter Code -->
    <p><img src="./validxhtml10.png" alt="Valid XHTML 1.0" /></p>
</div>

The following sed script did exactly what I wanted, which was leave the footer <div/> intact but kill all of the code and comments nested within.

# kill-statcounter.sed
# remove all lines between 'Start' and 'End' inclusive
/Start of StatCounter/ {
    :loop
        # pull in the next line to the pattern space
        N
        # if our line matches, delete entire pattern space
        # AND restart the cycle outside of the loop
        /End of StatCounter/d
        # if we get here we didn't match delete, so keep looking
        b loop
}

Tags

See Also