| This is ../../doc/sed.info, produced by makeinfo version 4.12 from |
| ../../doc//config.texi. |
| |
| INFO-DIR-SECTION Text creation and manipulation |
| START-INFO-DIR-ENTRY |
| * sed: (sed). Stream EDitor. |
| |
| END-INFO-DIR-ENTRY |
| |
| This file documents version 4.2.1 of GNU `sed', a stream editor. |
| |
| Copyright (C) 1998, 1999, 2001, 2002, 2003, 2004 Free Software |
| Foundation, Inc. |
| |
| This document is released under the terms of the GNU Free |
| Documentation License as published by the Free Software Foundation; |
| either version 1.1, or (at your option) any later version. |
| |
| You should have received a copy of the GNU Free Documentation |
| License along with GNU `sed'; see the file `COPYING.DOC'. If not, |
| write to the Free Software Foundation, 59 Temple Place - Suite 330, |
| Boston, MA 02110-1301, USA. |
| |
| There are no Cover Texts and no Invariant Sections; this text, along |
| with its equivalent in the printed manual, constitutes the Title Page. |
| |
| |
| File: sed.info, Node: Top, Next: Introduction, Up: (dir) |
| |
| sed, a stream editor |
| ******************** |
| |
| This file documents version 4.2.1 of GNU `sed', a stream editor. |
| |
| Copyright (C) 1998, 1999, 2001, 2002, 2003, 2004 Free Software |
| Foundation, Inc. |
| |
| This document is released under the terms of the GNU Free |
| Documentation License as published by the Free Software Foundation; |
| either version 1.1, or (at your option) any later version. |
| |
| You should have received a copy of the GNU Free Documentation |
| License along with GNU `sed'; see the file `COPYING.DOC'. If not, |
| write to the Free Software Foundation, 59 Temple Place - Suite 330, |
| Boston, MA 02110-1301, USA. |
| |
| There are no Cover Texts and no Invariant Sections; this text, along |
| with its equivalent in the printed manual, constitutes the Title Page. |
| |
| * Menu: |
| |
| * Introduction:: Introduction |
| * Invoking sed:: Invocation |
| * sed Programs:: `sed' programs |
| * Examples:: Some sample scripts |
| * Limitations:: Limitations and (non-)limitations of GNU `sed' |
| * Other Resources:: Other resources for learning about `sed' |
| * Reporting Bugs:: Reporting bugs |
| |
| * Extended regexps:: `egrep'-style regular expressions |
| |
| * Concept Index:: A menu with all the topics in this manual. |
| * Command and Option Index:: A menu with all `sed' commands and |
| command-line options. |
| |
| --- The detailed node listing --- |
| |
| sed Programs: |
| * Execution Cycle:: How `sed' works |
| * Addresses:: Selecting lines with `sed' |
| * Regular Expressions:: Overview of regular expression syntax |
| * Common Commands:: Often used commands |
| * The "s" Command:: `sed''s Swiss Army Knife |
| * Other Commands:: Less frequently used commands |
| * Programming Commands:: Commands for `sed' gurus |
| * Extended Commands:: Commands specific of GNU `sed' |
| * Escapes:: Specifying special characters |
| |
| Examples: |
| * Centering lines:: |
| * Increment a number:: |
| * Rename files to lower case:: |
| * Print bash environment:: |
| * Reverse chars of lines:: |
| * tac:: Reverse lines of files |
| * cat -n:: Numbering lines |
| * cat -b:: Numbering non-blank lines |
| * wc -c:: Counting chars |
| * wc -w:: Counting words |
| * wc -l:: Counting lines |
| * head:: Printing the first lines |
| * tail:: Printing the last lines |
| * uniq:: Make duplicate lines unique |
| * uniq -d:: Print duplicated lines of input |
| * uniq -u:: Remove all duplicated lines |
| * cat -s:: Squeezing blank lines |
| |
| |
| File: sed.info, Node: Introduction, Next: Invoking sed, Prev: Top, Up: Top |
| |
| 1 Introduction |
| ************** |
| |
| `sed' is a stream editor. A stream editor is used to perform basic text |
| transformations on an input stream (a file or input from a pipeline). |
| While in some ways similar to an editor which permits scripted edits |
| (such as `ed'), `sed' works by making only one pass over the input(s), |
| and is consequently more efficient. But it is `sed''s ability to |
| filter text in a pipeline which particularly distinguishes it from |
| other types of editors. |
| |
| |
| File: sed.info, Node: Invoking sed, Next: sed Programs, Prev: Introduction, Up: Top |
| |
| 2 Invocation |
| ************ |
| |
| Normally `sed' is invoked like this: |
| |
| sed SCRIPT INPUTFILE... |
| |
| The full format for invoking `sed' is: |
| |
| sed OPTIONS... [SCRIPT] [INPUTFILE...] |
| |
| If you do not specify INPUTFILE, or if INPUTFILE is `-', `sed' |
| filters the contents of the standard input. The SCRIPT is actually the |
| first non-option parameter, which `sed' specially considers a script |
| and not an input file if (and only if) none of the other OPTIONS |
| specifies a script to be executed, that is if neither of the `-e' and |
| `-f' options is specified. |
| |
| `sed' may be invoked with the following command-line options: |
| |
| `--version' |
| Print out the version of `sed' that is being run and a copyright |
| notice, then exit. |
| |
| `--help' |
| Print a usage message briefly summarizing these command-line |
| options and the bug-reporting address, then exit. |
| |
| `-n' |
| `--quiet' |
| `--silent' |
| By default, `sed' prints out the pattern space at the end of each |
| cycle through the script (*note How `sed' works: Execution Cycle.). |
| These options disable this automatic printing, and `sed' only |
| produces output when explicitly told to via the `p' command. |
| |
| `-e SCRIPT' |
| `--expression=SCRIPT' |
| Add the commands in SCRIPT to the set of commands to be run while |
| processing the input. |
| |
| `-f SCRIPT-FILE' |
| `--file=SCRIPT-FILE' |
| Add the commands contained in the file SCRIPT-FILE to the set of |
| commands to be run while processing the input. |
| |
| `-i[SUFFIX]' |
| `--in-place[=SUFFIX]' |
| This option specifies that files are to be edited in-place. GNU |
| `sed' does this by creating a temporary file and sending output to |
| this file rather than to the standard output.(1). |
| |
| This option implies `-s'. |
| |
| When the end of the file is reached, the temporary file is renamed |
| to the output file's original name. The extension, if supplied, |
| is used to modify the name of the old file before renaming the |
| temporary file, thereby making a backup copy(2)). |
| |
| This rule is followed: if the extension doesn't contain a `*', |
| then it is appended to the end of the current filename as a |
| suffix; if the extension does contain one or more `*' characters, |
| then _each_ asterisk is replaced with the current filename. This |
| allows you to add a prefix to the backup file, instead of (or in |
| addition to) a suffix, or even to place backup copies of the |
| original files into another directory (provided the directory |
| already exists). |
| |
| If no extension is supplied, the original file is overwritten |
| without making a backup. |
| |
| `-l N' |
| `--line-length=N' |
| Specify the default line-wrap length for the `l' command. A |
| length of 0 (zero) means to never wrap long lines. If not |
| specified, it is taken to be 70. |
| |
| `--posix' |
| GNU `sed' includes several extensions to POSIX sed. In order to |
| simplify writing portable scripts, this option disables all the |
| extensions that this manual documents, including additional |
| commands. Most of the extensions accept `sed' programs that are |
| outside the syntax mandated by POSIX, but some of them (such as |
| the behavior of the `N' command described in *note Reporting |
| Bugs::) actually violate the standard. If you want to disable |
| only the latter kind of extension, you can set the |
| `POSIXLY_CORRECT' variable to a non-empty value. |
| |
| `-b' |
| `--binary' |
| This option is available on every platform, but is only effective |
| where the operating system makes a distinction between text files |
| and binary files. When such a distinction is made--as is the case |
| for MS-DOS, Windows, Cygwin--text files are composed of lines |
| separated by a carriage return _and_ a line feed character, and |
| `sed' does not see the ending CR. When this option is specified, |
| `sed' will open input files in binary mode, thus not requesting |
| this special processing and considering lines to end at a line |
| feed. |
| |
| `--follow-symlinks' |
| This option is available only on platforms that support symbolic |
| links and has an effect only if option `-i' is specified. In this |
| case, if the file that is specified on the command line is a |
| symbolic link, `sed' will follow the link and edit the ultimate |
| destination of the link. The default behavior is to break the |
| symbolic link, so that the link destination will not be modified. |
| |
| `-r' |
| `--regexp-extended' |
| Use extended regular expressions rather than basic regular |
| expressions. Extended regexps are those that `egrep' accepts; |
| they can be clearer because they usually have less backslashes, |
| but are a GNU extension and hence scripts that use them are not |
| portable. *Note Extended regular expressions: Extended regexps. |
| |
| `-s' |
| `--separate' |
| By default, `sed' will consider the files specified on the command |
| line as a single continuous long stream. This GNU `sed' extension |
| allows the user to consider them as separate files: range |
| addresses (such as `/abc/,/def/') are not allowed to span several |
| files, line numbers are relative to the start of each file, `$' |
| refers to the last line of each file, and files invoked from the |
| `R' commands are rewound at the start of each file. |
| |
| `-u' |
| `--unbuffered' |
| Buffer both input and output as minimally as practical. (This is |
| particularly useful if the input is coming from the likes of `tail |
| -f', and you wish to see the transformed output as soon as |
| possible.) |
| |
| |
| If no `-e', `-f', `--expression', or `--file' options are given on |
| the command-line, then the first non-option argument on the command |
| line is taken to be the SCRIPT to be executed. |
| |
| If any command-line parameters remain after processing the above, |
| these parameters are interpreted as the names of input files to be |
| processed. A file name of `-' refers to the standard input stream. |
| The standard input will be processed if no file names are specified. |
| |
| ---------- Footnotes ---------- |
| |
| (1) This applies to commands such as `=', `a', `c', `i', `l', `p'. |
| You can still write to the standard output by using the `w' or `W' |
| commands together with the `/dev/stdout' special file |
| |
| (2) Note that GNU `sed' creates the backup file whether or not any |
| output is actually changed. |
| |
| |
| File: sed.info, Node: sed Programs, Next: Examples, Prev: Invoking sed, Up: Top |
| |
| 3 `sed' Programs |
| **************** |
| |
| A `sed' program consists of one or more `sed' commands, passed in by |
| one or more of the `-e', `-f', `--expression', and `--file' options, or |
| the first non-option argument if zero of these options are used. This |
| document will refer to "the" `sed' script; this is understood to mean |
| the in-order catenation of all of the SCRIPTs and SCRIPT-FILEs passed |
| in. |
| |
| Each `sed' command consists of an optional address or address range, |
| followed by a one-character command name and any additional |
| command-specific code. |
| |
| * Menu: |
| |
| * Execution Cycle:: How `sed' works |
| * Addresses:: Selecting lines with `sed' |
| * Regular Expressions:: Overview of regular expression syntax |
| * Common Commands:: Often used commands |
| * The "s" Command:: `sed''s Swiss Army Knife |
| * Other Commands:: Less frequently used commands |
| * Programming Commands:: Commands for `sed' gurus |
| * Extended Commands:: Commands specific of GNU `sed' |
| * Escapes:: Specifying special characters |
| |
| |
| File: sed.info, Node: Execution Cycle, Next: Addresses, Up: sed Programs |
| |
| 3.1 How `sed' Works |
| =================== |
| |
| `sed' maintains two data buffers: the active _pattern_ space, and the |
| auxiliary _hold_ space. Both are initially empty. |
| |
| `sed' operates by performing the following cycle on each lines of |
| input: first, `sed' reads one line from the input stream, removes any |
| trailing newline, and places it in the pattern space. Then commands |
| are executed; each command can have an address associated to it: |
| addresses are a kind of condition code, and a command is only executed |
| if the condition is verified before the command is to be executed. |
| |
| When the end of the script is reached, unless the `-n' option is in |
| use, the contents of pattern space are printed out to the output |
| stream, adding back the trailing newline if it was removed.(1) Then the |
| next cycle starts for the next input line. |
| |
| Unless special commands (like `D') are used, the pattern space is |
| deleted between two cycles. The hold space, on the other hand, keeps |
| its data between cycles (see commands `h', `H', `x', `g', `G' to move |
| data between both buffers). |
| |
| ---------- Footnotes ---------- |
| |
| (1) Actually, if `sed' prints a line without the terminating |
| newline, it will nevertheless print the missing newline as soon as more |
| text is sent to the same output stream, which gives the "least expected |
| surprise" even though it does not make commands like `sed -n p' exactly |
| identical to `cat'. |
| |
| |
| File: sed.info, Node: Addresses, Next: Regular Expressions, Prev: Execution Cycle, Up: sed Programs |
| |
| 3.2 Selecting lines with `sed' |
| ============================== |
| |
| Addresses in a `sed' script can be in any of the following forms: |
| `NUMBER' |
| Specifying a line number will match only that line in the input. |
| (Note that `sed' counts lines continuously across all input files |
| unless `-i' or `-s' options are specified.) |
| |
| `FIRST~STEP' |
| This GNU extension matches every STEPth line starting with line |
| FIRST. In particular, lines will be selected when there exists a |
| non-negative N such that the current line-number equals FIRST + (N |
| * STEP). Thus, to select the odd-numbered lines, one would use |
| `1~2'; to pick every third line starting with the second, `2~3' |
| would be used; to pick every fifth line starting with the tenth, |
| use `10~5'; and `50~0' is just an obscure way of saying `50'. |
| |
| `$' |
| This address matches the last line of the last file of input, or |
| the last line of each file when the `-i' or `-s' options are |
| specified. |
| |
| `/REGEXP/' |
| This will select any line which matches the regular expression |
| REGEXP. If REGEXP itself includes any `/' characters, each must |
| be escaped by a backslash (`\'). |
| |
| The empty regular expression `//' repeats the last regular |
| expression match (the same holds if the empty regular expression is |
| passed to the `s' command). Note that modifiers to regular |
| expressions are evaluated when the regular expression is compiled, |
| thus it is invalid to specify them together with the empty regular |
| expression. |
| |
| `\%REGEXP%' |
| (The `%' may be replaced by any other single character.) |
| |
| This also matches the regular expression REGEXP, but allows one to |
| use a different delimiter than `/'. This is particularly useful |
| if the REGEXP itself contains a lot of slashes, since it avoids |
| the tedious escaping of every `/'. If REGEXP itself includes any |
| delimiter characters, each must be escaped by a backslash (`\'). |
| |
| `/REGEXP/I' |
| `\%REGEXP%I' |
| The `I' modifier to regular-expression matching is a GNU extension |
| which causes the REGEXP to be matched in a case-insensitive manner. |
| |
| `/REGEXP/M' |
| `\%REGEXP%M' |
| The `M' modifier to regular-expression matching is a GNU `sed' |
| extension which causes `^' and `$' to match respectively (in |
| addition to the normal behavior) the empty string after a newline, |
| and the empty string before a newline. There are special character |
| sequences (`\`' and `\'') which always match the beginning or the |
| end of the buffer. `M' stands for `multi-line'. |
| |
| |
| If no addresses are given, then all lines are matched; if one |
| address is given, then only lines matching that address are matched. |
| |
| An address range can be specified by specifying two addresses |
| separated by a comma (`,'). An address range matches lines starting |
| from where the first address matches, and continues until the second |
| address matches (inclusively). |
| |
| If the second address is a REGEXP, then checking for the ending |
| match will start with the line _following_ the line which matched the |
| first address: a range will always span at least two lines (except of |
| course if the input stream ends). |
| |
| If the second address is a NUMBER less than (or equal to) the line |
| matching the first address, then only the one line is matched. |
| |
| GNU `sed' also supports some special two-address forms; all these |
| are GNU extensions: |
| `0,/REGEXP/' |
| A line number of `0' can be used in an address specification like |
| `0,/REGEXP/' so that `sed' will try to match REGEXP in the first |
| input line too. In other words, `0,/REGEXP/' is similar to |
| `1,/REGEXP/', except that if ADDR2 matches the very first line of |
| input the `0,/REGEXP/' form will consider it to end the range, |
| whereas the `1,/REGEXP/' form will match the beginning of its |
| range and hence make the range span up to the _second_ occurrence |
| of the regular expression. |
| |
| Note that this is the only place where the `0' address makes |
| sense; there is no 0-th line and commands which are given the `0' |
| address in any other way will give an error. |
| |
| `ADDR1,+N' |
| Matches ADDR1 and the N lines following ADDR1. |
| |
| `ADDR1,~N' |
| Matches ADDR1 and the lines following ADDR1 until the next line |
| whose input line number is a multiple of N. |
| |
| Appending the `!' character to the end of an address specification |
| negates the sense of the match. That is, if the `!' character follows |
| an address range, then only lines which do _not_ match the address range |
| will be selected. This also works for singleton addresses, and, |
| perhaps perversely, for the null address. |
| |
| |
| File: sed.info, Node: Regular Expressions, Next: Common Commands, Prev: Addresses, Up: sed Programs |
| |
| 3.3 Overview of Regular Expression Syntax |
| ========================================= |
| |
| To know how to use `sed', people should understand regular expressions |
| ("regexp" for short). A regular expression is a pattern that is |
| matched against a subject string from left to right. Most characters |
| are "ordinary": they stand for themselves in a pattern, and match the |
| corresponding characters in the subject. As a trivial example, the |
| pattern |
| |
| The quick brown fox |
| |
| matches a portion of a subject string that is identical to itself. The |
| power of regular expressions comes from the ability to include |
| alternatives and repetitions in the pattern. These are encoded in the |
| pattern by the use of "special characters", which do not stand for |
| themselves but instead are interpreted in some special way. Here is a |
| brief description of regular expression syntax as used in `sed'. |
| |
| `CHAR' |
| A single ordinary character matches itself. |
| |
| `*' |
| Matches a sequence of zero or more instances of matches for the |
| preceding regular expression, which must be an ordinary character, |
| a special character preceded by `\', a `.', a grouped regexp (see |
| below), or a bracket expression. As a GNU extension, a postfixed |
| regular expression can also be followed by `*'; for example, `a**' |
| is equivalent to `a*'. POSIX 1003.1-2001 says that `*' stands for |
| itself when it appears at the start of a regular expression or |
| subexpression, but many nonGNU implementations do not support this |
| and portable scripts should instead use `\*' in these contexts. |
| |
| `\+' |
| As `*', but matches one or more. It is a GNU extension. |
| |
| `\?' |
| As `*', but only matches zero or one. It is a GNU extension. |
| |
| `\{I\}' |
| As `*', but matches exactly I sequences (I is a decimal integer; |
| for portability, keep it between 0 and 255 inclusive). |
| |
| `\{I,J\}' |
| Matches between I and J, inclusive, sequences. |
| |
| `\{I,\}' |
| Matches more than or equal to I sequences. |
| |
| `\(REGEXP\)' |
| Groups the inner REGEXP as a whole, this is used to: |
| |
| * Apply postfix operators, like `\(abcd\)*': this will search |
| for zero or more whole sequences of `abcd', while `abcd*' |
| would search for `abc' followed by zero or more occurrences |
| of `d'. Note that support for `\(abcd\)*' is required by |
| POSIX 1003.1-2001, but many non-GNU implementations do not |
| support it and hence it is not universally portable. |
| |
| * Use back references (see below). |
| |
| `.' |
| Matches any character, including newline. |
| |
| `^' |
| Matches the null string at beginning of the pattern space, i.e. |
| what appears after the circumflex must appear at the beginning of |
| the pattern space. |
| |
| In most scripts, pattern space is initialized to the content of |
| each line (*note How `sed' works: Execution Cycle.). So, it is a |
| useful simplification to think of `^#include' as matching only |
| lines where `#include' is the first thing on line--if there are |
| spaces before, for example, the match fails. This simplification |
| is valid as long as the original content of pattern space is not |
| modified, for example with an `s' command. |
| |
| `^' acts as a special character only at the beginning of the |
| regular expression or subexpression (that is, after `\(' or `\|'). |
| Portable scripts should avoid `^' at the beginning of a |
| subexpression, though, as POSIX allows implementations that treat |
| `^' as an ordinary character in that context. |
| |
| `$' |
| It is the same as `^', but refers to end of pattern space. `$' |
| also acts as a special character only at the end of the regular |
| expression or subexpression (that is, before `\)' or `\|'), and |
| its use at the end of a subexpression is not portable. |
| |
| `[LIST]' |
| `[^LIST]' |
| Matches any single character in LIST: for example, `[aeiou]' |
| matches all vowels. A list may include sequences like |
| `CHAR1-CHAR2', which matches any character between (inclusive) |
| CHAR1 and CHAR2. |
| |
| A leading `^' reverses the meaning of LIST, so that it matches any |
| single character _not_ in LIST. To include `]' in the list, make |
| it the first character (after the `^' if needed), to include `-' |
| in the list, make it the first or last; to include `^' put it |
| after the first character. |
| |
| The characters `$', `*', `.', `[', and `\' are normally not |
| special within LIST. For example, `[\*]' matches either `\' or |
| `*', because the `\' is not special here. However, strings like |
| `[.ch.]', `[=a=]', and `[:space:]' are special within LIST and |
| represent collating symbols, equivalence classes, and character |
| classes, respectively, and `[' is therefore special within LIST |
| when it is followed by `.', `=', or `:'. Also, when not in |
| `POSIXLY_CORRECT' mode, special escapes like `\n' and `\t' are |
| recognized within LIST. *Note Escapes::. |
| |
| `REGEXP1\|REGEXP2' |
| Matches either REGEXP1 or REGEXP2. Use parentheses to use complex |
| alternative regular expressions. The matching process tries each |
| alternative in turn, from left to right, and the first one that |
| succeeds is used. It is a GNU extension. |
| |
| `REGEXP1REGEXP2' |
| Matches the concatenation of REGEXP1 and REGEXP2. Concatenation |
| binds more tightly than `\|', `^', and `$', but less tightly than |
| the other regular expression operators. |
| |
| `\DIGIT' |
| Matches the DIGIT-th `\(...\)' parenthesized subexpression in the |
| regular expression. This is called a "back reference". |
| Subexpressions are implicity numbered by counting occurrences of |
| `\(' left-to-right. |
| |
| `\n' |
| Matches the newline character. |
| |
| `\CHAR' |
| Matches CHAR, where CHAR is one of `$', `*', `.', `[', `\', or `^'. |
| Note that the only C-like backslash sequences that you can |
| portably assume to be interpreted are `\n' and `\\'; in particular |
| `\t' is not portable, and matches a `t' under most implementations |
| of `sed', rather than a tab character. |
| |
| |
| Note that the regular expression matcher is greedy, i.e., matches |
| are attempted from left to right and, if two or more matches are |
| possible starting at the same character, it selects the longest. |
| |
| Examples: |
| `abcdef' |
| Matches `abcdef'. |
| |
| `a*b' |
| Matches zero or more `a's followed by a single `b'. For example, |
| `b' or `aaaaab'. |
| |
| `a\?b' |
| Matches `b' or `ab'. |
| |
| `a\+b\+' |
| Matches one or more `a's followed by one or more `b's: `ab' is the |
| shortest possible match, but other examples are `aaaab' or |
| `abbbbb' or `aaaaaabbbbbbb'. |
| |
| `.*' |
| `.\+' |
| These two both match all the characters in a string; however, the |
| first matches every string (including the empty string), while the |
| second matches only strings containing at least one character. |
| |
| `^main.*(.*)' |
| his matches a string starting with `main', followed by an opening |
| and closing parenthesis. The `n', `(' and `)' need not be |
| adjacent. |
| |
| `^#' |
| This matches a string beginning with `#'. |
| |
| `\\$' |
| This matches a string ending with a single backslash. The regexp |
| contains two backslashes for escaping. |
| |
| `\$' |
| Instead, this matches a string consisting of a single dollar sign, |
| because it is escaped. |
| |
| `[a-zA-Z0-9]' |
| In the C locale, this matches any ASCII letters or digits. |
| |
| `[^ tab]\+' |
| (Here `tab' stands for a single tab character.) This matches a |
| string of one or more characters, none of which is a space or a |
| tab. Usually this means a word. |
| |
| `^\(.*\)\n\1$' |
| This matches a string consisting of two equal substrings separated |
| by a newline. |
| |
| `.\{9\}A$' |
| This matches nine characters followed by an `A'. |
| |
| `^.\{15\}A' |
| This matches the start of a string that contains 16 characters, |
| the last of which is an `A'. |
| |
| |
| |
| File: sed.info, Node: Common Commands, Next: The "s" Command, Prev: Regular Expressions, Up: sed Programs |
| |
| 3.4 Often-Used Commands |
| ======================= |
| |
| If you use `sed' at all, you will quite likely want to know these |
| commands. |
| |
| `#' |
| [No addresses allowed.] |
| |
| The `#' character begins a comment; the comment continues until |
| the next newline. |
| |
| If you are concerned about portability, be aware that some |
| implementations of `sed' (which are not POSIX conformant) may only |
| support a single one-line comment, and then only when the very |
| first character of the script is a `#'. |
| |
| Warning: if the first two characters of the `sed' script are `#n', |
| then the `-n' (no-autoprint) option is forced. If you want to put |
| a comment in the first line of your script and that comment begins |
| with the letter `n' and you do not want this behavior, then be |
| sure to either use a capital `N', or place at least one space |
| before the `n'. |
| |
| `q [EXIT-CODE]' |
| This command only accepts a single address. |
| |
| Exit `sed' without processing any more commands or input. Note |
| that the current pattern space is printed if auto-print is not |
| disabled with the `-n' options. The ability to return an exit |
| code from the `sed' script is a GNU `sed' extension. |
| |
| `d' |
| Delete the pattern space; immediately start next cycle. |
| |
| `p' |
| Print out the pattern space (to the standard output). This |
| command is usually only used in conjunction with the `-n' |
| command-line option. |
| |
| `n' |
| If auto-print is not disabled, print the pattern space, then, |
| regardless, replace the pattern space with the next line of input. |
| If there is no more input then `sed' exits without processing any |
| more commands. |
| |
| `{ COMMANDS }' |
| A group of commands may be enclosed between `{' and `}' characters. |
| This is particularly useful when you want a group of commands to |
| be triggered by a single address (or address-range) match. |
| |
| |
| |
| File: sed.info, Node: The "s" Command, Next: Other Commands, Prev: Common Commands, Up: sed Programs |
| |
| 3.5 The `s' Command |
| =================== |
| |
| The syntax of the `s' (as in substitute) command is |
| `s/REGEXP/REPLACEMENT/FLAGS'. The `/' characters may be uniformly |
| replaced by any other single character within any given `s' command. |
| The `/' character (or whatever other character is used in its stead) |
| can appear in the REGEXP or REPLACEMENT only if it is preceded by a `\' |
| character. |
| |
| The `s' command is probably the most important in `sed' and has a |
| lot of different options. Its basic concept is simple: the `s' command |
| attempts to match the pattern space against the supplied REGEXP; if the |
| match is successful, then that portion of the pattern space which was |
| matched is replaced with REPLACEMENT. |
| |
| The REPLACEMENT can contain `\N' (N being a number from 1 to 9, |
| inclusive) references, which refer to the portion of the match which is |
| contained between the Nth `\(' and its matching `\)'. Also, the |
| REPLACEMENT can contain unescaped `&' characters which reference the |
| whole matched portion of the pattern space. Finally, as a GNU `sed' |
| extension, you can include a special sequence made of a backslash and |
| one of the letters `L', `l', `U', `u', or `E'. The meaning is as |
| follows: |
| |
| `\L' |
| Turn the replacement to lowercase until a `\U' or `\E' is found, |
| |
| `\l' |
| Turn the next character to lowercase, |
| |
| `\U' |
| Turn the replacement to uppercase until a `\L' or `\E' is found, |
| |
| `\u' |
| Turn the next character to uppercase, |
| |
| `\E' |
| Stop case conversion started by `\L' or `\U'. |
| |
| To include a literal `\', `&', or newline in the final replacement, |
| be sure to precede the desired `\', `&', or newline in the REPLACEMENT |
| with a `\'. |
| |
| The `s' command can be followed by zero or more of the following |
| FLAGS: |
| |
| `g' |
| Apply the replacement to _all_ matches to the REGEXP, not just the |
| first. |
| |
| `NUMBER' |
| Only replace the NUMBERth match of the REGEXP. |
| |
| Note: the POSIX standard does not specify what should happen when |
| you mix the `g' and NUMBER modifiers, and currently there is no |
| widely agreed upon meaning across `sed' implementations. For GNU |
| `sed', the interaction is defined to be: ignore matches before the |
| NUMBERth, and then match and replace all matches from the NUMBERth |
| on. |
| |
| `p' |
| If the substitution was made, then print the new pattern space. |
| |
| Note: when both the `p' and `e' options are specified, the |
| relative ordering of the two produces very different results. In |
| general, `ep' (evaluate then print) is what you want, but |
| operating the other way round can be useful for debugging. For |
| this reason, the current version of GNU `sed' interprets specially |
| the presence of `p' options both before and after `e', printing |
| the pattern space before and after evaluation, while in general |
| flags for the `s' command show their effect just once. This |
| behavior, although documented, might change in future versions. |
| |
| `w FILE-NAME' |
| If the substitution was made, then write out the result to the |
| named file. As a GNU `sed' extension, two special values of |
| FILE-NAME are supported: `/dev/stderr', which writes the result to |
| the standard error, and `/dev/stdout', which writes to the standard |
| output.(1) |
| |
| `e' |
| This command allows one to pipe input from a shell command into |
| pattern space. If a substitution was made, the command that is |
| found in pattern space is executed and pattern space is replaced |
| with its output. A trailing newline is suppressed; results are |
| undefined if the command to be executed contains a NUL character. |
| This is a GNU `sed' extension. |
| |
| `I' |
| `i' |
| The `I' modifier to regular-expression matching is a GNU extension |
| which makes `sed' match REGEXP in a case-insensitive manner. |
| |
| `M' |
| `m' |
| The `M' modifier to regular-expression matching is a GNU `sed' |
| extension which causes `^' and `$' to match respectively (in |
| addition to the normal behavior) the empty string after a newline, |
| and the empty string before a newline. There are special character |
| sequences (`\`' and `\'') which always match the beginning or the |
| end of the buffer. `M' stands for `multi-line'. |
| |
| |
| ---------- Footnotes ---------- |
| |
| (1) This is equivalent to `p' unless the `-i' option is being used. |
| |
| |
| File: sed.info, Node: Other Commands, Next: Programming Commands, Prev: The "s" Command, Up: sed Programs |
| |
| 3.6 Less Frequently-Used Commands |
| ================================= |
| |
| Though perhaps less frequently used than those in the previous section, |
| some very small yet useful `sed' scripts can be built with these |
| commands. |
| |
| `y/SOURCE-CHARS/DEST-CHARS/' |
| (The `/' characters may be uniformly replaced by any other single |
| character within any given `y' command.) |
| |
| Transliterate any characters in the pattern space which match any |
| of the SOURCE-CHARS with the corresponding character in DEST-CHARS. |
| |
| Instances of the `/' (or whatever other character is used in its |
| stead), `\', or newlines can appear in the SOURCE-CHARS or |
| DEST-CHARS lists, provide that each instance is escaped by a `\'. |
| The SOURCE-CHARS and DEST-CHARS lists _must_ contain the same |
| number of characters (after de-escaping). |
| |
| `a\' |
| `TEXT' |
| As a GNU extension, this command accepts two addresses. |
| |
| Queue the lines of text which follow this command (each but the |
| last ending with a `\', which are removed from the output) to be |
| output at the end of the current cycle, or when the next input |
| line is read. |
| |
| Escape sequences in TEXT are processed, so you should use `\\' in |
| TEXT to print a single backslash. |
| |
| As a GNU extension, if between the `a' and the newline there is |
| other than a whitespace-`\' sequence, then the text of this line, |
| starting at the first non-whitespace character after the `a', is |
| taken as the first line of the TEXT block. (This enables a |
| simplification in scripting a one-line add.) This extension also |
| works with the `i' and `c' commands. |
| |
| `i\' |
| `TEXT' |
| As a GNU extension, this command accepts two addresses. |
| |
| Immediately output the lines of text which follow this command |
| (each but the last ending with a `\', which are removed from the |
| output). |
| |
| `c\' |
| `TEXT' |
| Delete the lines matching the address or address-range, and output |
| the lines of text which follow this command (each but the last |
| ending with a `\', which are removed from the output) in place of |
| the last line (or in place of each line, if no addresses were |
| specified). A new cycle is started after this command is done, |
| since the pattern space will have been deleted. |
| |
| `=' |
| As a GNU extension, this command accepts two addresses. |
| |
| Print out the current input line number (with a trailing newline). |
| |
| `l N' |
| Print the pattern space in an unambiguous form: non-printable |
| characters (and the `\' character) are printed in C-style escaped |
| form; long lines are split, with a trailing `\' character to |
| indicate the split; the end of each line is marked with a `$'. |
| |
| N specifies the desired line-wrap length; a length of 0 (zero) |
| means to never wrap long lines. If omitted, the default as |
| specified on the command line is used. The N parameter is a GNU |
| `sed' extension. |
| |
| `r FILENAME' |
| As a GNU extension, this command accepts two addresses. |
| |
| Queue the contents of FILENAME to be read and inserted into the |
| output stream at the end of the current cycle, or when the next |
| input line is read. Note that if FILENAME cannot be read, it is |
| treated as if it were an empty file, without any error indication. |
| |
| As a GNU `sed' extension, the special value `/dev/stdin' is |
| supported for the file name, which reads the contents of the |
| standard input. |
| |
| `w FILENAME' |
| Write the pattern space to FILENAME. As a GNU `sed' extension, |
| two special values of FILE-NAME are supported: `/dev/stderr', |
| which writes the result to the standard error, and `/dev/stdout', |
| which writes to the standard output.(1) |
| |
| The file will be created (or truncated) before the first input |
| line is read; all `w' commands (including instances of `w' flag on |
| successful `s' commands) which refer to the same FILENAME are |
| output without closing and reopening the file. |
| |
| `D' |
| Delete text in the pattern space up to the first newline. If any |
| text is left, restart cycle with the resultant pattern space |
| (without reading a new line of input), otherwise start a normal |
| new cycle. |
| |
| `N' |
| Add a newline to the pattern space, then append the next line of |
| input to the pattern space. If there is no more input then `sed' |
| exits without processing any more commands. |
| |
| `P' |
| Print out the portion of the pattern space up to the first newline. |
| |
| `h' |
| Replace the contents of the hold space with the contents of the |
| pattern space. |
| |
| `H' |
| Append a newline to the contents of the hold space, and then |
| append the contents of the pattern space to that of the hold space. |
| |
| `g' |
| Replace the contents of the pattern space with the contents of the |
| hold space. |
| |
| `G' |
| Append a newline to the contents of the pattern space, and then |
| append the contents of the hold space to that of the pattern space. |
| |
| `x' |
| Exchange the contents of the hold and pattern spaces. |
| |
| |
| ---------- Footnotes ---------- |
| |
| (1) This is equivalent to `p' unless the `-i' option is being used. |
| |
| |
| File: sed.info, Node: Programming Commands, Next: Extended Commands, Prev: Other Commands, Up: sed Programs |
| |
| 3.7 Commands for `sed' gurus |
| ============================ |
| |
| In most cases, use of these commands indicates that you are probably |
| better off programming in something like `awk' or Perl. But |
| occasionally one is committed to sticking with `sed', and these |
| commands can enable one to write quite convoluted scripts. |
| |
| `: LABEL' |
| [No addresses allowed.] |
| |
| Specify the location of LABEL for branch commands. In all other |
| respects, a no-op. |
| |
| `b LABEL' |
| Unconditionally branch to LABEL. The LABEL may be omitted, in |
| which case the next cycle is started. |
| |
| `t LABEL' |
| Branch to LABEL only if there has been a successful `s'ubstitution |
| since the last input line was read or conditional branch was taken. |
| The LABEL may be omitted, in which case the next cycle is started. |
| |
| |
| |
| File: sed.info, Node: Extended Commands, Next: Escapes, Prev: Programming Commands, Up: sed Programs |
| |
| 3.8 Commands Specific to GNU `sed' |
| ================================== |
| |
| These commands are specific to GNU `sed', so you must use them with |
| care and only when you are sure that hindering portability is not evil. |
| They allow you to check for GNU `sed' extensions or to do tasks that |
| are required quite often, yet are unsupported by standard `sed's. |
| |
| `e [COMMAND]' |
| This command allows one to pipe input from a shell command into |
| pattern space. Without parameters, the `e' command executes the |
| command that is found in pattern space and replaces the pattern |
| space with the output; a trailing newline is suppressed. |
| |
| If a parameter is specified, instead, the `e' command interprets |
| it as a command and sends its output to the output stream (like |
| `r' does). The command can run across multiple lines, all but the |
| last ending with a back-slash. |
| |
| In both cases, the results are undefined if the command to be |
| executed contains a NUL character. |
| |
| `L N' |
| This GNU `sed' extension fills and joins lines in pattern space to |
| produce output lines of (at most) N characters, like `fmt' does; |
| if N is omitted, the default as specified on the command line is |
| used. This command is considered a failed experiment and unless |
| there is enough request (which seems unlikely) will be removed in |
| future versions. |
| |
| `Q [EXIT-CODE]' |
| This command only accepts a single address. |
| |
| This command is the same as `q', but will not print the contents |
| of pattern space. Like `q', it provides the ability to return an |
| exit code to the caller. |
| |
| This command can be useful because the only alternative ways to |
| accomplish this apparently trivial function are to use the `-n' |
| option (which can unnecessarily complicate your script) or |
| resorting to the following snippet, which wastes time by reading |
| the whole file without any visible effect: |
| |
| :eat |
| $d Quit silently on the last line |
| N Read another line, silently |
| g Overwrite pattern space each time to save memory |
| b eat |
| |
| `R FILENAME' |
| Queue a line of FILENAME to be read and inserted into the output |
| stream at the end of the current cycle, or when the next input |
| line is read. Note that if FILENAME cannot be read, or if its end |
| is reached, no line is appended, without any error indication. |
| |
| As with the `r' command, the special value `/dev/stdin' is |
| supported for the file name, which reads a line from the standard |
| input. |
| |
| `T LABEL' |
| Branch to LABEL only if there have been no successful |
| `s'ubstitutions since the last input line was read or conditional |
| branch was taken. The LABEL may be omitted, in which case the next |
| cycle is started. |
| |
| `v VERSION' |
| This command does nothing, but makes `sed' fail if GNU `sed' |
| extensions are not supported, simply because other versions of |
| `sed' do not implement it. In addition, you can specify the |
| version of `sed' that your script requires, such as `4.0.5'. The |
| default is `4.0' because that is the first version that |
| implemented this command. |
| |
| This command enables all GNU extensions even if `POSIXLY_CORRECT' |
| is set in the environment. |
| |
| `W FILENAME' |
| Write to the given filename the portion of the pattern space up to |
| the first newline. Everything said under the `w' command about |
| file handling holds here too. |
| |
| `z' |
| This command empties the content of pattern space. It is usually |
| the same as `s/.*//', but is more efficient and works in the |
| presence of invalid multibyte sequences in the input stream. |
| POSIX mandates that such sequences are _not_ matched by `.', so |
| that there is no portable way to clear `sed''s buffers in the |
| middle of the script in most multibyte locales (including UTF-8 |
| locales). |
| |
| |
| File: sed.info, Node: Escapes, Prev: Extended Commands, Up: sed Programs |
| |
| 3.9 GNU Extensions for Escapes in Regular Expressions |
| ===================================================== |
| |
| Until this chapter, we have only encountered escapes of the form `\^', |
| which tell `sed' not to interpret the circumflex as a special |
| character, but rather to take it literally. For example, `\*' matches |
| a single asterisk rather than zero or more backslashes. |
| |
| This chapter introduces another kind of escape(1)--that is, escapes |
| that are applied to a character or sequence of characters that |
| ordinarily are taken literally, and that `sed' replaces with a special |
| character. This provides a way of encoding non-printable characters in |
| patterns in a visible manner. There is no restriction on the |
| appearance of non-printing characters in a `sed' script but when a |
| script is being prepared in the shell or by text editing, it is usually |
| easier to use one of the following escape sequences than the binary |
| character it represents: |
| |
| The list of these escapes is: |
| |
| `\a' |
| Produces or matches a BEL character, that is an "alert" (ASCII 7). |
| |
| `\f' |
| Produces or matches a form feed (ASCII 12). |
| |
| `\n' |
| Produces or matches a newline (ASCII 10). |
| |
| `\r' |
| Produces or matches a carriage return (ASCII 13). |
| |
| `\t' |
| Produces or matches a horizontal tab (ASCII 9). |
| |
| `\v' |
| Produces or matches a so called "vertical tab" (ASCII 11). |
| |
| `\cX' |
| Produces or matches `CONTROL-X', where X is any character. The |
| precise effect of `\cX' is as follows: if X is a lower case |
| letter, it is converted to upper case. Then bit 6 of the |
| character (hex 40) is inverted. Thus `\cz' becomes hex 1A, but |
| `\c{' becomes hex 3B, while `\c;' becomes hex 7B. |
| |
| `\dXXX' |
| Produces or matches a character whose decimal ASCII value is XXX. |
| |
| `\oXXX' |
| Produces or matches a character whose octal ASCII value is XXX. |
| |
| `\xXX' |
| Produces or matches a character whose hexadecimal ASCII value is |
| XX. |
| |
| `\b' (backspace) was omitted because of the conflict with the |
| existing "word boundary" meaning. |
| |
| Other escapes match a particular character class and are valid only |
| in regular expressions: |
| |
| `\w' |
| Matches any "word" character. A "word" character is any letter or |
| digit or the underscore character. |
| |
| `\W' |
| Matches any "non-word" character. |
| |
| `\b' |
| Matches a word boundary; that is it matches if the character to |
| the left is a "word" character and the character to the right is a |
| "non-word" character, or vice-versa. |
| |
| `\B' |
| Matches everywhere but on a word boundary; that is it matches if |
| the character to the left and the character to the right are |
| either both "word" characters or both "non-word" characters. |
| |
| `\`' |
| Matches only at the start of pattern space. This is different |
| from `^' in multi-line mode. |
| |
| `\'' |
| Matches only at the end of pattern space. This is different from |
| `$' in multi-line mode. |
| |
| |
| ---------- Footnotes ---------- |
| |
| (1) All the escapes introduced here are GNU extensions, with the |
| exception of `\n'. In basic regular expression mode, setting |
| `POSIXLY_CORRECT' disables them inside bracket expressions. |
| |
| |
| File: sed.info, Node: Examples, Next: Limitations, Prev: sed Programs, Up: Top |
| |
| 4 Some Sample Scripts |
| ********************* |
| |
| Here are some `sed' scripts to guide you in the art of mastering `sed'. |
| |
| * Menu: |
| |
| Some exotic examples: |
| * Centering lines:: |
| * Increment a number:: |
| * Rename files to lower case:: |
| * Print bash environment:: |
| * Reverse chars of lines:: |
| |
| Emulating standard utilities: |
| * tac:: Reverse lines of files |
| * cat -n:: Numbering lines |
| * cat -b:: Numbering non-blank lines |
| * wc -c:: Counting chars |
| * wc -w:: Counting words |
| * wc -l:: Counting lines |
| * head:: Printing the first lines |
| * tail:: Printing the last lines |
| * uniq:: Make duplicate lines unique |
| * uniq -d:: Print duplicated lines of input |
| * uniq -u:: Remove all duplicated lines |
| * cat -s:: Squeezing blank lines |
| |
| |
| File: sed.info, Node: Centering lines, Next: Increment a number, Up: Examples |
| |
| 4.1 Centering Lines |
| =================== |
| |
| This script centers all lines of a file on a 80 columns width. To |
| change that width, the number in `\{...\}' must be replaced, and the |
| number of added spaces also must be changed. |
| |
| Note how the buffer commands are used to separate parts in the |
| regular expressions to be matched--this is a common technique. |
| |
| #!/usr/bin/sed -f |
| |
| # Put 80 spaces in the buffer |
| 1 { |
| x |
| s/^$/ / |
| s/^.*$/&&&&&&&&/ |
| x |
| } |
| |
| # del leading and trailing spaces |
| y/tab/ / |
| s/^ *// |
| s/ *$// |
| |
| # add a newline and 80 spaces to end of line |
| G |
| |
| # keep first 81 chars (80 + a newline) |
| s/^\(.\{81\}\).*$/\1/ |
| |
| # \2 matches half of the spaces, which are moved to the beginning |
| s/^\(.*\)\n\(.*\)\2/\2\1/ |
| |
| |
| File: sed.info, Node: Increment a number, Next: Rename files to lower case, Prev: Centering lines, Up: Examples |
| |
| 4.2 Increment a Number |
| ====================== |
| |
| This script is one of a few that demonstrate how to do arithmetic in |
| `sed'. This is indeed possible,(1) but must be done manually. |
| |
| To increment one number you just add 1 to last digit, replacing it |
| by the following digit. There is one exception: when the digit is a |
| nine the previous digits must be also incremented until you don't have |
| a nine. |
| |
| This solution by Bruno Haible is very clever and smart because it |
| uses a single buffer; if you don't have this limitation, the algorithm |
| used in *note Numbering lines: cat -n, is faster. It works by |
| replacing trailing nines with an underscore, then using multiple `s' |
| commands to increment the last digit, and then again substituting |
| underscores with zeros. |
| |
| #!/usr/bin/sed -f |
| |
| /[^0-9]/ d |
| |
| # replace all leading 9s by _ (any other character except digits, could |
| # be used) |
| :d |
| s/9\(_*\)$/_\1/ |
| td |
| |
| # incr last digit only. The first line adds a most-significant |
| # digit of 1 if we have to add a digit. |
| # |
| # The `tn' commands are not necessary, but make the thing |
| # faster |
| |
| s/^\(_*\)$/1\1/; tn |
| s/8\(_*\)$/9\1/; tn |
| s/7\(_*\)$/8\1/; tn |
| s/6\(_*\)$/7\1/; tn |
| s/5\(_*\)$/6\1/; tn |
| s/4\(_*\)$/5\1/; tn |
| s/3\(_*\)$/4\1/; tn |
| s/2\(_*\)$/3\1/; tn |
| s/1\(_*\)$/2\1/; tn |
| s/0\(_*\)$/1\1/; tn |
| |
| :n |
| y/_/0/ |
| |
| ---------- Footnotes ---------- |
| |
| (1) `sed' guru Greg Ubben wrote an implementation of the `dc' RPN |
| calculator! It is distributed together with sed. |
| |
| |
| File: sed.info, Node: Rename files to lower case, Next: Print bash environment, Prev: Increment a number, Up: Examples |
| |
| 4.3 Rename Files to Lower Case |
| ============================== |
| |
| This is a pretty strange use of `sed'. We transform text, and |
| transform it to be shell commands, then just feed them to shell. Don't |
| worry, even worse hacks are done when using `sed'; I have seen a script |
| converting the output of `date' into a `bc' program! |
| |
| The main body of this is the `sed' script, which remaps the name |
| from lower to upper (or vice-versa) and even checks out if the remapped |
| name is the same as the original name. Note how the script is |
| parameterized using shell variables and proper quoting. |
| |
| #! /bin/sh |
| # rename files to lower/upper case... |
| # |
| # usage: |
| # move-to-lower * |
| # move-to-upper * |
| # or |
| # move-to-lower -R . |
| # move-to-upper -R . |
| # |
| |
| help() |
| { |
| cat << eof |
| Usage: $0 [-n] [-r] [-h] files... |
| |
| -n do nothing, only see what would be done |
| -R recursive (use find) |
| -h this message |
| files files to remap to lower case |
| |
| Examples: |
| $0 -n * (see if everything is ok, then...) |
| $0 * |
| |
| $0 -R . |
| |
| eof |
| } |
| |
| apply_cmd='sh' |
| finder='echo "$@" | tr " " "\n"' |
| files_only= |
| |
| while : |
| do |
| case "$1" in |
| -n) apply_cmd='cat' ;; |
| -R) finder='find "$@" -type f';; |
| -h) help ; exit 1 ;; |
| *) break ;; |
| esac |
| shift |
| done |
| |
| if [ -z "$1" ]; then |
| echo Usage: $0 [-h] [-n] [-r] files... |
| exit 1 |
| fi |
| |
| LOWER='abcdefghijklmnopqrstuvwxyz' |
| UPPER='ABCDEFGHIJKLMNOPQRSTUVWXYZ' |
| |
| case `basename $0` in |
| *upper*) TO=$UPPER; FROM=$LOWER ;; |
| *) FROM=$UPPER; TO=$LOWER ;; |
| esac |
| |
| eval $finder | sed -n ' |
| |
| # remove all trailing slashes |
| s/\/*$// |
| |
| # add ./ if there is no path, only a filename |
| /\//! s/^/.\// |
| |
| # save path+filename |
| h |
| |
| # remove path |
| s/.*\/// |
| |
| # do conversion only on filename |
| y/'$FROM'/'$TO'/ |
| |
| # now line contains original path+file, while |
| # hold space contains the new filename |
| x |
| |
| # add converted file name to line, which now contains |
| # path/file-name\nconverted-file-name |
| G |
| |
| # check if converted file name is equal to original file name, |
| # if it is, do not print nothing |
| /^.*\/\(.*\)\n\1/b |
| |
| # now, transform path/fromfile\n, into |
| # mv path/fromfile path/tofile and print it |
| s/^\(.*\/\)\(.*\)\n\(.*\)$/mv "\1\2" "\1\3"/p |
| |
| ' | $apply_cmd |
| |
| |
| File: sed.info, Node: Print bash environment, Next: Reverse chars of lines, Prev: Rename files to lower case, Up: Examples |
| |
| 4.4 Print `bash' Environment |
| ============================ |
| |
| This script strips the definition of the shell functions from the |
| output of the `set' Bourne-shell command. |
| |
| #!/bin/sh |
| |
| set | sed -n ' |
| :x |
| |
| # if no occurrence of "=()" print and load next line |
| /=()/! { p; b; } |
| / () $/! { p; b; } |
| |
| # possible start of functions section |
| # save the line in case this is a var like FOO="() " |
| h |
| |
| # if the next line has a brace, we quit because |
| # nothing comes after functions |
| n |
| /^{/ q |
| |
| # print the old line |
| x; p |
| |
| # work on the new line now |
| x; bx |
| ' |
| |
| |
| File: sed.info, Node: Reverse chars of lines, Next: tac, Prev: Print bash environment, Up: Examples |
| |
| 4.5 Reverse Characters of Lines |
| =============================== |
| |
| This script can be used to reverse the position of characters in lines. |
| The technique moves two characters at a time, hence it is faster than |
| more intuitive implementations. |
| |
| Note the `tx' command before the definition of the label. This is |
| often needed to reset the flag that is tested by the `t' command. |
| |
| Imaginative readers will find uses for this script. An example is |
| reversing the output of `banner'.(1) |
| |
| #!/usr/bin/sed -f |
| |
| /../! b |
| |
| # Reverse a line. Begin embedding the line between two newlines |
| s/^.*$/\ |
| &\ |
| / |
| |
| # Move first character at the end. The regexp matches until |
| # there are zero or one characters between the markers |
| tx |
| :x |
| s/\(\n.\)\(.*\)\(.\n\)/\3\2\1/ |
| tx |
| |
| # Remove the newline markers |
| s/\n//g |
| |
| ---------- Footnotes ---------- |
| |
| (1) This requires another script to pad the output of banner; for |
| example |
| |
| #! /bin/sh |
| |
| banner -w $1 $2 $3 $4 | |
| sed -e :a -e '/^.\{0,'$1'\}$/ { s/$/ /; ba; }' | |
| ~/sedscripts/reverseline.sed |
| |
| |
| File: sed.info, Node: tac, Next: cat -n, Prev: Reverse chars of lines, Up: Examples |
| |
| 4.6 Reverse Lines of Files |
| ========================== |
| |
| This one begins a series of totally useless (yet interesting) scripts |
| emulating various Unix commands. This, in particular, is a `tac' |
| workalike. |
| |
| Note that on implementations other than GNU `sed' this script might |
| easily overflow internal buffers. |
| |
| #!/usr/bin/sed -nf |
| |
| # reverse all lines of input, i.e. first line became last, ... |
| |
| # from the second line, the buffer (which contains all previous lines) |
| # is *appended* to current line, so, the order will be reversed |
| 1! G |
| |
| # on the last line we're done -- print everything |
| $ p |
| |
| # store everything on the buffer again |
| h |
| |
| |
| File: sed.info, Node: cat -n, Next: cat -b, Prev: tac, Up: Examples |
| |
| 4.7 Numbering Lines |
| =================== |
| |
| This script replaces `cat -n'; in fact it formats its output exactly |
| like GNU `cat' does. |
| |
| Of course this is completely useless and for two reasons: first, |
| because somebody else did it in C, second, because the following |
| Bourne-shell script could be used for the same purpose and would be |
| much faster: |
| |
| #! /bin/sh |
| sed -e "=" $@ | sed -e ' |
| s/^/ / |
| N |
| s/^ *\(......\)\n/\1 / |
| ' |
| |
| It uses `sed' to print the line number, then groups lines two by two |
| using `N'. Of course, this script does not teach as much as the one |
| presented below. |
| |
| The algorithm used for incrementing uses both buffers, so the line |
| is printed as soon as possible and then discarded. The number is split |
| so that changing digits go in a buffer and unchanged ones go in the |
| other; the changed digits are modified in a single step (using a `y' |
| command). The line number for the next line is then composed and |
| stored in the hold space, to be used in the next iteration. |
| |
| #!/usr/bin/sed -nf |
| |
| # Prime the pump on the first line |
| x |
| /^$/ s/^.*$/1/ |
| |
| # Add the correct line number before the pattern |
| G |
| h |
| |
| # Format it and print it |
| s/^/ / |
| s/^ *\(......\)\n/\1 /p |
| |
| # Get the line number from hold space; add a zero |
| # if we're going to add a digit on the next line |
| g |
| s/\n.*$// |
| /^9*$/ s/^/0/ |
| |
| # separate changing/unchanged digits with an x |
| s/.9*$/x&/ |
| |
| # keep changing digits in hold space |
| h |
| s/^.*x// |
| y/0123456789/1234567890/ |
| x |
| |
| # keep unchanged digits in pattern space |
| s/x.*$// |
| |
| # compose the new number, remove the newline implicitly added by G |
| G |
| s/\n// |
| h |
| |
| |
| File: sed.info, Node: cat -b, Next: wc -c, Prev: cat -n, Up: Examples |
| |
| 4.8 Numbering Non-blank Lines |
| ============================= |
| |
| Emulating `cat -b' is almost the same as `cat -n'--we only have to |
| select which lines are to be numbered and which are not. |
| |
| The part that is common to this script and the previous one is not |
| commented to show how important it is to comment `sed' scripts |
| properly... |
| |
| #!/usr/bin/sed -nf |
| |
| /^$/ { |
| p |
| b |
| } |
| |
| # Same as cat -n from now |
| x |
| /^$/ s/^.*$/1/ |
| G |
| h |
| s/^/ / |
| s/^ *\(......\)\n/\1 /p |
| x |
| s/\n.*$// |
| /^9*$/ s/^/0/ |
| s/.9*$/x&/ |
| h |
| s/^.*x// |
| y/0123456789/1234567890/ |
| x |
| s/x.*$// |
| G |
| s/\n// |
| h |
| |
| |
| File: sed.info, Node: wc -c, Next: wc -w, Prev: cat -b, Up: Examples |
| |
| 4.9 Counting Characters |
| ======================= |
| |
| This script shows another way to do arithmetic with `sed'. In this |
| case we have to add possibly large numbers, so implementing this by |
| successive increments would not be feasible (and possibly even more |
| complicated to contrive than this script). |
| |
| The approach is to map numbers to letters, kind of an abacus |
| implemented with `sed'. `a's are units, `b's are tens and so on: we |
| simply add the number of characters on the current line as units, and |
| then propagate the carry to tens, hundreds, and so on. |
| |
| As usual, running totals are kept in hold space. |
| |
| On the last line, we convert the abacus form back to decimal. For |
| the sake of variety, this is done with a loop rather than with some 80 |
| `s' commands(1): first we convert units, removing `a's from the number; |
| then we rotate letters so that tens become `a's, and so on until no |
| more letters remain. |
| |
| #!/usr/bin/sed -nf |
| |
| # Add n+1 a's to hold space (+1 is for the newline) |
| s/./a/g |
| H |
| x |
| s/\n/a/ |
| |
| # Do the carry. The t's and b's are not necessary, |
| # but they do speed up the thing |
| t a |
| : a; s/aaaaaaaaaa/b/g; t b; b done |
| : b; s/bbbbbbbbbb/c/g; t c; b done |
| : c; s/cccccccccc/d/g; t d; b done |
| : d; s/dddddddddd/e/g; t e; b done |
| : e; s/eeeeeeeeee/f/g; t f; b done |
| : f; s/ffffffffff/g/g; t g; b done |
| : g; s/gggggggggg/h/g; t h; b done |
| : h; s/hhhhhhhhhh//g |
| |
| : done |
| $! { |
| h |
| b |
| } |
| |
| # On the last line, convert back to decimal |
| |
| : loop |
| /a/! s/[b-h]*/&0/ |
| s/aaaaaaaaa/9/ |
| s/aaaaaaaa/8/ |
| s/aaaaaaa/7/ |
| s/aaaaaa/6/ |
| s/aaaaa/5/ |
| s/aaaa/4/ |
| s/aaa/3/ |
| s/aa/2/ |
| s/a/1/ |
| |
| : next |
| y/bcdefgh/abcdefg/ |
| /[a-h]/ b loop |
| p |
| |
| ---------- Footnotes ---------- |
| |
| (1) Some implementations have a limit of 199 commands per script |
| |
| |
| File: sed.info, Node: wc -w, Next: wc -l, Prev: wc -c, Up: Examples |
| |
| 4.10 Counting Words |
| =================== |
| |
| This script is almost the same as the previous one, once each of the |
| words on the line is converted to a single `a' (in the previous script |
| each letter was changed to an `a'). |
| |
| It is interesting that real `wc' programs have optimized loops for |
| `wc -c', so they are much slower at counting words rather than |
| characters. This script's bottleneck, instead, is arithmetic, and |
| hence the word-counting one is faster (it has to manage smaller |
| numbers). |
| |
| Again, the common parts are not commented to show the importance of |
| commenting `sed' scripts. |
| |
| #!/usr/bin/sed -nf |
| |
| # Convert words to a's |
| s/[ tab][ tab]*/ /g |
| s/^/ / |
| s/ [^ ][^ ]*/a /g |
| s/ //g |
| |
| # Append them to hold space |
| H |
| x |
| s/\n// |
| |
| # From here on it is the same as in wc -c. |
| /aaaaaaaaaa/! bx; s/aaaaaaaaaa/b/g |
| /bbbbbbbbbb/! bx; s/bbbbbbbbbb/c/g |
| /cccccccccc/! bx; s/cccccccccc/d/g |
| /dddddddddd/! bx; s/dddddddddd/e/g |
| /eeeeeeeeee/! bx; s/eeeeeeeeee/f/g |
| /ffffffffff/! bx; s/ffffffffff/g/g |
| /gggggggggg/! bx; s/gggggggggg/h/g |
| s/hhhhhhhhhh//g |
| :x |
| $! { h; b; } |
| :y |
| /a/! s/[b-h]*/&0/ |
| s/aaaaaaaaa/9/ |
| s/aaaaaaaa/8/ |
| s/aaaaaaa/7/ |
| s/aaaaaa/6/ |
| s/aaaaa/5/ |
| s/aaaa/4/ |
| s/aaa/3/ |
| s/aa/2/ |
| s/a/1/ |
| y/bcdefgh/abcdefg/ |
| /[a-h]/ by |
| p |
| |
| |
| File: sed.info, Node: wc -l, Next: head, Prev: wc -w, Up: Examples |
| |
| 4.11 Counting Lines |
| =================== |
| |
| No strange things are done now, because `sed' gives us `wc -l' |
| functionality for free!!! Look: |
| |
| #!/usr/bin/sed -nf |
| $= |
| |
| |
| File: sed.info, Node: head, Next: tail, Prev: wc -l, Up: Examples |
| |
| 4.12 Printing the First Lines |
| ============================= |
| |
| This script is probably the simplest useful `sed' script. It displays |
| the first 10 lines of input; the number of displayed lines is right |
| before the `q' command. |
| |
| #!/usr/bin/sed -f |
| 10q |
| |
| |
| File: sed.info, Node: tail, Next: uniq, Prev: head, Up: Examples |
| |
| 4.13 Printing the Last Lines |
| ============================ |
| |
| Printing the last N lines rather than the first is more complex but |
| indeed possible. N is encoded in the second line, before the bang |
| character. |
| |
| This script is similar to the `tac' script in that it keeps the |
| final output in the hold space and prints it at the end: |
| |
| #!/usr/bin/sed -nf |
| |
| 1! {; H; g; } |
| 1,10 !s/[^\n]*\n// |
| $p |
| h |
| |
| Mainly, the scripts keeps a window of 10 lines and slides it by |
| adding a line and deleting the oldest (the substitution command on the |
| second line works like a `D' command but does not restart the loop). |
| |
| The "sliding window" technique is a very powerful way to write |
| efficient and complex `sed' scripts, because commands like `P' would |
| require a lot of work if implemented manually. |
| |
| To introduce the technique, which is fully demonstrated in the rest |
| of this chapter and is based on the `N', `P' and `D' commands, here is |
| an implementation of `tail' using a simple "sliding window." |
| |
| This looks complicated but in fact the working is the same as the |
| last script: after we have kicked in the appropriate number of lines, |
| however, we stop using the hold space to keep inter-line state, and |
| instead use `N' and `D' to slide pattern space by one line: |
| |
| #!/usr/bin/sed -f |
| |
| 1h |
| 2,10 {; H; g; } |
| $q |
| 1,9d |
| N |
| D |
| |
| Note how the first, second and fourth line are inactive after the |
| first ten lines of input. After that, all the script does is: exiting |
| on the last line of input, appending the next input line to pattern |
| space, and removing the first line. |
| |
| |
| File: sed.info, Node: uniq, Next: uniq -d, Prev: tail, Up: Examples |
| |
| 4.14 Make Duplicate Lines Unique |
| ================================ |
| |
| This is an example of the art of using the `N', `P' and `D' commands, |
| probably the most difficult to master. |
| |
| #!/usr/bin/sed -f |
| h |
| |
| :b |
| # On the last line, print and exit |
| $b |
| N |
| /^\(.*\)\n\1$/ { |
| # The two lines are identical. Undo the effect of |
| # the n command. |
| g |
| bb |
| } |
| |
| # If the `N' command had added the last line, print and exit |
| $b |
| |
| # The lines are different; print the first and go |
| # back working on the second. |
| P |
| D |
| |
| As you can see, we mantain a 2-line window using `P' and `D'. This |
| technique is often used in advanced `sed' scripts. |
| |
| |
| File: sed.info, Node: uniq -d, Next: uniq -u, Prev: uniq, Up: Examples |
| |
| 4.15 Print Duplicated Lines of Input |
| ==================================== |
| |
| This script prints only duplicated lines, like `uniq -d'. |
| |
| #!/usr/bin/sed -nf |
| |
| $b |
| N |
| /^\(.*\)\n\1$/ { |
| # Print the first of the duplicated lines |
| s/.*\n// |
| p |
| |
| # Loop until we get a different line |
| :b |
| $b |
| N |
| /^\(.*\)\n\1$/ { |
| s/.*\n// |
| bb |
| } |
| } |
| |
| # The last line cannot be followed by duplicates |
| $b |
| |
| # Found a different one. Leave it alone in the pattern space |
| # and go back to the top, hunting its duplicates |
| D |
| |
| |
| File: sed.info, Node: uniq -u, Next: cat -s, Prev: uniq -d, Up: Examples |
| |
| 4.16 Remove All Duplicated Lines |
| ================================ |
| |
| This script prints only unique lines, like `uniq -u'. |
| |
| #!/usr/bin/sed -f |
| |
| # Search for a duplicate line --- until that, print what you find. |
| $b |
| N |
| /^\(.*\)\n\1$/ ! { |
| P |
| D |
| } |
| |
| :c |
| # Got two equal lines in pattern space. At the |
| # end of the file we simply exit |
| $d |
| |
| # Else, we keep reading lines with `N' until we |
| # find a different one |
| s/.*\n// |
| N |
| /^\(.*\)\n\1$/ { |
| bc |
| } |
| |
| # Remove the last instance of the duplicate line |
| # and go back to the top |
| D |
| |
| |
| File: sed.info, Node: cat -s, Prev: uniq -u, Up: Examples |
| |
| 4.17 Squeezing Blank Lines |
| ========================== |
| |
| As a final example, here are three scripts, of increasing complexity |
| and speed, that implement the same function as `cat -s', that is |
| squeezing blank lines. |
| |
| The first leaves a blank line at the beginning and end if there are |
| some already. |
| |
| #!/usr/bin/sed -f |
| |
| # on empty lines, join with next |
| # Note there is a star in the regexp |
| :x |
| /^\n*$/ { |
| N |
| bx |
| } |
| |
| # now, squeeze all '\n', this can be also done by: |
| # s/^\(\n\)*/\1/ |
| s/\n*/\ |
| / |
| |
| This one is a bit more complex and removes all empty lines at the |
| beginning. It does leave a single blank line at end if one was there. |
| |
| #!/usr/bin/sed -f |
| |
| # delete all leading empty lines |
| 1,/^./{ |
| /./!d |
| } |
| |
| # on an empty line we remove it and all the following |
| # empty lines, but one |
| :x |
| /./!{ |
| N |
| s/^\n$// |
| tx |
| } |
| |
| This removes leading and trailing blank lines. It is also the |
| fastest. Note that loops are completely done with `n' and `b', without |
| relying on `sed' to restart the the script automatically at the end of |
| a line. |
| |
| #!/usr/bin/sed -nf |
| |
| # delete all (leading) blanks |
| /./!d |
| |
| # get here: so there is a non empty |
| :x |
| # print it |
| p |
| # get next |
| n |
| # got chars? print it again, etc... |
| /./bx |
| |
| # no, don't have chars: got an empty line |
| :z |
| # get next, if last line we finish here so no trailing |
| # empty lines are written |
| n |
| # also empty? then ignore it, and get next... this will |
| # remove ALL empty lines |
| /./!bz |
| |
| # all empty lines were deleted/ignored, but we have a non empty. As |
| # what we want to do is to squeeze, insert a blank line artificially |
| i\ |
| |
| bx |
| |
| |
| File: sed.info, Node: Limitations, Next: Other Resources, Prev: Examples, Up: Top |
| |
| 5 GNU `sed''s Limitations and Non-limitations |
| ********************************************* |
| |
| For those who want to write portable `sed' scripts, be aware that some |
| implementations have been known to limit line lengths (for the pattern |
| and hold spaces) to be no more than 4000 bytes. The POSIX standard |
| specifies that conforming `sed' implementations shall support at least |
| 8192 byte line lengths. GNU `sed' has no built-in limit on line length; |
| as long as it can `malloc()' more (virtual) memory, you can feed or |
| construct lines as long as you like. |
| |
| However, recursion is used to handle subpatterns and indefinite |
| repetition. This means that the available stack space may limit the |
| size of the buffer that can be processed by certain patterns. |
| |
| |
| File: sed.info, Node: Other Resources, Next: Reporting Bugs, Prev: Limitations, Up: Top |
| |
| 6 Other Resources for Learning About `sed' |
| ****************************************** |
| |
| In addition to several books that have been written about `sed' (either |
| specifically or as chapters in books which discuss shell programming), |
| one can find out more about `sed' (including suggestions of a few |
| books) from the FAQ for the `sed-users' mailing list, available from: |
| `http://sed.sourceforge.net/sedfaq.html' |
| |
| Also of interest are |
| `http://www.student.northpark.edu/pemente/sed/index.htm' and |
| `http://sed.sf.net/grabbag', which include `sed' tutorials and other |
| `sed'-related goodies. |
| |
| The `sed-users' mailing list itself maintained by Sven Guckes. To |
| subscribe, visit `http://groups.yahoo.com' and search for the |
| `sed-users' mailing list. |
| |
| |
| File: sed.info, Node: Reporting Bugs, Next: Extended regexps, Prev: Other Resources, Up: Top |
| |
| 7 Reporting Bugs |
| **************** |
| |
| Email bug reports to <bonzini@gnu.org>. Be sure to include the word |
| "sed" somewhere in the `Subject:' field. Also, please include the |
| output of `sed --version' in the body of your report if at all possible. |
| |
| Please do not send a bug report like this: |
| |
| while building frobme-1.3.4 |
| $ configure |
| error--> sed: file sedscr line 1: Unknown option to 's' |
| |
| If GNU `sed' doesn't configure your favorite package, take a few |
| extra minutes to identify the specific problem and make a stand-alone |
| test case. Unlike other programs such as C compilers, making such test |
| cases for `sed' is quite simple. |
| |
| A stand-alone test case includes all the data necessary to perform |
| the test, and the specific invocation of `sed' that causes the problem. |
| The smaller a stand-alone test case is, the better. A test case should |
| not involve something as far removed from `sed' as "try to configure |
| frobme-1.3.4". Yes, that is in principle enough information to look |
| for the bug, but that is not a very practical prospect. |
| |
| Here are a few commonly reported bugs that are not bugs. |
| |
| `N' command on the last line |
| Most versions of `sed' exit without printing anything when the `N' |
| command is issued on the last line of a file. GNU `sed' prints |
| pattern space before exiting unless of course the `-n' command |
| switch has been specified. This choice is by design. |
| |
| For example, the behavior of |
| sed N foo bar |
| would depend on whether foo has an even or an odd number of |
| lines(1). Or, when writing a script to read the next few lines |
| following a pattern match, traditional implementations of `sed' |
| would force you to write something like |
| /foo/{ $!N; $!N; $!N; $!N; $!N; $!N; $!N; $!N; $!N } |
| instead of just |
| /foo/{ N;N;N;N;N;N;N;N;N; } |
| |
| In any case, the simplest workaround is to use `$d;N' in scripts |
| that rely on the traditional behavior, or to set the |
| `POSIXLY_CORRECT' variable to a non-empty value. |
| |
| Regex syntax clashes (problems with backslashes) |
| `sed' uses the POSIX basic regular expression syntax. According to |
| the standard, the meaning of some escape sequences is undefined in |
| this syntax; notable in the case of `sed' are `\|', `\+', `\?', |
| `\`', `\'', `\<', `\>', `\b', `\B', `\w', and `\W'. |
| |
| As in all GNU programs that use POSIX basic regular expressions, |
| `sed' interprets these escape sequences as special characters. |
| So, `x\+' matches one or more occurrences of `x'. `abc\|def' |
| matches either `abc' or `def'. |
| |
| This syntax may cause problems when running scripts written for |
| other `sed's. Some `sed' programs have been written with the |
| assumption that `\|' and `\+' match the literal characters `|' and |
| `+'. Such scripts must be modified by removing the spurious |
| backslashes if they are to be used with modern implementations of |
| `sed', like GNU `sed'. |
| |
| On the other hand, some scripts use s|abc\|def||g to remove |
| occurrences of _either_ `abc' or `def'. While this worked until |
| `sed' 4.0.x, newer versions interpret this as removing the string |
| `abc|def'. This is again undefined behavior according to POSIX, |
| and this interpretation is arguably more robust: older `sed's, for |
| example, required that the regex matcher parsed `\/' as `/' in the |
| common case of escaping a slash, which is again undefined |
| behavior; the new behavior avoids this, and this is good because |
| the regex matcher is only partially under our control. |
| |
| In addition, this version of `sed' supports several escape |
| characters (some of which are multi-character) to insert |
| non-printable characters in scripts (`\a', `\c', `\d', `\o', `\r', |
| `\t', `\v', `\x'). These can cause similar problems with scripts |
| written for other `sed's. |
| |
| `-i' clobbers read-only files |
| In short, `sed -i' will let you delete the contents of a read-only |
| file, and in general the `-i' option (*note Invocation: Invoking |
| sed.) lets you clobber protected files. This is not a bug, but |
| rather a consequence of how the Unix filesystem works. |
| |
| The permissions on a file say what can happen to the data in that |
| file, while the permissions on a directory say what can happen to |
| the list of files in that directory. `sed -i' will not ever open |
| for writing a file that is already on disk. Rather, it will work |
| on a temporary file that is finally renamed to the original name: |
| if you rename or delete files, you're actually modifying the |
| contents of the directory, so the operation depends on the |
| permissions of the directory, not of the file. For this same |
| reason, `sed' does not let you use `-i' on a writeable file in a |
| read-only directory, and will break hard or symbolic links when |
| `-i' is used on such a file. |
| |
| `0a' does not work (gives an error) |
| There is no line 0. 0 is a special address that is only used to |
| treat addresses like `0,/RE/' as active when the script starts: if |
| you write `1,/abc/d' and the first line includes the word `abc', |
| then that match would be ignored because address ranges must span |
| at least two lines (barring the end of the file); but what you |
| probably wanted is to delete every line up to the first one |
| including `abc', and this is obtained with `0,/abc/d'. |
| |
| `[a-z]' is case insensitive |
| You are encountering problems with locales. POSIX mandates that |
| `[a-z]' uses the current locale's collation order - in C parlance, |
| that means using `strcoll(3)' instead of `strcmp(3)'. Some |
| locales have a case-insensitive collation order, others don't. |
| |
| Another problem is that `[a-z]' tries to use collation symbols. |
| This only happens if you are on the GNU system, using GNU libc's |
| regular expression matcher instead of compiling the one supplied |
| with GNU sed. In a Danish locale, for example, the regular |
| expression `^[a-z]$' matches the string `aa', because this is a |
| single collating symbol that comes after `a' and before `b'; `ll' |
| behaves similarly in Spanish locales, or `ij' in Dutch locales. |
| |
| To work around these problems, which may cause bugs in shell |
| scripts, set the `LC_COLLATE' and `LC_CTYPE' environment variables |
| to `C'. |
| |
| `s/.*//' does not clear pattern space |
| This happens if your input stream includes invalid multibyte |
| sequences. POSIX mandates that such sequences are _not_ matched |
| by `.', so that `s/.*//' will not clear pattern space as you would |
| expect. In fact, there is no way to clear sed's buffers in the |
| middle of the script in most multibyte locales (including UTF-8 |
| locales). For this reason, GNU `sed' provides a `z' command (for |
| `zap') as an extension. |
| |
| To work around these problems, which may cause bugs in shell |
| scripts, set the `LC_COLLATE' and `LC_CTYPE' environment variables |
| to `C'. |
| |
| ---------- Footnotes ---------- |
| |
| (1) which is the actual "bug" that prompted the change in behavior |
| |
| |
| File: sed.info, Node: Extended regexps, Next: Concept Index, Prev: Reporting Bugs, Up: Top |
| |
| Appendix A Extended regular expressions |
| *************************************** |
| |
| The only difference between basic and extended regular expressions is in |
| the behavior of a few characters: `?', `+', parentheses, and braces |
| (`{}'). While basic regular expressions require these to be escaped if |
| you want them to behave as special characters, when using extended |
| regular expressions you must escape them if you want them _to match a |
| literal character_. |
| |
| Examples: |
| `abc?' |
| becomes `abc\?' when using extended regular expressions. It |
| matches the literal string `abc?'. |
| |
| `c\+' |
| becomes `c+' when using extended regular expressions. It matches |
| one or more `c's. |
| |
| `a\{3,\}' |
| becomes `a{3,}' when using extended regular expressions. It |
| matches three or more `a's. |
| |
| `\(abc\)\{2,3\}' |
| becomes `(abc){2,3}' when using extended regular expressions. It |
| matches either `abcabc' or `abcabcabc'. |
| |
| `\(abc*\)\1' |
| becomes `(abc*)\1' when using extended regular expressions. |
| Backreferences must still be escaped when using extended regular |
| expressions. |
| |
| |
| File: sed.info, Node: Concept Index, Next: Command and Option Index, Prev: Extended regexps, Up: Top |
| |
| Concept Index |
| ************* |
| |
| This is a general index of all issues discussed in this manual, with the |
| exception of the `sed' commands and command-line options. |
| |
| [index] |
| * Menu: |
| |
| * 0 address: Reporting Bugs. (line 103) |
| * Additional reading about sed: Other Resources. (line 6) |
| * ADDR1,+N: Addresses. (line 78) |
| * ADDR1,~N: Addresses. (line 78) |
| * Address, as a regular expression: Addresses. (line 27) |
| * Address, last line: Addresses. (line 22) |
| * Address, numeric: Addresses. (line 8) |
| * Addresses, in sed scripts: Addresses. (line 6) |
| * Append hold space to pattern space: Other Commands. (line 125) |
| * Append next input line to pattern space: Other Commands. (line 105) |
| * Append pattern space to hold space: Other Commands. (line 117) |
| * Appending text after a line: Other Commands. (line 27) |
| * Backreferences, in regular expressions: The "s" Command. (line 19) |
| * Branch to a label, if s/// failed: Extended Commands. (line 63) |
| * Branch to a label, if s/// succeeded: Programming Commands. |
| (line 22) |
| * Branch to a label, unconditionally: Programming Commands. |
| (line 18) |
| * Buffer spaces, pattern and hold: Execution Cycle. (line 6) |
| * Bugs, reporting: Reporting Bugs. (line 6) |
| * Case-insensitive matching: The "s" Command. (line 94) |
| * Caveat -- #n on first line: Common Commands. (line 20) |
| * Command groups: Common Commands. (line 50) |
| * Comments, in scripts: Common Commands. (line 12) |
| * Conditional branch <1>: Extended Commands. (line 63) |
| * Conditional branch: Programming Commands. |
| (line 22) |
| * Copy hold space into pattern space: Other Commands. (line 121) |
| * Copy pattern space into hold space: Other Commands. (line 113) |
| * Delete first line from pattern space: Other Commands. (line 99) |
| * Disabling autoprint, from command line: Invoking sed. (line 34) |
| * empty regular expression: Addresses. (line 31) |
| * Emptying pattern space <1>: Reporting Bugs. (line 130) |
| * Emptying pattern space: Extended Commands. (line 85) |
| * Evaluate Bourne-shell commands: Extended Commands. (line 12) |
| * Evaluate Bourne-shell commands, after substitution: The "s" Command. |
| (line 85) |
| * Exchange hold space with pattern space: Other Commands. (line 129) |
| * Excluding lines: Addresses. (line 101) |
| * Extended regular expressions, choosing: Invoking sed. (line 113) |
| * Extended regular expressions, syntax: Extended regexps. (line 6) |
| * Files to be processed as input: Invoking sed. (line 141) |
| * Flow of control in scripts: Programming Commands. |
| (line 11) |
| * Global substitution: The "s" Command. (line 51) |
| * GNU extensions, /dev/stderr file <1>: Other Commands. (line 88) |
| * GNU extensions, /dev/stderr file: The "s" Command. (line 78) |
| * GNU extensions, /dev/stdin file <1>: Extended Commands. (line 53) |
| * GNU extensions, /dev/stdin file: Other Commands. (line 78) |
| * GNU extensions, /dev/stdout file <1>: Other Commands. (line 88) |
| * GNU extensions, /dev/stdout file <2>: The "s" Command. (line 78) |
| * GNU extensions, /dev/stdout file: Invoking sed. (line 149) |
| * GNU extensions, 0 address <1>: Reporting Bugs. (line 103) |
| * GNU extensions, 0 address: Addresses. (line 78) |
| * GNU extensions, 0,ADDR2 addressing: Addresses. (line 78) |
| * GNU extensions, ADDR1,+N addressing: Addresses. (line 78) |
| * GNU extensions, ADDR1,~N addressing: Addresses. (line 78) |
| * GNU extensions, branch if s/// failed: Extended Commands. (line 63) |
| * GNU extensions, case modifiers in s commands: The "s" Command. |
| (line 23) |
| * GNU extensions, checking for their presence: Extended Commands. |
| (line 69) |
| * GNU extensions, disabling: Invoking sed. (line 81) |
| * GNU extensions, emptying pattern space <1>: Reporting Bugs. (line 130) |
| * GNU extensions, emptying pattern space: Extended Commands. (line 85) |
| * GNU extensions, evaluating Bourne-shell commands <1>: Extended Commands. |
| (line 12) |
| * GNU extensions, evaluating Bourne-shell commands: The "s" Command. |
| (line 85) |
| * GNU extensions, extended regular expressions: Invoking sed. (line 113) |
| * GNU extensions, g and NUMBER modifier interaction in s command: The "s" Command. |
| (line 57) |
| * GNU extensions, I modifier <1>: The "s" Command. (line 94) |
| * GNU extensions, I modifier: Addresses. (line 49) |
| * GNU extensions, in-place editing <1>: Reporting Bugs. (line 85) |
| * GNU extensions, in-place editing: Invoking sed. (line 51) |
| * GNU extensions, L command: Extended Commands. (line 26) |
| * GNU extensions, M modifier: The "s" Command. (line 99) |
| * GNU extensions, modifiers and the empty regular expression: Addresses. |
| (line 31) |
| * GNU extensions, N~M addresses: Addresses. (line 13) |
| * GNU extensions, quitting silently: Extended Commands. (line 36) |
| * GNU extensions, R command: Extended Commands. (line 53) |
| * GNU extensions, reading a file a line at a time: Extended Commands. |
| (line 53) |
| * GNU extensions, reformatting paragraphs: Extended Commands. (line 26) |
| * GNU extensions, returning an exit code <1>: Extended Commands. |
| (line 36) |
| * GNU extensions, returning an exit code: Common Commands. (line 30) |
| * GNU extensions, setting line length: Other Commands. (line 65) |
| * GNU extensions, special escapes <1>: Reporting Bugs. (line 78) |
| * GNU extensions, special escapes: Escapes. (line 6) |
| * GNU extensions, special two-address forms: Addresses. (line 78) |
| * GNU extensions, subprocesses <1>: Extended Commands. (line 12) |
| * GNU extensions, subprocesses: The "s" Command. (line 85) |
| * GNU extensions, to basic regular expressions <1>: Reporting Bugs. |
| (line 51) |
| * GNU extensions, to basic regular expressions: Regular Expressions. |
| (line 26) |
| * GNU extensions, two addresses supported by most commands: Other Commands. |
| (line 25) |
| * GNU extensions, unlimited line length: Limitations. (line 6) |
| * GNU extensions, writing first line to a file: Extended Commands. |
| (line 80) |
| * Goto, in scripts: Programming Commands. |
| (line 18) |
| * Greedy regular expression matching: Regular Expressions. (line 143) |
| * Grouping commands: Common Commands. (line 50) |
| * Hold space, appending from pattern space: Other Commands. (line 117) |
| * Hold space, appending to pattern space: Other Commands. (line 125) |
| * Hold space, copy into pattern space: Other Commands. (line 121) |
| * Hold space, copying pattern space into: Other Commands. (line 113) |
| * Hold space, definition: Execution Cycle. (line 6) |
| * Hold space, exchange with pattern space: Other Commands. (line 129) |
| * In-place editing: Reporting Bugs. (line 85) |
| * In-place editing, activating: Invoking sed. (line 51) |
| * In-place editing, Perl-style backup file names: Invoking sed. |
| (line 62) |
| * Inserting text before a line: Other Commands. (line 46) |
| * Labels, in scripts: Programming Commands. |
| (line 14) |
| * Last line, selecting: Addresses. (line 22) |
| * Line length, setting <1>: Other Commands. (line 65) |
| * Line length, setting: Invoking sed. (line 76) |
| * Line number, printing: Other Commands. (line 62) |
| * Line selection: Addresses. (line 6) |
| * Line, selecting by number: Addresses. (line 8) |
| * Line, selecting by regular expression match: Addresses. (line 27) |
| * Line, selecting last: Addresses. (line 22) |
| * List pattern space: Other Commands. (line 65) |
| * Mixing g and NUMBER modifiers in the s command: The "s" Command. |
| (line 57) |
| * Next input line, append to pattern space: Other Commands. (line 105) |
| * Next input line, replace pattern space with: Common Commands. |
| (line 44) |
| * Non-bugs, 0 address: Reporting Bugs. (line 103) |
| * Non-bugs, in-place editing: Reporting Bugs. (line 85) |
| * Non-bugs, localization-related: Reporting Bugs. (line 112) |
| * Non-bugs, N command on the last line: Reporting Bugs. (line 31) |
| * Non-bugs, regex syntax clashes: Reporting Bugs. (line 51) |
| * Parenthesized substrings: The "s" Command. (line 19) |
| * Pattern space, definition: Execution Cycle. (line 6) |
| * Perl-style regular expressions, multiline: Addresses. (line 54) |
| * Portability, comments: Common Commands. (line 15) |
| * Portability, line length limitations: Limitations. (line 6) |
| * Portability, N command on the last line: Reporting Bugs. (line 31) |
| * POSIXLY_CORRECT behavior, bracket expressions: Regular Expressions. |
| (line 105) |
| * POSIXLY_CORRECT behavior, enabling: Invoking sed. (line 84) |
| * POSIXLY_CORRECT behavior, escapes: Escapes. (line 11) |
| * POSIXLY_CORRECT behavior, N command: Reporting Bugs. (line 46) |
| * Print first line from pattern space: Other Commands. (line 110) |
| * Printing line number: Other Commands. (line 62) |
| * Printing text unambiguously: Other Commands. (line 65) |
| * Quitting <1>: Extended Commands. (line 36) |
| * Quitting: Common Commands. (line 30) |
| * Range of lines: Addresses. (line 65) |
| * Range with start address of zero: Addresses. (line 78) |
| * Read next input line: Common Commands. (line 44) |
| * Read text from a file <1>: Extended Commands. (line 53) |
| * Read text from a file: Other Commands. (line 78) |
| * Reformat pattern space: Extended Commands. (line 26) |
| * Reformatting paragraphs: Extended Commands. (line 26) |
| * Replace hold space with copy of pattern space: Other Commands. |
| (line 113) |
| * Replace pattern space with copy of hold space: Other Commands. |
| (line 121) |
| * Replacing all text matching regexp in a line: The "s" Command. |
| (line 51) |
| * Replacing only Nth match of regexp in a line: The "s" Command. |
| (line 55) |
| * Replacing selected lines with other text: Other Commands. (line 52) |
| * Requiring GNU sed: Extended Commands. (line 69) |
| * Script structure: sed Programs. (line 6) |
| * Script, from a file: Invoking sed. (line 46) |
| * Script, from command line: Invoking sed. (line 41) |
| * sed program structure: sed Programs. (line 6) |
| * Selecting lines to process: Addresses. (line 6) |
| * Selecting non-matching lines: Addresses. (line 101) |
| * Several lines, selecting: Addresses. (line 65) |
| * Slash character, in regular expressions: Addresses. (line 41) |
| * Spaces, pattern and hold: Execution Cycle. (line 6) |
| * Special addressing forms: Addresses. (line 78) |
| * Standard input, processing as input: Invoking sed. (line 143) |
| * Stream editor: Introduction. (line 6) |
| * Subprocesses <1>: Extended Commands. (line 12) |
| * Subprocesses: The "s" Command. (line 85) |
| * Substitution of text, options: The "s" Command. (line 47) |
| * Text, appending: Other Commands. (line 27) |
| * Text, deleting: Common Commands. (line 36) |
| * Text, insertion: Other Commands. (line 46) |
| * Text, printing: Common Commands. (line 39) |
| * Text, printing after substitution: The "s" Command. (line 65) |
| * Text, writing to a file after substitution: The "s" Command. |
| (line 78) |
| * Transliteration: Other Commands. (line 14) |
| * Unbuffered I/O, choosing: Invoking sed. (line 131) |
| * Usage summary, printing: Invoking sed. (line 28) |
| * Version, printing: Invoking sed. (line 24) |
| * Working on separate files: Invoking sed. (line 121) |
| * Write first line to a file: Extended Commands. (line 80) |
| * Write to a file: Other Commands. (line 88) |
| * Zero, as range start address: Addresses. (line 78) |
| |
| |
| File: sed.info, Node: Command and Option Index, Prev: Concept Index, Up: Top |
| |
| Command and Option Index |
| ************************ |
| |
| This is an alphabetical list of all `sed' commands and command-line |
| options. |
| |
| [index] |
| * Menu: |
| |
| * # (comments): Common Commands. (line 12) |
| * --binary: Invoking sed. (line 93) |
| * --expression: Invoking sed. (line 41) |
| * --file: Invoking sed. (line 46) |
| * --follow-symlinks: Invoking sed. (line 104) |
| * --help: Invoking sed. (line 28) |
| * --in-place: Invoking sed. (line 51) |
| * --line-length: Invoking sed. (line 76) |
| * --quiet: Invoking sed. (line 34) |
| * --regexp-extended: Invoking sed. (line 113) |
| * --silent: Invoking sed. (line 34) |
| * --unbuffered: Invoking sed. (line 131) |
| * --version: Invoking sed. (line 24) |
| * -b: Invoking sed. (line 93) |
| * -e: Invoking sed. (line 41) |
| * -f: Invoking sed. (line 46) |
| * -i: Invoking sed. (line 51) |
| * -l: Invoking sed. (line 76) |
| * -n: Invoking sed. (line 34) |
| * -n, forcing from within a script: Common Commands. (line 20) |
| * -r: Invoking sed. (line 113) |
| * -u: Invoking sed. (line 131) |
| * : (label) command: Programming Commands. |
| (line 14) |
| * = (print line number) command: Other Commands. (line 62) |
| * a (append text lines) command: Other Commands. (line 27) |
| * b (branch) command: Programming Commands. |
| (line 18) |
| * c (change to text lines) command: Other Commands. (line 52) |
| * D (delete first line) command: Other Commands. (line 99) |
| * d (delete) command: Common Commands. (line 36) |
| * e (evaluate) command: Extended Commands. (line 12) |
| * G (appending Get) command: Other Commands. (line 125) |
| * g (get) command: Other Commands. (line 121) |
| * H (append Hold) command: Other Commands. (line 117) |
| * h (hold) command: Other Commands. (line 113) |
| * i (insert text lines) command: Other Commands. (line 46) |
| * L (fLow paragraphs) command: Extended Commands. (line 26) |
| * l (list unambiguously) command: Other Commands. (line 65) |
| * N (append Next line) command: Other Commands. (line 105) |
| * n (next-line) command: Common Commands. (line 44) |
| * P (print first line) command: Other Commands. (line 110) |
| * p (print) command: Common Commands. (line 39) |
| * q (quit) command: Common Commands. (line 30) |
| * Q (silent Quit) command: Extended Commands. (line 36) |
| * r (read file) command: Other Commands. (line 78) |
| * R (read line) command: Extended Commands. (line 53) |
| * s command, option flags: The "s" Command. (line 47) |
| * T (test and branch if failed) command: Extended Commands. (line 63) |
| * t (test and branch if successful) command: Programming Commands. |
| (line 22) |
| * v (version) command: Extended Commands. (line 69) |
| * w (write file) command: Other Commands. (line 88) |
| * W (write first line) command: Extended Commands. (line 80) |
| * x (eXchange) command: Other Commands. (line 129) |
| * y (transliterate) command: Other Commands. (line 14) |
| * z (Zap) command: Extended Commands. (line 85) |
| * {} command grouping: Common Commands. (line 50) |
| |
| |
| |
| Tag Table: |
| Node: Top944 |
| Node: Introduction3867 |
| Node: Invoking sed4421 |
| Ref: Invoking sed-Footnote-110512 |
| Ref: Invoking sed-Footnote-210704 |
| Node: sed Programs10803 |
| Node: Execution Cycle11951 |
| Ref: Execution Cycle-Footnote-113129 |
| Node: Addresses13430 |
| Node: Regular Expressions18174 |
| Node: Common Commands26082 |
| Node: The "s" Command28085 |
| Ref: The "s" Command-Footnote-132422 |
| Node: Other Commands32494 |
| Ref: Other Commands-Footnote-137636 |
| Node: Programming Commands37708 |
| Node: Extended Commands38622 |
| Node: Escapes42630 |
| Ref: Escapes-Footnote-145641 |
| Node: Examples45832 |
| Node: Centering lines46928 |
| Node: Increment a number47820 |
| Ref: Increment a number-Footnote-149380 |
| Node: Rename files to lower case49500 |
| Node: Print bash environment52203 |
| Node: Reverse chars of lines52958 |
| Ref: Reverse chars of lines-Footnote-153959 |
| Node: tac54176 |
| Node: cat -n54943 |
| Node: cat -b56765 |
| Node: wc -c57512 |
| Ref: wc -c-Footnote-159420 |
| Node: wc -w59489 |
| Node: wc -l60953 |
| Node: head61197 |
| Node: tail61528 |
| Node: uniq63209 |
| Node: uniq -d63997 |
| Node: uniq -u64708 |
| Node: cat -s65419 |
| Node: Limitations67270 |
| Node: Other Resources68111 |
| Node: Reporting Bugs68956 |
| Ref: Reporting Bugs-Footnote-176092 |
| Node: Extended regexps76163 |
| Node: Concept Index77349 |
| Node: Command and Option Index92298 |
| |
| End Tag Table |