| * ABOUT BUGS |
| |
| Before reporting a bug, please check the list of known bugs |
| and the list of oft-reported non-bugs (below). |
| |
| Bugs and comments may be sent to bonzini@gnu.org; please |
| include in the Subject: header the first line of the output of |
| ``sed --version''. |
| |
| Please do not send a bug report like this: |
| |
| [while building frobme-1.3.4] |
| $ configure |
| sed: file sedscr line 1: Unknown option to 's' |
| |
| If sed doesn't configure your favorite package, take a few extra |
| minutes to identify the specific problem and make a stand-alone test |
| case. |
| |
| A stand-alone test case includes all the data necessary to perform the |
| test, and the specific invocation of sed that causes the problem. The |
| smaller a stand-alone test case is, the better. A test case should |
| not involve something as far removed from sed as ``try to configure |
| frobme-1.3.4''. Yes, that is in principle enough information to look |
| for the bug, but that is not a very practical prospect. |
| |
| |
| |
| * NON-BUGS |
| |
| `N' command on the last line |
| |
| Most versions of sed exit without printing anything when the `N' |
| command is issued on the last line of a file. GNU sed instead |
| prints pattern space before exiting unless of course the `-n' |
| command switch has been specified. More information on the reason |
| behind this choice can be found in the Info manual. |
| |
| |
| regex syntax clashes (problems with backslashes) |
| |
| sed uses the Posix basic regular expression syntax. According to |
| the standard, the meaning of some escape sequences is undefined in |
| this syntax; notable in the case of GNU sed are `\|', `\+', `\?', |
| `\`', `\'', `\<', `\>', `\b', `\B', `\w', and `\W'. |
| |
| As in all GNU programs that use Posix basic regular expressions, sed |
| interprets these escape sequences as meta-characters. So, `x\+' |
| matches one or more occurrences of `x'. `abc\|def' matches either |
| `abc' or `def'. |
| |
| This syntax may cause problems when running scripts written for other |
| seds. Some sed programs have been written with the assumption that |
| `\|' and `\+' match the literal characters `|' and `+'. Such scripts |
| must be modified by removing the spurious backslashes if they are to |
| be used with recent versions of sed (not only GNU sed). |
| |
| On the other hand, some scripts use `s|abc\|def||g' to remove occurrences |
| of _either_ `abc' or `def'. While this worked until sed 4.0.x, newer |
| versions interpret this as removing the string `abc|def'. This is |
| again undefined behavior according to POSIX, but this interpretation |
| is arguably more robust: the older one, for example, required that |
| the regex matcher parsed `\/' as `/' in the common case of escaping |
| a slash, which is again undefined behavior; the new behavior avoids |
| this, and this is good because the regex matcher is only partially |
| under our control. |
| |
| In addition, GNU sed supports several escape characters (some of |
| which are multi-character) to insert non-printable characters |
| in scripts (`\a', `\c', `\d', `\o', `\r', `\t', `\v', `\x'). These |
| can cause similar problems with scripts written for other seds. |
| |
| |
| -i clobbers read-only files |
| |
| In short, `sed d -i' will let one delete the contents of |
| a read-only file, and in general the `-i' option will let |
| one clobber protected files. This is not a bug, but rather a |
| consequence of how the Unix filesystem works. |
| |
| The permissions on a file say what can happen to the data |
| in that file, while the permissions on a directory say what can |
| happen to the list of files in that directory. `sed -i' |
| will not ever open for writing a file that is already on disk, |
| rather, it will work on a temporary file that is finally renamed |
| to the original name: if you rename or delete files, you're actually |
| modifying the contents of the directory, so the operation depends on |
| the permissions of the directory, not of the file). For this same |
| reason, sed will not let one use `-i' on a writeable file in a |
| read-only directory, and will break hard or symbolic links when |
| `-i' is used on such a file. |
| |
| |
| `0a' does not work (gives an error) |
| |
| There is no line 0. 0 is a special address that is only used to treat |
| addresses like `0,/RE/' as active when the script starts: if you |
| write `1,/abc/d' and the first line includes the word `abc', then |
| that match would be ignored because address ranges must span at least |
| two lines (barring the end of the file); but what you probably wanted is |
| to delete every line up to the first one including `abc', and this |
| is obtained with `0,/abc/d'. |
| |
| |
| `[a-z]' is case insensitive |
| `s/.*//' does not clear pattern space |
| |
| You are encountering problems with locales. POSIX mandates that `[a-z]' |
| uses the current locale's collation order -- in C parlance, that means |
| strcoll(3) instead of strcmp(3). Some locales have a case insensitive |
| strcoll, others don't. |
| |
| Another problem is that [a-z] tries to use collation symbols. This |
| only happens if you are on the GNU system, using GNU libc's regular |
| expression matcher instead of compiling the one supplied with GNU sed. |
| In a Danish locale, for example, the regular expression `^[a-z]$' |
| matches the string `aa', because `aa' is a single collating symbol that |
| comes after `a' and before `b'; `ll' behaves similarly in Spanish |
| locales, or `ij' in Dutch locales. |
| |
| Another common localization-related problem happens if your input stream |
| includes invalid multibyte sequences. POSIX mandates that such |
| sequences are _not_ matched by `.', so that `s/.*//' will not clear |
| pattern space as you would expect. In fact, there is no way to clear |
| sed's buffers in the middle of the script in most multibyte locales |
| (including UTF-8 locales). For this reason, GNU sed provides a `z' |
| command (for `zap') as an extension. |
| |
| However, to work around both of these problems, which may cause bugs |
| in shell scripts, you can set the LC_ALL environment variable to `C', |
| or set the locale on a more fine-grained basis with the other LC_* |
| environment variables. |
| |