Description of problem: With mageia 1 or 2, in x86_64, [a-z] in a sed replacement is case insensitive. On a debian squeeze, [a-z] is case sensitive. Both with the same version of sed (4.2.1) Version-Release number of selected component (if applicable): 4.2.1-4.mga1 How reproducible: use [a-z] like expression in sed replacement Steps to Reproduce: With mageia : $ echo "A" | sed 's/[a-z]/b/' > b With debian squeeze : $ echo "A" | sed 's/[a-z]/b/' > A (I set the severity as major as the problem can impact some scripts like post install, configuration, etc)
CC: (none) => boklm
to add some tests : * the [.-.] is not good ([a-b], [a-z], ...) $ echo "B" | sed 's/[b-c]/a/' > a * the [..] is good ([abc] by example) $ echo "B" | sed 's/[bc]/a/' > B
All this in apparence strange behaviour is due to the LC_COLLATE environment variable, which is affecting sed and other commands. For instance, try in your bash shell (and i guess it would be the same on Debian, which probably has simply no LC_COLLATE defined by default): $ export LC_COLLATE=fr_FR.UTF8 $ echo "A" | sed 's/[a-z]/b/' b $ export LC_COLLATE=C $ echo "A" | sed 's/[a-z]/b/' A You can see that the sorting order has some importance in the [.-.] form, and of course not in the [..] form where you explicitely specify the characters to test for. To get the sorting order in a given locale: $ export LC_COLLATE=fr_FR.UTF8 $ echo $(printf '%s\n' {A..z} | sort) ` ^ _ [ ] a A b B c C d D e E f F g G h H i I j J k K l L m M n N o O p P q Q r R s S t T u U v V w W x X y Y z Z $ export LC_COLLATE=C $ echo $(printf '%s\n' {A..z} | sort) A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z This means that in some locale, when you say [a-z], you end up as though you were saying: [aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYz], which explains the behaviour you observe. It is not directly a question of a case insensitivity. You can also check this by typing, in your original locale: $ echo "Y" | sed 's/[a-z]/b/' b $ echo "Z" | sed 's/[a-z]/b/' Z
CC: (none) => eonwir.ardamire+mageia
CC: (none) => mageiaAssignee: bugsquad => shlomif
Thanks for all informations. But I don't really understand if it's normal or not. On a macos : $ locale LANG="fr_FR.UTF-8" LC_COLLATE="fr_FR.UTF-8" LC_CTYPE="fr_FR.UTF-8" LC_MESSAGES="fr_FR.UTF-8" LC_MONETARY="fr_FR.UTF-8" LC_NUMERIC="fr_FR.UTF-8" LC_TIME="fr_FR.UTF-8" LC_ALL= $ echo "A" | sed 's/[a-z]/b/' A $ export LC_COLLATE=fr_FR.UTF8 $ echo $(printf '%s\n' {A..z} | sort) A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z I'll try on a debian tomorrow but I guess the result will be identical, isn't it ?
From what I know it is dependent on one's locale, so not a bug.
But I reproduce the case only in mageia. Not in debian, ubuntu or macox by example. Each 4 with the same locale (fr_FR.UTF8) The initial test of the bug report is executed with the same locale on each computer. Perhaps is the mageia package of sed compiled with a specific flag ?
Hello Yves, (In reply to comment #5) > But I reproduce the case only in mageia. Not in debian, ubuntu or macox by > example. Each 4 with the same locale (fr_FR.UTF8) > > The initial test of the bug report is executed with the same locale on each > computer. > > Perhaps is the mageia package of sed compiled with a specific flag ? Mageia's sed is not built with any special flags: %configure2_5x --bindir=/bin %make LDFLAGS=-s %make html %make check I've now build GNU sed from sources under ~/apps/temp-sed and it yields the same results: shlomif[rpms]:$mageia/sed$ ( echo "A" | ~/apps/temp-sed/bin/sed 's/[a-z]/b/' ) b shlomif[rpms]:$mageia/sed$ ( export LC_ALL=C ; echo "A" | ~/apps/temp-sed/bin/sed 's/[a-z]/b/' ) A So I don't think the problem is in the sed package. Regards, -- Shlomi Fish
Hi, Finally I reproduce it on a debian with a french locale (my test would be wrong). So it's really a locale problem. Thanks for the time spent on my problem, the bug can be closed as it's not a real sed bug. Regards, Yves
Hi, I was doing more or less the same thing as Shlomi, compiling sed from the source without any special flags, and i saw too that it was exactly the same behaviour as the default sed shipped with Mageia. Then i read the following page: http://www.gnu.org/software/sed/manual/html_node/Reporting-Bugs.html Especially at the end: ---- begin included text ---- Here are a few commonly reported bugs that are not bugs. (...) [a-z] is case insensitive You are encountering problems with locales. POSIX mandates that [a-z] uses the current locale's collation order â in C parlance, that means using strcoll(3) instead of strcmp(3). Some locales have a case-insensitive collation order, others don't. Another problem is that [a-z] tries to use collation symbols. This only happens if you are on the GNU system, using GNU libc's regular expression matcher instead of compiling the one supplied with GNU sed. In a Danish locale, for example, the regular expression ^[a-z]$ matches the string âaaâ, because this is a single collating symbol that comes after âaâ and before âbâ; âllâ behaves similarly in Spanish locales, or âijâ in Dutch locales. To work around these problems, which may cause bugs in shell scripts, set the LC_COLLATE and LC_CTYPE environment variables to âCâ. ---- end included text ---- In a second step, i did compile two versions from the same original source, one with the --with-included-regex and one with the --without-included-regex. And here we can see the difference on the test case you mentioned: $ echo "A" | ./sed_with_included_regex 's/[a-z]/b/' b $ echo "A" | ./sed_without_included_regex 's/[a-z]/b/' A I guess that's a quite subtle difference of behaviour, to be aware of when writing scripts using the [.-.] syntax. Cheers, Arnaud
Hi, Thanks for this informations. > In a second step, i did compile two versions from the same original source, one with the --with-included-regex and one with the --without-included-regex. And here we can see the difference on the test case you mentioned: > > $ echo "A" | ./sed_with_included_regex 's/[a-z]/b/' > b > > $ echo "A" | ./sed_without_included_regex 's/[a-z]/b/' > A Ok, that can explain (perhaps) why my mac with LC_COLLATE=fr_FR not the same result as my mageia. Thank you all! Yves
Resolving as invalid then. Thanks for the report.
Status: NEW => RESOLVEDResolution: (none) => INVALID
CC: boklm => (none)