Bug 7108 - PostgreSQL initializes databases with the SQL_ASCII encoding
Summary: PostgreSQL initializes databases with the SQL_ASCII encoding
Status: RESOLVED OLD
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: 2
Hardware: i586 Linux
Priority: Normal normal
Target Milestone: ---
Assignee: Olivier Thauvin
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-08-19 17:37 CEST by Frédéric "LpSolit" Buclin
Modified: 2013-11-23 16:13 CET (History)
0 users

See Also:
Source RPM: postgresql9.1-server-9.1.4-1.mga2
CVE:
Status comment:


Attachments

Description Frédéric "LpSolit" Buclin 2012-08-19 17:37:53 CEST
I just installed PostgreSQL 9.1.5 on Cauldron, but it has the same problem as Pg 9.1.4 on Mageia 2:

[postgres@localhost ~]$ psql -l

                             List of databases
   Name    |  Owner   | Encoding  | Collate | Ctype |   Access privileges   
-----------+----------+-----------+---------+-------+-----------------------
 postgres  | postgres | SQL_ASCII | C       | C     | 
 template0 | postgres | SQL_ASCII | C       | C     | =c/postgres          +
           |          |           |         |       | postgres=CTc/postgres
 template1 | postgres | SQL_ASCII | C       | C     | =c/postgres          +
           |          |           |         |       | postgres=CTc/postgres
(3 rows)


Per http://www.postgresql.org/docs/9.1/static/app-initdb.html, about the encoding:

"The default is derived from the locale, or SQL_ASCII if that does not work."

So for some reason, the locale (fr_CH.UTF-8) couldn't be determined, and Pg falled back to SQL_ASCII instead of using UTF8. But http://www.postgresql.org/docs/9.1/static/sql-createdatabase.html says:

"CREATE DATABASE will allow superusers to specify SQL_ASCII encoding regardless of the locale settings, but this choice is deprecated and may result in misbehavior of character-string functions"

This is exactly what happened to me:

bugs_cvs=> select quipid, length(quip), quip from quips;

 quipid | length |         quip         
--------+--------+----------------------
      5 |     16 | moi je suis mini
      6 |     21 | moi aussi, très mini
      8 |      6 | ààà

Here, only the first length is correct, the other two are totally wrong due to letters outside the US-ASCII range. To fix this problem, databases must be created with ENCODING = 'UTF8'. But as new databases are created by copying template1, which uses SQL_ASCII, all newly created databases have this encoding, which should be avoided at all price. UTF8 is compatible with all locales, AFAIK, so it should be used instead.

http://www.postgresql.org/docs/9.1/static/multibyte.html says that SQL_ASCII stores 1 byte per character, which is why it sees e.g. "à" as being 2 characters, as it needs two bytes to store it.
Manuel Hiebel 2012-08-19 20:48:09 CEST

Assignee: bugsquad => nanardon

Comment 1 Olivier Thauvin 2012-08-19 22:10:23 CEST
Since Postgresql 8.4 (iirc) it is impossible to freely choose the encoding of new database if the storage has been initialize w/ another encoding.

So this behavior is the expect one.

It is still possible to create UTF-8 database by using template0 as template for new database.

I'll try to check if with postgresql 9.1 another a better solution is possible, but I don't see a real problem here.
Comment 2 Frédéric "LpSolit" Buclin 2012-08-20 12:02:45 CEST
(In reply to comment #1)
> I'll try to check if with postgresql 9.1 another a better solution is possible,
> but I don't see a real problem here.

I see two problems:

1) As the Pg doc says, initdb falls back to SQL_ASCII only if it cannot find anything better. So in our case, this means that it's unable to correctly detect and parse our locale. This is a bug. The doc also says that UTF8 works with all locales, so I'm sure a post-script in the RPM can run it with
 "initdb -E UTF8" to force this encoding.

2) SQL_ASCII can only store one byte per character, which makes it behaves stupidly as shown above (ààà = 6 characters). By default, Pg uses template1, not template0, so it would be a big win to have it encoded with UTF8.
Comment 3 Manuel Hiebel 2013-10-22 12:10:04 CEST
This message is a reminder that Mageia 2 is nearing its end of life.
Approximately one month from now Mageia will stop maintaining and issuing updates for Mageia 2. At that time this bug will be closed as WONTFIX (EOL) if it remains open with a Mageia 'version' of '2'.

Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Mageia version prior to Mageia 2's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Mageia 2 is end of life.  If you would still like to see this bug fixed and are able to reproduce it against a later version of Mageia, you are encouraged to click on "Version" and change it against that version of Mageia.

Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Mageia release includes newer upstream software that fixes bugs or makes them obsolete.

-- 
The Mageia Bugsquad
Comment 4 Manuel Hiebel 2013-11-23 16:13:40 CET
Mageia 2 changed to end-of-life (EOL) status on ''22 November''. Mageia 2 is no
longer maintained, which means that it will not receive any further security or
bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of Mageia
please feel free to click on "Version" change it against that version of Mageia
and reopen this bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

--
The Mageia Bugsquad

Status: NEW => RESOLVED
Resolution: (none) => OLD


Note You need to log in before you can comment on or make changes to this bug.