For Programmers: Free Programming Magazines  


Home > Archive > PHP DB > July 2007 > Re: [PHP-DB] PHP + PostgreSQL: invalid byte sequence for encoding









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Re: [PHP-DB] PHP + PostgreSQL: invalid byte sequence for encoding
aldnin

2007-07-21, 6:58 pm

> My guess is that your PHP is not setup to handle UTF8, and is really
> sending something else. UTF8 is the default client encoding because that
> is the encoding of the database. It does not mean that PHP has set the
> right one. Before running your test, try executing this: "SET
> client_encoding TO LATIN1;" and see if that fixes it.


I already did this and all encoding settings are right, but I figured out something more.

1) Using pg_query for fetching UTF8 data from database is working properly. Of course when I try to output it direclty then I get something like that as output "lacarrière" - but when I use utf8_decode() on the UTF8-bytes I get it the right way "lacarriè
re".

2) I found another PHP application which is able to insert UTF8 data properly, phpPgAdmin, but it seems that it uses the ADODB-Layers for executing SQL-statements.
Well, the fact that phpPgAdmin runs on the same machine handling properly UTF8 data means that my PHP is well configurated handling UTF8.

3) When I add to my DB-Class utf8_encode() on the querystring I send to the database, it works properly, the insert is fine, so that's a temporary solution for my first problem.

4) When I get data from database I usually would have to do a utf8_decode on EVERY string which is fetched from database. So my solution is now, to handle all strings comming UTF8 from database as they are comming with UTF8-bytes, and really only then wh
en I need to decode them I decode them for further use.

Problem:
--------
Just declaring the string 'lacarrière' 10 millions times takes 5 seconds, when doing a utf8_encode() on it takes 13 seconds. So it needs 2-3 times more ressources when using always a utf8_encode() on a string, also when the string does not include special
characters. And this ressources are also wasted when the strings don't need to be utf8-encoded.

Workaround:
-----------
To don't waste ressources you have to do a utf8_encode only when you "guess" that there might be special characters - have fun with that, but it's the only way I see to work properly with that special characters in combination with postgres.
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com