Home > Archive > PHP DB > July 2007 > Re: [PHP-DB] PHP + PostgreSQL: invalid byte sequence for encoding "UTF8"
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Re: [PHP-DB] PHP + PostgreSQL: invalid byte sequence for encoding "UTF8"
|
|
| John DeSoi 2007-07-21, 6:58 pm |
|
On Jul 21, 2007, at 7:53 AM, aldnin wrote:
> When I try to send this query (select 'lacarri=E8re' as test;) to a =20=
> UTF8 initialized pgsql-database (8.2.4) from PHP 5.2.3 I get this =20
> error:
>
> ERROR: invalid byte sequence for encoding "UTF8": 0xe87265
>
> I use pg_query for the query delivery.
>
> Client Encoding is set to:
> client_encoding
> -----------------
> UTF8
> (1 row)
>
> pg_client_encoding() also deliveres me "UTF8".
My guess is that your PHP is not setup to handle UTF8, and is really =20
sending something else. UTF8 is the default client encoding because =20
that is the encoding of the database. It does not mean that PHP has =20
set the right one. Before running your test, try executing this: "SET =20=
client_encoding TO LATIN1;" and see if that fixes it.
John DeSoi, Ph.D.
http://pgedit.com/
Power Tools for PostgreSQL
| |
|
| Hi
Please configure your email client so we don't receive 5 copies of your
mail.
> I already did this and all encoding settings are right, but I figured out=
something more.
>=20
> 1) Using pg_query for fetching UTF8 data from database is working properl=
y. Of course when I try to output it direclty then I get something like tha=
t as output "lacarri=C3=A8re" - but when I use utf8_decode() on the UTF8-by=
tes I get it the right way "lacarri=E8re".
This indicates that PHP not using UTF-8. That output is typical of
UTF-8 output as Latin characters.
> 2) I found another PHP application which is able to insert UTF8 data prop=
erly, phpPgAdmin, but it seems that it uses the ADODB-Layers for executing =
SQL-statements.
> Well, the fact that phpPgAdmin runs on the same machine handling properly=
UTF8 data means that my PHP is well configurated handling UTF8.
Not true, it only indicates that phpPgAdmin is is configured to handle
UTF-8 correctly.
> 3) When I add to my DB-Class utf8_encode() on the querystring I send to t=
he database, it works properly, the insert is fine, so that's a temporary s=
olution for my first problem.
> 4) When I get data from database I usually would have to do a utf8_decode=
on EVERY string which is fetched from database. So my solution is now, to =
handle all strings comming UTF8 from database as they are comming with UTF8=
-bytes, and really only then when I need to decode them I decode them for =
further use.
Once again indicating your data needs to be converted from some other
character set.
I had similar problems getting PHP to work with UTF-8 and MySQL. Many
of PHP's function are not multibyte aware and assume a Latin character set.
What, if any, output buffering are you using? What is your
default_charset set to?
--
Niel Archer
| |
|
| Hi
>=20
> Well, I searched all the source code of phpPgAdmin for charsets and I fou=
nd:
>=20
> "echo "\t<meta http-equiv=3D\"Content-Type\" content=3D\"text/html; chars=
et=3D{$data->codemap[$dbEncoding]}\" />\r\n";"
>=20
> So this means, phpPgAdmin sets the output charset to the charset which
> is used by the databased connected to - but that's still not the
> problem, because I also know how to fix charset output in browsers.
Not exactly. As far as I can see, it only changes the value of the Content-=
type: header
in the HTML, it doesn't change the actual encoding output.
>=20
> It's already converted to be compatible to utf8 when fetching it from som=
e other ressources.
I didn't mean the content of the database,. I was referring to the data
that PHP is actually processing, which appears to have been converted
within PHP
> Well, I've set the default_charset to UTF8, it was set before to "" (empt=
y) -
> but the output on console (cli) and the problem is still the same also
> after changing this to UTF8, so: this is not the problem,=20
=2E
It should be "UTF-8", this is the official designation from unicode,
although case will likely be ignored. As far as I know "UTF8" is not a
recognised encoding
This however, is only the value that will be output as the
Content-Type charset, as noted above.
> I fetch something from database, which looks like "lacarri=C3=A8re" when =
I output it in
> PHP - well don't let us get from PHPs output. Then I fetch
> something from another ressource looking like "lacarri=E8re" - when I
> compare both strings in PHP it tells me that they are "not equal".
As I said before. Many of PHP's functions (the string one's for
comparing for example) are NOT multi-byte aware, so are NOT guaranteed
to work correctly.
You did not answer the most important question. What, if any, output
buffering are you using? Are you using the mbstring module? If so, is
it set to overload the old string functions?
--
Niel Archer
| |
|
| Hi
You still haven't answered whether you're using any output handler, and
if so which one. I use
output_handler=mb_output_handler
> I overloaded the mbstring variables with:
> mbstring.func_overload = 6
> Setting it to "7" won't let me even echo something else.
Very strange, the only additional function overloaded is mail() and that
shouldn't stop you using echo.
As well as setting the internal encoding and enabling it with
mbstring.encoding_translation = On
mbstring.internal_encoding = UTF-8
I would also use:
mbstring.language = English
; or German in your case
mbstring.detect_order = UTF-8,eucjp-win,sjis-win
mbstring.http_input = UTF-8,SJIS,EUC-JP
mbstring.http_output = UTF-8
> Is it possible for mbstring to overload the pg-functions I need?
No, and it shouldn't be needed. Those functions should be UTF-8 enabled
in order to communicate with the database and supply the correct data
You're still referring to 'UTF8' which as I pointed out isn't the
official name of the encoding system. I have no idea if PHP will
recognise it, but to be safe I suggest you use the official 'UTF-8'
(hyphen between letters and number) in case it's causing problems.
The other thing to be wary of, is output to the console. Some OSes do
not support unicode in the console. So unless you're certain yours does,
I wouldn't use it as a test.
--
Niel Archer
|
|
|
|
|