For Programmers: Free Programming Magazines  


Home > Archive > PHP SQL > July 2007 > Foreign characters behaving oddly









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Foreign characters behaving oddly
Matthew White

2007-07-16, 10:02 pm

Hello,
I have a website that is supposed to grab a French word, and return the
English translation. The front-end has an AJAX script, that dynamically
POST's the value to the backend:

function post() {
var string = document.getElementById("string").value;
var poststr = "string=" + encodeURI( string );
makePOSTRequest('dict.eng.php', poststr);
}

Then the backend takes the script, and queries a database for 30 words most
like that word:

$query = "SELECT * FROM dictionary WHERE fr like ('" . $string . "%') ORDER
BY fr LIMIT 30";
$query = mysql_query($query);

If I enter in a word like "bonjour", the script returns the words that are
most like bonjour. A word with a special character, like "français", will
return no values, even though it is in the dictionary. The page is in
UTF-8, and the database, tables, and fields are all utf8_bin. Can anyone
please point me in the right direction?

Matt




Harrie Verveer

2007-07-17, 4:04 am

Hi Matthew,

you might want to look into soundex functions:

http://dev.mysql.com/doc/refman/5.0...unction_soundex

Kind regards,

Harrie Verveer

Matthew White wrote:
> Hello,
> I have a website that is supposed to grab a French word, and return the
> English translation. The front-end has an AJAX script, that dynamically
> POST's the value to the backend:
>
> function post() {
> var string = document.getElementById("string").value;
> var poststr = "string=" + encodeURI( string );
> makePOSTRequest('dict.eng.php', poststr);
> }
>
> Then the backend takes the script, and queries a database for 30 words
> most like that word:
>
> $query = "SELECT * FROM dictionary WHERE fr like ('" . $string . "%')
> ORDER BY fr LIMIT 30";
> $query = mysql_query($query);
>
> If I enter in a word like "bonjour", the script returns the words that
> are most like bonjour. A word with a special character, like
> "français", will return no values, even though it is in the dictionary.
> The page is in UTF-8, and the database, tables, and fields are all
> utf8_bin. Can anyone please point me in the right direction?
>
> Matt
>
>
>
>

eisenstein

2007-07-17, 4:04 am

On 16 Jul., 23:31, "Matthew White" <mgw...@msn.com> wrote:
> Hello,
> I have a website that is supposed to grab a French word, and return the
> English translation. The front-end has an AJAX script, that dynamically
> POST's the value to the backend:
>
> function post() {
> var string =3D document.getElementById("string").value;
> var poststr =3D "string=3D" + encodeURI( string );
> makePOSTRequest('dict.eng.php', poststr);
> }
>
> Then the backend takes the script, and queries a database for 30 words mo=

st
> like that word:
>
> $query =3D "SELECT * FROM dictionary WHERE fr like ('" . $string . "%') O=

RDER
> BY fr LIMIT 30";
> $query =3D mysql_query($query);
>
> If I enter in a word like "bonjour", the script returns the words that are
> most like bonjour. A word with a special character, like "fran=E7ais", w=

ill
> return no values, even though it is in the dictionary. The page is in
> UTF-8, and the database, tables, and fields are all utf8_bin. Can anyone
> please point me in the right direction?
>
> Matt


Try to define first the connection encoding as utf8 (SET NAMES utf8),
before doing any db-transaactions.
Be aware that not all php-functions can handle unicode data strings:

mysql_query("SET NAMES 'utf8'");

eisenstein

Matthew White

2007-07-17, 7:01 pm

Soundex might work, but the MySQL documentation you provided clearly states:

Important: When using SOUNDEX(), you should be aware of the following
limitations:
This function, as currently implemented, is intended to work well with
strings that are in the English language only. Strings in other languages
may not produce reliable results.
This function is not guaranteed to provide consistent results with strings
that use multi-byte character sets, including utf-8.

Thanks for your help, though!
Matt

"Harrie Verveer"
<harrie-remove_this_and_the_-_between_i_and_b@i-buildings.nl> wrote in
message news:aeqdnarFfYR-6wHbRVnyhAA@zeelandnet.nl...[color=darkred]
> Hi Matthew,
>
> you might want to look into soundex functions:
>
> http://dev.mysql.com/doc/refman/5.0...unction_soundex
>
> Kind regards,
>
> Harrie Verveer
>
> Matthew White wrote:
Markus

2007-07-17, 7:01 pm

Matthew White schrieb:
> Hello,
> I have a website that is supposed to grab a French word, and return the
> English translation. The front-end has an AJAX script, that dynamically
> POST's the value to the backend:
>
> function post() {
> var string = document.getElementById("string").value;
> var poststr = "string=" + encodeURI( string );
> makePOSTRequest('dict.eng.php', poststr);
> }
>
> Then the backend takes the script, and queries a database for 30 words
> most like that word:
>
> $query = "SELECT * FROM dictionary WHERE fr like ('" . $string . "%')
> ORDER BY fr LIMIT 30";
> $query = mysql_query($query);
>
> If I enter in a word like "bonjour", the script returns the words that
> are most like bonjour. A word with a special character, like
> "français", will return no values, even though it is in the dictionary.
> The page is in UTF-8, and the database, tables, and fields are all
> utf8_bin. Can anyone please point me in the right direction?


Your Ajax function does encodeURI( string ) - do you decode it somewhere
before you do the database query? You can check this with var_dump($string).

Anyway, as you do a POST request, I would actually try to go without
urlencoding the string (this is needed with the GET method).
Javascript's encodeURI()/decodeURI() and PHP's urlencode()/urldecode()
may have different behaviours. I'd rather try to have your function send
the data as UTF-8. In a normal form, this would be done with the
accept-charset="UTF-8" attribute in the form tag; I don't know wether
this also works when sending data with your method.

HTH
Markus
Matthew White

2007-07-17, 7:01 pm

I added that query right after calling the database, and it now works fine,
but here is a problem- "français" returns three matches:
français
française
françaises

Why is "ç" being substituted for "ç", even when I pass each returned string
through htmlentities()?

Matt

"eisenstein" <stefan.huwiler@gmail.com> wrote in message
news:1184661891.783545.164920@d30g2000prg.googlegroups.com...
On 16 Jul., 23:31, "Matthew White" <mgw...@msn.com> wrote:
> Hello,
> I have a website that is supposed to grab a French word, and return the
> English translation. The front-end has an AJAX script, that dynamically
> POST's the value to the backend:
>
> function post() {
> var string = document.getElementById("string").value;
> var poststr = "string=" + encodeURI( string );
> makePOSTRequest('dict.eng.php', poststr);
> }
>
> Then the backend takes the script, and queries a database for 30 words
> most
> like that word:
>
> $query = "SELECT * FROM dictionary WHERE fr like ('" . $string . "%')
> ORDER
> BY fr LIMIT 30";
> $query = mysql_query($query);
>
> If I enter in a word like "bonjour", the script returns the words that are
> most like bonjour. A word with a special character, like "français", will
> return no values, even though it is in the dictionary. The page is in
> UTF-8, and the database, tables, and fields are all utf8_bin. Can anyone
> please point me in the right direction?
>
> Matt


Try to define first the connection encoding as utf8 (SET NAMES utf8),
before doing any db-transaactions.
Be aware that not all php-functions can handle unicode data strings:

mysql_query("SET NAMES 'utf8'");

eisenstein

Matthew White

2007-07-17, 7:01 pm

Well, the AJAX script passes the string correctly, because the PHP script
picks it up without problem. The issue seems to be with MySQL (see
eisenstein's post above).

"Markus" <derernst@NO#SP#AMgmx.ch> wrote in message
news:469cedd6$1_1@news.cybercity.ch...
> Matthew White schrieb:
>
> Your Ajax function does encodeURI( string ) - do you decode it somewhere
> before you do the database query? You can check this with
> var_dump($string).
>
> Anyway, as you do a POST request, I would actually try to go without
> urlencoding the string (this is needed with the GET method). Javascript's
> encodeURI()/decodeURI() and PHP's urlencode()/urldecode() may have
> different behaviours. I'd rather try to have your function send the data
> as UTF-8. In a normal form, this would be done with the
> accept-charset="UTF-8" attribute in the form tag; I don't know wether this
> also works when sending data with your method.
>
> HTH
> Markus


Allodoxaphobia

2007-07-17, 7:01 pm

On Mon, 16 Jul 2007 21:31:14 GMT, Matthew White posted:

> Subject: Foreign characters behaving oddly


I need to mention that around here the foreign characters behave quite
normally. It's the local characters that seem to be behaving oddly.
Don't *even* get me started about the elected characters.
Rik

2007-07-17, 10:00 pm

On Tue, 17 Jul 2007 20:31:56 +0200, Matthew White <mgw854@msn.com> wrote:

> I added that query right after calling the database, and it now works
> fine,
> but here is a problem- "français" returns three matches:
> français
> française
> françaises
>
> Why is "ç" being substituted for "ç", even when I pass each returned
> string
> through htmlentities()?


Well, it's clearly not interpreted as UTF8 as it should be. Maybe use
iconv to ensure all internal encoding is in utf8?

http://www.php.net/iconv
--
Rik Wasmus
Matthew White

2007-07-17, 10:00 pm

I tried the iconv, both for internal and external, but to no avail. I also
added in the mysql_query that set UTF-8, and I have also set htmlentities
with the third argument of "utf-8". The output is still corrupted.

Matt

"Rik" <luiheidsgoeroe@hotmail.com> wrote in message
news:op.tvmvq3jkqnv3q9@metallium...
> On Tue, 17 Jul 2007 20:31:56 +0200, Matthew White <mgw854@msn.com> wrote:
>
>
> Well, it's clearly not interpreted as UTF8 as it should be. Maybe use
> iconv to ensure all internal encoding is in utf8?
>
> http://www.php.net/iconv
> --
> Rik Wasmus


Markus

2007-07-18, 4:01 am

Matthew White schrieb:

> "Rik" <luiheidsgoeroe@hotmail.com> wrote in message
> news:op.tvmvq3jkqnv3q9@metallium...
[color=darkred]
> I tried the iconv, both for internal and external, but to no avail. I
> also added in the mysql_query that set UTF-8, and I have also set
> htmlentities with the third argument of "utf-8". The output is still
> corrupted.


It looks like your string is in UTF-8 encoding, but the output is
converted to Latin-1 or whatever. Check the following points:

1. All scripts (PHP, HTML) are in UTF-8 encoding

2. Send UTF-8 header to the browser:
header('Content-Type: text/html; charset=UTF-8');

3. Set also the appropriate Meta tag in the HTML source (should not be
necessary if correct header is sent, but you never know about browsers):
<meta http-equiv="content-type" content="text/html;charset=UTF-8">


BTW, Please get used to bottom-posting when you correspond with
newsgroups and mailing lists (add your answer below the text you quote,
rather than above as you do in normal e-mail).

HTH
Markus
Matthew White

2007-07-18, 7:02 pm

"Markus" <derernst@NO#SP#AMgmx.ch> wrote in message
news:469db6f9$1_1@news.cybercity.ch...
> Matthew White schrieb:
>
>
>
> It looks like your string is in UTF-8 encoding, but the output is
> converted to Latin-1 or whatever. Check the following points:
>
> 1. All scripts (PHP, HTML) are in UTF-8 encoding
>
> 2. Send UTF-8 header to the browser:
> header('Content-Type: text/html; charset=UTF-8');
>
> 3. Set also the appropriate Meta tag in the HTML source (should not be
> necessary if correct header is sent, but you never know about browsers):
> <meta http-equiv="content-type" content="text/html;charset=UTF-8">
>
>
> BTW, Please get used to bottom-posting when you correspond with newsgroups
> and mailing lists (add your answer below the text you quote, rather than
> above as you do in normal e-mail).
>
> HTH
> Markus


I had already made sure of the first and last, but I did add the header() to
my PHP file. It has made no difference in the output.

Matt

Good Man

2007-07-18, 7:02 pm

Message-ID: <Xns997179E2AED2sonicyouth@216.196.97.131>
Followup-To: alt.comp.lang.php
User-Agent: Xnews/5.04.25
Date: Wed, 18 Jul 2007 10:58:53 -0500
Lines: 42
X-Usenet-Provider: http://www.giganews.com
X-Trace: sv3- HfvehJgIBS8eHImW7GGiGU5J+RnTRI+CtaXNPaA5
yYLbnCrv2wDZi7N8e9wYkh2A9zolratzPxDqw/A!/ WPxk7ftDtdZzMUW63nNnyqjHRZg6aHC97jnyu7HT
FzS7lfORYhCEXHvXxi14CGNFAc=
X-Complaints-To: abuse@giganews.com
X-DMCA-Notifications: http://www.giganews.com/info/dmca.html
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.35
Bytes: 2573
Xref: number1.nntp.dca.giganews.com alt.comp.lang.php:42186 alt.php:124028 alt.php.sql:27812 comp.lang.php:149354

"Matthew White" <mgw854@msn.com> wrote in
news:H8pni.8062$fj5.7565@trnddc08:

> "Markus" <derernst@NO#SP#AMgmx.ch> wrote in message
> news:469db6f9$1_1@news.cybercity.ch...
>
> I had already made sure of the first and last, but I did add the
> header() to my PHP file. It has made no difference in the output.


Sorry to see this struggle go on for days!

I know some versions of MySQL were buggy with mixing collation types,
and perhaps that is a clue to your problem. Have you looked into using
COLLATE in your SQL query? Not sure if its the right tree to bark up,
but hey, its another tree:
http://dev.mysql.com/doc/refman/5.1...collations.html

and then further back,
http://dev.mysql.com/doc/refman/5.1/en/charset.html

good luck




Markus

2007-07-19, 8:01 am

In-Reply-To: <H8pni.8062$fj5.7565@trnddc08>
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 8bit
NNTP-Posting-Host: 84.253.56.201
X-Original-NNTP-Posting-Host: 84.253.56.201
Message-ID: <469f50f1$1_1@news.cybercity.ch>
X-Trace: news.cybercity.ch 1184846065 84.253.56.201 (19 Jul 2007 13:54:25 +0200)
Lines: 54
X-Original-NNTP-Posting-Host: 127.0.0.1
Path: border1.nntp.dca.giganews.com!nntp.giganews.com!newsfeed00.sul.t-online.de!newsfeed01.sul.t-online.de!t-online.de!solnet.ch!solnet.ch!nntp.gblx.net!nntp3.phx1!news.cybercity.ch
Bytes: 3368
Xref: number1.nntp.dca.giganews.com alt.comp.lang.php:42191 alt.php:124036 alt.php.sql:27816 comp.lang.php:149414

Matthew White schrieb:
> "Markus" <derernst@NO#SP#AMgmx.ch> wrote in message
> news:469db6f9$1_1@news.cybercity.ch...
[...][color=darkred]
[...][color=darkred]
>
> I had already made sure of the first and last, but I did add the
> header() to my PHP file. It has made no difference in the output.


Hum... if you don't find the solution in the links posted by Good Man,
you could try to add

ini_set('default_charset', 'utf-8');

to your PHP script (somewhere at the top); but I also think it is rather
a MySQL issue now. BTW, which MySQL version do you use?

One possible reason is that the db contents, that existed before you
added mysql_query("SET NAMES 'utf8'"), are now returned distorted, as
you entered them without telling the DB they are UTF-8, so "ç" was
stored as "ç", which will now be returned in proper UTF-8 encoding. To
test this, make the same test with data you entered after you added the
"SET NAMES" query.

Anyway, if this is the case, it is likely that your original problem
re-arises with all data entered with proper SET NAMES setting!
Matthew White

2007-07-19, 7:03 pm

"Matthew White" <mgw854@msn.com> wrote in message
news:CsRmi.2399$s25.1211@trndny04...
> Hello,
> I have a website that is supposed to grab a French word, and return the
> English translation. The front-end has an AJAX script, that dynamically
> POST's the value to the backend:
>
> function post() {
> var string = document.getElementById("string").value;
> var poststr = "string=" + encodeURI( string );
> makePOSTRequest('dict.eng.php', poststr);
> }
>
> Then the backend takes the script, and queries a database for 30 words
> most like that word:
>
> $query = "SELECT * FROM dictionary WHERE fr like ('" . $string . "%')
> ORDER BY fr LIMIT 30";
> $query = mysql_query($query);
>
> If I enter in a word like "bonjour", the script returns the words that are
> most like bonjour. A word with a special character, like "français", will
> return no values, even though it is in the dictionary. The page is in
> UTF-8, and the database, tables, and fields are all utf8_bin. Can anyone
> please point me in the right direction?
>
> Matt


Retracing my steps, I opened up the MySQL database, only to find those
values were corrupted. After adding in mysql_query("SET NAMES 'utf8'") to
the script that parses the dictionary file, I was able to make everything
work well. Thanks for everyone's help!

Matt

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com