Code Comments
Programming Forum and web based access to our favorite programming groups.Hi List,
I'm trying really hard the last 2 days to get around the problem UTF-8 to
ISO-8859-1
I receive a POST of an UTF-8 XML Document, declaration is okay, the document
is send by a Windows Server.
Now I have tried to convert the document to Latin1 (ISO-8859-1) by all the
ways I can imagine, but nothing really modifies the utf flag.
When I change the text to iso-8859-1 and I put it into my database (utf8
also as latin1) I get this sign " Â " before the sign I want to save in my
database!
When I print out the string on the screen of the server (logfile) it shows
me that the data comes in with the utf-8 flag set on (Â sign I guess) an
after transforming it I print it out by data::dump and the signs become
something like \x{c2}\x.... the \x{c2} I guess is the special character set
by utf, okay now I transform the string using Unicode::String
And the string becomes Latin1 in the logfile, but in my database not, in the
UTF-8 table the signs are good, but in the latin1 table the signs become
weird.
Maybe someone has a hint how to convert a XML::Simple document (by POST) in
UTF-8 with the FLAG set on to a Simple LATIN1 document so that I can safe it
into my latin1 table!
Tanks for any help
Ciao Thomas
Post Follow-up to this messageOn 08/04/2006 05:07 AM, webmaster@echtwahr.com wrote: > Hi List, > [...] Hi Web. > > Maybe someone has a hint how to convert a XML::Simple document (by POST) i n > UTF-8 with the FLAG set on to a Simple LATIN1 document so that I can safe it > into my latin1 table! > > Tanks for any help > > > Ciao Thomas > > > > Use the Encode module to convert the string to iso-8859-1.
Post Follow-up to this messageYou can try with *Encode::Unicode* <http://search.cpan.org/author/DANKO...code/Unicode.pm> or *Unicode::Transform* <http://search.cpan.org/author/SADAH...34/Transform.pm> An other form is using the sustitution parameters (s/) in order to do inverse convertion. -- Atentamente, ,_, (O,O) J. Alejandro Ceballos Z. buzon@alejandro.ceballos.info ( ) -"-"----------------------------------------------------------------- http://alejandro.ceballos.info movil: (33) 3849-8936
Post Follow-up to this messageThomas,
I've had a similar experience and will provide my solution below. I'm
not sure it's optimal, but it works for me. I'm working with a file, not
an HTTP POST. (In addition to what is below, I would suggest looking at
how you specify the charset encoding of your POST to be sure it is what
you think it is. That part is beyond me.)
As far as I can tell, Perl works in UTF-8 and can mangle diacritics
given to it in other character sets. The key is that you encode TWICE.
First to get it into Perl, then once more right before you put data in
the database. As soon as Perl does any transformations on text, it seems
to go back to UTF-8. When I leave off the first or second encoding, I
get mangled diacritics.
use Encode;
my $file = "file data in iso-8859-1 or LATIN1";
# this could be a string too, i.e., what you receive from your POST, but
then you would use the second command below, I think
open (F, "<:encoding(iso-8859-1)", $file)
#This gets the data in cleanly. You do transformations on the text as
you please, but then Perl has it in UTF-8 again. So *right before* you
put it in your SQL query, take your $string and put it into the proper
encoding for your database.
$string = encode("iso-8859-1", $string);
#Probably a good idea to use a bound parameter, i.e., ? in the query and
provide the $string as a parameter in the execute command.
At least in my case, this solves the problem.
-Chris Cosner
webmaster@echtwahr.com wrote:
> Hi List,
>
> I'm trying really hard the last 2 days to get around the problem UTF-8 to
> ISO-8859-1
>
> I receive a POST of an UTF-8 XML Document, declaration is okay, the docume
nt
> is send by a Windows Server.
>
> Now I have tried to convert the document to Latin1 (ISO-8859-1) by all the
> ways I can imagine, but nothing really modifies the utf flag.
>
> When I change the text to iso-8859-1 and I put it into my database (utf8
> also as latin1) I get this sign " Â " before the sign I want to save in m
y
> database!
>
> When I print out the string on the screen of the server (logfile) it shows
> me that the data comes in with the utf-8 flag set on (Â sign I guess) an
> after transforming it I print it out by data::dump and the signs become
> something like \x{c2}\x.... the \x{c2} I guess is the special character se
t
> by utf, okay now I transform the string using Unicode::String
> And the string becomes Latin1 in the logfile, but in my database not, in t
he
> UTF-8 table the signs are good, but in the latin1 table the signs become
> weird.
>
> Maybe someone has a hint how to convert a XML::Simple document (by POST) i
n
> UTF-8 with the FLAG set on to a Simple LATIN1 document so that I can safe
it
> into my latin1 table!
>
> Tanks for any help
>
>
> Ciao Thomas
>
>
>
>
--
Chris Cosner
Systems Administrator
Stanford University Press
1450 Page Mill Road
Palo Alto, CA 94304
(650) 724-7276
ccosner@stanford.edu
http://www.sup.org
Post Follow-up to this message
Show a Printable Version
Email This Page to Someone!
Receive updates to this thread
Powered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.