For Programmers: Free Programming Magazines  


Home > Archive > PERL CGI Beginners > August 2006 > Convert utf-8 XML Document to ISO format









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Convert utf-8 XML Document to ISO format
webmaster@echtwahr.com

2006-08-04, 7:55 am

Hi List,

I'm trying really hard the last 2 days to get around the problem UTF-8 to
ISO-8859-1

I receive a POST of an UTF-8 XML Document, declaration is okay, the document
is send by a Windows Server.

Now I have tried to convert the document to Latin1 (ISO-8859-1) by all the
ways I can imagine, but nothing really modifies the utf flag.

When I change the text to iso-8859-1 and I put it into my database (utf8
also as latin1) I get this sign " Â " before the sign I want to save in my
database!

When I print out the string on the screen of the server (logfile) it shows
me that the data comes in with the utf-8 flag set on (Â sign I guess) an
after transforming it I print it out by data::dump and the signs become
something like \x{c2}\x.... the \x{c2} I guess is the special character set
by utf, okay now I transform the string using Unicode::String
And the string becomes Latin1 in the logfile, but in my database not, in the
UTF-8 table the signs are good, but in the latin1 table the signs become
weird.

Maybe someone has a hint how to convert a XML::Simple document (by POST) in
UTF-8 with the FLAG set on to a Simple LATIN1 document so that I can safe it
into my latin1 table!

Tanks for any help


Ciao Thomas



Mumia W.

2006-08-04, 6:55 pm

On 08/04/2006 05:07 AM, webmaster@echtwahr.com wrote:
> Hi List,
> [...]


Hi Web.

>
> Maybe someone has a hint how to convert a XML::Simple document (by POST) in
> UTF-8 with the FLAG set on to a Simple LATIN1 document so that I can safe it
> into my latin1 table!
>
> Tanks for any help
>
>
> Ciao Thomas
>
>
>
>


Use the Encode module to convert the string to iso-8859-1.


J. Alejandro Ceballos Z. -JOAL-

2006-08-07, 9:55 pm

You can try with

*Encode::Unicode*
<http://search.cpan.org/author/DANKO...code/Unicode.pm>
or *Unicode::Transform*
<http://search.cpan.org/author/SADAH...34/Transform.pm>

An other form is using the sustitution parameters (s/) in order to do
inverse convertion.


--

Atentamente,

,_,
(O,O) J. Alejandro Ceballos Z. buzon@alejandro.ceballos.info
( )
-"-"-----------------------------------------------------------------
http://alejandro.ceballos.info movil: (33) 3849-8936




Chris Cosner

2006-08-14, 6:55 pm

Thomas,

I've had a similar experience and will provide my solution below. I'm
not sure it's optimal, but it works for me. I'm working with a file, not
an HTTP POST. (In addition to what is below, I would suggest looking at
how you specify the charset encoding of your POST to be sure it is what
you think it is. That part is beyond me.)

As far as I can tell, Perl works in UTF-8 and can mangle diacritics
given to it in other character sets. The key is that you encode TWICE.
First to get it into Perl, then once more right before you put data in
the database. As soon as Perl does any transformations on text, it seems
to go back to UTF-8. When I leave off the first or second encoding, I
get mangled diacritics.

use Encode;
my $file = "file data in iso-8859-1 or LATIN1";

# this could be a string too, i.e., what you receive from your POST, but
then you would use the second command below, I think

open (F, "<:encoding(iso-8859-1)", $file)

#This gets the data in cleanly. You do transformations on the text as
you please, but then Perl has it in UTF-8 again. So *right before* you
put it in your SQL query, take your $string and put it into the proper
encoding for your database.

$string = encode("iso-8859-1", $string);

#Probably a good idea to use a bound parameter, i.e., ? in the query and
provide the $string as a parameter in the execute command.

At least in my case, this solves the problem.

-Chris Cosner

webmaster@echtwahr.com wrote:
> Hi List,
>
> I'm trying really hard the last 2 days to get around the problem UTF-8 to
> ISO-8859-1
>
> I receive a POST of an UTF-8 XML Document, declaration is okay, the document
> is send by a Windows Server.
>
> Now I have tried to convert the document to Latin1 (ISO-8859-1) by all the
> ways I can imagine, but nothing really modifies the utf flag.
>
> When I change the text to iso-8859-1 and I put it into my database (utf8
> also as latin1) I get this sign " Â " before the sign I want to save in my
> database!
>
> When I print out the string on the screen of the server (logfile) it shows
> me that the data comes in with the utf-8 flag set on (Â sign I guess) an
> after transforming it I print it out by data::dump and the signs become
> something like \x{c2}\x.... the \x{c2} I guess is the special character set
> by utf, okay now I transform the string using Unicode::String
> And the string becomes Latin1 in the logfile, but in my database not, in the
> UTF-8 table the signs are good, but in the latin1 table the signs become
> weird.
>
> Maybe someone has a hint how to convert a XML::Simple document (by POST) in
> UTF-8 with the FLAG set on to a Simple LATIN1 document so that I can safe it
> into my latin1 table!
>
> Tanks for any help
>
>
> Ciao Thomas
>
>
>
>


--
Chris Cosner

Systems Administrator
Stanford University Press
1450 Page Mill Road
Palo Alto, CA 94304
(650) 724-7276
ccosner@stanford.edu
http://www.sup.org
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com