For Programmers: Free Programming Magazines  


Home > Archive > PERL CGI Beginners > January 2006 > convert unicode to ANSI or ASCII









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author convert unicode to ANSI or ASCII
Daniel Chan

2006-01-28, 6:55 pm

Hi



I gave a cgi script that takes the web form data, send them to a
different system by e-mail. On that system I have a process to load the
e-mail data to the database. The problem is that if users entered some
European name that has some special character in one of the form field
it will chock up the data loading process. We do not want to remove
those special characters from the input string because it is part of the
name. We think that if we can convert those Unicode to ANSI or ASCII
character it will fix the problem but we do not how to do it on the perl
cgi script. Please help.



Thank you,



Daniel


Zentara

2006-01-29, 6:55 pm

On Fri, 27 Jan 2006 18:58:32 -0700, Daniel.Chan@lsil.com ("Chan,
Daniel") wrote:

>I gave a cgi script that takes the web form data, send them to a
>different system by e-mail. On that system I have a process to load the
>e-mail data to the database. The problem is that if users entered some
>European name that has some special character in one of the form field
>it will chock up the data loading process. We do not want to remove
>those special characters from the input string because it is part of the
>name. We think that if we can convert those Unicode to ANSI or ASCII
>character it will fix the problem but we do not how to do it on the perl
>cgi script. Please help.
>Daniel


Hi, these are a collection of tips I've accumulated from the unicode
experts at http://perlmonks.com

I hope one of them suits your needs.

########################################
#################
#!/usr/bin/perl
use warnings;
use strict;
use Unicode::Normalize;
use Encode;

my $string = "+lsctzùïåé}";
print "$string\n";
$string = decode("windows-1250", $string);
$string = NFD($string);
$string =~ s/\pM//og;
print "$string\n";
__END__


Some other tips:
########################################
#############################
# to replace all translations instead of just one would be:

while (m/(.)/) {
$string =$1 ;
$ustr = Unicode::String::utf8($string);
$latin1 = $ustr->latin1();
s/$string/$latin1/ ;
}

However, this can all be condensed into exactly one line:

s/(.)/Unicode::String::utf8($1)->latin1()/eg;

########################################
##########################
Or try this regex:

s/([\xc2-\xc3])([\x80-\xbf])/chr(64*ord($1&"\x03")+ord($2&"\x3f"))/eg

########################################
#################################

You could try Text::Iconv?
########################################
###########################




--
I'm not really a human, but I play one on earth.
http://zentara.net/japh.html
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com