Home > Archive > PERL CGI Beginners > January 2006 > convert unicode to ANSI or ASCII
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
convert unicode to ANSI or ASCII
|
|
| Daniel Chan 2006-01-28, 6:55 pm |
| Hi
I gave a cgi script that takes the web form data, send them to a
different system by e-mail. On that system I have a process to load the
e-mail data to the database. The problem is that if users entered some
European name that has some special character in one of the form field
it will chock up the data loading process. We do not want to remove
those special characters from the input string because it is part of the
name. We think that if we can convert those Unicode to ANSI or ASCII
character it will fix the problem but we do not how to do it on the perl
cgi script. Please help.
Thank you,
Daniel
| |
| Zentara 2006-01-29, 6:55 pm |
| On Fri, 27 Jan 2006 18:58:32 -0700, Daniel.Chan@lsil.com ("Chan,
Daniel") wrote:
>I gave a cgi script that takes the web form data, send them to a
>different system by e-mail. On that system I have a process to load the
>e-mail data to the database. The problem is that if users entered some
>European name that has some special character in one of the form field
>it will chock up the data loading process. We do not want to remove
>those special characters from the input string because it is part of the
>name. We think that if we can convert those Unicode to ANSI or ASCII
>character it will fix the problem but we do not how to do it on the perl
>cgi script. Please help.
>Daniel
Hi, these are a collection of tips I've accumulated from the unicode
experts at http://perlmonks.com
I hope one of them suits your needs.
########################################
#################
#!/usr/bin/perl
use warnings;
use strict;
use Unicode::Normalize;
use Encode;
my $string = "+lsctzùïåé}";
print "$string\n";
$string = decode("windows-1250", $string);
$string = NFD($string);
$string =~ s/\pM//og;
print "$string\n";
__END__
Some other tips:
########################################
#############################
# to replace all translations instead of just one would be:
while (m/(.)/) {
$string =$1 ;
$ustr = Unicode::String::utf8($string);
$latin1 = $ustr->latin1();
s/$string/$latin1/ ;
}
However, this can all be condensed into exactly one line:
s/(.)/Unicode::String::utf8($1)->latin1()/eg;
########################################
##########################
Or try this regex:
s/([\xc2-\xc3])([\x80-\xbf])/chr(64*ord($1&"\x03")+ord($2&"\x3f"))/eg
########################################
#################################
You could try Text::Iconv?
########################################
###########################
--
I'm not really a human, but I play one on earth.
http://zentara.net/japh.html
|
|
|
|
|