For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > August 2007 > problem with Sterling pound sign









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author problem with Sterling pound sign
Dermot Paikkos

2007-08-06, 7:58 am

Hi All,

CGI;
Mime::Lite;

I am trying to take the input from a text field from a html page and
send it as an email. The text contains a UK sterling =A3 sign. It looks
fine on in the html page but when I send the mail or output the text
to STDERR, it gets transformed into this: =C2=A3

Here are a few of the lines from the script:

33 my $str =3D $q->param('pro');
34 my ($p) =3D ($str =3D~ /THIS IS GOING TO COST (.*)320/);
35 my $o =3D ord($p);
36 my ($hex) =3D unpack( 'H', $p);
37 print STDERR "Text=3D",$q->param('pro')," \"$p\" $hex $o\n";

And this is the output:
Text=3DTHIS IS GOING TO COST =C2=A3320 "=C2=A3" c 194

I am a bit lost by this as I thought CGI did the heavy lifting with
character-encoding. Can anyone give me some pointers?

TIA,
Dp.





Mumia W.

2007-08-06, 6:59 pm

On 08/06/2007 05:52 AM, Dermot Paikkos wrote:
> Hi All,
>
> CGI;
> Mime::Lite;
>
> I am trying to take the input from a text field from a html page and
> send it as an email. The text contains a UK sterling £ sign. It looks
> fine on in the html page but when I send the mail or output the text
> to STDERR, it gets transformed into this: £
> [...]


Evidently you forgot to set the correct charset in your HTTP headers.
You're outputting UTF8 data, so you want to put that in the headers.

Beginner

2007-08-06, 6:59 pm

On 6 Aug 2007 at 12:50, Rob Dixon wrote:

> Dermot Paikkos wrote:
s[color=darkred]
>
> Hey Dermot
>
> I think you are grabbing two characters from the text instead of one.
> Your ord() is looking only at the first byte (and your unpack only at th=

e
> first four bits!) and HTML entity  is capital A circumflex. Quite
> what it's doing in there I don't know, but try using just /(.)320/ as yo=

ur
> regex (it's not optional and you don't want more than one). You should g=

et
> a character code of 163 for the pound sign.


Thanx for the tip Rob and your right that my Regex was too greedy. I
now have this:

30 my $str =3D $q->param('pro');
31 my $length =3D length($str);
32 my ($p1,$p2) =3D ($str =3D~ /(.)(.)320/);
33 my $o1 =3D ord($p1);
34 my $o2 =3D ord($p2);
35 my ($hex1) =3D unpack( 'H', $p1);
36 my ($hex2) =3D unpack( 'H', $p2);
37 print STDERR "Project=3D",$q->param('pro')," \"$p1\" \"$p2\" $hex1
$hex2 $o1 $o2 $length\n";


Which outputs this:
Text=3DTHIS IS GOING TO COST =C2=A3320 "=C2" "=A3" c a 194 163 27

Interestingly I count 26 characters in the field proir to submitting
but length is reported as 27 once it in the CGI.

So the character is there but it is some misinterretation of the
space prior to that as #194; If I copy the data from the field into a
text/hex editor it's shown x20 (SPace).

UTF-8: The referring page has this in the head:
<meta http-equiv=3D"Content-Type" content=3D"text/html;charset=3Dutf-8" >
which I think should make it legitimate utf-8. I have tried using
charset =3D> 'utf-8' in the start_html prior to read....but wait there
is charset and $q->charset('utf-8') gives me the desired result.

So thanx W.Mumia.
Dp.



Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2009 codecomments.com