For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > August 2006 > parsing special characters in perl









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author parsing special characters in perl
soyebb@gmail.com

2006-08-29, 9:57 pm

Hi,
I am trying to parse a line
..=2E.Read product information for the AMD Athlon=99 64 3200+, 2.0 GHz
Athlon 64 (AMDADA3200BIBOX) AMD Processor in a Box (PIB) 

using the HTML Toke parser in Perl, the regular expression that i have
is

if($subject =3D~
m/ Read\s*product\s*information\s*for\s*the
\s*([\s\w\&\:\;\"'\-\_\>\<\\\/\)=
\(,\.\?\!\~\%\*\+\=3D\@\$]+)/gi)


the line parses everything after 'the' like it parseses "AMD Athlon"
but when it encounters the TM (symbol) it drops everything after that.
When I need everything till the end of line.

Is there a special symbol / wild card character that i can include in
this regular expression so as to keep parsing the text till the end of
line.

Thank You in anticipation

Thomas J.

2006-08-29, 9:57 pm

soyebb@gmail.com schrieb:

>
> if($subject =~
> m/ Read\s*product\s*information\s*for\s*the
\s*([\s\w\&\:\;\"'\-\_\>\<\\\/\)\(,\.\?\!\~\%\*\+\=\@\$]+)/gi)
>
>
> the line parses everything after 'the' like it parseses "AMD Athlon"
> but when it encounters the TM (symbol) it drops everything after that.
> When I need everything till the end of line.


.... end of your variable, i think?

>
> Is there a special symbol / wild card character that i can include in
> this regular expression so as to keep parsing the text till the end of
> line.
>


the answer depends on the charset of your input.
The "TM"-glyphe is not in your Character-Class.
If you want all to "end-of-line", why not using
(.+) instead of
([\s\w\&\:\;\"'\-\_\>\<\\\/\)\(,\.\?\!\~\%\*\+\=\@\$]+) ?

Thomas

soyebb@gmail.com

2006-08-29, 9:57 pm

Thanks so much Thomas.
It worked

Thomas J. wrote:
> soyebb@gmail.com schrieb:
>
>
> ... end of your variable, i think?
>
>
> the answer depends on the charset of your input.
> The "TM"-glyphe is not in your Character-Class.
> If you want all to "end-of-line", why not using
> (.+) instead of
> ([\s\w\&\:\;\"'\-\_\>\<\\\/\)\(,\.\?\!\~\%\*\+\=\@\$]+) ?
>
> Thomas


Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com