Home > Archive > PERL Beginners > August 2006 > parsing special characters in perl
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
parsing special characters in perl
|
|
| soyebb@gmail.com 2006-08-29, 9:57 pm |
| Hi,
I am trying to parse a line
..=2E.Read product information for the AMD Athlon=99 64 3200+, 2.0 GHz
Athlon 64 (AMDADA3200BIBOX) AMD Processor in a Box (PIB)
using the HTML Toke parser in Perl, the regular expression that i have
is
if($subject =3D~
m/ Read\s*product\s*information\s*for\s*the
\s*([\s\w\&\:\;\"'\-\_\>\<\\\/\)=
\(,\.\?\!\~\%\*\+\=3D\@\$]+)/gi)
the line parses everything after 'the' like it parseses "AMD Athlon"
but when it encounters the TM (symbol) it drops everything after that.
When I need everything till the end of line.
Is there a special symbol / wild card character that i can include in
this regular expression so as to keep parsing the text till the end of
line.
Thank You in anticipation
| |
| Thomas J. 2006-08-29, 9:57 pm |
| soyebb@gmail.com schrieb:
>
> if($subject =~
> m/ Read\s*product\s*information\s*for\s*the
\s*([\s\w\&\:\;\"'\-\_\>\<\\\/\)\(,\.\?\!\~\%\*\+\=\@\$]+)/gi)
>
>
> the line parses everything after 'the' like it parseses "AMD Athlon"
> but when it encounters the TM (symbol) it drops everything after that.
> When I need everything till the end of line.
.... end of your variable, i think?
>
> Is there a special symbol / wild card character that i can include in
> this regular expression so as to keep parsing the text till the end of
> line.
>
the answer depends on the charset of your input.
The "TM"-glyphe is not in your Character-Class.
If you want all to "end-of-line", why not using
(.+) instead of
([\s\w\&\:\;\"'\-\_\>\<\\\/\)\(,\.\?\!\~\%\*\+\=\@\$]+) ?
Thomas
| |
| soyebb@gmail.com 2006-08-29, 9:57 pm |
| Thanks so much Thomas.
It worked
Thomas J. wrote:
> soyebb@gmail.com schrieb:
>
>
> ... end of your variable, i think?
>
>
> the answer depends on the charset of your input.
> The "TM"-glyphe is not in your Character-Class.
> If you want all to "end-of-line", why not using
> (.+) instead of
> ([\s\w\&\:\;\"'\-\_\>\<\\\/\)\(,\.\?\!\~\%\*\+\=\@\$]+) ?
>
> Thomas
|
|
|
|
|