For Programmers: Free Programming Magazines  


Home > Archive > PERL Miscellaneous > September 2006 > regular expression pb. with tags









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author regular expression pb. with tags
steeve_dun@SoftHome.net

2006-09-26, 8:03 am

Hi,
I want to make some pattern replacement. ie to delete every thing
that's between 2 tags.
For example for

1<tag> 2</tag>3
x<tag> a<tag> b </tag> c</tag>z

I want to get

1 3
x z

But I have a problem with embeded tags.
I've tried :
$text =~ s/\<tag\>(.*?)\<\/tag\>//sg;
but it doens't work for embeded tags. It gives:
13
x c</tag>z

Is there a way to deal with this?

Thank you

-steeve

David Squire

2006-09-26, 8:03 am

steeve_dun@SoftHome.net wrote:
> Hi,
> I want to make some pattern replacement. ie to delete every thing
> that's between 2 tags.
> For example for
>
> 1<tag> 2</tag>3
> x<tag> a<tag> b </tag> c</tag>z
>
> I want to get
>
> 1 3
> x z
>
> But I have a problem with embeded tags.
> I've tried :
> $text =~ s/\<tag\>(.*?)\<\/tag\>//sg;
> but it doens't work for embeded tags. It gives:
> 13
> x c</tag>z
>
> Is there a way to deal with this?


Yep. Don't try to use regular expressions to parse XML. Use a module
that understands XML. Go to CPAN and you will find many.


DS

anno4000@radom.zrz.tu-berlin.de

2006-09-26, 8:03 am

<steeve_dun@SoftHome.net> wrote in comp.lang.perl.misc:
> Hi,
> I want to make some pattern replacement. ie to delete every thing
> that's between 2 tags.
> For example for
>
> 1<tag> 2</tag>3
> x<tag> a<tag> b </tag> c</tag>z
>
> I want to get
>
> 1 3
> x z
>
> But I have a problem with embeded tags.
> I've tried :
> $text =~ s/\<tag\>(.*?)\<\/tag\>//sg;
> but it doens't work for embeded tags. It gives:
> 13
> x c</tag>z
>
> Is there a way to deal with this?


Not using regular expressions directly. Use one of the HTML-parsing
modules from CPAN.

Anno
Xicheng Jia

2006-09-26, 6:59 pm

steeve_dun@SoftHome.net wrote:
> Hi,
> I want to make some pattern replacement. ie to delete every thing
> that's between 2 tags.
> For example for
>
> 1<tag> 2</tag>3
> x<tag> a<tag> b </tag> c</tag>z
>
> I want to get
>
> 1 3
> x z
>
> But I have a problem with embeded tags.
> I've tried :
> $text =~ s/\<tag\>(.*?)\<\/tag\>//sg;
> but it doens't work for embeded tags. It gives:
> 13
> x c</tag>z
>
> Is there a way to deal with this?


Since you are using Perl, and XML is quite well formated, you may try
something like:

my $ptn;
$ptn = qr(<tag>(?:(??{$ptn})|.)*?</tag> )s;
$line =~ s/$ptn//g;

I am not encouraging you using regexes at work. But in case of some
small programs, using regexes might be much faster/easier if you know
what you do.

Regards,
Xicheng

Ted Zlatanov

2006-09-26, 6:59 pm

On 26 Sep 2006, steeve_dun@softhome.net wrote:

> I want to make some pattern replacement. ie to delete every thing
> that's between 2 tags.
> For example for
>
> 1<tag> 2</tag>3
> x<tag> a<tag> b </tag> c</tag>z
>
> I want to get
>
> 1 3
> x z
>
> But I have a problem with embeded tags.
> I've tried :
> $text =~ s/\<tag\>(.*?)\<\/tag\>//sg;
> but it doens't work for embeded tags. It gives:
> 13
> x c</tag>z
>
> Is there a way to deal with this?


For the first example, you're getting exactly what you wanted ("13").
Look at your input data.

For the second example, your requirements are not good. You don't say
whether you want to replace the outermost tags (in which case a regex
would work) or you want to balance tags. For outermost tag
replacement, use

$text =~ s/\<tag\>(.*)\<\/tag\>//sg;

but note that this will also replace "<tag>a</tag> extra <tag>b</tag>"
with "" and not " extra " as you may expect.

My guess is that you do want to balance tags, and you can use
Text::Balanced for that (especially if your text is not valid XML or
even SGML). If you are doing SGML/HTML/XML/etc. tagged formats then
you should search CPAN for the appropriate parser, as others have
suggested. Look at "perldoc -q html" as well.

Ted
steeve_dun@SoftHome.net

2006-09-27, 4:00 am

Thank you all
-steve

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com