For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > April 2005 > Some part of the text should not be converted









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Some part of the text should not be converted
N. Ganesh Babu

2005-04-26, 8:57 am

Dear All,

I am doing capitalization of the titles. In that the text within <u></u>
tag should be as input. means my conversion program should not touch
this text.

What I have used is, I have removed this part from the text in the
beginning and in the last once again I put the text back. This working
fine if I have only one case in the line. If I have multiple cases this
logic is not working.

my code is:

if($line=~m!<u>(.+)</u>!i)
{
$un=$1;
}
$line=~s!<u>(.+)</u>!<u></u>!ig;

code for conversion ....

$line=~s!<u></u>!$un!ig;

the above code is working if the input is like this. "A Practical
Guide to <u>CD-Rom</u>"
the output "A Practical Guide to CD-Rom"

I have tried with non-greedy by putting the question mark after + but
the DVD is also getting replaced with CD-Rom.

the above code is not working if the input is like this. "A Practical
Guide to <u>CD-Rom</u> and <u>DVD</u>"
the output "A Practical Guide to CD-Rom and CD-Rom"

Please help to solve this problem.

Regards,
Ganesh

Chris Cole

2005-04-26, 8:57 am

On Tue, 26 Apr 2005 12:52:32 +0530, N. Ganesh Babu wrote:

> Dear All,
>
> I am doing capitalization of the titles. In that the text within <u></u>
> tag should be as input. means my conversion program should not touch
> this text.
>
> What I have used is, I have removed this part from the text in the
> beginning and in the last once again I put the text back. This working
> fine if I have only one case in the line. If I have multiple cases this
> logic is not working.
>
> my code is:
>
> if($line=~m!<u>(.+)</u>!i)
> {
> $un=$1;
> }
> $line=~s!<u>(.+)</u>!<u></u>!ig;
>
> code for conversion ....
>
> $line=~s!<u></u>!$un!ig;
>
> the above code is working if the input is like this. "A Practical
> Guide to <u>CD-Rom</u>"
> the output "A Practical Guide to CD-Rom"
>
> I have tried with non-greedy by putting the question mark after + but
> the DVD is also getting replaced with CD-Rom.


This is not a greedy problem. Your logic is flawed.

> the above code is not working if the input is like this. "A Practical
> Guide to <u>CD-Rom</u> and <u>DVD</u>" the output "A Practical Guide to
> CD-Rom and CD-Rom"


This is because you're only capturing the first instance of <u>(.*)</u> on
a line and storing it in $un. So when you've got several on a line they
will all be replaced with the same $un.

> Please help to solve this problem.
>
> Regards,
> Ganesh


Why not do the capturing and substituting at the same time? e.g:

$line =~ s|<u>(.+?)</u>|$1|gi;

Chris.
Offer Kaye

2005-04-26, 8:57 am

On 4/26/05, N. Ganesh Babu wrote:
>=20
> the above code is not working if the input is like this. "A Practical
> Guide to <u>CD-Rom</u> and <u>DVD</u>"
> the output "A Practical Guide to CD-Rom and CD-Rom"
>=20


One way is to get the list of texts between the <u> and </u> tag. I
choose to do it together with the substitution, using the "e"
modifier:
my @un;
$line=3D~s!<u>(.+?)</u>!push @un,$1;"<u></u>"!ige;

After processing, put it back in using a for loop over the list of
saved texts, or an s///e construct:
$line=3D~s!<u></u>!shift @un!ige;

Note the question mark in ".+?". In reference to your question, yes,
you must use it, or the match will be greedy.
Personally I think you're workin too hard - you should be able to do
any processing on the line and not touch the the <u> delimited text,
without having to resort to removing it. But of course TIMTOWTDI :-)
Here is a complete working example:
###################### begin code
use strict;
use warnings;
while(defined(my $line=3D<DATA> )) {
print '-' x 80 , "\n";
print "Original line: $line";
my @un;
$line=3D~s!<u>(.+?)</u>!push @un,$1;"<u></u>"!ige;
#do something with line...
print "line after data removal: $line";
# put back the data
$line=3D~s!<u></u>!shift @un!ige;
print "line after data replace: $line";
}

__DATA__
A Practical Guide to <u>CD-Rom</u>
A Practical Guide to <u>CD-Rom</u> and <u>DVD</u>
###################### end code

BTW, I would be happy if any of the gurus on the list could shorten:
my @un;
$line=3D~s!<u>(.+?)</u>!push @un,$1;"<u></u>"!ige;
To a single line. Unlike "m//", "s///" never seems to return the
results of "()" in the RE, even in list context. Annoying :-(

HTH,
--=20
Offer Kaye
Offer Kaye

2005-04-26, 3:57 pm

On 4/26/05, N. Ganesh Babu wrote:
> Dear Offer Kaye,
> =20
> I want to preserve the <u> tag also in the context. Can you help me how =

to
> do it. If you run 2nd time also the same action will happen. If we remove=

,
> in the 2nd execution again the conversion will take place on these words.
> =20


Hi Ganesh,
I'm not following you - that do you mean "context"? What "2nd execution"?=
=20
Wild guess- you want the final line output from the code to include
the <u> tags? If so, simply use:
$line=3D~s!(<u>.+?</u> )!push @un,$1;"<u></u>"!ige;
So now the tags as well as the text are saved into @un and will appear
in the final output line.

Please read "perldoc perlrequick", it will help you learn regular
expressions in Perl. You can read it online at:
http://perldoc.perl.org/perlrequick.html

HTH,
--=20
Offer Kaye
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com