Code Comments
Programming Forum and web based access to our favorite programming groups.Dear All,
I am doing capitalization of the titles. In that the text within <u></u>
tag should be as input. means my conversion program should not touch
this text.
What I have used is, I have removed this part from the text in the
beginning and in the last once again I put the text back. This working
fine if I have only one case in the line. If I have multiple cases this
logic is not working.
my code is:
if($line=~m!<u>(.+)</u>!i)
{
$un=$1;
}
$line=~s!<u>(.+)</u>!<u></u>!ig;
code for conversion ....
$line=~s!<u></u>!$un!ig;
the above code is working if the input is like this. "A Practical
Guide to <u>CD-Rom</u>"
the output "A Practical Guide to CD-Rom"
I have tried with non-greedy by putting the question mark after + but
the DVD is also getting replaced with CD-Rom.
the above code is not working if the input is like this. "A Practical
Guide to <u>CD-Rom</u> and <u>DVD</u>"
the output "A Practical Guide to CD-Rom and CD-Rom"
Please help to solve this problem.
Regards,
Ganesh
Post Follow-up to this messageOn Tue, 26 Apr 2005 12:52:32 +0530, N. Ganesh Babu wrote:
> Dear All,
>
> I am doing capitalization of the titles. In that the text within <u></u>
> tag should be as input. means my conversion program should not touch
> this text.
>
> What I have used is, I have removed this part from the text in the
> beginning and in the last once again I put the text back. This working
> fine if I have only one case in the line. If I have multiple cases this
> logic is not working.
>
> my code is:
>
> if($line=~m!<u>(.+)</u>!i)
> {
> $un=$1;
> }
> $line=~s!<u>(.+)</u>!<u></u>!ig;
>
> code for conversion ....
>
> $line=~s!<u></u>!$un!ig;
>
> the above code is working if the input is like this. "A Practical
> Guide to <u>CD-Rom</u>"
> the output "A Practical Guide to CD-Rom"
>
> I have tried with non-greedy by putting the question mark after + but
> the DVD is also getting replaced with CD-Rom.
This is not a greedy problem. Your logic is flawed.
> the above code is not working if the input is like this. "A Practical
> Guide to <u>CD-Rom</u> and <u>DVD</u>" the output "A Practical Guide to
> CD-Rom and CD-Rom"
This is because you're only capturing the first instance of <u>(.*)</u> on
a line and storing it in $un. So when you've got several on a line they
will all be replaced with the same $un.
> Please help to solve this problem.
>
> Regards,
> Ganesh
Why not do the capturing and substituting at the same time? e.g:
$line =~ s|<u>(.+?)</u>|$1|gi;
Chris.
Post Follow-up to this messageOn 4/26/05, N. Ganesh Babu wrote:
>=20
> the above code is not working if the input is like this. "A Practical
> Guide to <u>CD-Rom</u> and <u>DVD</u>"
> the output "A Practical Guide to CD-Rom and CD-Rom"
>=20
One way is to get the list of texts between the <u> and </u> tag. I
choose to do it together with the substitution, using the "e"
modifier:
my @un;
$line=3D~s!<u>(.+?)</u>!push @un,$1;"<u></u>"!ige;
After processing, put it back in using a for loop over the list of
saved texts, or an s///e construct:
$line=3D~s!<u></u>!shift @un!ige;
Note the question mark in ".+?". In reference to your question, yes,
you must use it, or the match will be greedy.
Personally I think you're workin too hard - you should be able to do
any processing on the line and not touch the the <u> delimited text,
without having to resort to removing it. But of course TIMTOWTDI :-)
Here is a complete working example:
###################### begin code
use strict;
use warnings;
while(defined(my $line=3D<DATA> )) {
print '-' x 80 , "\n";
print "Original line: $line";
my @un;
$line=3D~s!<u>(.+?)</u>!push @un,$1;"<u></u>"!ige;
#do something with line...
print "line after data removal: $line";
# put back the data
$line=3D~s!<u></u>!shift @un!ige;
print "line after data replace: $line";
}
__DATA__
A Practical Guide to <u>CD-Rom</u>
A Practical Guide to <u>CD-Rom</u> and <u>DVD</u>
###################### end code
BTW, I would be happy if any of the gurus on the list could shorten:
my @un;
$line=3D~s!<u>(.+?)</u>!push @un,$1;"<u></u>"!ige;
To a single line. Unlike "m//", "s///" never seems to return the
results of "()" in the RE, even in list context. Annoying :-(
HTH,
--=20
Offer Kaye
Post Follow-up to this messageOn 4/26/05, N. Ganesh Babu wrote: > Dear Offer Kaye, > =20 > I want to preserve the <u> tag also in the context. Can you help me how = to > do it. If you run 2nd time also the same action will happen. If we remove= , > in the 2nd execution again the conversion will take place on these words. > =20 Hi Ganesh, I'm not following you - that do you mean "context"? What "2nd execution"?= =20 Wild guess- you want the final line output from the code to include the <u> tags? If so, simply use: $line=3D~s!(<u>.+?</u> )!push @un,$1;"<u></u>"!ige; So now the tags as well as the text are saved into @un and will appear in the final output line. Please read "perldoc perlrequick", it will help you learn regular expressions in Perl. You can read it online at: http://perldoc.perl.org/perlrequick.html HTH, --=20 Offer Kaye
Post Follow-up to this message
Show a Printable Version
Email This Page to Someone!
Receive updates to this thread
Powered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.