Home > Archive > PERL Beginners > April 2005 > Some part of the text should not be converted
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Some part of the text should not be converted
|
|
| N. Ganesh Babu 2005-04-26, 8:57 am |
| Dear All,
I am doing capitalization of the titles. In that the text within <u></u>
tag should be as input. means my conversion program should not touch
this text.
What I have used is, I have removed this part from the text in the
beginning and in the last once again I put the text back. This working
fine if I have only one case in the line. If I have multiple cases this
logic is not working.
my code is:
if($line=~m!<u>(.+)</u>!i)
{
$un=$1;
}
$line=~s!<u>(.+)</u>!<u></u>!ig;
code for conversion ....
$line=~s!<u></u>!$un!ig;
the above code is working if the input is like this. "A Practical
Guide to <u>CD-Rom</u>"
the output "A Practical Guide to CD-Rom"
I have tried with non-greedy by putting the question mark after + but
the DVD is also getting replaced with CD-Rom.
the above code is not working if the input is like this. "A Practical
Guide to <u>CD-Rom</u> and <u>DVD</u>"
the output "A Practical Guide to CD-Rom and CD-Rom"
Please help to solve this problem.
Regards,
Ganesh
| |
| Chris Cole 2005-04-26, 8:57 am |
| On Tue, 26 Apr 2005 12:52:32 +0530, N. Ganesh Babu wrote:
> Dear All,
>
> I am doing capitalization of the titles. In that the text within <u></u>
> tag should be as input. means my conversion program should not touch
> this text.
>
> What I have used is, I have removed this part from the text in the
> beginning and in the last once again I put the text back. This working
> fine if I have only one case in the line. If I have multiple cases this
> logic is not working.
>
> my code is:
>
> if($line=~m!<u>(.+)</u>!i)
> {
> $un=$1;
> }
> $line=~s!<u>(.+)</u>!<u></u>!ig;
>
> code for conversion ....
>
> $line=~s!<u></u>!$un!ig;
>
> the above code is working if the input is like this. "A Practical
> Guide to <u>CD-Rom</u>"
> the output "A Practical Guide to CD-Rom"
>
> I have tried with non-greedy by putting the question mark after + but
> the DVD is also getting replaced with CD-Rom.
This is not a greedy problem. Your logic is flawed.
> the above code is not working if the input is like this. "A Practical
> Guide to <u>CD-Rom</u> and <u>DVD</u>" the output "A Practical Guide to
> CD-Rom and CD-Rom"
This is because you're only capturing the first instance of <u>(.*)</u> on
a line and storing it in $un. So when you've got several on a line they
will all be replaced with the same $un.
> Please help to solve this problem.
>
> Regards,
> Ganesh
Why not do the capturing and substituting at the same time? e.g:
$line =~ s|<u>(.+?)</u>|$1|gi;
Chris.
| |
| Offer Kaye 2005-04-26, 8:57 am |
| On 4/26/05, N. Ganesh Babu wrote:
>=20
> the above code is not working if the input is like this. "A Practical
> Guide to <u>CD-Rom</u> and <u>DVD</u>"
> the output "A Practical Guide to CD-Rom and CD-Rom"
>=20
One way is to get the list of texts between the <u> and </u> tag. I
choose to do it together with the substitution, using the "e"
modifier:
my @un;
$line=3D~s!<u>(.+?)</u>!push @un,$1;"<u></u>"!ige;
After processing, put it back in using a for loop over the list of
saved texts, or an s///e construct:
$line=3D~s!<u></u>!shift @un!ige;
Note the question mark in ".+?". In reference to your question, yes,
you must use it, or the match will be greedy.
Personally I think you're workin too hard - you should be able to do
any processing on the line and not touch the the <u> delimited text,
without having to resort to removing it. But of course TIMTOWTDI :-)
Here is a complete working example:
###################### begin code
use strict;
use warnings;
while(defined(my $line=3D<DATA> )) {
print '-' x 80 , "\n";
print "Original line: $line";
my @un;
$line=3D~s!<u>(.+?)</u>!push @un,$1;"<u></u>"!ige;
#do something with line...
print "line after data removal: $line";
# put back the data
$line=3D~s!<u></u>!shift @un!ige;
print "line after data replace: $line";
}
__DATA__
A Practical Guide to <u>CD-Rom</u>
A Practical Guide to <u>CD-Rom</u> and <u>DVD</u>
###################### end code
BTW, I would be happy if any of the gurus on the list could shorten:
my @un;
$line=3D~s!<u>(.+?)</u>!push @un,$1;"<u></u>"!ige;
To a single line. Unlike "m//", "s///" never seems to return the
results of "()" in the RE, even in list context. Annoying :-(
HTH,
--=20
Offer Kaye
| |
| Offer Kaye 2005-04-26, 3:57 pm |
| On 4/26/05, N. Ganesh Babu wrote:
> Dear Offer Kaye,
> =20
> I want to preserve the <u> tag also in the context. Can you help me how =
to
> do it. If you run 2nd time also the same action will happen. If we remove=
,
> in the 2nd execution again the conversion will take place on these words.
> =20
Hi Ganesh,
I'm not following you - that do you mean "context"? What "2nd execution"?=
=20
Wild guess- you want the final line output from the code to include
the <u> tags? If so, simply use:
$line=3D~s!(<u>.+?</u> )!push @un,$1;"<u></u>"!ige;
So now the tags as well as the text are saved into @un and will appear
in the final output line.
Please read "perldoc perlrequick", it will help you learn regular
expressions in Perl. You can read it online at:
http://perldoc.perl.org/perlrequick.html
HTH,
--=20
Offer Kaye
|
|
|
|
|