For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > September 2006 > selecting a part of a string









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author selecting a part of a string
Adriano Allora

2006-09-23, 6:57 pm

hi to all,

another silly question about a pattern matching which should work but
it doesn't.

I have a list af string similar to this one:

parola|n.c.,0,fem,sg,0|parola

and I need to select all the chars before the pipe and put them in a
variable.

That substitution does'n work:

#!/usr/bin/perl -w
use strict;
my(%gen, %act, %record, $tex, $char, @parola, $zut);
while(<> )
{
$tex =~ s/^([^|]+).*/$1/o;
print STDOUT "$i errore sulla linea: $_\n" if !$tex;
}

The error message is (for example):

Use of uninitialized value in substitution (s///) at ./contalettere.pl
line 6, <> line 10.

I suppose the rror is in the expression: [^|].

Someone can help me?

Thanks a lot,

alladr

|^|_|^|_|^| |^|_|^|_|^|
| | | |
| | | |
| |*\_/*\_/*\_/*\_/*\_/* | |
| |
| |
| |
| http://www.e-allora.net |
| |
| |
**************************************

nobull67@gmail.com

2006-09-23, 6:57 pm


Adriano Allora wrote:

> another silly question about a pattern matching which should work but
> it doesn't.


What makes you think it should work?

> I have a list af string similar to this one:
>
> parola|n.c.,0,fem,sg,0|parola


Do you? Are you sure? Where do you have it?

> and I need to select all the chars before the pipe and put them in a
> variable.


Well, you can use m// or split.

my ($before) = /(.*?)\|/;

my ($before) = split /\|/;

> That substitution does'n work:


Which substitution? Why would you want substitution? You said you
wanted to select (by which I assume you mean capture) part of the
string into a variable. You never said to change the original string.

> #!/usr/bin/perl -w


In recent perl the -w switch has been superceeded by "use warnings"

> use strict;
> my(%gen, %act, %record, $tex, $char, @parola, $zut);


Nasty case of premature declaration you have there. Are you taking
anthing for it?

> while(<> )
> {
> $tex =~ s/^([^|]+).*/$1/o;


Do you know what the /o does there? (Trick question, it does nothing in
that code). Don't use /o unless you know what it does. Once you know
what it does you probably won't want to use it.

You have not put anything into $tex so it's undefined and when you
apply s/// to the string in $tex you'll get a warning.

You've been bitten by your premature declaration. If you'd get into
the habit of declaring variables where you first put something in them
you'd most likely have noticed that the line where you first put a
value in $tex is _completely_missing_ from your program.

> print STDOUT "$i errore sulla linea: $_\n" if !$tex;


You never declared $i so that line would not compile. This isn't your
real code then is it? When asking people to explain the bahaviour of
your code it is good manners to show them the code.

> }
>
> The error message is (for example):
>
> Use of uninitialized value in substitution (s///) at ./contalettere.pl
> line 6, <> line 10.


That is not an error, that is a warning. Note it's badly worded it
should say "undefined value".

> I suppose the error is in the expression: [^|].


Why woould you suppose that?

John W. Krahn

2006-09-23, 6:57 pm

Adriano Allora wrote:
> hi to all,


Hello,

> another silly question about a pattern matching which should work but it
> doesn't.
>
> I have a list af string similar to this one:
>
> parola|n.c.,0,fem,sg,0|parola
>
> and I need to select all the chars before the pipe and put them in a
> variable.
>
> That substitution does'n work:
>
> #!/usr/bin/perl -w
> use strict;
> my(%gen, %act, %record, $tex, $char, @parola, $zut);
> while(<> )
> {
> $tex =~ s/^([^|]+).*/$1/o;
> print STDOUT "$i errore sulla linea: $_\n" if !$tex;
> }
>
> The error message is (for example):
>
> Use of uninitialized value in substitution (s///) at
> ./contalettere.pl line 6, <> line 10.
>
> I suppose the rror is in the expression: [^|].
>
> Someone can help me?


The problem is that you are using the variable $tex but it doesn't contain
anything. You could do either:

while ( <> ) {
( my $tex = $_ ) =~ s/^([^|]+).*/$1/s;
print "$i errore sulla linea: $_\n" if !$tex;
}

Or:

while ( <> ) {
/^([^|]+)/ and my $tex = $1;
print "$i errore sulla linea: $_\n" if !$tex;
}

Or:

while ( <> ) {
my ( $tex ) = /^([^|]+)/;
print "$i errore sulla linea: $_\n" if !$tex;
}


But then you also have the problem that $i is not defined.



John
--
use Perl;
program
fulfillment
D. Bolliger

2006-09-23, 6:57 pm

Adriano Allora am Sunday, 24. September 2006 01:12:
> hi to all,
>
> another silly question about a pattern matching which should work but
> it doesn't.
>
> I have a list af string similar to this one:
>
> parola|n.c.,0,fem,sg,0|parola
>
> and I need to select all the chars before the pipe and put them in a
> variable.
>
> That substitution does'n work:
>
> #!/usr/bin/perl -w
> use strict;
> my(%gen, %act, %record, $tex, $char, @parola, $zut);
> while(<> )
> {
> $tex =3D~ s/^([^|]+).*/$1/o;


$tex is never set before used.

> print STDOUT "$i errore sulla linea: $_\n" if !$tex;


$i is not defined.

> }
>
> The error message is (for example):
>
> Use of uninitialized value in substitution (s///) at ./contalettere.pl
> line 6, <> line 10.
>
> I suppose the rror is in the expression: [^|].


No, it's referring to the fact that $1 is never defined because $tex is nev=
er=20
defined :-)

Here's a first modified version:

#!/usr/bin/perl

use strict;
use warnings;

while(<DATA> ) {
chomp;
s/^([^|]+).*/$1/;
print $_
? "ok: $_\n"=20
: "errore sulla linea $. ($_)\n";
}

__DATA__
heyq24614q35gh|-------------
efb=E4pkmmeth|qrhqeerherth


But you don't need a substitution, a simple match suffices:

#!/usr/bin/perl

use strict;
use warnings;

while(<DATA> ) {
chomp;
my $tex=3D /(.*?)\|/
? do {warn "ok: $1\n"; $1}
: do {warn "errore sulla linea $. ($_)\n"; undef};
}
__DATA__
heyq24614q35gh|-------------
efb=E4pkmmeth|qrhqeerherth



Hope this helps!

Dani
Rob Dixon

2006-09-23, 6:57 pm

Adriano Allora wrote:
>
> hi to all,
>
> another silly question about a pattern matching which should work but it
> doesn't.
>
> I have a list af string similar to this one:
>
> parola|n.c.,0,fem,sg,0|parola
>
> and I need to select all the chars before the pipe and put them in a
> variable.
>
> That substitution does'n work:
>
> #!/usr/bin/perl -w
> use strict;
> my(%gen, %act, %record, $tex, $char, @parola, $zut);
> while(<> )
> {
> $tex =~ s/^([^|]+).*/$1/o;
> print STDOUT "$i errore sulla linea: $_\n" if !$tex;
> }
>
> The error message is (for example):
>
> Use of uninitialized value in substitution (s///) at
> ./contalettere.pl line 6, <> line 10.
>
> I suppose the rror is in the expression: [^|].
>
> Someone can help me?


Hi Adriano.

Your regex is correct, but it doesn't do what you said you wanted! You're
substituting the entire string in $tex for just those characters up to the first
pipe, but there's nothing in $tex - the data has been read into $_.

If you want to do what you've written, then s/|.*// is a lot easier: it just
removes everything starting at the first pipe.

If you want to do what you said, and put everything up to the pipe into a
variable (scalar $tex?) then

/([^|]+)/;
$tex = $1;

will do it for you. (It captures 'parola' in the example string. is that right?)

And, by the way, you haven't declared the $i in the print() call.

And, by the way again, there's no point in putting the /o modifier on the
substitute as the regex contains no variables and so will be compiled only once
anyway.

HTH,

Rob


Mumia W.

2006-09-23, 9:57 pm

On 09/23/2006 07:07 PM, Rob Dixon wrote:
> [...]
> If you want to do what you said, and put everything up to the pipe into a
> variable (scalar $tex?) then
>
> /([^|]+)/;
> $tex = $1;
> [...]


No, you should always only use the match variables after you've
determined that the match was successful:

/^([^|]+)/ && $tex = $1;
or this:
$tex = $1 if /^([^|]+)/;



John W. Krahn

2006-09-23, 9:57 pm

Rob Dixon wrote:
> Adriano Allora wrote:
>
> Your regex is correct, but it doesn't do what you said you wanted! You're
> substituting the entire string in $tex for just those characters up to
> the first
> pipe, but there's nothing in $tex - the data has been read into $_.
>
> If you want to do what you've written, then s/|.*// is a lot easier: it
> just removes everything starting at the first pipe.


The | is for alternation so if you want to match a literal | character you
have to escape it:

s/\|.*//s


> If you want to do what you said, and put everything up to the pipe into a
> variable (scalar $tex?) then
>
> /([^|]+)/;
> $tex = $1;


You should only use the numeric variables if the match was successful
otherwise their contents may not be what you expected.

if ( /([^|]+)/ ) {
$tex = $1;
}



John
--
use Perl;
program
fulfillment
nobull67@gmail.com

2006-09-24, 3:58 am


John W. Krahn wrote:
> Rob Dixon wrote:


>
> You should only use the numeric variables if the match was successful
> otherwise their contents may not be what you expected.
>
> if ( /([^|]+)/ ) {
> $tex = $1;
> }


If you look at the OPs code this still makes the same mistake! In Rob's
code a failed match would leave $tex contain whatever happened to be in
$1 previously. In John's it would contain whatever happened to be in
$tex previously.

This is because the OP is suffering from that oddly common affliction:
premature declaration. Variables should always be declared in the
smallest applicable scope unless there is a reason to do otherwise.
(Note: this is not specific to Perl, it applies in all languages).

BTW: The pattern does not do what the OP asked for. It does something
else which is, truth be told probably close enough. But it's a good
habit to always think about the edge cases.

What if $_ does not contain a '|'? Rob/John's pattern would return the
entire string whereas by a strict interpretation of the OPs requirement
you should get undef as there is no such thing as everything before the
'|' in a string with no '|'. This probably doesn't matter.

More realistically what if the first characters are one or more '|'.
Rob/John's pattern would return the part after the leading '|'s. By a
strict interpretation of the OPs requirement you should get an empty
string.

I'd KISS:

my ($tex) = /^(.*?)\|/; # ^ is redundant but aids readbility

Rob Dixon

2006-09-24, 7:57 am

John W. Krahn wrote:
> Rob Dixon wrote:
>
>
>
> The | is for alternation so if you want to match a literal | character you
> have to escape it:
>
> s/\|.*//s


Quite right John, Thank you.

>
>
> You should only use the numeric variables if the match was successful
> otherwise their contents may not be what you expected.
>
> if ( /([^|]+)/ ) {
> $tex = $1;
> }


Mumia made the same point. In context, this was the first and only pattern match
in the program, so $1 would have remained undefined if the match had failed. But
I agree, the code was not correct in general and I should have made that clear.

I must stop posting after 1:00 am :-/

Rob

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com