Home > Archive > PERL Beginners > September 2006 > selecting a part of a string
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
selecting a part of a string
|
|
| Adriano Allora 2006-09-23, 6:57 pm |
| hi to all,
another silly question about a pattern matching which should work but
it doesn't.
I have a list af string similar to this one:
parola|n.c.,0,fem,sg,0|parola
and I need to select all the chars before the pipe and put them in a
variable.
That substitution does'n work:
#!/usr/bin/perl -w
use strict;
my(%gen, %act, %record, $tex, $char, @parola, $zut);
while(<> )
{
$tex =~ s/^([^|]+).*/$1/o;
print STDOUT "$i errore sulla linea: $_\n" if !$tex;
}
The error message is (for example):
Use of uninitialized value in substitution (s///) at ./contalettere.pl
line 6, <> line 10.
I suppose the rror is in the expression: [^|].
Someone can help me?
Thanks a lot,
alladr
|^|_|^|_|^| |^|_|^|_|^|
| | | |
| | | |
| |*\_/*\_/*\_/*\_/*\_/* | |
| |
| |
| |
| http://www.e-allora.net |
| |
| |
**************************************
| |
| nobull67@gmail.com 2006-09-23, 6:57 pm |
|
Adriano Allora wrote:
> another silly question about a pattern matching which should work but
> it doesn't.
What makes you think it should work?
> I have a list af string similar to this one:
>
> parola|n.c.,0,fem,sg,0|parola
Do you? Are you sure? Where do you have it?
> and I need to select all the chars before the pipe and put them in a
> variable.
Well, you can use m// or split.
my ($before) = /(.*?)\|/;
my ($before) = split /\|/;
> That substitution does'n work:
Which substitution? Why would you want substitution? You said you
wanted to select (by which I assume you mean capture) part of the
string into a variable. You never said to change the original string.
> #!/usr/bin/perl -w
In recent perl the -w switch has been superceeded by "use warnings"
> use strict;
> my(%gen, %act, %record, $tex, $char, @parola, $zut);
Nasty case of premature declaration you have there. Are you taking
anthing for it?
> while(<> )
> {
> $tex =~ s/^([^|]+).*/$1/o;
Do you know what the /o does there? (Trick question, it does nothing in
that code). Don't use /o unless you know what it does. Once you know
what it does you probably won't want to use it.
You have not put anything into $tex so it's undefined and when you
apply s/// to the string in $tex you'll get a warning.
You've been bitten by your premature declaration. If you'd get into
the habit of declaring variables where you first put something in them
you'd most likely have noticed that the line where you first put a
value in $tex is _completely_missing_ from your program.
> print STDOUT "$i errore sulla linea: $_\n" if !$tex;
You never declared $i so that line would not compile. This isn't your
real code then is it? When asking people to explain the bahaviour of
your code it is good manners to show them the code.
> }
>
> The error message is (for example):
>
> Use of uninitialized value in substitution (s///) at ./contalettere.pl
> line 6, <> line 10.
That is not an error, that is a warning. Note it's badly worded it
should say "undefined value".
> I suppose the error is in the expression: [^|].
Why woould you suppose that?
| |
| John W. Krahn 2006-09-23, 6:57 pm |
| Adriano Allora wrote:
> hi to all,
Hello,
> another silly question about a pattern matching which should work but it
> doesn't.
>
> I have a list af string similar to this one:
>
> parola|n.c.,0,fem,sg,0|parola
>
> and I need to select all the chars before the pipe and put them in a
> variable.
>
> That substitution does'n work:
>
> #!/usr/bin/perl -w
> use strict;
> my(%gen, %act, %record, $tex, $char, @parola, $zut);
> while(<> )
> {
> $tex =~ s/^([^|]+).*/$1/o;
> print STDOUT "$i errore sulla linea: $_\n" if !$tex;
> }
>
> The error message is (for example):
>
> Use of uninitialized value in substitution (s///) at
> ./contalettere.pl line 6, <> line 10.
>
> I suppose the rror is in the expression: [^|].
>
> Someone can help me?
The problem is that you are using the variable $tex but it doesn't contain
anything. You could do either:
while ( <> ) {
( my $tex = $_ ) =~ s/^([^|]+).*/$1/s;
print "$i errore sulla linea: $_\n" if !$tex;
}
Or:
while ( <> ) {
/^([^|]+)/ and my $tex = $1;
print "$i errore sulla linea: $_\n" if !$tex;
}
Or:
while ( <> ) {
my ( $tex ) = /^([^|]+)/;
print "$i errore sulla linea: $_\n" if !$tex;
}
But then you also have the problem that $i is not defined.
John
--
use Perl;
program
fulfillment
| |
| D. Bolliger 2006-09-23, 6:57 pm |
| Adriano Allora am Sunday, 24. September 2006 01:12:
> hi to all,
>
> another silly question about a pattern matching which should work but
> it doesn't.
>
> I have a list af string similar to this one:
>
> parola|n.c.,0,fem,sg,0|parola
>
> and I need to select all the chars before the pipe and put them in a
> variable.
>
> That substitution does'n work:
>
> #!/usr/bin/perl -w
> use strict;
> my(%gen, %act, %record, $tex, $char, @parola, $zut);
> while(<> )
> {
> $tex =3D~ s/^([^|]+).*/$1/o;
$tex is never set before used.
> print STDOUT "$i errore sulla linea: $_\n" if !$tex;
$i is not defined.
> }
>
> The error message is (for example):
>
> Use of uninitialized value in substitution (s///) at ./contalettere.pl
> line 6, <> line 10.
>
> I suppose the rror is in the expression: [^|].
No, it's referring to the fact that $1 is never defined because $tex is nev=
er=20
defined :-)
Here's a first modified version:
#!/usr/bin/perl
use strict;
use warnings;
while(<DATA> ) {
chomp;
s/^([^|]+).*/$1/;
print $_
? "ok: $_\n"=20
: "errore sulla linea $. ($_)\n";
}
__DATA__
heyq24614q35gh|-------------
efb=E4pkmmeth|qrhqeerherth
But you don't need a substitution, a simple match suffices:
#!/usr/bin/perl
use strict;
use warnings;
while(<DATA> ) {
chomp;
my $tex=3D /(.*?)\|/
? do {warn "ok: $1\n"; $1}
: do {warn "errore sulla linea $. ($_)\n"; undef};
}
__DATA__
heyq24614q35gh|-------------
efb=E4pkmmeth|qrhqeerherth
Hope this helps!
Dani
| |
| Rob Dixon 2006-09-23, 6:57 pm |
| Adriano Allora wrote:
>
> hi to all,
>
> another silly question about a pattern matching which should work but it
> doesn't.
>
> I have a list af string similar to this one:
>
> parola|n.c.,0,fem,sg,0|parola
>
> and I need to select all the chars before the pipe and put them in a
> variable.
>
> That substitution does'n work:
>
> #!/usr/bin/perl -w
> use strict;
> my(%gen, %act, %record, $tex, $char, @parola, $zut);
> while(<> )
> {
> $tex =~ s/^([^|]+).*/$1/o;
> print STDOUT "$i errore sulla linea: $_\n" if !$tex;
> }
>
> The error message is (for example):
>
> Use of uninitialized value in substitution (s///) at
> ./contalettere.pl line 6, <> line 10.
>
> I suppose the rror is in the expression: [^|].
>
> Someone can help me?
Hi Adriano.
Your regex is correct, but it doesn't do what you said you wanted! You're
substituting the entire string in $tex for just those characters up to the first
pipe, but there's nothing in $tex - the data has been read into $_.
If you want to do what you've written, then s/|.*// is a lot easier: it just
removes everything starting at the first pipe.
If you want to do what you said, and put everything up to the pipe into a
variable (scalar $tex?) then
/([^|]+)/;
$tex = $1;
will do it for you. (It captures 'parola' in the example string. is that right?)
And, by the way, you haven't declared the $i in the print() call.
And, by the way again, there's no point in putting the /o modifier on the
substitute as the regex contains no variables and so will be compiled only once
anyway.
HTH,
Rob
| |
| Mumia W. 2006-09-23, 9:57 pm |
| On 09/23/2006 07:07 PM, Rob Dixon wrote:
> [...]
> If you want to do what you said, and put everything up to the pipe into a
> variable (scalar $tex?) then
>
> /([^|]+)/;
> $tex = $1;
> [...]
No, you should always only use the match variables after you've
determined that the match was successful:
/^([^|]+)/ && $tex = $1;
or this:
$tex = $1 if /^([^|]+)/;
| |
| John W. Krahn 2006-09-23, 9:57 pm |
| Rob Dixon wrote:
> Adriano Allora wrote:
>
> Your regex is correct, but it doesn't do what you said you wanted! You're
> substituting the entire string in $tex for just those characters up to
> the first
> pipe, but there's nothing in $tex - the data has been read into $_.
>
> If you want to do what you've written, then s/|.*// is a lot easier: it
> just removes everything starting at the first pipe.
The | is for alternation so if you want to match a literal | character you
have to escape it:
s/\|.*//s
> If you want to do what you said, and put everything up to the pipe into a
> variable (scalar $tex?) then
>
> /([^|]+)/;
> $tex = $1;
You should only use the numeric variables if the match was successful
otherwise their contents may not be what you expected.
if ( /([^|]+)/ ) {
$tex = $1;
}
John
--
use Perl;
program
fulfillment
| |
| nobull67@gmail.com 2006-09-24, 3:58 am |
|
John W. Krahn wrote:
> Rob Dixon wrote:
>
> You should only use the numeric variables if the match was successful
> otherwise their contents may not be what you expected.
>
> if ( /([^|]+)/ ) {
> $tex = $1;
> }
If you look at the OPs code this still makes the same mistake! In Rob's
code a failed match would leave $tex contain whatever happened to be in
$1 previously. In John's it would contain whatever happened to be in
$tex previously.
This is because the OP is suffering from that oddly common affliction:
premature declaration. Variables should always be declared in the
smallest applicable scope unless there is a reason to do otherwise.
(Note: this is not specific to Perl, it applies in all languages).
BTW: The pattern does not do what the OP asked for. It does something
else which is, truth be told probably close enough. But it's a good
habit to always think about the edge cases.
What if $_ does not contain a '|'? Rob/John's pattern would return the
entire string whereas by a strict interpretation of the OPs requirement
you should get undef as there is no such thing as everything before the
'|' in a string with no '|'. This probably doesn't matter.
More realistically what if the first characters are one or more '|'.
Rob/John's pattern would return the part after the leading '|'s. By a
strict interpretation of the OPs requirement you should get an empty
string.
I'd KISS:
my ($tex) = /^(.*?)\|/; # ^ is redundant but aids readbility
| |
| Rob Dixon 2006-09-24, 7:57 am |
| John W. Krahn wrote:
> Rob Dixon wrote:
>
>
>
> The | is for alternation so if you want to match a literal | character you
> have to escape it:
>
> s/\|.*//s
Quite right John, Thank you.
>
>
> You should only use the numeric variables if the match was successful
> otherwise their contents may not be what you expected.
>
> if ( /([^|]+)/ ) {
> $tex = $1;
> }
Mumia made the same point. In context, this was the first and only pattern match
in the program, so $1 would have remained undefined if the match had failed. But
I agree, the code was not correct in general and I should have made that clear.
I must stop posting after 1:00 am :-/
Rob
|
|
|
|
|