For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > January 2006 > including . in a pattern match









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author including . in a pattern match
Keith Worthington

2006-01-27, 6:57 pm

Hi All,

I am still a newbie in Perl and it is only with the help of this list that I was
able to construct the following pattern match.

($v_size_str =~ /\d+\s*['"]\s*x\s*\d+\s*['"]/i)

The problem that I have just realized that this string will match the first line
of this input file but not the second.

Border: None Size: 2.5' x 10' Tag: None
Border: None Size: 10' x 2.5' Tag: None

Here is the test perl script.
#!/usr/bin/env perl
use strict;
use warnings;

open(INFILE, "small_input.txt") or die "Can't open input.txt: $!";

while (<INFILE> ) { # assigns each line in turn to $_
my $v_border_id = "";
my $v_dim1_total = 0;
my $v_dim2_total = 0;
my $v_tag = "";

# Echo out the input line.
print "\nInput line: $_";
# Perform a case insensitive check for the proper data format. Capture the
# desired parts of the data using parentheses.
if (/.*border:\s*(.*?)\s*size:\s*(.*?)\s*tag:\s*(.*)\s*/i){
print "properly formatted\n";
# Store the capture patterns in variables to avoid unpredictable results.
my ($v_border_str, $v_size_str, $v_tag_str) = ($1, $2, $3);
# Parse up the size string.
if ($v_size_str =~ /\d+\s*['"]\s*x\s*\d+\s*['"]/i){
print "It looks like a size string.\n";
} else {
print "It doesn't look like a size string.\n";
}
} else {
print "bad format\n";
$v_border_id = "";
$v_dim1_total = 0;
$v_dim2_total = 0;
$v_tag = "";
}
}

close INFILE;

Okay so the pattern is able to ignore a decimal in the first dimesion because it
only needs one digit before the unit indicator. On the right it is trying to
match the whole thing so I need something to match the potential decimal point.
I have tried a couple of things but I am struggling with how to optionally
match the decimal point.

I think what I need is the code equivilant of:
zero or more numbers followed by
zero or one decimal point followed by
one or more numbers followed by
either ' or "

I appreciate your time in giving me some help with this issue. URL's to
documentation are welcome.

Kind Regards,
Keith
Paul Lalli

2006-01-27, 6:57 pm

Keith Worthington wrote:

> I am still a newbie in Perl and it is only with the help of this list that I was
> able to construct the following pattern match.
>
> ($v_size_str =~ /\d+\s*['"]\s*x\s*\d+\s*['"]/i)
>
> The problem that I have just realized that this string will match the first line
> of this input file but not the second.
>
> Border: None Size: 2.5' x 10' Tag: None
> Border: None Size: 10' x 2.5' Tag: None



> I think what I need is the code equivilant of:
> zero or more numbers followed by


\d*

> zero or one decimal point followed by


\.?

> one or more numbers followed by


\d+

> either ' or "


['"]


Putting that together is: \d*\.?\d+['"]

which will match things like
2'
2.5'
..5'

And it's that last one that concerns me. Is no whole-number portion an
acceptable data format? If so, you're all good. If not, if the whole
number portion will always be given and the decimal will only appear if
there is a fractional portion, then what you really want is:
one or more digits, possibly followed by a decimal and one or more
digits:
\d(\.\d+)?

> I appreciate your time in giving me some help with this issue. URL's to
> documentation are welcome.


perldoc perlretut
perldoc perlre
perldoc perlreref
http://search.cpan.org/~abigail/Reg...ommon/number.pm

Paul Lalli

Chas Owens

2006-01-27, 6:57 pm

On 1/27/06, Keith Worthington <keithw@narrowpathinc.com> wrote:
snip
> I have tried a couple of things but I am struggling with how to optional=

ly
> match the decimal point.
>
> I think what I need is the code equivilant of:
> zero or more numbers followed by
> zero or one decimal point followed by
> one or more numbers followed by
> either ' or "

snip

Take a look at Regexp::Common
(http://search.cpan.org/~abigail/Reg...egexp/Common.pm).
What you want is $RE{num}{real} out of that package. It will match
any real number.
Chas Owens

2006-01-27, 6:57 pm

Another thing you can do is break your larger regexes into parts to
make them more readable/maintainable. It also helps to use the x flag
so that you can separate the individual tokens and comment them.

#!/usr/bin/perl

use strict;
use warnings;

use Regexp::Common;

#FIXME: the border value should be (foo|bar|baz) where foo,
#bar, and baz are the valid values for a border
my $border =3D qr{ # match a border value
border: #constant indicating this is a border value
\s* #optional spaces
(.*) #the border value
}xi;
my $num =3D qr{ #match a number of feet or inches
($RE{num}{real}) #a real number
(['"]|'') #unit indicator ' is foot, " or '' is inches
}x;
my $dim =3D qr{ #match a dimension
$num #height value
\s* #optional spaces
(?:x|by) #x or by, don't capture
\s* #optional spaces
$num #width value
}xi;
my $size =3D qr{ #match a size value
size: #constant indication this is a size value
\s* #optional spaces
$dim #the dimensions
}xi;
my $tag =3D qr{ #match a tag value
tag: #constant indication this is a tag value
\s* #optional spaces
(.*) #the tag
}xi;
my $match=3Dqr{ #match the whole record for a widget
^ #start of the record
$border #a border value
\s* #optional spaces
$size #a size value
\s* #optional spaces
$tag #a tag value
$ #optional spaces
}x;

while(<> ) {
chomp;
if (/$border\s*$size\s*$tag/) {
my ($border, $h, $h_type, $w, $w_type, $tag) =3D
($1, $2, $3, $4, $5, $6);
print
"Border is $border\n",
"Height is ", ($h_type !~ /"|''/ ? $h*12 : $h), " inches\n"=
,
"Width is ", ($w_type !~ /"|''/ ? $w*12 : $w), " inches\n"=
,
"Tag is $tag\n";
} else {
print "invalid entry\n";
}
}
Chas Owens

2006-01-27, 6:57 pm

> my $match=3Dqr{ #match the whole record for a widget
> ^ #start of the record
> $border #a border value
> \s* #optional spaces
> $size #a size value
> \s* #optional spaces
> $tag #a tag value
> $ #optional spaces
> }x;

whoops, this should read:
$ #end of record
Keith Worthington

2006-01-30, 6:56 pm

On Fri, 27 Jan 2006 18:31:18 -0500, Chas Owens wrote
> Another thing you can do is break your larger regexes into parts to
> make them more readable/maintainable. It also helps to use the x flag
> so that you can separate the individual tokens and comment them.
>
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
> use Regexp::Common;
>
> #FIXME: the border value should be (foo|bar|baz) where foo,
> #bar, and baz are the valid values for a border
> my $border = qr{ # match a border value
> border: #constant indicating this is a border value
> \s* #optional spaces
> (.*) #the border value
> }xi;
> my $num = qr{ #match a number of feet or inches
> ($RE{num}{real}) #a real number
> (['"]|'') #unit indicator ' is foot, " or '' is inches
> }x;
> my $dim = qr{ #match a dimension
> $num #height value
> \s* #optional spaces
> (?:x|by) #x or by, don't capture
> \s* #optional spaces
> $num #width value
> }xi;
> my $size = qr{ #match a size value
> size: #constant indication this is a size value
> \s* #optional spaces
> $dim #the dimensions
> }xi;
> my $tag = qr{ #match a tag value
> tag: #constant indication this is a tag value
> \s* #optional spaces
> (.*) #the tag
> }xi;
> my $match=qr{ #match the whole record for a widget
> ^ #start of the record
> $border #a border value
> \s* #optional spaces
> $size #a size value
> \s* #optional spaces
> $tag #a tag value
> $ #optional spaces
> }x;
>
> while(<> ) {
> chomp;
> if (/$border\s*$size\s*$tag/) {
> my ($border, $h, $h_type, $w, $w_type, $tag) =
> ($1, $2, $3, $4, $5, $6);
> print
> "Border is $border\n",
> "Height is ", ($h_type !~ /"|''/ ? $h*12 : $h), "
> inches\n", "Width is ", ($w_type !~ /"|''/ ? $w*12 :
> $w), " inches\n", "Tag is $tag\n"; } else {
> print "invalid entry\n"; } }
>
> --
> To unsubscribe, e-mail: beginners-unsubscribe@perl.org
> For additional commands, e-mail: beginners-help@perl.org
> <http://learn.perl.org/> <http://learn.perl.org/first-response>


Chas,

Wow, it is going to take me some time to wrap my head around this code. I
really like the commenting idea. That certainly will help the next time around.
I don't get the lines where you defined the pattern.
i.e. my $border = qr{ _pattern_stuff }xi
Hmm is qr something special? I guess I will start with the x modifier man page.

What I would like to do first is to use this ($RE{num}{real}) construct. That
will simplify the code enormously. I am off to the man pages.

http://search.cpan.org/~abigail/Reg...egexp/Common.pm
http://search.cpan.org/~abigail/Reg...ommon/number.pm

Ahhh, a bright day dawns with plenty of opportunity for learning in sight. :-)

Kind Regards,
Keith
Chas Owens

2006-01-30, 6:56 pm

On 1/30/06, Keith Worthington <keithw@narrowpathinc.com> wrote:
snip
> Wow, it is going to take me some time to wrap my head around this code. =

I
> really like the commenting idea. That certainly will help the next time =

around.
> I don't get the lines where you defined the pattern.
> i.e. my $border =3D qr{ _pattern_stuff }xi
> Hmm is qr something special? I guess I will start with the x modifier ma=

n page.
snip

Yes, it is the "quote regex" operator. It takes a string and returns
a compiled regex. You can read more at "perldoc perlop" and "perldoc
perlre".
Keith Worthington

2006-01-30, 6:56 pm

On Mon, 30 Jan 2006 10:26:51 -0500, Chas Owens wrote
> On 1/30/06, Keith Worthington <keithw@narrowpathinc.com> wrote:
> snip
> snip
>
> Yes, it is the "quote regex" operator. It takes a string and returns
> a compiled regex. You can read more at "perldoc perlop" and "perldoc
> perlre".


Chas,

Thanks. I will run off to that man page.

I am considering that change the next time around. It certainly simplifies the
area where I am trying to perform the main match. It does however introduce a
little confusion for a newbie such as myself haveing a variable contain a
variable sort of thing.

The challenge that I am working on now is that although the $RE{num}{real}
simplifies the match of real numbers it does not appear to help me with
fractions. i.e. 36 1/2' Sighhhhh. I looked through Regexp::Common::number and
didn't see anything that pattern matched fractions. A search turned up
Number::Fraction but that doesn't seem to be about pattern matching.

I guess I will try to roll my own using the qr operator. A learning day! ;-)

Thanks for all your help.

Kind Regards,
Keith
Keith Worthington

2006-01-30, 6:56 pm

On Mon, 30 Jan 2006 10:26:51 -0500, Chas Owens wrote
> On 1/30/06, Keith Worthington <keithw@narrowpathinc.com> wrote:
> snip
> snip
>
> Yes, it is the "quote regex" operator. It takes a string and returns
> a compiled regex. You can read more at "perldoc perlop" and "perldoc
> perlre".


Chas,

Well, I am having some fun with this qr{} now. :-) I created the two pattern
matches and then threw the result into my expression. Absolutely amazing! Woohoo!

The only challenge is that the $p_number pattern seems to match all of the
following.
10 1/2
10-1/2
10--1/2

I started out with (\s+|-?) as my delimiter pattern and then when that didn't
work I changed it to (\s+|-{1,1}). It still matches the case with two hyphens.
Can you tell me why?

my $p_fraction = qr{ # match a fractional number
($RE{num}{int}) # integer pattern
(/) # fraction delimiter
($RE{num}{int}) # integer pattern
}x;

my $p_number = qr{ # match a number
(
($RE{num}{real}) # real pattern
|
($RE{num}{real}) # real pattern
(\s+|-{1,1}) # mixed number delimiter
($p_fraction) # fraction pattern
|
($p_fraction) # fraction pattern
)
}x;

if ($v_size_str =~ /$p_number\s*['"]\s*x\s*$p_number['"]/i){

Kind Regards,
Keith
Paul Lalli

2006-01-30, 6:56 pm

Keith Worthington wrote:
> The only challenge is that the $p_number pattern seems to match all of the
> following.
> 10 1/2
> 10-1/2
> 10--1/2
>
> I started out with (\s+|-?) as my delimiter pattern and then when that didn't
> work I changed it to (\s+|-{1,1}). It still matches the case with two hyphens.
> Can you tell me why?


Recall that pattern matching does not determine if the given string
"IS" the pattern, but only if it CONTAINS the pattern. You are
checking to see if $v_size_str CONTAINS a $p_number, space, a quote,
space, x, space, $p_number, and quote. There is nothing preventing
$v_size_str from having additional text either before or after this
pattern.

> my $p_fraction = qr{ # match a fractional number
> ($RE{num}{int}) # integer pattern
> (/) # fraction delimiter
> ($RE{num}{int}) # integer pattern
> }x;
>
> my $p_number = qr{ # match a number
> (
> ($RE{num}{real}) # real pattern
> |
> ($RE{num}{real}) # real pattern
> (\s+|-{1,1}) # mixed number delimiter
> ($p_fraction) # fraction pattern
> |
> ($p_fraction) # fraction pattern
> )
> }x;


Here, your $p_number pattern must CONTAIN either: a real number OR a
real followed by a space or dash followed by a fraction OR a fraction.

The string
10--1/2
does indeed contain this pattern. In fact, it contains it right off
the bat: 10 is a valid real number, so 10--1/2 CONTAINS a real number,
and therefore matches a $p_number.

You need to learn about "anchoring" patterns with the ^ and $ symbols.
These are "zero-width assertions" in a regexp, meaning that they do not
match any actual characters, but instead are "true" or not depending on
where the pattern is currently matching in the string. If the pattern
is currently at the beginning of the string, ^ is true. If it's at the
end of the string [1], $ is true.

So, we can re-write your pattern to be:

qr{ # match a number
^ # beginning of string
(
($RE{num}{real}) # real pattern
|
($RE{num}{real}) # real pattern
(
\s+ # multiple spaces
| # or
- # a single dash
)
($p_fraction) # fraction pattern
|
($p_fraction) # fraction pattern
)
$ # end of string
}x;


I strongly suggest, if you haven't already, reading the RegExp tutorial
at:
perldoc perlretut
and following it up with
perldoc perlre
and
perldoc perlreref

Paul Lalli

Keith Worthington

2006-01-30, 6:56 pm

snip
>
> The only challenge is that the $p_number pattern seems to match all of
> the following. 10 1/2 10-1/2 10--1/2
>
> I started out with (\s+|-?) as my delimiter pattern and then when that
> didn't work I changed it to (\s+|-{1,1}). It still matches the case
> with two hyphens. Can you tell me why?
>
> my $p_fraction = qr{ # match a fractional number
> ($RE{num}{int}) # integer pattern
> (/) # fraction delimiter
> ($RE{num}{int}) # integer pattern
> }x;
>
> my $p_number = qr{ # match a number
> (
> ($RE{num}{real}) # real pattern
> |
> ($RE{num}{real}) # real pattern
> (\s+|-{1,1}) # mixed number delimiter
> ($p_fraction) # fraction pattern
> |
> ($p_fraction) # fraction pattern
> )
> }x;


Well, the answer was right in fromt of me. The delimiter pattern is mattching
the first hyphen and the integer pattern is matching the second hyphen as a
negative sign. Arrrgh!

The Regex::Dommon::Number documentation doesn't indicate a way to match only
numbers without a sign. I am thinking I will just lose the - as part of the
delimiter and let the integer pattern take care of it. A bit messy but it will
work. If anyone can think of a better idea I would appreciate your time in
sharing it.

Kind Regards,
Keith
Chas Owens

2006-01-30, 6:56 pm

On 1/30/06, Keith Worthington <keithw@narrowpathinc.com> wrote:
snip
> Well, the answer was right in fromt of me. The delimiter pattern is matt=

ching
> the first hyphen and the integer pattern is matching the second hyphen as=

a
> negative sign. Arrrgh!
>
> The Regex::Dommon::Number documentation doesn't indicate a way to match o=

nly
> numbers without a sign. I am thinking I will just lose the - as part of =

the
> delimiter and let the integer pattern take care of it. A bit messy but i=

t will
> work. If anyone can think of a better idea I would appreciate your time =

in
> sharing it.
>
> Kind Regards,
> Keith


This may motivate me to hack on that module to add -unsigned and
-negative options.
Keith Worthington

2006-01-30, 6:56 pm

On Mon, 30 Jan 2006 13:26:42 -0500, Chas Owens wrote
> On 1/30/06, Keith Worthington <keithw@narrowpathinc.com> wrote:
> snip
>
> This may motivate me to hack on that module to add -unsigned and
> -negative options.


Obviously that would benefit someone in my situation that really doesn't want to
allow negative numbers. I think that hacking a module is well beyound me for
the next couple of months though. ;-)

Thanks for all the help. With the exception of that one delimiter issue my code
looks pretty good. Functionality is okay as well. I am going to take what you
have taught me and rewrite my script during the next phase of my project.

Kind Regards,
Keith
Chas Owens

2006-01-30, 6:56 pm

On 1/30/06, Keith Worthington <keithw@narrowpathinc.com> wrote:
snip
>
> Obviously that would benefit someone in my situation that really doesn't =

want to
> allow negative numbers. I think that hacking a module is well beyound me=

for
> the next couple of months though. ;-)
>
> Thanks for all the help. With the exception of that one delimiter issue =

my code
> looks pretty good. Functionality is okay as well. I am going to take wh=

at you
> have taught me and rewrite my script during the next phase of my project.
>
> Kind Regards,
> Keith
>


After a bit of digging I have realized why the unsigned option isn't
there: the pattern is too simple to be needed. Instead of using
$RE{num}{int} use qr{[1-9][0-9]*} or qr{[0-9]+} (depending on whether
or not you want to forbid leading zeros respectively).
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com