Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

Trouble with m///g
Hi,

I'm trying to extract all four-digit numbers from a string in one fell
swoop, but I can't seem to come up with the proper regexp.  This is my
first time using /g in a match so maybe there's a trick I'm missing.

For example, the string

"1111 2222aa3333 444 55555555 6666 7777-8888"

should yield

1111, 2222, 3333, 6666, 7777, 8888.

Here's one attempt that I thought had a reasonable chance.

- - - - -
#!/usr/bin/perl -w
my $foo = "1111 2222aa3333 444 55555555 6666 7777-8888";
my @a = ($foo =~ m'[\D^](\d{4})[\D$]'g);
print "<$foo>\n";
print(join(":",@a)."\n");
- - - - -

<1111 2222aa3333 444 55555555 6666 7777-8888>
2222:3333:6666

Thanks for your consideration,
Chap


Report this thread to moderator Post Follow-up to this message
Old Post
Chap Harrison
09-30-04 04:24 PM


RE: Trouble with m///g
I think this might work.

/\b\d{4}\b/

Rob

-----Original Message-----
From: Chap Harrison [mailto:clh@pobox.com]
Sent: Thursday, September 30, 2004 10:38 AM
To: beginners@perl.org
Subject: Trouble with m///g


Hi,

I'm trying to extract all four-digit numbers from a string in one fell
swoop, but I can't seem to come up with the proper regexp.  This is my
first time using /g in a match so maybe there's a trick I'm missing.

For example, the string

"1111 2222aa3333 444 55555555 6666 7777-8888"

should yield

1111, 2222, 3333, 6666, 7777, 8888.

Here's one attempt that I thought had a reasonable chance.

- - - - -
#!/usr/bin/perl -w
my $foo = "1111 2222aa3333 444 55555555 6666 7777-8888";
my @a = ($foo =~ m'[\D^](\d{4})[\D$]'g);
print "<$foo>\n";
print(join(":",@a)."\n");
- - - - -

<1111 2222aa3333 444 55555555 6666 7777-8888>
2222:3333:6666

Thanks for your consideration,
Chap


--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Report this thread to moderator Post Follow-up to this message
Old Post
Rob Hanson
09-30-04 04:24 PM


Re: Trouble with m///g
> For example, the string
>
> "1111 2222aa3333 444 55555555 6666 7777-8888"
>
> should yield
>
> 1111, 2222, 3333, 6666, 7777, 8888.

That's actually kind of tricky. How about :

$aa = "1111 2222aa3333 444 55555555 6666 7777-8888";
@aa = $aa =~ /(?<!\d)\d{4}(?!\d)/g;
print "$_\n" for @aa;

That gets 2222 and 3333 also, which the \b solution skips. What it
says is to get all groups of 4 numbers not following or followed by
another number.

Dave

ps - also see perldoc -f perlre and look for zero-width negative
look(ahead|behind) assertions

Report this thread to moderator Post Follow-up to this message
Old Post
Dave Gray
09-30-04 04:24 PM


RE: Trouble with m///g
Please bottom post...

> I think this might work.
>

It might, but doesn't. Some testing would be good before posting
inaccurate responses.

> /\b\d{4}\b/
>

\b is matching on boundaries, so you miss the first set, and the set
with the 'aa' around them, and then there is the set with the '-'....

> Rob
>
> -----Original Message-----
> From: Chap Harrison [mailto:clh@pobox.com]
> Sent: Thursday, September 30, 2004 10:38 AM
> To: beginners@perl.org
> Subject: Trouble with m///g
>
>
> Hi,
>
> I'm trying to extract all four-digit numbers from a string in one fell
> swoop, but I can't seem to come up with the proper regexp.  This is my
> first time using /g in a match so maybe there's a trick I'm missing.
>
> For example, the string
>
> "1111 2222aa3333 444 55555555 6666 7777-8888"
>
> should yield
>
> 1111, 2222, 3333, 6666, 7777, 8888.
>
> Here's one attempt that I thought had a reasonable chance.
>
> - - - - -
> #!/usr/bin/perl -w
> my $foo = "1111 2222aa3333 444 55555555 6666 7777-8888";
> my @a = ($foo =~ m'[\D^](\d{4})[\D$]'g);
> print "<$foo>\n";
> print(join(":",@a)."\n");
> - - - - -
>
> <1111 2222aa3333 444 55555555 6666 7777-8888>
> 2222:3333:6666
>
> Thanks for your consideration,
> Chap
>

Out of curiousity based on your description shouldn't it return,

1111:2222:3333:5555:5555:6666:7777:8888

Or do you really mean, you are trying to capture all 4 digit strings
that are not in a string of longer digits?  You need to be very explicit
about what you are after.  I think (and have tested) that,

my @a = ($foo =~ m'(?<!\d{4})\d{4}(?!\d)'g);

Gives you want you want, though I don't claim to be a regex expert like
others on the list (are experts, rather than claiming). And I *believe*
says, match any 4 digit string not preceded by a 4 digit string and not
followed by a digit.

Works?

http://danconia.org



Report this thread to moderator Post Follow-up to this message
Old Post
Wiggins d Anconia
09-30-04 04:24 PM


Re: Trouble with m///g
Hmmm...

m'\b(\d{4})\b'g
<1111 2222aa3333 444 55555555 6666 7777-8888>
1111:6666:7777:8888

Doesn't give me 2222 or 3333.  I think the problem has to do with where
m///g starts on subsequent iterations.  The pattern specifies a
delimiter for both the start and the end of the target substring, but
that means it will want to find an ending delim on iteration n,
followed by a beginning delim on iteration n+1.


On Sep 30, 2004, at 9:41 AM, Hanson, Rob wrote:

> I think this might work.
>
> /\b\d{4}\b/
>
> Rob
>
> -----Original Message-----
> From: Chap Harrison [mailto:clh@pobox.com]
> Sent: Thursday, September 30, 2004 10:38 AM
> To: beginners@perl.org
> Subject: Trouble with m///g
>
>
> Hi,
>
> I'm trying to extract all four-digit numbers from a string in one fell
> swoop, but I can't seem to come up with the proper regexp.  This is my
> first time using /g in a match so maybe there's a trick I'm missing.
>
> For example, the string
>
> "1111 2222aa3333 444 55555555 6666 7777-8888"
>
> should yield
>
> 1111, 2222, 3333, 6666, 7777, 8888.
>
> Here's one attempt that I thought had a reasonable chance.
>
> - - - - -
> #!/usr/bin/perl -w
> my $foo = "1111 2222aa3333 444 55555555 6666 7777-8888";
> my @a = ($foo =~ m'[\D^](\d{4})[\D$]'g);
> print "<$foo>\n";
> print(join(":",@a)."\n");
> - - - - -
>
> <1111 2222aa3333 444 55555555 6666 7777-8888>
> 2222:3333:6666
>
> Thanks for your consideration,
> Chap
>
>
> --
> To unsubscribe, e-mail: beginners-unsubscribe@perl.org
> For additional commands, e-mail: beginners-help@perl.org
> <http://learn.perl.org/> <http://learn.perl.org/first-response>
>
>
> --
> To unsubscribe, e-mail: beginners-unsubscribe@perl.org
> For additional commands, e-mail: beginners-help@perl.org
> <http://learn.perl.org/> <http://learn.perl.org/first-response>
>
>
>


Report this thread to moderator Post Follow-up to this message
Old Post
Chap Harrison
09-30-04 04:24 PM


RE: Trouble with m///g
Chap Harrison wrote:
> Hi,
>
> I'm trying to extract all four-digit numbers from a string in one fell
> swoop, but I can't seem to come up with the proper regexp.  This is my
> first time using /g in a match so maybe there's a trick I'm missing.
>
> For example, the string
>
> "1111 2222aa3333 444 55555555 6666 7777-8888"
>
> should yield
>
> 1111, 2222, 3333, 6666, 7777, 8888.

TIMTOWTDI:

@list = grep length==4, /\d+/g

Report this thread to moderator Post Follow-up to this message
Old Post
Bob Showalter
09-30-04 08:57 PM


Re: Trouble with m///g
Chap Harrison wrote:
> I'm trying to extract all four-digit numbers from a string in one fell
> swoop, but I can't seem to come up with the proper regexp.  This is my
> first time using /g in a match so maybe there's a trick I'm missing.
>
> For example, the string
>
> "1111 2222aa3333 444 55555555 6666 7777-8888"
>
> should yield
>
> 1111, 2222, 3333, 6666, 7777, 8888.
>
> Here's one attempt that I thought had a reasonable chance.
>
> - - - - -
> #!/usr/bin/perl -w
> my $foo = "1111 2222aa3333 444 55555555 6666 7777-8888";
> my @a = ($foo =~ m'[\D^](\d{4})[\D$]'g);

The first character class requires that the number is preceeded by a
non-digit character. (The ^ character has no special meaning in a
character class.) Since the first number is not preceeded by anything,
1111 is not matched.

I suppose you meant to do:

my @a = ($foo =~ m'(?:\D|^)(\d{4})(?:\D|$)'g);

which gives

1111:3333:6666:8888

but that's not what you want either. The reason why e.g. 2222 is not
matched is that the space after 1111 is included in the first match,
so the second attempt to match starts at the first '2'...

You'd better use extended patterns, i.e. zero-width assertions:

my @a = $foo =~ /(?<!\d)\d{4}(?!\d)/g;

Read about extended patterns in "perldoc perlre".

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

Report this thread to moderator Post Follow-up to this message
Old Post
Gunnar Hjalmarsson
09-30-04 08:57 PM


Re: Trouble with m///g
On Sep 30, 2004, at 9:55 AM, Wiggins d Anconia wrote:

> Out of curiousity based on your description shouldn't it return,
>
> 1111:2222:3333:5555:5555:6666:7777:8888
>
> Or do you really mean, you are trying to capture all 4 digit strings
> that are not in a string of longer digits?  You need to be very
> explicit
> about what you are after.
>
>

The example was intended to resolve the ambiguities of my informal
description :-)   You correctly surmised what I was after.

> my @a = ($foo =~ m'(?<!\d{4})\d{4}(?!\d)'g);

And your solution works.  Now I'm going to study up on *how* it works!

Thanks, and also thanks to Dave and Gunnar for what appears to be the
same solution, and the references to extended patterns and zero-width
assertions.

Chap


Report this thread to moderator Post Follow-up to this message
Old Post
Chap Harrison
09-30-04 08:57 PM


Re: Trouble with m///g
> TIMTOWTDI:
>
>   @list = grep length==4, /\d+/g

Shouldn't that be:

@list = grep length==4, $foo =~ /\d+/g;

Cool solution, I wouldn't have thought to do it that way. I'm getting
varying Benchmarking results, though. I think it might have something
to do with grep speedups from 5.6.1 to 5.8.0... can anyone confirm
this?

On a box with 4 Xeon 2gigs with 5.6.1 and Benchmark v1:
Rate   grep wregex  regex
grep   55586/s     --   -13%   -23%
wregex 64061/s    15%     --   -12%
regex  72569/s    31%    13%     --

But, on another box with 1 AMD 1gig with 5.8.0 and Benchmark v1.0501:
Rate wregex  regex   grep
wregex 31437/s     --   -14%   -18%
regex  36470/s    16%     --    -5%
grep   38212/s    22%     5%     --


Confusing!

#!/usr/bin/perl -w
use strict;
use Benchmark qw/cmpthese/;

my ($aa);
$aa = "1111 2222aa3333 444 55555555 6666 7777-8888";

sub regex { my @aa = $aa =~ /(?<!\d)\d{4}(?!\d)/g }

# Wiggins ;-)
sub wregex { my @aa = $aa =~ /(?<!\d{4})\d{4}(?!\d{4})/g }

sub grep { my @aa = grep length==4, $aa =~ /\d+/g }

cmpthese(100000, {
regex => \®ex,
wregex => \&wregex,
grep  => \&grep,
});

Report this thread to moderator Post Follow-up to this message
Old Post
Dave Gray
09-30-04 08:57 PM


Re: Trouble with m///g
Chap Harrison wrote on 30.09.2004:

>
>On Sep 30, 2004, at 9:55 AM, Wiggins d Anconia wrote:
> 
>
>The example was intended to resolve the ambiguities of my informal=20
>description :-)   You correctly surmised what I was after.
> 
>
Careful, you mistyped the original proposition:

my @a =3D ($foo =3D~ m'(?<!\d)\d{4}(?!\d)'g);

This one will find a string consisting of four digits, neither preceded nor=
followed by a digit. In other words: exactly four digits. Your quote will =
find a string of four digits not preceded by another four digits, so it cou=
ld find a string of five, six or seven digits.

- Jan
--=20
How many Microsoft engineers does it take to screw in a lightbulb? None. Th=
ey just redefine "dark" as the new standard.

Report this thread to moderator Post Follow-up to this message
Old Post
Jan Eden
09-30-04 08:57 PM


Sponsored Links




Last Thread Next Thread Next
Pages (2): [1] 2 »
Search this forum -> 
Post New Thread

PERL Beginners archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 05:39 PM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.