Home > Archive > PERL Beginners > November 2007 > Regex Help
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
|
|
| Omega -1911 2007-11-10, 7:59 am |
| Hello,
Can anyone assist with a regex to pull lottery numbers from a page?
The following is one example I have tried:
All data from the web page is pushed into an array. ( Header and
footer info is removed before being pushed into the array.) The entire
goal here is an exercise I was playing with to produce a report of the
most common winning lottery numbers. What I will need to be able to do
is place the most common 5 numbers (before the word "powerball") into
an array then place the powerball numbers into another array. Thanks
in advance.
@liners = split /(\s\[0-9],\s)Powerball:\s[0-9]/,$data_string;
_DATA_
22, 29, 35, 46, 52, Powerball: 2, Power Play: 5
1, 31, 38, 40, 53, Powerball: 42, Power Play: 2
6, 16, 18, 29, 37, Powerball: 24, Power Play: 2
| |
| Joy Peng 2007-11-10, 7:59 am |
| On Nov 10, 2007 5:10 PM, Omega -1911 <1911que@gmail.com> wrote:
>What I will need to be able to do
> is place the most common 5 numbers (before the word "powerball") into
> an array then place the powerball numbers into another array. Thanks
> in advance.
>
> @liners = split /(\s\[0-9],\s)Powerball:\s[0-9]/,$data_string;
>
> _DATA_
> 22, 29, 35, 46, 52, Powerball: 2, Power Play: 5
> 1, 31, 38, 40, 53, Powerball: 42, Power Play: 2
> 6, 16, 18, 29, 37, Powerball: 24, Power Play: 2
>
Hi,
I just think the data stru you need is a hash not two arrays.The
entire code can be:
use strict;
use warnings;
use Data::Dumper;
my %hash;
while(<DATA> ) {
my ($li,$powerb) = /^(.+)\,\s*Powerball\:\s*(\d+)/;
$hash{$powerb} = [split/,/,$li];
}
print Dumper \%hash;
__DATA__
22, 29, 35, 46, 52, Powerball: 2, Power Play: 5
1, 31, 38, 40, 53, Powerball: 42, Power Play: 2
6, 16, 18, 29, 37, Powerball: 24, Power Play: 2
| |
| Jonathan Lang 2007-11-10, 7:59 am |
| Omega -1911 wrote:
> @liners = split /(\s\[0-9],\s)Powerball:\s[0-9]/,$data_string;
Instead of split, just do a pattern match:
($a[0], $a[1], $a[2], $a[3], $a[4], $b) = /(\d+), (\d+), (\d+),
(\d+), (\d+), Powerball: (\d+)/;
This puts the first five numbers into the array @a, and puts the
powerball number into scalar $b.
Note that this tackles a single line of data. To get everything,
cycle through the lines using a "while (<DATA> )" and push the results
onto the two arrays as you get them:
push @common, @a; push @powerball, $b;
In whole, you get:
while (<DATA> ) {
($a[0], $a[1], $a[2], $a[3], $a[4], $b) = /(\d+), (\d+), (\d+),
(\d+), (\d+), Powerball: (\d+)/;
push @common, @a; push @powerball, $b;
}
When you're done, @common is (22, 29, 35, 46, 52, 1, 31, 38, 40, 53,
6, 16, 18, 29, 37), and @powerball is (2, 42, 24).
--
Jonathan "Dataweaver" Lang
| |
| Dr.Ruud 2007-11-10, 10:00 pm |
| "Jonathan Lang" schreef:
> while (<DATA> ) {
> ($a[0], $a[1], $a[2], $a[3], $a[4], $b) = /(\d+), (\d+), (\d+),
> (\d+), (\d+), Powerball: (\d+)/;
> push @common, @a; push @powerball, $b;
> }
A slightly different way to do that, is:
while (<DATA> ) {
if (my @numbers =
/(\d+), (\d+), (\d+), (\d+), (\d+), Powerball: (\d+)/) {
push @common, @numbers[0..4];
push @powerball, $numbers[5];
}
else {
...
}
}
--
Affijn, Ruud
"Gewoon is een tijger."
| |
| Omega -1911 2007-11-10, 10:00 pm |
| Thank you both Dr.Ruud & Jonathan Lang. I will give both examples a
try later today and let you know how it all turns out.
| |
| Omega -1911 2007-11-10, 10:00 pm |
| Thank you both Dr.Ruud & Jonathan Lang. I will give both examples a
try later today and let you know how it all turns out.
| |
| John W . Krahn 2007-11-11, 10:00 pm |
| On Saturday 10 November 2007 06:39, Dr.Ruud wrote:
> "Jonathan Lang" schreef:
>
> A slightly different way to do that, is:
>
> while (<DATA> ) {
> if (my @numbers =
> /(\d+), (\d+), (\d+), (\d+), (\d+), Powerball: (\d+)/) {
Another way to do that:
/Powerball:/ and my @numbers = /\d+/g;
> push @common, @numbers[0..4];
> push @powerball, $numbers[5];
> }
> else {
> ...
> }
> }
John
--
use Perl;
program
fulfillment
| |
| Dr.Ruud 2007-11-11, 10:00 pm |
| John W . Krahn schreef:[color=darkred]
> Dr.Ruud:
[color=darkred]
>
> Another way to do that:
>
> /Powerball:/ and my @numbers = /\d+/g;
>
>
I wouldn't use such a conditional "my".
So maybe you meant it more like:
if ( /Powerball:/ ) {
if ( (my @numbers = /\d+/g) >= 5 ) {
push @common, @numbers[0..4];
push @powerball, $numbers[5];
}
else {
....
}
}
else {
...
}
For example:
#!/usr/bin/perl
use strict;
use warnings;
{ local ($", $\) = (", ", "\n");
my @common;
my @powerball;
while (<DATA> ) {
if ( /Powerball:/ ) {
if ( (my @numbers = /\b\d+\b/g) > 5 ) {
push @common, @numbers[0..4];
push @powerball, $numbers[5];
}
else {
print << "EOS";
*********
* ERROR * parsing input line $.
*********
EOS
}
}
else {
# do nothing
}
}
print "common : @common";
print "powerball : @powerball";
}
__DATA__
abc 01 def 02 ghi 03 ijk 04 lmn 05 Powerbalx: 06 xyz
abc 11 def 12 ghi 13 ijk 14 lmn 15 Powerball: 16 xyz
abc 21 def 22 ghi 23 ijk 24 lmn 25 Powerball: X6 xyz
abc 31 def 32 ghi 33 ijk 34 lmn 35 Powerball: 36 xyz
test
abc 41 def 42 ghi 43 ijk 44 lmn 45 Powerball: 46.3 xyz
--
Affijn, Ruud
"Gewoon is een tijger."
| |
| Rob Dixon 2007-11-11, 10:00 pm |
| Dr.Ruud wrote:
> John W . Krahn schreef:
>
>
> I wouldn't use such a conditional "my".
>
> So maybe you meant it more like:
>
> if ( /Powerball:/ ) {
> if ( (my @numbers = /\d+/g) >= 5 ) {
> push @common, @numbers[0..4];
> push @powerball, $numbers[5];
> }
> else {
> ....
> }
> }
> else {
> ...
> }
There is no conditional 'my': it is a delaration. I believe John was
suggesting a replacement just for your conditional expression:
if (/Powerball:/ and my @numbers = /\d+/g) {
push @common, @numbers[0..4];
push @powerball, $numbers[5];
}
else {
:
}
which isn't an equivalent to yours - it simply makes sure that the
record contains 'Powerball:' and at least one digit - but I'm sure it is
adequate. My own solution didn't even do this much checking, since I
read the OP as saying that all irrelevant data records had been removed.
Rob
| |
| Omega -1911 2007-11-11, 10:00 pm |
| > which isn't an equivalent to yours - it simply makes sure that the
> record contains 'Powerball:' and at least one digit - but I'm sure it is
> adequate. My own solution didn't even do this much checking, since I
> read the OP as saying that all irrelevant data records had been removed.
I appreciate the help as I am understanding the examples, but when I
ran Dr. Rudd's example, I had weird data in the common array (Notice
the number 440):
common : 120, 10, 07, 440, 6, 120, 7, 07, 440, 22, 120, 3, 07, 440, 1,
120, 31, 07, 440, 6, 120, 27, 07, 440, 13, 120, 24, 07, 440, 10, 120,
20, 07, 440, 10, 120, 17, 07, 440, 14, 120, 13, 07, 440, 21, 120, 10,
07, 440, 12, 120, 6, 07, 440, 8, 120, 3, 07, 440, 2, 120, 29, 07, 440,
31, 120, 26, 07, 440, 25, 120, 22, 07, 440, 4, 120, 19, 07, 440, 20,
120, 15, 07, 440, 13, 120, 12, 07, 440, 5, 120, 8, 07, 440, 7, 120, 5,
07, 440, 11, 120, 1, 07, 440, 12, 120, 29, 07, 440, 13, 120, 25, 07,
440, 2, 120, 22, 07, 440, 12, 120, 18, 07, 440, 12, 120, 15, 07, 440,
19, 120, 11, 07, 440, 1, 120, 8, 07, 440, 9, 120, 4, 07, 440, 2, 120,
1, 07, 440, 9, 120, 28, 07, 440, 15, 120, 25, 07, 440, 28, 120, 21,
07, 440, 14, 120, 18, 07, 440, 3, 120, 14, 07, 440, 1, 120, 11, 07,
440, 8, 120, 7, 07, 440, 15, 120, 4, 07, 440, 1, 120, 30, 07, 440, 24,
120, 27, 07, 440, 9, 120, 23, 07, 440, 14, 120, 20, 07, 440, 23, 120,
16, 07, 440, 4, 120, 13, 07, 440, 10, 120, 9, 07, 440, 7, 120, 6, 07,
440, 5, 120, 2, 07, 440, 2, 120, 30, 07, 440, 7, 120, 26, 07, 440, 1,
120, 23, 07, 440, 3, 120, 19, 07, 440, 3, 120, 16, 07, 440, 6, 120,
12, 07, 440, 30, 120, 9, 07, 440, 2, 120, 5, 07, 440, 13, 120, 2, 07,
440, 1, 120, 28, 07, 440, 16, 120, 25, 07, 440, 12, 120, 21, 07, 440,
22, 120, 18, 07, 440, 6, 120, 14, 07, 440, 12, 120, 11, 07, 440, 6,
120, 7, 07, 440, 2, 120, 4, 07, 440, 19, 120, 31, 07, 440, 2, 120, 28,
07, 440, 6, 120, 24, 07, 440, 10, 120, 21, 07, 440, 16, 120, 17, 07,
440, 7, 120, 14, 07, 440, 4, 120, 10, 07, 440, 14, 120, 7, 07, 440,
13, 120, 3, 07, 440, 1, 120, 28, 07, 440, 13, 120, 24, 07, 440, 36,
120, 21, 07, 440, 2, 120, 17, 07, 440, 1, 120, 14, 07, 440, 3, 120,
10, 07, 440, 2, 120, 7, 07, 440, 4, 120, 3, 07, 440, 12, 120, 31, 07,
440, 2, 120, 27, 07, 440, 10, 120, 24, 07, 440, 9, 120, 20, 07, 440,
1, 120, 17, 07, 440, 16, 120, 13, 07, 440, 1, 120, 10, 07, 440, 36,
120, 6, 07, 440, 1, 120, 3, 07, 440, 10, 120, 30, 06, 440, 9, 120, 27,
06, 440, 14, 120, 23, 06, 440, 8, 120, 20, 06, 440, 1, 120, 16, 06,
440, 5, 120, 13, 06, 440, 19, 120, 9, 06, 440, 19, 120, 6, 06, 440, 7,
120, 2, 06, 440, 17, 120, 29, 06, 440, 2, 120, 25, 06, 440, 5, 120,
22, 06, 440, 22, 120, 18, 06, 440, 1, 120, 15, 06, 440, 11, 120, 11,
06, 440, 35 powerball : 22, 29, 31, 16, 25, 11, 11, 15, 30, 16, 30, 4,
33, 27, 9, 25, 16, 24, 12, 20, 19, 16, 8, 37, 15, 22, 10, 16, 23, 16,
19, 35, 30, 9, 21, 20, 21, 2, 38, 11, 15, 31, 8, 13, 10, 9, 22, 23, 5,
11, 19, 7, 44, 13, 21, 8, 22, 13, 26, 10, 21, 15, 28, 30, 5, 20, 38,
24, 17, 27, 18, 16, 5, 20, 38, 8, 15, 26, 11, 22, 13, 15, 19, 19, 5,
35, 21, 42, 24, 12, 23, 16, 27, 6, 17, 32, 22, 34, 34, 8, 18, 32, 8,
28, 38
BUT, when I run his other example (see below), everything worked as
well as the other examples you all supplied:
while (<DATA> ) {
if (my @numbers =
/(\d+), (\d+), (\d+), (\d+), (\d+), Powerball: (\d+)/) {
push @common, @numbers[0..4];
push @powerball, $numbers[5];
}
else {
...
}
}
| |
| Dr.Ruud 2007-11-11, 10:00 pm |
| Rob Dixon schreef:
> Dr.Ruud:
[color=darkred]
>
> There is no conditional 'my': it is a de[c]laration.
I call it a conditional "my". A "my" can be just a declaration, or a
declaration and an initialisation. In this case only the initialisation
is conditional.
A "my" in a condition has special behaviour if the condition is constant
false: "0 and my $var;" creates a static $var.
As I wrote: *I* wouldn't use *such* a conditional "my". I put the
declaration on its own line, just before the conditional initialisation.
I sometimes use a conditional my if I want the static behaviour, but not
in production code. Perl 5.10 has "static".
--
Affijn, Ruud
"Gewoon is een tijger."
| |
| Jenda Krynicky 2007-11-12, 7:00 pm |
| From: "Dr.Ruud" <rvtol+news@isolution.nl>
> Rob Dixon schreef:
>
>
> I call it a conditional "my". A "my" can be just a declaration, or a
> declaration and an initialisation. In this case only the initialisation
> is conditional.
>
> A "my" in a condition has special behaviour if the condition is constant
> false: "0 and my $var;" creates a static $var.
>
> As I wrote: *I* wouldn't use *such* a conditional "my". I put the
> declaration on its own line, just before the conditional initialisation.
>
> I sometimes use a conditional my if I want the static behaviour, but not
> in production code. Perl 5.10 has "static".
Perl 5.x has
{
my $static;
sub foo {
$static++;
...
}
}
which even lets you create variables that are shared by several
subroutines.
I do understand you might want to use my() like this:
open my $FH, '>', $filename or die $^E;
or
if (my $foo = foo($x, $y, $z) and my $bar = bar(1,2,3)) {
and use $foo and $bar here
}
but I'd definitely never ever do
condition and my $x = blah();
and if
0 and my $x;
creates a static $x I call it a bug.
Jend
===== Jenda@Krynicky.cz === http://Jenda.Krynicky.cz =====
When it comes to wine, women and song, wizards are allowed
to get drunk and croon as much as they like.
-- Terry Pratchett in Sourcery
| |
| Dr.Ruud 2007-11-13, 7:01 pm |
| "Jenda Krynicky" schreef:
> if
>
> 0 and my $x;
>
> creates a static $x I call it a bug.
It's called a feature.
--
Affijn, Ruud
"Gewoon is een tijger."
| |
| Dr.Ruud 2007-11-13, 7:01 pm |
| "Jenda Krynicky" schreef:
> I'd definitely never ever do
>
> condition and my $x = blah();
That is what I said. It is "technically" OK to use it with a condition
that can not be decided at compile time, but I still recommend not to
use it.
--
Affijn, Ruud
"Gewoon is een tijger."
| |
| Dr.Ruud 2007-11-13, 7:01 pm |
| "Jenda Krynicky" schreef:
> {
> my $static;
> sub foo {
> $static++;
> ...
> }
> }
There (the first declared version of) the variable $static is part of
the environment of foo(). Don't mistake that for staticness.
In Perl 5.8.8 you can enforce $static to be static, like this:
{
0 and my $static;
sub foo {
$static++;
...
}
}
That ugly my() can only occur once, ut it still makes the variable
lexical.
There is just no better way to set up a real static variable in Perl
5.8.8.
Check out the differences between the following two "academic" examples:
$ perl -le'
for (7..9)
{
my $static = $_; # declared and initialised 3 times
sub foo {
$static++; # uses the first of the declared $static's
print " foo:$static";
}
foo() for 0..1;
print "for:$static";
}
'
foo:8
foo:9
for:9
foo:10
foo:11
for:8 (would be undef without the initialisation)
foo:12
foo:13
for:9 (would be undef without the initialisation)
$ perl -le'
for (7..9)
{
0 and my $static = $_; # declared *once*,
# *never* initialised
sub foo {
$static++;
print " foo:$static";
}
foo() for 0..1;
print "for:$static";
}
'
foo:1
foo:2
for:2
foo:3
foo:4
for:4
foo:5
foo:6
for:6
--
Affijn, Ruud
"Gewoon is een tijger."
| |
| Jenda Krynicky 2007-11-19, 7:59 am |
| From: "Dr.Ruud" <rvtol+news@isolution.nl>
> "Jenda Krynicky" schreef:
>
> There (the first declared version of) the variable $static is part of
> the environment of foo(). Don't mistake that for staticness.
Maybe I don't know what does "staticness" mean then. I though a
static variable is one that is private to a function, but keeps the
value between the function's invocations. How do you define
staticness?
> In Perl 5.8.8 you can enforce $static to be static, like this:
>
> {
> 0 and my $static;
> sub foo {
> $static++;
> ...
> }
> }
>
> That ugly my() can only occur once, ut it still makes the variable
> lexical.
> There is just no better way to set up a real static variable in Perl
> 5.8.8.
>
>
> Check out the differences between the following two "academic" examples:
>
> $ perl -le'
> for (7..9)
> {
> my $static = $_; # declared and initialised 3 times
>
> sub foo {
> $static++; # uses the first of the declared $static's
> print " foo:$static";
> }
> foo() for 0..1;
> print "for:$static";
> }
> '
With -w you get a "Variable $static will not stay shared" warning.
And rightly so. You are doing something you are not supposed to do.
A named subroutine inside another subroutine or a loop is a red flag.
Something that (unless found in an obfuscation) suggests that the
author of the code misunderstood something. It's yet another "please
don't do this".
$ perl -le'
for (7..9)
{
my $static = $_; # declared and initialised 3 times
my $foo = sub {
$static++; # uses the first of the declared $static's
print " foo:$static";
};
$foo->() for 0..1;
print "for:$static";
}
'
Jenda
===== Jenda@Krynicky.cz === http://Jenda.Krynicky.cz =====
When it comes to wine, women and song, wizards are allowed
to get drunk and croon as much as they like.
-- Terry Pratchett in Sourcery
| |
| Matthew Whipple 2007-11-21, 10:01 pm |
| Omega -1911 wrote:
>
>
> I appreciate the help as I am understanding the examples, but when I
> ran Dr. Rudd's example, I had weird data in the common array (Notice
> the number 440):
>
>
I'd guess the first example didn't include the conditional checking the
data format and there were some 440 HTTP errors while retrieving the
data (obviously hinging on whether that applies to the data source,
especially since it was initially specified as a web page rather than a
web site).
> common : 120, 10, 07, 440, 6, 120, 7, 07, 440, 22, 120, 3, 07, 440, 1,
> 120, 31, 07, 440, 6, 120, 27, 07, 440, 13, 120, 24, 07, 440, 10, 120,
> 20, 07, 440, 10, 120, 17, 07, 440, 14, 120, 13, 07, 440, 21, 120, 10,
> 07, 440, 12, 120, 6, 07, 440, 8, 120, 3, 07, 440, 2, 120, 29, 07, 440,
> 31, 120, 26, 07, 440, 25, 120, 22, 07, 440, 4, 120, 19, 07, 440, 20,
> 120, 15, 07, 440, 13, 120, 12, 07, 440, 5, 120, 8, 07, 440, 7, 120, 5,
> 07, 440, 11, 120, 1, 07, 440, 12, 120, 29, 07, 440, 13, 120, 25, 07,
> 440, 2, 120, 22, 07, 440, 12, 120, 18, 07, 440, 12, 120, 15, 07, 440,
> 19, 120, 11, 07, 440, 1, 120, 8, 07, 440, 9, 120, 4, 07, 440, 2, 120,
> 1, 07, 440, 9, 120, 28, 07, 440, 15, 120, 25, 07, 440, 28, 120, 21,
> 07, 440, 14, 120, 18, 07, 440, 3, 120, 14, 07, 440, 1, 120, 11, 07,
> 440, 8, 120, 7, 07, 440, 15, 120, 4, 07, 440, 1, 120, 30, 07, 440, 24,
> 120, 27, 07, 440, 9, 120, 23, 07, 440, 14, 120, 20, 07, 440, 23, 120,
> 16, 07, 440, 4, 120, 13, 07, 440, 10, 120, 9, 07, 440, 7, 120, 6, 07,
> 440, 5, 120, 2, 07, 440, 2, 120, 30, 07, 440, 7, 120, 26, 07, 440, 1,
> 120, 23, 07, 440, 3, 120, 19, 07, 440, 3, 120, 16, 07, 440, 6, 120,
> 12, 07, 440, 30, 120, 9, 07, 440, 2, 120, 5, 07, 440, 13, 120, 2, 07,
> 440, 1, 120, 28, 07, 440, 16, 120, 25, 07, 440, 12, 120, 21, 07, 440,
> 22, 120, 18, 07, 440, 6, 120, 14, 07, 440, 12, 120, 11, 07, 440, 6,
> 120, 7, 07, 440, 2, 120, 4, 07, 440, 19, 120, 31, 07, 440, 2, 120, 28,
> 07, 440, 6, 120, 24, 07, 440, 10, 120, 21, 07, 440, 16, 120, 17, 07,
> 440, 7, 120, 14, 07, 440, 4, 120, 10, 07, 440, 14, 120, 7, 07, 440,
> 13, 120, 3, 07, 440, 1, 120, 28, 07, 440, 13, 120, 24, 07, 440, 36,
> 120, 21, 07, 440, 2, 120, 17, 07, 440, 1, 120, 14, 07, 440, 3, 120,
> 10, 07, 440, 2, 120, 7, 07, 440, 4, 120, 3, 07, 440, 12, 120, 31, 07,
> 440, 2, 120, 27, 07, 440, 10, 120, 24, 07, 440, 9, 120, 20, 07, 440,
> 1, 120, 17, 07, 440, 16, 120, 13, 07, 440, 1, 120, 10, 07, 440, 36,
> 120, 6, 07, 440, 1, 120, 3, 07, 440, 10, 120, 30, 06, 440, 9, 120, 27,
> 06, 440, 14, 120, 23, 06, 440, 8, 120, 20, 06, 440, 1, 120, 16, 06,
> 440, 5, 120, 13, 06, 440, 19, 120, 9, 06, 440, 19, 120, 6, 06, 440, 7,
> 120, 2, 06, 440, 17, 120, 29, 06, 440, 2, 120, 25, 06, 440, 5, 120,
> 22, 06, 440, 22, 120, 18, 06, 440, 1, 120, 15, 06, 440, 11, 120, 11,
> 06, 440, 35 powerball : 22, 29, 31, 16, 25, 11, 11, 15, 30, 16, 30, 4,
> 33, 27, 9, 25, 16, 24, 12, 20, 19, 16, 8, 37, 15, 22, 10, 16, 23, 16,
> 19, 35, 30, 9, 21, 20, 21, 2, 38, 11, 15, 31, 8, 13, 10, 9, 22, 23, 5,
> 11, 19, 7, 44, 13, 21, 8, 22, 13, 26, 10, 21, 15, 28, 30, 5, 20, 38,
> 24, 17, 27, 18, 16, 5, 20, 38, 8, 15, 26, 11, 22, 13, 15, 19, 19, 5,
> 35, 21, 42, 24, 12, 23, 16, 27, 6, 17, 32, 22, 34, 34, 8, 18, 32, 8,
> 28, 38
>
> BUT, when I run his other example (see below), everything worked as
> well as the other examples you all supplied:
>
> while (<DATA> ) {
> if (my @numbers =
> /(\d+), (\d+), (\d+), (\d+), (\d+), Powerball: (\d+)/) {
> push @common, @numbers[0..4];
> push @powerball, $numbers[5];
> }
> else {
> ...
> }
> }
>
>
|
|
|
|
|