Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

Help needed for perl rookie
I am new to perl, but so far have had decent success in writing/modifying
code to do what I want to do. However I am stuck trying to modify the
following code. I am sure the solution is quite simple, but I can't
completely figure out what this piece of code does. I think it is just
matching up a data pattern but this is an area I am unfamiliar with.

All I want to do is change the format of the data file from example #1 to
example #2 and need this section of code to work with the new format. I
would be grateful for any help provided in understanding what this piece of
code does and suggestions on the modification needed.

If more information or a larger chunk of the code is needed please let me
know and I will provide.

EXAMPLE #1 - Current format of data file:
0000000050 20041227 0000000003 'my-page.shtml'
0000000054 20041227 0000000004 'another-page.shtml'
0000000020 20041227 0000000003 'yet-another-page.shtml'

EXAMPLE #2 - New format of data file:
0000000050|20041227|0000000003|my-page.shtml
0000000054|20041227|0000000004|another-page.shtml
0000000020|20041227|0000000003|yet-another-page.shtml

Current code that reads original data format:

&LockOpen (COUNT,"$AccessFile");
$location = tell COUNT;
while ($line = <COUNT> ) {
if (($acc,$day,$dayacc,$uri) = ($line =~ /^(\d+) (\d+) (\d+) '(\S+)'$/)) {
if ($uri eq $doc_uri) {
last;
}
}
last if ($uri eq $doc_uri);
$location = tell COUNT;
$acc = 0;
$dayacc = 0;
}


Thanks!

PM


Report this thread to moderator Post Follow-up to this message
Old Post
GRLCOPM
12-28-04 01:56 AM


Re: Help needed for perl rookie
GRLCOPM wrote:
> I am new to perl, but so far have had decent success in
writing/modifying
> code to do what I want to do. However I am stuck trying to modify the
> following code. I am sure the solution is quite simple, but I can't
> completely figure out what this piece of code does. I think it is
just
> matching up a data pattern but this is an area I am unfamiliar with.
>
> All I want to do is change the format of the data file from example
#1 to
> example #2 and need this section of code to work with the new format.
I
> would be grateful for any help provided in understanding what this
piece of
> code does and suggestions on the modification needed.
>
> If more information or a larger chunk of the code is needed please
let me
> know and I will provide.
>
> EXAMPLE #1 - Current format of data file:
> 0000000050 20041227 0000000003 'my-page.shtml'
> 0000000054 20041227 0000000004 'another-page.shtml'
> 0000000020 20041227 0000000003 'yet-another-page.shtml'
>
> EXAMPLE #2 - New format of data file:
> 0000000050|20041227|0000000003|my-page.shtml
> 0000000054|20041227|0000000004|another-page.shtml
> 0000000020|20041227|0000000003|yet-another-page.shtml
>

looks like you are replacing the spaces after the numbers with a '|'
and removing the single quotes.

s/(\d+)\s/$1|/g;
s/'//g;

maybe

> Current code that reads original data format:
>
> &LockOpen (COUNT,"$AccessFile");
> $location = tell COUNT;
> while ($line = <COUNT> ) {
>   if (($acc,$day,$dayacc,$uri) = ($line =~ /^(\d+) (\d+) (\d+)
'(\S+)'$/)) {
>         if ($uri eq $doc_uri) {
>             last;
>         }
>     }
>     last if ($uri eq $doc_uri);
>     $location = tell COUNT;
>     $acc = 0;
>     $dayacc = 0;
> }
>
>
> Thanks!
>
> PM


Report this thread to moderator Post Follow-up to this message
Old Post
ioneabu@yahoo.com
12-28-04 01:56 AM


Re: Help needed for perl rookie

> From: ioneabu@yahoo.com
> Organization: http://groups.google.com
> Newsgroups: comp.lang.perl.misc
> Date: 27 Dec 2004 14:17:37 -0800
> Subject: Re: Help needed for perl rookie
>
>
> looks like you are replacing the spaces after the numbers with a '|'
> and removing the single quotes.
>
> s/(\d+)\s/$1|/g;
> s/'//g;
>

Yes, that is how I have changed the format of the data file. Replaced the
spaces with | and removed the single quotes from the last item on the line.

Can someone please explain what the the following line of code does and what
the replacement would be?

if (($acc,$day,$dayacc,$uri) = ($line =~ /^(\d+) (\d+) (\d+) '(\S+)'$/)) {

Thanks

Patrick


Report this thread to moderator Post Follow-up to this message
Old Post
GRLCOPM
12-28-04 01:56 AM


Re: Help needed for perl rookie
In article <BDF5CC8C.30353%grlcopm@pacbell.net>, GRLCOPM
<grlcopm@pacbell.net> wrote:
 
>
> Yes, that is how I have changed the format of the data file. Replaced the
> spaces with | and removed the single quotes from the last item on the line
.
>
> Can someone please explain what the the following line of code does and wh
at
> the replacement would be?
>
> if (($acc,$day,$dayacc,$uri) = ($line =~ /^(\d+) (\d+) (\d+) '(\S+)'$/)) {

Starting from the inner =~ operator and working outwards:

The /.../ is a regular expression that will match a string if starting
at the beginning of the string (^) there occurs one or more digits 0-9
(\d+), a space character, one or more digits, a space, one or more
digits, a space, a single quote, one or more non-whitespace characters
(\S+), another single quote, followed by the end of the line.

If the contents of $line match the above expression, as all of your
example do, then the substrings of $line that match each of the partial
sub-patterns in parentheses in the regular expression will be put into
a list and assigned to the variables $acc, $day, $dayacc, and $uri in
the order in which they occur in $line. The length of this list will be
used as a boolean expression by the if statement, so that if the line
contains any matches, the statements within the if statement will be
executed.

You should replace this statement with the following to match your
modifed data lines (untested):

if( ($acc,$day,$dayacc,$uri) = ($line =~
/^(\d+)\|(\d+)\|(\d+)\|(\S+)$/) ) {

Note that the | character is special in regular expressions and needs
to be escaped.

See the documentation that comes with your perl installation to learn
more about regular expressions:

perldoc perlre
perldoc perlretut
perldoc perlrequick


----== Posted via mcse.ms - Unlimited-Uncensored-Secure Usenet News==-
---
http://www.mcse.ms The #1 Newsgroup Service in the World! >100,000 New
sgroups
---= East/West-Coast Server Farms - Total Privacy via Encryption =---

Report this thread to moderator Post Follow-up to this message
Old Post
Jim Gibson
12-28-04 01:56 AM


Re: Help needed for perl rookie
GRLCOPM wrote:

> I am new to perl, but so far have had decent success in writing/modifying
> code to do what I want to do. However I am stuck trying to modify the
> following code. I am sure the solution is quite simple, but I can't
> completely figure out what this piece of code does. I think it is just
> matching up a data pattern but this is an area I am unfamiliar with.
>
> All I want to do is change the format of the data file from example #1 to
> example #2 and need this section of code to work with the new format. I
> would be grateful for any help provided in understanding what this piece o
f
> code does and suggestions on the modification needed.
>
> If more information or a larger chunk of the code is needed please let me
> know and I will provide.
>
> EXAMPLE #1 - Current format of data file:
> 0000000050 20041227 0000000003 'my-page.shtml'
> 0000000054 20041227 0000000004 'another-page.shtml'
> 0000000020 20041227 0000000003 'yet-another-page.shtml'
>
> EXAMPLE #2 - New format of data file:
> 0000000050|20041227|0000000003|my-page.shtml
> 0000000054|20041227|0000000004|another-page.shtml
> 0000000020|20041227|0000000003|yet-another-page.shtml

Your example #2 is in "pipe-delimited" form -- the best way to
split it apart is with the split() function, as in:

($acc,$day,$dayacc,$uri)=split /\|/,$line;

For that, using a regexp ("regular expression") is a bit
cumbersome and slow in comparison.  See:

perldoc -f split

...

>   if (($acc,$day,$dayacc,$uri) = ($line =~ /^(\d+) (\d+) (\d+) '(\S+)'$/)) {[/colo
r]

Regarding your question of what the regexp does, please see the
docs that come with Perl:

perldoc perlre
perldoc perlretut
perldoc perlrequick
perldoc perlreref
perldoc perlop

and probably some more.  Hint:  Type the above at a command
prompt on your computer.  You might also want to check out:

http://www.perlpod.com
http://learn.perl.org

or a good book on the subject like "Mastering Regular Expressions".
...
> PM

--
Bob Walton
Email: http://bwalton.com/cgi-bin/emailbob.pl


----== Posted via mcse.ms - Unlimited-Uncensored-Secure Usenet News==-
---
http://www.mcse.ms The #1 Newsgroup Service in the World! >100,000 New
sgroups
---= East/West-Coast Server Farms - Total Privacy via Encryption =---

Report this thread to moderator Post Follow-up to this message
Old Post
Bob Walton
12-28-04 08:56 AM


Re: Help needed for perl rookie
GRLCOPM wrote:
> I am new to perl,


For your future reference:  Read the posting guidelines for this Usenet
news group:  http://mail.augustmail.com/~tadmc/clpmisc.shtml

As the guidelines advise, put your real Perl subject in the "Subject"
line.  There's no need to apologize for being a rookie provided you make
a genuine effort to solve the problem yourself prior to posting.  But
the fact that you're new to Perl, should you wish to include it, is best
included in the body of your posting.  Don't take up valuable Net real
estate by wasting it on the Subject line.  HTH.

Jim Keenan

Report this thread to moderator Post Follow-up to this message
Old Post
Jim Keenan
12-28-04 08:56 AM


Re: Help needed for perl rookie
> From: Bob Walton <see_sig@invalid>
> Organization: Newsfeed.com http://www.mcse.ms 100,000+ UNCENSORED
> Newsgroups.
> Newsgroups: comp.lang.perl.misc
> Date: Mon, 27 Dec 2004 22:34:20 -0500
> Subject: Re: Help needed for perl rookie
>
> GRLCOPM wrote:
> 
>
> Your example #2 is in "pipe-delimited" form -- the best way to
> split it apart is with the split() function, as in:
>
> ($acc,$day,$dayacc,$uri)=split /\|/,$line;
>
> 
>
>
> --
> Bob Walton
> Email: http://bwalton.com/cgi-bin/emailbob.pl

Thanks Bob,

I am familiar with the split function and have been looking for a solution
that utilizes it, but the line you provided does not seem to work as a
replacement for the line I included. I have been looking through
documentation including the references you provided, but I am still having a
hard time with this. I guess what I am looking for is someone to break down
what is happening in this line so that I can modify it to work as I need it
to. Here is the section of code in question.

&LockOpen (COUNT,"$AccessFile");
$location = tell COUNT;
while ($line = <COUNT> ) {
if (($acc,$day,$dayacc,$uri) = ($line =~ /^(\d+) (\d+) (\d+) '(\S+)'$/)) {
if ($uri eq $doc_uri) {
last;
}
}
last if ($uri eq $doc_uri);
$location = tell COUNT;
$acc = 0;
$dayacc = 0;
}

And here is the specific line:

if (($acc,$day,$dayacc,$uri) = ($line =~ /^(\d+) (\d+) (\d+) '(\S+)'$/)) {

It reads the data file that is in this format:

EXAMPLE #1 - Current format of data file:
0000000050 20041227 0000000003 'my-page.shtml'
0000000054 20041227 0000000004 'another-page.shtml'
0000000020 20041227 0000000003 'yet-another-page.shtml'

I need it to perform the same function on a data file in this format:

EXAMPLE #2 - New format of data file:
0000000050|20041227|0000000003|my-page.shtml
0000000054|20041227|0000000004|another-page.shtml
0000000020|20041227|0000000003|yet-another-page.shtml

Based on the way this program works, my guess is that $uri is being compared
with the data inside the quotes '(\S+)' taken from the current line of the
data file. Right?

I appreciate your help and any further advice you or anyone else can offer.

- Patrick


Report this thread to moderator Post Follow-up to this message
Old Post
GRLCOPM
12-28-04 08:57 PM


Re: Help needed for perl rookie
> From: Jim Keenan <jkeen_via_google@yahoo.com>
> Newsgroups: comp.lang.perl.misc
> Date: Tue, 28 Dec 2004 03:59:08 GMT
> Subject: Re: Help needed for perl rookie
>
> GRLCOPM wrote: 
>
>
> For your future reference:  Read the posting guidelines for this Usenet
> news group:  http://mail.augustmail.com/~tadmc/clpmisc.shtml
>
> As the guidelines advise, put your real Perl subject in the "Subject"
> line.  There's no need to apologize for being a rookie provided you make
> a genuine effort to solve the problem yourself prior to posting.  But
> the fact that you're new to Perl, should you wish to include it, is best
> included in the body of your posting.  Don't take up valuable Net real
> estate by wasting it on the Subject line.  HTH.
>
> Jim Keenan

Thanks Jim.

At the time I posted the question I was really lost as to what exactly my
question was... that is why I used the subject line I did.

I apologize to the group and hope you will excuse my breech of etiquette
this time. I will try to be more specific with my subject line in the
future.

-Patrick


Report this thread to moderator Post Follow-up to this message
Old Post
GRLCOPM
12-28-04 08:57 PM


Re: Help needed for perl rookie

> From: Jim Gibson <jgibson@mail.arc.nasa.gov>
> Organization: Newsfeed.com http://www.mcse.ms 100,000+ UNCENSORED
> Newsgroups.
> Newsgroups: comp.lang.perl.misc
> Date: Mon, 27 Dec 2004 15:05:04 -0800
> Subject: Re: Help needed for perl rookie
>
> In article <BDF5CC8C.30353%grlcopm@pacbell.net>, GRLCOPM
> <grlcopm@pacbell.net> wrote:
> 
>
> Starting from the inner =~ operator and working outwards:
>

Thanks Jim!

At first I missed your reply this morning so please excuse my re-post of the
question.

The line you provided works exactly as desired and in addition I really
appreciate you taking your time to explain the code. Stumbling around in the
dark I managed to get pretty close, but failed to escape the | char.

Thanks again!

Happy Holidays,

Patrick


Report this thread to moderator Post Follow-up to this message
Old Post
GRLCOPM
12-28-04 08:57 PM


Re: Help needed for perl rookie
GRLCOPM wrote:
 
...
 
>
>
> Thanks Bob,
>
> I am familiar with the split function and have been looking for a solution
> that utilizes it, but the line you provided does not seem to work as a
> replacement for the line I included. I have been looking through

Here is an example using split:

use warnings;
use strict;
while(my $line=<DATA> ){
chomp $line; #remove newline at end of line
if(my($acc,$day,$dayacc,$uri)=split /\|/,$line){
print " acc=$acc\nday=$day\ndayacc=$dayacc\nuri=
$uri\n";
}
}
__END__
0000000050|20041227|0000000003|my-page.shtml
0000000054|20041227|0000000004|another-page.shtml
0000000020|20041227|0000000003|yet-another-page.shtml

That generates:

D:\junk>perl junk510.pl
acc=0000000050
day=20041227
dayacc=0000000003
uri=my-page.shtml
acc=0000000054
day=20041227
dayacc=0000000004
uri=another-page.shtml
acc=0000000020
day=20041227
dayacc=0000000003
uri=yet-another-page.shtml

D:\junk>

which seems to me to be what you want.  If that isn't what you
want, please describe in full detail exactly what it is you do
want.  Note that your statement "does not seem to work" doesn't
convey much information.  What *exactly* did it do that you
didn't want it to do?  What didn't it do that you did want it to
do?  Did it generate any error messages?  If so, what *exactly*
(copy/pasted, not retyped) were they?

Also note the use of a simplified example code complete with data
(and lacking unrelated obfuscating details) that illustrates the
point and that anyone can copy/paste/execute.  Providing such is
good form in this newsgroup.

> documentation including the references you provided, but I am still having
 a
> hard time with this. I guess what I am looking for is someone to break dow
n
> what is happening in this line so that I can modify it to work as I need i
t
> to. Here is the section of code in question.
>
> &LockOpen (COUNT,"$AccessFile");
> $location = tell COUNT;
> while ($line = <COUNT> ) {
>   if (($acc,$day,$dayacc,$uri) = ($line =~ /^(\d+) (\d+) (\d+) '(\S+)'$/))
 {
>       if ($uri eq $doc_uri) {
>           last;
>       }
>   }
>   last if ($uri eq $doc_uri);
>   $location = tell COUNT;
>   $acc = 0;
>   $dayacc = 0;
> }
>
> And here is the specific line:
>
> if (($acc,$day,$dayacc,$uri) = ($line =~ /^(\d+) (\d+) (\d+) '(\S+)'$/)) {

OK, in detail:

The if(expression){block} statement tests an expression (in this
case, the scalarized results of a list assignment [i.e., the
length of the list assignment] from a regular expression match)
for a true value, and if true, it executes the statements in the
block (in this case, another if statement).  Otherwise it does
not execute them.  In the case of a pattern match, it is a *very*
good idea to test for the success of the pattern match before
using the purported results, as you are doing in this if statement.

Now, the expression executed is:

($acc,$day,$dayacc,$uri)=($line=~/^(\d+) (\d+) (\d+) '(\S+)'$/)

The lefthand side of the = is a list of four lvalues, which
lvalues will be assigned to the first four list elements
generated by the right-hand side.  The right-hand side is:

($line=~/^(\d+) (\d+) (\d+) '(\S+)'$/)

which has an unneeded set of parens around it, so:

$line=~/^(\d+) (\d+) (\d+) '(\S+)'$/

which is a pattern-matching statement.  The left-hand side of the
=~ matching operator designates the source of the string to be
matched.  The right-hand side starts with a / , which indicates
to Perl that it is a shortcut for the "m" operator using /'s as
delimiters.  Between the matching /'s then is a regular
expression.  This regular expression contains many metacharacters
(characters with special meaning inside regular expressions).
Specifically:

^ -> start the match with the first character of the string
(anchored match)

(\d+) -> a parenthesized group "captures" the portion of the
string matched by the contents of the parens.  Each capture
generates another element in the list output by the pattern match
(so there will be a four-element list generated by this regexp if
it matches).

\d+ -> The + metacharacter means the regexp element
immediately to the left of the + is repeated one or more times.
So in this case, a "\d" will be repeated one or more times.

\d -> This is a shortcut code for "any digit" (or, in other
words, the character class [0-9]).  It matches any single digit.
Thus, we see that \d+ matches any string of one or more digits.
And (\d+) captures that string of one or more digits on the
output list.

space character -> the space character is not a
metacharacter, and is matched literally.  Since it is not inside
of capturing parenthesis, it is not output on the output list.

Three occurrences of "(\d+) " occur, which will match three
strings of digits followed by space characters, and capture the
three strings of digits in the output list.

' -> the apostrophe is not a metacharacter, so it is matched
literally.  It is not captured.

(\S+) -> captures the results of \S repeated one or more
times.  \S is a shortcut code for any non-whitespace character.

' -> is a literal apostrophe

$ -> anchors the trailing end of the match at the end of the
string.  In other words, if the string isn't exhaused at the
point where the $ metacharacter occurs, the match will backtrack
and try an alternative or fail if the alternatives are exhausted.
By default, a trailing newline (like what you've got with your
data) is permitted on the end of the string -- the match will
succeed if everything up to the newline has been matched.

So in English, your regexp will match a string that starts with
three repititions of strings of digits followed by a single space
character followed by ' followed by any string of non-whitespace
characters followed by ' followed by the end of the string.  The
three strings of digits and the string of non-whitespace
characters will be captured and, upon match success, will be
assigned as the output list of the =~ match operator (and also,
BTW, in special variables $1, $2, $3 and $4, plus various pieces
of the match may be assigned to other builtin variables such as
$', $`, $&, @+, @-, etc.  See the docs for details, particularly
perldoc perlvar.

>
> It reads the data file that is in this format:
>
> EXAMPLE #1 - Current format of data file:
> 0000000050 20041227 0000000003 'my-page.shtml'
> 0000000054 20041227 0000000004 'another-page.shtml'
> 0000000020 20041227 0000000003 'yet-another-page.shtml'
>
> I need it to perform the same function on a data file in this format:
>
> EXAMPLE #2 - New format of data file:
> 0000000050|20041227|0000000003|my-page.shtml
> 0000000054|20041227|0000000004|another-page.shtml
> 0000000020|20041227|0000000003|yet-another-page.shtml
>

If you insist on a regexp to match the above, try:

if(($acc,$day,$dayacc,$uri)=
($line=~/^(\d+)\|(\d+)\|(\d+)\|(\S+)$/)) {

Note that | is a regexp metacharacter and thus literal instances
of it must be escaped with the \ metacharacter or equivalent.

> Based on the way this program works, my guess is that $uri is being compar
ed
> with the data inside the quotes '(\S+)' taken from the current line of the
> data file. Right?

Yes, if the match succeeds.

>
> I appreciate your help and any further advice you or anyone else can offer.[/color
]

My advice is:  read and study the documentation that is already
on your computer.  It is wonderful stuff, and is where all the
answers may be found.  And found more quickly than asking on a
newsgroup, where folks are generally not too willing to
regurgitate the docs in specific detail.

>
> - Patrick
>

HTH.
--
Bob Walton
Email: http://bwalton.com/cgi-bin/emailbob.pl


----== Posted via mcse.ms - Unlimited-Uncensored-Secure Usenet News==-
---
http://www.mcse.ms The #1 Newsgroup Service in the World! >100,000 New
sgroups
---= East/West-Coast Server Farms - Total Privacy via Encryption =---

Report this thread to moderator Post Follow-up to this message
Old Post
Bob Walton
12-29-04 08:57 AM


Sponsored Links




Last Thread Next Thread Next
Search this forum -> 
Post New Thread

PERL Miscellaneous archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 08:25 AM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.