For Programmers: Free Programming Magazines  


Home > Archive > PERL Miscellaneous > March 2004 > [NEWBIE] newline question









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author [NEWBIE] newline question
Jan Biel

2004-03-31, 1:34 pm

Hello!

From some tutorials on the web I managed to create a perl script which finds
and replaces certain occurences in text files via regular expressions.

Then something happened which I cannot really explain, so I hope you can
clarify it for me.

The original perl script looks like this:

-------------------------------
$filein = 'a.txt';
$fileout = 'b.txt';

open(INFO, $filein);
open(INFO2, ">$fileout");

@lines = <INFO>;

grep(s/\n//g,@lines);
grep(s/ab/found/g,@lines);

print INFO2 @lines;

close(INFO);
close(INFO2);
--------------------------------

where a.txt is a file containing:

--------------------------------
a
b
c
--------------------------------

The resulting b.txt contains:

--------------------------------
abc
--------------------------------

So the second regular expression is ignored.

But if I write two perl scripts where each executes only one of the regular
expressions it works with the result:

--------------------------------
foundc
--------------------------------

as expected.

What is the mystery here?

I hope this wasn't too confusing :)
Janbiel

Richard Morse

2004-03-31, 1:34 pm

In article <c4eupc$mqe$1@ariadne.rz.tu-clausthal.de>,
"Jan Biel" <jan.biel@tu-clausthal.de> wrote:

> -------------------------------
> $filein = 'a.txt';
> $fileout = 'b.txt';
>
> open(INFO, $filein);
> open(INFO2, ">$fileout");
>
> @lines = <INFO>;


@lines = ( 'a\n', 'b\n', 'c\n' );

> grep(s/\n//g,@lines);


@lines = ( 'a', 'b', 'c');

> grep(s/ab/found/g,@lines);


At this point, no entry in @lines matched 'ab', so the substitute never
occurs.


Try this:

#!/usr/bin/perl

# always use these next two lines
use strict;
use warnings;

my $filein = 'a.txt';
my $fileout = 'b.txt';

open(my $in, "<", $filein) or die("Can't open $filein: $!");

# slurp all of the data into one string, since you really don't
# care about newline separations
my $data;
{
local $/;
$data = <$in>;
}
close($in);

# remove any newline characters
$data =~ s/\n//g;

# change 'ab' to 'found'
$data =~ s/ab/found/g;

# save the data
open(my $out, ">", $fileout) or die("Couldn't open >$fileout: $!");
print $out $data, "\n";
close($out);

__END__

HTH,
Ricky

--
Pukku
Gunnar Hjalmarsson

2004-03-31, 1:34 pm

Jan Biel wrote:
> The original perl script looks like this:


use strict; # Make Perl help you detect errors
use warnings; # "-

> $filein = 'a.txt';
> $fileout = 'b.txt';


my $filein = 'a.txt';
my $fileout = 'b.txt';
----^^
Declare variables with my()

> open(INFO, $filein);
> open(INFO2, ">$fileout");


open INFO, $filein or die $!;
open INFO2, "> $fileout" or die $!;
----------------------------^^^^^^^^^^
Check if file was successfully opened

> @lines = <INFO>;


my @lines = <INFO>;

> grep(s/\n//g,@lines);


That works, but it's clearer written as:

@lines = map { tr/\n//d; $_ } @lines;

Now it's time for reflection. :)

@lines is an array, and at this point, it contains three elements. You
seem to want to concatenate the elements to a string. That can be
done like this:

my $string = join '', @lines;

> grep(s/ab/found/g,@lines);


That takes one element at a time, and replaces occurrences of the
sting 'ab'. None of the elements contains that string, so nothing happens.

You can apply the s/// operator to $string instead:

$string =~ s/ab/found/g;

> print INFO2 @lines;


print INFO2 "$string\n";

> close(INFO);
> close(INFO2);
> --------------------------------



HTH

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

Gunnar Hjalmarsson

2004-03-31, 2:36 pm

Richard Morse wrote:
> In article <c4eupc$mqe$1@ariadne.rz.tu-clausthal.de>,
> "Jan Biel" <jan.biel@tu-clausthal.de> wrote:
>
> @lines = ( 'a\n', 'b\n', 'c\n' );


No, that wouldn't populate @lines with the same thing. This would:

@lines = ( "a\n", "b\n", "c\n" );

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

Brian McCauley

2004-03-31, 2:36 pm

Gunnar Hjalmarsson <noreply@gunnar.cc> writes:

>
> That works, but it's clearer written as:
>
> @lines = map { tr/\n//d; $_ } @lines;


That works, but it's not clearer. Using tr/// rather than using s///
adds efficiency not clarity. Using "map" where you really want "for"
instead of using "grep" where you really want "for" is a neutral change.
Addding a redundant assignement is just obfuscaion.

Clearer would be something like

tr/\n//d for @lines;

Or

s/\n//g for @lines;

--
\\ ( )
. _\\__[oo
.__/ \\ /\@
. l___\\
# ll l\\
###LL LL\\
Gunnar Hjalmarsson

2004-03-31, 2:36 pm

Brian McCauley wrote:
> Gunnar Hjalmarsson <noreply@gunnar.cc> writes:
>
>
> That works, but it's not clearer.


Well, I could argue, but I won't. Let's just agree that I should
better have used 'for'. :)

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

Richard Morse

2004-03-31, 3:35 pm

In article <c4f1j1$2hkaud$1@ID-184292.news.uni-berlin.de>,
Gunnar Hjalmarsson <noreply@gunnar.cc> wrote:

> Richard Morse wrote:
>
> No, that wouldn't populate @lines with the same thing. This would:
>
> @lines = ( "a\n", "b\n", "c\n" );


Right. I knew that.

/me starts flipping through documentation

My bad,
Ricky

--
Pukku
Jan Biel

2004-03-31, 3:35 pm

Richard Morse wrote:

> Try this:


[...]

Thanks a lot.
I really appreciate the comments.

Always glad to learn decent style along with learning the basics of a
language.

Thanks,
Janbiel

Jan Biel

2004-03-31, 3:35 pm

Gunnar Hjalmarsson wrote:

> my $filein = 'a.txt';
> my $fileout = 'b.txt';
> ----^^
> Declare variables with my()


Is that a style hint or is it crucial to the script?
I'm always eager to learn coding good style, I was just wondering which it
is ;)

>
> That works, but it's clearer written as:
>
> @lines = map { tr/\n//d; $_ } @lines;


I guess I'll need to read some tutorials to get a better hang of the
language. Trial and error won't cut it with perl I guess. I wasted too many
hours trying to find out why my code doesn't work. RTFM would have helped I
guess.

Thanks a lot for your input. I appreciate it.
Janbiel

David K. Wall

2004-03-31, 3:35 pm

Gunnar Hjalmarsson <noreply@gunnar.cc> wrote:

> Brian McCauley wrote:
>
> Well, I could argue, but I won't. Let's just agree that I should
> better have used 'for'. :)


Since the OP was just stripping newlines from lines read from a file,
why not use this:

chomp( my @lines = <INFO> );

Did I miss something somewhere? Why use map() when chomp() is made for
this purpose?
Gunnar Hjalmarsson

2004-03-31, 5:37 pm

Jan Biel wrote:
> Gunnar Hjalmarsson wrote:
>
> Is that a style hint or is it crucial to the script?


Running the script with strictures enabled:

use strict;

is a very important style hint. Doing so requires that all variables
are declared, and normally you declare the variables lexically using my().

> I guess I'll need to read some tutorials to get a better hang of
> the language. Trial and error won't cut it with perl I guess. I
> wasted too many hours trying to find out why my code doesn't work.
> RTFM would have helped I guess.


Good thoughts. :)

> Thanks a lot for your input. I appreciate it.


You are welcome.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

Gunnar Hjalmarsson

2004-03-31, 5:37 pm

David K. Wall wrote:
> Gunnar Hjalmarsson <noreply@gunnar.cc> wrote:
>
> Since the OP was just stripping newlines from lines read from a file,
> why not use this:
>
> chomp( my @lines = <INFO> );
>
> Did I miss something somewhere? Why use map() when chomp() is made for
> this purpose?


No, I think it was I who missed something (twice). (And Brian once.)

Thanks.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

Randal L. Schwartz

2004-03-31, 5:37 pm

*** post for FREE via your newsreader at post.newsfeed.com ***
[color=darkred]

Gunnar> That works, but it's clearer written as:

Gunnar> @lines = map { tr/\n//d; $_ } @lines;

I don't consider that clearer. The loop first modifies @lines (via
the side effect of having changed $_ in the map block), then gathers
all those results together to create a new list, then assigns the
entire new list over the top of the identically updated list.

Weird. Definitely not clearer, and more dangerous too. Consider
the obvious cut-and-paste mangling:

@newlines = map { tr/\n//d; $_ } @lines;

Your copy of @lines and @newlines are identical, even though you might
expect @lines to remain unaffected!

Definitely bad. Definitely don't do this. Not without the required
BIG HONKIN COMMENT to the right describing how wasteful you are.

print "Just another Perl hacker,"; # yeah, the guy who invented this phrase

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!


-----= Posted via Newsfeed.Com, Uncensored Usenet News =-----
http://www.newsfeed.com - The #1 Newsgroup Service in the World!
-----== 100,000 Groups! - 19 Servers! - Unlimited Download! =-----

Tad McClellan

2004-03-31, 6:40 pm

Jan Biel <jan.biel@tu-clausthal.de> wrote:
> Gunnar Hjalmarsson wrote:
>
>
> Is that a style hint or is it crucial to the script?



It is a style hint (a really strong one).

It _may_ be crucial to _a_ script, but I don't think it is
crucial to the code quoted above. (dodged a bullet)

It may be crucial to having folks here decide between looking
at the code and moving on to the next post. :-)


> I'm always eager to learn coding good style, I was just wondering which it
> is ;)



Controlling the scope of variables is a general CS type topic, much
of it is the same regardless of the programming language being used.


Learn about scoping in Perl:

"Coping with Scoping":

http://perl.plover.com/FAQs/Namespaces.html

Without my(), it is a Global Variable, and everybody knows that the
indiscriminate use of Global Variables is bad design.


I like to encourage these preliminary "rules" regarding variable scope:

For a Perl Beginner:

Always prefer lexical (my) over package (our) variables,
except for the built-in variables.

For a Perl Intermediate:

Always prefer lexical (my) over package (our) variables,
except when your program is so big as to require breaking
it up into several files.

For Everyone Else:

Always prefer lexical (my) over package (our) variables,
except when you can't. (you'll know when you can't by this point)


:-)

--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com