Home > Archive > PERL Beginners > July 2007 > Search and Replace
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Search and Replace
|
|
| Joseph L. Casale 2007-07-12, 3:59 am |
| Hi,
Know that I am learning perl, I am expected to use it at work :)
Problem is I am still to green for the current problem I have. The data is always left justified and has a space between each value.
I have a text file of about ~500 lines like this:
-11.67326 23.95923 0.4617566
5.075023 24.27938 0.4484084
6.722163 -24.68986 1.399011
-11.2023 -25.0398 1.145933
I need to do the following:
Insert and X, Y and Z as one script:
X-11.67326 Y23.95923 Z0.4617566
X5.075023 Y24.27938 Z0.4484084
X6.722163 Y-24.68986 Z1.399011
X-11.2023 Y-25.0398 Z1.145933
Lastly, I'll need to make an additional copy of the program to strip out any numerical values for Z, and replace them with the following, [some_var].
X-11.67326 Y23.95923 Z[some_var]
X5.075023 Y24.27938 Z[some_var]
X6.722163 Y-24.68986 Z[some_var]
X-11.2023 Y-25.0398 Z[some_var]
Any help would be appreciated greatly, I am still just to new for this!
Thanks guys!
jlc
| |
| John Moon 2007-07-12, 3:59 am |
| From: Joseph L. Casale [mailto:JCasale@ActiveNetwerx.com]=20
Sent: Wednesday, July 11, 2007 1:51 PM
To: beginners@perl.org
Subject: Search and Replace
Hi,
Know that I am learning perl, I am expected to use it at work :)
Problem is I am still to green for the current problem I have. The data
is always left justified and has a space between each value.
I have a text file of about ~500 lines like this:
-11.67326 23.95923 0.4617566
5.075023 24.27938 0.4484084
6.722163 -24.68986 1.399011
-11.2023 -25.0398 1.145933
I need to do the following:
Insert and X, Y and Z as one script:
X-11.67326 Y23.95923 Z0.4617566
X5.075023 Y24.27938 Z0.4484084
X6.722163 Y-24.68986 Z1.399011
X-11.2023 Y-25.0398 Z1.145933
Lastly, I'll need to make an additional copy of the program to strip out
any numerical values for Z, and replace them with the following,
[some_var].
X-11.67326 Y23.95923 Z[some_var]
X5.075023 Y24.27938 Z[some_var]
X6.722163 Y-24.68986 Z[some_var]
X-11.2023 Y-25.0398 Z[some_var]
Any help would be appreciated greatly, I am still just to new for this!
Thanks guys!
jlc
maybe
....
while (my $dat =3D <SOMEINPUT> ) {
$dat =3D~ /^\s+(.*)\s+^/\1/g;
my ($x,$y,$z) =3D split /\s+/;
print SOMEFILE "X$x Y$y Z$newz\n";
}
....
Hope this helps...=20
| |
| Paul Lalli 2007-07-12, 3:59 am |
| On Jul 11, 1:50 pm, JCas...@ActiveNetwerx.com (Joseph L. Casale)
wrote:
> Hi,
> Know that I am learning perl, I am expected to use it at work :)
> Problem is I am still to green for the current problem I have. The data is always left justified and has a space between each value.
>
> I have a text file of about ~500 lines like this:
> -11.67326 23.95923 0.4617566
> 5.075023 24.27938 0.4484084
> 6.722163 -24.68986 1.399011
> -11.2023 -25.0398 1.145933
>
> I need to do the following:
> Insert and X, Y and Z as one script:
> X-11.67326 Y23.95923 Z0.4617566
> X5.075023 Y24.27938 Z0.4484084
> X6.722163 Y-24.68986 Z1.399011
> X-11.2023 Y-25.0398 Z1.145933
>
> Lastly, I'll need to make an additional copy of the program to strip out any numerical values for Z, and replace them with the following, [some_var].
> X-11.67326 Y23.95923 Z[some_var]
> X5.075023 Y24.27938 Z[some_var]
> X6.722163 Y-24.68986 Z[some_var]
> X-11.2023 Y-25.0398 Z[some_var]
>
> Any help would be appreciated greatly, I am still just to new for this!
#!/usr/bin/env perl
use strict;
use warnings;
my @data = <DATA>;
#Part A
for my $line (@data){
$line =~ s/(\S+)\s+(\S+)\s+(\S+)/X$1 Y$2 Z$3/;
print $line;
}
print "\n\n";
#Part B
for my $line (@data) {
$line =~ s/(?<=Z)\d+(\.\d+)?/[some var]/;
print $line;
}
__DATA__
-11.67326 23.95923 0.4617566
5.075023 24.27938 0.4484084
6.722163 -24.68986 1.399011
-11.2023 -25.0398 1.145933
Output:
X-11.67326 Y23.95923 Z0.4617566
X5.075023 Y24.27938 Z0.4484084
X6.722163 Y-24.68986 Z1.399011
X-11.2023 Y-25.0398 Z1.145933
X-11.67326 Y23.95923 Z[some var]
X5.075023 Y24.27938 Z[some var]
X6.722163 Y-24.68986 Z[some var]
X-11.2023 Y-25.0398 Z[some var]
In the first, we simply search for the three strings of non-whitespace
and capture them in $1, $2, and $3, then we replace the whole string
with X followed by $1, Y followed by $2, and Z followed by $3
In the second, we search for any sequence of numbers (possibly
followed by a period and more numbers) that was preceeded by a Z, and
replace that sequence with [some var].
For more information on regular expressions, please see:
perldoc perlre
perldoc perlretut
perldoc perlreref
Paul Lalli
| |
| Rob Dixon 2007-07-12, 3:59 am |
| Joseph L. Casale wrote:
> Hi,
> Know that I am learning perl, I am expected to use it at work :)
> Problem is I am still to green for the current problem I have. The data is always left justified and has a space between each value.
>
> I have a text file of about ~500 lines like this:
> -11.67326 23.95923 0.4617566
> 5.075023 24.27938 0.4484084
> 6.722163 -24.68986 1.399011
> -11.2023 -25.0398 1.145933
>
> I need to do the following:
> Insert and X, Y and Z as one script:
> X-11.67326 Y23.95923 Z0.4617566
> X5.075023 Y24.27938 Z0.4484084
> X6.722163 Y-24.68986 Z1.399011
> X-11.2023 Y-25.0398 Z1.145933
>
>
> Lastly, I'll need to make an additional copy of the program to strip out any numerical values for Z, and replace them with the following, [some_var].
> X-11.67326 Y23.95923 Z[some_var]
> X5.075023 Y24.27938 Z[some_var]
> X6.722163 Y-24.68986 Z[some_var]
> X-11.2023 Y-25.0398 Z[some_var]
>
> Any help would be appreciated greatly, I am still just to new for this!
How about the program below? For your final variation, just change the line
printf "X%s Y%s Z%s\n", @data;
to
printf "X%s Y%s Z[some_var]\n", @data;
And note that you could open two output files and write to them both
using two print statements, rather than running a different program to
create the second file.
Hope this helps.
Rob
use strict;
use warnings;
while (<DATA> ) {
my @data = split;
printf "X%s Y%s Z%s\n", @data;
}
__DATA__
-11.67326 23.95923 0.4617566
5.075023 24.27938 0.4484084
6.722163 -24.68986 1.399011
-11.2023 -25.0398 1.145933
**OUTPUT**
X-11.67326 Y23.95923 Z0.4617566
X5.075023 Y24.27938 Z0.4484084
X6.722163 Y-24.68986 Z1.399011
X-11.2023 Y-25.0398 Z1.145933
| |
| Mr. Shawn H. Corey 2007-07-12, 3:59 am |
| Joseph L. Casale wrote:
> Paul,
> Reading the perlre doc I am starting to understand this line:
> $line =~ s/(\S+)\s+(\S+)\s+(\S+)/X$1 Y$2 Z$3/;
>
> I have a few questions.
> 1. What is the tilde for?
From `perldoc perlop`:
Binding Operators
Binary "=~" binds a scalar expression to a pattern match.
Certain operations search or modify the string $_ by default. This
operator makes that kind of operation work on some other string. The
right argument is a search pattern, substitution, or transliteration.
The left argument is what is supposed to be searched, substituted, or
transliterated instead of the default $_. When used in scalar context,
the return value generally indicates the success of the operation.
Behavior in list context depends on the particular operator. See
"Regexp Quote-Like Operators" for details and perlretut for examples
using these operators.
If the right argument is an expression rather than a search
pattern, substitution, or transliteration, it is interpreted as a search
pattern at run time.
Binary "!~" is just like "=~" except the return value is negated
in the logical sense.
> 2. I see you built a pattern to search for consisting of a non-whitespace followed by a whitespace followed by a non etc. I see the replacement, but cant figure out how to modify it for the case where I want to go straight to the last file, X# Y# Z[some
var].
>
> $line =~ s/(\S+)\s+(\S+)\s+(\S+)/X$1 Y$2 [some var]$3/;
>
> Doesn't work? I assume it's the [] chars?
> How do I escape these in to that expression?
$some_var = 'Z'; # change to whatever you want
$line =~ s/(\S+)\s+(\S+)\s+(\S+)/X$1 Y$2 $some_var$3/;
--
Just my 0.00000002 million dollars worth,
Shawn
"For the things we have to learn before we can do them, we learn by
doing them."
Aristotle
| |
| Rob Dixon 2007-07-12, 9:59 pm |
| Joseph L. Casale wrote:
>
> How can I make this expression:
> $line =~ s/(\S+)\s+(\S+)\s+(\S+)/X$1 Y$2 Z$3/
>
> Add some numerical value to the Z$3 part, so if $3
> was 3.14, I want it to be Z4.14 for example by adding 1 to it.
May I reply amending my original solution to your problem, which seems to me
to suit your purpose better? In the code below the value that would have been
held in $3 is in $z, so the extra line of code simply increments it by one.
HTH,
Rob
use strict;
use warnings;
while (<DATA> ) {
my ($x, $y, $z) = split;
$z += 1;
printf "X%s Y%s Z%s\n", $x, $y, $z;
}
__DATA__
-11.67326 23.95923 0.4617566
5.075023 24.27938 0.4484084
6.722163 -24.68986 1.399011
-11.2023 -25.0398 1.145933
**OUTPUT**
X-11.67326 Y23.95923 Z1.4617566
X5.075023 Y24.27938 Z1.4484084
X6.722163 Y-24.68986 Z2.399011
X-11.2023 Y-25.0398 Z2.145933
| |
| Chas Owens 2007-07-13, 3:59 am |
| On 7/12/07, Joseph L. Casale <JCasale@activenetwerx.com> wrote:
> Hi All,
>
> How can I make this expression:
> $line =~ s/(\S+)\s+(\S+)\s+(\S+)/X$1 Y$2 Z$3/
>
> Add some numerical value to the Z$3 part, so if $3 was 3.14, I want it to be Z4.14 for example by adding 1 to it.
Use the e option to turn the replacement into an expression instead of
a double quoted string.
$line =~ s/(\S+)\s+(\S+)\s+(\S+)/"X$1 Y$2 Z" . ($3 + 1)/e
| |
| Rob Dixon 2007-07-13, 9:58 pm |
| Joseph L. Casale wrote:
> One of these scripts has a loop like this:
>
> for my $line (@lines){
> my $line2 = $line;
> $line =~ s/(\S+)\s+(\S+)\s+(\S+)/X$1 Y$2/;
> print FILEOUT $line;
> $line2 =~ s/(\S+)\s+(\S+)\s+(\S+)/Z[$3+DPad]/;
> print FILEOUT $line2;
> print FILEOUT "M98PDRILL.SUBL1\n";
> print FILEOUT "G90\n";
> print FILEOUT "G00 Z[CPlane]\n"
> }
>
> What would be a wise way of trapping a condition such as the line read
> and passed into the loop is not 3 sets of numbers and if so, skip?
I'm sticking with using split() - this sort of thing is exactly what it's for!
The first 'next' statement makes sure there are exactly three data items
in the line, the second one makes sure that none of them contain anything
except the digits 0 through 9, a decimal point or a minus sign.
HTH,
Rob
foreach (@lines) {
my @data = split;
next unless @data == 3;
next if grep /[^0-9.-]/, @data;
printf FILEOUT "X%s Y%s\n", $data[0], $data[1];
printf FILEOUT "Z[%s+DPad]\n", $data[2];
print FILEOUT "M98PDRILL.SUBL1\n";
print FILEOUT "G90\n";
print FILEOUT "G00 Z[CPlane]\n"
}
| |
| Rob Dixon 2007-07-13, 9:58 pm |
| Joseph L. Casale wrote:
> One of these scripts has a loop like this:
>
> for my $line (@lines){
> my $line2 = $line;
> $line =~ s/(\S+)\s+(\S+)\s+(\S+)/X$1 Y$2/;
> print FILEOUT $line;
> $line2 =~ s/(\S+)\s+(\S+)\s+(\S+)/Z[$3+DPad]/;
> print FILEOUT $line2;
> print FILEOUT "M98PDRILL.SUBL1\n";
> print FILEOUT "G90\n";
> print FILEOUT "G00 Z[CPlane]\n"
> }
>
> What would be a wise way of trapping a condition such as the line read
> and passed into the loop is not 3 sets of numbers and if so, skip?
It's also worth pointing out that Paul originally loaded the file data
into an array so that he could use it twice in two successive loops.
Unless the amount of your data is tiny it's much better to remove the
array @lines and write this loop as
while (<DATA> ) {
:
}
which reads the file one line at a time and doesn't need to draw it all
into memory at once.
Cheers,
Rob
| |
| Chas Owens 2007-07-14, 7:58 am |
| On 7/13/07, Joseph L. Casale <JCasale@activenetwerx.com> wrote:
snip
> open (FILEIN, "< $ARGV[0]") or die $!;
> my @lines = <FILEIN>;
snip
In list context the <> operatot returns all lines, but in scalar
context it returns on line at a time. This can be used with a while
loop to walk over the file in pieces (a necessity for large files).
Also, you do not need a copy of $line, just change the substitution to
a match.
#!/usr/bin/perl
use strict;
use warnings;
unless (@ARGV == 2) {
die qq(usage: ConvertASCII.pl "input file name" "output file name"\n)
}
open my $in, '<', $ARGV[0]
or die "could not open $ARGV[0]: $!";
open my $out, '>', $ARGV[1]
or die "could not open $ARGV[1]: $!";
while (defined (my $line = <$in> )) {
my $line2 = $line;
my ($x, $y, $z) = $line =~ /(\S+)\s+(\S+)\s+(\S+)/;
print $out "X$x Y$y\nZ[$z+DPad]\nM98PDRILL.SUBL1\nG90\nG00 Z[CPlane]\n";
}
or if you prefer for the print to be more readable:
print $out
"X$x Y$y\n",
"Z[$z+DPad]\n",
"M98PDRILL.SUBL1\n",
"G90\n",
"G00 Z[CPlane]\n";
Never use multiple print statements when you can use just one.
| |
| Dr.Ruud 2007-07-14, 7:58 am |
| "Chas Owens" schreef:
> print $out
> "X$x Y$y\n",
> "Z[$z+DPad]\n",
> "M98PDRILL.SUBL1\n",
> "G90\n",
> "G00 Z[CPlane]\n";
>
> Never use multiple print statements when you can use just one.
In general that's true, but because of the "Never" I have to object. :)
Sometimes multiple print statements look like only one, I am thinking of
the "print for LIST" construct.
print +(join "\n", @LIST), "\n" ;
print "$_\n" for @LIST;
I often use the way of the second one, but then like this:
{ local $\ = "\n"; print for @LIST }
(because it doesn't have to construct copies of the data).
I hardly ever go the join-way.
--
Affijn, Ruud
"Gewoon is een tijger."
| |
| Rob Dixon 2007-07-14, 7:58 am |
| Joseph L. Casale wrote:
>
> From: Rob Dixon
>
> OK, I saw your example and noted it. I intended on using next time as
> I know there will be:) But now I am convinced, as the lack of error
> checking in my script worries me. I'll take yours and fit it in!
>
> I do need to read up on what you're doing as I am not clear on its
> syntax in this email. I am using this:
>
> if ($#ARGV != 1) {
> print "usage: ConvertASCII.pl \"input file name\" \"output file name\"\n";
> exit;
> }
>
> open (FILEIN, "< $ARGV[0]") or die $!;
> my @lines = <FILEIN>;
> open (FILEOUT, "> $ARGV[1]") or die $!;
>
> So using your syntax escapes me at the moment:)
Hi Joseph
(Please bottom-post your responses to this list, so that longer threads remain
readable. Thank you.)
So your complete script should look like the program below. I wouldn't write it
exactly like this, but I've changed a minimum of your code so that you can see
where it's come from.
One thing I have changed is your usage check. An array in scalar context provides
the number of elements it has, whereas the $#ARGV you used gives the index of the
last element. Together with 'unless' instead of 'if' I believe it makes this
clause more readable.
Post again if there's anything else you don't understand.
HTH,
Rob
use strict;
use warnings;
unless (@ARGV == 2) {
print "usage: ConvertASCII.pl \"input file name\" \"output file name\"\n";
exit;
}
open (FILEIN, "< $ARGV[0]") or die $!;
open (FILEOUT, "> $ARGV[1]") or die $!;
while (<FILEIN> ) {
my @data = split;
next unless @data == 3;
next if grep /[^0-9.-]/, @data;
printf FILEOUT "X%s Y%s\n", $data[0], $data[1];
printf FILEOUT "Z[%s+DPad]\n", $data[2];
print FILEOUT "M98PDRILL.SUBL1\n";
print FILEOUT "G90\n";
print FILEOUT "G00 Z[CPlane]\n"
}
| |
| Mr. Shawn H. Corey 2007-07-14, 7:58 am |
| Joseph L. Casale wrote:
> OK, I saw your example and noted it. I intended on using next time as I know there will be:)
> But now I am convinced, as the lack of error checking in my script worries me. I'll take yours and fit it in!
>
> I do need to read up on what you're doing as I am not clear on its syntax in this email. I am using this:
>
> if ($#ARGV != 1) {
> print "usage: ConvertASCII.pl \"input file name\" \"output file name\"\n";
> exit;
> }
Error messages should be printed to STDERR (that's what it's there for).
It's better to use die or warn. A module or an object should use the
subroutines in Carp.
See:
perldoc -f die
perldoc -f warn
perldoc Carp
--
Just my 0.00000002 million dollars worth,
Shawn
"For the things we have to learn before we can do them, we learn by
doing them."
Aristotle
| |
| Chas Owens 2007-07-14, 6:59 pm |
| On 7/14/07, Dr.Ruud <rvtol+news@isolution.nl> wrote:
snip
>
> In general that's true, but because of the "Never" I have to object. :)
>
> Sometimes multiple print statements look like only one, I am thinking of
> the "print for LIST" construct.
>
> print +(join "\n", @LIST), "\n" ;
>
> print "$_\n" for @LIST;
>
>
> I often use the way of the second one, but then like this:
>
> { local $\ = "\n"; print for @LIST }
>
> (because it doesn't have to construct copies of the data).
>
> I hardly ever go the join-way.
I was referring to writing, not calling, the print function. It is more of
a style/readability thing than an efficiency thing.
By the way, an easier way to write the join version is
print map { "$_\n" } @list;
| |
| Mr. Shawn H. Corey 2007-07-14, 6:59 pm |
| Chas Owens wrote:
> By the way, an easier way to write the join version is
>
> print map { "$_\n" } @list;
>
BTW, that's not the same. Join inserts its string between each element,
map (in this case) appends it to the end. A subtle difference that may
lead to confusion and errors.
--
Just my 0.00000002 million dollars worth,
Shawn
"For the things we have to learn before we can do them, we learn by
doing them."
Aristotle
| |
| Chas Owens 2007-07-14, 6:59 pm |
| On 7/14/07, Mr. Shawn H. Corey <shawnhcorey@magma.ca> wrote:
> Chas Owens wrote:
>
> BTW, that's not the same. Join inserts its string between each element,
> map (in this case) appends it to the end. A subtle difference that may
> lead to confusion and errors.
snip
The code I am referring to is
print +(join "\n", @LIST), "\n" ;
Which does the same thing as
print map { "$_\n" } @list;
The only difference between them is if $, is set.
| |
| Mr. Shawn H. Corey 2007-07-14, 6:59 pm |
| Chas Owens wrote:
>
> The code I am referring to is
>
> print +(join "\n", @LIST), "\n" ;
>
> Which does the same thing as
>
> print map { "$_\n" } @list;
>
> The only difference between them is if $, is set.
True, in this case they are.
But the way you stated your preferences implied they are the same, or at
least, a replacement. As I said, the difference is subtle but significant.
Join works on the "gaps" between elements of an array.
Map works on every element.
--
Just my 0.00000002 million dollars worth,
Shawn
"For the things we have to learn before we can do them, we learn by
doing them."
Aristotle
| |
| Rob Dixon 2007-07-14, 6:59 pm |
| Chas Owens wrote:
>
> On 7/14/07, Mr. Shawn H. Corey <shawnhcorey@magma.ca> wrote:
> snip
>
> The code I am referring to is
>
> print +(join "\n", @LIST), "\n" ;
>
> Which does the same thing as
>
> print map { "$_\n" } @list;
>
> The only difference between them is if $, is set.
No it doesn't. As Shawn said, the join doesn't append a newline
after the last element.
Rob
| |
| Rob Dixon 2007-07-14, 6:59 pm |
| Rob Dixon wrote:
>
> Chas Owens wrote:
>
> No it doesn't. As Shawn said, the join doesn't append a newline
> after the last element.
I realise what you're saying now - the statements as a whole produce
the same output, yes.
Rob
| |
| Chas Owens 2007-07-14, 6:59 pm |
| On 7/14/07, Rob Dixon <rob.dixon@350.com> wrote:
> Chas Owens wrote:
>
> No it doesn't. As Shawn said, the join doesn't append a newline
> after the last element.
>
> Rob
No, it doesn't, but there is an extra "\n" being tacked on after the
join. This is sloppy and confusing, which is why the map version is
better. the join function is not the right function to use here. If
the join character was ",", then join would be appropriate.
| |
| Mr. Shawn H. Corey 2007-07-14, 6:59 pm |
| Chas Owens wrote:
> Eh, writing "join version" was confusing. I am surprised no one
> called me on the real error: @LIST and @list aren't the same variable.
That's because most people who read the mailing list do not run code as
presented.
You will find that most of the arguments here are about philosophical
differences. I should say entrenched philosophical differences.
Nobody can argue with a parser. Either it accepts your code or not.
Nobody can argue with the results of a program. Either it works or it
does not.
What we can and do argue about is the best way to program. The thing to
remember is that if your program achieve your results, it's good
programming. Of course, it can be made faster, smaller, or more
understandable.
Which is where the argument comes in. Some want it faster. Some want
it smaller. Some want it easier to understand.
Vie la difference.
--
Just my 0.00000002 million dollars worth,
Shawn
"For the things we have to learn before we can do them, we learn by
doing them."
Aristotle
|
|
|
|
|