Home > Archive > PERL Miscellaneous > June 2005 > Masking by columns for grep
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Masking by columns for grep
|
|
| Shannon Jacobs 2005-06-09, 3:57 am |
| I want to restrict a grep to certain columns. After extensive contemplation
and a lot of study of the camel book (1991 printing), various manuals and
Web pages, and assorted newsgroup posts, I finally indirectly arrived at
this:
@foo2 = grep(/.{50}$form_values{'a_SEARCH_VALUE'}.{6}/,@foo1);
That's how it appears in the actual file, though for the sake of this
question, a reduced form would be:
@foo = grep(/.{50}$theString.{6},@foo);
The "records" are fixed length, and I want to ignore the first 50 characters
and the last 6. This approach mostly works (unless the string has a | in it,
which is probably a different problem), but it seems to have pegged my
nauseous code detector. I'm interested in constructive suggestions or
bemusing feedback. (Or maybe I should just appeal for a reference to a
source to replace the entire grotesque thing, appealing to the camel page
xviii citation of "laziness" as the first great virtue. Right now this thing
feeds off of another database system that goes back over 20 years, and it
would probably be a good thing to port the entire mess to some reasonable
environment...)
For the sake of context, the entire program appears below. It is actually
live on the Web at shanenj.tripod.com/search0.html. (Yes, I know Tripod has
a brain-damaged interface, but returning the encoded error messages in the
URL is kind of unique. I also know I should move to a decent server.
Consider Perl/CGI on Tripod as one of those semi-religious rituals, rather
like flogging oneself with a chain.)
Warning: If you proceed past this point you are at risk of laughing yourself
sick. By any standard, I'm sure this is some really peculiar old code,
containing about 10 years of occasional hacking and tweaking. I think it was
the Lisp influence that did it... Or maybe the parts that came from the
Chinese guy with the Berkeley Ph.D.? (I'm sure he doesn't want a more
explicit citation for what I've done to his code.) Or the Y2K windowing? Or
something. <Imagine a movie of the "Lost in Space" robot wildly waving its
arms and shouting "Danger, Will Robinson! Danger!">
#!/usr/local/bin/perl
print '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">';
print "Content-type: text/html\n\n";
print "<HTML><HEAD><TITLE>Search Result</TITLE>\n";
print '<SCRIPT language="JavaScript" src="../formfunc.js"></SCRIPT>';
print "\n";
print '<SCRIPT language="JavaScript">var inHTMLdate = "';
print scalar localtime;
print '";</SCRIPT>';
print "\n";
print "</HEAD>\n";
print "<BODY bgcolor=\"#00bfbf\">\n";
print "<hr size=8>\n";
print "<center><h3>BookList Search Form Result</H3></center>\n";
print "<hr size=8>";
%form_values = &html_parse;
if ( $form_values{'a_SEARCH_VALUE'} ) {
open(DATAFILE,'titles.txt');
$i = 0;
while (<DATAFILE> ) { $foo0[$i++] = $_; }
close(DATAFILE);
open(AUTHFILE,'authors.txt');
$i = 1;
while (<AUTHFILE> ) { $authlist[$i++] = $_; }
close(AUTHFILE);
# Time for fishing against authors (Main event of 8JUN2005)
if ($form_values{'AuthorFishing'} eq 'gofish' ) {
@foobar = grep(/$form_values{'a_SEARCH_VALUE'}/i,@authlist);
$form_values{'a_SEARCH_VALUE'} = '';
foreach $GREPPED (@foobar) {
$form_values{'a_SEARCH_VALUE'} =
$form_values{'a_SEARCH_VALUE'}.substr($GREPPED,34,4)."|";
}
$form_values{'a_SEARCH_VALUE'} = $form_values{'a_SEARCH_VALUE'}."90AZ";
}
# Main searches from here
if ($form_values{'sorttype'} eq 'title' ) {
@foo1 = sort ignore3 @foo0
} else { if ($form_values{'sorttype'} eq 'subject') {
@foo1 = sort {substr($a,62,6) cmp substr($b,62,6)} @foo0
} else {
@foo1 = @foo0
}}
if ($form_values{'searchfields'} eq 'authornums' ) {
@foo2 = grep(/.{50}$form_values{'a_SEARCH_VALUE'}.{6}/,@foo1);
# fut } else { if ($form_values{'searchfields'} eq 'dates' ) {
# @foo2 = grep(/$form_values{'a_SEARCH_VALUE'}/,@foo2);
# } else { if ($form_values{'searchfields'} eq 'subjects' ) {
# fut @foo2 = grep(/$form_values{'a_SEARCH_VALUE'}/,@foo2);
} else {
@foo2 = @foo1
}
if ($form_values{'sensecase'} eq 'usecase' ) {
@foo3 = grep(/$form_values{'a_SEARCH_VALUE'}/,@foo2);
} else {
@foo3 = grep(/$form_values{'a_SEARCH_VALUE'}/i,@foo2);
}
$hits = 1+$#foo3;
print ("<center>This search ( $form_values{'a_SEARCH_VALUE'} ) found $hits
books.</center><p>\n");
if ($form_values{'datetype'} eq 'human' ) {
#only load $months once if required
$months[1] = 'January';
$months[2] = 'February';
$months[3] = 'March';
$months[4] = 'April';
$months[5] = 'May';
$months[6] = 'June';
$months[7] = 'July';
$months[8] = 'August';
$months[9] = 'September';
$months[10] = 'October';
$months[11] = 'November';
$months[12] = 'December';
}
print("<center><table border = 1 cell padding = 1>\n");
print("<th>Title of Book</th><th>Pub. Year</th><th>Date Read</th>\n");
print("<th colspan=\"3\">Authors</th><th colspan=\"3\">Subjects</th>\n");
print("<tr></tr><tr></tr>\n");
foreach $GREPPED (@foo3) {
print("<tr>");
print("<td>",substr($GREPPED,0,40),"</td>\n");
print("<td>",substr($GREPPED,40,4),"</td>");
# print("<td>",substr($GREPPED,44,6),"</td>");
print("<td>".nicedate(substr($GREPPED,44,6))."</td>");
# From here store the authors and get names if required
$auth1 = substr($GREPPED,50,4);
$auth2 = substr($GREPPED,54,4);
$auth3 = substr($GREPPED,58,4);
if ( $form_values{'numorname'} eq 'authnames' ) {
print("<td>".substr($authlist[$auth1],0,34)."<br></td>");
print("<td>".substr($authlist[$auth2],0,34)."<br></td>");
print("<td>".substr($authlist[$auth3],0,34)."<br></td>\n");
} else { if ( $form_values{'numorname'} eq 'both' ) {
print("<td>".substr($authlist[$auth1],0,34)."( $auth1 )<br></td>");
print("<td>".substr($authlist[$auth2],0,34)."( $auth2 )<br></td>");
print("<td>".substr($authlist[$auth3],0,34)."( $auth3 )<br></td>\n");
} else {
print("<td>$auth1<br></td>");
print("<td>$auth2<br></td>");
print("<td>$auth3<br></td>\n");
}}
# print("<td>",substr($GREPPED,50,4),"<br></td>");
# print("<td>",substr($GREPPED,54,4),"<br></td>");
# print("<td>",substr($GREPPED,58,4),"<br></td>\n");
print("<td>",substr($GREPPED,62,2),"<br></td>");
print("<td>",substr($GREPPED,64,2),"<br></td>");
print("<td>",substr($GREPPED,66,2),"<br></td>");
print("</tr>\n\n");
}
print("</table></center>\n");
} else {
print ("Nothing to Search For<br>\n");
}
print '<SCRIPT language="JavaScript">';
print "\n";
print 'threefooter('Return
to<br>shanen\\'s<br>Home','/index.html','procbook');';
print "\n";
print '</SCRIPT>';
print "\n";
# print "<hr size=8>";
# print "<center><P>End of Search Results<P><br>\nClick to go back to my
Index page </center>\n";
# print "<P><center>";
# print "<A http=\"/index.html\"><img src=\"/images/home.gif\"
border=0></a>";
# print "</center><P>";
# print "<hr size=8><P>";
print "</BODY></HTML>";
exit(0);
sub ignore3 {
# local ($acut,$bcut);
$acut = 0;
$bcut = 0;
if (index($a,'A ') == 0) {$acut = 2}
if (index($a,'An ') == 0) {$acut = 3}
if (index($a,'The ') == 0) {$acut = 4}
if (index($b,'A ') == 0) {$bcut = 2}
if (index($b,'An ') == 0) {$bcut = 3}
if (index($b,'The ') == 0) {$bcut = 4}
#print ("$a1 cmp $b1");
substr($a,$acut) cmp substr($b,$bcut)
# $a cmp $b
}
sub nicedate {
if ($form_values{'datetype'} eq 'human' ) {
if (substr(@_[0],4,1) eq '0') {
$foo = $months[substr(@_[0],2,2)].'
'.substr(@_[0],5,1)
} else {
$foo = $months[substr(@_[0],2,2)].'
'.substr(@_[0],4,2)
};
# Y2K window added to 1960-2059, slj 3/20/2000 including above #s
if (substr(@_[0],0,1) gt '5') {
$foo.', 19'.substr(@_[0],0,2)
} else {
$foo.', 20'.substr(@_[0],0,2)
}
# Yeah, I know it isn't pretty but I haven't touched PERL in a LONG time...
} else {
if ($form_values{'datetype'} eq 'usa' ) {
substr(@_[0],2,2).'/'.substr(@_[0],4,2).'/'.substr(@_[0],0,2)
} else {
@_[0]
}
}
}
sub html_parse
{
local($line, $length, $offset, @pairs);
# if ($ENV{"REQUEST_METHOD"} eq "GET") {
# $line = $ENV{"QUERY_STRING"};
# }
# elsif ($ENV{"REQUEST_METHOD"} eq "POST") {
#
# ($ENV{"CONTENT_TYPE"} ne "application/x-www-form-#urlencoded") &&
# &html_fatal("Illegal Content-Type '" . $ENV
# $length = $ENV{"CONTENT_LENGTH"};
# (($length =~ m/^\d+$/) == 0) &&
# &html_fatal("Content-Length variable not found");
#
# (read(STDIN, $line, $length) == $length) ||
# &html_fatal("Could not read form info from stdin: " .
#$length);
# }
# else {
# &html_fatal("Illegal Request-Method '" . $ENV
#{"REQUEST_METHOD"} . "'");
# }
# Probably not allowed to check ENV so just take the query string?
# Just use a GET and the one line?
$line = $ENV{"QUERY_STRING"};
((!defined($line)) || (length($line) == 0)) &&
&html_fatal("No values passed in");
@pairs = split(/[=&]/, $line);
(substr($line, $length - 1, 1) eq "=") &&
($pairs[@pairs] = "");
$offset = 0;
foreach $value (@pairs) {
($offset++ & 1) &&
&html_unescape($value);
}
return @pairs;
}
sub html_unescape
{
local($offset);
$_[0] =~ s/\+/ /g;
$offset = -1;
while (($offset = index($_[0], "%", $offset + 1)) >= 0) {
substr($_[0], $offset, 3) =
pack("c", hex(substr($_[0], $offset + 1, 2)));
}
}
sub html_fatal
{
print "<H3>Error Detected</H3><P>";
print @_[0], "\n";
exit 0;
}
| |
| Shannon Jacobs 2005-06-09, 3:57 am |
| Shannon Jacobs wrote:
<snip>
> @foo = grep(/.{50}$theString.{6},@foo);
Should be:
@foo = grep(/.{50}$theString.{6}/,@foo);
| |
| John Bokma 2005-06-09, 3:57 am |
| Shannon Jacobs wrote:
> For the sake of
our sanity, first many tips:
> #!/usr/local/bin/perl
CGI, so add -T to the end of that line
and next:
use strict;
use warnings;
> print '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0
> Transitional//EN">'; print "Content-type: text/html\n\n";
Ouch! You should print the header first...
And since it's CGI:
use CGI::Carp qw(fatalsToBrowser);
use CGI;
my $cgi = new CGI;
print $cgi->header;
next: here docs, instead of:
> print "<HTML><HEAD><TITLE>Search Result</TITLE>\n";
:
:
> print "<center><h3>BookList Search Form Result</H3></center>\n";
> print "<hr size=8>";
print <<HTML;
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>....
:
:
<hr size="8">
HTML
> %form_values = &html_parse;
Don't use & in front of a sub, unless you need it's special effects.
throw away that CGI for Dummies (1996)
> if ( $form_values{'a_SEARCH_VALUE'} ) {
> open(DATAFILE,'titles.txt');
or die
> $i = 0;
> while (<DATAFILE> ) { $foo0[$i++] = $_; }
> close(DATAFILE);
or die
>
> open(AUTHFILE,'authors.txt');
or die
> $i = 1;
> while (<AUTHFILE> ) { $authlist[$i++] = $_; }
> close(AUTHFILE);
or die
> $hits = 1+$#foo3;
$hits = @foo3;
use clear names.
> $months[1] = 'January';
:
> $months[12] = 'December';
aargh:
my @months = qw( January February ...... December );
if you really need 1 based, put a dummy before January
> foreach $GREPPED (@foo3) {
Don't use ALLCAPS for non-constant scalars
[ snip ]
A *lot* of stuff that could be made more readable with here docs.
> # Yeah, I know it isn't pretty but I haven't touched PERL in a LONG
> time...
10 years?
> sub html_parse
Dump that one
> sub html_fatal
With fatals to browser you can just use die
--
John Small Perl scripts: http://johnbokma.com/perl/
Perl programmer available: http://castleamber.com/
Happy Customers: http://castleamber.com/testimonials.html
| |
| Shannon Jacobs 2005-06-09, 8:57 am |
| John Bokma wrote:
> Shannon Jacobs wrote:
>
>
> our sanity, first many tips:
Well, thanks, and you show some of the third virtue, hubris, but I'm not
sure how many of these tips apply to the environment of Tripod. To call
Tripod's CGI/Perl interface "twisted" is a charitable description, but
"lobotomized" is probably more accurate. I'm pretty sure I tried a number of
these things, especially in your early recommendations. I'm not even sure
the here docs worked properly in that environment, though it has also been a
long time since I tried those experiments. (That was probably during the
time when I was running the same code on two or three servers, and Tripod
was definitely the LCD. The Tripod version only survived because it's a
reasonably stable system, and because it's hard to argue with the price.) It
would be nice if you would have included a bit of explanation for some of
the less obvious parts. For example, I think there's something non-obvious
about the "or die", since I already knew about it, and I'm reasonably sure
it was used in the original code. There must have been some reason I took it
out here...
<snipping to conserve resources in the reply>
| |
| Jim Gibson 2005-06-09, 3:58 pm |
| In article <42a7cd2b_2@news1.prserv.net>, Shannon Jacobs
<shanen@cashette.com> wrote:
> Shannon Jacobs wrote:
> <snip>
>
> Should be:
>
> @foo = grep(/.{50}$theString.{6}/,@foo);
>
If you really want to ignore the first 50 columns and ignore the last
6, you should anchor your pattern:
@foo = grep(/^.{50}$theString.{6}$/,@foo);
----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! 120,000+ Newsgroups
----= East and West-Coast Server Farms - Total Privacy via Encryption =----
| |
| John Bokma 2005-06-09, 3:58 pm |
| Shannon Jacobs wrote:
> John Bokma wrote:
>
> Well, thanks, and you show some of the third virtue, hubris,
:-D. Yeah, while on the other news people say I have a low self esteem,
so yup.
> but I'm
> not sure how many of these tips apply to the environment of Tripod.
Me neither, but lets see:
> To
> call Tripod's CGI/Perl interface "twisted" is a charitable
> description, but "lobotomized" is probably more accurate.
You can try a mini-script, and see what it does:
#!/usr/bin/perl -T
use strict;
use warnings;
use CGI::Carp qw(fatalsToBrowser);
use CGI;
my $cgi = new CGI;
print $cgi->header( 'text/plain' ),
<<"HELLO";
Hello, World!
HELLO
If they don't have CGI.pm installed, which I doubt, their interface is
not that twisted, just incomplete.
> I'm pretty
> sure I tried a number of these things, especially in your early
> recommendations. I'm not even sure the here docs worked properly in
> that environment,
I think here docs are in Perl4, no idea about 3, but have *nothing* to
do with CGI.
> though it has also been a long time since I tried
> those experiments. (That was probably during the time when I was
> running the same code on two or three servers, and Tripod was
> definitely the LCD. The Tripod version only survived because it's a
> reasonably stable system, and because it's hard to argue with the
> price.)
I have no idea about the price, but I have webspace which doesn't criple
me, Perl, or CGI at 1 USD a month (excluding domain registration, all in
it comes down to 19.95 USD a year!)
> It would be nice if you would have included a bit of
> explanation for some of the less obvious parts.
That would come down to a Perl course for beginners :-)
> For example, I think
> there's something non-obvious about the "or die", since I already knew
> about it, and I'm reasonably sure it was used in the original code.
> There must have been some reason I took it out here...
Your script died? Which means that the open ( or close, etc ) failed.
If you output the value of $! you would know the reason.
Programming is not cooking, even though there are cookbooks. It's not:
ah, it tastes not nice, lets add some salt, and remove some of the
vegetables.
Programming is planning ahead what you want, then shop for the
ingredients, and the result *will* taste good, since you planned and
bought the right ingredients, and used the right amounts.
And not blaming the color of the egg plant, because the result tastes
too salty :-)
--
John Small Perl scripts: http://johnbokma.com/perl/
Perl programmer available: http://castleamber.com/
Happy Customers: http://castleamber.com/testimonials.html
| |
| Brian McCauley 2005-06-10, 3:58 pm |
|
Jim Gibson wrote:
> In article <42a7cd2b_2@news1.prserv.net>, Shannon Jacobs
> <shanen@cashette.com> wrote:
>
>
>
>
> If you really want to ignore the first 50 columns and ignore the last
> 6, you should anchor your pattern:
>
> @foo = grep(/^.{50}$theString.{6}$/,@foo);
Actually sinde the OP stated fixed length records you can ignore the
last 6 by, er, simply ignoring them.
@foo = grep(/^.{50}\Q$theString/,@foo);
(Note: \Q inserted as per other branch of this thread as $theString is a
target string not a target regex).
|
|
|
|
|