For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > July 2004 > Re: sort by extension









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Re: sort by extension
James Edward Gray II

2004-07-28, 8:56 pm

(Let's keep our discussion on the list for all to help and learn,
please.)

On Jul 28, 2004, at 12:55 PM, John West wrote:

> James Edward Gray II <james@grayproductions.net>
>
> On Wed, 28 Jul 2004 12:41:08 -0500, James Edward Gray II wrote
>
> Thanks for the suggestion. I'm not really sure what I mean by better
> - seems like there are tradeoffs between readability, performance and
> length; mine is too long. Unfortunately I can't follow this code (I'm
> not a big fan of map or grep) so I probably wouldn't take this
> approach unless it really improves performance.


Not a "fan" of map() and grep() or just don't understand them? I'm
hoping it's the second, since the first doesn't make much sense to me.
Luckily, we can fix the second.

map() is generally just a shorthand for a foreach loop. You feed it a
list and a chunk of processing code. It runs the code on each item in
the list and spits out a list that is the results the code returned for
each item. Example:

my @names = qw(JAMES BOB JEFF);

my @title_case = map { "\u\L$_" } @names; # process each name

print "@title_case\n"; # prints James Bob Jeff

Let's apply that thinking to demystify the code I offered you. It's
easier to read that code bottom to top, so let's start with:

map { m/\.([^.]+)$/ ? [$_, $1] : [$_, ''] } @input;

That processes each element of the array input. The processing code
looks for an extension and returns a two element array (by reference)
that contains the original string in the first slot and the extension
in the second (or an empty string, if none could be found). Our
modified list of originals and extensions side-by-side is then handed
up the chain to:

sort { lc($a->[1]) cmp lc($b->[1]) }

Are you a "fan" of sort()? It works just like map() and grep().

You feed it a list, a condition or conditions you want it sorted on and
it returns the results. My condition here is simply what you asked
for, by extension case-insensitively. We have our answer at this
point, but not in the format expected so we feed it to one more map():

@input = map { $_->[0] }

This is simply the reverse of the first map(). Where it turned
everything into a two-part list (original and extension), this one
returns everything to it's original format, discarding referenced
arrays and extensions. The result of that final operation is then
store back in the array.

grep() is more a list filtering tool. Feed it a list, tell it what
you're looking for, and you get just those elements back. Example:

my @numbers = (104, 3, 102, 1, 100);

my @small_numbers = grep { $_ < 10 } @numbers; # search list

print "@small_numbers\n"; # prints 3 1

map() and grep() are powerful tools, because they allow us to specify
what work we want done. You can argue that they hurt readability in
some cases, as long as I can argue that they help it in others.
Eventually, when you're learning any language, you have to start
picking up the slang too, so you can speak it like the pros do.
Nothing wrong with that.

Work with them a little and see if you can't put them to good use. I
think you'll surprise yourself and hopefully, become a "fan"...

James

Perl.Org

2004-07-29, 3:56 pm

Thanks for the detailed response. I know the interspersed comments won't make
some members of the list happy, but they're just opinions.

On Wed, 28 Jul 2004 13:46:07 -0500, James Edward Gray II wrote
>
> Not a "fan" of map() and grep() or just don't understand them?


Both, didn't understand them and still find them hard to read. I have to
worry about programmers with no Perl experience maintaining the code, and
since these structures don't seem to exist in any other languages they are
intimidating. Plus, the Perl may need to get ported to Java or C#, which is
easiest if the logic structures can be similar. I would only use them if they
significantly improve performance, not to reduce typing (since then I would
have to type a long comment reminding me what the logic does - your
explanation of this solution is precise but lengthy).

> It's
> easier to read that code bottom to top, so let's start with:
>
> map { m/\.([^.]+)$/ ? [$_, $1] : [$_, ''] } @input;


Unfortunately I really don't find this easy to read.

> Are you a "fan" of sort()? It works just like map() and grep().


I like the convenient functionality, but I don't like the syntax. I don't
understand why there is no between logic and @data here:

print join( $/, sort( { $a <=> $b } @data ));

These are the kinds of idiosyncrasies that make Perl difficult when coming
from other languages.

> I think you'll surprise yourself and hopefully, become a "fan"...


I agree, I could really use these, but the fact is I probably won't be
programming Perl much longer so I may not make the effort. If I could just
think of a way to make them clear to other programmers - maybe I'll just link
to your post in the docs. Again, thanks for the thorough explanation.
Jeff 'Japhy' Pinyan

2004-07-29, 3:56 pm

On Jul 29, perl.org said:

>
>Unfortunately I really don't find this easy to read.


That's why he broke it down. Is the map() really the problem, or is it
the regex, the ?: operator, and the two array references?

>I like the convenient functionality, but I don't like the syntax. I don't
>understand why there is no between logic and @data here:
>
>print join( $/, sort( { $a <=> $b } @data ));


I'm not sure I understand YOU. Did you mean "... why there is COMMA
between THE SORT logic and @data here:"? That makes more sense.

The reason there's no comma is because Perl's grammar says you don't put
commas after blocks like that.

map BLOCK LIST
grep BLOCK LIST
sort BLOCK LIST

cf.:

map EXPR, LIST
grep EXPR, LIST

--
Jeff "japhy" Pinyan % How can we ever be the sold short or
RPI Acacia Brother #734 % the cheated, we who for every service
http://japhy.perlmonk.org/ % have long ago been overpaid?
http://www.perlmonks.org/ % -- Meister Eckhart

Perl.Org

2004-07-29, 3:56 pm

On Thu, 29 Jul 2004 12:08:20 -0400 (EDT), Jeff 'japhy' Pinyan wrote
>
> That's why he broke it down. Is the map() really the problem, or is
> it the regex, the ?: operator, and the two array references?


All of the above ;), but now that I think about it the map looks easiest, then
?: (which I also prefer not to use, along with unless). At least map is a
word instead of a (cryptic) token.

>
> I'm not sure I understand YOU. Did you mean "... why there is COMMA
> between THE SORT logic and @data here:"? That makes more sense.


Yep, typo on sort, thought the other was obvious.

> The reason there's no comma is because Perl's grammar says you don't
> put commas after blocks like that.


Great, maybe this is what I'm missing - what is the difference between
sort/map/grep blocks and other blocks? I don't understand why I would want
multiple syntaxes for what I consider function calls. Just out of curiosity,
what other functions/blocks work this way?

Thanks again,

-John

James Edward Gray II

2004-07-29, 3:56 pm

On Jul 29, 2004, at 10:42 AM, perl.org wrote:

> Thanks for the detailed response.


Anytime.

>
> Unfortunately I really don't find this easy to read.


That's unfortunate, because I actually changed it to that because I
hoped it would be easier to follow. I originally used:

map { [ m/^(.+?(?:\.([^.]*))?)$/ ] } @input;

when I was testing your problem, so I'll post that here just to show
another approach. (You must use my lc()ed sort with this. The reason
I had it that way.)

This solution is more my style, but I worried a bit trickier. It
avoids the if/else shorthand of ?:, but you need to at least understand
that a regex evaluated in list context returns the list of all the
items captured by it's execution to read it.

TIMTOWTDI, as they say. One of the things I prefer about Perl is how
it usually leaves me with the choice to approach problems in a way I
like.

James

James Edward Gray II

2004-07-29, 3:56 pm

On Jul 29, 2004, at 11:23 AM, perl.org wrote:

> On Thu, 29 Jul 2004 12:08:20 -0400 (EDT), Jeff 'japhy' Pinyan wrote
>
> All of the above ;), but now that I think about it the map looks
> easiest, then
> ?: (which I also prefer not to use, along with unless). At least map
> is a
> word instead of a (cryptic) token.


Just for the sake of completeness.

COND ? TRUE CODE : FALSE CODE

is generally the same as

if (COND) { TRUE CODE; }
else { FALSE CODE; }

James

Charles K. Clarkson

2004-07-29, 3:56 pm

perl.org <perl.org@jpw3.com> wrote:

: Thanks for the detailed response. I know the interspersed
: comments won't make some members of the list happy, but
: they're just opinions.
:
: James Edward Gray II wrote:
: : Not a "fan" of map() and grep() or just don't understand them?
:
: Both, didn't understand them and still find them hard to
: read. I have to worry about programmers with no Perl
: experience maintaining the code, and since these structures
: don't seem to exist in any other languages they are
: intimidating. Plus, the Perl may need to get ported to
: Java or C#, which is easiest if the logic structures can
: be similar. I would only use them if they significantly
: improve performance, not to reduce typing (since then I
: would have to type a long comment reminding me what the
: logic does - your explanation of this solution is precise
: but lengthy).

There is another solution to this problem that involves
only simple perl statements that should be easier to read
than map and ?:. It is based on some slides by MJ Dominus
about an Indirect Sort.

http://perl.plover.com/yak/hw2/samples/slide003.html


use File::Basename 'fileparse';

my @files = qw(
/path/to/file/index.ext
/path/to/another/file/with.htm
/path/to/YA/file/small
/path/to/file/foo.eml
/path/to/file/bar.pdf
/index.htm
);

# create an array of extensions
my @extensions;
foreach my $file ( @files ) {
push @extensions, ( fileparse( $file, '\..*' ) )[2];
}

# sort the indices
my @sorted_indices =
sort { $extensions[$a] cmp $extensions[$b] }
0 .. $#extensions;

my @sorted_files = @files[@sorted_indices];


I used File::Basename's fileparse() subroutine to
get the file extensions. You could expand this for
clarity.

my @extensions;
foreach my $file ( @files ) {
my( $base, $path, $type ) = fileparse( $file, '\..*' );
push @extensions, $type;
}


Note that you could easily create additional
arrays for other sorts.

# create an array of extensions and filenames
my( @extensions, @filenames );
foreach my $file ( @files ) {
my( $base, $path, $type ) = fileparse( $file, '\..*' );
push @extensions, $type;
push @filenames, "$base.$type";
}

# sort indices by extension
my @extension_sorted_indices =
sort { $extensions[$a] cmp $extensions[$b] }
0 .. $#extensions;

# sort indices by filename
my @filename_sorted_indices =
sort { $filenames[$a] cmp $filenames[$b] }
0 .. $#filenames;

my @extension_sorted_files = @files[@extension_sorted_indices];
my @filename_sorted_files = @files[@filename_sorted_indices];


HTH,

Charles K. Clarkson
--
Mobile Homes Specialist
254 968-8328








Randy W. Sims

2004-07-30, 3:56 am

perl.org wrote:
> On Thu, 29 Jul 2004 12:08:20 -0400 (EDT), Jeff 'japhy' Pinyan wrote
>
>
>
> All of the above ;), but now that I think about it the map looks easiest, then
> ?: (which I also prefer not to use, along with unless). At least map

is a
> word instead of a (cryptic) token.


map() is easier to understand if you rearange it a bit. Take the
expression you've been discussing:

map { m/\.([^.]+)$/ ? [$_, $1] : [$_, ''] } @input;

Well, let's make it a complete expression

my @results = map { m/\.([^.]+)$/ ? [$_, $1] : [$_, ''] } @input;

Let's add a little formatting:

my @results = map {
m/\.([^.]+)$/
? [$_, $1]
: [$_, '']
} @input;

Change the spelling of map and rearrange a bit:

my @results;
foreach (@input) {
push( @results, m/\.([^.]+)$/
? [$_, $1]
: [$_, ''] );
}

We can change the spelling of that ?: thingy too:

my @results;
foreach (@input) {
if ( m/\.([^.]+)$/ ) {
push( @results, [$_, $1];
} else {
push( @results, [$_, ''];
}
}

Let's be a little more specific about what we're working with:

my @results;
foreach my $file (@input) {
if ( $file =~ m/\.([^.]+)$/ ) {
push( @results, [$file, $1];
} else {
push( @results, [$file, ''];
}
}

That is easier to read. It's more obvious what is going on. I hardly
ever write code like this. ;-)

For better or worse, I've gotten use to the terseness that perl is
capable of. The original map expression reads just like good prose to
me. I guess a person can get used to about anything.

Actually, there does seem to be an advantage to the terseness: it seems
to help me comprehend larger chucks of code when reading unfamiliar
code. The above is not the greatest example, but you can see that in the
rearranged version you have to read about a half dozen lines to figure
out what is going on whereas in the original, it's one line.

I can't say which is better, terseness or verbosity. I can say that I
had very similar opions to yours when I first encountered perl. I come
from a C/C++ background, and I like minimalist languages. The fewer
constructs in a language, the fewer the complexities, the fewer the
problems. I remember reading the camel book and hating all the ways of
saying if/else/unless, amoung other things. Too many ways to do it
(TMWTDI). I would never use some of those extraneous constructs...

Oh well, so much for first impressions. I'm now a sinner with the worst
of them. There's not a construct I won't abuse. I glory in terseness. I
spend untold amounts of time trying to sqeeze two expressions into one.
I spend precious cycles pouring over sections of code not because it
doesn't work, but because it doesn't look right... esthetically. I want
to find ways to restructure my code... poetically.

Confused?
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com