Home > Archive > PERL Beginners > July 2004 > sort files by extension
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
sort files by extension
|
|
| Perl.Org 2004-07-28, 8:56 pm |
| I have a list of files I want to case-insensitive sort by extension, files
with no extension appearing first. It should handle both Windows and Unix
directory separators. I think I have working code, but I am interested in the
various syntax for this one - there MUST be a better way than my feeble attempt:
use strict;
my @input = ( '/path/to/file/with.ext', '/path/to/file/with.htm',
'/path/to/file/without', '/path/to/file/with.eml', '/path/to/file/with.pdf' );
my @output = sort
{
my $ex1 = '';
my $ex2 = '';
if ( $a =~ m#[^\\/]\.([^\\/]+)$# )
{
$ex1 = $1;
}
if ( $b =~ m#[^\\/]\.([^\\/]+)$# )
{
$ex2 = $1;
}
return( lc( $ex1 ) cmp lc( $ex2 ));
} @input;
print join( $/, @output );
C:\temp>sortext.pl
/path/to/file/without
/path/to/file/with.eml
/path/to/file/with.ext
/path/to/file/with.htm
/path/to/file/with.pdf
| |
| James Edward Gray II 2004-07-28, 8:56 pm |
| On Jul 28, 2004, at 12:27 PM, perl.org wrote:
> I have a list of files I want to case-insensitive sort by extension,
> files
> with no extension appearing first. It should handle both Windows and
> Unix
> directory separators. I think I have working code, but I am
> interested in the
> various syntax for this one - there MUST be a better way than my
> feeble attempt:
I'm never one to abuse working code, but if your definition of "better"
has something to do with sorter, a simple Schwartzian Transformation
seems to work:
@input = map { $_->[0] }
sort { lc($a->[1]) cmp lc($b->[1]) }
map { m/\.([^.]+)$/ ? [$_, $1] : [$_, ''] } @input;
Hope that helps.
James
| |
| Jeff 'Japhy' Pinyan 2004-07-28, 8:56 pm |
| On Jul 28, perl.org said:
>I have a list of files I want to case-insensitive sort by extension,
>files with no extension appearing first. It should handle both Windows
>and Unix directory separators. I think I have working code, but I am
>interested in the various syntax for this one - there MUST be a better
>way than my feeble attempt:
I would use File::Basename so that I can be sure it works on all
platforms. What is your take on files with multiple extensions, like
program.pl.bak or jeff.pinyan.txt?
--
Jeff "japhy" Pinyan % How can we ever be the sold short or
RPI Acacia Brother #734 % the cheated, we who for every service
http://japhy.perlmonk.org/ % have long ago been overpaid?
http://www.perlmonks.org/ % -- Meister Eckhart
| |
| James Edward Gray II 2004-07-28, 8:56 pm |
| On Jul 28, 2004, at 2:26 PM, WilliamGunther@aol.com wrote:
> Maybe I'm missing something but since you're doing Schwartzian
> Transformation
> already why call lc() every time?
>
> @input = map { $_->[0] }
> sort { $a->[1] cmp $b->[1] }
> map { m/\.([^.]+)$/ ? [$_, lc($1)] : [$_, ''] } @input;
Good point. You're way is more efficient.
James
| |
| WilliamGunther@aol.com 2004-07-28, 8:56 pm |
| In a message dated 7/28/2004 12:41:43 PM Eastern Standard Time,
james@grayproductions.net writes:
Transformation
>@input = map { $_->[0] }
> sort { lc($a->[1]) cmp lc($b->[1]) }
> map { m/\.([^.]+)$/ ? [$_, $1] : [$_, ''] } @input;
Maybe I'm missing something but since you're doing Schwartzian Transformation
already why call lc() every time?
@input = map { $_->[0] }
sort { $a->[1] cmp $b->[1] }
map { m/\.([^.]+)$/ ? [$_, lc($1)] : [$_, ''] } @input;
--
-will
http://www.wgunther.tk
(the above message is double rot13 encoded for security reasons)
Most Useful Perl Modules
-strict
-warnings
-Devel::DProf
-Benchmark
-B::Deparse
-Data::Dumper
-Clone
-Perl::Tidy
-Beautifier
-DBD::SQLite
| |
| Perl.Org 2004-07-28, 8:56 pm |
| On Wed, 28 Jul 2004 14:04:26 -0400 (EDT), Jeff 'japhy' Pinyan wrote
> On Jul 28, perl.org said:
>
> I would use File::Basename so that I can be sure it works on all
> platforms. What is your take on files with multiple extensions, like
> program.pl.bak or jeff.pinyan.txt?
Excellent point about basename - I knew of the module, not sure why I didn't
use it. For multi-extension files (which there shouldn't be, it's a website)
I just want the last extension. Now to find out which basename returns...
Thanks,
-John
| |
| Jeff 'Japhy' Pinyan 2004-07-29, 3:56 pm |
| On Jul 28, perl.org said:
>On Wed, 28 Jul 2004 14:04:26 -0400 (EDT), Jeff 'japhy' Pinyan wrote
>
>Excellent point about basename - I knew of the module, not sure why I didn't
>use it. For multi-extension files (which there shouldn't be, it's a website)
>I just want the last extension. Now to find out which basename returns...
Here's one approach;
use File::Basename;
my @sorted =
map +(split /\0/, $_, 3)[2],
sort
map lc((fileparse($_, qr/\.[^.]*$/))[2] . "\0$_") . "\0$_",
glob "*";
This is a Guttman-Rosler Transform. Unlike a Schwartzian Transform that
creates datastructures, a GRT creates strings that go directly to sort(),
which is faster than supplying sort() with some sorting function.
The strings created look like this:
lowercase ext \0 lowercase filename \0 original filename
I'm using NULLs because they sort lower than any other character. They're
useful here to make sure shorter extensions and filenames are sorted
"earlier" than longer extensions and filenames that start the same way.
What I mean is that ".htm" should come before ".html", and no matter how
the filenames look, this sorting mechanism will ensure that.
It might look messy, but that's the price you pay.
--
Jeff "japhy" Pinyan % How can we ever be the sold short or
RPI Acacia Brother #734 % the cheated, we who for every service
http://japhy.perlmonk.org/ % have long ago been overpaid?
http://www.perlmonks.org/ % -- Meister Eckhart
|
|
|
|
|