| whocares 2005-08-06, 4:59 pm |
| Hi all. I am trying to sort through 13,000 images of album art to match
2500 albums I have right now. I put together a script to hit amazon.com
ro retrieve this stuff but I got some sloppy results and am now trying
to decide the most efficient way of filtering through them all. The
directory structure looks like
/artist_name/album_name/(picture.jpg, artist.jpg) etc
Here is an example listing:
--------------------------
sean@bitwise ~/art/testing $ ls -R ../complete/Vines
.../complete/Vines:
Highly Evolved Winning Days
.../complete/Vines/Highly Evolved:
B.R.M.C. by Black Rebel Motorcycle Club.jpg Highly Evolved by
Vines.jpg Winning Days by The Vines.jpg
Elephant by The White Stripes,White Stripes.jpg Is This It by The
Strokes.jpg Winning Days_Highly Evolved by Vines.jpg
Highly Evolved by The Vines.jpg Rubber Factory by The
Black Keys.jpg
.../complete/Vines/Winning Days:
B.R.M.C. by Black Rebel Motorcycle Club.jpg Is This It by The
Strokes.jpg Winning Days, Pt. 1 by The Vines.jpg
Elephant by The White Stripes,White Stripes.jpg The Beatles 1 by The
Beatles.jpg Winning Days, Pt. 2 by The Vines.jpg
Highly Evolved by The Vines.jpg
-----------------------------
I wrote this script to filter through it all:
-----------------------------
#!/usr/bin/perl
use warnings;
use File::Find();
use Cwd;
use File::Basename;
$slop = "/home/sean/art/slop";
@ARGV = (".") unless @ARGV;
sub find(&@) { &File::Find::find }
*name = *File::Find::name;
find {
$img_name = basename $name;
$cwd = cwd;
if ( $img_name =~ /.*jpg/ ) {
$basename = basename cwd;
unless ( $img_name =~ /^$basename.*/ ) {
rename( $img_name,"/home/sean/art/slop/$img_name" ) or die
$!;
}
}
} @ARGV;
-----------------------------
It works *most* of the time but there are folders that wind up with no
art because that regex doesn't quite match the image name. I was
wondering what might be a more accurate way to do this based on the 2
directory names $artist and $album. Since sometimes the image may be
$artist - $album.jpg, or maybe $album - $artist.jpg, or possibly not
quite either... I also tried
unless ( $img_name =~ /.*$basename.*/ )
but I wound up with extras that dont exactly match the album as well.
Would it be better to read the names of both dirs into an 2 arrays and
base the filter on the most consectutive letter matches of the jpg name?
Any help will be greatly appreciated!
-Sean
|