For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > August 2005 > Filtering









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Filtering
whocares

2005-08-06, 4:59 pm

Hi all. I am trying to sort through 13,000 images of album art to match
2500 albums I have right now. I put together a script to hit amazon.com
ro retrieve this stuff but I got some sloppy results and am now trying
to decide the most efficient way of filtering through them all. The
directory structure looks like

/artist_name/album_name/(picture.jpg, artist.jpg) etc

Here is an example listing:
--------------------------
sean@bitwise ~/art/testing $ ls -R ../complete/Vines
.../complete/Vines:
Highly Evolved Winning Days

.../complete/Vines/Highly Evolved:
B.R.M.C. by Black Rebel Motorcycle Club.jpg Highly Evolved by
Vines.jpg Winning Days by The Vines.jpg
Elephant by The White Stripes,White Stripes.jpg Is This It by The
Strokes.jpg Winning Days_Highly Evolved by Vines.jpg
Highly Evolved by The Vines.jpg Rubber Factory by The
Black Keys.jpg

.../complete/Vines/Winning Days:
B.R.M.C. by Black Rebel Motorcycle Club.jpg Is This It by The
Strokes.jpg Winning Days, Pt. 1 by The Vines.jpg
Elephant by The White Stripes,White Stripes.jpg The Beatles 1 by The
Beatles.jpg Winning Days, Pt. 2 by The Vines.jpg
Highly Evolved by The Vines.jpg
-----------------------------

I wrote this script to filter through it all:

-----------------------------
#!/usr/bin/perl
use warnings;

use File::Find();
use Cwd;
use File::Basename;

$slop = "/home/sean/art/slop";

@ARGV = (".") unless @ARGV;
sub find(&@) { &File::Find::find }
*name = *File::Find::name;

find {
$img_name = basename $name;
$cwd = cwd;
if ( $img_name =~ /.*jpg/ ) {
$basename = basename cwd;
unless ( $img_name =~ /^$basename.*/ ) {
rename( $img_name,"/home/sean/art/slop/$img_name" ) or die
$!;
}
}
} @ARGV;
-----------------------------

It works *most* of the time but there are folders that wind up with no
art because that regex doesn't quite match the image name. I was
wondering what might be a more accurate way to do this based on the 2
directory names $artist and $album. Since sometimes the image may be
$artist - $album.jpg, or maybe $album - $artist.jpg, or possibly not
quite either... I also tried

unless ( $img_name =~ /.*$basename.*/ )

but I wound up with extras that dont exactly match the album as well.

Would it be better to read the names of both dirs into an 2 arrays and
base the filter on the most consectutive letter matches of the jpg name?

Any help will be greatly appreciated!

-Sean
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2009 codecomments.com