Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

Finding duplicate files
Hello,

I have a very large directory structure which I need to copy to a
Windows server. Unfortunately there are several directories which have
multiple files which have the same name but different case which is
obviously not going to be tolerated by Windows. I need to make a list
of all of these files so that the user can determine whether the
duplicates need to be moved, renamed, or deleted. I've searched around
but I can't find a script that will do this and I'm not very good with
regular expressions or recursion =)

The directory structure in question resides on a Fedora Core 4 server.

Any help would be greatly appreciated.

Report this thread to moderator Post Follow-up to this message
Old Post
ChrisV
04-03-08 12:41 AM


Re: Finding duplicate files
On 2008-04-02, ChrisV <rpmonkey@gmail.com> wrote:

> I have a very large directory structure which I need to copy to a
> Windows server. Unfortunately there are several directories which have
> multiple files which have the same name but different case which is
> obviously not going to be tolerated by Windows. I need to make a list
> of all of these files so that the user can determine whether the
> duplicates need to be moved, renamed, or deleted. I've searched around
> but I can't find a script that will do this and I'm not very good with
> regular expressions or recursion =)
>
> The directory structure in question resides on a Fedora Core 4 server.
>
> Any help would be greatly appreciated.

I have a ready-made python script, called xdoubles. Feel free to pick it up
on http://www.rikishi42.net/SkunkWorks/Junk/.



If python is not available, you can still use the md5 key approach; how
about :

find /start_dir/ -type f -exec md5sum '{}' ';' > md5_list.txt

Just sort the list, and run it trough uniq to display only the doubles:
cat md5_list.txt | sort | uniq -D

... should do it.


PS: If you only problem is a difference in case, and you're sure the only
difference between the files is the case (not the content), then do't worry.
Just copy the whole bunch. As Windows doesn't really care about case, the
second file will just overwrite the first. Just enable overwrite, in the
copy.


--
There is an art, it says, or rather, a knack to flying.
The knack lies in learning how to throw yourself at the ground and miss.
Douglas Adams

Report this thread to moderator Post Follow-up to this message
Old Post
Rikishi 42
04-03-08 12:41 AM


Re: Finding duplicate files
ChrisV wrote:
> Hello,
>
> I have a very large directory structure which I need to copy to a
> Windows server. Unfortunately there are several directories which have
> multiple files which have the same name but different case which is
> obviously not going to be tolerated by Windows. I need to make a list
> of all of these files so that the user can determine whether the
> duplicates need to be moved, renamed, or deleted. I've searched around
> but I can't find a script that will do this and I'm not very good with
> regular expressions or recursion =)
>
> The directory structure in question resides on a Fedora Core 4 server.
>
> Any help would be greatly appreciated.

I cannot tell about the find command you have on WinDOS, but in case
you've got installed Cygwin on your WinDOS (or have MKS installed)
and make sure the respective find command is listed first in PATH
(i.e. before the WinDOS find), then you can just get the recursive
directory structure from any current working directory by

find . -type f | sort >all_win_files

(The sorting may also be a separate step performed on all_win_files
after transferring it to the Unix box.)

The same on the Unix box

find . -type f | sort >all_win_files

And finally compare with case ignored to see the differences

diff -i all_win_files all_win_files >differences

You can also use comm instead of diff (see "man comm" for details)
and suppress the duplicate files by specifying options -1, -2, or -3.
My comm program doesn't seem to support case-insensitive comparison,
so it may be necessary to use a tr command to convert case

find . -type f | tr 'A-Z' 'a-z' | sort >file_list

Once you have the file listing you can use that information to build
a tar archive or zip file from the files to be copied.

In case you have a modern shell that's what you can do...

On WinDOS:

find . -type f >winfiles    # then transfer file winfiles to Unix

On Unix:

comm -3 <( find . -type f | tr 'A-Z' 'a-z' | sort ) \
<( cat winfiles   | tr 'A-Z' 'a-z' | sort ) |
xargs tar -cvf tarfile.tar    # then transfer to WinDOS

Note: cat is generally to avoid in that context, but for clarity I've
done it that way.
Note: In case the file list is large you may want to use tar's append
function (option -A with GNU tar) instead of -c.

(All programms untested.)

Janis

Report this thread to moderator Post Follow-up to this message
Old Post
Janis Papanagnou
04-03-08 03:54 AM


Sponsored Links




Last Thread Next Thread Next
Search this forum -> 
Post New Thread

Unix Shell Programming archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 09:40 AM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.