For Programmers: Free Programming Magazines  


Home > Archive > PERL Miscellaneous > October 2006 > using File::Find with big files on Linux









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author using File::Find with big files on Linux
Matt Rohm

2006-10-30, 7:09 pm


I've noticed something peculiar with a script that
checks file sizes using the File::Find - finddepth
routine. When it comes across really big files,
say >4GB it examines (or compares) the file twice.

Anyone seen this before? Care to share the cause of
this problem, and some ideas for a solution?

thx.
DJ Stunks

2006-10-30, 7:09 pm


Matt Rohm wrote:
> I've noticed something peculiar with a script that
> checks file sizes using the File::Find - finddepth
> routine. When it comes across really big files,
> say >4GB it examines (or compares) the file twice.
>
> Anyone seen this before? Care to share the cause of
> this problem, and some ideas for a solution?


I am highly suspicious.

Where's your "smallest _but complete_ code sample which demonstrates
the problem"? Have you read the posting guidelines?

-jp

Joe Smith

2006-10-30, 7:09 pm

Matt Rohm wrote:
> I've noticed something peculiar with a script that
> checks file sizes using the File::Find - finddepth
> routine. When it comes across really big files,
> say >4GB it examines (or compares) the file twice.


I haven't seen that problem.

linux% ls -lR
..:
total 16
drwxr-xr-x 4 jms jms 4096 Oct 24 20:47 one/
drwxr-xr-x 4 jms jms 4096 Oct 24 20:47 two/

../one:
total 36
drwxr-xr-x 2 jms jms 4096 Oct 24 20:45 1a/
drwxr-xr-x 2 jms jms 4096 Oct 24 20:46 1b/
-rw-r--r-- 1 jms jms 5368709121 Oct 24 20:46 bar-1

../one/1a:
total 20
-rw-r--r-- 1 jms jms 5368709121 Oct 24 20:44 foo-1a

../one/1b:
total 20
-rw-r--r-- 1 jms jms 5368709121 Oct 24 20:44 foo-1b

../two:
total 36
drwxr-xr-x 2 jms jms 4096 Oct 24 20:46 2a/
drwxr-xr-x 2 jms jms 4096 Oct 24 20:46 2b/
-rw-r--r-- 1 jms jms 5368709121 Oct 24 20:46 bar-2

../two/2a:
total 20
-rw-r--r-- 1 jms jms 5368709121 Oct 24 20:44 foo-2a

../two/2b:
total 20
-rw-r--r-- 1 jms jms 5368709121 Oct 24 20:44 foo-2b
linux%
linux% find . -depth -print
../one/1a/foo-1a
../one/1a
../one/1b/foo-1b
../one/1b
../one/bar-1
../one
../two/2a/foo-2a
../two/2a
../two/2b/foo-2b
../two/2b
../two/bar-2
../two
..
linux% perl -MFile::Find -le 'finddepth(sub {print $File::Find::name},".")'
../one/bar-1
../one/1a/foo-1a
../one/1a
../one/1b/foo-1b
../one/1b
../one
../two/bar-2
../two/2a/foo-2a
../two/2a
../two/2b/foo-2b
../two/2b
../two
..
linux% perl -MFile::Find -le 'find(sub {print $File::Find::name},".")'
..
../one
../one/bar-1
../one/1a
../one/1a/foo-1a
../one/1b
../one/1b/foo-1b
../two
../two/bar-2
../two/2a
../two/2a/foo-2a
../two/2b
../two/2b/foo-2b
linux% uname -a
Linux mathras 2.6.15-1.1831_FC4 #1 Tue Feb 7 13:37:42 EST 2006 i686

You'll need to post some actual code that we can run to
duplicate the problem.
-Joe
Michele Dondi

2006-10-30, 7:09 pm

On Tue, 24 Oct 2006 20:55:50 -0700, Joe Smith <joe@inwap.com> wrote:

^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^[color=darkred]
>
>I haven't seen that problem.

[snip]
>linux% perl -MFile::Find -le 'finddepth(sub {print $File::Find::name},".")'


Not that I expect it to make a difference, but yours is a flawed
comparison, since you do *not* "checks file sizes"...


Michele
--
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{po
p^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
Michele Dondi

2006-10-30, 7:09 pm

On Tue, 24 Oct 2006 16:50:09 -0700, Matt Rohm <rohm@cisco.com> wrote:

>routine. When it comes across really big files,
>say >4GB it examines (or compares) the file twice.
>
>Anyone seen this before? Care to share the cause of
>this problem, and some ideas for a solution?


As others said, this sounds strange, and it would be easier to believe
it if you gave some evidence. Wild guess: isn't it that you're doing
multiple stat(s) perhaps in the disguised form of some -X? If so, then
just use the _ filehandle, if possible of course. However this should
affect *all* files independently of their size, but perhaps you were
concerned about "really big files" and only noticed the thing in
connection with them...


Michele
--
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{po
p^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
Matt Rohm

2006-10-30, 7:09 pm

Thanks for pointing this out, I think you have hit the basic
problem. The results of the stat are compare against a number
of rules, trying to filter all files and find the old, large files.

Sorry for ignoring the posting rules, next I will include code and
everyone can point out my stupid mistake.


Michele Dondi wrote:
> On Tue, 24 Oct 2006 16:50:09 -0700, Matt Rohm <rohm@cisco.com> wrote:
>
>
>
>
> As others said, this sounds strange, and it would be easier to believe
> it if you gave some evidence. Wild guess: isn't it that you're doing
> multiple stat(s) perhaps in the disguised form of some -X? If so, then
> just use the _ filehandle, if possible of course. However this should
> affect *all* files independently of their size, but perhaps you were
> concerned about "really big files" and only noticed the thing in
> connection with them...
>
>
> Michele

Michele Dondi

2006-10-30, 7:09 pm

On Wed, 25 Oct 2006 11:57:44 -0700, Matt Rohm <rohm@cisco.com> wrote:

>Thanks for pointing this out, I think you have hit the basic
>problem. The results of the stat are compare against a number
>of rules, trying to filter all files and find the old, large files.


If you use the *results*, and I think you mean the return value, of a
stat(), then you should not have problems. If you call, say, -f, -s
and so on on the same filehandle, then you may, since they call stat()
again each time. Thus you can use the _ filehandle, as hinted in the
other post, as that will use cached values from the last stat()
instead.

>Sorry for ignoring the posting rules, next I will include code and
>everyone can point out my stupid mistake.


Well, there's no guarantee either. We're not perfect of course. But
indeed if you paste some minimal but complete example exhibiting the
problem you're having, then it will be easier to help you.

BTW: *please* do not top-post. It's annoying, and makes editing and
replying harder for mostly everybody here. (See below!)
[color=darkred]
>Michele Dondi wrote:
[snip full quoted context]


Michele
--
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{po
p^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com