Home > Archive > PERL Miscellaneous > August 2004 > -s vs du - different results
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
-s vs du - different results
|
|
| Zebee Johnstone 2004-08-25, 5:05 am |
| As people recommended using stat, I've tried.
But I seem to get different results to du, and different
to what my CD burning prog says.
#!/usr/bin/perl -w
use strict;
use File::Find;
my $total;
my $dir = shift;
find(\&wanted, $dir);
print "total = $total \n";
sub wanted {
$total += -s $File::Find::name;
}
produces:
total = 695543582
Running du -sb on the directory given to that program gets me:
750284800
So what am I missing about -s? That's a huge discrepancy, so
there's something that's not being counted.
I am running it as root, so it's not a permissions problem.
am I overflowing some buffer somewhere?
Zebee
--
Zebee Johnstone (zebee@zip.com.au), proud holder of
aus.motorcycles Poser Permit #1.
"Motorcycles are like peanuts... who can stop at just one?"
| |
| Uri Guttman 2004-08-25, 5:05 am |
| >>>>> "ZJ" == Zebee Johnstone <zebee@zip.com.au> writes:
ZJ> print "total = $total \n";
ZJ> sub wanted {
ZJ> $total += -s $File::Find::name;
ZJ> }
ZJ> produces:
ZJ> total = 695543582
ZJ> Running du -sb on the directory given to that program gets me:
ZJ> 750284800
ZJ> So what am I missing about -s? That's a huge discrepancy, so
ZJ> there's something that's not being counted.
du is not the same as -s.
du measures real blocks in use. unix files (notably dbm types as well as
others) can have gaps so the maximum offset (what -s sees) can be much
greater than the actual storage used. du gets into the inode itself and
finds all the allocated blocks and counts them.
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
| |
| Jürgen Exner 2004-08-25, 5:05 am |
| Zebee Johnstone wrote:
> But I seem to get different results to du, and different
> to what my CD burning prog says.
[...]
> $total += -s $File::Find::name;
> produces:
> total = 695543582
Which apparently is the sum of the sizes of all files.
> Running du -sb on the directory given to that program gets me:
> 750284800
Which is how much space all files together occupy on the disk.
> So what am I missing about -s?
Nothing. You are just calculating two different things.
> That's a huge discrepancy, so
> there's something that's not being counted.
Not at all. It's just the trivial fact that usually the size of a file and
the amount of disk space it occupies are not identical and in some cases can
be very different, e.g. for sparse files.
jue
| |
| Sam Holden 2004-08-25, 5:05 am |
| On Wed, 25 Aug 2004 04:17:01 GMT, Zebee Johnstone <zebee@zip.com.au> wrote:
> As people recommended using stat, I've tried.
>
> But I seem to get different results to du, and different
> to what my CD burning prog says.
>
> #!/usr/bin/perl -w
> use strict;
> use File::Find;
> my $total;
> my $dir = shift;
> find(\&wanted, $dir);
>
> print "total = $total \n";
>
> sub wanted {
> $total += -s $File::Find::name;
> }
>
> produces:
> total = 695543582
>
> Running du -sb on the directory given to that program gets me:
> 750284800
>
> So what am I missing about -s? That's a huge discrepancy, so
> there's something that's not being counted.
>
> I am running it as root, so it's not a permissions problem.
>
> am I overflowing some buffer somewhere?
-s is doing a stat(), which will give different answers than
"du -sb" in the presence of symbolic links.
What happens with:
sub wanted {
lstat $_;
$total += -s _;
}
?
--
Sam Holden
| |
| Sam Holden 2004-08-25, 5:05 am |
| On Wed, 25 Aug 2004 04:30:10 GMT, Uri Guttman <uri@stemsystems.com> wrote:
>
> ZJ> print "total = $total \n";
>
> ZJ> sub wanted {
> ZJ> $total += -s $File::Find::name;
> ZJ> }
>
> ZJ> produces:
> ZJ> total = 695543582
>
> ZJ> Running du -sb on the directory given to that program gets me:
> ZJ> 750284800
>
> ZJ> So what am I missing about -s? That's a huge discrepancy, so
> ZJ> there's something that's not being counted.
>
> du is not the same as -s.
>
> du measures real blocks in use. unix files (notably dbm types as well as
> others) can have gaps so the maximum offset (what -s sees) can be much
> greater than the actual storage used. du gets into the inode itself and
> finds all the allocated blocks and counts them.
The '-b' option to GNU du changes that behaviour to calculate the
"apparent size" and not the disk usage (which is silly for a program named
"du", but that's another issue). I don't know the various flavours of
du, but the non-GNU ones I have access to don't have a '-b' option at all.
So it's likely (given my small sample) that the OP is using GNU du.
Also, wouldn't that result in "du" giving a smaller total, not a larger
total?
--
Sam Holden
| |
| Zebee Johnstone 2004-08-25, 5:05 am |
| In comp.lang.perl.misc on Wed, 25 Aug 2004 04:30:10 GMT
Uri Guttman <uri@stemsystems.com> wrote:
>
> du is not the same as -s.
>
> du measures real blocks in use. unix files (notably dbm types as well as
> others) can have gaps so the maximum offset (what -s sees) can be much
> greater than the actual storage used. du gets into the inode itself and
> finds all the allocated blocks and counts them.
Much greater? So shouldn't -s therefore come up with a bigger size?
But it came up with a much smaller one.
Or am I misunderstanding what you mean by offset?
Is there a perl method that does the right thing? If -s is
undercounting then it's not very helpful to find sizes...
So it might well be back to du!
Zebee
| |
| Zebee Johnstone 2004-08-25, 5:05 am |
| In comp.lang.perl.misc on Wed, 25 Aug 2004 04:31:43 GMT
Jürgen Exner <jurgenex@hotmail.com> wrote:
>
>
> Not at all. It's just the trivial fact that usually the size of a file and
> the amount of disk space it occupies are not identical and in some cases can
> be very different, e.g. for sparse files.
Which is something not being counted :) If only unused blocks...
I want to take enough files to fit on a CD and put those files in
a directory and then make an CD from the directory.
If -s won't do it, what will? Or do I just use du in backticks?
Zebee
| |
| Zebee Johnstone 2004-08-25, 5:05 am |
| In comp.lang.perl.misc on 25 Aug 2004 04:57:09 GMT
Sam Holden <sholden@flexal.cs.usyd.edu.au> wrote:
> -s is doing a stat(), which will give different answers than
> "du -sb" in the presence of symbolic links.
There aren't many of those in the given dir
>
> What happens with:
>
> sub wanted {
> lstat $_;
> $total += -s _;
> }
total = 695472130
compared to the simple -s which was total = 695543582
and du which was 750284800
Zebee
| |
| Zebee Johnstone 2004-08-25, 5:05 am |
| In comp.lang.perl.misc on 25 Aug 2004 05:05:23 GMT
Sam Holden <sholden@flexal.cs.usyd.edu.au> wrote:
>
> The '-b' option to GNU du changes that behaviour to calculate the
> "apparent size" and not the disk usage (which is silly for a program named
> "du", but that's another issue). I don't know the various flavours of
> du, but the non-GNU ones I have access to don't have a '-b' option at all.
> So it's likely (given my small sample) that the OP is using GNU du.
It's a linux box, so use gnu du, although the info page has nothing about
"apparent size" but says " `du' reports the amount of disk space used by
the specified files and for each subdirectory (of directory arguments). "
and -b, --bytes print size in bytes
I used -b because otherwise it gives it in 1024 byte chunks:
[root@clone backups]# du -s burn
732700 burn
Zebee
| |
| Uri Guttman 2004-08-25, 5:05 am |
| >>>>> "SH" == Sam Holden <sholden@flexal.cs.usyd.edu.au> writes:
SH> On Wed, 25 Aug 2004 04:30:10 GMT, Uri Guttman <uri@stemsystems.com> wrote:[color=darkred]
ZJ> print "total = $total \n";[color=darkred]
ZJ> sub wanted {
ZJ> $total += -s $File::Find::name;
ZJ> }[color=darkred]
ZJ> produces:
ZJ> total = 695543582[color=darkred]
ZJ> Running du -sb on the directory given to that program gets me:
ZJ> 750284800[color=darkred]
ZJ> So what am I missing about -s? That's a huge discrepancy, so
ZJ> there's something that's not being counted.[color=darkred]
SH> The '-b' option to GNU du changes that behaviour to calculate the
SH> "apparent size" and not the disk usage (which is silly for a program named
SH> "du", but that's another issue). I don't know the various flavours of
SH> du, but the non-GNU ones I have access to don't have a '-b' option at all.
SH> So it's likely (given my small sample) that the OP is using GNU du.
SH> Also, wouldn't that result in "du" giving a smaller total, not a larger
SH> total?
good point but it still is a real discrepancy. gnu du says:
-b, --bytes
print size in bytes
so it prints the size used in bytes. it still isn't -s.
ls -l pad_bench.pl
-rw-r--r-- 1 uri staff 523 May 3 03:25 pad_bench.pl
perl -le 'print -s "pad_bench.pl"'
523
du -sb pad_bench.pl
1024 pad_bench.pl
so du will add up the unused bytes in the trailing blocks. and it will
still skip missing blocks (maybe he has none of those in that dir tree).
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
| |
| Sam Holden 2004-08-25, 5:05 am |
| On Wed, 25 Aug 2004 05:51:09 GMT, Uri Guttman <uri@stemsystems.com> wrote:
[snip du and perl's '-s' giving different results]
>
> good point but it still is a real discrepancy. gnu du says:
>
> -b, --bytes
> print size in bytes
>
> so it prints the size used in bytes. it still isn't -s.
>
> ls -l pad_bench.pl
> -rw-r--r-- 1 uri staff 523 May 3 03:25 pad_bench.pl
> perl -le 'print -s "pad_bench.pl"'
> 523
> du -sb pad_bench.pl
> 1024 pad_bench.pl
>
> so du will add up the unused bytes in the trailing blocks. and it will
> still skip missing blocks (maybe he has none of those in that dir tree).
My du doesn't seem to do that:
; ls -l resolver.pl
-rw-r--r-- 1 sholden pgrad 436 Jul 18 15:58 resolver.pl
; perl -le 'print -s "resolver.pl"'
436
; du -sb resolver.pl
436 resolver.pl
;
But I see that it's a version thing...
; du --version
du (coreutils) 5.2.1
; ./du --version
du (fileutils) 4.1
; du -bs resolver.pl
436 resolver.pl
; ./du -bs resolver.pl
4096 resolver.pl
So the --apparent-size was added sometime between those two versions
and -b changed to be:
-b, --bytes equivalent to `--apparent-size --block-size=1'
The joys of incompatable unix tools - someone should write a portable
scripting language to avoid these issues...
For the OP: You could use blocks count in the stat result, but I
don't know how to determine the blocksize for the filesystem. Plus
it if you are creating a CD image, then you are creating a new
filesystem whose blocksize may well be different so the
count may be useless anyway?
--
Sam Holden
| |
| Martien Verbruggen 2004-08-25, 5:05 am |
| On Wed, 25 Aug 2004 05:18:42 GMT,
Zebee Johnstone <zebee@zip.com.au> wrote:
> In comp.lang.perl.misc on Wed, 25 Aug 2004 04:31:43 GMT
> Jürgen Exner <jurgenex@hotmail.com> wrote:
>
> Which is something not being counted :) If only unused blocks...
>
> I want to take enough files to fit on a CD and put those files in
> a directory and then make an CD from the directory.
>
> If -s won't do it, what will? Or do I just use du in backticks?
Have you tried
my ($block_size, $blocks) = (stat $_)[11, 12];
my $du_size = $block_size * $blocks;
If your find doesn't span multiple file systems, $block_size probably
is constant, and you can optimise that out. Even if you span multiple
file systems, the chance is still high that $block_size is constant.
Martien
--
|
Martien Verbruggen | Make it idiot proof and someone will make a
Trading Post Australia | better idiot.
|
| |
| Sam Holden 2004-08-25, 5:05 am |
| On 25 Aug 2004 06:39:29 GMT, Martien Verbruggen <mgjv@tradingpost.com.au> wrote:
> On Wed, 25 Aug 2004 05:18:42 GMT,
> Zebee Johnstone <zebee@zip.com.au> wrote:
>
> Have you tried
>
> my ($block_size, $blocks) = (stat $_)[11, 12];
> my $du_size = $block_size * $blocks;
>
> If your find doesn't span multiple file systems, $block_size probably
> is constant, and you can optimise that out. Even if you span multiple
> file systems, the chance is still high that $block_size is constant.
On my system multiplying those two numbers doesn't work.
; echo a >test.file
; perl -le 'print join "\t", (stat "test.file")[11, 12]'
4096 8
;
Clearly the file uses 1 block (it is only 1 byte) but it is
reported as using 8. It seems to be reporting the block
count in terms of 512 byte blocks even though 4096 byte
blocks are actually used.
Of course (as shown in my other posts) my system seems
a little strange (with a strangely behaving du...)
I'd still argue that taking the file size and rounding up to
a multiple of the blocksize of the CD file system you are
going to create is the only correct approach. But I
know next to nothing about the specifics of those file
systems so there could be tail packing or something to
ruin that approach...
--
Sam Holden
| |
| Uri Guttman 2004-08-25, 3:57 pm |
| >>>>> "SH" == Sam Holden <sholden@flexal.cs.usyd.edu.au> writes:
SH> On Wed, 25 Aug 2004 05:51:09 GMT, Uri Guttman <uri@stemsystems.com> wrote:
SH> [snip du and perl's '-s' giving different results]
[color=darkred]
SH> My du doesn't seem to do that:
SH> ; ls -l resolver.pl
SH> -rw-r--r-- 1 sholden pgrad 436 Jul 18 15:58 resolver.pl
SH> ; perl -le 'print -s "resolver.pl"'
SH> 436
SH> ; du -sb resolver.pl
SH> 436 resolver.pl
SH> ;
SH> But I see that it's a version thing...
SH> ; du --version
SH> du (coreutils) 5.2.1
SH> ; ./du --version
SH> du (fileutils) 4.1
SH> ; du -bs resolver.pl
SH> 436 resolver.pl
SH> ; ./du -bs resolver.pl
SH> 4096 resolver.pl
interesting. i have du (GNU fileutils) 4.0 on my sparc/solaris.
SH> So the --apparent-size was added sometime between those two versions
SH> and -b changed to be:
SH> -b, --bytes equivalent to `--apparent-size --block-size=1'
bah!
i have always used du with block counts as i wanted 'disk usage'. i
never cared about byte usage. in fact i always use the -k option for du
since i want to know storage that way.
SH> The joys of incompatable unix tools - someone should write a portable
SH> scripting language to avoid these issues...
hmmmm.
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
| |
| Zebee Johnstone 2004-08-25, 3:57 pm |
| In comp.lang.perl.misc on Wed, 25 Aug 2004 14:13:04 GMT
Uri Guttman <uri@stemsystems.com> wrote:
>
> i have always used du with block counts as i wanted 'disk usage'. i
> never cared about byte usage. in fact i always use the -k option for du
> since i want to know storage that way.
du (fileutils) 4.1
Written by Torbjorn Granlund, David MacKenzie, Larry McVoy, and Paul
Eggert.
Copyright (C) 2001 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is
NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.
[zebee@clone zebee]$ ls -l netgear.cfg
-rw-r--r-- 1 zebee www 30452 Aug 28 2003 netgear.cfg
[zebee@clone zebee]$ du -b netgear.cfg
32768 netgear.cfg
[zebee@clone zebee]$ du netgear.cfg
32 netgear.cfg
[zebee@clone zebee]$ perl -le 'print -s "netgear.cfg"'
30452
is there a perl way to get block usage?
Zebee
| |
| Uri Guttman 2004-08-25, 3:57 pm |
| >>>>> "ZJ" == Zebee Johnstone <zebee@zip.com.au> writes:
ZJ> du (fileutils) 4.1
ZJ> Written by Torbjorn Granlund, David MacKenzie, Larry McVoy, and Paul
ZJ> Eggert.
ZJ> Copyright (C) 2001 Free Software Foundation, Inc.
ZJ> This is free software; see the source for copying conditions. There is
ZJ> NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
ZJ> PURPOSE.
ZJ> [zebee@clone zebee]$ ls -l netgear.cfg
ZJ> -rw-r--r-- 1 zebee www 30452 Aug 28 2003 netgear.cfg
ZJ> [zebee@clone zebee]$ du -b netgear.cfg
ZJ> 32768 netgear.cfg
ZJ> [zebee@clone zebee]$ du netgear.cfg
ZJ> 32 netgear.cfg
ZJ> [zebee@clone zebee]$ perl -le 'print -s "netgear.cfg"'
ZJ> 30452
ZJ> is there a perl way to get block usage?
see the other posts by sam. he is using a more recent du which makes -b
act more like -s. but that still won't handle gaps correctly. sam
recommend rounding up the -s to the next block size (or you could just
count blocks with a mod (%) operation on the block size). you really
need block counts IMO as that is what the cdrom will need. fractional
trailing blocks still take up whole blocks on most file systems (reiser
is one that doesn't do that).
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
| |
| Paul Gaborit 2004-08-25, 3:57 pm |
|
À (at) Wed, 25 Aug 2004 14:36:59 GMT,
Zebee Johnstone <zebee@zip.com.au> écrivait (wrote):
> [zebee@clone zebee]$ perl -le 'print -s "netgear.cfg"'
> 30452
>
> is there a perl way to get block usage?
Yes :
$ perl -le 'print +(stat "netgear.cfg")[12]'
But, is there a perl way to get block size ? ;-)
--
Paul Gaborit - <http://www.enstimac.fr/~gaborit/>
Perl en français - <http://www.enstimac.fr/Perl/>
| |
| Martien Verbruggen 2004-08-26, 3:56 am |
| On 25 Aug 2004 06:55:32 GMT,
Sam Holden <sholden@flexal.cs.usyd.edu.au> wrote:
> On 25 Aug 2004 06:39:29 GMT, Martien Verbruggen <mgjv@tradingpost.com.au> wrote:
>
> On my system multiplying those two numbers doesn't work.
>
> ; echo a >test.file
> ; perl -le 'print join "\t", (stat "test.file")[11, 12]'
> 4096 8
That's correct. I was wrong. The block size in stat is not the size of
the blocks on disk, but the size of the file system I/O preferred
block read.
The nblocks field gives the count in blocks of 512 bytes, as in the
block size on the original UFS.
> I'd still argue that taking the file size and rounding up to
> a multiple of the blocksize of the CD file system you are
> going to create is the only correct approach. But I
> know next to nothing about the specifics of those file
> systems so there could be tail packing or something to
> ruin that approach...
(stat $_)[12] * 512 should, I believe, work on all Unix systems. It
seems to work on both the Linux and the Solaris system I've got here.
However, it does not work the same way under Cygwin, where the
multiplication rule that I came up with earlier holds up.
It looks like as long as you know what the "fundamental" block size of
your file system is, you can translate the number of blocks to space
taken up on the file system.
Since, indeed, the idea seems to be that these files are going to be
packed onto a CD, using the block size of the ISO9660 file system in
the way you describe should be ok. I don't know whether it's easy to
predict what directory sizes and such will be.
Martien
--
|
Martien Verbruggen | Freudian slip: when you say one thing but
Trading Post Australia | mean your mother.
|
| |
| Joe Smith 2004-08-27, 8:57 am |
| Zebee Johnstone wrote:
> $total += -s $File::Find::name;
> produces: total = 695543582
> Running du -sb on the directory given to that program gets me:
> 750284800
For file systems where you know for sure that disk allocation
is based on 1K blocks, I have used something like this:
$bytes = -s $name;
$blocks = ($bytes + 1023) >> 10;
$total += $blocks << 10;
But that would not take into consideration the overhead for
large files (indirect blocks and double indirect blocks)
the way that (stat $name)[12]*512 does.
The overhead required for large files on the CD is something you
ought to include in your calculations.
-Joe
|
|
|
|
|