Code Comments
Programming Forum and web based access to our favorite programming groups.In <comp.lang.awk> Fingers McGee <newsscarce1@hotmail.com> wrote:
> To be used as a cron job. The directories where this is located can
> get tens of thousands of small identical message files every day.
> This works but I'd like to see if someone might be able to improve
> upon it. I'll let the script explain:
>
> ls -l > ./outputfiles
> cat ./outputfiles | grep outfile | awk '{print $5 " " $9}' | grep ^35
> | sed s/^35/'rm -f '/ > ./35files
> chmod 700 ./35files
> ./35files
> rm -f ./5files
> cat ./outputfiles | grep outfile | awk '{print $5 " " $9}' | grep ^5
|
> sed s/^5/'rm -f '/ > ./5files
> chmod 700 ./5files
> ./5files
> rm -f ./5files
> rm -f ./outputfiles
You are deleting files whose size starts with '35' and '5'. So, 35,
350, 3500, 35678, 5, 50, 567, 5320, ... will all match. Most direct way
is
ls -l | while read a{1,2,3,4,5,6,7,8,9} ; do
case $a5 in
35*|5*) rm -f $a9 ;;
ease
done
--
William Park, Open Geometry Consulting, <opengeometry@yahoo.ca>
Linux solution for data management and processing.
Post Follow-up to this messageOn 5 Feb 2004 04:59:37 GMT, William Park <opengeometry@yahoo.ca> wrote: > >You are deleting files whose size starts with '35' and '5'. So, 35, I think you missed the first grep in pipeline.
Post Follow-up to this messageIn comp.unix.shell A. Alper ATICI <alper.aticiSTRIP@softhome.net> wrote:
# On 4 Feb 2004 11:46:12 -0800, newsscarce1@hotmail.com (Fingers McGee)
# wrote:
#
#>To be used as a cron job. The directories where this is located can
#>get tens of thousands of small identical message files every day.
#>This works but I'd like to see if someone might be able to improve
#>upon it.
#
# find . -name 'outfile*' \( -size 35c -o -size 5c \) -exec rm -f '{}'
';'
If I'm not mistaken,
find . -name '*outfile*' \( -size 35c -o -size 5c \) -exec rm -f '{}' '
;'
has closer semantics to the original poster's pipe. The 'grep outfile'
does match more than outfile*. In case the list of files to rm is large,
xargs may be performing better:
find . -name '*outfile*' \( -size 35c -o -size 5c \) -print | xargs rm
In case the files can contain white space and find/xargs can
produce/consume \0 terminated lists,
find . -name '*outfile*' \( -size 35c -o -size 5c \) -print0 | xargs -0 rm
may be almost bullet proof.
Regards,
Jens
--
Jens Schweikhardt http://www.schweikhardt.net/
SIGSIG -- signature too long (core dumped)
Post Follow-up to this messageOn 5 Feb 2004 13:58:15 GMT, Jens Schweikhardt <usenet@schweikhardt.net>
wrote:
>In comp.unix.shell A. Alper ATICI <alper.aticiSTRIP@softhome.net> wrote:
># On 4 Feb 2004 11:46:12 -0800, newsscarce1@hotmail.com (Fingers McGee)
># wrote:
>#
>#>To be used as a cron job. The directories where this is located can
>#>get tens of thousands of small identical message files every day.
>#>This works but I'd like to see if someone might be able to improve
>#>upon it.
>#
># find . -name 'outfile*' \( -size 35c -o -size 5c \) -exec rm -f '{}'
';'
>
>If I'm not mistaken,
>
> find . -name '*outfile*' \( -size 35c -o -size 5c \) -exec rm -f '{}
' ';'
>
>has closer semantics to the original poster's pipe. The 'grep outfile'
>does match more than outfile*.
You're absolutely right.
I've made assumptions based on the nature of problem, and would refine that
further if/when the OP gave feedback.
>In case the list of files to rm is large,
>xargs may be performing better:
>
{} in -exec is replaced by the current filename, not a list of filename
s,
so I don't think the list of files to rm will ever get large.
However, your solution below might be more efficient due to lack of
repetitive rm invocation, but I can't confirm that without knowing inner
workings of xargs.
> find . -name '*outfile*' \( -size 35c -o -size 5c \) -print | xargs rm
>
>In case the files can contain white space and find/xargs can
>produce/consume \0 terminated lists,
>
> find . -name '*outfile*' \( -size 35c -o -size 5c \) -print0 | xargs -0 r
m
>
>may be almost bullet proof.
>
>Regards,
>
> Jens
Post Follow-up to this messageJens Schweikhardt wrote:
> find . -name '*outfile*' \( -size 35c -o -size 5c \) -print0 | xargs -0 rm[/colo
r]
or
find . -name '*outfile*' \( -size 35c -o -size 5c \) -exec rm '{}' '+'
;-)
(s.a. <bupe7d$3h2$1@news.in-ulm.de> )
[f'up c.u.shell]
Post Follow-up to this message[Followups trimmed to comp.unix.shell] In comp.unix.shell A. Alper ATICI <alper.aticiSTRIP@softhome.net> wrote: ... # However, your solution below might be more efficient due to lack of # repetitive rm invocation, but I can't confirm that without knowing inner # workings of xargs. # #> find . -name '*outfile*' \( -size 35c -o -size 5c \) -print | xargs rm Yes, the point of xargs is to save a lot of process creation overhead. For N files, the "find ... -exec rm" will fork N processes, while with xargs it is 1 + a small number (xargs also may invoke rm repeatedly if the list gets too long, but it will invoke rm with as many args as possible). The overhead becomes more noticeable the more files the find prints and the more expensive process creation is. On my FreeBSD system, finding and deleting 100 files gives a 6x speedup with xargs. Regards, Jens -- Jens Schweikhardt http://www.schweikhardt.net/ SIGSIG -- signature too long (core dumped)
Post Follow-up to this messagePowered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.