For Programmers: Free Programming Magazines  


Home > Archive > AWK > March 2004 > Re: Is there a better way?









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Re: Is there a better way?
William Park

2004-03-19, 8:23 pm

In <comp.lang.awk> Fingers McGee <newsscarce1@hotmail.com> wrote:
> To be used as a cron job. The directories where this is located can
> get tens of thousands of small identical message files every day.
> This works but I'd like to see if someone might be able to improve
> upon it. I'll let the script explain:
>
> ls -l > ./outputfiles
> cat ./outputfiles | grep outfile | awk '{print $5 " " $9}' | grep ^35
> | sed s/^35/'rm -f '/ > ./35files
> chmod 700 ./35files
> ./35files
> rm -f ./5files
> cat ./outputfiles | grep outfile | awk '{print $5 " " $9}' | grep ^5 |
> sed s/^5/'rm -f '/ > ./5files
> chmod 700 ./5files
> ./5files
> rm -f ./5files
> rm -f ./outputfiles


You are deleting files whose size starts with '35' and '5'. So, 35,
350, 3500, 35678, 5, 50, 567, 5320, ... will all match. Most direct way
is
ls -l | while read a{1,2,3,4,5,6,7,8,9} ; do
case $a5 in
35*|5*) rm -f $a9 ;;
ease
done

--
William Park, Open Geometry Consulting, <opengeometry@yahoo.ca>
Linux solution for data management and processing.
A. Alper ATICI

2004-03-19, 8:23 pm

On 5 Feb 2004 04:59:37 GMT, William Park <opengeometry@yahoo.ca> wrote:

>
>You are deleting files whose size starts with '35' and '5'. So, 35,


I think you missed the first grep in pipeline.


Jens Schweikhardt

2004-03-19, 8:23 pm

In comp.unix.shell A. Alper ATICI <alper.aticiSTRIP@softhome.net> wrote:
# On 4 Feb 2004 11:46:12 -0800, newsscarce1@hotmail.com (Fingers McGee)
# wrote:
#
#>To be used as a cron job. The directories where this is located can
#>get tens of thousands of small identical message files every day.
#>This works but I'd like to see if someone might be able to improve
#>upon it.
#
# find . -name 'outfile*' \( -size 35c -o -size 5c \) -exec rm -f '{}' ';'

If I'm not mistaken,

find . -name '*outfile*' \( -size 35c -o -size 5c \) -exec rm -f '{}' ';'

has closer semantics to the original poster's pipe. The 'grep outfile'
does match more than outfile*. In case the list of files to rm is large,
xargs may be performing better:

find . -name '*outfile*' \( -size 35c -o -size 5c \) -print | xargs rm

In case the files can contain white space and find/xargs can
produce/consume \0 terminated lists,

find . -name '*outfile*' \( -size 35c -o -size 5c \) -print0 | xargs -0 rm

may be almost bullet proof.

Regards,

Jens
--
Jens Schweikhardt http://www.schweikhardt.net/
SIGSIG -- signature too long (core dumped)
A. Alper ATICI

2004-03-19, 8:23 pm

On 5 Feb 2004 13:58:15 GMT, Jens Schweikhardt <usenet@schweikhardt.net>
wrote:

>In comp.unix.shell A. Alper ATICI <alper.aticiSTRIP@softhome.net> wrote:
># On 4 Feb 2004 11:46:12 -0800, newsscarce1@hotmail.com (Fingers McGee)
># wrote:
>#
>#>To be used as a cron job. The directories where this is located can
>#>get tens of thousands of small identical message files every day.
>#>This works but I'd like to see if someone might be able to improve
>#>upon it.
>#
># find . -name 'outfile*' \( -size 35c -o -size 5c \) -exec rm -f '{}' ';'
>
>If I'm not mistaken,
>
> find . -name '*outfile*' \( -size 35c -o -size 5c \) -exec rm -f '{}' ';'
>
>has closer semantics to the original poster's pipe. The 'grep outfile'
>does match more than outfile*.


You're absolutely right.
I've made assumptions based on the nature of problem, and would refine that
further if/when the OP gave feedback.

>In case the list of files to rm is large,
>xargs may be performing better:
>


{} in -exec is replaced by the current filename, not a list of filenames,
so I don't think the list of files to rm will ever get large.
However, your solution below might be more efficient due to lack of
repetitive rm invocation, but I can't confirm that without knowing inner
workings of xargs.


> find . -name '*outfile*' \( -size 35c -o -size 5c \) -print | xargs rm
>
>In case the files can contain white space and find/xargs can
>produce/consume \0 terminated lists,
>
> find . -name '*outfile*' \( -size 35c -o -size 5c \) -print0 | xargs -0 rm
>
>may be almost bullet proof.
>
>Regards,
>
> Jens


Sven Mascheck

2004-03-19, 8:23 pm

Jens Schweikhardt wrote:

> find . -name '*outfile*' \( -size 35c -o -size 5c \) -print0 | xargs -0 rm


or
find . -name '*outfile*' \( -size 35c -o -size 5c \) -exec rm '{}' '+'

;-)

(s.a. <bupe7d$3h2$1@news.in-ulm.de> )
[f'up c.u.shell]
Jens Schweikhardt

2004-03-19, 8:23 pm

[Followups trimmed to comp.unix.shell]

In comp.unix.shell A. Alper ATICI <alper.aticiSTRIP@softhome.net> wrote:
....
# However, your solution below might be more efficient due to lack of
# repetitive rm invocation, but I can't confirm that without knowing inner
# workings of xargs.
#
#> find . -name '*outfile*' \( -size 35c -o -size 5c \) -print | xargs rm

Yes, the point of xargs is to save a lot of process creation overhead.
For N files, the "find ... -exec rm" will fork N processes, while with
xargs it is 1 + a small number (xargs also may invoke rm repeatedly if
the list gets too long, but it will invoke rm with as many args as
possible). The overhead becomes more noticeable the more files the
find prints and the more expensive process creation is. On my FreeBSD
system, finding and deleting 100 files gives a 6x speedup with xargs.

Regards,

Jens
--
Jens Schweikhardt http://www.schweikhardt.net/
SIGSIG -- signature too long (core dumped)
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com