Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

getting list of all .html files in a directory and its directories
I need to get a list of all the files that end with '.html' in a directory a
nd
all of its subdirectories. I then want to search through each file and remov
e
the ones from the list that contain '<%perl>' or '<%init>'. How can I do thi
s?
Thanks for any help.

--
Andrew Gaffney
Network Administrator
Skyline Aeronautics, LLC.
636-357-1548


Report this thread to moderator Post Follow-up to this message
Old Post
Andrew Gaffney
07-30-04 08:55 PM


Re: getting list of all .html files in a directory and its directories
On Fri, 30 Jul 2004, Andrew Gaffney wrote:

> I need to get a list of all the files that end with '.html' in a
> directory and all of its subdirectories. I then want to search through
> each file and remove the ones from the list that contain '<%perl>' or
> '<%init>'. How can I do this? Thanks for any help.

From a Unix command line, you could do something like this:

$ find /path/to/htdocs -type f | xargs egrep -li '<%(perl|init)>'

The above line results in a list of all the files that have either
'<%perl>' or '<%init>' in them.

From here, you can o a step further by deleting them all. Because files
with spaces in their name (or their path) can break this horribly, I'll
use `sed` to wrap each line in quotes before removing them:

$ find /path/to/htdocs -type f | \
> xargs egrep -li '<%(perl|init)>' | \
> sed 's/\(.*\)/"\1"/' | \
> xargs rm -i

This should also prompt you before taking any action, in case you
realize that you really wanted one of these files. If you want to just
proceed blindly -- and my but you're brave if you do -- then delete the
"-i" from the last line.


Of course, you probably wanted to do this in Perl, but sometimes things
are just as easy to do with shell tools, and this seems like a good
example. Unless you want to do this all the time -- in which case go
ahead & script it in Perl -- a shell one liner like this should be fine.



And of course this all breaks down if you're using Windows, in which
case unless you're a fan of Cygwin you can just ignore all of this :)



--
Chris Devers

Report this thread to moderator Post Follow-up to this message
Old Post
Chris Devers
07-30-04 08:55 PM


Re: getting list of all .html files in a directory and its directories
Chris Devers wrote:
> On Fri, 30 Jul 2004, Andrew Gaffney wrote:
> 
>
>
>  From a Unix command line, you could do something like this:
>
>     $ find /path/to/htdocs -type f | xargs egrep -li '<%(perl|init)>'
>
> The above line results in a list of all the files that have either
> '<%perl>' or '<%init>' in them.
>
>  From here, you can o a step further by deleting them all. Because files
> with spaces in their name (or their path) can break this horribly, I'll
> use `sed` to wrap each line in quotes before removing them:
>
>     $ find /path/to/htdocs -type f | \ 
>
> This should also prompt you before taking any action, in case you
> realize that you really wanted one of these files. If you want to just
> proceed blindly -- and my but you're brave if you do -- then delete the
> "-i" from the last line.

I think you misunderstand. I don't want to delete the files that contain
'<%perl>' or '<%init>'. I just want to make a list of all .html files in a
directory tree and remove the ones that contains '<%perl>' or '<%init>' from
 my
list.

--
Andrew Gaffney
Network Administrator
Skyline Aeronautics, LLC.
636-357-1548


Report this thread to moderator Post Follow-up to this message
Old Post
Andrew Gaffney
07-30-04 08:55 PM


Re: getting list of all .html files in a directory and its directories
On Fri, 30 Jul 2004, Andrew Gaffney wrote:

> I think you misunderstand. I don't want to delete the files that
> contain '<%perl>' or '<%init>'. I just want to make a list of all
> .html files in a directory tree and remove the ones that contains
> '<%perl>' or '<%init>' from my list.

Then yes, I misunderstood. This version should do what you want:

$ find /path/to/htdocs -type f | xargs egrep -liv '<%(perl|init)>'

It's exactly like the first one I sent before, but I've added "-v" to
the egrep arguments, which inverts the meaning from "all files with this
pattern" to "all files NOT with this pattern". In this case, that's what
you're trying to get.

If you then want to remove / delete files, tack on the sed & rm commands
I had in the earlier version, but it sounds like you just mean "omit
from the list" rather than "remove from the hard drive".


--
Chris Devers

Report this thread to moderator Post Follow-up to this message
Old Post
Chris Devers
07-30-04 08:55 PM


Re: getting list of all .html files in a directory and its directories
Chris Devers wrote:
> On Fri, 30 Jul 2004, Andrew Gaffney wrote:
> 
>
>
> Then yes, I misunderstood. This version should do what you want:
>
>     $ find /path/to/htdocs -type f | xargs egrep -liv '<%(perl|init)>'
>
> It's exactly like the first one I sent before, but I've added "-v" to
> the egrep arguments, which inverts the meaning from "all files with this
> pattern" to "all files NOT with this pattern". In this case, that's what
> you're trying to get.
>
> If you then want to remove / delete files, tack on the sed & rm commands
> I had in the earlier version, but it sounds like you just mean "omit
> from the list" rather than "remove from the hard drive".

That still doesn't appear to do what I want. I believe it is showing me all
files where *all* lines don't contain '<%perl>' or '<%init>'. Since not *all
*
lines contain either one of those, all files still show in the list.

--
Andrew Gaffney
Network Administrator
Skyline Aeronautics, LLC.
636-357-1548


Report this thread to moderator Post Follow-up to this message
Old Post
Andrew Gaffney
07-31-04 01:55 AM


Re: getting list of all .html files in a directory and its directories
On Fri, 30 Jul 2004, Andrew Gaffney wrote:

> Chris Devers wrote: 
>
> That still doesn't appear to do what I want. I believe it is showing
> me all files where *all* lines don't contain '<%perl>' or '<%init>'.
> Since not *all* lines contain either one of those, all files still
> show in the list.

Okay, let's try again then:

$ grep -li '<title>' *html  # print all html files with '<title>'
20things.html
bookmarks.html
gas.html
gas_form.html
itunes.html
noise.html
$

$ grep -Li '<title>' *html  # print all html files WITHOUT '<title>'
HEADER.shtml
$

The sets are non-intersecting, and so apparently what you meant.

If you want to refine this further, try `egrep --help` or `man egrep`.
I should have tested what I sent before sending it, but ten seconds of
skimming over the documentation on your own should have been enough to
show you these lines from `egrep --help`:

$ egrep --help | grep -i 'files.*match.*print'
-L, --files-without-match only print FILE names containing no match
-l, --files-with-matches  only print FILE names containing matches
$

So, as with many Unix commands, shift-L inverts the usual sense of L,
meaning that '-L' gets you the opposite of what '-l' does.

Now have we got it? :-)




--
Chris Devers

Report this thread to moderator Post Follow-up to this message
Old Post
Chris Devers
07-31-04 01:55 AM


Re: getting list of all .html files in a directory and its directories
In DOS:
> perl -n0 -e "push @b, $ARGV unless /<%(?:perl|init)>/; END{print \"@b\"}"
file1.html file2.html file3.html

In *nix (untested):
> perl -n0 -e 'push @b, $ARGV unless /<%(?:perl|init)>/; END{print "@b"}'
*.html

"Andrew Gaffney" <agaffney@skylineaero.com> wrote in message That still
doesn't appear to do what I want. I believe it is showing me all
> files where *all* lines don't contain '<%perl>' or '<%init>'. Since not
*all*
> lines contain either one of those, all files still show in the list.



Report this thread to moderator Post Follow-up to this message
Old Post
Zeus Odin
07-31-04 01:55 AM


Re: getting list of all .html files in a directory and its directories
Chris Devers wrote:
> On Fri, 30 Jul 2004, Andrew Gaffney wrote:
> 
>
>
> Okay, let's try again then:
>
>   $ grep -li '<title>' *html  # print all html files with '<title>'
>   20things.html
>   bookmarks.html
>   gas.html
>   gas_form.html
>   itunes.html
>   noise.html
>   $
>
>   $ grep -Li '<title>' *html  # print all html files WITHOUT '<title>'
>   HEADER.shtml
>   $
>
> The sets are non-intersecting, and so apparently what you meant.
>
> If you want to refine this further, try `egrep --help` or `man egrep`.
> I should have tested what I sent before sending it, but ten seconds of
> skimming over the documentation on your own should have been enough to
> show you these lines from `egrep --help`:
>
>   $ egrep --help | grep -i 'files.*match.*print'
>     -L, --files-without-match only print FILE names containing no match
>     -l, --files-with-matches  only print FILE names containing matches
>   $
>
> So, as with many Unix commands, shift-L inverts the usual sense of L,
> meaning that '-L' gets you the opposite of what '-l' does.
>
> Now have we got it? :-)

I think it is a problem with the regex. If I change it to:

grep -RLi '<%init>' * | grep '.html'

I get all files that don't have '<%init>', but it doesn't work with the
'<%(init|perl)>'. That regex doesn't seem to match anything.

--
Andrew Gaffney
Network Administrator
Skyline Aeronautics, LLC.
636-357-1548


Report this thread to moderator Post Follow-up to this message
Old Post
Andrew Gaffney
07-31-04 08:55 AM


Re: getting list of all .html files in a directory and its directories
On Fri, 30 Jul 2004, Andrew Gaffney wrote:

> I think it is a problem with the regex. If I change it to:
>
> grep -RLi '<%init>' * | grep '.html'
>
> I get all files that don't have '<%init>', but it doesn't work with
> the '<%(init|perl)>'. That regex doesn't seem to match anything.

More man page material: I was using `egrep` for the earlier examples,
not `grep`. On my computer (a Mac), `egrep` is equivalent to `grep -e`;
either way, this pulls in an enhanced regex parser that, in this case,
is being used to match multiple patterns (by|doing|this).

Hence, these two lines are equivalent:

egrep    'pattern|anotherpattern'  *
grep  -e 'pattern|anotherpattern'  *

Also, the line you ended up with --

grep -RLi '<%init>' * | grep '.html'

-- should be equivalent to this one --

grep -RLi '<%init>' *html

-- without needing the second grep statement.

And to weave the multiple pattern matching back in, you can do these:

egrep -RLi  '<%(init|perl)>' *html
grep  -RLie '<%(init|perl)>' *html

Both of these should match files that have neither of the two patterns
you were asking about : /<%init>/ nor /<%perl>/ .

Make sense?



--
Chris Devers

Report this thread to moderator Post Follow-up to this message
Old Post
Chris Devers
07-31-04 08:55 AM


Re: getting list of all .html files in a directory and its directories
Chris Devers wrote:
> On Fri, 30 Jul 2004, Andrew Gaffney wrote:
> 
>
>
> More man page material: I was using `egrep` for the earlier examples,
> not `grep`. On my computer (a Mac), `egrep` is equivalent to `grep -e`;
> either way, this pulls in an enhanced regex parser that, in this case,
> is being used to match multiple patterns (by|doing|this).
>
> Hence, these two lines are equivalent:
>
>   egrep    'pattern|anotherpattern'  *
>   grep  -e 'pattern|anotherpattern'  *
>
> Also, the line you ended up with --
>
>   grep -RLi '<%init>' * | grep '.html'
>
> -- should be equivalent to this one --
>
>   grep -RLi '<%init>' *html
>
> -- without needing the second grep statement.

It isn't though. I had the '-R' flag in which means I want it to search
subdirectories also. The '*html' gets interpreted by the shell and it ends u
p
not recursing.

> And to weave the multiple pattern matching back in, you can do these:
>
>   egrep -RLi  '<%(init|perl)>' *html
>   grep  -RLie '<%(init|perl)>' *html

I ended up with "egrep -RLi  '<%(init|perl)>' * | egrep '.html$'" which seem
s to
get me exactly what I wanted.

> Both of these should match files that have neither of the two patterns
> you were asking about : /<%init>/ nor /<%perl>/ .
>
> Make sense?

Yes. Thanks for the help.

--
Andrew Gaffney
Network Administrator
Skyline Aeronautics, LLC.
636-357-1548


Report this thread to moderator Post Follow-up to this message
Old Post
Andrew Gaffney
07-31-04 08:55 AM


Sponsored Links




Last Thread Next Thread Next
Search this forum -> 
Post New Thread

PERL Beginners archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 04:30 PM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.