Home > Archive > AWK > February 2005 > is file ascii only?
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
is file ascii only?
|
|
| Gernot Frisch 2005-02-07, 8:56 pm |
| hi,
I want to do seomthing like this:
ls -1 | gawk {isfileascii} | do_something
do_something could be: xargs cat|grep something
problem is, I want a awk script, that only points out the file name,
if the file is ascii only. If it countains binary characters, don't
print anything at all.
Thank you in advice,
--
-Gernot
int main(int argc, char** argv) {printf
("%silto%c%cf%cgl%ssic%ccom%c", "ma", 58, 'g', 64, "ba", 46, 10);}
| |
| Kenny McCormack 2005-02-07, 8:56 pm |
| In article <36pcjrF50rn1gU1@individual.net>,
Gernot Frisch <Me@Privacy.net> wrote:
>hi,
>
>I want to do seomthing like this:
>
>ls -1 | gawk {isfileascii} | do_something
>
>do_something could be: xargs cat|grep something
>
>problem is, I want a awk script, that only points out the file name,
>if the file is ascii only. If it countains binary characters, don't
>print anything at all.
Define "binary character". That's always the hardest part of this
exercise.
| |
| Jürgen Kahrs 2005-02-07, 8:56 pm |
| Gernot Frisch wrote:
> I want to do seomthing like this:
>
> ls -1 | gawk {isfileascii} | do_something
file *.txt | awk '$2~/ASCII/{print substr($1,0, length($1)-1)}'
| |
| Gernot Frisch 2005-02-07, 8:56 pm |
|
"Jürgen Kahrs" <Juergen.KahrsDELETETHIS@vr-web.de> schrieb im
Newsbeitrag news:36pd25F54ripaU1@individual.net...
> Gernot Frisch wrote:
>
>
> file *.txt | awk '$2~/ASCII/{print substr($1,0, length($1)-1)}'
I don't have the command "file", I use a cygwin box.
Gernot
| |
| Gernot Frisch 2005-02-07, 8:56 pm |
|
"Kenny McCormack" <gazelle@yin.interaccess.com> schrieb im Newsbeitrag
news:cu7ujf$40t$1@yin.interaccess.com...
> In article <36pcjrF50rn1gU1@individual.net>,
> Gernot Frisch <Me@Privacy.net> wrote:
>
> Define "binary character". That's always the hardest part of this
> exercise.
>
No character above 127, and none below 9 (TAB).
| |
| Kenny McCormack 2005-02-07, 8:56 pm |
| In article <36pfqqF4v170vU2@individual.net>,
Gernot Frisch <Me@Privacy.net> wrote:
>
>"Kenny McCormack" <gazelle@yin.interaccess.com> schrieb im Newsbeitrag
>news:cu7ujf$40t$1@yin.interaccess.com...
>
>No character above 127, and none below 9 (TAB).
Well then, gawk may not be the best tool since it has no built-in ord()
function. TAWK does (as does P***). It is trivial to write an ord() for
gawk as an extension() function - I believe I posted one a year or so back
(Google is your friend). Unclear whether or not extension() functions work
in Cygwin gawk (I say unclear because I know it didn't work before, but
that progress may have been made in the interim).
Then, there's the problem that if the file really is binary, you might have
"issues" reading it via a text-oriented tool like (g)awk. Again, this *can*
be done cleanly and w/o issues in TAWK (or P****).
| |
| Jürgen Kahrs 2005-02-07, 8:56 pm |
| Gernot Frisch wrote:
> I don't have the command "file", I use a cygwin box.
Have you tried ? I can't believe this.
The command "file" is a command that is
required by the POSIX standard:
http://www.opengroup.org/onlinepubs...9/xcu/file.html
I would be very surprised if this command
would be missing on Cygwin.
| |
| Janis Papanagnou 2005-02-07, 8:56 pm |
| Jürgen Kahrs wrote:
> Gernot Frisch wrote:
>
>
> file *.txt | awk '$2~/ASCII/{print substr($1,0, length($1)-1)}'
AFAIK the output of the command 'file' is OS dependent.
And "ASCII" seems not to be a good pattern.
ksh.1: troff or preprocessor input text
ksh.txt: ASCII English text, with overstriking
Both are ASCII based text files.
Janis
| |
| Jürgen Kahrs 2005-02-07, 8:56 pm |
| Janis Papanagnou wrote:
> AFAIK the output of the command 'file' is OS dependent.
That's true.
> And "ASCII" seems not to be a good pattern.
Well, it worked for me. But this is one
of the things the user has to change ad hoc.
| |
| Janis Papanagnou 2005-02-07, 8:56 pm |
| Jürgen Kahrs wrote:
> Janis Papanagnou wrote:
>
>
> That's true.
>
>
> Well, it worked for me. But this is one
> of the things the user has to change ad hoc.
But these two points I mentioned are exactly what makes it hard (if
possible at all) to _definitely_ check whether the requirements are
satisfied or not.
With the given requirements some more elementary code is necessary;
for example checking the output of... [*]
od -t x1 yourfile | awk '{$1="" ; print}' | tr " " "\n" | sort -u
....by adding a grep to look for ^[89a-f] and 00 (or any such).
Janis
[*] That's just an outline, sort -u just added to inspect the outcome
manually, use cut instead of awk, as you like, etc.
| |
| Jan van den Broek 2005-02-07, 8:56 pm |
| Mon, 7 Feb 2005 16:30:02 +0100
"Gernot Frisch" <Me@Privacy.net> schrieb:
>
>"J?rgen Kahrs" <Juergen.KahrsDELETETHIS@vr-web.de> schrieb im
>Newsbeitrag news:36pd25F54ripaU1@individual.net...
>
>I don't have the command "file", I use a cygwin box.
?
I installed Cygwin on a machine last w , and it has 'file' in /bin.
--
Jan van den Broek balglaas@xs4all.nl
"It's guys like Jan who give us weird people a bad name."
"Sid" <sid@siddhartha.8m.com> in alt.fan.douglas-adams
| |
| Janis Papanagnou 2005-02-10, 8:55 am |
| Jürgen Kahrs wrote:
> Janis Papanagnou wrote:
>
>
> That's true.
>
>
> Well, it worked for me. But this is one
> of the things the user has to change ad hoc.
But these two points I mentioned are exactly what makes it hard (if
possible at all) to _definitely_ check whether the requirements are
satisfied or not.
With the given requirements some more elementary code is necessary;
for example checking the output of... [*]
od -t x1 yourfile | awk '{$1="" ; print}' | tr " " "\n" | sort -u
....by adding a grep to look for ^[89a-f] and 00 (or any such).
Janis
[*] That's just an outline, sort -u just added to inspect the outcome
manually, use cut instead of awk, as you like, etc.
| |
| Ian Stirling 2005-02-10, 8:55 am |
| Kenny McCormack <gazelle@yin.interaccess.com> wrote:
> In article <36pfqqF4v170vU2@individual.net>,
> Gernot Frisch <Me@Privacy.net> wrote:
<snip "what is ASCII">
>
> Well then, gawk may not be the best tool since it has no built-in ord()
> function. TAWK does (as does P***). It is trivial to write an ord() for
> gawk as an extension() function - I believe I posted one a year or so back
> (Google is your friend). Unclear whether or not extension() functions work
> in Cygwin gawk (I say unclear because I know it didn't work before, but
> that progress may have been made in the interim).
BEGIN{RS="[^\t-~]"
FILENAME=ARGV[1]
}
FNR==2{
:q!nextfile
}
FILENAME!=oldfilename{
oldfilename=FILENAME
}
> Then, there's the problem that if the file really is binary, you might have
> "issues" reading it via a text-oriented tool like (g)awk. Again, this *can*
> be done cleanly and w/o issues in TAWK (or P****).
>
|
|
|
|
|