For Programmers: Free Programming Magazines  


Home > Archive > AWK > February 2005 > is file ascii only?









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author is file ascii only?
Gernot Frisch

2005-02-07, 8:56 pm

hi,

I want to do seomthing like this:

ls -1 | gawk {isfileascii} | do_something

do_something could be: xargs cat|grep something

problem is, I want a awk script, that only points out the file name,
if the file is ascii only. If it countains binary characters, don't
print anything at all.

Thank you in advice,
--
-Gernot
int main(int argc, char** argv) {printf
("%silto%c%cf%cgl%ssic%ccom%c", "ma", 58, 'g', 64, "ba", 46, 10);}



Kenny McCormack

2005-02-07, 8:56 pm

In article <36pcjrF50rn1gU1@individual.net>,
Gernot Frisch <Me@Privacy.net> wrote:
>hi,
>
>I want to do seomthing like this:
>
>ls -1 | gawk {isfileascii} | do_something
>
>do_something could be: xargs cat|grep something
>
>problem is, I want a awk script, that only points out the file name,
>if the file is ascii only. If it countains binary characters, don't
>print anything at all.


Define "binary character". That's always the hardest part of this
exercise.

Jürgen Kahrs

2005-02-07, 8:56 pm

Gernot Frisch wrote:

> I want to do seomthing like this:
>
> ls -1 | gawk {isfileascii} | do_something


file *.txt | awk '$2~/ASCII/{print substr($1,0, length($1)-1)}'
Gernot Frisch

2005-02-07, 8:56 pm


"Jürgen Kahrs" <Juergen.KahrsDELETETHIS@vr-web.de> schrieb im
Newsbeitrag news:36pd25F54ripaU1@individual.net...
> Gernot Frisch wrote:
>
>
> file *.txt | awk '$2~/ASCII/{print substr($1,0, length($1)-1)}'


I don't have the command "file", I use a cygwin box.
Gernot


Gernot Frisch

2005-02-07, 8:56 pm


"Kenny McCormack" <gazelle@yin.interaccess.com> schrieb im Newsbeitrag
news:cu7ujf$40t$1@yin.interaccess.com...
> In article <36pcjrF50rn1gU1@individual.net>,
> Gernot Frisch <Me@Privacy.net> wrote:
>
> Define "binary character". That's always the hardest part of this
> exercise.
>


No character above 127, and none below 9 (TAB).


Kenny McCormack

2005-02-07, 8:56 pm

In article <36pfqqF4v170vU2@individual.net>,
Gernot Frisch <Me@Privacy.net> wrote:
>
>"Kenny McCormack" <gazelle@yin.interaccess.com> schrieb im Newsbeitrag
>news:cu7ujf$40t$1@yin.interaccess.com...
>
>No character above 127, and none below 9 (TAB).


Well then, gawk may not be the best tool since it has no built-in ord()
function. TAWK does (as does P***). It is trivial to write an ord() for
gawk as an extension() function - I believe I posted one a year or so back
(Google is your friend). Unclear whether or not extension() functions work
in Cygwin gawk (I say unclear because I know it didn't work before, but
that progress may have been made in the interim).

Then, there's the problem that if the file really is binary, you might have
"issues" reading it via a text-oriented tool like (g)awk. Again, this *can*
be done cleanly and w/o issues in TAWK (or P****).

Jürgen Kahrs

2005-02-07, 8:56 pm

Gernot Frisch wrote:

> I don't have the command "file", I use a cygwin box.


Have you tried ? I can't believe this.
The command "file" is a command that is
required by the POSIX standard:

http://www.opengroup.org/onlinepubs...9/xcu/file.html

I would be very surprised if this command
would be missing on Cygwin.
Janis Papanagnou

2005-02-07, 8:56 pm

Jürgen Kahrs wrote:
> Gernot Frisch wrote:
>
>
> file *.txt | awk '$2~/ASCII/{print substr($1,0, length($1)-1)}'


AFAIK the output of the command 'file' is OS dependent.

And "ASCII" seems not to be a good pattern.

ksh.1: troff or preprocessor input text
ksh.txt: ASCII English text, with overstriking

Both are ASCII based text files.

Janis
Jürgen Kahrs

2005-02-07, 8:56 pm

Janis Papanagnou wrote:

> AFAIK the output of the command 'file' is OS dependent.


That's true.

> And "ASCII" seems not to be a good pattern.


Well, it worked for me. But this is one
of the things the user has to change ad hoc.
Janis Papanagnou

2005-02-07, 8:56 pm

Jürgen Kahrs wrote:
> Janis Papanagnou wrote:
>
>
> That's true.
>
>
> Well, it worked for me. But this is one
> of the things the user has to change ad hoc.


But these two points I mentioned are exactly what makes it hard (if
possible at all) to _definitely_ check whether the requirements are
satisfied or not.

With the given requirements some more elementary code is necessary;
for example checking the output of... [*]

od -t x1 yourfile | awk '{$1="" ; print}' | tr " " "\n" | sort -u

....by adding a grep to look for ^[89a-f] and 00 (or any such).

Janis

[*] That's just an outline, sort -u just added to inspect the outcome
manually, use cut instead of awk, as you like, etc.
Jan van den Broek

2005-02-07, 8:56 pm

Mon, 7 Feb 2005 16:30:02 +0100
"Gernot Frisch" <Me@Privacy.net> schrieb:
>
>"J?rgen Kahrs" <Juergen.KahrsDELETETHIS@vr-web.de> schrieb im
>Newsbeitrag news:36pd25F54ripaU1@individual.net...
>
>I don't have the command "file", I use a cygwin box.


?

I installed Cygwin on a machine last w, and it has 'file' in /bin.

--
Jan van den Broek balglaas@xs4all.nl

"It's guys like Jan who give us weird people a bad name."
"Sid" <sid@siddhartha.8m.com> in alt.fan.douglas-adams
Janis Papanagnou

2005-02-10, 8:55 am

Jürgen Kahrs wrote:
> Janis Papanagnou wrote:
>
>
> That's true.
>
>
> Well, it worked for me. But this is one
> of the things the user has to change ad hoc.


But these two points I mentioned are exactly what makes it hard (if
possible at all) to _definitely_ check whether the requirements are
satisfied or not.

With the given requirements some more elementary code is necessary;
for example checking the output of... [*]

od -t x1 yourfile | awk '{$1="" ; print}' | tr " " "\n" | sort -u

....by adding a grep to look for ^[89a-f] and 00 (or any such).

Janis

[*] That's just an outline, sort -u just added to inspect the outcome
manually, use cut instead of awk, as you like, etc.
Ian Stirling

2005-02-10, 8:55 am

Kenny McCormack <gazelle@yin.interaccess.com> wrote:
> In article <36pfqqF4v170vU2@individual.net>,
> Gernot Frisch <Me@Privacy.net> wrote:

<snip "what is ASCII">
>
> Well then, gawk may not be the best tool since it has no built-in ord()
> function. TAWK does (as does P***). It is trivial to write an ord() for
> gawk as an extension() function - I believe I posted one a year or so back
> (Google is your friend). Unclear whether or not extension() functions work
> in Cygwin gawk (I say unclear because I know it didn't work before, but
> that progress may have been made in the interim).


BEGIN{RS="[^\t-~]"
FILENAME=ARGV[1]
}
FNR==2{
:q!nextfile
}
FILENAME!=oldfilename{
oldfilename=FILENAME
}

> Then, there's the problem that if the file really is binary, you might have
> "issues" reading it via a text-oriented tool like (g)awk. Again, this *can*
> be done cleanly and w/o issues in TAWK (or P****).
>

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com