Home > Archive > AWK > May 2005 > OT: Re: command-line vs. script file
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
OT: Re: command-line vs. script file
|
|
| Ed Morton 2005-05-15, 3:55 pm |
|
Chris F.A. Johnson wrote:
> On Sun, 15 May 2005 at 13:21 GMT, Ed Morton wrote:
>
>
>
> There's nothing wrong with "-" in a filename, except at the
> beginning.
Any time you have to say "except..." there is a problem.
The POSIX portable filename standard allows letters,
> numbers, periods, hyphens and underscores, but a name may not
> begin with a hyphen.
>
We often write scripts that manipulate file names, holding parts of the
name in variables, creating tmp file names from the parts, etc. so the
impact of a hyphen isn't restricted to where it appears in the original
file name.
I agree you can later code to avoid the potential problems introduced by
"-"s, but if you just choose to avoid them when creating the files then
you don't need to deal with them at all.
Ed
| |
| Kenny McCormack 2005-05-15, 3:55 pm |
| In article <ta6dnY-rs8WWHRrfRVn-1w@comcast.com>,
Ed Morton <morton@lsupcaemnt.com> wrote:
....
>We often write scripts that manipulate file names, holding parts of the
>name in variables, creating tmp file names from the parts, etc. so the
>impact of a hyphen isn't restricted to where it appears in the original
>file name.
Exactly. That's the point.
| |
| Chris F.A. Johnson 2005-05-15, 8:55 pm |
| On Sun, 15 May 2005 at 16:59 GMT, Ed Morton wrote:
>
>
> Chris F.A. Johnson wrote:
>
> Any time you have to say "except..." there is a problem.
>
> The POSIX portable filename standard allows letters,
>
> We often write scripts that manipulate file names, holding parts of the
> name in variables, creating tmp file names from the parts, etc. so the
> impact of a hyphen isn't restricted to where it appears in the original
> file name.
>
> I agree you can later code to avoid the potential problems introduced by
> "-"s, but if you just choose to avoid them when creating the files then
> you don't need to deal with them at all.
When is a hyphen a problem, except at the beginning of a
filename?
--
Chris F.A. Johnson <http://cfaj.freeshell.org>
========================================
==========================
Shell Scripting Recipes: A Problem-Solution Approach, 2005, Apress
<http://www.torfree.net/~chris/books/ssr.html>
| |
| Ed Morton 2005-05-15, 8:55 pm |
|
Chris F.A. Johnson wrote:
> On Sun, 15 May 2005 at 16:59 GMT, Ed Morton wrote:
>
>
>
> When is a hyphen a problem, except at the beginning of a
> filename?
>
As a trivial example:
$ file=a-b
$ tmp="${file#[a-z]}"
$ > $tmp
$ rm $tmp
rm: invalid option -- b
Try `rm --help' for more information.
Had I instead used underscores in the name of "file":
$ file=a_b
$ tmp="${file#[a-z]}"
$ > $tmp
$ rm $tmp
The hyphen was not at the start of the original file name but by
manipulation to produce a tmp file, it ended up at the front of the tmp
file name and so became a problem.
Regards,
Ed.
| |
| Sebastian Luque 2005-05-15, 8:55 pm |
| Ed Morton <morton@lsupcaemnt.com> wrote:
[...]
> setupFiles.awk *
Yes, that does it, although I'm surprised one cannot pipe 'ls' to an awk
script. I've gotten used to doing that with other programs.
Doing test runs of the script, I discovered yet something else that needs
to be added to it. Some lines (in some files) do not end in comma ",", so
I said "piece of cake" and added:
/[^,]$/ { print $0, "," }
which disastrously added a comma at the end of every line! Obviously my
regexp must be wrong. How could I fail at something so simple!? Thanks for
your patience...this is my first awk excercise.
Cheers,
--
Sebastian P. Luque
| |
| Sebastian Luque 2005-05-16, 3:55 am |
| Sebastian Luque <sluque@mun.ca> wrote:
[...]
> /[^,]$/ { print $0, "," }
This is driving me insane. I created a separate fake file with a few lines
having the same structure as the real one, and that pair above works
without problems on the fake file. So it seems there's something weird
about the real files that renders the regexp useless. I hope this is not
some encoding hell.
Cheers,
--
Sebastian P. Luque
| |
| Ed Morton 2005-05-16, 3:55 am |
|
Sebastian Luque wrote:
> Ed Morton <morton@lsupcaemnt.com> wrote:
>
> [...]
>
>
>
>
> Yes, that does it, although I'm surprised one cannot pipe 'ls' to an awk
> script. I've gotten used to doing that with other programs.
ls produces a list of file names. Piping that to ANY program causes that
program to work on that list of file names, NOT on the contents of those
files. Your awk script is behaving exactly as any other UNIX program would.
> Doing test runs of the script, I discovered yet something else that needs
> to be added to it. Some lines (in some files) do not end in comma ",", so
> I said "piece of cake" and added:
>
> /[^,]$/ { print $0, "," }
>
> which disastrously added a comma at the end of every line! Obviously my
> regexp must be wrong. How could I fail at something so simple!? Thanks for
> your patience...this is my first awk excercise.
The regexp looks fine. Check whether or not you have spaces at the end
of the lines that appear to end in commas. In a DOS-created file you'll
have control-Ms (or some other DOS-isnpired eoln char).
Ed.
| |
| Sebastian Luque 2005-05-16, 3:55 am |
| Ed Morton <morton@lsupcaemnt.com> wrote:
[...]
> ls produces a list of file names. Piping that to ANY program causes that
> program to work on that list of file names, NOT on the contents of those
> files. Your awk script is behaving exactly as any other UNIX program
> would.
Exactly! Thanks, I often miss these subtleties.
[...]
> The regexp looks fine. Check whether or not you have spaces at the end
> of the lines that appear to end in commas. In a DOS-created file you'll
> have control-Ms (or some other DOS-isnpired eoln char).
I thought about that and checked both visually and doing 'C-x =' in Emacs
to get information on the character at the end of the lines. There are no
spaces, Emacs says the end of line character is C-j as it is in other
files, and there's nothing weird visually. But go figure, copying the
contents and saving into a new file solved the problem. With so many files
to run the script through, I'll have to look for an automatic way of
fixing this.
>
> Ed.
--
Sebastian P. Luque
| |
| Sebastian Luque 2005-05-16, 3:55 am |
| Sebastian Luque <sluque@mun.ca> wrote:
[...]
> I thought about that and checked both visually and doing 'C-x =' in
> Emacs to get information on the character at the end of the lines. There
> are no spaces, Emacs says the end of line character is C-j as it is in
> other files, and there's nothing weird visually. But go figure, copying
> the contents and saving into a new file solved the problem. With so many
> files to run the script through, I'll have to look for an automatic way
> of fixing this.
Sorry I accidentally sent my message before finishing.
This looks like what you were talking about with DOS generated files
putting control-M at the end of lines. When doing:
awk '/[^,]$/ { print $0 "," }' original-file
the output has "^M," added to every line, which is something I didn't
fully describe when I first posted the problem. That goes to shows me not
to make assumptions about the relevance of some pieces of information!
Thank you,
--
Sebastian P. Luque
| |
| Sebastian Luque 2005-05-16, 3:55 am |
| Ed Morton <morton@lsupcaemnt.com> wrote:
[...]
> The regexp looks fine. Check whether or not you have spaces at the end
> of the lines that appear to end in commas. In a DOS-created file you'll
> have control-Ms (or some other DOS-isnpired eoln char).
You're right; when opening any of these files, Emacs is showing "(DOS)" at
the left-hand side of the mode line. However, I don't understand why the
"^M"s don't show up at all when viewing the file either directly in Emacs
or from a console. I can only see them in the output of the awk command.
Cheers,
--
Sebastian P. Luque
[scratching head]
| |
| Kenny McCormack 2005-05-16, 3:55 am |
| In article <87acmwq7nk.fsf@mun.ca>, Sebastian Luque <sluque@mun.ca> wrote:
>
>awk '/[^,]$/ { print $0 "," }' original-file
>
>the output has "^M," added to every line, which is something I didn't
>fully describe when I first posted the problem. That goes to shows me not
>to make assumptions about the relevance of some pieces of information!
You can probably fix this, by changing your reg exp to:
/[^,]\r*$/
I've used this on occasion; the nice thing is the *, which matches 0 or
more, so should be safe against most permutations.
| |
| Kenny McCormack 2005-05-16, 3:55 am |
| In article <8764xkq70k.fsf@mun.ca>, Sebastian Luque <sluque@mun.ca> wrote:
>Ed Morton <morton@lsupcaemnt.com> wrote:
>
>[...]
>
>
>You're right; when opening any of these files, Emacs is showing "(DOS)" at
>the left-hand side of the mode line. However, I don't understand why the
>"^M"s don't show up at all when viewing the file either directly in Emacs
>or from a console. I can only see them in the output of the awk command.
VIM effectively converts the file to Unix format internally, but keeps
track of the fact that the file was categorized as "DOS" format when it was
read in. It then writes it back out in DOS mode. I assume Emacs does much
the same.
I think VIM has a "binary" mode that turns off this special processing
(making it more "what you see is what you get"). Again, I would guess that
Emacs has one, too.
| |
| Ed Morton 2005-05-16, 3:55 pm |
|
Kenny McCormack wrote:
> In article <87acmwq7nk.fsf@mun.ca>, Sebastian Luque <sluque@mun.ca> wrote:
>
>
>
> You can probably fix this, by changing your reg exp to:
>
> /[^,]\r*$/
>
> I've used this on occasion; the nice thing is the *, which matches 0 or
> more, so should be safe against most permutations.
Alternatively consider using a character class (see
http://www.gnu.org/software/gawk/ma...har_002dclasses)
to detect any control characters:
/[^,][[:control:]]*$/
or any non-printable characters:
/[^,][^[:graph:]]*$/
or ....
Regards,
Ed.
| |
| Ed Morton 2005-05-16, 3:55 pm |
|
Sebastian Luque wrote:
[color=darkred]
> Sebastian Luque <sluque@mun.ca> wrote:
>
> [...]
>
>
Hint: look for "dos2unix". For non-awk questions, comp.unix.shell is a
good resource (and you may recognize a couple of contributors ;-) ).
Ed.
| |
| Kenny McCormack 2005-05-16, 3:55 pm |
| In article <gs-dnUD1XYkmAhXfRVn-1A@comcast.com>,
Ed Morton <morton@lsupcaemnt.com> wrote:
....
>
>Hint: look for "dos2unix". For non-awk questions, comp.unix.shell is a
>good resource (and you may recognize a couple of contributors ;-) ).
>
> Ed.
Note that:
1) He said (in an earlier post) that he did use dos2unix - but it
didn't help. Whether or not this statement should be taken
at face value is, of course, open to debate.
2) dos2unix is a weird command. In particular, under Solaris, it
doesn't do what you think it does.
Note: I should ammend #2 above to say "The system-supplied 'dos2unix'
command, under some OSs, is a weird command. Therefore, I never use
system-supplied versions; I write my own."
| |
| Kenny McCormack 2005-05-16, 3:55 pm |
| In article <YcudnbpqnZdDBhXfRVn-2Q@comcast.com>,
Ed Morton <morton@lsupcaemnt.com> wrote:
>
>
>Kenny McCormack wrote:
>
>
>Alternatively consider using a character class (see
>http://www.gnu.org/software/gawk/ma...har_002dclasses)
>to detect any control characters:
I don't like your so-called "character classes" and never use them.
They are convoluted, contrived, and unneccessary. Also, prone to failure
when so-called "internationalization" (aka, "locale") issues come into
play.
| |
| Ed Morton 2005-05-16, 3:55 pm |
|
Kenny McCormack wrote:
> In article <YcudnbpqnZdDBhXfRVn-2Q@comcast.com>,
> Ed Morton <morton@lsupcaemnt.com> wrote:
>
>
>
> I don't like your so-called "character classes" and never use them.
They aren't mine
> They are convoluted, contrived, and unneccessary.
Many things that aren't necessary are still useful. I find it clearer,
and sometimes more concise, to use some of them, though I'd have been
happier if the names weren't quite so long. e.g. 2-character names
instead of 5+:
[:alnum:] -> [:an:]
[:alpha:] -> [:al:]
[:blank:] -> [:bl:]
[:cntrl:] -> [:cn:]
[:digit:] -> [:di:]
[:graph:] -> [:gr:]
[:lower:] -> [:lc:]
[:print:] -> [:pr:]
[:punct:] -> [:pu:]
[:space:] -> [:sp:]
[:upper:] -> [:uc:]
[:xdigit:] -> [:xd:]
wouldn't have been noticably harder to understand and would've made the
character class names about as brief as the explicit ranges they
replace. As-is I'd find it hard to justify using "[[:lower:]]" instead
of "[a-z]" unless it's for future-proofing against languages that don't
have that range (any?), but I could just about buy into using "[[:lc:]]"
for consistency with other ranges.
Also, prone to failure
> when so-called "internationalization" (aka, "locale") issues come into
> play.
I'm surprised to hear that. Could you give an example?
Ed.
| |
| Janis Papanagnou 2005-05-16, 3:55 pm |
| Ed Morton wrote:
> Kenny McCormack wrote:
>
>
> I'm surprised to hear that. Could you give an example?
An example from my environment where character classes work perfectly...
Umlauts Ä Ö Ü (upper) and ä ö ü ß (lower) are all recognized correctly with
LANG=de_DE@euro (while with LANG=C they are not recognized as upper/lower).
Hard coded character ranges, OTOH, would not fit I18N requirements.
Janis
|
|
|
|
|