Home > Archive > AWK > November 2004 > Re: awk challenge
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
|
|
| William Park 2004-11-16, 6:50 pm |
| A Ferenstein <epaalx@hotmail.com> wrote:
> Here's the task: create an array containing beginning and end of columns (of
> some text not matching a regexp string) separated by a regexp string.
> Condition: can't use substr() to read character-by-character!
>
> For example, let's say regexp=" " (ie. a space), result "delimiter_array"
> (where x represents some character not matched by regexp):
>
> xxx xxxx xxxxx
>
> would result in
> delimiter_array[1] is 1
> delimiter_array[2] is 3
> delimiter_array[3] is 6
> delimiter_array[4] is 9
> delimiter_array[5] is 11
> delimiter_array[6] is 15
You example doesn't match your numbers. Count again, and repost.
| |
| Ed Morton 2004-11-16, 6:50 pm |
|
William Park wrote:
> A Ferenstein <epaalx@hotmail.com> wrote:
>
>
>
> You example doesn't match your numbers. Count again, and repost.
Or not :-). Don't know about anyone else but IMHO "let's see how clever
you are" exercises are just newsgroup clutter. In that category I
include anything that says "solve this without doing ABC" without a good
reason for that restriction.
Ed.
| |
| Kenny McCormack 2004-11-16, 6:50 pm |
| In article <cn08hr$llc@netnews.proxy.lucent.com>,
Ed Morton <morton@lsupcaemnt.com> wrote:
....
>Or not :-). Don't know about anyone else but IMHO "let's see how clever
>you are" exercises are just newsgroup clutter. In that category I include
>anything that says "solve this without doing ABC" without a good reason
>for that restriction.
You are obviously an engineer and not a mathematician.
I need say no more.
| |
| Ed Morton 2004-11-16, 6:50 pm |
|
Kenny McCormack wrote:
> In article <cn08hr$llc@netnews.proxy.lucent.com>,
> Ed Morton <morton@lsupcaemnt.com> wrote:
> ...
>
>
>
> You are obviously an engineer and not a mathematician.
He shoots, he scores. Guilty as charged.
> I need say no more.
A mathematician would have found a way to say "I need say no more"
without using vowels.
Ed.
| |
| A Ferenstein 2004-11-16, 6:50 pm |
| The reason for not using substr() character-by-character is self evident
(clue. 'very big line'), neither would it exploit awk's strengths -
optimised functions related to strings. Plus the aim of exercise to think
rather than force.
Please look at my version of solution.
> Or not :-). Don't know about anyone else but IMHO "let's see how clever
> you are" exercises are just newsgroup clutter. In that category I
> include anything that says "solve this without doing ABC" without a good
> reason for that restriction.
| |
| Kenny McCormack 2004-11-16, 6:50 pm |
| In article <cn1qeh$m5s$1@newstree.wise.edt.ericsson.se>,
A Ferenstein <epaalx@hotmail.com> wrote:
>The reason for not using substr() character-by-character is self evident
>(clue. 'very big line'), neither would it exploit awk's strengths -
>optimised functions related to strings. Plus the aim of exercise to think
>rather than force.
>Please look at my version of solution.
Ed has made it clear that he is a simpleton who doesn't like theoretical
problems. You are unlikely to change his way of thinking (or not).
[color=darkred]
| |
| Ed Morton 2004-11-16, 6:50 pm |
|
Kenny McCormack wrote:
> In article <cn1qeh$m5s$1@newstree.wise.edt.ericsson.se>,
> A Ferenstein <epaalx@hotmail.com> wrote:
>
The clue appears to be missing from your original posting. Had you said
"I have a problem where my lines are very long and using substr() takes
too long because XYZ - is there a way to optimizes this code" then we
wouldn't be having this chat.
neither would it exploit awk's strengths -[color=darkred]
But it does exploit awk's strengths -
a rich set of component functions from which you can quickly and easily
build solutions to problems. No-ones using awk vs C for it's speed of
execution.
Plus the aim of exercise to think[color=darkred]
I suspect most of us get a reasonable workout every day solving real
problems and don't need the additional exercise.
[color=darkred]
I did. It's not immediately obvious that a recursive solution with
multiple calls to "match()" would be any faster than using substr().
[color=darkred]
>
> Ed has made it clear that he is a simpleton who doesn't like theoretical
> problems. You are unlikely to change his way of thinking (or not).
Yes, exactly. Hey, hang on.... I need to go ask someone if that's an insult.
Ed.
>
>
>
| |
| Ed Morton 2004-11-16, 6:50 pm |
|
A Ferenstein wrote:
>
>
> Given that column separators are regexp's, match() is un-avoidable.
>
>
No it isn't:
gawk '{c=gsub("\(.\)","&"SUBSEP);FS=SUBSEP;$0=$0;idx=1;
for (i=1; i<NF; i++) {
if ($i == " ") {
if (fnd) {
delimiter_array[++idx] = i-1
idx++;
}
fnd = 0
} else {
if (!fnd) {
delimiter_array[idx] = i
}
fnd = 1
}
}
delimiter_array[++idx] = i-1
}'
I have no idea if the above is any faster or slower than using substr()
or your recusrion with match(). My only point is that there's alternatives.
Regards,
Ed.
| |
| Ed Morton 2004-11-16, 6:50 pm |
|
Kenny McCormack wrote:
> In article <cn08hr$llc@netnews.proxy.lucent.com>,
> Ed Morton <morton@lsupcaemnt.com> wrote:
> ...
>
>
>
> You are obviously an engineer and not a mathematician.
He shoots, he scores. Guilty as charged.
> I need say no more.
A mathematician would have found a way to say "I need say no more"
without using vowels.
Ed.
| |
| Kenny McCormack 2004-11-16, 6:50 pm |
| In article <cn1qeh$m5s$1@newstree.wise.edt.ericsson.se>,
A Ferenstein <epaalx@hotmail.com> wrote:
>The reason for not using substr() character-by-character is self evident
>(clue. 'very big line'), neither would it exploit awk's strengths -
>optimised functions related to strings. Plus the aim of exercise to think
>rather than force.
>Please look at my version of solution.
Ed has made it clear that he is a simpleton who doesn't like theoretical
problems. You are unlikely to change his way of thinking (or not).
[color=darkred]
| |
| A Ferenstein 2004-11-16, 6:50 pm |
| Ed, sorry, I guess I'm slow. Can you please explain what this is supposed to
do?
The function must be able to show beginning and ending columns of text whose
(delimiter) separator is specified by a regular expression!
> No it isn't:
>
> gawk '{c=gsub("\(.\)","&"SUBSEP);FS=SUBSEP;$0=$0;idx=1;
> for (i=1; i<NF; i++) {
> if ($i == " ") {
> if (fnd) {
> delimiter_array[++idx] = i-1
> idx++;
> }
> fnd = 0
> } else {
> if (!fnd) {
> delimiter_array[idx] = i
> }
> fnd = 1
> }
> }
> delimiter_array[++idx] = i-1
> }'
>
> I have no idea if the above is any faster or slower than using substr()
> or your recusrion with match(). My only point is that there's
alternatives.
>
> Regards,
>
> Ed.
| |
| Ed Morton 2004-11-16, 6:50 pm |
|
A Ferenstein wrote:
> Ed, sorry, I guess I'm slow.
That's OK, I have it on good authority that there are simpletons in this
NG ;-).
Can you please explain what this is supposed to
> do?
It finds the start and end character positions in lines where chains of
single spaces are the separator.
> The function must be able to show beginning and ending columns of text whose
> (delimiter) separator is specified by a regular expression!
I shouldn't have posted a response without re-reading your original
posting as you weren't trying to do what I thought.
For an RE-separated solution, if you're processing a single line of text
(which seems likely for this type of problem) then the obvious way to
handle that would be to use a running count of "length($0) + length(RT)"
to track the start and end points rather than invoking "match()".
Off the top of my head I don't know a built-in equivalent to "RT" in
identifying field separators, so if this had to be applied to every line
then I'd probably invent a comparable solution which may or may not use
match().
For any GNU implementers listening, here's a suggestion - we could
really use a "FT" equivalent to "RT".
Ed.
>
>
>
> alternatives.
>
>
>
>
| |
| Aharon Robbins 2004-11-16, 6:50 pm |
| In article <lYWdndip-pRxXAXcRVn-3w@comcast.com>,
Ed Morton <morton@lsupcaemnt.com> wrote:
>Off the top of my head I don't know a built-in equivalent to "RT" in
>identifying field separators,
There isn't one.
>For any GNU implementers listening, here's a suggestion - we could
>really use a "FT" equivalent to "RT".
Really? How many times a w do say to yourself, "Boy, I wish we
had an FT?" Seriously, is the lack of FT really a big impediment
in doing your day to day awk programming?
First of all, FT would have to be an array, not a single variable.
Secondly, the overhead for having it would be huge. Gawk would have
to clear and then fill in the array for EVERY record. When you're
processing multi-gigabyte log files, that's a lot of overhead for
a feature that probably won't be used all that much.
I know: For 3.1.3, I fixed gawk so that it only changes RT if something
really changed. In the normal case, where RS = "\n", RT's value
doesn't change between records. This brought a huge speedup for
something like
gawk '{ print }' /some/huge/file
If you really need the info an FT array would give you, it's pretty
easy to cobble up a function using match and substr to get it.
Arnold
--
Aharon (Arnold) Robbins --- Pioneer Consulting Ltd. arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381 Fax: +1 206 350 8765
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL
| |
| Kenny McCormack 2004-11-16, 6:50 pm |
| In article <41990592$1@news.012.net.il>,
Aharon Robbins <arnold@skeeve.com> wrote:
>In article <lYWdndip-pRxXAXcRVn-3w@comcast.com>,
>Ed Morton <morton@lsupcaemnt.com> wrote:
>
>There isn't one.
>
>
>Really? How many times a w do say to yourself, "Boy, I wish we
>had an FT?" Seriously, is the lack of FT really a big impediment
>in doing your day to day awk programming?
I'm not disagreeing with anything you're saying - and the arguments based
on efficiency certainly sound right. However, I have a few comments:
1) I think Ed just was thinking in terms of completeness, not
really asserting that it would be used all that often. In fact, RT is
a neat idea, but I don't think I've used it more than twice since its
inception. FT would be even less frequently used.
2) For the most part, if you wanted the functionality of FT,
couldn't you just set RS=< whatYouHadPreviouslyThoughtOfAsYourField
Delimiter>
and then use RT? I've seen a lot of this sort of technique - where you set
RS to something and then process each field in the line as if it were a
"record".
| |
| Aharon Robbins 2004-11-16, 6:50 pm |
| In article <cnb1dt$nkv$1@yin.interaccess.com>,
Kenny McCormack <gazelle@interaccess.com> wrote:
>I'm not disagreeing with anything you're saying - and the arguments based
>on efficiency certainly sound right. However, I have a few comments:
>
> 1) I think Ed just was thinking in terms of completeness, not
>really asserting that it would be used all that often.
Completeness is great in theory, but every new feature has a practical
implication. If the cost of the feature (code complexity, code
maintainability, run time efficiency) outweighs the expressive power
that the feature brings, then it's not worth it. This has been a
painful lesson for me: there are features in gawk that have taken
literally *years* to get right, and that I'd happily remove, except
that it's Too Late Now.
>In fact, RT is
>a neat idea, but I don't think I've used it more than twice since its
>inception. FT would be even less frequently used.
I actually use RT quite a lot, especially when working with XML files.
Using something like RS = "<[^>]+>", RT becomes an XML open or close
tag, and $0 is all the text up to the tag. Incredibly useful.
> 2) For the most part, if you wanted the functionality of FT,
>couldn't you just set RS=< whatYouHadPreviouslyThoughtOfAsYourField
Delimiter>
>and then use RT? I've seen a lot of this sort of technique - where you set
>RS to something and then process each field in the line as if it were a
>"record".
Yes, this would work, and it's even more elegant than having FT.
Arnold
--
Aharon (Arnold) Robbins --- Pioneer Consulting Ltd. arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381 Fax: +1 206 350 8765
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL
| |
| Jürgen Kahrs 2004-11-16, 6:50 pm |
| Aharon Robbins wrote:
> I actually use RT quite a lot, especially when working with XML files.
> Using something like RS = "<[^>]+>", RT becomes an XML open or close
> tag, and $0 is all the text up to the tag. Incredibly useful.
After extraction of open tags, do you also work
on XML attributes ? Or are you mostly interested
in tag names ?
You were talking about the cost of an additional
feature as compared to its use. What do you think
about the relation between cost and effect in the
XML extension ?
| |
| Ed Morton 2004-11-16, 6:50 pm |
|
Aharon Robbins wrote:
> In article <lYWdndip-pRxXAXcRVn-3w@comcast.com>,
> Ed Morton <morton@lsupcaemnt.com> wrote:
>
>
>
> There isn't one.
>
>
>
>
> Really? How many times a w do say to yourself, "Boy, I wish we
> had an FT?" Seriously, is the lack of FT really a big impediment
> in doing your day to day awk programming?
I wish for FT about as often as I'm thankful for RT which is about once
a month or so and that's mostly when responding to NG articles. I've
rarely needed either of them in my own usage.
> First of all, FT would have to be an array, not a single variable.
That'd be the most obvious implementation. I was just picturing some
variable that gets set by context within loops on fields just like RT
gets set by context on loops through records, but I hadn't thought at
all about the implementation.
> Secondly, the overhead for having it would be huge. Gawk would have
> to clear and then fill in the array for EVERY record. When you're
> processing multi-gigabyte log files, that's a lot of overhead for
> a feature that probably won't be used all that much.
I don't process mutli-gigabyte files at all and if I did I'd probably
write a C program to do it. I use awk for speed of writing scripts, not
speed of executing them.
>
> I know: For 3.1.3, I fixed gawk so that it only changes RT if something
> really changed. In the normal case, where RS = "\n", RT's value
> doesn't change between records. This brought a huge speedup for
> something like
>
> gawk '{ print }' /some/huge/file
>
> If you really need the info an FT array would give you, it's pretty
> easy to cobble up a function using match and substr to get it.
It's not an "if", it's a "when". I'm just suggesting that some builtin
method to avoid multiple users having to repeatedly write their own
function for this would be a useful addition. If performance is a
significant issue, as I'm sure it is, you could always have a flag that
controls whether the functionality is on or off.
As with any suggestion, you can always reject it for many good reasons.
Ed.
> Arnold
| |
| Ed Morton 2004-11-16, 6:50 pm |
|
Aharon Robbins wrote:
> In article <cnb1dt$nkv$1@yin.interaccess.com>,
> Kenny McCormack <gazelle@interaccess.com> wrote:
>
>
>
> Completeness is great in theory, but every new feature has a practical
> implication. If the cost of the feature (code complexity, code
> maintainability, run time efficiency) outweighs the expressive power
> that the feature brings, then it's not worth it.
Yes, that's right. Users suggest enhancements and developers weigh the
pros-and-cons based on their expertise and decide whether to implement
or reject the suggestions.
<snip>
>
>
> Yes, this would work, and it's even more elegant than having FT.
I considered that for this example, but it's very in-elegant if the
input file is more than one line long and you want to get line-by-line
output as usual since you need to go looking for newlines (or whatever
the original RS was) in the text and special-case them.
Ed.
| |
| Stepan Kasal 2004-11-16, 6:50 pm |
| Hello,
In article <cnbcdk$hgq@netnews.proxy.lucent.com>, Ed Morton wrote:
> Aharon Robbins wrote:
>
> I considered that for this example, but it's very in-elegant if the
> input file is more than one line long and you want to get line-by-line
> output as usual since you need to go looking for newlines (or whatever
> the original RS was) in the text and special-case them.
Of course, you have to craft the RS so that it matches the newline too,
something like RS="...|\n".
Then, your code has to start by something like
if (RT == "\n") ...
Hope this helps,
Stepan
| |
| Stepan Kasal 2004-11-16, 6:50 pm |
| Hello Aharon,
you wrote:
> painful lesson for me: there are features in gawk that have taken
> literally *years* to get right, and that I'd happily remove, except
> that it's Too Late Now.
It might be enlightening for us to know which features do you mean.
Could you please tell us? (At least the ones you can tell off your mind.)
It's always less painful to learn from others' mistakes. ;-)
Thanks,
Stepan
| |
| Stepan Kasal 2004-11-16, 6:50 pm |
| Hello Ed,
you wrote:
> Aharon Robbins wrote:
>
> That'd be the most obvious implementation. I was just picturing some
> variable that gets set by context within loops on fields just like RT
> gets set by context on loops through records, but I hadn't thought at
> all about the implementation.
Well, then you cannot speak about elegancy and completeness.
I see no consistent way to implement such hack. I'd say that FT has
to be an array.
> If performance is a significant issue, as I'm sure it is, you could
> always have a flag that controls whether the functionality is on or
> off.
> As with any suggestion, you can always reject it for many good
> reasons.
Well, I see two issues: 1) performance overhead,
2) maintenance overhead
I see a possible solution to 1)--see below--but there is still 2).
If we came up with a good solution, which presented no slowdown for
programs which don't use the FT array, Arnold could still reject it
because it clutters up the source tree.
I would perhaps accept the patch in such situation, but I cannot say
that I have successfully managed a free software package for many
years. Thus we have to have respect to Arnold's decission about 2).
Back to 1)--performance:
Luckily, awk has no eval, so we can say in compile time whether the
awk program is using FT or not. (Thus there is no need for any
external option, the optimization can be done automatically.)
If this were done right, there would be no performance penalty for
programs that don't use FT.
Last, but not least, who would do it? There is no point trying to
convince Arnold about maintainablility of the code unless there is
noone who has capacity to implement it. Arnold is not going to
implement it, and I don't volunteer either.
Ed, could you write the patch, if there were a chance that Arnold
would accept it? Or can you raise some funds for this?
Regards,
Stepan Kasal
| |
| Aharon Robbins 2004-11-16, 6:50 pm |
| In article <2vsjudF2njgctU1@uni-berlin.de>,
Jürgen Kahrs <Juergen.KahrsDELETETHIS@vr-web.de> wrote:
>Aharon Robbins wrote:
>
>
>After extraction of open tags, do you also work
>on XML attributes ? Or are you mostly interested
>in tag names ?
Mainly the tag names.
>You were talking about the cost of an additional
>feature as compared to its use. What do you think
>about the relation between cost and effect in the
>XML extension ?
I don't have a feel for it. It seems that if you don't use the XML
features, they're not in the way and don't add runtime overhead
for a regular use. That's good.
Otherwise, the gain in expressiveness for working with XML
data does seem to be worth the tradeoff of increased executable
size, and the source code changes didn't appear to be too
pervasive at first glance, which is also good.
Arnold
--
Aharon (Arnold) Robbins --- Pioneer Consulting Ltd. arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381 Fax: +1 206 350 8765
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL
| |
| Aharon Robbins 2004-11-16, 6:50 pm |
| In article <slrncpjk05.gku.kasal@matsrv.math.cas.cz>,
Stepan Kasal <kasal@ucw.cz> wrote:
>Hello Aharon,
>
>you wrote:
>
>It might be enlightening for us to know which features do you mean.
>Could you please tell us? (At least the ones you can tell off your mind.)
>
>It's always less painful to learn from others' mistakes. ;-)
>
>Thanks,
> Stepan
IGNORECASE is #1. It's taken a long time to get all the semantics right,
and as standard awk already has tolower() and toupper(), in retrospect
it seems to not have been worth the trouble.
The /dev/pid and so on special files. Those will actually disappear
eventually, now that PROCINFO is in.
The bit manipulation functions seem to be overkill; I wonder if anyone
actually uses them?
I also tend to wonder if it was worth the trouble to bring the i18n
features of gettext out to the awk language level.
Arnold
--
Aharon (Arnold) Robbins --- Pioneer Consulting Ltd. arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381 Fax: +1 206 350 8765
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL
| |
| Aharon Robbins 2004-11-16, 6:50 pm |
| In article <cnbbv1$hae@netnews.proxy.lucent.com>,
Ed Morton <morton@lsupcaemnt.com> wrote:
>I wish for FT about as often as I'm thankful for RT which is about once
>a month or so and that's mostly when responding to NG articles. I've
>rarely needed either of them in my own usage.
My point exactly; I suspect that it's not broadly enough useful to
be worth the trouble to implement.
>I don't process mutli-gigabyte files at all
Ah, but there are plenty of people who do. I have to worry about my
entire customer base.
> and if I did I'd probably write a C program to do it. I use awk for
> speed of writing scripts, not speed of executing them.
You might be surprised at the tradeoff you're making. Gawk is faster
than Unix awk, and mawk is usually faster than gawk and perl. I occasionally
get reports that gawk is faster than perl too. It may be that gawk or
mawk are "fast enough" that you can use that awk script for production.
>
>It's not an "if", it's a "when". I'm just suggesting that some builtin
>method to avoid multiple users having to repeatedly write their own
>function for this would be a useful addition.
Alternatively, the web is a great resource. When you write that function,
post it here, and then newbies can be refered to your article as found
via google or dejanews.
Or contribute it to the FSF for me to include in the gawk doc. There's
more than one way to skin a cat. :-)
It's nothing personal, but I've learned the hard way not to go dropping
in features every time an idea comes up in comp.lang.awk (or elsewhere).
Arnold
--
Aharon (Arnold) Robbins --- Pioneer Consulting Ltd. arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381 Fax: +1 206 350 8765
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL
| |
| Aharon Robbins 2004-11-16, 6:50 pm |
| In article <slrncpjleu.gku.kasal@matsrv.math.cas.cz>,
Stepan Kasal <kasal@ucw.cz> wrote:
>Back to 1)--performance:
>Luckily, awk has no eval, so we can say in compile time whether the
>awk program is using FT or not. (Thus there is no need for any
>external option, the optimization can be done automatically.)
>If this were done right, there would be no performance penalty for
>programs that don't use FT.
It's still one if test per record. Small, but not zero.
The penalty when using it would be severe though. Gawk would have to
completely parse the record into fields for every record, whether or
not fields get referenced. Thus, the lazy field parsing stuff goes out
the window. I suspect therefore that we're looking at significant code
complexity too.
And, it's still more incremental goo in the code. Another straw bringing
the camel's back ever closer to breaking. Better to do it with an
awk function. That's also portable to other awks.
Arnold
--
Aharon (Arnold) Robbins --- Pioneer Consulting Ltd. arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381 Fax: +1 206 350 8765
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL
| |
| Ed Morton 2004-11-16, 6:50 pm |
|
Stepan Kasal wrote:
> Hello Ed,
>
> you wrote:
>
>
>
> Well, then you cannot speak about elegancy and completeness.
Sure I can: Having access to the string that matches the field separator
regexp would be the elegant and complete functionality given that we
have access to the string that matches the record separator regexp. How
that's implemented is a minor detail to a user, and whether or not it's
worth implementing is a significant concern to the developer, but that
doesn't affect the abstract appeal to a user.
If you want to get all esoteric about it, part of the QWAN
(http://c2.com/cgi/wiki?QualityWithoutaName) are symmetry and
"wholeness" and having RT without an equivalent FT is neither
symmetrical nor whole.
<snip>
> Ed, could you write the patch, if there were a chance that Arnold
> would accept it? Or can you raise some funds for this?
Nope. My impetus for suggesting it was that I see people asking
questions in this NG where that would be the most elegant solution, but
whether or not it gets implemented isn't a big deal to me personally.
Ed.
| |
| Ed Morton 2004-11-16, 6:50 pm |
|
Aharon Robbins wrote:
> In article <cnbbv1$hae@netnews.proxy.lucent.com>,
> Ed Morton <morton@lsupcaemnt.com> wrote:
>
>
>
> My point exactly; I suspect that it's not broadly enough useful to
> be worth the trouble to implement.
Right, that's where your judgment kicks in.
>
>
> Ah, but there are plenty of people who do. I have to worry about my
> entire customer base.
Right, that's where your judgment kicks in.
>
>
> You might be surprised at the tradeoff you're making. Gawk is faster
> than Unix awk, and mawk is usually faster than gawk and perl. I occasionally
> get reports that gawk is faster than perl too. It may be that gawk or
> mawk are "fast enough" that you can use that awk script for production.
It's not that much harder to write C than awk.
>
>
> Alternatively, the web is a great resource. When you write that function,
> post it here, and then newbies can be refered to your article as found
> via google or dejanews.
Yes, I've started putting some examples on my web site. When it looks a
bit more presentable I'll start referring peiople to it rather than
posting the same solutions here repeatedly.
> Or contribute it to the FSF for me to include in the gawk doc. There's
> more than one way to skin a cat. :-)
I wouldn't have thought of that, I'd have thought of getting it into the
FAQ (though I don't know how to make that happen either).
> It's nothing personal, but I've learned the hard way not to go dropping
> in features every time an idea comes up in comp.lang.awk (or elsewhere).
That's fine, but all you had to say in response to my suggestion was
"Thanks for the suggestion but I've considered it and it's not worth the
development and performance overhead" to which I'd have replied "Fair
enough, thanks for considering it" and we wouldn't have had this chain
of postings.
What I'm taking from this is that you prefer not to see suggestions for
enhancements posted to the NG unless it's something that the poster
personally cares deeply about and is willing to debate and/or take on
the development work personally and that's fine too - lesson learned.
By the way, I really appreciate all the hard work that's gone into gawk.
It makes my working life much easier on a daily basis. Thank you.
Ed.
> Arnold
| |
| Aharon Robbins 2004-11-16, 6:50 pm |
| In article <EfKdnSBrwLNdZgTcRVn-jw@comcast.com>,
Ed Morton <morton@lsupcaemnt.com> wrote:
>
>It's not that much harder to write C than awk.
I think I'll let that one go without comment ...
>
>I wouldn't have thought of that, I'd have thought of getting it into the
>FAQ (though I don't know how to make that happen either).
The FAQ seems abandoned. I don't have the cycles to take it over myself.
>
>That's fine, but all you had to say in response to my suggestion was
>"Thanks for the suggestion but I've considered it and it's not worth the
>development and performance overhead" to which I'd have replied "Fair
>enough, thanks for considering it" and we wouldn't have had this chain
>of postings.
Oh, the postings are OK. It's worth clarifying these things in public
every once in a while. I don't mind.
And it's a nice change from the usual contents of the group.
>What I'm taking from this is that you prefer not to see suggestions for
>enhancements posted to the NG unless it's something that the poster
>personally cares deeply about and is willing to debate and/or take on
>the development work personally and that's fine too - lesson learned.
Well, what probably got me was the "we really need" part. My point
was to try to explore how necessary such a feature is, and that's a
worthwhile exercise occasionally. It was more the "this is absolute
fact" tone of your post, if you get what I mean, that pushed my button.
I don't mind discussing (or shooting down :-) features in this group,
at least once in a while. I do try to be somewhat open minded.
(And I'm sure you didn't mean anything personally, nor do I mean anything
personally.)
I will add that I think gawk has grown about as large as it should grow.
See some of my other posts for the details. Thus I think that when
people see a need for something, the first attempt to solve it should
be an awk function. Failing that, it should be a dynamic module. Failing
that, it should be a suggestion for a built-in feature.
>By the way, I really appreciate all the hard work that's gone into gawk.
>It makes my working life much easier on a daily basis. Thank you.
You're quite welcome. I'm glad it helps. (This is the fun part! :-)
Arnold
--
Aharon (Arnold) Robbins --- Pioneer Consulting Ltd. arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381 Fax: +1 206 350 8765
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL
| |
| E. Rosten 2004-11-16, 6:50 pm |
| Aharon Robbins wrote:
> The bit manipulation functions seem to be overkill; I wonder if anyone
> actually uses them?
Yes!
I do. Not often, but when I need them, they are invaluable. I'd have to
write horrible C code otherwise.
-Ed
--
(You can't go wrong with psycho-rats.) (er258)(@)(eng.cam)(.ac.uk)
/d{def}def/f{/Times findfont s scalefont setfont}d/s{10}d/r{roll}d f 5/m
{moveto}d -1 r 230 350 m 0 1 179{1 index show 88 rotate 4 mul 0 rmoveto}
for /s 15 d f pop 240 420 m 0 1 3 { 4 2 1 r sub -1 r show } for showpage
| |
| Stepan Kasal 2004-11-16, 6:50 pm |
| Hello,
Ed Morton wrote:[color=darkred]
> Stepan Kasal wrote:
when I wrote the above sentence, I meant that it cannot be ``set by
context''. Other than that, yes, I agree that having FT would be
elegant and symmetrical.
But well we have another symmetry: Arnold doesn't want one more straw
on his back and there are no capacities to implement it.
Well it's not that important.
I'd like to thank you for the proposal. Yes, it's good to make a
suggestion when you have an idea. Sorry for the storm that was
triggered by it; but well, we are mere humans...
I for one couldn't resist to add one (well, four or five) comments.
Thanks again,
Stepan
| |
| Ed Morton 2004-11-16, 6:50 pm |
|
Aharon Robbins wrote:
<snip>
> Well, what probably got me was the "we really need" part.
When I said "really need" I just meant "in actuality would be best
served by" rather than "desperately require". Sorry for the confusion.
Ed.
| |
| Don Stokes 2004-11-17, 3:55 am |
| In article <4199e370$1@news.012.net.il>,
Aharon Robbins <arnold@skeeve.com> wrote:
>I also tend to wonder if it was worth the trouble to bring the i18n
>features of gettext out to the awk language level.
On a related note, is it be possible to set the language from inside
an awk program? Lots of times I really want to be sure LC_CTYPE=C and
not some other randomness.
IMAO, it would be nice if the default was to use C as the locale, and
allow users to change it if their programs were going to actually use
the locale stuff, e.g.
BEGIN {
setlocale("LC_CTYPE", ENVIRON["LC_CTYPE"])
...
}
Mostly, if you aren't really trying to use the locale stuff, it just
causes problems.
-- don
| |
| Patrick TJ McPhee 2004-11-17, 3:55 am |
| In article <41990592$1@news.012.net.il>,
Aharon Robbins <arnold@skeeve.com> wrote:
% In article <lYWdndip-pRxXAXcRVn-3w@comcast.com>,
% Ed Morton <morton@lsupcaemnt.com> wrote:
% >For any GNU implementers listening, here's a suggestion - we could
% >really use a "FT" equivalent to "RT".
%
% Really? How many times a w do say to yourself, "Boy, I wish we
% had an FT?" Seriously, is the lack of FT really a big impediment
% in doing your day to day awk programming?
I've never thought that, but I've frequently thought it would be
nice to have an array with the offset of each field.
Just my two bits.
--
Patrick TJ McPhee
North York Canada
ptjm@interlog.com
| |
| Aharon Robbins 2004-11-17, 8:55 am |
| In article <2sxmd.6391$3U4.136633@news02.tsnz.net>,
Don Stokes <don@daedalus.co.not-this-bit.nz> wrote:
>On a related note, is it be possible to set the language from inside
>an awk program? Lots of times I really want to be sure LC_CTYPE=C and
>not some other randomness.
Nope. Sorry. This is best done with a shell wrapper. (Or an extension
function, hint, hint.)
>IMAO, it would be nice if the default was to use C as the locale, and
>allow users to change it if their programs were going to actually use
>the locale stuff,
Yeah, but that's anti-POSIX. D*mned if you do and d*mned if you don't.
>Mostly, if you aren't really trying to use the locale stuff, it just
>causes problems.
Amen brother.
--
Aharon (Arnold) Robbins --- Pioneer Consulting Ltd. arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381 Fax: +1 206 350 8765
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL
| |
| Aharon Robbins 2004-11-17, 8:56 am |
| In article <cndfnq$etg@netnews.proxy.lucent.com>,
Ed Morton <morton@lsupcaemnt.com> wrote:
>Aharon Robbins wrote:
><snip>
>
>When I said "really need" I just meant "in actuality would be best
>served by" rather than "desperately require". Sorry for the confusion.
No problem. Unfortunately, I can only read what's there. I have
yet to get gawk's /dev/telepath feature to work correctly. :-)
Take it easy,
Arnold
P.S. Email bounced. Don't know why.
--
Aharon (Arnold) Robbins --- Pioneer Consulting Ltd. arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381 Fax: +1 206 350 8765
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL
| |
| Don Stokes 2004-11-17, 8:55 pm |
| In article <419b1916$1@news.012.net.il>,
Aharon Robbins <arnold@skeeve.com> wrote:
>In article <2sxmd.6391$3U4.136633@news02.tsnz.net>,
>Don Stokes <don@daedalus.co.not-this-bit.nz> wrote:
>
>Nope. Sorry. This is best done with a shell wrapper. (Or an extension
>function, hint, hint.)
Sure, I could do that. But given that gawk's external include stuff is
clunky at best, a setlocale() builtin isn't an unreasonable request, no?
I'll even code it if required.
-- don
| |
| Kenny McCormack 2004-11-18, 3:55 am |
| In article <41990592$1@news.012.net.il>,
Aharon Robbins <arnold@skeeve.com> wrote:
>In article <lYWdndip-pRxXAXcRVn-3w@comcast.com>,
>Ed Morton <morton@lsupcaemnt.com> wrote:
>
>There isn't one.
>
>
>Really? How many times a w do say to yourself, "Boy, I wish we
>had an FT?" Seriously, is the lack of FT really a big impediment
>in doing your day to day awk programming?
I'm not disagreeing with anything you're saying - and the arguments based
on efficiency certainly sound right. However, I have a few comments:
1) I think Ed just was thinking in terms of completeness, not
really asserting that it would be used all that often. In fact, RT is
a neat idea, but I don't think I've used it more than twice since its
inception. FT would be even less frequently used.
2) For the most part, if you wanted the functionality of FT,
couldn't you just set RS=< whatYouHadPreviouslyThoughtOfAsYourField
Delimiter>
and then use RT? I've seen a lot of this sort of technique - where you set
RS to something and then process each field in the line as if it were a
"record".
| |
| Ed Morton 2004-11-18, 8:55 am |
|
Aharon Robbins wrote:
> In article <lYWdndip-pRxXAXcRVn-3w@comcast.com>,
> Ed Morton <morton@lsupcaemnt.com> wrote:
>
>
>
> There isn't one.
>
>
>
>
> Really? How many times a w do say to yourself, "Boy, I wish we
> had an FT?" Seriously, is the lack of FT really a big impediment
> in doing your day to day awk programming?
I wish for FT about as often as I'm thankful for RT which is about once
a month or so and that's mostly when responding to NG articles. I've
rarely needed either of them in my own usage.
> First of all, FT would have to be an array, not a single variable.
That'd be the most obvious implementation. I was just picturing some
variable that gets set by context within loops on fields just like RT
gets set by context on loops through records, but I hadn't thought at
all about the implementation.
> Secondly, the overhead for having it would be huge. Gawk would have
> to clear and then fill in the array for EVERY record. When you're
> processing multi-gigabyte log files, that's a lot of overhead for
> a feature that probably won't be used all that much.
I don't process mutli-gigabyte files at all and if I did I'd probably
write a C program to do it. I use awk for speed of writing scripts, not
speed of executing them.
>
> I know: For 3.1.3, I fixed gawk so that it only changes RT if something
> really changed. In the normal case, where RS = "\n", RT's value
> doesn't change between records. This brought a huge speedup for
> something like
>
> gawk '{ print }' /some/huge/file
>
> If you really need the info an FT array would give you, it's pretty
> easy to cobble up a function using match and substr to get it.
It's not an "if", it's a "when". I'm just suggesting that some builtin
method to avoid multiple users having to repeatedly write their own
function for this would be a useful addition. If performance is a
significant issue, as I'm sure it is, you could always have a flag that
controls whether the functionality is on or off.
As with any suggestion, you can always reject it for many good reasons.
Ed.
> Arnold
| |
| Ed Morton 2004-11-18, 8:55 am |
|
Aharon Robbins wrote:
> In article <cnb1dt$nkv$1@yin.interaccess.com>,
> Kenny McCormack <gazelle@interaccess.com> wrote:
>
>
>
> Completeness is great in theory, but every new feature has a practical
> implication. If the cost of the feature (code complexity, code
> maintainability, run time efficiency) outweighs the expressive power
> that the feature brings, then it's not worth it.
Yes, that's right. Users suggest enhancements and developers weigh the
pros-and-cons based on their expertise and decide whether to implement
or reject the suggestions.
<snip>
>
>
> Yes, this would work, and it's even more elegant than having FT.
I considered that for this example, but it's very in-elegant if the
input file is more than one line long and you want to get line-by-line
output as usual since you need to go looking for newlines (or whatever
the original RS was) in the text and special-case them.
Ed.
| |
| Aharon Robbins 2004-11-18, 8:55 am |
| In article <ykPmd.6493$3U4.143215@news02.tsnz.net>,
Don Stokes <don@daedalus.co.not-this-bit.nz> wrote:
>In article <419b1916$1@news.012.net.il>,
>Aharon Robbins <arnold@skeeve.com> wrote:
>
>Sure, I could do that. But given that gawk's external include stuff is
>clunky at best, a setlocale() builtin isn't an unreasonable request, no?
>
>I'll even code it if required.
>
>-- don
I still think it's best done with a shell wrapper. You're the only
one to request it so far, in the almost 4 years since 3.1.0 and
the i18n features were released.
Arnold
--
Aharon (Arnold) Robbins --- Pioneer Consulting Ltd. arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381 Fax: +1 206 350 8765
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL
| |
| Kenny McCormack 2004-11-18, 3:56 pm |
| In article <3003o1F2p63sjU2@uni-berlin.de>,
Patrick TJ McPhee <ptjm@interlog.com> wrote:
>In article <41990592$1@news.012.net.il>,
>Aharon Robbins <arnold@skeeve.com> wrote:
>% In article <lYWdndip-pRxXAXcRVn-3w@comcast.com>,
>% Ed Morton <morton@lsupcaemnt.com> wrote:
>
>% >For any GNU implementers listening, here's a suggestion - we could
>% >really use a "FT" equivalent to "RT".
>%
>% Really? How many times a w do say to yourself, "Boy, I wish we
>% had an FT?" Seriously, is the lack of FT really a big impediment
>% in doing your day to day awk programming?
>
>I've never thought that, but I've frequently thought it would be
>nice to have an array with the offset of each field.
As it happens, I had to write a script today to solve something along these
lines. Working in TAWK, I realized that TAWK's splitp() function is just
the ticket. The idea of splitp is that you specify an RE that matches your
data - as opposed to split() (and the regular AWK field splitting) where
you specify an RE that matches the delimiters. I.e., with FPAT/splitp(), you
specify what to keep, whereas with FS/split(), you specify what to throw
away. Note that the original idea of FPAT/splitp() was to handle CSV-ish
files, but this thread illustrates a more general conception.
The effect is that if you use both functions, you end up with everything on
the line, categorized as wheat or chaff. I.e.,
n = split($0,T,/someRE/)
n1 = splitp($0,T1,/theSameRE/)
You end up with the fields in T[] and the delimiters in T1[].
May I humbly suggest that splitp() would be a nice addition to GAWK?
| |
| Ed Morton 2004-11-18, 3:56 pm |
|
Kenny McCormack wrote:
> In article <cn08hr$llc@netnews.proxy.lucent.com>,
> Ed Morton <morton@lsupcaemnt.com> wrote:
> ...
>
>
>
> You are obviously an engineer and not a mathematician.
He shoots, he scores. Guilty as charged.
> I need say no more.
A mathematician would have found a way to say "I need say no more"
without using vowels.
Ed.
| |
| A Ferenstein 2004-11-18, 8:56 pm |
| Ed, sorry, I guess I'm slow. Can you please explain what this is supposed to
do?
The function must be able to show beginning and ending columns of text whose
(delimiter) separator is specified by a regular expression!
> No it isn't:
>
> gawk '{c=gsub("\(.\)","&"SUBSEP);FS=SUBSEP;$0=$0;idx=1;
> for (i=1; i<NF; i++) {
> if ($i == " ") {
> if (fnd) {
> delimiter_array[++idx] = i-1
> idx++;
> }
> fnd = 0
> } else {
> if (!fnd) {
> delimiter_array[idx] = i
> }
> fnd = 1
> }
> }
> delimiter_array[++idx] = i-1
> }'
>
> I have no idea if the above is any faster or slower than using substr()
> or your recusrion with match(). My only point is that there's
alternatives.
>
> Regards,
>
> Ed.
| |
| Steve Calfee 2004-11-18, 8:56 pm |
| On 16 Nov 2004 13:24:32 +0200, arnold@skeeve.com (Aharon Robbins)
wrote:
>In article <slrncpjk05.gku.kasal@matsrv.math.cas.cz>,
>Stepan Kasal <kasal@ucw.cz> wrote:
>
>IGNORECASE is #1. It's taken a long time to get all the semantics right,
>and as standard awk already has tolower() and toupper(), in retrospect
>it seems to not have been worth the trouble.
>
>The /dev/pid and so on special files. Those will actually disappear
>eventually, now that PROCINFO is in.
>
>The bit manipulation functions seem to be overkill; I wonder if anyone
>actually uses them?
>
OH, yes. I wrote a machine language interpreter in AWK. This made a
machine simulator. Try to do that without the bit functions.
By the way, what is the precision of an integer in gawk 3.1.0 for
windows? My problem is that some operators like "<" are important in
simulating a computer, but signed are probably not the desired
comparison. In the gawk document it said all integers are represented
as double precision floating point. That cannot be correct, how would
the bit operators work? If the manual is true and the double floats
are truncated to say 64 bit integers, for my purpose all the stuff is
unsigned for a 32 bit machine simulation. That is good.
If I am comparing the pc at greater than 2**31 with one at say 2**3 I
would like a > to work, and not fail because it is a signed test.
The first version was done in tawk, which had different tests for
signed and unsigned less than. I would like to run on a newer,
supported awk, so I have tried gawk. Empirically it works with the 32
bit address space of the simulated machine, but I would like to know
what the actual workings are especially in regards to 32 bit
arithmetic. For instance, what is the result of adding 1 to (2**31)-1?
Thanks, Steve
| |
| Jürgen Kahrs 2004-11-19, 3:55 am |
| Aharon Robbins wrote:
> I actually use RT quite a lot, especially when working with XML files.
> Using something like RS = "<[^>]+>", RT becomes an XML open or close
> tag, and $0 is all the text up to the tag. Incredibly useful.
After extraction of open tags, do you also work
on XML attributes ? Or are you mostly interested
in tag names ?
You were talking about the cost of an additional
feature as compared to its use. What do you think
about the relation between cost and effect in the
XML extension ?
| |
| Stepan Kasal 2004-11-19, 3:55 am |
| Hello Aharon,
you wrote:
> painful lesson for me: there are features in gawk that have taken
> literally *years* to get right, and that I'd happily remove, except
> that it's Too Late Now.
It might be enlightening for us to know which features do you mean.
Could you please tell us? (At least the ones you can tell off your mind.)
It's always less painful to learn from others' mistakes. ;-)
Thanks,
Stepan
| |
| Aharon Robbins 2004-11-19, 3:55 am |
| In article <lYWdndip-pRxXAXcRVn-3w@comcast.com>,
Ed Morton <morton@lsupcaemnt.com> wrote:
>Off the top of my head I don't know a built-in equivalent to "RT" in
>identifying field separators,
There isn't one.
>For any GNU implementers listening, here's a suggestion - we could
>really use a "FT" equivalent to "RT".
Really? How many times a w do say to yourself, "Boy, I wish we
had an FT?" Seriously, is the lack of FT really a big impediment
in doing your day to day awk programming?
First of all, FT would have to be an array, not a single variable.
Secondly, the overhead for having it would be huge. Gawk would have
to clear and then fill in the array for EVERY record. When you're
processing multi-gigabyte log files, that's a lot of overhead for
a feature that probably won't be used all that much.
I know: For 3.1.3, I fixed gawk so that it only changes RT if something
really changed. In the normal case, where RS = "\n", RT's value
doesn't change between records. This brought a huge speedup for
something like
gawk '{ print }' /some/huge/file
If you really need the info an FT array would give you, it's pretty
easy to cobble up a function using match and substr to get it.
Arnold
--
Aharon (Arnold) Robbins --- Pioneer Consulting Ltd. arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381 Fax: +1 206 350 8765
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL
| |
| Stepan Kasal 2004-11-19, 8:55 am |
| Hello,
Kenny McCormack wrote:
[color=darkred]
> As it happens, I had to write a script today to solve something along these
> lines. Working in TAWK, I realized that TAWK's splitp() function [...]
> n = split($0,T,/someRE/)
> n1 = splitp($0,T1,/theSameRE/)
when we sacrifice the comfort of automatic field splitting to $n, other
possibilities open up.
Perhaps split() could save the offsets somehow.
We could use the same notation as match() uses for substrings, ie.
arr[1, "start"] and arr[1, "length"]
The problem with this is that is cannot be optimized out for programs which
don't use it, as there is no easy way to trace when the index is
num SUBSEP "start"
Or we could add a fourth parameter, which would be filled with the offsets
of the fields:
split($0, T, /someRE/, T1)
> May I humbly suggest that splitp() would be a nice addition to GAWK?
In a following post, Kenny posted an implementation, featuring:
[color=darkred]
well, this has the usual limitations:
1) you cannot use static RE (delimited by slashes) as parameter
(we are discussing non-Thompson awk, of course)
2) when you run match on substring, you change semantics of ^, \<, \>, etc.
Problem 1) is worked around by using ``dynamic RE's'' a.k.a. backslash hell.
The 2) cennot be worked around, it simply makes the ``general'' functions
less general. I proposed long ago that match() might have yet another
parameter which would specify that the search starts from n-th character of
the string but Arnold didn't seem to like the idea. Well, match() is
already complicated enough...
Regards,
Stepan
| |
| Aharon Robbins 2004-11-19, 8:55 am |
| In article <cndfnq$etg@netnews.proxy.lucent.com>,
Ed Morton <morton@lsupcaemnt.com> wrote:
>Aharon Robbins wrote:
><snip>
>
>When I said "really need" I just meant "in actuality would be best
>served by" rather than "desperately require". Sorry for the confusion.
No problem. Unfortunately, I can only read what's there. I have
yet to get gawk's /dev/telepath feature to work correctly. :-)
Take it easy,
Arnold
P.S. Email bounced. Don't know why.
--
Aharon (Arnold) Robbins --- Pioneer Consulting Ltd. arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381 Fax: +1 206 350 8765
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL
| |
| Kenny McCormack 2004-11-20, 8:55 am |
| In article <41990592$1@news.012.net.il>,
Aharon Robbins <arnold@skeeve.com> wrote:
>In article <lYWdndip-pRxXAXcRVn-3w@comcast.com>,
>Ed Morton <morton@lsupcaemnt.com> wrote:
>
>There isn't one.
>
>
>Really? How many times a w do say to yourself, "Boy, I wish we
>had an FT?" Seriously, is the lack of FT really a big impediment
>in doing your day to day awk programming?
I'm not disagreeing with anything you're saying - and the arguments based
on efficiency certainly sound right. However, I have a few comments:
1) I think Ed just was thinking in terms of completeness, not
really asserting that it would be used all that often. In fact, RT is
a neat idea, but I don't think I've used it more than twice since its
inception. FT would be even less frequently used.
2) For the most part, if you wanted the functionality of FT,
couldn't you just set RS=< whatYouHadPreviouslyThoughtOfAsYourField
Delimiter>
and then use RT? I've seen a lot of this sort of technique - where you set
RS to something and then process each field in the line as if it were a
"record".
| |
| Stepan Kasal 2004-11-21, 3:57 am |
| Hello,
In article <cnbcdk$hgq@netnews.proxy.lucent.com>, Ed Morton wrote:
> Aharon Robbins wrote:
>
> I considered that for this example, but it's very in-elegant if the
> input file is more than one line long and you want to get line-by-line
> output as usual since you need to go looking for newlines (or whatever
> the original RS was) in the text and special-case them.
Of course, you have to craft the RS so that it matches the newline too,
something like RS="...|\n".
Then, your code has to start by something like
if (RT == "\n") ...
Hope this helps,
Stepan
| |
| Aharon Robbins 2004-11-21, 3:57 am |
| In article <cnbbv1$hae@netnews.proxy.lucent.com>,
Ed Morton <morton@lsupcaemnt.com> wrote:
>I wish for FT about as often as I'm thankful for RT which is about once
>a month or so and that's mostly when responding to NG articles. I've
>rarely needed either of them in my own usage.
My point exactly; I suspect that it's not broadly enough useful to
be worth the trouble to implement.
>I don't process mutli-gigabyte files at all
Ah, but there are plenty of people who do. I have to worry about my
entire customer base.
> and if I did I'd probably write a C program to do it. I use awk for
> speed of writing scripts, not speed of executing them.
You might be surprised at the tradeoff you're making. Gawk is faster
than Unix awk, and mawk is usually faster than gawk and perl. I occasionally
get reports that gawk is faster than perl too. It may be that gawk or
mawk are "fast enough" that you can use that awk script for production.
>
>It's not an "if", it's a "when". I'm just suggesting that some builtin
>method to avoid multiple users having to repeatedly write their own
>function for this would be a useful addition.
Alternatively, the web is a great resource. When you write that function,
post it here, and then newbies can be refered to your article as found
via google or dejanews.
Or contribute it to the FSF for me to include in the gawk doc. There's
more than one way to skin a cat. :-)
It's nothing personal, but I've learned the hard way not to go dropping
in features every time an idea comes up in comp.lang.awk (or elsewhere).
Arnold
--
Aharon (Arnold) Robbins --- Pioneer Consulting Ltd. arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381 Fax: +1 206 350 8765
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL
| |
| Aharon Robbins 2004-11-21, 3:57 am |
| In article <slrncpjleu.gku.kasal@matsrv.math.cas.cz>,
Stepan Kasal <kasal@ucw.cz> wrote:
>Back to 1)--performance:
>Luckily, awk has no eval, so we can say in compile time whether the
>awk program is using FT or not. (Thus there is no need for any
>external option, the optimization can be done automatically.)
>If this were done right, there would be no performance penalty for
>programs that don't use FT.
It's still one if test per record. Small, but not zero.
The penalty when using it would be severe though. Gawk would have to
completely parse the record into fields for every record, whether or
not fields get referenced. Thus, the lazy field parsing stuff goes out
the window. I suspect therefore that we're looking at significant code
complexity too.
And, it's still more incremental goo in the code. Another straw bringing
the camel's back ever closer to breaking. Better to do it with an
awk function. That's also portable to other awks.
Arnold
--
Aharon (Arnold) Robbins --- Pioneer Consulting Ltd. arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381 Fax: +1 206 350 8765
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL
| |
| Ed Morton 2004-11-21, 3:57 am |
|
Aharon Robbins wrote:
> In article <cnbbv1$hae@netnews.proxy.lucent.com>,
> Ed Morton <morton@lsupcaemnt.com> wrote:
>
>
>
> My point exactly; I suspect that it's not broadly enough useful to
> be worth the trouble to implement.
Right, that's where your judgment kicks in.
>
>
> Ah, but there are plenty of people who do. I have to worry about my
> entire customer base.
Right, that's where your judgment kicks in.
>
>
> You might be surprised at the tradeoff you're making. Gawk is faster
> than Unix awk, and mawk is usually faster than gawk and perl. I occasionally
> get reports that gawk is faster than perl too. It may be that gawk or
> mawk are "fast enough" that you can use that awk script for production.
It's not that much harder to write C than awk.
>
>
> Alternatively, the web is a great resource. When you write that function,
> post it here, and then newbies can be refered to your article as found
> via google or dejanews.
Yes, I've started putting some examples on my web site. When it looks a
bit more presentable I'll start referring peiople to it rather than
posting the same solutions here repeatedly.
> Or contribute it to the FSF for me to include in the gawk doc. There's
> more than one way to skin a cat. :-)
I wouldn't have thought of that, I'd have thought of getting it into the
FAQ (though I don't know how to make that happen either).
> It's nothing personal, but I've learned the hard way not to go dropping
> in features every time an idea comes up in comp.lang.awk (or elsewhere).
That's fine, but all you had to say in response to my suggestion was
"Thanks for the suggestion but I've considered it and it's not worth the
development and performance overhead" to which I'd have replied "Fair
enough, thanks for considering it" and we wouldn't have had this chain
of postings.
What I'm taking from this is that you prefer not to see suggestions for
enhancements posted to the NG unless it's something that the poster
personally cares deeply about and is willing to debate and/or take on
the development work personally and that's fine too - lesson learned.
By the way, I really appreciate all the hard work that's gone into gawk.
It makes my working life much easier on a daily basis. Thank you.
Ed.
> Arnold
| |
| E. Rosten 2004-11-21, 3:57 am |
| Aharon Robbins wrote:
> The bit manipulation functions seem to be overkill; I wonder if anyone
> actually uses them?
Yes!
I do. Not often, but when I need them, they are invaluable. I'd have to
write horrible C code otherwise.
-Ed
--
(You can't go wrong with psycho-rats.) (er258)(@)(eng.cam)(.ac.uk)
/d{def}def/f{/Times findfont s scalefont setfont}d/s{10}d/r{roll}d f 5/m
{moveto}d -1 r 230 350 m 0 1 179{1 index show 88 rotate 4 mul 0 rmoveto}
for /s 15 d f pop 240 420 m 0 1 3 { 4 2 1 r sub -1 r show } for showpage
| |
| Ed Morton 2004-11-21, 3:57 am |
|
Aharon Robbins wrote:
<snip>
> Well, what probably got me was the "we really need" part.
When I said "really need" I just meant "in actuality would be best
served by" rather than "desperately require". Sorry for the confusion.
Ed.
| |
| Don Stokes 2004-11-21, 3:57 am |
| In article <4199e370$1@news.012.net.il>,
Aharon Robbins <arnold@skeeve.com> wrote:
>I also tend to wonder if it was worth the trouble to bring the i18n
>features of gettext out to the awk language level.
On a related note, is it be possible to set the language from inside
an awk program? Lots of times I really want to be sure LC_CTYPE=C and
not some other randomness.
IMAO, it would be nice if the default was to use C as the locale, and
allow users to change it if their programs were going to actually use
the locale stuff, e.g.
BEGIN {
setlocale("LC_CTYPE", ENVIRON["LC_CTYPE"])
...
}
Mostly, if you aren't really trying to use the locale stuff, it just
causes problems.
-- don
| |
| Aharon Robbins 2004-11-21, 3:57 am |
| In article <2sxmd.6391$3U4.136633@news02.tsnz.net>,
Don Stokes <don@daedalus.co.not-this-bit.nz> wrote:
>On a related note, is it be possible to set the language from inside
>an awk program? Lots of times I really want to be sure LC_CTYPE=C and
>not some other randomness.
Nope. Sorry. This is best done with a shell wrapper. (Or an extension
function, hint, hint.)
>IMAO, it would be nice if the default was to use C as the locale, and
>allow users to change it if their programs were going to actually use
>the locale stuff,
Yeah, but that's anti-POSIX. D*mned if you do and d*mned if you don't.
>Mostly, if you aren't really trying to use the locale stuff, it just
>causes problems.
Amen brother.
--
Aharon (Arnold) Robbins --- Pioneer Consulting Ltd. arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381 Fax: +1 206 350 8765
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL
| |
| Aharon Robbins 2004-11-21, 3:57 am |
| In article <ykPmd.6493$3U4.143215@news02.tsnz.net>,
Don Stokes <don@daedalus.co.not-this-bit.nz> wrote:
>In article <419b1916$1@news.012.net.il>,
>Aharon Robbins <arnold@skeeve.com> wrote:
>
>Sure, I could do that. But given that gawk's external include stuff is
>clunky at best, a setlocale() builtin isn't an unreasonable request, no?
>
>I'll even code it if required.
>
>-- don
I still think it's best done with a shell wrapper. You're the only
one to request it so far, in the almost 4 years since 3.1.0 and
the i18n features were released.
Arnold
--
Aharon (Arnold) Robbins --- Pioneer Consulting Ltd. arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381 Fax: +1 206 350 8765
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL
| |
| Kenny McCormack 2004-11-21, 3:57 am |
| In article <3003o1F2p63sjU2@uni-berlin.de>,
Patrick TJ McPhee <ptjm@interlog.com> wrote:
>In article <41990592$1@news.012.net.il>,
>Aharon Robbins <arnold@skeeve.com> wrote:
>% In article <lYWdndip-pRxXAXcRVn-3w@comcast.com>,
>% Ed Morton <morton@lsupcaemnt.com> wrote:
>
>% >For any GNU implementers listening, here's a suggestion - we could
>% >really use a "FT" equivalent to "RT".
>%
>% Really? How many times a w do say to yourself, "Boy, I wish we
>% had an FT?" Seriously, is the lack of FT really a big impediment
>% in doing your day to day awk programming?
>
>I've never thought that, but I've frequently thought it would be
>nice to have an array with the offset of each field.
As it happens, I had to write a script today to solve something along these
lines. Working in TAWK, I realized that TAWK's splitp() function is just
the ticket. The idea of splitp is that you specify an RE that matches your
data - as opposed to split() (and the regular AWK field splitting) where
you specify an RE that matches the delimiters. I.e., with FPAT/splitp(), you
specify what to keep, whereas with FS/split(), you specify what to throw
away. Note that the original idea of FPAT/splitp() was to handle CSV-ish
files, but this thread illustrates a more general conception.
The effect is that if you use both functions, you end up with everything on
the line, categorized as wheat or chaff. I.e.,
n = split($0,T,/someRE/)
n1 = splitp($0,T1,/theSameRE/)
You end up with the fields in T[] and the delimiters in T1[].
May I humbly suggest that splitp() would be a nice addition to GAWK?
| |
| Ed Morton 2004-11-21, 3:57 am |
|
A Ferenstein wrote:
> Ed, sorry, I guess I'm slow.
That's OK, I have it on good authority that there are simpletons in this
NG ;-).
Can you please explain what this is supposed to
> do?
It finds the start and end character positions in lines where chains of
single spaces are the separator.
> The function must be able to show beginning and ending columns of text whose
> (delimiter) separator is specified by a regular expression!
I shouldn't have posted a response without re-reading your original
posting as you weren't trying to do what I thought.
For an RE-separated solution, if you're processing a single line of text
(which seems likely for this type of problem) then the obvious way to
handle that would be to use a running count of "length($0) + length(RT)"
to track the start and end points rather than invoking "match()".
Off the top of my head I don't know a built-in equivalent to "RT" in
identifying field separators, so if this had to be applied to every line
then I'd probably invent a comparable solution which may or may not use
match().
For any GNU implementers listening, here's a suggestion - we could
really use a "FT" equivalent to "RT".
Ed.
>
>
>
> alternatives.
>
>
>
>
| |
| Ed Morton 2004-11-21, 3:55 pm |
|
Aharon Robbins wrote:
> In article <cnb1dt$nkv$1@yin.interaccess.com>,
> Kenny McCormack <gazelle@interaccess.com> wrote:
>
>
>
> Completeness is great in theory, but every new feature has a practical
> implication. If the cost of the feature (code complexity, code
> maintainability, run time efficiency) outweighs the expressive power
> that the feature brings, then it's not worth it.
Yes, that's right. Users suggest enhancements and developers weigh the
pros-and-cons based on their expertise and decide whether to implement
or reject the suggestions.
<snip>
>
>
> Yes, this would work, and it's even more elegant than having FT.
I considered that for this example, but it's very in-elegant if the
input file is more than one line long and you want to get line-by-line
output as usual since you need to go looking for newlines (or whatever
the original RS was) in the text and special-case them.
Ed.
| |
| Aharon Robbins 2004-11-21, 3:55 pm |
| In article <cnb1dt$nkv$1@yin.interaccess.com>,
Kenny McCormack <gazelle@interaccess.com> wrote:
>I'm not disagreeing with anything you're saying - and the arguments based
>on efficiency certainly sound right. However, I have a few comments:
>
> 1) I think Ed just was thinking in terms of completeness, not
>really asserting that it would be used all that often.
Completeness is great in theory, but every new feature has a practical
implication. If the cost of the feature (code complexity, code
maintainability, run time efficiency) outweighs the expressive power
that the feature brings, then it's not worth it. This has been a
painful lesson for me: there are features in gawk that have taken
literally *years* to get right, and that I'd happily remove, except
that it's Too Late Now.
>In fact, RT is
>a neat idea, but I don't think I've used it more than twice since its
>inception. FT would be even less frequently used.
I actually use RT quite a lot, especially when working with XML files.
Using something like RS = "<[^>]+>", RT becomes an XML open or close
tag, and $0 is all the text up to the tag. Incredibly useful.
> 2) For the most part, if you wanted the functionality of FT,
>couldn't you just set RS=< whatYouHadPreviouslyThoughtOfAsYourField
Delimiter>
>and then use RT? I've seen a lot of this sort of technique - where you set
>RS to something and then process each field in the line as if it were a
>"record".
Yes, this would work, and it's even more elegant than having FT.
Arnold
--
Aharon (Arnold) Robbins --- Pioneer Consulting Ltd. arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381 Fax: +1 206 350 8765
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL
| |
| A Ferenstein 2004-11-21, 8:55 pm |
| Ed, sorry, I guess I'm slow. Can you please explain what this is supposed to
do?
The function must be able to show beginning and ending columns of text whose
(delimiter) separator is specified by a regular expression!
> No it isn't:
>
> gawk '{c=gsub("\(.\)","&"SUBSEP);FS=SUBSEP;$0=$0;idx=1;
> for (i=1; i<NF; i++) {
> if ($i == " ") {
> if (fnd) {
> delimiter_array[++idx] = i-1
> idx++;
> }
> fnd = 0
> } else {
> if (!fnd) {
> delimiter_array[idx] = i
> }
> fnd = 1
> }
> }
> delimiter_array[++idx] = i-1
> }'
>
> I have no idea if the above is any faster or slower than using substr()
> or your recusrion with match(). My only point is that there's
alternatives.
>
> Regards,
>
> Ed.
| |
| Aharon Robbins 2004-11-22, 3:56 am |
| In article <lYWdndip-pRxXAXcRVn-3w@comcast.com>,
Ed Morton <morton@lsupcaemnt.com> wrote:
>Off the top of my head I don't know a built-in equivalent to "RT" in
>identifying field separators,
There isn't one.
>For any GNU implementers listening, here's a suggestion - we could
>really use a "FT" equivalent to "RT".
Really? How many times a w do say to yourself, "Boy, I wish we
had an FT?" Seriously, is the lack of FT really a big impediment
in doing your day to day awk programming?
First of all, FT would have to be an array, not a single variable.
Secondly, the overhead for having it would be huge. Gawk would have
to clear and then fill in the array for EVERY record. When you're
processing multi-gigabyte log files, that's a lot of overhead for
a feature that probably won't be used all that much.
I know: For 3.1.3, I fixed gawk so that it only changes RT if something
really changed. In the normal case, where RS = "\n", RT's value
doesn't change between records. This brought a huge speedup for
something like
gawk '{ print }' /some/huge/file
If you really need the info an FT array would give you, it's pretty
easy to cobble up a function using match and substr to get it.
Arnold
--
Aharon (Arnold) Robbins --- Pioneer Consulting Ltd. arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381 Fax: +1 206 350 8765
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL
| |
| Jürgen Kahrs 2004-11-22, 3:56 am |
| Aharon Robbins wrote:
> I actually use RT quite a lot, especially when working with XML files.
> Using something like RS = "<[^>]+>", RT becomes an XML open or close
> tag, and $0 is all the text up to the tag. Incredibly useful.
After extraction of open tags, do you also work
on XML attributes ? Or are you mostly interested
in tag names ?
You were talking about the cost of an additional
feature as compared to its use. What do you think
about the relation between cost and effect in the
XML extension ?
| |
| Ed Morton 2004-11-22, 3:56 am |
|
Aharon Robbins wrote:
> In article <lYWdndip-pRxXAXcRVn-3w@comcast.com>,
> Ed Morton <morton@lsupcaemnt.com> wrote:
>
>
>
> There isn't one.
>
>
>
>
> Really? How many times a w do say to yourself, "Boy, I wish we
> had an FT?" Seriously, is the lack of FT really a big impediment
> in doing your day to day awk programming?
I wish for FT about as often as I'm thankful for RT which is about once
a month or so and that's mostly when responding to NG articles. I've
rarely needed either of them in my own usage.
> First of all, FT would have to be an array, not a single variable.
That'd be the most obvious implementation. I was just picturing some
variable that gets set by context within loops on fields just like RT
gets set by context on loops through records, but I hadn't thought at
all about the implementation.
> Secondly, the overhead for having it would be huge. Gawk would have
> to clear and then fill in the array for EVERY record. When you're
> processing multi-gigabyte log files, that's a lot of overhead for
> a feature that probably won't be used all that much.
I don't process mutli-gigabyte files at all and if I did I'd probably
write a C program to do it. I use awk for speed of writing scripts, not
speed of executing them.
>
> I know: For 3.1.3, I fixed gawk so that it only changes RT if something
> really changed. In the normal case, where RS = "\n", RT's value
> doesn't change between records. This brought a huge speedup for
> something like
>
> gawk '{ print }' /some/huge/file
>
> If you really need the info an FT array would give you, it's pretty
> easy to cobble up a function using match and substr to get it.
It's not an "if", it's a "when". I'm just suggesting that some builtin
method to avoid multiple users having to repeatedly write their own
function for this would be a useful addition. If performance is a
significant issue, as I'm sure it is, you could always have a flag that
controls whether the functionality is on or off.
As with any suggestion, you can always reject it for many good reasons.
Ed.
> Arnold
| |
| Stepan Kasal 2004-11-22, 3:56 am |
| Hello Ed,
you wrote:
> Aharon Robbins wrote:
>
> That'd be the most obvious implementation. I was just picturing some
> variable that gets set by context within loops on fields just like RT
> gets set by context on loops through records, but I hadn't thought at
> all about the implementation.
Well, then you cannot speak about elegancy and completeness.
I see no consistent way to implement such hack. I'd say that FT has
to be an array.
> If performance is a significant issue, as I'm sure it is, you could
> always have a flag that controls whether the functionality is on or
> off.
> As with any suggestion, you can always reject it for many good
> reasons.
Well, I see two issues: 1) performance overhead,
2) maintenance overhead
I see a possible solution to 1)--see below--but there is still 2).
If we came up with a good solution, which presented no slowdown for
programs which don't use the FT array, Arnold could still reject it
because it clutters up the source tree.
I would perhaps accept the patch in such situation, but I cannot say
that I have successfully managed a free software package for many
years. Thus we have to have respect to Arnold's decission about 2).
Back to 1)--performance:
Luckily, awk has no eval, so we can say in compile time whether the
awk program is using FT or not. (Thus there is no need for any
external option, the optimization can be done automatically.)
If this were done right, there would be no performance penalty for
programs that don't use FT.
Last, but not least, who would do it? There is no point trying to
convince Arnold about maintainablility of the code unless there is
noone who has capacity to implement it. Arnold is not going to
implement it, and I don't volunteer either.
Ed, could you write the patch, if there were a chance that Arnold
would accept it? Or can you raise some funds for this?
Regards,
Stepan Kasal
| |
| Aharon Robbins 2004-11-22, 3:56 am |
| In article <2vsjudF2njgctU1@uni-berlin.de>,
Jürgen Kahrs <Juergen.KahrsDELETETHIS@vr-web.de> wrote:
>Aharon Robbins wrote:
>
>
>After extraction of open tags, do you also work
>on XML attributes ? Or are you mostly interested
>in tag names ?
Mainly the tag names.
>You were talking about the cost of an additional
>feature as compared to its use. What do you think
>about the relation between cost and effect in the
>XML extension ?
I don't have a feel for it. It seems that if you don't use the XML
features, they're not in the way and don't add runtime overhead
for a regular use. That's good.
Otherwise, the gain in expressiveness for working with XML
data does seem to be worth the tradeoff of increased executable
size, and the source code changes didn't appear to be too
pervasive at first glance, which is also good.
Arnold
--
Aharon (Arnold) Robbins --- Pioneer Consulting Ltd. arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381 Fax: +1 206 350 8765
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL
| |
| Aharon Robbins 2004-11-22, 3:56 am |
| In article <slrncpjk05.gku.kasal@matsrv.math.cas.cz>,
Stepan Kasal <kasal@ucw.cz> wrote:
>Hello Aharon,
>
>you wrote:
>
>It might be enlightening for us to know which features do you mean.
>Could you please tell us? (At least the ones you can tell off your mind.)
>
>It's always less painful to learn from others' mistakes. ;-)
>
>Thanks,
> Stepan
IGNORECASE is #1. It's taken a long time to get all the semantics right,
and as standard awk already has tolower() and toupper(), in retrospect
it seems to not have been worth the trouble.
The /dev/pid and so on special files. Those will actually disappear
eventually, now that PROCINFO is in.
The bit manipulation functions seem to be overkill; I wonder if anyone
actually uses them?
I also tend to wonder if it was worth the trouble to bring the i18n
features of gettext out to the awk language level.
Arnold
--
Aharon (Arnold) Robbins --- Pioneer Consulting Ltd. arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381 Fax: +1 206 350 8765
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL
| |
| Ed Morton 2004-11-22, 3:56 am |
|
Stepan Kasal wrote:
> Hello Ed,
>
> you wrote:
>
>
>
> Well, then you cannot speak about elegancy and completeness.
Sure I can: Having access to the string that matches the field separator
regexp would be the elegant and complete functionality given that we
have access to the string that matches the record separator regexp. How
that's implemented is a minor detail to a user, and whether or not it's
worth implementing is a significant concern to the developer, but that
doesn't affect the abstract appeal to a user.
If you want to get all esoteric about it, part of the QWAN
(http://c2.com/cgi/wiki?QualityWithoutaName) are symmetry and
"wholeness" and having RT without an equivalent FT is neither
symmetrical nor whole.
<snip>
> Ed, could you write the patch, if there were a chance that Arnold
> would accept it? Or can you raise some funds for this?
Nope. My impetus for suggesting it was that I see people asking
questions in this NG where that would be the most elegant solution, but
whether or not it gets implemented isn't a big deal to me personally.
Ed.
| |
| Aharon Robbins 2004-11-22, 3:56 am |
| In article <EfKdnSBrwLNdZgTcRVn-jw@comcast.com>,
Ed Morton <morton@lsupcaemnt.com> wrote:
>
>It's not that much harder to write C than awk.
I think I'll let that one go without comment ...
>
>I wouldn't have thought of that, I'd have thought of getting it into the
>FAQ (though I don't know how to make that happen either).
The FAQ seems abandoned. I don't have the cycles to take it over myself.
>
>That's fine, but all you had to say in response to my suggestion was
>"Thanks for the suggestion but I've considered it and it's not worth the
>development and performance overhead" to which I'd have replied "Fair
>enough, thanks for considering it" and we wouldn't have had this chain
>of postings.
Oh, the postings are OK. It's worth clarifying these things in public
every once in a while. I don't mind.
And it's a nice change from the usual contents of the group.
>What I'm taking from this is that you prefer not to see suggestions for
>enhancements posted to the NG unless it's something that the poster
>personally cares deeply about and is willing to debate and/or take on
>the development work personally and that's fine too - lesson learned.
Well, what probably got me was the "we really need" part. My point
was to try to explore how necessary such a feature is, and that's a
worthwhile exercise occasionally. It was more the "this is absolute
fact" tone of your post, if you get what I mean, that pushed my button.
I don't mind discussing (or shooting down :-) features in this group,
at least once in a while. I do try to be somewhat open minded.
(And I'm sure you didn't mean anything personally, nor do I mean anything
personally.)
I will add that I think gawk has grown about as large as it should grow.
See some of my other posts for the details. Thus I think that when
people see a need for something, the first attempt to solve it should
be an awk function. Failing that, it should be a dynamic module. Failing
that, it should be a suggestion for a built-in feature.
>By the way, I really appreciate all the hard work that's gone into gawk.
>It makes my working life much easier on a daily basis. Thank you.
You're quite welcome. I'm glad it helps. (This is the fun part! :-)
Arnold
--
Aharon (Arnold) Robbins --- Pioneer Consulting Ltd. arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381 Fax: +1 206 350 8765
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL
| |
| Stepan Kasal 2004-11-22, 3:56 am |
| Hello,
Ed Morton wrote:[color=darkred]
> Stepan Kasal wrote:
when I wrote the above sentence, I meant that it cannot be ``set by
context''. Other than that, yes, I agree that having FT would be
elegant and symmetrical.
But well we have another symmetry: Arnold doesn't want one more straw
on his back and there are no capacities to implement it.
Well it's not that important.
I'd like to thank you for the proposal. Yes, it's good to make a
suggestion when you have an idea. Sorry for the storm that was
triggered by it; but well, we are mere humans...
I for one couldn't resist to add one (well, four or five) comments.
Thanks again,
Stepan
| |
| Patrick TJ McPhee 2004-11-22, 3:56 am |
| In article <41990592$1@news.012.net.il>,
Aharon Robbins <arnold@skeeve.com> wrote:
% In article <lYWdndip-pRxXAXcRVn-3w@comcast.com>,
% Ed Morton <morton@lsupcaemnt.com> wrote:
% >For any GNU implementers listening, here's a suggestion - we could
% >really use a "FT" equivalent to "RT".
%
% Really? How many times a w do say to yourself, "Boy, I wish we
% had an FT?" Seriously, is the lack of FT really a big impediment
% in doing your day to day awk programming?
I've never thought that, but I've frequently thought it would be
nice to have an array with the offset of each field.
Just my two bits.
--
Patrick TJ McPhee
North York Canada
ptjm@interlog.com
| |
| Don Stokes 2004-11-22, 8:55 am |
| In article <419b1916$1@news.012.net.il>,
Aharon Robbins <arnold@skeeve.com> wrote:
>In article <2sxmd.6391$3U4.136633@news02.tsnz.net>,
>Don Stokes <don@daedalus.co.not-this-bit.nz> wrote:
>
>Nope. Sorry. This is best done with a shell wrapper. (Or an extension
>function, hint, hint.)
Sure, I could do that. But given that gawk's external include stuff is
clunky at best, a setlocale() builtin isn't an unreasonable request, no?
I'll even code it if required.
-- don
| |
| Stepan Kasal 2004-11-22, 8:55 pm |
| Hello,
Kenny McCormack wrote:
[color=darkred]
> As it happens, I had to write a script today to solve something along these
> lines. Working in TAWK, I realized that TAWK's splitp() function [...]
> n = split($0,T,/someRE/)
> n1 = splitp($0,T1,/theSameRE/)
when we sacrifice the comfort of automatic field splitting to $n, other
possibilities open up.
Perhaps split() could save the offsets somehow.
We could use the same notation as match() uses for substrings, ie.
arr[1, "start"] and arr[1, "length"]
The problem with this is that is cannot be optimized out for programs which
don't use it, as there is no easy way to trace when the index is
num SUBSEP "start"
Or we could add a fourth parameter, which would be filled with the offsets
of the fields:
split($0, T, /someRE/, T1)
> May I humbly suggest | | |