Home > Archive > AWK > November 2007 > awk and sorting file
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
awk and sorting file
|
|
| happytoday 2007-11-19, 7:01 pm |
| is there a function can sort lines in file according to certain
field=substr($0,4,6)
| |
| Janis Papanagnou 2007-11-19, 7:01 pm |
| happytoday wrote:
> is there a function can sort lines in file according to certain
> field=substr($0,4,6)
There's no sort function builtin in standard awk. That's a task
for a specialized external program (e.g. 'sort' on Unix). We'd
need some more information about your requirements to help you
further. (Do you need the sorted data just as output or will you
process it further in the awk program? Are you working in a Unix
environment? Are you using GNU awk or womething different? What
will you do with your data after sorting?)
Janis
| |
| happytoday 2007-11-19, 7:01 pm |
| On Nov 19, 6:16 pm, Janis Papanagnou <Janis_Papanag...@hotmail.com>
wrote:
> happytoday wrote:
>
> There's no sort function builtin in standard awk. That's a task
> for a specialized external program (e.g. 'sort' on Unix). We'd
> need some more information about your requirements to help you
> further. (Do you need the sorted data just as output or will you
> process it further in the awk program? Are you working in a Unix
> environment? Are you using GNU awk or womething different? What
> will you do with your data after sorting?)
>
> Janis
I am using awk version for the windows and that is the file which
contain those fields :
Name Account number Sum
James Botte 0001-0092-30-33 99.6625
Henry Byle 0001-0092-30-21 81.2211
Moray Noll 0001-0092-30-51 9552.11
etc ....
I need to sort that file to be like this :
Henry Byle 0001-0092-30-21 81.2211
James Botte 0001-0092-30-33 99.6625
Moray Noll 0001-0092-30-51 9552.11
According to the key (account number)
Thanks
| |
| Kenny McCormack 2007-11-19, 7:01 pm |
| In article <65fd1454-84ee-4376-bc1a-bfbb2a3e71a6@i29g2000prf.googlegroups.com>,
happytoday <ehabaziz2001@gmail.com> wrote:
>On Nov 19, 6:16 pm, Janis Papanagnou <Janis_Papanag...@hotmail.com>
>wrote:
>
>I am using awk version for the windows and that is the file which
>contain those fields :
>Name Account number Sum
>
>
>
>James Botte 0001-0092-30-33 99.6625
>Henry Byle 0001-0092-30-21 81.2211
>Moray Noll 0001-0092-30-51 9552.11
>etc ....
>
>
>I need to sort that file to be like this :
1) What is the exact version of AWK that you are using.
Try: awk -W version
Does that give any useful information? If so, please post the output of
that command here.
2) Can you just use the "sort" command (Windows has one - SORT.EXE) ?
| |
| loki harfagr 2007-11-19, 7:01 pm |
| On Mon, 19 Nov 2007 11:03:58 -0800, happytoday wrote:
> On Nov 19, 6:16 pm, Janis Papanagnou <Janis_Papanag...@hotmail.com>
> wrote:
>
> I am using awk version for the windows and that is the file which
> contain those fields :
> Name Account number Sum
>
>
>
> James Botte 0001-0092-30-33 99.6625 Henry Byle
> 0001-0092-30-21 81.2211 Moray Noll 0001-0092-30-51
> 9552.11 etc ....
>
>
> I need to sort that file to be like this : Henry Byle
> 0001-0092-30-21 81.2211 James Botte 0001-0092-30-33
> 99.6625 Moray Noll 0001-0092-30-51 9552.11
>
> According to the key (account number)
Seems the fields are TAB separated, then just use
$ sort -k2,2
If not use awk or sed to calibrate your field separators
then tell to 'sort' which is the one to use, its man says
-t, --field-separator=SEP
use SEP instead of non-blank to blank transition
If there are new problems from here please give a short
test sample, infile and outfile, like you just did in your post :-)
| |
| Kenny McCormack 2007-11-19, 7:01 pm |
| In article <pan.2007.11.19.19.32.06@DarkDesign.free.fr>,
loki harfagr <loki@DarkDesign.free.fr> wrote:
>On Mon, 19 Nov 2007 11:03:58 -0800, happytoday wrote:
>
>
>
> Seems the fields are TAB separated, then just use
>$ sort -k2,2
>
> If not use awk or sed to calibrate your field separators
>then tell to 'sort' which is the one to use, its man says
>
>-t, --field-separator=SEP
> use SEP instead of non-blank to blank transition
>
> If there are new problems from here please give a short
>test sample, infile and outfile, like you just did in your post :-)
This person is working on Windows. So, don't assume too much Unix-y
conventions.
Note that it is because they are working on Windows that they ask for
AWK help. In Unix, you'd just do it with the system 'sort' (1), but on
Windows, what often happens is that AWK is the only tool they have (2).
1) Actually, I've never liked the Unix sort utility. The command line
syntax is (was - yes, I know its been updated and made some more
logical, but that just makes things worse - now you have to deal with
both old and new syntax...) insane. I usually use AWK (TAWK or GAWK) to
do my processing, including sorting, even on Unix.
2) Yes, I know that Windows does have a SORT.EXE. I don't much like it,
as the syntax is even more broken than it is in Unix, but it is there,
and probably will work for the OP (as I suggested in a previous post in
this thread).
| |
| happytoday 2007-11-19, 7:01 pm |
| On Nov 19, 9:40 pm, gaze...@xmission.xmission.com (Kenny McCormack)
wrote:
> In article <pan.2007.11.19.19.32...@DarkDesign.free.fr>,
> loki harfagr <l...@DarkDesign.free.fr> wrote:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> This person is working on Windows. So, don't assume too much Unix-y
> conventions.
>
> Note that it is because they are working on Windows that they ask for
> AWK help. In Unix, you'd just do it with the system 'sort' (1), but on
> Windows, what often happens is that AWK is the only tool they have (2).
>
> 1) Actually, I've never liked the Unix sort utility. The command line
> syntax is (was - yes, I know its been updated and made some more
> logical, but that just makes things worse - now you have to deal with
> both old and new syntax...) insane. I usually use AWK (TAWK or GAWK) to
> do my processing, including sorting, even on Unix.
>
> 2) Yes, I know that Windows does have a SORT.EXE. I don't much like it,
> as the syntax is even more broken than it is in Unix, but it is there,
> and probably will work for the OP (as I suggested in a previous post in
> this thread).- Hide quoted text -
>
> - Show quoted text -
awk95.exe
| |
| Ted Davis 2007-11-19, 7:01 pm |
| On Mon, 19 Nov 2007 11:03:58 -0800, happytoday wrote:
> I am using awk version for the windows and that is the file which contain
> those fields :
> Name Account number Sum
>
>
>
> James Botte 0001-0092-30-33 99.6625 Henry Byle
> 0001-0092-30-21 81.2211 Moray Noll 0001-0092-30-51 9552.11
> etc ....
>
>
> I need to sort that file to be like this : Henry Byle
> 0001-0092-30-21 81.2211 James Botte 0001-0092-30-33 99.6625
> Moray Noll 0001-0092-30-51 9552.11
>
> According to the key (account number)
Assuming that the file is really like this
James Botte 0001-0092-30-33 99.6625
Henry Byle 0001-0092-30-21 81.2211
Moray Noll 0001-0092-30-51 9552.11
and is space delimited, then from a command line or in a batch file
sort /+19 sourcefile > target file
produces
Henry Byle 0001-0092-30-21 81.2211
James Botte 0001-0092-30-33 99.6625
Moray Noll 0001-0092-30-51 9552.11
--
T.E.D. (tdavis@umr.edu)
| |
| Steffen Schuler 2007-11-20, 6:58 pm |
| Hi happytoday, hello netlanders,
On Mon, 19 Nov 2007 11:03:58 -0800, happytoday wrote:
> On Nov 19, 6:16 pm, Janis Papanagnou <Janis_Papanag...@hotmail.com>
> wrote:
>
> I am using awk version for the windows and that is the file which
> contain those fields :
> Name Account number Sum
>
>
>
> James Botte 0001-0092-30-33 99.6625 Henry Byle
> 0001-0092-30-21 81.2211 Moray Noll 0001-0092-30-51
> 9552.11 etc ....
>
>
> I need to sort that file to be like this : Henry Byle
> 0001-0092-30-21 81.2211 James Botte 0001-0092-30-33
> 99.6625 Moray Noll 0001-0092-30-51 9552.11
>
> According to the key (account number)
>
> Thanks
since "awk95" is the original "awk" from Brian W. Kernighan, I assume that
it doesn't contain a sort function, but you have still MS-DOS "sort":
In the following I abbreviate "awk95" with "awk".
(A) Tab Separated Case
**********************
If the fields in the input file input.txt are tab separated and
if file1.awk is the file:
BEGIN { OFS = FS = "\t" }
{ print $2, $0 }
and file2.awk is the file:
BEGIN { OFS = FS = "\t" }
{ $1 = ""; sub(/\t/, "") }
1
then use under Windows:
awk -f file1.awk input.txt | sort | awk -f file2.awk
That should do your task.
(B) Blank Separated Case
************************
If your fields are blank separated, Ted Davis gave you a solution.
Another portable, and more awk-centric solution would be:
Let file1.awk be the file:
{ print substr($0, 19, 15), $0 }
and input.txt your input file with blank separated fields (field2
starting at position 19 with length 15) and file2.awk the file:
{ sub(/^[^ ]+ /, "") }
1
then the following code (same code as in the tab case) solves your
problem:
awk -f file1.awk input.txt | sort | awk -f file2.awk
(C) General AWK-SORT Pattern
****************************
The code
awk-call | sort | awk-call
is a general, often used pattern described in "The AWK Programming
Language" from A. V. Aho, B. W. Kernighan, and P. J. Weinberger.
The first awk-call prepares the input data for sort and the second awk-
call creates the output from the sorted, modified input.
Hope I could help you,
Steffen "goedel" Schuler
| |
| happytoday 2007-11-22, 6:58 pm |
| On Nov 21, 1:13 am, Steffen Schuler <schuler.stef...@googlemail.com>
wrote:
> Hi happytoday, hello netlanders,
>
>
>
>
>
> On Mon, 19 Nov 2007 11:03:58 -0800, happytoday wrote:
>
>
>
>
>
>
>
>
> since "awk95" is the original "awk" from Brian W. Kernighan, I assume that
> it doesn't contain a sort function, but you have still MS-DOS "sort":
>
> In the following I abbreviate "awk95" with "awk".
>
> (A) Tab Separated Case
> **********************
>
> If the fields in the input file input.txt are tab separated and
> if file1.awk is the file:
>
> BEGIN { OFS = FS = "\t" }
> { print $2, $0 }
>
> and file2.awk is the file:
>
> BEGIN { OFS = FS = "\t" }
> { $1 = ""; sub(/\t/, "") }
> 1
>
> then use under Windows:
>
> awk -f file1.awk input.txt | sort | awk -f file2.awk
>
> That should do your task.
>
> (B) Blank Separated Case
> ************************
>
> If your fields are blank separated, Ted Davis gave you a solution.
> Another portable, and more awk-centric solution would be:
>
> Let file1.awk be the file:
>
> { print substr($0, 19, 15), $0 }
>
> and input.txt your input file with blank separated fields (field2
> starting at position 19 with length 15) and file2.awk the file:
>
> { sub(/^[^ ]+ /, "") }
> 1
>
> then the following code (same code as in the tab case) solves your
> problem:
>
> awk -f file1.awk input.txt | sort | awk -f file2.awk
>
> (C) General AWK-SORT Pattern
> ****************************
>
> The code
>
> awk-call | sort | awk-call
>
> is a general, often used pattern described in "The AWK Programming
> Language" from A. V. Aho, B. W. Kernighan, and P. J. Weinberger.
>
> The first awk-call prepares the input data for sort and the second awk-
> call creates the output from the sorted, modified input.
>
> Hope I could help you,
>
> Steffen "goedel" Schuler- Hide quoted text -
>
> - Show quoted text -
Is there is any version of awk contain sort function ?
| |
| Steffen Schuler 2007-11-22, 6:58 pm |
| Hi happytoday, hello netlanders,
On Thu, 22 Nov 2007 13:28:21 -0800, happytoday wrote:
> Is there is any version of awk contain sort function ?
try gawk (ftp://ftp.gnu.org). It contains the two sort functions asort()
and asorti().
Kind regards,
Steffen "goedel" Schuler
| |
| Kenny McCormack 2007-11-22, 6:58 pm |
| In article <5qmb4vF10uc6sU1@mid.uni-berlin.de>,
Steffen Schuler <schuler.steffen@googlemail.com> wrote:
>Hi happytoday, hello netlanders,
>
>On Thu, 22 Nov 2007 13:28:21 -0800, happytoday wrote:
>
>
>try gawk (ftp://ftp.gnu.org). It contains the two sort functions asort()
>and asorti().
Both of which are, IMHO, all but useless (but, of course, better than
nothing).
The real answer to this question is: Use GAWK (URL given above) and use
the WHINY_USERS feature, which gives true array sorting.
| |
| Ted Davis 2007-11-22, 9:58 pm |
| On Thu, 22 Nov 2007 23:51:01 +0000, Kenny McCormack wrote:
> In article <5qmb4vF10uc6sU1@mid.uni-berlin.de>, Steffen Schuler
> <schuler.steffen@googlemail.com> wrote:
>
> Both of which are, IMHO, all but useless (but, of course, better than
> nothing).
>
> The real answer to this question is: Use GAWK (URL given above) and use
> the WHINY_USERS feature, which gives true array sorting.
Which works in *ix and Cygwin, but not otherwise in the available Windows
ports.
--
T.E.D. (tdavis@umr.edu)
| |
| Steffen Schuler 2007-11-22, 9:58 pm |
| Hi Ted, hello netlanders,
On Thu, 22 Nov 2007 19:23:55 -0600, Ted Davis wrote:
> On Thu, 22 Nov 2007 23:51:01 +0000, Kenny McCormack wrote:
>
<snip>
>
> Which works in *ix and Cygwin, but not otherwise in the available
> Windows ports.
<snip>
that's not true, see /gawk-3.1.6/README_d/README.pc in the source
distribution of gawk 3.1.6. It can be compiled with MSVC or Mingw32 and
can run without cygwin on Win32.
Kind regards,
Steffen "goedel" schuler
| |
| Ted Davis 2007-11-23, 6:58 pm |
| On Fri, 23 Nov 2007 02:04:50 +0000, Steffen Schuler wrote:
> Hi Ted, hello netlanders,
>
> On Thu, 22 Nov 2007 19:23:55 -0600, Ted Davis wrote:
>
> <snip>
> <snip>
>
> that's not true, see /gawk-3.1.6/README_d/README.pc in the source
> distribution of gawk 3.1.6. It can be compiled with MSVC or Mingw32 and
> can run without cygwin on Win32.
>
Unlike *ix users, Windows users, with a very few exceptions, have to use
whatever binaries are available - compilers, especially expensive ones
like Visual C, don't come with the OS package, and very few users have the
skill or inclination to use them (they are not very friendly). There is a
good free compiler package available, but hardly any users know about it
(<http://www.openwatcom.org/index.php/Main_Page> ).
--
T.E.D. (tdavis@umr.edu)
| |
| Kenny McCormack 2007-11-23, 6:58 pm |
| In article <pan.2007.11.23.16.57.44.547000@umr.edu>,
Ted Davis <tdavis@umr.edu> wrote:
....
>Unlike *ix users, Windows users, with a very few exceptions, have to use
>whatever binaries are available - compilers, especially expensive ones
>like Visual C, don't come with the OS package, and very few users have the
>skill or inclination to use them (they are not very friendly).
While what you say is true in absolute numbers - there being literally
billions of Windows users in the world, while the number of Unix users
is probably only numbered in the millions - it really doesn't mean that
much w.r.t. a Usenet newsgroup. I.e., around here, we assume that most
people can and will use a C compiler if they need to. And if they
don't/can't, we're not really interested in helping them.
Further, there are literally dozens of free compiler suites for Windows,
most of which will compile GAWK just fine. Cygwin, Visual C++ Express,
MinGW, Borland, etc, etc.
>There is a good free compiler package available, but hardly any users
>know about it (<http://www.openwatcom.org/index.php/Main_Page> ).
Well, there ya go. One of many. So, I don't see what the problem is.
| |
| Ted Davis 2007-11-23, 9:58 pm |
| On Fri, 23 Nov 2007 17:30:31 +0000, Kenny McCormack wrote:
> In article <pan.2007.11.23.16.57.44.547000@umr.edu>, Ted Davis
> <tdavis@umr.edu> wrote:
> ...
>
> While what you say is true in absolute numbers - there being literally
> billions of Windows users in the world, while the number of Unix users is
> probably only numbered in the millions - it really doesn't mean that much
> w.r.t. a Usenet newsgroup. I.e., around here, we assume that most people
> can and will use a C compiler if they need to. And if they don't/can't,
> we're not really interested in helping them.
>
> Further, there are literally dozens of free compiler suites for Windows,
> most of which will compile GAWK just fine. Cygwin, Visual C++ Express,
> MinGW, Borland, etc, etc.
>
>
> Well, there ya go. One of many. So, I don't see what the problem is.
The problem is a general fear of HLLs, especially C. Of the users I
provide support for, the older ones may feel comfortable with a text mode
interface, and a few of those can manage a batch file - curiously, the old
ones are also mostly comfortable with a command line FORTRAN compiler, but
don't mention C around them (done that, been reprimanded). The younger
ones mostly don't even know how to get a command line window, but many of
them are comfortable with things like drag and drop prototyping languages,
just don't mention C around them. A very few, mostly grad students, can
manage a text mode language like AWK, batch language, or WSH, and I
encounter one or two a year who use Visual C, mostly because some
simulation package they use requires it. But compile a language? The
concept is far beyond all but the rarest Windows user - other people get
paid to do that.
| |
| Kenny McCormack 2007-11-24, 3:58 am |
| In article <pan.2007.11.24.01.02.31.703000@umr.edu>,
Ted Davis <tdavis@umr.edu> wrote:
....
>The problem is a general fear of HLLs, especially C. Of the users I
>provide support for, the older ones may feel comfortable with a text mode
>interface, and a few of those can manage a batch file - curiously, the old
>ones are also mostly comfortable with a command line FORTRAN compiler, but
>don't mention C around them (done that, been reprimanded). The younger
>ones mostly don't even know how to get a command line window, but many of
>them are comfortable with things like drag and drop prototyping languages,
>just don't mention C around them. A very few, mostly grad students, can
>manage a text mode language like AWK, batch language, or WSH, and I
>encounter one or two a year who use Visual C, mostly because some
>simulation package they use requires it. But compile a language? The
>concept is far beyond all but the rarest Windows user - other people get
>paid to do that.
Yes. There are a lot of stupid people in the world.
This is news?
| |
| Jürgen Kahrs 2007-11-24, 6:57 pm |
| Kenny McCormack schrieb:
>
> Yes. There are a lot of stupid people in the world.
>
> This is news?
No, that's not news. But it was still interesting to hear
about the division between older and younger users. It
was also interesting to hear that FORTRAN isnt dead yet.
BTW: FORTRAN is 50, aint it ?
| |
| Ted Davis 2007-11-24, 6:57 pm |
| On Sat, 24 Nov 2007 16:44:20 +0100, Jürgen Kahrs wrote:
> It was also interesting to hear
> that FORTRAN isnt dead yet. BTW: FORTRAN is 50, aint it ?
More or less - it dates to 1954, 1957, or "the late fifties" depending on
your source. The language dates to '53/'54 but the first compiler came
out in April '57. It still has a place in numerical intensive computing
for engineering and science.
That's still younger than these older users - most of the command line
FORTRAN people are Mechanical Engineering professors who are well beyond
retirement age. I had to keep a DOS compiler working well into the
Win95 era and even beyond, but I did manage to convince a couple of them
to use g77 a few years after our Watfor license expired.
|
|
|
|
|