Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

Selecting blocks of data
Hello,

I am new to awk and am having problems with selecting groups of data.
I have a file of the format with 5 lines of data consisting of a
'group' of data i'm interested in.
If I don't get a complete group I want to ignore all the remaining
information associated with that partial group.
i.e. in this example file I have three groups, lines 1-5, lines 6,10
and lines 14-18,
Lines 11, 12 and 13 are incomplete, and need to be removed, such that
I end up with a file of 15 lines.

Example File
Note: I have added line numbers to help describe my problem.
They do not appear in the actual file.

1  AAA
2  BBB
3  CCC
4  DDD
5  DDD
6  AAA
7  BBB
8  CCC
9  DDD
10  DDD
11  AAA
12  BBB
13  CCC
14  AAA
15  BBB
16  CCC
17  DDD
18  DDD


The incomplete groups can appear anywhere in my data file. At the
start, middle or end.
There can be 0 or more incomplete groups.
They are always missing the two lines with the repetitive string. (The
DDD lines.)
I don't know if that makes the task easier or harder!
They


All help appreciated!

Report this thread to moderator Post Follow-up to this message
Old Post
mick
11-16-04 11:50 PM


Re: Selecting blocks of data
In article <4a67a4e9.0411090831.22108b01@posting.google.com>,
mick <mick_merlin@hotmail.com> wrote:
>Hello,
>
>I am new to awk and am having problems with selecting groups of data.
>I have a file of the format with 5 lines of data consisting of a
>'group' of data i'm interested in.
>If I don't get a complete group I want to ignore all the remaining
>information associated with that partial group.
>i.e. in this example file I have three groups, lines 1-5, lines 6,10
>and lines 14-18,
>Lines 11, 12 and 13 are incomplete, and need to be removed, such that
>I end up with a file of 15 lines.

(With the standard caveat that the problem specification is unclear - one
has to intuit what is really going on)

One way to do it is to key on your "AAA" string and only output groups that
are (exactly) 5 lines long.  Something like:

/AAA/ {p()}
{x[++n]=$0}
END {p()}
function p() {
if (n==5)
for (i=1; i<=5; i++)
print x[n]
delete x
n=0
}


Report this thread to moderator Post Follow-up to this message
Old Post
Kenny McCormack
11-16-04 11:50 PM


Re: Selecting blocks of data
mick <mick_merlin@hotmail.com> wrote:
> Hello,
>
> I am new to awk and am having problems with selecting groups of data.
> I have a file of the format with 5 lines of data consisting of a
> 'group' of data i'm interested in.
> If I don't get a complete group I want to ignore all the remaining
> information associated with that partial group.
> i.e. in this example file I have three groups, lines 1-5, lines 6,10
> and lines 14-18,
> Lines 11, 12 and 13 are incomplete, and need to be removed, such that
> I end up with a file of 15 lines.
>
> Example File
> Note: I have added line numbers to help describe my problem.
> They do not appear in the actual file.
>
> 1  AAA
> 2  BBB
> 3  CCC
> 4  DDD
> 5  DDD
> 6  AAA
> 7  BBB
> 8  CCC
> 9  DDD
> 10  DDD
> 11  AAA
> 12  BBB
> 13  CCC
> 14  AAA
> 15  BBB
> 16  CCC
> 17  DDD
> 18  DDD
>
>
> The incomplete groups can appear anywhere in my data file. At the
> start, middle or end.
> There can be 0 or more incomplete groups.
> They are always missing the two lines with the repetitive string. (The
> DDD lines.)
> I don't know if that makes the task easier or harder!
> They
>
>
> All help appreciated!

Since the presence of DDD is the key, invert the file and print 5 lines
from DDD.  From top of me head,
tac < file | awk '/DDD/,/AAA/'

Report this thread to moderator Post Follow-up to this message
Old Post
William Park
11-16-04 11:50 PM


Re: Selecting blocks of data
mick wrote:
> Hello,
>
> I am new to awk and am having problems with selecting groups of data.
> I have a file of the format with 5 lines of data consisting of a
> 'group' of data i'm interested in.
> If I don't get a complete group I want to ignore all the remaining
> information associated with that partial group.
> i.e. in this example file I have three groups, lines 1-5, lines 6,10
> and lines 14-18,
> Lines 11, 12 and 13 are incomplete, and need to be removed, such that
> I end up with a file of 15 lines.
>
> Example File
> Note: I have added line numbers to help describe my problem.
> They do not appear in the actual file.
>
> 1  AAA
> 2  BBB
> 3  CCC
> 4  DDD
> 5  DDD
> 6  AAA
> 7  BBB
> 8  CCC
> 9  DDD
> 10  DDD
> 11  AAA
> 12  BBB
> 13  CCC
> 14  AAA
> 15  BBB
> 16  CCC
> 17  DDD
> 18  DDD
>
>
> The incomplete groups can appear anywhere in my data file. At the
> start, middle or end.
> There can be 0 or more incomplete groups.
> They are always missing the two lines with the repetitive string. (The
> DDD lines.)
> I don't know if that makes the task easier or harder!
> They

This should work regardless of where the missing strings are located.
There's extra stuff in there that can be trimmed out, but I wrote it
that way so that I can clean things up with a function.  But I wasn't
able to write the function.  So now I need help.  How do I write a
function for the first three lines in each pattern?  The function must
be able to tell me the value of both the array (a) and the index (x).

##########
#!/usr/bin/awk -f
x == 0 {
if (!/AAA/) x = /AAA/ ? 0 : -1
if (x >= 0) a[x++] = $0
else x = 0
next
}
x == 1 {
if (!/BBB/) x = /AAA/ ? 0 : -1
if (x >= 0) a[x++] = $0
else x = 0
next
}
x == 2 {
if (!/CCC/) x = /AAA/ ? 0 : -1
if (x >= 0) a[x++] = $0
else x = 0
next
}
x == 3 {
if (!/DDD/) x = /AAA/ ? 0 : -1
if (x >= 0) a[x++] = $0
else x = 0
next
}
x == 4 {
if (!/DDD/) x = /AAA/ ? 0 : -1
if (x >= 0) a[x++] = $0
else x = 0
if (x == 5) {for (i = 0; i < x; i++) print a[i]; x = 0}
}
##########

--
Regards,

---Robert

Report this thread to moderator Post Follow-up to this message
Old Post
Robert Katz
11-16-04 11:50 PM


Re: Selecting blocks of data
Robert Katz wrote:

[ . . . ]

> There's extra stuff in there that can be trimmed out, but I wrote it
> that way so that I can clean things up with a function.  But I wasn't
> able to write the function.  So now I need help.  How do I write a
> function for the first three lines in each pattern?  The function must
> be able to tell me the value of both the array (a) and the index (x).
>
> ##########
> #!/usr/bin/awk -f
>         x == 0 {
>                 if (!/AAA/) x = /AAA/ ? 0 : -1
>                 if (x >= 0) a[x++] = $0
>                 else x = 0
>                 next
>             }
>         x == 1 {
>                 if (!/BBB/) x = /AAA/ ? 0 : -1
>                 if (x >= 0) a[x++] = $0
>                 else x = 0
>                 next
>             }
>         x == 2 {
>                 if (!/CCC/) x = /AAA/ ? 0 : -1
>                 if (x >= 0) a[x++] = $0
>                 else x = 0
>                 next
>             }
>         x == 3 {
>                 if (!/DDD/) x = /AAA/ ? 0 : -1
>                 if (x >= 0) a[x++] = $0
>                 else x = 0
>                 next
>             }
>         x == 4 {
>                 if (!/DDD/) x = /AAA/ ? 0 : -1
>                 if (x >= 0) a[x++] = $0
>                 else x = 0
>                 if (x == 5) {for (i = 0; i < x; i++) print a[i]; x = 0}
>         }
> ##########
>
Okay, I rewrote it so that there are three identical lines of action for
each of the four patterns.  But I still couldn't write the function, let
alone figure out how to call it.

#!/usr/bin/awk -f
x == 0 {
pattern = /AAA/
if (!pattern) x = /AAA/ ? 0 : -1
if (x >= 0) a[x++] = $0
else x = 0
next
}
x == 1 {
pattern = /BBB/
if (!pattern) x = /AAA/ ? 0 : -1
if (x >= 0) a[x++] = $0
else x = 0
next
}
x == 2 {
pattern = /CCC/
if (!pattern) x = /AAA/ ? 0 : -1
if (x >= 0) a[x++] = $0
else x = 0
next
}
x == 3 || x == 4 {
pattern = /DDD/
if (!pattern) x = /AAA/ ? 0 : -1
if (x >= 0) a[x++] = $0
else x = 0
if (x == 5) {for (i = 0; i < x; i++) print a[i]; x = 0}
}

--
Regards,

---Robert

Report this thread to moderator Post Follow-up to this message
Old Post
Robert Katz
11-16-04 11:50 PM


Re: Selecting blocks of data
mick wrote:
> Hello,
>
> I am new to awk and am having problems with selecting groups of data.
> I have a file of the format with 5 lines of data consisting of a
> 'group' of data i'm interested in.
> If I don't get a complete group I want to ignore all the remaining
> information associated with that partial group.
> i.e. in this example file I have three groups, lines 1-5, lines 6,10
> and lines 14-18,
> Lines 11, 12 and 13 are incomplete, and need to be removed, such that
> I end up with a file of 15 lines.
>
> Example File
> Note: I have added line numbers to help describe my problem.
> They do not appear in the actual file.
>
> 1  AAA
> 2  BBB
> 3  CCC
> 4  DDD
> 5  DDD
> 6  AAA
> 7  BBB
> 8  CCC
> 9  DDD
> 10  DDD
> 11  AAA
> 12  BBB
> 13  CCC
> 14  AAA
> 15  BBB
> 16  CCC
> 17  DDD
> 18  DDD
>
>
> The incomplete groups can appear anywhere in my data file. At the
> start, middle or end.
> There can be 0 or more incomplete groups.
> They are always missing the two lines with the repetitive string. (The
> DDD lines.)
> I don't know if that makes the task easier or harder!
> They
>
>
> All help appreciated!

Forget all the function stuff that other guy was jabbering about, this
ought to do what you want regardless of which lines are missing.

#!/usr/bin/awk -f
{
if (x == 0) pattern = /AAA/
else if (x == 1) pattern = /BBB/
else if (x == 2) pattern = /CCC/
else if (x == 3 || x == 4) pattern = /DDD/
if (!pattern) x = /AAA/ ? 0 : -1
if (x >= 0) a[x++] = $0
else x = 0
if (x == 5) {for (i = 0; i < x; i++) print a[i]; x = 0}
}

--
Regards,

---Robert

Report this thread to moderator Post Follow-up to this message
Old Post
Robert Katz
11-16-04 11:50 PM


Re: Selecting blocks of data

Robert Katz wrote:
> mick wrote:
> 
>
>
> Forget all the function stuff that other guy was jabbering about, this
> ought to do what you want regardless of which lines are missing.
>
> #!/usr/bin/awk -f
> {
>         if (x == 0) pattern = /AAA/
>         else if (x == 1) pattern = /BBB/
>         else if (x == 2) pattern = /CCC/
>         else if (x == 3 || x == 4) pattern = /DDD/
>         if (!pattern) x = /AAA/ ? 0 : -1
>         if (x >= 0) a[x++] = $0
>         else x = 0
>         if (x == 5) {for (i = 0; i < x; i++) print a[i]; x = 0}
> }
>

Or alternatively just set the approriate RS and FS then print out the 3
lines before each RS followed by the RS, e.g.:

gawk -vRS="DDD\nDDD\n" -vFS="\n" '
{printf "%s\n%s\n%s\n%s",$(NF-3),$(NF-2),$(NF-1),RS}'

Regards,

Ed.

Report this thread to moderator Post Follow-up to this message
Old Post
Ed Morton
11-16-04 11:50 PM


Re: Selecting blocks of data
Robert Katz wrote:
> mick wrote:
> 
>
>
> Forget all the function stuff that other guy was jabbering about, this
> ought to do what you want regardless of which lines are missing.
>
> #!/usr/bin/awk -f
> {
>         if (x == 0) pattern = /AAA/
>         else if (x == 1) pattern = /BBB/
>         else if (x == 2) pattern = /CCC/
>         else if (x == 3 || x == 4) pattern = /DDD/
>         if (!pattern) x = /AAA/ ? 0 : -1
>         if (x >= 0) a[x++] = $0
>         else x = 0
>         if (x == 5) {for (i = 0; i < x; i++) print a[i]; x = 0}
> }
>

Okay, just a bit simpler.

#!/usr/bin/awk -f
{
if (x == 0) pattern = /AAA/
else if (x == 1) pattern = /BBB/
else if (x == 2) pattern = /CCC/
else pattern = /DDD/
if (!pattern) x = /AAA/ ? 0 : -1
if (x >= 0) a[x++] = $0
if (x == 5) {for (i = 0; i < x; i++) print a[i]}
}

--
Regards,

---Robert

Report this thread to moderator Post Follow-up to this message
Old Post
Robert Katz
11-16-04 11:50 PM


Re: Selecting blocks of data

Ed Morton wrote:
<snip>
> Or alternatively just set the approriate RS and FS then print out the 3
> lines before each RS followed by the RS, e.g.:
>
> gawk -vRS="DDD\nDDD\n" -vFS="\n" '
> {printf "%s\n%s\n%s\n%s",$(NF-3),$(NF-2),$(NF-1),RS}'

Just occurred to me that'll fail if the input file doesn't end in
DDD\nDDD\n, so you need a small tweak. You just need to check that RT
got set so this'll work:

gawk -vRS="DDD\nDDD\n" -vFS="\n" '
RT{printf "%s\n%s\n%s\n%s",$(NF-3),$(NF-2),$(NF-1),RS}'

Regards,

Ed.

Report this thread to moderator Post Follow-up to this message
Old Post
Ed Morton
11-16-04 11:50 PM


Re: Selecting blocks of data
Robert Katz wrote:
> Robert Katz wrote:
> 
>
> Okay, just a bit simpler.
>
> #!/usr/bin/awk -f
> {
>         if (x == 0) pattern = /AAA/
>         else if (x == 1) pattern = /BBB/
>         else if (x == 2) pattern = /CCC/
>         else pattern = /DDD/
>         if (!pattern) x = /AAA/ ? 0 : -1
>         if (x >= 0) a[x++] = $0
>         if (x == 5) {for (i = 0; i < x; i++) print a[i]}
> }
>

And simpler still.  I changed the variable to make it clearer that valid
is just a boolean with values of 0 or 1.

#!/usr/bin/awk -f
{
if (x == 0) valid = /AAA/
else if (x == 1) valid = /BBB/
else if (x == 2) valid = /CCC/
else valid = /DDD/
if (!valid) x = 0
a[x++] = $0
if (x == 5) for (i = 0; i < x; i++) print a[i]
}

--
Regards,

---Robert

Report this thread to moderator Post Follow-up to this message
Old Post
Robert Katz
11-16-04 11:50 PM


Sponsored Links




Last Thread Next Thread Next
Pages (3): [1] 2 3 »
Search this forum -> 
Post New Thread

AWK archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 06:15 AM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.