Home > Archive > Cobol > August 2006 > Need help on cobol
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Need help on cobol
|
|
| florence 2006-08-05, 6:55 pm |
|
I had two Physical Sequnetial files both are sorted by CUST ID no.
File1 containts 10,000 records
File2 contains 22 millions rows.(For same CUSTID, there are
multiple records)
I need compare file1 CUST ID and File2 CUST ID and matching rows would
be written to
output file.
Here I am thinking two possible solutions:
1. Fetch each record from FIle1 and compare with file2
sequentilly until the cust ID in file1 greater then file2 cust id.
Match is found write ouput record.
2. Put all the 22 million rows in a table and use SEARCH ALL for
each every record on file1.
I want to know which method is preferrable.
questions:
1. If I store 22 million records in table declaration, how much
storage is needed. Is this ok to use this method.
2. Sequntial processing it is taking very very llong time.
If there are any different methods are there, Let me know
Please suggest your opions.
Your help is appreciated.
Thanks,
| |
| Arnold Trembley 2006-08-05, 6:55 pm |
| In my opinion, this kind of problem is better solved by a sequential
match-merge process. This is a well-known, reliable, and efficient
batch processing technique.
You don't say which COBOL compiler you are using, or what operating
environment this will run in, or if either of the files are on tape
versus disk. It might also be helpful to know the record lengths of
each file.
Few COBOL environments will be able to support a working-storage table
containing 22 million records. If we assume each record is 80 bytes
long, the working-storage table would occupy 80 * 22 million bytes or
about 1.76 gigabytes of memory. And loading an in-memory table still
requires you to read every record in the larger file.
Some database products may allow you to allocate a database table and
load it, but not in my limited experience with DB2.
With kindest regards,
florence wrote:
>
> I had two Physical Sequnetial files both are sorted by CUST ID no.
>
> File1 containts 10,000 records
> File2 contains 22 millions rows.(For same CUSTID, there are
> multiple records)
>
> I need compare file1 CUST ID and File2 CUST ID and matching rows would
> be written to
> output file.
>
> Here I am thinking two possible solutions:
>
> 1. Fetch each record from FIle1 and compare with file2
> sequentilly until the cust ID in file1 greater then file2 cust id.
> Match is found write ouput record.
>
> 2. Put all the 22 million rows in a table and use SEARCH ALL for
> each every record on file1.
>
> I want to know which method is preferrable.
>
>
> questions:
>
> 1. If I store 22 million records in table declaration, how much
> storage is needed. Is this ok to use this method.
>
> 2. Sequntial processing it is taking very very llong time.
>
> If there are any different methods are there, Let me know
>
> Please suggest your opions.
>
> Your help is appreciated.
>
> Thanks,
>
--
http://arnold.trembley.home.att.net/
| |
| Richard 2006-08-05, 6:55 pm |
|
florence wrote:
> I had two Physical Sequnetial files both are sorted by CUST ID no.
>
> File1 containts 10,000 records
> File2 contains 22 millions rows.(For same CUSTID, there are
> multiple records)
>
> I need compare file1 CUST ID and File2 CUST ID and matching rows would
> be written to
> output file.
That sounds like a homework question the answer to which is the MERGE
statement.
| |
| florence 2006-08-05, 6:55 pm |
| Thanks Arnold,
I am working with IBM mainframes with Z/os. Merging is not possible
becasue for each matching record, I need to do some calculations and
write output. Both these files are on 3390 disk.
Thanks in advance,
Arnold Trembley wrote:
> In my opinion, this kind of problem is better solved by a sequential
> match-merge process. This is a well-known, reliable, and efficient
> batch processing technique.
>
> You don't say which COBOL compiler you are using, or what operating
> environment this will run in, or if either of the files are on tape
> versus disk. It might also be helpful to know the record lengths of
> each file.
>
> Few COBOL environments will be able to support a working-storage table
> containing 22 million records. If we assume each record is 80 bytes
> long, the working-storage table would occupy 80 * 22 million bytes or
> about 1.76 gigabytes of memory. And loading an in-memory table still
> requires you to read every record in the larger file.
>
> Some database products may allow you to allocate a database table and
> load it, but not in my limited experience with DB2.
>
> With kindest regards,
>
>
>
> florence wrote:
>=20
> --=20
> http://arnold.trembley.home.att.net/
>
| |
| florence 2006-08-05, 6:55 pm |
| Thanks Arnold,
I am working with IBM mainframes with Z/os. Merging is not possible
becasue for each matching record, I need to do some calculations and
write output. Both these files are on 3390 disk.
Thanks in advance,
Arnold Trembley wrote:
> In my opinion, this kind of problem is better solved by a sequential
> match-merge process. This is a well-known, reliable, and efficient
> batch processing technique.
>
> You don't say which COBOL compiler you are using, or what operating
> environment this will run in, or if either of the files are on tape
> versus disk. It might also be helpful to know the record lengths of
> each file.
>
> Few COBOL environments will be able to support a working-storage table
> containing 22 million records. If we assume each record is 80 bytes
> long, the working-storage table would occupy 80 * 22 million bytes or
> about 1.76 gigabytes of memory. And loading an in-memory table still
> requires you to read every record in the larger file.
>
> Some database products may allow you to allocate a database table and
> load it, but not in my limited experience with DB2.
>
> With kindest regards,
>
>
>
> florence wrote:
>=20
> --=20
> http://arnold.trembley.home.att.net/
>
| |
| florence 2006-08-05, 6:55 pm |
|
florence wrote:
Thanks Arnold,
I am working with IBM mainframes with Z/os. Merging is not possible
becasue for each matching record, I need to do some calculations and
write output. Both these files are on 3390 disk.
file1 LREC =3D 20
file2 LRECL=3D120.
Output record length =3D 200.
I hope this helps
[color=darkred]
>
> Thanks in advance,
>
>
>
> Arnold Trembley wrote:
or[color=darkred]
ch[color=darkred]
| |
| florence 2006-08-05, 6:55 pm |
|
florence wrote:
Thanks Arnold,
I am working with IBM mainframes with Z/os. Merging is not possible
becasue for each matching record, I need to do some calculations and
write output. Both these files are on 3390 disk.
file1 LREC =3D 20
file2 LRECL=3D120.
Output record length =3D 200.
I hope this helps
[color=darkred]
>
> Thanks in advance,
>
>
>
> Arnold Trembley wrote:
or[color=darkred]
ch[color=darkred]
| |
| Richard 2006-08-05, 6:55 pm |
|
florence wrote:
> I am working with IBM mainframes with Z/os. Merging is not possible
> becasue for each matching record, I need to do some calculations and
> write output. Both these files are on 3390 disk.
What part of "some calculations and write output" prevents a standard 2
file merge logic being used ?
The whole point of a merge from two files is that you wind up at some
point in the program with each record matching. That is where you would
then calculate and output.
| |
| Richard 2006-08-05, 6:55 pm |
|
florence wrote:
> I had two Physical Sequnetial files both are sorted by CUST ID no.
>
> File1 containts 10,000 records
> File2 contains 22 millions rows.(For same CUSTID, there are
> multiple records)
>
> I need compare file1 CUST ID and File2 CUST ID and matching rows would
> be written to
> output file.
You haven't specified whther it is the file1 matching records or the
file2 matching records that are written to output, or both.
Given custids of:
File1: B D E E F ...
File2: A(1) A(2) B(1) C(1) D(1) D(2) E(1) E(2) E(3) F(1) ...
(n) indicating there are 2 A records, 3 E records.
Which records will be output ? Those from file1 ? those from file2 ?
both ? If there are two records with the same custid in file1 (is this
possible) do you need to output all the matching file2 records for
each, duplicating the output ?
> Here I am thinking two possible solutions:
>
> 1. Fetch each record from FIle1 and compare with file2
> sequentilly until the cust ID in file1 greater then file2 cust id.
> Match is found write ouput record.
You imply that it would be necessary to start again at the beginning of
file2 for each record in file1. You said that the files are sorted by
Cust-Id. For each record in File1 it is only necessary to read forward
in file2 because all the records already read in file2 must be lower
CustId than the current File1 CustId. That is the nature of them being
sorted.
> 2. Put all the 22 million rows in a table and use SEARCH ALL for
> each every record on file1.
SEARCH ALL does not give you 'all' the records that match, it only
gives one but it may use a binary chop search (or any other method) and
the one that it finds need not be the first of that key. That is it
might 'search all' the table when searching.
> I want to know which method is preferrable.
Neither probably.
> questions:
>
> 1. If I store 22 million records in table declaration, how much
> storage is needed. Is this ok to use this method.
Simple 22,000,000 x table item size. Are you allowed to use a Gigabyte
of RAM or so ? Note that a SERCH ALL (which is unlikely to be what you
want anyway) will potentially access all parts of the table for each
SEARCH and so will hammer the virtual memory mercilessly and will
thrash. The operators will kill your program.
> 2. Sequntial processing it is taking very very llong time.
Are you reading the whole of file2 for each file1 record ? why ?
| |
| florence 2006-08-05, 6:55 pm |
| Thank you very much for your analysis.Thats excellent.
I will need to match file1(No duplicate CUST IDS in this file)
with file2 custids(Duplicate CUST IDS in this file).
Once CUST ID matches, then based on status category in file2
field I need to pass
one filed data to one of seven output fields in the output record. All
other output fields in the output record would be populated by file2
fields only, and output record would be "PIPE" delimeted.
FIle1 LRECL 20( It has 10000)
File2 LRECL 120(It has 22 millian records)
I hope this helps.
What is PIPE delimeted(it is JUST putting "|" after each filed in the
output record?)
Once again appreciated your help.
Thanks
Richard wrote:
> florence wrote:
>
> You haven't specified whther it is the file1 matching records or the
> file2 matching records that are written to output, or both.
>
> Given custids of:
>
> File1: B D E E F ...
>
> File2: A(1) A(2) B(1) C(1) D(1) D(2) E(1) E(2) E(3) F(1) ...
>
> (n) indicating there are 2 A records, 3 E records.
>
> Which records will be output ? Those from file1 ? those from file2 ?
> both ? If there are two records with the same custid in file1 (is this
> possible) do you need to output all the matching file2 records for
> each, duplicating the output ?
>
>
> You imply that it would be necessary to start again at the beginning of
> file2 for each record in file1. You said that the files are sorted by
> Cust-Id. For each record in File1 it is only necessary to read forward
> in file2 because all the records already read in file2 must be lower
> CustId than the current File1 CustId. That is the nature of them being
> sorted.
>
>
> SEARCH ALL does not give you 'all' the records that match, it only
> gives one but it may use a binary chop search (or any other method) and
> the one that it finds need not be the first of that key. That is it
> might 'search all' the table when searching.
>
>
> Neither probably.
>
>
> Simple 22,000,000 x table item size. Are you allowed to use a Gigabyte
> of RAM or so ? Note that a SERCH ALL (which is unlikely to be what you
> want anyway) will potentially access all parts of the table for each
> SEARCH and so will hammer the virtual memory mercilessly and will
> thrash. The operators will kill your program.
>
>
> Are you reading the whole of file2 for each file1 record ? why ?
| |
| Richard 2006-08-05, 6:55 pm |
|
florence wrote:
> Thank you very much for your analysis.Thats excellent.
>
> I will need to match file1(No duplicate CUST IDS in this file)
> with file2 custids(Duplicate CUST IDS in this file).
>
> Once CUST ID matches, then based on status category in file2
> field I need to pass
> one filed data to one of seven output fields in the output record. All
> other output fields in the output record would be populated by file2
> fields only, and output record would be "PIPE" delimeted.
>
> FIle1 LRECL 20( It has 10000)
> File2 LRECL 120(It has 22 millian records)
Then you probably need:
PERFORM UNTIL File1-CustId = HIGH-VALUE
READ file1
AT END MOVE HIGH-VALUES TO File1-CustId
END-READ
PERFORM Read-File2
UNTIL File2-CustId >= File1-CustId
PERFORM
UNTIL File1-CustId = HIGH-VLUES
OR File2-CustId > File1-CustId
* deal with matching file2 record here
PERFORM Read-File2
END-PERFOM
END-PERFORM
| |
|
| In article <1154797366.419417.233450@m73g2000cwd.googlegroups.com>,
florence <hari_junk1@yahoo.com> wrote:
>
>
> I had two Physical Sequnetial files both are sorted by CUST ID no.
>
> File1 containts 10,000 records
> File2 contains 22 millions rows.(For same CUSTID, there are
>multiple records)
>
>I need compare file1 CUST ID and File2 CUST ID and matching rows would
>be written to
>output file.
Hmmmmm... I have, when forced to, suggested a rather similar question as
an interview for prospective hires.
Please do your own homework.
DD
| |
| Alistair 2006-08-06, 6:55 pm |
|
florence wrote:
> Thanks Arnold,
>
> I am working with IBM mainframes with Z/os. Merging is not possible
> becasue for each matching record, I need to do some calculations and
> write output. Both these files are on 3390 disk.
>
> Thanks in advance,
I don't think that Arnold meaned for you to do an external merge of the
files (which can be done on IBM mainframes using DFSORT, SYNCSORT or
CA-SORT) but to instead use Cobol code (written by yourself) which
would sequentially match records on one file against the other before
performing calculations on matched records..
| |
| William M. Klein 2006-08-06, 6:55 pm |
| Actually, with any conforming COBOL compiler, you COULD use the MERGE statement
with an INPUT PROCEDURE to do exactly this. I am NOT saying it would be the
best way to handle it, but it certainly would be ONE way to do it.
--
Bill Klein
wmklein <at> ix.netcom.com
"Alistair" <alistair@ld50macca.demon.co.uk> wrote in message
news:1154888464.351416.21260@i3g2000cwc.googlegroups.com...
>
> florence wrote:
>
> I don't think that Arnold meaned for you to do an external merge of the
> files (which can be done on IBM mainframes using DFSORT, SYNCSORT or
> CA-SORT) but to instead use Cobol code (written by yourself) which
> would sequentially match records on one file against the other before
> performing calculations on matched records..
>
| |
|
|
"florence" <hari_junk1@yahoo.com> wrote in message
news:1154797366.419417.233450@m73g2000cwd.googlegroups.com...
>
>
> I had two Physical Sequnetial files both are sorted by CUST ID no.
>
> File1 containts 10,000 records
> File2 contains 22 millions rows.(For same CUSTID, there are
> multiple records)
>
> I need compare file1 CUST ID and File2 CUST ID and matching rows would
> be written to
> output file.
>
> Here I am thinking two possible solutions:
>
> 1. Fetch each record from FIle1 and compare with file2
> sequentilly until the cust ID in file1 greater then file2 cust id.
> Match is found write ouput record.
>
> 2. Put all the 22 million rows in a table and use SEARCH ALL for
> each every record on file1.
Skip that idea, definitely not in a table.
Hint: MERGE
>
> I want to know which method is preferrable.
>
>
> questions:
>
> 1. If I store 22 million records in table declaration, how much
> storage is needed. Is this ok to use this method.
Way too much!
>
> 2. Sequntial processing it is taking very very llong time.
>
> If there are any different methods are there, Let me know
>
> Please suggest your opions.
>
> Your help is appreciated.
MERGE
>
> Thanks,
>
| |
| Rick Smith 2006-08-07, 6:55 pm |
|
"William M. Klein" <wmklein@nospam.netcom.com> wrote in message
news:KXsBg.318531$GD7.307552@fe08.news.easynews.com...
> "Alistair" <alistair@ld50macca.demon.co.uk> wrote in message
> news:1154888464.351416.21260@i3g2000cwc.googlegroups.com...
>
> Actually, with any conforming COBOL compiler, you COULD use the MERGE
statement
> with an INPUT PROCEDURE to do exactly this. I am NOT saying it would be
the
> best way to handle it, but it certainly would be ONE way to do it.
You mispelled "OUTPUT"; but I really wanted to mention
that I found Micro Focus 3.2.24 (2 Jun 1994) to be a
non-conforming compiler with respect to the MERGE
statement and variable length records.
I set up a test program using ordered input files with fixed
length records of 20 and 120 characters, respectively. I set
up the SD for the merge file for variable length records.
I set the code to use record length to determine the source
of the record. When I ran the program, the RETURN
statement always set the record length to the maximum
size; rather than the original size.
Changing the code from merge to sort, the program worked
as expected.
Here is a simplified version of the program using line sequential
files, which are still fixed length as far as the MERGE statement
is concerned. In "file-1.txt", I placed the letters b, d, and e, on
separate lines. In "file-2.txt", I placed a repeating sequence of
three records for each of the letters a though f: a1, a2, a3, b1,
...., f3, on separate lines. The resulting output should be 9 lines:
bb1, bb2, bb3, dd1, ...,ee3.
Changing "merge" to "sort" and adding "duplicates in order",
the program works as expected.
-----
identification division.
program-id. 2-files.
environment division.
input-output section.
file-control.
select file-1 assign to "file-1.txt"
organization line sequential.
select file-2 assign to "file-2.txt"
organization line sequential.
select file-3 assign to "file-3.txt"
organization line sequential.
select work-file assign to "work".
data division.
file section.
fd file-1.
01 file-1-record pic x(1).
fd file-2.
01 file-2-record pic x(2).
fd file-3.
01 file-3-record pic x(3).
sd work-file
record varying from 1 to 10
depending on record-length.
01 work-record.
02 cust-id pic x.
02 pic x.
01 work-record-min pic x.
working-storage section.
01 record-length comp pic 9(4).
01 saved-key pic x value space.
01 end-of-merge-flag pic x value "0".
88 end-of-merge value "1".
procedure division.
begin.
open output file-3
merge work-file
ascending key cust-id
using file-1 file-2
output procedure is merge-process
close file-3
stop run
| |
| epc8@juno.com 2006-08-07, 9:55 pm |
|
Richard wrote:
> florence wrote:
>
>
> What part of "some calculations and write output" prevents a standard 2
> file merge logic being used ?
>
> The whole point of a merge from two files is that you wind up at some
> point in the program with each record matching. That is where you would
> then calculate and output.
Perhaps processing the 22 million record second file sequentially is
the real problem?
The dynamics of the problem depend on the relationship between the keys
in the two files, of course.
a. Suppose each key in the first file has only a few matching records
in the second file.
b. Something in between.
c. Suppose all of the keys in the second file match records in the
first.
I'm guessing, perhaps wrongly, that the OP has a good reason for
rejecting the standard 2 file merge logic that you suggest. :-).
-- elliot
| |
| William M. Klein 2006-08-07, 9:55 pm |
|
"Rick Smith" <ricksmith@mfi.net> wrote in message
news:12df0r656hfp8ec@corp.supernews.com...
>
> "William M. Klein" <wmklein@nospam.netcom.com> wrote in message
> news:KXsBg.318531$GD7.307552@fe08.news.easynews.com...
<snip>[color=darkred]
> sd work-file
> record varying from 1 to 10
> depending on record-length.
> 01 work-record.
> 02 cust-id pic x.
> 02 pic x.
> 01 work-record-min pic x.
Rick,
Was this really the way you coded this (approximately, not exactly)?
I believe that when you have a "to 10" but the largest record area is defined as
2, that you will still ONLY be able to get at 2 bytes - even if record-length
shows a value greater than 2.
If this was a "typo" in your stripped down program, can you tell us what MF
support said when you reported your problem to them?
--
Bill Klein
wmklein <at> ix.netcom.com
| |
| Rick Smith 2006-08-07, 9:55 pm |
|
"William M. Klein" <wmklein@nospam.netcom.com> wrote in message
news:IDNBg.223594$1Q1.150987@fe03.news.easynews.com...
>
> "Rick Smith" <ricksmith@mfi.net> wrote in message
> news:12df0r656hfp8ec@corp.supernews.com...
> <snip>
>
> Rick,
> Was this really the way you coded this (approximately, not exactly)?
Yes, but exactly, and I ran the program to be sure that
the error still occured, as in the first program.
> I believe that when you have a "to 10" but the largest record area is
defined as
> 2, that you will still ONLY be able to get at 2 bytes - even if
record-length
> shows a value greater than 2.
There are only 2 record lengths used, 1 and 2. I used
"to 10" to demonstrate the error that the maximum is
supplied and not the largest or the original. The program
makes no attempt to access the record unless the length
is 1 or 2.
> If this was a "typo" in your stripped down program, can you tell us what
MF
> support said when you reported your problem to them?
It was not a "typo". The implementation I am using is
12 years old. Had I reported the error to Micro Focus,
the most likely response would have been to upgrade
to their latest and greatest.
| |
| Richard 2006-08-07, 9:55 pm |
|
epc8@juno.com wrote:
> Perhaps processing the 22 million record second file sequentially is
> the real problem?
The alternative being ?
> The dynamics of the problem depend on the relationship between the keys
> in the two files, of course.
The files were said to be sequential rather than indesed, but then one
probably shouldn't trust what has been said about them.
> I'm guessing, perhaps wrongly, that the OP has a good reason for
> rejecting the standard 2 file merge logic that you suggest. :-).
I suspect that OP is not even aware that there is such a thing as '2
file merge logic' let alone reasons for rejecting it.
| |
| epc8@juno.com 2006-08-07, 9:55 pm |
|
Richard wrote:
> epc8@juno.com wrote:
>
>
> The alternative being ?
>
>
> The files were said to be sequential rather than indesed, but then one
> probably shouldn't trust what has been said about them.
>
>
> I suspect that OP is not even aware that there is such a thing as '2
> file merge logic' let alone reasons for rejecting it.
On sober reflection, I think you are correct.
Suppose you had the freedom to load the data into some other type of
file. Would it be worth it or would sequential processing be fast
enough? [I have no way of knowing what is reasonable on a mainframe.
The last time I ran a program on one was '84 or so. :-).]
| |
| Clark F Morris 2006-08-07, 9:55 pm |
| On 7 Aug 2006 17:02:20 -0700, epc8@juno.com wrote:
>
>Richard wrote:
>
>On sober reflection, I think you are correct.
>
>Suppose you had the freedom to load the data into some other type of
>file. Would it be worth it or would sequential processing be fast
>enough? [I have no way of knowing what is reasonable on a mainframe.
>The last time I ran a program on one was '84 or so. :-).]
Sequential processing is very fast on the IBM mainframe unless the
file is created BLOCK 1 RECORD which is the default if BLOCK 0 (let
the system decide) or some reasonable block size is specified.
| |
| Lüko Willms 2006-08-08, 3:55 am |
| Am Sat, 5 Aug 2006 21:58:07 UTC, schrieb "Richard"
<riplin@Azonic.co.nz> auf comp.lang.cobol :
> Then you probably need:
>
> PERFORM UNTIL File1-CustId = HIGH-VALUE
> READ file1
> AT END MOVE HIGH-VALUES TO File1-CustId
> END-READ
> PERFORM Read-File2
> UNTIL File2-CustId >= File1-CustId
> PERFORM
> UNTIL File1-CustId = HIGH-VLUES
> OR File2-CustId > File1-CustId
>
> * deal with matching file2 record here
>
> PERFORM Read-File2
> END-PERFOM
> END-PERFORM
> .
>
> Read-File2.
>
> READ file2
> AT END MOVE HIGH-VALUES TO File2-CustId
> END-READ
> .
Easier and clearer:
READ file1
READ file2
*> file1 is the control file.
*> it is assumed that Customer-ID is unique within file1,
*> while many duplicates can exist in file2
*> probably file1 is the customer master file,
*> while file2 is some transaction file
*>
PERFORM UNTIL EOF IN FILE-STATUS-file1
OR EOF IN FILE-STATUS-file2
EVALUATE TRUE
WHEN Customer-ID IN file1 < Customer-ID IN file2 THEN
READ file1
WHEN Customer-ID IN file1 > Customer-ID IN file2 THEN
READ file2
OTHERWISE *> i.e. Customer-ID IN file1 = the one in
file2
PERFORM UNTIL EOF IN FILE-STATUS-file2
OR Customer-ID in file1 < Customer-ID IN file2
PERFORM collect-information-and-write-out
READ file2
END-PERFORM
END-EVALUATE
END-PERFORM
Yours,
L.W.
| |
| Richard 2006-08-08, 3:55 am |
|
L=FCko Willms wrote:
> Easier and clearer:
What is 'easier and clearer' is entirely what one is used to.
| |
| Lüko Willms 2006-08-09, 3:55 am |
| Am Tue, 8 Aug 2006 00:02:20 UTC, schrieb epc8@juno.com auf
comp.lang.cobol :
> Suppose you had the freedom to load the data into some other type of
> file. Would it be worth it or would sequential processing be fast
> enough?
This would require a sequential processing of the transaction file
by the utility copying that file into some other place, and doing some
process with it, like indexing.
I presume that the direct sequential processing of that file is the
most economical with respect to system resources.
Yours,
L.W.
|
|
|
|
|