Code Comments
Programming Forum and web based access to our favorite programming groups.I had two Physical Sequnetial files both are sorted by CUST ID no. File1 containts 10,000 records File2 contains 22 millions rows.(For same CUSTID, there are multiple records) I need compare file1 CUST ID and File2 CUST ID and matching rows would be written to output file. Here I am thinking two possible solutions: 1. Fetch each record from FIle1 and compare with file2 sequentilly until the cust ID in file1 greater then file2 cust id. Match is found write ouput record. 2. Put all the 22 million rows in a table and use SEARCH ALL for each every record on file1. I want to know which method is preferrable. questions: 1. If I store 22 million records in table declaration, how much storage is needed. Is this ok to use this method. 2. Sequntial processing it is taking very very llong time. If there are any different methods are there, Let me know Please suggest your opions. Your help is appreciated. Thanks,
Post Follow-up to this messageIn my opinion, this kind of problem is better solved by a sequential match-merge process. This is a well-known, reliable, and efficient batch processing technique. You don't say which COBOL compiler you are using, or what operating environment this will run in, or if either of the files are on tape versus disk. It might also be helpful to know the record lengths of each file. Few COBOL environments will be able to support a working-storage table containing 22 million records. If we assume each record is 80 bytes long, the working-storage table would occupy 80 * 22 million bytes or about 1.76 gigabytes of memory. And loading an in-memory table still requires you to read every record in the larger file. Some database products may allow you to allocate a database table and load it, but not in my limited experience with DB2. With kindest regards, florence wrote: > > I had two Physical Sequnetial files both are sorted by CUST ID no. > > File1 containts 10,000 records > File2 contains 22 millions rows.(For same CUSTID, there are > multiple records) > > I need compare file1 CUST ID and File2 CUST ID and matching rows would > be written to > output file. > > Here I am thinking two possible solutions: > > 1. Fetch each record from FIle1 and compare with file2 > sequentilly until the cust ID in file1 greater then file2 cust id. > Match is found write ouput record. > > 2. Put all the 22 million rows in a table and use SEARCH ALL for > each every record on file1. > > I want to know which method is preferrable. > > > questions: > > 1. If I store 22 million records in table declaration, how much > storage is needed. Is this ok to use this method. > > 2. Sequntial processing it is taking very very llong time. > > If there are any different methods are there, Let me know > > Please suggest your opions. > > Your help is appreciated. > > Thanks, > -- http://arnold.trembley.home.att.net/
Post Follow-up to this messageThanks Arnold, I am working with IBM mainframes with Z/os. Merging is not possible becasue for each matching record, I need to do some calculations and write output. Both these files are on 3390 disk. Thanks in advance, Arnold Trembley wrote: > In my opinion, this kind of problem is better solved by a sequential > match-merge process. This is a well-known, reliable, and efficient > batch processing technique. > > You don't say which COBOL compiler you are using, or what operating > environment this will run in, or if either of the files are on tape > versus disk. It might also be helpful to know the record lengths of > each file. > > Few COBOL environments will be able to support a working-storage table > containing 22 million records. If we assume each record is 80 bytes > long, the working-storage table would occupy 80 * 22 million bytes or > about 1.76 gigabytes of memory. And loading an in-memory table still > requires you to read every record in the larger file. > > Some database products may allow you to allocate a database table and > load it, but not in my limited experience with DB2. > > With kindest regards, > > > > florence wrote: >=20 > --=20 > http://arnold.trembley.home.att.net/ >
Post Follow-up to this messageThanks Arnold, I am working with IBM mainframes with Z/os. Merging is not possible becasue for each matching record, I need to do some calculations and write output. Both these files are on 3390 disk. Thanks in advance, Arnold Trembley wrote: > In my opinion, this kind of problem is better solved by a sequential > match-merge process. This is a well-known, reliable, and efficient > batch processing technique. > > You don't say which COBOL compiler you are using, or what operating > environment this will run in, or if either of the files are on tape > versus disk. It might also be helpful to know the record lengths of > each file. > > Few COBOL environments will be able to support a working-storage table > containing 22 million records. If we assume each record is 80 bytes > long, the working-storage table would occupy 80 * 22 million bytes or > about 1.76 gigabytes of memory. And loading an in-memory table still > requires you to read every record in the larger file. > > Some database products may allow you to allocate a database table and > load it, but not in my limited experience with DB2. > > With kindest regards, > > > > florence wrote: >=20 > --=20 > http://arnold.trembley.home.att.net/ >
Post Follow-up to this messageflorence wrote: Thanks Arnold, I am working with IBM mainframes with Z/os. Merging is not possible becasue for each matching record, I need to do some calculations and write output. Both these files are on 3390 disk. file1 LREC =3D 20 file2 LRECL=3D120. Output record length =3D 200. I hope this helps > > Thanks in advance, > > > > Arnold Trembley wrote: or ch
Post Follow-up to this messageflorence wrote: Thanks Arnold, I am working with IBM mainframes with Z/os. Merging is not possible becasue for each matching record, I need to do some calculations and write output. Both these files are on 3390 disk. file1 LREC =3D 20 file2 LRECL=3D120. Output record length =3D 200. I hope this helps > > Thanks in advance, > > > > Arnold Trembley wrote: or ch
Post Follow-up to this messageflorence wrote: > I am working with IBM mainframes with Z/os. Merging is not possible > becasue for each matching record, I need to do some calculations and > write output. Both these files are on 3390 disk. What part of "some calculations and write output" prevents a standard 2 file merge logic being used ? The whole point of a merge from two files is that you wind up at some point in the program with each record matching. That is where you would then calculate and output.
Post Follow-up to this messageflorence wrote: > I had two Physical Sequnetial files both are sorted by CUST ID no. > > File1 containts 10,000 records > File2 contains 22 millions rows.(For same CUSTID, there are > multiple records) > > I need compare file1 CUST ID and File2 CUST ID and matching rows would > be written to > output file. You haven't specified whther it is the file1 matching records or the file2 matching records that are written to output, or both. Given custids of: File1: B D E E F ... File2: A(1) A(2) B(1) C(1) D(1) D(2) E(1) E(2) E(3) F(1) ... (n) indicating there are 2 A records, 3 E records. Which records will be output ? Those from file1 ? those from file2 ? both ? If there are two records with the same custid in file1 (is this possible) do you need to output all the matching file2 records for each, duplicating the output ? > Here I am thinking two possible solutions: > > 1. Fetch each record from FIle1 and compare with file2 > sequentilly until the cust ID in file1 greater then file2 cust id. > Match is found write ouput record. You imply that it would be necessary to start again at the beginning of file2 for each record in file1. You said that the files are sorted by Cust-Id. For each record in File1 it is only necessary to read forward in file2 because all the records already read in file2 must be lower CustId than the current File1 CustId. That is the nature of them being sorted. > 2. Put all the 22 million rows in a table and use SEARCH ALL for > each every record on file1. SEARCH ALL does not give you 'all' the records that match, it only gives one but it may use a binary chop search (or any other method) and the one that it finds need not be the first of that key. That is it might 'search all' the table when searching. > I want to know which method is preferrable. Neither probably. > questions: > > 1. If I store 22 million records in table declaration, how much > storage is needed. Is this ok to use this method. Simple 22,000,000 x table item size. Are you allowed to use a Gigabyte of RAM or so ? Note that a SERCH ALL (which is unlikely to be what you want anyway) will potentially access all parts of the table for each SEARCH and so will hammer the virtual memory mercilessly and will thrash. The operators will kill your program. > 2. Sequntial processing it is taking very very llong time. Are you reading the whole of file2 for each file1 record ? why ?
Post Follow-up to this messageThank you very much for your analysis.Thats excellent. I will need to match file1(No duplicate CUST IDS in this file) with file2 custids(Duplicate CUST IDS in this file). Once CUST ID matches, then based on status category in file2 field I need to pass one filed data to one of seven output fields in the output record. All other output fields in the output record would be populated by file2 fields only, and output record would be "PIPE" delimeted. FIle1 LRECL 20( It has 10000) File2 LRECL 120(It has 22 millian records) I hope this helps. What is PIPE delimeted(it is JUST putting "|" after each filed in the output record?) Once again appreciated your help. Thanks Richard wrote: > florence wrote: > > You haven't specified whther it is the file1 matching records or the > file2 matching records that are written to output, or both. > > Given custids of: > > File1: B D E E F ... > > File2: A(1) A(2) B(1) C(1) D(1) D(2) E(1) E(2) E(3) F(1) ... > > (n) indicating there are 2 A records, 3 E records. > > Which records will be output ? Those from file1 ? those from file2 ? > both ? If there are two records with the same custid in file1 (is this > possible) do you need to output all the matching file2 records for > each, duplicating the output ? > > > You imply that it would be necessary to start again at the beginning of > file2 for each record in file1. You said that the files are sorted by > Cust-Id. For each record in File1 it is only necessary to read forward > in file2 because all the records already read in file2 must be lower > CustId than the current File1 CustId. That is the nature of them being > sorted. > > > SEARCH ALL does not give you 'all' the records that match, it only > gives one but it may use a binary chop search (or any other method) and > the one that it finds need not be the first of that key. That is it > might 'search all' the table when searching. > > > Neither probably. > > > Simple 22,000,000 x table item size. Are you allowed to use a Gigabyte > of RAM or so ? Note that a SERCH ALL (which is unlikely to be what you > want anyway) will potentially access all parts of the table for each > SEARCH and so will hammer the virtual memory mercilessly and will > thrash. The operators will kill your program. > > > Are you reading the whole of file2 for each file1 record ? why ?
Post Follow-up to this messageflorence wrote: > Thank you very much for your analysis.Thats excellent. > > I will need to match file1(No duplicate CUST IDS in this file) > with file2 custids(Duplicate CUST IDS in this file). > > Once CUST ID matches, then based on status category in file2 > field I need to pass > one filed data to one of seven output fields in the output record. All > other output fields in the output record would be populated by file2 > fields only, and output record would be "PIPE" delimeted. > > FIle1 LRECL 20( It has 10000) > File2 LRECL 120(It has 22 millian records) Then you probably need: PERFORM UNTIL File1-CustId = HIGH-VALUE READ file1 AT END MOVE HIGH-VALUES TO File1-CustId END-READ PERFORM Read-File2 UNTIL File2-CustId >= File1-CustId PERFORM UNTIL File1-CustId = HIGH-VLUES OR File2-CustId > File1-CustId * deal with matching file2 record here PERFORM Read-File2 END-PERFOM END-PERFORM
Post Follow-up to this messagePowered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.