Code Comments
Programming Forum and web based access to our favorite programming groups.I came accross the Linpack benchmark and I gave it a try at my PC (2.6 GHz Pentium, Windows XP, CVF6.6). I used a timing function that involves DATE_AND_TIME. It seems that the first Linpack test (rank 100 linear system) is out of date (!?). The execution takes miliseconds to end and the results are negative which according to Linpack FAQ means that the timing function is not accurate enough. Obviously it was designed for slower systems. The second Linpack test (1000) lasts about 4 seconds. (Compilation command: f90 -Optimize:5 1000d.f77 second.f77) ------------------------------------------------ norm. resid resid machep 6.49150133E+00 7.20701276E-13 2.22044605E-16 1 times are reported for matrices of order 1000 factor solve total mflops times for array with leading dimension of1001 3.750E+00 -3.750E-04 3.749E+00 1.783E+02 1.121 end of tests -- this version dated 10/12/92 ------------------------------------------------ It reports a speed of 178.3 MFLOPs which is close enough to the java version of the Linpack version (167 MFLOPs). I find it hard to believe the results mentioned in netlib (http://performance.netlib.org/perfo.../PDSbrowse.html), since they are one order of a magnidute greater. The third test (hpl), I didn't dare to touch, since it required a lot of work and I didn't know if it is applicable my computer. So my questions are: 1) Is there something basic that I might have overlooked in the comparison with netlib archives? 2) Obviously this MFLOPs number is OS and compiler dependent. Is there a universel method/program of determing a MFLOPs number for a certain hardware setup, independent of software? 3) Can the java version of Linpack be considered as one? 4) Wouldn't it be better if vendor's included a MFLOPs number among the spefications of a computer, even if it refers to a certain software setup? Thank you in advance.
Post Follow-up to this messageIn article <c99s9o$g06$1@nic.grnet.gr>, "Harontas" <harontas@hotmail.com> wrote: > I came accross the Linpack benchmark and I gave it a try at my PC (2.6 GHz > Pentium, Windows XP, CVF6.6). > I used a timing function that involves DATE_AND_TIME. > > It seems that the first Linpack test (rank 100 linear system) is out of da te > (!?). > The execution takes miliseconds to end and the results are negative > which according to Linpack FAQ means that the timing function is not > accurate enough. > Obviously it was designed for slower systems. Yes, even PC class computers now are about 10,000 times faster than computers were when that benchmark was first used in the late 70's. I don't know exactly what you mean by a "negative" result, but it is likely that you need to use a different timer. You can probably find a timer that is accurate to a microsecond. This may still give you some timing noise, but if you run the test a few times you can get a good average value. I ran this calculation on a Mac G5 a while back, and I got INF for the MFLOPS rate because the entire calculation finished within a single timing clock-tick. > The second Linpack test (1000) lasts about 4 seconds. > (Compilation command: f90 -Optimize:5 1000d.f77 second.f77) > ------------------------------------------------ > norm. resid resid machep > 6.49150133E+00 7.20701276E-13 2.22044605E-16 1 > > > > times are reported for matrices of order 1000 > factor solve total mflops > times for array with leading dimension of1001 > 3.750E+00 -3.750E-04 3.749E+00 1.783E+02 1.121 > end of tests -- this version dated 10/12/92 > ------------------------------------------------ > It reports a speed of 178.3 MFLOPs which is close enough to the java versi on > of the Linpack version (167 MFLOPs). I think that a 2.6GHz Pentium should test in the 500-1000 MFLOPs range for the 100x100 calculation. The 1000x1000 results that are published do the calculation differently than the reference code that you are using. The vendor/user is allowed to change algorithms to solve the problem the optimal way for this benchmark. That is why the performance is so much higher than for the 100x100 calculation. Here is a simpler benchmark. Time a matrix-matrix product using the BLAS3 dgemm() routine for matrices of various sizes. For a 2.6GHz pentium, you should get over 2000 MFLOPS. This will give you an estimate (and a reasonable upper bound) to the 1000x1000 linpack benchmark results using an optimal algorithm. There are several versions of BLAS3 libraries online that you can download, or you can purchase, for example, the MKL library from intel, that will have good performance on your hardware. The 1000x1000 linpack problem can be formulated in terms of dgemm() operations involving subblocks of the large matrix, and that is how optimal performance is achieved. > I find it hard to believe the results mentioned in netlib > (http://performance.netlib.org/perfo.../PDSbrowse.html), > since they are one order of a magnidute greater. After you experiment a little with dgemm(), it will not be hard to believe. On a 2GHz Mac G5, for example, you can see 7500 MFLOPS computation rates for dgemm(); the theoretical max for this hardware is 8000 MFLOPS. > 1) Is there something basic that I might have overlooked in the comparison > with netlib archives? Yes, the vendors are allowed to change algorithms for the large problem, but not for the small problem. > 2) Obviously this MFLOPs number is OS and compiler dependent. Is there a > universel method/program > of determing a MFLOPs number for a certain hardware setup, independent > of software? No. Different algorithms will perform differently on different hardware. > 3) Can the java version of Linpack be considered as one? Java is an interpreted language, so it will naturally be slow. As you have seen for yourself, it is about a factor of 10 slower than an optimal implementation for the linpack benchmark. > 4) Wouldn't it be better if vendor's included a MFLOPs number among the > spefications of a computer, even > if it refers to a certain software setup? Some people want to see linpack benchmarks because that mimics their application codes the best. Other people want to see PhotoShop benchmarks because that is what they mostly do. Yet other people want to see scalar floating point results, such as SpecFP. You can't please all of the people all of the time. $.02 -Ron Shepard
Post Follow-up to this messagePowered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.