For Programmers: Free Programming Magazines  


Home > Archive > Fortran > June 2005 > Interpreting Profiler Report









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Interpreting Profiler Report
student

2005-06-08, 4:00 am


I got the following flot profile report while using gprofiler. I don't
understand what _g95_section_array really is which is using 23.47% of
the time. Also, the "calls", self "Ks/calls" and the "total Ks/call"
fields don't show up for the intrinsic functions. How can I make them
appear? Also, is the intrinsic dot_product() optimal?


------------------------------------------------------
23.47 1385.20 1385.20
_g95_section_array
16.99 2388.07 1002.87
_g95_dot_product_r4
10.80 3025.36 637.29 300000 0.00 0.00 matrix_MP_ludcmp_
9.14 3564.78 539.41 _g95_spread
6.98 3976.99 412.21 1 0.41 2.28 MAIN_
5.69 4312.65 335.67 7200000 0.00 0.00
nrutil_MP_outerprod_r__
4.45 4575.04 262.39 550001 0.00 0.00
matrix_MP_matrixmult_
3.09 4757.65 182.62 7200000 0.00 0.00 matrix_MP_lubksb_
3.06 4938.47 180.81 _g95_transpose
2.18 5067.06 128.59 150000 0.00 0.00
mcmc_MP_gibbsnorm_
1.50 5155.48 88.43 300000 0.00 0.00
matrix_MP_pdsymminv_
1.45 5240.78 85.30 _g95_bump_element
1.30 5317.56 76.78 100000 0.00 0.00
matrix_MP_multdiag_right__
1.23 5390.36 72.80 80750000 0.00 0.00
random_MP_random_normal__
0.96 5447.27 56.91 xorshf96
0.92 5501.82 54.55 _g95_random_4
0.88 5554.05 52.23 5691750 0.00 0.00
nrutil_MP_swap_rv__
0.57 5587.82 33.77 insert_mem
0.56 5621.15 33.33 malloc
0.40 5644.61 23.46 300000 0.00 0.00
matrix_MP_identity_
0.35 5665.06 20.44 _g95_maxvald1_r4
0.35 5685.48 20.42 free
0.31 5703.55 18.07 _g95_maxloc_r4
0.28 5719.93 16.38
_g95_init_assumed_shape
0.26 5735.35 15.42 _g95_rand
0.26 5750.52 15.17 get_user_mem
0.24 5764.59 14.07 section_size
0.22 5777.55 12.96 delete_treap
0.22 5790.27 12.72 compare
0.20 5801.81 11.54 free_user_mem
0.19 5813.15 11.35
_g95_init_multipliers
0.19 5824.24 11.09
_g95_allocate_array
0.18 5834.81 10.56
_g95_array_from_section
0.16 5844.16 9.36
_g95_deallocate_array
0.12 5851.02 6.86 7200000 0.00 0.00
nrutil_MP_imaxloc_r__
0.09 5856.61 5.59 initialize_memory
0.09 5861.77 5.16 _g95_size
0.08 5866.65 4.88 delete_root
0.08 5871.32 4.67 _g95_xorshift128
0.05 5874.27 2.95 largebin_index
0.05 5877.01 2.73 put_field
0.05 5879.73 2.72 _g95_write_real
0.04 5881.81 2.08
malloc_consolidate
0.04 5883.88 2.07 rotate_left
0.03 5885.80 1.92 _g95_temp_array
0.02 5886.93 1.13 _g95_temp_alloc
0.02 5887.94 1.02 huge
0.02 5888.89 0.94 _g95_temp_free
0.01 5889.65 0.77 get_field
0.01 5890.41 0.76 _g95_any_4
0.01 5891.16 0.75 _g95_huge_4
0.01 5891.90 0.73 _g95_write_block
0.01 5892.59 0.69 write_fixed
0.01 5893.24 0.66
data_transfer_init
0.01 5893.90 0.65
_g95_list_formatted_write
0.01 5894.53 0.64 7500000 0.00 0.00
nrutil_MP_assert_eq3__
0.01 5895.14 0.61 215596 0.00 0.00 ignlgi_
0.01 5895.69 0.55 100000 0.00 0.00
matrix_MP_normsquare_
0.01 5896.20 0.52 7 0.00 0.00 io_MP_writebuff_
0.01 5896.69 0.48
size_record_buffer
0.01 5897.14 0.45 100000 0.00 0.00 sgamma_
0.01 5897.57 0.44
_g95_is_internal_unit
0.01 5897.99 0.41
random_MP_random_gamma__
0.01 5898.40 0.41
_g95_bump_element_dim
0.01 5898.74 0.34 _g95_find_unit
0.01 5899.06 0.32 write_separator
0.00 5899.35 0.29 100000 0.00 0.00 snorm_
0.00 5899.62 0.27
_g95_transfer_real
0.00 5899.88 0.26
_g95_get_float_flavor
0.00 5900.12 0.24 write_free
0.00 5900.31 0.19 100000 0.00 0.00 gengam_
0.00 5900.45 0.14 215596 0.00 0.00 ranf_
0.00 5900.59 0.14 _g95_get_unit
0.00 5900.73 0.14 start_transfer
0.00 5900.86 0.13 _g95_extract_mint
0.00 5900.99 0.13 _g95_get_sign
0.00 5901.12 0.13 _g95_salloc_w
0.00 5901.25 0.13 _g95_st_write
0.00 5901.38 0.13
_g95_st_write_done
0.00 5901.51 0.13
_g95_write_integer
0.00 5901.62 0.11 fd_flush
0.00 5901.72 0.10
_g95_library_start
0.00 5901.81 0.09 itoa_4
0.00 5901.90 0.09 rotate_right
0.00 5901.99 0.09 write_record
0.00 5902.06 0.07 _g95_free_fnodes
0.00 5902.13 0.07 _g95_library_end
0.00 5902.20 0.07
write_formatted_sequential
0.00 5902.27 0.07
nrutil_MP_outerprod_d__
0.00 5902.32 0.06 15411 0.00 0.00 sexpo_
0.00 5902.39 0.06 _g95_get_ioparm
0.00 5902.44 0.06 writen
0.00 5902.48 0.04 _g95_sign_r4
0.00 5902.52 0.04 init_write
0.00 5902.56 0.04 matrix_MP_diag_
0.00 5902.60 0.04
nrutil_MP_assert_eq4__
0.00 5902.63 0.04
nrutil_MP_imaxloc_i__
0.00 5902.66 0.03 _g95_sfree
0.00 5902.69 0.03 recursive_io
0.00 5902.72 0.03 215629 0.00 0.00 getcgn_
0.00 5902.74 0.03
matrix_MP_printmatrix_
0.00 5902.76 0.02 215630 0.00 0.00 __g95_master_0__
0.00 5902.78 0.02 215597 0.00 0.00 __g95_master_0__
0.00 5902.80 0.02
_g95_transfer_integer
0.00 5902.82 0.02 finalize_transfer
0.00 5902.84 0.02
matrix_MP_diag_inv__
0.00 5902.85 0.01 32 0.00 0.00 setcgn_
0.00 5902.87 0.01
nrutil_MP_ifirstloc_
0.00 5902.88 0.01 215661 0.00 0.00 __g95_master_0__
0.00 5902.89 0.01 215596 0.00 0.00 rgnqsd_
0.00 5902.90 0.01 3 0.00 0.00 io_MP_readbuff_
0.00 5902.91 0.01 free_fnode
0.00 5902.91 0.00 215629 0.00 0.00 qrgnin_
0.00 5902.91 0.00 62 0.00 0.00 mltmod_
0.00 5902.91 0.00 32 0.00 0.00 initgn_
0.00 5902.91 0.00 1 0.00 0.00 inrgcm_
0.00 5902.91 0.00 1 0.00 0.00 qrgnsn_
0.00 5902.91 0.00 1 0.00 0.00 setall_

Tim Prince

2005-06-08, 4:00 am


"student" <adarsh@stat.tamu.edu> wrote in message
news:1118193076.736386.14880@g44g2000cwa.googlegroups.com...
>
> I got the following flot profile report while using gprofiler. I don't
> understand what _g95_section_array really is which is using 23.47% of
> the time. Also, the "calls", self "Ks/calls" and the "total Ks/call"
> fields don't show up for the intrinsic functions. How can I make them
> appear? Also, is the intrinsic dot_product() optimal?

You must compile the intrinsic functions with -pg to get those data, if the
compiler didn't come with a copy of the library built that way. You may
want to look at your generated object code to see how the section_array call
is used.
You are the best person to decide whether the way the dot_product you are
using is coded suits your application. For IA scalar processors, it
generally requires splitting into 2 batched sums for good performance, and
into 4 or 8 partial sums for use of parallel instructions, if the operation
is big enough.


student

2005-06-08, 4:00 am

> > I got the following flot profile report while using gprofiler. I don't
> You must compile the intrinsic functions with -pg to get those data, if the
> compiler didn't come with a copy of the library built that way. You may
> want to look at your generated object code to see how the section_array call
> is used.



what's the command? what do i exactly write on the command line?

Tim Prince

2005-06-08, 4:00 am


"student" <adarsh@stat.tamu.edu> wrote in message
news:1118203274.637823.183020@z14g2000cwz.googlegroups.com...
>
>
> what's the command? what do i exactly write on the command line?
>

Same command as was used to build the library you are using, adding -pg, and
any other options you might think appropriate to your own platform. Or, you
could try including relevant library sources in your own project, using your
favorite compile options. I doubt if many g95 users bother with this; if you
are interested in what goes on in the intrinsics, you might prefer gfortran.


Joost

2005-06-08, 8:58 am

Hi,

I notice that you were able to speedup the code already quite a bit ...

For your question concerning g95_section_array, I find this comment in
the g95 code (runtime/array.c see other thread how to get that):

/* section_array()-- Given an array, calculate a new descriptor that
* is the section of the array. The section_info[] array holds a set
* of records on each dimension. The first word of a record is a flag
* indicating whether the dimension has a fixed value (contracting the
* dimensionality of the result) or a subscript range. For fixed
* values, the next word is the array index. For ranges, the next
* three words give the start, end and stride of the range. From
* this, the new descriptor is calculated.
*
* If a dimension is not a range, the element only contributes to
* calculating the offset of the new array.
*
* For a section, the multiplier of the new descriptor is the product
* of the old descriptor and the stride. The bounds are one and the
* extent of the section, which can be truncated by the range of the
* original array. */

That's a bit verbose, but basically, it is something that gets invoked
if you use an array section e.g.
x=DOT_PRODUCT(a(:,1),a(1,:))
might call this twice (as far as I know).

These things are typically well optimized in more mature compilers,i.e.
such optimization wasn't the first focus of the g95 development (better
get the rigt answer slowly than the wrong answer fast).

If you want to get around this limitation, you might try to avoid the
array syntax in speed critical parts of the code.

Joost

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com