For Programmers: Free Programming Magazines  


Home > Archive > Fortran > August 2005 > Ways to debug Monte Carlo Code ?









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Ways to debug Monte Carlo Code ?
Klemens Barfus

2005-08-23, 7:57 am

Dear list members,

I am working with a Monte Carlo Radiation Code in the Visual Fortran
Environment. Working with small simulations, where I attend just a small
amount of photons on their way through the grid of optical properties,
it works. But if I increase the number of photons, I always get errors.
I can imagine two reasons for the errors. First one is that the chance
for the photon to enter a gridcell with bad properties is higher with
more photons, the second one is an overflow of a variable when using the
higher amount of photons.
The exit message is: ... has exited with code 3.
Unfortunatly this code is not described in the manual of Visual Fortran.
Because I am not familiar with finding reasons for errors other then
debugging step by step in Visual Fortran, are there any other
possibilities to get information about the reason why the programm exits
and in which state different variables have been when exited [something
like on_error ...] then to always write the variables in a file ?

Thanks for your help in advance !

Klemens
Arjen Markus

2005-08-23, 7:57 am

Have you turned on the stack trace facility?
And do you explicitly initialise the random number generator so that
your
computations can be repeated?

Just a few suggestions.

Regards,

Arjen

Rich Townsend

2005-08-23, 7:00 pm

Klemens Barfus wrote:
> Dear list members,
>
> I am working with a Monte Carlo Radiation Code in the Visual Fortran
> Environment. Working with small simulations, where I attend just a small
> amount of photons on their way through the grid of optical properties,
> it works. But if I increase the number of photons, I always get errors.
> I can imagine two reasons for the errors. First one is that the chance
> for the photon to enter a gridcell with bad properties is higher with
> more photons, the second one is an overflow of a variable when using the
> higher amount of photons.
> The exit message is: ... has exited with code 3.
> Unfortunatly this code is not described in the manual of Visual Fortran.
> Because I am not familiar with finding reasons for errors other then
> debugging step by step in Visual Fortran, are there any other
> possibilities to get information about the reason why the programm exits
> and in which state different variables have been when exited [something
> like on_error ...] then to always write the variables in a file ?
>
> Thanks for your help in advance !
>
> Klemens


Are you sure that the error message is being produced by the compiler?
You should search through the source code for the phrase "has exited",
to see whether the "error" is actually caused by debugging code.

You may be interested to hear that I've also been working on MC
radiative transfer codes recently. Much to my surprise, the code kept on
halting when it detected an internal inconsistency in the photon's
position (I'd put in debugging code to explicitly check for this, since
I was getting weird results). Turns out, these inconsistencies were
arising when the photon scattered exactly on a cell boundary.

What's the probability of that, you ask? Well, if I use 32-bit floating
point numbers numbers to specify the photon's position, then I should
get an inconsistency approximately once in every 10^7 photons. That's
the thing about MC radiative transfer: you throw so many photons through
a code that often you find problems that only come up one in a million
or more times.

So, check through the program to see whether your problems are due to
debugging code. If they are, *don't* remove the debugging code; instead,
find out why it is being triggered.

cheers,

Rich
Nachiket Gokhale

2005-08-23, 7:00 pm

On Tue, 2005-08-23 at 11:07 +0200, Klemens Barfus wrote:
> Dear list members,
>
> I am working with a Monte Carlo Radiation Code in the Visual Fortran
> Environment. Working with small simulations, where I attend just a small
> amount of photons on their way through the grid of optical properties,
> it works. But if I increase the number of photons, I always get errors.
> I can imagine two reasons for the errors. First one is that the chance
> for the photon to enter a gridcell with bad properties is higher with
> more photons, the second one is an overflow of a variable when using the
> higher amount of photons.
> The exit message is: ... has exited with code 3.
> Unfortunatly this code is not described in the manual of Visual Fortran.
> Because I am not familiar with finding reasons for errors other then
> debugging step by step in Visual Fortran, are there any other
> possibilities to get information about the reason why the programm exits
> and in which state different variables have been when exited [something
> like on_error ...] then to always write the variables in a file ?
>
> Thanks for your help in advance !
>
> Klemens


I would run the code through the profiler valgrind
(http://www.valgrind.org available under the GPL) and look for memory
allocation errors and other similar errors.

-Nachiket.

Steve Lionel

2005-08-23, 7:00 pm

On Tue, 23 Aug 2005 11:07:39 +0200, Klemens Barfus
<klemens.barfus@forst.tu-dresden.de> wrote:

>The exit message is: ... has exited with code 3.


This is a message reported by the Microsoft debugger when the program exits
and it reports the value of whatever the exit status is. Code 3 is not
particularly meaningful. I would recommend running the program under the
debugger to see if you can determine where and when it fails.

Steve Lionel
Software Products Division
Intel Corporation
Nashua, NH

User communities for Intel Software Development Products
http://softwareforums.intel.com/
Intel Fortran Support
http://developer.intel.com/software/products/support/
Dr Ivan D. Reid

2005-08-23, 7:00 pm

On Tue, 23 Aug 2005 09:18:50 -0400, Rich Townsend <rhdt@barVOIDtol.udel.edu>
wrote in <def7nq$c0g$1@scrotar.nss.udel.edu>:
> Klemens Barfus wrote:


[color=darkred]
> Are you sure that the error message is being produced by the compiler?
> You should search through the source code for the phrase "has exited",
> to see whether the "error" is actually caused by debugging code.


> You may be interested to hear that I've also been working on MC
> radiative transfer codes recently. Much to my surprise, the code kept on
> halting when it detected an internal inconsistency in the photon's
> position (I'd put in debugging code to explicitly check for this, since
> I was getting weird results). Turns out, these inconsistencies were
> arising when the photon scattered exactly on a cell boundary.


> What's the probability of that, you ask? Well, if I use 32-bit floating
> point numbers numbers to specify the photon's position, then I should
> get an inconsistency approximately once in every 10^7 photons. That's
> the thing about MC radiative transfer: you throw so many photons through
> a code that often you find problems that only come up one in a million
> or more times.


> So, check through the program to see whether your problems are due to
> debugging code. If they are, *don't* remove the debugging code; instead,
> find out why it is being triggered.


These days with much faster machines, the limits of PRNGs and
numerical methods can be much more readily explored than when I was doing
electron transport in gases at 5 M collisions/30 hours CPU. It took me
several months to find my first such problem -- that the PRNG _will_
return 0.0 (but not 1.0) and ln(0.0) will crash the programme. Quick
change to using (1.0-RAND()) instead. You need to carefully evaluate
the granularity and cycle-length of the PRNG to make sure it lives up
to your model (it's no good using a PRNG like the above if you are selecting
events on the 10e-7 probability level!).

I posted the simple PRNG I used (derived from one by RP Brent)
a while back, but others with good characteristics are also readily
available. Especially with the overwhelming majority of current hardware
using double-precision floats by default, I wouldn't recommend anything
that takes more than a few seconds' run-time to use a single-precision
PRNG! (My generator returns over 20,000,000 randoms/second using g77/cygwin
on a 3.2 GHz laptop; the assembler version in DOS is much faster but I haven't
learnt the GCC assembler & protocols yet...). In fact, use double precision
for everything, unless you are dealling with huge arrays (most MCs won't).

--
Ivan Reid, Electronic & Computer Engineering, ___ CMS Collaboration,
Brunel University. Ivan.Reid@[brunel.ac.uk|cern.ch] Room 40-1-B12, CERN
Klemens Barfus

2005-08-26, 3:56 am

Dear list members,

thanks to everybody who has contributed to my question about debugging a
Monte Carlo code !
Finally I found some errors by disabling the random number generator and
tracking then photons causing the error.
I m aware that there will be some constellations of photons, positions
and directions of moving, which will still cause some errors [like Rich
mentioned], but hopefully I will not realize them in my simulations :-) .

Cheers,

Klemens
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com