Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

64-bit c++ application crashing on solaris
Hi,
we are trying to migrate an existing C++ application on solaris
compiled in 32-bit env to 64-bit env.
We have successfully compiled the application in 64-bit mode. We are
facing problems while running the compiled application.
The application seems to be crashing intermittently for no specific
reason.
We think this could probably be due to sufficient memory not available
to the 64-bit application.
we investigated various tunable parameters in solaris for the same
purpose.
A brief description of them is listed below --


lwp_default_stksize

Specifies the default value of the stack size to be used when a kernel
thread is created, and when the calling routine does not provide an
explicit size to be used.

Data Type
Integer

Range
Minimum is the default values:
3 x PAGESIZE on SPARC systems  (3 *  8192 = 24576)
Maximum is 32 times the default value.

Units
Bytes in multiples of the value returned by the getpagesize
parameter. For more information, see getpagesize(3C).

Dynamic?
Yes. Affects threads created after the variable is changed.

Validation
Must be greater than or equal to 8192 and less than or equal to
262,144 (256 x 1024). Also must be a multiple of the system page size.
If these conditions are not met, the following message is displayed:
Illegal stack size, Using N
The value of N is the default value of lwp_default_stksize.

When to Change
When the system panics because it has run out of stack space. The
best solution for this problem is to determine why the system is
running out of space and then make a correction.
Increasing the default stack size means that almost every kernel
thread will have a larger stack, resulting in increased kernel memory
consumption for no good reason. Generally, that space will be unused.
The increased consumption means other resources that are competing for
the same pool of memory will have the amount of space available to
them reduced, possibly decreasing the system's ability to perform
work. Among the side effects is a reduction in the number of threads
that the kernel can create. This solution should be treated as no more
than an interim workaround until the root cause is remedied.

segkpsize
Specify the amount of kernel pageable memory available. This
memory is used primarily for kernel thread stacks. Increasing this
number allows either larger stacks for the same number of threads or
more threads. This parameter can only be set on systems running 64-bit
kernels. Systems running 64-bit kernels use a default stack size of 24
Kbytes.

Data Type
Unsigned long

Default
64-bit kernels, 2 Gbytes
32-bit kernels, 512 Mbytes

Range
64-bit kernels, 512 Mbytes - 24 Gbytes
32-bit kernels, 512 Mbytes

Units
Mbytes

Dynamic?
No

Validation
Value is compared to minimum and maximum sizes (512 Mbytes and 24
Gbytes for 64-bit systems) and if smaller than the minimum or larger
than the maximum, it is reset to 2 Gbytes and a message to that effect
is displayed.
The actual size used in creation of the cache is the lesser of the
value specified in segkpsize after the constraints checking and 50% of
physical memory.
When to Change
This is one of the steps necessary to support large numbers of
processes on a system. The default size of 2 Gbytes, assuming at least
1 Gbyte of physical memory is present, allows creation of 24-Kbyte
stacks for more than 87,000 kernel threads. The size of a stack in a
64-bit kernel is the same whether the process is a 32-bit process or a
64-bit process. If more than this number is needed, segkpsize can be
increased assuming sufficient physical memory exists.


Does anyone have an idea about these parameters (or any other related
params), and if they could be helpful in resolving the issue at hand ?
Or is there any other area, we should look at which could help in this
case ?

Report this thread to moderator Post Follow-up to this message
Old Post
Sumir
03-28-08 01:11 PM


Re: 64-bit c++ application crashing on solaris
Sumir <sumirmehta@gmail.com> writes:

>we are trying to migrate an existing C++ application on solaris
>compiled in 32-bit env to 64-bit env.
>We have successfully compiled the application in 64-bit mode. We are
>facing problems while running the compiled application.
>The application seems to be crashing intermittently for no specific
>reason.

Surely it crashes somewhere?  "Intermittently, unspecific"
feels like memory corruption.

>We think this could probably be due to sufficient memory not available
>to the 64-bit application.

That would result in NULL being returned from allocation functions
and exceptions to be thrown.

>we investigated various tunable parameters in solaris for the same
>purpose.
>A brief description of them is listed below --

>lwp_default_stksize

Not relevant to user processes; this is a *kernel* stack limit.

>segkpsize

Something to do with the kernel, not user processes.

>Does anyone have an idea about these parameters (or any other related
>params), and if they could be helpful in resolving the issue at hand ?
>Or is there any other area, we should look at which could help in this
>case ?

If the crash is intermittent then it's not likely a resource issue
unless the crash happens when resources are allocated or shortly
afterwards; it is unlikely that a 64 bit app uses many more
resources than its 32 bit counterpart and you would notice it running
out of swap space (memory resources) and likely out of stack space.
(which you can change using ulimit)

Casper
--
Expressed in this posting are my opinions.  They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.

Report this thread to moderator Post Follow-up to this message
Old Post
Casper H.S. Dik
03-29-08 12:22 AM


Re: 64-bit c++ application crashing on solaris
Sumir <sumirmehta@gmail.com> writes:

> The application seems to be crashing intermittently for no specific
> reason.

Oh, there *is* a reson (or a few).

The very first question you should ask is where exactly is it
crashing? Use debugger to find out.

> We think this could probably be due to sufficient memory not available
> to the 64-bit application.

You have not presented any reason why you'd think that.

Cheers,
--
In order to understand recursion you must first understand recursion.
Remove /-nsp/ for email.

Report this thread to moderator Post Follow-up to this message
Old Post
Paul Pluzhnikov
03-29-08 09:41 AM


Re: 64-bit c++ application crashing on solaris
On Mar 28, 7:29=A0pm, Casper H.S. Dik <Casper....@Sun.COM> wrote:
> Sumir <sumirme...@gmail.com> writes: 
>
> Surely it crashes somewhere? =A0"Intermittently, unspecific"
> feels like memory corruption.
> 
>
> That would result in NULL being returned from allocation functions
> and exceptions to be thrown.
> 
>
> Not relevant to user processes; this is a *kernel* stack limit.
> 
>
> Something to do with the kernel, not user processes.
> 
>
> If the crash is intermittent then it's not likely a resource issue
> unless the crash happens when resources are allocated or shortly
> afterwards; it is unlikely that a 64 bit app uses many more
> resources than its 32 bit counterpart and you would notice it running
> out of swap space (memory resources) and likely out of stack space.
> (which you can change using ulimit)
>
> Casper
> --
> Expressed in this posting are my opinions. =A0They are in no way related
> to opinions held by my employer, Sun Microsystems.
> Statements on Sun products included here are not gospel and may
> be fiction rather than truth.



Hi Casper,

Thanks for your reply.

The application does not crash upon resource allocation. In fact it
does run for sometime, before it dies. Also this behaviour is visible
when there are multiple clients (say a multithreaded process)
connected to this application and hitting it continuously with
requests.

Can you briefly elaborate about the usage of ulimit, and how it could
affect the application in our case.

Report this thread to moderator Post Follow-up to this message
Old Post
Sumir
03-31-08 10:46 AM


Re: 64-bit c++ application crashing on solaris
On Mar 29, 10:14=A0am, Paul Pluzhnikov <ppluzhnikov-...@gmail.com>
wrote:
> Sumir <sumirme...@gmail.com> writes: 
>
> Oh, there *is* a reson (or a few).
>
> The very first question you should ask is where exactly is it
> crashing? Use debugger to find out.
> 
>
> You have not presented any reason why you'd think that.
>
> Cheers,
> --
> In order to understand recursion you must first understand recursion.
> Remove /-nsp/ for email.

Hi Paul,

I ran a truss on the application to have a trace. It resulted into
this ...

lwp_sema_post(0xFFFFFFFF7D003D60)               =3D 0
lwp_mutex_lock(0xFFFFFFFF7E72B068)              =3D 0
read(39, " 3 e c 0 a 8 f 1 c 7 d 0".., 512)     =3D 512
lwp_mutex_wakeup(0xFFFFFFFF7E72B068)            =3D 0
time()                                          =3D 1205757131
lwp_mutex_lock(0xFFFFFFFF7E72B068)              =3D 0
Incurred fault #6, FLTBOUNDS  %pc =3D 0xFFFFFFFF7E449BBC
siginfo: SIGSEGV SEGV_MAPERR addr=3D0x0FAE0350
Received signal #11, SIGSEGV [default]
siginfo: SIGSEGV SEGV_MAPERR addr=3D0x0FAE0350
*** process killed ***


Seems there is some memory access violation. But the thing is, this
same application compiled in 32-bit mode, run on the same environment
(same machine) fares well. So if it were something to do within the
code, in way of accessing memory wrongly, it should have surfaced in
the 32-bit version as well.

Report this thread to moderator Post Follow-up to this message
Old Post
Sumir
03-31-08 10:46 AM


Re: 64-bit c++ application crashing on solaris
Sumir wrote:
> On Mar 29, 10:14 am, Paul Pluzhnikov <ppluzhnikov-...@gmail.com>
> wrote: 
>
> Hi Paul,
>
> I ran a truss on the application to have a trace. It resulted into
> this ...
>
What do you see in your debugger?  If there's a core file, load that.

--
Ian Collins.

Report this thread to moderator Post Follow-up to this message
Old Post
Ian Collins
03-31-08 10:46 AM


Re: 64-bit c++ application crashing on solaris
On Mar 31, 11:22=A0am, Ian Collins <ian-n...@hotmail.com> wrote:
> Sumir wrote: 
> 
> 
 
> 
> 
> 
>
> What do you see in your debugger? =A0If there's a core file, load that.
>
> --
> Ian Collins.- Hide quoted text -
>
> - Show quoted text -[/color]



I did have a couple of core files. Loading them gives the following
stack trace --

CORE 1 -->

=3D>[1]
__rwstd::time_reader<char,std::istreambuf_iterator<char,std::char_traits<cha
=
r> 
0xffffffff7ac06978, 0x0, 0x10fe72590, 0x1005c20d8), at 0x100386884
[2]
std::time_get<char,std::istreambuf_iterator<char,std::char_traits<char> 
0xffffffff7ac0688c, 0xffffffff7ac06800), at 0x100384078
[3] Date::Date(0x10bc4f910, 0xffffffff7ac06a18, 0x6800, 0x6070,
0x1005c20d8, 0x68a0), at 0x1001b17f0
[4] ImagineToDbml::getCalendar(0x80000006c, 0x1, 0x1005d0748,
0xffffffff7ac069b0, 0x1000000f4, 0x1005c20d8), at 0x1001184e0
[5]  IMDFn::getCalendarXML(0xffffffff7ac07238
, 0xffffffff7ac06dd8,
0xffffffff7ac07250, 0xffffffff7ac06cf0, 0xffffffff7ac06d60,
0xffffffff7ac06fd0), at 0x1000b6dc4
[6]  SOAPServer::doFnRequest(0xffffffff7fffc8
10, 0x10064fb78,
0xffffffff7ac07728, 0xffffffff7ac07250, 0xffffffff7ac07250,
0x1005cbe30), at 0x1000e9ccc
[7] SOAPServer::callFn(0xffffffff7fffc810, 0xffffffff7ac07210,
0x10047b7aa, 0x10064fb78, 0xffffffff7ac07728, 0xffffffff7ac07690), at
0x1000e9b20
[8] SOAPServer::onBody(0xffffffff7fffc810, 0xffffffff7ac07478,
0xffffffff7ac073f8, 0xffffffff7ac07728, 0xffffffff7ac07690,
0xffffffff), at 0x1000e97b8
[9]  SOAPServer::onMessage(0xffffffff7fffc810
, 0xffffffff7ac07608,
0x1005c20d8, 0xffffffff7ac07728, 0xffffffff7ac07690, 0x0), at
0x1000e9408
[10]  SOAPServerTCPIP::onReceive(0xffffffff7ff
fc810, 0x1074ded50,
0x0, 0xbd, 0x105b99060, 0x0), at 0x1000ed170
[11] IOServer::check(0x1005d3988, 0x0, 0x1074ded50, 0x1079cfd00,
0x32, 0x1079cfd38), at 0x1001009f4
[12]  IOServerMT::ChildServer::loop(0x1079cfd0
0, 0x32, 0x64,
0xffffffff7e3912ac, 0x0, 0x0), at 0x1000ff640
[13]  IOServerMT::ChildServer::run(0x1079cfd00
, 0x15, 0x100639e38,
0x0, 0x0, 0x0), at 0x1001018cc
[14] Thread::entryFun(0x1079cfdc8, 0xffffffff7e722bb0, 0x0, 0x1,
0xffffffff7e720000, 0x0), at 0x1001afe9c


CORE 2 -->

=3D>[1] __rwstd::timepunct_data<char>::__initpat(0xb0, 0x10060d3d8,
0x10fe3d490, 0x1005c20d8, 0x0, 0x0), at 0x1003956ac
[2] __rwstd::timepunct<char>::__initfacet(0x10fe32730, 0x1005d5e50,
0x1, 0x2, 0x0, 0x10fe32760), at 0x10039466c
[3] std::locale::__install(0x1005d5e50, 0x10fe32730, 0x10060ded0,
0x10036aae4, 0x1005c20d8, 0x1), at 0x10036ab78
[4]  std::locale::__make_explicit(0x1005d5e50
, 0x10060ded0, 0x1,
0x100, 0x100382db8, 0x5800), at 0x10036aaf4
[5]
std::time_get<char,std::istreambuf_iterator<char,std::char_traits<char> 
0x1005d5ea0), at 0x100382d1c
[6] std::locale::__install(0x1005d5e50, 0x10fe37b20, 0x10060ddb0,
0x10036aae4, 0x1005c20d8, 0x0), at 0x10036ab78
[7]  std::locale::__make_explicit(0x1005d5e50
, 0x10060ddb0, 0x1,
0x100, 0x1001b1900, 0x5800), at 0x10036aaf4
[8] Date::Date(0x10fe42c50, 0x100481fdc, 0x6800, 0x6070,
0x1005c20d8, 0x68a0), at 0x1001b1770
[9]  ImagineToDbml::getCalendar(0xffffffff7be
04dd8,
0xffffffff7be04f28, 0x1005d0748, 0xffffffff7be049b0,
0xffffffff7be04fb8, 0xffffffff7be04e27), at 0x1001184c4
[10]  IMDFn::getCalendarXML(0xffffffff7be05238
, 0xffffffff7be04dd8,
0xffffffff7be05250, 0xffffffff7be04cf0, 0xffffffff7be04d60,
0xffffffff7be04fd0), at 0x1000b6dc4
[11]  SOAPServer::doFnRequest(0xffffffff7fffc8
00, 0x10064fb78,
0xffffffff7be05728, 0xffffffff7be05250, 0xffffffff7be05250,
0x1005cbe30), at 0x1000e9ccc
[12] SOAPServer::callFn(0xffffffff7fffc800, 0xffffffff7be05210,
0x10047b7aa, 0x10064fb78, 0xffffffff7be05728, 0xffffffff7be05690), at
0x1000e9b20
[13] SOAPServer::onBody(0xffffffff7fffc800, 0xffffffff7be05478,
0xffffffff7be053f8, 0xffffffff7be05728, 0xffffffff7be05690,
0xffffffff), at 0x1000e97b8
[14]  SOAPServer::onMessage(0xffffffff7fffc800
, 0xffffffff7be05608,
0x1005c20d8, 0xffffffff7be05728, 0xffffffff7be05690, 0x0), at
0x1000e9408
[15]  SOAPServerTCPIP::onReceive(0xffffffff7ff
fc800, 0x102fb2b80,
0x0, 0xbd, 0x1006aad30, 0x0), at 0x1000ed170
[16] IOServer::check(0x1005d3988, 0x0, 0x102fb2b80, 0x103a03270,
0x32, 0x103a032a8), at 0x1001009f4
[17]  IOServerMT::ChildServer::loop(0x103a0327
0, 0x32, 0x64,
0xffffffff7e3912ac, 0x0, 0x0), at 0x1000ff640
[18]  IOServerMT::ChildServer::run(0x103a03270
, 0x15, 0x100639e38,
0x0, 0x0, 0x0), at 0x1001018cc
[19] Thread::entryFun(0x103a03338, 0xffffffff7e722bb0, 0x0, 0x1,
0xffffffff7e720000, 0x0), at 0x1001afe9c


Report this thread to moderator Post Follow-up to this message
Old Post
Sumir
03-31-08 10:46 AM


Re: 64-bit c++ application crashing on solaris
Sumir wrote:
> On Mar 31, 11:22 am, Ian Collins <ian-n...@hotmail.com> wrote: 

*Please* don't quote signatures or that google nonsense.

>
>
> I did have a couple of core files. Loading them gives the following
> stack trace --
>
Your Date constructor looks to be a prime contender, run the application
under dbx until it crashes and see what it is passing to do_get_date.

I'm guessing you are building with Sun Studio, using the default STL.
If so, try stlport4.

--
Ian Collins.

Report this thread to moderator Post Follow-up to this message
Old Post
Ian Collins
03-31-08 10:47 AM


Re: 64-bit c++ application crashing on solaris
On Sun, 30 Mar 2008 23:11:53 -0700 (PDT), Sumir <sumirmehta@gmail.com> wrote:
>
> I ran a truss on the application to have a trace. It resulted into
> this ...
>
> lwp_sema_post(0xFFFFFFFF7D003D60)               = 0
> lwp_mutex_lock(0xFFFFFFFF7E72B068)              = 0
> read(39, " 3 e c 0 a 8 f 1 c 7 d 0".., 512)     = 512
> lwp_mutex_wakeup(0xFFFFFFFF7E72B068)            = 0
> time()                                          = 1205757131
> lwp_mutex_lock(0xFFFFFFFF7E72B068)              = 0
>     Incurred fault #6, FLTBOUNDS  %pc = 0xFFFFFFFF7E449BBC
>       siginfo: SIGSEGV SEGV_MAPERR addr=0x0FAE0350
>     Received signal #11, SIGSEGV [default]
>       siginfo: SIGSEGV SEGV_MAPERR addr=0x0FAE0350
>         *** process killed ***
>
> Seems there is some memory access violation. But the thing is, this
> same application compiled in 32-bit mode, run on the same environment
> (same machine) fares well. So if it were something to do within the
> code, in way of accessing memory wrongly, it should have surfaced in
> the 32-bit version as well.

Not necessarily.  There are many programs which implicitly assume that
`int' is large enough to hold a memory address.  They tend to work find
in 32-bit mode (because their assumption happens to be true), but fail
randomly in 64-bit mode.


Report this thread to moderator Post Follow-up to this message
Old Post
Giorgos Keramidas
03-31-08 10:47 AM


Re: 64-bit c++ application crashing on solaris
Sumir wrote:
> I ran a truss on the application to have a trace. It resulted into
> this ...
>
> lwp_sema_post(0xFFFFFFFF7D003D60)               = 0
> lwp_mutex_lock(0xFFFFFFFF7E72B068)              = 0
> read(39, " 3 e c 0 a 8 f 1 c 7 d 0".., 512)     = 512
> lwp_mutex_wakeup(0xFFFFFFFF7E72B068)            = 0
> time()                                          = 1205757131
> lwp_mutex_lock(0xFFFFFFFF7E72B068)              = 0
>     Incurred fault #6, FLTBOUNDS  %pc = 0xFFFFFFFF7E449BBC
>       siginfo: SIGSEGV SEGV_MAPERR addr=0x0FAE0350
>     Received signal #11, SIGSEGV [default]
>       siginfo: SIGSEGV SEGV_MAPERR addr=0x0FAE0350
>         *** process killed ***
>
>
> Seems there is some memory access violation. But the thing is, this
> same application compiled in 32-bit mode, run on the same environment
> (same machine) fares well. So if it were something to do within the
> code, in way of accessing memory wrongly, it should have surfaced in
> the 32-bit version as well.

It looks like a typical sign extension problem.  It's the most common
bug when porting to 64-bit and only shows itself when 'long' is 64-bit
wide.  On 32-bit, long and int are the same width, so it doesn't happen;
whatever arithmetics you can do with int, apply to long too since
they're the same on 32-bit.

You have to inspect the code for constants like 0xFFFF, 0xFFFF0000 (and
similar patterns) and make then unsigned long if needed (UL), search for
things like shift operations with longs, longs that ought to be unsigned
longs, stuff like that.  Travel up the stacktrace of the crash to see
when you hit code that looks like what I just described.

Report this thread to moderator Post Follow-up to this message
Old Post
Nikos Chantziaras
03-31-08 10:47 AM


Sponsored Links




Last Thread Next Thread Next
Search this forum -> 
Post New Thread

Unix Programming archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 06:01 PM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.