Home > Archive > Software Engineering > May 2006 > Failing projects
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
|
|
|
| Hello,
I am working on an a very large (12+ developers plus testers etc approx
500, 000 lines of code) real time embedded software project which is way
overdue and also we cant seem to get down to a point where the number of
critical issues (ie restarts, lockups etc) is zero.
It does seem that the developers (external company) are injecting new
problems by rushing the fixes for the old problems although this is just
a hunch.
Is there anything (and I mean anything) that we can do NOW to make the
software more stable and also prevent "knock on" issues?
In an ideal world we would have started with full processes and
procedures in place, however they didn't so here I am :-(
Thanks
Paul
| |
| Bradley K. Sherman 2006-04-25, 10:00 pm |
| In article <e2ja8t$ls3$1@emma.aioe.org>, paul <paul@xxx.co.uk> wrote:
>Hello,
> I am working on an a very large (12+ developers plus testers etc approx
>500, 000 lines of code) real time embedded software project which is way
>overdue and also we cant seem to get down to a point where the number of
>critical issues (ie restarts, lockups etc) is zero.
>
>It does seem that the developers (external company) are injecting new
>problems by rushing the fixes for the old problems although this is just
>a hunch.
>
>Is there anything (and I mean anything) that we can do NOW to make the
>software more stable and also prevent "knock on" issues?
Three things:
1) Make management aware of the problem,
2) Make management aware of the problem,
3) Set a reasonable new delivery date.
--bks
| |
| James Bond 007 2006-04-25, 10:00 pm |
|
"Bradley K. Sherman" <bks@panix.com> wrote in message =
news:e2jb8d$1rc$1@reader1.panix.com...
> In article <e2ja8t$ls3$1@emma.aioe.org>, paul <paul@xxx.co.uk> wrote:
approx=20[color=darkred]
way=20[color=darkred]
of=20[color=darkred]
just=20[color=darkred]
[color=darkred]
>=20
> Three things:
> 1) Make management aware of the problem,=20
> 2) Make management aware of the problem,=20
> 3) Set a reasonable new delivery date.
>=20
> --bks
Perhaps try to get the developers to do some unit and integration =
testing to make sure that new problems are not being introduced. Ask =
them to do some code reviews.
Long term, have a port mortem and implement some process changes so that =
these kinds of problems don't happen next time. Besides what I already =
mentioned, have more realistic schedules and add in a 15%-20% fudge =
factor for the unexpected problems that always occur, especially in =
organizations such as yours that do not have good processes.
| |
| Bill Stevenson 2006-04-25, 10:00 pm |
| On 2006-04-24 15:53:09 -0400, paul <paul@xxx.co.uk> said:
> Hello,
> I am working on an a very large (12+ developers plus testers etc
> approx 500, 000 lines of code) real time embedded software project
> which is way overdue and also we cant seem to get down to a point where
> the number of critical issues (ie restarts, lockups etc) is zero.
>
> It does seem that the developers (external company) are injecting new
> problems by rushing the fixes for the old problems although this is
> just a hunch.
>
> Is there anything (and I mean anything) that we can do NOW to make the
> software more stable and also prevent "knock on" issues?
>
> In an ideal world we would have started with full processes and
> procedures in place, however they didn't so here I am :-(
>
> Thanks
> Paul
Can you tell us more about your current testing methods? How do you
prioritize bugs? How are new ones discovered? For example, do you have
tests that you run on every build of the software to identify when an
issue is fixed and that a component functions correctly?
Here's another way of looking at it: What are your criteria for
declaring the project finished? Are you currently at a point where the
project is feature-complete and you're just trying to shake out bugs?
Or are you actively still adding features while also quickly patching
bugs introduced by the features you added yesterday?
On the low level, does your build process still generate compiler
warnings? Have you run the code through a lint-like tool? Those are
certainly low-hanging fruit for getting rid of some crashes. [Here I
assume you're using C?] -- is the problem buggy code in general, or
buggy features - as in, features implemented incorrectly compared to
their specification.
I wish that I could give you some magical advice, but like the others
will likely say, you should try to put together unit tests and
integration tests. Certainly you need to figure out what bugs need to
be fixed to ship, and triage them with the other less serious bugs. Of
course in your embedded situation, if you're unable to deliver updates
easily, then you'll want to organize your priorities differently.
If you tell us a little more, the newsgroup can perhaps be more
helpful. Good luck.
--
Bill Stevenson
Mac OS X Product Release Group
Apple Computer
bstevenson at remove this text dot apple dot com
For Technical Support, refer to http://www.apple.com/support
| |
| STE ;¬! 2006-04-25, 10:00 pm |
| paul wrote:
> It does seem that the developers (external company) are injecting new
> problems by rushing the fixes for the old problems although this is
> just a hunch.
>
> Is there anything (and I mean anything) that we can do NOW to make
> the software more stable and also prevent "knock on" issues?
Well assuming the external company is off site to you, and assuming you
can take some kind of portable test centre with you (laptop?), I would
suggest you get yourself to them and make them hand over the code build to
you as a first quality gateway before anyone else gets to see it.
The theory being that you have a quicker turnaround of severe defects,
and the external company might work a little better when they have an
on site human face to deliver to/communicate with.
--
STE ;¬!
| |
| Matthias Wolpers 2006-04-25, 10:00 pm |
| In article <e2ja8t$ls3$1@emma.aioe.org>, paul <paul@xxx.co.uk> wrote:
> Hello,
> I am working on an a very large (12+ developers plus testers etc approx
> 500, 000 lines of code) real time embedded software project which is way
> overdue and also we cant seem to get down to a point where the number of
> critical issues (ie restarts, lockups etc) is zero.
having been in a similar situation, the following sort of worked for us:
a) make an automated stress test
b) run this nightly ( and do invest in one fancy grafic (i do mean one,
not two or twenty) that will show decreaasing worry when it happens :
marketing is important! )
c) use the failure rate as control device for code change management:
anything that ups the fail rate gets backed out. ignore all non
stability bugs for the time being
d) get a senior joe to sit in the daily test run appraisals with the
express purpose of keeping the not-my-fault-brigade in check
e) get those externals on-site and dont let them go back until they are
done.
please note that i dont _like_ all of these in equal measure: eg i wish
to believe the world will one day work so that d) is unnecessary. also,
the psychological aasumptions behind e) are rather closer to slander
than i would allow. however, this is what worked.
when you have that in place, you can identify the top 3 problem areas
and start working them (this is basically just divide and conquer) :
review the specs, review the code, and generally one by one plug all the
holes that were left during the first pass. you may again need d) to
push this through. it'll create a sense of getting somewhere after all,
mostly.
finding the _real_ top 3 isnt all that important: when your system is a
mess, any focus area will yield results, and result are what you need. i
found that simply counting assigned bugs was a plausible indicator to
buggy function areas so dont start a committee over this:-)
hth, and good luck,
matthias
>
> It does seem that the developers (external company) are injecting new
> problems by rushing the fixes for the old problems although this is just
> a hunch.
>
> Is there anything (and I mean anything) that we can do NOW to make the
> software more stable and also prevent "knock on" issues?
>
> In an ideal world we would have started with full processes and
> procedures in place, however they didn't so here I am :-(
>
> Thanks
> Paul
| |
| Jose Cornado 2006-04-25, 10:00 pm |
| "It does seem that the developers (external company) are injecting new
problems by rushing the fixes for the old problems although this is
just
a hunch."
The solution is to re-test everything all the time. Could you achieve
that? I do not know. It will all depend on what you try to minimize
(operative word) to increase output.
But given that you are dealing with a rte system, I would assume that
you have low tolerance for failures.
For the short term, you have plenty of advice here.
For the long term, try to find an angel high in the food chain.
Automation is a lot about politics.
Make sure your boss and your boss's boss are in.
If things start to work out, the first thing that the CEO/VP will say
is: how come didn't do this before? And your boss does not want to be
the one who couldn't deliver it before.
So if he is not the one getting the credit or part of the credit, he
(or she) will try to derail the effort.
| |
| H. S. Lahman 2006-04-25, 10:00 pm |
| Responding to Paul...
> Hello,
> I am working on an a very large (12+ developers plus testers etc
> approx 500, 000 lines of code) real time embedded software project which
> is way overdue and also we cant seem to get down to a point where the
> number of critical issues (ie restarts, lockups etc) is zero.
This just announces the fact that there were serious process problems in
the development that led to a poor architecture. Unfortunately, it
announces it far too late...
>
> It does seem that the developers (external company) are injecting new
> problems by rushing the fixes for the old problems although this is just
> a hunch.
If you are providing a single acceptance test suite, then you would
probably know this as a fact because the test suite would fail in new
ways as problems were fixed. (OTOH, if the problems found so far are
very fundamental and fatal, the test suite may simply not be executing
fully.)
>
> Is there anything (and I mean anything) that we can do NOW to make the
> software more stable and also prevent "knock on" issues?
Probably not. However, as others have suggested, if you do not have a
single, reasonably comprehensive acceptance test suite that you control,
then you should create one ASAP. While acceptance testing of a 500
KLOC R-T/E application is woefully inadequate as an evaluation of
reliability, it is a whole lot better than nothing or a moving test
target. In an arms-length development, it may be your only control
without other visibility into the development processes.
A second step is to get one of your people who is thoroughly familiar
with the problem domain (or a qualified consultant) to do a design
review of what they have done so far. That is the only way to uncover
any fundamental design flaws so that they can be fixed. Firefighting
the symptoms without addressing the root causes could go on until
bankruptcy.
A third step is to set up a joint team to analyze the failure data with
the goal of isolating patterns that can be traced back to the
implementation. IOW, the team would be doing root cause analysis based
on all the failure data rather than reacting to specific problems. [A
joint team is necessary to represent the white box view of the
implementation and the black box view of the problem domain. IME, it is
a good bet that such a team will uncover requirements errors.]
*************
There is nothing wrong with me that could
not be cured by a capful of Drano.
H. S. Lahman
hsl@pathfindermda.com
Pathfinder Solutions -- Put MDA to Work
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
(888)OOA-PATH
| |
| Paul F. Dietz 2006-04-30, 7:04 pm |
| paul wrote:
> Is there anything (and I mean anything) that we can do NOW to make the
> software more stable and also prevent "knock on" issues?
If you don't have adequate unit tests, it's never too late to
add them. RTE software teams have successfully outsourced
unit testing, by the way:
http://csdl2.computer.org/persagen/...9/ISSRE.2004.44
| |
| Michael Bolton 2006-05-01, 7:02 pm |
| >Is there anything (and I mean anything) that we can do NOW to make the software more stable and also prevent "knock on" issues?
Who is "we"? Does "we" include someone to whom the developers report?
If yes, then there are bunches of possible things, but I hesitate to
offer them without knowing a whole bunch more about your context.
If no, then there may be plenty of things that you can do to make your
testing more efficient, to get better coverage, to handle regression
testing more capably, and so on. But there's nothing that testers can
do to make software more stable unless those testers are also
developers or managers, or unless the testers have support from the
developers and managers. Until you have budgetary authority,
scheduling authority, hiring-and-firing authority, and so on, all you
have is potential influence. Testers don't drive the bus. We're the
bus driver's friend.
So: what's your role in the project?
---Michael B.
| |
| JXStern 2006-05-07, 10:05 pm |
| On Mon, 24 Apr 2006 20:53:09 +0100, paul <paul@xxx.co.uk> wrote:
>Hello,
> I am working on an a very large (12+ developers plus testers etc approx
>500, 000 lines of code)
That is not very large, it is sort of middle-sized. I'd say "large"
starts somewhere around 100.
> real time embedded software project which is way
>overdue
Define "way". Many projects are scheduled impossibly to begin with
(nudge, nudge, wink, wink), so that 100% overrun is actually expected.
> and also we cant seem to get down to a point where the number of
>critical issues (ie restarts, lockups etc) is zero.
That is worrisome.
Under the heading of "quality is built in, you can't test it in", you
have serious problems.
>It does seem that the developers (external company) are injecting new
>problems by rushing the fixes for the old problems although this is just
>a hunch.
Are they competent?
Have you ever met with their development team?
Do you have any visibility into their process?
>Is there anything (and I mean anything) that we can do NOW to make the
>software more stable and also prevent "knock on" issues?
Get on a plane and go visit them for a w .
>In an ideal world we would have started with full processes and
>procedures in place, however they didn't so here I am :-(
Enjoy.
Such is our industry today.
J.
>
>Thanks
>Paul
| |
| carhar 2006-05-08, 8:02 am |
| If you don't know the number of test cases required to establish a
confidence interval, then there is nothing you can say about its
correctness. Can you say now something like I am 95% +/- 2% confidence
in the correctness of the system? The literature states says this a lot
better: http://www.cs.utk.edu/sqrl/MBSTtutorialWeb.pdf The intent is
to verify function, not to exhautively test, which is most likely
impossible due to the combinatorial explosion of test cases.
Ask yourself what kind of quantitative statement can you make about the
accuracy of the system now. Have you looked at using a Gamma
distribution of the defects found over time versus the total defects to
date to show the curve of when you can expect to get to your acceptable
sigma number? You can see the slope of the curve trending downward,
hopefully, over time.
I believe it was the late Dr Harland Mills who said the 2nd (after req
problems) largest source of software defects is the test/debug/code
cycle. Defects should be sent back to the designers for evaluation,
else the programmers are introducing new design or changing the design
unintentionally and naively as they rework the code to accomodate the
"fix", meaning the intellectual control over the system is lost to the
designers.
carl wayne
paul wrote:
> Hello,
> I am working on an a very large (12+ developers plus testers etc approx
> 500, 000 lines of code) real time embedded software project which is way
> overdue and also we cant seem to get down to a point where the number of
> critical issues (ie restarts, lockups etc) is zero.
>
> It does seem that the developers (external company) are injecting new
> problems by rushing the fixes for the old problems although this is just
> a hunch.
>
> Is there anything (and I mean anything) that we can do NOW to make the
> software more stable and also prevent "knock on" issues?
>
> In an ideal world we would have started with full processes and
> procedures in place, however they didn't so here I am :-(
>
> Thanks
> Paul
|
|
|
|
|