Home > Archive > Software Testing > August 2007 > identifying factors for tester evaluation
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
identifying factors for tester evaluation
|
|
| zubair 2007-08-11, 4:43 am |
| Hi all
I am working on identifying factors and their wattage for evaluating
tester's performance. Can anybody suggest which factors to include and
which not?
E.g. one way can be to use following factors
1. Number of bugs
- For major bugs give 3 numbers
- For intermediate bugs give 2 numbers
- For minor bugs give 1 number
2. Time spent
- Total time given was 5 days, for each day use 1 number, e.g. if the
task is completed in 5 days give 5/5 numbers and incase task was
completed in less than 5 days then for each day add 1 number , like if
it was completed in 3 days then give the resource 7/5 numbers.
Similarly if the task was completed in 7 days then give 3/5 numbers.
And at the end just count the total numbers and check %, e.g. A
resource was give to complete a task in 5 days and the task was
completed 4 days with 30 bugs (10 major(10x3) + 12 intermediate(12x2)
and 8 minor(8x1)) this will be
(10x3 + 12x2 + 8x1)/30 + 6/5
= 62/30+6/5
= 206+120 = 326% which is the total score.
I know lacks number of things such as complexity of requirements
tested, standard of development, experience of resources etc. But this
is just a rough idea and I will appreciate if someone can improve
this.
Zubair
| |
| Rajeshkz 2007-08-11, 7:21 pm |
| On Aug 11, 11:33 am, zubair <zubairaslam1...@gmail.com> wrote:
> Hi all
>
> I am working on identifying factors and their wattage for evaluating
> tester's performance. Can anybody suggest which factors to include and
> which not?
> E.g. one way can be to use following factors
> 1. Number of bugs
> - For major bugs give 3 numbers
> - For intermediate bugs give 2 numbers
> - For minor bugs give 1 number
> 2. Time spent
> - Total time given was 5 days, for each day use 1 number, e.g. if the
> task is completed in 5 days give 5/5 numbers and incase task was
> completed in less than 5 days then for each day add 1 number , like if
> it was completed in 3 days then give the resource 7/5 numbers.
> Similarly if the task was completed in 7 days then give 3/5 numbers.
>
> And at the end just count the total numbers and check %, e.g. A
> resource was give to complete a task in 5 days and the task was
> completed 4 days with 30 bugs (10 major(10x3) + 12 intermediate(12x2)
> and 8 minor(8x1)) this will be
> (10x3 + 12x2 + 8x1)/30 + 6/5
> = 62/30+6/5
> = 206+120 = 326% which is the total score.
> I know lacks number of things such as complexity of requirements
> tested, standard of development, experience of resources etc. But this
> is just a rough idea and I will appreciate if someone can improve
> this.
>
> Zubair
I'd avoid having an objective model on evaluating the performance of a
tester. The parameters you have mentioned has external factors
influencing them. For example, the module being worked on by the
tester may be buggy or near perfect making the #of bugs found to be
depended on the developer. And also how can one decide the "total
expected time given"? what if the original estimates were incorrect?
A more subjective and effective way of evaluation will be the customer
satisfaction. Again a testers evaluation may not be limited by the
number of defects he finds, its more of the number and quality of
defects he can influence to address before the release. Observing
testers on their activities and giving a positive or negative feedback
on their accomplishments and failures, I feel is a more effective
method. Testing is a creative activity an cannot be compared to say
say archiving a sales target.
And always remember demings point#11 ( 11."Eliminate management by
objectives". Deming saw production targets as encouraging the delivery
of poor-quality goods. )
http://www.hci.com.au/hcisite2/articles/deming.htm
Rajesh.
| |
| H. S. Lahman 2007-08-11, 7:21 pm |
| Responding to Zubair...
> I am working on identifying factors and their wattage for evaluating
> tester's performance. Can anybody suggest which factors to include and
> which not?
I am not enthusiastic about specific performance metrics at the
individual level. Managers have plenty of other ways of evaluating their
people and such individual metrics are easily abused. If one is going to
collect explicit performance metrics it should be at the team level and
then only if there is an existing infrastructure for process improvement
that the team can use to improve its performance.
The culture one wants to promote is focused on solving problems, not
manipulating statistics for individual benefit. So collecting data
should be initiated by the team with some specific team problem in mind.
<a favorite anecdote>
Before retiring I worked in a process-oriented shop. At one point the
process for procedural development had evolved to the point where we
have very detailed functional specifications; one spec line usually
resulted in 1-3 lines of C code. All the problems had been worked out in
specification reviews so the coding was essentially a rote exercise.
During one process improvement cycle the Fat Rabbit problem happened to
be lines in the spec that were never implemented at all. The developers
averaged 8-12 such errors per K of spec lines. But for one developer
always had 0 errors. So we asked her what she did that was different. It
turns out she used a felt pen to high-light each spec line as she
implemented it. We were able to eliminate an entire class of defects by
simply having everyone use a high-lighter pen.
The point of the anecdote is that in that environment the culture was
focused on process improvement. While individual data was collected, it
was used solely to analyze the root cause of problems like omitting
lines from the spec when implementing. The fact that a particular
developer had 0 defects where all the rest had 8-12 was a process issue,
not a performance issue. (Besides, everyone already knew she was a
superstar and things like that Just Happened for her.)
</a favorite anecdote>
> E.g. one way can be to use following factors
> 1. Number of bugs
> - For major bugs give 3 numbers
> - For intermediate bugs give 2 numbers
> - For minor bugs give 1 number
If you are measuring QA performance, the severity of the bugs is
probably not of much interest. That usually only becomes important in
triage when the shop needs to decide what should be fixed with given
resources in a fixed time. The question you need to ask is: How does the
tester control what severity of defects are found?
As a practical matter that is difficult to do. Normally the closest one
can come is by prioritizing test cases in the Test Plan based upon some
criteria like most common usage. In practice it is usually easier to
prioritize the requirements and then use that to prioritize test cases
in the Test Plan. IOW, one handles severity by deciding what test cases
will be implemented and executed first. By the time the individual
tester is implementing or running test cases the work has already been done.
Another question to ask: Does the tester really have control over the
number of defects found? It seems to me that is primarily up to the
developer rather than the tester. IOW, if the developer inserts a lot of
defects into the application, the tester will likely find quite a few.
But if the developer doesn't insert a lot of defects the tester is not
going to find many.
Note that in shops where Engineering releases 5-Sigma software or better
to QA, there is a very real problem is getting decent sampling of defect
classes. At 5-Sigma there are only going to be 23 bugs in 100 KLOC total
and it is going to take a /lot/ of testing to find them. Trying to get a
good sampling for each of a dozen different fault classes will be
impossible. The vast majority of test cases will find no bugs at all so
it becomes serendipity if a particular tester's test cases happen to be
the ones that detect some of those very few defects. IOW, it becomes a
lottery with low expected value for whose test cases will catch the bugs.
The point here is that there is little point in evaluating performance
via metrics whose results the tester really can't influence very much.
> 2. Time spent
> - Total time given was 5 days, for each day use 1 number, e.g. if the
> task is completed in 5 days give 5/5 numbers and incase task was
> completed in less than 5 days then for each day add 1 number , like if
> it was completed in 3 days then give the resource 7/5 numbers.
> Similarly if the task was completed in 7 days then give 3/5 numbers.
Effort and elapsed time are certainly good metrics to have when
estimating the testing effort needed for a new project. However, I am a
bit about your calculation. The variance between estimated and
actual is normally expressed as a simple percentage.
Note that for estimation you need the actual data to be expressed in
time units so that one can form the time/size ratio for future estimates.
>
> And at the end just count the total numbers and check %, e.g. A
> resource was give to complete a task in 5 days and the task was
> completed 4 days with 30 bugs (10 major(10x3) + 12 intermediate(12x2)
> and 8 minor(8x1)) this will be
> (10x3 + 12x2 + 8x1)/30 + 6/5
> = 62/30+6/5
> = 206+120 = 326% which is the total score.
While you are normalizing in a clever way to combine defect-finding with
schedule compliance, I think this is risky for several reasons.
(1) Because the defect denominator is 6X larger than the schedule
denominator, small changes in schedule variance will have a larger
effect on the overall score if the ratio values are nearly the same.
(2) One factor will dominate the other if the ratios are quite
different. Note that in your example the defects provide nearly twice
the contribution to the total as the schedule. That difference in
contribution increases as the individual ratio values diverge. When
combined with (1) this could produce some interesting effects across
projects.
(3) You are still combining apples and oranges even though each ratio is
normalized. That is, defect density is not schedule variance. So you
need some sort of conversion factor to weight them relative to one
another. That bugger factor will define the relative importance the shop
gives to schedules vs. reliability.
(4) Ratio metrics are more difficult to interpret properly in general.
Combining two ratio metrics for quite different things just makes
interpretation that much harder. One way this is manifested is through
the point I made above, that it is hard to see why defect severity is
important to tester performance. But in you example it is the dominant
contributor.
> I know lacks number of things such as complexity of requirements
> tested, standard of development, experience of resources etc. But this
> is just a rough idea and I will appreciate if someone can improve
> this.
Other testing metrics that you might want to think about...
- test cases per effort hour. This may need to be normalized to the
number of requirements per test case. Also essential for estimation.
This can be collected for individuals if one is adamant about evaluating
individual performance.
- test cases actually executed. Sounds simplistic but I have a great
story about a test case that somehow didn't get executed that resulted
in an irate customer CEO bending the ear of a Group VP for half an hour.
It is also important if test cases are prioritized when resources are
limited.
- number of test cases / size of product. Also number of test cases per
feature (though this should usually be weighted by feature size class).
This track record is essential for estimating new projects.
- number of requirements intended to be tested per test case. This is
actually easier to collect than it sounds. Since in the initial cut at
test cases each test case targets only one requirement, when one
combines test cases for efficiency the requirements for the merged test
case is just the sum of the atomic test cases merged.
- average execution time per test case. (This is usually only important
for hybrid systems where there are complex hardware setups.)
- number of automated tests / total tests.
- defects found per test case. Both total and individual classes of
defects may be of interest. This can be very useful in evaluating the
effectiveness of the test suite.
Bottom line: if it can be measured it is fodder for a metric, especially
in process-oriented shops.
Note that there are different definitions of 'test case'. Purists tend
to regard a test case as a single pair of stimuli and responses while
others regard a test case as all of the stimuli and responses needed to
test a particular requirement while still others regard a test case as
simply a logical grouping (e.g., by feature) of an arbitrary suite of
stimuli and responses that are applied together.
*************
There is nothing wrong with me that could
not be cured by a capful of Drano.
H. S. Lahman
hsl@pathfindermda.com
Pathfinder Solutions
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
"Model-Based Translation: The Next Step in Agile Development". Email
info@pathfindermda.com for your copy.
Pathfinder is hiring:
http://www.pathfindermda.com/about_us/careers_pos3.php.
(888)OOA-PATH
| |
| Pradeep Soundararajan 2007-08-12, 7:18 pm |
| Cem Kaner suggests that a random number generator would make more
sense compared to the formulas available below to rate a testers
performance.
Here are a few questions, I'd ask myself if I want to use such
formulas to calculate a testers performance:
What if a tester finds no bugs and yet the customer is happy?
What if a smart skilled tester quits the job because he doesn't like
this way of rating his performance?
What if tester works with the developer, to induce some major issues
and performs an act to find it out?
What if the customers are already happy?
What if the testers come and ask me a training program that costs
$34998 which can actually help them find out fantastic bugs and score
very high, will I be able to fund the training?
What if testers decide to share the bugs across them in order to
normalize the scores?
What if my manager rates me based on the number of phone calls I make
to the customer?
What if all testers revolt and start hating me, which can make the
management to feel that I am a bad manager when I contribute to
attrition?
What if some testers use this as an evidence to say, "this is why I
kept telling he was a fool"?
What if testers challenge me to find more issues than them?
What if testers ask me, "Were you ever rated like this?" ?
What if testers spread the message that such a thing is happening in
their company and spoil the reputation of mine and the company? [ as
you have written this in this group, search other groups were your
subordinates are talking about this topic ]
What if testers think that testing is a number game and become happy
when their rating is good enough although there are many, many issues
to be found?
What if there are 9843349879834784937843 valid questions that I can
ask myself that can make me feel stupid of thinking about rating
testers with such formulas?
-- Pradeep Soundararajan - http://testertested.blogspot.com -
+91-98451-76817 - pradeep.srajan@gmail.com
"Pradeep's first language is not English--his first language appears
to be testing." -- Michael Bolton
On Aug 11, 11:33 am, zubair <zubairaslam1...@gmail.com> wrote:
> Hi all
>
> I am working on identifying factors and their wattage for evaluating
> tester's performance. Can anybody suggest which factors to include and
> which not?
> E.g. one way can be to use following factors
> 1. Number of bugs
> - For major bugs give 3 numbers
> - For intermediate bugs give 2 numbers
> - For minor bugs give 1 number
> 2. Time spent
> - Total time given was 5 days, for each day use 1 number, e.g. if the
> task is completed in 5 days give 5/5 numbers and incase task was
> completed in less than 5 days then for each day add 1 number , like if
> it was completed in 3 days then give the resource 7/5 numbers.
> Similarly if the task was completed in 7 days then give 3/5 numbers.
>
> And at the end just count the total numbers and check %, e.g. A
> resource was give to complete a task in 5 days and the task was
> completed 4 days with 30 bugs (10 major(10x3) + 12 intermediate(12x2)
> and 8 minor(8x1)) this will be
> (10x3 + 12x2 + 8x1)/30 + 6/5
> = 62/30+6/5
> = 206+120 = 326% which is the total score.
> I know lacks number of things such as complexity of requirements
> tested, standard of development, experience of resources etc. But this
> is just a rough idea and I will appreciate if someone can improve
> this.
>
> Zubair
| |
| Vladimir Trushkin 2007-08-13, 8:16 am |
| On Aug 11, 9:33 am, zubair <zubairaslam1...@gmail.com> wrote:
> Hi all
>
> I am working on identifying factors and their wattage for evaluating
> tester's performance. Can anybody suggest which factors to include and
> which not?
> E.g. one way can be to use following factors
> 1. Number of bugs
> - For major bugs give 3 numbers
> - For intermediate bugs give 2 numbers
> - For minor bugs give 1 number
> 2. Time spent
> - Total time given was 5 days, for each day use 1 number, e.g. if the
> task is completed in 5 days give 5/5 numbers and incase task was
> completed in less than 5 days then for each day add 1 number , like if
> it was completed in 3 days then give the resource 7/5 numbers.
> Similarly if the task was completed in 7 days then give 3/5 numbers.
>
> And at the end just count the total numbers and check %, e.g. A
> resource was give to complete a task in 5 days and the task was
> completed 4 days with 30 bugs (10 major(10x3) + 12 intermediate(12x2)
> and 8 minor(8x1)) this will be
> (10x3 + 12x2 + 8x1)/30 + 6/5
> = 62/30+6/5
> = 206+120 = 326% which is the total score.
> I know lacks number of things such as complexity of requirements
> tested, standard of development, experience of resources etc. But this
> is just a rough idea and I will appreciate if someone can improve
> this.
>
> Zubair
There is a well know saying "what you measure is what you get". If you
measure defects you get defects (mainly small, silly and duplicates).
If you measure tests they become so small that can be executed
thousand in 10 minutes.
The moral here is never be bound to objective parameters in measuring
one's contribution. You may use those numbers but only in the context
of what a person was doing. If two testers run 100 tests each and one
did it in two days but she has found 5 defects whereas another has run
it in one day but has found nothing, who is the best? I can hardly
answer this question without a context.
I suggest looking at the problem from another end. Just ask yourself
who is the best tester in your group then try to figure out why. If
you can't you are simply not ready to create any kind of performance
metric. Once you figured out what makes your best tester truly the
best this is not a big deal to evaluate others using that scale.
Besides looking at quantities start looking at qualities
(communication skills, professional skills, responsibility, etc.).
Quantity is just one of the metrics.
And there is absolutely no magic formula. Once in a while I had same
idea to create it. Yet I timely realized I am on a wrong path.
----
Best Wishes,
Vladimir
| |
| BridgeCollapse 2007-08-13, 7:25 pm |
| On Aug 13, 5:37 am, Vladimir Trushkin <Vladimir.Trush...@gmail.com>
wrote:
> On Aug 11, 9:33 am, zubair <zubairaslam1...@gmail.com> wrote:
>
>
>
>
>
>
>
>
>
> There is a well know saying "what you measure is what you get". If you
> measure defects you get defects (mainly small, silly and duplicates).
> If you measure tests they become so small that can be executed
> thousand in 10 minutes.
>
> The moral here is never be bound to objective parameters in measuring
> one's contribution. You may use those numbers but only in the context
> of what a person was doing. If two testers run 100 tests each and one
> did it in two days but she has found 5 defects whereas another has run
> it in one day but has found nothing, who is the best? I can hardly
> answer this question without a context.
>
> I suggest looking at the problem from another end. Just ask yourself
> who is the best tester in your group then try to figure out why. If
> you can't you are simply not ready to create any kind of performance
> metric. Once you figured out what makes your best tester truly the
> best this is not a big deal to evaluate others using that scale.
>
> Besides looking at quantities start looking at qualities
> (communication skills, professional skills, responsibility, etc.).
> Quantity is just one of the metrics.
>
> And there is absolutely no magic formula. Once in a while I had same
> idea to create it. Yet I timely realized I am on a wrong path.
>
> ----
> Best Wishes,
> Vladimir- Hide quoted text -
>
> - Show quoted text -
I'd mearure them by the number of words spoken in a normalized 30
minute meeting period. Secondary measure, number of belt loops on
belt, unless your a femaile and it's Tuesday, then I'd go by number of
tires on their car.
As with the prior (and more sane) responses, there is not a reasonable-
standard measureable test by which to gage tester performance. Try
being a manager, and manage your resources, rather than pigeon holing
them.
Mr. Gibberish.
| |
| Vladimir Trushkin 2007-08-14, 4:53 am |
| On Aug 14, 1:42 am, BridgeCollapse <tomhori...@gmail.com> wrote:
>
> I'd mearure them by the number of words spoken in a normalized 30
> minute meeting period. Secondary measure, number of belt loops on
> belt, unless your a femaile and it's Tuesday, then I'd go by number of
> tires on their car.
>
You responded to my post or to the top of this topic? If you responded
to mine then you did not get it. I was against measuring bugs, hours
of anything alone and I am not a fun of building a magic number that
will tell one person is 100 [something] better than another. Please re-
read my post.
----
Best Wishes,
Vladimir
|
|
|
|
|