For Programmers: Free Programming Magazines  


Home > Archive > Functional > April 2007 > Finding median and mode (Haskell)









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Finding median and mode (Haskell)
w3stfa11

2007-04-30, 4:06 am

I'm not quite sure how to do this. I need to find the median and the
mode of an unsorted list of integers. I was wondering if someone could
give me a hint or show me how to approach the problem. I'm very new to
Haskell and functional programming in general and I would appreciate
all the help I can get. Thank you.

Anthony

Paul Rubin

2007-04-30, 4:06 am

w3stfa11 <w3stfa11@gmail.com> writes:
> I'm not quite sure how to do this. I need to find the median and the
> mode of an unsorted list of integers.


The simplest (but not fastest) way to get the median is sort the
list (use List.sort) and pick the middle element. I can think of
some functional ways to find the mode but I'd rather see what the
experts say.
Daniel C. Wang

2007-04-30, 4:06 am

sort | uniq -c | sort -rn

A hint as a unix pipeline.


Paul Rubin wrote:
> w3stfa11 <w3stfa11@gmail.com> writes:
>
> The simplest (but not fastest) way to get the median is sort the
> list (use List.sort) and pick the middle element. I can think of
> some functional ways to find the mode but I'd rather see what the
> experts say.

Torben Ęgidius Mogensen

2007-04-30, 8:12 am

Paul Rubin <http://phr.cx@NOSPAM.invalid> writes:

> w3stfa11 <w3stfa11@gmail.com> writes:
>
> The simplest (but not fastest) way to get the median is sort the
> list (use List.sort) and pick the middle element.


That is, indeed, the easist way, but far from the fastest. There are
linear-time median finders, and sorting is n*log(n). In practice,
guaranteed linear-time median finders are complex and a bit slow, but
probabilistic linear-time median finders are easy and not too
difficult. Here is one based on quicksort:

median l = nthLeast l (length l `div` 2)

nthLeast (x:xs) 0 = x
nthLeast (x:xs) n
= let (m,low,high) = split x xs 0 [] [] in
if n<=m then nthLeast low n
else if n=n+1 then x
else nthLeast high (n-m-1)

split x [] m low high = (m,low,high)
split x (y:ys) m low high
= if y<=x then split x ys (m+1) (y:low) high
else split x ys m low (y:high)

It is just hacked down with no testing, but even if I mistyped
something, the idea should be clear enough.

Since the pivot element is always the first in the list, it works bad
for sorted lists. If you think the list may be sorted, randomize it
first or pick a random elemnt for pivot.

> I can think of some functional ways to find the mode but I'd rather
> see what the experts say.


A simple way is to sort and look for the longest run (subsequence of
identical elements), but it can be done faster. Look at
http://www.plan-x.org/msd/multiset-discrimination.pdf for a fast way
of grouping identical elements. After that, it is easy enough to find
the largest group.

Torben
Simon Richard Clarkstone

2007-04-30, 10:04 pm

w3stfa11 wrote:
> I'm not quite sure how to do this. I need to find the median and the
> mode of an unsorted list of integers. I was wondering if someone could
> give me a hint or show me how to approach the problem. I'm very new to
> Haskell and functional programming in general and I would appreciate
> all the help I can get. Thank you.


It is possible to find the median (or any other nth-order statistic) in
linear time. The basic algorithm is to guess at its value, partition
the list around this value, and see if you have the correct number of
elements greater and less than your guess. If you guessed wrong, you
pick the partition that you know it must be in, then figure out which
order statistic it is of that partition and re-try. It is a bit like a
binary search.

For example, to find the median of the list L: 1 2 3 4 5 6 7. You know
L has 7 elements, so the median (M) is the 4th smallest element (the 4th
order-statistic).

You guess that the 4th-smallest element of L is 2, then partition:
(1) 2 (3 4 5 6 7)
You can see from the list lengths that 2 must be the 2nd-smallest
element, so by subtracting 2 from 4, you know that M is the 2nd-smallest
element of the second partition.

You guess that the 2nd-smallest of (3 4 5 6) is 4, then partition:
(3) 4 (5 6)
You can see from the list lengths that you are right: M is 4.

-----

But how to generate a good guess for the approximate median at each
stage? That can be done like this:

To estimate the median of a list:

********************

divide the list into groups of 5:

***** ***** ***** *****

find the exact median of each of those groups of 5 (via hard-coded logic):
* * * *

Then estimate the median of that list (i.e. tail recursion).

This is guaranteed to produce a good-enough guess (between 30th and 70th
percentile IIRC) that the above divide-and-conquer algorithm is linear
when using it.

-----

See also:

http://en.wikipedia.org/wiki/Selection_algorithm
http://en.wikipedia.org/wiki/Order_statistic

--
Simon Richard Clarkstone:
s.r.cl?rkst?n?@durham.ac.uk/s?m?n.cl?rkst?n?@hotmail.com
Scheme guy says: (> Scheme Haskell)
Haskell guy says: (> Scheme) Haskell
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2009 codecomments.com