OJ's rants

It's not about you, it's about the software

A Better 'Nub'

| Comments

During my Haskell travels I have found myself using the nub function quite regularly. For those too lazy to click the link: nub removes duplicates from a list of items. eg:

1
2
Prelude> nub [1,1,3,3,5,5,6,6,6,1]
[1,3,5,6]

Fairly simple stuff. Until recently I hadn’t bothered pondering the internal implementation of this function because I hadn’t really been too worried about performance. That’s no longer the case. I recently cracked open the hood of nub, and was rather surprised by what I saw.

Since the source of Prelude is in the public domain, I’m safe to show you the content:

1
2
3
4
5
6
nub                     :: (Eq a) => [a] -> [a]
nub                      = nubBy (==)

nubBy                   :: (a -> a -> Bool) -> [a] -> [a]
nubBy eq []              = []
nubBy eq (x:xs)          = x : nubBy eq (filter (\y -> not (eq x y)) xs)

So nub uses nubBy and passes (==) as the filter for the items. This is all well and good, but the result is an O(n2) level of complexity! In this day and age that’s pretty poor for a simple filtering mechanism, especially when we have more powerful mechanisms built-in to most libraries.

Since seeing this I had added a “write my own nub” task to my TODO list, but I haven’t yet got round to it. So you can imagine my delight when I found this comment by Jedai over at hvergi.net. He’d pointed out that nub was indeed inefficient, and showed an alternative implementation:

Note that nub only demands that the list element be part of the Eq typeclass. As a result it is very inefficient and a better solution must always be prefered whenever the nature of the elements allows it.

1
2
3
4
5
6
import qualified Data.Set as S
nub' :: (Ord a) => [a] -> [a]
nub' = go S.empty
  where go _ [] = []
        go s (x:xs) | S.member x s = go s xs
                    | otherwise    = x : go (S.insert x s) xs
La diffrence de performance est norme.

You got that right! Thanks to Jedai (and Arnar via hvergi.net) I now have a much faster nub implementation. Cheers guys!

It just goes to show that even these days you can’t always trust the libraries. There is no harm is taking a look under the hood to see what’s going on! You don’t have to reinvent the wheel, but there’s nothing wrong with making the wheel smoother :)

Comments