During my Haskell travels I have found myself using the nub function quite regularly. For those too lazy to click the link: nub removes duplicates from a list of items. eg:
1 2 | |
Fairly simple stuff. Until recently I hadn’t bothered pondering the internal implementation of this function because I hadn’t really been too worried about performance. That’s no longer the case. I recently cracked open the hood of nub, and was rather surprised by what I saw.
Since the source of Prelude is in the public domain, I’m safe to show you the content:
1 2 3 4 5 6 | |
So nub uses nubBy and passes (==) as the filter for the items. This is all well and good, but the result is an O(n2) level of complexity! In this day and age that’s pretty poor for a simple filtering mechanism, especially when we have more powerful mechanisms built-in to most libraries.
Since seeing this I had added a “write my own nub” task to my TODO list, but I haven’t yet got round to it. So you can imagine my delight when I found this comment by Jedai over at hvergi.net. He’d pointed out that nub was indeed inefficient, and showed an alternative implementation:
Note that nub only demands that the list element be part of the Eq typeclass. As a result it is very inefficient and a better solution must always be prefered whenever the nature of the elements allows it.
La diffrence de performance est norme.
1 2 3 4 5 6import qualified Data.Set as S nub' :: (Ord a) => [a] -> [a] nub' = go S.empty where go _ [] = [] go s (x:xs) | S.member x s = go s xs | otherwise = x : go (S.insert x s) xs
You got that right! Thanks to Jedai (and Arnar via hvergi.net) I now have a much faster nub implementation. Cheers guys!
It just goes to show that even these days you can’t always trust the libraries. There is no harm is taking a look under the hood to see what’s going on! You don’t have to reinvent the wheel, but there’s nothing wrong with making the wheel smoother :)