<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Data Crunching in Haskell</title>
	<atom:link href="http://buffered.io/2009/06/25/data-crunching-in-haskell/feed/" rel="self" type="application/rss+xml" />
	<link>http://buffered.io/2009/06/25/data-crunching-in-haskell/</link>
	<description>What would OJ do?</description>
	<lastBuildDate>Fri, 23 Jul 2010 14:33:02 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
	<item>
		<title>By: OJ</title>
		<link>http://buffered.io/2009/06/25/data-crunching-in-haskell/comment-page-1/#comment-1349</link>
		<dc:creator>OJ</dc:creator>
		<pubDate>Sat, 27 Jun 2009 09:28:38 +0000</pubDate>
		<guid isPermaLink="false">http://buffered.io/?p=719#comment-1349</guid>
		<description>&lt;p&gt;@James: Wow, thanks! You&#039;ve saved me the pain of having to do it myself. :) The timings don&#039;t surprise me to be honest. The compiler is pretty damned smart.

I do prefer my solution so far though. Transposing is a very obvious solution to the problem and if anyone needed to read the code afterwards I think the solution with transpose would be digested the fastest. Yup that&#039;s the maintenance coder coming out in me.

Anyway, thanks for taking the time to do the profiling, and the commenting :)

@Axman6: Thanks for commenting! I agree, it does look like it&#039;d be nice and quick. If and when I get the chance to profile it I&#039;ll put some stats up. Unless you&#039;ve already profiled it...??? :)

@Dan: That&#039;s a great question :) One that I&#039;ve spent a few days thinking about because I wasn&#039;t sure exactly what the answer was. To be honest, I&#039;m not sure I do now! But I&#039;ll give it a crack. Pardon me if I don&#039;t articulate it very well.

The first part of the problem is that most people who are writing code that lacks performance don&#039;t know what they don&#039;t know. It&#039;s becoming more and more common for non-developers, or inexperienced developers, to build production level systems in code without really having a full grasp of what&#039;s required to write high quality and efficient software. That wasn&#039;t intended to be a stab :) That&#039;s a gross generalisation!

So people will go through their formal education in their own areas and then attempt to fill whatever gaps they have in the real world with their untrained efforts.

The perfect parallel to this is programmer art! Everyone knows what happens when a developer attempts to create a user interface. It&#039;s abysmal. The colours are bad, there&#039;s no &quot;flow&quot;, it&#039;s abrasive to use and the UI looks awful. It&#039;s a typical case of a programmer thinking they can design. Most of the time, they can&#039;t. They just think they can! Thankfully, I am in the boat where I know I can&#039;t do design :) I can write you an interface that works, but don&#039;t expect it to be nice! That&#039;s where I aim to utilise other people&#039;s talents.

So I guess what I&#039;m saying is that without a good background in development or some formal training in algorithms, you&#039;re going to find it hard because the points where there are inefficiencies aren&#039;t going to be immediately obvious to you.

Having said all that, there are definitely things you can do to aid you in making your systems function better. These are the kinds of things that anyone can do. Those things are the content of another blog post :) I&#039;m inspired to write them up in a more formal way and actually try to do a better job of justifying what I&#039;ve said here.

So without sounding cheezy, watch this space! I&#039;ll have a solid answer for you very shortly.&lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>@James: Wow, thanks! You&#8217;ve saved me the pain of having to do it myself. <img src='http://buffered.io/wp-content/plugins/smilies-themer/Silk/emoticon_smile.png' alt=':)' class='wp-smiley' /> The timings don&#8217;t surprise me to be honest. The compiler is pretty damned smart.</p>
<p>I do prefer my solution so far though. Transposing is a very obvious solution to the problem and if anyone needed to read the code afterwards I think the solution with transpose would be digested the fastest. Yup that&#8217;s the maintenance coder coming out in me.</p>
<p>Anyway, thanks for taking the time to do the profiling, and the commenting <img src='http://buffered.io/wp-content/plugins/smilies-themer/Silk/emoticon_smile.png' alt=':)' class='wp-smiley' /> </p>
<p>@Axman6: Thanks for commenting! I agree, it does look like it&#8217;d be nice and quick. If and when I get the chance to profile it I&#8217;ll put some stats up. Unless you&#8217;ve already profiled it&#8230;??? <img src='http://buffered.io/wp-content/plugins/smilies-themer/Silk/emoticon_smile.png' alt=':)' class='wp-smiley' /> </p>
<p>@Dan: That&#8217;s a great question <img src='http://buffered.io/wp-content/plugins/smilies-themer/Silk/emoticon_smile.png' alt=':)' class='wp-smiley' /> One that I&#8217;ve spent a few days thinking about because I wasn&#8217;t sure exactly what the answer was. To be honest, I&#8217;m not sure I do now! But I&#8217;ll give it a crack. Pardon me if I don&#8217;t articulate it very well.</p>
<p>The first part of the problem is that most people who are writing code that lacks performance don&#8217;t know what they don&#8217;t know. It&#8217;s becoming more and more common for non-developers, or inexperienced developers, to build production level systems in code without really having a full grasp of what&#8217;s required to write high quality and efficient software. That wasn&#8217;t intended to be a stab <img src='http://buffered.io/wp-content/plugins/smilies-themer/Silk/emoticon_smile.png' alt=':)' class='wp-smiley' /> That&#8217;s a gross generalisation!</p>
<p>So people will go through their formal education in their own areas and then attempt to fill whatever gaps they have in the real world with their untrained efforts.</p>
<p>The perfect parallel to this is programmer art! Everyone knows what happens when a developer attempts to create a user interface. It&#8217;s abysmal. The colours are bad, there&#8217;s no &#8220;flow&#8221;, it&#8217;s abrasive to use and the UI looks awful. It&#8217;s a typical case of a programmer thinking they can design. Most of the time, they can&#8217;t. They just think they can! Thankfully, I am in the boat where I know I can&#8217;t do design <img src='http://buffered.io/wp-content/plugins/smilies-themer/Silk/emoticon_smile.png' alt=':)' class='wp-smiley' /> I can write you an interface that works, but don&#8217;t expect it to be nice! That&#8217;s where I aim to utilise other people&#8217;s talents.</p>
<p>So I guess what I&#8217;m saying is that without a good background in development or some formal training in algorithms, you&#8217;re going to find it hard because the points where there are inefficiencies aren&#8217;t going to be immediately obvious to you.</p>
<p>Having said all that, there are definitely things you can do to aid you in making your systems function better. These are the kinds of things that anyone can do. Those things are the content of another blog post <img src='http://buffered.io/wp-content/plugins/smilies-themer/Silk/emoticon_smile.png' alt=':)' class='wp-smiley' /> I&#8217;m inspired to write them up in a more formal way and actually try to do a better job of justifying what I&#8217;ve said here.</p>
<p>So without sounding cheezy, watch this space! I&#8217;ll have a solid answer for you very shortly.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Axman6</title>
		<link>http://buffered.io/2009/06/25/data-crunching-in-haskell/comment-page-1/#comment-1345</link>
		<dc:creator>Axman6</dc:creator>
		<pubDate>Fri, 26 Jun 2009 16:05:14 +0000</pubDate>
		<guid isPermaLink="false">http://buffered.io/?p=719#comment-1345</guid>
		<description>Just my 2 cents, this looks fairly ugly, but i have a feeling it might be fairly efficient:
&lt;pre lang=&quot;haskell&quot;&gt;extremes = (\(y:ys) -&gt; foldl&#039; (\(l,h) x -&gt; (min l x, max h x)) (y,y) ys) . foldl (zipWith (+)) (repeat 0)&lt;/pre&gt;
Seems I had a lot of the same ideas as Dougal. I think that at least the left foldl&#039; should be optimised pretty nicely.
-- Axman6</description>
		<content:encoded><![CDATA[<p>Just my 2 cents, this looks fairly ugly, but i have a feeling it might be fairly efficient:</p>

<div class="wp_syntax"><div class="code"><pre class="haskell" style="font-family:monospace;">extremes <span style="color: #339933; font-weight: bold;">=</span> <span style="color: green;">&#40;</span>\<span style="color: green;">&#40;</span>y:ys<span style="color: green;">&#41;</span> <span style="color: #339933; font-weight: bold;">-&gt;</span> <span style="font-weight: bold;">foldl</span>' <span style="color: green;">&#40;</span>\<span style="color: green;">&#40;</span>l<span style="color: #339933; font-weight: bold;">,</span>h<span style="color: green;">&#41;</span> x <span style="color: #339933; font-weight: bold;">-&gt;</span> <span style="color: green;">&#40;</span><span style="font-weight: bold;">min</span> l x<span style="color: #339933; font-weight: bold;">,</span> <span style="font-weight: bold;">max</span> h x<span style="color: green;">&#41;</span><span style="color: green;">&#41;</span> <span style="color: green;">&#40;</span>y<span style="color: #339933; font-weight: bold;">,</span>y<span style="color: green;">&#41;</span> ys<span style="color: green;">&#41;</span> <span style="color: #339933; font-weight: bold;">.</span> <span style="font-weight: bold;">foldl</span> <span style="color: green;">&#40;</span><span style="font-weight: bold;">zipWith</span> <span style="color: green;">&#40;</span><span style="color: #339933; font-weight: bold;">+</span><span style="color: green;">&#41;</span><span style="color: green;">&#41;</span> <span style="color: green;">&#40;</span><span style="font-weight: bold;">repeat</span> <span style="color: red;">0</span><span style="color: green;">&#41;</span></pre></div></div>

<p>Seems I had a lot of the same ideas as Dougal. I think that at least the left foldl&#8217; should be optimised pretty nicely.<br />
&#8211; Axman6</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: James</title>
		<link>http://buffered.io/2009/06/25/data-crunching-in-haskell/comment-page-1/#comment-1342</link>
		<dc:creator>James</dc:creator>
		<pubDate>Thu, 25 Jun 2009 09:33:17 +0000</pubDate>
		<guid isPermaLink="false">http://buffered.io/?p=719#comment-1342</guid>
		<description>I ran these three:
&lt;pre lang=&quot;haskell&quot;&gt;e1 = (maximum &amp;&amp;&amp; minimum) . map sum . transpose
e2 = foldl1 (curry (uncurry max . fst &amp;&amp;&amp; uncurry min . snd)) . join zip . map sum . transpose
e3 = foldl1 (curry (uncurry max . fst &amp;&amp;&amp; uncurry min . snd)) . join zip . foldr (zipWith (+)) (repeat 0)&lt;/pre&gt;
... and they all came up around the 3s mark, jostling for top spot depending on how many times I can them.

I think this might mean that GHC is clever enough to prune the intermediate matrix? Or this could be just an effect of lazy evaluation (effectively doing the pruning automatically). Hard to tell really. 

@Dan - The haskell profiling stuff is pretty good, easiest way to find out if something is going wrong (wrong order polynomial) is to run it a bunch of times and graph it!</description>
		<content:encoded><![CDATA[<p>I ran these three:</p>

<div class="wp_syntax"><div class="code"><pre class="haskell" style="font-family:monospace;">e1 <span style="color: #339933; font-weight: bold;">=</span> <span style="color: green;">&#40;</span><span style="font-weight: bold;">maximum</span> <span style="color: #339933; font-weight: bold;">&amp;&amp;</span>&amp; <span style="font-weight: bold;">minimum</span><span style="color: green;">&#41;</span> <span style="color: #339933; font-weight: bold;">.</span> <span style="font-weight: bold;">map</span> <span style="font-weight: bold;">sum</span> <span style="color: #339933; font-weight: bold;">.</span> transpose
e2 <span style="color: #339933; font-weight: bold;">=</span> <span style="font-weight: bold;">foldl1</span> <span style="color: green;">&#40;</span><span style="font-weight: bold;">curry</span> <span style="color: green;">&#40;</span><span style="font-weight: bold;">uncurry</span> <span style="font-weight: bold;">max</span> <span style="color: #339933; font-weight: bold;">.</span> <span style="font-weight: bold;">fst</span> <span style="color: #339933; font-weight: bold;">&amp;&amp;</span>&amp; <span style="font-weight: bold;">uncurry</span> <span style="font-weight: bold;">min</span> <span style="color: #339933; font-weight: bold;">.</span> <span style="font-weight: bold;">snd</span><span style="color: green;">&#41;</span><span style="color: green;">&#41;</span> <span style="color: #339933; font-weight: bold;">.</span> join <span style="font-weight: bold;">zip</span> <span style="color: #339933; font-weight: bold;">.</span> <span style="font-weight: bold;">map</span> <span style="font-weight: bold;">sum</span> <span style="color: #339933; font-weight: bold;">.</span> transpose
e3 <span style="color: #339933; font-weight: bold;">=</span> <span style="font-weight: bold;">foldl1</span> <span style="color: green;">&#40;</span><span style="font-weight: bold;">curry</span> <span style="color: green;">&#40;</span><span style="font-weight: bold;">uncurry</span> <span style="font-weight: bold;">max</span> <span style="color: #339933; font-weight: bold;">.</span> <span style="font-weight: bold;">fst</span> <span style="color: #339933; font-weight: bold;">&amp;&amp;</span>&amp; <span style="font-weight: bold;">uncurry</span> <span style="font-weight: bold;">min</span> <span style="color: #339933; font-weight: bold;">.</span> <span style="font-weight: bold;">snd</span><span style="color: green;">&#41;</span><span style="color: green;">&#41;</span> <span style="color: #339933; font-weight: bold;">.</span> join <span style="font-weight: bold;">zip</span> <span style="color: #339933; font-weight: bold;">.</span> <span style="font-weight: bold;">foldr</span> <span style="color: green;">&#40;</span><span style="font-weight: bold;">zipWith</span> <span style="color: green;">&#40;</span><span style="color: #339933; font-weight: bold;">+</span><span style="color: green;">&#41;</span><span style="color: green;">&#41;</span> <span style="color: green;">&#40;</span><span style="font-weight: bold;">repeat</span> <span style="color: red;">0</span><span style="color: green;">&#41;</span></pre></div></div>

<p>&#8230; and they all came up around the 3s mark, jostling for top spot depending on how many times I can them.</p>
<p>I think this might mean that GHC is clever enough to prune the intermediate matrix? Or this could be just an effect of lazy evaluation (effectively doing the pruning automatically). Hard to tell really. </p>
<p>@Dan &#8211; The haskell profiling stuff is pretty good, easiest way to find out if something is going wrong (wrong order polynomial) is to run it a bunch of times and graph it!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dan</title>
		<link>http://buffered.io/2009/06/25/data-crunching-in-haskell/comment-page-1/#comment-1340</link>
		<dc:creator>Dan</dc:creator>
		<pubDate>Thu, 25 Jun 2009 00:02:29 +0000</pubDate>
		<guid isPermaLink="false">http://buffered.io/?p=719#comment-1340</guid>
		<description>@jberryman @oj - this raises another question (at least in my mind) - is there any giveaways that suggest code needs optimisation? (other than the knowledge that im not a coding guru)? IE, OJ, you talked about &lt;a href=&quot;http://www.shiftperception.com/blog/2009/06/16/unique-records-in-as3/&quot; rel=&quot;nofollow&quot;&gt;O(n) with relation to the indexOf method in AS3&lt;/a&gt; - but how did you know that that method was the weakest link and the thing that needed improvement?</description>
		<content:encoded><![CDATA[<p>@jberryman @oj &#8211; this raises another question (at least in my mind) &#8211; is there any giveaways that suggest code needs optimisation? (other than the knowledge that im not a coding guru)? IE, OJ, you talked about <a href="http://www.shiftperception.com/blog/2009/06/16/unique-records-in-as3/" rel="nofollow">O(n) with relation to the indexOf method in AS3</a> &#8211; but how did you know that that method was the weakest link and the thing that needed improvement?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: OJ</title>
		<link>http://buffered.io/2009/06/25/data-crunching-in-haskell/comment-page-1/#comment-1339</link>
		<dc:creator>OJ</dc:creator>
		<pubDate>Wed, 24 Jun 2009 22:47:32 +0000</pubDate>
		<guid isPermaLink="false">http://buffered.io/?p=719#comment-1339</guid>
		<description>&lt;p&gt;@Dougal: Thanks for your comment mate. You&#039;re right, the list folding method with zipWith is indeed a better option. I saw your &lt;a href=&quot;http://www.reddit.com/r/haskell/comments/8v95i/data_crunching_in_haskell/&quot; rel=&quot;nofollow&quot;&gt;comment&lt;/a&gt; on reddit (didn&#039;t know it was submitted there until you posted ;)) and I have to agree, the removal of the transpose looks to be a good one.&lt;/p&gt;
&lt;p&gt;@jberryman: You raise a good point :) I am yet to profile any of the solutions mentioned above. That is something I&#039;ll aim to do at some stage today or tomorrow and I&#039;ll post the results as an edit to the blog post. Thanks for the feedback!&lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>@Dougal: Thanks for your comment mate. You&#8217;re right, the list folding method with zipWith is indeed a better option. I saw your <a href="http://www.reddit.com/r/haskell/comments/8v95i/data_crunching_in_haskell/" rel="nofollow">comment</a> on reddit (didn&#8217;t know it was submitted there until you posted ;)) and I have to agree, the removal of the transpose looks to be a good one.</p>
<p>@jberryman: You raise a good point <img src='http://buffered.io/wp-content/plugins/smilies-themer/Silk/emoticon_smile.png' alt=':)' class='wp-smiley' /> I am yet to profile any of the solutions mentioned above. That is something I&#8217;ll aim to do at some stage today or tomorrow and I&#8217;ll post the results as an edit to the blog post. Thanks for the feedback!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jberryman</title>
		<link>http://buffered.io/2009/06/25/data-crunching-in-haskell/comment-page-1/#comment-1338</link>
		<dc:creator>jberryman</dc:creator>
		<pubDate>Wed, 24 Jun 2009 18:58:47 +0000</pubDate>
		<guid isPermaLink="false">http://buffered.io/?p=719#comment-1338</guid>
		<description>Out of curiosity, did you profile your various approaches? It would probably take a larger data set, but I often find that what looks like an obviously-inefficient piece of code turns out to be no better than my more explicit and longer version.
Would be interested to see if that is the case here.</description>
		<content:encoded><![CDATA[<p>Out of curiosity, did you profile your various approaches? It would probably take a larger data set, but I often find that what looks like an obviously-inefficient piece of code turns out to be no better than my more explicit and longer version.<br />
Would be interested to see if that is the case here.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dougal Stanton</title>
		<link>http://buffered.io/2009/06/25/data-crunching-in-haskell/comment-page-1/#comment-1337</link>
		<dc:creator>Dougal Stanton</dc:creator>
		<pubDate>Wed, 24 Jun 2009 17:10:48 +0000</pubDate>
		<guid isPermaLink="false">http://buffered.io/?p=719#comment-1337</guid>
		<description>I said this on Reddit more succinctly, but:
There shouldn&#039;t be any need to transpose the lists. For each pair of lists &lt;code&gt;[a1,b1,c1,...]&lt;/code&gt; and &lt;code&gt;[a2,b2,c2,...]&lt;/code&gt; you want to join them element-wise: &lt;code&gt;zipWith (+) [a1,b1,c1,...] [a2,b2,c2,...]&lt;/code&gt;
.
If you&#039;re guaranteed to have valid lists you can use &lt;code&gt;foldr1&lt;/code&gt; to repeatedly apply this until all the lists are joined together element-wise. If not &lt;code&gt;foldr f (repeat 0)&lt;/code&gt; will work.
I don&#039;t actually know if this is any cheaper, but it seems foolish to transpose a list-of-lists if you don&#039;t actually want that resulting format.</description>
		<content:encoded><![CDATA[<p>I said this on Reddit more succinctly, but:<br />
There shouldn&#8217;t be any need to transpose the lists. For each pair of lists <code>[a1,b1,c1,...]</code> and <code>[a2,b2,c2,...]</code> you want to join them element-wise: <code>zipWith (+) [a1,b1,c1,...] [a2,b2,c2,...]</code><br />
.<br />
If you&#8217;re guaranteed to have valid lists you can use <code>foldr1</code> to repeatedly apply this until all the lists are joined together element-wise. If not <code>foldr f (repeat 0)</code> will work.<br />
I don&#8217;t actually know if this is any cheaper, but it seems foolish to transpose a list-of-lists if you don&#8217;t actually want that resulting format.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
