OJ’s rants What would OJ do?

25Mar/097

A Quirk in List.Find()

Earlier today I was having a chat with a friend of mine, who lives in Vancouver, about finding items that are stored in generic Lists. He flicked me a code snippet that looked something like this:

List<foo> list = new List<foo>();
// .. do some stuff
Foo f = list.Find(delegate(Foo f) { return foo.Name == "Bar"; });

Straight away I fired back with an update to the code which used lambda expressions instead, as I'm a fan of how concise they are ;)

List<foo> list = new List<foo>();
// .. do some stuff
Foo f = list.Find(foo => foo.Name == "Bar");

My friend ran this code against a data set that he had constructed and found that when the call to Find() was made, a NullReferenceException was being thrown. I found this odd as I hadn't seen that before. list was definitely a valid reference and the lambda expression was well-formed as well. So what was wrong?

It turns out that even though list was a valid reference, it didn't contain any elements.

How odd! Why would the generic List object throw an exception when the user calls Find() when no elements are present? After a little bit of thinking I thought that I had the answer. I thought to myself:

What if the List was a container for a value-type, such as int? If you attempt to find a value in an empty list, then the function cannot return null because that isn't valid for value-types! Throwing an exception does make sense!

Isn't it amazing how easy it is to convince yourself of your own greatness? I thought I'd nailed it first go. So I proposed my argument to my friend, who initially was semi-sold on the idea.

Then I thought about it again and managed to convince myself that my apparent "brilliance" was, in fact, a failure. The perfect counter-argument to the above point is:

What happens when you have a List of ints which does contain elements and you attempt to search for a value that is not in the list?

It wasn't immediately obvious. So I tried something to see what would happen:

List<int> list = new List<int>(new int[] { 1, 2, 3 });
int i = list.Find(x => x > 3);
// ....

So what do you think the value of i is after those first two lines? Yes, you guessed it: Zero. Why? Well, duh, it's because default(T) for integers is Zero!

This is where little alarm bells started to ring in my head. I immediately whipped up an example where this would be considered bad:

List<int> list = new List<int>(new int[] { 0, 1, 2, 3 });
int i = list.Find(x => x > 3);
// ....

Again, i is Zero when this code is executed, but the result is very misleading. Zero is contained in the collection but doesn't match the predicate, yet Zero is returned because that's the default value for this value-type.

I thought this was a bit of a glaring hole in the design. So I went straight to the documentation and found this:

Important Note:

When searching a list containing value types, make sure the default value for the type does not satisfy the search predicate. Otherwise, there is no way to distinguish between a default value indicating that no match was found and a list element that happens to have the default value for the type. If the default value satisfies the search predicate, use the FindIndex method instead.

This was concerning for a couple of reasons. First of all, the designers have left it up to you to determine that this is the default behaviour. Yes I should be able to come to that conclusion myself, but I didn't until I got bitten :) So shut up! Secondly, you have to check your result value against your predicate again to be sure that it's not dodgey. For example:

List<int> list = new List<int>(new int[] { 0, 1, 2, 3 });
int i = list.Find(x => x > 3);
if(i > 3)
{
  // .. valid value, do stuff ..
}
else
{
  // .. no item found
}

Do you want to do that? I certainly don't. After a bit of back-and-forth with Jimbo, I thought that the best option for a generic List Find() function would be one that is akin to the good old C++ days. It would look something like this:

bool Find<t>(Predicate<t> predicate, ref T output);

This would mean that you could change your code to something like the following:

int i;
List<int> list = new List<int>(new int[] { 0, 1, 2, 3 });
if(list.Find(x => x > 3, ref i))
{
  // .. valid value, do stuff ..
}
else
{
  // .. no item found, or empty list!
}

Note how with this option you could easily support the case for empty lists at the same time. It would be helpful and meaningful. Only when the function returns true can you rely on the output parameter. It's very clear and caters for value-types and reference-types. It'd be easy to implement in an extension method as well. I'd prefer this solution over using FindIndex().

In case it's not obvious, this problem would no doubt exist in all functions on generic objects that attempt to return a single instance of T based on some form of predicate. FindLast() would be another example.

I'm very keen to know the reasons behind the original design decision. I'm sure that minds far greater than mine parsed that problem and came up with that solution, probably for a very good reason.

What do you guys think?

  • James R
    Welcome to my parlour said the spider to the fly!

    After the pain of using the Find method in pre .Net ADO, I was just thinking of getting back into the water with Find in the generic lists, now you are ringing the shark alarm.

    I agree, having the Find method looking like the various TryParse methods would be more intuitive and useful.
  • OJ
    @James R: I was waiting for someone to say that ;) Along with FindIndex(), Contains() adds unnecessary overhead.

    The thing is, I want to find out if the item exists and get a reference to it. Having said that, this might not be such an issue for value types.

    What we need is to determine if something is in a list and get a handle to it regardless of whether it is a value type or a reference type. Contains() doesn't do that, and neither does FindIndex(), as you need to make further calls to get a handle.

    I certainly wouldn't be using a List<T> at all if I wanted performance across a large dataset, so you could say that I'm optimising unnecessarily. The point for me is that the Find() method is counter-intuitive if you haven't read the documentation. It should be more indicative of what it's doing for both value and reference types.

    Cheers!
  • James R
    You can use the Contains method, list.Contains(4) will return false in your example.

    However, if the list is long performance may be affected as the documentation notes that the method performs a linear search. Also, I do not know how efficient the method is if the list is not of a simple type.
  • OJ
    @Sergey: Thank you for attempting to justify your position. In my view, nobody with a solid understanding of C#, the .NET framework and the CLR would have made the comments you have just made. Until you brush up on your understanding and perhaps even use C# and .NET there's no point in continuing this conversation.
  • Sergey Shepelev
    C# is impossible without .NET. And i believe, C# is nearly all code written for .NET. Is there a point to distinguish them so much? If there is, let me rephrase - .NET implementation of generic list sucks.

    It's good to agree on using same therms, but this didn't change the point - C# sucks, because it's underlying (.NET) implementation of generic list sucks. It would be wrong if there was alternative implementation usable from C#. There isn't one, right?

    I think difference is big for .NET developer (i mean those guys who write the actual framework). For end user (developer who uses .NET framework)... Find()-ing item in list is just that tricky and hard to do right way as you wrote. It doesn't matter, whether is it framework or language problem.
  • OJ
    @Sergey: I think you just perfectly proved the point that you have no idea what you're talking about.

    Thank you for your uneducated opinion. Perhaps once you've finished making such "strong" arguments against the language you can begin to tell us why this "point" relates to C# as opposed to the .NET framework's implementation of a generic list container?
  • Sergey Shepelev
    I think you just perfectly proved another point why C# sucks.
blog comments powered by Disqus