Friday, January 4, 2013

Deconstructing Zed's K&R2 Deconstruction

I recently stumbled upon Zed Shaw's deconstruction of K&R2. The post is well intended but, in my opinion, flawed. I believe Zed fails to make a valid argument and also fails to provide a valid solution to the issue he raises.

The chapter is clearly not finished, so this rebuttal might not be valid at the time of reading.

The Argument

The primary argument is that K&R2 is not an appropriate tool for learning C in our modern age. The example given is a function called copy which is effectively strcpy. Zed points out that if the function is not given a valid string, as C defines it, the behaviour of the function is undefined.

This provides a formal proof that the function is defective because there are possible inputs that causes the while-loop to run forever or overflow the target.

When presented with the rebuttal that the cases where it fails are not valid C strings, the response is that it doesn't matter:

... but I'm saying the function is defective because most of the possible inputs cause it to crash the software.

The problem with this mindset is there's no way to confirm that a C string is valid.

Also:

Another argument in favor of this copy() function is when the proponents of K&RC state that you are "just supposed to not use bad strings". Despite the mountains of empirical evidence that this is impossible in C code...

To reiterate, the problem with copy is that:

  1. It depends on valid C strings to operate correctly
  2. C strings are impossible to validate at run-time
  3. The behaviour of copy is undefined for most values that are possible to be put into a char*

Proposed Solution

The solution is a function called safercopy which takes the lengths of the storages as input, allegedly guaranteeing the termination of safercopy:

In every case the for-loop variant with string length given as arguments will terminate no matter what.

What's Wrong With This

We can write what is wrong with safercopy using the exact same criteria Zed used for copy:

  1. It depends on valid lengths to operate correctly
  2. The lengths are impossible to validate at run-time
  3. The behaviour of safercopy is undefined for most values that are possible to be put into a size_t (I am presuming that the lengths would be a size_t)

Additionally, Zed instills a false confidence in his safercopy. The function is no more guaranteed to terminate than copy when given bad input. Specifically, if the lengths are wrong causing the copy loop to go out of bounds of its storage it could easily overwrite the value of anything, including the lengths and pointer values in the loop its in. It could blow up, it could loop forever, who knows. It's undefined.

Finally, if it is hard to properly handle C strings, why should we think it is any easier to track the length of a string separately? Remember, in C the length of a string is encoded in the string itself by the location of the '\0'. The solution provided by Zed takes the length of the strings as separate input. But he provides no reason to believe developers will get this correct. If the solution proposed had been implementing a safe string library, I might be able to agree.

And that is the crux of the problem. It's not if K&R2 is any good or not, it's that the solution given isn't any better. It doesn't address the faults of C strings. There are known way safely handle C strings, the problem tends to be that its tedious so people get it wrong. Many C strings issues have to do with lack of proper allocation rather than forgetting the '\0'-terminator. In what way does the solution solve this problem?

If the solution given is no better than the problem it's solving, then it isn't a very good solution.

K&R2

Is K&R2 not suitable for teaching people C in this day? It has plenty of faults in it but I don't think this particular article, as it exists now, makes a compelling argument. Nor does it provide anything better than what it's critiquing.

3 comments:

  1. Your argument is flawed in two ways:

    1. You cannot say that K&R's string copy function is correct because you believe my function is wrong. All that does is show mine's flawed, not disprove that theirs is flawed.

    2. I actually point out this problem with sizes, but no matter what size you give it *my function does terminate*. Pick any size integer you want, it is still finite on every machine. Meanwhile, the while-loop version does not terminate, and provably so. That is the defect, and failure to terminate logically is a flaw.

    So no, your critique of mine is nothing more than repeating what I already said, then claiming it disproves that K&R's function is broken but not actually presenting a formal proof that revalidates their function.

    ReplyDelete
    Replies
    1. Your first point is correct, I never actually come down on if I think the K&R2 function is the correct solution or not.

      Your second point is false though. Your function is not guaranteed to terminate. If you give invalid lengths to safercopy, specifically larger than the strings, it is not guaranteed to terminate. The behaviour is undefined. So your analysis of your own solution is incorrect.

      Delete
    2. Dear Zed.

      In your article I note that you say "The only way to solve it is to include the length of every string and use that to scan it."

      Wouldn't another solution be to use that length to put a '\0' at string[length]? Doesn't that mean your solution isn't the only one?

      Wa la. Problem solved. No more strlen problems. You lose. Who's your farkin' daddeh? I am! Next...

      Delete