2007-06-11

A Columbo Moment

One of my favourite television shows when I was younger (and still a favourite today when I'm in a reminiscent mood) was Columbo. Peter Falk (one of my favourite character actors) played a natty little police detective who doggedly (and non-violently) pursued murderers with his stock catch-phrase "just one more question".

My Columbo moment came while voraciously reading a book Jeff pointed me to. Written by Richard P. Gabriel, the book is a collection of essays on software development and its community. The "just one more question" popped up as I read this passage:

A small language defined by a group of mathematicians or would-be mathematicians might have a usable formal semantics, and people who wish to prove correctness would be in luck with such a language. But—I hate to be the one to break the news—I’ve been in industry for over 10 years, and I don’t see a whole lot of people proving correctness; I see them reasoning about their programs, but they don’t need a formal semantics to do that.

If you Google on "Haskell prove correct" or some similar phrase, you'll find a lot of people extolling the virtues of Haskell based on how you can make code that is "provably correct". Yet, oddly, I would guess that most of the same people extolling this virtue have themselves never formally proven anything in code beyond some trivial things they did as an exercise.

"Provably correct" code is, I think, a chimera. Whether code is "provable" or not is irrelevant if nobody actually proves it. I think Haskell's virtues are legion without invoking chimeras.
  • It's concise.
  • It's expressive.
  • It permits powerful abstractions to the point of not requiring huge, trivial libraries.
  • It supports multi-threaded programming with no change of source code -- the compiler does the work behind the scenes.
  • The monadic stuff still blows me away with just how powerful the ability to define whole computational approaches in a libraries. (There's a future blog entry in this if I can ever find the words.)

That's only a short, representative list. I could go on for a long time explaining Haskell's virtues. So why burble on about a capability which is perhaps true, but never used by anybody? Why not focus on things people actually care about?

2007-06-09

Functional Pollution

In the Ruby mailing list a self-proclaimed "Newbie" (I'll avoid mentioning names since there's no need for them here, although if the people involved want to be named I'll cheerfully edit the blog) asked for hints as to how to improve this code:

dpath = Dir.getwd
Dir.foreach(dpath) do
|x|
ddir = File.join(dpath,x)
if File.directory?(ddir) && x != "." && x != ".." then
puts %x{du -sh #{x}}
end
end

I found it very promising that the person in question identified this code as "ugly" and was actively soliciting input for improving it. I've worked with a lot of people who'll cheerfully churn out hundreds of pages of boilerplate code with small, subtle changes on each iteration and not notice how unmanageably ugly their code is.

One of the first responses to this question was posted while I was composing my own response. It looked like this:
require 'rio'
rio.dirs do |x|
puts %x{du -sh #{x}}
end

Of course I didn't actually see that post until after mine got put up, but when I did see it, I looked at it and thought to myself:
This is the typical sort of answer for an imperative programmer.

As a (relatively) recently-reformed imperative programmer myself, I've got a special sensitivity toward spotting imperative solutions in contrast to more functional solutions. As I said in an earlier entry, learning new paradigms of programming isn't just useful for using those paradigms directly. When I learned OOP, it informed even my straightforward procedural programming. And as I've learned more and more functional thinking, it informs both my procedural and my object-oriented thinking.

So what's so imperative about this second block of code? Well, it does what every good imperative programmer wants: it wraps specific, desired functionality into a library and gives you a new API to access it. And yes, it is indeed, much prettier than the first code.

Look at the costs, however:
  • You have to download and install a new library. The Rio library isn't part of standard Ruby, so you have to go through the hassles there. And if this is an application for distribution, you've just made your build and installation that much more complicated.
  • You have to learn the API of that new library. Rio is a large, complex library with a lot of functionality (most of which isn't likely to be needed in any single given application). It is a good library, well-written and well-documented, but there is a serious overhead in learning attached to it before you can be fluent in using it.
  • Apropos the large, complex library – Rio brings in all that functionality whether you're using it or not. This can have an impact on applications.
  • Using Rio isn't as flexible as working from scratch using good, functional composition techniques (at least in my opinion).

The solution I put forth (a small bug later corrected by Dan Zwell, thanks Dan!) is, to my mind, a much cleaner solution. There's a bit more typing involved, true, but I'm not afraid of typing. Any decent developer spends at most 10% of the time typing anyway. The rest should be thinking, planning, testing, designing, etc. Here's what it looks like:
Dir.entries(Dir.getwd).reject{|e| /^\.{1,2}$/ =~ e}.find_all{|e| File.directory? e}.each do |x|
puts %x{du -sh "#{x}"}
end

Now I don't claim that this code is perfect. First, I hacked it together quickly in a couple of minutes as a proof-of-concept. I did briefly test it, but as Dan demonstrated there was a clear bug (the regular expression would have matched any file beginning or ending in ".") in that version. Too, I decided to separate out the reject logic from the accept logic. I find that much easier to think about instead of making large, complicated conditional expressions, but I acknowledge that this could have dire performance impact in some circumstances. (My policy is write for correctness first, then refactor for performance if it's proven you need it.)

Now to the average imperative programmer, the second block of code—the Rio one—is the best, hands-down. The reason for this is, I suspect, because of the typing. Most (all?) procedural languages are so verbose and so full of boilerplate scaffolding that typing is something people get paranoid of. The best solution is the one that involves the least typing. Or something. I disagree, however. I like my solution better.

I like it better because it's more flexible. I can adjust the conditions very easily to precisely do that which I want and only that which I want. I can make huge, convoluted logic statements if I choose to, or if I choose not to I can wrap complicated membership checks into a function and pass the function in. I don't have to process by the block the way I did. I can instead just take the (very finely-tuned) returned list and use functional calls like inject (a.k.a. "fold" to those used to functional languages), collect (a.k.a. map), etc. to continue my merry, functional way. I can take the string of chained conditions I've got there and wrap them up into a function and use any source of string lists that correspond to file/directory names as my inputs. And, best of all, I can do all of this without having to muck around with another library (well-written as Rio is) and without being forced into what another library's writer thinks my interface to the world should be.

My mind has been polluted with functional thinking.