Functional Pollution
In the Ruby mailing list a self-proclaimed "Newbie" (I'll avoid mentioning names since there's no need for them here, although if the people involved want to be named I'll cheerfully edit the blog) asked for hints as to how to improve this code:
dpath = Dir.getwd
Dir.foreach(dpath) do
|x|
ddir = File.join(dpath,x)
if File.directory?(ddir) && x != "." && x != ".." then
puts %x{du -sh #{x}}
end
end
I found it very promising that the person in question identified this code as "ugly" and was actively soliciting input for improving it. I've worked with a lot of people who'll cheerfully churn out hundreds of pages of boilerplate code with small, subtle changes on each iteration and not notice how unmanageably ugly their code is.
One of the first responses to this question was posted while I was composing my own response. It looked like this:
require 'rio'
rio.dirs do |x|
puts %x{du -sh #{x}}
end
Of course I didn't actually see that post until after mine got put up, but when I did see it, I looked at it and thought to myself:
This is the typical sort of answer for an imperative programmer.
As a (relatively) recently-reformed imperative programmer myself, I've got a special sensitivity toward spotting imperative solutions in contrast to more functional solutions. As I said in an earlier entry, learning new paradigms of programming isn't just useful for using those paradigms directly. When I learned OOP, it informed even my straightforward procedural programming. And as I've learned more and more functional thinking, it informs both my procedural and my object-oriented thinking.
So what's so imperative about this second block of code? Well, it does what every good imperative programmer wants: it wraps specific, desired functionality into a library and gives you a new API to access it. And yes, it is indeed, much prettier than the first code.
Look at the costs, however:
- You have to download and install a new library. The Rio library isn't part of standard Ruby, so you have to go through the hassles there. And if this is an application for distribution, you've just made your build and installation that much more complicated.
- You have to learn the API of that new library. Rio is a large, complex library with a lot of functionality (most of which isn't likely to be needed in any single given application). It is a good library, well-written and well-documented, but there is a serious overhead in learning attached to it before you can be fluent in using it.
- Apropos the large, complex library – Rio brings in all that functionality whether you're using it or not. This can have an impact on applications.
- Using Rio isn't as flexible as working from scratch using good, functional composition techniques (at least in my opinion).
The solution I put forth (a small bug later corrected by Dan Zwell, thanks Dan!) is, to my mind, a much cleaner solution. There's a bit more typing involved, true, but I'm not afraid of typing. Any decent developer spends at most 10% of the time typing anyway. The rest should be thinking, planning, testing, designing, etc. Here's what it looks like:
Dir.entries(Dir.getwd).reject{|e| /^\.{1,2}$/ =~ e}.find_all{|e| File.directory? e}.each do |x|
puts %x{du -sh "#{x}"}
end
Now I don't claim that this code is perfect. First, I hacked it together quickly in a couple of minutes as a proof-of-concept. I did briefly test it, but as Dan demonstrated there was a clear bug (the regular expression would have matched any file beginning or ending in ".") in that version. Too, I decided to separate out the reject logic from the accept logic. I find that much easier to think about instead of making large, complicated conditional expressions, but I acknowledge that this could have dire performance impact in some circumstances. (My policy is write for correctness first, then refactor for performance if it's proven you need it.)
Now to the average imperative programmer, the second block of code—the Rio one—is the best, hands-down. The reason for this is, I suspect, because of the typing. Most (all?) procedural languages are so verbose and so full of boilerplate scaffolding that typing is something people get paranoid of. The best solution is the one that involves the least typing. Or something. I disagree, however. I like my solution better.
I like it better because it's more flexible. I can adjust the conditions very easily to precisely do that which I want and only that which I want. I can make huge, convoluted logic statements if I choose to, or if I choose not to I can wrap complicated membership checks into a function and pass the function in. I don't have to process by the block the way I did. I can instead just take the (very finely-tuned) returned list and use functional calls like inject (a.k.a. "fold" to those used to functional languages), collect (a.k.a. map), etc. to continue my merry, functional way. I can take the string of chained conditions I've got there and wrap them up into a function and use any source of string lists that correspond to file/directory names as my inputs. And, best of all, I can do all of this without having to muck around with another library (well-written as Rio is) and without being forced into what another library's writer thinks my interface to the world should be.
My mind has been polluted with functional thinking.

4 comments:
I think this is almost outside of the "functional vs object vs imperative" arguments...
Richard P. Gabriel's Patterns of Software: Reuse vs Compression essay comes the closest to capturing what's going on here.
The crux of the argument as you point out is that there's a cost to reuse no matter how small. To which I add that the value must exceed the cost to be beneficial.
The post which says 'just use rio' is correct... for that individual and their use. For one-off use, the same post is very wrong. rio as a DSL is extremely valuable if you're doing a lot of file manipulation, and it's a fantastic aid if you're using it together with rake or shell commands. But to frame the question another way: would someone learn a new language (albeit a small one) in order to do this task. And also, is it worth forcing all who follow using the resulting code to learn this same language.
In this sense there's a very natural tradeoff between learning new languages and "just getting the job done with the tools at hand". In fact there's an argument for using reusable code you're already familiar with during development and systematically removing infrequently used bits to reduce dependencies before releasing it into the wild.
I agree with a lot of what you said, but with one caveat, Jeff: with sufficient abstraction in the language you're using, a lot of these "DSLs" are so trivial that they're pointless.
First, let me get it out of the way: I am in no way saying that Rio is a bad library. (Nor am I saying that Ruby is a bad language.) It is a really, really good library. (Just like Ruby is a really, really good language.) I'm very impressed with it. But the very need for a library like that is symptomatic of some problems of the expressiveness of many languages—even in a language as expressive as Ruby.
To go with the language I'm wrestling with the most at the moment: in Haskell you'd likely never see a library like Rio. You'd have no need for it. Composition of chains of functions (anonymous or otherwise) is so trivial that making a large, complicated library like Rio would be downright quixotic. Doing what I did in the post above in Ruby using Haskell instead would be such a trivial undertaking that you'd likely get laughed at if you suggested wrapping it all up into a library.
Codepoet had an interesting article showing IO in Haskell. Read the article and you'll see what I mean; why a Rio-like library for Haskell is likely not of much use. If you're still not convinced, drop by the Simple Unix Tools page of the Haskell Wiki. Take a look at a whole bunch of Unix string utilities sort-of replicated in Haskell. Count the number of lines per utility. (Of course the functionality isn't identical since it's only proof-of-concept code, but what you do see is pretty jaw-dropping.)
I think that there is a strong correlation between the expressiveness of a language. I'm sure that Gabriel's paper has some very good points to make on the Reuse vs. Compression issue (got a link?) but I think there's a diagonal issue of the level at where you have to start making that trade-off decision. Given a sufficiently expressive language, you can get a lot farther a lot easier without the need for special "little languages" to help you through it.
(I still do think that Rio is an awesome library.)
sigh
Teach me not to proof-read....
s/(correlation between the expressiveness of a language)/negative $1 and the size of "required" support libraries/
link to Gabriel's book in which the sited essay appears: http://www.dreamsongs.com/Files/PatternsOfSoftware.pdf
And yes, I agree with you... what you're really objecting to is trivial libraries... for example, I've never warmed up to Ruby Facets library even though I've read through the source on more than one occassion and borrowed ideas from it and have the gem installed (http://facets.rubyforge.org/) I tend not to use it directly because it's just such a nuisance to learn the bits and pieces. Each piece of it is just too small to be useful.
Of course I also work with Java J2EE developers during my day job and see Java projects collapsing under the weight of add-on libraries and frameworks. In that environment you can do virtually nothing without the aid of extras. And of course each project (and each programmer) has different favourites that overlap. To maintain trivial code I need to know several frameworks each of which do the same thing. This is far worse than having to know multiple different programming languages.
The more expressive the language the farther you can go before the weight of the environment collapses on you in a giant code cave-in leaving you smothering and gasping for breath.
Post a Comment