Writing clearer code without map and filter.
I love programming by slicing and dicing maps and lists in Clojure. The data-first approach of Clojure — and functional programming languages in general — feels right after years of hiding data behind object-oriented walls.
But one thing has been a thorn in my side since starting with Clojure four years ago: sometimes I still feel like a clumsy beginner when it comes to processing nested trees of maps and lists.
Here’s what I mean. Say I have a list of sites owned by a restauranteur. Each one of the sites has one or more stations that provide service to customers, and I want to generate a list of the ID’s of every station on all sites.
(def sites [{:time-zone "Adelaide/Australia" :stations [{:id "1" :name "Barista"}]} {:time-zone "Melbourne/Australia" :stations [{:id "2" :name "Kitchen"}]}])
Many developers, myself included, get introduced to processing lists and maps using the higher-order functions map
and filter
.1 This tends to be a bit of a revelation, and the natural instinct is to always reach for them whenever we want to process data.
The problem is that logic for processing complex data structures using map/filter can be:
- Hard to write.
- Even harder to read later.
Here’s Clojure that generates that list of station ID’s I wanted:
(mapcat (fn [site] (map :id (:stations site))) sites) -> ("1" "2")
This is not great code. It’s not obvious that this is going to generate a list of station ID’s: you need to mentally parse into the inner statement before you can see that it’s mapping :id
over stations.
You can make this a little shorter using an anonymous function (although whether this makes it more clear is another matter):
(mapcat #(map :id (:stations %)) sites)
And it can be even better using the thrush macro:
(->> sites (mapcat :stations) (map :id))
Which is probably the best form since you can see that ID’s come out at the end (but I always forget the mapcat!).2
But learning how to tame map
with ->>
doesn’t help much when you start to process nested data structures. And then you add filtering into the mix.
Let’s find all ID’s of stations in the Adelaide time zone:
(mapcat (fn [site] (map :id (:stations site))) (filter #(= (:time-zone %) "Adelaide/Australia") sites))
Or, using a thrush pipeline:
(->> sites (filter #(= (:time-zone %) "Adelaide/Australia")) (mapcat :stations) (map :id))
But you can also do it this way, using a list-comprehension in Clojure’s for
syntax:
(for [site sites :when (= (:time-zone site) "Adelaide/Australia") station (:stations site)] (:id station))
This reads as: “select every site
in sites
, remove the ones that aren’t in the Adelaide time zone, select the stations from those, and return their ID”. This generates the same results as the first expressions, but reads pretty much as you’d say it to someone (and there’s no mapcat
to forget).
Now, depending on your background, you might actually prefer the pipelined map/mapcat
style. That style is very similar to the point free form used in many functional languages such as Haskell, where new functions are composed from others without naming parameters.3 This can be nice — especially in languages with automatic currying like Haskell — but it requires discipline to use as expressions get more complex.
But it’s when you get to following example, a real part of a system I’ve been working on, that list-comprehension approach comes into its own:
(for [scheduled-task scheduled-tasks schedule (:schedules scheduled-task) [station-id span] (get spans (:id schedule)) :when (in-span? span current-time)] (let [local-time (-> current-time (t/to-time-zone (timezone-for-station-id sites station-id)))] [station-id (make-station-task scheduled-task schedule local-time)]))
Without going into details, this has three levels of nesting and a filter, generating a series of pairs of station ID’s and tasks to be scheduled for them. For someone familiar with the system, this will be far easier to comprehend than a map/filter.
Clojure’s For Comprehension
Like many “built in” parts of Clojure, for
is a macro, so it’s easy to use macroexpand-all
to see what code it generates. When I did that, I found it the results a little surprising. While you can always formulate a list comprehension in terms of map/mapcat/filter
, the for
macro produces a series of inline nested loops that generate a lazy Clojure “chunked” sequence. In Clojure 1.7, the first small example above generates 80 lines (76 sexp’s) and defines 4 closures (i.e. fn’s).
This is a lot more code than would be generated by the map/filter route, and I’d be interested to know why Rich Hickey took that approach. My initial guess was that it might be easier for the JVM to optimise, but I ran a quick-and-dirty benchmark of the simplest for
expression selecting station ID’s the and map/mapcat
equivalent, and there was no discernible difference in speed.
Going Further: Specter
The examples I’ve shown so far are queries: I’m filtering and transforming data into result sets. What if I want to “modify” data, i.e. produce a new copy with the same shape but some changes? This is where Specter comes in.
To change the name of all the stations named “Barista” to “Coffee Master”, I can use a for comprehension:
(for [site sites station (:stations site)] (if (= (:name station) "Barista") (assoc station :name "Coffee Master") station))
But using Specter:
(->> sites (transform [ALL :stations ALL #(= (:name %) "Barista")] #(assoc % :name "Coffee Master")))
Now, this example is not actually doing justice to Specter. But in a previous (superseded) versions of the system, Specter replaced what would have been some really hairy logic with a two lines of code. I expect to be using it more in the future.
A Useful Tool
Looking around online, I can see that list comprehensions are especially idiomatic in Python, and I’ve seen them used regularly in Haskell code too. I wrote this post because I was missing out on a powerful part of Clojure, partly because I don’t see a lot of use of for
in the Clojure code I read. So perhaps this will add a useful tool to your toolbox next time you’re wrangling data.