Understanding Transducers
What are transducers? Using transducers is easy enough—but how do they work underneath the hood?
This article explores transducers by ignoring transducers. Instead we will examine two ordinary functions, map
and filter
. We’ll play with them and scrutinize them. And we’ll marvel at the power of higher-order functions as we apply abstractions. And perhaps, if we’re lucky, we’ll bump into transducers along the way.
And since we ignore transducers, you won’t need to know what transducers are to follow along. If you don’t know Clojure or a Lisp, this quick primer may help.
Lastly, I encourage you to type these examples into your REPL, or use clojurescript.net. The source code from this post can be found here.
Power of reduce
You are probably familiar with map
and filter
, and know that we can combine them together, like this:
(map inc (range 10))
; ⇒ (1 2 3 4 5 6 7 8 9 10)
(filter even? '(1 2 3 4 5 6 7 8 9 10))
; ⇒ (2 4 6 8 10)
(filter even? (map inc (range 10)))
; ⇒ (2 4 6 8 10)
A key insight, however, is that map
and filter
can be defined using reduce
. Let’s implement the expression (map inc (range 10))
in terms of reduce
:
(defn map-inc-reducer
[result input]
(conj result (inc input)))
(reduce map-inc-reducer [] (range 10))
; ⇒ [1 2 3 4 5 6 7 8 9 10]
But note map-inc-reducer
’s explicit use of inc
as its transformer. What if we extract that out and let the user pass in whatever function they want? We can define a new function that takes a transforming function like inc
and returns a new function.
(defn map-reducer
[f]
(fn [result input]
(conj result (f input))))
(reduce (map-reducer inc) [] (range 10))
; ⇒ [1 2 3 4 5 6 7 8 9 10]
Functions like map-reducer
are called higher-order functions because they accept functions and return functions. Let’s play around:
(reduce (map-reducer dec) [] (range 10))
; ⇒ [-1 0 1 2 3 4 5 6 7 8]
(reduce (map-reducer #(* % %)) [] (range 10))
; ⇒ [0 1 4 9 16 25 36 49 64 81]
Let’s also implement the expression (filter even? '(1 2 3 4 5 6 7 8 9 10))
in terms of reduce
:
(defn filter-even-reducer
[result input]
(if (even? input)
(conj result input)
result))
(reduce filter-even-reducer [] '(1 2 3 4 5 6 7 8 9 10))
; ⇒ [2 4 6 8 10]
Again, notice that filter-even-reducer
explicitly uses even?
as its predicate. As before, let’s extract that out and let the user pass in whatever they want.
(defn filter-reducer
[predicate]
(fn [result input]
(if (predicate input)
(conj result input)
result)))
(reduce (filter-reducer even?) [] '(1 2 3 4 5 6 7 8 9 10))
; ⇒ [2 4 6 8 10]
We can even compose map-reducer
and filter-reducer
together:
The expression above is equivalent (ignoring vectors versus lists) to the expression below.
We see that with higher-order functions, we are able to define map
and filter
in terms of reduce
.
Both versions, however, required intermediate vectors—one for map and one for filter. One important property of transducers is that they should employ only one collection regardless of the number of transformations. How can we accomplish that?
Another step in abstraction
Let’s scrutinize map-reducer
and filter-reducer
. Here they are again:
(defn map-reducer
[f]
(fn [result input]
(conj result (f input))))
(defn filter-reducer
[predicate]
(fn [result input]
(if (predicate input)
(conj result input)
result)))
What do you observe? conj
is used in both of them. Why? What’s so special about it? Can we use other functions in place of conj
?
Well, notice that result
and input
can be of any type. If result
is 10
and input
is 1
, conj
would not work here; (conj 10 1)
throws an error. Instead of conj
, we would want something like +
, because (+ 10 1)
makes sense.
We can say that conj
and +
are both reducing functions. Reducing functions have the type result, input -> result
; they take a result and an input, and returns a new result. For example:
Now, instead of always using conj
in map-reducer
and filter-reducer
, what if we let the user pass in whatever reducing function they want?
This will result in another higher-order function that takes our map’s transform function and filter’s predicate function, as usual. But we now will return a function that accepts a reducing function. Let’s use the names mapping
and filtering
for our new functions.
(defn mapping
[f]
(fn [reducing]
(fn [result input]
(reducing result (f input)))))
(defn filtering
[predicate]
(fn [reducing]
(fn [result input]
(if (predicate input)
(reducing result input)
result))))
And now let’s use them as before:
We see here that we can choose the reducing function. In this case, we choose conj
.
Arriving at transducers
Take note of the functions ((mapping inc) conj)
and ((filtering even?) conj)
. Their types are result, input -> result
. We can test this:
(((mapping inc) conj) [] 1)
; ⇒ [2]
(((mapping inc) conj) [2] 2)
; ⇒ [2 3]
(((mapping inc) conj) [2 3] 3)
; ⇒ [2 3 4]
(((filtering even?) conj) [2 4] 5)
; ⇒ [2 4]
(((filtering even?) conj) [2 4] 6)
; ⇒ [2 4 6]
This means that ((mapping inc) conj)
and ((filtering even?) conj)
are also reducing functions, just like conj
and +
.
So what happens if we compose these two functions like this:
This is also a function. But what is its type? Go on, evaluate it.
It turns out, this function also has the type result, input -> result
. It is also a reducing function. This means that we can use it via reduce
:
This is a bit messy, so let’s clean it up by using comp
instead. Recall that (comp a b c d)
returns the function
Here’s the cleaned up version, using comp
:
(def xform
(comp
(mapping inc)
(filtering even?)))
(reduce (xform conj) [] (range 10))
; ⇒ [2 4 6 8 10]
And what about something more complex:
(defn square [x] (* x x))
(def xform
(comp
(filtering even?)
(filtering #(< % 10))
(mapping square)
(mapping inc)))
(reduce (xform conj) [] (range 10))
; ⇒ [1 5 17 37 65]
Beautiful.
But we were talking about transducers. Where are our transducers?
It turns out, mapping
and filtering
are transducer-returning functions. The functions (mapping inc)
, (filtering even?)
and xform
are the very transducers we were looking for.
Transducers are functions that accept a reducing function and return a reducing function. For example, the function (mapping inc)
is a transducer because it accepts a reducing function like conj
, and returns another reducing function, as we observed above. As Rich Hickey pointed out in Transducers are coming, transducers have the type
(result, input -> result) -> (result, input -> result)
Also observe that no intermediate collections are created when we evaluate (reduce (xform conj) [] (range 10))
, other than the initial vector. This satisfies our goal of not allocating intermediate collections.
A more intuitive understanding
It may be difficult to understand why the xform
transducer works because it’s quite complicated. Let’s try to get a better understanding. Here’s xform
as before:
(defn square [x] (* x x))
(def xform
(comp
(filtering even?)
(filtering #(< % 10))
(mapping square)
(mapping inc)))
Say we invoke this composed function by passing in some reducing function, perhaps our favorite, conj
. This would be passed to the transducer (mapping inc)
, and we would have ((mapping inc) conj)
. We know that this returns another reducing function. This new reducing function is then passed into the function (mapping square)
, which is another transducer. Naturally, this returns another reducing function. And we do this all the way to the first transducer in our composition, (filtering even?)
.
This means that when we give a reducing function to xform
, like (xform conj)
, we get back a function that will apply the left-most reducing function first, then down the stack until the last reducing function, conj
, is applied to the current result and input.
Imagine this transducer is being used in some reduce function, and we have so far collected in our results the vector [1 5 17]
. Say the current input in question is 12
. Since 12
is even, it will pass the first filter. Instead of terminating early and returning the current result
, the first filter will call its reducing function, passing in [1 5 17]
and 12
. In this case, the reducing function is the “rest” of the transformation, which is the second filter #(< % 10)
. Since 12
fails the second filter, the third reducing function is not called, and the result-so-far [1 5 17]
is returned.
But if the input in question is 6
, it would pass both filters and arrive at the mapping transforms, which will transform 6
to 37
. We then pass this input to the final reducing function, conj
, which will join [1 5 17]
with the new value 37
.
((xform conj) [1 5 17] 12)
; ⇒ [1 5 17]
((xform conj) [1 5 17] 6)
; ⇒ [1 5 17 37]
(reduce (xform conj) [] (range 10))
; ⇒ [1 5 17 37 65]
Being able to compose transducers is important. We see that it’s quite simple to do, and that ordinary functions power all of it.
Transducers in core.async
Another major selling point of transducers is that a transducer can work across core.async channels. For example, we should be able to take our xform
transducer and use it to filter and transform items in a channel.
Using Clojure’s transducer library:
(defn square [x] (* x x))
(def xform
(comp
(filter even?)
(filter #(< % 10))
(map square)
(map inc)))
(def my-chan (async/chan 1 xform))
; Waiting for an item to print...
(async/take! my-chan println)
(async/put! my-chan 3)
; nothing printed to screen, since 3 is not even
(async/put! my-chan 4)
; "17" printed to screen, since 4 is even and less than 10
How do transducers work across core.async channels?
First, note that channel buffers are linked lists underneath (in fact, java.util.LinkedList
s). When you put an item into a channel, the internal helper method add!
is called to add your item into the buffer.
But if a transducer xform
is supplied, core.async will use add!
as the reducing function passed into xform
:
This means that any item put into a channel will first be transformed by our transducer. And if the transducer filters out an item (e.g. due to (filter even?)
), then the final reducing function add!
is never called. Thus the item is never added to the channel’s buffer and no takers ever see it.
The pertinent code can be found in the core.async sources, here.
Conclusion
We’ve come a long ways. We started with regular map
and filter
and observed how they can be implemented using reduce
. We then abstracted our reducing functions until we found ourselves with transducer-building functions, mapping
and filtering
.
By building, analyzing and using transducers, I hope that you gained a better understanding of how they work. They are, after all, just functions.
Problem sets
If you’re interested in learning more, I encourage you to tackle the problems below. Solutions can be found here.
Write a transduce
helper function
Right now, our use of reduce
is a bit clunky. Write a function transduce
that will allow us to use transducers like this:
The Caesar Cipher
In our examples above, we used conj
and +
as reducing functions. Let’s write a more complex one.
Given a string, use transducers to:
- Filter out vowels and non-ASCII characters
- Filter out upper-case characters
- Rotate all remaining characters via a Caesar cipher,
- And reduce the rotated characters into a map counting the number of occurrences of each character.
Example:
(defn caesar-count
[string cipher]
???)
(caesar-count "abc" 0)
; ⇒ {\c 1, \b 1}
(caesar-count "abc" 1)
; ⇒ {\d 1, \c 1}
(caesar-count "hello world" 0)
; ⇒ {\d 1, \r 1, \w 1, \l 3, \h 1}
(caesar-count "hello world" 13)
; ⇒ {\q 1, \e 1, \j 1, \y 3, \u 1}
Write a mapcat
transducer
Write mapcatting
, a function that returns a mapcat
transducer.
Examples of mapcat
(no transducers):
The transducer should work like this:
(defn mapcatting [f] ???)
(reduce ((mapcatting twins) conj) [] (range 10))
; ⇒ [0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9]
Write a take
transducer
Write taking
, a function that returns a take
transducer.
Note that you may need to keep some state for this one.
References
Tom Ashworth: CSP and transducers in JavaScript. This is the original blog post that helped me understand transducers.