ClojureDocs

Nav

Namespaces

pmap

clojure.core

Available since 1.0 (source)
  • (pmap f coll)
  • (pmap f coll & colls)
Like map, except f is applied in parallel. Semi-lazy in that the
parallel computation stays ahead of the consumption, but doesn't
realize the entire result unless required. Only useful for
computationally intensive functions where the time of f dominates
the coordination overhead.
5 Examples
;; This function operates just like map.  See
;; clojure.core/map for more details.
user=> (pmap inc [1 2 3 4 5])
(2 3 4 5 6)
;; A function that simulates a long-running process by calling Thread/sleep:
(defn long-running-job [n]
    (Thread/sleep 3000) ; wait for 3 seconds
    (+ n 10))

;; Use `doall` to eagerly evaluate `map`, which evaluates lazily by default.

;; With `map`, the total elapsed time is just under 4 * 3 seconds:
user=> (time (doall (map long-running-job (range 4))))
"Elapsed time: 11999.235098 msecs"
(10 11 12 13)

;; With `pmap`, the total elapsed time is just over 3 seconds:
user=> (time (doall (pmap long-running-job (range 4))))
"Elapsed time: 3200.001117 msecs"
(10 11 12 13)
;; pmap is implemented using Clojure futures.  See examples for 'future'
;; for discussion of an undesirable 1-minute wait that can occur before
;; your standalone Clojure program exits if you do not use shutdown-agents.
;; Parallel application (of 'f') does NOT mean that the result collection would
;; be sorted according to calculation time. The result collection is sorted
;; in the same way as for map, i.e. it "preserves" the items' order in the 'coll'
;; (or 'colls') parameter(s) of pmap. In other words: calculation is done parallel,
;; but the result is delivered in the order the input came (in 'coll'/'colls').

;; So, e.g. if the first item of 'coll' takes 1 hour to be processed (by 'f'), and
;; the rest requires 1 sec, nothing is delivered by pmap during the 1st hour:
;; the 1st item "blocks" the appearence of the others in the result of pmap,
;; even if the others are already calculated. E.g. (take 5 (pmap ...) will not 
;; return in 5 secs (but in 1 hour), even if we calculated 5 items in 5 secs
;; -- we wait for the calculations of the first five in 'coll'.

;; In contrast, side effects of 'f' (if any) are coming in "random" order (due to
;; parallelism): in the example above, we might see the side effects (e.g. swap!-s)
;; of many appliactions of 'f' to different elements of 'coll', long before we 
;; get the result of (take 1 (pmap ...)).

;; To illustrate the statements above, run this:
(defn proc
  [i]
  (println "processing: " i "(" (System/currentTimeMillis) ")")
  (Thread/sleep
   (if (= i 0)
     5000
     10)))

(take 1 (pmap proc (range 5)))
;; output:
(processing: processing: processing: processing: processing:     3 42  ( ((1 
 1539007947561(  1539007947561 ) )1539007947561 0 )

1539007947561( ) 1539007947561 )

nil)
;; We can see that 5 threads are started at the same time, immediately, in parallel.
;; 4 of them must be finished in 10 msecs, but we get back the REPL prompt
;; only after 5 secs, because we wait for the result of the i=0 item.
;; pmap is implemented using Clojure futures. Futures run in threads. 
;; These threads of a pmap's evaluation run independently from each other.
;; This means that even if one of these threads already determined the result
;; of the whole pmap*, all the other, already started threads keep running
;; until they finish their own calculations. (Although these calcualtions might
;; already be absolutely unnecessary.)
;; This can be especially important, when these threads have side effects:
;; these side effects (e.g. swap!-s) might happen later, when they are not
;; expected anymore.
;; Moreover, these "cowboy" threads keep occuping the resources (CPU, memory...)
;; they need.
;; *: this is the case e.g. when one of the threads throws an exception.
See Also

Returns a lazy sequence consisting of the result of applying f to the set of first items of each c...

Added by gstamp

Takes a body of expressions and yields a future object that will invoke the body in another thread...

Added by jafingerhut

Executes the no-arg fns in parallel, returning a lazy sequence of their values

Added by MicahElliott

Returns a lazy sequence of the values of the exprs, which are evaluated in parallel

Added by MicahElliott

Returns a lazy sequence of lists of n items each, at offsets step apart. If step is not supplied, ...

Added by MicahElliott
2 Notes
    By , created 14.4 years ago

    for insight into how pmap does stuff see this presentation: "From Concurrency to Parallelism", by David Edgar Liebke @ http://incanter.org/downloads/fjclj.pdf

    By , created 6.0 years ago, updated 6.0 years ago

    The following can be used to understand how many threads pmap runs at once (assuming tasks are roughly the same computational cost). The min level correspond to the situation where the consumer is slower than the producer, while the max level is when the consumer is faster than the producer:

    • When the sequence is not chunked (for example subvec) the min parallelism is 1 and the max parallelism is (+ 2 N-cores). Example: with 12 cores, (doall (pmap #(Thread/sleep %) (subvec (into [] (range 1000)) 0 999))) keeps 12+2 threads busy.
    • In case of chunked sequences (vast majority are size 32), the min parallelism is (min chunk-size (+ 2 n-cores)), while the max amount is equal to (+ chunk-size 2 N-cores). Example: with 12 cores, (doall (pmap #(Thread/sleep %) (range 1000))) keeps 12+2+32 threads busy.