Applies f to each value in coll, splitting it each time f returns a new value. Returns a lazy seq of partitions. Returns a stateful transducer when no collection is provided.
user=> (partition-by odd? [1 1 1 2 2 3 3]) ((1 1 1) (2 2) (3 3)) user=> (partition-by even? [1 1 1 2 2 3 3]) ((1 1 1) (2 2) (3 3))
;; (this is part of a solution from 4clojure.com/problem 30) user=> (partition-by identity "Leeeeeerrroyyy") ((\L) (\e \e \e \e \e \e) (\r \r \r) (\o) (\y \y \y))
;; Note that previously created 'bins' are not used when same value is seen again user=> (partition-by identity "ABBA") ((\A) (\B \B) (\A)) ;; That is why you use group-by function if you want all the the same values in the same 'bins' :) ;; Which gives you a hash, but you can extract values from that if you need. (group-by identity "ABBA") => {\A [\A \A], \B [\B \B]}
;; Arbitrary partitioning (let [seen (atom true)] (partition-by #(cond (#{1} %) (reset! seen (not @seen)) (or (and (string? %) (< (count %) 2)) (char? %)) "letter" (string? %) "string" (#{0} %) 0 (vector? %) (count %) :else "rest") [1 1 1 2 3 nil "a" \l 0 4 5 {:a 1} "bc" "aa" "k" [0] [1 1] [2 2]])) ;;=> ((1) (1) (1) (2 3 nil) ("a" \l) (0) (4 5 {:a 1}) ("bc" "aa") ("k") ([0]) ([1 1] [2 2]))
I think this is a better (more agnostic & more declarative/functional) arbitrary partitioning. The ratio of true:false can be tweaked for longer average sequence length. (defn arbitrarily-partition [coll] (let [signals (take (count coll) (cycle [true false])) shuffled (shuffle signals) zipped (map vector shuffled coll) partitioned (partition-by first zipped)] (for [c partitioned] (map second c)))) ;;=> (arbitrarily-partition (range 100)) ((0 1) (2) (3) (4 5) (6 7) (8 9 10) (11) (12) (13 14 15) (16) (17 18) (19) (20) (21) (22 23 24 25 26) (27 28) (29) (30 31) (32) (33 34 35 36 37 38) (39 40 41) (42) (43 44 45 46 47) (48 49) (50) (51 52 53) (54) (55 56) (57 58) (59 60 61) (62) (63) (64) (65 66) (67 68) (69) (70) (71) (72 73 74 75) (76) (77 78) (79) (80) (81 82) (83 84 85) (86) (87 88) (89 90 91 92 93) (94) (95 96 97) (98) (99))
Returns a lazy sequence of lists of n items each, at offsets step apart. If step is not supplied, ...
Returns a lazy sequence of lists like partition, but may include partitions with fewer than n item...
Returns a map of the elements of coll keyed by the result of f on each element. The value at each ...
Returns a lazy sequence removing consecutive duplicates in coll. Returns a transducer when no coll...
Returns a sorted sequence of the items in coll. If no comparator is supplied, uses compare. compa...
Returns a vector of [(take-while pred coll) (drop-while pred coll)]
It's worth mentioning that (partition-by identity …)
is equivalent to the Data.List.group
function in Haskell:
(defn group [coll] (partition-by identity coll))
Which proves to be an interesting idiom:
user=> (apply str (for [ch (group "fffffffuuuuuuuuuuuu")] (str (first ch) (count ch)))) ⇒ "f7u12"
Many other programming languages like Kotlin or Haskell define partition
slightly different. They partition the given collection into two collections, the first containing all truthy values and the second elements all falsy elements. This function does it:
(defn partition-2 "Partitions the collection into exactly two [[all-truthy] [all-falsy]] collection." [pred coll] (mapv persistent! (reduce (fn [[t f] x] (if (pred x) [(conj! t x) f] [t (conj! f x)])) [(transient []) (transient [])] coll))) (partition-2 odd? (range 5))
I tried this implementation of your Kotlin/Haskell partition
, which is simpler but somewhat slower (less than 2x):
(defn partition-3 "Partitions the collection into exactly two [[all-truthy] [all-falsy]] collection." [pred coll] (let [m (group-by pred coll)] [(m true) (m false)]))
A third implementation which in my limited testing in cljs is the fastest so far:
(defn split-by "Effectively though non-lazily splits the `coll`ection using `pred`, essentially like `[(filter coll pred) (remove coll pred)]`" [pred coll] (let [match (transient []) no-match (transient [])] (doseq [v coll] (if (pred v) (conj! match v) (conj! no-match v))) [(persistent! match) (persistent! no-match)]))
Using simple-benchmark
in cljs, these are the results:
[r (range 1000)], (partition-2 odd? r), 1000 runs, 167 msecs [r (range 1000)], (partition-3 odd? r), 1000 runs, 364 msecs [r (range 1000)], (split-by odd? r), 1000 runs, 60 msecs
It's worth noting that all implementations of the java-esque partition
in this thread are non-lazy.
On big collections where you don't want to realize the whole list, this is the fastest:
[(filter odd? r) (filter (complement odd?) r)]
Can also be written as:
((juxt filter remove) odd? r)
(taken from: http://blog.jayfields.com/2011/08/clojure-partition-by-split-with-group.html)
Perhaps this will be of help someone trying to find the regions denoted by partitions
(defn partition-at "Like partition-by but will start a new run when f returns true" [f coll] (lazy-seq (when-let [s (seq coll)] (let [run (cons (first s) (take-while #(not (f %)) (rest s)))] (cons run (partition-at f (drop (count run) s)))))))
(taken from: http://cninja.blogspot.com/2011/02/clojure-partition-at.html#comments)