Collections and Sequences in Clojure

Purpose

Newcomers to Clojure are often confused by the collection and sequence abstractions and how they relate to one another. This document aims to provide an overview of these concepts and how they may be used in one's code.

TL;DR

  1. "Collection" and "sequence" are abstractions, not a property that can be determined from a given value.
  2. Collections are bags of values.
  3. Sequences are a type of collection, supporting linear access only.
  4. A seq can be derived from any collection (and some non-collections.)
  5. Many linear-access functions derive a seq from their argument using seq.
  6. The main use of seq in user code is to check if a collection or collection-like will yield elements.

Collections

Venn diagram of collection types by properties
Some collection types divided by their properties in a Venn diagram

Clojure's Collection API provides a generic mechanism for creating and handling compound data. Technically, a Clojure collection is an object claiming the clojure.lang.IPersistentCollection interface. This may be discovered using the predicate coll?. The associated conceptual abstraction is that of a bag o' values, supporting certain operations: Adding, removing, counting, finding, and iterating over the values. Commonly seen examples are lists, maps, vectors, sets, and seqs, but these are not the only collection types that Clojure provides.

Different collection types have different APIs, performance characteristics, and intended patterns of usage. Any given collection may match one or more of the following predicates, which group collection types by their broad characteristics:

counted?
These colls know their size and can calculate their count in constant time, without actually traversing their data. This is not just a performance characteristic — some collections are infinite or may not be able to predict their size without running arbitrary code.
associative?
Associative colls support key-value lookups. Maps are the traditional associative data structure, but vectors can be treated as mappings of indices to values.
sequential?
Sequential colls retain a linear ordering under insertion and deletion. Lists, seqs, and vectors have this property. Note that a vector is both sequential and associative, while a set is neither.

Here we see which collections support which predicates, as well as how some non-collections are treated:

'(1 2 3)[4 5 6]{:a 1, :b 2}#{7 8 9}"hello"nil(range)(seq "hello")
coll? truetruetruetruefalsefalsetruetrue
counted? truetruetruetruefalsefalsefalsetrue
sequential? truetruefalsefalsefalsefalsetruetrue
associative? falsetruetruefalsefalsefalsefalsefalse

Note that the string is not a collection, but may be converted into one.

TODO: Raid clojure.core repo for instances of IPersistentCollection, ISeq, etc.

Sequences

A sequence is a data structure that is expected to be accessed in a sequential manner. It may be infinitely long, and may require additional computation in order to read.

(range)'(1 2 3)[4 5 6]{:a 1, :b 2}#{7 8 9}"hello"()[]""nil17
seq truetruetruetruetruetruefalsefalsefalsefalseexception
empty? falsefalsefalsefalsefalsefalsetruetruetruetrueexception

TODO: seq API

[1 2][1][]nil17
first truetruefalsefalseexception
next truefalsefalsefalseexception
rest truetruetruetrueexception

Relationship between the two

Nota Bene: Sequences are not implemented as lists, they just act a lot like them and may be backed by similar data structures.

(range)'(1 2 3)[4 5 6]{:a 1, :b 2}#{7 8 9}"hello"nil17
coll? truetruetruetruetruefalsefalsefalse
seq? truetruefalsefalsefalsefalsefalsefalse

TODO: counted?

Equality

TODO: Equality partitions re: seqs and colls

()[]{}#{}
#(= () %) truetruefalsefalse
#(= [] %) truetruefalsefalse
#(= {} %) falsefalsetruefalse
#(= #{} %) falsefalsefalsetrue

TODO: effect of metadata, sortedness

TODO: comparison with other Java Collections

Further reading