Streaming & Iteration

New in version 19.10.0.

glom’s helpers for streaming use cases.

Specifier types which yield their results incrementally so that they can be applied to targets which are themselves streaming (e.g. chunks of rows from a database, lines from a file) without excessive memory usage.

glom’s streaming functionality revolves around a single Iter Specifier type, which has methods to transform the target stream.

class glom.Iter(subspec=T, **kwargs)[source]

Iter() is glom’s counterpart to Python’s built-in iter() function. Given an iterable target, Iter() yields the result of applying the passed spec to each element of the target, similar to the built-in [] spec, but streaming.

The following turns a list of strings into integers using Iter(), before deduplicating and converting it to a tuple:

>>> glom(['1', '2', '1', '3'], (Iter(int), set, tuple))
(1, 2, 3)

Iter() also has many useful methods which can be chained to compose a stream processing pipeline. The above can also be written as:

>>> glom(['1', '2', '1', '3'], (Iter().map(int).unique(), tuple))
(1, 2, 3)

Iter() also respects glom’s SKIP and STOP singletons for filtering and breaking iteration.

Parameters:
  • subspec – A subspec to be applied on each element from the iterable.
  • sentinel – Keyword-only argument, which, when found in the iterable stream, causes the iteration to stop. Same as with the built-in iter().
map(subspec)[source]

Return a new Iter() spec which will apply the provided subspec to each element of the iterable.

>>> glom(range(5), Iter().map(lambda x: x * 2).all())
[0, 2, 4, 6, 8]

Because a spec can be a callable, Iter.map() does everything the built-in map() does, but with the full power of glom specs.

>>> glom(['a', 'B', 'C'], Iter().map(T.islower()).all())
[True, False, False]
filter(key=T)[source]

Return a new Iter() spec which will include only elements matching the given key.

>>> glom(range(6), Iter().filter(lambda x: x % 2).all())
[1, 3, 5]

Because a spec can be a callable, Iter.filter() does everything the built-in filter() does, but with the full power of glom specs. For even more power, combine, Iter.filter() with Check().

>>> # PROTIP: Python's ints know how many binary digits they require, using the bit_length method
>>> glom(range(9), Iter().filter(Check(T.bit_length(), one_of=(2, 4), default=SKIP)).all())
[2, 3, 8]
chunked(size, fill=Sentinel('_MISSING'))[source]

Return a new Iter() spec which groups elements in the iterable into lists of length size.

If the optional fill argument is provided, iterables not evenly divisible by size will be padded out by the fill constant. Otherwise, the final chunk will be shorter than size.

>>> list(glom(range(10), Iter().chunked(3)))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]
>>> list(glom(range(10), Iter().chunked(3, fill=None)))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, None, None]]
split(sep=None, maxsplit=None)[source]

Return a new Iter() spec which will lazily split an iterable based on a separator (or list of separators), sep. Like str.split(), but for all iterables.

split_iter() yields lists of non-separator values. A separator will never appear in the output.

>>> target = [1, 2, None, None, 3, None, 4, None]
>>> list(glom(target, Iter().split()))
[[1, 2], [3], [4]]

Note that split_iter is based on str.split(), so if sep is None, split() groups separators. If empty lists are desired between two contiguous None values, simply use sep=[None]:

>>> list(glom(target, Iter().split(sep=[None])))
[[1, 2], [], [3], [4], []]

A max number of splits may also be set:

>>> list(glom(target, Iter().split(maxsplit=2)))
[[1, 2], [3], [4, None]]
flatten()[source]

Returns a new Iter() instance which combines iterables into a single iterable.

>>> target = [[1, 2], [3, 4], [5]]
>>> list(glom(target, Iter().flatten()))
[1, 2, 3, 4, 5]
unique(key=T)[source]

Return a new Iter() spec which lazily filters out duplicate values, i.e., only the first appearance of a value in a stream will be yielded.

>>> target = list('gloMolIcious')
>>> out = list(glom(target, Iter().unique(T.lower())))
>>> print(''.join(out))
gloMIcus
limit(count)[source]

A convenient alias for slice(), which takes a single argument, count, the max number of items to yield.

slice(*args)[source]

Returns a new Iter() spec which trims iterables in the same manner as itertools.islice().

>>> target = [0, 1, 2, 3, 4, 5]
>>> glom(target, Iter().slice(3).all())
[0, 1, 2]
>>> glom(target, Iter().slice(2, 4).all())
[2, 3]

This method accepts only positional arguments.

takewhile(key=T)[source]

Returns a new Iter() spec which stops the stream once key becomes falsy.

>>> glom([3, 2, 0, 1], Iter().takewhile().all())
[3, 2]

itertools.takewhile() for more details.

dropwhile(key=T)[source]

Returns a new Iter() spec which drops stream items until key becomes falsy.

>>> glom([0, 0, 3, 2, 0], Iter().dropwhile(lambda t: t < 1).all())
[3, 2, 0]

Note that while similar to Iter.filter(), the filter only applies to the beginning of the stream. In a way, Iter.dropwhile() can be thought of as lstrip() for streams. See itertools.dropwhile() for more details.

all()[source]

A convenience method which returns a new spec which turns an iterable into a list.

>>> glom(range(5), Iter(lambda t: t * 2).all())
[0, 2, 4, 6, 8]

Note that this spec will always consume the whole iterable, and as such, the spec returned is not an Iter() instance.

first(key=T, default=None)[source]

A convenience method for lazily yielding a single truthy item from an iterable.

>>> target = [False, 1, 2, 3]
>>> glom(target, Iter().first())
1

This method takes a condition, key, which can also be a glomspec, as well as a default, in case nothing matches the condition.

As this spec yields at most one item, and not an iterable, the spec returned from this method is not an Iter() instance.