Streaming & Iteration¶
New in version 19.10.0.
glom’s helpers for streaming use cases.
Specifier types which yield their results incrementally so that they can be applied to targets which are themselves streaming (e.g. chunks of rows from a database, lines from a file) without excessive memory usage.
glom’s streaming functionality revolves around a single Iter
Specifier type, which has methods to transform the target stream.
- class glom.Iter(subspec=T, **kwargs)[source]¶
Iter()
is glom’s counterpart to Python’s built-initer()
function. Given an iterable target,Iter()
yields the result of applying the passed spec to each element of the target, similar to the built-in[]
spec, but streaming.The following turns a list of strings into integers using Iter(), before deduplicating and converting it to a tuple:
>>> glom(['1', '2', '1', '3'], (Iter(int), set, tuple)) (1, 2, 3)
Iter()
also has many useful methods which can be chained to compose a stream processing pipeline. The above can also be written as:>>> glom(['1', '2', '1', '3'], (Iter().map(int).unique(), tuple)) (1, 2, 3)
Iter()
also respects glom’sSKIP
andSTOP
singletons for filtering and breaking iteration.- Parameters
subspec – A subspec to be applied on each element from the iterable.
sentinel – Keyword-only argument, which, when found in the iterable stream, causes the iteration to stop. Same as with the built-in
iter()
.
- map(subspec)[source]¶
Return a new
Iter()
spec which will apply the provided subspec to each element of the iterable.>>> glom(range(5), Iter().map(lambda x: x * 2).all()) [0, 2, 4, 6, 8]
Because a spec can be a callable,
Iter.map()
does everything the built-inmap()
does, but with the full power of glom specs.>>> glom(['a', 'B', 'C'], Iter().map(T.islower()).all()) [True, False, False]
- filter(key=T)[source]¶
Return a new
Iter()
spec which will include only elements matching the given key.>>> glom(range(6), Iter().filter(lambda x: x % 2).all()) [1, 3, 5]
Because a spec can be a callable,
Iter.filter()
does everything the built-infilter()
does, but with the full power of glom specs. For even more power, combine,Iter.filter()
withCheck()
.>>> # PROTIP: Python's ints know how many binary digits they require, using the bit_length method >>> glom(range(9), Iter().filter(Check(T.bit_length(), one_of=(2, 4), default=SKIP)).all()) [2, 3, 8]
- chunked(size, fill=Sentinel('_MISSING'))[source]¶
Return a new
Iter()
spec which groups elements in the iterable into lists of length size.If the optional fill argument is provided, iterables not evenly divisible by size will be padded out by the fill constant. Otherwise, the final chunk will be shorter than size.
>>> list(glom(range(10), Iter().chunked(3))) [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]] >>> list(glom(range(10), Iter().chunked(3, fill=None))) [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, None, None]]
- split(sep=None, maxsplit=None)[source]¶
Return a new
Iter()
spec which will lazily split an iterable based on a separator (or list of separators), sep. Likestr.split()
, but for all iterables.split_iter()
yields lists of non-separator values. A separator will never appear in the output.>>> target = [1, 2, None, None, 3, None, 4, None] >>> list(glom(target, Iter().split())) [[1, 2], [3], [4]]
Note that
split_iter
is based onstr.split()
, so if sep isNone
,split()
groups separators. If empty lists are desired between two contiguousNone
values, simply usesep=[None]
:>>> list(glom(target, Iter().split(sep=[None]))) [[1, 2], [], [3], [4], []]
A max number of splits may also be set:
>>> list(glom(target, Iter().split(maxsplit=2))) [[1, 2], [3], [4, None]]
- flatten()[source]¶
Returns a new
Iter()
instance which combines iterables into a single iterable.>>> target = [[1, 2], [3, 4], [5]] >>> list(glom(target, Iter().flatten())) [1, 2, 3, 4, 5]
- unique(key=T)[source]¶
Return a new
Iter()
spec which lazily filters out duplicate values, i.e., only the first appearance of a value in a stream will be yielded.>>> target = list('gloMolIcious') >>> out = list(glom(target, Iter().unique(T.lower()))) >>> print(''.join(out)) gloMIcus
- limit(count)[source]¶
A convenient alias for
slice()
, which takes a single argument, count, the max number of items to yield.
- slice(*args)[source]¶
Returns a new
Iter()
spec which trims iterables in the same manner asitertools.islice()
.>>> target = [0, 1, 2, 3, 4, 5] >>> glom(target, Iter().slice(3).all()) [0, 1, 2] >>> glom(target, Iter().slice(2, 4).all()) [2, 3]
This method accepts only positional arguments.
- takewhile(key=T)[source]¶
Returns a new
Iter()
spec which stops the stream once key becomes falsy.>>> glom([3, 2, 0, 1], Iter().takewhile().all()) [3, 2]
itertools.takewhile()
for more details.
- dropwhile(key=T)[source]¶
Returns a new
Iter()
spec which drops stream items until key becomes falsy.>>> glom([0, 0, 3, 2, 0], Iter().dropwhile(lambda t: t < 1).all()) [3, 2, 0]
Note that while similar to
Iter.filter()
, the filter only applies to the beginning of the stream. In a way,Iter.dropwhile()
can be thought of aslstrip()
for streams. Seeitertools.dropwhile()
for more details.
- all()[source]¶
A convenience method which returns a new spec which turns an iterable into a list.
>>> glom(range(5), Iter(lambda t: t * 2).all()) [0, 2, 4, 6, 8]
Note that this spec will always consume the whole iterable, and as such, the spec returned is not an
Iter()
instance.
- first(key=T, default=None)[source]¶
A convenience method for lazily yielding a single truthy item from an iterable.
>>> target = [False, 1, 2, 3] >>> glom(target, Iter().first()) 1
This method takes a condition, key, which can also be a glomspec, as well as a default, in case nothing matches the condition.
As this spec yields at most one item, and not an iterable, the spec returned from this method is not an
Iter()
instance.