Reduction & Grouping¶
This document contains glom techniques for transforming a collection of data to a smaller set, otherwise known as “grouping” or “reduction”.
Combining iterables with Flatten and Merge¶
New in version 19.1.0.
Got lists of lists? Sets of tuples? A sequence of dicts (but only want
one)? Do you find yourself reaching for Python’s builtin sum()
and reduce()
? To handle these situations and more, glom has five
specifier types and two convenience functions:
- glom.flatten(target, **kwargs)[source]¶
At its most basic,
flatten()
turns an iterable of iterables into a single list. But it has a few arguments which give it more power:- Parameters
init (callable) – A function or type which gives the initial value of the return. The value must support addition. Common values might be
list
(the default),tuple
, or evenint
. You can also passinit="lazy"
to get a generator.levels (int) – A positive integer representing the number of nested levels to flatten. Defaults to 1.
spec – The glomspec to fetch before flattening. This defaults to the the root level of the object.
Usage is straightforward.
>>> target = [[1, 2], [3], [4]] >>> flatten(target) [1, 2, 3, 4]
Because integers themselves support addition, we actually have two levels of flattening possible, to get back a single integer sum:
>>> flatten(target, init=int, levels=2) 10
However, flattening a non-iterable like an integer will raise an exception:
>>> target = 10 >>> flatten(target) Traceback (most recent call last): ... FoldError: can only Flatten on iterable targets, not int type (...)
By default,
flatten()
will add a mix of iterables together, making it a more-robust alternative to the built-insum(list_of_lists, list())
trick most experienced Python programmers are familiar with using:>>> list_of_iterables = [range(2), [2, 3], (4, 5)] >>> sum(list_of_iterables, []) Traceback (most recent call last): ... TypeError: can only concatenate list (not "tuple") to list
Whereas flatten() handles this just fine:
>>> flatten(list_of_iterables) [0, 1, 2, 3, 4, 5]
The
flatten()
function is a convenient wrapper around theFlatten
specifier type. For embedding in larger specs, and more involved flattening, seeFlatten
and its base,Fold
.
- class glom.Flatten(subspec=T, init=<class 'list'>)[source]¶
The Flatten specifier type is used to combine iterables. By default it flattens an iterable of iterables into a single list containing items from all iterables.
>>> target = [[1], [2, 3]] >>> glom(target, Flatten()) [1, 2, 3]
You can also set init to
"lazy"
, which returns a generator instead of a list. Use this to avoid making extra lists and other collections during intermediate processing steps.
- glom.merge(target, **kwargs)[source]¶
By default,
merge()
turns an iterable of mappings into a single, mergeddict
, leveraging the behavior of theupdate()
method. A new mapping is created and none of the passed mappings are modified.>>> target = [{'a': 'alpha'}, {'b': 'B'}, {'a': 'A'}] >>> res = merge(target) >>> pprint(res) {'a': 'A', 'b': 'B'}
- Parameters
target – The list of dicts, or some other iterable of mappings.
The start state can be customized with the init keyword argument, as well as the update operation, with the op keyword argument. For more on those customizations, see the
Merge
spec.
- class glom.Merge(subspec=T, init=<class 'dict'>, op=None)[source]¶
By default, Merge turns an iterable of mappings into a single, merged
dict
, leveraging the behavior of theupdate()
method. The start state can be customized with init, as well as the update operation, with op.- Parameters
subspec – The location of the iterable of mappings. Defaults to
T
.init (callable) – A type or callable which returns a base instance into which all other values will be merged.
op (callable) – A callable, which takes two arguments, and performs a merge of the second into the first. Can also be the string name of a method to fetch on the instance created from init. Defaults to
"update"
.
Note
Besides the differing defaults, the primary difference between
Merge
and otherFold
subtypes is that its op argument is assumed to be a two-argument function which has no return value and modifies the left parameter in-place. Because the initial state is a new object created with the init parameter, none of the target values are modified.
- class glom.Sum(subspec=T, init=<class 'int'>)[source]¶
The Sum specifier type is used to aggregate integers and other numericals using addition, much like the
sum()
builtin.>>> glom(range(5), Sum()) 10
Note that this specifier takes a callable init parameter like its friends, so to change the start value, be sure to wrap it in a callable:
>>> glom(range(5), Sum(init=lambda: 5.0)) 15.0
To “sum” lists and other iterables, see the
Flatten
spec. For other objects, see theFold
specifier type.
- class glom.Fold(subspec, init, op=<built-in function iadd>)[source]¶
The Fold specifier type is glom’s building block for reducing iterables in data, implementing the classic fold from functional programming, similar to Python’s built-in
reduce()
.- Parameters
subspec – A spec representing the target to fold, which must be an iterable, or otherwise registered to ‘iterate’ (with
register()
).init (callable) – A function or type which will be invoked to initialize the accumulator value.
op (callable) – A function to call on the accumulator value and every value, the result of which will become the new accumulator value. Defaults to
operator.iadd()
.
Usage is as follows:
>>> target = [set([1, 2]), set([3]), set([2, 4])] >>> result = glom(target, Fold(T, init=frozenset, op=frozenset.union)) >>> result == frozenset([1, 2, 3, 4]) True
Note the required
spec
andinit
arguments.op
is optional, but here must be used because theset
andfrozenset
types do not work with addition.While
Fold
is powerful,Flatten
andSum
are subtypes with more convenient defaults for day-to-day use.