Reduction & Grouping

This document contains glom techniques for transforming a collection of data to a smaller set, otherwise known as “grouping” or “reduction”.

Combining iterables with Flatten and Merge

New in version 19.1.0.

Got lists of lists? Sets of tuples? A sequence of dicts (but only want one)? Do you find yourself reaching for Python’s builtin sum() and reduce()? To handle these situations and more, glom has five specifier types and two convenience functions:

glom.flatten(target, **kwargs)[source]

At its most basic, flatten() turns an iterable of iterables into a single list. But it has a few arguments which give it more power:

Parameters

init (callable) – A function or type which gives the initial value of the return. The value must support addition. Common values might be list (the default), tuple, or even int. You can also pass init="lazy" to get a generator.
levels (int) – A positive integer representing the number of nested levels to flatten. Defaults to 1.
spec – The glomspec to fetch before flattening. This defaults to the the root level of the object.

Usage is straightforward.

>>> target = [[1, 2], [3], [4]]
>>> flatten(target)
[1, 2, 3, 4]

Because integers themselves support addition, we actually have two levels of flattening possible, to get back a single integer sum:

>>> flatten(target, init=int, levels=2)
10

However, flattening a non-iterable like an integer will raise an exception:

>>> target = 10
>>> flatten(target)
Traceback (most recent call last):
...
FoldError: can only Flatten on iterable targets, not int type (...)

By default, flatten() will add a mix of iterables together, making it a more-robust alternative to the built-in sum(list_of_lists, list()) trick most experienced Python programmers are familiar with using:

>>> list_of_iterables = [range(2), [2, 3], (4, 5)]
>>> sum(list_of_iterables, [])
Traceback (most recent call last):
...
TypeError: can only concatenate list (not "tuple") to list

Whereas flatten() handles this just fine:

>>> flatten(list_of_iterables)
[0, 1, 2, 3, 4, 5]

The flatten() function is a convenient wrapper around the Flatten specifier type. For embedding in larger specs, and more involved flattening, see Flatten and its base, Fold.

class glom.Flatten(subspec=T, init=<class 'list'>)[source]

The Flatten specifier type is used to combine iterables. By default it flattens an iterable of iterables into a single list containing items from all iterables.

>>> target = [[1], [2, 3]]
>>> glom(target, Flatten())
[1, 2, 3]

You can also set init to "lazy", which returns a generator instead of a list. Use this to avoid making extra lists and other collections during intermediate processing steps.

glom.merge(target, **kwargs)[source]

By default, merge() turns an iterable of mappings into a single, merged dict, leveraging the behavior of the update() method. A new mapping is created and none of the passed mappings are modified.

>>> target = [{'a': 'alpha'}, {'b': 'B'}, {'a': 'A'}]
>>> res = merge(target)
>>> pprint(res)
{'a': 'A', 'b': 'B'}

Parameters: target – The list of dicts, or some other iterable of mappings.

The start state can be customized with the init keyword argument, as well as the update operation, with the op keyword argument. For more on those customizations, see the Merge spec.

class glom.Merge(subspec=T, init=<class 'dict'>, op=None)[source]

By default, Merge turns an iterable of mappings into a single, merged dict, leveraging the behavior of the update() method. The start state can be customized with init, as well as the update operation, with op.

Parameters

subspec – The location of the iterable of mappings. Defaults to T.
init (callable) – A type or callable which returns a base instance into which all other values will be merged.
op (callable) – A callable, which takes two arguments, and performs a merge of the second into the first. Can also be the string name of a method to fetch on the instance created from init. Defaults to "update".

Note

Besides the differing defaults, the primary difference between Merge and other Fold subtypes is that its op argument is assumed to be a two-argument function which has no return value and modifies the left parameter in-place. Because the initial state is a new object created with the init parameter, none of the target values are modified.

class glom.Sum(subspec=T, init=<class 'int'>)[source]

The Sum specifier type is used to aggregate integers and other numericals using addition, much like the sum() builtin.

>>> glom(range(5), Sum())
10

Note that this specifier takes a callable init parameter like its friends, so to change the start value, be sure to wrap it in a callable:

>>> glom(range(5), Sum(init=lambda: 5.0))
15.0

To “sum” lists and other iterables, see the Flatten spec. For other objects, see the Fold specifier type.

class glom.Fold(subspec, init, op=<built-in function iadd>)[source]

The Fold specifier type is glom’s building block for reducing iterables in data, implementing the classic fold from functional programming, similar to Python’s built-in reduce().

Parameters

subspec – A spec representing the target to fold, which must be an iterable, or otherwise registered to ‘iterate’ (with register()).
init (callable) – A function or type which will be invoked to initialize the accumulator value.
op (callable) – A function to call on the accumulator value and every value, the result of which will become the new accumulator value. Defaults to operator.iadd().

Usage is as follows:

>>> target = [set([1, 2]), set([3]), set([2, 4])]
>>> result = glom(target, Fold(T, init=frozenset, op=frozenset.union))
>>> result == frozenset([1, 2, 3, 4])
True

Note the required spec and init arguments. op is optional, but here must be used because the set and frozenset types do not work with addition.

While Fold is powerful, Flatten and Sum are subtypes with more convenient defaults for day-to-day use.

Exceptions

class glom.FoldError[source]: Error raised when Fold() is called on non-iterable targets, and possibly other uses in the future.