Matching & Validation

New in version 20.7.0.

Sometimes you want to confirm that your target data matches your code’s assumptions. With glom, you don’t need a separate validation step, you can do these checks inline with your glom spec, using Match and friends.

Validation with Match

For matching whole data structures, use a Match spec.

class glom.Match(spec, default=Sentinel('_MISSING'))[source]

glom’s Match specifier type enables a new mode of glom usage: pattern matching. In particular, this mode has been designed for nested data validation.

Pattern specs are evaluated as follows:

  1. Spec instances are always evaluated first
  2. Types match instances of that type
  3. Instances of dict, list, tuple, set, and frozenset are matched recursively
  4. Any other values are compared for equality to the target with ==

By itself, this allows to assert that structures match certain patterns, and may be especially familiar to users of the schema library.

For example, let’s load some data:

>>> target = [
... {'id': 1, 'email': 'alice@example.com'},
... {'id': 2, 'email': 'bob@example.com'}]

A Match pattern can be used to ensure this data is in its expected form:

>>> spec = Match([{'id': int, 'email': str}])

This spec succinctly describes our data structure’s pattern Specifically, a list of dict objects, each of which has exactly two keys, 'id' and 'email', whose values are an int and str, respectively. Now, glom() will ensure our target matches our pattern spec:

>>> result = glom(target, spec)
>>> assert result == \
... [{'id': 1, 'email': 'alice@example.com'}, {'id': 2, 'email': 'bob@example.com'}]

With a more complex Match spec, we can be more precise:

>>> spec = Match([{'id': And(M > 0, int), 'email': Regex('[^@]+@[^@]+')}])

And allows multiple conditions to be applied. Regex evaluates the regular expression against the target value under the 'email' key. In this case, we take a simple approach: an email has exactly one @, with at least one character before and after.

Finally, M is our stand-in for the current target we’re matching against, allowing us to perform in-line comparisons using Python’s native greater-than operator (as well as others). We apply our Match pattern as before:

>>> assert glom(target, spec) == \
... [{'id': 1, 'email': 'alice@example.com'}, {'id': 2, 'email': 'bob@example.com'}]

And as usual, upon a successful match, we get the matched result.

Note

For Python 3.6+ where dictionaries are ordered, keys in the target are matched against keys in the spec in their insertion order.

Parameters:
  • spec – The glomspec representing the pattern to match data against.
  • default – The default value to be returned if a match fails. If not set, a match failure will raise a MatchError.
matches(target)[source]

A convenience method on a Match instance, returns True if the target matches, False if not.

>>> Match(int).matches(-1.0)
False
Parameters:target – Target value or data structure to match against.
verify(target)[source]

A convenience function a Match instance which returns the matched value when target matches, or raises a MatchError when it does not.

Parameters:target – Target value or data structure to match against.
Raises:glom.MatchError

Optional and required dict key matching

Note that our four Match rules above imply that object is a match-anything pattern. Because isinstance(val, object) is true for all values in Python, object is a useful stopping case. For instance, if we wanted to extend an example above to allow additional keys and values in the user dict above we could add object as a generic pass through:

>>> target = [{'id': 1, 'email': 'alice@example.com', 'extra': 'val'}]
>>> spec = Match([{'id': int, 'email': str, object: object}]))
>>> assert glom(target, spec) == \\
    ... [{'id': 1, 'email': 'alice@example.com', 'extra': 'val'}]
True

The fact that {object: object} will match any dictionary exposes the subtlety in Match dictionary evaluation.

By default, value match keys are required, and other keys are optional. For example, 'id' and 'email' above are required because they are matched via ==. If either was not present, it would raise class:~glom.MatchError. class:object however is matched with func:isinstance(). Since it is not an value-match comparison, it is not required.

This default behavior can be modified with Required and Optional.

class glom.Optional(key, default=Sentinel('_MISSING'))[source]

Used as a dict key in a Match() spec, marks that a value match key which would otherwise be required is optional and should not raise MatchError even if no keys match.

For example:

>>> spec = Match({Optional("name"): str})
>>> glom({"name": "alice"}, spec)
{'name': 'alice'}
>>> glom({}, spec)
{}
>>> spec = Match({Optional("name", default=""): str})
>>> glom({}, spec)
{'name': ''}
class glom.Required(key)[source]

Used as a dict key in Match() mode, marks that a key which might otherwise not be required should raise MatchError if the key in the target does not match.

For example:

>>> spec = Match({object: object})

This spec will match any dict, because object is the base type of every object:

>>> glom({}, spec)
{}

{} will also match because match mode does not require at least one match by default. If we want to require that a key matches, we can use Required:

>>> spec = Match({Required(object): object})
>>> glom({}, spec)
Traceback (most recent call last):
...
MatchError: error raised while processing.
 Target-spec trace, with error detail (most recent last):
 - Target: {}
 - Spec: Match({Required(object): <type 'object'>})
 - Spec: {Required(object): <type 'object'>}
MatchError: target missing expected keys Required(object)

Now our spec requires at least one key of any type. You can refine the spec by putting more specific subpatterns inside of Required.

M Expressions

The most concise way to express validation and guards.

glom.M = M

M is similar to T, a stand-in for the current target, but where T allows for attribute and key access and method calls, M allows for comparison operators.

If a comparison succeeds, the target is returned unchanged. If a comparison fails, MatchError is thrown.

Some examples:

>>> glom(1, M > 0)
1
>>> glom(0, M == 0)
0
>>> glom('a', M != 'b') == 'a'
True

M by itself evaluates the current target for truthiness. For example, M | Val(None) is a simple idiom for normalizing all falsey values to None:

>>> from glom import Val
>>> glom([0, False, "", None], [M | Val(None)])
[None, None, None, None]

For convenience, & and | operators are overloaded to construct And and Or instances.

>>> glom(1.0, (M > 0) & float)
1.0

Note

Python’s operator overloading may make for concise code, but it has its limits.

Because bitwise operators (& and |) have higher precedence than comparison operators (>, <, etc.), expressions must be parenthesized.

>>> M > 0 & float
Traceback (most recent call last):
...
TypeError: unsupported operand type(s) for &: 'int' and 'type'

Similarly, because of special handling around ternary comparisons (1 < M < 5) are implemented via short-circuiting evaluation, they also cannot be captured by M.

Boolean operators and matching

While M is an easy way to construct expressions, sometimes a more object-oriented approach can be more suitable.

class glom.Or(*children, **kw)[source]

Tries to apply the first child spec to the target, and return the result. If GlomError is raised, try the next child spec until there are no all child specs have been tried, then raise MatchError.

class glom.And(*children, **kw)[source]

Applies child specs one after the other to the target; if none of the specs raises GlomError, returns the last result.

class glom.Not(child)[source]

Inverts the child. Child spec will be expected to raise GlomError (or subtype), in which case the target will be returned.

If the child spec does not raise GlomError, MatchError will be raised.

String matching

class glom.Regex(pattern, flags=0, func=None)[source]

checks that target is a string which matches the passed regex pattern

raises MatchError if there isn’t a match; returns Target if match

variables captures in regex are added to the scope so they can be used by downstream processes

Control flow with Switch

Match becomes even more powerful when combined with the ability to branch spec execution.

class glom.Switch(cases, default=Sentinel('_MISSING'))[source]

The Switch specifier type routes data processing based on matching keys, much like the classic switch statement.

Here is a spec which differentiates between lowercase English vowel and consonant characters:

>>> switch_spec = Match(Switch([(Or('a', 'e', 'i', 'o', 'u'), Val('vowel')),
...                             (And(str, M, M(T[2:]) == ''), Val('consonant'))]))

The constructor accepts a dict of {keyspec: valspec} or a list of items, [(keyspec, valspec)]. Keys are tried against the current target in order. If a keyspec raises GlomError, the next keyspec is tried. Once a keyspec succeeds, the corresponding valspec is evaluated and returned. Let’s try it out:

>>> glom('a', switch_spec)
'vowel'
>>> glom('z', switch_spec)
'consonant'

If no keyspec succeeds, a MatchError is raised. Our spec only works on characters (strings of length 1). Let’s try a non-character, the integer 3:

>>> glom(3, switch_spec)
Traceback (most recent call last):
...
glom.matching.MatchError: error raised while processing, details below.
 Target-spec trace (most recent last):
 - Target: 3
 - Spec: Match(Switch([(Or('a', 'e', 'i', 'o', 'u'), Val('vowel')), (And(str, M, (M(T[2:]) == '')), Val('...
 + Spec: Switch([(Or('a', 'e', 'i', 'o', 'u'), Val('vowel')), (And(str, M, (M(T[2:]) == '')), Val('conson...
 |\ Spec: Or('a', 'e', 'i', 'o', 'u')
 ||\ Spec: 'a'
 ||X glom.matching.MatchError: 3 does not match 'a'
 ||\ Spec: 'e'
 ||X glom.matching.MatchError: 3 does not match 'e'
 ||\ Spec: 'i'
 ||X glom.matching.MatchError: 3 does not match 'i'
 ||\ Spec: 'o'
 ||X glom.matching.MatchError: 3 does not match 'o'
 ||\ Spec: 'u'
 ||X glom.matching.MatchError: 3 does not match 'u'
 |X glom.matching.MatchError: 3 does not match 'u'
 |\ Spec: And(str, M, (M(T[2:]) == ''))
 || Spec: str
 |X glom.matching.TypeMatchError: expected type str, not int
glom.matching.MatchError: no matches for target in Switch

Note

Switch is one of several branching specifier types in glom. See “Reading Branched Exceptions” for details on interpreting its exception messages.

A default value can be passed to the spec to be returned instead of raising a MatchError.

Note

Switch implements control flow similar to the switch statement proposed in PEP622.

Exceptions

class glom.MatchError(fmt, *args)[source]

Raised when a Match or M check fails.

>>> glom({123: 'a'}, Match({'id': int}))
Traceback (most recent call last):
...
MatchError: key 123 didn't match any of ['id']
class glom.TypeMatchError(actual, expected)[source]

MatchError subtype raised when a Match fails a type check.

>>> glom({'id': 'a'}, Match({'id': int}))
Traceback (most recent call last):
...
TypeMatchError: error raised while processing.
 Target-spec trace, with error detail (most recent last):
 - Target: {'id': 'a'}
 - Spec: Match({'id': <type 'int'>})
 - Spec: {'id': <type 'int'>}
 - Target: 'a'
 - Spec: int
TypeMatchError: expected type int, not str

Validation with Check

Warning

Given the suite of tools introduced with Match, the Check specifier type may be deprecated in a future release.

class glom.Check(spec=T, **kwargs)[source]

Check objects are used to make assertions about the target data, and either pass through the data or raise exceptions if there is a problem.

If any check condition fails, a CheckError is raised.

Parameters:
  • spec – a sub-spec to extract the data to which other assertions will be checked (defaults to applying checks to the target itself)
  • type – a type or sequence of types to be checked for exact match
  • equal_to – a value to be checked for equality match (“==”)
  • validate – a callable or list of callables, each representing a check condition. If one or more return False or raise an exception, the Check will fail.
  • instance_of – a type or sequence of types to be checked with isinstance()
  • one_of – an iterable of values, any of which can match the target (“in”)
  • default – an optional default value to replace the value when the check fails (if default is not specified, GlomCheckError will be raised)

Aside from spec, all arguments are keyword arguments. Each argument, except for default, represent a check condition. Multiple checks can be passed, and if all check conditions are left unset, Check defaults to performing a basic truthy check on the value.

class glom.CheckError(msgs, check, path)[source]

This GlomError subtype is raised when target data fails to pass a Check’s specified validation.

An uncaught CheckError looks like this:

>>> target = {'a': {'b': 'c'}}
>>> glom(target, {'b': ('a.b', Check(type=int))})
Traceback (most recent call last):
...
CheckError: target at path ['a.b'] failed check, got error: "expected type to be 'int', found type 'str'"

If the Check contains more than one condition, there may be more than one error message. The string rendition of the CheckError will include all messages.

You can also catch the CheckError and programmatically access messages through the msgs attribute on the CheckError instance.