Matching & Validation
New in version 20.7.0.
Sometimes you want to confirm that your target data matches your
code’s assumptions. With glom, you don’t need a separate validation
step, you can do these checks inline with your glom spec, using
Match and friends.
Contents
Validation with Match
For matching whole data structures, use a Match spec.
- class glom.Match(spec, default=Sentinel('_MISSING'))[source]
glom’s
Matchspecifier type enables a new mode of glom usage: pattern matching. In particular, this mode has been designed for nested data validation.Pattern specs are evaluated as follows:
By itself, this allows to assert that structures match certain patterns, and may be especially familiar to users of the schema library.
For example, let’s load some data:
>>> target = [ ... {'id': 1, 'email': 'alice@example.com'}, ... {'id': 2, 'email': 'bob@example.com'}]
A
Matchpattern can be used to ensure this data is in its expected form:>>> spec = Match([{'id': int, 'email': str}])
This
specsuccinctly describes our data structure’s pattern Specifically, alistofdictobjects, each of which has exactly two keys,'id'and'email', whose values are anintandstr, respectively. Now,glom()will ensure ourtargetmatches our patternspec:>>> result = glom(target, spec) >>> assert result == \ ... [{'id': 1, 'email': 'alice@example.com'}, {'id': 2, 'email': 'bob@example.com'}]
With a more complex
Matchspec, we can be more precise:>>> spec = Match([{'id': And(M > 0, int), 'email': Regex('[^@]+@[^@]+')}])
Andallows multiple conditions to be applied.Regexevaluates the regular expression against the target value under the'email'key. In this case, we take a simple approach: an email has exactly one@, with at least one character before and after.Finally,
Mis our stand-in for the current target we’re matching against, allowing us to perform in-line comparisons using Python’s native greater-than operator (as well as others). We apply ourMatchpattern as before:>>> assert glom(target, spec) == \ ... [{'id': 1, 'email': 'alice@example.com'}, {'id': 2, 'email': 'bob@example.com'}]
And as usual, upon a successful match, we get the matched result.
Note
For Python 3.6+ where dictionaries are ordered, keys in the target are matched against keys in the spec in their insertion order.
- Parameters
spec – The glomspec representing the pattern to match data against.
default – The default value to be returned if a match fails. If not set, a match failure will raise a
MatchError.
- matches(target)[source]
A convenience method on a
Matchinstance, returnsTrueif the target matches,Falseif not.>>> Match(int).matches(-1.0) False
- Parameters
target – Target value or data structure to match against.
- verify(target)[source]
A convenience function a
Matchinstance which returns the matched value when target matches, or raises aMatchErrorwhen it does not.- Parameters
target – Target value or data structure to match against.
- Raises
Optional and required dict key matching
Note that our four Match rules above imply that
object is a match-anything pattern. Because
isinstance(val, object) is true for all values in Python,
object is a useful stopping case. For instance, if we wanted to
extend an example above to allow additional keys and values in the
user dict above we could add object as a generic pass through:
>>> target = [{'id': 1, 'email': 'alice@example.com', 'extra': 'val'}]
>>> spec = Match([{'id': int, 'email': str, object: object}])
>>> assert glom(target, spec) == \\
... [{'id': 1, 'email': 'alice@example.com', 'extra': 'val'}]
True
The fact that {object: object} will match any dictionary exposes
the subtlety in Match dictionary evaluation.
By default, value match keys are required, and other keys are
optional. For example, 'id' and 'email' above are required
because they are matched via ==. If either was not present, it
would raise MatchError. object however is matched
with isinstance(). Since it is not a value-match comparison,
it is not required.
This default behavior can be modified with Required
and Optional.
- class glom.Optional(key, default=Sentinel('_MISSING'))[source]
Used as a
dictkey in aMatch()spec, marks that a value match key which would otherwise be required is optional and should not raiseMatchErroreven if no keys match.For example:
>>> spec = Match({Optional("name"): str}) >>> glom({"name": "alice"}, spec) {'name': 'alice'} >>> glom({}, spec) {} >>> spec = Match({Optional("name", default=""): str}) >>> glom({}, spec) {'name': ''}
- class glom.Required(key)[source]
Used as a
dictkey inMatch()mode, marks that a key which might otherwise not be required should raiseMatchErrorif the key in the target does not match.For example:
>>> spec = Match({object: object})
This spec will match any dict, because
objectis the base type of every object:>>> glom({}, spec) {}
{}will also match because match mode does not require at least one match by default. If we want to require that a key matches, we can useRequired:>>> spec = Match({Required(object): object}) >>> glom({}, spec) Traceback (most recent call last): ... MatchError: error raised while processing. Target-spec trace, with error detail (most recent last): - Target: {} - Spec: Match({Required(object): <type 'object'>}) - Spec: {Required(object): <type 'object'>} MatchError: target missing expected keys Required(object)
Now our spec requires at least one key of any type. You can refine the spec by putting more specific subpatterns inside of
Required.
M Expressions
The most concise way to express validation and guards.
- glom.M = M
Mis similar toT, a stand-in for the current target, but whereTallows for attribute and key access and method calls,Mallows for comparison operators.If a comparison succeeds, the target is returned unchanged. If a comparison fails,
MatchErroris thrown.Some examples:
>>> glom(1, M > 0) 1 >>> glom(0, M == 0) 0 >>> glom('a', M != 'b') == 'a' True
Mby itself evaluates the current target for truthiness. For example, M | Val(None) is a simple idiom for normalizing all falsey values to None:>>> from glom import Val >>> glom([0, False, "", None], [M | Val(None)]) [None, None, None, None]
For convenience,
&and|operators are overloaded to constructAndandOrinstances.>>> glom(1.0, (M > 0) & float) 1.0
Note
Python’s operator overloading may make for concise code, but it has its limits.
Because bitwise operators (
&and|) have higher precedence than comparison operators (>,<, etc.), expressions must be parenthesized.>>> M > 0 & float Traceback (most recent call last): ... TypeError: unsupported operand type(s) for &: 'int' and 'type'
Similarly, because of special handling around ternary comparisons (
1 < M < 5) are implemented via short-circuiting evaluation, they also cannot be captured byM.
Boolean operators and matching
While M is an easy way to construct expressions, sometimes a more
object-oriented approach can be more suitable.
- class glom.Or(*children, **kw)[source]
Tries to apply the first child spec to the target, and return the result. If GlomError is raised, try the next child spec until there are no all child specs have been tried, then raise MatchError.
- class glom.And(*children, **kw)[source]
Applies child specs one after the other to the target; if none of the specs raises GlomError, returns the last result.
- class glom.Not(child)[source]
Inverts the child. Child spec will be expected to raise
GlomError(or subtype), in which case the target will be returned.If the child spec does not raise
GlomError,MatchErrorwill be raised.
String matching
Control flow with Switch
Match becomes even more powerful when combined with the ability to branch spec execution.
- class glom.Switch(cases, default=Sentinel('_MISSING'))[source]
The
Switchspecifier type routes data processing based on matching keys, much like the classic switch statement.Here is a spec which differentiates between lowercase English vowel and consonant characters:
>>> switch_spec = Match(Switch([(Or('a', 'e', 'i', 'o', 'u'), Val('vowel')), ... (And(str, M, M(T[2:]) == ''), Val('consonant'))]))
The constructor accepts a
dictof{keyspec: valspec}or a list of items,[(keyspec, valspec)]. Keys are tried against the current target in order. If a keyspec raisesGlomError, the next keyspec is tried. Once a keyspec succeeds, the corresponding valspec is evaluated and returned. Let’s try it out:>>> glom('a', switch_spec) 'vowel' >>> glom('z', switch_spec) 'consonant'
If no keyspec succeeds, a
MatchErroris raised. Our spec only works on characters (strings of length 1). Let’s try a non-character, the integer3:>>> glom(3, switch_spec) Traceback (most recent call last): ... glom.matching.MatchError: error raised while processing, details below. Target-spec trace (most recent last): - Target: 3 - Spec: Match(Switch([(Or('a', 'e', 'i', 'o', 'u'), Val('vowel')), (And(str, M, (M(T[2:]) == '')), Val('... + Spec: Switch([(Or('a', 'e', 'i', 'o', 'u'), Val('vowel')), (And(str, M, (M(T[2:]) == '')), Val('conson... |\ Spec: Or('a', 'e', 'i', 'o', 'u') ||\ Spec: 'a' ||X glom.matching.MatchError: 3 does not match 'a' ||\ Spec: 'e' ||X glom.matching.MatchError: 3 does not match 'e' ||\ Spec: 'i' ||X glom.matching.MatchError: 3 does not match 'i' ||\ Spec: 'o' ||X glom.matching.MatchError: 3 does not match 'o' ||\ Spec: 'u' ||X glom.matching.MatchError: 3 does not match 'u' |X glom.matching.MatchError: 3 does not match 'u' |\ Spec: And(str, M, (M(T[2:]) == '')) || Spec: str |X glom.matching.TypeMatchError: expected type str, not int glom.matching.MatchError: no matches for target in Switch
Note
Switchis one of several branching specifier types in glom. See “Reading Branched Exceptions” for details on interpreting its exception messages.A default value can be passed to the spec to be returned instead of raising a
MatchError.Note
Switch implements control flow similar to the switch statement proposed in PEP622.
Exceptions
- class glom.MatchError(fmt, *args)[source]
Raised when a
MatchorMcheck fails.>>> glom({123: 'a'}, Match({'id': int})) Traceback (most recent call last): ... MatchError: key 123 didn't match any of ['id']
- class glom.TypeMatchError(actual, expected)[source]
MatchErrorsubtype raised when aMatchfails a type check.>>> glom({'id': 'a'}, Match({'id': int})) Traceback (most recent call last): ... TypeMatchError: error raised while processing. Target-spec trace, with error detail (most recent last): - Target: {'id': 'a'} - Spec: Match({'id': <type 'int'>}) - Spec: {'id': <type 'int'>} - Target: 'a' - Spec: int TypeMatchError: expected type int, not str
Validation with Check
- class glom.Check(spec=T, **kwargs)[source]
Check objects are used to make assertions about the target data, and either pass through the data or raise exceptions if there is a problem.
If any check condition fails, a
CheckErroris raised.- Parameters
spec – a sub-spec to extract the data to which other assertions will be checked (defaults to applying checks to the target itself)
type – a type or sequence of types to be checked for exact match
equal_to – a value to be checked for equality match (“==”)
validate – a callable or list of callables, each representing a check condition. If one or more return False or raise an exception, the Check will fail.
instance_of – a type or sequence of types to be checked with isinstance()
one_of – an iterable of values, any of which can match the target (“in”)
default – an optional default value to replace the value when the check fails (if default is not specified, GlomCheckError will be raised)
Aside from spec, all arguments are keyword arguments. Each argument, except for default, represent a check condition. Multiple checks can be passed, and if all check conditions are left unset, Check defaults to performing a basic truthy check on the value.
- class glom.CheckError(msgs, check, path)[source]
This
GlomErrorsubtype is raised when target data fails to pass aCheck’s specified validation.An uncaught
CheckErrorlooks like this:>>> target = {'a': {'b': 'c'}} >>> glom(target, {'b': ('a.b', Check(type=int))}) Traceback (most recent call last): ... CheckError: target at path ['a.b'] failed check, got error: "expected type to be 'int', found type 'str'"
If the
Checkcontains more than one condition, there may be more than one error message. The string rendition of theCheckErrorwill include all messages.You can also catch the
CheckErrorand programmatically access messages through themsgsattribute on theCheckErrorinstance.