glom¶
Restructuring data, the Python way.
glom is a new approach to working with data in Python, featuring:
- Path-based access for nested structures
- Declarative data transformation using lightweight, Pythonic specifications
- Readable, meaningful error messages
- Built-in debugging features
- Plus, deep assignment, streaming, data validation, and more!
While it may sound like a lot, glom’s straightforward approach becomes second-nature very quickly. Get started with a few minutes on the tutorial!
Installation¶
glom is pure Python, and tested on Python 2.7-3.7, as well as PyPy. Installation is easy:
pip install glom
Then you’re ready to get glomming!
from glom import glom
target = {'a': {'b': {'c': 'd'}}}
glom(target, 'a.b.c') # returns 'd'
There’s much, much more to glom, check out the glom Tutorial and API reference!
Just glom it! ☄️
glom
Tutorial¶
Learn to use glom in no time!
Basic use of glom requires only a glance, not a whole tutorial. The case studies below takes a wider look at day-to-day data and object manipulation, helping you develop an eye for writing robust, declarative data transformations.
Go beyond basic with 10 minutes or less, and even further if you can spare a half hour.
Contents
Note
glom’s tutorial is a runnable module, feel free to run pip
install glom
and from glom.tutorial import *
in the Python
REPL to glom along. Or try it in your browser here or in the
embedded REPLs below!
Dealing with Data¶
Every application deals with data, and these days, even the simplest applications deals with rich, heavily-nested data.
What does nested data looks like? In its most basic form:
>>> data = {'a': {'b': {'c': 'd'}}}
>>> data['a']['b']['c']
'd'
Pretty simple right? On a good day, it certainly can be. But other days, a value might not be set:
>>> data2 = {
... 'a': {
... 'b': None
... }
... }
>>> data2['a']['b']['c']
Traceback (most recent call last):
...
TypeError: 'NoneType' object is not subscriptable
Well that’s no good. We didn’t get our value. We got a TypeError, a
type of error that doesn’t help us at all. The error message doesn’t
even tell us which access failed. If data2
had been passed to us,
we wouldn’t know if 'a'
, 'b'
, or 'c'
had been set to
None
.
If only there were a more semantically powerful accessor.
Accessing Nested Data¶
AKA “Access Granted”
After years of research and countless iterations, the glom team landed on this simple construct:
>>> glom(data, 'a.b.c')
'd'
Well that’s short, and reads fine, but what about in the error case?
>>> glom(data2, 'a.b.c')
Traceback (most recent call last):
...
PathAccessError: could not access 'c', index 2 in path Path('a', 'b', 'c'), got error: ...
That’s more like it! We have a function that can give us our data, or give us an error message we can read, understand, and act upon.
See also
For more on glom’s error messages, see Exceptions & Debugging.
Interactive Deep Get¶
And would you believe this “deep access” example doesn’t even scratch the surface of the tip of the iceberg? Welcome to glom.
Going Beyond Access¶
To start out, let’s introduce some basic terminology:
- target is our data, be it a dict, list, or any other object
- spec is what we want output to be
With output = glom(target, spec)
committed to memory, we’re ready for some new requirements.
Let’s follow some astronomers on their journey exploring the solar system.
>>> target = {
... 'galaxy': {
... 'system': {
... 'planet': 'jupiter'
... }
... }
... }
>>> spec = 'galaxy.system.planet'
>>> glom(target, spec)
'jupiter'
Our astronomers want to focus in on the Solar system, and represent planets as a list. Let’s restructure the data to make a list of names:
>>> target = {
... 'system': {
... 'planets': [
... {'name': 'earth'},
... {'name': 'jupiter'}
... ]
... }
... }
>>> glom(target, ('system.planets', ['name']))
['earth', 'jupiter']
And let’s say we want to capture a parallel list of moon counts with the names as well:
>>> target = {
... 'system': {
... 'planets': [
... {'name': 'earth', 'moons': 1},
... {'name': 'jupiter', 'moons': 69}
... ]
... }
... }
>>> spec = {
... 'names': ('system.planets', ['name']),
... 'moons': ('system.planets', ['moons'])
... }
>>> pprint(glom(target, spec))
{'moons': [1, 69], 'names': ['earth', 'jupiter']}
We can react to changing data requirements as fast as the data itself can change, naturally restructuring our results, despite the input’s nested nature. Like a list comprehension, but for nested data, our code mirrors our output.
Handling Nested Lists¶
In the example above we introduced a new wrinkle: the target for planets
has multiple
entries stored in a list. Previously our targets were all nested dictionaries.
To handle this we use a new spec pattern: (path, [subpath])
. In this pattern path
is the path
to the list, and subpath
is the path within each element of the list. What’s that? You need to handle
lists within lists (within lists …)? Then just repeat the pattern, replacing subpath
with another
(path, [subpath])
tuple. For example, say we have information about each planet’s moons like so:
>>> target = {
... 'system': {
... 'planets': [
... {
... 'name': 'earth',
... 'moons': [
... {'name': 'luna'}
... ]
... },
... {
... 'name': 'jupiter',
... 'moons': [
... {'name': 'io'},
... {'name': 'europa'}
... ]
... }
... ]
... }
... }
We can get the names of each moon from our nested lists by nesting our subpath specs:
>>> spec = {
... 'planet_names': ('system.planets', ['name']),
... 'moon_names': ('system.planets', [('moons', ['name'])])
... }
>>> pprint(glom(target, spec))
{'moon_names': [['luna'], ['io', 'europa']], 'planet_names': ['earth', 'jupiter']}
Changing Requirements¶
Unfortunately, data in the real world is messy. You might be expecting a certain format and end up getting something completely different. No worries, glom to the rescue.
Coalesce is a glom construct that allows you to specify fallback behavior for a list of subspecs. Subspecs are passed as positional arguments, while defaults can be set using keyword arguments.
Let’s say our astronomers recently got a new update in their systems, and sometimes system
will contain
dwarf_planets
instead of planets
.
To handle this, we can define the dwarf_planets
subspec as a Coalesce fallback.
>>> from glom import Coalesce
>>> target = {
... 'system': {
... 'planets': [
... {'name': 'earth', 'moons': 1},
... {'name': 'jupiter', 'moons': 69}
... ]
... }
... }
>>> spec = {
... 'planets': (Coalesce('system.planets', 'system.dwarf_planets'), ['name']),
... 'moons': (Coalesce('system.planets', 'system.dwarf_planets'), ['moons'])
... }
>>> pprint(glom(target, spec))
{'moons': [1, 69], 'planets': ['earth', 'jupiter']}
You can see here we get the expected results, but say our target changes…
>>> target = {
... 'system': {
... 'dwarf_planets': [
... {'name': 'pluto', 'moons': 5},
... {'name': 'ceres', 'moons': 0}
... ]
... }
... }
>>> pprint(glom(target, spec))
{'moons': [5, 0], 'planets': ['pluto', 'ceres']}
Voila, the target can still be parsed and we can elegantly handle changes in our data formats.
Data-Driven Assignment¶
Quite often APIs deliver data in dictionaries without constant key values. They use parts of the data itself as a key. This we call data-driven assignment.
The following example shows you a way to handle this situation. It extracts the moon count from a dictionary that has the planet names as a key.
>>> from glom import glom, T, Merge, Iter, Coalesce
>>> target = {
... "pluto": {"moons": 6, "population": None},
... "venus": {"population": {"aliens": 5}},
... "earth": {"moons": 1, "population": {"humans": 7700000000, "aliens": 1}},
... }
>>> spec = {
... "moons": (
... T.items(),
... Iter({T[0]: (T[1], Coalesce("moons", default=0))}),
... Merge(),
... )
... }
>>> pprint(glom(target, spec))
{'moons': {'earth': 1, 'pluto': 6, 'venus': 0}}
Don’t worry if you do not fully understand how this works at this
point. If you would like to learn more, look up Iter()
,
T
, or Merge
in the glom API reference.
True Python Native¶
Most other implementations are limited to a particular data format or pure model, be it jmespath or XPath/XSLT. glom makes no such sacrifices of practicality, harnessing the full power of Python itself.
Going back to our example, let’s say we wanted to get an aggregate moon count:
>>> target = {
... 'system': {
... 'planets': [
... {'name': 'earth', 'moons': 1},
... {'name': 'jupiter', 'moons': 69}
... ]
... }
... }
>>> pprint(glom(target, {'moon_count': ('system.planets', ['moons'], sum)}))
{'moon_count': 70}
With glom, you have full access to Python at any given moment. Pass values to functions, whether built-in, imported, or defined inline with lambda.
Practical Production Use¶
AKA “Point of Contact”
glom is a practical tool for production use. To best demonstrate how you can use it, we’ll be building an API response. We’re implementing a Contacts web service, like an address book, but backed by an ORM/database and compatible with web and mobile frontends.
Let’s create a Contact to familiarize ourselves with our test data: pri
>>> from glom.tutorial import * # import the tutorial module members
>>> contact = Contact('Julian',
... emails=[Email(email='jlahey@svtp.info')],
... location='Canada')
>>> contact.save()
>>> contact.primary_email
Email(id=5, email='jlahey@svtp.info', email_type='personal')
>>> contact.add_date
datetime.datetime(...)
>>> contact.id
5
As you can see, the Contact object has fields for primary_email
,
defaulting to the first email in the email list, and add_date
, to
track the date the contact was added. And as the unique,
autoincrementing id
suggests, there appear to be a few other
contacts already in our system.
>>> len(Contact.objects.all())
5
Sure enough, we’ve got a little address book going here. But right now it consists of plain Python objects, not very API friendly:
>>> json.dumps(Contact.objects.all())
Traceback (most recent call last):
...
TypeError: Contact(id=1, name='Kurt', ...) ... is not JSON serializable
But at least we know our data, so let’s get to building the API response with glom.
First, let’s set our source object, conventionally named target:
>>> target = Contact.objects.all() # here we could do filtering, etc.
Next, let’s specify the format of our result. Remember, the processing is not happening here, this is just declaring the format. We’ll be going over the specifics of what each line does after we get our results.
>>> spec = {'results': [{'id': 'id',
... 'name': 'name',
... 'add_date': ('add_date', str),
... 'emails': ('emails', [{'id': 'id',
... 'email': 'email',
... 'type': 'email_type'}]),
... 'primary_email': Coalesce('primary_email.email', default=None),
... 'pref_name': Coalesce('pref_name', 'name', skip='', default=''),
... 'detail': Coalesce('company',
... 'location',
... ('add_date.year', str),
... skip='', default='')}]}
With target and spec in hand, we’re ready to glom, build our response, and take a look the final json-serialized form:
>>> resp = glom(target, spec)
>>> print(json.dumps(resp, indent=2, sort_keys=True))
{
"results": [
{
"add_date": "20...",
"detail": "Mountain View",
"emails": [
{
"email": "kurt@example.com",
"id": 1,
"type": "personal"
}
],
"id": 1,
"name": "Kurt",
"pref_name": "Kurt",
"primary_email": "kurt@example.com"
},
...
}
As we can see, our response looks a lot like our glom specification. This type of WYSIWYG code is one of glom’s most important features. After we’ve appreciated that simple fact, let’s look at it line by line.
Understanding the Specification¶
For id
and name
, we’re just doing simple copy-overs. For
add_date
, we use a tuple to denote repeated gloms; we access
add_date
and pass the result to str
to convert it to a string.
For emails we need to serialize a list of subobjects. Good news, glom
subgloms just fine, too. We use a tuple to access emails
, iterate
over that list, and from each we copy over id
and email
. Note
how email_type
is easily remapped to simply type
.
For primary_email
we see our first usage of glom’s Coalesce
feature. Much like SQL’s keyword of the same name, Coalesce
returns the result of the first spec that returns a valid value. In
our case, primary_email
can be None, so a further access of
primary_email.email
would, outside of glom, result in an
AttributeError or TypeError like the one we described before the
Contact example. Inside of a glom Coalesce
, exceptions are caught
and we move on to the next spec. glom raises a
CoalesceError
when no specs match, so we use
default
to tell it to return None instead.
Some Contacts have nicknames or other names they prefer to go by, so
for pref_name
, we want to return the stored pref_name
, or fall
back to the normal name. Again, we use Coalesce
, but
this time we tell it not only to ignore the default
GlomError
exceptions, but also ignore empty string
values, and finally default to empty string if all specs result in
empty strings or GlomError
.
And finally, for our last field, detail
, we want to conjure up a
bit of info that’ll help jog the user’s memory. We’re going to include
the location, or company, or year the contact was added. You can see
an example of this feature as implemented by GitHub, here:
https://github.com/mahmoud/glom/stargazers
Conclusion¶
We’ve seen a crash course in how glom can tame your data and act as a powerful source of code coherency. glom transforms not only your data, but also your code, bringing it in line with the data itself.
glom tamed our nested data, avoiding tedious, bug-prone lines, replacing what would have been large sections with code that was declarative, but flexible, an ideal balance for maintainability.
Frequently Asked Questions¶
Paradigm shifts always raise a question or two.
Contents
What does “glom” mean?¶
“glom” is short for “conglomerate”, which means “gather into a compact form”, coming from the Latin “glom-” meaning ball, like globe.
glom can be used as a noun or verb. A developer might say, “I glommed together this API response.” An astronomer might say, “these gloms of space dust are forming planets and comets.”
Got some data you need to transform? glom it! ☄️
Any other glom terminology worth knowing?¶
A couple of conventional terms that help navigate around glom’s semantics:
- target -
glom
operates on a variety of inputs, so we simply refer to the object being accessed (i.e., the first argument toglom()
) as the “target” - spec - (aka “glomspec”) The accompanying template used to specify the structure and sources of the output.
- output - The value retrieved or created and returned by
glom()
.
All of these can be seen in the conventional call to glom()
:
output = glom(target, spec)
Nothing too wild, but these standard terms really do help clarify the
complex situations glom
was built to handle.
Other glom tips?¶
Just a few (for now):
- Specs don’t have to live in the glom call. You can put them anywhere. Commonly-used specs work as class attributes and globals.
- Using glom’s declarative approach does wonders for code coverage,
much like attrs and schema, both of which go great
with
glom
. - Advanced tips
- glom is designed to support all of Python’s built-ins as targets,
and is readily extensible to other types and special handling, through
register()
. - If you’re trying to minimize global state, consider
instantiating your own
Glommer
object to encapsulate any type registration changes.
- glom is designed to support all of Python’s built-ins as targets,
and is readily extensible to other types and special handling, through
If you’ve got more tips or patterns, send them our way!
Why not just write more Python?¶
The answer is more than just DRY (“Don’t Repeat Yourself”).
Here on the glom team, we’re big fans of Python. Have been for years. In fact, Python is one of a tiny handful of languages that could support something as powerful as glom.
But not all Python code is the same. We built glom to replace the kind of Python that is about as un-Pythonic as code gets: simultaneously fluffy, but also fragile. Simple transformations requiring countless lines.
Before glom, the “right” way to write this transformation code was
verbose. Whether trying to fetch values nested within objects that may
contain attributes set to None
, or performing a list comprehension
which may raise an exception, the correct code was many lines of
repetitious try-except
blocks with a lot of hand-written exception
messages.
Written any more compactly, this Python would produce failures expressed in errors too low-level to associate with the higher-level transformation.
So the glom-less code was hard to change, hard to debug, or
both. glom
specifications are none of the above, thanks to
meaningful, high-level error messages, a a built-in debugging
facility
, and a compact, composable design.
In short, thanks to Python, glom can provide a Pythonic solution for those times when pure Python wasn’t Pythonic enough.
How does glom work?¶
The core conceptual engine of glom is a very simple recursive loop. It could fit on a business card. OK maybe a postcard.
In fact, here it is, in literate form, modified from this early point in glom history:
def glom(target, spec):
# if the spec is a string or a Path, perform a deep-get on the target
if isinstance(spec, (basestring, Path)):
return _get_path(target, spec)
# if the spec is callable, call it on the target
elif callable(spec):
return spec(target)
# if the spec is a dict, assign the result of
# the glom on the right to the field key on the left
elif isinstance(spec, dict):
ret = {}
for field, subspec in spec.items():
ret[field] = glom(target, subspec)
return ret
# if the spec is a list, run the spec inside the list on every
# element in the list and return the new list
elif isinstance(spec, list):
subspec = spec[0]
iterator = _get_iterator(target)
return [glom(t, subspec) for t in iterator]
# if the spec is a tuple of specs, chain the specs by running the
# first spec on the target, then running the second spec on the
# result of the first, and so on.
elif isinstance(spec, tuple):
res = target
for subspec in spec:
res = glom(res, subspec)
return res
else:
raise TypeError('expected one of the above types')
Does Python need a null-coalescing operator?¶
Not technically a glom question, but it is frequently asked!
Null coalescing operators traverse nested objects and return null
(or None
for us) on the first null or non-traversable object,
depending on implementation.
It’s basically a compact way of doing a deep getattr()
with a
default set to None
.
Suffice to say that glom(target, T.a.b.c, default=None)
achieves
this with ease, but I still want to revisit the question, since it’s
part of what got me thinking about glom
in the first place.
First off, working in PayPal’s SOA environment, my team dealt with literally tens of thousands of service objects, with object definitions (from other teams) nested so deep as to make an 80-character line length laughable.
But null coalescing wouldn’t have helped, because in most of those
cases None
wasn’t what we needed. We needed a good, automatically
generated error message when a deeply-nested field wasn’t accessible. Not
NoneType has no attribute 'x'
, but not plain old None
either.
To solve this, I wrote my share of deep-gets before glom
,
including the open-source boltons.iterutils.get_path(). For
whatever reason, it took me years of usage to realize just how often
the deep-gets were coupled with the other transformations that
glom
enables. Now, I can never go back to a simple deep-get.
Another years-in-the-making observation, from my time doing JavaScript
then PHP then Django templates: all were much more lax on typing than
Python. Not because of a fierce belief in weak types, though. More
because when you’re templating, it’s inherently safer to return a
blank value on lookup failures. You’re so close to text formats that
this default achieves a pretty desirable result. While implicitly
doing this isn’t my cup of tea, and glom
opts for explicit
Coalesce
specifiers, this connection contributed to the
concept of glom
as an “object templating” system.
glom
by Analogy¶
glom
is pure Python, and you don’t need to know anything but
Python to use it effectively.
Still, most everyone who encounters glom
for the first time finds
analogies to tools they already know. Whether SQL, list
comprehensions, or HTML templates, there seems to be no end to the
similarities. Many of them intentional!
While glom
is none of those tools, and none of those tools are glom
, a
little comparison doesn’t hurt. This document collects analogies to
help guide understanding along.
Similarity to list comprehensions¶
One of the key inspirations for glom
was the humble list
comprehension, one of my favorite Python features.
List comprehensions make your code look like its output, and that goes
a long way in readability. glom
itself does list processing with
square brackets like [lambda x: x % 2]
, which actually makes it
more like a list comp and the old filter()
function.
glom
’s list processing differs in two ways:
- Required use of a callable or other
glom
spec, to enable deferred processing. - Ability to return
SKIP
, which can exclude items from a list.
Similarity to templating (Jinja, Django, Mustache)¶
glom
is a lot like templating engines, including modern formatters
like gofmt, but with all the format affordances distilled out. glom
doesn’t just work on HTML, XML, JSON, or even just strings.
glom
works on objects, including functions, dicts, and all other
primitives. In fact, it would be safe to call glom
an “object
templating” system.
A lot of insights for glom
came (and continue to come) from writing ashes.
Similarity to SQL and GraphQL¶
In some ways, glom
is a Python query language for Python
objects. But thanks to its restructuring capabilities, it’s much more
than SQL or GraphQL.
With SQL the primary abstraction is an table, or table-like resultset. With GraphQL, the analogous answer to this is, of course, the graph.
glom goes further, not only offering the Python object tree as a graph, but also allowing you to change the shape of the data, restructuring it while fetching and transforming values, which GraphQL only minimally supports, and SQL barely supports at all. Table targets get you table outputs.
Similiarity to validation (jsonschema, schema, cerberus)¶
glom
is a generalized form of intake libraries, and will have
explicit validation support soon. We definitely took schema
becoming successful as a sign that others shared our appetite for
succinct, declarative Python datastructure manipulation.
More importantly, these libraries seem to excel at structuring and parsing data, and don’t solve much on the other end. Translating valid, structured objects like database models to JSON serializable objects is glom’s forté.
Similarity to jq¶
The CLI that glom
packs is very similar in function
to jq, except it uses Python as its query language, instead of making
its own. Most importantly glom gives you a programmatic way forward.
Similarity to XPath/XSLT¶
These hallowed technologies of yore, they were way ahead of the game in many ways. glom intentionally avoids their purity and verbosity, while trying to take as much inspiration as possible from their function.
Others¶
Beyond what’s listed above, several other packages and language features exist in glom’s ballpark, including:
If you know of other useful comparisons, let us know!
Examples & Snippets¶
glom can do a lot of things, in the right hands. This doc makes those hands yours, through sample code of useful building blocks and common glom tasks.
Contents
Note
All samples below assume from glom import glom, T, Call
and any
other dependencies.
Reversing a Target¶
Here are a couple ways to reverse the current target. The first uses
basic Python builtins, the second uses the T
object.
glom([1, 2, 3], (reversed, list))
glom([1, 2, 3], T[::-1])
Iteration Result as Tuple¶
The default glom iteration specifier returns a list, but it’s easy to turn that list into a tuple. The following returns a tuple of absolute-valued integers:
glom([-1, 2, -3], ([abs], tuple))
Data-Driven Assignment¶
glom’s dict specifier interprets the keys as constants. A different technique is required if the dict keys are part of the target data rather than spec.
glom({1:2, 2:3}, Call(dict, args=(T.items(),)))
glom({1:2, 2:3}, lambda t: dict(t.items()))
glom({1:2, 2:3}, dict)
Construct Instance¶
A common use case is to construct an instance. In the most basic case, the default behavior on callable will suffice.
The following converts a list of ints to a list of
decimal.Decimal
objects.
glom([1, 2, 3], [Decimal])
If additional arguments are required, Call
or lambda
are good options.
This converts a list to a collection.deque, while specifying a max size of 10.
glom([1, 2, 3], Call(deque, args=[T, 10]))
glom([1, 2, 3], lambda t: deque(t, 10))
Filtered Iteration¶
Sometimes in addition to stepping through an iterable, you’d like to omit some of the items from the result set all together. Here are two ways to filter the odd numbers from a list.
glom([1, 2, 3, 4, 5, 6], lambda t: [i for i in t if i % 2])
glom([1, 2, 3, 4, 5, 6], [lambda i: i if i % 2 else SKIP])
The second approach demonstrates the use of glom.SKIP
to
back out of an execution.
This can also be combined with Coalesce
to
filter items which are missing sub-attributes.
Here is an example of extracting the primary email from a group of contacts, skipping where the email is empty string, None, or the attribute is missing.
glom(contacts, [Coalesce('primary_email.email', skip=('', None), default=SKIP)])
Preserve Type¶
The iteration specifier will walk lists and tuples. In some cases it would be convenient to preserve the target type in the result type.
This glomspec iterates over a tuple or list, adding one to each
element, and uses T
to return a tuple or list depending
on the target input’s type.
glom((1, 2, 3), (
{
"type": type,
"result": [lambda v: v + 1] # arbitrary operation
}, T['type'](T['result'])))
This demonstrates an advanced technique – just as a tuple can be used to process sub-specs “in series”, a dict can be used to store intermediate results while processing sub-specs “in parallel” so they can then be recombined later on.
Automatic Django ORM type handling¶
In day-to-day Django ORM usage, Managers and QuerySets are
everywhere. They work great with glom, too, but they work even better
when you don’t have to call .all()
all the time. Enable automatic
iteration using the following register()
technique:
import glom
import django.db.models
glom.register(django.db.models.Manager, iterate=lambda m: m.all())
glom.register(django.db.models.QuerySet, iterate=lambda qs: qs.all())
Call this in settings
or somewhere similarly early in your
application setup for the best results.
Filter Iterable¶
An iteration specifier can filter items out by using
SKIP
as the default of a Check
object.
glom(['cat', 1, 'dog', 2], [Check(types=str, default=SKIP)])
# ['cat', 'dog']
You can also truncate the list at the first failing check by using
STOP
.
Lisp-style If Extension¶
Any class with a glomit method will be treated as a spec by glom. As an example, here is a lisp-style If expression custom spec type:
class If(object):
def __init__(self, cond, if_, else_=None):
self.cond, self.if_, self.else_ = cond, if_, else_
def glomit(self, target, scope):
g = lambda spec: scope[glom](target, spec, scope)
if g(self.cond):
return g(self.if_)
elif self.else_:
return g(self.else_)
else:
return None
glom(1, If(bool, {'yes': T}, {'no': T}))
# {'yes': 1}
glom(0, If(bool, {'yes': T}, {'no': T}))
# {'no': 0}
Parellel Evaluation of Sub-Specs¶
This is another example of a simple glom extension. Sometimes it is convenient to execute multiple glom-specs in parallel against a target, and get a sequence of their results.
class Seq(object):
def __init__(self, *subspecs):
self.subspecs = subspecs
def glomit(self, target, scope):
return [scope[glom](target, spec, scope) for spec in self.subspecs]
glom('1', Seq(float, int))
# [1.0, 1]
Without this extension, the simplest way to achieve the same result is with a dict:
glom('1', ({1: float, 2: int}, T.values()))
Clamp Values¶
A common numerical operation is to clamp values – if they are above or below a certain value, assign them to that value.
Using a pattern-matching glom idiom, this can be implemented simply:
glom(range(10), [(M < 7) | Literal(7)])
# [0, 1, 2, 3, 4, 5, 6, 7, 7, 7]
What if you want to drop rather than clamp out-of-range values?
glom(range(10), [(M < 7) | Literal(SKIP)])
# [0, 1, 2, 3, 4, 5, 6]
Transform Tree¶
With an arbitrary depth tree, Ref
can be used to
express a recursive spec.
For example, this etree2dicts spec will recursively walk an ElementTree instance and transform it from nested objects to nested dicts.
etree2dicts = Ref('ElementTree',
{"tag": "tag", "text": "text", "attrib": "attrib", "children": (iter, [Ref('ElementTree')])})
Alternatively, say we only wanted to generate tuples of tag and children:
etree2tuples = Fill(Ref('ElementTree', (T.tag, Iter(Ref('ElementTree')).all())))
(Note also the use of Fill
mode to easily construct a tuple.)
<html>
<head>
<title>the title</title>
</head>
<body id="the-body">
<p>A paragraph</p>
</body>
</html>
Will translate to the following tuples:
>>> etree = ElementTree.fromstring(html_text)
>>> glom(etree, etree2tuples)
('html', [('head', [('title', [])]), ('body', [('p', [])])])
Fix Up Strings in Parsed JSON¶
Tree-walking with Ref()
combines powerfully with
pattern matching from Match()
.
In this case, consider that we want to transform parsed JSON recursively, such that all unicodes are converted to native strings.
glom(json.loads(data),
Ref('json',
Match(Or(
And(dict, {Ref('json'): Ref('json')}),
And(list, [Ref('json')]),
And(type(u''), Auto(str)),
object))))
Match()
above splits the Ref()
evaluation into 4 cases:
- on
dict
, useRef()
to recurse for all keys and values - on
list
, useRef()
to recurse on each item - on text objects (
type(u'')
) – py3str
or py2unicode
– transform the target withstr
- for all other values (
object
), pass them through
As motivation for why this might come up: attributes, class names, function names, and identifiers must be the native string type for a given Python, i.e., bytestrings in Python 2 and unicode in Python 3.
glom
Command-Line Interface¶
Note
glom’s CLI is still under construction. Definitely usable and useful, but glom is a library first, and if you’re reading this, the CLI should not be considered stable.
All the power of glom
, without even opening your text editor!
$ glom --help
Usage: /home/mahmoud/virtualenvs/glom/bin/glom [FLAGS] [spec [target]]
Command-line interface to the glom library, providing nested data
access and data restructuring with the power of Python.
Flags:
--help / -h show this help message and exit
--target-file TARGET_FILE path to target data source (optional)
--target-format TARGET_FORMAT format of the source data (json or python)
(defaults to 'json')
--spec-file SPEC_FILE path to glom spec definition (optional)
--spec-format SPEC_FORMAT format of the glom spec definition (json, python,
python-full) (defaults to 'python')
--indent INDENT number of spaces to indent the result, 0 to disable
pretty-printing (defaults to 2)
--debug interactively debug any errors that come up
--inspect interactively explore the data
The glom
command will also read from standard input (stdin) and
process that data as the target.
Here’s an example, filtering a GitHub API example to something much more flat and readable:
$ pip install glom
$ curl -s https://api.github.com/repos/mahmoud/glom/events \
| glom '[{"type": "type", "date": "created_at", "user": "actor.login"}]'
This yields:
[
{
"date": "2018-05-09T03:39:44Z",
"type": "WatchEvent",
"user": "asapzacy"
},
{
"date": "2018-05-08T22:51:46Z",
"type": "WatchEvent",
"user": "CameronCairns"
},
{
"date": "2018-05-08T03:27:27Z",
"type": "PushEvent",
"user": "mahmoud"
},
{
"date": "2018-05-08T03:27:27Z",
"type": "PullRequestEvent",
"user": "mahmoud"
}
...
]
By default the CLI target is JSON and the spec is a Python literal.
Note
Because the default CLI spec is a Python literal, there are no
lambdas and other Python/glom constructs available. These features
are gated behind the --spec-format python-full
option to avoid
code injection and other unwanted consequences.
The --debug
and --inspect
flags are useful for exploring
data. Note that they are not available when piping data through
stdin. Save that API response to a file and use --target-file
to
do your interactive experimenting.
Core glom
API¶
glom gets results.
The glom
package has one central entrypoint,
glom.glom()
. Everything else in the package revolves around that
one function. Sometimes, big things come in small packages.
A couple of conventional terms you’ll see repeated many times below:
- target - glom is built to work on any data, so we simply refer to the object being accessed as the “target”
- spec - (aka “glomspec”, short for specification) The accompanying template used to specify the structure of the return value.
Now that you know the terms, let’s take a look around glom’s powerful semantics.
See also
As the glom API grows, we’ve refactored the docs into separate domains. The core API is below. More specialized types can also be found in the following docs:
Longtime glom docs readers: thanks in advance for reporting/fixing any broken links you may find.
Contents
The glom
Function¶
Where it all happens. The reason for the season. The eponymous
function, glom()
.
-
glom.
glom
(target, spec, **kwargs)[source]¶ Access or construct a value from a given target based on the specification declared by spec.
Accessing nested data, aka deep-get:
>>> target = {'a': {'b': 'c'}} >>> glom(target, 'a.b') 'c'
Here the spec was just a string denoting a path,
'a.b.
. As simple as it should be. The next example shows how to use nested data to access many fields at once, and make a new nested structure.Constructing, or restructuring more-complicated nested data:
>>> target = {'a': {'b': 'c', 'd': 'e'}, 'f': 'g', 'h': [0, 1, 2]} >>> spec = {'a': 'a.b', 'd': 'a.d', 'h': ('h', [lambda x: x * 2])} >>> output = glom(target, spec) >>> pprint(output) {'a': 'c', 'd': 'e', 'h': [0, 2, 4]}
glom
also takes a keyword-argument, default. When set, if aglom
operation fails with aGlomError
, the default will be returned, very much likedict.get()
:>>> glom(target, 'a.xx', default='nada') 'nada'
The skip_exc keyword argument controls which errors should be ignored.
>>> glom({}, lambda x: 100.0 / len(x), default=0.0, skip_exc=ZeroDivisionError) 0.0
Parameters: - target (object) – the object on which the glom will operate.
- spec (object) – Specification of the output object in the form of a dict, list, tuple, string, other glom construct, or any composition of these.
- default (object) – An optional default to return in the case an exception, specified by skip_exc, is raised.
- skip_exc (Exception) – An optional exception or tuple of exceptions to ignore and return default (None if omitted). If skip_exc and default are both not set, glom raises errors through.
- scope (dict) – Additional data that can be accessed via S inside the glom-spec.
It’s a small API with big functionality, and glom’s power is only surpassed by its intuitiveness. Give it a whirl!
Basic Specifiers¶
Basic glom specifications consist of dict
, list
, tuple
,
str
, and callable
objects. However, as data calls for more
complicated interactions, glom
provides specialized specifier
types that can be used with the basic set of Python builtins.
-
class
glom.
Path
(*path_parts)[source]¶ Path objects specify explicit paths when the default
'a.b.c'
-style general access syntax won’t work or isn’t desirable. Use this to wrap ints, datetimes, and other valid keys, as well as strings with dots that shouldn’t be expanded.>>> target = {'a': {'b': 'c', 'd.e': 'f', 2: 3}} >>> glom(target, Path('a', 2)) 3 >>> glom(target, Path('a', 'd.e')) 'f'
Paths can be used to join together other Path objects, as well as
T
objects:>>> Path(T['a'], T['b']) T['a']['b'] >>> Path(Path('a', 'b'), Path('c', 'd')) Path('a', 'b', 'c', 'd')
Paths also support indexing and slicing, with each access returning a new Path object:
>>> path = Path('a', 'b', 1, 2) >>> path[0] Path('a') >>> path[-2:] Path(1, 2)
-
class
glom.
Literal
(value)[source]¶ Literal objects specify literal values in rare cases when part of the spec should not be interpreted as a glommable subspec. Wherever a Literal object is encountered in a spec, it is replaced with its wrapped value in the output.
>>> target = {'a': {'b': 'c'}} >>> spec = {'a': 'a.b', 'readability': Literal('counts')} >>> pprint(glom(target, spec)) {'a': 'c', 'readability': 'counts'}
Instead of accessing
'counts'
as a key like it did with'a.b'
,glom()
just unwrapped the literal and included the value.Literal
takes one argument, the literal value that should appear in the glom output.This could also be achieved with a callable, e.g.,
lambda x: 'literal_string'
in the spec, but using aLiteral
object adds explicitness, code clarity, and a cleanrepr()
.
-
class
glom.
Spec
(spec, scope=None)[source]¶ Spec objects serve three purposes, here they are, roughly ordered by utility:
- As a form of compiled or “curried” glom call, similar to
Python’s built-in
re.compile()
. - A marker as an object as representing a spec rather than a literal value in certain cases where that might be ambiguous.
- A way to update the scope within another Spec.
In the second usage, Spec objects are the complement to
Literal
, wrapping a value and marking that it should be interpreted as a glom spec, rather than a literal value. This is useful in places where it would be interpreted as a value by default. (Such as T[key], Call(func) where key and func are assumed to be literal values and not specs.)Parameters: - spec – The glom spec.
- scope (dict) – additional values to add to the scope when evaluating this Spec
- As a form of compiled or “curried” glom call, similar to
Python’s built-in
See also
Note that many of the Specifier types previously mentioned here have moved into their own docs, among them:
Object-Oriented Access and Method Calls with T¶
glom’s shortest-named feature may be its most powerful.
-
glom.
T
= T¶ T
, short for “target”. A singleton object that enables object-oriented expression of a glom specification.Note
T
is a singleton, and does not need to be constructed.Basically, think of
T
as your data’s stunt double. Everything that you do toT
will be recorded and executed during theglom()
call. Take this example:>>> spec = T['a']['b']['c'] >>> target = {'a': {'b': {'c': 'd'}}} >>> glom(target, spec) 'd'
So far, we’ve relied on the
'a.b.c'
-style shorthand for access, or used thePath
objects, but if you want to explicitly do attribute and key lookups, look no further thanT
.But T doesn’t stop with unambiguous access. You can also call methods and perform almost any action you would with a normal object:
>>> spec = ('a', (T['b'].items(), list)) # reviewed below >>> glom(target, spec) [('c', 'd')]
A
T
object can go anywhere in the spec. As seen in the example above, we access'a'
, use aT
to get'b'
and iterate over itsitems
, turning them into alist
.You can even use
T
withCall
to construct objects:>>> class ExampleClass(object): ... def __init__(self, attr): ... self.attr = attr ... >>> target = {'attr': 3.14} >>> glom(target, Call(ExampleClass, kwargs=T)).attr 3.14
On a further note, while
lambda
works great in glom specs, and can be very handy at times,T
andCall
eliminate the need for the vast majority oflambda
usage with glom.Unlike
lambda
and other functions,T
roundtrips beautifully and transparently:>>> T['a'].b['c']('success') T['a'].b['c']('success')
T
-related access errors raise aPathAccessError
during theglom()
call.Note
While
T
is clearly useful, powerful, and here to stay, its semantics are still being refined. Currently, operations beyond method calls and attribute/item access are considered experimental and should not be relied upon.
Defaults with Coalesce¶
Data isn’t always where or what you want it to be. Use these specifiers to declare away overly branchy procedural code.
-
class
glom.
Coalesce
(*subspecs, **kwargs)[source]¶ Coalesce objects specify fallback behavior for a list of subspecs.
Subspecs are passed as positional arguments, and keyword arguments control defaults. Each subspec is evaluated in turn, and if none match, a
CoalesceError
is raised, or a default is returned, depending on the options used.Note
This operation may seem very familar if you have experience with SQL or even C# and others.
In practice, this fallback behavior’s simplicity is only surpassed by its utility:
>>> target = {'c': 'd'} >>> glom(target, Coalesce('a', 'b', 'c')) 'd'
glom tries to get
'a'
fromtarget
, but gets a KeyError. Rather than raise aPathAccessError
as usual, glom coalesces into the next subspec,'b'
. The process repeats until it gets to'c'
, which returns our value,'d'
. If our value weren’t present, we’d see:>>> target = {} >>> glom(target, Coalesce('a', 'b')) Traceback (most recent call last): ... CoalesceError: no valid values found. Tried ('a', 'b') and got (PathAccessError, PathAccessError) ...
Same process, but because
target
is empty, we get aCoalesceError
. If we want to avoid an exception, and we know which value we want by default, we can set default:>>> target = {} >>> glom(target, Coalesce('a', 'b', 'c'), default='d-fault') 'd-fault'
'a'
,'b'
, and'c'
weren’t present so we got'd-fault'
.Parameters: - subspecs – One or more glommable subspecs
- default – A value to return if no subspec results in a valid value
- default_factory – A callable whose result will be returned as a default
- skip – A value, tuple of values, or predicate function representing values to ignore
- skip_exc – An exception or tuple of exception types to catch and
move on to the next subspec. Defaults to
GlomError
, the parent type of all glom runtime exceptions.
If all subspecs produce skipped values or exceptions, a
CoalesceError
will be raised. For more examples, check out the glom Tutorial, which makes extensive use of Coalesce.
-
glom.
SKIP
= Sentinel('SKIP')¶ The
SKIP
singleton can be returned from a function or included via aLiteral
to cancel assignment into the output object.>>> target = {'a': 'b'} >>> spec = {'a': lambda t: t['a'] if t['a'] == 'a' else SKIP} >>> glom(target, spec) {} >>> target = {'a': 'a'} >>> glom(target, spec) {'a': 'a'}
Mostly used to drop keys from dicts (as above) or filter objects from lists.
Note
SKIP was known as OMIT in versions 18.3.1 and prior. Versions 19+ will remove the OMIT alias entirely.
-
glom.
STOP
= Sentinel('STOP')¶ The
STOP
singleton can be used to halt iteration of a list or execution of a tuple of subspecs.>>> target = range(10) >>> spec = [lambda x: x if x < 5 else STOP] >>> glom(target, spec) [0, 1, 2, 3, 4]
Calling Callables with Invoke¶
New in version 19.10.0.
From calling functions to constructing objects, it’s hardly Python if you’re not invoking callables. By default, single-argument functions work great on their own in glom specs. The function gets passed the target and it just works:
>>> glom(['1', '3', '5'], [int])
[1, 3, 5]
Zero-argument and multi-argument functions get a lot trickier,
especially when more than one of those arguments comes from the
target, thus the Invoke
spec.
-
class
glom.
Invoke
(func)[source]¶ Specifier type designed for easy invocation of callables from glom.
Parameters: func (callable) – A function or other callable object. Invoke
is similar tofunctools.partial()
, but with the ability to set up a “templated” call which interleaves constants and glom specs.For example, the following creates a spec which can be used to check if targets are integers:
>>> is_int = Invoke(isinstance).specs(T).constants(int) >>> glom(5, is_int) True
And this composes like any other glom spec:
>>> target = [7, object(), 9] >>> glom(target, [is_int]) [True, False, True]
Another example, mixing positional and keyword arguments:
>>> spec = Invoke(sorted).specs(T).constants(key=int, reverse=True) >>> target = ['10', '5', '20', '1'] >>> glom(target, spec) ['20', '10', '5', '1']
Invoke also helps with evaluating zero-argument functions:
>>> glom(target={}, spec=Invoke(int)) 0
(A trivial example, but from timestamps to UUIDs, zero-arg calls do come up!)
Note
Invoke
is mostly for functions, object construction, and callable objects. For calling methods, consider theT
object.-
constants
(*a, **kw)[source]¶ Returns a new
Invoke
spec, with the provided positional and keyword argument values stored for passing to the underlying function.>>> spec = Invoke(T).constants(5) >>> glom(range, (spec, list)) [0, 1, 2, 3, 4]
Subsequent positional arguments are appended:
>>> spec = Invoke(T).constants(2).constants(10, 2) >>> glom(range, (spec, list)) [2, 4, 6, 8]
Keyword arguments also work as one might expect:
>>> round_2 = Invoke(round).constants(ndigits=2).specs(T) >>> glom(3.14159, round_2) 3.14
constants()
and otherInvoke
methods may be called multiple times, just remember that every call returns a new spec.
-
classmethod
specfunc
(spec)[source]¶ Creates an
Invoke
instance where the function is indicated by a spec.>>> spec = Invoke.specfunc('func').constants(5) >>> glom({'func': range}, (spec, list)) [0, 1, 2, 3, 4]
-
specs
(*a, **kw)[source]¶ Returns a new
Invoke
spec, with the provided positional and keyword arguments stored to be interpreted as specs, with the results passed to the underlying function.>>> spec = Invoke(range).specs('value') >>> glom({'value': 5}, (spec, list)) [0, 1, 2, 3, 4]
Subsequent positional arguments are appended:
>>> spec = Invoke(range).specs('start').specs('end', 'step') >>> target = {'start': 2, 'end': 10, 'step': 2} >>> glom(target, (spec, list)) [2, 4, 6, 8]
Keyword arguments also work as one might expect:
>>> multiply = lambda x, y: x * y >>> times_3 = Invoke(multiply).constants(y=3).specs(x='value') >>> glom({'value': 5}, times_3) 15
specs()
and otherInvoke
methods may be called multiple times, just remember that every call returns a new spec.
-
star
(args=None, kwargs=None)[source]¶ Returns a new
Invoke
spec, with args and/or kwargs specs set to be “starred” or “star-starred” (respectively)>>> import os.path >>> spec = Invoke(os.path.join).star(args='path') >>> target = {'path': ['path', 'to', 'dir']} >>> glom(target, spec) 'path/to/dir'
Parameters: - args (spec) – A spec to be evaluated and “starred” into the underlying function.
- kwargs (spec) – A spec to be evaluated and “star-starred” into the underlying function.
One or both of the above arguments should be set.
The
star()
, like otherInvoke
methods, may be called multiple times. The args and kwargs will be stacked in the order in which they are provided.
-
Alternative approach to functions: Call¶
An earlier, more primitive approach to callables in glom was the Call specifier type.
Warning
Given superiority of its successor, Invoke
,
the Call
type may be deprecated in a future release.
-
class
glom.
Call
(func=None, args=None, kwargs=None)[source]¶ Call
specifies when a target should be passed to a function, func.Call
is similar topartial()
in that it is no more powerful thanlambda
or other functions, but it is designed to be more readable, with a betterrepr
.Parameters: func (callable) – a function or other callable to be called with the target Call
combines well withT
to construct objects. For instance, to generate a dict and then pass it to a constructor:>>> class ExampleClass(object): ... def __init__(self, attr): ... self.attr = attr ... >>> target = {'attr': 3.14} >>> glom(target, Call(ExampleClass, kwargs=T)).attr 3.14
This does the same as
glom(target, lambda target: ExampleClass(**target))
, but it’s easy to see which one reads better.Note
Call
is mostly for functions. Use aT
object if you need to call a method.
Self-Referential Specs¶
Sometimes nested data repeats itself, either recursive structure or just through redundancy.
-
class
glom.
Ref
(name, subspec=Sentinel('_MISSING'))[source]¶ Name a part of a spec and refer to it elsewhere in the same spec, useful for trees and other self-similar data structures.
Parameters: - name (str) – The name of the spec to reference.
- subspec – Pass a spec to name it name, or leave unset to refer to an already-named spec.
Core Exceptions¶
Not all data is going to match specifications. Luckily, glom errors are designed to be as readable and actionable as possible.
All glom exceptions inherit from GlomError
, described below,
along with other core exception types. For more details about handling
and debugging exceptions, see “Exceptions & Debugging”.
-
class
glom.
PathAccessError
(exc, path, part_idx)[source]¶ This
GlomError
subtype represents a failure to access an attribute as dictated by the spec. The most commonly-seen error when using glom, it maintains a copy of the original exception and produces a readable error message for easy debugging.If you see this error, you may want to:
- Check the target data is accurate using
Inspect
- Catch the exception and return a semantically meaningful error message
- Use
glom.Coalesce
to specify a default - Use the top-level
default
kwarg onglom()
In any case, be glad you got this error and not the one it was wrapping!
Parameters: - exc (Exception) – The error that arose when we tried to access path. Typically an instance of KeyError, AttributeError, IndexError, or TypeError, and sometimes others.
- path (Path) – The full Path glom was in the middle of accessing when the error occurred.
- part_idx (int) – The index of the part of the path that caused the error.
>>> target = {'a': {'b': None}} >>> glom(target, 'a.b.c') Traceback (most recent call last): ... PathAccessError: could not access 'c', part 2 of Path('a', 'b', 'c'), got error: ...
- Check the target data is accurate using
-
class
glom.
CoalesceError
(coal_obj, skipped, path)[source]¶ This
GlomError
subtype is raised from within aCoalesce
spec’s processing, when none of the subspecs match and no default is provided.The exception object itself keeps track of several values which may be useful for processing:
Parameters: >>> target = {} >>> glom(target, Coalesce('a', 'b')) Traceback (most recent call last): ... CoalesceError: no valid values found. Tried ('a', 'b') and got (PathAccessError, PathAccessError) ...
-
class
glom.
UnregisteredTarget
(op, target_type, type_map, path)[source]¶ This
GlomError
subtype is raised when a spec calls for an unsupported action on a target type. For instance, trying to iterate on an non-iterable target:>>> glom(object(), ['a.b.c']) Traceback (most recent call last): ... UnregisteredTarget: target type 'object' not registered for 'iterate', expected one of registered types: (...)
It should be noted that this is a pretty uncommon occurrence in production glom usage. See the Setup and Registration section for details on how to avoid this error.
An UnregisteredTarget takes and tracks a few values:
Parameters:
Setup and Registration¶
When it comes to targets, glom()
will operate on the
vast majority of objects out there in Python-land. However, for that
very special remainder, glom is readily extensible!
-
glom.
register
(target_type, **kwargs)[source]¶ Register target_type so
glom()
will know how to handle instances of that type as targets.Parameters: - target_type (type) – A type expected to appear in a glom() call target
- get (callable) – A function which takes a target object and
a name, acting as a default accessor. Defaults to
getattr()
. - iterate (callable) – A function which takes a target object
and returns an iterator. Defaults to
iter()
if target_type appears to be iterable. - exact (bool) – Whether or not to match instances of subtypes of target_type.
Note
The module-level
register()
function affects the module-levelglom()
function’s behavior. If this global effect is undesirable for your application, or you’re implementing a library, consider instantiating aGlommer
instance, and using theregister()
andGlommer.glom()
methods instead.
-
class
glom.
Glommer
(**kwargs)[source]¶ The
Glommer
type mostly serves to encapsulate type registration context so that advanced uses of glom don’t need to worry about stepping on each other.Glommer objects are lightweight and, once instantiated, provide a
glom()
method:>>> glommer = Glommer() >>> glommer.glom({}, 'a.b.c', default='d') 'd' >>> Glommer().glom({'vals': list(range(3))}, ('vals', len)) 3
Instances also provide
register()
method for localized control over type handling.Parameters: register_default_types (bool) – Whether or not to enable the handling behaviors of the default glom()
. These default actions include dict access, list and iterable iteration, and generic object attribute access. Defaults to True.
Assignment & Mutation¶
By default, glom aims to safely return a transformed copy of your data. But sometimes you really need to transform an existing object.
When you already have a large or complex bit of nested data that you
are sure you want to modify in-place, glom has you covered, with the
assign()
function, and the Assign()
specifier
type.
Contents
Assignment¶
Deeply assign within an existing structure, given a path and a value.
-
glom.
assign
(obj, path, val, missing=None)[source]¶ New in glom 18.3.0
The
assign()
function provides convenient “deep set” functionality, modifying nested data structures in-place:>>> target = {'a': [{'b': 'c'}, {'d': None}]} >>> _ = assign(target, 'a.1.d', 'e') # let's give 'd' a value of 'e' >>> pprint(target) {'a': [{'b': 'c'}, {'d': 'e'}]}
Missing structures can also be automatically created with the missing parameter. For more information and examples, see the
Assign
specifier type, which this function wraps.
-
class
glom.
Assign
(path, val, missing=None)[source]¶ New in glom 18.3.0
The
Assign
specifier type enables glom to modify the target, performing a “deep-set” to mirror glom’s original deep-get use case.Assign
can be used to perform spot modifications of large data structures when making a copy is not desired:# deep assignment into a nested dictionary >>> target = {'a': {}} >>> spec = Assign('a.b', 'value') >>> _ = glom(target, spec) >>> pprint(target) {'a': {'b': 'value'}}
The value to be assigned can also be a
Spec
, which is useful for copying values around within the data structure:# copying one nested value to another >>> _ = glom(target, Assign('a.c', Spec('a.b'))) >>> pprint(target) {'a': {'b': 'value', 'c': 'value'}}
Another handy use of Assign is to deep-apply a function:
# sort a deep nested list >>> target={'a':{'b':[3,1,2]}} >>> _ = glom(target, Assign('a.b', Spec(('a.b',sorted)))) >>> pprint(target) {'a': {'b': [1, 2, 3]}}
Like many other specifier types,
Assign
’s destination path can be aT
expression, for maximum control:# changing the error message of an exception in an error list >>> err = ValueError('initial message') >>> target = {'errors': [err]} >>> _ = glom(target, Assign(T['errors'][0].args, ('new message',))) >>> str(err) 'new message'
Assign
has built-in support for assigning to attributes of objects, keys of mappings (like dicts), and indexes of sequences (like lists). Additional types can be registered throughregister()
using the"assign"
operation name.Attempting to assign to an immutable structure, like a
tuple
, will result in aPathAssignError
. Attempting to assign to a path that doesn’t exist will raise aPathAccessError
.To automatically backfill missing structures, you can pass a callable to the missing argument. This callable will be called for each path segment along the assignment which is not present.
>>> target = {} >>> assign(target, 'a.b.c', 'hi', missing=dict) {'a': {'b': {'c': 'hi'}}}
Deletion¶
Delete attributes from objects and keys from containers.
-
glom.
delete
(obj, path, ignore_missing=False)[source]¶ The
delete()
function provides “deep del” functionality, modifying nested data structures in-place:>>> target = {'a': [{'b': 'c'}, {'d': None}]} >>> delete(target, 'a.0.b') {'a': [{}, {'d': None}]}
Attempting to delete missing keys, attributes, and indexes will raise a
PathDeleteError
. To ignore these errors, use the ignore_missing argument:>>> delete(target, 'does_not_exist', ignore_missing=True) {'a': [{}, {'d': None}]}
For more information and examples, see the
Delete
specifier type, which this convenience function wraps.New in version 20.5.0.
-
class
glom.
Delete
(path, ignore_missing=False)[source]¶ In addition to glom’s core “deep-get” and
Assign
’s “deep-set”, theDelete
specifier type performs a “deep-del”, which can remove items from larger data structures by key, attribute, and index.>>> target = {'dict': {'x': [5, 6, 7]}} >>> glom(target, Delete('dict.x.1')) {'dict': {'x': [5, 7]}} >>> glom(target, Delete('dict.x')) {'dict': {}}
If a target path is missing, a
PathDeleteError
will be raised. To ignore missing targets, use theignore_missing
flag:>>> glom(target, Delete('does_not_exist', ignore_missing=True)) {'dict': {}}
Delete
has built-in support for deleting attributes of objects, keys of dicts, and indexes of sequences (like lists). Additional types can be registered throughregister()
using the"delete"
operation name.New in version 20.5.0.
Exceptions¶
-
class
glom.
PathAssignError
(exc, path, dest_name)[source]¶ This
GlomError
subtype is raised when an assignment fails, stemming from anassign()
call or otherAssign
usage.One example would be assigning to an out-of-range position in a list:
>>> assign(["short", "list"], Path(5), 'too far') Traceback (most recent call last): ... PathAssignError: could not assign 5 on object at Path(), got error: IndexError(...
Other assignment failures could be due to assigning to an
@property
or exception being raised inside a__setattr__()
.
-
class
glom.
PathDeleteError
(exc, path, dest_name)[source]¶ This
GlomError
subtype is raised when an assignment fails, stemming from andelete()
call or otherDelete
usage.One example would be deleting an out-of-range position in a list:
>>> delete(["short", "list"], Path(5)) Traceback (most recent call last): ... PathDeleteError: could not delete 5 on object at Path(), got error: IndexError(...
Other assignment failures could be due to deleting a read-only
@property
or exception being raised inside a__delattr__()
.
Streaming & Iteration¶
New in version 19.10.0.
glom’s helpers for streaming use cases.
Specifier types which yield their results incrementally so that they can be applied to targets which are themselves streaming (e.g. chunks of rows from a database, lines from a file) without excessive memory usage.
glom’s streaming functionality revolves around a single Iter
Specifier type, which has methods to transform the target stream.
-
class
glom.
Iter
(subspec=T, **kwargs)[source]¶ Iter()
is glom’s counterpart to Python’s built-initer()
function. Given an iterable target,Iter()
yields the result of applying the passed spec to each element of the target, similar to the built-in[]
spec, but streaming.The following turns a list of strings into integers using Iter(), before deduplicating and converting it to a tuple:
>>> glom(['1', '2', '1', '3'], (Iter(int), set, tuple)) (1, 2, 3)
Iter()
also has many useful methods which can be chained to compose a stream processing pipeline. The above can also be written as:>>> glom(['1', '2', '1', '3'], (Iter().map(int).unique(), tuple)) (1, 2, 3)
Iter()
also respects glom’sSKIP
andSTOP
singletons for filtering and breaking iteration.Parameters: - subspec – A subspec to be applied on each element from the iterable.
- sentinel – Keyword-only argument, which, when found in the
iterable stream, causes the iteration to stop. Same as with the
built-in
iter()
.
-
map
(subspec)[source]¶ Return a new
Iter()
spec which will apply the provided subspec to each element of the iterable.>>> glom(range(5), Iter().map(lambda x: x * 2).all()) [0, 2, 4, 6, 8]
Because a spec can be a callable,
Iter.map()
does everything the built-inmap()
does, but with the full power of glom specs.>>> glom(['a', 'B', 'C'], Iter().map(T.islower()).all()) [True, False, False]
-
filter
(key=T)[source]¶ Return a new
Iter()
spec which will include only elements matching the given key.>>> glom(range(6), Iter().filter(lambda x: x % 2).all()) [1, 3, 5]
Because a spec can be a callable,
Iter.filter()
does everything the built-infilter()
does, but with the full power of glom specs. For even more power, combine,Iter.filter()
withCheck()
.>>> # PROTIP: Python's ints know how many binary digits they require, using the bit_length method >>> glom(range(9), Iter().filter(Check(T.bit_length(), one_of=(2, 4), default=SKIP)).all()) [2, 3, 8]
-
chunked
(size, fill=Sentinel('_MISSING'))[source]¶ Return a new
Iter()
spec which groups elements in the iterable into lists of length size.If the optional fill argument is provided, iterables not evenly divisible by size will be padded out by the fill constant. Otherwise, the final chunk will be shorter than size.
>>> list(glom(range(10), Iter().chunked(3))) [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]] >>> list(glom(range(10), Iter().chunked(3, fill=None))) [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, None, None]]
-
split
(sep=None, maxsplit=None)[source]¶ Return a new
Iter()
spec which will lazily split an iterable based on a separator (or list of separators), sep. Likestr.split()
, but for all iterables.split_iter()
yields lists of non-separator values. A separator will never appear in the output.>>> target = [1, 2, None, None, 3, None, 4, None] >>> list(glom(target, Iter().split())) [[1, 2], [3], [4]]
Note that
split_iter
is based onstr.split()
, so if sep isNone
,split()
groups separators. If empty lists are desired between two contiguousNone
values, simply usesep=[None]
:>>> list(glom(target, Iter().split(sep=[None]))) [[1, 2], [], [3], [4], []]
A max number of splits may also be set:
>>> list(glom(target, Iter().split(maxsplit=2))) [[1, 2], [3], [4, None]]
-
flatten
()[source]¶ Returns a new
Iter()
instance which combines iterables into a single iterable.>>> target = [[1, 2], [3, 4], [5]] >>> list(glom(target, Iter().flatten())) [1, 2, 3, 4, 5]
-
unique
(key=T)[source]¶ Return a new
Iter()
spec which lazily filters out duplicate values, i.e., only the first appearance of a value in a stream will be yielded.>>> target = list('gloMolIcious') >>> out = list(glom(target, Iter().unique(T.lower()))) >>> print(''.join(out)) gloMIcus
-
limit
(count)[source]¶ A convenient alias for
slice()
, which takes a single argument, count, the max number of items to yield.
-
slice
(*args)[source]¶ Returns a new
Iter()
spec which trims iterables in the same manner asitertools.islice()
.>>> target = [0, 1, 2, 3, 4, 5] >>> glom(target, Iter().slice(3).all()) [0, 1, 2] >>> glom(target, Iter().slice(2, 4).all()) [2, 3]
This method accepts only positional arguments.
-
takewhile
(key=T)[source]¶ Returns a new
Iter()
spec which stops the stream once key becomes falsy.>>> glom([3, 2, 0, 1], Iter().takewhile().all()) [3, 2]
itertools.takewhile()
for more details.
-
dropwhile
(key=T)[source]¶ Returns a new
Iter()
spec which drops stream items until key becomes falsy.>>> glom([0, 0, 3, 2, 0], Iter().dropwhile(lambda t: t < 1).all()) [3, 2, 0]
Note that while similar to
Iter.filter()
, the filter only applies to the beginning of the stream. In a way,Iter.dropwhile()
can be thought of aslstrip()
for streams. Seeitertools.dropwhile()
for more details.
-
all
()[source]¶ A convenience method which returns a new spec which turns an iterable into a list.
>>> glom(range(5), Iter(lambda t: t * 2).all()) [0, 2, 4, 6, 8]
Note that this spec will always consume the whole iterable, and as such, the spec returned is not an
Iter()
instance.
-
first
(key=T, default=None)[source]¶ A convenience method for lazily yielding a single truthy item from an iterable.
>>> target = [False, 1, 2, 3] >>> glom(target, Iter().first()) 1
This method takes a condition, key, which can also be a glomspec, as well as a default, in case nothing matches the condition.
As this spec yields at most one item, and not an iterable, the spec returned from this method is not an
Iter()
instance.
Reduction & Grouping¶
This document contains glom techniques for transforming a collection of data to a smaller set, otherwise known as “grouping” or “reduction”.
Combining iterables with Flatten and Merge¶
New in version 19.1.0.
Got lists of lists? Sets of tuples? A sequence of dicts (but only want
one)? Do you find yourself reaching for Python’s builtin sum()
and reduce()
? To handle these situations and more, glom has five
specifier types and two convenience functions:
-
glom.
flatten
(target, **kwargs)[source]¶ At its most basic,
flatten()
turns an iterable of iterables into a single list. But it has a few arguments which give it more power:Parameters: - init (callable) – A function or type which gives the initial
value of the return. The value must support addition. Common
values might be
list
(the default),tuple
, or evenint
. You can also passinit="lazy"
to get a generator. - levels (int) – A positive integer representing the number of nested levels to flatten. Defaults to 1.
- spec – The glomspec to fetch before flattening. This defaults to the the root level of the object.
Usage is straightforward.
>>> target = [[1, 2], [3], [4]] >>> flatten(target) [1, 2, 3, 4]
Because integers themselves support addition, we actually have two levels of flattening possible, to get back a single integer sum:
>>> flatten(target, init=int, levels=2) 10
However, flattening a non-iterable like an integer will raise an exception:
>>> target = 10 >>> flatten(target) Traceback (most recent call last): ... FoldError: can only Flatten on iterable targets, not int type (...)
By default,
flatten()
will add a mix of iterables together, making it a more-robust alternative to the built-insum(list_of_lists, list())
trick most experienced Python programmers are familiar with using:>>> list_of_iterables = [range(2), [2, 3], (4, 5)] >>> sum(list_of_iterables, []) Traceback (most recent call last): ... TypeError: can only concatenate list (not "tuple") to list
Whereas flatten() handles this just fine:
>>> flatten(list_of_iterables) [0, 1, 2, 3, 4, 5]
The
flatten()
function is a convenient wrapper around theFlatten
specifier type. For embedding in larger specs, and more involved flattening, seeFlatten
and its base,Fold
.- init (callable) – A function or type which gives the initial
value of the return. The value must support addition. Common
values might be
-
class
glom.
Flatten
(subspec=T, init=<type 'list'>)[source]¶ The Flatten specifier type is used to combine iterables. By default it flattens an iterable of iterables into a single list containing items from all iterables.
>>> target = [[1], [2, 3]] >>> glom(target, Flatten()) [1, 2, 3]
You can also set init to
"lazy"
, which returns a generator instead of a list. Use this to avoid making extra lists and other collections during intermediate processing steps.
-
glom.
merge
(target, **kwargs)[source]¶ By default,
merge()
turns an iterable of mappings into a single, mergeddict
, leveraging the behavior of theupdate()
method. A new mapping is created and none of the passed mappings are modified.>>> target = [{'a': 'alpha'}, {'b': 'B'}, {'a': 'A'}] >>> res = merge(target) >>> pprint(res) {'a': 'A', 'b': 'B'}
Parameters: target – The list of dicts, or some other iterable of mappings. The start state can be customized with the init keyword argument, as well as the update operation, with the op keyword argument. For more on those customizations, see the
Merge
spec.
-
class
glom.
Merge
(subspec=T, init=<type 'dict'>, op=None)[source]¶ By default, Merge turns an iterable of mappings into a single, merged
dict
, leveraging the behavior of theupdate()
method. The start state can be customized with init, as well as the update operation, with op.Parameters: - subspec – The location of the iterable of mappings. Defaults to
T
. - init (callable) – A type or callable which returns a base instance into which all other values will be merged.
- op (callable) – A callable, which takes two arguments, and
performs a merge of the second into the first. Can also be
the string name of a method to fetch on the instance created
from init. Defaults to
"update"
.
Note
Besides the differing defaults, the primary difference between
Merge
and otherFold
subtypes is that its op argument is assumed to be a two-argument function which has no return value and modifies the left parameter in-place. Because the initial state is a new object created with the init parameter, none of the target values are modified.- subspec – The location of the iterable of mappings. Defaults to
-
class
glom.
Sum
(subspec=T, init=<type 'int'>)[source]¶ The Sum specifier type is used to aggregate integers and other numericals using addition, much like the
sum()
builtin.>>> glom(range(5), Sum()) 10
Note that this specifier takes a callable init parameter like its friends, so to change the start value, be sure to wrap it in a callable:
>>> glom(range(5), Sum(init=lambda: 5.0)) 15.0
To “sum” lists and other iterables, see the
Flatten
spec. For other objects, see theFold
specifier type.
-
class
glom.
Fold
(subspec, init, op=<built-in function iadd>)[source]¶ The Fold specifier type is glom’s building block for reducing iterables in data, implementing the classic fold from functional programming, similar to Python’s built-in
reduce()
.Parameters: - subspec – A spec representing the target to fold, which must be
an iterable, or otherwise registered to ‘iterate’ (with
register()
). - init (callable) – A function or type which will be invoked to initialize the accumulator value.
- op (callable) – A function to call on the accumulator value and
every value, the result of which will become the new
accumulator value. Defaults to
operator.iadd()
.
Usage is as follows:
>>> target = [set([1, 2]), set([3]), set([2, 4])] >>> result = glom(target, Fold(T, init=frozenset, op=frozenset.union)) >>> result == frozenset([1, 2, 3, 4]) True
Note the required
spec
andinit
arguments.op
is optional, but here must be used because theset
andfrozenset
types do not work with addition.While
Fold
is powerful,Flatten
andSum
are subtypes with more convenient defaults for day-to-day use.- subspec – A spec representing the target to fold, which must be
an iterable, or otherwise registered to ‘iterate’ (with
Matching & Validation¶
New in version 20.7.0.
Sometimes you want to confirm that your target data matches your
code’s assumptions. With glom, you don’t need a separate validation
step, you can do these checks inline with your glom spec, using
Match
and friends.
Contents
Validation with Match¶
For matching whole data structures, use a Match
spec.
-
class
glom.
Match
(spec, default=Sentinel('_MISSING'))[source]¶ glom’s
Match
specifier type enables a new mode of glom usage: pattern matching. In particular, this mode has been designed for nested data validation.Pattern specs are evaluated as follows:
By itself, this allows to assert that structures match certain patterns, and may be especially familiar to users of the schema library.
For example, let’s load some data:
>>> target = [ ... {'id': 1, 'email': 'alice@example.com'}, ... {'id': 2, 'email': 'bob@example.com'}]
A
Match
pattern can be used to ensure this data is in its expected form:>>> spec = Match([{'id': int, 'email': str}])
This
spec
succinctly describes our data structure’s pattern Specifically, alist
ofdict
objects, each of which has exactly two keys,'id'
and'email'
, whose values are anint
andstr
, respectively. Now,glom()
will ensure ourtarget
matches our patternspec
:>>> result = glom(target, spec) >>> assert result == \ ... [{'id': 1, 'email': 'alice@example.com'}, {'id': 2, 'email': 'bob@example.com'}]
With a more complex
Match
spec, we can be more precise:>>> spec = Match([{'id': And(M > 0, int), 'email': Regex('[^@]+@[^@]+')}])
And
allows multiple conditions to be applied.Regex
evaluates the regular expression against the target value under the'email'
key. In this case, we take a simple approach: an email has exactly one@
, with at least one character before and after.Finally,
M
is our stand-in for the current target we’re matching against, allowing us to perform in-line comparisons using Python’s native greater-than operator (as well as others). We apply ourMatch
pattern as before:>>> assert glom(target, spec) == \ ... [{'id': 1, 'email': 'alice@example.com'}, {'id': 2, 'email': 'bob@example.com'}]
And as usual, upon a successful match, we get the matched result.
Note
For Python 3.6+ where dictionaries are ordered, keys in the target are matched against keys in the spec in their insertion order.
Parameters: - spec – The glomspec representing the pattern to match data against.
- default – The default value to be returned if a match fails. If not
set, a match failure will raise a
MatchError
.
-
matches
(target)[source]¶ A convenience method on a
Match
instance, returnsTrue
if the target matches,False
if not.>>> Match(int).matches(-1.0) False
Parameters: target – Target value or data structure to match against.
-
verify
(target)[source]¶ A convenience function a
Match
instance which returns the matched value when target matches, or raises aMatchError
when it does not.Parameters: target – Target value or data structure to match against. Raises: glom.MatchError
Optional and required dict
key matching¶
Note that our four Match
rules above imply that
object
is a match-anything pattern. Because
isinstance(val, object)
is true for all values in Python,
object
is a useful stopping case. For instance, if we wanted to
extend an example above to allow additional keys and values in the
user dict above we could add object
as a generic pass through:
>>> target = [{'id': 1, 'email': 'alice@example.com', 'extra': 'val'}]
>>> spec = Match([{'id': int, 'email': str, object: object}]))
>>> assert glom(target, spec) == \\
... [{'id': 1, 'email': 'alice@example.com', 'extra': 'val'}]
True
The fact that {object: object}
will match any dictionary exposes
the subtlety in Match
dictionary evaluation.
By default, value match keys are required, and other keys are
optional. For example, 'id'
and 'email'
above are required
because they are matched via ==
. If either was not present, it
would raise class:~glom.MatchError. class:object however is matched
with func:isinstance(). Since it is not an value-match comparison,
it is not required.
This default behavior can be modified with Required
and Optional
.
-
class
glom.
Optional
(key, default=Sentinel('_MISSING'))[source]¶ Used as a
dict
key in aMatch()
spec, marks that a value match key which would otherwise be required is optional and should not raiseMatchError
even if no keys match.For example:
>>> spec = Match({Optional("name"): str}) >>> glom({"name": "alice"}, spec) {'name': 'alice'} >>> glom({}, spec) {} >>> spec = Match({Optional("name", default=""): str}) >>> glom({}, spec) {'name': ''}
-
class
glom.
Required
(key)[source]¶ Used as a
dict
key inMatch()
mode, marks that a key which might otherwise not be required should raiseMatchError
if the key in the target does not match.For example:
>>> spec = Match({object: object})
This spec will match any dict, because
object
is the base type of every object:>>> glom({}, spec) {}
{}
will also match because match mode does not require at least one match by default. If we want to require that a key matches, we can useRequired
:>>> spec = Match({Required(object): object}) >>> glom({}, spec) Traceback (most recent call last): ... MatchError: error raised while processing. Target-spec trace, with error detail (most recent last): - Target: {} - Spec: Match({Required(object): <type 'object'>}) - Spec: {Required(object): <type 'object'>} MatchError: target missing expected keys Required(object)
Now our spec requires at least one key of any type. You can refine the spec by putting more specific subpatterns inside of
Required
.
M
Expressions¶
The most concise way to express validation and guards.
-
glom.
M
= M¶ M
is similar toT
, a stand-in for the current target, but whereT
allows for attribute and key access and method calls,M
allows for comparison operators.If a comparison succeeds, the target is returned unchanged. If a comparison fails,
MatchError
is thrown.Some examples:
>>> glom(1, M > 0) 1 >>> glom(0, M == 0) 0 >>> glom('a', M != 'b') == 'a' True
M
by itself evaluates the current target for truthiness. For example, M | Literal(None) is a simple idiom for normalizing all falsey values to None:>>> from glom import Literal >>> glom([0, False, "", None], [M | Literal(None)]) [None, None, None, None]
For convenience,
&
and|
operators are overloaded to constructAnd
andOr
instances.>>> glom(1.0, (M > 0) & float) 1.0
Note
Python’s operator overloading may make for concise code, but it has its limits.
Because bitwise operators (
&
and|
) have higher precedence than comparison operators (>
,<
, etc.), expressions must be parenthesized.>>> M > 0 & float Traceback (most recent call last): ... TypeError: unsupported operand type(s) for &: 'int' and 'type'
Similarly, because of special handling around ternary comparisons (
1 < M < 5
) are implemented via short-circuiting evaluation, they also cannot be captured byM
.
Boolean operators and matching¶
While M
is an easy way to construct expressions, sometimes a more
object-oriented approach can be more suitable.
-
class
glom.
Or
(*children, **kw)[source]¶ Tries to apply the first child spec to the target, and return the result. If GlomError is raised, try the next child spec until there are no all child specs have been tried, then raise MatchError.
-
class
glom.
And
(*children, **kw)[source]¶ Applies child specs one after the other to the target; if none of the specs raises GlomError, returns the last result.
-
class
glom.
Not
(child)[source]¶ Inverts the child. Child spec will be expected to raise
GlomError
(or subtype), in which case the target will be returned.If the child spec does not raise
GlomError
,MatchError
will be raised.
Exceptions¶
-
class
glom.
MatchError
(fmt, *args)[source]¶ Raised when a
Match
orM
check fails.>>> glom({123: 'a'}, Match({'id': int})) Traceback (most recent call last): ... MatchError: key 123 didn't match any of ['id']
-
class
glom.
TypeMatchError
(actual, expected)[source]¶ MatchError
subtype raised when aMatch
fails a type check.>>> glom({'id': 'a'}, Match({'id': int})) Traceback (most recent call last): ... TypeMatchError: error raised while processing. Target-spec trace, with error detail (most recent last): - Target: {'id': 'a'} - Spec: Match({'id': <type 'int'>}) - Spec: {'id': <type 'int'>} - Target: 'a' - Spec: int TypeMatchError: expected type int, not str
Validation with Check¶
Warning
Given the suite of tools introduced with Match
, the
Check
specifier type may be deprecated in a future
release.
-
class
glom.
Check
(spec=T, **kwargs)[source]¶ Check objects are used to make assertions about the target data, and either pass through the data or raise exceptions if there is a problem.
If any check condition fails, a
CheckError
is raised.Parameters: - spec – a sub-spec to extract the data to which other assertions will be checked (defaults to applying checks to the target itself)
- type – a type or sequence of types to be checked for exact match
- equal_to – a value to be checked for equality match (“==”)
- validate – a callable or list of callables, each representing a check condition. If one or more return False or raise an exception, the Check will fail.
- instance_of – a type or sequence of types to be checked with isinstance()
- one_of – an iterable of values, any of which can match the target (“in”)
- default – an optional default value to replace the value when the check fails (if default is not specified, GlomCheckError will be raised)
Aside from spec, all arguments are keyword arguments. Each argument, except for default, represent a check condition. Multiple checks can be passed, and if all check conditions are left unset, Check defaults to performing a basic truthy check on the value.
-
class
glom.
CheckError
(msgs, check, path)[source]¶ This
GlomError
subtype is raised when target data fails to pass aCheck
’s specified validation.An uncaught
CheckError
looks like this:>>> target = {'a': {'b': 'c'}} >>> glom(target, {'b': ('a.b', Check(type=int))}) Traceback (most recent call last): ... CheckError: target at path ['a.b'] failed check, got error: "expected type to be 'int', found type 'str'"
If the
Check
contains more than one condition, there may be more than one error message. The string rendition of theCheckError
will include all messages.You can also catch the
CheckError
and programmatically access messages through themsgs
attribute on theCheckError
instance.
Exceptions & Debugging¶
While glom works well when all goes as intended, it even shines when data doesn’t match expectations. glom’s error messages and exception hierarchy have been designed to maximize readability and debuggability. Read on for a listing of glom’s exceptions and how to debug them.
Contents
Exceptions¶
glom introduces a several new exception types designed to maximize
readability and debuggability. Note that all these errors derive from
GlomError
, and are only raised from glom()
calls, not
from spec construction or glom type registration. Those declarative
and setup operations raise ValueError
, TypeError
, and
other standard Python exceptions as appropriate.
Here is a short list of links to all public exception types in glom.
Reading a glom Exception¶
glom errors are regular Python exceptions, but may look a little different from other Python errors. Because glom is a data manipulation library, glom errors include a data traceback, interleaving spec and target data.
For example, let’s raise an error by glomming up some data that doesn’t exist:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | >>> target = {'planets': [{'name': 'earth', 'moons': 1}]}
>>> glom(target, ('planets', ['rings']))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/mahmoud/projects/glom/glom/core.py", line 1787, in glom
raise err
glom.core.PathAccessError: error raised while processing, details below.
Target-spec trace (most recent last):
- Target: {'planets': [{'name': 'earth', 'moons': 1}]}
- Spec: ('planets', ['rings'])
- Spec: 'planets'
- Target: [{'name': 'earth', 'moons': 1}]
- Spec: ['rings']
- Target: {'name': 'earth', 'moons': 1}
- Spec: 'rings'
glom.core.PathAccessError: could not access 'rings', part 0 of Path('rings'), got error: KeyError('rings')
|
Let’s step through this output:
- Line 1: We created a planet registry, similar to the one in the glom Tutorial.
- Line 2-3: We try to get a listing of
rings
of all the planets. Instead, we get a Python traceback. - Line 7: We see we have a
PathAccessError
. - Line 8-9: The “target-spec trace”, our data stack, begins. It always starts with the target data as it was passed in.
- Line 10: Next is the top-level spec, as passed in:
('planets', ['rings'])
- Line 11: glom takes the first part of the spec from line 9,
'planets'
, to get the next target. - Line 12: Because the spec on line 11 updated the current target, glom outputs it. When a spec is evaluated but the target value is unchanged, the target is skipped in the trace.
- Line 14-15: We get to the last two lines, which include the culprit target and spec
- Line 16: Finally, our familiar
PathAccessError
message, with more details about the error, including the originalKeyError('rings')
.
This view of glom evaluation answers many of the questions a developer or user would ask upon encountering the error:
- What was the data?
- Which part of the spec failed?
- What was the original error?
The data trace does this by peeling away at the target and spec until it hones in on the failure. Both targets and specs in traces are truncated to terminal width to maximize readability.
Note
If for some reason you need the full Python stack instead of the
glom data traceback, pass glom_debug=True
to the top-level glom
call.
Debugging¶
Good error messages are great when the data has a problem, but what about when a spec is incorrect?
Even the most carefully-constructed specifications eventually need debugging. If the error message isn’t enough to fix your glom issues, that’s where Inspect comes in.
-
class
glom.
Inspect
(*a, **kw)[source]¶ The
Inspect
specifier type provides a way to get visibility into glom’s evaluation of a specification, enabling debugging of those tricky problems that may arise with unexpected data.Inspect
can be inserted into an existing spec in one of two ways. First, as a wrapper around the spec in question, or second, as an argument-less placeholder wherever a spec could be.Inspect
supports several modes, controlled by keyword arguments. Its default, no-argument mode, simply echos the state of the glom at the point where it appears:>>> target = {'a': {'b': {}}} >>> val = glom(target, Inspect('a.b')) # wrapping a spec --- path: ['a.b'] target: {'a': {'b': {}}} output: {} ---
Debugging behavior aside,
Inspect
has no effect on values in the target, spec, or result.Parameters: - echo (bool) – Whether to print the path, target, and output of each inspected glom. Defaults to True.
- recursive (bool) – Whether or not the Inspect should be applied at every level, at or below the spec that it wraps. Defaults to False.
- breakpoint (bool) – This flag controls whether a debugging prompt should appear before evaluating each inspected spec. Can also take a callable. Defaults to False.
- post_mortem (bool) – This flag controls whether exceptions
should be caught and interactively debugged with
pdb
on inspected specs.
All arguments above are keyword-only to avoid overlap with a wrapped spec.
Note
Just like
pdb.set_trace()
, be careful about leaving strayInspect()
instances in production glom specs.
Writing a custom Specifier Type¶
While glom comes with a lot of built-in features, no library can ever encompass all data manipulation operations.
To cover every case out there, glom provides a way to extend its functionality with your own data handling hooks. This document explains glom’s execution model and how to integrate with it when writing a custom Specifier Type.
When to write a Specifier Type¶
glom
has always supported arbitrary callables, like so:
glom({'nums': range(5)}, ('nums', sum))
# 10
With this built-in extensibility, what does a glom specifier type add?
Custom specifier types are useful when you want to:
- Perform validation at spec construction time
- Enable users to interact with new target types and operations
- Improve readability and reusability of your data transformations
- Temporarily change the glom runtime behavior
If you’re just building a one-off spec for transforming your own data,
there’s no reason to reach for an extension. glom
’s extension API
is easy, but a good old Python lambda
is even easier.
Building your Specifier Type¶
Any object instance with a glomit
method can participate in a glom
call. By way of example, here is a programming cliché implemented as a
glom specifier type, with comments referencing notes below.
class HelloWorldSpec(object): # 1
def glomit(self, target, scope): # 2
print("Hello, world!")
return target
And now let’s put it to use!
from glom import glom
target = {'example': 'object'}
glom(target, HelloWorldSpec()) # 3
# prints "Hello, world!" and returns target
There are a few things to note from this example:
- Specifier types do not need to inherit from any type. Just implement the
glomit
method.- The
glomit
signature takes two parameters,target
andscope
. Thetarget
should be familiar from usingglom()
, and it’s thescope
that makes glom really tick.- By convention, instances are used in specs passed to
glom()
calls, not the types themselves.
The glom Scope¶
The glom scope exposes runtime state to the specifier type. Let’s take a look inside a scope:
from glom import glom
from pprint import pprint
class ScopeInspectorSpec(object):
def glomit(self, target, scope):
pprint(dict(scope))
return target
glom(target, ScopeInspectorSpec())
Which gives us:
{T: {'example': 'object'},
<function glom at 0x7f208984d140>: <function _glom at 0x7f208984d5f0>,
<class 'glom.core.Path'>: [],
<class 'glom.core.Spec'>: <__main__.ScopeInspectorSpec object at 0x7f208bf58690>,
<class 'glom.core.Inspect'>: None,
<class 'glom.core.TargetRegistry'>: <glom.core.TargetRegistry object at 0x7f208984b4d0>}
As you can see, all glom’s core workings are present, all under familiar keys:
- The current target, accessible using
T
as a scope key.- The current spec, accessible under
Spec
.- The current path, accessible under
Path
.- The
TargetRegistry
, used to register new operations and target types.- Even the
glom()
function itself, filed underglom()
.
To learn how to use the scope’s powerful features idiomatically, let’s reimplement at one of glom’s standard specifier types.
Specifiers by example¶
While we’ve technically created a couple of extensions above, let’s really dig into the features of the scope using an example.
Sum
is a standard extension that ships with glom, and
it works like this:
from glom import glom, Sum
glom([1, 2, 3], Sum())
# 6
The version below does not have as much error handling, but reproduces
all the same basic principles. This version of Sum()
code also
contains comments with references to explanatory notes below.
from glom import glom, Path, T
from glom.core import TargetRegistry, UnregisteredTarget # 1
class Sum(object):
def __init__(self, subspec=T, init=int): # 2
self.subspec = subspec
self.init = init
def glomit(self, target, scope):
if self.subspec is not T:
target = scope[glom](target, self.subspec, scope) # 3
try:
# 4
iterate = scope[TargetRegistry].get_handler('iterate', target, path=scope[Path])
except UnregisteredTarget as ut:
# 5
raise TypeError('can only %s on iterable targets, not %s type (%s)'
% (self.__class__.__name__, type(target).__name__, ut))
try:
iterator = iterate(target)
except Exception as e:
raise TypeError('failed to iterate on instance of type %r at %r (got %r)'
% (target.__class__.__name__, Path(*scope[Path]), e))
return self._sum(iterator)
def _sum(self, iterator): # 6
ret = self.init()
for v in iterator:
ret += v
return ret
Now, let’s take a look at the interesting parts, referencing the comments above:
- Specifier types often reference the TargetRegistry, which is not part of the top-level
glom
API, and must be imported fromglom.core
. More on this in #4.- Specifier type
__init__
methods may take as many or as few arguments as desired, but many glom specifier types take a first parameter of a subspec, meant to be fetched right before the actual specifier’s operation. This helps readability of glomspecs. SeeCoalesce
for an example of this idiom.- Specifier types should not reference the
glom()
function directly, instead use theglom()
function as a key to thescope
map to get the currently activeglom()
. This ensures that the extension type is compatible with advanced specifier types which override theglom()
function.- To maximize compatiblity with new target types,
glom
allows new types and operations to be registered with theTargetRegistry
. Specifier types should respect this by contextually fetching these standard operators as demonstrated above. At the time of writing, three primary operators are used by glom itself,"get"
,"iterate"
, and"assign"
.- In the event that the current target does not support your Specifier type’s desired operation, it’s customary to raise a helpful error. Consider creating your own exception type and inheriting from
GlomError
.- Specifier types may have other methods and members in addition to the primary
glomit()
method. This_sum()
method implements most of the core of our custom specifier type.
Check out the implementation of the real glom.Sum()
specifier for more details.
Summing up¶
glom
Specifier Types are more than just add-ons; the extension
architecture is how most of glom
itself is implemented. Build
knowing that the paradigm is as powerful as anything built-in.
If you need more examples, another simple one can be found in
this snippet. glom
’s source code itself
contains many specifiers more advanced than the above. Simply search
the codebase for glomit()
methods and you will find no shortage.
Happy extending!
glom
Modes¶
Note
Be sure to read “Writing a custom Specifier Type” before diving into the deep details below.
A glom “mode” determines how Python built-in data structures are
evaluated. Think of it like a dialect for how dict
,
tuple
, list
, etc., are interpreted in a spec. Modes
do not change the behavior of T, or many other core
specifiers. Modes are one of the keys to keeping glom specs short and
readable.
A mode is used similar to a spec: whatever Python data structure is passed to the mode type constructor will be evaluated under that mode. Once set, the mode remains in place until it is overridden by another mode.
glom only has a few modes:
Auto
- The default glom behavior, used for data transformation, with the spec acting as a template.Fill
- A variant of the default transformation behavior; preferring to “fill” containers instead of iterating, chaining, etc.Match
- Treats the spec as a pattern, checking that the target matches.
Adding a new mode is relatively rare, but when it comes up this document includes relevant details.
Writing custom Modes¶
A mode is a spec which sets scope[MODE]
to a function which
accepts target
, spec
, and scope
and returns a result, a
signature very similar to the top-level glom()
method
itself.
For example, here is an abbreviated version of the Fill
mode:
class Fill(object):
def __init__(self, spec):
self.spec = spec
def glomit(self, target, scope):
scope[MODE] = _fill
return scope[glom](target, self.spec, scope)
def _fill(target, spec, scope):
recurse = lambda val: scope[glom](target, val, scope)
if type(spec) is dict:
return {recurse(key): recurse(val)
for key, val in spec.items()}
if type(spec) in (list, tuple, set, frozenset):
result = [recurse(val) for val in spec]
if type(spec) is list:
return result
return type(spec)(result)
if callable(spec):
return spec(target)
return spec
Like any other Specifier Type, Fill
has
a glomit()
method, and this method sets the MODE
key in the
glom scope to our _fill
function. The name
itself doesn’t matter, but the signature must match exactly:
(target, spec, scope)
.
As mentioned above, custom modes are relatively rare for glom. If you write one, let us know!