glom Extensions

While glom comes with a lot of built-in features, no library can ever encompass all data manipulation operations.

To cover every case out there, glom provides a way to extend its functionality with your own data handling hooks. This document explains glom’s execution model and how to integrate with it using glom’s Extension API.

When to make an extension

From day one, glom has had built-in support for arbitrary callables, like so:

glom({'nums': range(5)}, ('nums', sum))
# 10

With this built-in extensibility, what does a glom extension add?

Glom extensions are useful when you want to:

  • Perform validation at spec construction time
  • Enable users to interact with new target types and operations
  • Improve readability and reusability of your data transformations
  • Temporarily change the glom runtime behavior

If you’re just building a one-off spec for transforming your own data, there’s no reason to reach for an extension. glom’s extension API is easy, but a good old Python lambda is even easier.

Making a Specifier Type

Any object instance with a glomit method can participate in a glom call. By way of example, here is a programming cliche implemented as a glom extension type, with comments referencing notes below.

class HelloWorldSpec(object):  # 1
    def glomit(self, target, scope):  # 2
        print("Hello, world!")
        return target

And now let’s put it to use!

from glom import glom

target = {'example': 'object'}

glom(target, HelloWorldSpec())  # 3
# prints "Hello, world!" and returns target

There are a few things to note from this example:

  1. Specifier types do not need to inherit from any type. Just implement the glomit method.
  2. The glomit signature takes two parameters, target and scope. The target should be familiar from using glom(), and it’s the scope that makes glom really tick.
  3. By convention, instances are used in specs passed to glom() calls, not the types themselves.

The glom Scope

The glom scope exposes glom-internal state to the extension. Let’s take a look inside a scope:

from glom import glom
from pprint import pprint

class ScopeInspectorSpec(object):
    def glomit(self, target, scope):
        return target

glom(target, ScopeInspectorSpec())

Which gives us:

{T: {'example': 'object'},
<function glom at 0x7f208984d140>: <function _glom at 0x7f208984d5f0>,
<class 'glom.core.Path'>: [],
<class 'glom.core.Spec'>: <__main__.ScopeInspectorSpec object at 0x7f208bf58690>,
<class 'glom.core.Inspect'>: None,
<class 'glom.core.TargetRegistry'>: <glom.core.TargetRegistry object at 0x7f208984b4d0>}

As you can see, all glom’s core workings are present, all under familiar keys:

To learn how to use the scope’s powerful features idiomatically, let’s reimplement at one of glom’s standard specifier types.

Extensions by example

While we’ve technically created a couple of extensions above, let’s really dig into the features of the scope using an example.

Sum is a standard extension that ships with glom, and it works like this:

from glom import glom

glom([1, 2, 3], Sum())
# 6

The version below does not have as much error handling, but reproduces all the same basic principles. This version of Sum() code also contains comments with references to explanatory notes below.

from glom import glom, Path, T
from glom.core import TargetRegistry, UnregisteredTarget  # 1

class Sum(object):
   def __init__(self, subspec=T, init=int):  # 2
       self.subspec = subspec
       self.init = init

   def glomit(self, target, scope):
       if self.subspec is not T:
           target = scope[glom](target, self.subspec, scope)  # 3

           # 4
           iterate = scope[TargetRegistry].get_handler('iterate', target, path=scope[Path])
       except UnregisteredTarget as ut:
           # 5
           raise TypeError('can only %s on iterable targets, not %s type (%s)'
                           % (self.__class__.__name__, type(target).__name__, ut))

           iterator = iterate(target)
       except Exception as e:
           raise TypeError('failed to iterate on instance of type %r at %r (got %r)'
                           % (target.__class__.__name__, Path(*scope[Path]), e))

       return self._sum(iterator)

   def _sum(self, iterator):  # 6
       ret = self.init()

       for v in iterator:
           ret += v

       return ret

Now, let’s take a look at the interesting parts, referencing the comments above:

  1. Extensions often reference the TargetRegistry, which is not part of the top-level glom API, and must be imported from glom.core. More on this in #4.
  2. Specifier type __init__ methods may take as many or as few arguments as desired, but many glom specifier types take a first parameter of a subspec, meant to be fetched right before the actual specifier’s operation. This helps readability of glomspecs. See Coalesce for an example of this idiom.
  3. Extension specifiers should not reference the glom() function directly, instead use the glom() function as a key to the scope map to get the currently active glom(). This ensures that the extension type is compatible with advanced specifier types which override the glom() function.
  4. To maximize compatiblity with new target types, glom allows new types and operations to be registered with the TargetRegistry. Extensions should respect this by contextually fetching these standard operators as demonstrated above. At the time of writing, three primary operators are used by glom itself, "get", "iterate", and "assign".
  5. In the event that the current target does not support your extension’s desired operation, it’s customary to raise a helpful error. Consider creating your own exception type and inheriting from GlomError.
  6. Extension types may have other methods and members in addition to the primary glomit() method. This _sum() method implements most of the core of our custom extension.

Check out the implementation of the real glom.Sum() specifier for more details.

Summing up

glom extensions are more than just add-ons; the extension architecture is how most of glom itself is implemented. Build knowing that the paradigm is powerful enough to achieve your data transformation requirements.

If you need more examples, a simple one can be found in this snippet, and glom itself contains many specifiers more advanced than the above. Simply search the codebase for glomit() methods and you will find no shortage.

Happy extending!