Serialization of stored/cached data¶
By default, both cache and data files (created using the APIs described in
Persistent data) are cached using
cPickle. This provides
a great compromise in terms of speed and the ability to store arbitrary objects.
When changing or specifying a serializer, use the name under which the serializer is registered with the workflow.manager object.
When it comes to cache data, it is strongly recommended to stick with the
cPickle is very fast and fully supports standard Python
data structures (
If you really must customise the cache data format, you can change the
default cache serialization format to
wf = Workflow() wf.cache_serializer = 'pickle'
Unlike the stored data API, the cached data API can’t determine the format of the cached data. If you change the serializer without clearing the cache, errors will probably result as the serializer tries to load data in a foreign format.
In the case of stored data, you are free to specify either a global default serializer or one for each individual datastore:
1 2 3 4 5 6
wf = Workflow() # Use `pickle` as the global default serializer wf.data_serializer = 'pickle' # Use the JSON serializer only for these data wf.store_data('name', data, serializer='json')
This is primarily so you can create files that are human-readable or useable by other software. The generated JSON is formatted to make it readable.
stored_data() method can
automatically determine the serialization of the stored data (based on the file
extension, which is the same as the name the serializer is registered under),
provided the corresponding serializer is registered. If it isn’t, a
ValueError will be raised.
There are 3 built-in, pre-configured serializers:
cpickle— the default serializer for both cached and stored data, with very good support for native Python data types;
pickle— a more flexible, but much slower alternative to
json— a very common data format, but with limited support for native Python data types.
You can add your own serializer, or replace the built-in ones, using the
configured instance of
from workflow import manager.
1 2 3 4
# Reading obj = serializer.load(open('filename', 'rb')) # Writing serializer.dump(obj, open('filename', 'wb'))
To register a new serializer, call the
register() method of the
workflow.manager object with the name of the serializer and the object
that performs serialization:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
from workflow import Workflow, manager class MySerializer(object): @classmethod def load(cls, file_obj): # load data from file_obj @classmethod def dump(cls, obj, file_obj): # serialize obj to file_obj manager.register('myformat', MySerializer())
The name you specify for your serializer will be the file extension of the stored files.