FuzzyDates, or: One Django Model Field, multiple database columns

A while ago, I implemented support for fuzzy dates in critify, i.e. incomplete dates like “January 2008”, “2008” or “3Q 2008”. I decided two use two separate columns on the database level, one storing the date itself, the other a “precision”. Based on the latter I then can determine which parts of the date value to consider missing. The thinking here is that two columns will provide for more flexibility when it comes to sorting – e.g. do we want to show “2008” dates before or after all the non-fuzzy values?

In Django, I wanted to hide as many of those implementation details as possible – define a FuzzyDateField() on a model, done. However, the field would have to manage two database columns, which is not really something Django models fields are designed to do. All things considered though, it worked out surprisingly well:

While each model field maps to exactly one database column (and there’s not much we can do about that), nothing is stopping as from adding a second field dynamically:

class FuzzyDateField(models.DateField):
    def contribute_to_class(self, cls, name):
        precision_field = models.IntegerField(
            choices=enum_as_choices(DatePrecision), editable=False,
            null=True, blank=True)
        cls.add_to_class(_precision_field_name(name), precision_field)

        # add the date field normally
        super(FuzzyDateField, self).contribute_to_class(cls, name)

During model creation, Django will call contribute_to_class() on each field, basically instructing it to add itself. A perfect opportunity to sneak a hidden precision field in there. Note that this is now a field like any other, and works just fine with syncdb, the ORM, etc.

Now, according to the docs there are a couple of methods we are supposed to look into overriding, like get_db_prep_save() or get_db_prep_lookup(). Most importantly, there’s to_python(), which is required if our field uses a custom value type (a FuzzyDate instance, in our case). See, all the default Django model fields actually just use standard python types like integers, strings, directly provided by the database backend. So when a model is loaded from the database, Django just sets an attribute on the model instance for each column (remember that).

When is to_python() called then? As stated in the documentation, we need to let our field use the SubfieldBase metaclass. Doing this will make sure – at class definition time – that the model attribute for the field is an object implementing the descriptor protocol. Now, when the model is created from the database, and Django is again setting an attribute for each column, when it reaches our custom field, the descriptor object can intercept, and call the field classes’ to_python() method.

How is this relevant? Well, we want to return a FuzzyDate instance, and for that, we need access to both the date value itself, and the value from the precision field. The former we retrieve via the to_python() call, but we never get to touch the latter. Except… the descriptor object, that is responsible for the to_python() call, possibly could, as it has access to the model instance (because that’s how descriptors work). Now, if we can make sure that the precision column is loaded/assigned to the model first, we can use a custom version of SubfieldBase, with a custom descriptor object, that passes us both column values.

Bear with me, we’re almost done. There are two more issues. First, it turns out that SubfieldBase is not really extensible. We’d basically have to redo the entire fields.subclassing module (this is where it’s defined). But as we have a very specific case and don’t need to keep things generic, we can just create an descriptor object that does all the work required itself (no need to redirect anything to the FuzzyDateField class):

_precision_field_name = lambda name: "%s_precision"%name
class FuzzyDateCreator(object):
    def __init__(self, field):
        self.field = field
        self.precision_name = _precision_field_name(self.field.name)

    def __get__(self, obj, type=None):
        if obj is None:
            raise AttributeError('Can only be accessed via an instance.')

        date = obj.__dict__[self.field.name]
        if date is None: return None
        else:
            # ===> Build a fuzzydate based on both the date and the precision value
            return FuzzyDate(date, precision=getattr(obj, self.precision_name))

    def __set__(self, obj, value):
        if isinstance(value, FuzzyDate):
            # fuzzy date is assigned: take over it's values
            obj.__dict__[self.field.name] = value.date
            setattr(obj, self.precision_name, value.precision)
        else:
            # standard python date: use the date portion and reset precision
            obj.__dict__[self.field.name] = self.field.to_python(value)
            # you could be tempted to reset the precision to "day" whenever
            # a user assigns a plain date - however, don't do this. when django
            # assigns to this while loading a row from the database, we want
            # to keep the precision that was already set!

We also don’t need the fancy metaclass stuff. We simply put the descriptor object in place right after we created the fields, in contribute_to_class():

class FuzzyDateField(models.DateField):
    def contribute_to_class(self, cls, name):
        # .... OLD CODE, see above ...
        super(FuzzyDateField, self).contribute_to_class(cls, name)
        # NEW => set descriptor
        setattr(cls, self.name, FuzzyDateCreator(self))

Now, all that would almost work, but try it, and you’ll see that it doesn’t: When the descriptor object tries to do it’s job, the precision attribute has not yet been set. Why would that be, after we explicitly took care to create the precision field first in contribute_to_class(), before adding ourself? As it turns out, Django reorders all fields internally, using a class variable creation_counter on the base Field class. Whenever a field class is instantiated (i.e. usually when defining a model), that counter is incremented. Because all the metaclass magic that Django employs during model creation only begins after the model class has been fully defined, all the field instances will have been created before our contribute_to_class() is even called. The precision field created there will thus always have a higher creation_counter than all the “normally” defined fields.

To fix this, we employ our first real hack. We manually update the precision field’s created_counter to ensure the correct order, leading us to the following contribute_to_class() method:

    def contribute_to_class(self, cls, name):
        precision_field = models.IntegerField(
            choices=enum_as_choices(DatePrecision), editable=False,
            null=True, blank=True) # if not set, assume full precision
        # ==> HACK: manually fix creation_counter
        precision_field.creation_counter = self.creation_counter
        cls.add_to_class(_precision_field_name(name), precision_field)

        # add the date field as normal
        super(FuzzyDateField, self).contribute_to_class(cls, name)

        setattr(cls, self.name, FuzzyDateCreator(self))

And that’s it. I’m still amazed that it was that simple (if the above doesn’t sound simple: that’s mostly because I’m a bad writer, and possibly also because all those Django internals that need to explained – if you look at the final mechanics only, you’ll agree that the whole thing really is quite straightforward). One thing to note though: What doesn’t work 100% correctly are lookups that involve fuzzy dates. So far, I haven’t found a way to do this (and one first glance, it doesn’t even seem possible), so any lookup, be it exact, range, year… – they all simple query the date field, while completely ignoring any precision value. As “2008” is stored as 2008/01/01, it will be included in a range query from 2007/12/29 to 2008/01/02, which is probably not  desirable.
I skipped over a lot of non-relevant stuff in this post, like the actual implementation of the FuzzyDate class, the newforms field and widget classes, and a couple other things. But none of that is in any way special, and the whole thing is long enough as it is.

To close out, here’s the complete code of the FuzzyDateField implementation itself:

from django.db import models
from djutils import enum_as_choices
from core import DatePrecision, FuzzyDate
import forms

_precision_field_name = lambda name: "%s_precision"%name

class FuzzyDateCreator(object):
    """
    An equivalent to Django's default attribute descriptor class (enabled via
    the SubfieldBase metaclass, see module doc for details). However, instead
    of callig to_python() on our FuzzyDateField class, it stores the two
    different party of a fuzzy date, the date and the precision, separately, and
    updates them whenever something is assigned. If the attribute is read, it
    builds the FuzzyDate instance "on-demand" with the current data.
    """
    def __init__(self, field):
        self.field = field
        self.precision_name = _precision_field_name(self.field.name)

    def __get__(self, obj, type=None):
        if obj is None:
            raise AttributeError('Can only be accessed via an instance.')

        date = obj.__dict__[self.field.name]
        if date is None: return None
        else:
            return FuzzyDate(date, precision=getattr(obj, self.precision_name))

    def __set__(self, obj, value):
        if isinstance(value, FuzzyDate):
            # fuzzy date is assigned: take over it's values
            obj.__dict__[self.field.name] = value.date
            setattr(obj, self.precision_name, value.precision)
        else:
            # standard python date: use the date portion and reset precision
            obj.__dict__[self.field.name] = self.field.to_python(value)
            # you could be tempted to reset the precision to "day" whenever
            # a user assigns a plain date - however, don't do this. when django
            # assigns to this while loading a row from the database, we want
            # to keep the precision that was already set!

class FuzzyDateField(models.DateField):
    """
    A field that stores a fuzzy date. See the module doc for more information.
    """
    def contribute_to_class(self, cls, name):
        # first, create a hidden "precision" field. It is *crucial* that this
        # field appears *before* the actual date field (i.e. self) in the
        # models _meta.fields - to achieve this, we need to change it's
        # creation_counter class variable.
        precision_field = models.IntegerField(
            choices=enum_as_choices(DatePrecision), editable=False,
            null=True, blank=True) # if not set, assume full precision
        # setting the counter to the same value as the date field itself will
        # ensure the precision field appear first - it is added first after all,
        # and when the date field is added later, it won't be sorted before it.
        precision_field.creation_counter = self.creation_counter
        cls.add_to_class(_precision_field_name(name), precision_field)

        # add the date field as normal
        super(FuzzyDateField, self).contribute_to_class(cls, name)

        # as we are not using SubfieldBase (see intro), we need to do it's job
        # ourselfs. we don't need to be generic, so don't use a metaclass, but
        # just assign the descriptor object here.
        setattr(cls, self.name, FuzzyDateCreator(self))

    def get_db_prep_save(self, value):
        if isinstance(value, FuzzyDate): value = value.date
        return super(FuzzyDateField, self).get_db_prep_save(value)

    def get_db_prep_lookup(self, lookup_type, value):
        if lookup_type == 'exact':
            return [self.get_db_prep_save(value)]
        elif lookup_type == 'in':
            return [self.get_db_prep_save(v) for v in value]
        else:
            # let the base class deal with the rest; some will work out fine,
            # like 'year', others will probably give unexpected results,
            # like 'range'.
            return super(FuzzyDateField, self).get_db_prep_lookup(lookup_type, value)

    def formfield(self, **kwargs):
        defaults = {'form_class': forms.FuzzyDateField}
        defaults.update(kwargs)
        return super(FuzzyDateField, self).formfield(**defaults)

    # Although we need flatten_data for (oldforms) admin, we don't need to
    # implement it here, as the DateField baseclass will just call strftime on
    # our FuzzyDate object, which is something we support.

9 thoughts on “FuzzyDates, or: One Django Model Field, multiple database columns

  1. Well done. This is a very logical approach. The SubfieldBase metaclass is just a helper class for the common case; it’s not required to use it and people *should* just write their own __get__ and __set__ methods when they need to, as you’ve done.

    Making this work in the fully general case (where lookups work) is actually pretty fiddly. I’ve been working on a design for a couple of weeks now without having anything brilliant pop out (although it seems to be getting closer). So it’s work in progress. We need it for reverse-lookups with generic relations for example — which map two database columns to one model attribute.

    Like

  2. Thanks for your comment, Malcolm.

    I haven’t looked at generic relations at all so far, which is something I know regret. I wasn’t aware that they need to deal with the same issue. I’ll check them out.

    Like

  3. Great post. I’m trying to implement a file field with a foreignkey db column. You really inspired me. Thanks!

    Like

  4. You have told the way to create extra fields dynamically along with the custom field. How can I assign values to the created fields whenever we populate the custom field.

    For example,
    fullname = MyCustomField(max_length=100).

    In the def of MyCustomField, I created 2 fields dynamically, firstname and lastname. If fullname is ‘abc def’, then firstname is ‘abc’, lastname is ‘def’ . I am able to create these fields, but unable to populate those values ‘abc’ for firstname and ‘def’ for lastname. By the way, this firstname logic is just an example , and my requirement is something like this.

    Like

Leave a comment