Skip to content

Default Dict

You must have come across the situation: you don't know if a key is in the dict, but you want to change the value, append the dict or modify the data. For example:

Dict Awkward Situation

There is a list of animal types:

animals = ['cat', 'dog', 'cat', 'parrot', 'lion', 'tiger', 'dog', 'parrot', 'dog']

And if want to use a dict to store the counts of each type.

Let's do a naive approach:

Codes

def count_animals(animals):
    stat = {}
    for animal in animals:
        if animal not in stat:
            stat[animal] = 0
        stat[animal] += 1
    return stat

Run

count_animals(animals)

Output

{'cat': 2, 'dog': 3, 'parrot': 2, 'lion': 1, 'tiger': 1}

It solves the problem, but handling key missing error is annoying when the situation gets complicate.

If we can handle it in one place, and use it anywhere, won't it be great? Sounds familiar?

Poor Man Default Dict

Poor Man Default Dict

I know implementing a prefect default dict is appealing, but the details will blind you from the mechanism. Here, we only try to do a poor man version defaultdict to give you the concept how it works.

When you try to get a key that doesn't exist, __missing__ method can be called from __getitem__. Implementing it will do the trick.

Codes

class PoorManDefaultDictV1(dict):
    def __missing__(self, key):
        return 0

Run

stat = PoorManDefaultDictV1()
stat['cat']

Output

0

We need to let user to provide a callable object to return the default value and store it in the dict.

Codes

class PoorManDefaultDictV2(dict):

    def __init__(self, default_factory, *args, **kwargs):
        # set the default factory
        self.default_factory = default_factory
        super().__init__(*args, **kwargs)

    def __missing__(self, key):
        value = self.default_factory()
        self.__setitem__(key, value)
        return value

Run

stat = PoorManDefaultDictV2(int)
stat['cat']

Output

0

Also we can:

Run

stat = PoorManDefaultDictV2(list)
stat['cat']

Output

[]

Now let's use our poor man version of defaultdict to solve the previous problem.

Codes

def poor_man_count_animals(animals):
    stat = PoorManDefaultDictV2(int)
    for animal in animals:
        stat[animal] += 1
    return stat

Run

poor_man_count_animals(animals)

Output

{'cat': 2, 'dog': 3, 'parrot': 2, 'lion': 1, 'tiger': 1}

I think now you've learned how to implement a very basic defaultdict.

Default Dict Source Codes

Default Dict

defaultdict is a great tool, very straight forward to use. The implementation is quite similar as above, but it is written in C.

Here it defines the missing function for defaultdict

static PyObject *
defdict_missing(defdictobject *dd, PyObject *key)
{
    PyObject *factory = dd->default_factory;
    PyObject *value;
    if (factory == NULL || factory == Py_None) {
        /* XXX Call dict.__missing__(key) */
        PyObject *tup;
        tup = PyTuple_Pack(1, key);
        if (!tup) return NULL;
        PyErr_SetObject(PyExc_KeyError, tup);
        Py_DECREF(tup);
        return NULL;
    }
    value = _PyObject_CallNoArg(factory);
    if (value == NULL)
        return value;
    if (PyObject_SetItem((PyObject *)dd, key, value) < 0) {
        Py_DECREF(value);
        return NULL;
    }
    return value;
}

Then it registers the functions to the object.

static PyMethodDef defdict_methods[] = {
    {"__missing__", (PyCFunction)defdict_missing, METH_O,
    defdict_missing_doc},
    {"copy", (PyCFunction)defdict_copy, METH_NOARGS,
    defdict_copy_doc},
    {"__copy__", (PyCFunction)defdict_copy, METH_NOARGS,
    defdict_copy_doc},
    {"__reduce__", (PyCFunction)defdict_reduce, METH_NOARGS,
    reduce_doc},
    {"__class_getitem__", (PyCFunction)Py_GenericAlias, METH_O|METH_CLASS,
    PyDoc_STR("See PEP 585")},
    {NULL}
};

defaultdict Examples

Codes

from collections import defaultdict
def count_animals(animals):
    stat = defaultdict(int)
    for animal in animals:
        stat[animal] += 1
    return stat

Run

count_animals(animals)

Output

{'cat': 2, 'dog': 3, 'parrot': 2, 'lion': 1, 'tiger': 1}

Run

import random
from collections import defaultdict
random_default = defaultdict(lambda: random.randint(0, 10))
random_default['cat']

Output

The default value of a non-existing key is a random integer between 0 and 10.

4