Documentation unit tests
28th July 2018
Or: Test-driven documentation.
Keeping documentation synchronized with an evolving codebase is difficult. Without extreme discipline, it’s easy for documentation to get out-of-date as new features are added.
One thing that can help is keeping the documentation for a project in the same repository as the code itself. This allows you to construct the ideal commit: one that includes the code change, the updated unit tests AND the accompanying documentation all in the same unit of work.
When combined with a code review system (like Phabricator or GitHub pull requests) this pattern lets you enforce documentation updates as part of the review process: if a change doesn’t update the relevant documentation, point that out in your review!
Good code review systems also execute unit tests automatically and attach the results to the review. This provides an opportunity to have the tests enforce other aspects of the codebase: for example, running a linter so that no-one has to waste their time arguing over standardize coding style.
I’ve been experimenting with using unit tests to ensure that aspects of a project are covered by the documentation. I think it’s a very promising technique.
Introspect the code, introspect the docs
The key to this trick is introspection: interogating the code to figure out what needs to be documented, then parsing the documentation to see if each item has been covered.
I’ll use my Datasette project as an example. Datasette’s test_docs.py module contains three relevant tests:
test_config_options_are_documented
checks that every one of Datasette’s configuration options are documented.test_plugin_hooks_are_documented
ensures all of the plugin hooks (powered by pluggy) are covered in the plugin documentation.test_view_classes_are_documented
iterates through all of the*View
classes (corresponding to pages in the Datasette user interface) and makes sure they are covered.
In each case, the test uses introspection against the relevant code areas to figure out what needs to be documented, then runs a regular expression against the documentation to make sure it is mentioned in the correct place.
Obviously the tests can’t confirm the quality of the documentation, so they are easy to cheat: but they do at least protect against adding a new option but forgetting to document it.
Testing that Datasette’s view classes are covered
Datasette’s view classes use a naming convention: they all end in View
. The current list of view classes is DatabaseView
, TableView
, RowView
, IndexView
and JsonDataView
.
Since these classes are all imported into the datasette.app module (in order to be hooked up to URL routes) the easiest way to introspect them is to import that module, then run dir(app)
and grab any class names that end in View
. We can do that with a Python list comprehension:
from datasette import app
views = [v for v in dir(app) if v.endswith("View")]
I’m using reStructuredText labels to mark the place in the documentation that addresses each of these classes. This also ensures that each documentation section can be linked to, for example:
http://datasette.readthedocs.io/en/latest/pages.html#tableview
The reStructuredText syntax for that label looks like this:
.. _TableView:
Table
=====
The table page is the heart of Datasette...
We can extract these labels using a regular expression:
from pathlib import Path
import re
docs_path = Path(__file__).parent.parent / 'docs'
label_re = re.compile(r'\.\. _([^\s:]+):')
def get_labels(filename):
contents = (docs_path / filename).open().read()
return set(label_re.findall(contents))
Since Datasette’s documentation is spread across multiple *.rst
files, and I want the freedom to document a view class in any one of them, I iterate through every file to find the labels and pull out the ones ending in View
:
def documented_views():
view_labels = set()
for filename in docs_path.glob("*.rst"):
for label in get_labels(filename):
first_word = label.split("_")[0]
if first_word.endswith("View"):
view_labels.add(first_word)
return view_labels
We now have a list of class names and a list of labels across all of our documentation. Writing a basic unit test comparing the two lists is trivial:
def test_view_documentation():
view_labels = documented_views()
view_classes = set(v for v in dir(app) if v.endswith("View"))
assert view_labels == view_classes
Taking advantage of pytest
Datasette uses pytest for its unit tests, and documentation unit tests are a great opportunity to take advantage of some advanced pytest features.
Parametrization
The first of these is parametrization: pytest provides a decorator which can be used to execute a single test function multiple times, each time with different arguments.
This example from the pytest documentation shows how parametrization works:
import pytest
@pytest.mark.parametrize("test_input,expected", [
("3+5", 8),
("2+4", 6),
("6*9", 42),
])
def test_eval(test_input, expected):
assert eval(test_input) == expected
pytest treats this as three separate unit tests, even though they share a single function definition.
We can combine this pattern with our introspection to execute an independent unit test for each of our view classes. Here’s what that looks like:
@pytest.mark.parametrize("view", [v for v in dir(app) if v.endswith("View")])
def test_view_classes_are_documented(view):
assert view in documented_views()
Here’s the output from pytest if we execute just this unit test (and one of our classes is undocumented):
$ pytest -k test_view_classes_are_documented -v
=== test session starts ===
collected 249 items / 244 deselected
tests/test_docs.py::test_view_classes_are_documented[DatabaseView] PASSED [ 20%]
tests/test_docs.py::test_view_classes_are_documented[IndexView] PASSED [ 40%]
tests/test_docs.py::test_view_classes_are_documented[JsonDataView] PASSED [ 60%]
tests/test_docs.py::test_view_classes_are_documented[RowView] PASSED [ 80%]
tests/test_docs.py::test_view_classes_are_documented[TableView] FAILED [100%]
=== FAILURES ===
view = 'TableView'
@pytest.mark.parametrize("view", [v for v in dir(app) if v.endswith("View")])
def test_view_classes_are_documented(view):
> assert view in documented_views()
E AssertionError: assert 'TableView' in {'DatabaseView', 'IndexView', 'JsonDataView', 'RowView', 'Table2View'}
E + where {'DatabaseView', 'IndexView', 'JsonDataView', 'RowView', 'Table2View'} = documented_views()
tests/test_docs.py:77: AssertionError
=== 1 failed, 4 passed, 244 deselected in 1.13 seconds ===
Fixtures
There’s a subtle inefficiency in the above test: for every view class, it calls the documented_views()
function—and that function then iterates through every *.rst
file in the docs/
directory and uses a regular expression to extract the labels. With 5 view classes and 17 documentation files that’s 85 executions of get_labels()
, and that number will only increase as Datasette’s code and documentation grow larger.
We can use pytest’s neat fixtures to reduce this to a single call to documented_views()
that is shared across all of the tests. Here’s what that looks like:
@pytest.fixture(scope="session")
def documented_views():
view_labels = set()
for filename in docs_path.glob("*.rst"):
for label in get_labels(filename):
first_word = label.split("_")[0]
if first_word.endswith("View"):
view_labels.add(first_word)
return view_labels
@pytest.mark.parametrize("view_class", [
v for v in dir(app) if v.endswith("View")
])
def test_view_classes_are_documented(documented_views, view_class):
assert view_class in documented_views
Fixtures in pytest are an example of dependency injection: pytest introspects every test_*
function and checks if it has a function argument with a name matching something that has been annotated with the @pytest.fixture
decorator. If it finds any matching arguments, it executes the matching fixture function and passes its return value in to the test function.
By default, pytest will execute the fixture function once for every test execution. In the above code we use the scope="session"
argument to tell pytest that this particular fixture should be executed only once for every pytest
command-line execution of the tests, and that single return value should be passed to every matching test.
What if you haven’t documented everything yet?
Adding unit tests to your documentation in this way faces an obvious problem: when you first add the tests, you may have to write a whole lot of documentation before they can all pass.
Having tests that protect against future code being added without documentation is only useful once you’ve added them to the codebase—but blocking that on documenting your existing features could prevent that benefit from ever manifesting itself.
Once again, pytest to the rescue. The @pytest.mark.xfail
decorator allows you to mark a test as “expected to fail”—if it fails, pytest will take note but will not fail the entire test suite.
This means you can add deliberately failing tests to your codebase without breaking the build for everyone—perfect for tests that look for documentation that hasn’t yet been written!
I used xfail
when I first added view documentation tests to Datasette, then removed it once the documentation was all in place. Any future code in pull requests without documentation will cause a hard test failure.
Here’s what the test output looks like when some of those tests are marked as “expected to fail”:
$ pytest tests/test_docs.py
collected 31 items
tests/test_docs.py ..........................XXXxx. [100%]
============ 26 passed, 2 xfailed, 3 xpassed in 1.06 seconds ============
Since this reports both the xfailed and the xpassed counts, it shows how much work is still left to be done before the xfail
decorator can be safely removed.
Structuring code for testable documentation
A benefit of comprehensive unit testing is that it encourages you to design your code in a way that is easy to test. In my experience this leads to much higher code quality in general: it encourages separation of concerns and cleanly decoupled components.
My hope is that documentation unit tests will have a similar effect. I’m already starting to think about ways of restructuring my code such that I can cleanly introspect it for the areas that need to be documented. I’m looking forward to discovering code design patterns that help support this goal.
More recent articles
- Qwen2.5-Coder-32B is an LLM that can code well that runs on my Mac - 12th November 2024
- Visualizing local election results with Datasette, Observable and MapLibre GL - 9th November 2024
- Project: VERDAD - tracking misinformation in radio broadcasts using Gemini 1.5 - 7th November 2024