Automated testing has become axiomatic in the Python community. The Django tutorial, for example, explains testing Django projects before even explaining how to deal with CSS. A common measurement for tests is test coverage: the percentage of lines, branches or files of the code that are executed when the tests are run. How high this number should be is a frequently debated subject, but general consensus is to aim for 90-95%, with 100% being utopian unrealistic and not worth the time it may take. Some don’t care about test coverage at all, because high test coverage doesn’t guarantee the tests are any good.
For my own projects, I’ve adopted a new rule: all code must have 100% test coverage. I am not done done until the unit tests have 100% coverage. How is this not utopian, unrealistic, and a waste of time? Because I count test coverage after exclusions. Although even that won’t help you to catch every scenario.
What code coverage tells me
I often don’t use TDD when writing code, because I’ve found it not to be the best way to write user interface code for me, which is a significant part of what I write. This post is not an argument against TDD: I couldn’t care less whether anyone else uses TDD, as long as your code has good tests by the time I see it. Writing tests afterwards does bring the risk for me that my tests are incomplete.
This is where code coverage helps me out. Code coverage will never catch malicious tests: tests which have been written only to cover code, and don’t actually verify any outcomes. But it often catches me forgetting a test for a scenario in the code. So code coverage won’t guarantee that my tests are any good, but does often catch it when I’ve simply forgotten to test some functionality.
Test coverage often reports lines missed as well as percentages, so you can review those, check whether any of those relate to your new code, and decide to add tests. Alternatively, you may decide that they are a scenario that would be unreasonable to test, and leave them as is. I base that on a rough risk assessment: how complex is this code, what is the worst failure scenario, how hard is it to add a test. Deciding that something is unreasonable to test should be a rare scenario.
Where 95% coverage fails
The workflow described above will work for a while. However, as your project grows, and remaining at 95% coverage, the number of cases where you decided not to test a scenario increases. Slowly, the number of missed lines increases, up to a point where it becomes harder to figure out which misses relate to your changes. But worst of all: it becomes impossible for another person to determine whether lines are missed due to oversight or due to an intentional decision that testing this line is unreasonable.
This is why I adopted my 100% coverage rule: the test coverage must always be 100%. No exceptions. However, you are allowed to exclude something from the tests. I’d prefer not to, but then at least we know that a conscious decision was made, who made it, and when. You still need to be cautious with exclusions, but at least they are explicit. And when I run the tests, it is instantly visible whether or not I am done: if the test coverage is not 100%, there is work to be done. Most likely tests should be added, or otherwise exclusions. This is so strict that my CircleCI builds fail if test coverage is lower than 100%.
Testing when things break
Obviously, this mostly helps you with forgetting scenarios in tests. You’ll still need to use your brain to ensure the tests you do write make sense. And there are scenarios that won’t appear in test coverage at all.
In particular, one scenario where test coverage will rarely help is whether something breaks as designed. Sure, it’s nice enough that you can edit that blog post if you are logged in and own it or are the superuser. And test coverage is likely to say you’re 100% done once you test these cases. But what if I’m not the superuser, and don’t own the blog? What if I’m not even logged in? Will your code reject me seeing the editing page? What if I try to submit an update anyways? Is the entry in the database unchanged?
A failure to test these scenarios is rarely visible in test coverage, at least in in Python/Django, because this is often implemented with decorators or mixins. However, they are even more important than testing whether things work as they should: if nobody can edit a blog, users will complain. If everyone can edit a blog, nobody will complain for a long time.
I place a strong focus on adding tests like these for everything I make, but haven’t come up with a reliable way of proving I’ve done this.
For Python I use coverage. My typical setup is to have a
.coveragerc that looks like this:
[run] include = contractportal/* [report] exclude_lines = pragma: no cover # Don't complain about missing debug-only code: def __unicode__ def __repr__ if self\.debug # Don't complain if tests don't hit defensive assertion code: raise AssertionError raise NotImplementedError # Don't complain if non-runnable code isn't run: if 0: if __name__ == .__main__.: omit = project/settings/* project/*/migrations/* show_missing = True
This places a number of sensible excludes, and I can exclude any other line or block with
# pragma: no cover. Excluded lines will be counted separately in the reports.
In my Django 1.7 setup, I run this as:
coverage run ./manage.py test && coverage report