Python Testing 101 (How To Decide What To Test)
Dealing with failing tests is a misery.
Imagine your new code breaks old tests and you have no idea why, preventing you from merging your latest code.
Your tests are slow and break every time you change the code even slightly, feeling more like a massive dumbbell you dread when merging PRs, rather than a net or signal to protect or guide you.
On the other side of the spectrum, maybe you don’t even know what to test.
Should you test every function in your application? What if the code interacts with a database or external service? Should you use mocks?
If you’re building algorithms, APIs, data transformation, should you test every function?
It’s not feasible nor a good idea to test everything, it just takes too much time and effort, and test maintenance carries an overhead.
In this article, we’ll break through all the dilemmas. I’ll teach you how to design an effective test strategy around objects and to evaluate the important features to test.
You’ll learn principles and the best test design patterns that can guide you no matter what application you’re building.
We’ll learn all of this using a real example so you can practice along if you choose to.
Goals Of Unit Testing
In our article on Python Unit Testing Best Practices, we discussed several good testing patterns, I’d like to briefly recap some of those here.
Remember, you’re not just writing tests; you’re architecting a safety net that ensures your software is robust, efficient, and, most importantly, trustworthy.
Fast
Your tests should run quickly, enabling frequent execution without slowing down the development process.
Fast tests encourage more frequent testing, leading to earlier detection of bugs and smoother iterations.
Stable
A stable test suite produces the same outcomes under the same conditions, time after time.
This predictability is crucial for identifying when and how new changes affect existing functionalities.
Thorough
A thorough test suite meticulously covers every critical path and most edge cases, ensuring that all functionalities are tested.
Few
While it might seem counterintuitive, aiming for fewer, more focused tests can be more beneficial than having too many.
Each test should have a clear purpose.
Isolated
Isolation in testing means that each test is independent of others; it does not rely on the state produced by previous tests or external dependencies.
This characteristic ensures that tests can run in any order and that the failure of one test does not cascade.
Idempotent
Idempotence here refers to the principle that a test can be run multiple times with the same conditions and produce the same results every time.
This property ensures that the outcome of a test is solely determined by the code under test and not by the test’s previous executions or external state changes.
Focus on Messages
I would like to take this opportunity to introduce a fantastic talk by Sandi Metz called “Magic Tricks Of Testing”. I highly recommend you watch it.
Sandi shares some light on testing strategy by recommending focusing on messages and the route they take.
I’m going to try and explain Sandi’s concepts below but please watch the talk first to help you understand better.
Objects or Entities as we call them are like black boxes.
In a programming sense, I’m referring to a Class but you can apply the same concept to a function.
Sandi recommends thinking of objects like Space Capsules.
[Photo by Jeremy Straub on Unsplash]
They have a very clear separation between what’s inside and outside.
Objects communicate with other objects via messages.
Each Object:
- Receives incoming messages from another object or external system
- Sends messages to itself or within itself
- Sends outgoing messages to another object or external system
Lastly, Sandi also defines the 2 types of messages
- Queries — Returns something but changes nothing
- Commands — Returns nothing but changes something
This differentiation is extremely important.
Most people (including myself) don’t make the distinction and suffer great consequences in the form of poorly designed tests.
Here’s a very interesting table and we’re now going to populate it with the help of a real example.
Message | Query | Command |
---|---|---|
Incoming | ||
Sent-to-self | ||
Outgoing |
Set Up Your Local Environment
Before diving into the source code, let’s set up your local environment so you can follow along.
Clone the Repo
The project has the following structure.1
2
3
4
5
6
7
8
9
10
11.
├── .gitignore
├── Pipfile
├── Pipfile.lock
├── bicycle
│ ├── gear.py
│ ├── observer.py
│ └── wheel.py
└── tests
├── __init__.py
└── test_bicycle.py
Some knowledge of Python and Pytest is useful.
Note — I’m using Pipenv to manage virtual environments and packages instead of Conda but feel free to use what’s best for you.1
2pipenv shell
pipenv install - dev
Once you’re all set up with dependencies let’s move on.
Example Code
I’m going to use the same example code explained by Sandi in the talk so you can easily make a reference and understand the underlying concept.
This code will be a Python adaptation of Sandi’s example. The reference to object
in this article will be a Python Class.
We’ll build a project bicycle
with 2 objects — wheel
and gear
.
We’ll also have a 3rd observer
object which is an external object, although this could be a database or external Rest API interface.
bicycle/wheel.py
1
2
3
4
5
6
7
8
9
10class Wheel:
def __init__(self, rim: float, tire: float):
self.rim = rim
self.tire = tire
def diameter(self) -> float:
"""
Query to calculate the diameter of the wheel
"""
return round(self.rim + (self.tire * 2), 2)
bicycle/gear.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39from bicycle.wheel import Wheel
from bicycle.observer import Observer
class Gear:
def __init__( self, chainring: float, cog: int, wheel: Wheel, observer: Observer = None ):
self.chainring = chainring
self.cog = cog
self.wheel = wheel
self.observer = observer
def gear_inches(self) -> float:
"""
Query to calculate the gear inches
"""
return round(self.__ratio() * self.wheel.diameter(), 2)
def __ratio(self) -> float:
"""
Internal method to calculate the ratio
"""
return round(self.chainring / self.cog, 2)
def set_cog(self, new_cog_value: int):
"""
Command to set the cog value
"""
self.cog = new_cog_value # Set the cog value for future calculations
self.changed()
def changed(self):
"""
Send a message to the observer that the cog value has changed
"""
if self.observer:
self.observer.changed(
self.chainring, self.cog
) # Sends a message to the observer that the cog value has changed, message MUST be sent
pass
bicycle/observer.py
1
2
3
4
5
6class Observer:
def changed(self, chainring: float, cog: int):
"""
Method to be implemented by the observer
"""
pass
Don’t worry too much about the methods of each class, we’ll discuss each of them shortly.
Our goal in the subsequent sections of the article is to populate the table by breaking down variations of incoming, outgoing, sent-to-self query, and command messages.
You’ll write relevant tests to thoroughly understand the concept.
Incoming Query Messages
It’s important to test incoming query messages because the object under test needs to return the correct response.
tests/test_bicycle.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14from bicycle.wheel import Wheel
from bicycle.gear import Gear
from bicycle.observer import Observer
## Incoming Query Messages
def test_diameter():
"""
Test the diameter of the wheel
"""
wheel = Wheel(rim=26, tire=1.5)
assert (
wheel.diameter() == 29
) # Test incoming query message by making assertions about what they send back
We make assertions about what an object sends back.
Another example,
tests/test_bicycle.py
1
2
3
4
5
6
7
8
9def test_calculates_gear_inches():
"""
Test the gear inches calculation
"""
wheel = Wheel(rim=26, tire=1.5)
gear = Gear(chainring=52, cog=11, wheel=wheel)
assert (
gear.gear_inches() == 137.17
) # Test the interface, not implementation
Again, we test the gear_inches
interface in the Gear
object.
Incoming Command Messages
This means the object under test changes something or has a side effect based on the incoming command.
For example, set_cog()
sets the value of cog
to a new value and all future methods or objects in the class referencing cog
will see this new value (hence a “command”).
tests/test_bicycle.py
1
2
3
4
5
6
7
8
9
10
11## Incoming Command Messages
def test_set_cog():
"""
Test the set cog method
"""
wheel = Wheel(rim=26, tire=1.5)
gear = Gear(chainring=52, cog=11, wheel=wheel)
gear.set_cog(10)
assert (
gear.cog == 10
) # Test incoming command message by making assertions about direct public side effects
It’s important to test that the cog
value has been set correctly.
Messages Sent To Self (Internal Messages)
All in all, this was the most important piece of the talk for me. 🤯
Often we’ve learned the design principle “Don’t test implementation, just test the interface”.
But what does this actually mean? Sandi breaks this down beautifully, calling these “Anti-Patterns”.
tests/test_bicycle.py
1
2
3
4
5
6
7def test_calculates_ratio():
"""
Test the ratio calculation
"""
wheel = Wheel(rim=26, tire=1.5)
gear = Gear(chainring=52, cog=11, wheel=wheel)
assert gear._Gear__ratio() == 4.73 # DO NOT TEST PRIVATE METHODS
In the above test, there is NO NEED to test private methods. How Gear
calculates the ratio
is not visible to the outside world and doesn’t need to be tested.
It becomes very hard to improve the internal code without breaking the tests and also deters colleagues from trying to improve your code.
The caveat is if you’re testing some complex calculations or algorithms then sure, include tests in a separate module with the comment to delete or skip them if they break.
Outgoing Query Messages
Now what about the messages sent from one object to another for example Gear
calling Wheel.diameter()
?
tests/test_bicycle.py
1
2
3
4
5
6
7
8
9
10## Outgoing Query Messages
# AntiPattern: Testing outgoing query messages - Do NOT test as they are tested as part of the incoming query messages
def test_calculates_gear_inches_outgoing():
"""
Test the gear inches calculation
"""
wheel = Wheel(rim=26, tire=1.5)
gear = Gear(chainring=52, cog=11, wheel=wheel)
assert gear.gear_inches() == 137.17
assert gear.wheel.diameter() == 29 # Redundant and duplicates the Wheel test
As evident from the comments, this is another Anti-Pattern.
It’s redundant and duplicates the incoming query message for the Wheel
object, no need to test it as part of the Gear
tests.
Don’t make assertions about the result and do not expect to send them.
Outgoing Command Messages
Let’s look at the last case for outgoing command messages.
Assuming the game where people ride bikes, if they change gears we have to notify the rest of the app.
i.e. If Gear
receives the set_cog
method, we have to call the changed
method which in turn calls the Observer
entity.
bicycle/gear.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16def set_cog(self, new_cog_value: int):
"""
Command to set the cog value
"""
self.cog = new_cog_value # Set the cog value for future calculations
self.changed()
def changed(self):
"""
Send a message to the observer that the cog value has changed
"""
if self.observer:
self.observer.changed(
self.chainring, self.cog
) # Sends a message to the observer that the cog value has changed, message MUST be sent
pass
As you can see above, the set_cog
method calls the changed
method which in turn calls Observer.changed()
.
The side effect is the Observer
class could then be many things, outside of our control i.e. write to a database, send an API request, webhook and so on.
Writing our test
tests/test_bicycle.py
1
2
3
4
5
6
7
8## Outgoing Command Messages
# AntiPattern: If you assert what's in the DB, it creates a dependency on distance side effect
def test_saves_changed_cog_in_db():
obs = Observer()
wheel = Wheel(rim=26, tire=1.5)
gear = Gear(chainring=52, cog=11, wheel=wheel, observer=obs)
gear.set_cog(27)
# Assert something about the state of the db
As soon as the gear.set_cog()
method is called (outgoing command) we now have to find a way to check that the side effect was successful.
A simple out-of-the-box way is to directly test the side effect i.e. read from the database or GET request from the API.
But this is NOT the job of our unit tests and makes it incredibly slow and heavily dependent on the side effects.
This has now been converted into an integration test.
What we want to test is that the correct method set_cog
was called with the correct argument. Testing beyond that is out of scope of this unit test.
To solve this, we can isolate our test by replacing the external dependency with a Mock.
If you’re not familiar with Mocks, I highly recommend checking out this article.
We’ll use the pytest-mock
plugin with the mocker
fixture which is a wrapper over the inbuilt unittest.mock
library.
tests/test_bicycle.py
1
2
3
4
5
6
7
8
9
10
11
12def test_notifies_observers_when_cogs_change(mocker):
# Create a mock for the observer "changed" method
obs_mock = mocker.patch("bicycle.observer.Observer.changed")
wheel = Wheel(rim=26, tire=1.5)
gear = Gear(
chainring=52, cog=11, wheel=wheel, observer=obs_mock
) # Pass the mocked Observer to the Gear
gear.set_cog(27)
obs_mock.changed.assert_called_with(52, 27) # Assert that the observer was notified
gear.set_cog(36)
obs_mock.changed.assert_called_with(52, 36) # Assert that the observer was notified
Let’s break down what’s going on here.
- We create a Mocked
Observer
or test double (commonly called) where we patch the path to thechanged()
method. - We initialize
Wheel
,Gear
objects as normal, passing the mockedobserver
to theGear
object instead of a realobserver
, this is very important.Gear
will now use the mocked version instead of the real one. - We assert that the
observer.changed()
method was called with the correct value ofcog
. - We set the value to
cog
to something else and assert that mocked version ofobserver.changed()
was called with the newcog
value.
If you need a refresher on how to test Assert Called Methods, I’ve got you covered with this practical guide.
This allows us to cleanly make sure that gear.set_cog()
calls observer.changed()
with the correct cog
value without relying on Observer
which is an external object or event.
The only downside here is that you have to keep your mock in sync with the external service or API drift.
Rule — Honour the contract.
Running our tests
Understanding the WHY and HOW here is really important so go back and read it twice if you need to.
Our final table looks like this
Message | Query | Command |
---|---|---|
Incoming | Assert Result | Assert direct public side effects |
Sent-to-self | Ignore (optional) | Ignore (optional) |
Outgoing | Ignore | Mock, Patch, Stub |
Now that we’ve seen how to test objects which is the fundamental level of unit testing, let’s move on to designing your own test strategy and deciding what to test.
Design A Test Strategy — Deciding What to Test
In a Python TDD approach, you’d be writing your tests before your code.
If you’re not familiar with Test-Driven Development (TDD) check out this article which walks you through exactly how to get started with TDD.
Let’s assume you’ve tested your core entities/objects as per the above-explained strategies, what else should you test to ensure your application works as you expect it to?
Core Functionality and Features
Focus on the core functionality and features of your application.
Identify the parts of your application that are most critical to its operation and start by writing tests for these components.
These are the functions and methods that your application relies on to perform its primary tasks.
You could also adopt Behavior-Driven Testing to make sure your application behaves as it should based on features and design.
Boundary Conditions
Test the edges of your application’s input space.
This includes testing with minimum, maximum, and just outside acceptable input ranges.
Boundary conditions often reveal edge cases that you might not have considered during development.
Even better, you can leverage in-built Pytest tools like Parametrization or property-based testing like Hypothesis to test edge-cases and boundary conditions.
Error Handling
Ensure that your application gracefully handles error conditions.
Write tests that simulate various error states to verify that your application responds appropriately, such as input validation errors or external service failures.
For example, if you’re building an API make sure it can gracefully handle client errors, server errors, timeouts, pagination, and so on.
Performance Constraints
If your application has specific performance requirements, write tests that verify these constraints are met.
This could include testing response times for web applications or processing times for data-intensive operations or API response times.
Regression Tests
Whenever a bug is fixed, write a test that captures the bug’s specific scenario to prevent regressions in the future.
Maintain a growing suite of tests that cover previously discovered issues, ensuring that any changes to the code do not reintroduce old problems.
Security
Is security important? Does the application deal with authentication or inject SQL into the database?
You can leverage ORMs (object-relational mappers) like SQLAlchemy or SQLModel to abstract that layer.
Test authentication and security thoroughly as even small data leaks can really hurt your customers and your reputation.
Separate Unit and Integration Tests
Unit tests and integration tests serve distinct purposes.
Unit tests are laser-focused on individual components, ensuring that each part performs as expected in isolation.
Integration tests, on the other hand, are concerned with the interactions between components, verifying that they work together as intended.
Typically, unit tests are fast and stable, providing immediate feedback.
Integration tests, tend to be slower and more susceptible to external factors, given their reliance on the integration of components, external systems, or services.
Separating them allows you to run the quick, reliable unit tests frequently, reserving the more comprehensive, though slower, integration tests for key moments in the development cycle.
When tests are well-organized, pinpointing the root cause of a failure becomes significantly easier.
Unit test failures usually indicate issues within the specific component tested, while integration test failures point to problems in the interaction between components.
This clear separation helps with better debugging and faster resolution.
Conclusion
Ok, it’s time to wrap this up.
I hope you enjoyed reading this article as much as I enjoyed researching and writing it.
Learning what to test in your application is arguably even more important than the act of testing itself. After all, tests only make sure the code that’s written is working as intended, so if the code is incorrect, no test can save you.
In this article, you learned how to test objects or entities based on messages (incoming, self and outgoing, command and query).
The talk from Sandi Metz was eye-opening in learning the concepts so we can apply them to our projects without having to write tests for every function of every object especially internal private methods.
You went through the practical implementation of the bicycle
project with example code and tests.
Lastly, we thoroughly explained how to design and prioritize your test strategy to produce the best performing, stable, and robust application possible.
With this information, I hope you feel more confident in designing and implementing your test strategy.
If you have ideas for improvement or like me to cover anything specific, please send me a message via Twitter, GitHub or Email.
Till the next time… Cheers!
Additional Reading
This article was made possible due to the below amazing resources from their wonderful creators.
Talk from Sandi Metz
Example Code Used
Python Unit Testing Best Practices For Building Reliable Applications
An Ultimate Guide To Using Pytest Skip Test And XFail - With Examples
Introduction to Pytest Mocking - What It Is and Why You Need It
How To Practice Test-Driven Development In Python? (Deep Dive)
What Are Pytest Mock Assert Called Methods and How To Leverage Them