Python Testing 101 (How To Decide What To Test)

Dealing with failing tests is a misery.

Imagine your new code breaks old tests and you have no idea why, preventing you from merging your latest code.

Your tests are slow and break every time you change the code even slightly, feeling more like a massive dumbbell you dread when merging PRs, rather than a net or signal to protect or guide you.

On the other side of the spectrum, maybe you don’t even know what to test.

Should you test every function in your application? What if the code interacts with a database or external service? Should you use mocks?

If you’re building algorithms, APIs, data transformation, should you test every function?

It’s not feasible nor a good idea to test everything, it just takes too much time and effort, and test maintenance carries an overhead.

In this article, we’ll break through all the dilemmas. I’ll teach you how to design an effective test strategy around objects and to evaluate the important features to test.

You’ll learn principles and the best test design patterns that can guide you no matter what application you’re building.

We’ll learn all of this using a real example so you can practice along if you choose to.

EXAMPLE CODE

Goals Of Unit Testing

In our article on Python Unit Testing Best Practices, we discussed several good testing patterns, I’d like to briefly recap some of those here.

Remember, you’re not just writing tests; you’re architecting a safety net that ensures your software is robust, efficient, and, most importantly, trustworthy.

Fast

Your tests should run quickly, enabling frequent execution without slowing down the development process.

Fast tests encourage more frequent testing, leading to earlier detection of bugs and smoother iterations.

Stable

A stable test suite produces the same outcomes under the same conditions, time after time.

This predictability is crucial for identifying when and how new changes affect existing functionalities.

Thorough

A thorough test suite meticulously covers every critical path and most edge cases, ensuring that all functionalities are tested.

Few

While it might seem counterintuitive, aiming for fewer, more focused tests can be more beneficial than having too many.

Each test should have a clear purpose.

Isolated

Isolation in testing means that each test is independent of others; it does not rely on the state produced by previous tests or external dependencies.

This characteristic ensures that tests can run in any order and that the failure of one test does not cascade.

Idempotent

Idempotence here refers to the principle that a test can be run multiple times with the same conditions and produce the same results every time.

This property ensures that the outcome of a test is solely determined by the code under test and not by the test’s previous executions or external state changes.

Focus on Messages

I would like to take this opportunity to introduce a fantastic talk by Sandi Metz called “Magic Tricks Of Testing”. I highly recommend you watch it.

Sandi shares some light on testing strategy by recommending focusing on messages and the route they take.

I’m going to try and explain Sandi’s concepts below but please watch the talk first to help you understand better.

Objects or Entities as we call them are like black boxes.

In a programming sense, I’m referring to a Class but you can apply the same concept to a function.

Sandi recommends thinking of objects like Space Capsules.

space capsule

[Photo by Jeremy Straub on Unsplash]

They have a very clear separation between what’s inside and outside.

Objects communicate with other objects via messages.

Each Object:

  • Receives incoming messages from another object or external system
  • Sends messages to itself or within itself
  • Sends outgoing messages to another object or external system

Lastly, Sandi also defines the 2 types of messages

  • Queries — Returns something but changes nothing
  • Commands — Returns nothing but changes something

This differentiation is extremely important.

Most people (including myself) don’t make the distinction and suffer great consequences in the form of poorly designed tests.

Here’s a very interesting table and we’re now going to populate it with the help of a real example.

MessageQueryCommand
Incoming
Sent-to-self
Outgoing

Set Up Your Local Environment

Before diving into the source code, let’s set up your local environment so you can follow along.

Clone the Repo

The project has the following structure.

1
2
3
4
5
6
7
8
9
10
11
.
├── .gitignore
├── Pipfile
├── Pipfile.lock
├── bicycle
│ ├── gear.py
│ ├── observer.py
│ └── wheel.py
└── tests
├── __init__.py
└── test_bicycle.py

Some knowledge of Python and Pytest is useful.

Note — I’m using Pipenv to manage virtual environments and packages instead of Conda but feel free to use what’s best for you.

1
2
$ pipenv shell   
$ pipenv install - dev

Once you’re all set up with dependencies let’s move on.

Example Code

I’m going to use the same example code explained by Sandi in the talk so you can easily make a reference and understand the underlying concept.

This code will be a Python adaptation of Sandi’s example. The reference to object in this article will be a Python Class.

We’ll build a project bicycle with 2 objects — wheel and gear .

We’ll also have a 3rd observer object which is an external object, although this could be a database or external Rest API interface.

bicycle/wheel.py

1
2
3
4
5
6
7
8
9
10
class Wheel:  
def __init__(self, rim: float, tire: float):
self.rim = rim
self.tire = tire

def diameter(self) -> float:
"""
Query to calculate the diameter of the wheel
"""
return round(self.rim + (self.tire * 2), 2)

bicycle/gear.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
from bicycle.wheel import Wheel  
from bicycle.observer import Observer


class Gear:
def __init__( self, chainring: float, cog: int, wheel: Wheel, observer: Observer = None ):
self.chainring = chainring
self.cog = cog
self.wheel = wheel
self.observer = observer

def gear_inches(self) -> float:
"""
Query to calculate the gear inches
"""
return round(self.__ratio() * self.wheel.diameter(), 2)

def __ratio(self) -> float:
"""
Internal method to calculate the ratio
"""
return round(self.chainring / self.cog, 2)

def set_cog(self, new_cog_value: int):
"""
Command to set the cog value
"""
self.cog = new_cog_value # Set the cog value for future calculations
self.changed()

def changed(self):
"""
Send a message to the observer that the cog value has changed
"""
if self.observer:
self.observer.changed(
self.chainring, self.cog
) # Sends a message to the observer that the cog value has changed, message MUST be sent
pass

bicycle/observer.py

1
2
3
4
5
6
class Observer:  
def changed(self, chainring: float, cog: int):
"""
Method to be implemented by the observer
"""
pass

Don’t worry too much about the methods of each class, we’ll discuss each of them shortly.

Our goal in the subsequent sections of the article is to populate the table by breaking down variations of incoming, outgoing, sent-to-self query, and command messages.

You’ll write relevant tests to thoroughly understand the concept.

Incoming Query Messages

It’s important to test incoming query messages because the object under test needs to return the correct response.

tests/test_bicycle.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
from bicycle.wheel import Wheel  
from bicycle.gear import Gear
from bicycle.observer import Observer


## Incoming Query Messages
def test_diameter():
"""
Test the diameter of the wheel
"""
wheel = Wheel(rim=26, tire=1.5)
assert (
wheel.diameter() == 29
) # Test incoming query message by making assertions about what they send back

We make assertions about what an object sends back.

Another example,

tests/test_bicycle.py

1
2
3
4
5
6
7
8
9
def test_calculates_gear_inches():  
"""
Test the gear inches calculation
"""
wheel = Wheel(rim=26, tire=1.5)
gear = Gear(chainring=52, cog=11, wheel=wheel)
assert (
gear.gear_inches() == 137.17
) # Test the interface, not implementation

Again, we test the gear_inchesinterface in the Gear object.

Incoming Command Messages

This means the object under test changes something or has a side effect based on the incoming command.

For example, set_cog() sets the value of cog to a new value and all future methods or objects in the class referencing cog will see this new value (hence a “command”).

tests/test_bicycle.py

1
2
3
4
5
6
7
8
9
10
11
## Incoming Command Messages  
def test_set_cog():
"""
Test the set cog method
"""
wheel = Wheel(rim=26, tire=1.5)
gear = Gear(chainring=52, cog=11, wheel=wheel)
gear.set_cog(10)
assert (
gear.cog == 10
) # Test incoming command message by making assertions about direct public side effects

It’s important to test that the cog value has been set correctly.

Messages Sent To Self (Internal Messages)

All in all, this was the most important piece of the talk for me. 🤯

Often we’ve learned the design principle “Don’t test implementation, just test the interface”.

But what does this actually mean? Sandi breaks this down beautifully, calling these “Anti-Patterns”.

tests/test_bicycle.py

1
2
3
4
5
6
7
def test_calculates_ratio():  
"""
Test the ratio calculation
"""
wheel = Wheel(rim=26, tire=1.5)
gear = Gear(chainring=52, cog=11, wheel=wheel)
assert gear._Gear__ratio() == 4.73 # DO NOT TEST PRIVATE METHODS

In the above test, there is NO NEED to test private methods. How Gear calculates the ratio is not visible to the outside world and doesn’t need to be tested.

It becomes very hard to improve the internal code without breaking the tests and also deters colleagues from trying to improve your code.

The caveat is if you’re testing some complex calculations or algorithms then sure, include tests in a separate module with the comment to delete or skip them if they break.

Outgoing Query Messages

Now what about the messages sent from one object to another for example Gear calling Wheel.diameter() ?

tests/test_bicycle.py

1
2
3
4
5
6
7
8
9
10
## Outgoing Query Messages  
# AntiPattern: Testing outgoing query messages - Do NOT test as they are tested as part of the incoming query messages
def test_calculates_gear_inches_outgoing():
"""
Test the gear inches calculation
"""
wheel = Wheel(rim=26, tire=1.5)
gear = Gear(chainring=52, cog=11, wheel=wheel)
assert gear.gear_inches() == 137.17
assert gear.wheel.diameter() == 29 # Redundant and duplicates the Wheel test

As evident from the comments, this is another Anti-Pattern.

It’s redundant and duplicates the incoming query message for the Wheel object, no need to test it as part of the Gear tests.

Don’t make assertions about the result and do not expect to send them.

Outgoing Command Messages

Let’s look at the last case for outgoing command messages.

Assuming the game where people ride bikes, if they change gears we have to notify the rest of the app.

i.e. If Gear receives the set_cog method, we have to call the changed method which in turn calls the Observer entity.

bicycle/gear.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def set_cog(self, new_cog_value: int):  
"""
Command to set the cog value
"""
self.cog = new_cog_value # Set the cog value for future calculations
self.changed()

def changed(self):
"""
Send a message to the observer that the cog value has changed
"""
if self.observer:
self.observer.changed(
self.chainring, self.cog
) # Sends a message to the observer that the cog value has changed, message MUST be sent
pass

As you can see above, the set_cog method calls the changed method which in turn calls Observer.changed() .

The side effect is the Observer class could then be many things, outside of our control i.e. write to a database, send an API request, webhook and so on.

Writing our test

tests/test_bicycle.py

1
2
3
4
5
6
7
8
## Outgoing Command Messages  
# AntiPattern: If you assert what's in the DB, it creates a dependency on distance side effect
def test_saves_changed_cog_in_db():
obs = Observer()
wheel = Wheel(rim=26, tire=1.5)
gear = Gear(chainring=52, cog=11, wheel=wheel, observer=obs)
gear.set_cog(27)
# Assert something about the state of the db

As soon as the gear.set_cog() method is called (outgoing command) we now have to find a way to check that the side effect was successful.

A simple out-of-the-box way is to directly test the side effect i.e. read from the database or GET request from the API.

But this is NOT the job of our unit tests and makes it incredibly slow and heavily dependent on the side effects.

This has now been converted into an integration test.

What we want to test is that the correct method set_cog was called with the correct argument. Testing beyond that is out of scope of this unit test.

To solve this, we can isolate our test by replacing the external dependency with a Mock.

If you’re not familiar with Mocks, I highly recommend checking out this article.

We’ll use the pytest-mock plugin with the mocker fixture which is a wrapper over the inbuilt unittest.mock library.

tests/test_bicycle.py

1
2
3
4
5
6
7
8
9
10
11
12
def test_notifies_observers_when_cogs_change(mocker):  
# Create a mock for the observer "changed" method
obs_mock = mocker.patch("bicycle.observer.Observer.changed")
wheel = Wheel(rim=26, tire=1.5)
gear = Gear(
chainring=52, cog=11, wheel=wheel, observer=obs_mock
) # Pass the mocked Observer to the Gear
gear.set_cog(27)
obs_mock.changed.assert_called_with(52, 27) # Assert that the observer was notified

gear.set_cog(36)
obs_mock.changed.assert_called_with(52, 36) # Assert that the observer was notified

Let’s break down what’s going on here.

  • We create a Mocked Observer or test double (commonly called) where we patch the path to the changed() method.
  • We initialize Wheel , Gear objects as normal, passing the mocked observer to the Gear object instead of a real observer , this is very important. Gear will now use the mocked version instead of the real one.
  • We assert that the observer.changed() method was called with the correct value of cog .
  • We set the value to cog to something else and assert that mocked version of observer.changed() was called with the new cog value.

If you need a refresher on how to test Assert Called Methods, I’ve got you covered with this practical guide.

This allows us to cleanly make sure that gear.set_cog() calls observer.changed() with the correct cog value without relying on Observer which is an external object or event.

The only downside here is that you have to keep your mock in sync with the external service or API drift.

Rule — Honour the contract.

Running our tests

python-testing-strategy-run-test

Understanding the WHY and HOW here is really important so go back and read it twice if you need to.

Our final table looks like this

MessageQueryCommand
IncomingAssert ResultAssert direct public side effects
Sent-to-selfIgnore (optional)Ignore (optional)
OutgoingIgnoreMock, Patch, Stub

Now that we’ve seen how to test objects which is the fundamental level of unit testing, let’s move on to designing your own test strategy and deciding what to test.

Design A Test Strategy —  Deciding What to Test

In a Python TDD approach, you’d be writing your tests before your code.

If you’re not familiar with Test-Driven Development (TDD) check out this article which walks you through exactly how to get started with TDD.

Let’s assume you’ve tested your core entities/objects as per the above-explained strategies, what else should you test to ensure your application works as you expect it to?

Core Functionality and Features

Focus on the core functionality and features of your application.

Identify the parts of your application that are most critical to its operation and start by writing tests for these components.

These are the functions and methods that your application relies on to perform its primary tasks.

You could also adopt Behavior-Driven Testing to make sure your application behaves as it should based on features and design.

Boundary Conditions

Test the edges of your application’s input space.

This includes testing with minimum, maximum, and just outside acceptable input ranges.

Boundary conditions often reveal edge cases that you might not have considered during development.

Even better, you can leverage in-built Pytest tools like Parametrization or property-based testing like Hypothesis to test edge-cases and boundary conditions.

Error Handling

Ensure that your application gracefully handles error conditions.

Write tests that simulate various error states to verify that your application responds appropriately, such as input validation errors or external service failures.

For example, if you’re building an API make sure it can gracefully handle client errors, server errors, timeouts, pagination, and so on.

Performance Constraints

If your application has specific performance requirements, write tests that verify these constraints are met.

This could include testing response times for web applications or processing times for data-intensive operations or API response times.

Regression Tests

Whenever a bug is fixed, write a test that captures the bug’s specific scenario to prevent regressions in the future.

Maintain a growing suite of tests that cover previously discovered issues, ensuring that any changes to the code do not reintroduce old problems.

Security

Is security important? Does the application deal with authentication or inject SQL into the database?

You can leverage ORMs (object-relational mappers) like SQLAlchemy or SQLModel to abstract that layer.

Test authentication and security thoroughly as even small data leaks can really hurt your customers and your reputation.

Separate Unit and Integration Tests

Unit tests and integration tests serve distinct purposes.

Unit tests are laser-focused on individual components, ensuring that each part performs as expected in isolation.

Integration tests, on the other hand, are concerned with the interactions between components, verifying that they work together as intended.

Typically, unit tests are fast and stable, providing immediate feedback.

Integration tests, tend to be slower and more susceptible to external factors, given their reliance on the integration of components, external systems, or services.

Separating them allows you to run the quick, reliable unit tests frequently, reserving the more comprehensive, though slower, integration tests for key moments in the development cycle.

When tests are well-organized, pinpointing the root cause of a failure becomes significantly easier.

Unit test failures usually indicate issues within the specific component tested, while integration test failures point to problems in the interaction between components.

This clear separation helps with better debugging and faster resolution.

Conclusion

Ok, it’s time to wrap this up.

I hope you enjoyed reading this article as much as I enjoyed researching and writing it.

Learning what to test in your application is arguably even more important than the act of testing itself. After all, tests only make sure the code that’s written is working as intended, so if the code is incorrect, no test can save you.

In this article, you learned how to test objects or entities based on messages (incoming, self and outgoing, command and query).

The talk from Sandi Metz was eye-opening in learning the concepts so we can apply them to our projects without having to write tests for every function of every object especially internal private methods.

You went through the practical implementation of the bicycle project with example code and tests.

Lastly, we thoroughly explained how to design and prioritize your test strategy to produce the best performing, stable, and robust application possible.

With this information, I hope you feel more confident in designing and implementing your test strategy.

If you have ideas for improvement or like me to cover anything specific, please send me a message via Twitter, GitHub or Email.

Till the next time… Cheers!

Additional Reading

This article was made possible due to the below amazing resources from their wonderful creators.

Talk from Sandi Metz
Example Code Used
Python Unit Testing Best Practices For Building Reliable Applications
An Ultimate Guide To Using Pytest Skip Test And XFail - With Examples
Introduction to Pytest Mocking - What It Is and Why You Need It
How To Practice Test-Driven Development In Python? (Deep Dive)
What Are Pytest Mock Assert Called Methods and How To Leverage Them