I've learned it the hard way that even for prototypes, tests plays a crucial role. And despite everything out there, testing isn't easy.

This is a bit surprising given so many LLM applications are available for testing. But most of them are not production ready. And truth is if you don't understand the basics of testing, you will just be stuck. These LLM applications churn out a bunch of tests but mock what needs to be tested and test what needs to be mocked.

Overview

This guide covers my notes that I prepared for building a testing strategy for Archie AI. I have tried to apply the same principles in this open source project as a reference.

1. Setup
2. Understanding Testing Concepts
3. Mocking and Patching
4. Fixtures
5. Running Tests

1. Setup

Packages to Install

This is one of the easiest part. Make sure you have the following packages installed.

pip install pytest pytest-mock pytest-asyncio pytest-cov mock

pytest.ini Configuration

This is the most important file. It configures pytest and tells pytest where to find the tests, what to test and how to test.

We learned about it a bit late, and that created a lot of issues. This file offers a lot of flexibility. For example, even though our files don't start with test_, pytest worked as we used wildcard for python files below.

Create a pytest.ini file at the root of your project with the following configuration:

# This is the pytest configuration file. It tells pytest how to find and run your tests.

[pytest]
# This sets the base directory for Python imports to the current directory.
pythonpath = .

# These options make the test output more verbose and show print statements.
addopts = -v -s

# This specifies the directory where pytest will look for tests.
testpaths = tests

# These options enable logging to the command line interface and set the log level to INFO.
log_cli = True
log_cli_level = INFO

# This option automatically selects the appropriate mode for asyncio.
asyncio_mode = auto

# This tells pytest to ignore the 'mocks' directory when looking for tests.
norecursedirs = mocks

# These options define the naming conventions for test files, classes, and functions.
python_files = *  # Any Python file can contain tests.
python_classes = Test*  # Test classes should start with 'Test'.
python_functions = test_*  # Test functions should start with 'test_'.

Folder Organization

While there are different conventions, I found the following structure to be most effective. That is, reflect the project's structure in your tests/ folder.

We choose to not name each file with test_ prefix. But that's just a personal choice. You can choose to name each file with test_ prefix if you like, which can then work with default pytest ini configuration.

Option 1: when not using test_ prefix

project/
│
├── src/
│   └── your_module/
│       ├── __init__.py
│       └── your_code.py
│
├── tests/
│   ├── __init__.py
│   └── unittest/
│       └── your_module/
│           ├── __init__.py
│           └── your_code.py
│
├── pytest.ini
├── requirements.txt
└── setup.py

Option 2: when using test_ prefix

project/
│
├── src/
│   └── your_module/
│       ├── __init__.py
│       └── your_code.py
│
├── tests/
│   ├── your_module/
│       └── test_your_code.py
│
├── pytest.ini
├── requirements.txt

2. Understanding Testing Concepts

Types of Tests

Unit Tests: Test individual functions or classes in isolation.
Integration Tests: Test the interaction between different modules or services.

Integration tests are often costly and only run at time of PR merge or push vs unit tests are run often (perhaps on every commit).

Best Practices

Some tips you might want to follow:

Arrange-Act-Assert (AAA) structure

The idea of this structure is to make sure that your test is clear and easy to understand. It is just a suggestion and not a strict rule.

An example of this structure is following:

def test_example():
    """
    This is a simple test function to demonstrate the Arrange-Act-Assert (AAA) structure.
    It tests the addition of two numbers.
    """
    # Arrange: Set up the initial conditions and inputs
    a = 1
    b = 2

    # Act: Execute the function or method under test
    c = a + b

    # Assert: Verify that the outcome is as expected
    assert c == 3

This just makes it an easy read for your fellow developers.

Test Isolation

As a best practice, you want to keep each test as small and focused as possible. This makes it easier to understand and maintain.

That means:

Don't want to test multiple things in a single test.
Don't want to use any external dependencies in your tests. If you do, you should mock them.
Don't share any state between tests. If you do, you should mock them.
Generate and destroy any resources you need in the test. Unless it's state doesn't affect other tests.

Descriptive Test Names

When the test fails, the key information highlighted is the test name. It can save you a lot of time if you name the test well.

General convention you can follow is:

test_[component]_[functionality]_[expected_behavior]

Examples

def test_github_repository_create_pull_request_with_installation_id():
    # Format: test_github_repo_create_pr_with_installation_id
    pass

def test_jira_repository_create_issue_with_project_and_summary():
    # Format: test_jira_repo_create_issue_with_project_
    pass

def test_jira_repository_create_issue_with_project_and_summary_and_description():
    # Format: test_jira_repo_create_issue_with_project_and_description
    pass

3. Mocking and Patching

Mocking and patching are essential techniques for isolating the code under test by replacing dependencies with mock objects. This section covers both concepts to help you understand when and how to use each effectively.

Mocking Basics

Mocking is like creating pretend objects for testing. These pretend objects act like real parts of your program, but you can control how they behave. This is useful when you want to test one part of your code without worrying about other parts.

For example, imagine you're testing a function that sends an email. Instead of actually sending an email every time you run the test, you can create a mock email sender that pretends to send an email. This way, you can check if your function is trying to send the email correctly, without actually sending anything.

What to Mock

Isolate the Unit of Work: Only mock the parts of your system that interact with external resources or are outside the scope of the test. Avoid mocking everything, as this can lead to tests that pass without actually verifying the behavior of your code.
Focus on External Dependencies: Common candidates for mocking include:
- I/O Operations: File systems, network calls, databases.
- Third-Party Services: APIs, external libraries.
- Time-Dependent Functions: time.sleep(), datetime.now().

How to Mock

Choose the Right Mock Type:
- Mock: A general-purpose mock object that can emulate any Python object. Use it when you need flexibility.
- MagicMock: A subclass of Mock with preconfigured magic methods. Ideal for mocking objects that require magic methods like __str__, __len__, etc.
Understand Sync vs. Async:
- Synchronous Code: Use Mock or MagicMock for standard functions and methods.
- Asynchronous Code: Use AsyncMock to mock async functions and methods, ensuring they behave correctly with await.

We'll start with a simple example of mocking a function:

from unittest.mock import Mock

# Step 1: Create a mock object that will replace the real dependency.
dependency = Mock()

# Step 2: Define the behavior of the mock object.
# Here, we set the return value of the mock object to 42.
dependency.return_value = 42

def function_under_test():
    """
    This function represents the code we want to test.
    It calls the dependency and returns its result.
    Since we are using a mock, it will return the value we specified (42).
    """
    return dependency()

# Step 3: Use the mock object in the function under test.
result = function_under_test()

# Step 4: Verify the behavior of the function under test.
# The assert statement checks if the function_under_test returns 42 as expected.
assert result == 42

# Step 5: Verify that the mock object was called as expected.
# This ensures that the dependency was actually used in the function.
dependency.assert_called_once()

In this example, we create a mock object to replace a dependency. By setting the return_value to 42, we ensure that our function under test always receives a predictable value, regardless of the actual implementation of the dependency.

Using MagicMock

Occasionally, you may need to mock an object that includes magic methods such as __str__ or __len__. Here's an example of how to achieve this:

from unittest.mock import MagicMock

# Example: Mocking an object that needs __str__
mock_obj = MagicMock()
mock_obj.__str__.return_value = "Mocked Object"

assert str(mock_obj) == "Mocked Object"

Mocking Asynchronous Functions

When working with asynchronous code, you'll need to use AsyncMock. Here's an example:

import asyncio
from unittest.mock import AsyncMock

# Example: Mocking an async function
async_dependency = AsyncMock(return_value="Async Result")

async def async_function_under_test():
    return await async_dependency()

assert asyncio.run(async_function_under_test()) == "Async Result"

Simulating Exceptions with side_effect

Side effect is a feature of mock objects that allows you to simulate an exception or a sequence of values.

Here's how you can use side_effect to simulate an exception:

# Example: Using side_effect to raise an exception
mock_func = Mock(side_effect=ValueError("An error occurred"))

try:
    mock_func()
except ValueError as e:
    assert str(e) == "An error occurred"

# Example: Using return_value for sequential returns
mock_sequence = Mock(side_effect=[1, 2, 3])
assert mock_sequence() == 1
assert mock_sequence() == 2
assert mock_sequence() == 3

For a comprehensive guide on mocking in Python, refer to the official unittest.mock documentation.

Patching Techniques

Patching is like putting a temporary sticker over part of your code during a test. It lets you replace real parts of your program with pretend ones (mocks) just for the duration of the test.

Imagine your code uses a weather service to get the temperature. When testing, you don't want to actually connect to the weather service every time. Instead, you can "patch" the weather service part with a pretend one that always returns the temperature you want for your test.

Pytest offers two primary methods: patch and monkeypatch.

Using `patch`

The patch function from unittest.mock allows you to replace objects in specific modules.

from unittest.mock import patch

@patch('module.ClassName')
def test_function(mock_class):
    module.ClassName()
    assert mock_class.called

Key Points:

Scope Control: Patches are applied for the duration of the test.
Method Replacement: Can mock methods, properties, and classes.

How Patching Works

The core principle of patching is to change the object that a name points to with another one, typically a mock object. Importantly, you patch where an object is looked up, not necessarily where it's defined. This distinction is crucial for effective patching.

Patching Methods and Properties

You can patch methods and properties of objects as well:

# Patching a method
@patch.object(SomeClass, 'method_name')
def test_method(mock_method):
    instance = SomeClass()
    instance.method_name()
    mock_method.assert_called_once()

# Patching a property
@patch.object(SomeClass, 'property_name', new_callable=PropertyMock)
def test_property(mock_property):
    mock_property.return_value = 'mocked_value'
    instance = SomeClass()
    assert instance.property_name == 'mocked_value'

Patching Techniques: Decorators vs. Context Managers

Patching can be applied using decorators or context managers:

Decorator approach (as seen in previous examples):

@patch('module.ClassName')
def test_function(mock_class):
    # Test implementation

Context manager approach:

def test_function():
    with patch('module.ClassName') as mock_class:
        # Test implementation

Use decorators for patching throughout an entire test function. Context managers are preferable when you need more fine-grained control over when the patch is applied and removed within a test.

Order of Application in Nested Patches

When using multiple patch decorators, they are applied from bottom to top, while mock objects are passed to the decorated function in reverse order:

@patch('module.Class2')
@patch('module.Class1')
def test(mock_class1, mock_class2):
    module.Class1()
    module.Class2()
    assert mock_class1.called and mock_class2.called

In this example, module.Class1 is patched first, but mock_class1 is passed as the first argument to the test function.

Using `monkeypatch`

monkeypatch is a fixture provided by pytest that allows you to modify or replace attributes, dictionaries, environment variables, etc., in a more flexible and cleaner way.

According to Wikipedia, the term "monkeypatching" likely originated from "guerrilla patching" in software development, which referred to quick and informal fixes. Over time, "guerrilla" was mispronounced as "gorilla," and eventually evolved into "monkey." The term "monkey" also fits well with the concept of "monkeying around" or "monkey business," suggesting playful or mischievous alterations.

def mock_function():
    return "mocked"

def test_function_replacement(monkeypatch):
    import mymodule
    # Replace the original_function in mymodule with mock_function
    monkeypatch.setattr(mymodule, "original_function", mock_function)
    # Check if the replacement was successful
    assert mymodule.original_function() == "mocked"

Advantages of monkeypatch:

Cleaner Test Functions: Reduces the need for multiple decorators.
Automatic Cleanup: Ensures modifications are reverted after tests.
Flexibility: Easily modify environment variables, classes, functions, etc.

Changing Environment Variables

Monkeypatch is particularly useful for managing environment variables in tests:

@pytest.fixture
def mock_env_vars(monkeypatch):
    """
    This fixture uses the monkeypatch fixture to temporarily set environment variables
    for the duration of the tests. It sets the DATABASE_URL to use an in-memory SQLite
    database and the API_KEY to a test value.
    """
    monkeypatch.setenv("DATABASE_URL", "sqlite:///:memory:")
    monkeypatch.setenv("API_KEY", "test_api_key")

def test_database_connection(mock_env_vars):
    """
    This test checks the database connection using the mocked DATABASE_URL environment variable.
    It ensures that the database URL is set to the in-memory SQLite database.
    """
    db = create_database_connection()
    assert db.url == "sqlite:///:memory:"

def test_api_client(mock_env_vars):
    """
    This test checks the API client using the mocked API_KEY environment variable.
    It ensures that the API client's key is set to the test value.
    """
    client = APIClient()
    assert client.api_key == "test_api_key"

Reducing Code Duplication by adding to fixture

One of the key advantages of monkeypatch is its ability to reduce code duplication across tests. Instead of adding decorators to every test, you can use a fixture:

import pytest

@pytest.fixture(autouse=True)
def mock_external_api(monkeypatch):
    def mock_api_call(*args, **kwargs):
        return {"status": "success", "data": "mocked data"}

    monkeypatch.setattr("myapp.external_api.make_call", mock_api_call)

# This mock will be automatically applied to all tests in this module
def test_using_external_api():
    result = myapp.process_external_data()
    assert result == "processed: mocked data"

When to Use Each

Use patch when:
- You need detailed control over mock object behavior.
- Patching is specific to a single test.
- Working with complex mock objects requiring specific method behaviors.
Use monkeypatch when:
- Modifying environment variables or system-level attributes.
- Applying the same modifications across multiple tests.
- Simplifying test setup with straightforward attribute/function replacements.

Advanced Patching Examples

Demonstrate complex scenarios involving multiple patches.

from unittest.mock import patch

@patch('src.git_providers.repository_factory.RepositoryFactory.create_repository', return_value=mock_repo)
@patch('src.rag.rag.Rag', return_value=mock_rag)
@patch('src.llm.llm.AnthropicLLM', return_value=mock_llm)
@patch('yaml.safe_load')
def test_load_tracker(mock_yaml_load, mock_anthropic, mock_rag, mock_repo_factory):
    # Test implementation
    # Note: mock objects are passed in reverse order of the decorators

4. Fixtures

Using yield in Fixtures

Use yield in fixtures when you need to perform actions both before and after a test runs. This is particularly useful for resources that require explicit teardown, such as:

Database Connections: Establish a connection before the test and close it afterward.
File Handling: Open a file for writing in setup and ensure it's properly closed in teardown.
External Services: Start a mock server or service before the test and shut it down afterward.

import pytest

@pytest.fixture
def resource_setup_teardown():
    # Setup code
    resource = create_resource()
    yield resource
    # Teardown code
    resource.cleanup()

Fixture Scopes

Pytest offers different scopes for fixtures:

function (default): Run once per test function
class: Run once per test class
module: Run once per module
package: Run once per package
session: Run once per test session

Choose the scope based on the resource lifecycle and test requirements:

@pytest.fixture(scope="module")
def database_connection():
    conn = create_db_connection()
    yield conn
    conn.close()

Use broader scopes (e.g., module or session) for expensive setup operations, but be cautious of potential state sharing between tests.

When to Use yield vs. return

Use yield when you need to perform cleanup actions after the test.
Use return when no cleanup is necessary, and you're simply providing a value.

@pytest.fixture
def simple_data():
    return {"key": "value"}  # No cleanup needed

5. Running Tests

This is one of the easiest parts. Just run the following commands:

General Commands

pytest

Description: Executes all the test cases in your project.
Usage: Navigate to your project's root directory in the terminal and run the pytest command. Pytest will automatically discover and run all tests following the naming conventions (test_*.py or *_test.py).

Run Tests with Coverage Analysis

pytest --cov=src

Description: Runs all tests and generates a coverage report for the specified source directory (src in this case).
Usage: This command not only runs your tests but also checks which parts of the code are covered by the tests.
Output: After execution, you'll receive a coverage summary indicating the percentage of code covered by tests. For a more detailed report, you can add the --cov-report option, such as --cov-report=html to generate an HTML report.

Run a Specific Test File

pytest tests/unittest/connectors/git/github_repository.py

Description: Executes all tests within the specified test file.
Usage: Replace the file path with the path to the test file you wish to run.

Run a Specific Test Function

pytest tests/unittest/connectors/git/github_repository.py -k test_stage_and_commit_and_push_with_installation_id

Description: Runs a specific test function within a test file.
Usage: Use the -k option followed by the test function name to target specific tests.

Run Tests from a Module

pytest -m "module_name"

Description: Executes all tests that are marked with a specific module name.
Usage: Decorate your tests with @pytest.mark.module_name to categorize them, then use the -m option to run tests from that module.

Run Failed Tests from Last Run

pytest --last-failed

Description: Re-runs only the tests that failed during the previous test run.
Usage: This is useful for quickly iterating on fixing tests without running the entire test suite again.

Viewing Coverage Reports with Python HTTP Server

After generating a coverage report in HTML format, you can serve it locally using Python's built-in HTTP server for easy viewing in your web browser.

Generate HTML Coverage Report
```
pytest --cov=src --cov-report=html
```
- Description: Generates an HTML coverage report in the htmlcov directory.
Navigate to the Coverage Report Directory
```
cd htmlcov
```
Start Python HTTP Server
```
python -m http.server 8000
```
- Description: Starts a local HTTP server on port 8000 serving the current directory.
View Coverage Report in Browser
Open your web browser and navigate to http://localhost:8000. Click on index.html to view the detailed coverage report.
- Note: Ensure that port 8000 is not in use by another application. You can change the port number if necessary.

Additional Tips

Verbose Output
For more detailed test output, use the -v option:
```
pytest -v
```
Running Tests with a Specific Keyword
To run tests that match a specific keyword in their names or markers:
```
pytest -k "keyword"
```
Excluding Files or Directories
To exclude certain files or directories from testing, you can add a configuration in your pytest.ini or use command-line options.
Parallel Test Execution
To speed up test execution by running tests in parallel, consider using the pytest-xdist plugin:
```
pytest -n auto
```
- Description: Automatically detects the number of available CPU cores and runs tests in parallel.
Caching Test Results
Pytest can cache previous test results (in .pytest_cache directory) to optimize subsequent test runs. Use the --cache-show and --cache-clear options to view and manage the cache.

Conclusion

This is just a basic guide to writing effective tests in Python. There is a lot more to it, but it's a good start and your tests will evolve as your code evolves.

Happy testing!

Glossary

Term	Definition
pytest	A testing framework for Python that simplifies test writing and execution.
fixture	A function that provides data or objects to tests, often used for setup and teardown.
assert	A statement used to check if a condition is true, raising an exception if it's false.
mock	An object that simulates the behavior of real objects in controlled ways for testing.
patch	A method to temporarily replace or modify objects, functions, or classes during testing.
monkeypatch	A pytest fixture that helps modify or replace attributes, dictionaries, or environment variables.
coverage	A measure of how much of your code is executed by your tests.
AAA (Arrange-Act-Assert)	A pattern for structuring test cases to improve readability and maintainability.
test isolation	The practice of ensuring each test is independent and doesn't affect other tests.
pytest.ini	A configuration file for pytest that sets up test discovery and execution options.
marker	A decorator used to add metadata to test functions for categorization or selective execution.
parameterize	A feature in pytest to run the same test function with different input values.
conftest.py	A special file recognized by pytest for sharing fixtures across multiple test files.
test discovery	The process by which pytest automatically finds and collects test files and functions.
xfail	A marker to indicate that a test is expected to fail, useful for documenting known issues.