Testing Guide¶
This is the practical companion to the testing philosophy. It covers where tests live, what fixtures are available, and how to add tests when you introduce new actions.
Running tests¶
uv run pytest -v # All tests (core + packages)
uv run pytest tests/safety # Safety tests only
uv run pytest packages/assistant-email/tests/ # Email tests in isolation (no app config)
uv run pytest packages/assistant-gemini/tests/ # Gemini tests in isolation
uv run pytest packages/assistant-github/tests/ # GitHub tests in isolation
uv run pytest packages/assistant-sdk/tests/ # SDK tests in isolation
CI runs on GitHub Actions (.github/workflows/test.yml): checkout, setup uv, sync, pytest.
Test organization¶
tests/
conftest.py # Shared fixtures
test_actions.py # Shared action partitioning, input resolution, template rendering
test_chat.py # ChatService conversation lifecycle and chat_message_handler
test_chat_routes.py # Chat API endpoint request/response tests
test_cli.py # CLI parser routing, setup wizard, doctor subcommands
test_config.py # YoloAction tag parsing, ScriptConfig, QueuePolicyConfig models
test_llm.py # LLM conversation, schema validation, structured output retry
test_loader.py # Integration discovery, manifest parsing, dynamic model construction
test_queue.py # Queue lifecycle + stateful property tests
test_queue_policy.py # Dedup, rate limiting, fingerprint, policy resolution
test_result_routes.py # Service result routing (note persistence, custom paths, fallbacks)
test_scheduler.py # interval_to_cron conversion and range validation
test_script_execution.py # Script executor, output capture, input injection, timeout handling
test_sdk.py # SDK public API imports, ServiceManifest defaults, runtime registration
test_services.py # Service manifest parsing, handler registration, enqueue with on_result
test_store.py # NoteStore CRUD + archive
test_supervisor.py # Process supervisor lifecycle, sentinel file, restart endpoint
test_ui_presenters.py # UI presenters: secret masking, classification/automation formatting
test_ui_routes.py # Web UI route handlers, secret masking in rendered pages
test_worker.py # Worker dispatch, lifecycle (happy/fail/routing), stale recovery
test_yaml_rw.py # YAML config round-trip read/write, comment preservation
safety/
test_automation_invariants.py # Property tests: all possible classifications
test_chaos.py # Chaos tests: garbage LLM output
test_provenance.py # Provenance derivation + safety validation
test_reference_validation.py # Script/service reference existence warnings
packages/assistant-email/tests/
test_act.py # Action execution, allowlist enforcement
test_check.py # Window parsing, inbox fetch ordering, IMAP criteria
test_classify.py # Condition matching, operators, schema building
test_email_store.py # EmailStore message-ID sanitization, dedup across subdirs
test_evaluate.py # Snapshot construction, resolver patterns, automation evaluation
test_mail_parsing.py # Header parsing (auth, unsubscribe, dates, calendar)
packages/assistant-gemini/tests/
test_client.py # GeminiClient two-pass flow (mocked)
test_web_research.py # Service handler with/without output_schema
packages/assistant-github/tests/
test_client.py # GitHub API client parsing, search dedup, PR status derivation
test_entity_store.py # GitHubEntityStore: save, find, find_anywhere, move_to_synced
test_evaluate.py # PR/Issue snapshot construction, resolver, automation evaluation
test_issue_store.py # IssueStore save, field mappings, URL generation, defaults
test_pr_store.py # PullRequestStore save, field mappings, URL generation
packages/assistant-sdk/tests/
test_action_partitioning.py # Action detection, input resolution, script/service partitioning
test_classify.py # Schema building (confidence/boolean/enum), Jinja2 environment
test_evaluate.py # Automation evaluation engine: conditions, operators, dedup
test_manifest.py # Manifest dataclasses (Service, Platform, Integration)
test_models.py # Pydantic config models: YoloAction, ClassificationConfig
test_note_store.py # NoteStore CRUD, archive, directory creation
test_provenance.py # Provenance resolution from condition keys (rule/llm/hybrid)
test_runtime.py # Runtime registration pattern and RuntimeNotRegistered guards
test_task.py # TaskPayload and TaskRecord TypedDict smoke tests
Package tests import from assistant_sdk.* directly. They run without loading the app config singleton, which means they can be executed in isolation.
Fixtures¶
Defined in tests/conftest.py:
queue_dir- Creates an isolated temp directory with queue subdirectories (pending/,active/,done/,failed/) and monkeypatchesqueue.BASE_DIRto point at it. Each test gets a clean queue.notes_dir- Isolated temp directory for NoteStore operations.- Config bootstrap - If
config.yamldoesn't exist when tests run,conftest.pycreates a minimal one automatically. This is needed because config loads eagerly at import time.
Safety tests¶
Property-based tests (test_automation_invariants.py)¶
Uses Hypothesis to generate 500 random classification outputs per test and asserts:
- Only known actions appear in the output (the
ALLOWED_ACTIONSset) - The number of actions produced is bounded
- Missing classification keys never trigger automations (safe default behavior)
Chaos tests (test_chaos.py)¶
Injects specific fault patterns at the classification level and asserts that safety boundaries still hold. Each chaos scenario produces only allowed actions with no crashes. See the testing philosophy for the full list of fault patterns.
Provenance tests (test_provenance.py)¶
Tests the provenance derivation system (resolve_provenance) and the startup safety validation (_validate_automation_safety). Verifies that irreversible actions from non-deterministic provenance are blocked unless !yolo is set. Covers script actions: scripts from LLM provenance are blocked by default, allowed with !yolo or when the script has reversible: true. Operates at the platform level, matching how automations are configured per-platform in the config.
Stateful queue testing (test_queue.py)¶
QueueStateMachine uses Hypothesis RuleBasedStateMachine to randomly interleave enqueue, dequeue, complete, and fail operations. After every step, two invariants are checked:
- Total task count is conserved (tasks don't appear or disappear)
- No task ID appears in two directories simultaneously
Filesystem snapshot pattern¶
The standard way to assert on queue state after a lifecycle:
def snapshot_tree(base: Path) -> dict:
counts = {}
for subdir in sorted(base.iterdir()):
if subdir.is_dir():
counts[subdir.name] = len(list(subdir.iterdir()))
return {"counts": counts, "total": sum(counts.values())}
Assert on the whole tree, not individual files. After enqueue -> dequeue -> complete: pending is empty, active is empty, done has one file, total is conserved.
Checklist: adding a new action¶
-
Categorize the reversibility tier. Read-only, soft reversible, hard reversible, or irreversible. This comes first. See the safety model for tier definitions.
-
Add the action. For platform-specific actions: add to
SIMPLE_ACTIONSor dict action handling inact.py. For cross-cutting actions (like scripts): add to the shared action layer inassistant_sdk/actions.py. For services: declare in the integration'smanifest.yamlunderservices:. TheSIMPLE_ACTIONSset must not grow without deliberate reversibility review. -
Write tests matching the tier. Read-only gets unit tests. Soft reversible gets filesystem snapshot assertions. Hard reversible gets shadow/dry-run verification. Irreversible gets property-based safety invariants and mandatory dry run.
-
Add property-based coverage. Update
test_automation_invariants.pyso the Hypothesis-generated classifications can produce your new action. Add it to theALLOWED_ACTIONSset. -
Add chaos coverage. Update
test_chaos.pyto verify your action behaves correctly under garbage classification inputs. Add it to theALLOWED_ACTIONSset there too. -
Log it. Use
log.human()when the action fires so it appears in the daily audit log.