Files

Joachim Hummel e51c5184e9 docs: update AGENTS.md with comprehensive repository guidelines

2025-11-13 23:30:23 +00:00

18 KiB

Raw Blame History

Repository Guidelines for AI Agents

Project Overview

This is a Python password hashing utility supporting multiple cryptographic algorithms (PBKDF2, Argon2, bcrypt). The project provides both library functions and two CLI implementations with different UX patterns.

Key Architecture:

Algorithm Registry Pattern: Pluggable hash algorithms implementing a common Algorithm protocol
Auto-registration: Algorithms register themselves on module import
Two CLIs: salt.py (shortcut syntax) and salt2.py (explicit commands)
Backward compatible: Default is PBKDF2 for compatibility with legacy code

Module Structure:

salt.py               # Core library + CLI with shortcut syntax
salt2.py              # Alternative CLI with explicit subcommands
algorithms.py         # Algorithm registry and Protocol definition
pbkdf2_algorithm.py   # PBKDF2-HMAC-SHA256 implementation
argon2_algorithm.py   # Argon2id implementation
bcrypt_algorithm.py   # bcrypt implementation
tests/
  test_hashing.py     # Tests for core hash/verify functions
  test_cli.py         # Tests for salt.py CLI behavior
  test_algorithms.py  # Tests for algorithm implementations

Essential Commands

Running the Tools

# Hash a password with default algorithm (PBKDF2)
python3 salt.py "MyPassword"                    # Shortcut syntax
python3 salt.py hash "MyPassword"               # Explicit command
python3 salt2.py generate "MyPassword"          # Alternative CLI

# Hash with specific algorithm
python3 salt.py hash --algorithm argon2 "MyPassword"
python3 salt.py --algorithm bcrypt "MyPassword"  # Shortcut with algorithm
python3 salt2.py generate "MyPassword"           # Note: salt2.py doesn't support --algorithm yet

# Verify a password
python3 salt.py verify "MyPassword" <salt_b64> <hash_b64>
python3 salt.py verify --algorithm argon2 "MyPassword" "" <hash_b64>

# List available algorithms
python3 salt.py list-algorithms  # Not implemented yet, would be useful addition

Testing

# Run all tests (20 tests total as of now)
python3 -m pytest

# Verbose output
python3 -m pytest -v

# Run specific test files
python3 -m pytest tests/test_hashing.py
python3 -m pytest tests/test_cli.py
python3 -m pytest tests/test_algorithms.py

# Run specific test patterns
python3 -m pytest -k verify
python3 -m pytest -k algorithm

# Run with verbose failure details
python3 -m pytest --maxfail=1 -vv

# Run tests with coverage (if pytest-cov installed)
python3 -m pytest --cov=. --cov-report=term-missing

Development Workflow

# Install dependencies
pip install -r requirements.txt

# Quick smoke test of CLI
python3 salt.py "test" && echo "salt.py works"
python3 salt2.py generate "test" && echo "salt2.py works"

# Test algorithm round-trip
salt_hash=$(python3 salt.py "secret" | tail -1 | cut -d' ' -f2)
# Extract salt and hash from output for verify command

Code Organization and Patterns

Algorithm Registry Pattern

How it works:

algorithms.py defines Algorithm Protocol with hash(), verify(), and identifier attribute
Each algorithm module creates a class implementing the protocol
At module bottom, algorithm imports register_algorithm and registers itself
algorithms.py imports all algorithm modules at bottom to trigger registration
Core functions use get_algorithm(name) to retrieve registered implementations

Adding a new algorithm:

Create <name>_algorithm.py implementing Algorithm protocol
Add register_algorithm(<Name>Algorithm()) at bottom of file
Import module at end of algorithms.py: import <name>_algorithm # noqa: E402, F401
Add tests to tests/test_algorithms.py
Update CLI choices in _build_parser() if using explicit choices (currently uses dynamic list)

Import Pattern for Registration

Critical: The algorithm modules use a specific import pattern to avoid circular dependencies:

# At BOTTOM of algorithm module (e.g., pbkdf2_algorithm.py)
from algorithms import register_algorithm

# Auto-register when module is imported
register_algorithm(PBKDF2Algorithm())

In algorithms.py:

# At BOTTOM of file
# Import to auto-register algorithms
import pbkdf2_algorithm  # noqa: E402, F401
import argon2_algorithm  # noqa: E402, F401
import bcrypt_algorithm  # noqa: E402, F401

The # noqa: E402, F401 comments suppress linter warnings about:

E402: module level import not at top of file (intentional for registration)
F401: imported but unused (side-effect import for registration)

Salt Handling Across Algorithms

IMPORTANT difference between algorithms:

PBKDF2: Generates separate 16-byte salt with os.urandom(), returns (salt_b64, hash_b64) where both are populated
Argon2: Embeds salt in hash string (e.g., $argon2id$v=19$m=65536,t=3,p=4$...), returns ("", hash_b64) with empty salt
bcrypt: Embeds salt in hash string (e.g., $2b$12$...), returns ("", hash_b64) with empty salt

When verifying Argon2/bcrypt passwords, the salt_b64 parameter is ignored since the salt is embedded in hash_b64.

CLI Argument Normalization (salt.py only)

salt.py has special shortcut syntax via _normalize_args():

# These are equivalent:
python3 salt.py "mypassword"
python3 salt.py hash "mypassword"

# Algorithm shortcut also works:
python3 salt.py --algorithm argon2 "mypassword"
# Becomes: python3 salt.py hash --algorithm argon2 "mypassword"

Normalization rules:

If first arg is hash or verify → pass through unchanged
If first arg is -h or --help → pass through unchanged
If first arg is --algorithm or -a → prepend hash
Otherwise (plain password) → prepend hash

Tested in: tests/test_cli.py::test_main_supports_hash_shortcut() and test_main_hash_algorithm_shortcut()

Core Functions

hash_password(password, *, algorithm="pbkdf2", iterations=None, salt_bytes=16) -> tuple[str, str]

Located in salt.py lines 19-30
Returns (salt_b64, hash_b64) tuple
For Argon2/bcrypt: salt_b64 will be empty string
Delegates to algorithm implementation via get_algorithm(algorithm).hash(...)

verify_password(password, salt_b64, hash_b64, *, algorithm="pbkdf2", iterations=None) -> bool

Located in salt.py lines 33-45
Returns True on match, False on mismatch or invalid input
Never raises exceptions (catches binascii.Error, ValueError)
Uses hmac.compare_digest() for timing-safe comparison (PBKDF2)
Delegates to algorithm implementation via get_algorithm(algorithm).verify(...)

Naming Conventions and Style

Python Style

Target version: Python 3.11+ (using | union types, structural pattern matching potential)
Actual version in use: Python 3.12.3 (per pytest output)
Indentation: 4 spaces (never tabs)
Type hints: Required for all public functions, use from __future__ import annotations for forward refs
Function/variable names: lowercase_with_underscores
Class names: CapWords (e.g., PBKDF2Algorithm, Argon2Algorithm)
Private/internal functions: Prefix with _ (e.g., _build_parser(), _normalize_args())

Naming Patterns Observed

Module names: <algorithm>_algorithm.py (e.g., pbkdf2_algorithm.py) Class names: <Algorithm>Algorithm (e.g., PBKDF2Algorithm, BcryptAlgorithm) Test files: test_<feature>.py (e.g., test_hashing.py, test_algorithms.py) Test functions: test_<behavior>_<context>() with docstrings explaining intent

Constants

DEFAULT_SALT_BYTES = 16
DEFAULT_ITERATIONS = int(os.environ.get("PBKDF2_ITERATIONS", "200000"))

Import Order

from __future__ import annotations (always first if present)
Standard library imports
Third-party imports (argon2, bcrypt)
Local imports (at bottom for registration side-effects)

Testing Guidelines

Test Structure

Coverage goal: >90% for hashing utilities

Test organization:

test_hashing.py: Core hash_password() and verify_password() functions
test_cli.py: CLI behavior for salt.py (including shortcut syntax)
test_algorithms.py: Individual algorithm implementations

Testing Patterns

Round-trip tests (most important):

def test_<algo>_algorithm_hash_round_trip():
    """Verify <algo> algorithm can hash and verify passwords."""
    algo = get_algorithm("<algo>")
    salt, hashed = algo.hash("test password")
    assert algo.verify("test password", salt, hashed)
    assert not algo.verify("wrong password", salt, hashed)

Property tests:

Base64 validation for outputs (see test_hash_password_returns_base64)
Invalid input handling (see test_verify_password_handles_invalid_base64)
Algorithm-specific behavior (see test_pbkdf2_algorithm_respects_iterations)

CLI tests:

def test_main_<command>_<scenario>():
    """Test description."""
    assert main([<args>]) == <expected_exit_code>

Exit codes: 0 = success, 1 = verification failure

Test Execution Notes

20 tests total (as of current state)
Test execution time: ~5 seconds (mostly from Argon2/bcrypt hashing)
All tests must pass before committing
Use pytest -v for verbose output
Use pytest -k <pattern> to run subset

Adding Tests for New Algorithms

When adding a new algorithm, add these tests to tests/test_algorithms.py:

def test_<algo>_algorithm_hash_round_trip():
    """Verify <algo> algorithm can hash and verify passwords."""
    algo = get_algorithm("<algo>")
    salt, hashed = algo.hash("test password")
    assert algo.verify("test password", salt, hashed)
    assert not algo.verify("wrong password", salt, hashed)

def test_<algo>_algorithm_identifier():
    """Verify <algo> algorithm has correct identifier."""
    algo = get_algorithm("<algo>")
    assert algo.identifier == "<algo>"

And to tests/test_cli.py:

def test_main_hash_with_algorithm_<algo>():
    """Test hash command with --algorithm <algo>."""
    assert main(["hash", "--algorithm", "<algo>", "test-password"]) == 0

Important Gotchas and Non-Obvious Patterns

1. Algorithm Registration Order Matters

The imports at the bottom of algorithms.py trigger registration. If you import algorithms module before the algorithm modules are imported, the registry will be empty. This is handled correctly in the current code, but be careful when refactoring.

2. CLI Output to Stdout (Tests Must Capture)

The CLI functions print to stdout. Tests use main([...]) to check exit codes but don't capture output. If you need to test output content, use capsys fixture:

def test_output_format(capsys):
    main(["hash", "test"])
    captured = capsys.readouterr()
    assert "Salt:" in captured.out
    assert "Hash:" in captured.out

3. Empty Salt for Argon2/bcrypt

When working with Argon2 or bcrypt hashes:

The first element of the tuple is always "" (empty string)
The second element contains the full hash with embedded salt
When verifying, pass "" for salt or any string (it's ignored)
Tests should handle this: salt, hashed = algo.hash("pw") where salt == ""

4. Environment Variable for Iterations

PBKDF2_ITERATIONS environment variable affects default behavior. Tests should be aware this can change behavior:

DEFAULT_ITERATIONS = int(os.environ.get("PBKDF2_ITERATIONS", "200000"))

5. German UI Text

CLI output and help text are in German:

"✓ Passwort korrekt" / "✗ Passwort falsch"
Help text uses German descriptions
Comments in code are mixed German/English
When adding CLI features, maintain German for user-facing text

6. Exception Handling in Verify

verify_password() catches exceptions and returns False rather than raising:

try:
    salt = base64.b64decode(salt_b64, validate=True)
    stored_hash = base64.b64decode(hash_b64, validate=True)
except (binascii.Error, ValueError):
    return False

This is intentional for security (avoid leaking information via exception types).

7. Dynamic Algorithm List

The CLI parser uses list_algorithms() to populate choices dynamically:

hash_parser.add_argument(
    "--algorithm",
    "-a",
    choices=list_algorithms(),  # Dynamic based on registered algorithms
    default="pbkdf2",
    help="Hash-Algorithmus (Standard: pbkdf2)",
)

This means adding a new algorithm automatically adds it to CLI choices.

8. salt2.py Missing Algorithm Support

Current limitation: salt2.py does NOT yet support the --algorithm flag. It only uses the default (PBKDF2). To add support:

Update _build_parser() to add --algorithm argument to both subparsers
Update _command_generate() to pass algorithm=args.algorithm
Update _command_verify() to pass algorithm=args.algorithm
Add tests to tests/test_cli.py or create tests/test_cli2.py

9. Relative Imports vs Absolute Imports

The code uses absolute imports (not relative):

from algorithms import get_algorithm  # Not: from .algorithms import get_algorithm

This works because modules are in the root directory, not a package. Keep this pattern consistent.

Security Practices

Cryptographic Randomness

Always use os.urandom() for salt generation (never random module)
Never use predictable values or timestamps

Timing Attack Protection

Use hmac.compare_digest() for hash comparison (PBKDF2)
Argon2 and bcrypt libraries handle timing safety internally

Input Validation

Use base64.b64decode(data, validate=True) to validate base64 format
Catch and handle binascii.Error and ValueError
Return False rather than raising exceptions in verify functions

Iteration Count

Default 200,000 for PBKDF2 (OWASP recommended as of 2023)
Configurable via PBKDF2_ITERATIONS environment variable
Always parameterize in functions (don't hardcode)

No Logging of Secrets

Never log passwords, salts, or hashes
Print statements only in CLI code for user output
Tests don't need to capture sensitive values

Configuration and Environment

Environment Variables

PBKDF2_ITERATIONS

Default: "200000" (as string, converted to int)
Used by: pbkdf2_algorithm.py and salt.py
Example: export PBKDF2_ITERATIONS=300000

Dependencies (requirements.txt)

pytest>=7.4              # Testing framework
argon2-cffi>=23.1.0      # Argon2id implementation
bcrypt>=4.1.0            # bcrypt implementation

Standard library only: hashlib, hmac, os, base64, binascii, argparse, sys

Python Version Requirements

Minimum: Python 3.11 (for | union type syntax)
Tested on: Python 3.12.3
Type hints: Using modern syntax (int | None, not Optional[int])

Commit Guidelines

Commit Message Format

Use conventional commit format with imperative mood:

<type>: <short description>

<optional longer description>

<optional footer>

Types in use:

feat: - New features (e.g., "feat: add argon2 algorithm")
fix: - Bug fixes
test: - Test additions/changes
refactor: - Code restructuring without behavior change
docs: - Documentation updates

Examples from Git History

docs: add README.md for repository
docs: translate CLAUDE.md to German
first commit

Before Committing

Run full test suite: python3 -m pytest -v
Ensure all tests pass (20/20)
Verify no regressions in existing functionality
Check that new code follows style conventions
Add tests for new functionality

Pull Request Guidelines

Keep PRs focused on single feature/fix
Include motivation in description
Reference issues with Fixes #ID
For security changes: include CLI output or reproduction
All tests must pass
Maintain >90% coverage

Planned Enhancements and Known Gaps

Based on existing documentation and code:

Missing Features (mentioned in plans but not implemented)

list-algorithms command - Mentioned in plan (Task 9) but not in current code
salt2.py algorithm support - salt2.py doesn't support --algorithm flag yet
Integration tests - tests/test_integration.py mentioned in plan but not created
Backward compatibility test - Specific test mentioned in Task 12 plan

Documentation Gaps

README.md is in German but doesn't mention multi-algorithm support yet
CLAUDE.md is German translation but may be outdated vs English version
No API docs for algorithm developers

Potential Improvements

Add list-algorithms subcommand to CLIs
Complete algorithm support in salt2.py
Add integration tests with subprocess
Document algorithm addition process more clearly
Consider packaging as proper Python package (salt/ directory structure)

Working with Plans and Documentation

Plan Execution Pattern

The repository includes a detailed plan at docs/plans/2025-11-13-multi-algorithm-support.md:

12 tasks with step-by-step TDD approach
Each task has verification steps
Plan includes commit message examples
Some tasks completed, others not yet

If implementing from plans:

Read the full task before starting
Follow TDD: failing test first, then implementation
Run tests after each step
Use suggested commit messages
Mark off checklist items as you complete them

Multiple Documentation Files

AGENTS.md: This file - agent/developer guidelines
CLAUDE.md: German translation, more detailed code examples
README.md: User-facing documentation in German
docs/plans/*.md: Implementation plans with TDD steps

Keep these in sync when making significant changes.

Quick Reference

File a Bug or Add a Feature

Create test demonstrating issue/feature
Implement fix/feature
Run python3 -m pytest -v
Commit with conventional message
Update docs if needed

Debug a Test Failure

# Run specific test with verbose output
python3 -m pytest tests/test_file.py::test_name -vv

# Run with pdb on failure
python3 -m pytest --pdb tests/test_file.py::test_name

# See print statements
python3 -m pytest -s tests/test_file.py

Verify All Systems

# Comprehensive check
python3 -m pytest -v && \
python3 salt.py "test" && \
python3 salt.py hash --algorithm argon2 "test" && \
python3 salt.py hash --algorithm bcrypt "test" && \
python3 salt2.py generate "test" && \
echo "All systems operational"

Document Version: 2025-11-13 Last Updated: Initial comprehensive analysis based on repository state Test Count: 20 tests (all passing) Python Version: 3.12.3

18 KiB Raw Blame History