From e51c5184e96a7f81463939f53154835d1a57c6c9 Mon Sep 17 00:00:00 2001 From: Joachim Hummel Date: Thu, 13 Nov 2025 23:30:23 +0000 Subject: [PATCH] docs: update AGENTS.md with comprehensive repository guidelines --- AGENTS.md | 562 ++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 545 insertions(+), 17 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index dbf0eac..b905553 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,25 +1,553 @@ -# Repository Guidelines +# Repository Guidelines for AI Agents -## Project Structure & Module Organization -The repo is intentionally small: `salt.py` is a reference implementation that demonstrates secure PBKDF2 hashing, while `salt2.py` wraps the same logic in a CLI with verify mode. Keep new helpers alongside these modules only if they remain single-file utilities; otherwise scaffold a package under `salt/` and mirror tests in `tests/`. Generate derived assets (sample salts, fixtures) in `artifacts/` and never commit secrets or real user data. +## Project Overview -## Build, Test, and Development Commands -Run quick manual checks directly: -```bash -python3 salt.py "MySecret" # prints salt + hash -python3 salt2.py verify # exit 0 on success +This is a Python password hashing utility supporting multiple cryptographic algorithms (PBKDF2, Argon2, bcrypt). The project provides both library functions and two CLI implementations with different UX patterns. + +**Key Architecture:** +- **Algorithm Registry Pattern**: Pluggable hash algorithms implementing a common `Algorithm` protocol +- **Auto-registration**: Algorithms register themselves on module import +- **Two CLIs**: `salt.py` (shortcut syntax) and `salt2.py` (explicit commands) +- **Backward compatible**: Default is PBKDF2 for compatibility with legacy code + +**Module Structure:** ``` -Add dependencies to `requirements.txt` if you introduce new libraries. For automated validation add pytest and expose a default target: -```bash -python3 -m pytest # runs unit tests -python3 -m pytest -k verify --maxfail=1 -vv +salt.py # Core library + CLI with shortcut syntax +salt2.py # Alternative CLI with explicit subcommands +algorithms.py # Algorithm registry and Protocol definition +pbkdf2_algorithm.py # PBKDF2-HMAC-SHA256 implementation +argon2_algorithm.py # Argon2id implementation +bcrypt_algorithm.py # bcrypt implementation +tests/ + test_hashing.py # Tests for core hash/verify functions + test_cli.py # Tests for salt.py CLI behavior + test_algorithms.py # Tests for algorithm implementations ``` -## Coding Style & Naming Conventions -Target Python 3.11+, 4-space indentation, and type-annotate public functions. Use descriptive, lowercase_with_underscores names for variables and helper functions; reserve CapWords for classes if you introduce them. Favor stdlib modules (hashlib, secrets, base64) before external choices. Keep scripts executable (`#!/usr/bin/env python3`) and guard entry points with `if __name__ == "__main__":`. +## Essential Commands + +### Running the Tools + +```bash +# Hash a password with default algorithm (PBKDF2) +python3 salt.py "MyPassword" # Shortcut syntax +python3 salt.py hash "MyPassword" # Explicit command +python3 salt2.py generate "MyPassword" # Alternative CLI + +# Hash with specific algorithm +python3 salt.py hash --algorithm argon2 "MyPassword" +python3 salt.py --algorithm bcrypt "MyPassword" # Shortcut with algorithm +python3 salt2.py generate "MyPassword" # Note: salt2.py doesn't support --algorithm yet + +# Verify a password +python3 salt.py verify "MyPassword" +python3 salt.py verify --algorithm argon2 "MyPassword" "" + +# List available algorithms +python3 salt.py list-algorithms # Not implemented yet, would be useful addition +``` + +### Testing + +```bash +# Run all tests (20 tests total as of now) +python3 -m pytest + +# Verbose output +python3 -m pytest -v + +# Run specific test files +python3 -m pytest tests/test_hashing.py +python3 -m pytest tests/test_cli.py +python3 -m pytest tests/test_algorithms.py + +# Run specific test patterns +python3 -m pytest -k verify +python3 -m pytest -k algorithm + +# Run with verbose failure details +python3 -m pytest --maxfail=1 -vv + +# Run tests with coverage (if pytest-cov installed) +python3 -m pytest --cov=. --cov-report=term-missing +``` + +### Development Workflow + +```bash +# Install dependencies +pip install -r requirements.txt + +# Quick smoke test of CLI +python3 salt.py "test" && echo "salt.py works" +python3 salt2.py generate "test" && echo "salt2.py works" + +# Test algorithm round-trip +salt_hash=$(python3 salt.py "secret" | tail -1 | cut -d' ' -f2) +# Extract salt and hash from output for verify command +``` + +## Code Organization and Patterns + +### Algorithm Registry Pattern + +**How it works:** +1. `algorithms.py` defines `Algorithm` Protocol with `hash()`, `verify()`, and `identifier` attribute +2. Each algorithm module creates a class implementing the protocol +3. At module bottom, algorithm imports `register_algorithm` and registers itself +4. `algorithms.py` imports all algorithm modules at bottom to trigger registration +5. Core functions use `get_algorithm(name)` to retrieve registered implementations + +**Adding a new algorithm:** +1. Create `_algorithm.py` implementing `Algorithm` protocol +2. Add `register_algorithm(Algorithm())` at bottom of file +3. Import module at end of `algorithms.py`: `import _algorithm # noqa: E402, F401` +4. Add tests to `tests/test_algorithms.py` +5. Update CLI choices in `_build_parser()` if using explicit choices (currently uses dynamic list) + +### Import Pattern for Registration + +**Critical:** The algorithm modules use a specific import pattern to avoid circular dependencies: + +```python +# At BOTTOM of algorithm module (e.g., pbkdf2_algorithm.py) +from algorithms import register_algorithm + +# Auto-register when module is imported +register_algorithm(PBKDF2Algorithm()) +``` + +**In algorithms.py:** +```python +# At BOTTOM of file +# Import to auto-register algorithms +import pbkdf2_algorithm # noqa: E402, F401 +import argon2_algorithm # noqa: E402, F401 +import bcrypt_algorithm # noqa: E402, F401 +``` + +The `# noqa: E402, F401` comments suppress linter warnings about: +- E402: module level import not at top of file (intentional for registration) +- F401: imported but unused (side-effect import for registration) + +### Salt Handling Across Algorithms + +**IMPORTANT difference between algorithms:** + +- **PBKDF2**: Generates separate 16-byte salt with `os.urandom()`, returns `(salt_b64, hash_b64)` where both are populated +- **Argon2**: Embeds salt in hash string (e.g., `$argon2id$v=19$m=65536,t=3,p=4$...`), returns `("", hash_b64)` with empty salt +- **bcrypt**: Embeds salt in hash string (e.g., `$2b$12$...`), returns `("", hash_b64)` with empty salt + +When verifying Argon2/bcrypt passwords, the `salt_b64` parameter is ignored since the salt is embedded in `hash_b64`. + +### CLI Argument Normalization (salt.py only) + +**`salt.py` has special shortcut syntax via `_normalize_args()`:** + +```python +# These are equivalent: +python3 salt.py "mypassword" +python3 salt.py hash "mypassword" + +# Algorithm shortcut also works: +python3 salt.py --algorithm argon2 "mypassword" +# Becomes: python3 salt.py hash --algorithm argon2 "mypassword" +``` + +**Normalization rules:** +1. If first arg is `hash` or `verify` → pass through unchanged +2. If first arg is `-h` or `--help` → pass through unchanged +3. If first arg is `--algorithm` or `-a` → prepend `hash` +4. Otherwise (plain password) → prepend `hash` + +**Tested in:** `tests/test_cli.py::test_main_supports_hash_shortcut()` and `test_main_hash_algorithm_shortcut()` + +### Core Functions + +**`hash_password(password, *, algorithm="pbkdf2", iterations=None, salt_bytes=16) -> tuple[str, str]`** +- Located in `salt.py` lines 19-30 +- Returns `(salt_b64, hash_b64)` tuple +- For Argon2/bcrypt: salt_b64 will be empty string +- Delegates to algorithm implementation via `get_algorithm(algorithm).hash(...)` + +**`verify_password(password, salt_b64, hash_b64, *, algorithm="pbkdf2", iterations=None) -> bool`** +- Located in `salt.py` lines 33-45 +- Returns `True` on match, `False` on mismatch or invalid input +- Never raises exceptions (catches `binascii.Error`, `ValueError`) +- Uses `hmac.compare_digest()` for timing-safe comparison (PBKDF2) +- Delegates to algorithm implementation via `get_algorithm(algorithm).verify(...)` + +## Naming Conventions and Style + +### Python Style +- **Target version:** Python 3.11+ (using `|` union types, structural pattern matching potential) +- **Actual version in use:** Python 3.12.3 (per pytest output) +- **Indentation:** 4 spaces (never tabs) +- **Type hints:** Required for all public functions, use `from __future__ import annotations` for forward refs +- **Function/variable names:** `lowercase_with_underscores` +- **Class names:** `CapWords` (e.g., `PBKDF2Algorithm`, `Argon2Algorithm`) +- **Private/internal functions:** Prefix with `_` (e.g., `_build_parser()`, `_normalize_args()`) + +### Naming Patterns Observed + +**Module names:** `_algorithm.py` (e.g., `pbkdf2_algorithm.py`) +**Class names:** `Algorithm` (e.g., `PBKDF2Algorithm`, `BcryptAlgorithm`) +**Test files:** `test_.py` (e.g., `test_hashing.py`, `test_algorithms.py`) +**Test functions:** `test__()` with docstrings explaining intent + +### Constants +```python +DEFAULT_SALT_BYTES = 16 +DEFAULT_ITERATIONS = int(os.environ.get("PBKDF2_ITERATIONS", "200000")) +``` + +### Import Order +1. `from __future__ import annotations` (always first if present) +2. Standard library imports +3. Third-party imports (argon2, bcrypt) +4. Local imports (at bottom for registration side-effects) ## Testing Guidelines -Pin tests under `tests/test_.py` and mirror module names (e.g., `tests/test_hashing.py`). Cover both happy paths and failure handling (invalid base64, mismatched hashes). Include property-style assertions such as round-tripping `hash_password` output into `verify_password`. When adding algorithms, require regression tests with representative salts and ensure coverage stays above 90% for the hashing utilities. -## Commit & Pull Request Guidelines -Use short, imperative subject lines (e.g., `feat: add argon2 hashing option`) and describe motivation plus key changes in the body. Reference related issues with `Fixes #ID` when applicable. Open PRs only after tests pass, include reproduction or CLI output for security changes, and add screenshots or logs if the behavior affects tooling UX. Keep PRs focused; split large refactors into reviewable chunks. +### Test Structure + +**Coverage goal:** >90% for hashing utilities + +**Test organization:** +- `test_hashing.py`: Core `hash_password()` and `verify_password()` functions +- `test_cli.py`: CLI behavior for `salt.py` (including shortcut syntax) +- `test_algorithms.py`: Individual algorithm implementations + +### Testing Patterns + +**Round-trip tests (most important):** +```python +def test__algorithm_hash_round_trip(): + """Verify algorithm can hash and verify passwords.""" + algo = get_algorithm("") + salt, hashed = algo.hash("test password") + assert algo.verify("test password", salt, hashed) + assert not algo.verify("wrong password", salt, hashed) +``` + +**Property tests:** +- Base64 validation for outputs (see `test_hash_password_returns_base64`) +- Invalid input handling (see `test_verify_password_handles_invalid_base64`) +- Algorithm-specific behavior (see `test_pbkdf2_algorithm_respects_iterations`) + +**CLI tests:** +```python +def test_main__(): + """Test description.""" + assert main([]) == +``` + +Exit codes: 0 = success, 1 = verification failure + +### Test Execution Notes + +- 20 tests total (as of current state) +- Test execution time: ~5 seconds (mostly from Argon2/bcrypt hashing) +- All tests must pass before committing +- Use `pytest -v` for verbose output +- Use `pytest -k ` to run subset + +### Adding Tests for New Algorithms + +When adding a new algorithm, add these tests to `tests/test_algorithms.py`: + +```python +def test__algorithm_hash_round_trip(): + """Verify algorithm can hash and verify passwords.""" + algo = get_algorithm("") + salt, hashed = algo.hash("test password") + assert algo.verify("test password", salt, hashed) + assert not algo.verify("wrong password", salt, hashed) + +def test__algorithm_identifier(): + """Verify algorithm has correct identifier.""" + algo = get_algorithm("") + assert algo.identifier == "" +``` + +And to `tests/test_cli.py`: +```python +def test_main_hash_with_algorithm_(): + """Test hash command with --algorithm .""" + assert main(["hash", "--algorithm", "", "test-password"]) == 0 +``` + +## Important Gotchas and Non-Obvious Patterns + +### 1. Algorithm Registration Order Matters + +The imports at the bottom of `algorithms.py` trigger registration. If you import `algorithms` module before the algorithm modules are imported, the registry will be empty. This is handled correctly in the current code, but be careful when refactoring. + +### 2. CLI Output to Stdout (Tests Must Capture) + +The CLI functions print to stdout. Tests use `main([...])` to check exit codes but don't capture output. If you need to test output content, use `capsys` fixture: + +```python +def test_output_format(capsys): + main(["hash", "test"]) + captured = capsys.readouterr() + assert "Salt:" in captured.out + assert "Hash:" in captured.out +``` + +### 3. Empty Salt for Argon2/bcrypt + +When working with Argon2 or bcrypt hashes: +- The first element of the tuple is always `""` (empty string) +- The second element contains the full hash with embedded salt +- When verifying, pass `""` for salt or any string (it's ignored) +- Tests should handle this: `salt, hashed = algo.hash("pw")` where `salt == ""` + +### 4. Environment Variable for Iterations + +`PBKDF2_ITERATIONS` environment variable affects default behavior. Tests should be aware this can change behavior: +```python +DEFAULT_ITERATIONS = int(os.environ.get("PBKDF2_ITERATIONS", "200000")) +``` + +### 5. German UI Text + +CLI output and help text are in German: +- `"✓ Passwort korrekt"` / `"✗ Passwort falsch"` +- Help text uses German descriptions +- Comments in code are mixed German/English +- When adding CLI features, maintain German for user-facing text + +### 6. Exception Handling in Verify + +`verify_password()` catches exceptions and returns `False` rather than raising: +```python +try: + salt = base64.b64decode(salt_b64, validate=True) + stored_hash = base64.b64decode(hash_b64, validate=True) +except (binascii.Error, ValueError): + return False +``` + +This is intentional for security (avoid leaking information via exception types). + +### 7. Dynamic Algorithm List + +The CLI parser uses `list_algorithms()` to populate choices dynamically: +```python +hash_parser.add_argument( + "--algorithm", + "-a", + choices=list_algorithms(), # Dynamic based on registered algorithms + default="pbkdf2", + help="Hash-Algorithmus (Standard: pbkdf2)", +) +``` + +This means adding a new algorithm automatically adds it to CLI choices. + +### 8. salt2.py Missing Algorithm Support + +**Current limitation:** `salt2.py` does NOT yet support the `--algorithm` flag. It only uses the default (PBKDF2). To add support: +1. Update `_build_parser()` to add `--algorithm` argument to both subparsers +2. Update `_command_generate()` to pass `algorithm=args.algorithm` +3. Update `_command_verify()` to pass `algorithm=args.algorithm` +4. Add tests to `tests/test_cli.py` or create `tests/test_cli2.py` + +### 9. Relative Imports vs Absolute Imports + +The code uses **absolute imports** (not relative): +```python +from algorithms import get_algorithm # Not: from .algorithms import get_algorithm +``` + +This works because modules are in the root directory, not a package. Keep this pattern consistent. + +## Security Practices + +### Cryptographic Randomness +- Always use `os.urandom()` for salt generation (never `random` module) +- Never use predictable values or timestamps + +### Timing Attack Protection +- Use `hmac.compare_digest()` for hash comparison (PBKDF2) +- Argon2 and bcrypt libraries handle timing safety internally + +### Input Validation +- Use `base64.b64decode(data, validate=True)` to validate base64 format +- Catch and handle `binascii.Error` and `ValueError` +- Return `False` rather than raising exceptions in verify functions + +### Iteration Count +- Default 200,000 for PBKDF2 (OWASP recommended as of 2023) +- Configurable via `PBKDF2_ITERATIONS` environment variable +- Always parameterize in functions (don't hardcode) + +### No Logging of Secrets +- Never log passwords, salts, or hashes +- Print statements only in CLI code for user output +- Tests don't need to capture sensitive values + +## Configuration and Environment + +### Environment Variables + +**`PBKDF2_ITERATIONS`** +- Default: `"200000"` (as string, converted to int) +- Used by: `pbkdf2_algorithm.py` and `salt.py` +- Example: `export PBKDF2_ITERATIONS=300000` + +### Dependencies (requirements.txt) + +``` +pytest>=7.4 # Testing framework +argon2-cffi>=23.1.0 # Argon2id implementation +bcrypt>=4.1.0 # bcrypt implementation +``` + +**Standard library only:** `hashlib`, `hmac`, `os`, `base64`, `binascii`, `argparse`, `sys` + +### Python Version Requirements + +- **Minimum:** Python 3.11 (for `|` union type syntax) +- **Tested on:** Python 3.12.3 +- **Type hints:** Using modern syntax (`int | None`, not `Optional[int]`) + +## Commit Guidelines + +### Commit Message Format + +Use conventional commit format with imperative mood: + +``` +: + + + + +``` + +**Types in use:** +- `feat:` - New features (e.g., "feat: add argon2 algorithm") +- `fix:` - Bug fixes +- `test:` - Test additions/changes +- `refactor:` - Code restructuring without behavior change +- `docs:` - Documentation updates + +### Examples from Git History + +``` +docs: add README.md for repository +docs: translate CLAUDE.md to German +first commit +``` + +### Before Committing + +1. Run full test suite: `python3 -m pytest -v` +2. Ensure all tests pass (20/20) +3. Verify no regressions in existing functionality +4. Check that new code follows style conventions +5. Add tests for new functionality + +### Pull Request Guidelines + +- Keep PRs focused on single feature/fix +- Include motivation in description +- Reference issues with `Fixes #ID` +- For security changes: include CLI output or reproduction +- All tests must pass +- Maintain >90% coverage + +## Planned Enhancements and Known Gaps + +Based on existing documentation and code: + +### Missing Features (mentioned in plans but not implemented) + +1. **`list-algorithms` command** - Mentioned in plan (Task 9) but not in current code +2. **salt2.py algorithm support** - `salt2.py` doesn't support `--algorithm` flag yet +3. **Integration tests** - `tests/test_integration.py` mentioned in plan but not created +4. **Backward compatibility test** - Specific test mentioned in Task 12 plan + +### Documentation Gaps + +1. README.md is in German but doesn't mention multi-algorithm support yet +2. CLAUDE.md is German translation but may be outdated vs English version +3. No API docs for algorithm developers + +### Potential Improvements + +1. Add `list-algorithms` subcommand to CLIs +2. Complete algorithm support in `salt2.py` +3. Add integration tests with subprocess +4. Document algorithm addition process more clearly +5. Consider packaging as proper Python package (`salt/` directory structure) + +## Working with Plans and Documentation + +### Plan Execution Pattern + +The repository includes a detailed plan at `docs/plans/2025-11-13-multi-algorithm-support.md`: +- 12 tasks with step-by-step TDD approach +- Each task has verification steps +- Plan includes commit message examples +- Some tasks completed, others not yet + +**If implementing from plans:** +1. Read the full task before starting +2. Follow TDD: failing test first, then implementation +3. Run tests after each step +4. Use suggested commit messages +5. Mark off checklist items as you complete them + +### Multiple Documentation Files + +- **AGENTS.md**: This file - agent/developer guidelines +- **CLAUDE.md**: German translation, more detailed code examples +- **README.md**: User-facing documentation in German +- **docs/plans/*.md**: Implementation plans with TDD steps + +Keep these in sync when making significant changes. + +## Quick Reference + +### File a Bug or Add a Feature + +1. Create test demonstrating issue/feature +2. Implement fix/feature +3. Run `python3 -m pytest -v` +4. Commit with conventional message +5. Update docs if needed + +### Debug a Test Failure + +```bash +# Run specific test with verbose output +python3 -m pytest tests/test_file.py::test_name -vv + +# Run with pdb on failure +python3 -m pytest --pdb tests/test_file.py::test_name + +# See print statements +python3 -m pytest -s tests/test_file.py +``` + +### Verify All Systems + +```bash +# Comprehensive check +python3 -m pytest -v && \ +python3 salt.py "test" && \ +python3 salt.py hash --algorithm argon2 "test" && \ +python3 salt.py hash --algorithm bcrypt "test" && \ +python3 salt2.py generate "test" && \ +echo "All systems operational" +``` + +--- + +**Document Version:** 2025-11-13 +**Last Updated:** Initial comprehensive analysis based on repository state +**Test Count:** 20 tests (all passing) +**Python Version:** 3.12.3