docs: update AGENTS.md with comprehensive repository guidelines

2025-11-13 23:30:23 +00:00
parent f456538b11
commit e51c5184e9
1 changed files with 545 additions and 17 deletions
@@ -1,25 +1,553 @@
-# Repository Guidelines
+# Repository Guidelines for AI Agents

-## Project Structure & Module Organization
-The repo is intentionally small: `salt.py` is a reference implementation that demonstrates secure PBKDF2 hashing, while `salt2.py` wraps the same logic in a CLI with verify mode. Keep new helpers alongside these modules only if they remain single-file utilities; otherwise scaffold a package under `salt/` and mirror tests in `tests/`. Generate derived assets (sample salts, fixtures) in `artifacts/` and never commit secrets or real user data.
+## Project Overview

-## Build, Test, and Development Commands
-Run quick manual checks directly:
-```bash
-python3 salt.py "MySecret"        # prints salt + hash
-python3 salt2.py verify <pw> <salt> <hash>  # exit 0 on success
+This is a Python password hashing utility supporting multiple cryptographic algorithms (PBKDF2, Argon2, bcrypt). The project provides both library functions and two CLI implementations with different UX patterns.
+
+**Key Architecture:**
+- **Algorithm Registry Pattern**: Pluggable hash algorithms implementing a common `Algorithm` protocol
+- **Auto-registration**: Algorithms register themselves on module import
+- **Two CLIs**: `salt.py` (shortcut syntax) and `salt2.py` (explicit commands)
+- **Backward compatible**: Default is PBKDF2 for compatibility with legacy code
+
+**Module Structure:**
 ```
-Add dependencies to `requirements.txt` if you introduce new libraries. For automated validation add pytest and expose a default target:
-```bash
-python3 -m pytest          # runs unit tests
-python3 -m pytest -k verify --maxfail=1 -vv
+salt.py               # Core library + CLI with shortcut syntax
+salt2.py              # Alternative CLI with explicit subcommands
+algorithms.py         # Algorithm registry and Protocol definition
+pbkdf2_algorithm.py   # PBKDF2-HMAC-SHA256 implementation
+argon2_algorithm.py   # Argon2id implementation
+bcrypt_algorithm.py   # bcrypt implementation
+tests/
+  test_hashing.py     # Tests for core hash/verify functions
+  test_cli.py         # Tests for salt.py CLI behavior
+  test_algorithms.py  # Tests for algorithm implementations
 ```

-## Coding Style & Naming Conventions
-Target Python 3.11+, 4-space indentation, and type-annotate public functions. Use descriptive, lowercase_with_underscores names for variables and helper functions; reserve CapWords for classes if you introduce them. Favor stdlib modules (hashlib, secrets, base64) before external choices. Keep scripts executable (`#!/usr/bin/env python3`) and guard entry points with `if __name__ == "__main__":`.
+## Essential Commands
+
+### Running the Tools
+
+```bash
+# Hash a password with default algorithm (PBKDF2)
+python3 salt.py "MyPassword"                    # Shortcut syntax
+python3 salt.py hash "MyPassword"               # Explicit command
+python3 salt2.py generate "MyPassword"          # Alternative CLI
+
+# Hash with specific algorithm
+python3 salt.py hash --algorithm argon2 "MyPassword"
+python3 salt.py --algorithm bcrypt "MyPassword"  # Shortcut with algorithm
+python3 salt2.py generate "MyPassword"           # Note: salt2.py doesn't support --algorithm yet
+
+# Verify a password
+python3 salt.py verify "MyPassword" <salt_b64> <hash_b64>
+python3 salt.py verify --algorithm argon2 "MyPassword" "" <hash_b64>
+
+# List available algorithms
+python3 salt.py list-algorithms  # Not implemented yet, would be useful addition
+```
+
+### Testing
+
+```bash
+# Run all tests (20 tests total as of now)
+python3 -m pytest
+
+# Verbose output
+python3 -m pytest -v
+
+# Run specific test files
+python3 -m pytest tests/test_hashing.py
+python3 -m pytest tests/test_cli.py
+python3 -m pytest tests/test_algorithms.py
+
+# Run specific test patterns
+python3 -m pytest -k verify
+python3 -m pytest -k algorithm
+
+# Run with verbose failure details
+python3 -m pytest --maxfail=1 -vv
+
+# Run tests with coverage (if pytest-cov installed)
+python3 -m pytest --cov=. --cov-report=term-missing
+```
+
+### Development Workflow
+
+```bash
+# Install dependencies
+pip install -r requirements.txt
+
+# Quick smoke test of CLI
+python3 salt.py "test" && echo "salt.py works"
+python3 salt2.py generate "test" && echo "salt2.py works"
+
+# Test algorithm round-trip
+salt_hash=$(python3 salt.py "secret" | tail -1 | cut -d' ' -f2)
+# Extract salt and hash from output for verify command
+```
+
+## Code Organization and Patterns
+
+### Algorithm Registry Pattern
+
+**How it works:**
+1. `algorithms.py` defines `Algorithm` Protocol with `hash()`, `verify()`, and `identifier` attribute
+2. Each algorithm module creates a class implementing the protocol
+3. At module bottom, algorithm imports `register_algorithm` and registers itself
+4. `algorithms.py` imports all algorithm modules at bottom to trigger registration
+5. Core functions use `get_algorithm(name)` to retrieve registered implementations
+
+**Adding a new algorithm:**
+1. Create `<name>_algorithm.py` implementing `Algorithm` protocol
+2. Add `register_algorithm(<Name>Algorithm())` at bottom of file
+3. Import module at end of `algorithms.py`: `import <name>_algorithm  # noqa: E402, F401`
+4. Add tests to `tests/test_algorithms.py`
+5. Update CLI choices in `_build_parser()` if using explicit choices (currently uses dynamic list)
+
+### Import Pattern for Registration
+
+**Critical:** The algorithm modules use a specific import pattern to avoid circular dependencies:
+
+```python
+# At BOTTOM of algorithm module (e.g., pbkdf2_algorithm.py)
+from algorithms import register_algorithm
+
+# Auto-register when module is imported
+register_algorithm(PBKDF2Algorithm())
+```
+
+**In algorithms.py:**
+```python
+# At BOTTOM of file
+# Import to auto-register algorithms
+import pbkdf2_algorithm  # noqa: E402, F401
+import argon2_algorithm  # noqa: E402, F401
+import bcrypt_algorithm  # noqa: E402, F401
+```
+
+The `# noqa: E402, F401` comments suppress linter warnings about:
+- E402: module level import not at top of file (intentional for registration)
+- F401: imported but unused (side-effect import for registration)
+
+### Salt Handling Across Algorithms
+
+**IMPORTANT difference between algorithms:**
+
+- **PBKDF2**: Generates separate 16-byte salt with `os.urandom()`, returns `(salt_b64, hash_b64)` where both are populated
+- **Argon2**: Embeds salt in hash string (e.g., `$argon2id$v=19$m=65536,t=3,p=4$...`), returns `("", hash_b64)` with empty salt
+- **bcrypt**: Embeds salt in hash string (e.g., `$2b$12$...`), returns `("", hash_b64)` with empty salt
+
+When verifying Argon2/bcrypt passwords, the `salt_b64` parameter is ignored since the salt is embedded in `hash_b64`.
+
+### CLI Argument Normalization (salt.py only)
+
+**`salt.py` has special shortcut syntax via `_normalize_args()`:**
+
+```python
+# These are equivalent:
+python3 salt.py "mypassword"
+python3 salt.py hash "mypassword"
+
+# Algorithm shortcut also works:
+python3 salt.py --algorithm argon2 "mypassword"
+# Becomes: python3 salt.py hash --algorithm argon2 "mypassword"
+```
+
+**Normalization rules:**
+1. If first arg is `hash` or `verify` → pass through unchanged
+2. If first arg is `-h` or `--help` → pass through unchanged
+3. If first arg is `--algorithm` or `-a` → prepend `hash`
+4. Otherwise (plain password) → prepend `hash`
+
+**Tested in:** `tests/test_cli.py::test_main_supports_hash_shortcut()` and `test_main_hash_algorithm_shortcut()`
+
+### Core Functions
+
+**`hash_password(password, *, algorithm="pbkdf2", iterations=None, salt_bytes=16) -> tuple[str, str]`**
+- Located in `salt.py` lines 19-30
+- Returns `(salt_b64, hash_b64)` tuple
+- For Argon2/bcrypt: salt_b64 will be empty string
+- Delegates to algorithm implementation via `get_algorithm(algorithm).hash(...)`
+
+**`verify_password(password, salt_b64, hash_b64, *, algorithm="pbkdf2", iterations=None) -> bool`**
+- Located in `salt.py` lines 33-45
+- Returns `True` on match, `False` on mismatch or invalid input
+- Never raises exceptions (catches `binascii.Error`, `ValueError`)
+- Uses `hmac.compare_digest()` for timing-safe comparison (PBKDF2)
+- Delegates to algorithm implementation via `get_algorithm(algorithm).verify(...)`
+
+## Naming Conventions and Style
+
+### Python Style
+- **Target version:** Python 3.11+ (using `|` union types, structural pattern matching potential)
+- **Actual version in use:** Python 3.12.3 (per pytest output)
+- **Indentation:** 4 spaces (never tabs)
+- **Type hints:** Required for all public functions, use `from __future__ import annotations` for forward refs
+- **Function/variable names:** `lowercase_with_underscores`
+- **Class names:** `CapWords` (e.g., `PBKDF2Algorithm`, `Argon2Algorithm`)
+- **Private/internal functions:** Prefix with `_` (e.g., `_build_parser()`, `_normalize_args()`)
+
+### Naming Patterns Observed
+
+**Module names:** `<algorithm>_algorithm.py` (e.g., `pbkdf2_algorithm.py`)
+**Class names:** `<Algorithm>Algorithm` (e.g., `PBKDF2Algorithm`, `BcryptAlgorithm`)
+**Test files:** `test_<feature>.py` (e.g., `test_hashing.py`, `test_algorithms.py`)
+**Test functions:** `test_<behavior>_<context>()` with docstrings explaining intent
+
+### Constants
+```python
+DEFAULT_SALT_BYTES = 16
+DEFAULT_ITERATIONS = int(os.environ.get("PBKDF2_ITERATIONS", "200000"))
+```
+
+### Import Order
+1. `from __future__ import annotations` (always first if present)
+2. Standard library imports
+3. Third-party imports (argon2, bcrypt)
+4. Local imports (at bottom for registration side-effects)

 ## Testing Guidelines
-Pin tests under `tests/test_<feature>.py` and mirror module names (e.g., `tests/test_hashing.py`). Cover both happy paths and failure handling (invalid base64, mismatched hashes). Include property-style assertions such as round-tripping `hash_password` output into `verify_password`. When adding algorithms, require regression tests with representative salts and ensure coverage stays above 90% for the hashing utilities.

-## Commit & Pull Request Guidelines
-Use short, imperative subject lines (e.g., `feat: add argon2 hashing option`) and describe motivation plus key changes in the body. Reference related issues with `Fixes #ID` when applicable. Open PRs only after tests pass, include reproduction or CLI output for security changes, and add screenshots or logs if the behavior affects tooling UX. Keep PRs focused; split large refactors into reviewable chunks.
+### Test Structure
+
+**Coverage goal:** >90% for hashing utilities
+
+**Test organization:**
+- `test_hashing.py`: Core `hash_password()` and `verify_password()` functions
+- `test_cli.py`: CLI behavior for `salt.py` (including shortcut syntax)
+- `test_algorithms.py`: Individual algorithm implementations
+
+### Testing Patterns
+
+**Round-trip tests (most important):**
+```python
+def test_<algo>_algorithm_hash_round_trip():
+    """Verify <algo> algorithm can hash and verify passwords."""
+    algo = get_algorithm("<algo>")
+    salt, hashed = algo.hash("test password")
+    assert algo.verify("test password", salt, hashed)
+    assert not algo.verify("wrong password", salt, hashed)
+```
+
+**Property tests:**
+- Base64 validation for outputs (see `test_hash_password_returns_base64`)
+- Invalid input handling (see `test_verify_password_handles_invalid_base64`)
+- Algorithm-specific behavior (see `test_pbkdf2_algorithm_respects_iterations`)
+
+**CLI tests:**
+```python
+def test_main_<command>_<scenario>():
+    """Test description."""
+    assert main([<args>]) == <expected_exit_code>
+```
+
+Exit codes: 0 = success, 1 = verification failure
+
+### Test Execution Notes
+
+- 20 tests total (as of current state)
+- Test execution time: ~5 seconds (mostly from Argon2/bcrypt hashing)
+- All tests must pass before committing
+- Use `pytest -v` for verbose output
+- Use `pytest -k <pattern>` to run subset
+
+### Adding Tests for New Algorithms
+
+When adding a new algorithm, add these tests to `tests/test_algorithms.py`:
+
+```python
+def test_<algo>_algorithm_hash_round_trip():
+    """Verify <algo> algorithm can hash and verify passwords."""
+    algo = get_algorithm("<algo>")
+    salt, hashed = algo.hash("test password")
+    assert algo.verify("test password", salt, hashed)
+    assert not algo.verify("wrong password", salt, hashed)
+
+def test_<algo>_algorithm_identifier():
+    """Verify <algo> algorithm has correct identifier."""
+    algo = get_algorithm("<algo>")
+    assert algo.identifier == "<algo>"
+```
+
+And to `tests/test_cli.py`:
+```python
+def test_main_hash_with_algorithm_<algo>():
+    """Test hash command with --algorithm <algo>."""
+    assert main(["hash", "--algorithm", "<algo>", "test-password"]) == 0
+```
+
+## Important Gotchas and Non-Obvious Patterns
+
+### 1. Algorithm Registration Order Matters
+
+The imports at the bottom of `algorithms.py` trigger registration. If you import `algorithms` module before the algorithm modules are imported, the registry will be empty. This is handled correctly in the current code, but be careful when refactoring.
+
+### 2. CLI Output to Stdout (Tests Must Capture)
+
+The CLI functions print to stdout. Tests use `main([...])` to check exit codes but don't capture output. If you need to test output content, use `capsys` fixture:
+
+```python
+def test_output_format(capsys):
+    main(["hash", "test"])
+    captured = capsys.readouterr()
+    assert "Salt:" in captured.out
+    assert "Hash:" in captured.out
+```
+
+### 3. Empty Salt for Argon2/bcrypt
+
+When working with Argon2 or bcrypt hashes:
+- The first element of the tuple is always `""` (empty string)
+- The second element contains the full hash with embedded salt
+- When verifying, pass `""` for salt or any string (it's ignored)
+- Tests should handle this: `salt, hashed = algo.hash("pw")` where `salt == ""`
+
+### 4. Environment Variable for Iterations
+
+`PBKDF2_ITERATIONS` environment variable affects default behavior. Tests should be aware this can change behavior:
+```python
+DEFAULT_ITERATIONS = int(os.environ.get("PBKDF2_ITERATIONS", "200000"))
+```
+
+### 5. German UI Text
+
+CLI output and help text are in German:
+- `"✓ Passwort korrekt"` / `"✗ Passwort falsch"`
+- Help text uses German descriptions
+- Comments in code are mixed German/English
+- When adding CLI features, maintain German for user-facing text
+
+### 6. Exception Handling in Verify
+
+`verify_password()` catches exceptions and returns `False` rather than raising:
+```python
+try:
+    salt = base64.b64decode(salt_b64, validate=True)
+    stored_hash = base64.b64decode(hash_b64, validate=True)
+except (binascii.Error, ValueError):
+    return False
+```
+
+This is intentional for security (avoid leaking information via exception types).
+
+### 7. Dynamic Algorithm List
+
+The CLI parser uses `list_algorithms()` to populate choices dynamically:
+```python
+hash_parser.add_argument(
+    "--algorithm",
+    "-a",
+    choices=list_algorithms(),  # Dynamic based on registered algorithms
+    default="pbkdf2",
+    help="Hash-Algorithmus (Standard: pbkdf2)",
+)
+```
+
+This means adding a new algorithm automatically adds it to CLI choices.
+
+### 8. salt2.py Missing Algorithm Support
+
+**Current limitation:** `salt2.py` does NOT yet support the `--algorithm` flag. It only uses the default (PBKDF2). To add support:
+1. Update `_build_parser()` to add `--algorithm` argument to both subparsers
+2. Update `_command_generate()` to pass `algorithm=args.algorithm`
+3. Update `_command_verify()` to pass `algorithm=args.algorithm`
+4. Add tests to `tests/test_cli.py` or create `tests/test_cli2.py`
+
+### 9. Relative Imports vs Absolute Imports
+
+The code uses **absolute imports** (not relative):
+```python
+from algorithms import get_algorithm  # Not: from .algorithms import get_algorithm
+```
+
+This works because modules are in the root directory, not a package. Keep this pattern consistent.
+
+## Security Practices
+
+### Cryptographic Randomness
+- Always use `os.urandom()` for salt generation (never `random` module)
+- Never use predictable values or timestamps
+
+### Timing Attack Protection
+- Use `hmac.compare_digest()` for hash comparison (PBKDF2)
+- Argon2 and bcrypt libraries handle timing safety internally
+
+### Input Validation
+- Use `base64.b64decode(data, validate=True)` to validate base64 format
+- Catch and handle `binascii.Error` and `ValueError`
+- Return `False` rather than raising exceptions in verify functions
+
+### Iteration Count
+- Default 200,000 for PBKDF2 (OWASP recommended as of 2023)
+- Configurable via `PBKDF2_ITERATIONS` environment variable
+- Always parameterize in functions (don't hardcode)
+
+### No Logging of Secrets
+- Never log passwords, salts, or hashes
+- Print statements only in CLI code for user output
+- Tests don't need to capture sensitive values
+
+## Configuration and Environment
+
+### Environment Variables
+
+**`PBKDF2_ITERATIONS`**
+- Default: `"200000"` (as string, converted to int)
+- Used by: `pbkdf2_algorithm.py` and `salt.py`
+- Example: `export PBKDF2_ITERATIONS=300000`
+
+### Dependencies (requirements.txt)
+
+```
+pytest>=7.4              # Testing framework
+argon2-cffi>=23.1.0      # Argon2id implementation
+bcrypt>=4.1.0            # bcrypt implementation
+```
+
+**Standard library only:** `hashlib`, `hmac`, `os`, `base64`, `binascii`, `argparse`, `sys`
+
+### Python Version Requirements
+
+- **Minimum:** Python 3.11 (for `|` union type syntax)
+- **Tested on:** Python 3.12.3
+- **Type hints:** Using modern syntax (`int | None`, not `Optional[int]`)
+
+## Commit Guidelines
+
+### Commit Message Format
+
+Use conventional commit format with imperative mood:
+
+```
+<type>: <short description>
+
+<optional longer description>
+
+<optional footer>
+```
+
+**Types in use:**
+- `feat:` - New features (e.g., "feat: add argon2 algorithm")
+- `fix:` - Bug fixes
+- `test:` - Test additions/changes
+- `refactor:` - Code restructuring without behavior change
+- `docs:` - Documentation updates
+
+### Examples from Git History
+
+```
+docs: add README.md for repository
+docs: translate CLAUDE.md to German
+first commit
+```
+
+### Before Committing
+
+1. Run full test suite: `python3 -m pytest -v`
+2. Ensure all tests pass (20/20)
+3. Verify no regressions in existing functionality
+4. Check that new code follows style conventions
+5. Add tests for new functionality
+
+### Pull Request Guidelines
+
+- Keep PRs focused on single feature/fix
+- Include motivation in description
+- Reference issues with `Fixes #ID`
+- For security changes: include CLI output or reproduction
+- All tests must pass
+- Maintain >90% coverage
+
+## Planned Enhancements and Known Gaps
+
+Based on existing documentation and code:
+
+### Missing Features (mentioned in plans but not implemented)
+
+1. **`list-algorithms` command** - Mentioned in plan (Task 9) but not in current code
+2. **salt2.py algorithm support** - `salt2.py` doesn't support `--algorithm` flag yet
+3. **Integration tests** - `tests/test_integration.py` mentioned in plan but not created
+4. **Backward compatibility test** - Specific test mentioned in Task 12 plan
+
+### Documentation Gaps
+
+1. README.md is in German but doesn't mention multi-algorithm support yet
+2. CLAUDE.md is German translation but may be outdated vs English version
+3. No API docs for algorithm developers
+
+### Potential Improvements
+
+1. Add `list-algorithms` subcommand to CLIs
+2. Complete algorithm support in `salt2.py`
+3. Add integration tests with subprocess
+4. Document algorithm addition process more clearly
+5. Consider packaging as proper Python package (`salt/` directory structure)
+
+## Working with Plans and Documentation
+
+### Plan Execution Pattern
+
+The repository includes a detailed plan at `docs/plans/2025-11-13-multi-algorithm-support.md`:
+- 12 tasks with step-by-step TDD approach
+- Each task has verification steps
+- Plan includes commit message examples
+- Some tasks completed, others not yet
+
+**If implementing from plans:**
+1. Read the full task before starting
+2. Follow TDD: failing test first, then implementation
+3. Run tests after each step
+4. Use suggested commit messages
+5. Mark off checklist items as you complete them
+
+### Multiple Documentation Files
+
+- **AGENTS.md**: This file - agent/developer guidelines
+- **CLAUDE.md**: German translation, more detailed code examples
+- **README.md**: User-facing documentation in German
+- **docs/plans/*.md**: Implementation plans with TDD steps
+
+Keep these in sync when making significant changes.
+
+## Quick Reference
+
+### File a Bug or Add a Feature
+
+1. Create test demonstrating issue/feature
+2. Implement fix/feature
+3. Run `python3 -m pytest -v`
+4. Commit with conventional message
+5. Update docs if needed
+
+### Debug a Test Failure
+
+```bash
+# Run specific test with verbose output
+python3 -m pytest tests/test_file.py::test_name -vv
+
+# Run with pdb on failure
+python3 -m pytest --pdb tests/test_file.py::test_name
+
+# See print statements
+python3 -m pytest -s tests/test_file.py
+```
+
+### Verify All Systems
+
+```bash
+# Comprehensive check
+python3 -m pytest -v && \
+python3 salt.py "test" && \
+python3 salt.py hash --algorithm argon2 "test" && \
+python3 salt.py hash --algorithm bcrypt "test" && \
+python3 salt2.py generate "test" && \
+echo "All systems operational"
+```
+
+---
+
+**Document Version:** 2025-11-13
+**Last Updated:** Initial comprehensive analysis based on repository state
+**Test Count:** 20 tests (all passing)
+**Python Version:** 3.12.3