docs: update AGENTS.md with comprehensive repository guidelines

This commit is contained in:
2025-11-13 23:30:23 +00:00
parent f456538b11
commit e51c5184e9

562
AGENTS.md
View File

@@ -1,25 +1,553 @@
# Repository Guidelines
# Repository Guidelines for AI Agents
## Project Structure & Module Organization
The repo is intentionally small: `salt.py` is a reference implementation that demonstrates secure PBKDF2 hashing, while `salt2.py` wraps the same logic in a CLI with verify mode. Keep new helpers alongside these modules only if they remain single-file utilities; otherwise scaffold a package under `salt/` and mirror tests in `tests/`. Generate derived assets (sample salts, fixtures) in `artifacts/` and never commit secrets or real user data.
## Project Overview
## Build, Test, and Development Commands
Run quick manual checks directly:
```bash
python3 salt.py "MySecret" # prints salt + hash
python3 salt2.py verify <pw> <salt> <hash> # exit 0 on success
This is a Python password hashing utility supporting multiple cryptographic algorithms (PBKDF2, Argon2, bcrypt). The project provides both library functions and two CLI implementations with different UX patterns.
**Key Architecture:**
- **Algorithm Registry Pattern**: Pluggable hash algorithms implementing a common `Algorithm` protocol
- **Auto-registration**: Algorithms register themselves on module import
- **Two CLIs**: `salt.py` (shortcut syntax) and `salt2.py` (explicit commands)
- **Backward compatible**: Default is PBKDF2 for compatibility with legacy code
**Module Structure:**
```
Add dependencies to `requirements.txt` if you introduce new libraries. For automated validation add pytest and expose a default target:
```bash
python3 -m pytest # runs unit tests
python3 -m pytest -k verify --maxfail=1 -vv
salt.py # Core library + CLI with shortcut syntax
salt2.py # Alternative CLI with explicit subcommands
algorithms.py # Algorithm registry and Protocol definition
pbkdf2_algorithm.py # PBKDF2-HMAC-SHA256 implementation
argon2_algorithm.py # Argon2id implementation
bcrypt_algorithm.py # bcrypt implementation
tests/
test_hashing.py # Tests for core hash/verify functions
test_cli.py # Tests for salt.py CLI behavior
test_algorithms.py # Tests for algorithm implementations
```
## Coding Style & Naming Conventions
Target Python 3.11+, 4-space indentation, and type-annotate public functions. Use descriptive, lowercase_with_underscores names for variables and helper functions; reserve CapWords for classes if you introduce them. Favor stdlib modules (hashlib, secrets, base64) before external choices. Keep scripts executable (`#!/usr/bin/env python3`) and guard entry points with `if __name__ == "__main__":`.
## Essential Commands
### Running the Tools
```bash
# Hash a password with default algorithm (PBKDF2)
python3 salt.py "MyPassword" # Shortcut syntax
python3 salt.py hash "MyPassword" # Explicit command
python3 salt2.py generate "MyPassword" # Alternative CLI
# Hash with specific algorithm
python3 salt.py hash --algorithm argon2 "MyPassword"
python3 salt.py --algorithm bcrypt "MyPassword" # Shortcut with algorithm
python3 salt2.py generate "MyPassword" # Note: salt2.py doesn't support --algorithm yet
# Verify a password
python3 salt.py verify "MyPassword" <salt_b64> <hash_b64>
python3 salt.py verify --algorithm argon2 "MyPassword" "" <hash_b64>
# List available algorithms
python3 salt.py list-algorithms # Not implemented yet, would be useful addition
```
### Testing
```bash
# Run all tests (20 tests total as of now)
python3 -m pytest
# Verbose output
python3 -m pytest -v
# Run specific test files
python3 -m pytest tests/test_hashing.py
python3 -m pytest tests/test_cli.py
python3 -m pytest tests/test_algorithms.py
# Run specific test patterns
python3 -m pytest -k verify
python3 -m pytest -k algorithm
# Run with verbose failure details
python3 -m pytest --maxfail=1 -vv
# Run tests with coverage (if pytest-cov installed)
python3 -m pytest --cov=. --cov-report=term-missing
```
### Development Workflow
```bash
# Install dependencies
pip install -r requirements.txt
# Quick smoke test of CLI
python3 salt.py "test" && echo "salt.py works"
python3 salt2.py generate "test" && echo "salt2.py works"
# Test algorithm round-trip
salt_hash=$(python3 salt.py "secret" | tail -1 | cut -d' ' -f2)
# Extract salt and hash from output for verify command
```
## Code Organization and Patterns
### Algorithm Registry Pattern
**How it works:**
1. `algorithms.py` defines `Algorithm` Protocol with `hash()`, `verify()`, and `identifier` attribute
2. Each algorithm module creates a class implementing the protocol
3. At module bottom, algorithm imports `register_algorithm` and registers itself
4. `algorithms.py` imports all algorithm modules at bottom to trigger registration
5. Core functions use `get_algorithm(name)` to retrieve registered implementations
**Adding a new algorithm:**
1. Create `<name>_algorithm.py` implementing `Algorithm` protocol
2. Add `register_algorithm(<Name>Algorithm())` at bottom of file
3. Import module at end of `algorithms.py`: `import <name>_algorithm # noqa: E402, F401`
4. Add tests to `tests/test_algorithms.py`
5. Update CLI choices in `_build_parser()` if using explicit choices (currently uses dynamic list)
### Import Pattern for Registration
**Critical:** The algorithm modules use a specific import pattern to avoid circular dependencies:
```python
# At BOTTOM of algorithm module (e.g., pbkdf2_algorithm.py)
from algorithms import register_algorithm
# Auto-register when module is imported
register_algorithm(PBKDF2Algorithm())
```
**In algorithms.py:**
```python
# At BOTTOM of file
# Import to auto-register algorithms
import pbkdf2_algorithm # noqa: E402, F401
import argon2_algorithm # noqa: E402, F401
import bcrypt_algorithm # noqa: E402, F401
```
The `# noqa: E402, F401` comments suppress linter warnings about:
- E402: module level import not at top of file (intentional for registration)
- F401: imported but unused (side-effect import for registration)
### Salt Handling Across Algorithms
**IMPORTANT difference between algorithms:**
- **PBKDF2**: Generates separate 16-byte salt with `os.urandom()`, returns `(salt_b64, hash_b64)` where both are populated
- **Argon2**: Embeds salt in hash string (e.g., `$argon2id$v=19$m=65536,t=3,p=4$...`), returns `("", hash_b64)` with empty salt
- **bcrypt**: Embeds salt in hash string (e.g., `$2b$12$...`), returns `("", hash_b64)` with empty salt
When verifying Argon2/bcrypt passwords, the `salt_b64` parameter is ignored since the salt is embedded in `hash_b64`.
### CLI Argument Normalization (salt.py only)
**`salt.py` has special shortcut syntax via `_normalize_args()`:**
```python
# These are equivalent:
python3 salt.py "mypassword"
python3 salt.py hash "mypassword"
# Algorithm shortcut also works:
python3 salt.py --algorithm argon2 "mypassword"
# Becomes: python3 salt.py hash --algorithm argon2 "mypassword"
```
**Normalization rules:**
1. If first arg is `hash` or `verify` → pass through unchanged
2. If first arg is `-h` or `--help` → pass through unchanged
3. If first arg is `--algorithm` or `-a` → prepend `hash`
4. Otherwise (plain password) → prepend `hash`
**Tested in:** `tests/test_cli.py::test_main_supports_hash_shortcut()` and `test_main_hash_algorithm_shortcut()`
### Core Functions
**`hash_password(password, *, algorithm="pbkdf2", iterations=None, salt_bytes=16) -> tuple[str, str]`**
- Located in `salt.py` lines 19-30
- Returns `(salt_b64, hash_b64)` tuple
- For Argon2/bcrypt: salt_b64 will be empty string
- Delegates to algorithm implementation via `get_algorithm(algorithm).hash(...)`
**`verify_password(password, salt_b64, hash_b64, *, algorithm="pbkdf2", iterations=None) -> bool`**
- Located in `salt.py` lines 33-45
- Returns `True` on match, `False` on mismatch or invalid input
- Never raises exceptions (catches `binascii.Error`, `ValueError`)
- Uses `hmac.compare_digest()` for timing-safe comparison (PBKDF2)
- Delegates to algorithm implementation via `get_algorithm(algorithm).verify(...)`
## Naming Conventions and Style
### Python Style
- **Target version:** Python 3.11+ (using `|` union types, structural pattern matching potential)
- **Actual version in use:** Python 3.12.3 (per pytest output)
- **Indentation:** 4 spaces (never tabs)
- **Type hints:** Required for all public functions, use `from __future__ import annotations` for forward refs
- **Function/variable names:** `lowercase_with_underscores`
- **Class names:** `CapWords` (e.g., `PBKDF2Algorithm`, `Argon2Algorithm`)
- **Private/internal functions:** Prefix with `_` (e.g., `_build_parser()`, `_normalize_args()`)
### Naming Patterns Observed
**Module names:** `<algorithm>_algorithm.py` (e.g., `pbkdf2_algorithm.py`)
**Class names:** `<Algorithm>Algorithm` (e.g., `PBKDF2Algorithm`, `BcryptAlgorithm`)
**Test files:** `test_<feature>.py` (e.g., `test_hashing.py`, `test_algorithms.py`)
**Test functions:** `test_<behavior>_<context>()` with docstrings explaining intent
### Constants
```python
DEFAULT_SALT_BYTES = 16
DEFAULT_ITERATIONS = int(os.environ.get("PBKDF2_ITERATIONS", "200000"))
```
### Import Order
1. `from __future__ import annotations` (always first if present)
2. Standard library imports
3. Third-party imports (argon2, bcrypt)
4. Local imports (at bottom for registration side-effects)
## Testing Guidelines
Pin tests under `tests/test_<feature>.py` and mirror module names (e.g., `tests/test_hashing.py`). Cover both happy paths and failure handling (invalid base64, mismatched hashes). Include property-style assertions such as round-tripping `hash_password` output into `verify_password`. When adding algorithms, require regression tests with representative salts and ensure coverage stays above 90% for the hashing utilities.
## Commit & Pull Request Guidelines
Use short, imperative subject lines (e.g., `feat: add argon2 hashing option`) and describe motivation plus key changes in the body. Reference related issues with `Fixes #ID` when applicable. Open PRs only after tests pass, include reproduction or CLI output for security changes, and add screenshots or logs if the behavior affects tooling UX. Keep PRs focused; split large refactors into reviewable chunks.
### Test Structure
**Coverage goal:** >90% for hashing utilities
**Test organization:**
- `test_hashing.py`: Core `hash_password()` and `verify_password()` functions
- `test_cli.py`: CLI behavior for `salt.py` (including shortcut syntax)
- `test_algorithms.py`: Individual algorithm implementations
### Testing Patterns
**Round-trip tests (most important):**
```python
def test_<algo>_algorithm_hash_round_trip():
"""Verify <algo> algorithm can hash and verify passwords."""
algo = get_algorithm("<algo>")
salt, hashed = algo.hash("test password")
assert algo.verify("test password", salt, hashed)
assert not algo.verify("wrong password", salt, hashed)
```
**Property tests:**
- Base64 validation for outputs (see `test_hash_password_returns_base64`)
- Invalid input handling (see `test_verify_password_handles_invalid_base64`)
- Algorithm-specific behavior (see `test_pbkdf2_algorithm_respects_iterations`)
**CLI tests:**
```python
def test_main_<command>_<scenario>():
"""Test description."""
assert main([<args>]) == <expected_exit_code>
```
Exit codes: 0 = success, 1 = verification failure
### Test Execution Notes
- 20 tests total (as of current state)
- Test execution time: ~5 seconds (mostly from Argon2/bcrypt hashing)
- All tests must pass before committing
- Use `pytest -v` for verbose output
- Use `pytest -k <pattern>` to run subset
### Adding Tests for New Algorithms
When adding a new algorithm, add these tests to `tests/test_algorithms.py`:
```python
def test_<algo>_algorithm_hash_round_trip():
"""Verify <algo> algorithm can hash and verify passwords."""
algo = get_algorithm("<algo>")
salt, hashed = algo.hash("test password")
assert algo.verify("test password", salt, hashed)
assert not algo.verify("wrong password", salt, hashed)
def test_<algo>_algorithm_identifier():
"""Verify <algo> algorithm has correct identifier."""
algo = get_algorithm("<algo>")
assert algo.identifier == "<algo>"
```
And to `tests/test_cli.py`:
```python
def test_main_hash_with_algorithm_<algo>():
"""Test hash command with --algorithm <algo>."""
assert main(["hash", "--algorithm", "<algo>", "test-password"]) == 0
```
## Important Gotchas and Non-Obvious Patterns
### 1. Algorithm Registration Order Matters
The imports at the bottom of `algorithms.py` trigger registration. If you import `algorithms` module before the algorithm modules are imported, the registry will be empty. This is handled correctly in the current code, but be careful when refactoring.
### 2. CLI Output to Stdout (Tests Must Capture)
The CLI functions print to stdout. Tests use `main([...])` to check exit codes but don't capture output. If you need to test output content, use `capsys` fixture:
```python
def test_output_format(capsys):
main(["hash", "test"])
captured = capsys.readouterr()
assert "Salt:" in captured.out
assert "Hash:" in captured.out
```
### 3. Empty Salt for Argon2/bcrypt
When working with Argon2 or bcrypt hashes:
- The first element of the tuple is always `""` (empty string)
- The second element contains the full hash with embedded salt
- When verifying, pass `""` for salt or any string (it's ignored)
- Tests should handle this: `salt, hashed = algo.hash("pw")` where `salt == ""`
### 4. Environment Variable for Iterations
`PBKDF2_ITERATIONS` environment variable affects default behavior. Tests should be aware this can change behavior:
```python
DEFAULT_ITERATIONS = int(os.environ.get("PBKDF2_ITERATIONS", "200000"))
```
### 5. German UI Text
CLI output and help text are in German:
- `"✓ Passwort korrekt"` / `"✗ Passwort falsch"`
- Help text uses German descriptions
- Comments in code are mixed German/English
- When adding CLI features, maintain German for user-facing text
### 6. Exception Handling in Verify
`verify_password()` catches exceptions and returns `False` rather than raising:
```python
try:
salt = base64.b64decode(salt_b64, validate=True)
stored_hash = base64.b64decode(hash_b64, validate=True)
except (binascii.Error, ValueError):
return False
```
This is intentional for security (avoid leaking information via exception types).
### 7. Dynamic Algorithm List
The CLI parser uses `list_algorithms()` to populate choices dynamically:
```python
hash_parser.add_argument(
"--algorithm",
"-a",
choices=list_algorithms(), # Dynamic based on registered algorithms
default="pbkdf2",
help="Hash-Algorithmus (Standard: pbkdf2)",
)
```
This means adding a new algorithm automatically adds it to CLI choices.
### 8. salt2.py Missing Algorithm Support
**Current limitation:** `salt2.py` does NOT yet support the `--algorithm` flag. It only uses the default (PBKDF2). To add support:
1. Update `_build_parser()` to add `--algorithm` argument to both subparsers
2. Update `_command_generate()` to pass `algorithm=args.algorithm`
3. Update `_command_verify()` to pass `algorithm=args.algorithm`
4. Add tests to `tests/test_cli.py` or create `tests/test_cli2.py`
### 9. Relative Imports vs Absolute Imports
The code uses **absolute imports** (not relative):
```python
from algorithms import get_algorithm # Not: from .algorithms import get_algorithm
```
This works because modules are in the root directory, not a package. Keep this pattern consistent.
## Security Practices
### Cryptographic Randomness
- Always use `os.urandom()` for salt generation (never `random` module)
- Never use predictable values or timestamps
### Timing Attack Protection
- Use `hmac.compare_digest()` for hash comparison (PBKDF2)
- Argon2 and bcrypt libraries handle timing safety internally
### Input Validation
- Use `base64.b64decode(data, validate=True)` to validate base64 format
- Catch and handle `binascii.Error` and `ValueError`
- Return `False` rather than raising exceptions in verify functions
### Iteration Count
- Default 200,000 for PBKDF2 (OWASP recommended as of 2023)
- Configurable via `PBKDF2_ITERATIONS` environment variable
- Always parameterize in functions (don't hardcode)
### No Logging of Secrets
- Never log passwords, salts, or hashes
- Print statements only in CLI code for user output
- Tests don't need to capture sensitive values
## Configuration and Environment
### Environment Variables
**`PBKDF2_ITERATIONS`**
- Default: `"200000"` (as string, converted to int)
- Used by: `pbkdf2_algorithm.py` and `salt.py`
- Example: `export PBKDF2_ITERATIONS=300000`
### Dependencies (requirements.txt)
```
pytest>=7.4 # Testing framework
argon2-cffi>=23.1.0 # Argon2id implementation
bcrypt>=4.1.0 # bcrypt implementation
```
**Standard library only:** `hashlib`, `hmac`, `os`, `base64`, `binascii`, `argparse`, `sys`
### Python Version Requirements
- **Minimum:** Python 3.11 (for `|` union type syntax)
- **Tested on:** Python 3.12.3
- **Type hints:** Using modern syntax (`int | None`, not `Optional[int]`)
## Commit Guidelines
### Commit Message Format
Use conventional commit format with imperative mood:
```
<type>: <short description>
<optional longer description>
<optional footer>
```
**Types in use:**
- `feat:` - New features (e.g., "feat: add argon2 algorithm")
- `fix:` - Bug fixes
- `test:` - Test additions/changes
- `refactor:` - Code restructuring without behavior change
- `docs:` - Documentation updates
### Examples from Git History
```
docs: add README.md for repository
docs: translate CLAUDE.md to German
first commit
```
### Before Committing
1. Run full test suite: `python3 -m pytest -v`
2. Ensure all tests pass (20/20)
3. Verify no regressions in existing functionality
4. Check that new code follows style conventions
5. Add tests for new functionality
### Pull Request Guidelines
- Keep PRs focused on single feature/fix
- Include motivation in description
- Reference issues with `Fixes #ID`
- For security changes: include CLI output or reproduction
- All tests must pass
- Maintain >90% coverage
## Planned Enhancements and Known Gaps
Based on existing documentation and code:
### Missing Features (mentioned in plans but not implemented)
1. **`list-algorithms` command** - Mentioned in plan (Task 9) but not in current code
2. **salt2.py algorithm support** - `salt2.py` doesn't support `--algorithm` flag yet
3. **Integration tests** - `tests/test_integration.py` mentioned in plan but not created
4. **Backward compatibility test** - Specific test mentioned in Task 12 plan
### Documentation Gaps
1. README.md is in German but doesn't mention multi-algorithm support yet
2. CLAUDE.md is German translation but may be outdated vs English version
3. No API docs for algorithm developers
### Potential Improvements
1. Add `list-algorithms` subcommand to CLIs
2. Complete algorithm support in `salt2.py`
3. Add integration tests with subprocess
4. Document algorithm addition process more clearly
5. Consider packaging as proper Python package (`salt/` directory structure)
## Working with Plans and Documentation
### Plan Execution Pattern
The repository includes a detailed plan at `docs/plans/2025-11-13-multi-algorithm-support.md`:
- 12 tasks with step-by-step TDD approach
- Each task has verification steps
- Plan includes commit message examples
- Some tasks completed, others not yet
**If implementing from plans:**
1. Read the full task before starting
2. Follow TDD: failing test first, then implementation
3. Run tests after each step
4. Use suggested commit messages
5. Mark off checklist items as you complete them
### Multiple Documentation Files
- **AGENTS.md**: This file - agent/developer guidelines
- **CLAUDE.md**: German translation, more detailed code examples
- **README.md**: User-facing documentation in German
- **docs/plans/*.md**: Implementation plans with TDD steps
Keep these in sync when making significant changes.
## Quick Reference
### File a Bug or Add a Feature
1. Create test demonstrating issue/feature
2. Implement fix/feature
3. Run `python3 -m pytest -v`
4. Commit with conventional message
5. Update docs if needed
### Debug a Test Failure
```bash
# Run specific test with verbose output
python3 -m pytest tests/test_file.py::test_name -vv
# Run with pdb on failure
python3 -m pytest --pdb tests/test_file.py::test_name
# See print statements
python3 -m pytest -s tests/test_file.py
```
### Verify All Systems
```bash
# Comprehensive check
python3 -m pytest -v && \
python3 salt.py "test" && \
python3 salt.py hash --algorithm argon2 "test" && \
python3 salt.py hash --algorithm bcrypt "test" && \
python3 salt2.py generate "test" && \
echo "All systems operational"
```
---
**Document Version:** 2025-11-13
**Last Updated:** Initial comprehensive analysis based on repository state
**Test Count:** 20 tests (all passing)
**Python Version:** 3.12.3