Skip to content

Conversation

@tobyhede
Copy link
Contributor

  1. Slow Statement Logging
    Logs detailed breakdown when statement exceeds threshold (default 2s)

  2. Labeled Metrics
    Existing histograms (statements_session_duration_seconds, statements_execution_duration_seconds) get labels:

  3. ZeroKMS Cipher Init Timing
    New metric: keyset_cipher_init_duration_seconds

  4. Configuration

CS_LOG__SLOW_STATEMENTS=true
CS_LOG__SLOW_STATEMENT_MIN_DURATION_MS=2000
CS_LOG__SLOW_STATEMENTS_LEVEL=warn

Tracks individual phase durations for statement processing:
- parse, encrypt, server_write, server_wait, server_response, client_write, decrypt

Enables detailed performance breakdown in diagnostic logs.
Captures statement-level metadata:
- statement_type (INSERT/UPDATE/DELETE/SELECT/OTHER)
- protocol (simple/extended)
- encrypted flag and value count
- param_bytes and query_fingerprint

Enables labeled metrics and detailed diagnostic logs.
New log target for slow statement logging, configurable via
CS_LOG__SLOW_STATEMENTS_LEVEL environment variable (default: warn).
Adds configuration options for slow statement logging:
- CS_LOG__SLOW_STATEMENTS: enable slow statement logging
- CS_LOG__SLOW_STATEMENT_MIN_DURATION_MS: threshold (default 2000ms)
- CS_LOG__SLOW_STATEMENTS_LEVEL: log level (default warn when enabled)

Simple config: just set CS_LOG__SLOW_STATEMENTS=true
SessionMetricsContext now tracks phase timing breakdown and statement
metadata for diagnostic logging.
Measures time for ZeroKMS cipher initialization, which includes the
network call to ZeroKMS. This helps isolate ZeroKMS latency as a
bottleneck separate from local crypto operations.
…ration() accessors

Convenient accessors for slow statement logging configuration.
Adds methods to Context for recording phase durations:
- record_parse_duration (set once)
- add_encrypt_duration/add_decrypt_duration (accumulate)
- add_server_write_duration/add_server_response_duration/add_client_write_duration (accumulate)
- record_server_wait_or_add_response (helper for first response vs subsequent)
- update_statement_metadata for metadata updates

These enable frontend.rs and backend.rs to instrument phases.
- Frontend tracks server write time
- Backend splits first response wait vs response time
- Backend tracks client write time
Records write_to_server duration using add_server_write_duration helper
parse_handler tracks:
- Parse duration
- Statement type and fingerprint

bind_handler tracks:
- Parameter bytes size
encrypt_literals and encrypt_params now record their duration to
the session phase timing for slow-statement breakdown.
Tracks:
- Parse/typecheck duration for phase breakdown
- Statement type (INSERT/UPDATE/DELETE/SELECT/OTHER)
- Protocol type (simple)
- Query fingerprint
- Multi-statement flag for simple queries
- Task 8: add_encrypt_duration and update_statement_metadata in encrypt_literals and encrypt_params
- Task 9: query_handler already instrumented
- Task 10: parse_handler with Extended protocol timing/metadata, bind_handler with param_bytes
- Task 10a: write_to_server with add_server_write_duration
Documents:
- Configuration options for slow statement logging
- Slow statement log structure and interpretation
- Prometheus metric labels and example queries
- Common symptom/cause mapping
…tatement logging

Existing duration histograms now include labels:
- statement_type: insert/update/delete/select/other
- protocol: simple/extended
- mapped: true/false
- multi_statement: true/false

Slow statement logging with phase breakdown when enabled.
Enables Grafana filtering by statement type and encryption status.
- Integration test scrapes /metrics and asserts statement labels
- Adds diagnostics module for testing metrics endpoint
- Adds reqwest HTTP client dependency for metrics scraping
- Record parse duration before encryption work starts to avoid double-counting
- Update encrypt_params metadata unconditionally for slow-statement logging
- Add decrypt duration recording via add_decrypt_duration() call
- Remove unused re-exports from context/mod.rs
- Fix needless_borrow clippy warnings in query_handler loop

These changes ensure phase timings accurately reflect actual work:
- parse_ms captures SQL parsing + type-checking only
- encrypt_ms and decrypt_ms are tracked separately without overlap
- Statement metadata updates regardless of Prometheus status
Auto-formatting applied by cargo fmt to various files.
Updates duration metric assertions to use regex matching that supports labeled metrics, ensuring quantile matches work regardless of other labels present.
Adds port 9930 to the proxy-tls service in docker-compose.yml to enable metrics collection during TLS-enabled integration tests.
Add doc comments explaining why timing data is accumulated in ExecuteContext
during backend message processing and transferred to SessionMetricsContext
on completion. This pattern exists because the backend operates on the execute
queue without direct access to the session metrics queue.
Replace magic numbers in fetch_metrics_with_retry with documented constants:
- METRICS_FETCH_MAX_RETRIES (5) - ~1 second total wait with 200ms delay
- METRICS_FETCH_RETRY_DELAY_MS (200) - balance responsiveness with scrape time
Address code review suggestions:
- Add warning log when bind_handler falls back to latest_session_id()
- Thread session_id explicitly through write_to_server signature
- Combine duplicate Option<SessionId> guards in encrypt_params
…server

All callers passed None, making the parameter obsolete. Simplified to
always use latest_session_id() directly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants