Tommy-yw/RunbookHermes
Tommy-yw/RunbookHermesHermes-native AIOps agent for evidence-driven incident response, approval-gated remediation, and runbook learning.
From the README
RunbookHermes
Hermes-native AIOps Agent for payment incident response, evidence-driven root-cause analysis, approval-gated remediation, and runbook learning.
RunbookHermes is built by adapting the official Hermes Agent runtime into a production-oriented incident-response system. It keeps Hermes Agent's strengths—runtime loop, provider routing, tool system, memory, context engine, skills, gateway, and safety boundaries—and specializes them for AIOps workflows such as payment-system failures, observability evidence collection, approval, checkpoint, rollback, recovery verification, and runbook knowledge accumulation.
RunbookHermes is not a separate toy dashboard beside Hermes Agent. It is a Hermes-native vertical extension: Hermes provides the agent foundation; RunbookHermes adds the incident-response domain layer.
Product Screenshots
The screenshots below show the current RunbookHermes Web Console. Put these images under docs/assets/ and keep the file names consistent with the Markdown paths.
AIOps Console Overview
The overview page shows the high-level AIOps control plane: incident count, pending approvals, generated skills, critical services, recommended operation flow, current capability boundaries, and a live monitoring preview.
Realtime Monitoring System
The monitoring page provides a multi-dimensional service health view for payment-service, coupon-service, and order-service, including HTTP status signals, QPS, p95 latency, service topology, backend mode, and deployment state.
The lower section of the monitoring page shows log signals and trace signals. This is where RunbookHermes connects observability data to incident diagnosis instead of relying only on model guesses.
Incident Command Center
The incident list page normalizes incidents created from Web, Alertmanager, Feishu, WeCom, or API entry points. It shows service, status, severity, root cause, creation time, and quick incident creation actions.
Incident Detail: Evidence and Executive Summary
The incident detail page displays evidence cards from metrics, logs, and traces, plus an executive summary with root cause, recommended action, evidence IDs, confidence, and approval status.
Incident Detail: Root Cause and Model-Assisted Summary
The root-cause tab separates deterministic evidence from optional model-assisted explanation. The model summary is only enabled when a model provider is configured.
Incident Detail: Actions, Approvals, and Checkpoints
Risky actions are not executed blindly. RunbookHermes places write or destructive actions behind approval, checkpoint, dry-run, controlled execution, and recovery verification.
Incident Detail: Timeline
The timeline records the full incident lifecycle, including incident creation, evidence collection, hypothesis generation, action planning, checkpoint creation, approval request, approval decision, skill generation, and execution result.