CERBERUS

Triage Summary

—

Run an investigation to see the triage summary here.

Executive Summary

One-sentence conclusion appears after investigation.

Alert Identity

Alert metadata appears here after investigation.

Top Signals

0 signals

Main drivers of the classification appear here after investigation.

+ pushes toward Malicious − pushes toward Benign High impact (≥66%) Medium (33–66%) Low (<33%) Numbers are signed contribution weights — bigger = stronger influence.

Risk Threshold

0.50

Classification sensitivity. Alerts with malicious probability at or above the threshold are labelled Malicious; below are Benign.

0.00 1.00

Score—

Verdict—

Delta—

Time	Alert ID	Event Type	Source IP	Destination	Verdict	Conf	Tokens
Loading incident history…

Time

Alert ID

Event Type

Source IP

Destination

Verdict

Conf

Tokens

Loading incident history…

MSCS 670 · Spring 2026

An autonomous Tier-1 SOC analyst that classifies security alerts as Malicious or Benign. Built on the LangChain deepagents framework — every alert goes through a lead orchestrator plus three parallel subagents and a calibrated risk scorer. No pre-agent bypass.

299/300

Accuracy · 300-alert benchmark

0.997

F1 Score · Precision 100% · Recall 99.3%

$0.026

Cost · Full 300-alert run

11s

Avg latency · Per alert

Agent Architecture

LangChain Deep Agents

Input

Alert JSON

event_type · raw_log · IPs · severity

Lead Orchestrator

Gemini 2.5 Flash

Emits 3 parallel task calls in a single turn

Subagent · Parallel

network-forensics

check_network_context

Subagent · Parallel

log-analysis

analyze_log_pattern

Subagent · Parallel

threat-intel

check_ip_reputation
query_threat_intelligence
geolocate_and_check_travel

Risk Synthesis

calculate_risk_score

Weighted evidence + sigmoid calibration → probability

Output

Verdict JSON

verdict · confidence · reasoning → SQLite history

Model Comparison

Tested on 300-alert benchmark

Gemini 2.5 Flash RECOMMENDED

google/gemini-2.5-flash

Accuracy299/300 (99.7%)

F1 Score0.997

Precision100%

Recall99.3%

Avg Latency~12.6 s / alert

Avg Tokens~584 / alert

Cost (300 run)~$0.026

Best overall accuracy and cost-performance balance. Used as primary.

GPT-4o-mini BACKUP

openai/gpt-4o-mini

Accuracy293/300 (97.7%)

F1 Score0.983

Precision99.1%

Recall97.5%

Avg Latency~24.1 s / alert

Avg Tokens~797 / alert

Cost (300 run)~$0.036

Fallback for when Gemini is unavailable. Slower and slightly less accurate.

Tech Stack

Python 3.11+ LangChain Deep Agents OpenRouter Gemini 2.5 Flash Flask SQLite Vanilla JS Inline SVG

Datasets

Dataset	Size	Purpose
Sanity set	20 labeled alerts	Smoke tests during development
Benchmark set	300 labeled alerts	Primary evaluation (150 B / 150 M)
Hidden test set	500 alerts	Professor's final competition evaluation

MSCS 670 · Special Topics: Agentic AI · Spring 2026

Joaan Bin Jassim Academy for Defence Studies

CERBERUS SOC Deep Agent

① Model Selection

② Alert Input

Verdict

Reasoning

Triage Summary

Executive Summary

Alert Identity

Top Signals

Risk Threshold

Batch Impact (from last batch run with true labels)

Full Agent Reasoning

MITRE ATT&CK Matrix

Investigation Trace

Session Timeline

Risk Gauge

Signal Distribution

Attack Flow

Signal Contribution

Risk Score Cascade

Final Assessment

Cross-Incident History

300-Alert Benchmark

20-Alert Sanity Set

450-Alert Large-Scale

Generate Report

Summary Report

Full Report

Generated Reports

Preview

CERBERUS

Agent Architecture

Model Comparison

Tech Stack

Datasets