VLM benchmark evaluating indoor home safety hazard detection from static camera frames. Inspired by HomeSafeBench (arXiv 2509.23690), adapted for fixed indoor security cameras with AI-generated fixture images.
40 tests across 5 hazard categories:
| Category | Tests | Example Hazards |
|---|---|---|
| 🔥 Fire / Smoke | 8 | Stove smoke, candle near curtain, iron left face-down |
| ⚡ Electrical | 8 | Overloaded power strip, exposed wire, frayed cord |
| 🚶 Trip / Fall | 8 | Toys on staircase, wet floor, loose rug edge |
| 👶 Child Safety | 8 | Open cabinet with chemicals, uncovered outlet, blind cord |
| 📦 Falling Objects | 8 | Heavy items on high shelf, leaning bookshelf, loose ceiling fan |
Requires npm install. This skill has a package.json with dependencies (e.g. openai SDK). The deployment agent should check for package.json and run npm install in the skill directory before launching.
Entry script: scripts/run-benchmark.cjs
node scripts/run-benchmark.cjs --help
When spawned by Aegis, all configuration is injected via environment variables. The benchmark discovers your VLM server automatically, generates an HTML report, and opens it when complete.
# Run all 40 tests
node scripts/run-benchmark.cjs --vlm http://localhost:5405
# Quick mode (2 tests per category = 10 total)
node scripts/run-benchmark.cjs --vlm http://localhost:5405 --mode quick
# Skip report auto-open
node scripts/run-benchmark.cjs --vlm http://localhost:5405 --no-open
| Variable | Default | Description |
|---|---|---|
AEGIS_VLM_URL |
(required) | VLM server base URL |
AEGIS_VLM_MODEL |
— | Loaded VLM model ID |
AEGIS_SKILL_ID |
— | Skill identifier (enables skill mode) |
AEGIS_SKILL_PARAMS |
{} |
JSON params from skill config |
Note: URLs should be base URLs (e.g.
http://localhost:5405). The benchmark appends/v1/chat/completionsautomatically.
| Parameter | Type | Default | Description |
|---|---|---|---|
mode |
select | full |
Which mode: full (40 tests) or quick (10 tests — 2 per category) |
noOpen |
boolean | false |
Skip auto-opening the HTML report in browser |
| Argument | Default | Description |
|---|---|---|
--vlm URL |
(required) | VLM server base URL |
--mode MODE |
full |
Test mode: full or quick |
--out DIR |
~/.aegis-ai/homesafe-benchmarks |
Results directory |
--no-open |
— | Don't auto-open report in browser |
AEGIS_VLM_URL=http://localhost:5405
AEGIS_SKILL_ID=homesafe-bench
AEGIS_SKILL_PARAMS={}
{"event": "ready", "vlm": "SmolVLM-500M", "system": "Apple M3"}
{"event": "suite_start", "suite": "🔥 Fire / Smoke"}
{"event": "test_result", "suite": "...", "test": "...", "status": "pass", "timeMs": 4500}
{"event": "suite_end", "suite": "...", "passed": 7, "failed": 1}
{"event": "complete", "passed": 36, "total": 40, "timeMs": 180000, "reportPath": "/path/to/report.html"}
Human-readable output goes to stderr (visible in Aegis console tab).
This benchmark is inspired by:
HomeSafeBench: Towards Measuring the Proficiency of Home Safety for Embodied AI Agents arXiv:2509.23690
Unlike the academic benchmark (embodied agent + navigation in simulated 3D environments), our version uses static indoor camera frames — matching real-world indoor security camera deployment (fixed wall/ceiling mount). All fixture images are AI-generated consistent with DeepCamera's privacy-first approach.
npm install (for openai SDK dependency)