VII is a full pipeline: ingest any video source, analyze with vision language models, detect events across time, and deliver structured intelligence via API. Deploy anywhere — cloud, on-prem, or air-gapped.
Each component of the pipeline is independently configurable, testable, and replaceable. No black boxes.
Connect any video source — IP cameras via RTSP/ONVIF, drone feeds, body-worn cameras, uploaded files. Automatic format detection, adaptive frame sampling from 1-10 FPS, and intelligent chunking with configurable overlap.
Route video chunks to the best-suited vision language model. Automatic failover between providers. Every response validated against strict Zod schemas — no hallucination passes through.
Cross-chunk event correlation detects events that span multiple segments. Deduplication ensures each event is reported once, not per-chunk. Temporal anchoring for precise timeline placement.
Ask questions in natural language across hours of footage. "Show me every time someone enters the loading dock after 9pm." The search index is built automatically during analysis.
Define alert rules per domain. "Notify if a person is detected in Zone A after hours." Alerts fire over webhooks, WebSocket, or push notifications with full event context.
Every analysis produces strongly-typed JSON. Events, timelines, summaries, objects, persons — all validated against TypeScript types and Zod schemas. No free-text guessing.
Anyone can call an AI model. The hard part is making it reliable, accurate, and production-ready at scale. These are the engineering decisions that make VII different.
Vision language models understand context and meaning. Object detection models catch precise locations. VII runs both on every chunk — YOLO enriches VLM output with bounding boxes and catches objects the VLM missed. Fewer false positives. Fewer missed events.
"Show me every red truck near Gate 3 this week." Not keyword search — semantic understanding over your entire video archive. Every answer cites specific timestamps and source footage.
Cross-camera person tracking uses text descriptions — not facial recognition, not biometric data. Raw video can stay inside your environment, and the platform is designed to support tightly governed deployments.
Every AI output is validated against strict schemas before it reaches your systems. Automatic failover between model providers. If one model is down, the next takes over without interruption.
Fully managed by VII. Zero infrastructure to maintain. Automatic scaling, updates, and monitoring. Data encrypted at rest and in transit.
Deploy inside your network perimeter. Air-gapped mode available — video never leaves your environment. Full control over data residency and retention.
Process video on-site, sync intelligence to the cloud. Keep raw video local while enabling cross-site dashboards, unified alerting, and central administration.
Request a walkthrough with our team. We'll run your own footage through the pipeline live.
Request Early Access