10 min read

    Retail Theft Detection with Edge AI on Jetson Nano

    by Deep Parmar

    CTO at Sunbots Innovations LLP | Director at Xwits Developers Pvt Ltd

    Retail Theft Detection: Edge AI on Jetson Nano

    The Problem with Cloud CCTV Analytics

    Retail loss prevention is a real problem — the National Retail Federation estimates 1.6% of retail revenue is lost to theft annually. AI-powered CCTV analytics products exist, but most of them require streaming video to cloud servers for analysis. For mid-size Indian retailers, this creates three problems: bandwidth cost (streaming multiple HD camera feeds continuously is expensive), latency (cloud round trips add 1–3 seconds before an alert fires, which is often too late), and privacy/data sovereignty (video of customers leaving their premises indefinitely).

    We built a retail theft detection system that runs entirely on a Jetson Nano placed inside the store. Video never leaves the premises. Alerts are generated in under 200ms from event occurrence. And the hardware cost is under ₹12,000 per camera — significantly lower than enterprise cloud analytics subscriptions.

    The Three-Model Architecture

    Retail theft detection is harder than it appears because "theft" isn't a visual primitive — it's a behavioral pattern that requires understanding what a person is doing over time, not just what they look like in a single frame. We use three models in sequence:

    Model 1 — Person Detection: YOLOv8n (nano variant) detects all persons in the frame at ~30fps. The nano variant fits in Jetson Nano's 4GB RAM with room for the other models. We run detection at every 3rd frame (10fps effective for detection) to balance accuracy against inference cost. Between detection frames, we use a lightweight tracker (ByteTrack) to maintain person identities without re-running detection.

    Model 2 — Pose Estimation: For each detected person, we run a lightweight pose estimation model (MoveNet Lightning, optimized for Jetson). Pose estimation gives us 17 joint positions per person per frame — hands, wrists, elbows, shoulders, knees, ankles. This is the most computationally expensive step and runs at ~8fps for scenes with 5 or fewer people.

    Model 3 — Action Recognition: The pose sequence over a 2-second window is fed into a temporal classification model that classifies the behavior into one of four categories: browsing, selecting, placing in bag, and uncertain. "Placing in bag" triggers an alert. "Uncertain" buffers for the next 2 seconds before deciding.

    The Hard Part: False Positive Rate

    A system that alerts 50 times per day is useless — security guards will ignore it. Our target false positive rate was under 3 per 8-hour shift, which means under 0.4 per hour in a busy retail environment.

    Achieving this required careful attention to three sources of false positives:

    • Occlusion: When one person walks in front of another, pose tracking can mis-assign joint positions. We added an occlusion detection stage that suppresses action recognition for a person whose bounding box overlaps more than 40% with another person.
    • Children: Children's proportions and movements differ significantly from adults. The action recognition model performs poorly on children because training data was predominantly adult. We added an adult/child classification step and suppress theft alerts for detected children.
    • Legitimate actions that look similar: Placing an item in a reusable shopping bag looks similar to shoplifting. We trained the action recognition model with specific examples of legitimate bag-placing behavior — the key discriminator is whether the item came from the shelf or from the cart/basket.

    Alert Design and Workflow

    The system doesn't call the police — it alerts a security guard via a tablet app showing a cropped video clip of the flagged behavior. The guard confirms or dismisses the alert. This human-in-the-loop design is important for several reasons: it prevents automated consequences from AI errors, it provides labeled feedback data for model improvement, and it keeps the system operating within the appropriate bounds of AI decision support rather than AI decision-making.

    Alert clips are stored locally on the Jetson Nano for 72 hours and then deleted — no long-term cloud storage of customer video.

    Results in Deployment

    After 6 months of deployment in a mid-size retail outlet in Ahmedabad, the system detected 23 confirmed theft incidents and generated 4 false positives per week on average (down from 12 per week in the first month after calibration). The retailer estimates approximately ₹3.8 lakh in prevented losses over the period, against a system cost of ₹45,000 (hardware + software).

    Interested in edge AI for retail or security applications? Reach out — we've developed playbooks for this type of deployment and I'm happy to share what works.

    Frequently Asked Questions

    Quick answers about this topic — also indexed by AI search engines via FAQPage schema.

    Share this article: