11 min read

    Object Detection on Android: YOLO and TFLite

    by Deep Parmar

    CTO at Sunbots Innovations LLP | Director at Xwits Developers Pvt Ltd

    Object Detection on Android: YOLO + TFLite | Deep Parmar

    Why On-Device Object Detection

    For SmartON's currency detection to be useful for visually impaired users, it needed to work without internet — in a market, in a rural area, anywhere. Server-based inference was ruled out from the start. On-device inference using TensorFlow Lite was the only viable path.

    This tutorial covers the full pipeline: training a YOLO model, converting it to TFLite, integrating it with Android's camera API, and optimizing for real-world performance. The code patterns come directly from SmartON's currency detection implementation.

    Step 1: Train Your YOLO Model

    We use YOLOv8 (Ultralytics) for training. It exports cleanly to ONNX and from there to TFLite, has strong documentation, and the nano/small variants are small enough for mobile inference.

    from ultralytics import YOLO
    
    # Train from scratch or fine-tune a pretrained model
    model = YOLO('yolov8n.pt')  # nano variant for mobile
    
    results = model.train(
      data='currency.yaml',      # your dataset config
      epochs=100,
      imgsz=640,
      batch=16,
      device='cuda',
      workers=4,
      augment=True               # mosaic, mixup, flips, color jitter
    )

    Your currency.yaml should specify train/val/test paths and class names. For SmartON, we had 7 classes (₹10, ₹20, ₹50, ₹100, ₹200, ₹500, ₹2000) plus an orientation class attribute.

    Step 2: Export to TFLite with INT8 Quantization

    Quantizing from FP32 to INT8 reduces model size by 4× and inference time by 2–3× on mobile, with under 2% accuracy loss for most detection tasks:

    # Export to ONNX first, then to TFLite
    model.export(format='tflite', int8=True, data='currency.yaml')
    # This runs representative data through the model to calibrate quantization
    # Provide 100-200 calibration images representative of your production data

    The output is a .tflite file. For SmartON's currency model, the FP32 model was 12.4MB; the INT8 quantized version is 3.1MB — comfortable for embedding in an Android APK.

    Step 3: Android Integration

    Add the TFLite dependencies to your build.gradle:

    dependencies {
      implementation 'org.tensorflow:tensorflow-lite:2.14.0'
      implementation 'org.tensorflow:tensorflow-lite-gpu:2.14.0'  // GPU delegate
      implementation 'org.tensorflow:tensorflow-lite-support:0.4.4'
    }

    Load and run the model:

    class CurrencyDetector(private val context: Context) {
      private lateinit var interpreter: Interpreter
    
      fun init() {
        val gpuDelegate = GpuDelegate()
        val options = Interpreter.Options().addDelegate(gpuDelegate)
    
        val modelBuffer = loadModelFile("currency_int8.tflite")
        interpreter = Interpreter(modelBuffer, options)
      }
    
      fun detect(bitmap: Bitmap): List {
        val input = preprocessBitmap(bitmap)  // resize to 640x640, normalize
        val output = Array(1) { Array(25200) { FloatArray(12) } }  // YOLO output shape
    
        interpreter.run(input, output)
        return postprocess(output, bitmap.width, bitmap.height)
      }
    }

    The postprocessing step applies non-maximum suppression (NMS) and maps detection coordinates from model space back to image space. Use the ObjectDetector class from TFLite Support Library to handle this if you don't want to implement NMS yourself.

    Step 4: Camera Integration with CameraX

    CameraX (Jetpack) is the right choice for camera integration in 2025. It handles device-specific quirks, lifecycle management, and provides a clean API for analysis use cases:

    val imageAnalysis = ImageAnalysis.Builder()
      .setTargetResolution(Size(640, 640))
      .setBackpressureStrategy(ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST)
      .build()
    
    imageAnalysis.setAnalyzer(cameraExecutor) { imageProxy ->
      val bitmap = imageProxy.toBitmap()  // extension function
      val detections = currencyDetector.detect(bitmap)
      runOnUiThread { updateUI(detections) }
      imageProxy.close()  // IMPORTANT: always close the proxy
    }

    STRATEGY_KEEP_ONLY_LATEST is critical for real-time inference. If your model can't process frames as fast as the camera produces them, this strategy drops stale frames rather than building a backlog — keeping inference current rather than delayed.

    Performance Optimization

    Achieved performance on SmartON (Snapdragon 870, NNAPI):

    • YOLOv8n INT8, input 320×320: ~28ms per frame (35fps effective)
    • YOLOv8n INT8, input 640×640: ~68ms per frame (14fps effective)
    • YOLOv8s INT8, input 640×640: ~145ms per frame (6fps effective)

    For real-time use cases, use the nano variant with 320×320 input. For accuracy-critical use cases (detailed OCR, precise distance estimation), step up to 640×640 at the cost of frame rate.

    Enable NNAPI delegate on supported devices (Snapdragon 855+): it offloads inference to the dedicated neural processing unit and is typically 2–4× faster than the GPU delegate for quantized models.

    SmartON uses this pipeline for currency detection. Read the full SmartON build story → or reach out if you're building computer vision for Android.

    Frequently Asked Questions

    Quick answers about this topic — also indexed by AI search engines via FAQPage schema.

    Share this article: