
Object Detection on Android: YOLO and TFLite
by Deep Parmar
CTO at Sunbots Innovations LLP | Director at Xwits Developers Pvt Ltd

Why On-Device Object Detection
For SmartON's currency detection to be useful for visually impaired users, it needed to work without internet — in a market, in a rural area, anywhere. Server-based inference was ruled out from the start. On-device inference using TensorFlow Lite was the only viable path.
This tutorial covers the full pipeline: training a YOLO model, converting it to TFLite, integrating it with Android's camera API, and optimizing for real-world performance. The code patterns come directly from SmartON's currency detection implementation.
Step 1: Train Your YOLO Model
We use YOLOv8 (Ultralytics) for training. It exports cleanly to ONNX and from there to TFLite, has strong documentation, and the nano/small variants are small enough for mobile inference.
from ultralytics import YOLO
# Train from scratch or fine-tune a pretrained model
model = YOLO('yolov8n.pt') # nano variant for mobile
results = model.train(
data='currency.yaml', # your dataset config
epochs=100,
imgsz=640,
batch=16,
device='cuda',
workers=4,
augment=True # mosaic, mixup, flips, color jitter
)
Your currency.yaml should specify train/val/test paths and class names. For SmartON, we had 7 classes (₹10, ₹20, ₹50, ₹100, ₹200, ₹500, ₹2000) plus an orientation class attribute.
Step 2: Export to TFLite with INT8 Quantization
Quantizing from FP32 to INT8 reduces model size by 4× and inference time by 2–3× on mobile, with under 2% accuracy loss for most detection tasks:
# Export to ONNX first, then to TFLite
model.export(format='tflite', int8=True, data='currency.yaml')
# This runs representative data through the model to calibrate quantization
# Provide 100-200 calibration images representative of your production data
The output is a .tflite file. For SmartON's currency model, the FP32 model was 12.4MB; the INT8 quantized version is 3.1MB — comfortable for embedding in an Android APK.
Step 3: Android Integration
Add the TFLite dependencies to your build.gradle:
dependencies {
implementation 'org.tensorflow:tensorflow-lite:2.14.0'
implementation 'org.tensorflow:tensorflow-lite-gpu:2.14.0' // GPU delegate
implementation 'org.tensorflow:tensorflow-lite-support:0.4.4'
}
Load and run the model:
class CurrencyDetector(private val context: Context) {
private lateinit var interpreter: Interpreter
fun init() {
val gpuDelegate = GpuDelegate()
val options = Interpreter.Options().addDelegate(gpuDelegate)
val modelBuffer = loadModelFile("currency_int8.tflite")
interpreter = Interpreter(modelBuffer, options)
}
fun detect(bitmap: Bitmap): List {
val input = preprocessBitmap(bitmap) // resize to 640x640, normalize
val output = Array(1) { Array(25200) { FloatArray(12) } } // YOLO output shape
interpreter.run(input, output)
return postprocess(output, bitmap.width, bitmap.height)
}
}
The postprocessing step applies non-maximum suppression (NMS) and maps detection coordinates from model space back to image space. Use the ObjectDetector class from TFLite Support Library to handle this if you don't want to implement NMS yourself.
Step 4: Camera Integration with CameraX
CameraX (Jetpack) is the right choice for camera integration in 2025. It handles device-specific quirks, lifecycle management, and provides a clean API for analysis use cases:
val imageAnalysis = ImageAnalysis.Builder()
.setTargetResolution(Size(640, 640))
.setBackpressureStrategy(ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST)
.build()
imageAnalysis.setAnalyzer(cameraExecutor) { imageProxy ->
val bitmap = imageProxy.toBitmap() // extension function
val detections = currencyDetector.detect(bitmap)
runOnUiThread { updateUI(detections) }
imageProxy.close() // IMPORTANT: always close the proxy
}
STRATEGY_KEEP_ONLY_LATEST is critical for real-time inference. If your model can't process frames as fast as the camera produces them, this strategy drops stale frames rather than building a backlog — keeping inference current rather than delayed.
Performance Optimization
Achieved performance on SmartON (Snapdragon 870, NNAPI):
- YOLOv8n INT8, input 320×320: ~28ms per frame (35fps effective)
- YOLOv8n INT8, input 640×640: ~68ms per frame (14fps effective)
- YOLOv8s INT8, input 640×640: ~145ms per frame (6fps effective)
For real-time use cases, use the nano variant with 320×320 input. For accuracy-critical use cases (detailed OCR, precise distance estimation), step up to 640×640 at the cost of frame rate.
Enable NNAPI delegate on supported devices (Snapdragon 855+): it offloads inference to the dedicated neural processing unit and is typically 2–4× faster than the GPU delegate for quantized models.
SmartON uses this pipeline for currency detection. Read the full SmartON build story → or reach out if you're building computer vision for Android.
Frequently Asked Questions
Quick answers about this topic — also indexed by AI search engines via FAQPage schema.
Share this article:
