
Transformers.js: Running LLMs in the Browser
by Deep Parmar
CTO at Sunbots Innovations LLP | Director at Xwits Developers Pvt Ltd

What Transformers.js Does
Transformers.js is a JavaScript library from Hugging Face that lets you run transformer models directly in the browser or Node.js, without any Python or server dependency. It uses ONNX Runtime Web as the inference engine, which means models trained in PyTorch can run in a browser tab after export to ONNX format.
The library mirrors the Transformers Python API — if you've used pipeline('sentiment-analysis') in Python, the browser version works almost identically. This makes it accessible to ML practitioners who know Python and to web developers who've never written a neural network.
Installing and Basic Usage
npm install @huggingface/transformers
The library is about 2MB of JavaScript. Models are downloaded from the Hugging Face Hub on first use and cached in the browser's Cache API — subsequent loads use the cached model instantly.
The simplest usage is the pipeline API:
import { pipeline } from '@huggingface/transformers';
// Feature extraction (embeddings)
const extractor = await pipeline(
'feature-extraction',
'Xenova/all-MiniLM-L6-v2'
);
const embedding = await extractor('Hello, world!', {
pooling: 'mean',
normalize: true
});
console.log(embedding.data); // Float32Array of 384 dimensions
// Text classification
const classifier = await pipeline(
'text-classification',
'Xenova/distilbert-base-uncased-finetuned-sst-2-english'
);
const result = await classifier('I love this product!');
// [{ label: 'POSITIVE', score: 0.9998 }]
Using It in a Web Worker
Always run Transformers.js in a Web Worker. Model loading and inference block the JavaScript thread — if they run on the main thread, your UI freezes. The library supports Web Workers natively:
// worker.js
import { pipeline } from '@huggingface/transformers';
let extractor = null;
self.onmessage = async (event) => {
const { type, payload } = event.data;
if (type === 'init') {
extractor = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
self.postMessage({ type: 'ready' });
}
if (type === 'embed') {
const result = await extractor(payload.text, { pooling: 'mean', normalize: true });
self.postMessage({ type: 'embedding', data: Array.from(result.data) });
}
};
// main.js
const worker = new Worker(new URL('./worker.js', import.meta.url), { type: 'module' });
worker.postMessage({ type: 'init' });
worker.onmessage = (e) => {
if (e.data.type === 'ready') {
worker.postMessage({ type: 'embed', payload: { text: 'Hello!' } });
}
if (e.data.type === 'embedding') {
console.log('Embedding:', e.data.data.slice(0, 5));
}
};
Models That Work Well in the Browser
Not all Hugging Face models work in the browser. They need to be in ONNX format and available on the Hub. The Xenova namespace (maintained by Hugging Face) has pre-converted versions of popular models:
- Embeddings:
Xenova/all-MiniLM-L6-v2(23MB, 384 dims),Xenova/all-mpnet-base-v2(418MB, 768 dims) - Classification:
Xenova/distilbert-base-uncased-finetuned-sst-2-english(67MB) - Translation:
Xenova/opus-mt-en-hi(298MB, English to Hindi) - Speech recognition:
Xenova/whisper-tiny(75MB),Xenova/whisper-base(145MB) - Small LLMs:
Xenova/LaMini-Flan-T5-783M(783M params, generative)
Dhiya NPM uses all-MiniLM-L6-v2 as the default embedding model — it's the best balance of size, speed, and quality for RAG applications in the browser.
Managing Model Downloads
The first use of any model downloads it from the Hub. MiniLM is 23MB — fine for most applications. Larger models (Whisper, MPNet) are 150–400MB. Design your loading experience accordingly:
const extractor = await pipeline(
'feature-extraction',
'Xenova/all-MiniLM-L6-v2',
{
progress_callback: (progress) => {
if (progress.status === 'downloading') {
console.log(`Downloading: ${(progress.loaded / progress.total * 100).toFixed(1)}%`);
}
}
}
);
Models are cached in the browser's Cache Storage after the first download — they're available offline on subsequent visits and load in under 200ms from cache.
Dhiya NPM abstracts Transformers.js into a clean RAG API so you don't have to manage pipelines, workers, or caching directly. See the build tutorial →
Frequently Asked Questions
Quick answers about this topic — also indexed by AI search engines via FAQPage schema.
Share this article:
