Tinoosh Mohsenin presents “BitMedViT: Ternary Quantized Vision Transformer for Medical Assistants on the Edge at the 44th ICCAD Conference in Munich – Laboratory for Computational Sensing + Robotics

Published: February 23, 2026

Author: EDGE AI FOUNDATION

Source, LinkedIn: EDGE AI FOUNDATION

Can you really run generative AI on the edge without blowing up memory, power, or latency?

At EDGE AI FOUNDATION, we sat down with Tinoosh Mohsenin, The Johns Hopkins University to break down what it actually takes to make transformers practical on Jetson-class devices and beyond.

This isn’t theory. It’s engineering.

In this session, Tinoosh walks through two powerful strategies for lean, deployable generative edge AI:

• Pruning where it matters most: the feedforward layers
• Structured sparsity that works with hardware, not against it
• Quantization that preserves accuracy while slashing memory traffic
• A 2-bit Vision Transformer designed for medical AI at the edge

The results speak for themselves:

• Multi-fold reductions in model size and energy
• Up to 43x model compression
• 22x latency improvements
• Accuracy that stays remarkably close to baseline

The big takeaway? FLOPs aren’t the real bottleneck. Memory, movement, and energy are. And if you focus your compression strategy there, edge AI becomes not just possible, but production-ready.

If you’re building robotics, clinical imaging tools, industrial systems, or any application where intelligence must live near the sensor, this one is worth your time.

Watch the full talk here:

Tinoosh Mohsenin presents "BitMedViT: Ternary Quantized Vision Transformer for Medical Assistants on the Edge at the 44th ICCAD Conference in Munich

Stay Connected

Address

Contact

Site Menu

Share Options

Site Menu