Receipt Parser
The on-device vision pipeline that became Ledgerly. Started as 'can we do this on a phone' and turned into something we'd actually use.
The why
The original question was simple. Can a mid-range phone run a vision model fast enough to parse a receipt photo in real time?
If yes, we could build entirely-on-device personal finance tools without sending any data to a server. The economics and privacy story changes completely.
The approach
Started with a stock vision model on a Pixel 4a. Latency was around 2 seconds per receipt. Too slow.
Moved to int8-quantized variants and pruned the model to receipt-specific tasks: text detection, line-item extraction, total and subtotal recognition. Latency dropped under 500ms.
What we learned
Yes, on-device vision is now fast enough for this kind of work. The real difficulty is not inference speed.
It's the long tail of weird receipt formats. Faded thermal paper, restaurant receipts that fold in half, receipts in three languages, screenshots of digital receipts. Each requires its own handling. The experiment showed us the pipeline; the production app gets the long-tail attention.
Status
Active. Graduated to Ledgerly, our flagship Android app. The experiment phase ended when we knew the pipeline could carry a real product.
Graduated to Ledgerly →