THE FUTURE IS HERE

LLaVA for AI image recognition on your home PC

I’m using LLaVA v1.6 13b hf as an assistant for general purpose questions and image recognition. I use the Whisper model for speech recognition and SAPI PC voices for speech synthesis, mostly from CereProc of Edinburgh, Scotland.
https://app.cereproc.com/

LLaVA is here:
https://huggingface.co/llava-hf/llava-v1.6-vicuna-13b-hf

All this runs comfortably on my PC with Nvidia card GeForce RTX 5070 Ti.

It’s kind of average at mathematics, good at image recognition and seems to be quite powerful compared with my previous attempt with Vicuna and an add on called Blip which although describes images only gives a line or two description. LLaVA can count (when it feels like it!) and gives elaborate descriptions. The program runs in Python on your home PC or with teh PC acting as a server via Gradio on any hand held device. I demonstrate it’s use here. Dont’ expect miracles but compare with what was possible say 4 years ago and remember this works entirely on your own PC.

Whisper for speech recognition is here: (Open AI)
https://github.com/openai/whisper