Llama Vs deepseek Vs Phi-3 Mini

Full Video Transcript

I tested four local models and the biggest one was not the winner. All tested inside Jan, same machine. Here’s what I found.

Llama 3.23B 56 tokens per second GPU.

Deepseek R1 1.5B 28 tokens per second GPU.

Fi3 Mini 27 tokens per second GPU.

Llama 3.18B 7 tokens per second CPU fallback.

That 8B model looks impressive on paper, but it overflowed GPU memory, dropped to CPU, and performance collapsed.

Now quality, Llama 3.23B crushed it. Content, code, strategy, all strong. Llama 3.18B was excellent, too, but too heavy for portable setups.

Deepseek was thoughtful, but verbose. Fi3 was sharp for code, weak for strategy.

Lesson:
Bigger does not mean better. Fit matters. Hardware alignment matters. Llama 3.23B hit the sweet spot.

Follow for more AI unlocks.