Microsoft has released two new small language models, called Phi-4-multimodal and Phi-4-mini. You can now find these on Azure AI Foundry, Hugging Face, and NVIDIA API Catalog. The Phi-4-multimodal model, with 5.6 billion parameters, handles speech, images, and text all together. According to Weizhu Chen, a Microsoft VP, it uses special learning to understand and react more naturally, allowing devices to understand different types of information at once. Last year, Microsoft introduced a 14 billion-parameter model, good at complex thinking. This new model beats Google’s Gemini 2 Flash and Gemini 1.5 Pro on audio and visual tests. Microsoft says it’s as good as OpenAI’s GPT-4o. This model is great for things like checking documents and understanding speech. It’s even better than WhisperV3 and SeamlessM4T-v2-Large at understanding and translating speech. It’s number one on the Hugging Face OpenASR list, with a low error rate of 6.14%. It also does well with documents, charts, reading text in images, and understanding science pictures. Phi-4-mini is a 3.8 billion-parameter model that works with text for thinking, coding, and handling long pieces of writing. It can process 128,000 tokens and is efficient. It can also work with other tools and APIs. Both models are good for situations where computing power is limited. They can be made even better with ONNX Runtime to work on different systems and be faster. Microsoft is adding these models to things like Windows and Copilot+ PCs. Vivek Pradeep from Microsoft says that Copilot+ PCs will use Phi-4-multimodal to give you strong AI without using too much power. Developers can use both models on different platforms and in areas like finance, healthcare, and cars.
Microsoft Unleashes Powerful, Compact AI Phi-4 Models Redefine Efficiency

Subscribe to Our Newsletter
Keep in touch with our news & offers