How Can Apple Make Siri Smarter With Ferret-UI, Its Multimodal Large Language Model?

By Shikhar Mehrotra • Updated On 10 Apr 2024

Like
Comment
Share

Reports about Apple working on AI-based features for its voice assistant, Siri, have been doing rounds on the internet for quite some time. However, they’ve not been able to describe how Siri will incorporate its new abilities and benefit users, not until now. A newly published Apple research paper explains how the company could integrate Ferret UI, its generative AI model trained specifically to interpret mobile app screens, into Siri to enhance its use cases and make it one of the smartest voice assistants.

What Is Ferret-UI?

Although the research paper doesn’t elaborate on the potential applications of Ferret-UI, it provides a fair idea of how Apple envisions the AI-based tool to help Siri make sense of images and icons on iPhones’ screens. For those catching up, Ferret-UI is an advanced multimodal large language model (MLLM) designed to understand and consume information beyond text, such as images, videos, and audio, and in this context, iOS’ user interface.

How Can Ferret-UI Fuel Siri’s Transformation Into An AI-Powered Voice Assistant?

Mobile App Interface Recognition

According to the information published in the research paper, Apple has been training Ferret-UI to recognize and analyze mobile screens. “Given that UI screens typically exhibit a more elongated aspect ratio and contain smaller objects of interest (e.g., icons, texts) than natural images, we incorporate any resolution on top of Ferret to magnify details and leverage enhanced visual features,” mentions the paper.

Interaction With Apps

In other words, Ferret-powered Siri should be able to take commands related to the on-screen content. This could include opening and closing apps, pressing a particular button on the screen, navigating around the interface, which is otherwise possible via touch-based inputs, summarizing text on the screen, and so on. What’s promising is that the research paper claims better results than GPT-4V and other leading UI-focused MLLMs.

As and when Siri becomes capable of recognizing the on-screen content and interacting with it, iPhone users will be able to perform a multitude of tasks via voice commands. For instance, Ferret-powered Siri should be able to interact with apps to order food, add items to your shopping list, book flights, search for TV shows on Netflix, and so much more. Although, there’s one thing that we’re concerned about, and that’s the clarity of commands that one might have to maintain.

In its current state and form, Siri does some basic tasks in the intended manner, but it can’t pick up the right words every two out of 10 times. This often happens when trying to use Siri to play specific tracks on Apple Music or open a specific app. However, when it gets an upgrade (based on the current research paper), Siri will become one of the most capable voice assistants.

You can follow Smartprix on Twitter, Facebook, Instagram, and Google News. Visit smartprix.com for the most recent news, reviews, and tech guides

Shikhar Mehrotra

A tech enthusiast at heart, Shikhar Mehrotra has been writing news since college for an undergraduate degree in Journalism and Mass Communication. Over the last four years, he has worked with several national and international publications, including Republic World, and ScreenRant, writing news, how-to explainers, smartphone comparisons, reviews, and list-type articles. When he is not working, Shikhar likes to click pictures, make videos for his YouTube channel, and watch the American sitcom Friends.

Google Pixel 7 Pro User Shares Frustrating Reality of Google Service Centers in India

The service experience at Google Pixel service centers in India can be mixed, as illustrated by a recent experience shared by a user-facing slow charging issues with his Google Pixel 7 Pro. This article delves into the specifics of his ordeal and the challenges encountered with the service center. The Service Center Saga The user’s journey (MohipGhosh1 …

iOS 18 To Focus On AI-Related Features: Mark Gurman

After successfully releasing iOS 17 for compatible devices, Apple is working on the next iteration of its operating system for iPhone – iOS 18. The Cupertino-based giant should announce the version sometime next year, probably during the WWDC 2024, and release it as a beta version for early adopters and testers. Although iOS 17 was …

Galaxy AI Won’t Kill Bixby, But It Might Improve The Voice Assistant Over Time

At the Galaxy Unpacked event held in January, Samsung fully revealed the Galaxy AI. The AI-enabled software utilizes both on-device and cloud-based solutions to enhance communication and increase productivity. However, since some of the new AI-based features are similar to those of Samsung’s in-house voice assistant, Bixby, this raises several questions. Galaxy AI And Bixby …

Apple Forays Into Language Models With OpenELM: What Features Can It Unlock For The iPhone?

Apple has been very secretive about its generative AI plans. There is a general consensus that the company is working on bringing on-device AI abilities to the next iPhone, but we didn’t know what the features could be and how they’ll function, until now. The company’s researchers have released OpenELM, a language model that can …

Apple iPhones could potentially get Google’s Gemini for AI features

The tech community is abuzz with speculation surrounding Apple’s upcoming iOS 18 update, which is rumored to include advanced generative AI capabilities. While Apple had previously hinted at an in-house large language model (LLM) known as Ajax, recent reports suggest the tech giant might opt for a partnership with Google instead, leveraging the latter’s Gemini …