In the race to deploy the most advanced AI-based language model, OpenAI (and its largest investor, Microsoft) and Google aren’t ready to slow down. Recently, OpenAI dropped the GPT-4 update, integrating several new abilities like data interpretation, image recognition, and more. Now, the Alphabet-owned tech giant has come up with its most advanced LLM, Gemini. That said, here are five exciting things Google’s latest AI model can do.
What Is Gemini Capable Of?
With advanced multimodality, Gemini can handle text, images, speech, code, video, patterns, and more. Google also says that Gemini is its most flexible model yet, as it can run efficiently on data centers with massive processing power to mobile devices with limited resources. Gemini 1.0, the first version, is optimized for three different use cases. These include the Gemini Nano for on-device tasks, the Gemini Pro for scaling across a wide range of tasks on a workstation, and the Gemini Ultra for highly complex tasks.
Gemini Ultra Vs. GPT-4: Here’s What The Benchmarks Say
Per Google, Gemini is the first model to outperform human experts on massive multitasking language understanding, as it understands 57 different subjects, including math, physics, law, medicine, and so much more. Some benchmarks where Gemini Ultra beats OpenAI’s GPT-4 include MMLU, Big-Bench Hard, DROP, GSM8K, AMTH, HumanEval, and Natural2Code. This implies that Gemini Ultra is better at handling diverse tasks requiring multi-step reasoning, reading comprehension, basic arithmetic manipulations, challenging match problems, and Python code generation.
Gemini Can Detect Similarities And Differences Between Two Images
Google’s multimodal AI model can find similarities between images. Gemini finds connecting points between two rather complicated images in a demo video uploaded on the company’s YouTube channel. It can identify that both have a curved and organic composition, implying that it understands what’s drawn in the image and can cross-reference the inference with its database to generate a response, all within seconds.
Gemini Can Explain Reasoning And Match In Simple Steps
Google showcases how Gemini can understand the formulas and steps written on handwritten paper and tell the correct ones from the wrong ones. In the demo, one asks Gemini to focus on one of such problems solved on a paper and figure out the mistake in calculation. Gemini gets this right and can even explain the mathematical or scientific concept behind the formula before performing the correct calculation. This way, Gemini can be useful for students who struggle to solve tricky mathematics or physics numerical problems.
Gemini Supports Python, Java, C++, And Go
Another demo video on Google’s YouTube channel mentions how Gemini consistently solves 75 percent of the 200 benchmarking programs (in the first try) on Python, up from 45 percent on the PaLM 2. Further, allowing Gemini to recheck and repair its codes, the solve rate goes over 90 percent, which indicates that the AI model can help coders remove errors from their programs and run them smoothly.
Gemini Can Recognise Clothes
In another example, Google shows how Gemini can understand different pieces of clothing and provide related reasoning. Although Google didn’t cover this part, Gemini should also be able to provide outfit ideas based on color combinations and climate. For instance, if someone asks what type of jeans or pants go with a puffer jacket, Gemini should be able to suggest some ideas. Similarly, Gemini can also identify what’s going on in a video, whether someone is creating a drawing, performing a magic trick, or playing a movie.
Gemini Can Extract Data From Thousands Of Research Papers In Minutes
Generally, referring from a massive data set could take months of manual reading and taking notes. However, Google showcases how Gemini recognized the research papers (from about 200,000) relevant to a study. Then, Gemini extracted the required information from the relevant papers and updated a particular data set.
Gemini can also reason about figures, such as charts and graphs, and create new ones with updated figures. This way, Google’s new AI model can help scientists and scholars get references and citations faster.
Pixel 8 Pro And Bard To Get First Taste
While these demos were showcased on a custom user interface, this implies that developers can utilize Gemini’s advanced capabilities to create their AI-based tools out of it. Google has already released Gemini Nano for Pixel 8 Pro, which has received two new features, including Summarize In Recorder and Smart Reply in Gboard. Google’s AI chatbot, Bard, is also getting Gemini Pro’s abilities in the coming days.