4 things Google Gemini users will soon be able to do

adminMay 15, 2024

Google’s artificial intelligence model Gemini is being incorporated into most of the tech giant’s technologies, and AI will soon appear in Gmail, YouTube, and the company’s smartphones.

CEO Sundar Pichai revealed some of the future places the company’s AI models will appear during his keynote address at the company’s I/O 2024 developer conference on May 14.

Pichai mentioned AI 121 times in his 110-minute keynote speech, and Gemini, which was released last December, received the spotlight.

Google is integrating Large Language Models (LLM) into nearly all of its services, including Android, Search, and Gmail, and here’s what users can expect going forward.

*Sundar Pichai at Google I/O 2024. Source: Google*

App interaction

Gemini is gaining more context in terms of being able to interact with applications. In a future update, users will be able to call Gemini to interact with the app, including dragging and dropping AI-generated images into messages.

YouTube users can also tap ‘Ask this video’ to find specific information within the AI’s video.

Gemini on Gmail

Google’s email platform, Gmail, also has AI integration, allowing users to search, summarize, and draft emails using Gemini.

AI assistants can take action on emails for more complex tasks, such as searching your inbox, finding receipts, or helping process e-commerce returns by filling out online forms.

gemini live

Google also unveiled a new experience called Gemini Live that will allow users to have ‘deep’ voice chats with AI on their smartphones.

The chatbot can pause mid-answer for clarification and adapts to the user’s speech patterns in real time. Gemini can also see and respond to its physical surroundings through photos or videos captured on the device.

*Screenshot from Gemini promotional video. Source: Google*

multi-mode power generation

Google is working to develop intelligent AI agents that can reason, plan, and complete complex, multi-step tasks on behalf of users under supervision. Multimodal means AI can process image, audio, and video input beyond text.

Examples and early use cases include automating shopping returns and exploring new cities.

Related: Google’s ‘GPT-4 killer’ Gemini has been released. Here’s what you can try:

Other updates in the company’s AI model pipeline include replacing Android’s Google Assistant with Gemini, which is fully integrated into the mobile operating system.

The new “Ask Your Photos” feature lets you search your photo library using natural language queries powered by Gemini. It can understand context, recognize objects and people, and summarize photographic memories in response to questions.

AI-generated summaries of places and areas appear on Google Maps, leveraging insights from the platform’s mapping data.

magazine: ‘AI in each other’ to prevent AI apocalypse: Science fiction writer David Brin