Google I/O 2026: Agentic AI gets serious

This week’s latest iteration of Google’s yearly developer event reiterated the company’s significant AI commitment. What’s different from messaging and examples past? Maturity.

One of the technologies showcased in the most recent edition of my previous-year retrospective series, published on New Year’s Day, was agentic AI. An overview excerpt from that earlier coverage follows:

Here’s what Wikipedia says about AI agents in its topic intro:

“In the context of generative artificial intelligence, AI agents (also referred to as compound AI systems or agentic AI) are a class of intelligent agents distinguished by their ability to operate autonomously in complex environments. Agentic AI tools prioritize decision-making over content creation and do not require human prompts or continuous oversight.”

And what about the aforementioned broader category of intelligent agents, of which AI agents are a subset? Glad you asked:

“In artificial intelligence, an intelligent agent is an entity that perceives its environment, takes actions autonomously to achieve goals, and may improve its performance through machine learning or by acquiring knowledge. AI textbooks define artificial intelligence as the “study and design of intelligent agents,” emphasizing that goal-directed behavior is central to intelligence. A specialized subset of intelligent agents, agentic AI (also known as an AI agent or simply agent), expands this concept by proactively pursuing goals, making decisions, and taking actions over extended periods.”

A recent post on Google’s Cloud Blog included, I thought, I concise summary of the aspiration:

“Agentic workflows represent the next logical step in AI, where models don’t just respond to a single prompt but execute complex, multi-step tasks. An AI agent might be asked to “plan a trip to Paris,” requiring it to perform dozens of interconnected operations: browsing for flights, checking hotel availability, comparing reviews, and mapping locations. Each of these steps is an inference operation, creating a cascade of requests that must be orchestrated across different systems.”

I suggested there that last year’s rapid evolution of agentic AI technology and products based on it wasn’t a one-off; that the maturation and proliferation trends would undoubtedly continue in the coming year and beyond. We’re nearing the 2026 mid-point and, judging from what Google showcased at yesterday’s keynote, I wasn’t offbase with my earlier perpetuation prediction:

But I’m getting ahead of myself…

Android Show: I/O Edition 2026

Google extended the trend it initiated last year by delivering a separate Android-specific showcase one week ahead of the main event:

Company representatives covered a lot of ground in only a bit more than a half hour, including pending enhancements to Android Auto and “Continue On”, an in-beta conceptual clone of Apple’s Handoff. But two other topics particularly caught my eye. Generally speaking, Google is fundamentally integrating Gemini Intelligence even more than previously into the core of both Android and its Chrome browser, including both anticipatory awareness of what you might need next and the agentic “chops” to independently (potentially) tackle such tasks on your behalf.

The central reason why I find this trend interesting is contextual in nature. Both Amazon (again) and OpenAI are reportedly working on smartphones based on brand new AI-based—specifically agentic, generative and personalized—operating systems. Going “clean slate” from a software standpoint does have at least some advantages, conceptually speaking at least, but it also tends to result in a “heavy lift” with respect to application development, internally and especially from a third-party standpoint. Conversely, Google’s building on a longstanding Android foundation.

Consider that contrast, too, in the context of the other key Android Show tidbit that I want to pass along today. Confirming longstanding rumor, Google announced that it is seriously re-engaging in the tablet market with Android (where, to clarify, it remains a “player” today, primarily courtesy of its Samsung partnership, albeit on a limited basis versus Apple iPad alternatives), as well as expanding Android into computing form factors that were traditionally serviced by Chrome OS, all with a new operating system version code-named “Aluminum”.

The coexistence of the two operating systems had always been awkward at best. They’re both built on a Linux foundation, but that’s kind of like saying that a Trabant and a Ferrari both hail from a Ford Model T heritage. I’m not trying here to infer a vehicle-analogous comparison between the two operating systems with respect to “sleekness”, price or anything like that, only generally proffering that they’re notably dissimilar. Different code bases, different development teams and schedules…over time, Android and Chrome OS had increasingly diverged, to their shared detriment.

And what does “Aluminum” mean for Chrome OS fortunes long-term? Unclear; the latter’s only notable success has been in the education market, but it’s been a notable success there, so Google needs to be careful about how it hand-holds these key customers during the transition (which I’d suggest is a matter not of if, but when). Event-delivered reassurances included that support-timeframe schedules for existing Chrome OS-based products would continue to be honored in full, that new Chrome OS-based products were still in the development pipeline from partners, and that at least some existing Chrome OS-based hardware would be upgradeable to whatever the marketing moniker for “Aluminum” ends up being. That said, if new Chrome OS hardware is still being announced when the decade turns in a few years, I’ll be shocked.

Foundation AI evolutions

Now for the main event. AI has been front and center in Google I/O messaging for a while now, as The Verge and I joked about two years back:

@verge

Pretty sure Google is focusing on AI at this year’s I/O. #google #googleio #ai #tech #technews #techtok

♬ original sound – The Verge

And it was more of the same this year. For those of you who’ve been wondering what the term “foundation model” (or variants of that name) means, I’ll start out with a Wikipedia-sourced definition:

In artificial intelligence, a foundation model (FM), also known as large x model (LxM, where “x” is a variable representing any text, image, sound, etc.), is a machine learning or deep learning model trained on vast datasets so that it can be applied across a wide range of use cases. Generative AI applications like large language models (LLM) are common examples of foundation models.

Building foundation models is often highly resource-intensive, with the most advanced models costing hundreds of millions of dollars to cover the expenses of acquiring, curating, and processing massive datasets, as well as the compute power required for training. These costs stem from the need for sophisticated infrastructure, extended training times, and advanced hardware, such as GPUs.

This all in contrast to dataset- and application-specific models. Wikipedia again:

Adapting an existing foundation model for a specific task or using it directly is far less costly, as it leverages pre-trained capabilities and typically requires only fine-tuning on smaller, task-specific datasets.

Last year at I/O, Google shared updates on v2.5 of its Gemini model family (standard, Flash and Pro), which had been introduced a few months earlier. Gemini v3 subsequently arrived last November. And now we’re up to Gemini family v3.5. Commensurate with the update, another term is in circulation for us to sort out: “Frontier model”. NVIDIA with the definition this time:

Frontier models are the most advanced AI models available at a given moment, trained on massive datasets to deliver state-of-the-art performance across many tasks, representing the leading edge of AI capability. They typically power advanced reasoning, image and text generation, and agentic workflows.

Translation: a fancy way of saying “next generation”. Gotta love those marketeers.

More generally, snark aside, I admittedly was particularly gob smacked by this subset of the event-opening keynote remarks by CEO Sundar Pichai:

These stories of how people are using AI are the best measure of progress. To understand the scale at which people are adopting AI, there is another great proxy — tokens, the fundamental units of data our models process, many representing a problem being solved.

Two years ago, we were processing 9.7 trillion tokens a month across our surfaces — a huge number. Last year at I/O, that grew to roughly 480 trillion tokens. Fast forward to today, that number jumped 7x to over 3.2 quadrillion per month. [Editor note: token maxxing? Likely, to a degree. Still…]

It tells an important story about our products and how others are building as well — especially developers and enterprises:

Over 8.5 million developers are now building new apps and experiences with our models monthly.
Our model APIs are now processing roughly 19 billion tokens per minute.
Over the past 12 months, over 375 Google Cloud customers each processed more than one trillion tokens, representing incredible demand for AI from across industries.

Today we have 13 products with over a billion users each. Five of those have more than 3 billion users. [Editor note: and they’re all AI-enhanced, if not AI-centric]

Multimodal and agentic enhancements

Back in December 2024, within a broader attempt to forecast the year to come, I opined:

Large language models (LLMs), which I rightly showcased at the very top of my 2023 retrospective list, are increasingly impressive in their capabilities. But they’re also, admittedly somewhat simplistically speaking, “one-trick ponies”. As their name implies, they’re language-based from both input (typed) and output (displayed) standpoints. If you want to speak to one, you need to first run the audio through a separate speech-to-text model (or standalone algorithm); the same goes for spitting a response back at you through a set of speakers. Analogies to images and video clips, and other sensory and output data, are apt.

Granted, this approach is at least somewhat analogous to human beings’ cerebral cortexes, which are roughly subdivided into areas optimized for language, vision and other processing functions. Still, given that humans are fundamentally multisensory in both input and output schema, any AI model that undershoots this reality will be inherently limited. That’s where newer multimodal models come in. Vision language models (VLMs), for example, augment language with equally innate still and video image perception and generation capabilities. And large multimodal models (LMMs) are even more input- and output-diverse. Think of them as the deep learning analogies to the legacy sensor fusion techniques applied to traditional processing algorithms, which I ironically alluded to in my 2022 retrospective.

Enter the new Gemini Omni multimodal model:

Last year, Nano Banana brought Gemini’s intelligence to image generation and editing. Since then, it’s helped millions of people restore old photos, design from sketches and visualize ideas in ways that weren’t possible before. From the start we built Gemini to be natively multimodal from the ground up, and now we’re taking the next step.

We’re introducing Gemini Omni, where Gemini’s ability to reason meets the ability to create. Omni is our new model that can create anything from any input — starting with video. With Omni, you can combine images, audio, video and text as input and generate high-quality videos grounded in Gemini’s real-world knowledge. You can also easily edit your videos through conversation.

Today, we’re rolling out the first model in the Omni family: Gemini Omni Flash, to the Gemini app, Google Flow and YouTube Shorts. In time we will support output modalities like image and audio.

And what about burgeoning agentic AI assistants such as OpenClaw? How’s that saying go—”Imitation is the sincerest form of flattery”—albeit this time with innate Google services and account-data access?

We’re also introducing Gemini Spark, a 24/7 personal AI agent that helps you navigate your digital life. Spark represents a big shift for Gemini, transforming it from an assistant that can answer your questions into an active partner that does real work on your behalf and under your direction.

Gemini Spark runs on Gemini 3.5 and uses the Antigravity harness. It’s deeply integrated with the Workspace tools you rely on daily, like Gmail, Docs, Slides and more. Even better, because it is a cloud-based agent, Spark keeps working in the background even when you close your laptop or lock your phone. That combination means Spark is ready to take complex tasks off your plate so you can be more present for what matters most.

“Intelligent Eyewear”

Last but not least, a few words about head-located wearables, including those with integrated displays. Google seems to be reluctant to refer to them as “smart glasses” (or VR headsets, for that matter). Gee, I wonder why? And why? Snark off (again). As regular readers may already recall, I’ve been following this market quite closely in recent years, even personally investing in a few trendsetting product examples. And we’ve in-parallel been hearing about (and I’ve been writing about) Google’s Android XR operating system and application suite for augmented, virtual and hybrid reality systems for a while now, too.

Well, the reality behind the hype is finally coming to market starting this fall. Supposedly. Conceptually, they sound a lot like Meta’s counterparts (albeit perhaps a bit sleeker) which I’d suggest have been meaningful from an implementation standpoint since at least the October 2023 unveil of the second-generation AI Glasses. That said, Meta’s success has to date been held back by (among other factors) a dearth of third-party support. Here’s a reality calibration: even if Google and partners’ competitive devices are no better off in this regard, their inherent coordination with the aforementioned “Google services and account data” will still give them a “leg up”. More generally, you’ve got to admit this was one heck of a compelling live demo suite:

We shall see.

Wrapping up

There was plenty more interesting news released at the Tuesday keynote and more broadly across the two-day event (which is still underway as I type these words mid-day on Wednesday). Browse other writeups on the Google event portal page, along with coverage at 9to5Google and elsewhere. And then share your thoughts with me and your fellow readers in the comments!

The post Google I/O 2026: Agentic AI gets serious appeared first on EDN.

Android Show: I/O Edition 2026

Foundation AI evolutions

Multimodal and agentic enhancements

“Intelligent Eyewear”

Wrapping up

Become a member

Become a subscriber

Become a sponsor