6 min readfrom VentureBeat

The modern data stack was built for humans asking questions. Google just rebuilt its for agents taking action.

The modern data stack was built for humans asking questions. Google just rebuilt its for agents taking action.

Enterprise data stacks were built for humans running scheduled queries. As AI agents increasingly act autonomously on behalf of businesses around the clock, that architecture is breaking down — and vendors are racing to rebuild it. Google's answer, announced at Cloud Next on Wednesday, is the Agentic Data Cloud.

The architecture has three pillars:

  • Knowledge Catalog. Automates semantic metadata curation, inferring business logic from query logs without manual data steward intervention

  • Cross-cloud lakehouse. Lets BigQuery query Iceberg tables on AWS S3 via private network with no egress fees

  • Data Agent Kit. Drops MCP tools into VS Code, Claude Code and Gemini CLI so data engineers describe outcomes rather than write pipelines

"The data architecture has to change now," Andi Gutmans, VP and GM of Data Cloud at Google Cloud, told VentureBeat. "We're moving from human scale to agent scale."

From system of intelligence to system of action

The core premise behind Agentic Data Cloud is that enterprises are moving from human‑scale to agent‑scale operations.

Historically, data platforms have been optimized for reporting, dashboarding, and some forecasting — what Google characterizes as “reactive intelligence.” In that model, humans interpret data and decide what to do.

Now, with AI agents increasingly expected to take actions directly on behalf of the business, Gutmans argued that data platforms must evolve into systems of action. "We need to make sure that all of enterprise data can be activated with AI, that includes both structured and unstructured data," Gutmans said. "We need to make sure that there's the right level of trust, which also means it's not just about getting access to the data, but really understanding the data."

The Knowledge Catalog is Google's answer to that problem. It is an evolution of Dataplex, Google's existing data governance product, with a materially different architecture underneath. Where traditional data catalogs required data stewards to manually label tables, define business terms and build glossaries, the Knowledge Catalog automates that process using agents.

The practical implication for data engineering teams is that the Knowledge Catalog scales to the full data estate, not just the curated subset that a small team of data stewards can maintain by hand. The catalog covers BigQuery, Spanner, AlloyDB and Cloud SQL natively, and federates with third-party catalogs including Collibra, Atlan and Datahub. Zero-copy federation extends semantic context from SaaS applications including SAP, Salesforce Data360, ServiceNow and Workday without requiring data movement.

Google's lakehouse goes cross cloud

Google has had a data lakehouse called BigLake since 2022. Initially it was limited to just Google data, but in recent years has had some limited federation capabilities enabling enterprises to query data found in other locations.

Gutmans explained that the previous federation worked through query APIs, which limited the features and optimizations BigQuery could bring to bear on external data. The new approach is storage-based sharing via the open Apache Iceberg format. That means whether the data is in Amazon S3 or in Google Cloud , he argued it doesn't make a difference. "This truly means we can bring all the goodness and all the AI capabilities to those third-party data sets," he said.

The practical result is that BigQuery can query Iceberg tables sitting on Amazon S3 via Google's Cross-Cloud Interconnect, a dedicated private networking layer, with no egress fees and price-performance Google says is comparable to native AWS warehouses. All BigQuery AI functions run against that cross-cloud data without modification. Bidirectional federation in preview extends to Databricks Unity Catalog on S3, Snowflake Polaris and the AWS Glue Data Catalog using the open Iceberg REST Catalog standard.

From writing pipelines to describing outcomes

The Knowledge Catalog and cross-cloud lakehouse solve the data access and context problems. The third pillar addresses what happens when a data engineer actually sits down to build something with all of it.

The Data Agent Kit ships as a portable set of skills, MCP tools and IDE extensions that drop into VS Code, Claude Code, Gemini CLI and Codex. It does not introduce a new interface.

The architectural shift it enables is a move from what Gutmans called a "prescriptive copilot experience" to intent-driven engineering. Rather than writing a Spark pipeline to move data from source A to destination B, a data engineer describes the outcome — a cleaned dataset ready for model training, a transformation that enforces a governance rule — and the agent selects whether to use BigQuery, the Lightning Engine for Apache Spark or Spanner to execute it, then generates production-ready code.

"Customers are kind of sick of building their own pipelines," Gutmans said. "They're truly more in the review kind of mode, than they are in the writing the code mode."

Where Google and its rivals diverge

The premise that agents require semantic context, not just data access, is shared across the market. 

Databricks has Unity Catalog, which provides governance and a semantic layer across its lakehouse. Snowflake has Cortex, its AI and semantic layer offering. Microsoft Fabric includes a semantic model layer built for business intelligence and, increasingly, agent grounding.

The dispute is not over whether semantics matter — everyone agrees they do. The dispute is over who builds and maintains them.

"Our goal is just to get all the semantics you can get," he explained, noting that Google will federate with third-party semantic models rather than require customers to start over.

Google is also positioning openness as a differentiator, with bidirectional federation into Databricks Unity Catalog and Snowflake Polaris via the open Iceberg REST Catalog standard.

What this means for enterprises

Google's argument — and one echoed across the data infrastructure market — is that enterprises are behind on three fronts:

Semantic context is becoming infrastructure. If your data catalog is still manually curated, it will not scale to agent workloads — and Gutmans argues that gap will only widen as agent query volumes increase.

Cross-cloud egress costs are a hidden tax on agentic AI. Storage-based federation via open Iceberg standards is emerging as the architectural answer across Google, Databricks and Snowflake. Enterprises locked into proprietary federation approaches should be stress-testing those costs at agent-scale query volumes.

Gutmans argues the pipeline-writing era is ending. Data engineers who move toward outcome-based orchestration now will have a significant head start.

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#generative AI for data analysis
#Excel alternatives for data analysis
#enterprise data management
#data visualization tools
#data analysis tools
#big data performance
#big data management in spreadsheets
#conversational data analysis
#real-time data collaboration
#intelligent data visualization
#data cleaning solutions
#google sheets
#cloud-based spreadsheet applications
#cloud-native spreadsheets
#business intelligence tools
#financial modeling with spreadsheets
#natural language processing for spreadsheets
#no-code spreadsheet solutions
#enterprise-level spreadsheet solutions
#self-service analytics tools