Google’s data lakehouse has been updated to natively support Iceberg and AI data governance

Google Cloud Computing announced a number of important upgrades to the Data Cloud platform, focusing on strengthening the openness and intelligent governance capabilities of the data lake warehouse architecture. This update includes native support for the Apache Iceberg open format, and integration of enterprise-level cloud computing storage through BigLake services, combined with artificial intelligence automated data governance, for enterprises and development teams to improve resilience and efficiency in data management, analysis and application. This update focuses on BigLake native support for Apache Iceberg, combining Iceberg open format data management capabilities with Google cloud computing storage. Enterprises can use BigLake Table to efficiently analyze Iceberg datasets and apply mechanisms such as Google Cloud Computing original survival point layer management and user-managed encryption keys. With BigLake Metastore’s new API and REST Catalog, developers can more easily integrate multi-source Iceberg data, and support collaboration with BigQuery, AlloyDB for PostgreSQL, and third-party analysis engines to reduce ETL costs and improve data access elasticity across platforms. Google has also launched an automated relocation tool to assist enterprises in quickly relocating existing data environments such as Hadoop or Delta Lake to Iceberg. The upgrade of the data lake architecture not only enhances the analysis layer, but also extends to operational database and artificial intelligence application integration. BigQuery now supports high-end applications such as real-time query, data reorganization, and multi-table transactions on Iceberg data. Enterprises can use BigQuery for streaming media processing, machine learning, and multimodal analysis while maintaining data autonomy. AlloyDB for PostgreSQL can also directly query Iceberg data managed by BigLake, supporting semantic search and natural language query, so that the operation and analysis data layers are more closely connected, reducing the trouble of data replication and transformation. The Dataplex Universal Catalog is also one of the focuses of this update. The service integrates relay data from different sources such as BigLake, BigQuery, Spanner, Vertex AI, etc., to achieve unified exploration, organization and governance. Combined with the Gemini AI model, Dataplex can automatically analyze data correlations, perform intelligent annotation, semantic search and analysis suggestions, improve data inventory and governance efficiency, and strengthen the automation of rights control, data security and regulatory compliance. Dataplex Universal Catalog also supports integration with third-party governance platforms, making it convenient for enterprises to build cross-cloud and multi-system data governance mechanisms. Google also integrates Gemini in BigQuery Notebook, providing SQL, Python and Apache Spark integrated development experience, through intelligent prompts, automatic generation of PySpark program code and error diagnosis, reducing learning and operation barriers, supporting JupyterLab and VS Code and other development environment extension components, allowing users to quickly connect Google cloud computing data lake open storage and computing resources, accelerating the development to deployment process.


已发布

分类

来自

标签:

评论

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注