Why I think that Hive metastore is still unbeatable even by modern solutions like Unity or Polaris

Open table formats like Apache Iceberg and Delta are evolving rapidly today. Developers worldwide are creating both open-source and proprietary custom formats for specific tasks such as data streaming, graph data, and embeddings. Additionally, we have numerous legacy and highly specific data sources, such as logs in custom formats or collections of old Excel files. This diversity is precisely why I believe that extensibility, or the ability to implement custom input and output formats, is crucial. Unfortunately, this feature, which is present in Hive Metastore, is missing in modern data catalogs like Unity or Polaris.

October 22, 2024 · 7 min · Sem Sinchenko

Unitycatalog: the first look

Databricks recently open-sourced Unitycatlog, a unified data catalog that aims to provide a single source of truth for data discovery, governance, and access control across multiple systems. In this blog post, we take a first look at Unitycatlog and dive into the source code to explain which features from the announcement are actually present. We explore how Unitycatlog addresses the challenges of managing data in a complex data landscape and discuss its potential impact on simplifying data governance and improving data accessibility for organizations.

June 17, 2024 · 8 min · Sem Sinchenko