Apache Iceberg vs Delta Lake vs Apache Hudi in 2026: the user audiences compared
The lakehouse audience exists because the warehouse economics broke at the high end. Companies whose Snowflake bills crossed a threshold, whose Databricks compute costs outgrew the simpler use cases, or whose data volume made closed formats untenable started routing the harder workloads through open table formats on object storage. Apache Iceberg, Delta Lake, and Apache Hudi each found a different slice of that audience. Vendors of compute engines, catalog services, and metadata management tools reach the lakehouse audience through Sponsored Challenges scoped to the specific operational problem the open table format the buyer chose actually solves.
ByDataDriven Partners EditorialResearched against open-source project surfaces and observed buyer patterns
Last reviewed
· 13 min read
Frequently asked
Which open table format is dominant in 2026?
Apache Iceberg. Cross-vendor support from Snowflake, BigQuery, Databricks, Dremio, Starburst, and most catalog services. Delta Lake remains dominant inside the Databricks ecosystem; Apache Hudi has the strongest merge-on-read support but smaller cross-vendor adoption.
How should I scope a Sponsored Challenge for lakehouse practitioners?
Scope to an operational problem specific to the open table format the customer runs. Iceberg partition evolution, Delta Lake protocol upgrades, Hudi merge-on-read semantics, cross-engine catalog interoperability. The vendor partners with DataDriven Partners editorial to scope the problem against the format the vendor wants to reach.
Should I support Iceberg, Delta, and Hudi all equally?
Probably not in your first product version. Pick the format your customers actually use (typically Iceberg in 2026), implement deeply, add others when there is real demand. The Sponsored Challenge scope can be per format across multiple placement quarters.
Why does catalog choice matter so much?
The catalog service holds the metadata that determines which compute engines can read which tables. Catalog choice gates interoperability across engines, governance policies, and metadata portability. Vendors who skip catalog support in their marketing lose evaluations.
What is Subsurface?
The annual lakehouse-specific conference run by Dremio. Largest in-person lakehouse practitioner event in 2026. Tiered sponsorship and editorial speaking slots; reinforces the Sponsored Challenge running concurrently.
How does this audience differ from warehouse practitioners?
Lakehouse practitioners chose open table formats on object storage over closed warehouses. The choice is architectural; cost, vendor lock-in avoidance, data volume, or interoperability requirements drove it. Many practitioners run hybrid architectures (warehouse for easy cases, lakehouse for hard ones).
What about the Tabular acquisition by Databricks?
Tabular's acquisition by Databricks in 2024 changed the Iceberg ecosystem dynamics. Databricks now has both Delta Lake and significant Iceberg position; the practical effect is that Databricks lakehouse marketing increasingly covers both formats. Plan for cross-format expectations.
How do I reach the Hudi subaudience?
The Hudi audience is smaller but technically concentrated. The Apache Hudi mailing list and the Onehouse community channels are the venues. Sponsored Challenges scoped to merge-on-read semantics or incremental-table patterns reach the audience inside their evaluation frame.
What evaluation criteria does the lakehouse audience apply?
Specification fidelity (does this implement the spec correctly), catalog interoperability (does this work with our catalog), honest open-versus-closed positioning (vendors who add quiet lock-in get caught), operational maturity at scale, and integration breadth across the heterogeneous lakehouse stack.
How long are lakehouse tool evaluation cycles?
Longer than warehouse-adjacent tool evaluations because the audience is making an architectural decision that affects multiple downstream systems. Six-month cycles are typical at Series B and later companies; cycles can extend longer at enterprises.
The three open table formats and who chose each
Apache Iceberg has
become the dominant open table format in 2026 with cross-vendor
support from Snowflake, BigQuery, Databricks, Dremio, Starburst,
and most catalog services. The audience that chose Iceberg
typically did so because they wanted cross-engine interoperability
and vendor neutrality; the project originated at Netflix and is
governed by Apache. The Iceberg user subaudience is the largest
slice of the broader lakehouse audience in 2026.
Delta Lake remains dominant
inside the Databricks ecosystem and has strong Linux Foundation
governance. The audience that chose Delta typically did so
because they were already running Databricks at significant
scale; Delta Lake's tight integration with the Databricks
platform was the deciding factor. Cross-vendor Delta support is
thinner than Iceberg; the audience is overwhelmingly
Databricks-flavored.
Apache Hudi has the
strongest support for merge-on-read and incremental-table
semantics but smaller cross-vendor adoption than Iceberg or
Delta. The audience that chose Hudi typically did so because
they have specific incremental-table requirements (CDC ingestion,
upsert-heavy workloads) that the other formats handle awkwardly.
Onehouse runs the commercial entity around Hudi; the community
is smaller but technically concentrated.
Why specification fidelity is the evaluation gate
The lakehouse audience reads upstream Apache project
documentation in surprising depth. The Iceberg spec, Delta Lake
protocol, and Hudi specification are all reference material the
audience returns to. Vendors whose products integrate with these
formats are evaluated against specification fidelity, not against
feature lists alone. A vendor that claims Iceberg support but
implements snapshot isolation incorrectly gets caught on first
proof-of-concept; the audience tests against the spec the day
they sign up.
The implication for vendor marketing is that overstated spec
coverage is corrosive. Vendors who claim full Iceberg support
with shallow implementation lose evaluations and lose trust
durably. Vendors who scope honestly (deep Iceberg support, Delta
Lake support next, Hudi support later if there is real demand)
earn durable trust the audience extends to engineering-honest
vendors.
The catalog question, which is bigger than vendors expect
The catalog service holds the metadata that determines which
query engines can read which tables. In 2026 the main catalog
options are Tabular (Iceberg-
focused, now part of Databricks following the 2024 acquisition),
Snowflake
Polaris, Databricks Unity Catalog, and the open-source
Apache Iceberg REST catalog spec that multiple vendors
implement.
The choice is consequential. Catalog choice determines what
compute engines can interoperate, what governance policies apply,
and what metadata travels with the data. Vendors of compute or
query tools need to address catalog support explicitly; vendors
of catalog services need to interoperate with the engines the
audience already runs. Vendors who pretend the catalog question
is a minor implementation detail get caught in evaluation; the
audience knows the catalog is the linchpin and asks the catalog
question first.
Why a Sponsored Challenge reaches the lakehouse audience cleanly
A Sponsored
Challenge on DataDriven.io scoped to an open-table-format
operational problem reaches the lakehouse audience inside the
architectural evaluation frame. Partition evolution under
concurrent writes against an Iceberg dataset. Snapshot isolation
testing across multiple writers. Schema evolution semantics
validation. Catalog interoperability across query engines. Each
of these problem shapes is something a lakehouse engineer
evaluates real products against; the placement reaches them in
the same mode they apply at work.
The mechanics: the engineer browses the challenge catalog,
selects a problem on partition evolution against an Iceberg
dataset, attempts the solution for twenty to forty minutes, and
clicks through the UTM-tagged closing CTA to the vendor's
documentation on the technique. The engineer leaves with a
working operational mental model of the vendor's product against
the open table format they actually use; the placement reaches
the lakehouse audience through architectural depth, not through
marketing copy.
Lakehouse vocabulary
The terms that come up in lakehouse-targeted placement scoping.
Lakehouse
An architecture that puts data in open table formats on object storage and routes different workloads to different compute engines. Distinct from a data warehouse (vendor-managed storage and compute, closed format) and a data lake (object storage, often raw files without table semantics).
Open table format
A specification for organizing files in object storage so they behave like tables (snapshot isolation, schema evolution, partition pruning). Apache Iceberg, Delta Lake, and Apache Hudi are the three primary options in 2026.
Apache Iceberg
The dominant open table format in 2026. Originated at Netflix, governed by Apache. Cross-vendor support from Snowflake, BigQuery, Databricks, Dremio, Trino, Starburst, and most catalog services.
Delta Lake
Databricks-origin open table format, now governed by the Linux Foundation. Strongest support inside the Databricks ecosystem; cross-vendor support thinner than Iceberg.
Apache Hudi
Uber-origin open table format with the strongest merge-on-read and incremental table semantics. Smaller cross-vendor adoption than Iceberg or Delta but distinct technical advantages for specific workloads.
Catalog service
The metadata layer that tracks which tables exist, where their files live, and what schemas they have. Determines compute-engine interoperability. Tabular, Snowflake Polaris, Databricks Unity Catalog, and Apache Iceberg REST catalog are the main options.
Sponsored Challenge scoped to lakehouse format
A placement on DataDriven.io scoped to the open-table-format problems the audience evaluates real products against. Partition evolution, schema evolution, catalog interoperability, snapshot isolation. Reaches the lakehouse audience inside the architectural evaluation frame.
What this page documents
Apache Iceberg has become the dominant open table format for lakehouse workloads in 2026, with Delta Lake (Databricks-aligned) and Apache Hudi (Uber-originated, smaller) holding meaningful shares. Snowflake, BigQuery, Databricks, Dremio, and Trino all support Iceberg natively as of 2026.
Apache Iceberg project, vendor public positioning2026-05Open-source project momentum
The lakehouse audience evaluates vendors on specification fidelity (does this implement the Iceberg or Delta protocol correctly), catalog interoperability (does this work with the catalog we run), and honest open-versus-closed positioning (vendors who quietly add lock-in get caught fast).
Industry pattern; audience evaluation framing2026-05Evaluation-criteria scoping
A Sponsored Challenge on DataDriven.io scoped to an open-table- format problem (partition evolution under concurrent writes, schema evolution semantics, catalog interoperability across query engines) reaches the lakehouse audience inside the architectural evaluation frame.
Catalog services (Tabular, Snowflake Polaris, Databricks Unity Catalog, Apache Iceberg REST catalog) have become a central decision point. Catalog choice determines metadata interoperability across query engines; vendors of compute or query tools must address catalog support explicitly.
Industry consensus on lakehouse architecture2026-05Architectural decision framing
Subsurface (Dremio's lakehouse conference) is the largest lakehouse-specific in-person event in 2026. Databricks Data + AI Summit covers the Delta Lake slice. Vendor-run Slacks (Tabular, Onehouse, Dremio, Starburst) cover the daily engagement layer.
Public conference and community surfaces2026-05Venue scope cross-reference
How vendor scope should match format scope
The lakehouse market consolidation around Iceberg, Delta, and
Hudi means vendors of compute, catalog, query, and metadata tools
need to scope their support honestly. A vendor that claims to
support all three formats but only one is production-grade gets
caught fast; the audience tests integrations on day one. The
honest scope: pick the format the customer base runs (typically
Iceberg in 2026), implement deeply, add the others when there is
real demand. The Sponsored Challenge scope follows: a vendor
with deep Iceberg support scopes the placement to an Iceberg
problem; a vendor with deep Delta Lake support scopes to a Delta
problem.
The Tabular acquisition by Databricks in 2024 changed the
Iceberg ecosystem dynamics meaningfully. Databricks now has both
Delta Lake and significant Iceberg position; the practical effect
is that Databricks lakehouse marketing increasingly covers both
formats. Vendors targeting Databricks-ecosystem lakehouse
practitioners should plan for cross-format integration as the
audience increasingly expects both.
Catalog support in Sponsored Challenge scoping
The catalog question surfaces directly in Sponsored Challenge
scoping. A challenge on Iceberg partition evolution implicitly
assumes a catalog; the dataset has to live somewhere; the catalog
determines what query engines can read it. Vendors scoping a
placement around their product's catalog support are scoping
exactly the question the audience asks first during evaluation.
The placement is the audience's first hands-on experience with
the vendor's catalog integration; the closing CTA points to
documentation on the catalog story.
Vendors of catalog services have a particularly clean
Sponsored Challenge story. The placement can be scoped to a
catalog interoperability problem across multiple query engines;
the engineer attempting the challenge experiences the catalog's
cross-engine behavior directly. Vendors of compute engines have
the inverse story: the placement can be scoped to a workload
that depends on catalog metadata, with the engineer experiencing
the compute engine's catalog integration through the challenge.
The three open table formats compared for vendor placement scoping
How vendor positioning should differ by subaudience.
The same vendor running lakehouse-adjacent tools can scope different Sponsored Challenges to different subaudiences across multiple placement quarters. The placement scope matches the format the customer runs.
One specific situation: a Series A query engine vendor's lakehouse playbook
A Series A query engine vendor targeting lakehouse workloads
has a clean playbook. Year one focus: ship deep Apache Iceberg
support (full spec compliance, not partial); add Delta Lake
support next; leave Hudi support for year two if there is real
demand. Scope a Sponsored Challenge to an Iceberg-specific
problem (partition evolution under concurrent writes against a
realistic dataset, snapshot isolation testing, hidden-partitioning
correctness). Participate substantively on the Apache Iceberg
user mailing list with named vendor engineers.
Sponsor Subsurface at a mid-tier with speaking-slot pursuit on
an Iceberg topic. Build presence in the Tabular Slack and the
Dremio community Slack with disclosed-affiliation engineers.
Address the catalog question explicitly in documentation: name
which catalogs the engine integrates with and how. Pair the
Sponsored Challenge with a Brand Slot on lakehouse-relevant
topic pages during the placement quarter for repeated brand
exposure.
The combination reaches the lakehouse audience through the
venues they actually read and the placement format that matches
their architectural evaluation frame. Pipeline conversion
measures through multi-touch attribution; the Sponsored Challenge
consistently appears in first-touch position for lakehouse
customers who closed during the placement window.
What does not work for this audience
Three patterns waste vendor effort on lakehouse practitioners.
Format-agnostic positioning that does not name Iceberg, Delta, or
Hudi explicitly reads as either ignorance or hedging. Overstated
cross-format coverage where the vendor claims to support all
three formats but only one is production-grade gets caught on
first proof-of-concept. Catalog hand-waving where the vendor's
product depends on catalog choice but the marketing pretends it
does not; the audience asks the catalog question first and rules
out vendors who do not have a clear answer.
The Sponsored Challenge scoping helps with each of these. A
placement scoped to a specific open-table-format problem names
the format and the problem directly; the closing CTA points to
documentation the engineer can validate against; the editorial
collaboration during scoping forces the vendor to be honest
about which formats the product handles well and which it
handles less well.
The long arc on lakehouse architecture decisions
Lakehouse practitioners are an architecture-driven audience,
and architectural choices have multi-year half-lives. Vendors who
establish presence in this audience in 2026 are positioning for
buying decisions that play out through 2028 and beyond. Year-one
investment in upstream contribution, conference presence, product
depth, and Sponsored Challenge placements compounds into trust
that survives the architectural shifts that will come. Iceberg
may not be the dominant format in 2030; the vendors who showed
up in the Iceberg community in 2026 are positioned to follow the
audience wherever it goes next.
Architecture
The lakehouse audience is defined by an architectural decision, not a job title. The decision converges on a similar set of evaluation criteria for tooling (specification fidelity, catalog interoperability, honest open-versus-closed positioning). The Sponsored Challenge format adapts cleanly to these criteria when scoped to an open-table-format operational problem; the placement reaches the audience inside the frame they evaluate in.
Reach lakehouse practitioners inside the architectural evaluation frame.
A Sponsored Challenge scoped to an open-table-format operational problem against an Iceberg or Delta Lake dataset reaches the lakehouse audience in evaluation mode. Apply with your operational story and the founder will scope the placement.