Iceberg Catalog - Apache Doris
Slack
is back! Welcome home! All channels and history remain intact. Also, our
Discord
community is here to stay.
This documentation is for an
unreleased
version of Apache Doris.
For usage, please refer to the official documentation of
Version 2.1
or
Version 3.x
Doris supports accessing Iceberg table data through various metadata services. In addition to reading data, Doris also supports writing to Iceberg tables, including INSERT, INSERT OVERWRITE, UPDATE, DELETE, and MERGE INTO.
Quick start with Apache Doris and Apache Iceberg
tip
Users can access Iceberg tables using Hive Metastore as metadata through the Hive Catalog. However, it is still recommended to use the Iceberg Catalog directly to avoid some compatibility issues.
Applicable Scenarios
Scenario
Description
Query Acceleration
Use Doris's distributed computing engine to directly access Iceberg data for faster queries.
Data Integration
Read Iceberg data and write it to internal Doris tables, or perform ZeroETL operations using the Doris computing engine.
Data Write-back
Process data from any Doris-supported data source and write it back to Iceberg table storage.
Configuring Catalog
Syntax
CREATE
CATALOG
IF
NOT
EXISTS
catalog_name PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'
'warehouse'
'
--optional
{MetaStoreProperties}
{StorageProperties}
{IcebergProperties}
{CommonProperties}
The type of Iceberg Catalog, supporting the following options:
hms
: Uses Hive Metastore as the metadata service.
rest
: Metadata service compatible with the Iceberg Rest Catalog interface.
hadoop
: Directly accesses metadata stored on the file system.
glue
: Uses AWS Glue as the metadata service.
dlf
: Uses Alibaba Cloud DLF as the metadata service.
s3tables
: Uses AWS S3 Tables Catalog to visit
S3 Table Bucket
The warehouse path for Iceberg. This parameter must be specified when
is
hadoop
The
warehouse
path must point to the level above the
Database
path. For example, if your table path is
s3://bucket/path/to/db1/table1
, then the
warehouse
should be
s3://bucket/path/to/
{MetaStoreProperties}
The MetaStoreProperties section is for entering connection and authentication information for the Metastore metadata service. Refer to the section on [Supported Metadata Services].
{StorageProperties}
The StorageProperties section is for entering connection and authentication information related to the storage system. Refer to the section on [Supported Storage Systems].
{IcebergProperties}
The IcebergProperties section is used to fill in parameters specific to Iceberg Catalog.
list-all-tables
For Iceberg Catalog that uses Hive Metastore as the metadata service. Default is
true
. By default, the
SHOW TABLES
operation will list all types of tables in the current Database (Hive Metastore may store non-Iceberg type tables). This approach has the best performance.
If set to
false
, Doris will check the type of each table one by one and only return Iceberg type tables. This mode will have poor performance when there are many tables.
{CommonProperties}
The CommonProperties section is for entering general properties. See the
Catalog Overview
for details on common properties.
Metadata Cache
To improve the performance of accessing external data sources, Apache Doris caches Iceberg metadata. Metadata includes table structure (Schema), table objects, view objects, and manifest details.
tip
For versions before Doris 4.1.x, metadata caching is mainly controlled globally by FE configuration items. For details, see
Metadata Cache
Starting from Doris 4.1.x, Iceberg Catalog's external metadata cache is configured using the unified
meta.cache.*
keys.
Cache Property Configuration (4.1.x+)
Each engine's cache entry uses a unified configuration key format:
meta.cache.
Property
Example
Meaning
enable
true/false
Whether to enable this cache module.
ttl-second
600
-1
means disable cache (takes effect immediately, can be used to see the latest data);
-1
means never expire; other positive integers mean TTL in seconds based on access time.
capacity
10000
Maximum number of cache entries (by count).
means disable.
Effective Logic:
The module cache only takes effect when
enable=true
ttl-second != 0
, and
capacity > 0
Cache Modules
Iceberg Catalog includes the following cache modules:
Module (
Property Key Prefix
Cached Content and Impact
schema
meta.cache.iceberg.schema.
Caches table structure. Impact: Visibility of table column information. If disabled, the latest Schema is pulled for each query.
table
meta.cache.iceberg.table.
Caches Iceberg table metadata objects. Impact: Reduces Catalog/Metastore round-trips.
view
meta.cache.iceberg.view.
Caches Iceberg View metadata objects.
manifest
meta.cache.iceberg.manifest.
Caches manifest details. Impact: Reduces repeated manifest access overhead. Note: This module is disabled by default and must be enabled manually.
Legacy Parameter Mapping and Conversion
In version 4.1.x and later, unified keys are recommended. The following is the mapping between legacy Catalog properties and 4.1.x+ unified keys:
Legacy Property Key
4.1.x+ Unified Key
Description
schema.cache.ttl-second
meta.cache.iceberg.schema.ttl-second
Expiration time of table structure cache
Best Practices
Real-time access to the latest data
: If you want each query to see the latest snapshots or schema changes for Iceberg tables, you can set the
ttl-second
for
schema
or
table
to
-- Disable table object cache to detect snapshot changes
ALTER
CATALOG iceberg_ctl
SET
PROPERTIES
"meta.cache.iceberg.table.ttl-second"
"0"
Performance optimization
Enabling manifest cache can significantly speed up query planning for large tables:
ALTER
CATALOG iceberg_ctl
SET
PROPERTIES
"meta.cache.iceberg.manifest.enable"
"true"
"meta.cache.iceberg.manifest.ttl-second"
"600"
Changes via
ALTER CATALOG ... SET PROPERTIES
support hot-reload in Iceberg Catalog.
Observability
Cache metrics can be observed through the
information_schema.catalog_meta_cache_statistics
system table:
SELECT
catalog_name
engine_name
entry_name
effective_enabled
ttl_second
capacity
estimated_size
hit_rate
load_failure_count
last_error
FROM
information_schema
catalog_meta_cache_statistics
WHERE
catalog_name
'iceberg_ctl'
AND
engine_name
'iceberg'
ORDER
BY
entry_name
See the documentation for this system table:
catalog_meta_cache_statistics
Supported Iceberg Versions
Doris Version
Iceberg SDK Version
2.1
1.6.1
3.0
1.6.1
3.1
1.9.1
4.0
1.9.1
Supported Iceberg Formats
Supports Iceberg V1/V2 formats.
Supports reading Position Delete and Equality Delete.
Supports reading Deletion Vector (Since 4.1.0).
Supported Metadata Services
Hive Metastore
AWS Glue
Aliyun DLF
Iceberg Rest Catalog
Iceberg JDBC Catalog
(Supported since version 4.1.0)
FileSystem
Note: The service types and parameters supported by different Doris versions are slightly different. Please refer to the [Examples] section.
Iceberg Catalog Feature Support Matrix
Metadata Service
Table Query
View Query
DDL Operations
Data Writeback
System Tables (Supported since 3.1+)
Hive MetaStore
AWS Glue
Aliyun DLF
Iceberg Rest Catalog
Depends on the service provider
Depends on the service provider
Depends on the service provider
FileSystem
Supported Storage Systems
HDFS
AWS S3
Google Cloud Storage
Azure Blob
Apache Ozone
(supported since 4.0.4)
Aliyun OSS
Tencent COS
Huawei OBS
MINIO
Note: The service types and parameters supported by different Doris versions are slightly different. Please refer to the [Examples] section.
Supported Data Formats
Parquet
ORC
Column Type Mapping
Iceberg Type
Doris Type
Comment
boolean
boolean
integer
int
long
bigint
float
float
double
double
decimal(P, S)
decimal(P, S)
date
date
timestamp (Timestamp without timezone)
datetime(6)
Mapped to datetime with precision 6
timestamptz (Timestamp with timezone)
datetime(6)/timestamptz(6)
Mapped to datetime with precision 6. The
enable.mapping.timestamp_tz
property (supported since version 4.0.3) controls the mapping behavior. By default, it is set to
false
, in which case the type is mapped to
datetime
. When set to
true
, it is mapped to the
timestamptz
type.
fixed(N)
char(N)
string
string
binary
string/varbinary
Controlled by the
enable.mapping.varbinary
property of Catalog (supported since 4.0.2). The default is
false
, which maps to
string
; when
true
, it maps to
varbinary
type.
uuid
string/varbinary
version < 4.0.2 maps to
string
type; version == 4.0.2 maps to
varbinary
type. For version > 4.0.2, it is controlled by the
enable.mapping.varbinary
property in the properties. The default is
false
, which maps to
string
struct
struct (supported from version 2.1.3)
map
map (supported from version 2.1.3)
list
array
other
UNSUPPORTED
Note:
Doris currently does not support
Timestamp
types with timezone. All
timestamp
and
timestamptz
will be uniformly mapped to
datetime(N)
type. However, during reading and writing, Doris will correctly handle timezones based on the actual source type. For example, after specifying a timezone with
SET time_zone=
, it will affect the reading and writing results of
timestamptz
columns.
You can check whether the source type has timezone information in the Extra column of the
DESCRIBE table_name
statement. If it shows
WITH_TIMEZONE
, it indicates that the source type is a timezone-aware type. (Supported since 3.1.0).
Supported since version 4.0.3,
timestamptz
(timestamp with time zone) can be mapped to the Doris
timestamptz
type.
Namespace Mapping
Iceberg's metadata hierarchy is Catalog -> Namespace -> Table. Namespace can have multiple levels (Nested Namespace).
┌─────────┐
│ Catalog │
└────┬────┘
┌─────┴─────┐
┌──▼──┐ ┌──▼──┐
│ NS1 │ │ NS2 │
└──┬──┘ └──┬──┘
│ │
┌────▼───┐ ┌──▼──┐
│ Table1 │ │ NS3 │
└────────┘ └──┬──┘
┌──────┴───────┐
┌────▼───┐ ┌────▼───┐
│ Table2 │ │ Table3 │
└────────┘ └────────┘
Starting from version 3.1.2, for Iceberg Rest Catalog, Doris supports mapping of Nested Namespace.
In the above example, tables will be mapped to Doris metadata according to the following logic:
Catalog
Database
Table
Catalog
NS1
Table1
Catalog
NS2.NS3
Table2
Catalog
NS2.NS3
Table3
Support for Nested Namespace needs to be explicitly enabled. For details, please refer to
Iceberg Rest Catalog
Examples
Hive Metastore
3.1+ Version
HDFS
S3
OSS
COS
OBS
GCS
Azure Blob
Ozone
MinIO
Access HMS and HDFS services without Kerberos authentication
CREATE
CATALOG iceberg_hms_on_hdfs_catalog PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'hms'
'warehouse'
'hdfs://127.0.0.1:8320/user/iceberg/warehouse/iceberg-hms-hdfs-warehouse'
'hive.metastore.uris'
'thrift://127.0.0.1:9383'
'fs.defaultFS'
'hdfs://127.0.0.1:8320'
'hadoop.username'
'doris'
Access HMS and HDFS services with Kerberos authentication enabled
CREATE
CATALOG iceberg_hms_on_hdfs_kerberos_hdfs_catalog PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'hms'
'warehouse'
'hdfs://127.0.0.1:8520/iceberg-hms-hdfs-warehouse'
'hive.metastore.uris'
'thrift://127.0.0.1:9583'
'hive.metastore.client.principal'
'hive/presto-master.docker.cluster@LABS.TERADATA.COM'
'hive.metastore.client.keytab'
'/keytabs/hive-presto-master.keytab'
'hive.metastore.service.principal'
'hive/hadoop-master@LABS.TERADATA.COM'
'hive.metastore.sasl.enabled '
'true'
'hive.metastore.authentication.type'
'kerberos'
'hadoop.security.auth_to_local'
'RULE:[2:\$1@\$0](.*@LABS.TERADATA.COM)s/@.*//
RULE:[2:\$1@\$0](.*@OTHERLABS.TERADATA.COM)s/@.*//
RULE:[2:\$1@\$0](.*@OTHERREALM.COM)s/@.*//
DEFAULT'
'fs.defaultFS'
'hdfs://127.0.0.1:8520'
'hadoop.security.authentication'
'kerberos'
'hadoop.kerberos.principal'
'hive/presto-master.docker.cluster@LABS.TERADATA.COM'
'hadoop.kerberos.keytab'
'/keytabs/hive-presto-master.keytab'
Access HMS without Kerberos authentication
CREATE
CATALOG iceberg_hms_on_s3_catalog PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'hms'
'hive.metastore.uris'
'thrift://127.0.0.1:9383'
'warehouse'
's3://selectdb-qa-datalake-test-hk/refactor-test/iceberg-hms-s3-warehouse'
's3.region'
'ap-east-1'
's3.access_key'
'
's3.secret_key'
'
Obtaining S3 access credentials using an IAM Assumed Role (3.1.2+)
CREATE
CATALOG iceberg_hms_on_s3_iamrole PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'hms'
'hive.metastore.uris'
'thrift://127.0.0.1:9383'
'warehouse'
's3://bucket/warehouse'
's3.region'
'us-east-1'
's3.role_arn'
'arn:aws:iam::543815668950:role/role'
Access HMS without Kerberos authentication
CREATE
CATALOG iceberg_hms_on_oss_catalog PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'hms'
'warehouse'
'oss://bucket/iceberg/iceberg-hms-warehouse'
'hive.metastore.uris'
'thrift://127.0.0.1:9083'
'oss.region'
'cn-beijing'
'oss.endpoint'
'oss-cn-beijing.aliyuncs.com'
'oss.access_key'
'
'oss.secret_key'
'
Access HMS with Kerberos authentication enabled
CREATE
CATALOG iceberg_hms_on_oss_kerberos_new_catalog PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'hms'
'warehouse'
'oss://bucket/iceberg/iceberg-hms-warehouse'
'hive.metastore.uris'
'thrift://127.0.0.1:9583'
'hive.metastore.client.principal'
'presto-server/presto-master.docker.cluster@LABS.TERADATA.COM'
'hive.metastore.client.keytab'
'/keytabs/presto-server.keytab'
'hive.metastore.authentication.type'
'kerberos'
'hive.metastore.service.principal'
'hive/hadoop-master@LABS.TERADATA.COM'
'hive.metastore.sasl.enabled '
'true'
'hadoop.security.auth_to_local'
'RULE:[2:\$1@\$0](.*@LABS.TERADATA.COM)s/@.*//
RULE:[2:\$1@\$0](.*@OTHERLABS.TERADATA.COM)s/@.*//
RULE:[2:\$1@\$0](.*@OTHERREALM.COM)s/@.*//
DEFAULT'
'oss.region'
'cn-beijing'
'oss.endpoint'
'oss-cn-beijing.aliyuncs.com'
'oss.access_key'
'
'oss.secret_key'
'
Access HMS without Kerberos authentication
CREATE
CATALOG
iceberg_hms_on_cos_catalog
PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'hms'
'warehouse'
'cos://bucket/iceberg-hms-cos-warehouse'
'hive.metastore.uris'
'thrift://127.0.0.1:9383'
'cos.region'
'ap-beijing'
'cos.endpoint'
'cos.ap-beijing.myqcloud.com'
'cos.secret_key'
'
'cos.access_key'
'
Access HMS with Kerberos authentication enabled
CREATE
CATALOG
iceberg_hms_on_cos_kerberos_new_catalog
PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'hms'
'warehouse'
'cos://bucket/iceberg-hms-cos-warehouse'
'hive.metastore.uris'
'thrift://127.0.0.1:9583'
'hive.metastore.service.principal'
'hive/hadoop-master@LABS.TERADATA.COM'
'hive.metastore.sasl.enabled'
'true'
'hive.metastore.client.principal'
'hive/presto-master.docker.cluster@LABS.TERADATA.COM'
'hive.metastore.client.keytab'
'/keytabs/keytabs/hive-presto-master.keytab'
'hadoop.security.auth_to_local'
'RULE:[2:$1@$0](.*@LABS.TERADATA.COM)s/@.*//
RULE:[2:$1@$0](.*@OTHERLABS.TERADATA.COM)s/@.*//
RULE:[2:$1@$0](.*@OTHERREALM.COM)s/@.*//
DEFAULT'
'cos.region'
'ap-beijing'
'cos.endpoint'
'cos.ap-beijing.myqcloud.com'
'cos.secret_key'
'
'cos.access_key'
'
Access HMS without Kerberos authentication
CREATE
CATALOG test_iceberg_on_hms_obs_catalog PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'hms'
'warehouse'
'obs://bucket/regression/iceberg'
'hive.metastore.uris'
'thrift://127.0.0.1:9383'
'obs.region'
'cn-north-4'
'obs.endpoint'
'obs.cn-north-4.myhuaweicloud.com'
'obs.access_key'
'
'obs.secret_key'
'
Access HMS with Kerberos authentication enabled
CREATE
CATALOG obs PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'hms'
'warehouse'
'obs://bucket/regression/iceberg1'
'hive.metastore.uris'
'thrift://127.0.0.1:9583'
'hive.metastore.service.principal'
'hive/hadoop-master@LABS.TERADATA.COM'
'hive.metastore.sasl.enabled'
'true'
'hive.metastore.client.principal'
'hive/presto-master.docker.cluster@LABS.TERADATA.COM'
'hive.metastore.client.keytab'
'/keytabs/hive-presto-master.keytab'
'hive.metastore.authentication.type'
'kerberos'
'hadoop.security.auth_to_local'
'RULE:[2:\$1@\$0](.*@LABS.TERADATA.COM)s/@.*//
RULE:[2:\$1@\$0](.*@OTHERLABS.TERADATA.COM)s/@.*//
RULE:[2:\$1@\$0](.*@OTHERREALM.COM)s/@.*//
DEFAULT'
'obs.region'
'cn-north-4'
'obs.endpoint'
'obs.cn-north-4.myhuaweicloud.com'
'obs.access_key'
'
'obs.secret_key'
'
CREATE
CATALOG
iceberg_hms_on_gcs_kerberos_new_catalog
PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'hms'
'hive.metastore.uris'
'thrift://127.0.0.1:9383'
'warehouse'
'gs://bucket/iceberg_warehouse'
'gs.access_key'
'
'gs.secret_key'
'
'fs.gcs.support'
'true'
Supported since 3.1.3
CREATE
CATALOG iceberg_fs_on_azure_blob_catalog PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'hadoop'
'warehouse'
'abfs://
'azure.account_name'
'
'azure.account_key'
'
'azure.endpoint'
'https://
'fs.azure.support'
'true'
Supported since 4.0.4
CREATE
CATALOG iceberg_fs_on_ozone_catalog PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'hadoop'
'warehouse'
's3a://bucket/iceberg_warehouse'
'fs.ozone.support'
'true'
'ozone.endpoint'
'http://ozone-s3g:9878'
'ozone.access_key'
'
'ozone.secret_key'
'
'ozone.region'
'us-east-1'
'ozone.use_path_style'
'true'
CREATE
CATALOG test_iceberg_on_hms_minio_catalog PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'hms'
'warehouse'
's3://warehouse/wh'
'hive.metastore.uris'
'thrift://127.0.0.1:9383'
'fs.minio.support'
'true'
'minio.use_path_style'
'true'
'minio.endpoint'
'http://127.0.0.1:19001'
'minio.access_key'
'
'minio.secret_key'
'
2.1 & 3.0 Version
HDFS
S3
OSS
COS
OBS
MinIO
Access HMS and HDFS services without Kerberos authentication
CREATE
CATALOG iceberg_hms_on_hdfs_catalog PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'hms'
'hive.metastore.uris'
'thrift://127.0.0.1:9383'
'warehouse'
'hdfs://127.0.0.1:8320/user/iceberg/warehouse/iceberg-hms-hdfs-warehouse'
'hadoop.username'
'doris'
'fs.defaultFS'
'hdfs://127.0.0.1:8320'
Access HMS and HDFS services with Kerberos authentication enabled
CREATE
CATALOG iceberg_hms_on_hdfs_kerberos_hdfs_catalog PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'hms'
'warehouse'
'hdfs://127.0.0.1:8520/iceberg-hms-hdfs-warehouse'
'hive.metastore.uris'
'thrift://127.0.0.1:9583'
'hive.metastore.kerberos.principal'
'hive/hadoop-master@LABS.TERADATA.COM'
'hive.metastore.sasl.enabled '
'true'
'hive.metastore.authentication.type'
'kerberos'
'hadoop.security.auth_to_local'
'RULE:[2:\$1@\$0](.*@LABS.TERADATA.COM)s/@.*//
RULE:[2:\$1@\$0](.*@OTHERLABS.TERADATA.COM)s/@.*//
RULE:[2:\$1@\$0](.*@OTHERREALM.COM)s/@.*//
DEFAULT'
'fs.defaultFS'
'hdfs://127.0.0.1:8520'
'hadoop.security.authentication'
'kerberos'
'hadoop.kerberos.principal'
'hive/presto-master.docker.cluster@LABS.TERADATA.COM'
'hadoop.kerberos.keytab'
'/keytabs/hive-presto-master.keytab'
CREATE
CATALOG iceberg_hms_on_s3_catalog PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'hms'
'hive.metastore.uris'
'thrift://127.0.0.1:9383'
'warehouse'
's3://selectdb-qa-datalake-test-hk/refactor-test/iceberg-hms-s3-warehouse'
's3.region'
'ap-east-1'
's3.access_key'
'
's3.secret_key'
'
CREATE
CATALOG iceberg_hms_on_oss_catalog PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'hms'
'hive.metastore.uris'
'thrift://127.0.0.1:9083'
'warehouse'
'oss://bucket/iceberg/iceberg-hms-warehouse'
'oss.region'
'cn-beijing'
'oss.endpoint'
'oss-cn-beijing.aliyuncs.com'
'oss.access_key'
'
'oss.secret_key'
'
CREATE
CATALOG
iceberg_hms_on_cos_catalog
PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'hms'
'warehouse'
'cos://bucket/iceberg-hms-cos-warehouse'
'hive.metastore.uris'
'thrift://127.0.0.1:9383'
'cos.region'
'ap-beijing'
'cos.endpoint'
'cos.ap-beijing.myqcloud.com'
'cos.secret_key'
'
'cos.access_key'
'
CREATE
CATALOG test_iceberg_on_hms_obs_catalog PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'hms'
'hive.metastore.uris'
'thrift://127.0.0.1:9383'
'warehouse'
'obs://bucket/regression/iceberg'
'obs.region'
'cn-north-4'
'obs.endpoint'
'obs.cn-north-4.myhuaweicloud.com'
'obs.access_key'
'
'obs.secret_key'
'
CREATE
CATALOG test_iceberg_on_hms_minio_catalog PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'hms'
'warehouse'
's3://warehouse/iceberg'
'hive.metastore.uris'
'thrift://127.0.0.1:9383'
's3.endpoint'
'http://127.0.0.1:19001'
's3.access_key'
'
's3.secret_key'
'
AWS Glue
3.1+ Version
S3
AWS Glue and S3 storage services share the same authentication credentials.
CREATE
CATALOG
iceberg_glue_on_s3_catalog_
PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'glue'
'warehouse'
's3://bucket/iceberg-glue-s3-warehouse'
'glue.region'
'ap-northeast-1'
'glue.endpoint'
'https://glue.ap-northeast-1.amazonaws.com'
'glue.access_key'
'
'glue.secret_key'
'
When Glue service authentication credentials differ from S3 authentication credentials, you can specify S3 authentication credentials separately using the following method.
CREATE
CATALOG
iceberg_glue_on_s3_catalog_
PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'glue'
'warehouse'
's3://selectdb-qa-datalake-test/iceberg-glue-s3-warehouse'
'glue.region'
'ap-northeast-1'
'glue.endpoint'
'https://glue.ap-northeast-1.amazonaws.com'
'glue.access_key'
'
'glue.secret_key'
'
's3.endpoint'
's3.ap-northeast-1.amazonaws.com'
's3.region'
'ap-northeast-1'
's3.access_key'
'
's3.secret_key'
'
CREATE
CATALOG
glue_iceberg_iamrole
PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'glue'
'warehouse'
's3://bucket/warehouse'
'glue.region'
'us-east-1'
'glue.endpoint'
'https://glue.us-east-1.amazonaws.com'
'glue.role_arn'
'
2.1 & 3.0 Version
S3
AWS Glue and S3 storage services share the same authentication credentials.
In non-EC2 environments, you need to use
aws configure
to configure Credentials information and generate a credentials file in the ~/.aws directory.
CREATE
CATALOG glue PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'glue'
'glue.endpoint'
'https://glue.us-east-1.amazonaws.com'
'glue.access_key'
'
'glue.secret_key'
'
Aliyun DLF
4.1+ Version
DLF 2.5+
CREATE
CATALOG iceberg_dlf2_catalog PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'rest'
'iceberg.rest.uri'
'http://
'warehouse'
'
'iceberg.rest.sigv4-enabled'
'true'
'iceberg.rest.signing-name'
'DlfNext'
'iceberg.rest.access-key-id'
'
'iceberg.rest.secret-access-key'
'
'iceberg.rest.signing-region'
'
'iceberg.rest.vended-credentials-enabled'
'true'
'io-impl'
'org.apache.iceberg.rest.DlfFileIO'
'fs.oss.support'
'true'
3.1+ Version
DLF 1.0
CREATE
CATALOG iceberg_dlf_catalog_catalog PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'dlf'
'warehouse'
'oss://bucket/iceberg-dlf-oss-warehouse'
'dlf.uid'
'203225413946383283'
'dlf.catalog_id'
'p2_regression_case'
'dlf.region'
'cn-beijing'
'dlf.endpoint'
'datalake.cn-beijing.aliyuncs.com'
'dlf.access_key'
'
'dlf.secret_key'
'
2.1 & 3.0 Version
DLF 1.0
CREATE
CATALOG iceberg_dlf_catalog_catalog PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'dlf'
'warehouse'
'oss://bucket/iceberg-dlf-oss-warehouse'
'dlf.uid'
'203225413946383283'
'dlf.catalog.id'
'catalog_id'
'dlf.region'
'cn-beijing'
'dlf.access_key'
'
'dlf.secret_key'
'
Iceberg Rest Catalog
3.1+ Version
S3
OSS
COS
OBS
GCS
HDFS
No Authentication
OAuth2 Authentication
Vended Credentials
CREATE
CATALOG iceberg_static_s3 PROPERTIES
'type'
'iceberg'
'warehouse'
's3://warehouse'
'iceberg.catalog.type'
'rest'
'iceberg.rest.uri'
'http://127.0.0.1:8181'
's3.endpoint'
'https://s3.ap-east-1.amazonaws.com'
's3.access_key'
'
's3.secret_key'
'
's3.region'
'ap-east-1'
CREATE
CATALOG polaris_oauth2_credential_static_s3 PROPERTIES
'type'
'iceberg'
'warehouse'
'doris_test'
'iceberg.catalog.type'
'rest'
'iceberg.rest.uri'
'http://127.0.0.1:8181/api/catalog'
'iceberg.rest.security.type'
'oauth2'
'iceberg.rest.oauth2.credential'
'root:secret123'
'iceberg.rest.oauth2.server-uri'
'http://127.0.0.1:8181/api/catalog/v1/oauth/tokens'
'iceberg.rest.oauth2.scope'
'PRINCIPAL_ROLE:ALL'
's3.endpoint'
'https://s3.us-west-2.amazonaws.com'
's3.access_key'
'
's3.secret_key'
'
's3.region'
'us-west-2'
CREATE
CATALOG gravitino_vended_s3 PROPERTIES
'type'
'iceberg'
'warehouse'
'warehouse'
'iceberg.catalog.type'
'rest'
'iceberg.rest.uri'
'http://127.0.0.1:9001/iceberg/'
'iceberg.rest.vended-credentials-enabled'
'true'
No Authentication
OAuth2 Authentication
CREATE
CATALOG iceberg_static_oss PROPERTIES
'type'
'iceberg'
'warehouse'
's3://warehouse'
'iceberg.catalog.type'
'rest'
'iceberg.rest.uri'
'http://127.0.0.1:8181'
'oss.endpoint'
'https://oss-cn-beijing.aliyuncs.com'
'oss.access_key'
'
'oss.secret_key'
'
'oss.region'
'cn-beijing'
CREATE
CATALOG polaris_oauth2_credential_static_oss PROPERTIES
'type'
'iceberg'
'warehouse'
'doris_test'
'iceberg.catalog.type'
'rest'
'iceberg.rest.uri'
'http://127.0.0.1:8181/api/catalog'
'iceberg.rest.security.type'
'oauth2'
'iceberg.rest.oauth2.credential'
'root:secret123'
'iceberg.rest.oauth2.server-uri'
'http://127.0.0.1:8181/api/catalog/v1/oauth/tokens'
'iceberg.rest.oauth2.scope'
'PRINCIPAL_ROLE:ALL'
'oss.endpoint'
'https://oss-cn-beijing.aliyuncs.com'
'oss.access_key'
'
'oss.secret_key'
'
'oss.region'
'cn-beijing'
No Authentication
OAuth2 Authentication
CREATE
CATALOG iceberg_static_cos PROPERTIES
'type'
'iceberg'
'warehouse'
's3://warehouse'
'iceberg.catalog.type'
'rest'
'iceberg.rest.uri'
'http://127.0.0.1:8181'
'cos.endpoint'
'https://cos.ap-beijing.myqcloud.com'
'cos.access_key'
'
'cos.secret_key'
'
'cos.region'
'ap-beijing'
CREATE
CATALOG polaris_oauth2_credential_static_cos PROPERTIES
'type'
'iceberg'
'warehouse'
'doris_test'
'iceberg.catalog.type'
'rest'
'iceberg.rest.uri'
'http://127.0.0.1:8181/api/catalog'
'iceberg.rest.security.type'
'oauth2'
'iceberg.rest.oauth2.credential'
'root:secret123'
'iceberg.rest.oauth2.server-uri'
'http://127.0.0.1:8181/api/catalog/v1/oauth/tokens'
'iceberg.rest.oauth2.scope'
'PRINCIPAL_ROLE:ALL'
'cos.endpoint'
'https://cos.ap-beijing.myqcloud.com'
'cos.access_key'
'
'cos.secret_key'
'
'cos.region'
'ap-beijing'
No Authentication
OAuth2 Authentication
CREATE
CATALOG iceberg_static_obs PROPERTIES
'type'
'iceberg'
'warehouse'
's3://warehouse'
'iceberg.catalog.type'
'rest'
'iceberg.rest.uri'
'http://127.0.0.1:8181'
'obs.endpoint'
'obs.cn-north-4.myhuaweicloud.com'
'obs.access_key'
'
'obs.secret_key'
'
'obs.region'
'cn-north-4'
CREATE
CATALOG polaris_oauth2_credential_static_obs PROPERTIES
'type'
'iceberg'
'warehouse'
'doris_test'
'iceberg.catalog.type'
'rest'
'iceberg.rest.uri'
'http://127.0.0.1:8181/api/catalog'
'iceberg.rest.security.type'
'oauth2'
'iceberg.rest.oauth2.credential'
'root:secret123'
'iceberg.rest.oauth2.server-uri'
'http://127.0.0.1:8181/api/catalog/v1/oauth/tokens'
'iceberg.rest.oauth2.scope'
'PRINCIPAL_ROLE:ALL'
'obs.endpoint'
'obs.cn-north-4.myhuaweicloud.com'
'obs.access_key'
'
'obs.secret_key'
'
'obs.region'
'cn-north-4'
No Authentication
OAuth2 Authentication
CREATE
CATALOG iceberg_static_gcs PROPERTIES
'type'
'iceberg'
'warehouse'
'gs://warehouse'
'iceberg.catalog.type'
'rest'
'iceberg.rest.uri'
'http://127.0.0.1:8181'
'gs.endpoint'
'https://storage.googleapis.com'
'gs.access_key'
'
'gs.secret_key'
'
CREATE
CATALOG polaris_oauth2_credential_static_gcs PROPERTIES
'type'
'iceberg'
'warehouse'
'doris_test'
'iceberg.catalog.type'
'rest'
'iceberg.rest.uri'
'http://127.0.0.1:8181/api/catalog'
'iceberg.rest.security.type'
'oauth2'
'iceberg.rest.oauth2.credential'
'root:secret123'
'iceberg.rest.oauth2.server-uri'
'http://127.0.0.1:8181/api/catalog/v1/oauth/tokens'
'iceberg.rest.oauth2.scope'
'PRINCIPAL_ROLE:ALL'
'gs.endpoint'
'https://storage.googleapis.com'
'gs.access_key'
'
'gs.secret_key'
'
CREATE
CATALOG iceberg_static_hdfs PROPERTIES
'type'
'iceberg'
'warehouse'
'hdfs://127.0.0.1:8020/user/iceberg/warehouse'
'iceberg.catalog.type'
'rest'
'iceberg.rest.uri'
'http://127.0.0.1:8181'
'fs.defaultFS'
'hdfs://127.0.0.1:8020'
2.1 & 3.0 Version
S3
OSS
COS
OBS
HDFS
CREATE
CATALOG iceberg_static_s3 PROPERTIES
'type'
'iceberg'
'warehouse'
's3://warehouse'
'iceberg.catalog.type'
'rest'
'uri'
'http://127.0.0.1:8181'
's3.endpoint'
'https://s3.ap-east-1.amazonaws.com'
's3.access_key'
'
's3.secret_key'
'
's3.region'
'ap-east-1'
CREATE
CATALOG iceberg_static_oss PROPERTIES
'type'
'iceberg'
'warehouse'
's3://warehouse'
'iceberg.catalog.type'
'rest'
'uri'
'http://127.0.0.1:8181'
's3.endpoint'
'https://oss-cn-beijing.aliyuncs.com'
's3.access_key'
'
's3.secret_key'
'
's3.region'
'cn-beijing'
CREATE
CATALOG iceberg_static_cos PROPERTIES
'type'
'iceberg'
'warehouse'
's3://warehouse'
'iceberg.catalog.type'
'rest'
'uri'
'http://127.0.0.1:8181'
's3.endpoint'
'https://cos.ap-beijing.myqcloud.com'
's3.access_key'
'
's3.secret_key'
'
's3.region'
'ap-beijing'
CREATE
CATALOG iceberg_static_obs PROPERTIES
'type'
'iceberg'
'warehouse'
's3://warehouse'
'iceberg.catalog.type'
'rest'
'uri'
'http://127.0.0.1:8181'
's3.endpoint'
'obs.cn-north-4.myhuaweicloud.com'
's3.access_key'
'
's3.secret_key'
'
's3.region'
'cn-north-4'
CREATE
CATALOG iceberg_static_hdfs PROPERTIES
'type'
'iceberg'
'warehouse'
'hdfs://127.0.0.1:8020/user/iceberg/warehouse'
'iceberg.catalog.type'
'rest'
'uri'
'http://127.0.0.1:8181'
'fs.defaultFS'
'hdfs://127.0.0.1:8020'
Iceberg JDBC Catalog
Note
This is an experimental feature, supported since version 4.1.0.
4.1+ Version
PostgreSQL
MySQL
SQLite
Using PostgreSQL as metadata storage with S3
CREATE
CATALOG iceberg_jdbc_postgresql PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'jdbc'
'iceberg.jdbc.uri'
'jdbc:postgresql://127.0.0.1:5432/iceberg_db'
'iceberg.jdbc.user'
'iceberg_user'
'iceberg.jdbc.password'
'password'
'iceberg.jdbc.init-catalog-tables'
'true'
'iceberg.jdbc.schema-version'
'V1'
'iceberg.jdbc.driver_class'
'org.postgresql.Driver'
'iceberg.jdbc.driver_url'
'
'warehouse'
's3://bucket/warehouse'
's3.access_key'
'
's3.secret_key'
'
's3.endpoint'
'https://s3.us-east-1.amazonaws.com'
's3.region'
'us-east-1'
Using MySQL as metadata storage with S3
CREATE
CATALOG iceberg_jdbc_mysql PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'jdbc'
'iceberg.jdbc.uri'
'jdbc:mysql://127.0.0.1:3306/iceberg_db'
'iceberg.jdbc.user'
'iceberg_user'
'iceberg.jdbc.password'
'password'
'iceberg.jdbc.init-catalog-tables'
'true'
'iceberg.jdbc.schema-version'
'V1'
'iceberg.jdbc.driver_class'
'com.mysql.cj.jdbc.Driver'
'iceberg.jdbc.driver_url'
'
'warehouse'
's3://bucket/warehouse'
's3.access_key'
'
's3.secret_key'
'
's3.endpoint'
'https://s3.us-east-1.amazonaws.com'
's3.region'
'us-east-1'
Using SQLite as metadata storage (suitable for testing environments)
CREATE
CATALOG iceberg_jdbc_sqlite PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'jdbc'
'iceberg.jdbc.uri'
'jdbc:sqlite:/tmp/iceberg_catalog.db'
'iceberg.jdbc.init-catalog-tables'
'true'
'iceberg.jdbc.schema-version'
'V1'
'iceberg.jdbc.driver_class'
'org.sqlite.JDBC'
'iceberg.jdbc.driver_url'
'
'warehouse'
's3://bucket/warehouse'
's3.access_key'
'
's3.secret_key'
'
's3.endpoint'
'https://s3.us-east-1.amazonaws.com'
's3.region'
'us-east-1'
FileSystem
3.1+ Version
HDFS
S3
COS
OSS
OBS
GCS
MinIO
Access HDFS service without Kerberos authentication
CREATE
CATALOG iceberg_fs_on_hdfs_catalog PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'hadoop'
'warehouse'
'hdfs://127.0.0.1:8320/user/iceberg/warehouse/iceberg-fs-hdfs-warehouse'
'hadoop.username'
'doris'
'fs.defaultFS'
'hdfs://127.0.0.1:8320'
Access HDFS service with Kerberos authentication enabled
CREATE
CATALOG iceberg_fs_on_hdfs_kerberos_catalog PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'hadoop'
'warehouse'
'hdfs://127.0.0.1:8520/iceberg-fs-hdfs-warehouse'
'fs.defaultFS'
'hdfs://127.0.0.1:8520'
'hadoop.security.authentication'
'kerberos'
'hadoop.kerberos.principal'
'hive/presto-master.docker.cluster@LABS.TERADATA.COM'
'hadoop.kerberos.keytab'
'/keytabs/hive-presto-master.keytab'
CREATE
CATALOG iceberg_fs_on_glue_catalog PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'hadoop'
'warehouse'
's3://selectdb-qa-datalake-test-hk/refactor-test/iceberg-fs-s3-warehouse'
's3.region'
'ap-east-1'
's3.access_key'
'
's3.secret_key'
'
Obtaining S3 access credentials using an IAM Assumed Role (3.1.2+)
CREATE
CATALOG iceberg_fs_on_s3_iamrole PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'hadoop'
'warehouse'
's3://bucket/warehouse'
's3.region'
'us-east-1'
's3.role_arn'
'arn:aws:iam::543815668950:role/role'
CREATE
CATALOG iceberg_fs_on_cos_catalog PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'hadoop'
'warehouse'
'cos://doris-build-1308700295/iceberg-fs-cos-warehouse'
'cos.region'
'ap-beijing'
'cos.endpoint'
'cos.ap-beijing.myqcloud.com'
'cos.access_key'
'
'cos.secret_key'
'
CREATE
CATALOG iceberg_fs_on_oss_catalog PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'hadoop'
'warehouse'
'oss://doris-regression-bj/refactor-test/iceberg-fs-oss-warehouse'
'oss.region'
'cn-beijing'
'oss.endpoint'
'oss-cn-beijing.aliyuncs.com'
'oss.access_key'
'
'oss.secret_key'
'
CREATE
CATALOG test_fs_on_obs_catalog PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'hadoop'
'warehouse'
'obs://bucket/regression/iceberg'
'obs.region'
'cn-north-4'
'obs.endpoint'
'obs.cn-north-4.myhuaweicloud.com'
'obs.access_key'
'
'obs.secret_key'
'
CREATE
CATALOG iceberg_fs_on_gcs_catalog PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'hadoop'
'warehouse'
'gs://bucket/iceberg_warehouse'
'fs.gcs.support'
'true'
'gs.access_key'
'
'gs.secret_key'
'
CREATE
CATALOG test_iceberg_fs_on_minio PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'hadoop'
'warehouse'
's3://warehouse/wh'
'fs.minio.support'
'true'
'minio.endpoint'
'http://127.0.0.1:19001'
'minio.access_key'
'admin'
'minio.secret_key'
'password'
'minio.use_path_style'
'true'
2.1 & 3.0 Version
HDFS
S3
COS
OSS
GCS
MinIO
Access HDFS service without Kerberos authentication
CREATE
CATALOG iceberg_fs_on_hdfs_catalog PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'hadoop'
'warehouse'
'hdfs://127.0.0.1:8320/user/iceberg/warehouse/iceberg-fs-hdfs-warehouse'
'hadoop.username'
'doris'
'fs.defaultFS'
'hdfs://127.0.0.1:8320'
Access HDFS service with Kerberos authentication enabled
CREATE
CATALOG iceberg_fs_on_hdfs_kerberos_catalog PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'hadoop'
'warehouse'
'hdfs://127.0.0.1:8520/iceberg-fs-hdfs-warehouse'
'fs.defaultFS'
'hdfs://127.0.0.1:8520'
'hadoop.security.authentication'
'kerberos'
'hadoop.kerberos.principal'
'hive/presto-master.docker.cluster@LABS.TERADATA.COM'
'hadoop.kerberos.keytab'
'/keytabs/hive-presto-master.keytab'
CREATE
CATALOG iceberg_fs_on_s3_catalog PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'hadoop'
'warehouse'
's3://selectdb-qa-datalake-test-hk/refactor-test/iceberg-fs-s3-warehouse'
's3.region'
'ap-east-1'
's3.endpoint'
's3.ap-east-1.amazonaws.com'
's3.access_key'
'
's3.secret_key'
'
CREATE
CATALOG iceberg_fs_on_cos_catalog PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'hadoop'
'warehouse'
'cos://doris-build-1308700295/iceberg-fs-cos-warehouse'
'cos.region'
'ap-beijing'
'cos.endpoint'
'cos.ap-beijing.myqcloud.com'
'cos.access_key'
'
'cos.secret_key'
'
CREATE
CATALOG iceberg_fs_on_oss_catalog PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'hadoop'
'warehouse'
'oss://bucket/refactor-test/iceberg-fs-oss-warehouse'
'oss.region'
'cn-beijing'
'oss.endpoint'
'oss-cn-beijing.aliyuncs.com'
'oss.access_key'
'
'oss.secret_key'
'
CREATE
CATALOG iceberg_fs_on_gcs_catalog PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'hadoop'
'warehouse'
's3://bucket/iceberg_warehouse'
'gs.endpoint'
'storage.googleapis.com'
'gs.access_key'
'
'gs.secret_key'
'
CREATE
CATALOG test_iceberg_fs_on_minio PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'hadoop'
'warehouse'
's3://warehouse/wh'
's3.region'
'ap-east-1'
's3.endpoint'
'http://minio:9000'
's3.access_key'
'
's3.secret_key'
'
AWS S3 Tables
3.1+ Version
S3 Tables Rest Catalog
AWS Glue Rest Catalog
FileSystem Catalog
Please refer to the
S3 Tables Integration
documentation.
CREATE
CATALOG aws_s3_tables PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'rest'
'warehouse'
'arn:aws:s3tables:us-east-1:
'iceberg.rest.uri'
'https://s3tables.us-east-1.amazonaws.com/iceberg'
'iceberg.rest.sigv4-enabled'
'true'
'iceberg.rest.signing-name'
's3tables'
'iceberg.rest.signing-region'
'us-east-1'
'iceberg.rest.access-key-id'
'
'iceberg.rest.secret-access-key'
'
Please refer to the
S3 Tables Integration
documentation.
CREATE
CATALOG glue_s3 PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
'rest'
'iceberg.rest.uri'
'https://glue.ap-east-1.amazonaws.com/iceberg'
'warehouse'
'
'iceberg.rest.sigv4-enabled'
'true'
'iceberg.rest.signing-name'
'glue'
'iceberg.rest.access-key-id'
'
'iceberg.rest.secret-access-key'
'
'iceberg.rest.signing-region'
'ap-east-1'
Please refer to the
S3 Tables Integration
documentation.
CREATE
CATALOG test_s3tables_write_insert PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
's3tables'
'warehouse'
'arn:aws:s3tables:ap-east-1:12345:bucket/doris-s3-table-bucket'
's3.region'
'ap-east-1'
's3.endpoint'
'https://s3.ap-east-1.amazonaws.com'
's3.access_key'
'
's3.secret_key'
'
3.0.6+ Version
FileSystem Catalog
Please refer to the
S3 Tables Integration
documentation.
CREATE
CATALOG test_s3tables_catalog PROPERTIES
'type'
'iceberg'
'iceberg.catalog.type'
's3tables'
'warehouse'
'arn:aws:s3tables:ap-east-1:9527:bucket/doris-s3-table-bucket'
's3.region'
'ap-east-1'
's3.endpoint'
'https://s3.ap-east-1.amazonaws.com'
's3.access_key'
'
's3.secret_key'
'
Query Operations
Basic Query
-- 1. switch to catalog, use database and query
SWITCH iceberg
USE
iceberg_db
SELECT
FROM
iceberg_tbl
LIMIT
10
-- 2. use iceberg database directly
USE
iceberg
iceberg_db
SELECT
FROM
iceberg_tbl
LIMIT
10
-- 3. use full qualified name to query
SELECT
FROM
iceberg
iceberg_db
iceberg_tbl
LIMIT
10
Time Travel
You can read a specific snapshot of an Iceberg table.
By default, read requests will access the latest snapshot version.
You can query a specific snapshot of an Iceberg table using the
iceberg_meta()
table function:
SELECT
FROM
iceberg_meta
'table'
'iceberg_ctl.iceberg_db.iceberg_tbl'
'query_type'
'snapshots'
\G
1.
row
committed_at:
2024
11
28
11
07
29
snapshot_id:
8903826400153112036
parent_id:
operation: append
manifest_list: oss:
//path/to/metadata/snap-8903826400153112036-1-3835e66d-9a18-4cb0-b9b0-9ec80527ad8d.avro
summary: {
"added-data-files"
"2"
"added-records"
"3"
"added-files-size"
"2742"
"changed-partition-count"
"2"
"total-records"
"3"
"total-files-size"
"2742"
"total-data-files"
"2"
"total-delete-files"
"0"
"total-position-deletes"
"0"
"total-equality-deletes"
"0"
2.
row
committed_at:
2024
11
28
11
10
11
snapshot_id:
6099853805930794326
parent_id:
8903826400153112036
operation: append
manifest_list: oss:
//path/to/metadata/snap-6099853805930794326-1-dd46a1bd-219b-4fb0-bb46-ac441d8b3105.avro
summary: {
"added-data-files"
"1"
"added-records"
"1"
"added-files-size"
"1367"
"changed-partition-count"
"1"
"total-records"
"4"
"total-files-size"
"4109"
"total-data-files"
"3"
"total-delete-files"
"0"
"total-position-deletes"
"0"
"total-equality-deletes"
"0"
You can use the
FOR TIME AS OF
and
FOR VERSION AS OF
clauses to read historical data based on snapshot ID or the time the snapshot was created. Here are some examples:
-- Read data as of a specific timestamp
SELECT
FROM
iceberg_table
FOR
TIME
AS
OF
'2023-01-01 00:00:00'
-- Read data as of a specific snapshot ID
SELECT
FROM
iceberg_table
FOR
VERSION
AS
OF
123456789
Branch and Tag
Since 3.1.0.
For creating, dropping and managing branch and tag, please refer to the [Managing Branch & Tag] section.
Reading specific branches and tags of Iceberg tables is supported.
Multiple syntax forms are supported to be compatible with systems such as Spark/Trino.
-- BRANCH
SELECT
FROM
iceberg_tbl
@branch
branch1
SELECT
FROM
iceberg_tbl
@branch
"name"
"branch1"
SELECT
FROM
iceberg_tbl
FOR
VERSION
AS
OF
'branch1'
-- TAG
SELECT
FROM
iceberg_tbl
@tag
tag1
SELECT
FROM
iceberg_tbl
@tag
"name"
"tag1"
SELECT
FROM
iceberg_tbl
FOR
VERSION
AS
OF
'tag1'
For the
FOR VERSION AS OF
syntax, Doris will automatically determine whether the parameter is a timestamp or a Branch/Tag name.
View
Since 3.1.0
Supports querying Iceberg views. View queries work the same way as regular table queries. Please note the following:
Only
hms
type Iceberg Catalog is supported.
The view definition SQL must be compatible with Doris SQL dialect, otherwise parsing errors will occur. (Dialect conversion functionality will be provided in future versions).
Iceberg V3 Hidden Columns (Row Lineage)
tip
This is an experimental feature, supported since version 4.1.0.
Doris supports querying Row Lineage hidden columns of Iceberg V3 tables. Iceberg V3 introduces system hidden columns that contain row lineage data, which is used to track data change records. This is highly suitable for incremental synchronization or data auditing:
_row_id
: A numerical value that uniquely identifies a row of data (generated based on the file and position where the data is written).
_last_updated_sequence_number
: The Sequence Number of the last modification to this row. When an
UPDATE
or
MERGE INTO
operation updates the data, this numerical value automatically increments along with the version number.
How to query hidden columns:
By default, hidden columns are not visible when using
DESC
or
SELECT *
. You can explicitly query them by specifying the column names directly:
SELECT
id
_row_id
_last_updated_sequence_number
FROM
iceberg_tbl
You can also make hidden columns visible in regular displays (like
DESC
and
SELECT *
) by enabling the session variable
show_hidden_columns
SET
show_hidden_columns
true
DESC
iceberg_tbl
Notes:
The Iceberg table's format version (
format-version
) must be configured as
for these row-level parameters to be generated. Querying row lineage columns on V1 and V2 format tables may result in errors.
Hidden columns are maintained by the system. Users are not allowed to explicitly specify or write to the
_row_id
or
_last_updated_sequence_number
columns when using writing statements like
INSERT
System Tables
Since 3.1.0
Doris supports querying Iceberg system tables to retrieve metadata information about tables. You can use system tables to view snapshot history, manifest files, data files, partitions, and other metadata.
To access metadata of an Iceberg table, append a
symbol followed by the system table name to the table name:
SELECT
FROM
iceberg_table$system_table_name
For example, to view the table's history, you can execute:
SELECT
FROM
iceberg_table$history
The
all_manifests
system table is supported starting from version 4.0.4.
The
position_deletes
system table is not yet supported and is planned to be supported in future versions.
entries
Shows all manifest entries for the current snapshot of the table:
all_entries
and
entries
are similar, with the difference that
all_entries
contains entries from all snapshots, while
entries
only contains entries from the current snapshot.
SELECT
FROM
iceberg_table$entries
Result:
+--------+---------------------+-----------------+----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| status | snapshot_id | sequence_number | file_sequence_number | data_file | readable_metrics |
+--------+---------------------+-----------------+----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 2 | 4890031351138056789 | 1 | 1 | {"content":0, "file_path":"s3://.../iceberg_table/data/id=1/00000-16-79ef2fd7-9997-47eb-a91a-9f7af8201315-0-00001.parquet", "file_format":"PARQUET", "spec_id":0, "partition":{"id":1}, "record_count":1, "file_size_in_bytes":625, "column_sizes":{1:36, 2:41}, "value_counts":{1:1, 2:1}, "null_value_counts":{1:0, 2:0}, "nan_value_counts":{}, "lower_bounds":{1:" ", 2:"Alice"}, "upper_bounds":{1:" ", 2:"Alice"}, "key_metadata":null, "split_offsets":[4], "equality_ids":null, "sort_order_id":0, "first_row_id":null, "referenced_data_file":null, "content_offset":null, "content_size_in_bytes":null} | {"id":{"column_size":36, "value_count":1, "null_value_count":0, "nan_value_count":null, "lower_bound":1, "upper_bound":1}, "name":{"column_size":41, "value_count":1, "null_value_count":0, "nan_value_count":null, "lower_bound":"Alice", "upper_bound":"Alice"}} |
| 0 | 1851184769713369003 | 1 | 1 | {"content":0, "file_path":"s3://.../iceberg_table/data/id=2/00000-16-79ef2fd7-9997-47eb-a91a-9f7af8201315-0-00002.parquet", "file_format":"PARQUET", "spec_id":0, "partition":{"id":2}, "record_count":1, "file_size_in_bytes":611, "column_sizes":{1:36, 2:39}, "value_counts":{1:1, 2:1}, "null_value_counts":{1:0, 2:0}, "nan_value_counts":{}, "lower_bounds":{1:" ", 2:"Bob"}, "upper_bounds":{1:" ", 2:"Bob"}, "key_metadata":null, "split_offsets":[4], "equality_ids":null, "sort_order_id":0, "first_row_id":null, "referenced_data_file":null, "content_offset":null, "content_size_in_bytes":null} | {"id":{"column_size":36, "value_count":1, "null_value_count":0, "nan_value_count":null, "lower_bound":2, "upper_bound":2}, "name":{"column_size":39, "value_count":1, "null_value_count":0, "nan_value_count":null, "lower_bound":"Bob", "upper_bound":"Bob"}} |
+--------+---------------------+-----------------+----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
files
Shows file list for the current snapshot of the table:
SELECT
FROM
iceberg_table$files
Result:
+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+---------+-----------+--------------+--------------------+--------------+--------------+-------------------+------------------+-----------------------+-----------------------+--------------+---------------+--------------+---------------+--------------+----------------------+----------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| content | file_path | file_format | spec_id | partition | record_count | file_size_in_bytes | column_sizes | value_counts | null_value_counts | nan_value_counts | lower_bounds | upper_bounds | key_metadata | split_offsets | equality_ids | sort_order_id | first_row_id | referenced_data_file | content_offset | content_size_in_bytes | readable_metrics |
+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+---------+-----------+--------------+--------------------+--------------+--------------+-------------------+------------------+-----------------------+-----------------------+--------------+---------------+--------------+---------------+--------------+----------------------+----------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0 | s3://.../iceberg_table/data/id=2/00000-16-79ef2fd7-9997-47eb-a91a-9f7af8201315-0-00002.parquet | PARQUET | 0 | {"id":2} | 1 | 611 | {1:36, 2:39} | {1:1, 2:1} | {1:0, 2:0} | {} | {1:" ", 2:"Bob"} | {1:" ", 2:"Bob"} | NULL | [4] | NULL | 0 | NULL | NULL | NULL | NULL | {"id":{"column_size":36, "value_count":1, "null_value_count":0, "nan_value_count":null, "lower_bound":2, "upper_bound":2}, "name":{"column_size":39, "value_count":1, "null_value_count":0, "nan_value_count":null, "lower_bound":"Bob", "upper_bound":"Bob"}} |
| 0 | s3://.../iceberg_table/data/id=4/00000-16-79ef2fd7-9997-47eb-a91a-9f7af8201315-0-00004.parquet | PARQUET | 0 | {"id":4} | 1 | 618 | {1:36, 2:40} | {1:1, 2:1} | {1:0, 2:0} | {} | {1:" ", 2:"Dave"} | {1:" ", 2:"Dave"} | NULL | [4] | NULL | 0 | NULL | NULL | NULL | NULL | {"id":{"column_size":36, "value_count":1, "null_value_count":0, "nan_value_count":null, "lower_bound":4, "upper_bound":4}, "name":{"column_size":40, "value_count":1, "null_value_count":0, "nan_value_count":null, "lower_bound":"Dave", "upper_bound":"Dave"}} |
| 0 | s3://.../iceberg_table/data/id=6/00000-16-79ef2fd7-9997-47eb-a91a-9f7af8201315-0-00006.parquet | PARQUET | 0 | {"id":6} | 1 | 625 | {1:36, 2:41} | {1:1, 2:1} | {1:0, 2:0} | {} | {1:" ", 2:"Frank"} | {1:" ", 2:"Frank"} | NULL | [4] | NULL | 0 | NULL | NULL | NULL | NULL | {"id":{"column_size":36, "value_count":1, "null_value_count":0, "nan_value_count":null, "lower_bound":6, "upper_bound":6}, "name":{"column_size":41, "value_count":1, "null_value_count":0, "nan_value_count":null, "lower_bound":"Frank", "upper_bound":"Frank"}} |
| 0 | s3://.../iceberg_table/data/id=8/00000-16-79ef2fd7-9997-47eb-a91a-9f7af8201315-0-00008.parquet | PARQUET | 0 | {"id":8} | 1 | 625 | {1:36, 2:41} | {1:1, 2:1} | {1:0, 2:0} | {} | {1:" ", 2:"Heidi"} | {1:" ", 2:"Heidi"} | NULL | [4] | NULL | 0 | NULL | NULL | NULL | NULL | {"id":{"column_size":36, "value_count":1, "null_value_count":0, "nan_value_count":null, "lower_bound":8, "upper_bound":8}, "name":{"column_size":41, "value_count":1, "null_value_count":0, "nan_value_count":null, "lower_bound":"Heidi", "upper_bound":"Heidi"}} |
| 0 | s3://.../iceberg_table/data/id=10/00000-16-79ef2fd7-9997-47eb-a91a-9f7af8201315-0-00010.parquet | PARQUET | 0 | {"id":10} | 1 | 618 | {1:36, 2:40} | {1:1, 2:1} | {1:0, 2:0} | {} | {1:" ", 2:"Judy"} | {1:" ", 2:"Judy"} | NULL | [4] | NULL | 0 | NULL | NULL | NULL | NULL | {"id":{"column_size":36, "value_count":1, "null_value_count":0, "nan_value_count":null, "lower_bound":10, "upper_bound":10}, "name":{"column_size":40, "value_count":1, "null_value_count":0, "nan_value_count":null, "lower_bound":"Judy", "upper_bound":"Judy"}} |
+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+---------+-----------+--------------+--------------------+--------------+--------------+-------------------+------------------+-----------------------+-----------------------+--------------+---------------+--------------+---------------+--------------+----------------------+----------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Similar to the
files
system table, you can also query specific file type information through the following system tables:
-- Query data files for the current snapshot
SELECT
FROM
iceberg_table$data_files
-- Query delete files for the current snapshot
SELECT
FROM
iceberg_table$delete_files
-- Query all files (including data and delete files) from all snapshots
SELECT
FROM
iceberg_table$all_files
-- Query data files from all snapshots
SELECT
FROM
iceberg_table$all_data_files
-- Query delete files from all snapshots
SELECT
FROM
iceberg_table$all_delete_files
The result format of these tables is similar to the
files
system table, but each focuses specifically on data files or delete files. System tables with the
all_
prefix contain files from all snapshots, not just files from the current snapshot.
Note: When specific types of files do not exist in the table (for example, querying
delete_files
when there are no delete files in the table), the query result may be empty.
history
Shows all history of the table:
SELECT
FROM
iceberg_table$history
Result:
+----------------------------+---------------------+---------------------+---------------------+
| made_current_at | snapshot_id | parent_id | is_current_ancestor |
+----------------------------+---------------------+---------------------+---------------------+
| 2025-06-12 22:29:16.357000 | 1851184769713369003 | NULL | 1 |
| 2025-06-12 22:29:39.922000 | 4890031351138056789 | 1851184769713369003 | 1 |
+----------------------------+---------------------+---------------------+---------------------+
manifests
Shows manifest file info of the table:
SELECT
FROM
iceberg_table$manifests
Result:
+---------+------------------------------------------------------------------------------------------------------------------------------------------------+--------+-------------------+---------------------+------------------------+---------------------------+--------------------------+--------------------------+-----------------------------+----------------------------+--------------------------------------------------------------------------------+
| content | path | length | partition_spec_id | added_snapshot_id | added_data_files_count | existing_data_files_count | deleted_data_files_count | added_delete_files_count | existing_delete_files_count | deleted_delete_files_count | partition_summaries |
+---------+------------------------------------------------------------------------------------------------------------------------------------------------+--------+-------------------+---------------------+------------------------+---------------------------+--------------------------+--------------------------+-----------------------------+----------------------------+--------------------------------------------------------------------------------+
| 0 | s3://.../iceberg_table/metadata/3194eb8b-5ea4-4cbe-95ba-073229458e7b-m0.avro | 7138 | 0 | 4890031351138056789 | 0 | 5 | 5 | 0 | 0 | 0 | [{"contains_null":0, "contains_nan":0, "lower_bound":"1", "upper_bound":"10"}] |
+---------+------------------------------------------------------------------------------------------------------------------------------------------------+--------+-------------------+---------------------+------------------------+---------------------------+--------------------------+--------------------------+-----------------------------+----------------------------+--------------------------------------------------------------------------------+
all_manifests
This feature is supported starting from version 4.0.4
Shows manifest file info of all valid snapshots of the table:
all_manifests
and
manifests
have the same structure. The difference is that
all_manifests
includes manifest files from all valid snapshots, while
manifests
only includes manifest files from the current snapshot.
SELECT
FROM
iceberg_table$all_manifests
The result format is the same as the
manifests
system table.
metadata_log_entries
Shows meta logs of the table:
SELECT
FROM
iceberg_table$metadata_log_entries
Result:
+----------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+------------------+------------------------+
| timestamp | file | latest_snapshot_id | latest_schema_id | latest_sequence_number |
+----------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+------------------+------------------------+
| 2025-06-12 22:29:06.948000 | s3://.../iceberg_table/metadata/00000-e373aa16-15f1-4e69-ae7d-5ed64199cf9a.metadata.json | NULL | NULL | NULL |
| 2025-06-12 22:29:16.357000 | s3://.../iceberg_table/metadata/00001-bbc8e244-e41c-4958-92f4-63b8c3ee1196.metadata.json | 1851184769713369003 | 0 | 1 |
| 2025-06-12 22:29:39.922000 | s3://.../iceberg_table/metadata/00002-7dc00d6a-6269-4200-9d28-5f8c1c6b9f99.metadata.json | 4890031351138056789 | 0 | 2 |
+----------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+------------------+------------------------+
partitions
Shows partitions of the table:
SELECT
FROM
iceberg_table$partitions
Result:
+-----------+---------+--------------+------------+-------------------------------+------------------------------+----------------------------+------------------------------+----------------------------+----------------------------+--------------------------+
| partition | spec_id | record_count | file_count | total_data_file_size_in_bytes | position_delete_record_count | position_delete_file_count | equality_delete_record_count | equality_delete_file_count | last_updated_at | last_updated_snapshot_id |
+-----------+---------+--------------+------------+-------------------------------+------------------------------+----------------------------+------------------------------+----------------------------+----------------------------+--------------------------+
| {"id":8} | 0 | 1 | 1 | 625 | 0 | 0 | 0 | 0 | 2025-06-12 22:29:16.357000 | 1851184769713369003 |
| {"id":6} | 0 | 1 | 1 | 625 | 0 | 0 | 0 | 0 | 2025-06-12 22:29:16.357000 | 1851184769713369003 |
| {"id":10} | 0 | 1 | 1 | 618 | 0 | 0 | 0 | 0 | 2025-06-12 22:29:16.357000 | 1851184769713369003 |
| {"id":4} | 0 | 1 | 1 | 618 | 0 | 0 | 0 | 0 | 2025-06-12 22:29:16.357000 | 1851184769713369003 |
| {"id":2} | 0 | 1 | 1 | 611 | 0 | 0 | 0 | 0 | 2025-06-12 22:29:16.357000 | 1851184769713369003 |
+-----------+---------+--------------+------------+-------------------------------+------------------------------+----------------------------+------------------------------+----------------------------+----------------------------+--------------------------+
Note:
For non-partitioned tables, the
partitions
table will not contain the
partition
and
spec_id
fields.
The
partitions
table shows partitions that contain data files or delete files in the current snapshot. However, delete files are not applied, so in some cases, a partition may still be displayed even if all data rows in the partition have been marked as deleted by delete files.
refs
Shows all known snapshot references (branches and tags) for the table:
SELECT
FROM
iceberg_table$refs
Result:
+------+--------+---------------------+-------------------------+-----------------------+------------------------+
| name | type | snapshot_id | max_reference_age_in_ms | min_snapshots_to_keep | max_snapshot_age_in_ms |
+------+--------+---------------------+-------------------------+-----------------------+------------------------+
| main | BRANCH | 4890031351138056789 | NULL | NULL | NULL |
+------+--------+---------------------+-------------------------+-----------------------+------------------------+
snapshots
Shows all snapshots of the table:
SELECT
FROM
iceberg_table$snapshots
Result:
+----------------------------+---------------------+---------------------+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| committed_at | snapshot_id | parent_id | operation | manifest_list | summary |
+----------------------------+---------------------+---------------------+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 2025-06-12 22:29:16.357000 | 1851184769713369003 | NULL | append | s3://.../iceberg_table/metadata/snap-1851184769713369003-1-82059f57-821a-4983-b083-002cc2cde313.avro | {"spark.app.id":"application_1738810850199_0472", "added-data-files":"10", "added-records":"10", "added-files-size":"6200", "changed-partition-count":"10", "total-records":"10", "total-files-size":"6200", "total-data-files":"10", "total-delete-files":"0", "total-position-deletes":"0", "total-equality-deletes":"0"} |
| 2025-06-12 22:29:39.922000 | 4890031351138056789 | 1851184769713369003 | overwrite | s3://.../iceberg_table/metadata/snap-4890031351138056789-1-3194eb8b-5ea4-4cbe-95ba-073229458e7b.avro | {"spark.app.id":"application_1738810850199_0472", "deleted-data-files":"5", "deleted-records":"5", "removed-files-size":"3103", "changed-partition-count":"5", "total-records":"5", "total-files-size":"3097", "total-data-files":"5", "total-delete-files":"0", "total-position-deletes":"0", "total-equality-deletes":"0"} |
+----------------------------+---------------------+---------------------+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Write Operations
INSERT INTO
The INSERT operation appends data to the target table.
For example:
INSERT
INTO
iceberg_tbl
VALUES
val1
val2
val3
val4
INSERT
INTO
iceberg
iceberg_db
iceberg_tbl
SELECT
col1
col2
FROM
internal
db1
tbl1
INSERT
INTO
iceberg_tbl
col1
col2
VALUES
val1
val2
INSERT
INTO
iceberg_tbl
col1
col2
partition_col1
partition_col2
VALUES
'beijing'
'2023-12-12'
Since version 3.1.0, support for writing data to specified branches:
INSERT
INTO
iceberg_tbl
@branch
b1
values
val1
val2
val3
val4
INSERT
INTO
iceberg_tbl
@branch
b1
col3
col4
values
val3
val4
INSERT OVERWRITE
The INSERT OVERWRITE operation completely replaces the existing data in the table with new data.
INSERT
OVERWRITE
TABLE
iceberg_tbl
VALUES
val1
val2
val3
val4
INSERT
OVERWRITE
TABLE
iceberg
iceberg_db
iceberg_tbl
col1
col2
SELECT
col1
col2
FROM
internal
db1
tbl1
Since version 3.1.0, support for writing data to specified branches:
INSERT
OVERWRITE
TABLE
iceberg_tbl
@branch
b1
values
val1
val2
val3
val4
INSERT
OVERWRITE
TABLE
iceberg_tbl
@branch
b1
col3
col4
values
val3
val4
Since version 4.1.0, support for writing data to static partition(or hybrid):
-- Full Static Partition
INSERT
OVERWRITE
TABLE
iceberg_tbl
PARTITION
dt
'2025-01-25'
region
'bj'
SELECT
id
name
FROM
source_table
-- Hybrid Partition Mode: "dt" is static, "region" comes from SELECT dynamically
INSERT
OVERWRITE
TABLE
iceberg_tbl
PARTITION
dt
'2025-01-25'
SELECT
id
name
region
FROM
source_table
CTAS
You can create an Iceberg table and write data using the
CTAS
(Create Table As Select) statement:
CREATE
TABLE
iceberg_ctas
AS
SELECT
FROM
other_table
CTAS supports specifying file formats, partitioning, and other properties:
CREATE
TABLE
iceberg_ctas
PARTITION
BY
LIST
pt1
pt2
AS
SELECT
col1
pt1
pt2
FROM
part_ctas_src
WHERE
col1
CREATE
TABLE
iceberg
iceberg_db
iceberg_ctas
col1
col2
pt1
PARTITION
BY
LIST
pt1
PROPERTIES
'write-format'
'parquet'
'compression-codec'
'zstd'
AS
SELECT
col1
pt1
AS
col2
pt2
AS
pt1
FROM
test_ctas
part_ctas_src
WHERE
col1
INSERT INTO BRANCH
Since 3.1.0
INSERT
INTO
iceberg_table
@branch
b1
SELECT
FROM
other_table
INSERT
OVERWRITE
TABLE
iceberg_table
@branch
b1
SELECT
FROM
other_table
DELETE
tip
This is an experimental feature, supported since version 4.1.0.
The
DELETE
operation is used to delete rows of data in an Iceberg table that meet specified conditions. This operation is primarily based on the following underlying mechanisms: for V2 format tables, the system writes Position Delete files; for V3 format tables, the system writes Puffin-format Deletion Vectors files.
Prerequisites:
The Iceberg table's format version (
format-version
) must be 2 or higher. V1 format tables do not support the
DELETE
operation.
Syntax:
DELETE
FROM
iceberg_tbl
WHERE
condition
Examples:
-- Delete a single row
DELETE
FROM
iceberg_tbl
WHERE
id
-- Delete using multiple conditions
DELETE
FROM
iceberg_tbl
WHERE
age
30
AND
name
'Bob'
-- Delete using a subquery
DELETE
FROM
iceberg_tbl
WHERE
id
IN
SELECT
id
FROM
other_table
WHERE
status
'inactive'
The usage is identical for partitioned tables:
DELETE
FROM
iceberg_partition_tbl
WHERE
id
10
UPDATE
tip
This is an experimental feature, supported since version 4.1.0.
The
UPDATE
operation is used to modify rows of data in an Iceberg table that meet specified conditions. Internally, the implementation involves first writing deletion tracking information (using Position Delete files for V2, and Puffin-format Deletion Vectors for V3) to mark the old data rows as invalid, and then appending the updated new data files (if the V3 format option is enabled, the newly inserted row's tracking value
_last_updated_sequence_number
will automatically advance).
Prerequisites:
The Iceberg table's format version (
format-version
) must be 2 or higher. V1 format tables do not support the
UPDATE
operation.
Syntax:
UPDATE
iceberg_tbl
SET
column1
value1
column2
value2
WHERE
condition
Examples:
-- Update a single column
UPDATE
iceberg_tbl
SET
name
'Alice_new'
WHERE
id
-- Update multiple columns
UPDATE
iceberg_tbl
SET
name
'Updated'
age
28
WHERE
id
-- Update using expressions
UPDATE
iceberg_tbl
SET
age
age
name
concat
name
'-modified'
WHERE
id
-- Update using a subquery as condition
UPDATE
iceberg_tbl
SET
name
'UpdatedViaSubquery'
WHERE
id
IN
SELECT
id
FROM
other_table
The usage is identical for partitioned tables:
UPDATE
iceberg_partition_tbl
SET
name
'Updated'
WHERE
id
10
MERGE INTO
tip
This is an experimental feature, supported since version 4.1.0.
The
MERGE INTO
operation is used to match source data with a target Iceberg table and perform insert, update, or delete operations based on the matching results. This is suitable for scenarios such as data synchronization and incremental updates.
Prerequisites:
The Iceberg table's format version (
format-version
) must be 2 or higher. V1 format tables do not support the
MERGE INTO
operation.
Syntax:
MERGE
INTO
target_table t
USING
source
ON
merge_condition
WHEN
MATCHED
AND
condition
THEN
DELETE
WHEN
MATCHED
AND
condition
THEN
UPDATE
SET
column1
value1
WHEN
NOT
MATCHED
AND
condition
THEN
INSERT
columns
VALUES
values
Where
can be a subquery or another table.
Examples:
Full match operation (includes update, delete, and insert):
MERGE
INTO
iceberg_tbl t
USING
SELECT
AS
id
'Alice_new'
AS
name
26
AS
age
'U'
AS
flag
UNION
ALL
SELECT
'Bob'
30
'D'
UNION
ALL
SELECT
'Dora'
28
'I'
ON
id
id
WHEN
MATCHED
AND
flag
'D'
THEN
DELETE
WHEN
MATCHED
THEN
UPDATE
SET
name
name
age
age
WHEN
NOT
MATCHED
THEN
INSERT
id
name
age
VALUES
id
name
age
In the above example:
The row with
id = 2
is deleted because
flag = 'D'
The row with
id = 1
is updated with new
name
and
age
The row with
id = 4
is inserted because it does not exist in the target table.
Execute matching updates only:
MERGE
INTO
iceberg_tbl t
USING
SELECT
AS
id
'Alice_matched'
AS
name
26
AS
age
ON
id
id
WHEN
MATCHED
THEN
UPDATE
SET
name
name
age
age
Execute inserts on not matched only:
MERGE
INTO
iceberg_tbl t
USING
SELECT
AS
id
'Dora'
AS
name
40
AS
age
ON
id
id
WHEN
NOT
MATCHED
THEN
INSERT
id
name
age
VALUES
id
name
age
Use a subquery as the data source:
MERGE
INTO
iceberg_tbl t
USING
SELECT
id
FROM
other_table
WHERE
id
ON
id
id
WHEN
MATCHED
THEN
UPDATE
SET
name
'UpdatedViaSubquery'
MERGE INTO
operation for partitioned tables:
MERGE
INTO
iceberg_partition_tbl t
USING
SELECT
AS
id
'Alice_new'
AS
name
26
AS
age
DATE
'2024-01-01'
AS
dt
'U'
AS
flag
UNION
ALL
SELECT
'Bob'
30
DATE
'2024-01-02'
'D'
UNION
ALL
SELECT
'Dora'
28
DATE
'2024-01-04'
'I'
ON
id
id
WHEN
MATCHED
AND
flag
'D'
THEN
DELETE
WHEN
MATCHED
THEN
UPDATE
SET
name
name
age
age
WHEN
NOT
MATCHED
THEN
INSERT
id
name
age
dt
VALUES
id
name
age
dt
Notes:
The above three operations (
DELETE
UPDATE
MERGE INTO
) all require the Iceberg table's format version (
format-version
) to be 2 or higher. Attempting these operations on a V1 format table will result in the error:
must have format version 2 or higher for position deletes
. When executed on a V3 format table, the system not only uses Puffin-format Deletion Vectors to replace the original Position Delete data, but also automatically adjusts the lineage lifecycle attributes (
_last_updated_sequence_number
) of the written records if content updates are involved.
These operations support both Parquet and ORC file formats.
In concurrent writing scenarios, Iceberg provides an optimistic concurrency control mechanism. When write conflicts occur, the operation may throw a transaction conflict exception.
The
UPDATE
operation, both independently and within
MERGE INTO
, supports using expressions and functions, such as
age = age * 2 + 1
or
name = concat(name, '-modified')
These operations apply to both partitioned and non-partitioned tables.
Related Parameters
BE (Backend)
Parameter Name
Default Value
Description
iceberg_sink_max_file_size
1GB
Maximum data file size. When the written data exceeds this size, the current file is closed and a new file is created to continue writing.
table_sink_partition_write_max_partition_nums_per_writer
128
Maximum number of partitions each instance can write to on a BE node.
table_sink_non_partition_write_scaling_data_processed_threshold
25MB
Data threshold for starting scaling-write in non-partitioned tables. A new writer (instance) is used for every additional
table_sink_non_partition_write_scaling_data_processed_threshold
of data. This mechanism adjusts the number of writers based on data volume to enhance throughput while conserving resources and minimizing file numbers for smaller data volumes.
table_sink_partition_write_min_data_processed_rebalance_threshold
25MB
Minimum data volume threshold to trigger rebalancing for partitioned tables. Rebalancing starts if
current accumulated data volume
data volume since last rebalancing
>=
table_sink_partition_write_min_data_processed_rebalance_threshold
. Lowering this threshold can improve balance if file size differences are significant, but may increase rebalancing costs and impact performance.
table_sink_partition_write_min_partition_data_processed_rebalance_threshold
Minimum partition data volume threshold to trigger rebalancing. Rebalancing starts if
current partition data volume
>=
threshold
number of tasks already allocated to the partition
. Lowering this threshold can improve balance if file size differences are significant, but may increase rebalancing costs and impact performance.
Database and Table Management
Creating and Dropping Databases
You can switch to the desired catalog using the
SWITCH
statement and execute the
CREATE DATABASE
command:
SWITCH iceberg
CREATE
DATABASE
IF
NOT
EXISTS
iceberg_db
You can also create a database using a fully qualified name or specify a location (currently, only HMS-type catalogs support specifying a location), such as:
CREATE
DATABASE
IF
NOT
EXISTS
iceberg
iceberg_db
CREATE
DATABASE
IF
NOT
EXISTS
iceberg
iceberg_db
PROPERTIES
'location'
'hdfs://172.21.16.47:4007/path/to/db/'
You can view the database's location information using the
SHOW CREATE DATABASE
command:
mysql
SHOW
CREATE
DATABASE
iceberg_db
-------------+-------------------------------------------------------------------------------------------------+
Database
Create
Database
-------------+-------------------------------------------------------------------------------------------------+
iceberg_db
CREATE
DATABASE
iceberg_db LOCATION
'hdfs://172.21.16.47:4007/usr/hive/warehouse/iceberg_db.db'
-------------+-------------------------------------------------------------------------------------------------+
To drop a database:
DROP
DATABASE
IF
EXISTS
iceberg
iceberg_db
caution
For an Iceberg Database, you must first drop all tables under the database before you can drop the database itself; otherwise, an error will occur.
Creating and Dropping Tables
Creating Tables
Doris supports creating both partitioned and non-partitioned tables in Iceberg.
For example:
-- Create unpartitioned iceberg table
CREATE
TABLE
unpartitioned_table
col1
BOOLEAN
COMMENT
'col1'
col2
INT
COMMENT
'col2'
col3
BIGINT
COMMENT
'col3'
col4
FLOAT
COMMENT
'col4'
col5
DOUBLE
COMMENT
'col5'
col6
DECIMAL
COMMENT
'col6'
col7
STRING
COMMENT
'col7'
col8
DATE
COMMENT
'col8'
col9
DATETIME
COMMENT
'col9'
PROPERTIES
'write-format'
'parquet'
-- Create partitioned iceberg table
-- The partition columns must be in table's column definition list
CREATE
TABLE
partition_table
ts
DATETIME
COMMENT
'ts'
col1
BOOLEAN
COMMENT
'col1'
col2
INT
COMMENT
'col2'
col3
BIGINT
COMMENT
'col3'
col4
FLOAT
COMMENT
'col4'
col5
DOUBLE
COMMENT
'col5'
col6
DECIMAL
COMMENT
'col6'
col7
STRING
COMMENT
'col7'
col8
DATE
COMMENT
'col8'
col9
DATETIME
COMMENT
'col9'
pt1
STRING
COMMENT
'pt1'
pt2
STRING
COMMENT
'pt2'
PARTITION
BY
LIST
day
ts
pt1
pt2
PROPERTIES
'write-format'
'orc'
'compression-codec'
'zlib'
Starting from version 4.1.0, Doris supports specifying sort columns when creating an Iceberg table. When writing data, the data will be sorted according to the specified sort columns to achieve better query performance.
CREATE
TABLE
ordered_table
id
int
NULL
name
text
NULL
score
double
NULL
create_time
datetimev2
NULL
ORDER
BY
id
ASC
NULLS
FIRST
score
DESC
NULLS
LAST
PROPERTIES
"write-format"
"parquet"
"write.parquet.compression-codec"
"zstd"
If no sort columns are specified, no sorting will be performed during writes.
The default sort order is ASC NULLS FIRST.
After creation, you can use the
SHOW CREATE TABLE
command to view the Iceberg table creation statement. For details about partition functions, see the
Partitioning
section.
Dropping Tables
You can drop an Iceberg table using the
DROP TABLE
statement. Dropping a table will also remove its data, including partition data.
For example:
DROP
TABLE
IF
EXISTS
iceberg_tbl
Column Type Mapping
Refer to the
Column Type Mapping
section.
Partitioning
Partition types in Iceberg correspond to List partitions in Doris. Therefore, when creating an Iceberg partitioned table in Doris, you should use the List partitioning syntax, but you don't need to explicitly enumerate each partition. Doris will automatically create the corresponding Iceberg partitions based on the data values during data insertion.
Supports creating single-column or multi-column partitioned tables.
Supports partition transformation functions to enable Iceberg implicit partitioning and partition evolution. For specific Iceberg partition transformation functions, see
Iceberg partition transforms
year(ts)
or
years(ts)
month(ts)
or
months(ts)
day(ts)
or
days(ts)
or
date(ts)
hour(ts)
or
hours(ts)
or
date_hour(ts)
bucket(N, col)
truncate(L, col)
File Formats
Parquet (default)
Note that for the Iceberg table created by Doris, the Datetime corresponds to the
timestamp_ntz
type.
In versions after 3.1.0, when the Datetime type is written to the Parquet file, the physical type used is INT64 instead of INT96.
And if the Iceberg table is created by other systems, although the
timestamp
and
timestamp_ntz
types are both mapped to the Doris Datetime type. However, when writing, it will determine whether the time zone needs to be processed based on the actual type.
ORC
Compression Formats
Parquet: snappy, zstd (default), plain (no compression).
ORC: snappy, zlib (default), zstd, plain (no compression).
Storage Medium
HDFS
Object storage
Schema Change
Start from 3.1.0, Doris supports schema changes for Iceberg tables, which can be modified using the
ALTER TABLE
statement.
Supported schema change operations include:
Rename Column
Use the
RENAME COLUMN
clause to rename columns. Renaming columns within nested types is not supported.
ALTER
TABLE
iceberg_table
RENAME
COLUMN
old_col_name
TO
new_col_name
Add a Column
Use
ADD COLUMN
to add a new column. Adding new columns to nested types is not supported.
When adding a new column, you can specify nullable attributes, default values, comments, and column position.
ALTER
TABLE
iceberg_table
ADD
COLUMN
col_name col_type
NULL
NOT
NULL
DEFAULT
default_value
COMMENT
'comment'
FIRST
AFTER
col_name
Example:
ALTER
TABLE
iceberg_table
ADD
COLUMN
new_col STRING
NOT
NULL
DEFAULT
'default_value'
COMMENT
'This is a new col'
AFTER
old_col
Add Columns
You can also use
ADD COLUMN
to add multiple columns. The new columns will be added to the end of the table. Column positioning is not supported for multiple columns. Adding new columns to nested types is not supported.
The syntax for each column is the same as adding a single column.
ALTER
TABLE
iceberg_table
ADD
COLUMN
col_name1 col_type1
NULL
NOT
NULL
DEFAULT
default_value
COMMENT
'comment'
col_name2 col_type2
NULL
NOT
NULL
DEFAULT
default_value
COMMENT
'comment'
Drop Column
Use
DROP COLUMN
to drop columns. Dropping columns within nested types is not supported.
ALTER
TABLE
iceberg_table
DROP
COLUMN
col_name
Modify Column
Use the
MODIFY COLUMN
statement to modify column attributes, including type, nullable, default value, comment, and column position.
Since version 4.0.4, Doris supports modifying complex types (STRUCT, ARRAY, MAP), including safe type promotions and appending struct fields.
Safe type promotions supported in nested types:
INT -> BIGINT, LARGEINT
TINYINT -> SMALLINT, INT, BIGINT, LARGEINT
SMALLINT -> INT, BIGINT, LARGEINT
BIGINT -> LARGEINT
FLOAT -> DOUBLE
VARCHAR(n) -> VARCHAR(m) where m > n
Constraints for complex types:
All new nested fields must be nullable.
Cannot change optional to required.
Default values for complex types only support NULL.
Note: When modifying column attributes, all attributes that are not being modified should also be explicitly specified with their original values.
ALTER
TABLE
iceberg_table
MODIFY
COLUMN
col_name col_type
NULL
NOT
NULL
DEFAULT
default_value
COMMENT
'comment'
FIRST
AFTER
col_name
Example:
CREATE
TABLE
iceberg_table
id
INT
name STRING
-- Modify the id column type to BIGINT, set as NOT NULL, default value to 0, and add comment
ALTER
TABLE
iceberg_table
MODIFY
COLUMN
id
BIGINT
NOT
NULL
DEFAULT
COMMENT
'This is a modified id column'
Example of modifying complex types:
-- Create Iceberg table with complex types
CREATE
TABLE
iceberg_tbl
id
BIGINT
user_info STRUCT
name:STRING
scores:ARRAY
INT
age:
INT
dt STRING
-- Append a new field (email) to the STRUCT column
ALTER
TABLE
iceberg_tbl
MODIFY
COLUMN
user_info STRUCT
name:STRING
scores:ARRAY
INT
age:
INT
email:STRING
-- Promote the nested ARRAY element type from INT to BIGINT
ALTER
TABLE
iceberg_tbl
MODIFY
COLUMN
user_info STRUCT
name:STRING
scores:ARRAY
BIGINT
age:
INT
email:STRING
Reorder Columns
Use
ORDER BY
to reorder columns by specifying the new column order.
ALTER
TABLE
iceberg_table
ORDER
BY
col_name1
col_name2
Partition Evolution
Starting from version 4.0.2, Doris supports Partition Evolution for Iceberg tables through
ALTER
statements.
Supported partition transforms include:
Transform
Syntax
Example
bucket
bucket(N, column)
bucket(16, id)
truncate
truncate(N, column)
truncate(10, name)
year
year(column)
year(ts)
month
month(column)
month(ts)
day
day(column)
day(ts)
hour
hour(column)
hour(ts)
identity
column
category
Supported operations include:
Add partition key
-- use optional AS keyword to specify a custom name for the partition field
ALTER
TABLE
table_name
ADD
PARTITION
KEY
partition_transform
AS
key_name
-- example
ALTER
TABLE
prod
db
sample
ADD
PARTITION
KEY
bucket
16
id
ALTER
TABLE
prod
db
sample
ADD
PARTITION
KEY
truncate
data
ALTER
TABLE
prod
db
sample
ADD
PARTITION
KEY
year
ts
-- use optional AS keyword to specify a custom name for the partition field
ALTER
TABLE
prod
db
sample
ADD
PARTITION
KEY
bucket
16
id
AS
shard
Drop partition key
ALTER
TABLE
table_name
DROP
PARTITION
KEY
partition_transform|key_name
-- example
ALTER
TABLE
prod
db
sample
DROP
PARTITION
KEY
catalog
ALTER
TABLE
prod
db
sample
DROP
PARTITION
KEY
bucket
16
id
ALTER
TABLE
prod
db
sample
DROP
PARTITION
KEY
truncate
data
ALTER
TABLE
prod
db
sample
DROP
PARTITION
KEY
year
ts
ALTER
TABLE
prod
db
sample
DROP
PARTITION
KEY
shard
Replace partition key
-- use optional AS keyword to specify a custom name for the partition field
ALTER
TABLE
table_name
REPLACE
PARTITION
KEY
key_name
WITH
partition_transform
AS
key_name
-- example
ALTER
TABLE
prod
db
sample
REPLACE
PARTITION
KEY
ts_day
WITH
day
ts
-- use optional AS keyword to specify a custom name for the new partition field
ALTER
TABLE
prod
db
sample
REPLACE
PARTITION
KEY
ts_day
WITH
day
ts
AS
day_of_ts
Managing Branch & Tag
Since 3.1.0
Create Branch
Syntax:
ALTER
TABLE
catalog
database
table_name
CREATE
OR
REPLACE
BRANCH
IF
NOT
EXISTS
branch_name
AS
OF
VERSION
snapshot_id
RETAIN
num
{ DAYS
HOURS
MINUTES }
WITH
SNAPSHOT
RETENTION { snapshotKeep
timeKeep }
snapshotKeep:
num
SNAPSHOTS
num
{ DAYS
HOURS
MINUTES }
timeKeep:
num
{ DAYS
HOURS
MINUTES }
Examples:
-- Create branch "b1".
ALTER
TABLE
tbl
CREATE
BRANCH b1
ALTER
TABLE
tb1
CREATE
BRANCH
IF
NOT
EXISTS
b1
-- Create or replace branch "b1".
ALTER
TABLE
tb1
CREATE
OR
REPLACE
BRANCH b1
-- Create or replace branch "b1" based on snapshot "123456".
ALTER
TABLE
tb1
CREATE
OR
REPLACE
BRANCH b1
AS
OF
VERSION
123456
-- Create or replace branch "b1" based on snapshot "123456", branch retained for 1 day.
ALTER
TABLE
tb1
CREATE
OR
REPLACE
BRANCH b1
AS
OF
VERSION
123456
RETAIN
DAYS
-- Create branch "b1" based on snapshot "123456", branch retained for 30 days. Keep the latest 3 snapshots in the branch.
ALTER
TABLE
tb1
CREATE
BRANCH b1
AS
OF
VERSION
123456
RETAIN
30
DAYS
WITH
SNAPSHOT
RETENTION
SNAPSHOTS
-- Create branch "b1" based on snapshot "123456", branch retained for 30 days. Snapshots in the branch are retained for at most 2 days.
ALTER
TABLE
tb1
CREATE
BRANCH b1
AS
OF
VERSION
123456
RETAIN
30
DAYS
WITH
SNAPSHOT
RETENTION
DAYS
-- Create branch "b1" based on snapshot "123456", branch retained for 30 days. Keep the latest 3 snapshots in the branch, and snapshots in the branch are retained for at most 2 days.
ALTER
TABLE
tb1
CREATE
BRANCH b1
AS
OF
VERSION
123456
RETAIN
30
DAYS
WITH
SNAPSHOT
RETENTION
SNAPSHOTS
DAYS
Drop Branch
Syntax:
ALTER
TABLE
catalog
database
table_name
DROP
BRANCH
IF
EXISTS
branch_name
Example:
ALTER
TABLE
tbl
DROP
BRANCH b1
Create Tag
Syntax:
ALTER
TABLE
catalog
database
table_name
CREATE
OR
REPLACE
TAG
IF
NOT
EXISTS
tag_name
AS
OF
VERSION
snapshot_id
RETAIN
num
{ DAYS
HOURS
MINUTES }
Examples:
-- Create tag "t1".
ALTER
TABLE
tbl
CREATE
TAG t1
ALTER
TABLE
tb1
CREATE
TAG
IF
NOT
EXISTS
t1
-- Create or replace tag "t1".
ALTER
TABLE
tb1
CREATE
OR
REPLACE
TAG t1
-- Create or replace tag "t1" based on snapshot "123456".
ALTER
TABLE
tb1
CREATE
OR
REPLACE
TAG b1
AS
OF
VERSION
123456
-- Create or replace tag "b1" based on snapshot "123456", tag retained for 1 day.
ALTER
TABLE
tb1
CREATE
OR
REPLACE
TAG b1
AS
OF
VERSION
123456
RETAIN
DAYS
Drop Tag
Syntax:
ALTER
TABLE
catalog
database
table_name
DROP
TAG
IF
EXISTS
tag_name
Example:
ALTER
TABLE
tbl
DROP
TAG t1
Iceberg Table Actions
This feature is supported starting from version 4.0.2. See specific version requirements for each operation.
This is an experimental feature.
Doris supports executing specific Iceberg table actions through the
ALTER TABLE EXECUTE
syntax. This syntax provides a unified interface for executing various operations such as table optimization and snapshot management.
ALTER
TABLE
catalog
database
table_name
EXECUTE
action_name
"key1"
"value1"
"key2"
"value2"
WHERE
condition
action_name
: Name of the operation to execute
("key1" = "value1", ...)
: Operation-related parameters provided as key-value pairs. Parameter names must be enclosed in double quotes
[WHERE
: Optional WHERE condition to specify the partition or data range for the operation (only supported by some operations)
cherrypick_snapshot
The
cherrypick_snapshot
operation merges changes from a specified snapshot into the current table state, creating a new snapshot without modifying or deleting the original snapshot.
Supported version: 4.0.2+
Syntax:
ALTER
TABLE
catalog
database
table_name
EXECUTE
cherrypick_snapshot
"snapshot_id"
"
Parameters:
Parameter
Type
Required
Description
snapshot_id
Long
Yes
The snapshot ID to be merged
Return Value:
Executing the
cherrypick_snapshot
operation returns a result set with the following 2 columns:
Column Name
Type
Description
source_snapshot_id
BIGINT
The merged snapshot ID
current_snapshot_id
BIGINT
The snapshot ID newly created and set as current after the merge operation
Example:
-- Merge changes from snapshot 123456789 into the current table state
ALTER
TABLE
iceberg_db
iceberg_table
EXECUTE
cherrypick_snapshot
"snapshot_id"
"123456789"
Notes:
This operation does not support WHERE conditions
The operation will fail if the specified snapshot does not exist
The merge operation creates a new snapshot and does not delete the original snapshot
expire_snapshots
The
expire_snapshots
operation removes old snapshots from Iceberg tables to free up storage space and improve metadata performance. This operation follows the Apache Iceberg Spark procedure specification.
Supported version: 4.1.0+
Syntax:
ALTER
TABLE
catalog
database
table_name
EXECUTE
expire_snapshots
"key1"
"value1"
"key2"
"value2"
Parameters:
Parameter
Type
Required
Description
older_than
String
No
Timestamp threshold for snapshot expiration. Snapshots older than this will be removed. Supports ISO datetime format (e.g.,
2024-01-01T00:00:00
) or milliseconds timestamp
retain_last
Integer
No
Number of ancestor snapshots to preserve. When specified alone, automatically sets
older_than
to current time
snapshot_ids
String
No
Comma-separated list of specific snapshot IDs to expire
max_concurrent_deletes
Integer
No
Size of thread pool for delete operations
clean_expired_metadata
Boolean
No
When set to
true
, cleans up unused partition specs and schemas
Return Value:
Executing the
expire_snapshots
operation returns a result set with the following 6 columns:
Column Name
Type
Description
deleted_data_files_count
BIGINT
Number of deleted data files
deleted_position_delete_files_count
BIGINT
Number of deleted position delete files
deleted_equality_delete_files_count
BIGINT
Number of deleted equality delete files
deleted_manifest_files_count
BIGINT
Number of deleted manifest files
deleted_manifest_lists_count
BIGINT
Number of deleted manifest list files
deleted_statistics_files_count
BIGINT
Number of deleted statistics files
Example:
-- Expire snapshots, keeping only the last 2
ALTER
TABLE
iceberg_db
iceberg_table
EXECUTE
expire_snapshots
"retain_last"
"2"
-- Expire snapshots older than a specific timestamp
ALTER
TABLE
iceberg_db
iceberg_table
EXECUTE
expire_snapshots
"older_than"
"2024-01-01T00:00:00"
-- Expire specific snapshots by ID
ALTER
TABLE
iceberg_db
iceberg_table
EXECUTE
expire_snapshots
"snapshot_ids"
"123456789,987654321"
-- Combine parameters: expire snapshots older than 2024-06-01 but keep at least the last 5
ALTER
TABLE
iceberg_db
iceberg_table
EXECUTE
expire_snapshots
"older_than"
"2024-06-01T00:00:00"
"retain_last"
"5"
Notes:
This operation does not support WHERE conditions.
If both
older_than
and
retain_last
are specified, both conditions apply: only snapshots older than
older_than
AND not within the most recent
retain_last
snapshots will be deleted.
snapshot_ids
can be used alone to delete specific snapshots.
This operation permanently deletes snapshots and their associated data files. Use with caution.
It is recommended to query the
$snapshots
system table before execution to understand the table's snapshot information.
fast_forward
The
fast_forward
operation quickly advances the current snapshot of one branch to the latest snapshot of another branch.
Supported version: 4.0.2+
Syntax:
ALTER
TABLE
catalog
database
table_name
EXECUTE
fast_forward
"branch"
"
"to"
"
Parameters:
Parameter
Type
Required
Description
branch
String
Yes
Name of the branch to advance
to
String
Yes
Target branch name;
branch
will be advanced to the latest snapshot of this branch
Return Value:
Executing the
fast_forward
operation returns a result set with the following 3 columns:
Column Name
Type
Description
branch_updated
STRING
The name of the branch that was advanced
previous_ref
BIGINT
The snapshot ID the branch pointed to before advancing (can be NULL)
updated_ref
BIGINT
The snapshot ID the branch points to after advancing
Example:
-- Advance the feature branch to the latest snapshot of the main branch
ALTER
TABLE
iceberg_db
iceberg_table
EXECUTE
fast_forward
"branch"
"feature"
"to"
"main"
Notes:
This operation does not support WHERE conditions
The operation will fail if the specified branch does not exist
Only branches can be advanced, not tags
publish_changes
The
publish_changes
operation is the "Publish" step in the Write-Audit-Publish (WAP) pattern. This process finds snapshots with a specific
wap.id
attribute and adds them to the current table state. This allows users to atomically make "staged" data visible after validation.
Supported version: 4.1.0+
Syntax:
ALTER
TABLE
catalog
database
table_name
EXECUTE
publish_changes
"wap_id"
"
Parameters:
Parameter
Type
Required
Description
wap_id
String
Yes
The
wap.id
of the snapshot to publish
Return Value:
Executing the
publish_changes
operation returns a result set with the following 2 columns:
Column Name
Type
Description
previous_snapshot_id
BIGINT
The snapshot ID before publishing
current_snapshot_id
BIGINT
The new snapshot ID after publishing
Example:
ALTER
TABLE
iceberg_db
iceberg_table
EXECUTE
publish_changes
"wap_id"
"branch-123"
Notes:
This operation does not support WHERE conditions
The operation will fail if the specified
wap.id
does not exist
rewrite_data_files
The
rewrite_data_files
operation rewrites data files in an Iceberg table to achieve small file compaction, optimizing query performance and storage efficiency. This is a synchronous operation.
Supported version: 4.0.2+
Syntax:
ALTER
TABLE
catalog
database
table_name
EXECUTE
rewrite_data_files
"key1"
"value1"
"key2"
"value2"
WHERE
condition
Parameters:
Parameter
Type
Default Value
Description
target-file-size-bytes
Long
536870912 (512MB)
Target file size in bytes. The target size for output files
min-file-size-bytes
Long
target-file-size-bytes * 0.75
Minimum file size in bytes. Files smaller than this will be rewritten. If not specified, defaults to 75% of target file size
max-file-size-bytes
Long
target-file-size-bytes * 1.8
Maximum file size in bytes. Files larger than this will be split. If not specified, defaults to 180% of target file size
min-input-files
Integer
Minimum number of input files to trigger rewriting. Rewriting will only execute when the number of files to rewrite reaches this threshold
rewrite-all
Boolean
false
Whether to rewrite all files regardless of file size
max-file-group-size-bytes
Long
107374182400 (100GB)
Maximum size of a single file group in bytes. Used to limit the amount of data processed by a single rewrite task
delete-file-threshold
Integer
Integer.MAX_VALUE
Minimum number of delete files required to trigger rewriting
delete-ratio-threshold
Double
0.3
Minimum ratio of deleted records required to trigger rewriting (deleted records / total records). Range: 0.0 - 1.0
output-spec-id
Long
Partition spec ID for output files
Return Value:
Executing the
rewrite_data_files
operation returns a result set with the following 4 columns:
Column Name
Type
Description
rewritten_data_files_count
INT
Number of data files rewritten
added_data_files_count
INT
Number of new data files added
rewritten_bytes_count
INT
Number of bytes rewritten
removed_delete_files_count
BIGINT
Number of delete files removed
Examples:
-- Rewrite data files with default parameters
ALTER
TABLE
iceberg_db
iceberg_table
EXECUTE
rewrite_data_files
-- Specify target file size and minimum input files
ALTER
TABLE
iceberg_db
iceberg_table
EXECUTE
rewrite_data_files
"target-file-size-bytes"
"134217728"
"min-input-files"
"10"
-- Rewrite data in specific partitions using WHERE condition
ALTER
TABLE
iceberg_db
iceberg_table
EXECUTE
rewrite_data_files
"target-file-size-bytes"
"268435456"
WHERE
date_col
'2024-01-01'
-- Rewrite all files
ALTER
TABLE
iceberg_db
iceberg_table
EXECUTE
rewrite_data_files
"rewrite-all"
"true"
"target-file-size-bytes"
"536870912"
Notes:
The
rewrite_data_files
operation reads and rewrites data files, which incurs additional I/O and computing overhead. Please allocate cluster resources appropriately.
Before execution, you can use SQL from the
View Data File Distribution
section to evaluate whether rewriting is necessary
WHERE conditions can be used to limit the partitions or data range for rewriting. This condition filters out files that don't contain data matching the WHERE condition, reducing the number of files and amount of data to rewrite
Before execution, you can use SQL from the
Rewrite File Selection Logic
section to calculate which files will be rewritten
rewrite_manifests
The
rewrite_manifests
operation allows users to optimize Iceberg table metadata by rewriting manifest files, improving query performance and reducing metadata overhead.
This addresses manifest file optimization issues in large Iceberg tables, as a large number of small manifest files can impact query planning performance.
Supported version: 4.1.0+
Syntax:
ALTER
TABLE
catalog
database
table_name
EXECUTE
rewrite_manifests
Parameters:
No parameters
Return Value:
Executing the
rewrite_manifests
operation returns a result set with the following 2 columns:
Column Name
Type
Description
rewritten_manifests_count
BIGINT
Number of files rewritten
total_data_manifests_count
BIGINT
Number of data manifest files rewritten
Example:
ALTER
TABLE
iceberg_db
iceberg_table
EXECUTE
rewrite_manifests
rollback_to_snapshot
The
rollback_to_snapshot
operation rolls back an Iceberg table to a specified snapshot.
Supported version: 4.0.2+
Syntax:
ALTER
TABLE
catalog
database
table_name
EXECUTE
rollback_to_snapshot
"snapshot_id"
"
Parameters:
Parameter
Type
Required
Description
snapshot_id
Long
Yes
The snapshot ID to roll back to
Return Value:
Executing the
rollback_to_snapshot
operation returns a result set with the following 2 columns:
Column Name
Type
Description
previous_snapshot_id
BIGINT
The ID of the current snapshot before rollback
current_snapshot_id
BIGINT
The snapshot ID set as current after rollback
Example:
-- Roll back to snapshot 987654321
ALTER
TABLE
iceberg_db
iceberg_table
EXECUTE
rollback_to_snapshot
"snapshot_id"
"987654321"
Notes:
This operation does not support WHERE conditions
The operation will fail if the specified snapshot does not exist
If the current snapshot is already the target snapshot, the operation returns directly without creating a new snapshot
rollback_to_timestamp
The
rollback_to_timestamp
operation rolls back an Iceberg table to the snapshot at a specified point in time.
Supported version: 4.0.2+
Syntax:
ALTER
TABLE
catalog
database
table_name
EXECUTE
rollback_to_timestamp
"timestamp"
"
Parameters:
Parameter
Type
Required
Description
timestamp
String
Yes
The point in time to roll back to. Supports two formats:
1. ISO datetime format:
yyyy-MM-dd HH:mm:ss.SSS
(e.g.,
2024-01-01 10:30:00.000
2. Millisecond timestamp (e.g.,
1704067200000
Return Value:
Executing the
rollback_to_timestamp
operation returns a result set with the following 2 columns:
Column Name
Type
Description
previous_snapshot_id
BIGINT
The ID of the current snapshot before rollback
current_snapshot_id
BIGINT
The snapshot ID at the specified point in time, now set as current
Examples:
-- Roll back to specified point in time (ISO format)
ALTER
TABLE
iceberg_db
iceberg_table
EXECUTE
rollback_to_timestamp
"timestamp"
"2024-01-01 10:30:00.000"
-- Roll back to specified point in time (timestamp format)
ALTER
TABLE
iceberg_db
iceberg_table
EXECUTE
rollback_to_timestamp
"timestamp"
"1704067200000"
Notes:
This operation does not support WHERE conditions
The
timestamp
parameter supports ISO datetime format (
yyyy-MM-dd HH:mm:ss.SSS
) or millisecond timestamp format
The operation will fail if there is no corresponding snapshot at the specified point in time
set_current_snapshot
The
set_current_snapshot
operation sets the current snapshot of an Iceberg table to a specified snapshot ID or reference (branch or tag).
Supported version: 4.0.2+
Syntax:
ALTER
TABLE
catalog
database
table_name
EXECUTE
set_current_snapshot
"snapshot_id"
"
"ref"
"
Parameters:
Parameter
Type
Required
Description
snapshot_id
Long
Either one
The snapshot ID to set as current
ref
String
Either one
The reference name (branch or tag) to set as current
Return Value:
Executing the
set_current_snapshot
operation returns a result set with the following 2 columns:
Column Name
Type
Description
previous_snapshot_id
BIGINT
The ID of the current snapshot before the operation
current_snapshot_id
BIGINT
The snapshot ID after the operation (from
snapshot_id
parameter or resolved from
ref
parameter)
Examples:
-- Set current snapshot by snapshot ID
ALTER
TABLE
iceberg_db
iceberg_table
EXECUTE
set_current_snapshot
"snapshot_id"
"123456789"
-- Set current snapshot by branch name
ALTER
TABLE
iceberg_db
iceberg_table
EXECUTE
set_current_snapshot
"ref"
"feature_branch"
-- Set current snapshot by tag name
ALTER
TABLE
iceberg_db
iceberg_table
EXECUTE
set_current_snapshot
"ref"
"v1.0"
Notes:
This operation does not support WHERE conditions
The
snapshot_id
and
ref
parameters are mutually exclusive; only one can be specified
The operation will fail if the specified snapshot ID or reference does not exist
If the current snapshot is already the target snapshot, the operation returns directly without creating a new snapshot
publish_changes
The
publish_changes
operation is used in the WAP (Write-Audit-Publish) mode to publish a snapshot with the specified
wap.id
as the current table state.
It locates the snapshot whose
wap.id
matches the given
wap_id
and cherry-picks it onto the current table, making the staged data visible to all read operations.
Syntax:
ALTER
TABLE
catalog
database
table_name
EXECUTE
publish_changes
"wap_id"
"
Parameters:
Parameters:
Parameter Name
Type
Required
Description
wap_id
STRING
Yes
The WAP snapshot ID to be published
Return Value:
Executing
publish_changes
returns a result set with the following 2 columns:
Column Name
Type
Description
previous_snapshot_id
STRING
The ID of the current snapshot before the publish operation (NULL if none)
current_snapshot_id
STRING
The ID of the new snapshot created and set as current after publishing
Examples:
-- Publish the snapshot whose WAP ID is test_wap_001
ALTER
TABLE
iceberg_db
iceberg_table
EXECUTE
publish_changes
"wap_id"
"test_wap_001"
Notes:
This operation does not support a WHERE clause, nor PARTITION/PARTITIONS clauses
It is only meaningful for Iceberg tables with write.wap.enabled = true and WAP snapshots generated via wap.id
If no snapshot is found for the specified wap_id, the operation fails and throws an error
After publishing, the new snapshot becomes the current snapshot
If there is no snapshot before publishing, previous_snapshot_id may be NULL
Iceberg Table Optimization
View Data File Distribution
You can use the following SQL to analyze the data distribution and delete file count of Iceberg tables, helping determine whether optimization like
rewrite_data_files
is needed.
Display data file size distribution. This can help identify if there are too many small files:
SELECT
CASE
WHEN
file_size_in_bytes
BETWEEN
AND
1024
1024
THEN
'0-8M'
WHEN
file_size_in_bytes
BETWEEN
1024
1024
AND
32
1024
1024
THEN
'8-32M'
WHEN
file_size_in_bytes
BETWEEN
32
1024
1024
AND
128
1024
1024
THEN
'32-128M'
WHEN
file_size_in_bytes
BETWEEN
128
1024
1024
AND
512
1024
1024
THEN
'128-512M'
WHEN
file_size_in_bytes
512
1024
1024
THEN
'> 512M'
ELSE
'Unknown'
END
AS
SizeRange
COUNT
AS
FileNum
FROM
store_sales$data_files
GROUP
BY
SizeRange
-----------+---------+
SizeRange
FileNum
-----------+---------+
32
-----------+---------+
Display the number of data files and delete files:
SELECT
CASE
WHEN
content
THEN
'DataFile'
WHEN
content
THEN
'PositionDeleteFile'
WHEN
content
THEN
'EqualityDeleteFile'
ELSE
'Unknown'
END
AS
ContentType
COUNT
AS
FileNum
SUM
file_size_in_bytes
AS
SizeInBytes
SUM
record_count
AS
Records
FROM
iceberg_table$files
GROUP
BY
ContentType
--------------------+---------+-------------+---------+
ContentType
FileNum
SizeInBytes
Records
--------------------+---------+-------------+---------+
EqualityDeleteFile
2787
1432518
27870
DataFile
2787
4062416
38760
PositionDeleteFile
11
36608
10890
--------------------+---------+-------------+---------+
Dangling Delete
In some cases, after executing the
rewrite_data_files
action, references to certain Position Deletes may not have been removed from the Snapshot metadata (Dangling Delete). If you directly use the row number information from the metadata in this situation, the result may be incorrect.
Therefore, by default, for
COUNT(*)
queries, if a Position Delete file is found, COUNT pushdown optimization is not enabled; instead, the file is read directly to obtain the actual
COUNT(*)
result. However, this method is time-consuming.
If the user can ensure there is no Dangling Delete issue, this check can be skipped using the Doris session variable
ignore_iceberg_dangling_delete
. This variable defaults to
false
. When set to
true
, the system will directly return the
COUNT(*)
result based on the row count information in the metadata, improving query efficiency.
This feature is supported from versions 3.1.4 and 4.0.3.
Appendix
rewrite_data_files
File Selection Strategy
You can check this section to understand the logic for selecting files to be rewritten. You can also use the following SQL queries to analyze which files will be rewritten based on system table information. This helps you debug and understand rewrite behavior before executing actual rewrite operations.
The rewrite operation uses a two-stage filtering and grouping strategy:
Stage 1: File-level filtering
First, files are filtered based on file-level conditions. Only files that meet at least one of the following conditions will be considered for rewriting:
File size out of range
: File is smaller than
min-file-size-bytes
or larger than
max-file-size-bytes
Too many delete files
: Number of associated delete files >=
delete-file-threshold
High delete ratio
: Delete record ratio >=
delete-ratio-threshold
Note about delete file conditions:
Conditions related to delete files (
delete-file-threshold
and
delete-ratio-threshold
) affect file rewrite selection, but cannot be accurately calculated through system table SQL queries. This is because current system table information is insufficient to accurately determine which data files are affected by delete files. Delete files may only affect specific data files (especially Position Delete files), and the association between delete files and data files requires more detailed metadata that is not available in current system tables. Actual rewrite operations will evaluate these conditions based on more detailed metadata not exposed in system tables.
Stage 2: BinPack grouping and group-level filtering
After file-level filtering, files are grouped by partition, then within each partition, files are grouped using the
BinPack algorithm
based on file size and
max-file-group-size-bytes
. The BinPack algorithm intelligently packs files into groups to optimize rewrite operations.
After grouping, file groups are filtered based on group-level conditions. File groups will be rewritten when they meet at least one of the following conditions:
Sufficient input files
: Number of files in group > 1 and file count >=
min-input-files
Sufficient content
: Number of files in group > 1 and total size >
target-file-size-bytes
Too much content
: Group total size >
max-file-group-size-bytes
Contains files with delete file issues
: At least one file in the group has too many delete files or high delete ratio.
Note about BinPack grouping:
The BinPack algorithm used to group files within each partition is a complex optimization algorithm that cannot be accurately simulated through system table SQL queries. This algorithm considers file size and
max-file-group-size-bytes
to intelligently pack files into groups, which may result in different grouping results than simple partition-based grouping. Therefore, the SQL queries provided below can only identify files that meet file-level conditions, but cannot accurately calculate the final file groups that will be rewritten.
Here are some example query strategies:
Query 1: Identify files meeting file-level conditions
This query identifies files that meet file-level conditions (file size) and will be considered for rewriting:
-- Set parameters (adjust according to your rewrite parameters)
SET
@min_file_size_bytes
16
1024
1024
-- 16 MB (default: target-file-size-bytes * 0.75)
SET
@max_file_size_bytes
768
1024
1024
-- 768 MB (default: target-file-size-bytes * 1.8)
SELECT
file_path
partition
file_size_in_bytes
1024.0
1024.0
AS
file_size_mb
record_count
CASE
WHEN
file_size_in_bytes
@min_file_size_bytes
THEN
'Too small'
WHEN
file_size_in_bytes
@max_file_size_bytes
THEN
'Too large'
END
AS
size_issue
FROM
iceberg_table$data_files
WHERE
file_size_in_bytes
@min_file_size_bytes
OR
file_size_in_bytes
@max_file_size_bytes
ORDER
BY
partition
file_size_in_bytes
DESC
Sample output:
+----------------------------------------------------------------------------------------+---------------------------+-------------+--------------+------------+
| file_path | partition | file_size_mb | record_count | size_issue |
+----------------------------------------------------------------------------------------+---------------------------+-------------+--------------+------------+
| s3a://bucket/path/to/table/data/date_col=2024-01-01/00000-0-00000-00000-0.parquet | {"date_col":"2024-01-01"} | 0.00215 | 5 | Too small |
| s3a://bucket/path/to/table/data/date_col=2024-01-01/00001-0-00001-00001-0.parquet | {"date_col":"2024-01-01"} | 0.00198 | 5 | Too small |
| s3a://bucket/path/to/table/data/date_col=2024-01-02/00002-0-00002-00002-0.parquet | {"date_col":"2024-01-02"} | 0.00231 | 5 | Too small |
| s3a://bucket/path/to/table/data/date_col=2024-01-02/00003-0-00003-00003-0.parquet | {"date_col":"2024-01-02"} | 850.23456 | 1250000 | Too large |
| s3a://bucket/path/to/table/data/date_col=2024-01-03/00004-0-00004-00004-0.parquet | {"date_col":"2024-01-03"} | 0.00245 | 5 | Too small |
+----------------------------------------------------------------------------------------+---------------------------+-------------+--------------+------------+
5 rows in set
Note:
This query can only identify files that meet file-level conditions (file size). The actual file groups to be rewritten depend on the BinPack grouping algorithm, which cannot be accurately simulated through SQL queries. Files identified by this query may be grouped differently by the BinPack algorithm, and some groups may be filtered out based on group-level conditions.
Query 2: Overall statistics summary
This query provides overall statistics about files that meet file-level conditions:
-- Set parameters (adjust according to your rewrite parameters)
SET
@min_file_size_bytes
16
1024
1024
SET
@max_file_size_bytes
768
1024
1024
WITH
file_analysis
AS
SELECT
partition
file_path
file_size_in_bytes
record_count
file_size_in_bytes
@min_file_size_bytes
OR
file_size_in_bytes
@max_file_size_bytes
AS
meets_file_level_conditions
FROM
iceberg_table$data_files
SELECT
'Total files'
AS
metric
COUNT
AS
value
FROM
file_analysis
UNION
ALL
SELECT
'Files meeting file-level conditions'
SUM
CASE
WHEN
meets_file_level_conditions
THEN
ELSE
END
FROM
file_analysis
UNION
ALL
SELECT
'Total size (GB)'
ROUND
SUM
file_size_in_bytes
1024.0
1024.0
1024.0
FROM
file_analysis
UNION
ALL
SELECT
'Size meeting file-level conditions (GB)'
ROUND
SUM
CASE
WHEN
meets_file_level_conditions
THEN
file_size_in_bytes
ELSE
END
1024.0
1024.0
1024.0
FROM
file_analysis
UNION
ALL
SELECT
'Percentage meeting file-level conditions (%)'
ROUND
SUM
CASE
WHEN
meets_file_level_conditions
THEN
ELSE
END
100.0
COUNT
FROM
file_analysis
Sample output:
+----------------------------------------------+--------+
| metric | value |
+----------------------------------------------+--------+
| Total files | 15.00 |
| Total size (GB) | 2.45 |
| Files meeting file-level conditions | 12.00 |
| Size meeting file-level conditions (GB) | 1.85 |
| Percentage meeting file-level conditions (%) | 80.00 |
+----------------------------------------------+--------+
5 rows in set
Notes:
Replace
iceberg_table
in the example with the actual table name
Adjust parameter values (
@min_file_size_bytes
@max_file_size_bytes
) according to your actual rewrite parameters
The default values shown above correspond to default rewrite parameters (min-file-size-bytes = target-file-size-bytes * 0.75, max-file-size-bytes = target-file-size-bytes * 1.8)
View snapshot and branch relationships:
SELECT
refs_data
snapshot_id
snapshots
committed_at
snapshots
operation
ARRAY_SORT
refs_data
refs
FROM
SELECT
snapshot_id
ARRAY_AGG
CONCAT
type
':'
name
AS
refs
FROM
iceberg_table$refs
GROUP
BY
snapshot_id
AS
refs_data
JOIN
SELECT
snapshot_id
committed_at
operation
FROM
iceberg_table$snapshots
AS
snapshots
ON
refs_data
snapshot_id
snapshots
snapshot_id
ORDER
BY
snapshots
committed_at
---------------------+----------------------------+-----------+-------------------------------------+
snapshot_id
committed_at
operation
ARRAY_SORT
refs_data
refs
---------------------+----------------------------+-----------+-------------------------------------+
8272911997874079853
2025
07
10
15
27
07.177000
append
"BRANCH:b1"
"TAG:t1"
1325777059626757917
2025
07
10
15
27
07.530000
append
"BRANCH:b2"
"TAG:t2"
76492482642020578
2025
07
10
15
27
07.865000
append
"BRANCH:b3"
"TAG:t3"
1788715857849070138
2025
07
12
04
15
19.626000
append
"BRANCH:main"
"TAG:t4"
"TAG:t5"
---------------------+----------------------------+-----------+-------------------------------------+
Report issue
Doris Homepage
Ask Questions on Discussion
Chat on Slack
Chat on Discord
On This Page
Applicable Scenarios
Configuring Catalog
Syntax
Metadata Cache
Cache Property Configuration (4.1.x+)
Cache Modules
Legacy Parameter Mapping and Conversion
Best Practices
Observability
Supported Iceberg Versions
Supported Iceberg Formats
Supported Metadata Services
Iceberg Catalog Feature Support Matrix
Supported Storage Systems
Supported Data Formats
Column Type Mapping
Namespace Mapping
Examples
Hive Metastore
AWS Glue
Aliyun DLF
Iceberg Rest Catalog
Iceberg JDBC Catalog
FileSystem
AWS S3 Tables
Query Operations
Basic Query
Time Travel
Branch and Tag
View
Iceberg V3 Hidden Columns (Row Lineage)
System Tables
entries
files
history
manifests
all_manifests
metadata_log_entries
partitions
refs
snapshots
Write Operations
INSERT INTO
INSERT OVERWRITE
CTAS
INSERT INTO BRANCH
DELETE
UPDATE
MERGE INTO
Related Parameters
Database and Table Management
Creating and Dropping Databases
Creating and Dropping Tables
Schema Change
Partition Evolution
Managing Branch & Tag
Iceberg Table Actions
cherrypick_snapshot
expire_snapshots
fast_forward
publish_changes
rewrite_data_files
rewrite_manifests
rollback_to_snapshot
rollback_to_timestamp
set_current_snapshot
publish_changes
Iceberg Table Optimization
View Data File Distribution
Dangling Delete
Appendix
rewrite_data_files
File Selection Strategy
ASF
Foundation
License
Events
Sponsorship
Security
Thanks
Resources
Blog
Ecosystem
Users
Discussions
Community
How to contribute
Source code
Doris team
Roadmap
Improvement proposal
Join the community
Copyright © 2026 The Apache Software Foundation,Licensed under the
Apache License, Version 2.0
. Apache, Doris, Apache Doris, the Apache feather logo and the Apache Doris logo are trademarks of The Apache Software Foundation.
US