You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@iceberg.apache.org by bl...@apache.org on 2020/06/17 17:56:14 UTC
[iceberg] branch master updated: Docs: Add HadoopCatalog example
(#1095)
This is an automated email from the ASF dual-hosted git repository.
blue pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/iceberg.git
The following commit(s) were added to refs/heads/master by this push:
new f3c54a8 Docs: Add HadoopCatalog example (#1095)
f3c54a8 is described below
commit f3c54a810ad06ab2ad7c42ca63eb5d8607690d0a
Author: hzfanxinxin <78...@qq.com>
AuthorDate: Thu Jun 18 01:56:08 2020 +0800
Docs: Add HadoopCatalog example (#1095)
Co-authored-by: 范欣欣 <hz...@corp.netease.com>
---
site/docs/api-quickstart.md | 32 +++++++++++++++++++++++++++++++-
site/docs/java-api-quickstart.md | 30 +++++++++++++++++++++++++++++-
2 files changed, 60 insertions(+), 2 deletions(-)
diff --git a/site/docs/api-quickstart.md b/site/docs/api-quickstart.md
index 00f7f35..1926f4a 100644
--- a/site/docs/api-quickstart.md
+++ b/site/docs/api-quickstart.md
@@ -48,9 +48,39 @@ logsDF.write
The logs [schema](#create-a-schema) and [partition spec](#create-a-partition-spec) are created below.
+### Using a Hadoop catalog
+
+A Hadoop catalog doesn't need to connect to a Hive MetaStore, but can only be used with HDFS or similar file systems that support atomic rename. Concurrent writes with a Hadoop catalog are not safe with a local FS or S3. To create a Hadoop catalog:
+
+```scala
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.hadoop.HadoopCatalog;
+
+val conf = new Configuration();
+val warehousePath = "hdfs://host:8020/warehouse_path";
+val catalog = new HadoopCatalog(conf, warehousePath);
+```
+
+Like the Hive catalog, `HadoopCatalog` implements `Catalog`, so it also has methods for working with tables, like `createTable`, `loadTable`, and `dropTable`.
+
+This example creates a table with the Hadoop catalog:
+
+```scala
+val name = TableIdentifier.of("logging", "logs")
+val table = catalog.createTable(name, schema, spec)
+
+// write into the new logs table with Spark 2.4
+logsDF.write
+ .format("iceberg")
+ .mode("append")
+ .save("hdfs://host:8020/warehouse_path/logging.db/logs")
+```
+
+The logs [schema](#create-a-schema) and [partition spec](#create-a-partition-spec) are created below.
+
### Using Hadoop tables
-Iceberg also supports tables that are stored in a directory in HDFS or the local file system. Directory tables don't support all catalog operations, like rename, so they use the `Tables` interface instead of `Catalog`.
+Iceberg also supports tables that are stored in a directory in HDFS. Concurrent writes with a Hadoop tables are not safe when stored in the local FS or S3. Directory tables don't support all catalog operations, like rename, so they use the `Tables` interface instead of `Catalog`.
To create a table in HDFS, use `HadoopTables`:
diff --git a/site/docs/java-api-quickstart.md b/site/docs/java-api-quickstart.md
index 8bcf080..826db6f 100644
--- a/site/docs/java-api-quickstart.md
+++ b/site/docs/java-api-quickstart.md
@@ -46,9 +46,37 @@ Table table = catalog.createTable(name, schema, spec);
The logs [schema](#create-a-schema) and [partition spec](#create-a-partition-spec) are created below.
+### Using a Hadoop catalog
+
+A Hadoop catalog doesn't need to connect to a Hive MetaStore, but can only be used with HDFS or similar file systems that support atomic rename. Concurrent writes with a Hadoop catalog are not safe with a local FS or S3. To create a Hadoop catalog:
+
+```java
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.hadoop.HadoopCatalog;
+
+Configuration conf = new Configuration();
+String warehousePath = "hdfs://host:8020/warehouse_path";
+HadoopCatalog catalog = new HadoopCatalog(conf, warehousePath);
+```
+
+Like the Hive catalog, `HadoopCatalog` implements `Catalog`, so it also has methods for working with tables, like `createTable`, `loadTable`, and `dropTable`.
+
+This example creates a table with Hadoop catalog:
+
+```java
+import org.apache.iceberg.Table;
+import org.apache.iceberg.catalog.TableIdentifier;
+
+TableIdentifier name = TableIdentifier.of("logging", "logs");
+Table table = catalog.createTable(name, schema, spec);
+```
+
+The logs [schema](#create-a-schema) and [partition spec](#create-a-partition-spec) are created below.
+
+
### Using Hadoop tables
-Iceberg also supports tables that are stored in a directory in HDFS or the local file system. Directory tables don't support all catalog operations, like rename, so they use the `Tables` interface instead of `Catalog`.
+Iceberg also supports tables that are stored in a directory in HDFS. Concurrent writes with a Hadoop tables are not safe when stored in the local FS or S3. Directory tables don't support all catalog operations, like rename, so they use the `Tables` interface instead of `Catalog`.
To create a table in HDFS, use `HadoopTables`: