You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@iceberg.apache.org by bl...@apache.org on 2020/06/17 17:56:14 UTC

[iceberg] branch master updated: Docs: Add HadoopCatalog example (#1095)

This is an automated email from the ASF dual-hosted git repository.

blue pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/iceberg.git


The following commit(s) were added to refs/heads/master by this push:
     new f3c54a8  Docs: Add HadoopCatalog example (#1095)
f3c54a8 is described below

commit f3c54a810ad06ab2ad7c42ca63eb5d8607690d0a
Author: hzfanxinxin <78...@qq.com>
AuthorDate: Thu Jun 18 01:56:08 2020 +0800

    Docs: Add HadoopCatalog example (#1095)
    
    Co-authored-by: 范欣欣 <hz...@corp.netease.com>
---
 site/docs/api-quickstart.md      | 32 +++++++++++++++++++++++++++++++-
 site/docs/java-api-quickstart.md | 30 +++++++++++++++++++++++++++++-
 2 files changed, 60 insertions(+), 2 deletions(-)

diff --git a/site/docs/api-quickstart.md b/site/docs/api-quickstart.md
index 00f7f35..1926f4a 100644
--- a/site/docs/api-quickstart.md
+++ b/site/docs/api-quickstart.md
@@ -48,9 +48,39 @@ logsDF.write
 
 The logs [schema](#create-a-schema) and [partition spec](#create-a-partition-spec) are created below.
 
+### Using a Hadoop catalog
+
+A Hadoop catalog doesn't need to connect to a Hive MetaStore, but can only be used with HDFS or similar file systems that support atomic rename. Concurrent writes with a Hadoop catalog are not safe with a local FS or S3. To create a Hadoop catalog:
+
+```scala
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.hadoop.HadoopCatalog;
+
+val conf = new Configuration();
+val warehousePath = "hdfs://host:8020/warehouse_path";
+val catalog = new HadoopCatalog(conf, warehousePath);
+```
+
+Like the Hive catalog, `HadoopCatalog` implements `Catalog`, so it also has methods for working with tables, like `createTable`, `loadTable`, and `dropTable`.
+                                                                                       
+This example creates a table with the Hadoop catalog:
+
+```scala
+val name = TableIdentifier.of("logging", "logs")
+val table = catalog.createTable(name, schema, spec)
+
+// write into the new logs table with Spark 2.4
+logsDF.write
+    .format("iceberg")
+    .mode("append")
+    .save("hdfs://host:8020/warehouse_path/logging.db/logs")
+```
+
+The logs [schema](#create-a-schema) and [partition spec](#create-a-partition-spec) are created below.
+
 ### Using Hadoop tables
 
-Iceberg also supports tables that are stored in a directory in HDFS or the local file system. Directory tables don't support all catalog operations, like rename, so they use the `Tables` interface instead of `Catalog`.
+Iceberg also supports tables that are stored in a directory in HDFS. Concurrent writes with a Hadoop tables are not safe when stored in the local FS or S3. Directory tables don't support all catalog operations, like rename, so they use the `Tables` interface instead of `Catalog`.
 
 To create a table in HDFS, use `HadoopTables`:
 
diff --git a/site/docs/java-api-quickstart.md b/site/docs/java-api-quickstart.md
index 8bcf080..826db6f 100644
--- a/site/docs/java-api-quickstart.md
+++ b/site/docs/java-api-quickstart.md
@@ -46,9 +46,37 @@ Table table = catalog.createTable(name, schema, spec);
 The logs [schema](#create-a-schema) and [partition spec](#create-a-partition-spec) are created below.
 
 
+### Using a Hadoop catalog
+
+A Hadoop catalog doesn't need to connect to a Hive MetaStore, but can only be used with HDFS or similar file systems that support atomic rename. Concurrent writes with a Hadoop catalog are not safe with a local FS or S3. To create a Hadoop catalog:
+
+```java
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.hadoop.HadoopCatalog;
+
+Configuration conf = new Configuration();
+String warehousePath = "hdfs://host:8020/warehouse_path";
+HadoopCatalog catalog = new HadoopCatalog(conf, warehousePath);
+```
+
+Like the Hive catalog, `HadoopCatalog` implements `Catalog`, so it also has methods for working with tables, like `createTable`, `loadTable`, and `dropTable`.
+                                                                                       
+This example creates a table with Hadoop catalog:
+
+```java
+import org.apache.iceberg.Table;
+import org.apache.iceberg.catalog.TableIdentifier;
+
+TableIdentifier name = TableIdentifier.of("logging", "logs");
+Table table = catalog.createTable(name, schema, spec);
+```
+
+The logs [schema](#create-a-schema) and [partition spec](#create-a-partition-spec) are created below.
+
+
 ### Using Hadoop tables
 
-Iceberg also supports tables that are stored in a directory in HDFS or the local file system. Directory tables don't support all catalog operations, like rename, so they use the `Tables` interface instead of `Catalog`.
+Iceberg also supports tables that are stored in a directory in HDFS. Concurrent writes with a Hadoop tables are not safe when stored in the local FS or S3. Directory tables don't support all catalog operations, like rename, so they use the `Tables` interface instead of `Catalog`.
 
 To create a table in HDFS, use `HadoopTables`: