You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@iceberg.apache.org by bl...@apache.org on 2019/06/27 18:32:53 UTC

[incubator-iceberg] branch master updated (b3d3ab9 -> a0738e5)

This is an automated email from the ASF dual-hosted git repository.

blue pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-iceberg.git.


    from b3d3ab9  Update ASF site with more docs.
     new 3a1aab0  Do not show permalink for h1.
     new a0738e5  Add API doc for Java.

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 site/docs/api.md        | 188 ++++++++++++++++++++++++++++++++++++++++++++++++
 site/docs/css/extra.css |   5 --
 2 files changed, 188 insertions(+), 5 deletions(-)
 create mode 100644 site/docs/api.md

[incubator-iceberg] 02/02: Add API doc for Java.

Posted by bl...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

blue pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-iceberg.git

commit a0738e5509fc2031a3a4bfb6ca344d08a1122c4e
Author: Ryan Blue <bl...@apache.org>
AuthorDate: Thu Jun 27 11:32:37 2019 -0700

    Add API doc for Java.
---
 site/docs/api.md | 188 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 188 insertions(+)

diff --git a/site/docs/api.md b/site/docs/api.md
new file mode 100644
index 0000000..78b5f05
--- /dev/null
+++ b/site/docs/api.md
@@ -0,0 +1,188 @@
+# Iceberg Java API
+
+## Tables
+
+The main purpose of the Iceberg API is to manage table metadata, like schema, partition spec, metadata, and data files that store table data.
+
+Table metadata and operations are accessed through the `Table` interface. This interface will return table information, 
+
+### Table metadata
+
+The [`Table` interface](/javadoc/master/index.html?org/apache/iceberg/Table.html) provides access table metadata:
+
+* `schema` returns the current table [schema](../schemas)
+* `spec` returns the current table partition spec
+* `properties` returns a map of key-value [properties](../configuration)
+* `currentSnapshot` returns the current table snapshot
+* `snapshots` returns all valid snapshots for the table
+* `snapshot(id)` returns a specific snapshot by ID
+* `location` returns the table's base location
+
+Tables also provide `refresh` to update the table to the latest version, and expose helpers:
+
+* `io` returns the `FileIO` used to read and write table files
+* `locationProvider` returns a `LocationProvider` used to create paths for data and metadata files
+
+
+### Scanning
+
+Iceberg table scans start by creating a `TableScan` object with `newScan`.
+
+```java
+TableScan scan = table.newScan();
+```
+
+To configure a scan, call `filter` and `select` on the `TableScan` to get a new `TableScan` with those changes.
+
+```java
+TableScan filteredScan = scan.filter(Expressions.equal("id", 5))
+```
+
+Calls to configuration methods create a new `TableScan` so that each `TableScan` is immutable and won't change unexpectedly if shared across threads.
+
+When a scan is configured, `planFiles`, `planTasks`, and `schema` are used to return files, tasks, and the read projection.
+
+```java
+TableScan scan = table.newScan()
+    .filter(Expressions.equal("id", 5))
+    .select("id", "data");
+
+Schema projection = scan.schema();
+Iterable<CombinedScanTask> tasks = scan.planTasks();
+```
+
+Use `asOfTime` or `useSnapshot` to configure the table snapshot for time travel queries.
+
+
+### Update operations
+
+`Table` also exposes operations that update the table. These operations use a builder pattern, [`PendingUpdate`](/javadoc/master/index.html?org/apache/iceberg/PendingUpdate.html), that commits when `PendingUpdate#commit` is called.
+
+For example, updating the table schema is done by calling `updateSchema`, adding updates to the builder, and finally calling `commit` to commit the pending changes to the table:
+
+```java
+table.updateSchema()
+    .addColumn("count", Types.LongType.get())
+    .commit();
+```
+
+Available operations to update a table are:
+
+* `updateSchema` -- update the table schema
+* `updateProperties` -- update table properties
+* `updateLocation` -- update the table's base location
+* `newAppend` -- used to append data files
+* `newFastAppend` -- used to append data files, will not compact metadata
+* `newOverwrite` -- used to append data files and remove files that are overwritten
+* `newDelete` -- used to delete data files
+* `newRewrite` -- used to rewrite data files; will replace existing files with new versions
+* `newTransaction` -- create a new table-level transaction
+* `rewriteManifests` -- rewrite manifest data by clustering files, for faster scan planning
+* `rollback` -- roll the table state back by pointing current to a specific snapshot
+
+### Transactions
+
+## Types
+
+Iceberg data types are located in the [`org.apache.iceberg.types` package](/javadoc/master/index.html?org/apache/iceberg/types/package-summary.html).
+
+### Primitives
+
+Primitive type instances are available from static methods in each type class. Types without parameters use `get`, and types like `decimal` use factory methods:
+
+```java
+Types.IntegerType.get()    // int
+Types.DoubleType.get()     // double
+Types.DecimalType.of(9, 2) // decimal(9, 2)
+```
+
+### Nested types
+
+Structs, maps, and lists are created using factory methods in type classes.
+
+Like struct fields, map keys or values and list elements are tracked as nested fields. Nested fields track [field IDs](../evolution#correctness) and nullability.
+
+Struct fields are created using `NestedField.optional` or `NestedField.required`. Map value and list element nullability is set in the map and list factory methods.
+
+```java
+// struct<1 id: int, 2 data: optional string>
+StructType struct = Struct.of(
+    Types.NestedField.required(1, "id", Types.IntegerType.get()),
+    Types.NestedField.optional(2, "data", Types.StringType.get())
+  )
+```
+```java
+// map<1 key: int, 2 value: optional string>
+MapType map = MapType.ofOptional(
+    1, Types.IntegerType.get(),
+    2, Types.StringType.get()
+  )
+```
+```java
+// array<1 element: int>
+ListType list = ListType.ofRequired(1, IntegerType.get());
+```
+
+
+## Expressions
+
+Iceberg's expressions are used to configure table scans. To create expressions, use the factory methods in [`Expressions`](/javadoc/master/index.html?org/apache/iceberg/expressions/Expressions.html).
+
+Supported predicate expressions are:
+
+* `isNull`
+* `notNull`
+* `equal`
+* `notEqual`
+* `lessThan`
+* `lessThanOrEqual`
+* `greaterThan`
+* `greaterThanOrEqual`
+
+Supported expression operations are:
+
+* `and`
+* `or`
+* `not`
+
+Constant expressions are:
+
+* `alwaysTrue`
+* `alwaysFalse`
+
+### Expression binding
+
+When created, expressions are unbound. Before an expression is used, it will be bound to a data type to find the field ID the expression name represents, and to convert predicate literals.
+
+For example, before using the expression `lessThan("x", 10)`, Iceberg needs to determine which column `"x"` refers to and convert `10` to that colunn's data type.
+
+If the expression could be bound to the type `struct<1 x: long, 2 y: long>` or to `struct<11 x: int, 12 y: int>`.
+
+### Expression example
+
+```java
+table.newScan()
+    .filter(Expressions.greaterThanOrEqual("x", 5))
+    .filter(Expressions.lessThan("x", 10))
+```
+
+
+## Modules
+
+Iceberg table support is organized in library modules:
+
+* `iceberg-common` contains utility classes used in other modules
+* `iceberg-api` contains the public Iceberg API, including expressions, types, tables, and operations
+* `iceberg-core` contains implementations of the Iceberg API and support for Avro data files, **this is what processing engines should depend on**
+* `iceberg-parquet` is an optional module for working with tables backed by Parquet files
+* `iceberg-orc` is an optional module for working with tables backed by ORC files (*experimental*)
+* `iceberg-hive` is an implementation of iceberg tables backed by hive metastore thrift client
+
+This project Iceberg also has modules for adding Iceberg support to processing engines:
+
+* `iceberg-spark` is an implementation of Spark's Datasource V2 API for Iceberg (use iceberg-runtime for a shaded version)
+* `iceberg-data` is a client library used to read Iceberg tables from JVM applications
+* `iceberg-pig` is an implementation of Pig's LoadFunc API for Iceberg
+* `iceberg-runtime` generates a shaded runtime jar for Spark to integrate with iceberg tables
+* `iceberg-presto-runtime` generates a shaded runtime jar that is used by presto to integrate with iceberg tables
+

[incubator-iceberg] 01/02: Do not show permalink for h1.

Posted by bl...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

blue pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-iceberg.git

commit 3a1aab05ed004c0e03ab4a1fd22f4bd8937d4bcc
Author: Ryan Blue <bl...@apache.org>
AuthorDate: Thu Jun 27 10:09:46 2019 -0700

    Do not show permalink for h1.
---
 site/docs/css/extra.css | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/site/docs/css/extra.css b/site/docs/css/extra.css
index 2a398d1..ea1ac09 100644
--- a/site/docs/css/extra.css
+++ b/site/docs/css/extra.css
@@ -37,11 +37,6 @@
   opacity: 0;
 }
 
-h1:target .headerlink {
-  color: #008cba;
-  opacity: 1;
-}
-
 h2:target .headerlink {
   color: #008cba;
   opacity: 1;