You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@iceberg.apache.org by bl...@apache.org on 2021/01/29 01:19:28 UTC

[iceberg] branch master updated: Docs: Add to release notes for 0.11.0, misc fixes (#2178)

This is an automated email from the ASF dual-hosted git repository.

blue pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/iceberg.git


The following commit(s) were added to refs/heads/master by this push:
     new b4f73d2  Docs: Add to release notes for 0.11.0, misc fixes (#2178)
b4f73d2 is described below

commit b4f73d2ba89d0c3d861120b157f949896053f848
Author: Jack Ye <yz...@amazon.com>
AuthorDate: Thu Jan 28 17:19:17 2021 -0800

    Docs: Add to release notes for 0.11.0, misc fixes (#2178)
---
 site/docs/aws.md      |  2 ++
 site/docs/flink.md    | 14 +++++++++++++-
 site/docs/releases.md | 14 +++++++++-----
 site/docs/spark.md    |  4 +---
 4 files changed, 25 insertions(+), 9 deletions(-)

diff --git a/site/docs/aws.md b/site/docs/aws.md
index 96828f0..24aa2c2 100644
--- a/site/docs/aws.md
+++ b/site/docs/aws.md
@@ -51,6 +51,8 @@ spark-sql --packages $DEPENDENCIES \
 
 As you can see, In the shell command, we use `--packages` to specify the additional AWS bundle and HTTP client dependencies with their version as `2.15.40`.
 
+For integration with other engines such as Flink, please read their engine documentation pages that explain how to load a custom catalog. 
+
 ## Glue Catalog
 
 Iceberg enables the use of [AWS Glue](https://aws.amazon.com/glue) as the `Catalog` implementation.
diff --git a/site/docs/flink.md b/site/docs/flink.md
index 30279b7..17b50a1 100644
--- a/site/docs/flink.md
+++ b/site/docs/flink.md
@@ -144,6 +144,18 @@ CREATE CATALOG my_catalog WITH (
 );
 ```
 
+### Create through YAML config
+
+Catalogs can be registered in `sql-client-defaults.yaml` before starting the SQL client. Here is an example:
+
+```yaml
+catalogs: 
+  - name: my_catalog
+    type: iceberg
+    catalog-type: hadoop
+    warehouse: hdfs://nn:8020/warehouse/path
+```
+
 ## DDL commands
 
 ### `CREATE DATABASE`
@@ -420,4 +432,4 @@ There are some features that we do not yet support in the current flink iceberg
 * Don't support creating iceberg table with hidden partitioning. [Discussion](http://mail-archives.apache.org/mod_mbox/flink-dev/202008.mbox/%3cCABi+2jQCo3MsOa4+ywaxV5J-Z8TGKNZDX-pQLYB-dG+dVUMiMw@mail.gmail.com%3e) in flink mail list.
 * Don't support creating iceberg table with computed column.
 * Don't support creating iceberg table with watermark.
-* Don't support adding columns, removing columns, renaming columns, changing columns. [FLINK-19062](https://issues.apache.org/jira/browse/FLINK-19062) is tracking this.
\ No newline at end of file
+* Don't support adding columns, removing columns, renaming columns, changing columns. [FLINK-19062](https://issues.apache.org/jira/browse/FLINK-19062) is tracking this.
diff --git a/site/docs/releases.md b/site/docs/releases.md
index a073d07..d52a6be 100644
--- a/site/docs/releases.md
+++ b/site/docs/releases.md
@@ -74,11 +74,13 @@ High-level features:
 
 Important bug fixes:
 
-* [\#2091](https://github.com/apache/iceberg/pull/2091) fixes Parquet vectorized reads when column types are promoted
-* [\#1991](https://github.com/apache/iceberg/pull/1991) fixes Avro schema conversions to preserve field docs
-* [\#1981](https://github.com/apache/iceberg/pull/1981) fixes bug that date and timestamp transforms were producing incorrect values for negative dates and times
-* [\#1798](https://github.com/apache/iceberg/pull/1798) fixes read failure when encountering duplicate entries of data files
-* [\#1785](https://github.com/apache/iceberg/pull/1785) fixes invalidation of metadata tables in CachingCatalog
+* [\#1981](https://github.com/apache/iceberg/pull/1981) fixes bug that date and timestamp transforms were producing incorrect values for dates and times before 1970. Before the fix, negative values were incorrectly transformed by date and timestamp transforms to 1 larger than the correct value. For example, `day(1969-12-31 10:00:00)` produced 0 instead of -1. The fix is backwards compatible, which means predicate projection can still work with the incorrectly transformed partitions writt [...]
+* [\#2091](https://github.com/apache/iceberg/pull/2091) fixes `ClassCastException` for type promotion `int` to `long` and `float` to `double` during Parquet vectorized read. Now Arrow vector is created by looking at Parquet file schema instead of Iceberg schema for `int` and `float` fields.
+* [\#1998](https://github.com/apache/iceberg/pull/1998) fixes bug in `HiveTableOperation` that `unlock` is not called if new metadata cannot be deleted. Now it is guaranteed that `unlock` is always called for Hive catalog users.
+* [\#1979](https://github.com/apache/iceberg/pull/1979) fixes table listing failure in Hadoop catalog when user does not have permission to some tables. Now the tables with no permission are ignored in listing.
+* [\#1798](https://github.com/apache/iceberg/pull/1798) fixes scan task failure when encountering duplicate entries of data files. Spark and Flink readers can now ignore duplicated entries in data files for each scan task.
+* [\#1785](https://github.com/apache/iceberg/pull/1785) fixes invalidation of metadata tables in `CachingCatalog`. When a table is dropped, all the metadata tables associated with it are also invalidated in the cache.
+* [\#1960](https://github.com/apache/iceberg/pull/1960) fixes bug that ORC writer does not read metrics config and always use the default. Now customized metrics config is respected.
 
 Other notable changes:
 
@@ -87,8 +89,10 @@ Other notable changes:
 * Spark and Flink now support dynamically loading customized `Catalog` and `FileIO` implementations
 * Spark 2 now supports loading tables from other catalogs, like Spark 3
 * Spark 3 now supports catalog names in DataFrameReader when using Iceberg as a format
+* Flink now uses the number of Iceberg read splits as its job parallelism to improve performance and save resource.
 * Hive (experimental) now supports INSERT INTO, case insensitive query, projection pushdown, create DDL with schema and auto type conversion
 * ORC now supports reading tinyint, smallint, char, varchar types
+* Avro to Iceberg schema conversion now preserves field docs
 
 ## Past releases
 
diff --git a/site/docs/spark.md b/site/docs/spark.md
index 1c4c9d2..3615a76 100644
--- a/site/docs/spark.md
+++ b/site/docs/spark.md
@@ -26,7 +26,7 @@ Iceberg uses Apache Spark's DataSourceV2 API for data source and catalog impleme
 | [`CREATE TABLE`](#create-table)                | ✔️        |            |                                                |
 | [`CREATE TABLE AS`](#create-table-as-select)   | ✔️        |            |                                                |
 | [`REPLACE TABLE AS`](#replace-table-as-select) | ✔️        |            |                                                |
-| [`ALTER TABLE`](#alter-table)                  | ✔️        |            | ⚠ requires extensions enabled to update partition field and sort order |
+| [`ALTER TABLE`](#alter-table)                  | ✔️        |            | ⚠ Requires [SQL extensions](./spark-configuration.md#sql-extensions) enabled to update partition field and sort order |
 | [`DROP TABLE`](#drop-table)                    | ✔️        |            |                                                |
 | [`SELECT`](#querying-with-sql)                 | ✔️        |            |                                                |
 | [`INSERT INTO`](#insert-into)                  | ✔️        |            |                                                |
@@ -40,8 +40,6 @@ Iceberg uses Apache Spark's DataSourceV2 API for data source and catalog impleme
 | [DataFrame CTAS and RTAS](#creating-tables)      | ✔️        |            |                                                |
 | [Metadata tables](#inspecting-tables)            | ✔️        | ✔️          |                                                |
 
-To enable Iceberg SQL extensions, set Spark configuration `spark.sql.extensions` as `org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions`. 
-
 ## Configuring catalogs
 
 Spark 3.0 adds an API to plug in table catalogs that are used to load, create, and manage Iceberg tables. Spark catalogs are configured by setting [Spark properties](./configuration.md#catalogs) under `spark.sql.catalog`.