You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2018/08/07 18:11:00 UTC

[jira] [Commented] (IMPALA-7309) Prevent the addition of Avro partitions to non-Avro tables with incompatible schema

    [ https://issues.apache.org/jira/browse/IMPALA-7309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16572075#comment-16572075 ] 

ASF subversion and git services commented on IMPALA-7309:
---------------------------------------------------------

Commit 4aec50484a51610efdea08db7af9e9737b3bc1c2 in impala's branch refs/heads/master from [~tlipcon]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=4aec504 ]

IMPALA-7308. Support Avro tables in LocalCatalog

This adds support for loading Avro-formatted tables in LocalCatalog. In
the case that the table properties indicate a table is Avro-formatted,
the semantics are identical to the existing catalog implementation:

- if an explicit avro schema is specified, it overrides the schema
  provided by the HMS
- if no explicit avro schema is specified, one is inferred, and then the
  inferred schema takes the place of the one provided by the HMS (thus
  promoting columns like TINYINT to INT)
- on COMPUTE STATS, if any discrepancy is discovered between the HMS
  schema and the inferred schema, an error is emitted.

The semantics for LocalCatalog are slightly different in the case of
tables which have not been configured as Avro format on the table level:

The existing implementation has the behavior that, when a table is
loaded, all partitions are inspected, and, if any partition is
discovered with Avro format, the above rules are applied. This has some
very unexpected results, described in an earlier email to
dev@impala.apache.org [1]. To summarize that email thread, the existing
behavior was decided to be unintuitive and inconsistent with Hive.
Additionally, this behavior requires loading all partitions up-front,
which gets in the goal of lazy/granular metadata loading in
LocalCatalog.

Thus, the LocalCatalog implementation differs as follows:

- the "schema override" behavior ONLY occurs if the Avro file format has
  been selected at a table level.

- if an Avro partition is added to a non-Avro table, and that partition
  has a schema that isn't compatible with the table's schema, an error
  will occur on read.

The thread additionally discusses adding an error message on "alter" to
prevent users from adding an Avro partition to a table with an
incompatible schema. To keep the scope of this patch minimal, that is
not yet implemented here. I filed IMPALA-7309 to change the behavior of
the existing catalog implementation to match.

A new test verifies the behavior, set to 'xfail' when running on the
existing catalog implementation.

[1] https://lists.apache.org/thread.html/fb68c54bd66a40982ee17f9f16f87a4112220a5df035a311bda310f1@%3Cdev.impala.apache.org%3E

Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69
Reviewed-on: http://gerrit.cloudera.org:8080/10970
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Vuk Ercegovac <ve...@cloudera.com>


> Prevent the addition of Avro partitions to non-Avro tables with incompatible schema
> -----------------------------------------------------------------------------------
>
>                 Key: IMPALA-7309
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7309
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog, Frontend
>            Reporter: Todd Lipcon
>            Priority: Major
>
> Per a recent [mailing list thread|https://lists.apache.org/thread.html/fb68c54bd66a40982ee17f9f16f87a4112220a5df035a311bda310f1@<dev.impala.apache.org>] the behavior of Avro partitions within non-Avro tables is inconsistent with Hive, and somewhat suprising. For example, the addition of a partition can cause the results of "describe" on the table to change, but only after a refresh or invalidate. In the mailing list thread, we decided to change the behavior to:
> 1. Schema handling:
> - if a table's properties indicate it's an avro table, parse and adopt the
> external avro schema as the table schema, or infer an avro-compatible schema from the existing columns
> - if a table's properties indicate it's _not_ an avro table, but there is
> an external avro schema defined in the table properties, then parse the
> avro schema and include it in the TableDescriptor (for use by avro
> partitions) but *do not* adopt it as the table schema.
> 2. Handling incompatible schemas:
> - If the table-level format is non-Avro,
> - AND the table contains column types incompatible with Avro (eg tinyint),
> - AND the table has an existing avro partition,
> - THEN the query will yield an error about incompatible types
> 3. Try to prevent shooting in the foot
> - If the table-level format is non-Avro,
> - AND the table contains column types incompatible with Avro (eg tinyint),
> - THEN disallow changing the file format of an existing partition to Avro



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org