You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Venki Korukanti <ve...@gmail.com> on 2013/10/23 23:29:21 UTC
Review Request 14890: Index creation on a skew table fails
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14890/
-----------------------------------------------------------
Review request for hive, Ashutosh Chauhan and Thejas Nair.
Bugs: HIVE-5631
https://issues.apache.org/jira/browse/HIVE-5631
Repository: hive-git
Description
-------
Repro steps:
CREATE DATABASE skewtest;
USE skewtest;
CREATE TABLE skew (id bigint, acct string) SKEWED BY (acct) ON ('CC','CH');
CREATE INDEX skew_indx ON TABLE skew (id) as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED REBUILD;
Last DDL fails with following error.
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. InvalidObjectException(message:Invalid skew column [acct])
When creating a table, Hive has sanity tests to make sure the columns have proper names and the skewed columns are subset of the table columns. Here we fail because index table has skewed column info. Index tables's skewed columns include {acct} and the columns are {id, _bucketname, _offsets}. As the skewed column {acct} is not part of the table columns Hive throws the exception.
The reason why Index table got skewed column info even though its definition has no such info is: When creating the index table a deep copy of the base table's StorageDescriptor (SD) (in this case 'skew') is made. And in that copied SD, index specific parameters are set and unrelated parameters are reset. Here skewed column info is not reset (there are few other params that are not reset). That's why the index table contains the skewed column info.
Fix: Instead of deep copying the base table StorageDescriptor, create a new one from gathered info. This way it avoids the index table to inherit unnecessary properties in SD from base table.
Diffs
-----
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java b0f124b
ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHive.java d0cbed6
ql/src/test/queries/clientpositive/index_skewtable.q PRE-CREATION
ql/src/test/results/clientpositive/index_skewtable.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/14890/diff/
Testing
-------
Added unittest and ran the index related unittest queries
Thanks,
Venki Korukanti
Re: Review Request 14890: Index creation on a skew table fails
Posted by Ashutosh Chauhan <ha...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14890/#review61395
-----------------------------------------------------------
Ship it!
Ship It!
- Ashutosh Chauhan
On Nov. 14, 2014, 12:03 a.m., Venki Korukanti wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14890/
> -----------------------------------------------------------
>
> (Updated Nov. 14, 2014, 12:03 a.m.)
>
>
> Review request for hive, Ashutosh Chauhan and Thejas Nair.
>
>
> Bugs: HIVE-5631
> https://issues.apache.org/jira/browse/HIVE-5631
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> Repro steps:
> CREATE DATABASE skewtest;
> USE skewtest;
> CREATE TABLE skew (id bigint, acct string) SKEWED BY (acct) ON ('CC','CH');
> CREATE INDEX skew_indx ON TABLE skew (id) as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED REBUILD;
>
> Last DDL fails with following error.
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. InvalidObjectException(message:Invalid skew column [acct])
>
> When creating a table, Hive has sanity tests to make sure the columns have proper names and the skewed columns are subset of the table columns. Here we fail because index table has skewed column info. Index tables's skewed columns include {acct} and the columns are {id, _bucketname, _offsets}. As the skewed column {acct} is not part of the table columns Hive throws the exception.
>
> The reason why Index table got skewed column info even though its definition has no such info is: When creating the index table a deep copy of the base table's StorageDescriptor (SD) (in this case 'skew') is made. And in that copied SD, index specific parameters are set and unrelated parameters are reset. Here skewed column info is not reset (there are few other params that are not reset). That's why the index table contains the skewed column info.
>
> Fix: Instead of deep copying the base table StorageDescriptor, create a new one from gathered info. This way it avoids the index table to inherit unnecessary properties in SD from base table.
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java b900627
> ql/src/test/queries/clientpositive/index_skewtable.q PRE-CREATION
> ql/src/test/results/clientpositive/index_skewtable.q.out PRE-CREATION
>
> Diff: https://reviews.apache.org/r/14890/diff/
>
>
> Testing
> -------
>
> Added unittest and ran the index related unittest queries
>
>
> Thanks,
>
> Venki Korukanti
>
>
Re: Review Request 14890: Index creation on a skew table fails
Posted by Venki Korukanti <ve...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14890/
-----------------------------------------------------------
(Updated Nov. 14, 2014, 12:03 a.m.)
Review request for hive, Ashutosh Chauhan and Thejas Nair.
Changes
-------
Rebased on latest trunk.
Bugs: HIVE-5631
https://issues.apache.org/jira/browse/HIVE-5631
Repository: hive-git
Description (updated)
-------
Repro steps:
CREATE DATABASE skewtest;
USE skewtest;
CREATE TABLE skew (id bigint, acct string) SKEWED BY (acct) ON ('CC','CH');
CREATE INDEX skew_indx ON TABLE skew (id) as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED REBUILD;
Last DDL fails with following error.
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. InvalidObjectException(message:Invalid skew column [acct])
When creating a table, Hive has sanity tests to make sure the columns have proper names and the skewed columns are subset of the table columns. Here we fail because index table has skewed column info. Index tables's skewed columns include {acct} and the columns are {id, _bucketname, _offsets}. As the skewed column {acct} is not part of the table columns Hive throws the exception.
The reason why Index table got skewed column info even though its definition has no such info is: When creating the index table a deep copy of the base table's StorageDescriptor (SD) (in this case 'skew') is made. And in that copied SD, index specific parameters are set and unrelated parameters are reset. Here skewed column info is not reset (there are few other params that are not reset). That's why the index table contains the skewed column info.
Fix: Instead of deep copying the base table StorageDescriptor, create a new one from gathered info. This way it avoids the index table to inherit unnecessary properties in SD from base table.
Diffs (updated)
-----
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java b900627
ql/src/test/queries/clientpositive/index_skewtable.q PRE-CREATION
ql/src/test/results/clientpositive/index_skewtable.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/14890/diff/
Testing
-------
Added unittest and ran the index related unittest queries
Thanks,
Venki Korukanti
Re: Review Request 14890: Index creation on a skew table fails
Posted by Venki Korukanti <ve...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14890/
-----------------------------------------------------------
(Updated Oct. 24, 2013, 6:34 p.m.)
Review request for hive, Ashutosh Chauhan and Thejas Nair.
Changes
-------
Initialize SerDeInfo object
Bugs: HIVE-5631
https://issues.apache.org/jira/browse/HIVE-5631
Repository: hive-git
Description
-------
Repro steps:
CREATE DATABASE skewtest;
USE skewtest;
CREATE TABLE skew (id bigint, acct string) SKEWED BY (acct) ON ('CC','CH');
CREATE INDEX skew_indx ON TABLE skew (id) as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED REBUILD;
Last DDL fails with following error.
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. InvalidObjectException(message:Invalid skew column [acct])
When creating a table, Hive has sanity tests to make sure the columns have proper names and the skewed columns are subset of the table columns. Here we fail because index table has skewed column info. Index tables's skewed columns include {acct} and the columns are {id, _bucketname, _offsets}. As the skewed column {acct} is not part of the table columns Hive throws the exception.
The reason why Index table got skewed column info even though its definition has no such info is: When creating the index table a deep copy of the base table's StorageDescriptor (SD) (in this case 'skew') is made. And in that copied SD, index specific parameters are set and unrelated parameters are reset. Here skewed column info is not reset (there are few other params that are not reset). That's why the index table contains the skewed column info.
Fix: Instead of deep copying the base table StorageDescriptor, create a new one from gathered info. This way it avoids the index table to inherit unnecessary properties in SD from base table.
Diffs (updated)
-----
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java b0f124b
ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHive.java d0cbed6
ql/src/test/queries/clientpositive/index_skewtable.q PRE-CREATION
ql/src/test/results/clientpositive/index_skewtable.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/14890/diff/
Testing
-------
Added unittest and ran the index related unittest queries
Thanks,
Venki Korukanti