You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Hive QA (JIRA)" <ji...@apache.org> on 2017/06/19 08:12:00 UTC

[jira] [Commented] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ

    [ https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053594#comment-16053594 ] 

Hive QA commented on HIVE-16559:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12873245/HIVE-16559.04.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5674/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5674/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5674/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-06-19 08:11:14.979
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-5674/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-06-19 08:11:14.981
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at cb6bf88 HIVE-16902: investigate "failed to remove operation log" errors (Aihua Xu, reviewed by Yongzhi Chen)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at cb6bf88 HIVE-16902: investigate "failed to remove operation log" errors (Aihua Xu, reviewed by Yongzhi Chen)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-06-19 08:11:19.639
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch
error: patch failed: ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java:458
error: ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java: patch does not apply
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12873245 - PreCommit-HIVE-Build

> Parquet schema evolution for partitioned tables may break if table and partition serdes differ
> ----------------------------------------------------------------------------------------------
>
>                 Key: HIVE-16559
>                 URL: https://issues.apache.org/jira/browse/HIVE-16559
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Barna Zsombor Klara
>            Assignee: Barna Zsombor Klara
>             Fix For: 3.0.0
>
>         Attachments: HIVE-16559.01.patch, HIVE-16559.02.patch, HIVE-16559.03.patch, HIVE-16559.04.patch
>
>
> Parquet schema evolution should make it possible to have partitions/tables 
>  backed by files with different schemas. Hive should match the table columns with file columns based on the column name if possible.
> However if the serde for a table is missing columns from the serde of a partition Hive fails to match the columns together.
> Steps to reproduce:
> {code}
> CREATE TABLE myparquettable_parted
> (
>   name string,
>   favnumber int,
>   favcolor string,
>   age int,
>   favpet string
> )
> PARTITIONED BY (day string)
> STORED AS PARQUET;
> INSERT OVERWRITE TABLE myparquettable_parted
> PARTITION(day='2017-04-04')
> SELECT
>    'mary' as name,
>    5 AS favnumber,
>    'blue' AS favcolor,
>    35 AS age,
>    'dog' AS favpet;
> alter table myparquettable_parted
> REPLACE COLUMNS
> (
> favnumber int,
> age int
> );   <!--- No cascade option, so the partition will not be altered. 
> {code}
> {{SELECT * FROM myparquettable_parted where day='2017-04-04';}}
> will fail with:
> {{java.lang.UnsupportedOperationException: Cannot inspect org.apache.hadoop.io.IntWritable}}
> Hive should either match the columns together or prevent the user from dropping columns from the table.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)