You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "László Bodor (Jira)" <ji...@apache.org> on 2019/10/09 12:53:00 UTC

[jira] [Commented] (ORC-558) ORC-540 changes result in orc_remove_col.q

    [ https://issues.apache.org/jira/browse/ORC-558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16947626#comment-16947626 ] 

László Bodor commented on ORC-558:
----------------------------------

seems like the there is a missing check/restriction, so the user can disable schema evolution and force positional evolution at the same time while using find columns, which is invalid, e.g.:
{code}
SET hive.exec.schema.evolution=false;
SET orc.force.positional.evolution=true;
{code}
I think when hive.exec.schema.evolution is set to false, this part should fallback to the file's schema:
https://github.com/apache/orc/commit/a9c0ca43cffab3acd73d19ac91297e1f507277c2#diff-a0861c5ad0775baf23793ac86da5fbb6R101-R106


> ORC-540 changes result in orc_remove_col.q
> ------------------------------------------
>
>                 Key: ORC-558
>                 URL: https://issues.apache.org/jira/browse/ORC-558
>             Project: ORC
>          Issue Type: Bug
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>         Attachments: orc_remove_cols_hive_logs.txt
>
>
> After ORC-540, result rows all disappear from hive's orc_remove_col.q qtest.
> Reproduction (locally):
> 1. reset orc to 1.5.6 + cherry pick ORC-540
> 2. build orc + build hive (hive depends on ORC 1.5.6, so no pom.xml change is needed)
> 3. run hive qtest: mvn test -Dtest.output.overwrite=true -pl itests/qtest -Dtest=TestCliDriver -Dqfile=orc_remove_cols.q
> {code}
> Client Execution succeeded but contained differences (error code = 1) after executing orc_remove_cols.q
> 62a63,72
> > -1073279343 today
> > -1073051226 today
> > -1072910839 today
> > -1072081801 today
> > -1072076362 today
> > -1071480828 today
> > -1071363017 today
> > -1070883071 today
> > -1070551679 today
> > -1069736047 today
> 72a83,92
> > -1073279343 tomorrow
> > -1073051226 tomorrow
> > -1072910839 tomorrow
> > -1072081801 tomorrow
> > -1072076362 tomorrow
> > -1071480828 tomorrow
> > -1071363017 tomorrow
> > -1070883071 tomorrow
> > -1070551679 tomorrow
> > -1069736047 tomorrow
> {code}
> qtest containing the selects:
> {code}
> --! qt:dataset:alltypesorc
> set hive.vectorized.execution.enabled=false;
> SET hive.exec.schema.evolution=false;
> set hive.fetch.task.conversion=more;
> set hive.mapred.mode=nonstrict;
> CREATE TABLE orc_partitioned(a INT, b STRING) partitioned by (ds string) STORED AS ORC;
> insert into table orc_partitioned partition (ds = 'today') select cint, cstring1 from alltypesorc where cint is not null order by cint limit 10;
> insert into table orc_partitioned partition (ds = 'tomorrow') select cint, cstring1 from alltypesorc where cint is not null order by cint limit 10;
> -- Use the old change the SERDE trick to avoid ORC DDL checks... and remove a column on the end.
> ALTER TABLE orc_partitioned SET SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe';
> ALTER TABLE orc_partitioned REPLACE COLUMNS (a int);
> ALTER TABLE orc_partitioned SET SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde';
> SELECT * FROM orc_partitioned WHERE ds = 'today';
> SELECT * FROM orc_partitioned WHERE ds = 'tomorrow';
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)