You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Hive QA (JIRA)" <ji...@apache.org> on 2017/10/19 20:08:00 UTC
[jira] [Commented] (HIVE-17696) Vectorized reader does not seem to be pushing down projection columns in certain code paths

    [ https://issues.apache.org/jira/browse/HIVE-17696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16211650#comment-16211650 ] 

Hive QA commented on HIVE-17696:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12892984/HIVE-17696.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 11309 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] (batchId=47)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_1_23] (batchId=76)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] (batchId=145)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[optimize_nullscan] (batchId=163)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_partitioned_date_time] (batchId=163)
org.apache.hadoop.hive.cli.TestSparkPerfCliDriver.testCliDriver[query16] (batchId=243)
org.apache.hadoop.hive.cli.TestSparkPerfCliDriver.testCliDriver[query94] (batchId=243)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query14] (batchId=241)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query16] (batchId=241)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query94] (batchId=241)
org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut (batchId=204)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints (batchId=221)
org.apache.hive.jdbc.TestTriggersWorkloadManager.testTriggerHighShuffleBytes (batchId=228)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7385/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7385/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7385/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12892984 - PreCommit-HIVE-Build

> Vectorized reader does not seem to be pushing down projection columns in certain code paths
> -------------------------------------------------------------------------------------------
>
>                 Key: HIVE-17696
>                 URL: https://issues.apache.org/jira/browse/HIVE-17696
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Vihang Karajgaonkar
>            Assignee: Ferdinand Xu
>         Attachments: HIVE-17696.patch
>
>
> This is the code snippet from {{VectorizedParquetRecordReader.java}}
> {noformat}
> MessageType tableSchema;
>     if (indexAccess) {
>       List<Integer> indexSequence = new ArrayList<>();
>       // Generates a sequence list of indexes
>       for(int i = 0; i < columnNamesList.size(); i++) {
>         indexSequence.add(i);
>       }
>       tableSchema = DataWritableReadSupport.getSchemaByIndex(fileSchema, columnNamesList,
>         indexSequence);
>     } else {
>       tableSchema = DataWritableReadSupport.getSchemaByName(fileSchema, columnNamesList,
>         columnTypesList);
>     }
>     indexColumnsWanted = ColumnProjectionUtils.getReadColumnIDs(configuration);
>     if (!ColumnProjectionUtils.isReadAllColumns(configuration) && !indexColumnsWanted.isEmpty()) {
>       requestedSchema =
>         DataWritableReadSupport.getSchemaByIndex(tableSchema, columnNamesList, indexColumnsWanted);
>     } else {
>       requestedSchema = fileSchema;
>     }
>     this.reader = new ParquetFileReader(
>       configuration, footer.getFileMetaData(), file, blocks, requestedSchema.getColumns());
> {noformat}
> Couple of things to notice here:
> Most of this code is duplicated from {{DataWritableReadSupport.init()}} method. 
> the else condition passes in fileSchema instead of using tableSchema like we do in DataWritableReadSupport.init() method. Does this cause projection columns to be missed when we read parquet files? We should probably just reuse ReadContext returned from {{DataWritableReadSupport.init()}} method here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)