You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/07/22 15:06:00 UTC

[jira] [Work logged] (HIVE-26350) IndexOutOfBoundsException when generating splits for external JDBC table with partition columns

     [ https://issues.apache.org/jira/browse/HIVE-26350?focusedWorklogId=794261&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794261 ]

ASF GitHub Bot logged work on HIVE-26350:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 22/Jul/22 15:05
            Start Date: 22/Jul/22 15:05
    Worklog Time Spent: 10m 
      Work Description: zabetak opened a new pull request, #3470:
URL: https://github.com/apache/hive/pull/3470

   ### What changes were proposed in this pull request and why?
   1. Introduce new API `DatabaseAccessor#getColumnTypes` to: i) allow fetching column types from the database; ii) align with the code using `DatabaseAccessor#getColumnNames`.
   2. Use the new API to find the type of the partition column in `JdbcInputFormat` since information is not propagated correctly to `LIST_COLUMN_TYPES` and leads to IOBE.
   3. Some refactoring in `GenericJdbcDatabaseAccessor` to avoid duplicate code with the introduction of the new API.
   4. Add test reproducing the IOBE problem, and tests for the new API. 
   5. Adapt existing accessor,Jdbc format tests based on the changes.
   
   ### Does this PR introduce _any_ user-facing change?
   Solves the IOBE problem described in HIVE-26350
   
   ### How was this patch tested?
   ```
   mvn test -pl itests/qtest -Pitests -Dtest=TestMiniLlapLocalCliDriver -Dqfile=jdbc_partition_table_pruned_pcolumn.q,external_jdbc_table_partition.q,external_jdbc_table_typeconversion.q
   mvn test -pl jdbc-handler -Dtest=TestJdbcInputFormat,TestGenericJdbcDatabaseAccessor
   ```




Issue Time Tracking
-------------------

    Worklog Id:     (was: 794261)
    Time Spent: 20m  (was: 10m)

> IndexOutOfBoundsException when generating splits for external JDBC table with partition columns
> -----------------------------------------------------------------------------------------------
>
>                 Key: HIVE-26350
>                 URL: https://issues.apache.org/jira/browse/HIVE-26350
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO, JDBC storage handler
>            Reporter: Stamatis Zampetakis
>            Assignee: Soumyakanti Das
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: cbo_plan.txt, explain_plan.txt, jdbc_join_with_partition_table.q
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Create the following table in some JDBC database (e.g., Postgres).
> {code:sql}
> CREATE TABLE country
> (
>     id   int,
>     name varchar(20)
> );
> {code}
> Create the following tables in Hive ensuring that the external JDBC table has the {{hive.sql.partitionColumn}} table property set.
> {code:sql}
> CREATE TABLE city (id int);
> CREATE EXTERNAL TABLE country
> (
>     id int,
>     name varchar(20)
> )
> STORED BY                                          
> 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
> TBLPROPERTIES (                                    
>     "hive.sql.database.type" = "POSTGRES",
>     "hive.sql.jdbc.driver" = "org.postgresql.Driver",
>     "hive.sql.jdbc.url" = "jdbc:postgresql://localhost:5432/qtestDB",
>     "hive.sql.dbcp.username" = "qtestuser",
>     "hive.sql.dbcp.password" = "qtestpassword",
>     "hive.sql.table" = "country",
>     "hive.sql.partitionColumn" = "name",
>     "hive.sql.numPartitions" = "2"
> );
> {code}
> The query below fails with IndexOutOfBoundsException when the mapper scanning the JDBC table tries to generate the splits by exploiting the partitioning column.
> {code:sql}
> select country.id from country cross join city;
> {code}
> The full stack trace is given below.
> {noformat}
> java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
>         at java.util.ArrayList.rangeCheck(ArrayList.java:659) ~[?:1.8.0_261]
>         at java.util.ArrayList.get(ArrayList.java:435) ~[?:1.8.0_261]
>         at org.apache.hive.storage.jdbc.JdbcInputFormat.getSplits(JdbcInputFormat.java:102) [hive-jdbc-handler-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:564) [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:858) [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:263) [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:281) [tez-dag-0.10.1.jar:0.10.1]
>         at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:272) [tez-dag-0.10.1.jar:0.10.1]
>         at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_261]
>         at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_261]
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) [hadoop-common-3.1.0.jar:?]
>         at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:272) [tez-dag-0.10.1.jar:0.10.1]
>         at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:256) [tez-dag-0.10.1.jar:0.10.1]
>         at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) [guava-19.0.jar:?]
>         at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) [guava-19.0.jar:?]
>         at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) [guava-19.0.jar:?]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_261]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_261]
>         at java.lang.Thread.run(Thread.java:748) [?:1.8.0_261]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)