You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Fucun Chu (Code Review)" <ge...@cloudera.org> on 2021/11/03 02:19:38 UTC

[Impala-ASF-CR] WIP IMPALA-5741: Support reading tiny RDBMS tables

Fucun Chu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17842


Change subject: WIP IMPALA-5741: Support reading tiny RDBMS tables
......................................................................

WIP IMPALA-5741: Support reading tiny RDBMS tables

This patch uses the "external data source" mechanism in Impala and
writes a data source for querying jdbc. It has some limitations:
- It is not distributed.
- Only support binary predicates with operators =, !=, <=, >=, <, > to
be pushed to RDBMS

In order to query the RDBMS tables, the following steps should be
followed (note that existing data source table will be rebuilt):
1. Make sure that the database driver package has been added to the
classpath and the minicluster cluster has been started.

2. Copy the data source library into HDFS.
${IMPALA_HOME}/testdata/bin/copy-data-sources.sh

3. Create an `alltypes` table in the postgres database.
${IMPALA_HOME}/testdata/bin/load-data-sources.sh

4. Create data sources table(alltypes_jdbc_datasource).
${IMPALA_HOME}/bin/impala-shell.sh -i ${IMPALAD} -f\
  ${IMPALA_HOME}/testdata/bin/create-data-source-table.sql

Testing:
- Ran core tests successfully.

Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
---
M fe/src/main/java/org/apache/impala/extdatasource/ExternalDataSourceExecutor.java
M fe/src/test/java/org/apache/impala/service/FrontendTest.java
A java/ext-data-source/jdbc/pom.xml
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/DatabaseType.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DB2DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessorFactory.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MsSqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MySqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/OracleDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/PostgresDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/exception/JdbcDatabaseAccessException.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java
A java/ext-data-source/jdbc/src/test/java/org/apache/impala/extdatasource/jdbc/JdbcDataSourceTest.java
A java/ext-data-source/jdbc/src/test/resources/log4j.properties
A java/ext-data-source/jdbc/src/test/resources/test_script.sql
M java/ext-data-source/pom.xml
M testdata/bin/copy-data-sources.sh
M testdata/bin/create-data-source-table.sql
M testdata/bin/create-load-data.sh
A testdata/bin/load-data-sources.sh
M testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test
28 files changed, 1,975 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/17842/5
-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 5
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] WIP IMPALA-5741: Support reading tiny RDBMS tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: WIP IMPALA-5741: Support reading tiny RDBMS tables
......................................................................


Patch Set 6:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/9714/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 6
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 03 Nov 2021 11:16:23 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 27:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17842/26/tests/query_test/test_ext_data_sources.py
File tests/query_test/test_ext_data_sources.py:

http://gerrit.cloudera.org:8080/#/c/17842/26/tests/query_test/test_ext_data_sources.py@63
PS26, Line 63: verify
> nit: verify
fixed



-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 27
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 05 Oct 2023 04:16:00 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#21) to the change originally created by Fucun Chu. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................

IMPALA-5741: Initial support for reading tiny RDBMS tables

This patch uses the "external data source" mechanism in Impala to
implement data source for querying jdbc.
It has some limitations due to the restrictions of "external data
source":
  - It is not distributed.
  - Only support binary predicates with operators =, !=, <=, >=,
    <, > to be pushed to RDBMS.
  - Don't support following data types for predicates:
    DECIMAL, TIMESTAMP, DATE, and binary.
  - Don't support external tables with complex types of columns.
  - Only support following RDBMS:
    MySQL, Postgres, Oracle, MSSQL, H2, DB2, and JETHRO_DATA.

Source files under jdbc/conf, jdbc/dao and jdbc/exception are
replicated from Hive JDBC Storage Handler.

In order to query the RDBMS tables, the following steps should be
followed (note that existing data source table will be rebuilt):
1. Make sure the Impala cluster has been started.

2. Copy the jar files of jdbc drivers and the data source library into
HDFS.
${IMPALA_HOME}/testdata/bin/copy-data-sources.sh

3. Create an `alltypes` table in the postgres database.
${IMPALA_HOME}/testdata/bin/load-data-sources.sh

4. Create data sources table(alltypes_jdbc_datasource).
${IMPALA_HOME}/bin/impala-shell.sh -i ${IMPALAD} -f\
  ${IMPALA_HOME}/testdata/bin/create-data-source-table.sql

Testing:
 - Added unit-test for Postgres and ran unit-test with jdbc driver
   postgresql-42.5.1.jar.
 - Ran manual unit-test for MySql with jdbc driver
   mysql-connector-j-8.1.0.jar.
 - Ran core tests successfully.

Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
---
M bin/rat_exclude_files.txt
M fe/src/main/java/org/apache/impala/extdatasource/ExternalDataSourceExecutor.java
M fe/src/test/java/org/apache/impala/service/FrontendTest.java
A java/ext-data-source/jdbc/pom.xml
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/README.md
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/DatabaseType.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DB2DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessorFactory.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MsSqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MySqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/OracleDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/PostgresDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/exception/JdbcDatabaseAccessException.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java
A java/ext-data-source/jdbc/src/test/java/org/apache/impala/extdatasource/jdbc/JdbcDataSourceTest.java
A java/ext-data-source/jdbc/src/test/resources/log4j.properties
A java/ext-data-source/jdbc/src/test/resources/test_script.sql
M java/ext-data-source/pom.xml
M testdata/bin/copy-data-sources.sh
M testdata/bin/create-data-source-table.sql
M testdata/bin/create-load-data.sh
A testdata/bin/load-data-sources.sh
M testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test
30 files changed, 2,231 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/17842/21
-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 21
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 23:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17842/23/testdata/bin/copy-ext-data-sources.sh
File testdata/bin/copy-ext-data-sources.sh:

http://gerrit.cloudera.org:8080/#/c/17842/23/testdata/bin/copy-ext-data-sources.sh@39
PS23, Line 39: echo "Copied" ${IMPALA_HOME}/java/ext-data-source/test/target/impala-data-source-test-*.jar \
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/17842/23/testdata/bin/copy-ext-data-sources.sh@46
PS23, Line 46: echo "Copied" ${IMPALA_HOME}/java/ext-data-source/jdbc/target/impala-data-source-jdbc-*.jar \
line too long (93 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 23
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 04 Oct 2023 03:11:03 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 31:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9800/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 31
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 09 Oct 2023 21:44:57 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#23) to the change originally created by Fucun Chu. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................

IMPALA-5741: Initial support for reading tiny RDBMS tables

This patch uses the "external data source" mechanism in Impala to
implement data source for querying jdbc.
It has some limitations due to the restrictions of "external data
source":
  - It is not distributed.
  - Don't support following data types for column data from external
    table: BINARY, CHAR, DATETIME, and COMPLEX.
  - Only support binary predicates with operators =, !=, <=, >=,
    <, > to be pushed to RDBMS.
  - Don't support following data types for predicates:
    DECIMAL, TIMESTAMP, DATE, and BINARY.
  - Don't support external tables with complex types of columns.
  - Only support following RDBMS:
    MySQL, Postgres, Oracle, MSSQL, H2, DB2, and JETHRO_DATA.
  - Don't support Catalog V2 (IMPALA-7131)
  - DataSource objects are not persistent (IMPALA-12375)

Source files under jdbc/conf, jdbc/dao and jdbc/exception are
replicated from Hive JDBC Storage Handler.

In order to query the RDBMS tables, the following steps should be
followed (note that existing data source table will be rebuilt):
1. Make sure the Impala cluster has been started.

2. Copy the jar files of jdbc drivers and the data source library into
HDFS.
${IMPALA_HOME}/testdata/bin/copy-data-sources.sh

3. Create an `alltypes` table in the postgres database.
${IMPALA_HOME}/testdata/bin/load-data-sources.sh

4. Create data sources table(alltypes_jdbc_datasource).
${IMPALA_HOME}/bin/impala-shell.sh -f\
  ${IMPALA_HOME}/testdata/bin/create-data-source-table.sql

Testing:
 - Added unit-test for Postgres and ran unit-test with jdbc driver
   postgresql-42.5.1.jar.
 - Ran manual unit-test for MySql with jdbc driver
   mysql-connector-j-8.1.0.jar.
 - Ran core tests successfully.

Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
---
M bin/rat_exclude_files.txt
M fe/src/main/java/org/apache/impala/extdatasource/ExternalDataSourceExecutor.java
M fe/src/test/java/org/apache/impala/service/FrontendTest.java
A java/ext-data-source/jdbc/pom.xml
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/README.md
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/DatabaseType.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DB2DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessorFactory.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MsSqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MySqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/OracleDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/PostgresDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/exception/JdbcDatabaseAccessException.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java
A java/ext-data-source/jdbc/src/test/java/org/apache/impala/extdatasource/jdbc/JdbcDataSourceTest.java
A java/ext-data-source/jdbc/src/test/resources/log4j.properties
A java/ext-data-source/jdbc/src/test/resources/test_script.sql
M java/ext-data-source/pom.xml
D testdata/bin/copy-data-sources.sh
A testdata/bin/copy-ext-data-sources.sh
D testdata/bin/create-data-source-table.sql
A testdata/bin/create-ext-data-source-table.sql
M testdata/bin/create-load-data.sh
A testdata/bin/load-ext-data-sources.sh
A testdata/workloads/functional-query/queries/QueryTest/jdbc-data-source.test
M tests/query_test/test_queries.py
33 files changed, 2,407 insertions(+), 82 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/17842/23
-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 23
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Kurt Deschler (Code Review)" <ge...@cloudera.org>.
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 22:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java:

http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java@273
PS22, Line 273:               .collect(Collectors.joining(", "));
Is quoting added somewhere? Postgres will convert unquoted capitals to lowercase.


http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java
File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java:

http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java@96
PS22, Line 96:       String countQuery = "SELECT COUNT(*) FROM (" + sql + ") tmptable";
Are we sure all of the target databases will flatten this view query? Otherwise could text replace the generated "select *"


http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java@285
PS22, Line 285:     props.put("maxActive", "3");
Are these database-specific?


http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java
File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java:

http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java@86
PS22, Line 86:       LOGGER.warn("hasNext() threw exception", se);
This should re-throw


http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java@175
PS22, Line 175:       LOGGER.warn("Caught exception while trying to close database objects", e);
probably should re-throw


http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java
File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java:

http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java@39
PS22, Line 39:     return "Select * from (" + sql + ") as \"tmp\" limit " + limit;
This may not always flatten well. consider text replacement instead.


http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java
File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java:

http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java@54
PS22, Line 54:         joiner.add(String.format("%s %s %s", name, op, value));
May need quoting here.


http://gerrit.cloudera.org:8080/#/c/17842/22/testdata/bin/create-data-source-table.sql
File testdata/bin/create-data-source-table.sql:

http://gerrit.cloudera.org:8080/#/c/17842/22/testdata/bin/create-data-source-table.sql@56
PS22, Line 56: CREATE TABLE alltypes_jdbc_datasource (
Add tests that require quoting



-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 22
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 02 Oct 2023 21:17:50 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 20: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/9781/


-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 20
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Sat, 30 Sep 2023 06:31:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Abhishek Rawat (Code Review)" <ge...@cloudera.org>.
Abhishek Rawat has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 21: Code-Review+1

(6 comments)

http://gerrit.cloudera.org:8080/#/c/17842/21//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17842/21//COMMIT_MSG@11
PS21, Line 11: It has some limitations due to the restrictions of "external data
There also seems to be restrictions on column data types being received from external table. Based on the `next` implementation in JdbcRecordIterator class following types aren't supported:
COMPLEX, BINARY, CHAR, DATETIME


http://gerrit.cloudera.org:8080/#/c/17842/21/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java:

http://gerrit.cloudera.org:8080/#/c/17842/21/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java@148
PS21, Line 148:     // 1. Check init string
              :     if (!convertInitStringToConfiguration(params.getInit_string())) {
              :       return new TOpenResult(
              :           new TStatus(TErrorCode.INTERNAL_ERROR,
              :               Lists.newArrayList("Invalid init_string value")));
              :     }
This is the same check as one in prepare().
Is there a case where you could still call open() if prepare() failed with an error?


http://gerrit.cloudera.org:8080/#/c/17842/21/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java
File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java:

http://gerrit.cloudera.org:8080/#/c/17842/21/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java@95
PS21, Line 95:   private static boolean isEmptyString(String value) {
We don't seem to be calling this function anywhere?


http://gerrit.cloudera.org:8080/#/c/17842/21/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java
File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java:

http://gerrit.cloudera.org:8080/#/c/17842/21/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java@66
PS21, Line 66:   protected DataSource dbcpDataSource = null;
`dbcpDataSource_` for instance variables


http://gerrit.cloudera.org:8080/#/c/17842/21/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java@68
PS21, Line 68:   public static Cache<String, DataSource> dataSourceCache = CacheBuilder
naming convention: `dataSourceCache_`


http://gerrit.cloudera.org:8080/#/c/17842/21/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java
File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java:

http://gerrit.cloudera.org:8080/#/c/17842/21/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java@53
PS21, Line 53:   private final Connection conn;
proper naming convention for all instance variables.



-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 21
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 02 Oct 2023 02:19:21 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 29:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/14162/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 29
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 09 Oct 2023 19:18:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 31: Code-Review+1

(2 comments)

carry +1 from Riza and Abhishek

http://gerrit.cloudera.org:8080/#/c/17842/30//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17842/30//COMMIT_MSG@13
PS30, Line 13:   - It is not distributed, e.g, fragment is unpartitioned. The queries
> nit: fragment
fixed


http://gerrit.cloudera.org:8080/#/c/17842/30//COMMIT_MSG@15
PS30, Line 15:   - Queries which read following data types from external JDBC tables
> JDBC
fixed



-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 31
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 09 Oct 2023 21:04:03 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 22:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/14122/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 22
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 02 Oct 2023 06:21:31 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#26) to the change originally created by Fucun Chu. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................

IMPALA-5741: Initial support for reading tiny RDBMS tables

This patch uses the "external data source" mechanism in Impala to
implement data source for querying jdbc.
It has some limitations due to the restrictions of "external data
source":
  - It is not distributed.
  - Don't support following data types for column data from external
    table: BINARY, CHAR, DATETIME, and COMPLEX.
  - Only support binary predicates with operators =, !=, <=, >=,
    <, > to be pushed to RDBMS.
  - Don't support following data types for predicates:
    DECIMAL, TIMESTAMP, DATE, and BINARY.
  - Don't support external tables with complex types of columns.
  - Only support following RDBMS:
    MySQL, Postgres, Oracle, MSSQL, H2, DB2, and JETHRO_DATA.
  - Don't support Catalog V2 (IMPALA-7131)
  - DataSource objects are not persistent (IMPALA-12375)

Source files under jdbc/conf, jdbc/dao and jdbc/exception are
replicated from Hive JDBC Storage Handler.

In order to query the RDBMS tables, the following steps should be
followed (note that existing data source table will be rebuilt):
1. Make sure the Impala cluster has been started.

2. Copy the jar files of jdbc drivers and the data source library into
HDFS.
${IMPALA_HOME}/testdata/bin/copy-ext-data-sources.sh

3. Create an `alltypes` table in the postgres database.
${IMPALA_HOME}/testdata/bin/load-ext-data-sources.sh

4. Create data sources table(alltypes_jdbc_datasource).
${IMPALA_HOME}/bin/impala-shell.sh -f\
  ${IMPALA_HOME}/testdata/bin/create-ext-data-source-table.sql

Testing:
 - Added unit-test for Postgres and ran unit-test with jdbc driver
   postgresql-42.5.1.jar.
 - Ran manual unit-test for MySql with jdbc driver
   mysql-connector-j-8.1.0.jar.
 - Ran core tests successfully.

Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
---
M bin/rat_exclude_files.txt
M fe/src/main/java/org/apache/impala/extdatasource/ExternalDataSourceExecutor.java
M fe/src/test/java/org/apache/impala/service/FrontendTest.java
A java/ext-data-source/jdbc/pom.xml
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/README.md
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/DatabaseType.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DB2DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessorFactory.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MsSqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MySqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/OracleDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/PostgresDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/exception/JdbcDatabaseAccessException.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java
A java/ext-data-source/jdbc/src/test/java/org/apache/impala/extdatasource/jdbc/JdbcDataSourceTest.java
A java/ext-data-source/jdbc/src/test/resources/log4j.properties
A java/ext-data-source/jdbc/src/test/resources/test_script.sql
M java/ext-data-source/pom.xml
D testdata/bin/copy-data-sources.sh
A testdata/bin/copy-ext-data-sources.sh
D testdata/bin/create-data-source-table.sql
A testdata/bin/create-ext-data-source-table.sql
M testdata/bin/create-load-data.sh
A testdata/bin/load-ext-data-sources.sh
A testdata/workloads/functional-query/queries/QueryTest/jdbc-data-source.test
A tests/query_test/test_ext_data_sources.py
M tests/query_test/test_queries.py
34 files changed, 2,488 insertions(+), 86 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/17842/26
-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 26
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 22:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17842/22//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17842/22//COMMIT_MSG@33
PS22, Line 33: copy-data-sources.sh
nit: General comment about naming: instead of referring things with *data-source* in this patch, maybe *ext-data-source* or *jdbc-data-source* is more precise?

So the script names will be something like:
copy-ext-data-sources.sh
create-ext-data-source-table.sql
load-ext-data-sources.sh



-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 22
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 02 Oct 2023 19:06:42 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5741: Support reading tiny RDBMS tables

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has uploaded a new patch set (#9) to the change originally created by Fucun Chu. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Support reading tiny RDBMS tables
......................................................................

IMPALA-5741: Support reading tiny RDBMS tables

This patch uses the "external data source" mechanism in Impala and
writes a data source for querying jdbc. It has some limitations:
- It is not distributed.
- Only support binary predicates with operators =, !=, <=, >=, <, > to
be pushed to RDBMS

In order to query the RDBMS tables, the following steps should be
followed (note that existing data source table will be rebuilt):
1. Make sure that the database driver package has been added to the
classpath and the minicluster cluster has been started.

2. Copy the data source library into HDFS.
${IMPALA_HOME}/testdata/bin/copy-data-sources.sh

3. Create an `alltypes` table in the postgres database.
${IMPALA_HOME}/testdata/bin/load-data-sources.sh

4. Create data sources table(alltypes_jdbc_datasource).
${IMPALA_HOME}/bin/impala-shell.sh -i ${IMPALAD} -f\
  ${IMPALA_HOME}/testdata/bin/create-data-source-table.sql

Testing:
- Ran core tests successfully.

Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
---
M fe/src/main/java/org/apache/impala/extdatasource/ExternalDataSourceExecutor.java
M fe/src/test/java/org/apache/impala/service/FrontendTest.java
A java/ext-data-source/jdbc/pom.xml
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/DatabaseType.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DB2DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessorFactory.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MsSqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MySqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/OracleDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/PostgresDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/exception/JdbcDatabaseAccessException.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java
A java/ext-data-source/jdbc/src/test/java/org/apache/impala/extdatasource/jdbc/JdbcDataSourceTest.java
A java/ext-data-source/jdbc/src/test/resources/log4j.properties
A java/ext-data-source/jdbc/src/test/resources/test_script.sql
M java/ext-data-source/pom.xml
M testdata/bin/copy-data-sources.sh
M testdata/bin/create-data-source-table.sql
M testdata/bin/create-load-data.sh
A testdata/bin/load-data-sources.sh
M testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test
28 files changed, 2,007 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/17842/9
-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 9
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-5741: Support reading tiny RDBMS tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Support reading tiny RDBMS tables
......................................................................


Patch Set 12:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/13758/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 12
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 15 Aug 2023 21:26:48 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 20:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/17842/19//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17842/19//COMMIT_MSG@11
PS19, Line 11: It has some limitations due to the restrictions of "external data
> There are likely other restrictions?
Added supported database types.
Also added more complex queries: inner join with non jdbc table, inner join with another jdbc table, cross join with order, group by.


http://gerrit.cloudera.org:8080/#/c/17842/19/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java:

http://gerrit.cloudera.org:8080/#/c/17842/19/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java@83
PS19, Line 83: entify Jdbc
> What's the purpose of ScanHandle? It's a string and it's just set to some r
Handle to identify JdbcDataSource object. It's assigned with random UUID value in open() API, and verified in getNext() and close() APIs to make sure caller use valid JdbcDataSource object.
Added comments.


http://gerrit.cloudera.org:8080/#/c/17842/19/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java@84
PS19, Line 84:   // It's assigned with random UUID value in open() API, and returned to the caller.
> What are we caching? Some comments here would be useful as this isn't obvio
added comments


http://gerrit.cloudera.org:8080/#/c/17842/19/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java@86
PS19, Line 86:   // input parameters to make sure the object is valid.
> What exactly is a table Config? Comments would be good.
It's a map to save table properties. Added comments.


http://gerrit.cloudera.org:8080/#/c/17842/19/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java@98
PS19, Line 98: 
> typo:
fixed



-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 20
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Sat, 30 Sep 2023 00:23:25 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 23:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/14137/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 23
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 04 Oct 2023 03:37:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 31: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 31
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 10 Oct 2023 02:13:57 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#31) to the change originally created by Fucun Chu. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................

IMPALA-5741: Initial support for reading tiny RDBMS tables

This patch uses the "external data source" mechanism in Impala to
implement data source for querying JDBC.
It has some limitations due to the restrictions of "external data
source":
  - It is not distributed, e.g, fragment is unpartitioned. The queries
    are executed on coordinator.
  - Queries which read following data types from external JDBC tables
    are not supported:
    BINARY, CHAR, DATETIME, and COMPLEX.
  - Only support binary predicates with operators =, !=, <=, >=,
    <, > to be pushed to RDBMS.
  - Following data types are not supported for predicates:
    DECIMAL, TIMESTAMP, DATE, and BINARY.
  - External tables with complex types of columns are not supported.
  - Support is limited to the following databases:
    MySQL, Postgres, Oracle, MSSQL, H2, DB2, and JETHRO_DATA.
  - Catalog V2 is not supported (IMPALA-7131).
  - DataSource objects are not persistent (IMPALA-12375).

Additional fixes are planned on top of this patch.

Source files under jdbc/conf, jdbc/dao and jdbc/exception are
replicated from Hive JDBC Storage Handler.

In order to query the RDBMS tables, the following steps should be
followed (note that existing data source table will be rebuilt):
1. Make sure the Impala cluster has been started.

2. Copy the jar files of JDBC drivers and the data source library into
HDFS.
${IMPALA_HOME}/testdata/bin/copy-ext-data-sources.sh

3. Create an `alltypes` table in the Postgres database.
${IMPALA_HOME}/testdata/bin/load-ext-data-sources.sh

4. Create data source tables (alltypes_jdbc_datasource and
alltypes_jdbc_datasource_2).
${IMPALA_HOME}/bin/impala-shell.sh -f\
  ${IMPALA_HOME}/testdata/bin/create-ext-data-source-table.sql

5. It's ready to run query to access data source tables created
in last step. Don't need to restart Impala cluster.

Testing:
 - Added unit-test for Postgres and ran unit-test with JDBC driver
   postgresql-42.5.1.jar.
 - Ran manual unit-test for MySql with JDBC driver
   mysql-connector-j-8.1.0.jar.
 - Ran core tests successfully.

Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
---
M bin/rat_exclude_files.txt
M fe/src/main/java/org/apache/impala/extdatasource/ExternalDataSourceExecutor.java
M fe/src/test/java/org/apache/impala/service/FrontendTest.java
A java/ext-data-source/jdbc/pom.xml
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/README.md
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/DatabaseType.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DB2DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessorFactory.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MsSqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MySqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/OracleDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/PostgresDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/exception/JdbcDatabaseAccessException.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java
A java/ext-data-source/jdbc/src/test/java/org/apache/impala/extdatasource/jdbc/JdbcDataSourceTest.java
A java/ext-data-source/jdbc/src/test/resources/log4j.properties
A java/ext-data-source/jdbc/src/test/resources/test_script.sql
M java/ext-data-source/pom.xml
D testdata/bin/copy-data-sources.sh
A testdata/bin/copy-ext-data-sources.sh
D testdata/bin/create-data-source-table.sql
A testdata/bin/create-ext-data-source-table.sql
M testdata/bin/create-load-data.sh
A testdata/bin/load-ext-data-sources.sh
A testdata/workloads/functional-query/queries/QueryTest/jdbc-data-source.test
A tests/query_test/test_ext_data_sources.py
M tests/query_test/test_queries.py
34 files changed, 2,488 insertions(+), 86 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/17842/31
-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 31
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Kurt Deschler (Code Review)" <ge...@cloudera.org>.
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 30: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17842/30//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17842/30//COMMIT_MSG@13
PS30, Line 13:   - It is not distributed, e.g, fregment is unpartitioned. The queries
nit: fragment


http://gerrit.cloudera.org:8080/#/c/17842/30//COMMIT_MSG@15
PS30, Line 15:   - Queries which read following data types from external jdbc tables
JDBC



-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 30
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 09 Oct 2023 20:52:13 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 19:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/14066/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 19
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 25 Sep 2023 01:09:10 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#15) to the change originally created by Fucun Chu. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................

IMPALA-5741: Initial support for reading tiny RDBMS tables

This patch uses the "external data source" mechanism in Impala to
implement data source for querying jdbc.
It has some limitations due to the restrictions of "external data
source":
  - It is not distributed.
  - Only support binary predicates with operators =, !=, <=, >=,
    <, > to be pushed to RDBMS.

Source files under jdbc/conf, jdbc/dao and jdbc/exception are
replicated from Hive JDBC Storage Handler.

In order to query the RDBMS tables, the following steps should be
followed (note that existing data source table will be rebuilt):
1. Make sure that the database driver package has been added to the
classpath and the minicluster cluster has been started.

2. Copy the data source library into HDFS.
${IMPALA_HOME}/testdata/bin/copy-data-sources.sh

3. Create an `alltypes` table in the postgres database.
${IMPALA_HOME}/testdata/bin/load-data-sources.sh

4. Create data sources table(alltypes_jdbc_datasource).
${IMPALA_HOME}/bin/impala-shell.sh -i ${IMPALAD} -f\
  ${IMPALA_HOME}/testdata/bin/create-data-source-table.sql

Testing:
 - Added unit-test for Postgres.
 - Ran core tests successfully.

Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
---
M bin/rat_exclude_files.txt
M fe/src/main/java/org/apache/impala/extdatasource/ExternalDataSourceExecutor.java
M fe/src/test/java/org/apache/impala/service/FrontendTest.java
A java/ext-data-source/jdbc/pom.xml
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/README.md
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/DatabaseType.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DB2DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessorFactory.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MsSqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MySqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/OracleDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/PostgresDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/exception/JdbcDatabaseAccessException.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java
A java/ext-data-source/jdbc/src/test/java/org/apache/impala/extdatasource/jdbc/JdbcDataSourceTest.java
A java/ext-data-source/jdbc/src/test/resources/log4j.properties
A java/ext-data-source/jdbc/src/test/resources/test_script.sql
M java/ext-data-source/pom.xml
M testdata/bin/copy-data-sources.sh
M testdata/bin/create-data-source-table.sql
M testdata/bin/create-load-data.sh
A testdata/bin/load-data-sources.sh
M testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test
30 files changed, 2,084 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/17842/15
-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 15
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-5741: Support reading tiny RDBMS tables

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Support reading tiny RDBMS tables
......................................................................


Patch Set 9:

(7 comments)

Looked through most of the code (but didn't went to the dao parts deeply). Thanks for the great work!

http://gerrit.cloudera.org:8080/#/c/17842/9/fe/src/main/java/org/apache/impala/extdatasource/ExternalDataSourceExecutor.java
File fe/src/main/java/org/apache/impala/extdatasource/ExternalDataSourceExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17842/9/fe/src/main/java/org/apache/impala/extdatasource/ExternalDataSourceExecutor.java@192
PS9, Line 192: debug
This should probably have a higher log level than debug, e.g. warning


http://gerrit.cloudera.org:8080/#/c/17842/9/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java:

http://gerrit.cloudera.org:8080/#/c/17842/9/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java@85
PS9, Line 85: cacheClass
Can you use _ postfix for all members?


http://gerrit.cloudera.org:8080/#/c/17842/9/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java@106
PS9, Line 106: prepare
Can the planner call this more than once? If not, then a Precondition could be added to check this. A state could be also added, e.g. PREPARED or PLANNER_PREPARED.


http://gerrit.cloudera.org:8080/#/c/17842/9/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java@153
PS9, Line 153: Recorder
What does Recorder mean here? Shouldn't it be iterator?


http://gerrit.cloudera.org:8080/#/c/17842/9/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java@190
PS9, Line 190:     if (tableConfig == null) 
Can the tableConfig be non-null here? I think that this will be only called in CREATED state.


http://gerrit.cloudera.org:8080/#/c/17842/9/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java@259
PS9, Line 259:     totalNumberOfRecords = dbAccessor.getTotalNumberOfRecords(tableConfig);
The role of totalNumberOfRecords is not clear to me.
If I understand correctly getTotalNumberOfRecords() runs the count(*) for the query without applying LIMIT, and then totalNumberOfRecords is used in getNext() as an additional stop condition besides the iterator.

Why is it not enough to use the iterator.hasNext() as stop condition? Or maybe also use LIMIT as a sanity check in case it was not successfully pushed down.

I see two problem with calling count(*) first:
- without limit the call is potentially more expensive than the original query
- without transactions it is possible that the table is changed between the count(*) and the "real" query


http://gerrit.cloudera.org:8080/#/c/17842/9/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/DatabaseType.java
File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/DatabaseType.java:

http://gerrit.cloudera.org:8080/#/c/17842/9/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/DatabaseType.java@20
PS9, Line 20: public enum DatabaseType {
The classes in conf and dao directories seem to come from Hive, e.g.:

https://github.com/apache/hive/blob/master/jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/conf/DatabaseType.java

I think that this should be documented somehow, e.g. by adding README.md or a comment to the files that come from Hive. A commit message could also contain some info about the relationship to Hive.

Separating code that is nearly the same as in Hive and Impala specific logic would also help with the review, as code from Hive doesn't need that deep inspection.



-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 9
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 13 Mar 2023 09:45:26 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#25) to the change originally created by Fucun Chu. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................

IMPALA-5741: Initial support for reading tiny RDBMS tables

This patch uses the "external data source" mechanism in Impala to
implement data source for querying jdbc.
It has some limitations due to the restrictions of "external data
source":
  - It is not distributed.
  - Don't support following data types for column data from external
    table: BINARY, CHAR, DATETIME, and COMPLEX.
  - Only support binary predicates with operators =, !=, <=, >=,
    <, > to be pushed to RDBMS.
  - Don't support following data types for predicates:
    DECIMAL, TIMESTAMP, DATE, and BINARY.
  - Don't support external tables with complex types of columns.
  - Only support following RDBMS:
    MySQL, Postgres, Oracle, MSSQL, H2, DB2, and JETHRO_DATA.
  - Don't support Catalog V2 (IMPALA-7131)
  - DataSource objects are not persistent (IMPALA-12375)

Source files under jdbc/conf, jdbc/dao and jdbc/exception are
replicated from Hive JDBC Storage Handler.

In order to query the RDBMS tables, the following steps should be
followed (note that existing data source table will be rebuilt):
1. Make sure the Impala cluster has been started.

2. Copy the jar files of jdbc drivers and the data source library into
HDFS.
${IMPALA_HOME}/testdata/bin/copy-ext-data-sources.sh

3. Create an `alltypes` table in the postgres database.
${IMPALA_HOME}/testdata/bin/load-ext-data-sources.sh

4. Create data sources table(alltypes_jdbc_datasource).
${IMPALA_HOME}/bin/impala-shell.sh -f\
  ${IMPALA_HOME}/testdata/bin/create-ext-data-source-table.sql

Testing:
 - Added unit-test for Postgres and ran unit-test with jdbc driver
   postgresql-42.5.1.jar.
 - Ran manual unit-test for MySql with jdbc driver
   mysql-connector-j-8.1.0.jar.
 - Ran core tests successfully.

Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
---
M bin/rat_exclude_files.txt
M fe/src/main/java/org/apache/impala/extdatasource/ExternalDataSourceExecutor.java
M fe/src/test/java/org/apache/impala/service/FrontendTest.java
A java/ext-data-source/jdbc/pom.xml
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/README.md
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/DatabaseType.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DB2DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessorFactory.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MsSqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MySqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/OracleDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/PostgresDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/exception/JdbcDatabaseAccessException.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java
A java/ext-data-source/jdbc/src/test/java/org/apache/impala/extdatasource/jdbc/JdbcDataSourceTest.java
A java/ext-data-source/jdbc/src/test/resources/log4j.properties
A java/ext-data-source/jdbc/src/test/resources/test_script.sql
M java/ext-data-source/pom.xml
D testdata/bin/copy-data-sources.sh
A testdata/bin/copy-ext-data-sources.sh
D testdata/bin/create-data-source-table.sql
A testdata/bin/create-ext-data-source-table.sql
M testdata/bin/create-load-data.sh
A testdata/bin/load-ext-data-sources.sh
A testdata/workloads/functional-query/queries/QueryTest/jdbc-data-source.test
A tests/query_test/test_ext_data_sources.py
M tests/query_test/test_queries.py
34 files changed, 2,489 insertions(+), 86 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/17842/25
-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 25
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 21:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9782/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 21
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Sat, 30 Sep 2023 15:52:05 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#30) to the change originally created by Fucun Chu. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................

IMPALA-5741: Initial support for reading tiny RDBMS tables

This patch uses the "external data source" mechanism in Impala to
implement data source for querying jdbc.
It has some limitations due to the restrictions of "external data
source":
  - It is not distributed, e.g, fregment is unpartitioned. The queries
    are executed on coordinator.
  - Queries which read following data types from external jdbc tables
    are not supported:
    BINARY, CHAR, DATETIME, and COMPLEX.
  - Only support binary predicates with operators =, !=, <=, >=,
    <, > to be pushed to RDBMS.
  - Following data types are not supported for predicates:
    DECIMAL, TIMESTAMP, DATE, and BINARY.
  - External tables with complex types of columns are not supported.
  - Support is limited to the following databases:
    MySQL, Postgres, Oracle, MSSQL, H2, DB2, and JETHRO_DATA.
  - Catalog V2 is not supported (IMPALA-7131).
  - DataSource objects are not persistent (IMPALA-12375).

Additional fixes are planned on top of this patch.

Source files under jdbc/conf, jdbc/dao and jdbc/exception are
replicated from Hive JDBC Storage Handler.

In order to query the RDBMS tables, the following steps should be
followed (note that existing data source table will be rebuilt):
1. Make sure the Impala cluster has been started.

2. Copy the jar files of jdbc drivers and the data source library into
HDFS.
${IMPALA_HOME}/testdata/bin/copy-ext-data-sources.sh

3. Create an `alltypes` table in the postgres database.
${IMPALA_HOME}/testdata/bin/load-ext-data-sources.sh

4. Create data source tables (alltypes_jdbc_datasource and
alltypes_jdbc_datasource_2).
${IMPALA_HOME}/bin/impala-shell.sh -f\
  ${IMPALA_HOME}/testdata/bin/create-ext-data-source-table.sql

5. It's ready to run query to access data source tables created
in last step. Don't need to restart Impala cluster.

Testing:
 - Added unit-test for Postgres and ran unit-test with jdbc driver
   postgresql-42.5.1.jar.
 - Ran manual unit-test for MySql with jdbc driver
   mysql-connector-j-8.1.0.jar.
 - Ran core tests successfully.

Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
---
M bin/rat_exclude_files.txt
M fe/src/main/java/org/apache/impala/extdatasource/ExternalDataSourceExecutor.java
M fe/src/test/java/org/apache/impala/service/FrontendTest.java
A java/ext-data-source/jdbc/pom.xml
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/README.md
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/DatabaseType.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DB2DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessorFactory.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MsSqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MySqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/OracleDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/PostgresDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/exception/JdbcDatabaseAccessException.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java
A java/ext-data-source/jdbc/src/test/java/org/apache/impala/extdatasource/jdbc/JdbcDataSourceTest.java
A java/ext-data-source/jdbc/src/test/resources/log4j.properties
A java/ext-data-source/jdbc/src/test/resources/test_script.sql
M java/ext-data-source/pom.xml
D testdata/bin/copy-data-sources.sh
A testdata/bin/copy-ext-data-sources.sh
D testdata/bin/create-data-source-table.sql
A testdata/bin/create-ext-data-source-table.sql
M testdata/bin/create-load-data.sh
A testdata/bin/load-ext-data-sources.sh
A testdata/workloads/functional-query/queries/QueryTest/jdbc-data-source.test
A tests/query_test/test_ext_data_sources.py
M tests/query_test/test_queries.py
34 files changed, 2,488 insertions(+), 86 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/17842/30
-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 30
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 29:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/14163/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 29
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 09 Oct 2023 20:28:07 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 18:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/14065/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 18
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 25 Sep 2023 00:58:52 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 18:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/17842/16//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17842/16//COMMIT_MSG@13
PS16, Line 13:   - It is not distributed.
> Does this mean the execution happens just on coordinator? Also, do you know
Yes, it's single scan node and does not support partition. I could not find original design document. But from the comments in Jiras and discussion threads, it's technical/design limitation.


http://gerrit.cloudera.org:8080/#/c/17842/16//COMMIT_MSG@22
PS16, Line 22: 
             : In order 
Changed the code to load driver class from driver jar file so that driver jar files are not required to be in the classpath. Driver jar file has to be specified as the table property when creating data source table.


http://gerrit.cloudera.org:8080/#/c/17842/16//COMMIT_MSG@23
PS16, Line 23: In order to query the RDBMS tables, the following steps should be
> nit: 
fixed.


http://gerrit.cloudera.org:8080/#/c/17842/16/fe/src/test/java/org/apache/impala/service/FrontendTest.java
File fe/src/test/java/org/apache/impala/service/FrontendTest.java:

http://gerrit.cloudera.org:8080/#/c/17842/16/fe/src/test/java/org/apache/impala/service/FrontendTest.java@146
PS16, Line 146:     assertEquals(4, resp.rows.size());
> It's not obvious why the resp.rows.size() changed?
added alltypes_jdbc_datasource.


http://gerrit.cloudera.org:8080/#/c/17842/16/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java:

http://gerrit.cloudera.org:8080/#/c/17842/16/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java@96
PS16, Line 96:   // are called in right order, e.g. state transitions must be in the below order:
> Comments here briefly describing the different states would be good. Also, 
updated comments


http://gerrit.cloudera.org:8080/#/c/17842/16/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java
File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java:

http://gerrit.cloudera.org:8080/#/c/17842/16/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java@22
PS16, Line 22:   // Table properties specified in the create table statement.
> Comments for each property would be good. For instance what does the QUERY 
Added comments for each property.


http://gerrit.cloudera.org:8080/#/c/17842/16/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java
File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java:

http://gerrit.cloudera.org:8080/#/c/17842/16/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java@96
PS16, Line 96:       if (type.types.get(0).getType() != TTypeNodeType.SCALAR) {
> This seems like another limitation which we should mention - "No support fo
Add the limitation is commit messages. We should allow user to query the scalar columns. Will address it in following patches.


http://gerrit.cloudera.org:8080/#/c/17842/16/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java
File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java:

http://gerrit.cloudera.org:8080/#/c/17842/16/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java@84
PS16, Line 84:     } else {
> We're missing a bunch of types here Date, TimeStamp, Binary, Decimal,...
I guess they were nor sure if the jdbc could could support Date, TimeStamp, Binary, Decimal data types for predicates at that time. Added TODO comment and declare the restrictions in commit messages.



-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 18
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 25 Sep 2023 00:31:59 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 22:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/17842/21//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17842/21//COMMIT_MSG@11
PS21, Line 11: It has some limitations due to the restrictions of "external data
> There also seems to be restrictions on column data types being received fro
Right. Updated commit messages to add restriction for column data type.


http://gerrit.cloudera.org:8080/#/c/17842/21/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java:

http://gerrit.cloudera.org:8080/#/c/17842/21/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java@148
PS21, Line 148:     // 1. Check init string again because the call in prepare() was from
              :     // the frontend and used a different instance of this JdbcDataSource class.
              :     if (!convertInitStringToConfiguration(params.getInit_string())) {
              :       return new TOpenResult(
              :           new TStatus(TErrorCode.INTERNAL_ERROR,
              :      
> This is the same check as one in prepare().
This logic is copied from our sample test code AllTypesDataSource.java (https://github.com/apache/impala/blob/master/java/ext-data-source/test/src/main/java/org/apache/impala/extdatasource/AllTypesDataSource.java#L179-L181). There is a comment in AllTypesDataSource.open(), which says "Need to check again because the call in Prepare() was from the frontend and used a different instance of this data source class."
The prepare() API is called by Impala frontend, open() API is called by Impala backend. Frontend and backend create separate  JdbcDataSource objects. We duplicate checking in prepare() and open() for safety. I think we should change external data source mechanism to share JdbcDataSource objects between frontend and backend for the long term, then we can remove these duplicated code.


http://gerrit.cloudera.org:8080/#/c/17842/21/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java
File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java:

http://gerrit.cloudera.org:8080/#/c/17842/21/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java@95
PS21, Line 95:   private static boolean isEmptyString(String value) {
> We don't seem to be calling this function anywhere?
This class is replicated from Hive JDBC Storage Handler. I think we should keep the change to a minimum so that we can easily port from Hive again if there is any improvement or bug fixing in Hive JDBC Storage Handler.


http://gerrit.cloudera.org:8080/#/c/17842/21/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java
File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java:

http://gerrit.cloudera.org:8080/#/c/17842/21/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java@66
PS21, Line 66:   protected DataSource dbcpDataSource = null;
> `dbcpDataSource_` for instance variables
This class is replicated from Hive JDBC Storage Handler. I think we should keep the original naming convention.


http://gerrit.cloudera.org:8080/#/c/17842/21/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java@68
PS21, Line 68:   public static Cache<String, DataSource> dataSourceCache = CacheBuilder
> naming convention: `dataSourceCache_`
This class is replicated from Hive JDBC Storage Handler. I think we should keep the original naming convention.


http://gerrit.cloudera.org:8080/#/c/17842/21/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java
File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java:

http://gerrit.cloudera.org:8080/#/c/17842/21/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java@53
PS21, Line 53:   private final Connection conn;
> proper naming convention for all instance variables.
This class is replicated from Hive JDBC Storage Handler. I think we should keep the original naming convention.


http://gerrit.cloudera.org:8080/#/c/17842/21/testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test
File testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test:

http://gerrit.cloudera.org:8080/#/c/17842/21/testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test@132
PS21, Line 132: 
move tests for jdbc data source to a new file.



-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 22
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 02 Oct 2023 05:56:29 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#22) to the change originally created by Fucun Chu. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................

IMPALA-5741: Initial support for reading tiny RDBMS tables

This patch uses the "external data source" mechanism in Impala to
implement data source for querying jdbc.
It has some limitations due to the restrictions of "external data
source":
  - It is not distributed.
  - Don't support following data types for column data from external
    table: BINARY, CHAR, DATETIME, and COMPLEX.
  - Only support binary predicates with operators =, !=, <=, >=,
    <, > to be pushed to RDBMS.
  - Don't support following data types for predicates:
    DECIMAL, TIMESTAMP, DATE, and BINARY.
  - Don't support external tables with complex types of columns.
  - Only support following RDBMS:
    MySQL, Postgres, Oracle, MSSQL, H2, DB2, and JETHRO_DATA.

Source files under jdbc/conf, jdbc/dao and jdbc/exception are
replicated from Hive JDBC Storage Handler.

In order to query the RDBMS tables, the following steps should be
followed (note that existing data source table will be rebuilt):
1. Make sure the Impala cluster has been started.

2. Copy the jar files of jdbc drivers and the data source library into
HDFS.
${IMPALA_HOME}/testdata/bin/copy-data-sources.sh

3. Create an `alltypes` table in the postgres database.
${IMPALA_HOME}/testdata/bin/load-data-sources.sh

4. Create data sources table(alltypes_jdbc_datasource).
${IMPALA_HOME}/bin/impala-shell.sh -i ${IMPALAD} -f\
  ${IMPALA_HOME}/testdata/bin/create-data-source-table.sql

Testing:
 - Added unit-test for Postgres and ran unit-test with jdbc driver
   postgresql-42.5.1.jar.
 - Ran manual unit-test for MySql with jdbc driver
   mysql-connector-j-8.1.0.jar.
 - Ran core tests successfully.

Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
---
M bin/rat_exclude_files.txt
M fe/src/main/java/org/apache/impala/extdatasource/ExternalDataSourceExecutor.java
M fe/src/test/java/org/apache/impala/service/FrontendTest.java
A java/ext-data-source/jdbc/pom.xml
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/README.md
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/DatabaseType.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DB2DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessorFactory.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MsSqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MySqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/OracleDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/PostgresDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/exception/JdbcDatabaseAccessException.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java
A java/ext-data-source/jdbc/src/test/java/org/apache/impala/extdatasource/jdbc/JdbcDataSourceTest.java
A java/ext-data-source/jdbc/src/test/resources/log4j.properties
A java/ext-data-source/jdbc/src/test/resources/test_script.sql
M java/ext-data-source/pom.xml
M testdata/bin/copy-data-sources.sh
M testdata/bin/create-data-source-table.sql
M testdata/bin/create-load-data.sh
A testdata/bin/load-data-sources.sh
A testdata/workloads/functional-query/queries/QueryTest/jdbc-data-source.test
M tests/query_test/test_queries.py
31 files changed, 2,237 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/17842/22
-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 22
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 20:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9781/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 20
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Sat, 30 Sep 2023 02:07:32 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 22:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/17842/22//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17842/22//COMMIT_MSG@11
PS22, Line 11: It has some limitations due to the restrictions of "external data
             : source":
nit: I assume this patch is also still limited by IMPALA-7131 (does not work for Catalog V2)? If yes, can we mention it here?


http://gerrit.cloudera.org:8080/#/c/17842/22//COMMIT_MSG@39
PS22, Line 39: -i ${IMPALAD}
nit: This can be removed? impala-shell will connect to first impalad.


http://gerrit.cloudera.org:8080/#/c/17842/22/testdata/bin/copy-data-sources.sh
File testdata/bin/copy-data-sources.sh:

http://gerrit.cloudera.org:8080/#/c/17842/22/testdata/bin/copy-data-sources.sh@28
PS22, Line 28: ${FILESYSTEM_PREFIX}/test-warehouse/data-sources/
nit: Contain this path into its own bash variable and use that for the following commands?


http://gerrit.cloudera.org:8080/#/c/17842/22/testdata/bin/copy-data-sources.sh@40
PS22, Line 40: hadoop fs -put -f \
             :   ${IMPALA_HOME}/fe/target/dependency/postgresql-*.jar \
             :   ${FILESYSTEM_PREFIX}/test-warehouse/data-sources/jdbc-driver/postgresql-jdbc.jar
nit: print something at the end of this script to indicate successful run? Maybe like "Copied xxx.jar into <hdfs_path>.".


http://gerrit.cloudera.org:8080/#/c/17842/22/testdata/workloads/functional-query/queries/QueryTest/jdbc-data-source.test
File testdata/workloads/functional-query/queries/QueryTest/jdbc-data-source.test:

http://gerrit.cloudera.org:8080/#/c/17842/22/testdata/workloads/functional-query/queries/QueryTest/jdbc-data-source.test@28
PS22, Line 28: 11,false,1,1,1,10,1.100000023841858,10.1,'01/02/09','1',2009-01-02 00:11:00.450000000
             : 12,true,2,2,2,20,2.200000047683716,20.2,'01/02/09','2',2009-01-02 00:12:00.460000000
             : 13,false,3,3,3,30,3.299999952316284,30.3,'01/02/09','3',2009-01-02 00:13:00.480000000
             : 14,true,4,4,4,40,4.400000095367432,40.4,'01/02/09','4',2009-01-02 00:14:00.510000000
             : 20,true,0,0,0,0,0,0,'01/03/09','0',2009-01-03 00:20:00.900000000
Tried to run the test myself and I get following error:

-- 2023-10-02 11:22:20,076 INFO     MainThread: Started query b548955a0c538511:6d16651b00000000
-- executing against localhost:21000

select *
from alltypes_jdbc_datasource
where id > 10 and int_col< 5 limit 5;

-- 2023-10-02 11:22:20,324 INFO     MainThread: Started query 95474cb079f2d042:878687c800000000
-- 2023-10-02 11:22:20,439 ERROR    MainThread: Comparing QueryTestResults (expected vs actual):
11,false,1,1,1,10,1.100000023841858,10.1,'01/02/09','1',2009-01-02 00:11:00.450000000 != 11,false,1,1,1,10,1.100000023841858,10.1,'01/02/09','1',2009-01-02 08:11:00.450000000
12,true,2,2,2,20,2.200000047683716,20.2,'01/02/09','2',2009-01-02 00:12:00.460000000 != 12,true,2,2,2,20,2.200000047683716,20.2,'01/02/09','2',2009-01-02 08:12:00.460000000
13,false,3,3,3,30,3.299999952316284,30.3,'01/02/09','3',2009-01-02 00:13:00.480000000 != 13,false,3,3,3,30,3.299999952316284,30.3,'01/02/09','3',2009-01-02 08:13:00.480000000
14,true,4,4,4,40,4.400000095367432,40.4,'01/02/09','4',2009-01-02 00:14:00.510000000 != 14,true,4,4,4,40,4.400000095367432,40.4,'01/02/09','4',2009-01-02 08:14:00.510000000
20,true,0,0,0,0,0,0,'01/03/09','0',2009-01-03 00:20:00.900000000 != 20,true,0,0,0,0,0,0,'01/03/09','0',2009-01-03 08:20:00.900000000

Other test cases passed.
Is there something non-deterministic during data loading?



-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 22
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 02 Oct 2023 18:40:24 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 28:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/17842/27//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17842/27//COMMIT_MSG@13
PS27, Line 13:   - It is not distributed, e.g, fregment is unpartitioned. The queries
> Unclear what this actually means. Queries executed on the coordinator?
The fragment is unpartitioned, one instance for each query, and queries are executed on coordinator. The Plan is like:

F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
|  Per-Host Resources: mem-estimate=1.00GB mem-reservation=4.00MB thread-reservation=1
PLAN-ROOT SINK
|  output exprs: id, bool_col, Int_col
|  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0
|
00:SCAN DATA SOURCE [functional.alltypes_jdbc_datasource]
   limit: 10


http://gerrit.cloudera.org:8080/#/c/17842/27//COMMIT_MSG@14
PS27, Line 14:     are executed on coordinator.
> Change wording:
fixed.


http://gerrit.cloudera.org:8080/#/c/17842/27//COMMIT_MSG@16
PS27, Line 16:     are not supported:
> Will between/in/like planned be supported in the future?
Yes


http://gerrit.cloudera.org:8080/#/c/17842/27//COMMIT_MSG@43
PS27, Line 43: 
> Does this require mincluster restart?
No



-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 28
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 09 Oct 2023 18:21:05 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Kurt Deschler (Code Review)" <ge...@cloudera.org>.
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 31: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 31
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 09 Oct 2023 21:14:55 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5741: Support reading tiny RDBMS tables

Posted by "Fucun Chu (Code Review)" <ge...@cloudera.org>.
Fucun Chu has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Support reading tiny RDBMS tables
......................................................................

IMPALA-5741: Support reading tiny RDBMS tables

This patch uses the "external data source" mechanism in Impala and
writes a data source for querying jdbc. It has some limitations:
- It is not distributed.
- Only support binary predicates with operators =, !=, <=, >=, <, > to
be pushed to RDBMS

In order to query the RDBMS tables, the following steps should be
followed (note that existing data source table will be rebuilt):
1. Make sure that the database driver package has been added to the
classpath and the minicluster cluster has been started.

2. Copy the data source library into HDFS.
${IMPALA_HOME}/testdata/bin/copy-data-sources.sh

3. Create an `alltypes` table in the postgres database.
${IMPALA_HOME}/testdata/bin/load-data-sources.sh

4. Create data sources table(alltypes_jdbc_datasource).
${IMPALA_HOME}/bin/impala-shell.sh -i ${IMPALAD} -f\
  ${IMPALA_HOME}/testdata/bin/create-data-source-table.sql

Testing:
- Ran core tests successfully.

Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
---
M fe/src/main/java/org/apache/impala/extdatasource/ExternalDataSourceExecutor.java
M fe/src/test/java/org/apache/impala/service/FrontendTest.java
A java/ext-data-source/jdbc/pom.xml
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/DatabaseType.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DB2DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessorFactory.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MsSqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MySqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/OracleDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/PostgresDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/exception/JdbcDatabaseAccessException.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java
A java/ext-data-source/jdbc/src/test/java/org/apache/impala/extdatasource/jdbc/JdbcDataSourceTest.java
A java/ext-data-source/jdbc/src/test/resources/log4j.properties
A java/ext-data-source/jdbc/src/test/resources/test_script.sql
M java/ext-data-source/pom.xml
M testdata/bin/copy-data-sources.sh
M testdata/bin/create-data-source-table.sql
M testdata/bin/create-load-data.sh
A testdata/bin/load-data-sources.sh
M testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test
28 files changed, 2,003 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/17842/7
-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 7
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-5741: Support reading tiny RDBMS tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Support reading tiny RDBMS tables
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9776/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 7
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Sun, 14 Nov 2021 05:46:14 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5741: Support reading tiny RDBMS tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Support reading tiny RDBMS tables
......................................................................


Patch Set 9:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/12547/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 9
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 06 Mar 2023 09:27:16 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5741: Support reading tiny RDBMS tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Support reading tiny RDBMS tables
......................................................................


Patch Set 13:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13759/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 13
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 15 Aug 2023 22:19:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5741: Support reading tiny RDBMS tables

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Support reading tiny RDBMS tables
......................................................................


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17842/9/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java:

http://gerrit.cloudera.org:8080/#/c/17842/9/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java@259
PS9, Line 259:     }
> The role of totalNumberOfRecords is not clear to me.
Changed code to use totalNumberOfRecords only when schema_.getColsSize() != 0.



-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 12
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 15 Aug 2023 21:02:09 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Kurt Deschler (Code Review)" <ge...@cloudera.org>.
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 27:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17842/29//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17842/29//COMMIT_MSG@20
PS29, Line 20:   - Don't support external tables with complex types of columns.
Please reword all of these the begin with Don't.


http://gerrit.cloudera.org:8080/#/c/17842/29//COMMIT_MSG@23
PS29, Line 23:   - Don't support Catalog V2 (IMPALA-7131)
Support is limited to the following databases:



-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 27
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 09 Oct 2023 20:09:57 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#28) to the change originally created by Fucun Chu. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................

IMPALA-5741: Initial support for reading tiny RDBMS tables

This patch uses the "external data source" mechanism in Impala to
implement data source for querying jdbc.
It has some limitations due to the restrictions of "external data
source":
  - It is not distributed, e.g, fregment is unpartitioned. The queries
    are executed on coordinator.
  - Queries which read following data types from external jdbc tables
    are not supported:
    BINARY, CHAR, DATETIME, and COMPLEX.
  - Only support binary predicates with operators =, !=, <=, >=,
    <, > to be pushed to RDBMS.
  - Don't support following data types for predicates:
    DECIMAL, TIMESTAMP, DATE, and BINARY.
  - Don't support external tables with complex types of columns.
  - Only support following RDBMS:
    MySQL, Postgres, Oracle, MSSQL, H2, DB2, and JETHRO_DATA.
  - Don't support Catalog V2 (IMPALA-7131)
  - DataSource objects are not persistent (IMPALA-12375)

Additional fixes are planned on top of this patch.

Source files under jdbc/conf, jdbc/dao and jdbc/exception are
replicated from Hive JDBC Storage Handler.

In order to query the RDBMS tables, the following steps should be
followed (note that existing data source table will be rebuilt):
1. Make sure the Impala cluster has been started.

2. Copy the jar files of jdbc drivers and the data source library into
HDFS.
${IMPALA_HOME}/testdata/bin/copy-ext-data-sources.sh

3. Create an `alltypes` table in the postgres database.
${IMPALA_HOME}/testdata/bin/load-ext-data-sources.sh

4. Create data source tables (alltypes_jdbc_datasource and
alltypes_jdbc_datasource_2).
${IMPALA_HOME}/bin/impala-shell.sh -f\
  ${IMPALA_HOME}/testdata/bin/create-ext-data-source-table.sql

5. It's ready to run query to access data source tables created
in last step. Don't need to restart Impala cluster.

Testing:
 - Added unit-test for Postgres and ran unit-test with jdbc driver
   postgresql-42.5.1.jar.
 - Ran manual unit-test for MySql with jdbc driver
   mysql-connector-j-8.1.0.jar.
 - Ran core tests successfully.

Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
---
M bin/rat_exclude_files.txt
M fe/src/main/java/org/apache/impala/extdatasource/ExternalDataSourceExecutor.java
M fe/src/test/java/org/apache/impala/service/FrontendTest.java
A java/ext-data-source/jdbc/pom.xml
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/README.md
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/DatabaseType.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DB2DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessorFactory.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MsSqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MySqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/OracleDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/PostgresDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/exception/JdbcDatabaseAccessException.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java
A java/ext-data-source/jdbc/src/test/java/org/apache/impala/extdatasource/jdbc/JdbcDataSourceTest.java
A java/ext-data-source/jdbc/src/test/resources/log4j.properties
A java/ext-data-source/jdbc/src/test/resources/test_script.sql
M java/ext-data-source/pom.xml
D testdata/bin/copy-data-sources.sh
A testdata/bin/copy-ext-data-sources.sh
D testdata/bin/create-data-source-table.sql
A testdata/bin/create-ext-data-source-table.sql
M testdata/bin/create-load-data.sh
A testdata/bin/load-ext-data-sources.sh
A testdata/workloads/functional-query/queries/QueryTest/jdbc-data-source.test
A tests/query_test/test_ext_data_sources.py
M tests/query_test/test_queries.py
34 files changed, 2,488 insertions(+), 86 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/17842/28
-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 28
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Kurt Deschler (Code Review)" <ge...@cloudera.org>.
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 27:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/17842/27//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17842/27//COMMIT_MSG@13
PS27, Line 13:   - It is not distributed.
Unclear what this actually means. Queries executed on the coordinator?


http://gerrit.cloudera.org:8080/#/c/17842/27//COMMIT_MSG@14
PS27, Line 14:   - Don't support following data types for column data from external
Change wording:
From: Don't support queries... Don't support datatypes XYZ.
To: Queries... are not supported. ... XYZ datatypes are not supported.


http://gerrit.cloudera.org:8080/#/c/17842/27//COMMIT_MSG@16
PS27, Line 16:   - Only support binary predicates with operators =, !=, <=, >=,
Will between/in/like planned be supported in the future?


http://gerrit.cloudera.org:8080/#/c/17842/27//COMMIT_MSG@43
PS27, Line 43: 
Does this require mincluster restart?



-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 27
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 09 Oct 2023 15:51:09 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP IMPALA-5741: Support reading tiny RDBMS tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: WIP IMPALA-5741: Support reading tiny RDBMS tables
......................................................................


Patch Set 5:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/17842/5/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java:

http://gerrit.cloudera.org:8080/#/c/17842/5/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java@249
PS5, Line 249:     sb.append(" FROM ").append(tableConfig.get(JdbcStorageConfig.TABLE.getPropertyName()));
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17842/5/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java
File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java:

http://gerrit.cloudera.org:8080/#/c/17842/5/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java@21
PS5, Line 21:  * JethroData specific data accessor. This is needed because JethroData JDBC drivers do not
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17842/5/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/OracleDatabaseAccessor.java
File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/OracleDatabaseAccessor.java:

http://gerrit.cloudera.org:8080/#/c/17842/5/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/OracleDatabaseAccessor.java@21
PS5, Line 21:  * Oracle specific data accessor. This is needed because Oracle JDBC drivers do not support
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17842/5/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/OracleDatabaseAccessor.java@37
PS5, Line 37:       // A simple ROWNUM > offset and ROWNUM <= (offset + limit) won't work, it will return nothing
line too long (99 > 90)


http://gerrit.cloudera.org:8080/#/c/17842/5/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/OracleDatabaseAccessor.java@40
PS5, Line 40:           + ROW_NUM_COLUMN_NAME + " >" + offset + " AND " + ROW_NUM_COLUMN_NAME + " <=" + (
line too long (91 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 5
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 03 Nov 2021 02:20:30 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 18:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17842/18/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java
File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java:

http://gerrit.cloudera.org:8080/#/c/17842/18/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java@23
PS18, Line 23:   // The database from which the external table comes, such as MySQL,ORACLE,POSTGRES,etc.
line too long (95 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 18
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 25 Sep 2023 00:32:11 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Abhishek Rawat (Code Review)" <ge...@cloudera.org>.
Abhishek Rawat has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 16:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/17842/16//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17842/16//COMMIT_MSG@13
PS16, Line 13:   - It is not distributed.
Does this mean the execution happens just on coordinator? Also, do you know why do we have this limitation? It sounds more like missing support rather than some technical/design limitation.


http://gerrit.cloudera.org:8080/#/c/17842/16//COMMIT_MSG@23
PS16, Line 23: classpath and the minicluster cluster has been started.
nit: 
"minicluster cluster" => minicluster
or 
"minicluster cluster" => Impala cluster 
?


http://gerrit.cloudera.org:8080/#/c/17842/16/fe/src/test/java/org/apache/impala/service/FrontendTest.java
File fe/src/test/java/org/apache/impala/service/FrontendTest.java:

http://gerrit.cloudera.org:8080/#/c/17842/16/fe/src/test/java/org/apache/impala/service/FrontendTest.java@146
PS16, Line 146:     assertEquals(4, resp.rows.size());
It's not obvious why the resp.rows.size() changed?


http://gerrit.cloudera.org:8080/#/c/17842/16/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java:

http://gerrit.cloudera.org:8080/#/c/17842/16/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java@96
PS16, Line 96:     CREATED,
Comments here briefly describing the different states would be good. Also, the possible state transitions. Can we go from CREATED->CLOSED ?


http://gerrit.cloudera.org:8080/#/c/17842/16/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java
File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java:

http://gerrit.cloudera.org:8080/#/c/17842/16/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java@22
PS16, Line 22:   DATABASE_TYPE("database.type", true),
Comments for each property would be good. For instance what does the QUERY mean in this context? Or what's a COLUMN_MAPPING. Would be good to explain each property.


http://gerrit.cloudera.org:8080/#/c/17842/16/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java
File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java:

http://gerrit.cloudera.org:8080/#/c/17842/16/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java@96
PS16, Line 96:       if (type.types.get(0).getType() != TTypeNodeType.SCALAR) {
This seems like another limitation which we should mention - "No support for external tables with complex types". If an external table has both scalar and complex types and we want to only query the scalar columns would that use case still be supported?


http://gerrit.cloudera.org:8080/#/c/17842/16/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java
File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java:

http://gerrit.cloudera.org:8080/#/c/17842/16/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java@84
PS16, Line 84:     } else {
We're missing a bunch of types here Date, TimeStamp, Binary, Decimal,...
Not sure what the TODO means?
https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java#L139
```
// TODO: we support DECIMAL, TIMESTAMP and DATE but no way to specify it in SQL.
```



-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 16
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 21 Sep 2023 20:16:11 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5741: Support reading tiny RDBMS tables

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Support reading tiny RDBMS tables
......................................................................


Patch Set 9:

(1 comment)

Fucun Chu, do you plan to continue working on this patch?

http://gerrit.cloudera.org:8080/#/c/17842/9/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java:

http://gerrit.cloudera.org:8080/#/c/17842/9/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java@131
PS9, Line 131:  dbAccessor = DatabaseAccessorFactory.getAccessor(tableConfig);
Is it possible to create dbAccessor in prepare() API and get totalNumberOfRecords so that we can return totalNumberOfRecords as num_rows_estimate in prepare()?



-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 9
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 04 Aug 2023 00:03:19 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 23:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/17842/22//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17842/22//COMMIT_MSG@11
PS22, Line 11: It has some limitations due to the restrictions of "external data
             : source":
> nit: I assume this patch is also still limited by IMPALA-7131 (does not wor
mentioned IMPALA-7131 and IMPALA-12375


http://gerrit.cloudera.org:8080/#/c/17842/22//COMMIT_MSG@33
PS22, Line 33: c drivers and the da
> nit: General comment about naming: instead of referring things with *data-s
renamed the scripts as suggested.


http://gerrit.cloudera.org:8080/#/c/17842/22//COMMIT_MSG@39
PS22, Line 39: 
> nit: This can be removed? impala-shell will connect to first impalad.
removed


http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java:

http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java@273
PS22, Line 273:     String project;
> Is quoting added somewhere? Postgres will convert unquoted capitals to lowe
The case sensitive column names must be provided in column mapping property in create table statement since Impala is case in-sensitive and column names are save as low case. The column names in column mapping property will be quoted with double quotes if the name is not quoted, and jdbc table name will be quoted if the column mapping is not empty.
See sample create table statement in testdata/bin/create-ext-data-source-table.sql


http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java
File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java:

http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java@96
PS22, Line 96:       // TODO: If a target database cannot flatten this view query, try to text
> Are we sure all of the target databases will flatten this view query? Other
This function is replicated from Hive JDBC Storage Handler. I think it had been tested. We did not find issue in Postgres and MySql so far. Add TODO comment and will revisit this code when finding issue on a target database.


http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java@285
PS22, Line 285:     // by calling BasicDataSource.setDriverClassLoader.
> Are these database-specific?
These are configurations for dbcp (Database Connection Pools) component, and not database-specific.


http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java
File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java:

http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java@86
PS22, Line 86:     try {
> This should re-throw
fixed


http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java@175
PS22, Line 175:   public void close() throws JdbcDatabaseAccessException {
> probably should re-throw
fixed


http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java
File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java:

http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java@39
PS22, Line 39:     return "Select * from (" + sql + ") as \"tmp\" limit " + limit;
> This may not always flatten well. consider text replacement instead.
This class is replicated from Hive JDBC Storage Handler. I guess it should be working. I will leave it as is until we have chance to run test again JethroData database.


http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java
File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java:

http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java@54
PS22, Line 54:         joiner.add(String.format("%s %s %s", name, op, value));
> May need quoting here.
jdbc column names in columnMapping are quoted.


http://gerrit.cloudera.org:8080/#/c/17842/22/testdata/bin/copy-data-sources.sh
File testdata/bin/copy-data-sources.sh:

http://gerrit.cloudera.org:8080/#/c/17842/22/testdata/bin/copy-data-sources.sh@28
PS22, Line 28: 
> nit: Contain this path into its own bash variable and use that for the foll
defined variables EXT_DATA_SOURCES_PATH and JDBC_DRIVERS_PATH


http://gerrit.cloudera.org:8080/#/c/17842/22/testdata/bin/copy-data-sources.sh@40
PS22, Line 40: 
             : 
             : 
> nit: print something at the end of this script to indicate successful run? 
Added printing messages as suggested


http://gerrit.cloudera.org:8080/#/c/17842/23/testdata/bin/copy-ext-data-sources.sh
File testdata/bin/copy-ext-data-sources.sh:

http://gerrit.cloudera.org:8080/#/c/17842/23/testdata/bin/copy-ext-data-sources.sh@39
PS23, Line 39: echo "Copied" ${IMPALA_HOME}/java/ext-data-source/test/target/impala-data-source-test-*.jar \
> line too long (93 > 90)
fixed


http://gerrit.cloudera.org:8080/#/c/17842/23/testdata/bin/copy-ext-data-sources.sh@46
PS23, Line 46: echo "Copied" ${IMPALA_HOME}/java/ext-data-source/jdbc/target/impala-data-source-jdbc-*.jar \
> line too long (93 > 90)
fixed


http://gerrit.cloudera.org:8080/#/c/17842/22/testdata/bin/create-data-source-table.sql
File testdata/bin/create-data-source-table.sql:

http://gerrit.cloudera.org:8080/#/c/17842/22/testdata/bin/create-data-source-table.sql@56
PS22, Line 56: 
> Add tests that require quoting
Added a Postgres table with case sensitive column names and table name. Made alltypes_jdbc_datasource_2 point to that new Postgres table. Added column.mapping table property to map columns between Impala and Postgres. Added unit-test to read the new Postgres's table.


http://gerrit.cloudera.org:8080/#/c/17842/22/testdata/workloads/functional-query/queries/QueryTest/jdbc-data-source.test
File testdata/workloads/functional-query/queries/QueryTest/jdbc-data-source.test:

http://gerrit.cloudera.org:8080/#/c/17842/22/testdata/workloads/functional-query/queries/QueryTest/jdbc-data-source.test@28
PS22, Line 28: 11,false,1,1,1,10,1.100000023841858,10.1,'01/02/09','1',2009-01-02 00:11:00.450000000
             : 12,true,2,2,2,20,2.200000047683716,20.2,'01/02/09','2',2009-01-02 00:12:00.460000000
             : 13,false,3,3,3,30,3.299999952316284,30.3,'01/02/09','3',2009-01-02 00:13:00.480000000
             : 14,true,4,4,4,40,4.400000095367432,40.4,'01/02/09','4',2009-01-02 00:14:00.510000000
             : 20,true,0,0,0,0,0,0,'01/03/09','0',2009-01-03 00:20:00.900000000
> Tried to run the test myself and I get following error:
This is time zone issue. jdbc driver use system time zone by default, but Impala use UTC by default. Fixed the issue when reading the value of timestamp column from RecordSet by calling ResultSet.getTimestamp() with UTC time zone.



-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 23
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 04 Oct 2023 03:48:07 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#24) to the change originally created by Fucun Chu. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................

IMPALA-5741: Initial support for reading tiny RDBMS tables

This patch uses the "external data source" mechanism in Impala to
implement data source for querying jdbc.
It has some limitations due to the restrictions of "external data
source":
  - It is not distributed.
  - Don't support following data types for column data from external
    table: BINARY, CHAR, DATETIME, and COMPLEX.
  - Only support binary predicates with operators =, !=, <=, >=,
    <, > to be pushed to RDBMS.
  - Don't support following data types for predicates:
    DECIMAL, TIMESTAMP, DATE, and BINARY.
  - Don't support external tables with complex types of columns.
  - Only support following RDBMS:
    MySQL, Postgres, Oracle, MSSQL, H2, DB2, and JETHRO_DATA.
  - Don't support Catalog V2 (IMPALA-7131)
  - DataSource objects are not persistent (IMPALA-12375)

Source files under jdbc/conf, jdbc/dao and jdbc/exception are
replicated from Hive JDBC Storage Handler.

In order to query the RDBMS tables, the following steps should be
followed (note that existing data source table will be rebuilt):
1. Make sure the Impala cluster has been started.

2. Copy the jar files of jdbc drivers and the data source library into
HDFS.
${IMPALA_HOME}/testdata/bin/copy-data-sources.sh

3. Create an `alltypes` table in the postgres database.
${IMPALA_HOME}/testdata/bin/load-data-sources.sh

4. Create data sources table(alltypes_jdbc_datasource).
${IMPALA_HOME}/bin/impala-shell.sh -f\
  ${IMPALA_HOME}/testdata/bin/create-data-source-table.sql

Testing:
 - Added unit-test for Postgres and ran unit-test with jdbc driver
   postgresql-42.5.1.jar.
 - Ran manual unit-test for MySql with jdbc driver
   mysql-connector-j-8.1.0.jar.
 - Ran core tests successfully.

Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
---
M bin/rat_exclude_files.txt
M fe/src/main/java/org/apache/impala/extdatasource/ExternalDataSourceExecutor.java
M fe/src/test/java/org/apache/impala/service/FrontendTest.java
A java/ext-data-source/jdbc/pom.xml
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/README.md
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/DatabaseType.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DB2DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessorFactory.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MsSqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MySqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/OracleDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/PostgresDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/exception/JdbcDatabaseAccessException.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java
A java/ext-data-source/jdbc/src/test/java/org/apache/impala/extdatasource/jdbc/JdbcDataSourceTest.java
A java/ext-data-source/jdbc/src/test/resources/log4j.properties
A java/ext-data-source/jdbc/src/test/resources/test_script.sql
M java/ext-data-source/pom.xml
D testdata/bin/copy-data-sources.sh
A testdata/bin/copy-ext-data-sources.sh
D testdata/bin/create-data-source-table.sql
A testdata/bin/create-ext-data-source-table.sql
M testdata/bin/create-load-data.sh
A testdata/bin/load-ext-data-sources.sh
A testdata/workloads/functional-query/queries/QueryTest/jdbc-data-source.test
M tests/query_test/test_queries.py
33 files changed, 2,408 insertions(+), 82 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/17842/24
-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 24
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 25:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/17842/24//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17842/24//COMMIT_MSG@35
PS24, Line 35: copy-ext-data-source
> nit: copy-ext-data-sources.sh
fixed


http://gerrit.cloudera.org:8080/#/c/17842/24//COMMIT_MSG@42
PS24, Line 42: create-ext-data-source-table
> nit: create-ext-data-source-table.sql
fixed


http://gerrit.cloudera.org:8080/#/c/17842/25/tests/query_test/test_ext_data_sources.py
File tests/query_test/test_ext_data_sources.py:

http://gerrit.cloudera.org:8080/#/c/17842/25/tests/query_test/test_ext_data_sources.py@61
PS25, Line 61:   
> line has trailing whitespace
fixed


http://gerrit.cloudera.org:8080/#/c/17842/25/tests/query_test/test_ext_data_sources.py@61
PS25, Line 61:   
> flake8: W293 blank line contains whitespace
fixed


http://gerrit.cloudera.org:8080/#/c/17842/25/tests/query_test/test_ext_data_sources.py@85
PS25, Line 85: 
> flake8: W391 blank line at end of file
fixed


http://gerrit.cloudera.org:8080/#/c/17842/24/tests/query_test/test_queries.py
File tests/query_test/test_queries.py:

http://gerrit.cloudera.org:8080/#/c/17842/24/tests/query_test/test_queries.py@243
PS24, Line 243:     # These results will vary slightly depending on how the values get split up
              :     # so only run with 1 node and on text.
> nit: Can we add metadata test verifying the values of table properties __IM
Added test case to verify table properties.
Moved data sources related test cases to a new file



-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 25
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 05 Oct 2023 00:02:33 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#27) to the change originally created by Fucun Chu. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................

IMPALA-5741: Initial support for reading tiny RDBMS tables

This patch uses the "external data source" mechanism in Impala to
implement data source for querying jdbc.
It has some limitations due to the restrictions of "external data
source":
  - It is not distributed.
  - Don't support following data types for column data from external
    table: BINARY, CHAR, DATETIME, and COMPLEX.
  - Only support binary predicates with operators =, !=, <=, >=,
    <, > to be pushed to RDBMS.
  - Don't support following data types for predicates:
    DECIMAL, TIMESTAMP, DATE, and BINARY.
  - Don't support external tables with complex types of columns.
  - Only support following RDBMS:
    MySQL, Postgres, Oracle, MSSQL, H2, DB2, and JETHRO_DATA.
  - Don't support Catalog V2 (IMPALA-7131)
  - DataSource objects are not persistent (IMPALA-12375)

Source files under jdbc/conf, jdbc/dao and jdbc/exception are
replicated from Hive JDBC Storage Handler.

In order to query the RDBMS tables, the following steps should be
followed (note that existing data source table will be rebuilt):
1. Make sure the Impala cluster has been started.

2. Copy the jar files of jdbc drivers and the data source library into
HDFS.
${IMPALA_HOME}/testdata/bin/copy-ext-data-sources.sh

3. Create an `alltypes` table in the postgres database.
${IMPALA_HOME}/testdata/bin/load-ext-data-sources.sh

4. Create data sources table(alltypes_jdbc_datasource).
${IMPALA_HOME}/bin/impala-shell.sh -f\
  ${IMPALA_HOME}/testdata/bin/create-ext-data-source-table.sql

Testing:
 - Added unit-test for Postgres and ran unit-test with jdbc driver
   postgresql-42.5.1.jar.
 - Ran manual unit-test for MySql with jdbc driver
   mysql-connector-j-8.1.0.jar.
 - Ran core tests successfully.

Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
---
M bin/rat_exclude_files.txt
M fe/src/main/java/org/apache/impala/extdatasource/ExternalDataSourceExecutor.java
M fe/src/test/java/org/apache/impala/service/FrontendTest.java
A java/ext-data-source/jdbc/pom.xml
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/README.md
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/DatabaseType.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DB2DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessorFactory.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MsSqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MySqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/OracleDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/PostgresDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/exception/JdbcDatabaseAccessException.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java
A java/ext-data-source/jdbc/src/test/java/org/apache/impala/extdatasource/jdbc/JdbcDataSourceTest.java
A java/ext-data-source/jdbc/src/test/resources/log4j.properties
A java/ext-data-source/jdbc/src/test/resources/test_script.sql
M java/ext-data-source/pom.xml
D testdata/bin/copy-data-sources.sh
A testdata/bin/copy-ext-data-sources.sh
D testdata/bin/create-data-source-table.sql
A testdata/bin/create-ext-data-source-table.sql
M testdata/bin/create-load-data.sh
A testdata/bin/load-ext-data-sources.sh
A testdata/workloads/functional-query/queries/QueryTest/jdbc-data-source.test
A tests/query_test/test_ext_data_sources.py
M tests/query_test/test_queries.py
34 files changed, 2,488 insertions(+), 86 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/17842/27
-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 27
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-5741: Support reading tiny RDBMS tables

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#12) to the change originally created by Fucun Chu. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Support reading tiny RDBMS tables
......................................................................

IMPALA-5741: Support reading tiny RDBMS tables

This patch uses the "external data source" mechanism in Impala to
implement data source for querying jdbc.
It has some limitations due to the restrictions of "external data
source":
  - It is not distributed.
  - Only support binary predicates with operators =, !=, <=, >=,
    <, > to be pushed to RDBMS.

Source files under jdbc/conf, jdbc/dao and jdbc/exception are
replicated from Hive JDBC Storage Handler.

In order to query the RDBMS tables, the following steps should be
followed (note that existing data source table will be rebuilt):
1. Make sure that the database driver package has been added to the
classpath and the minicluster cluster has been started.

2. Copy the data source library into HDFS.
${IMPALA_HOME}/testdata/bin/copy-data-sources.sh

3. Create an `alltypes` table in the postgres database.
${IMPALA_HOME}/testdata/bin/load-data-sources.sh

4. Create data sources table(alltypes_jdbc_datasource).
${IMPALA_HOME}/bin/impala-shell.sh -i ${IMPALAD} -f\
  ${IMPALA_HOME}/testdata/bin/create-data-source-table.sql

Testing:
- Ran core tests successfully.

Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
---
M fe/src/main/java/org/apache/impala/extdatasource/ExternalDataSourceExecutor.java
M fe/src/test/java/org/apache/impala/service/FrontendTest.java
A java/ext-data-source/jdbc/pom.xml
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/README.md
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/DatabaseType.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DB2DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessorFactory.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MsSqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MySqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/OracleDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/PostgresDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/exception/JdbcDatabaseAccessException.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java
A java/ext-data-source/jdbc/src/test/java/org/apache/impala/extdatasource/jdbc/JdbcDataSourceTest.java
A java/ext-data-source/jdbc/src/test/resources/log4j.properties
A java/ext-data-source/jdbc/src/test/resources/test_script.sql
M java/ext-data-source/pom.xml
M testdata/bin/copy-data-sources.sh
M testdata/bin/create-data-source-table.sql
M testdata/bin/create-load-data.sh
A testdata/bin/load-data-sources.sh
M testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test
29 files changed, 2,083 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/17842/12
-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 12
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#29) to the change originally created by Fucun Chu. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................

IMPALA-5741: Initial support for reading tiny RDBMS tables

This patch uses the "external data source" mechanism in Impala to
implement data source for querying jdbc.
It has some limitations due to the restrictions of "external data
source":
  - It is not distributed, e.g, fregment is unpartitioned. The queries
    are executed on coordinator.
  - Queries which read following data types from external jdbc tables
    are not supported:
    BINARY, CHAR, DATETIME, and COMPLEX.
  - Only support binary predicates with operators =, !=, <=, >=,
    <, > to be pushed to RDBMS.
  - Don't support following data types for predicates:
    DECIMAL, TIMESTAMP, DATE, and BINARY.
  - Don't support external tables with complex types of columns.
  - Only support following RDBMS:
    MySQL, Postgres, Oracle, MSSQL, H2, DB2, and JETHRO_DATA.
  - Don't support Catalog V2 (IMPALA-7131)
  - DataSource objects are not persistent (IMPALA-12375)

Additional fixes are planned on top of this patch.

Source files under jdbc/conf, jdbc/dao and jdbc/exception are
replicated from Hive JDBC Storage Handler.

In order to query the RDBMS tables, the following steps should be
followed (note that existing data source table will be rebuilt):
1. Make sure the Impala cluster has been started.

2. Copy the jar files of jdbc drivers and the data source library into
HDFS.
${IMPALA_HOME}/testdata/bin/copy-ext-data-sources.sh

3. Create an `alltypes` table in the postgres database.
${IMPALA_HOME}/testdata/bin/load-ext-data-sources.sh

4. Create data source tables (alltypes_jdbc_datasource and
alltypes_jdbc_datasource_2).
${IMPALA_HOME}/bin/impala-shell.sh -f\
  ${IMPALA_HOME}/testdata/bin/create-ext-data-source-table.sql

5. It's ready to run query to access data source tables created
in last step. Don't need to restart Impala cluster.

Testing:
 - Added unit-test for Postgres and ran unit-test with jdbc driver
   postgresql-42.5.1.jar.
 - Ran manual unit-test for MySql with jdbc driver
   mysql-connector-j-8.1.0.jar.
 - Ran core tests successfully.

Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
---
M bin/rat_exclude_files.txt
M fe/src/main/java/org/apache/impala/extdatasource/ExternalDataSourceExecutor.java
M fe/src/test/java/org/apache/impala/service/FrontendTest.java
A java/ext-data-source/jdbc/pom.xml
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/README.md
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/DatabaseType.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DB2DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessorFactory.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MsSqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MySqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/OracleDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/PostgresDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/exception/JdbcDatabaseAccessException.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java
A java/ext-data-source/jdbc/src/test/java/org/apache/impala/extdatasource/jdbc/JdbcDataSourceTest.java
A java/ext-data-source/jdbc/src/test/resources/log4j.properties
A java/ext-data-source/jdbc/src/test/resources/test_script.sql
M java/ext-data-source/pom.xml
D testdata/bin/copy-data-sources.sh
A testdata/bin/copy-ext-data-sources.sh
D testdata/bin/create-data-source-table.sql
A testdata/bin/create-ext-data-source-table.sql
M testdata/bin/create-load-data.sh
A testdata/bin/load-ext-data-sources.sh
A testdata/workloads/functional-query/queries/QueryTest/jdbc-data-source.test
A tests/query_test/test_ext_data_sources.py
M tests/query_test/test_queries.py
34 files changed, 2,488 insertions(+), 86 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/17842/29
-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 29
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 30:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17842/29//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17842/29//COMMIT_MSG@20
PS29, Line 20:   - Following data types are not supported for predicates:
> Please reword all of these the begin with Don't.
fixed


http://gerrit.cloudera.org:8080/#/c/17842/29//COMMIT_MSG@23
PS29, Line 23:   - Support is limited to the following databases:
> Support is limited to the following databases:
fixed



-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 30
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 09 Oct 2023 20:34:51 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP IMPALA-5741: Support reading tiny RDBMS tables

Posted by "Fucun Chu (Code Review)" <ge...@cloudera.org>.
Fucun Chu has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/17842 )

Change subject: WIP IMPALA-5741: Support reading tiny RDBMS tables
......................................................................

WIP IMPALA-5741: Support reading tiny RDBMS tables

This patch uses the "external data source" mechanism in Impala and
writes a data source for querying jdbc. It has some limitations:
- It is not distributed.
- Only support binary predicates with operators =, !=, <=, >=, <, > to
be pushed to RDBMS

In order to query the RDBMS tables, the following steps should be
followed (note that existing data source table will be rebuilt):
1. Make sure that the database driver package has been added to the
classpath and the minicluster cluster has been started.

2. Copy the data source library into HDFS.
${IMPALA_HOME}/testdata/bin/copy-data-sources.sh

3. Create an `alltypes` table in the postgres database.
${IMPALA_HOME}/testdata/bin/load-data-sources.sh

4. Create data sources table(alltypes_jdbc_datasource).
${IMPALA_HOME}/bin/impala-shell.sh -i ${IMPALAD} -f\
  ${IMPALA_HOME}/testdata/bin/create-data-source-table.sql

Testing:
- Ran core tests successfully.

Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
---
M fe/src/main/java/org/apache/impala/extdatasource/ExternalDataSourceExecutor.java
M fe/src/test/java/org/apache/impala/service/FrontendTest.java
A java/ext-data-source/jdbc/pom.xml
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/DatabaseType.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DB2DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessorFactory.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MsSqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MySqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/OracleDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/PostgresDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/exception/JdbcDatabaseAccessException.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java
A java/ext-data-source/jdbc/src/test/java/org/apache/impala/extdatasource/jdbc/JdbcDataSourceTest.java
A java/ext-data-source/jdbc/src/test/resources/log4j.properties
A java/ext-data-source/jdbc/src/test/resources/test_script.sql
M java/ext-data-source/pom.xml
M testdata/bin/copy-data-sources.sh
M testdata/bin/create-data-source-table.sql
M testdata/bin/create-load-data.sh
A testdata/bin/load-data-sources.sh
M testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test
28 files changed, 1,977 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/17842/6
-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 6
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-5741: Support reading tiny RDBMS tables

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#13) to the change originally created by Fucun Chu. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Support reading tiny RDBMS tables
......................................................................

IMPALA-5741: Support reading tiny RDBMS tables

This patch uses the "external data source" mechanism in Impala to
implement data source for querying jdbc.
It has some limitations due to the restrictions of "external data
source":
  - It is not distributed.
  - Only support binary predicates with operators =, !=, <=, >=,
    <, > to be pushed to RDBMS.

Source files under jdbc/conf, jdbc/dao and jdbc/exception are
replicated from Hive JDBC Storage Handler.

In order to query the RDBMS tables, the following steps should be
followed (note that existing data source table will be rebuilt):
1. Make sure that the database driver package has been added to the
classpath and the minicluster cluster has been started.

2. Copy the data source library into HDFS.
${IMPALA_HOME}/testdata/bin/copy-data-sources.sh

3. Create an `alltypes` table in the postgres database.
${IMPALA_HOME}/testdata/bin/load-data-sources.sh

4. Create data sources table(alltypes_jdbc_datasource).
${IMPALA_HOME}/bin/impala-shell.sh -i ${IMPALAD} -f\
  ${IMPALA_HOME}/testdata/bin/create-data-source-table.sql

Testing:
- Ran core tests successfully.

Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
---
M bin/rat_exclude_files.txt
M fe/src/main/java/org/apache/impala/extdatasource/ExternalDataSourceExecutor.java
M fe/src/test/java/org/apache/impala/service/FrontendTest.java
A java/ext-data-source/jdbc/pom.xml
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/README.md
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/DatabaseType.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DB2DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessorFactory.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MsSqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MySqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/OracleDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/PostgresDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/exception/JdbcDatabaseAccessException.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java
A java/ext-data-source/jdbc/src/test/java/org/apache/impala/extdatasource/jdbc/JdbcDataSourceTest.java
A java/ext-data-source/jdbc/src/test/resources/log4j.properties
A java/ext-data-source/jdbc/src/test/resources/test_script.sql
M java/ext-data-source/pom.xml
M testdata/bin/copy-data-sources.sh
M testdata/bin/create-data-source-table.sql
M testdata/bin/create-load-data.sh
A testdata/bin/load-data-sources.sh
M testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test
30 files changed, 2,084 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/17842/13
-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 13
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 27:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/14141/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 27
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 05 Oct 2023 04:42:26 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 24:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/14138/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 24
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 04 Oct 2023 04:13:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 26:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/14140/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 26
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 05 Oct 2023 00:30:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 31: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 31
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 09 Oct 2023 21:15:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#18) to the change originally created by Fucun Chu. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................

IMPALA-5741: Initial support for reading tiny RDBMS tables

This patch uses the "external data source" mechanism in Impala to
implement data source for querying jdbc.
It has some limitations due to the restrictions of "external data
source":
  - It is not distributed.
  - Only support binary predicates with operators =, !=, <=, >=,
    <, > to be pushed to RDBMS.
  - Don't support following data type for predicates:
    DECIMAL, TIMESTAMP, DATE, and binary.
  - Don't support external tables with complex types of columns.

Source files under jdbc/conf, jdbc/dao and jdbc/exception are
replicated from Hive JDBC Storage Handler.

In order to query the RDBMS tables, the following steps should be
followed (note that existing data source table will be rebuilt):
1. Make sure the Impala cluster has been started.

2. Copy the jar files of jdbc drivers and the data source library into
HDFS.
${IMPALA_HOME}/testdata/bin/copy-data-sources.sh

3. Create an `alltypes` table in the postgres database.
${IMPALA_HOME}/testdata/bin/load-data-sources.sh

4. Create data sources table(alltypes_jdbc_datasource).
${IMPALA_HOME}/bin/impala-shell.sh -i ${IMPALAD} -f\
  ${IMPALA_HOME}/testdata/bin/create-data-source-table.sql

Testing:
 - Added unit-test for Postgres and ran unit-test with jdbc driver
   postgresql-42.5.1.jar.
 - Ran manual unit-test for MySql with jdbc driver
   mysql-connector-j-8.1.0.jar.
 - Ran core tests successfully.

Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
---
M bin/rat_exclude_files.txt
M fe/src/main/java/org/apache/impala/extdatasource/ExternalDataSourceExecutor.java
M fe/src/test/java/org/apache/impala/service/FrontendTest.java
A java/ext-data-source/jdbc/pom.xml
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/README.md
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/DatabaseType.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DB2DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessorFactory.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MsSqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MySqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/OracleDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/PostgresDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/exception/JdbcDatabaseAccessException.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java
A java/ext-data-source/jdbc/src/test/java/org/apache/impala/extdatasource/jdbc/JdbcDataSourceTest.java
A java/ext-data-source/jdbc/src/test/resources/log4j.properties
A java/ext-data-source/jdbc/src/test/resources/test_script.sql
M java/ext-data-source/pom.xml
M testdata/bin/copy-data-sources.sh
M testdata/bin/create-data-source-table.sql
M testdata/bin/create-load-data.sh
A testdata/bin/load-data-sources.sh
M testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test
30 files changed, 2,155 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/17842/18
-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 18
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#19) to the change originally created by Fucun Chu. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................

IMPALA-5741: Initial support for reading tiny RDBMS tables

This patch uses the "external data source" mechanism in Impala to
implement data source for querying jdbc.
It has some limitations due to the restrictions of "external data
source":
  - It is not distributed.
  - Only support binary predicates with operators =, !=, <=, >=,
    <, > to be pushed to RDBMS.
  - Don't support following data type for predicates:
    DECIMAL, TIMESTAMP, DATE, and binary.
  - Don't support external tables with complex types of columns.

Source files under jdbc/conf, jdbc/dao and jdbc/exception are
replicated from Hive JDBC Storage Handler.

In order to query the RDBMS tables, the following steps should be
followed (note that existing data source table will be rebuilt):
1. Make sure the Impala cluster has been started.

2. Copy the jar files of jdbc drivers and the data source library into
HDFS.
${IMPALA_HOME}/testdata/bin/copy-data-sources.sh

3. Create an `alltypes` table in the postgres database.
${IMPALA_HOME}/testdata/bin/load-data-sources.sh

4. Create data sources table(alltypes_jdbc_datasource).
${IMPALA_HOME}/bin/impala-shell.sh -i ${IMPALAD} -f\
  ${IMPALA_HOME}/testdata/bin/create-data-source-table.sql

Testing:
 - Added unit-test for Postgres and ran unit-test with jdbc driver
   postgresql-42.5.1.jar.
 - Ran manual unit-test for MySql with jdbc driver
   mysql-connector-j-8.1.0.jar.
 - Ran core tests successfully.

Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
---
M bin/rat_exclude_files.txt
M fe/src/main/java/org/apache/impala/extdatasource/ExternalDataSourceExecutor.java
M fe/src/test/java/org/apache/impala/service/FrontendTest.java
A java/ext-data-source/jdbc/pom.xml
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/README.md
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/DatabaseType.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DB2DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessorFactory.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MsSqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MySqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/OracleDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/PostgresDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/exception/JdbcDatabaseAccessException.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java
A java/ext-data-source/jdbc/src/test/java/org/apache/impala/extdatasource/jdbc/JdbcDataSourceTest.java
A java/ext-data-source/jdbc/src/test/resources/log4j.properties
A java/ext-data-source/jdbc/src/test/resources/test_script.sql
M java/ext-data-source/pom.xml
M testdata/bin/copy-data-sources.sh
M testdata/bin/create-data-source-table.sql
M testdata/bin/create-load-data.sh
A testdata/bin/load-data-sources.sh
M testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test
30 files changed, 2,156 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/17842/19
-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 19
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Abhishek Rawat (Code Review)" <ge...@cloudera.org>.
Abhishek Rawat has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 19:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/17842/19//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17842/19//COMMIT_MSG@11
PS19, Line 11: It has some limitations due to the restrictions of "external data
There are likely other restrictions?
Can we run more complex queries where we perform JOIN/GROUP-BY/OLAP/SORT on external data source table with another external data source table or regular table? Do we support generating these kind of plans today?


http://gerrit.cloudera.org:8080/#/c/17842/19/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java:

http://gerrit.cloudera.org:8080/#/c/17842/19/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java@83
PS19, Line 83: scanHandle_
What's the purpose of ScanHandle? It's a string and it's just set to some random UUID?


http://gerrit.cloudera.org:8080/#/c/17842/19/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java@84
PS19, Line 84:   private boolean cacheClass_ = false;
What are we caching? Some comments here would be useful as this isn't obvious.


http://gerrit.cloudera.org:8080/#/c/17842/19/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java@86
PS19, Line 86:   private Configuration tableConfig_;
What exactly is a table Config? Comments would be good.


http://gerrit.cloudera.org:8080/#/c/17842/19/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java@98
PS19, Line 98: craete
typo:
'craete' -> 'create'



-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 19
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 29 Sep 2023 21:28:15 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#20) to the change originally created by Fucun Chu. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................

IMPALA-5741: Initial support for reading tiny RDBMS tables

This patch uses the "external data source" mechanism in Impala to
implement data source for querying jdbc.
It has some limitations due to the restrictions of "external data
source":
  - It is not distributed.
  - Only support binary predicates with operators =, !=, <=, >=,
    <, > to be pushed to RDBMS.
  - Don't support following data types for predicates:
    DECIMAL, TIMESTAMP, DATE, and binary.
  - Don't support external tables with complex types of columns.
  - Only support following RDBMS:
    MySQL, Postgres, Oracle, MSSQL, H2, DB2, and JETHRO_DATA.

Source files under jdbc/conf, jdbc/dao and jdbc/exception are
replicated from Hive JDBC Storage Handler.

In order to query the RDBMS tables, the following steps should be
followed (note that existing data source table will be rebuilt):
1. Make sure the Impala cluster has been started.

2. Copy the jar files of jdbc drivers and the data source library into
HDFS.
${IMPALA_HOME}/testdata/bin/copy-data-sources.sh

3. Create an `alltypes` table in the postgres database.
${IMPALA_HOME}/testdata/bin/load-data-sources.sh

4. Create data sources table(alltypes_jdbc_datasource).
${IMPALA_HOME}/bin/impala-shell.sh -i ${IMPALAD} -f\
  ${IMPALA_HOME}/testdata/bin/create-data-source-table.sql

Testing:
 - Added unit-test for Postgres and ran unit-test with jdbc driver
   postgresql-42.5.1.jar.
 - Ran manual unit-test for MySql with jdbc driver
   mysql-connector-j-8.1.0.jar.
 - Ran core tests successfully.

Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
---
M bin/rat_exclude_files.txt
M fe/src/main/java/org/apache/impala/extdatasource/ExternalDataSourceExecutor.java
M fe/src/test/java/org/apache/impala/service/FrontendTest.java
A java/ext-data-source/jdbc/pom.xml
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/README.md
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/DatabaseType.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DB2DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessorFactory.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MsSqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MySqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/OracleDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/PostgresDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/exception/JdbcDatabaseAccessException.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java
A java/ext-data-source/jdbc/src/test/java/org/apache/impala/extdatasource/jdbc/JdbcDataSourceTest.java
A java/ext-data-source/jdbc/src/test/resources/log4j.properties
A java/ext-data-source/jdbc/src/test/resources/test_script.sql
M java/ext-data-source/pom.xml
M testdata/bin/copy-data-sources.sh
M testdata/bin/create-data-source-table.sql
M testdata/bin/create-load-data.sh
A testdata/bin/load-data-sources.sh
M testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test
30 files changed, 2,229 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/17842/20
-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 20
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 20:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/14119/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 20
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Sat, 30 Sep 2023 00:45:14 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 21:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/14121/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 21
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Sat, 30 Sep 2023 16:13:48 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP IMPALA-5741: Support reading tiny RDBMS tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: WIP IMPALA-5741: Support reading tiny RDBMS tables
......................................................................


Patch Set 5:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/9712/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 5
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 03 Nov 2021 02:42:24 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5741: Support reading tiny RDBMS tables

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Support reading tiny RDBMS tables
......................................................................


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17842/9/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java:

http://gerrit.cloudera.org:8080/#/c/17842/9/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java@131
PS9, Line 131:  dbAccessor = DatabaseAccessorFactory.getAccessor(tableConfig);
> Is it possible to create dbAccessor in prepare() API and get totalNumberOfR
Planner create ExternalDataSourceExecutor and JdbcDataSource instances on stack, then call prepare() API. Backend create separate instances of ExternalDataSourceExecutor and JdbcDataSource to call open(), getNext() and close() APIs. Settings of member variables in prepare() API are not passed to subsequent APIs. We could create dbAccessor in prepare(), but we have to create it again in open().



-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 9
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 04 Aug 2023 20:10:30 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................

IMPALA-5741: Initial support for reading tiny RDBMS tables

This patch uses the "external data source" mechanism in Impala to
implement data source for querying JDBC.
It has some limitations due to the restrictions of "external data
source":
  - It is not distributed, e.g, fragment is unpartitioned. The queries
    are executed on coordinator.
  - Queries which read following data types from external JDBC tables
    are not supported:
    BINARY, CHAR, DATETIME, and COMPLEX.
  - Only support binary predicates with operators =, !=, <=, >=,
    <, > to be pushed to RDBMS.
  - Following data types are not supported for predicates:
    DECIMAL, TIMESTAMP, DATE, and BINARY.
  - External tables with complex types of columns are not supported.
  - Support is limited to the following databases:
    MySQL, Postgres, Oracle, MSSQL, H2, DB2, and JETHRO_DATA.
  - Catalog V2 is not supported (IMPALA-7131).
  - DataSource objects are not persistent (IMPALA-12375).

Additional fixes are planned on top of this patch.

Source files under jdbc/conf, jdbc/dao and jdbc/exception are
replicated from Hive JDBC Storage Handler.

In order to query the RDBMS tables, the following steps should be
followed (note that existing data source table will be rebuilt):
1. Make sure the Impala cluster has been started.

2. Copy the jar files of JDBC drivers and the data source library into
HDFS.
${IMPALA_HOME}/testdata/bin/copy-ext-data-sources.sh

3. Create an `alltypes` table in the Postgres database.
${IMPALA_HOME}/testdata/bin/load-ext-data-sources.sh

4. Create data source tables (alltypes_jdbc_datasource and
alltypes_jdbc_datasource_2).
${IMPALA_HOME}/bin/impala-shell.sh -f\
  ${IMPALA_HOME}/testdata/bin/create-ext-data-source-table.sql

5. It's ready to run query to access data source tables created
in last step. Don't need to restart Impala cluster.

Testing:
 - Added unit-test for Postgres and ran unit-test with JDBC driver
   postgresql-42.5.1.jar.
 - Ran manual unit-test for MySql with JDBC driver
   mysql-connector-j-8.1.0.jar.
 - Ran core tests successfully.

Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Reviewed-on: http://gerrit.cloudera.org:8080/17842
Reviewed-by: Wenzhe Zhou <wz...@cloudera.com>
Reviewed-by: Kurt Deschler <kd...@cloudera.com>
Reviewed-by: Riza Suminto <ri...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M bin/rat_exclude_files.txt
M fe/src/main/java/org/apache/impala/extdatasource/ExternalDataSourceExecutor.java
M fe/src/test/java/org/apache/impala/service/FrontendTest.java
A java/ext-data-source/jdbc/pom.xml
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/README.md
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/DatabaseType.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DB2DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessorFactory.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MsSqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MySqlDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/OracleDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/PostgresDatabaseAccessor.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/exception/JdbcDatabaseAccessException.java
A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java
A java/ext-data-source/jdbc/src/test/java/org/apache/impala/extdatasource/jdbc/JdbcDataSourceTest.java
A java/ext-data-source/jdbc/src/test/resources/log4j.properties
A java/ext-data-source/jdbc/src/test/resources/test_script.sql
M java/ext-data-source/pom.xml
D testdata/bin/copy-data-sources.sh
A testdata/bin/copy-ext-data-sources.sh
D testdata/bin/create-data-source-table.sql
A testdata/bin/create-ext-data-source-table.sql
M testdata/bin/create-load-data.sh
A testdata/bin/load-ext-data-sources.sh
A testdata/workloads/functional-query/queries/QueryTest/jdbc-data-source.test
A tests/query_test/test_ext_data_sources.py
M tests/query_test/test_queries.py
34 files changed, 2,488 insertions(+), 86 deletions(-)

Approvals:
  Wenzhe Zhou: Looks good to me, but someone else must approve
  Kurt Deschler: Looks good to me, but someone else must approve
  Riza Suminto: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 32
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 28:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/14161/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 28
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 09 Oct 2023 18:33:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 19:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17842/18/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java
File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java:

http://gerrit.cloudera.org:8080/#/c/17842/18/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java@23
PS18, Line 23:   // The database from which the external table comes, such as MySQL, ORACLE, POSTGRES,
> line too long (95 > 90)
fixed



-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 19
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 25 Sep 2023 00:42:14 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 21: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 21
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Sat, 30 Sep 2023 20:28:40 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 25:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/14139/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 25
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 05 Oct 2023 00:25:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 26: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17842/26/tests/query_test/test_ext_data_sources.py
File tests/query_test/test_ext_data_sources.py:

http://gerrit.cloudera.org:8080/#/c/17842/26/tests/query_test/test_ext_data_sources.py@63
PS26, Line 63: varify
nit: verify



-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 26
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 05 Oct 2023 03:51:14 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 27: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 27
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 05 Oct 2023 15:15:42 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 24:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/17842/24//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17842/24//COMMIT_MSG@35
PS24, Line 35: copy-data-sources.sh
nit: copy-ext-data-sources.sh


http://gerrit.cloudera.org:8080/#/c/17842/24//COMMIT_MSG@42
PS24, Line 42: create-data-source-table.sql
nit: create-ext-data-source-table.sql


http://gerrit.cloudera.org:8080/#/c/17842/22/testdata/workloads/functional-query/queries/QueryTest/jdbc-data-source.test
File testdata/workloads/functional-query/queries/QueryTest/jdbc-data-source.test:

http://gerrit.cloudera.org:8080/#/c/17842/22/testdata/workloads/functional-query/queries/QueryTest/jdbc-data-source.test@28
PS22, Line 28: 11,false,1,1,1,10,1.100000023841858,10.1,'01/02/09','1',2009-01-02 00:11:00.450000000
             : 12,true,2,2,2,20,2.200000047683716,20.2,'01/02/09','2',2009-01-02 00:12:00.460000000
             : 13,false,3,3,3,30,3.299999952316284,30.3,'01/02/09','3',2009-01-02 00:13:00.480000000
             : 14,true,4,4,4,40,4.400000095367432,40.4,'01/02/09','4',2009-01-02 00:14:00.510000000
             : 20,true,0,0,0,0,0,0,'01/03/09','0',2009-01-03 00:20:00.900000000
> This is time zone issue. jdbc driver use system time zone by default, but I
Done


http://gerrit.cloudera.org:8080/#/c/17842/24/tests/query_test/test_queries.py
File tests/query_test/test_queries.py:

http://gerrit.cloudera.org:8080/#/c/17842/24/tests/query_test/test_queries.py@243
PS24, Line 243:   @SkipIfCatalogV2.data_sources_unsupported()
              :   def test_jdbc_data_source(self, vector):
nit: Can we add metadata test verifying the values of table properties __IMPALA_DATA_SOURCE_*?

See _get_properties() and _get_tbl_properties() in tests/metadata/test_ddl_base.py



-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 24
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 04 Oct 2023 17:51:25 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
......................................................................


Patch Set 25:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17842/25/tests/query_test/test_ext_data_sources.py
File tests/query_test/test_ext_data_sources.py:

http://gerrit.cloudera.org:8080/#/c/17842/25/tests/query_test/test_ext_data_sources.py@61
PS25, Line 61:   
flake8: W293 blank line contains whitespace


http://gerrit.cloudera.org:8080/#/c/17842/25/tests/query_test/test_ext_data_sources.py@61
PS25, Line 61:   
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17842/25/tests/query_test/test_ext_data_sources.py@85
PS25, Line 85: 
flake8: W391 blank line at end of file



-- 
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 25
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 04 Oct 2023 23:58:05 +0000
Gerrit-HasComments: Yes