You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by AsterixDB Code Review <do...@asterix-gerrit.ics.uci.edu> on 2021/07/08 17:49:03 UTC

Change in asterixdb[master]: [ASTERIXDB-2753][EXT] Support reading Parquet from S3

From Wael Alkowaileet <wa...@gmail.com>:

Hello Anon. E. Moose #1000171, Hussain Towaileb, Ali Alsuliman, Jenkins, Michael Blow, Dmitry Lychagin, Ian Maxon, 

I'd like you to reexamine a change. Please visit

    https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/8984

to look at the new patch set (#14).

Change subject: [ASTERIXDB-2753][EXT] Support reading Parquet from S3
......................................................................

[ASTERIXDB-2753][EXT] Support reading Parquet from S3

- user model changes: no
- storage format changes: no
- interface changes: yes
  - Adds IRecordReaderFactory#getReaderSupportedFormats

Details:
- Read Parquet files from S3
- Allow different implementations for the same adapter name
  * They can be distinguished using IRecordReaderFactory#getReaderSupportedFormats
- Fix HDFSUtils to get the correct partitions' locations
- Bump Parquet version to 1.12.0

Change-Id: I7a8f1a9dc31d8b4af508e521d010e2ed10feb7dd
---
A asterixdb/asterix-app/data/hdfs/parquet/id_age-string.json
A asterixdb/asterix-app/src/test/java/org/apache/asterix/test/external_dataset/BinaryFileConverterUtil.java
M asterixdb/asterix-app/src/test/java/org/apache/asterix/test/external_dataset/ExternalDatasetTestUtils.java
M asterixdb/asterix-app/src/test/java/org/apache/asterix/test/external_dataset/aws/AwsS3ExternalDatasetTest.java
M asterixdb/asterix-app/src/test/java/org/apache/asterix/test/runtime/HDFSCluster.java
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/common/parquet/expression-pushdown/expression-pushdown.01.ddl.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/common/parquet/expression-pushdown/expression-pushdown.02.query.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/common/parquet/expression-pushdown/expression-pushdown.03.query.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/common/parquet/expression-pushdown/expression-pushdown.04.query.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/common/parquet/expression-pushdown/expression-pushdown.05.query.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/common/parquet/expression-pushdown/expression-pushdown.06.query.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/common/parquet/expression-pushdown/expression-pushdown.07.query.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/common/parquet/expression-pushdown/expression-pushdown.08.query.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/common/parquet/expression-pushdown/expression-pushdown.09.query.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/common/parquet/expression-pushdown/expression-pushdown.10.query.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/common/parquet/expression-pushdown/expression-pushdown.11.query.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/common/parquet/expression-pushdown/expression-pushdown.12.query.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/common/parquet/expression-pushdown/expression-pushdown.13.query.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/common/parquet/expression-pushdown/expression-pushdown.14.query.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/common/parquet/expression-pushdown/expression-pushdown.15.query.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/common/parquet/expression-pushdown/expression-pushdown.16.query.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/common/parquet/expression-pushdown/expression-pushdown.17.query.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/common/parquet/expression-pushdown/expression-pushdown.18.query.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/common/parquet/missing-fields/missing-fields.1.ddl.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/common/parquet/missing-fields/missing-fields.2.query.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/common/parquet/missing-fields/missing-fields.3.query.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/common/parquet/multi-file-multi-schema/multi-file-multi-schema.1.ddl.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/common/parquet/multi-file-multi-schema/multi-file-multi-schema.2.query.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/common/parquet/multi-file-multi-schema/multi-file-multi-schema.3.query.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/common/parquet/multi-file-multi-schema/multi-file-multi-schema.4.query.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/common/parquet/object-concat/object-concat.1.ddl.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/common/parquet/object-concat/object-concat.2.query.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/common/parquet/object-concat/object-concat.3.query.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/common/parquet/object-concat/object-concat.4.query.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/common/parquet/object-concat/object-concat.5.query.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/common/parquet/select-all-fields/select-all-fields.1.ddl.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/common/parquet/select-all-fields/select-all-fields.2.query.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/common/parquet/select-all-fields/select-all-fields.3.query.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/common/parquet/select-count-one-field/select-count-one-field.1.ddl.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/common/parquet/select-count-one-field/select-count-one-field.2.query.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/common/parquet/select-count-one-field/select-count-one-field.3.query.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/common/parquet/select-count-one-field/select-count-one-field.4.query.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/common/parquet/select-count-one-field/select-count-one-field.5.query.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/common/parquet/string-standard-utf8/string-standard-utf8.1.ddl.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/common/parquet/string-standard-utf8/string-standard-utf8.2.query.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/common/parquet/string-standard-utf8/string-standard-utf8.3.query.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/results/external-dataset/common/parquet/expression-pushdown/expression-pushdown.02.adm
A asterixdb/asterix-app/src/test/resources/runtimets/results/external-dataset/common/parquet/expression-pushdown/expression-pushdown.03.adm
A asterixdb/asterix-app/src/test/resources/runtimets/results/external-dataset/common/parquet/expression-pushdown/expression-pushdown.04.adm
A asterixdb/asterix-app/src/test/resources/runtimets/results/external-dataset/common/parquet/expression-pushdown/expression-pushdown.05.adm
A asterixdb/asterix-app/src/test/resources/runtimets/results/external-dataset/common/parquet/expression-pushdown/expression-pushdown.06.adm
A asterixdb/asterix-app/src/test/resources/runtimets/results/external-dataset/common/parquet/expression-pushdown/expression-pushdown.07.adm
A asterixdb/asterix-app/src/test/resources/runtimets/results/external-dataset/common/parquet/expression-pushdown/expression-pushdown.08.adm
A asterixdb/asterix-app/src/test/resources/runtimets/results/external-dataset/common/parquet/expression-pushdown/expression-pushdown.09.adm
A asterixdb/asterix-app/src/test/resources/runtimets/results/external-dataset/common/parquet/expression-pushdown/expression-pushdown.10.adm
A asterixdb/asterix-app/src/test/resources/runtimets/results/external-dataset/common/parquet/expression-pushdown/expression-pushdown.11.adm
A asterixdb/asterix-app/src/test/resources/runtimets/results/external-dataset/common/parquet/expression-pushdown/expression-pushdown.12.adm
A asterixdb/asterix-app/src/test/resources/runtimets/results/external-dataset/common/parquet/expression-pushdown/expression-pushdown.13.adm
A asterixdb/asterix-app/src/test/resources/runtimets/results/external-dataset/common/parquet/expression-pushdown/expression-pushdown.14.adm
A asterixdb/asterix-app/src/test/resources/runtimets/results/external-dataset/common/parquet/expression-pushdown/expression-pushdown.15.adm
A asterixdb/asterix-app/src/test/resources/runtimets/results/external-dataset/common/parquet/expression-pushdown/expression-pushdown.16.adm
A asterixdb/asterix-app/src/test/resources/runtimets/results/external-dataset/common/parquet/expression-pushdown/expression-pushdown.17.adm
A asterixdb/asterix-app/src/test/resources/runtimets/results/external-dataset/common/parquet/expression-pushdown/expression-pushdown.18.adm
A asterixdb/asterix-app/src/test/resources/runtimets/results/external-dataset/common/parquet/missing-fields/missing-fields.2.adm
A asterixdb/asterix-app/src/test/resources/runtimets/results/external-dataset/common/parquet/missing-fields/missing-fields.3.adm
A asterixdb/asterix-app/src/test/resources/runtimets/results/external-dataset/common/parquet/multi-file-multi-schema/multi-file-multi-schema.2.adm
A asterixdb/asterix-app/src/test/resources/runtimets/results/external-dataset/common/parquet/multi-file-multi-schema/multi-file-multi-schema.3.adm
A asterixdb/asterix-app/src/test/resources/runtimets/results/external-dataset/common/parquet/multi-file-multi-schema/multi-file-multi-schema.4.adm
A asterixdb/asterix-app/src/test/resources/runtimets/results/external-dataset/common/parquet/object-concat/object-concat.2.adm
A asterixdb/asterix-app/src/test/resources/runtimets/results/external-dataset/common/parquet/object-concat/object-concat.3.adm
A asterixdb/asterix-app/src/test/resources/runtimets/results/external-dataset/common/parquet/object-concat/object-concat.4.adm
A asterixdb/asterix-app/src/test/resources/runtimets/results/external-dataset/common/parquet/object-concat/object-concat.5.adm
A asterixdb/asterix-app/src/test/resources/runtimets/results/external-dataset/common/parquet/select-all-fields/select-all-fields.2.adm
A asterixdb/asterix-app/src/test/resources/runtimets/results/external-dataset/common/parquet/select-all-fields/select-all-fields.3.adm
A asterixdb/asterix-app/src/test/resources/runtimets/results/external-dataset/common/parquet/select-count-one-field/select-count-one-field.2.adm
A asterixdb/asterix-app/src/test/resources/runtimets/results/external-dataset/common/parquet/select-count-one-field/select-count-one-field.3.adm
A asterixdb/asterix-app/src/test/resources/runtimets/results/external-dataset/common/parquet/select-count-one-field/select-count-one-field.4.adm
A asterixdb/asterix-app/src/test/resources/runtimets/results/external-dataset/common/parquet/select-count-one-field/select-count-one-field.5.adm
A asterixdb/asterix-app/src/test/resources/runtimets/results/external-dataset/common/parquet/string-standard-utf8/string-standard-utf8.2.adm
A asterixdb/asterix-app/src/test/resources/runtimets/results/external-dataset/common/parquet/string-standard-utf8/string-standard-utf8.3.adm
M asterixdb/asterix-app/src/test/resources/runtimets/testsuite_external_dataset_s3.xml
M asterixdb/asterix-external-data/pom.xml
M asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/api/IRecordReaderFactory.java
M asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/input/HDFSDataSourceFactory.java
M asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/input/record/reader/abstracts/AbstractExternalInputStreamFactory.java
M asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/input/record/reader/aws/AwsS3InputStreamFactory.java
A asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/input/record/reader/aws/parquet/AwsS3ParquetReaderFactory.java
M asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/input/record/reader/azure/AzureBlobInputStreamFactory.java
M asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/provider/DatasourceFactoryProvider.java
M asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/util/ExternalDataConstants.java
M asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/util/ExternalDataUtils.java
M asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/util/HDFSUtils.java
M asterixdb/asterix-external-data/src/main/resources/META-INF/services/org.apache.asterix.external.api.IRecordReaderFactory
M asterixdb/asterix-server/pom.xml
M asterixdb/pom.xml
M asterixdb/src/main/appended-resources/supplemental-models.xml
A asterixdb/src/main/licenses/content/raw.githubusercontent.com_amzn_ion-java_v1.0.2_NOTICE.txt
A asterixdb/src/main/licenses/content/raw.githubusercontent.com_aws_aws-sdk-java_1.12.1_NOTICE.txt
A asterixdb/src/main/licenses/content/raw.githubusercontent.com_luben_zstd-jni_v1.4.9-1_LICENSE.txt
M hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/config/AlgebricksConfig.java
100 files changed, 2,739 insertions(+), 313 deletions(-)


  git pull ssh://asterix-gerrit.ics.uci.edu:29418/asterixdb refs/changes/84/8984/14
-- 
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/8984
To unsubscribe, or for help writing mail filters, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Change-Id: I7a8f1a9dc31d8b4af508e521d010e2ed10feb7dd
Gerrit-Change-Number: 8984
Gerrit-PatchSet: 14
Gerrit-Owner: Wael Alkowaileet <wa...@gmail.com>
Gerrit-Reviewer: Ali Alsuliman <al...@gmail.com>
Gerrit-Reviewer: Anon. E. Moose #1000171
Gerrit-Reviewer: Dmitry Lychagin <dm...@couchbase.com>
Gerrit-Reviewer: Hussain Towaileb <hu...@gmail.com>
Gerrit-Reviewer: Ian Maxon <im...@uci.edu>
Gerrit-Reviewer: Jenkins <je...@fulliautomatix.ics.uci.edu>
Gerrit-Reviewer: Michael Blow <mb...@apache.org>
Gerrit-Reviewer: Wael Alkowaileet <wa...@gmail.com>
Gerrit-CC: Till Westmann <ti...@apache.org>
Gerrit-MessageType: newpatchset