You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Anonymous Coward (Code Review)" <ge...@cloudera.org> on 2021/08/13 23:23:39 UTC

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

shikha.asrani10@gmail.com has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17771


Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................

WiP: IMPALA-10798 : Prototype for JSON reader

This prototype allows user to  create a table stored as jsonfile and
query it.
Steps to test:
- create a json table with schema specified using eligible datatypes
(int8/16/32/64/float/double/string/varchar/char)
- add your json file (with eligble datatypes and same column names as
 schema specified in the create command) to hdfs location
- add this 'location' to your table
- run a select statement

Fix:
- arrow library is included wherever required
- json format is added to scan node base class.
- json scanner files are added, that implement methods to read the
 json file from the specified file location

Change-Id: If79364a421d862d0d837f9be694911e388d4d629
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/exec/CMakeLists.txt
A be/src/exec/hdfs-json-scanner.cc
A be/src/exec/hdfs-json-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
8 files changed, 523 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/17771/1
-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 1
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 2:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/9296/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 2
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sat, 14 Aug 2021 01:09:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Aman Sinha, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17771

to look at the new patch set (#2).

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................

WiP: IMPALA-10798 : Prototype for JSON reader

This prototype allows user to  create a table stored as jsonfile and
query it.
Steps to test:
- create a json table with schema specified using eligible datatypes
(int8/16/32/64/float/double/string/varchar/char)
- add your json file (with eligble datatypes and same column names as
 schema specified in the create command) to hdfs location
- add this 'location' to your table
- run a select statement

Fix:
- arrow library is included wherever required
- json format is added to scan node base class.
- json scanner files are added, that implement methods to read the
 json file from the specified file location

Change-Id: If79364a421d862d0d837f9be694911e388d4d629
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/exec/CMakeLists.txt
A be/src/exec/hdfs-json-scanner.cc
A be/src/exec/hdfs-json-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
A cmake_modules/FindArrow.cmake
9 files changed, 606 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/17771/2
-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 2
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 22:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8794/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 22
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 10 Nov 2022 06:22:31 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Aman Sinha, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17771

to look at the new patch set (#5).

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................

WiP: IMPALA-10798 : Prototype for JSON reader

This prototype allows user to  create a table stored as jsonfile and
query it.
Steps to test:
- create a json table with schema specified using eligible datatypes
(int8/16/32/64/float/double/string/varchar/char)
- add your json file (with eligble datatypes and same column names as
 schema specified in the create command) to hdfs location
- add this 'location' to your table
- run a select statement

Fix:
- arrow library is included wherever required
- json format is added to scan node base class.
- json scanner files are added, that implement methods to read the
 json file from the specified file location

Change-Id: If79364a421d862d0d837f9be694911e388d4d629
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/exec/CMakeLists.txt
A be/src/exec/hdfs-json-scanner.cc
A be/src/exec/hdfs-json-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
A cmake_modules/FindArrow.cmake
9 files changed, 613 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/17771/5
-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 5
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17771/11/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

http://gerrit.cloudera.org:8080/#/c/17771/11/bin/bootstrap_toolchain.py@469
PS11, Line 469: )
flake8: E501 line too long (91 > 90 characters)



-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 11
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 13 Dec 2021 02:46:21 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 12:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17771/12/be/src/exec/hdfs-json-scanner.cc
File be/src/exec/hdfs-json-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17771/12/be/src/exec/hdfs-json-scanner.cc@187
PS12, Line 187:   std::shared_ptr<arrow::json::TableReader> reader;
There is a 'reader_' field but we never use it. I'm confused about the lifetime of arrow::json::TableReader, ArrowMemPool and arrow::Table. Not sure if they will be destroyed in a correct order.


http://gerrit.cloudera.org:8080/#/c/17771/12/be/src/exec/hdfs-json-scanner.cc@190
PS12, Line 190: new ArrowMemPool
Will the destructor of arrow::json::TableReader destroys the mem pool? If not, we are leaking the ArrowMemPool object.


http://gerrit.cloudera.org:8080/#/c/17771/12/be/src/exec/hdfs-json-scanner.cc@192
PS12, Line 192:   if (res.ok()) {
Shouldn't we handle non-ok status?



-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 12
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 20 Dec 2021 13:35:21 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
shikha.asrani10@gmail.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17771/12/be/src/exec/hdfs-json-scanner.cc
File be/src/exec/hdfs-json-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17771/12/be/src/exec/hdfs-json-scanner.cc@187
PS12, Line 187:   std::shared_ptr<arrow::json::TableReader> reader;
> There is a 'reader_' field but we never use it. I'm confused about the life
do you mean reader field, it used on line 193.



-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 12
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 20 Dec 2021 23:50:39 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has uploaded a new patch set (#15) to the change originally created by shikha.asrani10@gmail.com. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................

WiP: IMPALA-10798 : Prototype for JSON reader

This prototype allows user to  create a table stored as jsonfile and
query it.
Steps to test:
- create a json table with schema specified using eligible datatypes
(int8/16/32/64/float/double/string/varchar/char/timestamp/boolean)
- add your json file (with eligble datatypes and same column names as
 schema specified in the create command) to hdfs location
- add this 'location' to your table
- run a select statement

Fix:
- arrow library is included wherever required
- json format is added to scan node base class.
- json scanner files are added, that implement methods to read the
 json file from the specified file location

Change-Id: If79364a421d862d0d837f9be694911e388d4d629
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/exec/CMakeLists.txt
A be/src/exec/hdfs-json-scanner.cc
A be/src/exec/hdfs-json-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
A cmake_modules/FindArrow.cmake
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
10 files changed, 594 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/17771/15
-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 15
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17771/12/be/src/exec/hdfs-json-scanner.cc
File be/src/exec/hdfs-json-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17771/12/be/src/exec/hdfs-json-scanner.cc@297
PS12, Line 297:         *(reinterpret_cast<int32_t*>(slot_val_ptr)) = int_arr.Value(i - start_pos_);
I think we are not handling null values. We should check int_arr.IsNull(i - start_pos_) somewhere or check int_arr.IsValid(i - start_pos_) before using int_arr.Value().



-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 12
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 21 Dec 2021 01:04:47 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Aman Sinha, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17771

to look at the new patch set (#11).

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................

WiP: IMPALA-10798 : Prototype for JSON reader

This prototype allows user to  create a table stored as jsonfile and
query it.
Steps to test:
- create a json table with schema specified using eligible datatypes
(int8/16/32/64/float/double/string/varchar/char/timestamp/boolean)
- add your json file (with eligble datatypes and same column names as
 schema specified in the create command) to hdfs location
- add this 'location' to your table
- run a select statement

Fix:
- arrow library is included wherever required
- json format is added to scan node base class.
- json scanner files are added, that implement methods to read the
 json file from the specified file location

Change-Id: If79364a421d862d0d837f9be694911e388d4d629
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/exec/CMakeLists.txt
A be/src/exec/hdfs-json-scanner.cc
A be/src/exec/hdfs-json-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
A cmake_modules/FindArrow.cmake
9 files changed, 579 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/17771/11
-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 11
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 9:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/9833/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 9
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 23 Nov 2021 01:40:12 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has uploaded a new patch set (#19) to the change originally created by shikha.asrani10@gmail.com. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................

WiP: IMPALA-10798 : Prototype for JSON reader

This prototype allows user to  create a table stored as jsonfile and
query it.
Steps to test:
- create a json table with schema specified using eligible datatypes
(int8/16/32/64/float/double/string/varchar/char/timestamp/boolean)
- add your json file (with eligble datatypes and same column names as
 schema specified in the create command) to hdfs location
- add this 'location' to your table
- run a select statement

Fix:
- arrow library is included wherever required
- json format is added to scan node base class.
- json scanner files are added, that implement methods to read the
 json file from the specified file location

Change-Id: If79364a421d862d0d837f9be694911e388d4d629
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/exec/CMakeLists.txt
A be/src/exec/hdfs-json-scanner.cc
A be/src/exec/hdfs-json-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
A cmake_modules/FindArrow.cmake
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/bin/generate-schema-statements.py
M tests/common/test_dimensions.py
13 files changed, 681 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/17771/19
-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 19
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Aman Sinha, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17771

to look at the new patch set (#3).

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................

WiP: IMPALA-10798 : Prototype for JSON reader

This prototype allows user to  create a table stored as jsonfile and
query it.
Steps to test:
- create a json table with schema specified using eligible datatypes
(int8/16/32/64/float/double/string/varchar/char)
- add your json file (with eligble datatypes and same column names as
 schema specified in the create command) to hdfs location
- add this 'location' to your table
- run a select statement

Fix:
- arrow library is included wherever required
- json format is added to scan node base class.
- json scanner files are added, that implement methods to read the
 json file from the specified file location

Change-Id: If79364a421d862d0d837f9be694911e388d4d629
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/exec/CMakeLists.txt
A be/src/exec/hdfs-json-scanner.cc
A be/src/exec/hdfs-json-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
A cmake_modules/FindArrow.cmake
9 files changed, 606 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/17771/3
-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 3
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17771/12/be/src/exec/hdfs-json-scanner.cc
File be/src/exec/hdfs-json-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17771/12/be/src/exec/hdfs-json-scanner.cc@187
PS12, Line 187:   std::shared_ptr<arrow::json::TableReader> reader;
> do you mean reader field, it used on line 193.
No, I mean 'reader_' in HdfsJsonScanner (line 139 in hdfs-json-scanner.h)



-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 12
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 21 Dec 2021 00:02:29 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Aman Sinha, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17771

to look at the new patch set (#14).

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................

WiP: IMPALA-10798 : Prototype for JSON reader

This prototype allows user to  create a table stored as jsonfile and
query it.
Steps to test:
- create a json table with schema specified using eligible datatypes
(int8/16/32/64/float/double/string/varchar/char/timestamp/boolean)
- add your json file (with eligble datatypes and same column names as
 schema specified in the create command) to hdfs location
- add this 'location' to your table
- run a select statement

Fix:
- arrow library is included wherever required
- json format is added to scan node base class.
- json scanner files are added, that implement methods to read the
 json file from the specified file location

Change-Id: If79364a421d862d0d837f9be694911e388d4d629
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/exec/CMakeLists.txt
A be/src/exec/hdfs-json-scanner.cc
A be/src/exec/hdfs-json-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
A cmake_modules/FindArrow.cmake
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
10 files changed, 590 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/17771/14
-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 14
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 14:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17771/14/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

http://gerrit.cloudera.org:8080/#/c/17771/14/bin/bootstrap_toolchain.py@469
PS14, Line 469: )
flake8: E501 line too long (91 > 90 characters)



-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 14
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 22 Dec 2021 22:19:07 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 13:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/9951/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 13
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 21 Dec 2021 16:57:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has uploaded a new patch set (#20) to the change originally created by shikha.asrani10@gmail.com. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................

WiP: IMPALA-10798 : Prototype for JSON reader

This prototype allows user to  create a table stored as jsonfile and
query it.
Steps to test:
- create a json table with schema specified using eligible datatypes
(int8/16/32/64/float/double/string/varchar/char/timestamp/boolean)
- add your json file (with eligble datatypes and same column names as
 schema specified in the create command) to hdfs location
- add this 'location' to your table
- run a select statement

Fix:
- arrow library is included wherever required
- json format is added to scan node base class.
- json scanner files are added, that implement methods to read the
 json file from the specified file location

Change-Id: If79364a421d862d0d837f9be694911e388d4d629
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/exec/CMakeLists.txt
A be/src/exec/hdfs-json-scanner.cc
A be/src/exec/hdfs-json-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
A cmake_modules/FindArrow.cmake
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/bin/generate-schema-statements.py
M tests/common/test_dimensions.py
13 files changed, 756 insertions(+), 137 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/17771/20
-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 20
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 22:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/17771/22/be/src/exec/hdfs-json-scanner.cc
File be/src/exec/hdfs-json-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17771/22/be/src/exec/hdfs-json-scanner.cc@192
PS22, Line 192: new ArrowMemPool
Will the arrow lib delete this object at the end? Or is it the caller's duty to delete this object?


http://gerrit.cloudera.org:8080/#/c/17771/22/be/src/exec/hdfs-json-scanner.cc@201
PS22, Line 201:   res2 = reader->Read();
Please add a comment mentioning that "Read the entire JSON file and convert it to a Arrow Table"


http://gerrit.cloudera.org:8080/#/c/17771/22/be/src/exec/hdfs-json-scanner.cc@218
PS22, Line 218:   vector<ColumnDescriptor> cv = tad->col_descs();
              :   vector<std::shared_ptr<arrow::Field>> fields_list = {};
              :   // convert impala tuple descriptor to arrow schema
              :   for (auto cvf : cv) {
              :     std::shared_ptr<arrow::Field> field_a =
              :         arrow::field(cvf.name(), ColumnType2ArrowType(cvf.type()));
              :     fields_list.push_back(field_a);
              :   }
This explicitly sets schema for all columns of the table since 'tad' is the TableDescriptor. We should be able to only set schema of the required columns, i.e. use 'td', the TupleDescriptor directly. This avoids internal arrow parse errors on unneeded columns, e.g. decimal parse errors due to scale/precision overflow.


http://gerrit.cloudera.org:8080/#/c/17771/22/be/src/exec/hdfs-json-scanner.cc@279
PS22, Line 279:       if (i == chunked_boundary_) {
I think this is only updated in the first column and the prerequirement is that chunks in different columns are aligned. Is it alway true?



-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 22
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 16 Dec 2022 08:04:51 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has uploaded a new patch set (#25) to the change originally created by shikha.asrani10@gmail.com. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................

WiP: IMPALA-10798 : Prototype for JSON reader

This prototype allows user to  create a table stored as jsonfile and
query it.
Steps to test:
- create a json table with schema specified using eligible datatypes
(int8/16/32/64/float/double/string/varchar/char/timestamp/boolean)
- add your json file (with eligble datatypes and same column names as
 schema specified in the create command) to hdfs location
- add this 'location' to your table
- run a select statement

Fix:
- arrow library is included wherever required
- json format is added to scan node base class.
- json scanner files are added, that implement methods to read the
 json file from the specified file location

Change-Id: If79364a421d862d0d837f9be694911e388d4d629
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/exec/CMakeLists.txt
A be/src/exec/hdfs-json-scanner.cc
A be/src/exec/hdfs-json-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scanner.cc
M be/src/service/client-request-state.cc
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
A cmake_modules/FindArrow.cmake
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/bin/generate-schema-statements.py
M testdata/bin/run-hive-server.sh
M testdata/workloads/functional-query/functional-query_core.csv
M testdata/workloads/functional-query/functional-query_dimensions.csv
M testdata/workloads/functional-query/functional-query_exhaustive.csv
M testdata/workloads/functional-query/functional-query_pairwise.csv
M testdata/workloads/tpcds/tpcds_core.csv
M testdata/workloads/tpcds/tpcds_exhaustive.csv
M testdata/workloads/tpcds/tpcds_pairwise.csv
M testdata/workloads/tpch/tpch_core.csv
M testdata/workloads/tpch/tpch_dimensions.csv
M testdata/workloads/tpch/tpch_exhaustive.csv
M testdata/workloads/tpch/tpch_pairwise.csv
M tests/common/test_dimensions.py
M tests/metadata/test_hms_integration.py
M tests/query_test/test_decimal_queries.py
M tests/query_test/test_tpch_queries.py
30 files changed, 818 insertions(+), 201 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/17771/25
-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 25
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9299/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 5
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sat, 14 Aug 2021 02:53:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9704/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 6
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 02 Nov 2021 02:48:49 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Aman Sinha, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17771

to look at the new patch set (#6).

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................

WiP: IMPALA-10798 : Prototype for JSON reader

This prototype allows user to  create a table stored as jsonfile and
query it.
Steps to test:
- create a json table with schema specified using eligible datatypes
(int8/16/32/64/float/double/string/varchar/char/timestamp)
- add your json file (with eligble datatypes and same column names as
 schema specified in the create command) to hdfs location
- add this 'location' to your table
- run a select statement

Fix:
- arrow library is included wherever required
- json format is added to scan node base class.
- json scanner files are added, that implement methods to read the
 json file from the specified file location

Change-Id: If79364a421d862d0d837f9be694911e388d4d629
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/exec/CMakeLists.txt
A be/src/exec/hdfs-json-scanner.cc
A be/src/exec/hdfs-json-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
A cmake_modules/FindArrow.cmake
9 files changed, 607 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/17771/6
-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 6
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17771/9/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

http://gerrit.cloudera.org:8080/#/c/17771/9/bin/bootstrap_toolchain.py@469
PS9, Line 469: )
flake8: E501 line too long (91 > 90 characters)



-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 9
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 23 Nov 2021 01:18:33 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 12:

(26 comments)

http://gerrit.cloudera.org:8080/#/c/17771/12/be/src/exec/hdfs-json-scanner.cc
File be/src/exec/hdfs-json-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17771/12/be/src/exec/hdfs-json-scanner.cc@283
PS12, Line 283:       chunked_boundary_ = start_pos_ + column->chunk(chunk_pos_)->length(); 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/12/be/src/exec/hdfs-json-scanner.cc@284
PS12, Line 284:       // VLOG_QUERY << "i:" << i << "num_rows_" << num_rows_ << "chunked boundary" << chunked_boundary_ << "start_pos_" << start_pos_;
line too long (134 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/12/be/src/exec/hdfs-json-scanner.cc@285
PS12, Line 285:       if(i == chunked_boundary_){ 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/12/be/src/exec/hdfs-json-scanner.cc@289
PS12, Line 289:         VLOG_QUERY << "start pos" << start_pos_ << "chunk pos"<< chunk_pos_ << "length" << column->chunk(chunk_pos_)->length();
line too long (127 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/12/be/src/exec/hdfs-json-scanner.cc@312
PS12, Line 312:         DCHECK(s_arr.IsValid(i - start_pos_)) << "length: " << s_arr.length() << ", offset: " << s_arr.offset();
line too long (112 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/12/be/src/exec/hdfs-json-scanner.cc@313
PS12, Line 313: 	int src_len = s_arr.value_length(i - start_pos_);
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/17771/12/be/src/exec/hdfs-json-scanner.cc@315
PS12, Line 315: 	char* src_ptr;
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/17771/12/be/src/exec/hdfs-json-scanner.cc@318
PS12, Line 318:         
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/12/be/src/exec/hdfs-json-scanner.cc@319
PS12, Line 319: 	// auto blob = s_arr.value_data();
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/17771/12/be/src/exec/hdfs-json-scanner.cc@322
PS12, Line 322: 	char* blob_;
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/17771/12/be/src/exec/hdfs-json-scanner.cc@323
PS12, Line 323: 	// VLOG_QUERY << blob->data() << "blob data and size" << blob->size();
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/17771/12/be/src/exec/hdfs-json-scanner.cc@325
PS12, Line 325:         DCHECK(src_len>0)<< "i for size error" << i << "len" <<src_len << "length: " << s_arr.length() << ", offset: " << s_arr.offset(); 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/12/be/src/exec/hdfs-json-scanner.cc@325
PS12, Line 325:         DCHECK(src_len>0)<< "i for size error" << i << "len" <<src_len << "length: " << s_arr.length() << ", offset: " << s_arr.offset(); 
line too long (138 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/12/be/src/exec/hdfs-json-scanner.cc@326
PS12, Line 326: 	blob_ = reinterpret_cast<char*>(pool->TryAllocateUnaligned(src_len));
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/17771/12/be/src/exec/hdfs-json-scanner.cc@327
PS12, Line 327: 	// memcpy(blob_, blob->data(), blob->size());
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/17771/12/be/src/exec/hdfs-json-scanner.cc@329
PS12, Line 329: 	// char* src_ptr;
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/17771/12/be/src/exec/hdfs-json-scanner.cc@332
PS12, Line 332: 	auto val_char = reinterpret_cast<const char*>(val);
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/17771/12/be/src/exec/hdfs-json-scanner.cc@335
PS12, Line 335: 	//src_ptr = blob_ + (val - blob->data());
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/17771/12/be/src/exec/hdfs-json-scanner.cc@336
PS12, Line 336: 	//src_ptr = s_arr.GetString(i);
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/17771/12/be/src/exec/hdfs-json-scanner.cc@337
PS12, Line 337: 	// auto str = s_arr.GetString(i);
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/17771/12/be/src/exec/hdfs-json-scanner.cc@338
PS12, Line 338: 	memcpy(blob_, val_char, src_len);
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/17771/12/be/src/exec/hdfs-json-scanner.cc@339
PS12, Line 339: 	// VLOG_QUERY << "val" << val_char << "len" << src_len;
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/17771/12/be/src/exec/hdfs-json-scanner.cc@340
PS12, Line 340: 	//src_ptr = reinterpret_cast<char*>(s_arr.GetView(i));
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/17771/12/be/src/exec/hdfs-json-scanner.cc@341
PS12, Line 341: 	src_ptr = blob_;
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/17771/12/be/src/exec/hdfs-json-scanner.cc@342
PS12, Line 342: 	if (src_len == 0) {
tab used for whitespace


http://gerrit.cloudera.org:8080/#/c/17771/12/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

http://gerrit.cloudera.org:8080/#/c/17771/12/bin/bootstrap_toolchain.py@469
PS12, Line 469: )
flake8: E501 line too long (91 > 90 characters)



-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 12
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 20 Dec 2021 05:15:46 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 9:

(13 comments)

The prototype is in good shape. Let's roll it out!

http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/CMakeLists.txt
File be/src/exec/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/CMakeLists.txt@71
PS9, Line 71:  
nit: redundant whitespace


http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.h
File be/src/exec/hdfs-json-scanner.h:

http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.h@38
PS9, Line 38: #include "arrow/buffer.h"
            : #include "arrow/io/type_fwd.h"
            : #include "arrow/result.h"
            : #include "arrow/type_fwd.h"
            : #include "arrow/util/macros.h"
            : #include "arrow/util/string_view.h"
            : #include "arrow/util/type_fwd.h"
nit: use <> for external includes


http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.h@92
PS9, Line 92:   class ScanRangeInputStream : public arrow::io::InputStream {
I think we need the offset inside the file. When a large JSON file is split into several blocks, they will results in several scan ranges and thus several scanner instances. Each scanner only reads the portion corresponding to its scan range.


http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.h@132
PS9, Line 132: row_read
nit: rows_read_


http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.h@133
PS9, Line 133: num_rows
nit: num_rows_


http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.h@136
PS9, Line 136:   const char* filename() const { return metadata_range_->file(); }
nit: could you move this above to be together with the methods?


http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.cc
File be/src/exec/hdfs-json-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.cc@19
PS9, Line 19: #include <arrow/api.h>
            : #include <arrow/array.h>
            : #include <arrow/buffer.h>
            : #include <arrow/config.h>
            : #include <arrow/io/interfaces.h>
            : #include <arrow/json/api.h>
            : #include <arrow/json/chunked_builder.h>
            : #include <arrow/json/chunker.h>
            : #include <arrow/json/converter.h>
            : #include <arrow/json/options.h>
            : #include <arrow/json/parser.h>
            : #include <arrow/json/reader.h>
            : #include <arrow/memory_pool.h>
            : #include <arrow/table.h>
            : #include "arrow/buffer.h"
            : #include "arrow/io/interfaces.h"
nit: could you please remove headers that are included in hdfs-json-scanner.h?


http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.cc@179
PS9, Line 179: std::shared_ptr<arrow::DataType> convert_type(const ColumnType ct) {
nit: I think we need to mark this as "static". BTW, could you rename it to sth like "ColumnType2ArrowType"?


http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.cc@296
PS9, Line 296: for (auto column_ : columns_)
nit: for (const auto& column : table_->columns())

Our naming convention is using "_" at the end of class field names. Local variable names should not end with "_".


http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.cc@305
PS9, Line 305:       auto ar = column_->chunk(0);
             :       auto ard = ar->data();
nit: could you rename these variables? I'm confused in what "ar" means.. BTW, if "ar" is only used once, we don't need a variable for it.


http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.cc@331
PS9, Line 331:         memcpy(blob_, blob->data(), blob->size());
Is this copying all the binary values of a column?


http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.cc@336
PS9, Line 336:         if (src_len == 0) {
             :           tuple->SetNull(slot_desc->null_indicator_offset());
nit: can we move this to line 326?


http://gerrit.cloudera.org:8080/#/c/17771/9/cmake_modules/FindArrow.cmake
File cmake_modules/FindArrow.cmake:

http://gerrit.cloudera.org:8080/#/c/17771/9/cmake_modules/FindArrow.cmake@18
PS9, Line 18: # - Find Orc (headers and liborc.a) with ORC_ROOT hinting a location
            : # This module defines
            : #  ORC_INCLUDE_DIR, directory containing headers
            : #  ORC_STATIC_LIB, path to liborc.a
            : #  ORC_FOUND
nit: please update these comments



-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 9
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sat, 27 Nov 2021 11:55:05 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 21:

(33 comments)

http://gerrit.cloudera.org:8080/#/c/17771/21/be/src/exec/hdfs-json-scanner.cc
File be/src/exec/hdfs-json-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17771/21/be/src/exec/hdfs-json-scanner.cc@171
PS21, Line 171:     // VLOG_QUERY << "decimal128" << arrow::decimal128(ct.precision, ct.scale)->ToString();
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/21/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/17771/21/be/src/exec/hdfs-scan-node-base.cc@70
PS21, Line 70:                                           "for all reads, regardless of whether the read is local or remote. By default, the "
line too long (126 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/21/be/src/exec/hdfs-scan-node-base.cc@71
PS21, Line 71:                                           "IO data cache is only used if the data is expected to be remote. Used by tests.");
line too long (125 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/21/be/src/exec/hdfs-scan-node-base.cc@79
PS21, Line 79:                                                        " across all Disk I/O threads in HDFS read operations.");
line too long (112 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/21/be/src/exec/hdfs-scan-node-base.cc@81
PS21, Line 81:                                                            " spent across all Disk I/O threads in HDFS open operations.");
line too long (122 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/21/be/src/exec/hdfs-scan-node-base.cc@84
PS21, Line 84:                              " while it is executing I/O operations on behalf of a scan.");
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/21/be/src/exec/hdfs-scan-node-base.cc@91
PS21, Line 91:                                                                   "disks accessed by HDFS scan. Each local disk is counted as a disk and each type of"
line too long (150 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/21/be/src/exec/hdfs-scan-node-base.cc@92
PS21, Line 92:                                                                   " remote filesystem (e.g. HDFS remote reads, S3) is counted as a distinct disk.");
line too long (148 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/21/be/src/exec/hdfs-scan-node-base.cc@94
PS21, Line 94:                                                                               " average number of HDFS read threads executing read operations on behalf of this "
line too long (161 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/21/be/src/exec/hdfs-scan-node-base.cc@95
PS21, Line 95:                                                                               "scan. Higher values (i.e. close to the aggregate number of I/O threads across "
line too long (158 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/21/be/src/exec/hdfs-scan-node-base.cc@96
PS21, Line 96:                                                                               "all disks accessed) show that this scan is using a larger proportion of the I/O "
line too long (160 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/21/be/src/exec/hdfs-scan-node-base.cc@97
PS21, Line 97:                                                                               "capacity of the system. Lower values show that either this scan is not I/O bound"
line too long (160 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/21/be/src/exec/hdfs-scan-node-base.cc@98
PS21, Line 98:                                                                               " or that it is getting a small share of the I/O capacity of the system.");
line too long (153 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/21/be/src/exec/hdfs-scan-node-base.cc@106
PS21, Line 106:                   "Use this to determine if the scan got all of the reservation it wanted. Does not "
line too long (101 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/21/be/src/exec/hdfs-scan-node-base.cc@107
PS21, Line 107:                   "include subsequent reservation increases done by scanner implementation "
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/21/be/src/exec/hdfs-scan-node-base.cc@127
PS21, Line 127:                                                     "threads spent waiting for I/O. This value can be compared to the value of "
line too long (128 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/21/be/src/exec/hdfs-scan-node-base.cc@128
PS21, Line 128:                                                     "ScannerThreadsTotalWallClockTime of MT_DOP = 0 scan nodes or otherwise compared "
line too long (134 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/21/be/src/exec/hdfs-scan-node-base.cc@129
PS21, Line 129:                                                     "to the total time reported for MT_DOP > 0 scan nodes. High values show that "
line too long (130 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/21/be/src/exec/hdfs-scan-node-base.cc@130
PS21, Line 130:                                                     "scanner threads are spending significant time waiting for I/O instead of "
line too long (127 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/21/be/src/exec/hdfs-scan-node-base.cc@131
PS21, Line 131:                                                     "processing data. Note that this includes the time when the thread is runnable "
line too long (132 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/21/be/src/exec/hdfs-scan-node-base.cc@135
PS21, Line 135:                   "Each sample in the counter is the size of a single column that is scanned by the "
line too long (101 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/21/be/src/exec/hdfs-scan-node-base.cc@139
PS21, Line 139:                   "Each sample in the counter is the size of a single column that is scanned by the "
line too long (101 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/21/be/src/exec/hdfs-scan-node-base.cc@747
PS21, Line 747:                  metadata->partition_id, FilterStats::FILES_KEY, filter_ctxs, file, state)) {
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/21/be/src/exec/hdfs-scan-node-base.cc@1231
PS21, Line 1231:                                                                  "Read $0 of data across network that was expected to be local. Block locality "
line too long (144 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/21/be/src/exec/hdfs-scan-node-base.cc@1232
PS21, Line 1232:                                                                  "metadata for table '$1.$2' may be stale. This only affects query performance "
line too long (144 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/21/be/src/exec/hdfs-scan-node-base.cc@1233
PS21, Line 1233:                                                                  "and not result correctness. One of the common causes for this warning is HDFS "
line too long (145 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/21/be/src/exec/hdfs-scan-node-base.cc@1234
PS21, Line 1234:                                                                  "rebalancer moving some of the file's blocks. If the issue persists, consider "
line too long (144 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/21/be/src/exec/hdfs-scan-node-base.cc@1235
PS21, Line 1235:                                                                  "running \"INVALIDATE METADATA `$1`.`$2`\".",
line too long (110 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/21/be/src/exec/hdfs-scan-node-base.cc@1236
PS21, Line 1236:                                                                  PrettyPrinter::Print(unexpected_remote_bytes_->value(), TUnit::BYTES),
line too long (135 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/21/be/src/exec/hdfs-scan-node-base.cc@1237
PS21, Line 1237:                                                                  hdfs_table_->database(), hdfs_table_->name())));
line too long (113 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/21/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

http://gerrit.cloudera.org:8080/#/c/17771/21/bin/bootstrap_toolchain.py@489
PS21, Line 489: "
flake8: E501 line too long (98 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/17771/21/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/17771/21/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@475
PS21, Line 475:     
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/21/tests/query_test/test_tpch_queries.py
File tests/query_test/test_tpch_queries.py:

http://gerrit.cloudera.org:8080/#/c/17771/21/tests/query_test/test_tpch_queries.py@39
PS21, Line 39: s
flake8: E501 line too long (96 > 90 characters)



-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 21
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 10 Nov 2022 06:15:55 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has uploaded a new patch set (#22) to the change originally created by shikha.asrani10@gmail.com. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................

WiP: IMPALA-10798 : Prototype for JSON reader

This prototype allows user to  create a table stored as jsonfile and
query it.
Steps to test:
- create a json table with schema specified using eligible datatypes
(int8/16/32/64/float/double/string/varchar/char/timestamp/boolean)
- add your json file (with eligble datatypes and same column names as
 schema specified in the create command) to hdfs location
- add this 'location' to your table
- run a select statement

Fix:
- arrow library is included wherever required
- json format is added to scan node base class.
- json scanner files are added, that implement methods to read the
 json file from the specified file location

Change-Id: If79364a421d862d0d837f9be694911e388d4d629
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/exec/CMakeLists.txt
A be/src/exec/hdfs-json-scanner.cc
A be/src/exec/hdfs-json-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
A cmake_modules/FindArrow.cmake
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/bin/generate-schema-statements.py
M testdata/workloads/functional-query/functional-query_core.csv
M testdata/workloads/functional-query/functional-query_dimensions.csv
M testdata/workloads/functional-query/functional-query_exhaustive.csv
M testdata/workloads/functional-query/functional-query_pairwise.csv
M testdata/workloads/tpcds/tpcds_core.csv
M testdata/workloads/tpcds/tpcds_exhaustive.csv
M testdata/workloads/tpcds/tpcds_pairwise.csv
M testdata/workloads/tpch/tpch_core.csv
M testdata/workloads/tpch/tpch_dimensions.csv
M testdata/workloads/tpch/tpch_exhaustive.csv
M testdata/workloads/tpch/tpch_pairwise.csv
M tests/common/test_dimensions.py
M tests/metadata/test_hms_integration.py
M tests/query_test/test_decimal_queries.py
M tests/query_test/test_tpch_queries.py
27 files changed, 769 insertions(+), 165 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/17771/22
-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 22
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 4:

(38 comments)

http://gerrit.cloudera.org:8080/#/c/17771/4/be/src/exec/hdfs-json-scanner.h
File be/src/exec/hdfs-json-scanner.h:

http://gerrit.cloudera.org:8080/#/c/17771/4/be/src/exec/hdfs-json-scanner.h@70
PS4, Line 70:   
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/4/be/src/exec/hdfs-json-scanner.h@91
PS4, Line 91:  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/4/be/src/exec/hdfs-json-scanner.h@148
PS4, Line 148:    
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/4/be/src/exec/hdfs-json-scanner.h@151
PS4, Line 151:   inline Status AllocateTupleMem(RowBatch* row_batch) WARN_UNUSED_RESULT;  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/4/be/src/exec/hdfs-json-scanner.h@153
PS4, Line 153:   int num_rows;  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/4/be/src/exec/hdfs-json-scanner.h@154
PS4, Line 154:   uint8_t* tuple_mem_end_ = nullptr;  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/4/be/src/exec/hdfs-json-scanner.h@155
PS4, Line 155:   const io::ScanRange* metadata_range_ = nullptr;  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/4/be/src/exec/hdfs-json-scanner.h@156
PS4, Line 156:   const char *filename() const { return metadata_range_->file(); }  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/4/be/src/exec/hdfs-json-scanner.h@161
PS4, Line 161:   boost::scoped_ptr<MemPool> data_batch_pool_;  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/4/be/src/exec/hdfs-json-scanner.cc
File be/src/exec/hdfs-json-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17771/4/be/src/exec/hdfs-json-scanner.cc@55
PS4, Line 55:      RETURN_IF_ERROR(scan_node->AddDiskIoRanges(file, EnqueueLocation::TAIL));  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/4/be/src/exec/hdfs-json-scanner.cc@77
PS4, Line 77:   }  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/4/be/src/exec/hdfs-json-scanner.cc@83
PS4, Line 83:   chunk_sizes_[*out] = size;  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/4/be/src/exec/hdfs-json-scanner.cc@88
PS4, Line 88: arrow::Status HdfsJsonScanner::ArrowMemPool::Reallocate(int64_t old_size, int64_t new_size, uint8_t** ptr) {
line too long (108 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/4/be/src/exec/hdfs-json-scanner.cc@97
PS4, Line 97:        << GetStackTrace();  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/4/be/src/exec/hdfs-json-scanner.cc@107
PS4, Line 107: arrow::Result<int64_t> HdfsJsonScanner::ScanRangeInputStream::Read(int64_t nbytes, void* out){
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/4/be/src/exec/hdfs-json-scanner.cc@110
PS4, Line 110:   int64_t fl = file_desc_->file_length;  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/4/be/src/exec/hdfs-json-scanner.cc@112
PS4, Line 112:     nbytes = fl - pos_;  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/4/be/src/exec/hdfs-json-scanner.cc@143
PS4, Line 143: arrow::Result<std::shared_ptr<arrow::Buffer>> HdfsJsonScanner::ScanRangeInputStream::Read(int64_t nbytes){
line too long (106 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/4/be/src/exec/hdfs-json-scanner.cc@197
PS4, Line 197: arrow::Status HdfsJsonScanner::ReadTable(std::shared_ptr<arrow::io::InputStream> input_stream){
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/4/be/src/exec/hdfs-json-scanner.cc@199
PS4, Line 199:   ARROW_ASSIGN_OR_RAISE(reader_, arrow::json::TableReader::Make(arrow::default_memory_pool(),
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/4/be/src/exec/hdfs-json-scanner.cc@208
PS4, Line 208:   RETURN_IF_ERROR(HdfsScanner::Open(context));  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/4/be/src/exec/hdfs-json-scanner.cc@214
PS4, Line 214:   vector<std::shared_ptr<arrow::Field>> fields_list = {};  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/4/be/src/exec/hdfs-json-scanner.cc@217
PS4, Line 217:     std::shared_ptr<arrow::Field> field_a = arrow::field(cvf.name(), convert_type(cvf.type()));
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/4/be/src/exec/hdfs-json-scanner.cc@223
PS4, Line 223:   readOptions_ = arrow::json::ReadOptions::Defaults(); 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/4/be/src/exec/hdfs-json-scanner.cc@230
PS4, Line 230:   VLOG_QUERY << "args: " << table_->ToString() << std::endl << GetStackTrace();  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/4/be/src/exec/hdfs-json-scanner.cc@252
PS4, Line 252:     eos_ = true; 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/4/be/src/exec/hdfs-json-scanner.cc@261
PS4, Line 261:  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/4/be/src/exec/hdfs-json-scanner.cc@264
PS4, Line 264:   
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/4/be/src/exec/hdfs-json-scanner.cc@266
PS4, Line 266:   // reading rows from the previous point we left, tilll either end of table/capacity of rowbatch
line too long (97 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/4/be/src/exec/hdfs-json-scanner.cc@274
PS4, Line 274:    void* slot_val_ptr = tuple->GetSlot(slot_desc->tuple_offset());  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/4/be/src/exec/hdfs-json-scanner.cc@275
PS4, Line 275:    // helpful debug statements 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/4/be/src/exec/hdfs-json-scanner.cc@303
PS4, Line 303:      int dst_len = slot_desc->type().len;     
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/4/be/src/exec/hdfs-json-scanner.cc@312
PS4, Line 312:      src_ptr = blob_ + (val - blob->data());     
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/4/be/src/exec/hdfs-json-scanner.cc@325
PS4, Line 325:         StringValue* dst = reinterpret_cast<StringValue*>(slot_val_ptr);        
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/4/be/src/exec/hdfs-json-scanner.cc@329
PS4, Line 329:         } 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/4/be/src/exec/hdfs-json-scanner.cc@348
PS4, Line 348: Status s =  CommitRows(row_read, row_batch);  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/4/be/src/exec/hdfs-json-scanner.cc@357
PS4, Line 357:     row_batch->tuple_data_pool()->AcquireData(data_batch_pool_.get(), false);    
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/4/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

http://gerrit.cloudera.org:8080/#/c/17771/4/bin/bootstrap_toolchain.py@443
PS4, Line 443: )
flake8: E501 line too long (91 > 90 characters)



-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 4
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sat, 14 Aug 2021 01:49:12 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 24:

(49 comments)

http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-json-scanner.cc
File be/src/exec/hdfs-json-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-json-scanner.cc@206
PS24, Line 206:     // VLOG_QUERY << "decimal128" << arrow::decimal128(ct.precision, ct.scale)->ToString();
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-json-scanner.cc@237
PS24, Line 237:     // VLOG_QUERY << "decimal128" << arrow::decimal128(ct.precision, ct.scale)->ToString();
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-json-scanner.cc@279
PS24, Line 279:   VLOG_QUERY << " PrintTuple:::" << PrintTuple(template_tuple_, *scan_node_->tuple_desc());
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-json-scanner.cc@303
PS24, Line 303:     std::shared_ptr<arrow::Field> field_a = arrow::field(cvf.name(), ColumnType2ArrowType(cvf.type()));
line too long (103 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-json-scanner.cc@321
PS24, Line 321:   VLOG_QUERY<< "Filename:::"<< stream_->filename()<< " "<< "num_rows_openfunction():::" << num_rows_;
line too long (101 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-json-scanner.cc@348
PS24, Line 348:         VLOG_QUERY << " PrintTemplateTuple:::" << PrintTuple(template_tuple, *tuple_desc) << std::endl;
line too long (103 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-json-scanner.cc@350
PS24, Line 350:   VLOG_QUERY << " PrintTemplateTuple:::" << PrintTuple(template_tuple, *tuple_desc) << std::endl;
line too long (97 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-json-scanner.cc@371
PS24, Line 371:     //if (!EvalRuntimeFilters(reinterpret_cast<TupleRow*>(row)) || !ExecNode::EvalConjuncts(evals.data(), evals.size(), row)) {
line too long (127 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-json-scanner.cc@384
PS24, Line 384:       VLOG_QUERY<< "Column in GetNextInternal" << column->type()->ToString() << column->ToString();
line too long (99 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-json-scanner.cc@499
PS24, Line 499:     //if (EvalRuntimeFilters(reinterpret_cast<TupleRow*>(row)) && ExecNode::EvalConjuncts(evals.data(), evals.size(), row)) {
line too long (125 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-scan-node-base.cc@70
PS24, Line 70:                                           "for all reads, regardless of whether the read is local or remote. By default, the "
line too long (126 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-scan-node-base.cc@71
PS24, Line 71:                                           "IO data cache is only used if the data is expected to be remote. Used by tests.");
line too long (125 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-scan-node-base.cc@79
PS24, Line 79:                                                        " across all Disk I/O threads in HDFS read operations.");
line too long (112 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-scan-node-base.cc@81
PS24, Line 81:                                                            " spent across all Disk I/O threads in HDFS open operations.");
line too long (122 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-scan-node-base.cc@84
PS24, Line 84:                              " while it is executing I/O operations on behalf of a scan.");
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-scan-node-base.cc@91
PS24, Line 91:                                                                   "disks accessed by HDFS scan. Each local disk is counted as a disk and each type of"
line too long (150 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-scan-node-base.cc@92
PS24, Line 92:                                                                   " remote filesystem (e.g. HDFS remote reads, S3) is counted as a distinct disk.");
line too long (148 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-scan-node-base.cc@94
PS24, Line 94:                                                                               " average number of HDFS read threads executing read operations on behalf of this "
line too long (161 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-scan-node-base.cc@95
PS24, Line 95:                                                                               "scan. Higher values (i.e. close to the aggregate number of I/O threads across "
line too long (158 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-scan-node-base.cc@96
PS24, Line 96:                                                                               "all disks accessed) show that this scan is using a larger proportion of the I/O "
line too long (160 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-scan-node-base.cc@97
PS24, Line 97:                                                                               "capacity of the system. Lower values show that either this scan is not I/O bound"
line too long (160 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-scan-node-base.cc@98
PS24, Line 98:                                                                               " or that it is getting a small share of the I/O capacity of the system.");
line too long (153 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-scan-node-base.cc@106
PS24, Line 106:                   "Use this to determine if the scan got all of the reservation it wanted. Does not "
line too long (101 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-scan-node-base.cc@107
PS24, Line 107:                   "include subsequent reservation increases done by scanner implementation "
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-scan-node-base.cc@129
PS24, Line 129:                                                     "threads spent waiting for I/O. This value can be compared to the value of "
line too long (128 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-scan-node-base.cc@130
PS24, Line 130:                                                     "ScannerThreadsTotalWallClockTime of MT_DOP = 0 scan nodes or otherwise compared "
line too long (134 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-scan-node-base.cc@131
PS24, Line 131:                                                     "to the total time reported for MT_DOP > 0 scan nodes. High values show that "
line too long (130 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-scan-node-base.cc@132
PS24, Line 132:                                                     "scanner threads are spending significant time waiting for I/O instead of "
line too long (127 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-scan-node-base.cc@133
PS24, Line 133:                                                     "processing data. Note that this includes the time when the thread is runnable "
line too long (132 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-scan-node-base.cc@137
PS24, Line 137:                   "Each sample in the counter is the size of a single column that is scanned by the "
line too long (101 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-scan-node-base.cc@141
PS24, Line 141:                   "Each sample in the counter is the size of a single column that is scanned by the "
line too long (101 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-scan-node-base.cc@353
PS24, Line 353:   VLOG_QUERY<< hdfs_table_->partition_descriptors().size() << "   " << hdfs_table_->DebugString() << "    " << shared_state_.use_mt_scan_node_ << "    " << instance_ctx_pbs.size() << "  " << tnode_->hdfs_scan_node << GetStackTrace();
line too long (233 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-scan-node-base.cc@751
PS24, Line 751:                  metadata->partition_id, FilterStats::FILES_KEY, filter_ctxs, file, state)) {
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-scan-node-base.cc@1236
PS24, Line 1236:                                                                  "Read $0 of data across network that was expected to be local. Block locality "
line too long (144 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-scan-node-base.cc@1237
PS24, Line 1237:                                                                  "metadata for table '$1.$2' may be stale. This only affects query performance "
line too long (144 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-scan-node-base.cc@1238
PS24, Line 1238:                                                                  "and not result correctness. One of the common causes for this warning is HDFS "
line too long (145 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-scan-node-base.cc@1239
PS24, Line 1239:                                                                  "rebalancer moving some of the file's blocks. If the issue persists, consider "
line too long (144 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-scan-node-base.cc@1240
PS24, Line 1240:                                                                  "running \"INVALIDATE METADATA `$1`.`$2`\".",
line too long (110 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-scan-node-base.cc@1241
PS24, Line 1241:                                                                  PrettyPrinter::Print(unexpected_remote_bytes_->value(), TUnit::BYTES),
line too long (135 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/be/src/exec/hdfs-scan-node-base.cc@1242
PS24, Line 1242:                                                                  hdfs_table_->database(), hdfs_table_->name())));
line too long (113 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

http://gerrit.cloudera.org:8080/#/c/17771/24/bin/bootstrap_toolchain.py@489
PS24, Line 489: "
flake8: E501 line too long (98 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/17771/24/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/17771/24/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@458
PS24, Line 458:     
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/24/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1420
PS24, Line 1420:                   0, fileDesc.getFileLength(), partition.getId(), fileDesc.getFileLength(),
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1421
PS24, Line 1421:                   fileDesc.getFileCompression().toThrift(), fileDesc.getModificationTime(),
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1425
PS24, Line 1425:                   currentOffset, currentLength, partition.getId(), fileDesc.getFileLength(),
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1426
PS24, Line 1426:                   fileDesc.getFileCompression().toThrift(), fileDesc.getModificationTime(),
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1443
PS24, Line 1443:         LOG.info(String.format("scanRangeLocations: %s, LargestScanRangeBytes: %d, FileMaxScanRangeBytes = %s, " +
line too long (114 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1444
PS24, Line 1444:                         "RemainingLength = %d , CurrentOffset = %d, currentLength = %d", scanRangeLocations, largestScanRangeBytes_,
line too long (132 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/24/tests/query_test/test_tpch_queries.py
File tests/query_test/test_tpch_queries.py:

http://gerrit.cloudera.org:8080/#/c/17771/24/tests/query_test/test_tpch_queries.py@39
PS24, Line 39: s
flake8: E501 line too long (96 > 90 characters)



-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 24
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 21 Mar 2023 09:04:09 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has uploaded a new patch set (#24) to the change originally created by shikha.asrani10@gmail.com. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................

WiP: IMPALA-10798 : Prototype for JSON reader

This prototype allows user to  create a table stored as jsonfile and
query it.
Steps to test:
- create a json table with schema specified using eligible datatypes
(int8/16/32/64/float/double/string/varchar/char/timestamp/boolean)
- add your json file (with eligble datatypes and same column names as
 schema specified in the create command) to hdfs location
- add this 'location' to your table
- run a select statement

Fix:
- arrow library is included wherever required
- json format is added to scan node base class.
- json scanner files are added, that implement methods to read the
 json file from the specified file location

Change-Id: If79364a421d862d0d837f9be694911e388d4d629
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/exec/CMakeLists.txt
A be/src/exec/hdfs-json-scanner.cc
A be/src/exec/hdfs-json-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scanner.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/service/client-request-state.cc
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
A cmake_modules/FindArrow.cmake
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/bin/generate-schema-statements.py
M testdata/bin/run-hive-server.sh
M testdata/workloads/functional-query/functional-query_core.csv
M testdata/workloads/functional-query/functional-query_dimensions.csv
M testdata/workloads/functional-query/functional-query_exhaustive.csv
M testdata/workloads/functional-query/functional-query_pairwise.csv
M testdata/workloads/tpcds/tpcds_core.csv
M testdata/workloads/tpcds/tpcds_exhaustive.csv
M testdata/workloads/tpcds/tpcds_pairwise.csv
M testdata/workloads/tpch/tpch_core.csv
M testdata/workloads/tpch/tpch_dimensions.csv
M testdata/workloads/tpch/tpch_exhaustive.csv
M testdata/workloads/tpch/tpch_pairwise.csv
M tests/common/test_dimensions.py
M tests/metadata/test_hms_integration.py
M tests/query_test/test_decimal_queries.py
M tests/query_test/test_tpch_queries.py
31 files changed, 959 insertions(+), 201 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/17771/24
-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 24
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 24:

> Uploaded patch set 24.

This patch is only for my own reference as it has a lot of logs and debugging codes. I'll upload a new updated cleaner patch soon.


-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 24
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 21 Mar 2023 09:04:09 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 14:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9955/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 14
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 22 Dec 2021 22:40:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17771/13/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

http://gerrit.cloudera.org:8080/#/c/17771/13/bin/bootstrap_toolchain.py@469
PS13, Line 469: )
flake8: E501 line too long (91 > 90 characters)



-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 13
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 21 Dec 2021 16:36:18 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17771/10/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

http://gerrit.cloudera.org:8080/#/c/17771/10/bin/bootstrap_toolchain.py@469
PS10, Line 469: )
flake8: E501 line too long (91 > 90 characters)



-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 30 Nov 2021 07:22:06 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 15:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10463/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 15
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 19 Apr 2022 09:14:40 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 8:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9756/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 8
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 10 Nov 2021 19:22:49 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Aman Sinha, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17771

to look at the new patch set (#9).

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................

WiP: IMPALA-10798 : Prototype for JSON reader

This prototype allows user to  create a table stored as jsonfile and
query it.
Steps to test:
- create a json table with schema specified using eligible datatypes
(int8/16/32/64/float/double/string/varchar/char/timestamp/boolean)
- add your json file (with eligble datatypes and same column names as
 schema specified in the create command) to hdfs location
- add this 'location' to your table
- run a select statement

Fix:
- arrow library is included wherever required
- json format is added to scan node base class.
- json scanner files are added, that implement methods to read the
 json file from the specified file location

Change-Id: If79364a421d862d0d837f9be694911e388d4d629
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/exec/CMakeLists.txt
A be/src/exec/hdfs-json-scanner.cc
A be/src/exec/hdfs-json-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
A cmake_modules/FindArrow.cmake
9 files changed, 607 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/17771/9
-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 9
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 11:

> Patch Set 11:
> 
> Build Failed 
> 
> https://jenkins.impala.io/job/gerrit-code-review-checks/9913/ : Initial code review checks failed. See linked job for details on the failure.

The build failure is https://jenkins.impala.io/job/clang-tidy-ub1604/17531

 /home/ubuntu/Impala/be/src/exec/hdfs-json-scanner.cc:135:7: warning: variable 'bytes_read' is used uninitialized whenever 'if' condition is false [clang-diagnostic-sometimes-uninitialized]


-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 11
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 14 Dec 2021 01:14:51 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17771/7/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

http://gerrit.cloudera.org:8080/#/c/17771/7/bin/bootstrap_toolchain.py@469
PS7, Line 469: )
flake8: E501 line too long (91 > 90 characters)



-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 7
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 02 Nov 2021 03:22:49 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 22:

(33 comments)

http://gerrit.cloudera.org:8080/#/c/17771/22/be/src/exec/hdfs-json-scanner.cc
File be/src/exec/hdfs-json-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17771/22/be/src/exec/hdfs-json-scanner.cc@171
PS22, Line 171:     // VLOG_QUERY << "decimal128" << arrow::decimal128(ct.precision, ct.scale)->ToString();
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/22/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/17771/22/be/src/exec/hdfs-scan-node-base.cc@70
PS22, Line 70:                                           "for all reads, regardless of whether the read is local or remote. By default, the "
line too long (126 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/22/be/src/exec/hdfs-scan-node-base.cc@71
PS22, Line 71:                                           "IO data cache is only used if the data is expected to be remote. Used by tests.");
line too long (125 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/22/be/src/exec/hdfs-scan-node-base.cc@79
PS22, Line 79:                                                        " across all Disk I/O threads in HDFS read operations.");
line too long (112 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/22/be/src/exec/hdfs-scan-node-base.cc@81
PS22, Line 81:                                                            " spent across all Disk I/O threads in HDFS open operations.");
line too long (122 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/22/be/src/exec/hdfs-scan-node-base.cc@84
PS22, Line 84:                              " while it is executing I/O operations on behalf of a scan.");
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/22/be/src/exec/hdfs-scan-node-base.cc@91
PS22, Line 91:                                                                   "disks accessed by HDFS scan. Each local disk is counted as a disk and each type of"
line too long (150 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/22/be/src/exec/hdfs-scan-node-base.cc@92
PS22, Line 92:                                                                   " remote filesystem (e.g. HDFS remote reads, S3) is counted as a distinct disk.");
line too long (148 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/22/be/src/exec/hdfs-scan-node-base.cc@94
PS22, Line 94:                                                                               " average number of HDFS read threads executing read operations on behalf of this "
line too long (161 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/22/be/src/exec/hdfs-scan-node-base.cc@95
PS22, Line 95:                                                                               "scan. Higher values (i.e. close to the aggregate number of I/O threads across "
line too long (158 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/22/be/src/exec/hdfs-scan-node-base.cc@96
PS22, Line 96:                                                                               "all disks accessed) show that this scan is using a larger proportion of the I/O "
line too long (160 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/22/be/src/exec/hdfs-scan-node-base.cc@97
PS22, Line 97:                                                                               "capacity of the system. Lower values show that either this scan is not I/O bound"
line too long (160 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/22/be/src/exec/hdfs-scan-node-base.cc@98
PS22, Line 98:                                                                               " or that it is getting a small share of the I/O capacity of the system.");
line too long (153 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/22/be/src/exec/hdfs-scan-node-base.cc@106
PS22, Line 106:                   "Use this to determine if the scan got all of the reservation it wanted. Does not "
line too long (101 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/22/be/src/exec/hdfs-scan-node-base.cc@107
PS22, Line 107:                   "include subsequent reservation increases done by scanner implementation "
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/22/be/src/exec/hdfs-scan-node-base.cc@127
PS22, Line 127:                                                     "threads spent waiting for I/O. This value can be compared to the value of "
line too long (128 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/22/be/src/exec/hdfs-scan-node-base.cc@128
PS22, Line 128:                                                     "ScannerThreadsTotalWallClockTime of MT_DOP = 0 scan nodes or otherwise compared "
line too long (134 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/22/be/src/exec/hdfs-scan-node-base.cc@129
PS22, Line 129:                                                     "to the total time reported for MT_DOP > 0 scan nodes. High values show that "
line too long (130 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/22/be/src/exec/hdfs-scan-node-base.cc@130
PS22, Line 130:                                                     "scanner threads are spending significant time waiting for I/O instead of "
line too long (127 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/22/be/src/exec/hdfs-scan-node-base.cc@131
PS22, Line 131:                                                     "processing data. Note that this includes the time when the thread is runnable "
line too long (132 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/22/be/src/exec/hdfs-scan-node-base.cc@135
PS22, Line 135:                   "Each sample in the counter is the size of a single column that is scanned by the "
line too long (101 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/22/be/src/exec/hdfs-scan-node-base.cc@139
PS22, Line 139:                   "Each sample in the counter is the size of a single column that is scanned by the "
line too long (101 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/22/be/src/exec/hdfs-scan-node-base.cc@747
PS22, Line 747:                  metadata->partition_id, FilterStats::FILES_KEY, filter_ctxs, file, state)) {
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/22/be/src/exec/hdfs-scan-node-base.cc@1231
PS22, Line 1231:                                                                  "Read $0 of data across network that was expected to be local. Block locality "
line too long (144 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/22/be/src/exec/hdfs-scan-node-base.cc@1232
PS22, Line 1232:                                                                  "metadata for table '$1.$2' may be stale. This only affects query performance "
line too long (144 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/22/be/src/exec/hdfs-scan-node-base.cc@1233
PS22, Line 1233:                                                                  "and not result correctness. One of the common causes for this warning is HDFS "
line too long (145 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/22/be/src/exec/hdfs-scan-node-base.cc@1234
PS22, Line 1234:                                                                  "rebalancer moving some of the file's blocks. If the issue persists, consider "
line too long (144 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/22/be/src/exec/hdfs-scan-node-base.cc@1235
PS22, Line 1235:                                                                  "running \"INVALIDATE METADATA `$1`.`$2`\".",
line too long (110 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/22/be/src/exec/hdfs-scan-node-base.cc@1236
PS22, Line 1236:                                                                  PrettyPrinter::Print(unexpected_remote_bytes_->value(), TUnit::BYTES),
line too long (135 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/22/be/src/exec/hdfs-scan-node-base.cc@1237
PS22, Line 1237:                                                                  hdfs_table_->database(), hdfs_table_->name())));
line too long (113 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/22/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

http://gerrit.cloudera.org:8080/#/c/17771/22/bin/bootstrap_toolchain.py@489
PS22, Line 489: "
flake8: E501 line too long (98 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/17771/22/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/17771/22/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@475
PS22, Line 475:     
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/22/tests/query_test/test_tpch_queries.py
File tests/query_test/test_tpch_queries.py:

http://gerrit.cloudera.org:8080/#/c/17771/22/tests/query_test/test_tpch_queries.py@39
PS22, Line 39: s
flake8: E501 line too long (96 > 90 characters)



-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 22
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 10 Nov 2022 06:18:08 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 20:

(32 comments)

http://gerrit.cloudera.org:8080/#/c/17771/20/be/src/exec/hdfs-json-scanner.cc
File be/src/exec/hdfs-json-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17771/20/be/src/exec/hdfs-json-scanner.cc@171
PS20, Line 171:     // VLOG_QUERY << "decimal128" << arrow::decimal128(ct.precision, ct.scale)->ToString();
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/20/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/17771/20/be/src/exec/hdfs-scan-node-base.cc@70
PS20, Line 70:                                           "for all reads, regardless of whether the read is local or remote. By default, the "
line too long (126 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/20/be/src/exec/hdfs-scan-node-base.cc@71
PS20, Line 71:                                           "IO data cache is only used if the data is expected to be remote. Used by tests.");
line too long (125 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/20/be/src/exec/hdfs-scan-node-base.cc@79
PS20, Line 79:                                                        " across all Disk I/O threads in HDFS read operations.");
line too long (112 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/20/be/src/exec/hdfs-scan-node-base.cc@81
PS20, Line 81:                                                            " spent across all Disk I/O threads in HDFS open operations.");
line too long (122 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/20/be/src/exec/hdfs-scan-node-base.cc@84
PS20, Line 84:                              " while it is executing I/O operations on behalf of a scan.");
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/20/be/src/exec/hdfs-scan-node-base.cc@91
PS20, Line 91:                                                                   "disks accessed by HDFS scan. Each local disk is counted as a disk and each type of"
line too long (150 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/20/be/src/exec/hdfs-scan-node-base.cc@92
PS20, Line 92:                                                                   " remote filesystem (e.g. HDFS remote reads, S3) is counted as a distinct disk.");
line too long (148 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/20/be/src/exec/hdfs-scan-node-base.cc@94
PS20, Line 94:                                                                               " average number of HDFS read threads executing read operations on behalf of this "
line too long (161 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/20/be/src/exec/hdfs-scan-node-base.cc@95
PS20, Line 95:                                                                               "scan. Higher values (i.e. close to the aggregate number of I/O threads across "
line too long (158 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/20/be/src/exec/hdfs-scan-node-base.cc@96
PS20, Line 96:                                                                               "all disks accessed) show that this scan is using a larger proportion of the I/O "
line too long (160 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/20/be/src/exec/hdfs-scan-node-base.cc@97
PS20, Line 97:                                                                               "capacity of the system. Lower values show that either this scan is not I/O bound"
line too long (160 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/20/be/src/exec/hdfs-scan-node-base.cc@98
PS20, Line 98:                                                                               " or that it is getting a small share of the I/O capacity of the system.");
line too long (153 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/20/be/src/exec/hdfs-scan-node-base.cc@106
PS20, Line 106:                   "Use this to determine if the scan got all of the reservation it wanted. Does not "
line too long (101 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/20/be/src/exec/hdfs-scan-node-base.cc@107
PS20, Line 107:                   "include subsequent reservation increases done by scanner implementation "
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/20/be/src/exec/hdfs-scan-node-base.cc@127
PS20, Line 127:                                                     "threads spent waiting for I/O. This value can be compared to the value of "
line too long (128 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/20/be/src/exec/hdfs-scan-node-base.cc@128
PS20, Line 128:                                                     "ScannerThreadsTotalWallClockTime of MT_DOP = 0 scan nodes or otherwise compared "
line too long (134 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/20/be/src/exec/hdfs-scan-node-base.cc@129
PS20, Line 129:                                                     "to the total time reported for MT_DOP > 0 scan nodes. High values show that "
line too long (130 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/20/be/src/exec/hdfs-scan-node-base.cc@130
PS20, Line 130:                                                     "scanner threads are spending significant time waiting for I/O instead of "
line too long (127 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/20/be/src/exec/hdfs-scan-node-base.cc@131
PS20, Line 131:                                                     "processing data. Note that this includes the time when the thread is runnable "
line too long (132 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/20/be/src/exec/hdfs-scan-node-base.cc@135
PS20, Line 135:                   "Each sample in the counter is the size of a single column that is scanned by the "
line too long (101 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/20/be/src/exec/hdfs-scan-node-base.cc@139
PS20, Line 139:                   "Each sample in the counter is the size of a single column that is scanned by the "
line too long (101 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/20/be/src/exec/hdfs-scan-node-base.cc@747
PS20, Line 747:                  metadata->partition_id, FilterStats::FILES_KEY, filter_ctxs, file, state)) {
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/20/be/src/exec/hdfs-scan-node-base.cc@1231
PS20, Line 1231:                                                                  "Read $0 of data across network that was expected to be local. Block locality "
line too long (144 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/20/be/src/exec/hdfs-scan-node-base.cc@1232
PS20, Line 1232:                                                                  "metadata for table '$1.$2' may be stale. This only affects query performance "
line too long (144 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/20/be/src/exec/hdfs-scan-node-base.cc@1233
PS20, Line 1233:                                                                  "and not result correctness. One of the common causes for this warning is HDFS "
line too long (145 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/20/be/src/exec/hdfs-scan-node-base.cc@1234
PS20, Line 1234:                                                                  "rebalancer moving some of the file's blocks. If the issue persists, consider "
line too long (144 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/20/be/src/exec/hdfs-scan-node-base.cc@1235
PS20, Line 1235:                                                                  "running \"INVALIDATE METADATA `$1`.`$2`\".",
line too long (110 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/20/be/src/exec/hdfs-scan-node-base.cc@1236
PS20, Line 1236:                                                                  PrettyPrinter::Print(unexpected_remote_bytes_->value(), TUnit::BYTES),
line too long (135 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/20/be/src/exec/hdfs-scan-node-base.cc@1237
PS20, Line 1237:                                                                  hdfs_table_->database(), hdfs_table_->name())));
line too long (113 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/20/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

http://gerrit.cloudera.org:8080/#/c/17771/20/bin/bootstrap_toolchain.py@489
PS20, Line 489: "
flake8: E501 line too long (98 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/17771/20/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/17771/20/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@475
PS20, Line 475:     
line has trailing whitespace



-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 20
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 04 Nov 2022 11:14:49 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 20:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11792/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 20
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 04 Nov 2022 11:34:44 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 2:

(38 comments)

http://gerrit.cloudera.org:8080/#/c/17771/2/be/src/exec/hdfs-json-scanner.h
File be/src/exec/hdfs-json-scanner.h:

http://gerrit.cloudera.org:8080/#/c/17771/2/be/src/exec/hdfs-json-scanner.h@70
PS2, Line 70:   
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/2/be/src/exec/hdfs-json-scanner.h@91
PS2, Line 91:  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/2/be/src/exec/hdfs-json-scanner.h@148
PS2, Line 148:    
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/2/be/src/exec/hdfs-json-scanner.h@151
PS2, Line 151:   inline Status AllocateTupleMem(RowBatch* row_batch) WARN_UNUSED_RESULT;  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/2/be/src/exec/hdfs-json-scanner.h@153
PS2, Line 153:   int num_rows;  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/2/be/src/exec/hdfs-json-scanner.h@154
PS2, Line 154:   uint8_t* tuple_mem_end_ = nullptr;  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/2/be/src/exec/hdfs-json-scanner.h@155
PS2, Line 155:   const io::ScanRange* metadata_range_ = nullptr;  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/2/be/src/exec/hdfs-json-scanner.h@156
PS2, Line 156:   const char *filename() const { return metadata_range_->file(); }  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/2/be/src/exec/hdfs-json-scanner.h@161
PS2, Line 161:   boost::scoped_ptr<MemPool> data_batch_pool_;  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/2/be/src/exec/hdfs-json-scanner.cc
File be/src/exec/hdfs-json-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17771/2/be/src/exec/hdfs-json-scanner.cc@55
PS2, Line 55:      RETURN_IF_ERROR(scan_node->AddDiskIoRanges(file, EnqueueLocation::TAIL));  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/2/be/src/exec/hdfs-json-scanner.cc@77
PS2, Line 77:   }  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/2/be/src/exec/hdfs-json-scanner.cc@83
PS2, Line 83:   chunk_sizes_[*out] = size;  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/2/be/src/exec/hdfs-json-scanner.cc@88
PS2, Line 88: arrow::Status HdfsJsonScanner::ArrowMemPool::Reallocate(int64_t old_size, int64_t new_size, uint8_t** ptr) {
line too long (108 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/2/be/src/exec/hdfs-json-scanner.cc@97
PS2, Line 97:        << GetStackTrace();  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/2/be/src/exec/hdfs-json-scanner.cc@107
PS2, Line 107: arrow::Result<int64_t> HdfsJsonScanner::ScanRangeInputStream::Read(int64_t nbytes, void* out){
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/2/be/src/exec/hdfs-json-scanner.cc@110
PS2, Line 110:   int64_t fl = file_desc_->file_length;  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/2/be/src/exec/hdfs-json-scanner.cc@112
PS2, Line 112:     nbytes = fl - pos_;  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/2/be/src/exec/hdfs-json-scanner.cc@143
PS2, Line 143: arrow::Result<std::shared_ptr<arrow::Buffer>> HdfsJsonScanner::ScanRangeInputStream::Read(int64_t nbytes){
line too long (106 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/2/be/src/exec/hdfs-json-scanner.cc@197
PS2, Line 197: arrow::Status HdfsJsonScanner::ReadTable(std::shared_ptr<arrow::io::InputStream> input_stream){
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/2/be/src/exec/hdfs-json-scanner.cc@199
PS2, Line 199:   ARROW_ASSIGN_OR_RAISE(reader_, arrow::json::TableReader::Make(arrow::default_memory_pool(),
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/2/be/src/exec/hdfs-json-scanner.cc@208
PS2, Line 208:   RETURN_IF_ERROR(HdfsScanner::Open(context));  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/2/be/src/exec/hdfs-json-scanner.cc@214
PS2, Line 214:   vector<std::shared_ptr<arrow::Field>> fields_list = {};  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/2/be/src/exec/hdfs-json-scanner.cc@217
PS2, Line 217:     std::shared_ptr<arrow::Field> field_a = arrow::field(cvf.name(), convert_type(cvf.type()));
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/2/be/src/exec/hdfs-json-scanner.cc@223
PS2, Line 223:   readOptions_ = arrow::json::ReadOptions::Defaults(); 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/2/be/src/exec/hdfs-json-scanner.cc@230
PS2, Line 230:   VLOG_QUERY << "args: " << table_->ToString() << std::endl << GetStackTrace();  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/2/be/src/exec/hdfs-json-scanner.cc@249
PS2, Line 249:     eos_ = true; 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/2/be/src/exec/hdfs-json-scanner.cc@258
PS2, Line 258:  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/2/be/src/exec/hdfs-json-scanner.cc@261
PS2, Line 261:   
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/2/be/src/exec/hdfs-json-scanner.cc@263
PS2, Line 263:   // reading rows from the previous point we left, tilll either end of table/capacity of rowbatch
line too long (97 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/2/be/src/exec/hdfs-json-scanner.cc@271
PS2, Line 271:    void* slot_val_ptr = tuple->GetSlot(slot_desc->tuple_offset());  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/2/be/src/exec/hdfs-json-scanner.cc@272
PS2, Line 272:    // helpful debug statements 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/2/be/src/exec/hdfs-json-scanner.cc@300
PS2, Line 300:      int dst_len = slot_desc->type().len;     
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/2/be/src/exec/hdfs-json-scanner.cc@309
PS2, Line 309:      src_ptr = blob_ + (val - blob->data());     
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/2/be/src/exec/hdfs-json-scanner.cc@322
PS2, Line 322:         StringValue* dst = reinterpret_cast<StringValue*>(slot_val_ptr);        
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/2/be/src/exec/hdfs-json-scanner.cc@326
PS2, Line 326:         } 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/2/be/src/exec/hdfs-json-scanner.cc@345
PS2, Line 345: Status s =  CommitRows(row_read, row_batch);  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/2/be/src/exec/hdfs-json-scanner.cc@354
PS2, Line 354:     row_batch->tuple_data_pool()->AcquireData(data_batch_pool_.get(), false);    
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/2/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

http://gerrit.cloudera.org:8080/#/c/17771/2/bin/bootstrap_toolchain.py@443
PS2, Line 443: )
flake8: E501 line too long (91 > 90 characters)



-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 2
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sat, 14 Aug 2021 00:48:23 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 1:

(38 comments)

http://gerrit.cloudera.org:8080/#/c/17771/1/be/src/exec/hdfs-json-scanner.h
File be/src/exec/hdfs-json-scanner.h:

http://gerrit.cloudera.org:8080/#/c/17771/1/be/src/exec/hdfs-json-scanner.h@53
PS1, Line 53:   
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/1/be/src/exec/hdfs-json-scanner.h@74
PS1, Line 74:  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/1/be/src/exec/hdfs-json-scanner.h@131
PS1, Line 131:    
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/1/be/src/exec/hdfs-json-scanner.h@134
PS1, Line 134:   inline Status AllocateTupleMem(RowBatch* row_batch) WARN_UNUSED_RESULT;  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/1/be/src/exec/hdfs-json-scanner.h@136
PS1, Line 136:   int num_rows;  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/1/be/src/exec/hdfs-json-scanner.h@137
PS1, Line 137:   uint8_t* tuple_mem_end_ = nullptr;  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/1/be/src/exec/hdfs-json-scanner.h@138
PS1, Line 138:   const io::ScanRange* metadata_range_ = nullptr;  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/1/be/src/exec/hdfs-json-scanner.h@139
PS1, Line 139:   const char *filename() const { return metadata_range_->file(); }  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/1/be/src/exec/hdfs-json-scanner.h@144
PS1, Line 144:   boost::scoped_ptr<MemPool> data_batch_pool_;  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/1/be/src/exec/hdfs-json-scanner.cc
File be/src/exec/hdfs-json-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17771/1/be/src/exec/hdfs-json-scanner.cc@38
PS1, Line 38:      RETURN_IF_ERROR(scan_node->AddDiskIoRanges(file, EnqueueLocation::TAIL));  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/1/be/src/exec/hdfs-json-scanner.cc@60
PS1, Line 60:   }  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/1/be/src/exec/hdfs-json-scanner.cc@66
PS1, Line 66:   chunk_sizes_[*out] = size;  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/1/be/src/exec/hdfs-json-scanner.cc@71
PS1, Line 71: arrow::Status HdfsJsonScanner::ArrowMemPool::Reallocate(int64_t old_size, int64_t new_size, uint8_t** ptr) {
line too long (108 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/1/be/src/exec/hdfs-json-scanner.cc@80
PS1, Line 80:        << GetStackTrace();  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/1/be/src/exec/hdfs-json-scanner.cc@90
PS1, Line 90: arrow::Result<int64_t> HdfsJsonScanner::ScanRangeInputStream::Read(int64_t nbytes, void* out){
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/1/be/src/exec/hdfs-json-scanner.cc@93
PS1, Line 93:   int64_t fl = file_desc_->file_length;  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/1/be/src/exec/hdfs-json-scanner.cc@95
PS1, Line 95:     nbytes = fl - pos_;  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/1/be/src/exec/hdfs-json-scanner.cc@126
PS1, Line 126: arrow::Result<std::shared_ptr<arrow::Buffer>> HdfsJsonScanner::ScanRangeInputStream::Read(int64_t nbytes){
line too long (106 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/1/be/src/exec/hdfs-json-scanner.cc@180
PS1, Line 180: arrow::Status HdfsJsonScanner::ReadTable(std::shared_ptr<arrow::io::InputStream> input_stream){
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/1/be/src/exec/hdfs-json-scanner.cc@182
PS1, Line 182:   ARROW_ASSIGN_OR_RAISE(reader_, arrow::json::TableReader::Make(arrow::default_memory_pool(),
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/1/be/src/exec/hdfs-json-scanner.cc@191
PS1, Line 191:   RETURN_IF_ERROR(HdfsScanner::Open(context));  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/1/be/src/exec/hdfs-json-scanner.cc@197
PS1, Line 197:   vector<std::shared_ptr<arrow::Field>> fields_list = {};  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/1/be/src/exec/hdfs-json-scanner.cc@200
PS1, Line 200:     std::shared_ptr<arrow::Field> field_a = arrow::field(cvf.name(), convert_type(cvf.type()));
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/1/be/src/exec/hdfs-json-scanner.cc@206
PS1, Line 206:   readOptions_ = arrow::json::ReadOptions::Defaults(); 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/1/be/src/exec/hdfs-json-scanner.cc@213
PS1, Line 213:   VLOG_QUERY << "args: " << table_->ToString() << std::endl << GetStackTrace();  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/1/be/src/exec/hdfs-json-scanner.cc@232
PS1, Line 232:     eos_ = true; 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/1/be/src/exec/hdfs-json-scanner.cc@241
PS1, Line 241:  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/1/be/src/exec/hdfs-json-scanner.cc@244
PS1, Line 244:   
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/1/be/src/exec/hdfs-json-scanner.cc@246
PS1, Line 246:   // reading rows from the previous point we left, tilll either end of table/capacity of rowbatch
line too long (97 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/1/be/src/exec/hdfs-json-scanner.cc@254
PS1, Line 254:    void* slot_val_ptr = tuple->GetSlot(slot_desc->tuple_offset());  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/1/be/src/exec/hdfs-json-scanner.cc@255
PS1, Line 255:    // helpful debug statements 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/1/be/src/exec/hdfs-json-scanner.cc@283
PS1, Line 283:      int dst_len = slot_desc->type().len;     
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/1/be/src/exec/hdfs-json-scanner.cc@292
PS1, Line 292:      src_ptr = blob_ + (val - blob->data());     
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/1/be/src/exec/hdfs-json-scanner.cc@305
PS1, Line 305:         StringValue* dst = reinterpret_cast<StringValue*>(slot_val_ptr);        
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/1/be/src/exec/hdfs-json-scanner.cc@309
PS1, Line 309:         } 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/1/be/src/exec/hdfs-json-scanner.cc@328
PS1, Line 328: Status s =  CommitRows(row_read, row_batch);  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/1/be/src/exec/hdfs-json-scanner.cc@337
PS1, Line 337:     row_batch->tuple_data_pool()->AcquireData(data_batch_pool_.get(), false);    
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/1/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

http://gerrit.cloudera.org:8080/#/c/17771/1/bin/bootstrap_toolchain.py@443
PS1, Line 443: )
flake8: E501 line too long (91 > 90 characters)



-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 1
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 13 Aug 2021 23:24:24 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Aman Sinha, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17771

to look at the new patch set (#4).

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................

WiP: IMPALA-10798 : Prototype for JSON reader

This prototype allows user to  create a table stored as jsonfile and
query it.
Steps to test:
- create a json table with schema specified using eligible datatypes
(int8/16/32/64/float/double/string/varchar/char)
- add your json file (with eligble datatypes and same column names as
 schema specified in the create command) to hdfs location
- add this 'location' to your table
- run a select statement

Fix:
- arrow library is included wherever required
- json format is added to scan node base class.
- json scanner files are added, that implement methods to read the
 json file from the specified file location

Change-Id: If79364a421d862d0d837f9be694911e388d4d629
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/exec/CMakeLists.txt
A be/src/exec/hdfs-json-scanner.cc
A be/src/exec/hdfs-json-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
A cmake_modules/FindArrow.cmake
9 files changed, 609 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/17771/4
-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 4
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 3:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/9297/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 3
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sat, 14 Aug 2021 02:08:42 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 3:

(38 comments)

http://gerrit.cloudera.org:8080/#/c/17771/3/be/src/exec/hdfs-json-scanner.h
File be/src/exec/hdfs-json-scanner.h:

http://gerrit.cloudera.org:8080/#/c/17771/3/be/src/exec/hdfs-json-scanner.h@70
PS3, Line 70:   
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/3/be/src/exec/hdfs-json-scanner.h@91
PS3, Line 91:  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/3/be/src/exec/hdfs-json-scanner.h@148
PS3, Line 148:    
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/3/be/src/exec/hdfs-json-scanner.h@151
PS3, Line 151:   inline Status AllocateTupleMem(RowBatch* row_batch) WARN_UNUSED_RESULT;  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/3/be/src/exec/hdfs-json-scanner.h@153
PS3, Line 153:   int num_rows;  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/3/be/src/exec/hdfs-json-scanner.h@154
PS3, Line 154:   uint8_t* tuple_mem_end_ = nullptr;  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/3/be/src/exec/hdfs-json-scanner.h@155
PS3, Line 155:   const io::ScanRange* metadata_range_ = nullptr;  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/3/be/src/exec/hdfs-json-scanner.h@156
PS3, Line 156:   const char *filename() const { return metadata_range_->file(); }  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/3/be/src/exec/hdfs-json-scanner.h@161
PS3, Line 161:   boost::scoped_ptr<MemPool> data_batch_pool_;  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/3/be/src/exec/hdfs-json-scanner.cc
File be/src/exec/hdfs-json-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17771/3/be/src/exec/hdfs-json-scanner.cc@55
PS3, Line 55:      RETURN_IF_ERROR(scan_node->AddDiskIoRanges(file, EnqueueLocation::TAIL));  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/3/be/src/exec/hdfs-json-scanner.cc@77
PS3, Line 77:   }  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/3/be/src/exec/hdfs-json-scanner.cc@83
PS3, Line 83:   chunk_sizes_[*out] = size;  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/3/be/src/exec/hdfs-json-scanner.cc@88
PS3, Line 88: arrow::Status HdfsJsonScanner::ArrowMemPool::Reallocate(int64_t old_size, int64_t new_size, uint8_t** ptr) {
line too long (108 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/3/be/src/exec/hdfs-json-scanner.cc@97
PS3, Line 97:        << GetStackTrace();  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/3/be/src/exec/hdfs-json-scanner.cc@107
PS3, Line 107: arrow::Result<int64_t> HdfsJsonScanner::ScanRangeInputStream::Read(int64_t nbytes, void* out){
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/3/be/src/exec/hdfs-json-scanner.cc@110
PS3, Line 110:   int64_t fl = file_desc_->file_length;  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/3/be/src/exec/hdfs-json-scanner.cc@112
PS3, Line 112:     nbytes = fl - pos_;  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/3/be/src/exec/hdfs-json-scanner.cc@143
PS3, Line 143: arrow::Result<std::shared_ptr<arrow::Buffer>> HdfsJsonScanner::ScanRangeInputStream::Read(int64_t nbytes){
line too long (106 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/3/be/src/exec/hdfs-json-scanner.cc@197
PS3, Line 197: arrow::Status HdfsJsonScanner::ReadTable(std::shared_ptr<arrow::io::InputStream> input_stream){
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/3/be/src/exec/hdfs-json-scanner.cc@199
PS3, Line 199:   ARROW_ASSIGN_OR_RAISE(reader_, arrow::json::TableReader::Make(arrow::default_memory_pool(),
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/3/be/src/exec/hdfs-json-scanner.cc@208
PS3, Line 208:   RETURN_IF_ERROR(HdfsScanner::Open(context));  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/3/be/src/exec/hdfs-json-scanner.cc@214
PS3, Line 214:   vector<std::shared_ptr<arrow::Field>> fields_list = {};  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/3/be/src/exec/hdfs-json-scanner.cc@217
PS3, Line 217:     std::shared_ptr<arrow::Field> field_a = arrow::field(cvf.name(), convert_type(cvf.type()));
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/3/be/src/exec/hdfs-json-scanner.cc@223
PS3, Line 223:   readOptions_ = arrow::json::ReadOptions::Defaults(); 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/3/be/src/exec/hdfs-json-scanner.cc@230
PS3, Line 230:   VLOG_QUERY << "args: " << table_->ToString() << std::endl << GetStackTrace();  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/3/be/src/exec/hdfs-json-scanner.cc@249
PS3, Line 249:     eos_ = true; 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/3/be/src/exec/hdfs-json-scanner.cc@258
PS3, Line 258:  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/3/be/src/exec/hdfs-json-scanner.cc@261
PS3, Line 261:   
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/3/be/src/exec/hdfs-json-scanner.cc@263
PS3, Line 263:   // reading rows from the previous point we left, tilll either end of table/capacity of rowbatch
line too long (97 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/3/be/src/exec/hdfs-json-scanner.cc@271
PS3, Line 271:    void* slot_val_ptr = tuple->GetSlot(slot_desc->tuple_offset());  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/3/be/src/exec/hdfs-json-scanner.cc@272
PS3, Line 272:    // helpful debug statements 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/3/be/src/exec/hdfs-json-scanner.cc@300
PS3, Line 300:      int dst_len = slot_desc->type().len;     
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/3/be/src/exec/hdfs-json-scanner.cc@309
PS3, Line 309:      src_ptr = blob_ + (val - blob->data());     
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/3/be/src/exec/hdfs-json-scanner.cc@322
PS3, Line 322:         StringValue* dst = reinterpret_cast<StringValue*>(slot_val_ptr);        
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/3/be/src/exec/hdfs-json-scanner.cc@326
PS3, Line 326:         } 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/3/be/src/exec/hdfs-json-scanner.cc@345
PS3, Line 345: Status s =  CommitRows(row_read, row_batch);  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/3/be/src/exec/hdfs-json-scanner.cc@354
PS3, Line 354:     row_batch->tuple_data_pool()->AcquireData(data_batch_pool_.get(), false);    
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/3/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

http://gerrit.cloudera.org:8080/#/c/17771/3/bin/bootstrap_toolchain.py@443
PS3, Line 443: )
flake8: E501 line too long (91 > 90 characters)



-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 3
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sat, 14 Aug 2021 01:48:32 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Aman Sinha, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17771

to look at the new patch set (#12).

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................

WiP: IMPALA-10798 : Prototype for JSON reader

This prototype allows user to  create a table stored as jsonfile and
query it.
Steps to test:
- create a json table with schema specified using eligible datatypes
(int8/16/32/64/float/double/string/varchar/char/timestamp/boolean)
- add your json file (with eligble datatypes and same column names as
 schema specified in the create command) to hdfs location
- add this 'location' to your table
- run a select statement

Fix:
- arrow library is included wherever required
- json format is added to scan node base class.
- json scanner files are added, that implement methods to read the
 json file from the specified file location

Change-Id: If79364a421d862d0d837f9be694911e388d4d629
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/exec/CMakeLists.txt
A be/src/exec/hdfs-json-scanner.cc
A be/src/exec/hdfs-json-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
A cmake_modules/FindArrow.cmake
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
10 files changed, 617 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/17771/12
-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 12
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 25:

(40 comments)

http://gerrit.cloudera.org:8080/#/c/17771/25/be/src/exec/hdfs-json-scanner.cc
File be/src/exec/hdfs-json-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17771/25/be/src/exec/hdfs-json-scanner.cc@220
PS25, Line 220:     std::shared_ptr<arrow::Field> field_a = arrow::field(cvf.name(), ColumnType2ArrowType(cvf.type()));
line too long (103 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/25/be/src/exec/hdfs-json-scanner.cc@262
PS25, Line 262:         VLOG_QUERY << " PrintTemplateTuple:::" << PrintTuple(template_tuple, *tuple_desc) << std::endl;
line too long (103 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/25/be/src/exec/hdfs-json-scanner.cc@285
PS25, Line 285:       VLOG_QUERY<< "Column in GetNextInternal" << column->type()->ToString() << column->ToString();
line too long (99 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/25/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/17771/25/be/src/exec/hdfs-scan-node-base.cc@70
PS25, Line 70:                                           "for all reads, regardless of whether the read is local or remote. By default, the "
line too long (126 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/25/be/src/exec/hdfs-scan-node-base.cc@71
PS25, Line 71:                                           "IO data cache is only used if the data is expected to be remote. Used by tests.");
line too long (125 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/25/be/src/exec/hdfs-scan-node-base.cc@79
PS25, Line 79:                                                        " across all Disk I/O threads in HDFS read operations.");
line too long (112 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/25/be/src/exec/hdfs-scan-node-base.cc@81
PS25, Line 81:                                                            " spent across all Disk I/O threads in HDFS open operations.");
line too long (122 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/25/be/src/exec/hdfs-scan-node-base.cc@84
PS25, Line 84:                              " while it is executing I/O operations on behalf of a scan.");
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/25/be/src/exec/hdfs-scan-node-base.cc@91
PS25, Line 91:                                                                   "disks accessed by HDFS scan. Each local disk is counted as a disk and each type of"
line too long (150 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/25/be/src/exec/hdfs-scan-node-base.cc@92
PS25, Line 92:                                                                   " remote filesystem (e.g. HDFS remote reads, S3) is counted as a distinct disk.");
line too long (148 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/25/be/src/exec/hdfs-scan-node-base.cc@94
PS25, Line 94:                                                                               " average number of HDFS read threads executing read operations on behalf of this "
line too long (161 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/25/be/src/exec/hdfs-scan-node-base.cc@95
PS25, Line 95:                                                                               "scan. Higher values (i.e. close to the aggregate number of I/O threads across "
line too long (158 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/25/be/src/exec/hdfs-scan-node-base.cc@96
PS25, Line 96:                                                                               "all disks accessed) show that this scan is using a larger proportion of the I/O "
line too long (160 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/25/be/src/exec/hdfs-scan-node-base.cc@97
PS25, Line 97:                                                                               "capacity of the system. Lower values show that either this scan is not I/O bound"
line too long (160 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/25/be/src/exec/hdfs-scan-node-base.cc@98
PS25, Line 98:                                                                               " or that it is getting a small share of the I/O capacity of the system.");
line too long (153 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/25/be/src/exec/hdfs-scan-node-base.cc@106
PS25, Line 106:                   "Use this to determine if the scan got all of the reservation it wanted. Does not "
line too long (101 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/25/be/src/exec/hdfs-scan-node-base.cc@107
PS25, Line 107:                   "include subsequent reservation increases done by scanner implementation "
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/25/be/src/exec/hdfs-scan-node-base.cc@129
PS25, Line 129:                                                     "threads spent waiting for I/O. This value can be compared to the value of "
line too long (128 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/25/be/src/exec/hdfs-scan-node-base.cc@130
PS25, Line 130:                                                     "ScannerThreadsTotalWallClockTime of MT_DOP = 0 scan nodes or otherwise compared "
line too long (134 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/25/be/src/exec/hdfs-scan-node-base.cc@131
PS25, Line 131:                                                     "to the total time reported for MT_DOP > 0 scan nodes. High values show that "
line too long (130 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/25/be/src/exec/hdfs-scan-node-base.cc@132
PS25, Line 132:                                                     "scanner threads are spending significant time waiting for I/O instead of "
line too long (127 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/25/be/src/exec/hdfs-scan-node-base.cc@133
PS25, Line 133:                                                     "processing data. Note that this includes the time when the thread is runnable "
line too long (132 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/25/be/src/exec/hdfs-scan-node-base.cc@137
PS25, Line 137:                   "Each sample in the counter is the size of a single column that is scanned by the "
line too long (101 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/25/be/src/exec/hdfs-scan-node-base.cc@141
PS25, Line 141:                   "Each sample in the counter is the size of a single column that is scanned by the "
line too long (101 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/25/be/src/exec/hdfs-scan-node-base.cc@353
PS25, Line 353:   VLOG_QUERY<< hdfs_table_->partition_descriptors().size() << "   " << hdfs_table_->DebugString() << "    " << shared_state_.use_mt_scan_node_ << "    " << instance_ctx_pbs.size() << "  " << tnode_->hdfs_scan_node << GetStackTrace();
line too long (233 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/25/be/src/exec/hdfs-scan-node-base.cc@751
PS25, Line 751:                  metadata->partition_id, FilterStats::FILES_KEY, filter_ctxs, file, state)) {
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/25/be/src/exec/hdfs-scan-node-base.cc@1236
PS25, Line 1236:                                                                  "Read $0 of data across network that was expected to be local. Block locality "
line too long (144 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/25/be/src/exec/hdfs-scan-node-base.cc@1237
PS25, Line 1237:                                                                  "metadata for table '$1.$2' may be stale. This only affects query performance "
line too long (144 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/25/be/src/exec/hdfs-scan-node-base.cc@1238
PS25, Line 1238:                                                                  "and not result correctness. One of the common causes for this warning is HDFS "
line too long (145 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/25/be/src/exec/hdfs-scan-node-base.cc@1239
PS25, Line 1239:                                                                  "rebalancer moving some of the file's blocks. If the issue persists, consider "
line too long (144 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/25/be/src/exec/hdfs-scan-node-base.cc@1240
PS25, Line 1240:                                                                  "running \"INVALIDATE METADATA `$1`.`$2`\".",
line too long (110 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/25/be/src/exec/hdfs-scan-node-base.cc@1241
PS25, Line 1241:                                                                  PrettyPrinter::Print(unexpected_remote_bytes_->value(), TUnit::BYTES),
line too long (135 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/25/be/src/exec/hdfs-scan-node-base.cc@1242
PS25, Line 1242:                                                                  hdfs_table_->database(), hdfs_table_->name())));
line too long (113 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/25/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

http://gerrit.cloudera.org:8080/#/c/17771/25/bin/bootstrap_toolchain.py@489
PS25, Line 489: "
flake8: E501 line too long (98 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/17771/25/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/17771/25/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@458
PS25, Line 458:     
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/25/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1405
PS25, Line 1405:                   0, fileDesc.getFileLength(), partition.getId(), fileDesc.getFileLength(),
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/25/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1406
PS25, Line 1406:                   fileDesc.getFileCompression().toThrift(), fileDesc.getModificationTime(),
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/25/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1410
PS25, Line 1410:                   currentOffset, currentLength, partition.getId(), fileDesc.getFileLength(),
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/25/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1411
PS25, Line 1411:                   fileDesc.getFileCompression().toThrift(), fileDesc.getModificationTime(),
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/25/tests/query_test/test_tpch_queries.py
File tests/query_test/test_tpch_queries.py:

http://gerrit.cloudera.org:8080/#/c/17771/25/tests/query_test/test_tpch_queries.py@39
PS25, Line 39: s
flake8: E501 line too long (96 > 90 characters)



-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 25
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sun, 26 Mar 2023 16:28:22 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 25:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/12694/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 25
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sun, 26 Mar 2023 16:37:17 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 1:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/9295/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 1
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sat, 14 Aug 2021 00:19:19 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
shikha.asrani10@gmail.com has removed Impala Public Jenkins from this change.  ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Removed reviewer Impala Public Jenkins.
-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 1
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9705/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 7
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 02 Nov 2021 03:43:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Aman Sinha, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17771

to look at the new patch set (#7).

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................

WiP: IMPALA-10798 : Prototype for JSON reader

This prototype allows user to  create a table stored as jsonfile and
query it.
Steps to test:
- create a json table with schema specified using eligible datatypes
(int8/16/32/64/float/double/string/varchar/char/timestamp/boolean)
- add your json file (with eligble datatypes and same column names as
 schema specified in the create command) to hdfs location
- add this 'location' to your table
- run a select statement

Fix:
- arrow library is included wherever required
- json format is added to scan node base class.
- json scanner files are added, that implement methods to read the
 json file from the specified file location

Change-Id: If79364a421d862d0d837f9be694911e388d4d629
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/exec/CMakeLists.txt
A be/src/exec/hdfs-json-scanner.cc
A be/src/exec/hdfs-json-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
A cmake_modules/FindArrow.cmake
9 files changed, 612 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/17771/7
-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 7
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
shikha.asrani10@gmail.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 10:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/CMakeLists.txt
File be/src/exec/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/CMakeLists.txt@71
PS9, Line 71: 
> nit: redundant whitespace
Done


http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.h
File be/src/exec/hdfs-json-scanner.h:

http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.h@38
PS9, Line 38: #include <arrow/memory_pool.h>
            : #include <arrow/result.h>
            : #include <arrow/type_fwd.h>
            : #include <arrow/util/macros.h>
            : #include <arrow/util/string_view.h>
            : #include <arrow/util/type_fwd.h>
            : #include "exec/hdfs-scan-node.h"
> nit: use <> for external includes
Done


http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.h@132
PS9, Line 132: row_read
> nit: rows_read_
Done


http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.h@133
PS9, Line 133: num_rows
> nit: num_rows_
Done


http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.h@136
PS9, Line 136:   std::shared_ptr<arrow::json::TableReader> reader_ = nullptr;
> nit: could you move this above to be together with the methods?
Done


http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.cc
File be/src/exec/hdfs-json-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.cc@19
PS9, Line 19: #include "common/names.h"
            : #include "runtime/collection-value-builder.h"
            : #include "runtime/datetime-simple-date-format-parser.h"
            : #include "runtime/io/request-context.h"
            : #include "runtime/mem-tracker.h"
            : #include "runtime/row-batch.h"
            : #include "runtime/runtime-filter.inline.h"
            : #include "runtime/timestamp-value.h"
            : #include "runtime/timestamp-value.inline.h"
            : #include "runtime/tuple-row.h"
            : #include "util/decompress.h"
            : 
            : using namespace impala;
            : using namespace impala::io;
            : 
            : Status HdfsJsonScanner::IssueIni
> nit: could you please remove headers that are included in hdfs-json-scanner
Done


http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.cc@296
PS9, Line 296:     *(reinterpret_cast<bool*>
> nit: for (const auto& column : table_->columns())
Done


http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.cc@305
PS9, Line 305:         memcpy(blob_, blob->data(), blob->size());
             :         char* src_ptr;
> nit: could you rename these variables? I'm confused in what "ar" means.. BT
Done


http://gerrit.cloudera.org:8080/#/c/17771/9/cmake_modules/FindArrow.cmake
File cmake_modules/FindArrow.cmake:

http://gerrit.cloudera.org:8080/#/c/17771/9/cmake_modules/FindArrow.cmake@18
PS9, Line 18: # - Find Arrow (headers and libarrow.a) with ARROW_ROOT hinting a location
            : # This module defines
            : #  ARROW_INCLUDE_DIR, directory containing headers
            : #  ARROW_STATIC_LIB, path to libarrow.a
            : #  ARROW_FOU
> nit: please update these comments
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 01 Dec 2021 01:30:36 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 19:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/11738/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 19
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sun, 30 Oct 2022 00:05:25 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 21:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11826/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 21
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 10 Nov 2022 06:35:25 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 5:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/17771/5/be/src/exec/hdfs-json-scanner.h
File be/src/exec/hdfs-json-scanner.h:

http://gerrit.cloudera.org:8080/#/c/17771/5/be/src/exec/hdfs-json-scanner.h@70
PS5, Line 70:   
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/5/be/src/exec/hdfs-json-scanner.h@91
PS5, Line 91:  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/5/be/src/exec/hdfs-json-scanner.h@153
PS5, Line 153:   uint8_t* tuple_mem_end_ = nullptr;  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/5/be/src/exec/hdfs-json-scanner.cc
File be/src/exec/hdfs-json-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17771/5/be/src/exec/hdfs-json-scanner.cc@55
PS5, Line 55:      RETURN_IF_ERROR(scan_node->AddDiskIoRanges(file, EnqueueLocation::TAIL)); 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/5/be/src/exec/hdfs-json-scanner.cc@98
PS5, Line 98:        << GetStackTrace();  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/5/be/src/exec/hdfs-json-scanner.cc@145
PS5, Line 145: arrow::Result<std::shared_ptr<arrow::Buffer>> 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/5/be/src/exec/hdfs-json-scanner.cc@203
PS5, Line 203:   ARROW_ASSIGN_OR_RAISE(reader_, 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/5/be/src/exec/hdfs-json-scanner.cc@219
PS5, Line 219:   vector<std::shared_ptr<arrow::Field>> fields_list = {}; 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17771/5/be/src/exec/hdfs-json-scanner.cc@222
PS5, Line 222:     std::shared_ptr<arrow::Field> field_a = arrow::field(cvf.name(), convert_type(cvf.type()));
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/5/be/src/exec/hdfs-json-scanner.cc@271
PS5, Line 271:   // reading rows from the previous point we left, tilll either end of table/capacity of rowbatch
line too long (97 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/5/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

http://gerrit.cloudera.org:8080/#/c/17771/5/bin/bootstrap_toolchain.py@443
PS5, Line 443: )
flake8: E501 line too long (91 > 90 characters)



-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 5
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sat, 14 Aug 2021 02:33:19 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9298/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 4
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sat, 14 Aug 2021 02:10:47 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 24:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/12654/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 24
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 21 Mar 2023 09:13:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Aman Sinha, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17771

to look at the new patch set (#10).

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................

WiP: IMPALA-10798 : Prototype for JSON reader

This prototype allows user to  create a table stored as jsonfile and
query it.
Steps to test:
- create a json table with schema specified using eligible datatypes
(int8/16/32/64/float/double/string/varchar/char/timestamp/boolean)
- add your json file (with eligble datatypes and same column names as
 schema specified in the create command) to hdfs location
- add this 'location' to your table
- run a select statement

Fix:
- arrow library is included wherever required
- json format is added to scan node base class.
- json scanner files are added, that implement methods to read the
 json file from the specified file location

Change-Id: If79364a421d862d0d837f9be694911e388d4d629
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/exec/CMakeLists.txt
A be/src/exec/hdfs-json-scanner.cc
A be/src/exec/hdfs-json-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
A cmake_modules/FindArrow.cmake
9 files changed, 580 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/17771/10
-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 10:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/9852/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 30 Nov 2021 07:44:48 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 12:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/9948/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 12
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 20 Dec 2021 05:36:36 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Aman Sinha, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17771

to look at the new patch set (#13).

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................

WiP: IMPALA-10798 : Prototype for JSON reader

This prototype allows user to  create a table stored as jsonfile and
query it.
Steps to test:
- create a json table with schema specified using eligible datatypes
(int8/16/32/64/float/double/string/varchar/char/timestamp/boolean)
- add your json file (with eligble datatypes and same column names as
 schema specified in the create command) to hdfs location
- add this 'location' to your table
- run a select statement

Fix:
- arrow library is included wherever required
- json format is added to scan node base class.
- json scanner files are added, that implement methods to read the
 json file from the specified file location

Change-Id: If79364a421d862d0d837f9be694911e388d4d629
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/exec/CMakeLists.txt
A be/src/exec/hdfs-json-scanner.cc
A be/src/exec/hdfs-json-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
A cmake_modules/FindArrow.cmake
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
10 files changed, 587 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/17771/13
-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 13
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17771/6/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

http://gerrit.cloudera.org:8080/#/c/17771/6/bin/bootstrap_toolchain.py@469
PS6, Line 469: )
flake8: E501 line too long (91 > 90 characters)



-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 6
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 02 Nov 2021 02:26:44 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Aman Sinha, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17771

to look at the new patch set (#8).

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................

WiP: IMPALA-10798 : Prototype for JSON reader

This prototype allows user to  create a table stored as jsonfile and
query it.
Steps to test:
- create a json table with schema specified using eligible datatypes
(int8/16/32/64/float/double/string/varchar/char/timestamp/boolean)
- add your json file (with eligble datatypes and same column names as
 schema specified in the create command) to hdfs location
- add this 'location' to your table
- run a select statement

Fix:
- arrow library is included wherever required
- json format is added to scan node base class.
- json scanner files are added, that implement methods to read the
 json file from the specified file location

Change-Id: If79364a421d862d0d837f9be694911e388d4d629
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/exec/CMakeLists.txt
A be/src/exec/hdfs-json-scanner.cc
A be/src/exec/hdfs-json-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
A cmake_modules/FindArrow.cmake
9 files changed, 620 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/17771/8
-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 8
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17771/8/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

http://gerrit.cloudera.org:8080/#/c/17771/8/bin/bootstrap_toolchain.py@469
PS8, Line 469: )
flake8: E501 line too long (91 > 90 characters)



-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 8
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 10 Nov 2021 19:02:21 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 11:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/9913/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 11
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 13 Dec 2021 03:08:40 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 19:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17771/19/be/src/exec/hdfs-json-scanner.cc
File be/src/exec/hdfs-json-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17771/19/be/src/exec/hdfs-json-scanner.cc@171
PS19, Line 171:     // VLOG_QUERY << "decimal128" << arrow::decimal128(ct.precision, ct.scale)->ToString();
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17771/19/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

http://gerrit.cloudera.org:8080/#/c/17771/19/bin/bootstrap_toolchain.py@489
PS19, Line 489: "
flake8: E501 line too long (98 > 90 characters)



-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 19
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sun, 30 Oct 2022 00:00:44 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 22:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11827/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 22
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 10 Nov 2022 06:38:39 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has uploaded a new patch set (#21) to the change originally created by shikha.asrani10@gmail.com. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................

WiP: IMPALA-10798 : Prototype for JSON reader

This prototype allows user to  create a table stored as jsonfile and
query it.
Steps to test:
- create a json table with schema specified using eligible datatypes
(int8/16/32/64/float/double/string/varchar/char/timestamp/boolean)
- add your json file (with eligble datatypes and same column names as
 schema specified in the create command) to hdfs location
- add this 'location' to your table
- run a select statement

Fix:
- arrow library is included wherever required
- json format is added to scan node base class.
- json scanner files are added, that implement methods to read the
 json file from the specified file location

Change-Id: If79364a421d862d0d837f9be694911e388d4d629
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/exec/CMakeLists.txt
A be/src/exec/hdfs-json-scanner.cc
A be/src/exec/hdfs-json-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
A cmake_modules/FindArrow.cmake
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/bin/generate-schema-statements.py
M testdata/workloads/functional-query/functional-query_core.csv
M testdata/workloads/functional-query/functional-query_dimensions.csv
M testdata/workloads/functional-query/functional-query_exhaustive.csv
M testdata/workloads/functional-query/functional-query_pairwise.csv
M testdata/workloads/tpcds/tpcds_core.csv
M testdata/workloads/tpcds/tpcds_exhaustive.csv
M testdata/workloads/tpcds/tpcds_pairwise.csv
M testdata/workloads/tpch/tpch_core.csv
M testdata/workloads/tpch/tpch_dimensions.csv
M testdata/workloads/tpch/tpch_exhaustive.csv
M testdata/workloads/tpch/tpch_pairwise.csv
M tests/common/test_dimensions.py
M tests/query_test/test_decimal_queries.py
M tests/query_test/test_tpch_queries.py
26 files changed, 769 insertions(+), 141 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/17771/21
-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 21
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 22: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8794/


-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 22
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 10 Nov 2022 10:33:58 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 20:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17771/20/tests/metadata/test_hms_integration.py
File tests/metadata/test_hms_integration.py:

http://gerrit.cloudera.org:8080/#/c/17771/20/tests/metadata/test_hms_integration.py@123
PS20, Line 123:   def test_json_file_unsupported(self, unique_database):
We need to remove this test



-- 
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 20
Gerrit-Owner: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <sh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 08 Nov 2022 07:31:18 +0000
Gerrit-HasComments: Yes