You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Hanifi Gunes <ha...@gmail.com> on 2015/03/19 00:34:24 UTC

Review Request 32223: DRILL-2193: implement fast count / skip-all semantics for JSON reader

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32223/
-----------------------------------------------------------

Review request for drill, Aman Sinha and Parth Chandra.


Bugs: DRILL-2193
    https://issues.apache.org/jira/browse/DRILL-2193


Repository: drill-git


Description
-------

DRILL-2193: implement fast count / skip-all semantics for JSON reader

This patch introduces an abstraction for JSON processing and implements a efficient counting JSON reader if query is in skip-all mode(see DRILL-2358).


Diffs
-----

  exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/JSONRecordReader.java c343177a719b5f36f51bcb2f84d68518ba1ae02f 
  exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/JsonProcessor.java PRE-CREATION 
  exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/reader/BaseJsonProcessor.java PRE-CREATION 
  exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/reader/CountingJsonReader.java PRE-CREATION 
  exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReader.java cc5c8af63c6383eb8d2e28a409a3c055bf5cc737 

Diff: https://reviews.apache.org/r/32223/diff/


Testing
-------

unit + regression


Thanks,

Hanifi Gunes


Re: Review Request 32223: DRILL-2193: implement fast count / skip-all semantics for JSON reader

Posted by Aman Sinha <as...@maprtech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32223/#review77092
-----------------------------------------------------------

Ship it!


Ship It!

- Aman Sinha


On March 19, 2015, 5:49 p.m., Hanifi Gunes wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/32223/
> -----------------------------------------------------------
> 
> (Updated March 19, 2015, 5:49 p.m.)
> 
> 
> Review request for drill, Aman Sinha and Parth Chandra.
> 
> 
> Bugs: DRILL-2193
>     https://issues.apache.org/jira/browse/DRILL-2193
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> DRILL-2193: implement fast count / skip-all semantics for JSON reader
> 
> This patch introduces an abstraction for JSON processing and implements a efficient counting JSON reader if query is in skip-all mode(see DRILL-2358).
> 
> 
> Diffs
> -----
> 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/JSONRecordReader.java c343177a719b5f36f51bcb2f84d68518ba1ae02f 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/JsonProcessor.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/reader/BaseJsonProcessor.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/reader/CountingJsonReader.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReader.java cc5c8af63c6383eb8d2e28a409a3c055bf5cc737 
> 
> Diff: https://reviews.apache.org/r/32223/diff/
> 
> 
> Testing
> -------
> 
> unit + regression
> 
> 
> Thanks,
> 
> Hanifi Gunes
> 
>


Re: Review Request 32223: DRILL-2193: implement fast count / skip-all semantics for JSON reader

Posted by Hanifi Gunes <ha...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32223/
-----------------------------------------------------------

(Updated March 19, 2015, 5:49 p.m.)


Review request for drill, Aman Sinha and Parth Chandra.


Changes
-------

Addressed Aman's comments.


Bugs: DRILL-2193
    https://issues.apache.org/jira/browse/DRILL-2193


Repository: drill-git


Description
-------

DRILL-2193: implement fast count / skip-all semantics for JSON reader

This patch introduces an abstraction for JSON processing and implements a efficient counting JSON reader if query is in skip-all mode(see DRILL-2358).


Diffs (updated)
-----

  exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/JSONRecordReader.java c343177a719b5f36f51bcb2f84d68518ba1ae02f 
  exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/JsonProcessor.java PRE-CREATION 
  exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/reader/BaseJsonProcessor.java PRE-CREATION 
  exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/reader/CountingJsonReader.java PRE-CREATION 
  exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReader.java cc5c8af63c6383eb8d2e28a409a3c055bf5cc737 

Diff: https://reviews.apache.org/r/32223/diff/


Testing
-------

unit + regression


Thanks,

Hanifi Gunes


Re: Review Request 32223: DRILL-2193: implement fast count / skip-all semantics for JSON reader

Posted by Parth Chandra <pc...@maprtech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32223/#review76982
-----------------------------------------------------------

Ship it!


Ship It!

- Parth Chandra


On March 18, 2015, 11:34 p.m., Hanifi Gunes wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/32223/
> -----------------------------------------------------------
> 
> (Updated March 18, 2015, 11:34 p.m.)
> 
> 
> Review request for drill, Aman Sinha and Parth Chandra.
> 
> 
> Bugs: DRILL-2193
>     https://issues.apache.org/jira/browse/DRILL-2193
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> DRILL-2193: implement fast count / skip-all semantics for JSON reader
> 
> This patch introduces an abstraction for JSON processing and implements a efficient counting JSON reader if query is in skip-all mode(see DRILL-2358).
> 
> 
> Diffs
> -----
> 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/JSONRecordReader.java c343177a719b5f36f51bcb2f84d68518ba1ae02f 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/JsonProcessor.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/reader/BaseJsonProcessor.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/reader/CountingJsonReader.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReader.java cc5c8af63c6383eb8d2e28a409a3c055bf5cc737 
> 
> Diff: https://reviews.apache.org/r/32223/diff/
> 
> 
> Testing
> -------
> 
> unit + regression
> 
> 
> Thanks,
> 
> Hanifi Gunes
> 
>


Re: Review Request 32223: DRILL-2193: implement fast count / skip-all semantics for JSON reader

Posted by Hanifi Gunes <ha...@gmail.com>.

> On March 19, 2015, 7:40 a.m., Aman Sinha wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/reader/CountingJsonReader.java, line 36
> > <https://reviews.apache.org/r/32223/diff/1/?file=899462#file899462line36>
> >
> >     The default JsonReader (which is used when skip-all is false) has a initial while loop to iterate over the tokens; is that not needed here because you are expecting to be either at end-of-stream or at the beginning of a record ? I am wondering what happens where a single large record (with either many fields or a large string field) spans across batch boundary. (I am actually not completely sure if that is allowed, so let me know if that situation is not going to occur).
> 
> Hanifi Gunes wrote:
>     i) I am not sure why this initial loop in the original reader is useful.
>     
>     ii) I think parser works on entire json stream across batch boundaries. Wide records used to be a problem before auto reallocation came in now we do re-alloc as needed. Besides since we are not particularly interested in fields and just counting, footprint should be small.
> 
> Aman Sinha wrote:
>     It would be good to confirm this by working with QA to have a test case but it is not a blocker for the patch.

I think we have coverage on wide records. We can ask folks to include a count(*) query on the same data.


- Hanifi


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32223/#review77027
-----------------------------------------------------------


On March 19, 2015, 5:49 p.m., Hanifi Gunes wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/32223/
> -----------------------------------------------------------
> 
> (Updated March 19, 2015, 5:49 p.m.)
> 
> 
> Review request for drill, Aman Sinha and Parth Chandra.
> 
> 
> Bugs: DRILL-2193
>     https://issues.apache.org/jira/browse/DRILL-2193
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> DRILL-2193: implement fast count / skip-all semantics for JSON reader
> 
> This patch introduces an abstraction for JSON processing and implements a efficient counting JSON reader if query is in skip-all mode(see DRILL-2358).
> 
> 
> Diffs
> -----
> 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/JSONRecordReader.java c343177a719b5f36f51bcb2f84d68518ba1ae02f 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/JsonProcessor.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/reader/BaseJsonProcessor.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/reader/CountingJsonReader.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReader.java cc5c8af63c6383eb8d2e28a409a3c055bf5cc737 
> 
> Diff: https://reviews.apache.org/r/32223/diff/
> 
> 
> Testing
> -------
> 
> unit + regression
> 
> 
> Thanks,
> 
> Hanifi Gunes
> 
>


Re: Review Request 32223: DRILL-2193: implement fast count / skip-all semantics for JSON reader

Posted by Aman Sinha <as...@maprtech.com>.

> On March 19, 2015, 7:40 a.m., Aman Sinha wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/reader/CountingJsonReader.java, line 36
> > <https://reviews.apache.org/r/32223/diff/1/?file=899462#file899462line36>
> >
> >     The default JsonReader (which is used when skip-all is false) has a initial while loop to iterate over the tokens; is that not needed here because you are expecting to be either at end-of-stream or at the beginning of a record ? I am wondering what happens where a single large record (with either many fields or a large string field) spans across batch boundary. (I am actually not completely sure if that is allowed, so let me know if that situation is not going to occur).
> 
> Hanifi Gunes wrote:
>     i) I am not sure why this initial loop in the original reader is useful.
>     
>     ii) I think parser works on entire json stream across batch boundaries. Wide records used to be a problem before auto reallocation came in now we do re-alloc as needed. Besides since we are not particularly interested in fields and just counting, footprint should be small.

It would be good to confirm this by working with QA to have a test case but it is not a blocker for the patch.


- Aman


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32223/#review77027
-----------------------------------------------------------


On March 19, 2015, 5:49 p.m., Hanifi Gunes wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/32223/
> -----------------------------------------------------------
> 
> (Updated March 19, 2015, 5:49 p.m.)
> 
> 
> Review request for drill, Aman Sinha and Parth Chandra.
> 
> 
> Bugs: DRILL-2193
>     https://issues.apache.org/jira/browse/DRILL-2193
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> DRILL-2193: implement fast count / skip-all semantics for JSON reader
> 
> This patch introduces an abstraction for JSON processing and implements a efficient counting JSON reader if query is in skip-all mode(see DRILL-2358).
> 
> 
> Diffs
> -----
> 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/JSONRecordReader.java c343177a719b5f36f51bcb2f84d68518ba1ae02f 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/JsonProcessor.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/reader/BaseJsonProcessor.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/reader/CountingJsonReader.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReader.java cc5c8af63c6383eb8d2e28a409a3c055bf5cc737 
> 
> Diff: https://reviews.apache.org/r/32223/diff/
> 
> 
> Testing
> -------
> 
> unit + regression
> 
> 
> Thanks,
> 
> Hanifi Gunes
> 
>


Re: Review Request 32223: DRILL-2193: implement fast count / skip-all semantics for JSON reader

Posted by Hanifi Gunes <ha...@gmail.com>.

> On March 19, 2015, 7:40 a.m., Aman Sinha wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/JsonProcessor.java, line 37
> > <https://reviews.apache.org/r/32223/diff/1/?file=899460#file899460line37>
> >
> >     Is it necessary to have this method in the interface ?  The name suggests that implementors should ensure at least 1 field/column but the counting reader does not actually do that.

I would imagine this be useful for an empty json input. I will implement this in counting reader.


> On March 19, 2015, 7:40 a.m., Aman Sinha wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/reader/CountingJsonReader.java, line 36
> > <https://reviews.apache.org/r/32223/diff/1/?file=899462#file899462line36>
> >
> >     The default JsonReader (which is used when skip-all is false) has a initial while loop to iterate over the tokens; is that not needed here because you are expecting to be either at end-of-stream or at the beginning of a record ? I am wondering what happens where a single large record (with either many fields or a large string field) spans across batch boundary. (I am actually not completely sure if that is allowed, so let me know if that situation is not going to occur).

i) I am not sure why this initial loop in the original reader is useful.

ii) I think parser works on entire json stream across batch boundaries. Wide records used to be a problem before auto reallocation came in now we do re-alloc as needed. Besides since we are not particularly interested in fields and just counting, footprint should be small.


- Hanifi


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32223/#review77027
-----------------------------------------------------------


On March 18, 2015, 11:34 p.m., Hanifi Gunes wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/32223/
> -----------------------------------------------------------
> 
> (Updated March 18, 2015, 11:34 p.m.)
> 
> 
> Review request for drill, Aman Sinha and Parth Chandra.
> 
> 
> Bugs: DRILL-2193
>     https://issues.apache.org/jira/browse/DRILL-2193
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> DRILL-2193: implement fast count / skip-all semantics for JSON reader
> 
> This patch introduces an abstraction for JSON processing and implements a efficient counting JSON reader if query is in skip-all mode(see DRILL-2358).
> 
> 
> Diffs
> -----
> 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/JSONRecordReader.java c343177a719b5f36f51bcb2f84d68518ba1ae02f 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/JsonProcessor.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/reader/BaseJsonProcessor.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/reader/CountingJsonReader.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReader.java cc5c8af63c6383eb8d2e28a409a3c055bf5cc737 
> 
> Diff: https://reviews.apache.org/r/32223/diff/
> 
> 
> Testing
> -------
> 
> unit + regression
> 
> 
> Thanks,
> 
> Hanifi Gunes
> 
>


Re: Review Request 32223: DRILL-2193: implement fast count / skip-all semantics for JSON reader

Posted by Aman Sinha <as...@maprtech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32223/#review77027
-----------------------------------------------------------



exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/JsonProcessor.java
<https://reviews.apache.org/r/32223/#comment124816>

    Is it necessary to have this method in the interface ?  The name suggests that implementors should ensure at least 1 field/column but the counting reader does not actually do that.



exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/reader/CountingJsonReader.java
<https://reviews.apache.org/r/32223/#comment124819>

    The default JsonReader (which is used when skip-all is false) has a initial while loop to iterate over the tokens; is that not needed here because you are expecting to be either at end-of-stream or at the beginning of a record ? I am wondering what happens where a single large record (with either many fields or a large string field) spans across batch boundary. (I am actually not completely sure if that is allowed, so let me know if that situation is not going to occur).


- Aman Sinha


On March 18, 2015, 11:34 p.m., Hanifi Gunes wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/32223/
> -----------------------------------------------------------
> 
> (Updated March 18, 2015, 11:34 p.m.)
> 
> 
> Review request for drill, Aman Sinha and Parth Chandra.
> 
> 
> Bugs: DRILL-2193
>     https://issues.apache.org/jira/browse/DRILL-2193
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> DRILL-2193: implement fast count / skip-all semantics for JSON reader
> 
> This patch introduces an abstraction for JSON processing and implements a efficient counting JSON reader if query is in skip-all mode(see DRILL-2358).
> 
> 
> Diffs
> -----
> 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/JSONRecordReader.java c343177a719b5f36f51bcb2f84d68518ba1ae02f 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/JsonProcessor.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/reader/BaseJsonProcessor.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/reader/CountingJsonReader.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReader.java cc5c8af63c6383eb8d2e28a409a3c055bf5cc737 
> 
> Diff: https://reviews.apache.org/r/32223/diff/
> 
> 
> Testing
> -------
> 
> unit + regression
> 
> 
> Thanks,
> 
> Hanifi Gunes
> 
>