You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Kevin Wilfong <ke...@fb.com> on 2011/07/15 04:16:34 UTC

Review Request: Local mode needs to work well with block sampling

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1132/
-----------------------------------------------------------

Review request for hive and Siying Dong.


Summary
-------

A query should run in local mode when block sampling is used and the sample is small enough.  The size of the sample is currently being estimated, as it is done to estimate the number of reducers.


This addresses bug HIVE-2282.
    https://issues.apache.org/jira/browse/HIVE-2282


Diffs
-----

  ql/src/test/queries/clientpositive/sample_islocalmode_hook.q PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/MapRedTask.java 53769a0 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java cd3de76 
  ql/src/test/org/apache/hadoop/hive/ql/hooks/VerifyIsLocalModeHook.java PRE-CREATION 

Diff: https://reviews.apache.org/r/1132/diff


Testing
-------

TestCliDriver TestNegativeCliDriver, manually tested


Thanks,

Kevin


Re: Review Request: Local mode needs to work well with block sampling

Posted by Siying Dong <si...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1132/#review1081
-----------------------------------------------------------



ql/src/test/org/apache/hadoop/hive/ql/hooks/VerifyIsLocalModeHook.java
<https://reviews.apache.org/r/1132/#comment2210>

    We need a header for licensing.


- Siying


On 2011-07-15 02:16:34, Kevin Wilfong wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/1132/
> -----------------------------------------------------------
> 
> (Updated 2011-07-15 02:16:34)
> 
> 
> Review request for hive and Siying Dong.
> 
> 
> Summary
> -------
> 
> A query should run in local mode when block sampling is used and the sample is small enough.  The size of the sample is currently being estimated, as it is done to estimate the number of reducers.
> 
> 
> This addresses bug HIVE-2282.
>     https://issues.apache.org/jira/browse/HIVE-2282
> 
> 
> Diffs
> -----
> 
>   ql/src/test/queries/clientpositive/sample_islocalmode_hook.q PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MapRedTask.java 53769a0 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java cd3de76 
>   ql/src/test/org/apache/hadoop/hive/ql/hooks/VerifyIsLocalModeHook.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/1132/diff
> 
> 
> Testing
> -------
> 
> TestCliDriver TestNegativeCliDriver, manually tested
> 
> 
> Thanks,
> 
> Kevin
> 
>


Re: Review Request: Local mode needs to work well with block sampling

Posted by Siying Dong <si...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1132/#review1084
-----------------------------------------------------------


I mean you can just change the function name to something like estimateInputSize().

- Siying


On 2011-07-15 20:48:38, Kevin Wilfong wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/1132/
> -----------------------------------------------------------
> 
> (Updated 2011-07-15 20:48:38)
> 
> 
> Review request for hive and Siying Dong.
> 
> 
> Summary
> -------
> 
> A query should run in local mode when block sampling is used and the sample is small enough.  The size of the sample is currently being estimated, as it is done to estimate the number of reducers.
> 
> 
> This addresses bug HIVE-2282.
>     https://issues.apache.org/jira/browse/HIVE-2282
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java cd3de76 
>   ql/src/test/org/apache/hadoop/hive/ql/hooks/VerifyIsLocalModeHook.java PRE-CREATION 
>   ql/src/test/queries/clientpositive/sample_islocalmode_hook.q PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MapRedTask.java 53769a0 
> 
> Diff: https://reviews.apache.org/r/1132/diff
> 
> 
> Testing
> -------
> 
> TestCliDriver TestNegativeCliDriver, manually tested
> 
> 
> Thanks,
> 
> Kevin
> 
>


Re: Review Request: Local mode needs to work well with block sampling

Posted by Kevin Wilfong <ke...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1132/
-----------------------------------------------------------

(Updated 2011-07-22 17:40:44.736466)


Review request for hive and Siying Dong.


Changes
-------

I added the q.out file which I had forgotten for the new q file.

I also modified the test queries to select count(1) instead of selecting keys and values.


Summary
-------

A query should run in local mode when block sampling is used and the sample is small enough.  The size of the sample is currently being estimated, as it is done to estimate the number of reducers.


This addresses bug HIVE-2282.
    https://issues.apache.org/jira/browse/HIVE-2282


Diffs (updated)
-----

  ql/src/java/org/apache/hadoop/hive/ql/exec/MapRedTask.java 53769a0 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java cd3de76 
  ql/src/test/org/apache/hadoop/hive/ql/hooks/VerifyIsLocalModeHook.java PRE-CREATION 
  ql/src/test/queries/clientpositive/sample_islocalmode_hook.q PRE-CREATION 
  ql/src/test/results/clientpositive/sample_islocalmode_hook.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/1132/diff


Testing
-------

TestCliDriver TestNegativeCliDriver, manually tested


Thanks,

Kevin


Re: Review Request: Local mode needs to work well with block sampling

Posted by Kevin Wilfong <ke...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1132/
-----------------------------------------------------------

(Updated 2011-07-15 21:45:16.168124)


Review request for hive and Siying Dong.


Changes
-------

That's a good point, sorry I misunderstood it originally.

Renamed estimateSampledInputSize to estimateInputSize.


Summary
-------

A query should run in local mode when block sampling is used and the sample is small enough.  The size of the sample is currently being estimated, as it is done to estimate the number of reducers.


This addresses bug HIVE-2282.
    https://issues.apache.org/jira/browse/HIVE-2282


Diffs (updated)
-----

  ql/src/java/org/apache/hadoop/hive/ql/exec/MapRedTask.java 53769a0 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java cd3de76 
  ql/src/test/org/apache/hadoop/hive/ql/hooks/VerifyIsLocalModeHook.java PRE-CREATION 
  ql/src/test/queries/clientpositive/sample_islocalmode_hook.q PRE-CREATION 

Diff: https://reviews.apache.org/r/1132/diff


Testing
-------

TestCliDriver TestNegativeCliDriver, manually tested


Thanks,

Kevin


Re: Review Request: Local mode needs to work well with block sampling

Posted by Kevin Wilfong <ke...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1132/
-----------------------------------------------------------

(Updated 2011-07-15 20:48:38.625544)


Review request for hive and Siying Dong.


Changes
-------

I added comments to the estimateSampledInputSize function.  This function does set the input size even if there is no sampling, but this means that we do not need to create two cases everywhere we might need to use an estimated input size or an actual input size.  Instead, we can just run the function (which only does significant work the first time it is run thanks to a boolean flag) and the input size will be set to the appropriate values.  It only estimates the input size if sampling is used.

I also added the header to VerifyIsLocalModeHook.java


Summary
-------

A query should run in local mode when block sampling is used and the sample is small enough.  The size of the sample is currently being estimated, as it is done to estimate the number of reducers.


This addresses bug HIVE-2282.
    https://issues.apache.org/jira/browse/HIVE-2282


Diffs (updated)
-----

  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java cd3de76 
  ql/src/test/org/apache/hadoop/hive/ql/hooks/VerifyIsLocalModeHook.java PRE-CREATION 
  ql/src/test/queries/clientpositive/sample_islocalmode_hook.q PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/MapRedTask.java 53769a0 

Diff: https://reviews.apache.org/r/1132/diff


Testing
-------

TestCliDriver TestNegativeCliDriver, manually tested


Thanks,

Kevin


Re: Review Request: Local mode needs to work well with block sampling

Posted by Siying Dong <si...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1132/#review1080
-----------------------------------------------------------



ql/src/java/org/apache/hadoop/hive/ql/exec/MapRedTask.java
<https://reviews.apache.org/r/1132/#comment2209>

    This function name seems to be confusing. Looks like the input size is set even if there is no sampling, right? Also, can you add comments to this function?
    
    Other than that, the patch looks OK.


- Siying


On 2011-07-15 02:16:34, Kevin Wilfong wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/1132/
> -----------------------------------------------------------
> 
> (Updated 2011-07-15 02:16:34)
> 
> 
> Review request for hive and Siying Dong.
> 
> 
> Summary
> -------
> 
> A query should run in local mode when block sampling is used and the sample is small enough.  The size of the sample is currently being estimated, as it is done to estimate the number of reducers.
> 
> 
> This addresses bug HIVE-2282.
>     https://issues.apache.org/jira/browse/HIVE-2282
> 
> 
> Diffs
> -----
> 
>   ql/src/test/queries/clientpositive/sample_islocalmode_hook.q PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MapRedTask.java 53769a0 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java cd3de76 
>   ql/src/test/org/apache/hadoop/hive/ql/hooks/VerifyIsLocalModeHook.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/1132/diff
> 
> 
> Testing
> -------
> 
> TestCliDriver TestNegativeCliDriver, manually tested
> 
> 
> Thanks,
> 
> Kevin
> 
>