You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Zoltan Haindrich <ki...@rxd.hu> on 2018/04/03 16:04:14 UTC

Review Request 66402: HIVE-19009

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66402/
-----------------------------------------------------------

Review request for hive and Ashutosh Chauhan.


Repository: hive-git


Description
-------

retain runtime stats at session level; some fixes to support runtime stats during vectorized execution; minor fixes logs


Diffs
-----

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java dd94d4db87514de9050ea5669f73027ff3dd6937 
  ql/src/java/org/apache/hadoop/hive/ql/Context.java 58fa5f2287e1221f300e16a78aa76dc6fc23cf47 
  ql/src/java/org/apache/hadoop/hive/ql/Driver.java 75f928b69d3d7b206564216d24be450848a1fe8a 
  ql/src/java/org/apache/hadoop/hive/ql/DriverFactory.java 0f6a80ef0d7b768371a932e5bda75348955d0369 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 13a2fc478fda244eca8632d12f2d8f46ae280a63 
  ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/EmptyStatsSource.java 57762edbdfbf4f77b051f42faf313db882878969 
  ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/SessionStatsSource.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/StatsSource.java a4cb6e977190c4f5fa275619a8e97755ad1f55d8 
  ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/StatsSources.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/reexec/IReExecutionPlugin.java 2b0d23c6f237915684d59856cb62001c01c8209c 
  ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java 93031712dc6fb60a6d618c8754e50def489a12af 
  ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecutionOverlayPlugin.java 4ee3c14b3988521b3f44d02f032163734ed9e4d9 
  ql/src/java/org/apache/hadoop/hive/ql/reexec/ReOptimizePlugin.java 707858700f4d06301fb873fb47f8d9d9b101d056 
  ql/src/java/org/apache/hadoop/hive/ql/reexec/SessionStatsPlugin.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 0071a9a4ebadd73103a6f6a5c97cc9992df7aa48 
  ql/src/test/org/apache/hadoop/hive/ql/plan/mapping/TestCounterMapping.java 9fe95e4c5684b4aa69dd7148a14c29ed9c2a0062 


Diff: https://reviews.apache.org/r/66402/diff/1/


Testing
-------


Thanks,

Zoltan Haindrich


Re: Review Request 66402: HIVE-19009 Retain and use runtime statistics thru out a session

Posted by Zoltan Haindrich <ki...@rxd.hu>.

> On April 11, 2018, 12:51 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/SessionStatsSource.java
> > Lines 35 (patched)
> > <https://reviews.apache.org/r/66402/diff/2/?file=1994925#file1994925line35>
> >
> >     There should be a singleton for this cache which can be called from anywhere. e.g., see MaterializationsInvalidationCache

I will add that too - probably in the next patch ; but this is the "session level only" implementation; a static cache would be hs2 lifetime; or wider an even wider scope


> On April 11, 2018, 12:51 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/StatsSources.java
> > Lines 34 (patched)
> > <https://reviews.apache.org/r/66402/diff/2/?file=1994927#file1994927line34>
> >
> >     Deprecated when it was first written :)
> >     Any reason for having it this way.

yes...I use it as a marker that I've pending renaming to do - when I've created the method it was unclear what will be


> On April 11, 2018, 12:51 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/reexec/SessionStatsPlugin.java
> > Lines 36-40 (patched)
> > <https://reviews.apache.org/r/66402/diff/2/?file=1994932#file1994932line36>
> >
> >     I am not sure SS stats source should be linked to driver. Its lifecycle is not tied to query or driver or SS. This should be initialized once in a HS2 lifecycle.

I'll add the hs2 lifecycle level caching as a separate thing;
I think the session level may probably be usefull to reproduce problems with it later.

What happens right here is that: if the session doesn't yet have a sessionstatssource; a new one will be created - the creation might be also moved to sessionstate; but I didn't wanted to use the specific  implementation class there ; this will probably change when longer runtime stats will be retained for longer period (metastore)


> On April 11, 2018, 12:51 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
> > Lines 123 (patched)
> > <https://reviews.apache.org/r/66402/diff/2/?file=1994933#file1994933line123>
> >
> >     Why is stored in SessionState. SS is per query, whereas statscache is global.

SessionState is per session - at least currently it works like that ; will add the extension to hs2/persisted level in a follow up - I think it will be pretty straight forward...

but I will probably rethink how the caching level is configured


- Zoltan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66402/#review200829
-----------------------------------------------------------


On April 10, 2018, 12:31 p.m., Zoltan Haindrich wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66402/
> -----------------------------------------------------------
> 
> (Updated April 10, 2018, 12:31 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-19009
>     https://issues.apache.org/jira/browse/HIVE-19009
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> * retain runtime stats at session level
> * some fixes to support runtime stats during vectorized execution
> * make collection less strict; there are cases when the same op tree is evaluated multiple times in a query
> * minor fixes and added some logging
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java dd94d4db87514de9050ea5669f73027ff3dd6937 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 58fa5f2287e1221f300e16a78aa76dc6fc23cf47 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java 75f928b69d3d7b206564216d24be450848a1fe8a 
>   ql/src/java/org/apache/hadoop/hive/ql/DriverFactory.java 0f6a80ef0d7b768371a932e5bda75348955d0369 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 13a2fc478fda244eca8632d12f2d8f46ae280a63 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/EmptyStatsSource.java 57762edbdfbf4f77b051f42faf313db882878969 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/SessionStatsSource.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/StatsSource.java a4cb6e977190c4f5fa275619a8e97755ad1f55d8 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/StatsSources.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/reexec/IReExecutionPlugin.java 2b0d23c6f237915684d59856cb62001c01c8209c 
>   ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java 93031712dc6fb60a6d618c8754e50def489a12af 
>   ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecutionOverlayPlugin.java 4ee3c14b3988521b3f44d02f032163734ed9e4d9 
>   ql/src/java/org/apache/hadoop/hive/ql/reexec/ReOptimizePlugin.java 707858700f4d06301fb873fb47f8d9d9b101d056 
>   ql/src/java/org/apache/hadoop/hive/ql/reexec/SessionStatsPlugin.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 0071a9a4ebadd73103a6f6a5c97cc9992df7aa48 
>   ql/src/test/org/apache/hadoop/hive/ql/plan/mapping/TestCounterMapping.java 9fe95e4c5684b4aa69dd7148a14c29ed9c2a0062 
> 
> 
> Diff: https://reviews.apache.org/r/66402/diff/2/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Zoltan Haindrich
> 
>


Re: Review Request 66402: HIVE-19009 Retain and use runtime statistics thru out a session

Posted by Ashutosh Chauhan <ha...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66402/#review200829
-----------------------------------------------------------




common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
Lines 4265 (patched)
<https://reviews.apache.org/r/66402/#comment281730>

    I think default memory consumption of ~100MB in cache will be a good default. Assuming 1KB (?) for each entry that will be 100K entries. Lets use that.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java
Lines 4958 (patched)
<https://reviews.apache.org/r/66402/#comment281729>

    It will be good to leave a comment here about this.



ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/SessionStatsSource.java
Lines 35 (patched)
<https://reviews.apache.org/r/66402/#comment281767>

    There should be a singleton for this cache which can be called from anywhere. e.g., see MaterializationsInvalidationCache



ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/StatsSources.java
Lines 34 (patched)
<https://reviews.apache.org/r/66402/#comment281731>

    Deprecated when it was first written :)
    Any reason for having it this way.



ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/StatsSources.java
Lines 64 (patched)
<https://reviews.apache.org/r/66402/#comment281768>

    LOG.debug



ql/src/java/org/apache/hadoop/hive/ql/reexec/SessionStatsPlugin.java
Lines 36-40 (patched)
<https://reviews.apache.org/r/66402/#comment281769>

    I am not sure SS stats source should be linked to driver. Its lifecycle is not tied to query or driver or SS. This should be initialized once in a HS2 lifecycle.



ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
Lines 123 (patched)
<https://reviews.apache.org/r/66402/#comment281766>

    Why is stored in SessionState. SS is per query, whereas statscache is global.


- Ashutosh Chauhan


On April 10, 2018, 12:31 p.m., Zoltan Haindrich wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66402/
> -----------------------------------------------------------
> 
> (Updated April 10, 2018, 12:31 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-19009
>     https://issues.apache.org/jira/browse/HIVE-19009
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> * retain runtime stats at session level
> * some fixes to support runtime stats during vectorized execution
> * make collection less strict; there are cases when the same op tree is evaluated multiple times in a query
> * minor fixes and added some logging
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java dd94d4db87514de9050ea5669f73027ff3dd6937 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 58fa5f2287e1221f300e16a78aa76dc6fc23cf47 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java 75f928b69d3d7b206564216d24be450848a1fe8a 
>   ql/src/java/org/apache/hadoop/hive/ql/DriverFactory.java 0f6a80ef0d7b768371a932e5bda75348955d0369 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 13a2fc478fda244eca8632d12f2d8f46ae280a63 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/EmptyStatsSource.java 57762edbdfbf4f77b051f42faf313db882878969 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/SessionStatsSource.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/StatsSource.java a4cb6e977190c4f5fa275619a8e97755ad1f55d8 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/StatsSources.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/reexec/IReExecutionPlugin.java 2b0d23c6f237915684d59856cb62001c01c8209c 
>   ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java 93031712dc6fb60a6d618c8754e50def489a12af 
>   ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecutionOverlayPlugin.java 4ee3c14b3988521b3f44d02f032163734ed9e4d9 
>   ql/src/java/org/apache/hadoop/hive/ql/reexec/ReOptimizePlugin.java 707858700f4d06301fb873fb47f8d9d9b101d056 
>   ql/src/java/org/apache/hadoop/hive/ql/reexec/SessionStatsPlugin.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 0071a9a4ebadd73103a6f6a5c97cc9992df7aa48 
>   ql/src/test/org/apache/hadoop/hive/ql/plan/mapping/TestCounterMapping.java 9fe95e4c5684b4aa69dd7148a14c29ed9c2a0062 
> 
> 
> Diff: https://reviews.apache.org/r/66402/diff/2/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Zoltan Haindrich
> 
>


Re: Review Request 66402: HIVE-19009 Retain and use runtime statistics thru out a session

Posted by Zoltan Haindrich <ki...@rxd.hu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66402/
-----------------------------------------------------------

(Updated April 17, 2018, 9:43 a.m.)


Review request for hive and Ashutosh Chauhan.


Changes
-------

persist in a global map


Bugs: HIVE-19009
    https://issues.apache.org/jira/browse/HIVE-19009


Repository: hive-git


Description
-------

* retain runtime stats at session level
* some fixes to support runtime stats during vectorized execution
* make collection less strict; there are cases when the same op tree is evaluated multiple times in a query
* minor fixes and added some logging


Diffs (updated)
-----

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java dd94d4db87514de9050ea5669f73027ff3dd6937 
  itests/src/test/resources/testconfiguration.properties f513fe5ff760a6fa7e3ce069f33ce26703965e33 
  itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 050f9d5765c1389829fbde80cb76c32df4bc3cc5 
  ql/src/java/org/apache/hadoop/hive/ql/Context.java 58fa5f2287e1221f300e16a78aa76dc6fc23cf47 
  ql/src/java/org/apache/hadoop/hive/ql/Driver.java 75f928b69d3d7b206564216d24be450848a1fe8a 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 13a2fc478fda244eca8632d12f2d8f46ae280a63 
  ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/CachingStatsSource.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/EmptyStatsSource.java 57762edbdfbf4f77b051f42faf313db882878969 
  ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/SimpleRuntimeStatsSource.java 6f340b8450a95569156b3b7eddf28fdd53fc9065 
  ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/StatsSource.java a4cb6e977190c4f5fa275619a8e97755ad1f55d8 
  ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/StatsSources.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/reexec/IReExecutionPlugin.java 2b0d23c6f237915684d59856cb62001c01c8209c 
  ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java 93031712dc6fb60a6d618c8754e50def489a12af 
  ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecutionOverlayPlugin.java 4ee3c14b3988521b3f44d02f032163734ed9e4d9 
  ql/src/java/org/apache/hadoop/hive/ql/reexec/ReOptimizePlugin.java 707858700f4d06301fb873fb47f8d9d9b101d056 
  ql/src/test/org/apache/hadoop/hive/ql/optimizer/signature/TestOperatorSignature.java 8c899e7fef1a1ebe06fdd9269f3c922f8baab84d 
  ql/src/test/org/apache/hadoop/hive/ql/plan/mapping/TestCounterMapping.java 9fe95e4c5684b4aa69dd7148a14c29ed9c2a0062 
  ql/src/test/queries/clientpositive/runtime_stats_hs2.q PRE-CREATION 
  ql/src/test/results/clientpositive/llap/runtime_stats_hs2.q.out PRE-CREATION 


Diff: https://reviews.apache.org/r/66402/diff/3/

Changes: https://reviews.apache.org/r/66402/diff/2-3/


Testing
-------


Thanks,

Zoltan Haindrich


Re: Review Request 66402: HIVE-19009 Retain and use runtime statistics thru out a session

Posted by Zoltan Haindrich <ki...@rxd.hu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66402/
-----------------------------------------------------------

(Updated April 10, 2018, 12:31 p.m.)


Review request for hive and Ashutosh Chauhan.


Changes
-------

update to patch#2


Summary (updated)
-----------------

HIVE-19009 Retain and use runtime statistics thru out a session


Bugs: HIVE-19009
    https://issues.apache.org/jira/browse/HIVE-19009


Repository: hive-git


Description (updated)
-------

* retain runtime stats at session level
* some fixes to support runtime stats during vectorized execution
* make collection less strict; there are cases when the same op tree is evaluated multiple times in a query
* minor fixes and added some logging


Diffs (updated)
-----

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java dd94d4db87514de9050ea5669f73027ff3dd6937 
  ql/src/java/org/apache/hadoop/hive/ql/Context.java 58fa5f2287e1221f300e16a78aa76dc6fc23cf47 
  ql/src/java/org/apache/hadoop/hive/ql/Driver.java 75f928b69d3d7b206564216d24be450848a1fe8a 
  ql/src/java/org/apache/hadoop/hive/ql/DriverFactory.java 0f6a80ef0d7b768371a932e5bda75348955d0369 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 13a2fc478fda244eca8632d12f2d8f46ae280a63 
  ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/EmptyStatsSource.java 57762edbdfbf4f77b051f42faf313db882878969 
  ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/SessionStatsSource.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/StatsSource.java a4cb6e977190c4f5fa275619a8e97755ad1f55d8 
  ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/StatsSources.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/reexec/IReExecutionPlugin.java 2b0d23c6f237915684d59856cb62001c01c8209c 
  ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java 93031712dc6fb60a6d618c8754e50def489a12af 
  ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecutionOverlayPlugin.java 4ee3c14b3988521b3f44d02f032163734ed9e4d9 
  ql/src/java/org/apache/hadoop/hive/ql/reexec/ReOptimizePlugin.java 707858700f4d06301fb873fb47f8d9d9b101d056 
  ql/src/java/org/apache/hadoop/hive/ql/reexec/SessionStatsPlugin.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 0071a9a4ebadd73103a6f6a5c97cc9992df7aa48 
  ql/src/test/org/apache/hadoop/hive/ql/plan/mapping/TestCounterMapping.java 9fe95e4c5684b4aa69dd7148a14c29ed9c2a0062 


Diff: https://reviews.apache.org/r/66402/diff/2/

Changes: https://reviews.apache.org/r/66402/diff/1-2/


Testing
-------


Thanks,

Zoltan Haindrich