You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Zoltan Haindrich <ki...@rxd.hu> on 2018/04/03 16:04:14 UTC
Review Request 66402: HIVE-19009
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66402/
-----------------------------------------------------------
Review request for hive and Ashutosh Chauhan.
Repository: hive-git
Description
-------
retain runtime stats at session level; some fixes to support runtime stats during vectorized execution; minor fixes logs
Diffs
-----
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java dd94d4db87514de9050ea5669f73027ff3dd6937
ql/src/java/org/apache/hadoop/hive/ql/Context.java 58fa5f2287e1221f300e16a78aa76dc6fc23cf47
ql/src/java/org/apache/hadoop/hive/ql/Driver.java 75f928b69d3d7b206564216d24be450848a1fe8a
ql/src/java/org/apache/hadoop/hive/ql/DriverFactory.java 0f6a80ef0d7b768371a932e5bda75348955d0369
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 13a2fc478fda244eca8632d12f2d8f46ae280a63
ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/EmptyStatsSource.java 57762edbdfbf4f77b051f42faf313db882878969
ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/SessionStatsSource.java PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/StatsSource.java a4cb6e977190c4f5fa275619a8e97755ad1f55d8
ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/StatsSources.java PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/reexec/IReExecutionPlugin.java 2b0d23c6f237915684d59856cb62001c01c8209c
ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java 93031712dc6fb60a6d618c8754e50def489a12af
ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecutionOverlayPlugin.java 4ee3c14b3988521b3f44d02f032163734ed9e4d9
ql/src/java/org/apache/hadoop/hive/ql/reexec/ReOptimizePlugin.java 707858700f4d06301fb873fb47f8d9d9b101d056
ql/src/java/org/apache/hadoop/hive/ql/reexec/SessionStatsPlugin.java PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 0071a9a4ebadd73103a6f6a5c97cc9992df7aa48
ql/src/test/org/apache/hadoop/hive/ql/plan/mapping/TestCounterMapping.java 9fe95e4c5684b4aa69dd7148a14c29ed9c2a0062
Diff: https://reviews.apache.org/r/66402/diff/1/
Testing
-------
Thanks,
Zoltan Haindrich
Re: Review Request 66402: HIVE-19009 Retain and use runtime
statistics thru out a session
Posted by Zoltan Haindrich <ki...@rxd.hu>.
> On April 11, 2018, 12:51 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/SessionStatsSource.java
> > Lines 35 (patched)
> > <https://reviews.apache.org/r/66402/diff/2/?file=1994925#file1994925line35>
> >
> > There should be a singleton for this cache which can be called from anywhere. e.g., see MaterializationsInvalidationCache
I will add that too - probably in the next patch ; but this is the "session level only" implementation; a static cache would be hs2 lifetime; or wider an even wider scope
> On April 11, 2018, 12:51 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/StatsSources.java
> > Lines 34 (patched)
> > <https://reviews.apache.org/r/66402/diff/2/?file=1994927#file1994927line34>
> >
> > Deprecated when it was first written :)
> > Any reason for having it this way.
yes...I use it as a marker that I've pending renaming to do - when I've created the method it was unclear what will be
> On April 11, 2018, 12:51 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/reexec/SessionStatsPlugin.java
> > Lines 36-40 (patched)
> > <https://reviews.apache.org/r/66402/diff/2/?file=1994932#file1994932line36>
> >
> > I am not sure SS stats source should be linked to driver. Its lifecycle is not tied to query or driver or SS. This should be initialized once in a HS2 lifecycle.
I'll add the hs2 lifecycle level caching as a separate thing;
I think the session level may probably be usefull to reproduce problems with it later.
What happens right here is that: if the session doesn't yet have a sessionstatssource; a new one will be created - the creation might be also moved to sessionstate; but I didn't wanted to use the specific implementation class there ; this will probably change when longer runtime stats will be retained for longer period (metastore)
> On April 11, 2018, 12:51 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
> > Lines 123 (patched)
> > <https://reviews.apache.org/r/66402/diff/2/?file=1994933#file1994933line123>
> >
> > Why is stored in SessionState. SS is per query, whereas statscache is global.
SessionState is per session - at least currently it works like that ; will add the extension to hs2/persisted level in a follow up - I think it will be pretty straight forward...
but I will probably rethink how the caching level is configured
- Zoltan
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66402/#review200829
-----------------------------------------------------------
On April 10, 2018, 12:31 p.m., Zoltan Haindrich wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66402/
> -----------------------------------------------------------
>
> (Updated April 10, 2018, 12:31 p.m.)
>
>
> Review request for hive and Ashutosh Chauhan.
>
>
> Bugs: HIVE-19009
> https://issues.apache.org/jira/browse/HIVE-19009
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> * retain runtime stats at session level
> * some fixes to support runtime stats during vectorized execution
> * make collection less strict; there are cases when the same op tree is evaluated multiple times in a query
> * minor fixes and added some logging
>
>
> Diffs
> -----
>
> common/src/java/org/apache/hadoop/hive/conf/HiveConf.java dd94d4db87514de9050ea5669f73027ff3dd6937
> ql/src/java/org/apache/hadoop/hive/ql/Context.java 58fa5f2287e1221f300e16a78aa76dc6fc23cf47
> ql/src/java/org/apache/hadoop/hive/ql/Driver.java 75f928b69d3d7b206564216d24be450848a1fe8a
> ql/src/java/org/apache/hadoop/hive/ql/DriverFactory.java 0f6a80ef0d7b768371a932e5bda75348955d0369
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 13a2fc478fda244eca8632d12f2d8f46ae280a63
> ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/EmptyStatsSource.java 57762edbdfbf4f77b051f42faf313db882878969
> ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/SessionStatsSource.java PRE-CREATION
> ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/StatsSource.java a4cb6e977190c4f5fa275619a8e97755ad1f55d8
> ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/StatsSources.java PRE-CREATION
> ql/src/java/org/apache/hadoop/hive/ql/reexec/IReExecutionPlugin.java 2b0d23c6f237915684d59856cb62001c01c8209c
> ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java 93031712dc6fb60a6d618c8754e50def489a12af
> ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecutionOverlayPlugin.java 4ee3c14b3988521b3f44d02f032163734ed9e4d9
> ql/src/java/org/apache/hadoop/hive/ql/reexec/ReOptimizePlugin.java 707858700f4d06301fb873fb47f8d9d9b101d056
> ql/src/java/org/apache/hadoop/hive/ql/reexec/SessionStatsPlugin.java PRE-CREATION
> ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 0071a9a4ebadd73103a6f6a5c97cc9992df7aa48
> ql/src/test/org/apache/hadoop/hive/ql/plan/mapping/TestCounterMapping.java 9fe95e4c5684b4aa69dd7148a14c29ed9c2a0062
>
>
> Diff: https://reviews.apache.org/r/66402/diff/2/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Zoltan Haindrich
>
>
Re: Review Request 66402: HIVE-19009 Retain and use runtime
statistics thru out a session
Posted by Ashutosh Chauhan <ha...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66402/#review200829
-----------------------------------------------------------
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
Lines 4265 (patched)
<https://reviews.apache.org/r/66402/#comment281730>
I think default memory consumption of ~100MB in cache will be a good default. Assuming 1KB (?) for each entry that will be 100K entries. Lets use that.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java
Lines 4958 (patched)
<https://reviews.apache.org/r/66402/#comment281729>
It will be good to leave a comment here about this.
ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/SessionStatsSource.java
Lines 35 (patched)
<https://reviews.apache.org/r/66402/#comment281767>
There should be a singleton for this cache which can be called from anywhere. e.g., see MaterializationsInvalidationCache
ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/StatsSources.java
Lines 34 (patched)
<https://reviews.apache.org/r/66402/#comment281731>
Deprecated when it was first written :)
Any reason for having it this way.
ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/StatsSources.java
Lines 64 (patched)
<https://reviews.apache.org/r/66402/#comment281768>
LOG.debug
ql/src/java/org/apache/hadoop/hive/ql/reexec/SessionStatsPlugin.java
Lines 36-40 (patched)
<https://reviews.apache.org/r/66402/#comment281769>
I am not sure SS stats source should be linked to driver. Its lifecycle is not tied to query or driver or SS. This should be initialized once in a HS2 lifecycle.
ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
Lines 123 (patched)
<https://reviews.apache.org/r/66402/#comment281766>
Why is stored in SessionState. SS is per query, whereas statscache is global.
- Ashutosh Chauhan
On April 10, 2018, 12:31 p.m., Zoltan Haindrich wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66402/
> -----------------------------------------------------------
>
> (Updated April 10, 2018, 12:31 p.m.)
>
>
> Review request for hive and Ashutosh Chauhan.
>
>
> Bugs: HIVE-19009
> https://issues.apache.org/jira/browse/HIVE-19009
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> * retain runtime stats at session level
> * some fixes to support runtime stats during vectorized execution
> * make collection less strict; there are cases when the same op tree is evaluated multiple times in a query
> * minor fixes and added some logging
>
>
> Diffs
> -----
>
> common/src/java/org/apache/hadoop/hive/conf/HiveConf.java dd94d4db87514de9050ea5669f73027ff3dd6937
> ql/src/java/org/apache/hadoop/hive/ql/Context.java 58fa5f2287e1221f300e16a78aa76dc6fc23cf47
> ql/src/java/org/apache/hadoop/hive/ql/Driver.java 75f928b69d3d7b206564216d24be450848a1fe8a
> ql/src/java/org/apache/hadoop/hive/ql/DriverFactory.java 0f6a80ef0d7b768371a932e5bda75348955d0369
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 13a2fc478fda244eca8632d12f2d8f46ae280a63
> ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/EmptyStatsSource.java 57762edbdfbf4f77b051f42faf313db882878969
> ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/SessionStatsSource.java PRE-CREATION
> ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/StatsSource.java a4cb6e977190c4f5fa275619a8e97755ad1f55d8
> ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/StatsSources.java PRE-CREATION
> ql/src/java/org/apache/hadoop/hive/ql/reexec/IReExecutionPlugin.java 2b0d23c6f237915684d59856cb62001c01c8209c
> ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java 93031712dc6fb60a6d618c8754e50def489a12af
> ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecutionOverlayPlugin.java 4ee3c14b3988521b3f44d02f032163734ed9e4d9
> ql/src/java/org/apache/hadoop/hive/ql/reexec/ReOptimizePlugin.java 707858700f4d06301fb873fb47f8d9d9b101d056
> ql/src/java/org/apache/hadoop/hive/ql/reexec/SessionStatsPlugin.java PRE-CREATION
> ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 0071a9a4ebadd73103a6f6a5c97cc9992df7aa48
> ql/src/test/org/apache/hadoop/hive/ql/plan/mapping/TestCounterMapping.java 9fe95e4c5684b4aa69dd7148a14c29ed9c2a0062
>
>
> Diff: https://reviews.apache.org/r/66402/diff/2/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Zoltan Haindrich
>
>
Re: Review Request 66402: HIVE-19009 Retain and use runtime
statistics thru out a session
Posted by Zoltan Haindrich <ki...@rxd.hu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66402/
-----------------------------------------------------------
(Updated April 17, 2018, 9:43 a.m.)
Review request for hive and Ashutosh Chauhan.
Changes
-------
persist in a global map
Bugs: HIVE-19009
https://issues.apache.org/jira/browse/HIVE-19009
Repository: hive-git
Description
-------
* retain runtime stats at session level
* some fixes to support runtime stats during vectorized execution
* make collection less strict; there are cases when the same op tree is evaluated multiple times in a query
* minor fixes and added some logging
Diffs (updated)
-----
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java dd94d4db87514de9050ea5669f73027ff3dd6937
itests/src/test/resources/testconfiguration.properties f513fe5ff760a6fa7e3ce069f33ce26703965e33
itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 050f9d5765c1389829fbde80cb76c32df4bc3cc5
ql/src/java/org/apache/hadoop/hive/ql/Context.java 58fa5f2287e1221f300e16a78aa76dc6fc23cf47
ql/src/java/org/apache/hadoop/hive/ql/Driver.java 75f928b69d3d7b206564216d24be450848a1fe8a
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 13a2fc478fda244eca8632d12f2d8f46ae280a63
ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/CachingStatsSource.java PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/EmptyStatsSource.java 57762edbdfbf4f77b051f42faf313db882878969
ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/SimpleRuntimeStatsSource.java 6f340b8450a95569156b3b7eddf28fdd53fc9065
ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/StatsSource.java a4cb6e977190c4f5fa275619a8e97755ad1f55d8
ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/StatsSources.java PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/reexec/IReExecutionPlugin.java 2b0d23c6f237915684d59856cb62001c01c8209c
ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java 93031712dc6fb60a6d618c8754e50def489a12af
ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecutionOverlayPlugin.java 4ee3c14b3988521b3f44d02f032163734ed9e4d9
ql/src/java/org/apache/hadoop/hive/ql/reexec/ReOptimizePlugin.java 707858700f4d06301fb873fb47f8d9d9b101d056
ql/src/test/org/apache/hadoop/hive/ql/optimizer/signature/TestOperatorSignature.java 8c899e7fef1a1ebe06fdd9269f3c922f8baab84d
ql/src/test/org/apache/hadoop/hive/ql/plan/mapping/TestCounterMapping.java 9fe95e4c5684b4aa69dd7148a14c29ed9c2a0062
ql/src/test/queries/clientpositive/runtime_stats_hs2.q PRE-CREATION
ql/src/test/results/clientpositive/llap/runtime_stats_hs2.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/66402/diff/3/
Changes: https://reviews.apache.org/r/66402/diff/2-3/
Testing
-------
Thanks,
Zoltan Haindrich
Re: Review Request 66402: HIVE-19009 Retain and use runtime
statistics thru out a session
Posted by Zoltan Haindrich <ki...@rxd.hu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66402/
-----------------------------------------------------------
(Updated April 10, 2018, 12:31 p.m.)
Review request for hive and Ashutosh Chauhan.
Changes
-------
update to patch#2
Summary (updated)
-----------------
HIVE-19009 Retain and use runtime statistics thru out a session
Bugs: HIVE-19009
https://issues.apache.org/jira/browse/HIVE-19009
Repository: hive-git
Description (updated)
-------
* retain runtime stats at session level
* some fixes to support runtime stats during vectorized execution
* make collection less strict; there are cases when the same op tree is evaluated multiple times in a query
* minor fixes and added some logging
Diffs (updated)
-----
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java dd94d4db87514de9050ea5669f73027ff3dd6937
ql/src/java/org/apache/hadoop/hive/ql/Context.java 58fa5f2287e1221f300e16a78aa76dc6fc23cf47
ql/src/java/org/apache/hadoop/hive/ql/Driver.java 75f928b69d3d7b206564216d24be450848a1fe8a
ql/src/java/org/apache/hadoop/hive/ql/DriverFactory.java 0f6a80ef0d7b768371a932e5bda75348955d0369
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 13a2fc478fda244eca8632d12f2d8f46ae280a63
ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/EmptyStatsSource.java 57762edbdfbf4f77b051f42faf313db882878969
ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/SessionStatsSource.java PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/StatsSource.java a4cb6e977190c4f5fa275619a8e97755ad1f55d8
ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/StatsSources.java PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/reexec/IReExecutionPlugin.java 2b0d23c6f237915684d59856cb62001c01c8209c
ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java 93031712dc6fb60a6d618c8754e50def489a12af
ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecutionOverlayPlugin.java 4ee3c14b3988521b3f44d02f032163734ed9e4d9
ql/src/java/org/apache/hadoop/hive/ql/reexec/ReOptimizePlugin.java 707858700f4d06301fb873fb47f8d9d9b101d056
ql/src/java/org/apache/hadoop/hive/ql/reexec/SessionStatsPlugin.java PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 0071a9a4ebadd73103a6f6a5c97cc9992df7aa48
ql/src/test/org/apache/hadoop/hive/ql/plan/mapping/TestCounterMapping.java 9fe95e4c5684b4aa69dd7148a14c29ed9c2a0062
Diff: https://reviews.apache.org/r/66402/diff/2/
Changes: https://reviews.apache.org/r/66402/diff/1-2/
Testing
-------
Thanks,
Zoltan Haindrich