You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by chengxiang li <ch...@intel.com> on 2014/11/07 06:27:35 UTC

Review Request 27720: HIVE-8777 should only register used operator counters.[Spark Branch]

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27720/
-----------------------------------------------------------

Review request for hive and Xuefu Zhang.


Bugs: HIVE-8777
    https://issues.apache.org/jira/browse/HIVE-8777


Repository: hive-git


Description
-------

Currently we register all hive operator counters in SparkCounters, while actually not all hive operators are used in SparkTask, we should iterate SparkTask's operators, and only register conuters required.


Diffs
-----

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java e955da3 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 46b04bc 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/counter/SparkCounters.java bb3597a 
  ql/src/java/org/apache/hadoop/hive/ql/plan/SparkWork.java 66fd6b6 

Diff: https://reviews.apache.org/r/27720/diff/


Testing
-------


Thanks,

chengxiang li


Re: Review Request 27720: HIVE-8777 should only register used operator counters.[Spark Branch]

Posted by Xuefu Zhang <xz...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27720/#review60333
-----------------------------------------------------------

Ship it!


Got it. Thanks for the explanation.

I'm going to create a separate JIRA requesting to have some tests with stats collection with counter.

- Xuefu Zhang


On Nov. 7, 2014, 5:27 a.m., chengxiang li wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27720/
> -----------------------------------------------------------
> 
> (Updated Nov. 7, 2014, 5:27 a.m.)
> 
> 
> Review request for hive and Xuefu Zhang.
> 
> 
> Bugs: HIVE-8777
>     https://issues.apache.org/jira/browse/HIVE-8777
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Currently we register all hive operator counters in SparkCounters, while actually not all hive operators are used in SparkTask, we should iterate SparkTask's operators, and only register conuters required.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java e955da3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 46b04bc 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/counter/SparkCounters.java bb3597a 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/SparkWork.java 66fd6b6 
> 
> Diff: https://reviews.apache.org/r/27720/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> chengxiang li
> 
>


Re: Review Request 27720: HIVE-8777 should only register used operator counters.[Spark Branch]

Posted by Xuefu Zhang <xz...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27720/#review60343
-----------------------------------------------------------



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java
<https://reviews.apache.org/r/27720/#comment101712>

    I'm wondering, if there are multiple instance of an operator, say FileSinkOperator, what's the effect on counter registration.


- Xuefu Zhang


On Nov. 7, 2014, 5:27 a.m., chengxiang li wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27720/
> -----------------------------------------------------------
> 
> (Updated Nov. 7, 2014, 5:27 a.m.)
> 
> 
> Review request for hive and Xuefu Zhang.
> 
> 
> Bugs: HIVE-8777
>     https://issues.apache.org/jira/browse/HIVE-8777
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Currently we register all hive operator counters in SparkCounters, while actually not all hive operators are used in SparkTask, we should iterate SparkTask's operators, and only register conuters required.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java e955da3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 46b04bc 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/counter/SparkCounters.java bb3597a 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/SparkWork.java 66fd6b6 
> 
> Diff: https://reviews.apache.org/r/27720/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> chengxiang li
> 
>


Re: Review Request 27720: HIVE-8777 should only register used operator counters.[Spark Branch]

Posted by Xuefu Zhang <xz...@cloudera.com>.

> On Nov. 7, 2014, 2:36 p.m., Xuefu Zhang wrote:
> > 1. Looking at the diff, I was sure where we are removing unnecessary counter registrations.
> > 2. It would be great if we can have some tests that are enabled with counter statistics collection, so that we know what kind of output we are expecting and avoid future breakage.
> 
> chengxiang li wrote:
>     Thanks,xuefu.the hive operator level counters is used during spark job execution,so our current tests should have covered this patch change.but for table statistic collection, as you know that we use fs as default counter storage for test. i run qtests locally before,but yes related change would not reflect in automatic test result.

Actually, for #1, I meant "I was NOT sure..."

Can we have some tests running wiht counter?


- Xuefu


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27720/#review60316
-----------------------------------------------------------


On Nov. 7, 2014, 5:27 a.m., chengxiang li wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27720/
> -----------------------------------------------------------
> 
> (Updated Nov. 7, 2014, 5:27 a.m.)
> 
> 
> Review request for hive and Xuefu Zhang.
> 
> 
> Bugs: HIVE-8777
>     https://issues.apache.org/jira/browse/HIVE-8777
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Currently we register all hive operator counters in SparkCounters, while actually not all hive operators are used in SparkTask, we should iterate SparkTask's operators, and only register conuters required.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java e955da3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 46b04bc 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/counter/SparkCounters.java bb3597a 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/SparkWork.java 66fd6b6 
> 
> Diff: https://reviews.apache.org/r/27720/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> chengxiang li
> 
>


Re: Review Request 27720: HIVE-8777 should only register used operator counters.[Spark Branch]

Posted by chengxiang li <ch...@intel.com>.

> On 十一月 7, 2014, 2:36 p.m., Xuefu Zhang wrote:
> > 1. Looking at the diff, I was sure where we are removing unnecessary counter registrations.
> > 2. It would be great if we can have some tests that are enabled with counter statistics collection, so that we know what kind of output we are expecting and avoid future breakage.
> 
> chengxiang li wrote:
>     Thanks,xuefu.the hive operator level counters is used during spark job execution,so our current tests should have covered this patch change.but for table statistic collection, as you know that we use fs as default counter storage for test. i run qtests locally before,but yes related change would not reflect in automatic test result.
> 
> Xuefu Zhang wrote:
>     Actually, for #1, I meant "I was NOT sure..."
>     
>     Can we have some tests running wiht counter?

Oh, previously we register all possible counters in SparkCounters, in this patch, we get only required counters with SparkTask::getOperatorCounters, and register them in SparkClient.Maybe i'm not descirbe this clearly, this patch only include hive operator counter register related change,as all queries would collect hive operator statistic,so all qtests would go through the changed logic. table statistic collection is not related with this patch.


- chengxiang


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27720/#review60316
-----------------------------------------------------------


On 十一月 7, 2014, 5:27 a.m., chengxiang li wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27720/
> -----------------------------------------------------------
> 
> (Updated 十一月 7, 2014, 5:27 a.m.)
> 
> 
> Review request for hive and Xuefu Zhang.
> 
> 
> Bugs: HIVE-8777
>     https://issues.apache.org/jira/browse/HIVE-8777
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Currently we register all hive operator counters in SparkCounters, while actually not all hive operators are used in SparkTask, we should iterate SparkTask's operators, and only register conuters required.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java e955da3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 46b04bc 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/counter/SparkCounters.java bb3597a 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/SparkWork.java 66fd6b6 
> 
> Diff: https://reviews.apache.org/r/27720/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> chengxiang li
> 
>


Re: Review Request 27720: HIVE-8777 should only register used operator counters.[Spark Branch]

Posted by chengxiang li <ch...@intel.com>.

> On 十一月 7, 2014, 2:36 p.m., Xuefu Zhang wrote:
> > 1. Looking at the diff, I was sure where we are removing unnecessary counter registrations.
> > 2. It would be great if we can have some tests that are enabled with counter statistics collection, so that we know what kind of output we are expecting and avoid future breakage.

Thanks,xuefu.the hive operator level counters is used during spark job execution,so our current tests should have covered this patch change.but for table statistic collection, as you know that we use fs as default counter storage for test. i run qtests locally before,but yes related change would not reflect in automatic test result.


- chengxiang


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27720/#review60316
-----------------------------------------------------------


On 十一月 7, 2014, 5:27 a.m., chengxiang li wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27720/
> -----------------------------------------------------------
> 
> (Updated 十一月 7, 2014, 5:27 a.m.)
> 
> 
> Review request for hive and Xuefu Zhang.
> 
> 
> Bugs: HIVE-8777
>     https://issues.apache.org/jira/browse/HIVE-8777
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Currently we register all hive operator counters in SparkCounters, while actually not all hive operators are used in SparkTask, we should iterate SparkTask's operators, and only register conuters required.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java e955da3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 46b04bc 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/counter/SparkCounters.java bb3597a 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/SparkWork.java 66fd6b6 
> 
> Diff: https://reviews.apache.org/r/27720/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> chengxiang li
> 
>


Re: Review Request 27720: HIVE-8777 should only register used operator counters.[Spark Branch]

Posted by Xuefu Zhang <xz...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27720/#review60316
-----------------------------------------------------------


1. Looking at the diff, I was sure where we are removing unnecessary counter registrations.
2. It would be great if we can have some tests that are enabled with counter statistics collection, so that we know what kind of output we are expecting and avoid future breakage.

- Xuefu Zhang


On Nov. 7, 2014, 5:27 a.m., chengxiang li wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27720/
> -----------------------------------------------------------
> 
> (Updated Nov. 7, 2014, 5:27 a.m.)
> 
> 
> Review request for hive and Xuefu Zhang.
> 
> 
> Bugs: HIVE-8777
>     https://issues.apache.org/jira/browse/HIVE-8777
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Currently we register all hive operator counters in SparkCounters, while actually not all hive operators are used in SparkTask, we should iterate SparkTask's operators, and only register conuters required.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java e955da3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 46b04bc 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/counter/SparkCounters.java bb3597a 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/SparkWork.java 66fd6b6 
> 
> Diff: https://reviews.apache.org/r/27720/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> chengxiang li
> 
>