You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Jimmy Xiang <jx...@cloudera.com> on 2015/05/15 23:53:11 UTC

Review Request 34293: HIVE-10721 SparkSessionManagerImpl leaks SparkSessions [Spark Branch]

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34293/
-----------------------------------------------------------

Review request for hive and Xuefu Zhang.


Bugs: HIVE-10721
    https://issues.apache.org/jira/browse/HIVE-10721


Repository: hive-git


Description
-------

Add a SparkSession to createdSessions only after the session is opened properly if doOpen is specified.


Diffs
-----

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java 7e33a3f 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java bae30f3 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionImpl.java 603f1ca 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionManagerImpl.java ad012b6 
  spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 1bcd221 

Diff: https://reviews.apache.org/r/34293/diff/


Testing
-------


Thanks,

Jimmy Xiang


Re: Review Request 34293: HIVE-10721 SparkSessionManagerImpl leaks SparkSessions [Spark Branch]

Posted by Jimmy Xiang <jx...@cloudera.com>.

> On May 18, 2015, 2:26 a.m., chengxiang li wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionManagerImpl.java, line 96
> > <https://reviews.apache.org/r/34293/diff/1/?file=961679#file961679line96>
> >
> >     SparkClientFactory.initialize would be invoked only once, which means RpcServer would be initialized once inside either, so while we update spark client rpc related paramters, RpcServer does not really updated. This should be another issue, i just list here as it's found while read the code.

Are you saying the RpcServer should be restated too, because some configuration used by RpcServer could be changed? We may need to track those related properties separately. This could complicate the code however. Of course, I agree with you this is indeed an issue.


- Jimmy


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34293/#review84094
-----------------------------------------------------------


On May 15, 2015, 9:53 p.m., Jimmy Xiang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34293/
> -----------------------------------------------------------
> 
> (Updated May 15, 2015, 9:53 p.m.)
> 
> 
> Review request for hive and Xuefu Zhang.
> 
> 
> Bugs: HIVE-10721
>     https://issues.apache.org/jira/browse/HIVE-10721
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Add a SparkSession to createdSessions only after the session is opened properly if doOpen is specified.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java 7e33a3f 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java bae30f3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionImpl.java 603f1ca 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionManagerImpl.java ad012b6 
>   spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 1bcd221 
> 
> Diff: https://reviews.apache.org/r/34293/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jimmy Xiang
> 
>


Re: Review Request 34293: HIVE-10721 SparkSessionManagerImpl leaks SparkSessions [Spark Branch]

Posted by chengxiang li <ch...@intel.com>.

> On 五月 18, 2015, 2:26 a.m., chengxiang li wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionManagerImpl.java, line 96
> > <https://reviews.apache.org/r/34293/diff/1/?file=961679#file961679line96>
> >
> >     SparkClientFactory.initialize would be invoked only once, which means RpcServer would be initialized once inside either, so while we update spark client rpc related paramters, RpcServer does not really updated. This should be another issue, i just list here as it's found while read the code.
> 
> Jimmy Xiang wrote:
>     Are you saying the RpcServer should be restated too, because some configuration used by RpcServer could be changed? We may need to track those related properties separately. This could complicate the code however. Of course, I agree with you this is indeed an issue.

Yes, if we want to make these rpc configurations dynamically effectual, RpcServer should be restarted as well.


- chengxiang


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34293/#review84094
-----------------------------------------------------------


On 五月 15, 2015, 9:53 p.m., Jimmy Xiang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34293/
> -----------------------------------------------------------
> 
> (Updated 五月 15, 2015, 9:53 p.m.)
> 
> 
> Review request for hive and Xuefu Zhang.
> 
> 
> Bugs: HIVE-10721
>     https://issues.apache.org/jira/browse/HIVE-10721
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Add a SparkSession to createdSessions only after the session is opened properly if doOpen is specified.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java 7e33a3f 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java bae30f3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionImpl.java 603f1ca 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionManagerImpl.java ad012b6 
>   spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 1bcd221 
> 
> Diff: https://reviews.apache.org/r/34293/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jimmy Xiang
> 
>


Re: Review Request 34293: HIVE-10721 SparkSessionManagerImpl leaks SparkSessions [Spark Branch]

Posted by chengxiang li <ch...@intel.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34293/#review84094
-----------------------------------------------------------



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionManagerImpl.java
<https://reviews.apache.org/r/34293/#comment135202>

    SparkClientFactory.initialize would be invoked only once, which means RpcServer would be initialized once inside either, so while we update spark client rpc related paramters, RpcServer does not really updated. This should be another issue, i just list here as it's found while read the code.


- chengxiang li


On 五月 15, 2015, 9:53 p.m., Jimmy Xiang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34293/
> -----------------------------------------------------------
> 
> (Updated 五月 15, 2015, 9:53 p.m.)
> 
> 
> Review request for hive and Xuefu Zhang.
> 
> 
> Bugs: HIVE-10721
>     https://issues.apache.org/jira/browse/HIVE-10721
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Add a SparkSession to createdSessions only after the session is opened properly if doOpen is specified.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java 7e33a3f 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java bae30f3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionImpl.java 603f1ca 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionManagerImpl.java ad012b6 
>   spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 1bcd221 
> 
> Diff: https://reviews.apache.org/r/34293/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jimmy Xiang
> 
>


Re: Review Request 34293: HIVE-10721 SparkSessionManagerImpl leaks SparkSessions [Spark Branch]

Posted by chengxiang li <ch...@intel.com>.

> On 五月 18, 2015, 2:37 a.m., chengxiang li wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionManagerImpl.java, line 88
> > <https://reviews.apache.org/r/34293/diff/1/?file=961679#file961679line88>
> >
> >     Just curious, it looks to me that AtomaticBoolean works here either, is that possible 2 threads executed this block both?
> 
> Jimmy Xiang wrote:
>     If several sessions connect to the same HS2, they might execute this block concurrently. One issue with AtomaticBoolean instead of synchonized here is that we have to make sure the SparkClientFactory is properly initialized. Sometimes, we see it throws an exception, in which case, we may need to initialize it again.

Ok, I see, thanks for explaination.


- chengxiang


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34293/#review84096
-----------------------------------------------------------


On 五月 15, 2015, 9:53 p.m., Jimmy Xiang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34293/
> -----------------------------------------------------------
> 
> (Updated 五月 15, 2015, 9:53 p.m.)
> 
> 
> Review request for hive and Xuefu Zhang.
> 
> 
> Bugs: HIVE-10721
>     https://issues.apache.org/jira/browse/HIVE-10721
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Add a SparkSession to createdSessions only after the session is opened properly if doOpen is specified.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java 7e33a3f 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java bae30f3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionImpl.java 603f1ca 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionManagerImpl.java ad012b6 
>   spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 1bcd221 
> 
> Diff: https://reviews.apache.org/r/34293/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jimmy Xiang
> 
>


Re: Review Request 34293: HIVE-10721 SparkSessionManagerImpl leaks SparkSessions [Spark Branch]

Posted by Jimmy Xiang <jx...@cloudera.com>.

> On May 18, 2015, 2:37 a.m., chengxiang li wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionManagerImpl.java, line 88
> > <https://reviews.apache.org/r/34293/diff/1/?file=961679#file961679line88>
> >
> >     Just curious, it looks to me that AtomaticBoolean works here either, is that possible 2 threads executed this block both?

If several sessions connect to the same HS2, they might execute this block concurrently. One issue with AtomaticBoolean instead of synchonized here is that we have to make sure the SparkClientFactory is properly initialized. Sometimes, we see it throws an exception, in which case, we may need to initialize it again.


- Jimmy


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34293/#review84096
-----------------------------------------------------------


On May 15, 2015, 9:53 p.m., Jimmy Xiang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34293/
> -----------------------------------------------------------
> 
> (Updated May 15, 2015, 9:53 p.m.)
> 
> 
> Review request for hive and Xuefu Zhang.
> 
> 
> Bugs: HIVE-10721
>     https://issues.apache.org/jira/browse/HIVE-10721
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Add a SparkSession to createdSessions only after the session is opened properly if doOpen is specified.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java 7e33a3f 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java bae30f3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionImpl.java 603f1ca 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionManagerImpl.java ad012b6 
>   spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 1bcd221 
> 
> Diff: https://reviews.apache.org/r/34293/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jimmy Xiang
> 
>


Re: Review Request 34293: HIVE-10721 SparkSessionManagerImpl leaks SparkSessions [Spark Branch]

Posted by chengxiang li <ch...@intel.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34293/#review84096
-----------------------------------------------------------



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionManagerImpl.java
<https://reviews.apache.org/r/34293/#comment135204>

    Just curious, it looks to me that AtomaticBoolean works here either, is that possible 2 threads executed this block both?


- chengxiang li


On 五月 15, 2015, 9:53 p.m., Jimmy Xiang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34293/
> -----------------------------------------------------------
> 
> (Updated 五月 15, 2015, 9:53 p.m.)
> 
> 
> Review request for hive and Xuefu Zhang.
> 
> 
> Bugs: HIVE-10721
>     https://issues.apache.org/jira/browse/HIVE-10721
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Add a SparkSession to createdSessions only after the session is opened properly if doOpen is specified.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java 7e33a3f 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java bae30f3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionImpl.java 603f1ca 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionManagerImpl.java ad012b6 
>   spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 1bcd221 
> 
> Diff: https://reviews.apache.org/r/34293/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jimmy Xiang
> 
>


Re: Review Request 34293: HIVE-10721 SparkSessionManagerImpl leaks SparkSessions [Spark Branch]

Posted by Jimmy Xiang <jx...@cloudera.com>.

> On May 15, 2015, 10:19 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionManagerImpl.java, line 169
> > <https://reviews.apache.org/r/34293/diff/1/?file=961679#file961679line169>
> >
> >     Any concurrency issue for this, or it doesn't matter since it's in shutdown()?

This is fine. Two reasons: 1. inited is volatile, 2, it is in shutdown(). Probably we can remove this line since it is in shutdown().


- Jimmy


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34293/#review83990
-----------------------------------------------------------


On May 15, 2015, 9:53 p.m., Jimmy Xiang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34293/
> -----------------------------------------------------------
> 
> (Updated May 15, 2015, 9:53 p.m.)
> 
> 
> Review request for hive and Xuefu Zhang.
> 
> 
> Bugs: HIVE-10721
>     https://issues.apache.org/jira/browse/HIVE-10721
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Add a SparkSession to createdSessions only after the session is opened properly if doOpen is specified.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java 7e33a3f 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java bae30f3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionImpl.java 603f1ca 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionManagerImpl.java ad012b6 
>   spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 1bcd221 
> 
> Diff: https://reviews.apache.org/r/34293/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jimmy Xiang
> 
>


Re: Review Request 34293: HIVE-10721 SparkSessionManagerImpl leaks SparkSessions [Spark Branch]

Posted by Xuefu Zhang <xz...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34293/#review83990
-----------------------------------------------------------



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionManagerImpl.java
<https://reviews.apache.org/r/34293/#comment135091>

    Any concurrency issue for this, or it doesn't matter since it's in shutdown()?


- Xuefu Zhang


On May 15, 2015, 9:53 p.m., Jimmy Xiang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34293/
> -----------------------------------------------------------
> 
> (Updated May 15, 2015, 9:53 p.m.)
> 
> 
> Review request for hive and Xuefu Zhang.
> 
> 
> Bugs: HIVE-10721
>     https://issues.apache.org/jira/browse/HIVE-10721
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Add a SparkSession to createdSessions only after the session is opened properly if doOpen is specified.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java 7e33a3f 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java bae30f3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionImpl.java 603f1ca 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionManagerImpl.java ad012b6 
>   spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 1bcd221 
> 
> Diff: https://reviews.apache.org/r/34293/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jimmy Xiang
> 
>