You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Jimmy Xiang <jx...@cloudera.com> on 2015/05/15 23:53:11 UTC
Review Request 34293: HIVE-10721 SparkSessionManagerImpl leaks
SparkSessions [Spark Branch]
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34293/
-----------------------------------------------------------
Review request for hive and Xuefu Zhang.
Bugs: HIVE-10721
https://issues.apache.org/jira/browse/HIVE-10721
Repository: hive-git
Description
-------
Add a SparkSession to createdSessions only after the session is opened properly if doOpen is specified.
Diffs
-----
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java 7e33a3f
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java bae30f3
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionImpl.java 603f1ca
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionManagerImpl.java ad012b6
spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 1bcd221
Diff: https://reviews.apache.org/r/34293/diff/
Testing
-------
Thanks,
Jimmy Xiang
Re: Review Request 34293: HIVE-10721 SparkSessionManagerImpl leaks
SparkSessions [Spark Branch]
Posted by Jimmy Xiang <jx...@cloudera.com>.
> On May 18, 2015, 2:26 a.m., chengxiang li wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionManagerImpl.java, line 96
> > <https://reviews.apache.org/r/34293/diff/1/?file=961679#file961679line96>
> >
> > SparkClientFactory.initialize would be invoked only once, which means RpcServer would be initialized once inside either, so while we update spark client rpc related paramters, RpcServer does not really updated. This should be another issue, i just list here as it's found while read the code.
Are you saying the RpcServer should be restated too, because some configuration used by RpcServer could be changed? We may need to track those related properties separately. This could complicate the code however. Of course, I agree with you this is indeed an issue.
- Jimmy
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34293/#review84094
-----------------------------------------------------------
On May 15, 2015, 9:53 p.m., Jimmy Xiang wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34293/
> -----------------------------------------------------------
>
> (Updated May 15, 2015, 9:53 p.m.)
>
>
> Review request for hive and Xuefu Zhang.
>
>
> Bugs: HIVE-10721
> https://issues.apache.org/jira/browse/HIVE-10721
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> Add a SparkSession to createdSessions only after the session is opened properly if doOpen is specified.
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java 7e33a3f
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java bae30f3
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionImpl.java 603f1ca
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionManagerImpl.java ad012b6
> spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 1bcd221
>
> Diff: https://reviews.apache.org/r/34293/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Jimmy Xiang
>
>
Re: Review Request 34293: HIVE-10721 SparkSessionManagerImpl leaks
SparkSessions [Spark Branch]
Posted by chengxiang li <ch...@intel.com>.
> On 五月 18, 2015, 2:26 a.m., chengxiang li wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionManagerImpl.java, line 96
> > <https://reviews.apache.org/r/34293/diff/1/?file=961679#file961679line96>
> >
> > SparkClientFactory.initialize would be invoked only once, which means RpcServer would be initialized once inside either, so while we update spark client rpc related paramters, RpcServer does not really updated. This should be another issue, i just list here as it's found while read the code.
>
> Jimmy Xiang wrote:
> Are you saying the RpcServer should be restated too, because some configuration used by RpcServer could be changed? We may need to track those related properties separately. This could complicate the code however. Of course, I agree with you this is indeed an issue.
Yes, if we want to make these rpc configurations dynamically effectual, RpcServer should be restarted as well.
- chengxiang
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34293/#review84094
-----------------------------------------------------------
On 五月 15, 2015, 9:53 p.m., Jimmy Xiang wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34293/
> -----------------------------------------------------------
>
> (Updated 五月 15, 2015, 9:53 p.m.)
>
>
> Review request for hive and Xuefu Zhang.
>
>
> Bugs: HIVE-10721
> https://issues.apache.org/jira/browse/HIVE-10721
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> Add a SparkSession to createdSessions only after the session is opened properly if doOpen is specified.
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java 7e33a3f
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java bae30f3
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionImpl.java 603f1ca
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionManagerImpl.java ad012b6
> spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 1bcd221
>
> Diff: https://reviews.apache.org/r/34293/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Jimmy Xiang
>
>
Re: Review Request 34293: HIVE-10721 SparkSessionManagerImpl leaks
SparkSessions [Spark Branch]
Posted by chengxiang li <ch...@intel.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34293/#review84094
-----------------------------------------------------------
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionManagerImpl.java
<https://reviews.apache.org/r/34293/#comment135202>
SparkClientFactory.initialize would be invoked only once, which means RpcServer would be initialized once inside either, so while we update spark client rpc related paramters, RpcServer does not really updated. This should be another issue, i just list here as it's found while read the code.
- chengxiang li
On 五月 15, 2015, 9:53 p.m., Jimmy Xiang wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34293/
> -----------------------------------------------------------
>
> (Updated 五月 15, 2015, 9:53 p.m.)
>
>
> Review request for hive and Xuefu Zhang.
>
>
> Bugs: HIVE-10721
> https://issues.apache.org/jira/browse/HIVE-10721
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> Add a SparkSession to createdSessions only after the session is opened properly if doOpen is specified.
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java 7e33a3f
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java bae30f3
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionImpl.java 603f1ca
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionManagerImpl.java ad012b6
> spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 1bcd221
>
> Diff: https://reviews.apache.org/r/34293/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Jimmy Xiang
>
>
Re: Review Request 34293: HIVE-10721 SparkSessionManagerImpl leaks
SparkSessions [Spark Branch]
Posted by chengxiang li <ch...@intel.com>.
> On 五月 18, 2015, 2:37 a.m., chengxiang li wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionManagerImpl.java, line 88
> > <https://reviews.apache.org/r/34293/diff/1/?file=961679#file961679line88>
> >
> > Just curious, it looks to me that AtomaticBoolean works here either, is that possible 2 threads executed this block both?
>
> Jimmy Xiang wrote:
> If several sessions connect to the same HS2, they might execute this block concurrently. One issue with AtomaticBoolean instead of synchonized here is that we have to make sure the SparkClientFactory is properly initialized. Sometimes, we see it throws an exception, in which case, we may need to initialize it again.
Ok, I see, thanks for explaination.
- chengxiang
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34293/#review84096
-----------------------------------------------------------
On 五月 15, 2015, 9:53 p.m., Jimmy Xiang wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34293/
> -----------------------------------------------------------
>
> (Updated 五月 15, 2015, 9:53 p.m.)
>
>
> Review request for hive and Xuefu Zhang.
>
>
> Bugs: HIVE-10721
> https://issues.apache.org/jira/browse/HIVE-10721
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> Add a SparkSession to createdSessions only after the session is opened properly if doOpen is specified.
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java 7e33a3f
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java bae30f3
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionImpl.java 603f1ca
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionManagerImpl.java ad012b6
> spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 1bcd221
>
> Diff: https://reviews.apache.org/r/34293/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Jimmy Xiang
>
>
Re: Review Request 34293: HIVE-10721 SparkSessionManagerImpl leaks
SparkSessions [Spark Branch]
Posted by Jimmy Xiang <jx...@cloudera.com>.
> On May 18, 2015, 2:37 a.m., chengxiang li wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionManagerImpl.java, line 88
> > <https://reviews.apache.org/r/34293/diff/1/?file=961679#file961679line88>
> >
> > Just curious, it looks to me that AtomaticBoolean works here either, is that possible 2 threads executed this block both?
If several sessions connect to the same HS2, they might execute this block concurrently. One issue with AtomaticBoolean instead of synchonized here is that we have to make sure the SparkClientFactory is properly initialized. Sometimes, we see it throws an exception, in which case, we may need to initialize it again.
- Jimmy
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34293/#review84096
-----------------------------------------------------------
On May 15, 2015, 9:53 p.m., Jimmy Xiang wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34293/
> -----------------------------------------------------------
>
> (Updated May 15, 2015, 9:53 p.m.)
>
>
> Review request for hive and Xuefu Zhang.
>
>
> Bugs: HIVE-10721
> https://issues.apache.org/jira/browse/HIVE-10721
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> Add a SparkSession to createdSessions only after the session is opened properly if doOpen is specified.
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java 7e33a3f
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java bae30f3
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionImpl.java 603f1ca
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionManagerImpl.java ad012b6
> spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 1bcd221
>
> Diff: https://reviews.apache.org/r/34293/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Jimmy Xiang
>
>
Re: Review Request 34293: HIVE-10721 SparkSessionManagerImpl leaks
SparkSessions [Spark Branch]
Posted by chengxiang li <ch...@intel.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34293/#review84096
-----------------------------------------------------------
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionManagerImpl.java
<https://reviews.apache.org/r/34293/#comment135204>
Just curious, it looks to me that AtomaticBoolean works here either, is that possible 2 threads executed this block both?
- chengxiang li
On 五月 15, 2015, 9:53 p.m., Jimmy Xiang wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34293/
> -----------------------------------------------------------
>
> (Updated 五月 15, 2015, 9:53 p.m.)
>
>
> Review request for hive and Xuefu Zhang.
>
>
> Bugs: HIVE-10721
> https://issues.apache.org/jira/browse/HIVE-10721
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> Add a SparkSession to createdSessions only after the session is opened properly if doOpen is specified.
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java 7e33a3f
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java bae30f3
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionImpl.java 603f1ca
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionManagerImpl.java ad012b6
> spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 1bcd221
>
> Diff: https://reviews.apache.org/r/34293/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Jimmy Xiang
>
>
Re: Review Request 34293: HIVE-10721 SparkSessionManagerImpl leaks
SparkSessions [Spark Branch]
Posted by Jimmy Xiang <jx...@cloudera.com>.
> On May 15, 2015, 10:19 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionManagerImpl.java, line 169
> > <https://reviews.apache.org/r/34293/diff/1/?file=961679#file961679line169>
> >
> > Any concurrency issue for this, or it doesn't matter since it's in shutdown()?
This is fine. Two reasons: 1. inited is volatile, 2, it is in shutdown(). Probably we can remove this line since it is in shutdown().
- Jimmy
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34293/#review83990
-----------------------------------------------------------
On May 15, 2015, 9:53 p.m., Jimmy Xiang wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34293/
> -----------------------------------------------------------
>
> (Updated May 15, 2015, 9:53 p.m.)
>
>
> Review request for hive and Xuefu Zhang.
>
>
> Bugs: HIVE-10721
> https://issues.apache.org/jira/browse/HIVE-10721
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> Add a SparkSession to createdSessions only after the session is opened properly if doOpen is specified.
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java 7e33a3f
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java bae30f3
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionImpl.java 603f1ca
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionManagerImpl.java ad012b6
> spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 1bcd221
>
> Diff: https://reviews.apache.org/r/34293/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Jimmy Xiang
>
>
Re: Review Request 34293: HIVE-10721 SparkSessionManagerImpl leaks
SparkSessions [Spark Branch]
Posted by Xuefu Zhang <xz...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34293/#review83990
-----------------------------------------------------------
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionManagerImpl.java
<https://reviews.apache.org/r/34293/#comment135091>
Any concurrency issue for this, or it doesn't matter since it's in shutdown()?
- Xuefu Zhang
On May 15, 2015, 9:53 p.m., Jimmy Xiang wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34293/
> -----------------------------------------------------------
>
> (Updated May 15, 2015, 9:53 p.m.)
>
>
> Review request for hive and Xuefu Zhang.
>
>
> Bugs: HIVE-10721
> https://issues.apache.org/jira/browse/HIVE-10721
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> Add a SparkSession to createdSessions only after the session is opened properly if doOpen is specified.
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java 7e33a3f
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java bae30f3
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionImpl.java 603f1ca
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionManagerImpl.java ad012b6
> spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 1bcd221
>
> Diff: https://reviews.apache.org/r/34293/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Jimmy Xiang
>
>