You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by zuotingbing <gi...@git.apache.org> on 2017/12/15 10:24:28 UTC

[GitHub] spark pull request #19989: [SPARK-22793][SQL]Memory leak in Spark Thrift Ser...

GitHub user zuotingbing opened a pull request:

    https://github.com/apache/spark/pull/19989

    [SPARK-22793][SQL]Memory leak in Spark Thrift Server

    ## What changes were proposed in this pull request?
    
    1. Start HiveThriftServer2.
    2. Connect to thriftserver through beeline.
    3. Close the beeline.
    4. repeat step2 and step 3 for several times, which caused the leak of Memory.
    
    we found there are many directories never be dropped under the path `hive.exec.local.scratchdir` and `hive.exec.scratchdir`, as we know the scratchdir has been added to deleteOnExit when it be created. So it means that the cache size of FileSystem deleteOnExit will keep increasing until JVM terminated.
    
    In addition, we use `jmap -histo:live [PID]`
    to printout the size of objects in HiveThriftServer2 Process, we can find the object `org.apache.spark.sql.hive.client.HiveClientImpl` and `org.apache.hadoop.hive.ql.session.SessionState` keep increasing even though we closed all the beeline connections, which caused the leak of Memory.
    
    ## How was this patch tested?
    
    manual tests

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zuotingbing/spark branch-2.0

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19989.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19989
    
----

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19989: [SPARK-22793][SQL]Memory leak in Spark Thrift Server

Posted by zuotingbing <gi...@git.apache.org>.
Github user zuotingbing commented on the issue:

    https://github.com/apache/spark/pull/19989
  
    we can find the cache size of FileSystem `deleteOnExit` will keep increasing.
    [
    ![mshot](https://user-images.githubusercontent.com/24823338/34095036-d1fcc408-e40a-11e7-9599-2acdd96da2d9.png)
    ](url)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19989: [SPARK-22793][SQL]Memory leak in Spark Thrift Server

Posted by zuotingbing <gi...@git.apache.org>.
Github user zuotingbing commented on the issue:

    https://github.com/apache/spark/pull/19989
  
    OK , please move to https://github.com/apache/spark/pull/20029. Thanks all.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19989: [SPARK-22793][SQL]Memory leak in Spark Thrift Server

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/19989
  
    I am not sure about this change actually. In this way all the users would use the same `metadataHive`. This might have also concurrency issue. Did you experienced a OOM error due to the memory leak?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19989: [SPARK-22793][SQL][BACKPORT-2.0]Memory leak in Sp...

Posted by zuotingbing <gi...@git.apache.org>.
Github user zuotingbing closed the pull request at:

    https://github.com/apache/spark/pull/19989


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19989: [SPARK-22793][SQL][BACKPORT-2.0]Memory leak in Spark Thr...

Posted by zuotingbing <gi...@git.apache.org>.
Github user zuotingbing commented on the issue:

    https://github.com/apache/spark/pull/19989
  
    ok, got it. Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19989: [SPARK-22793][SQL][BACKPORT-2.0]Memory leak in Spark Thr...

Posted by zuotingbing <gi...@git.apache.org>.
Github user zuotingbing commented on the issue:

    https://github.com/apache/spark/pull/19989
  
    Could you please to check this PR or find out how to correct it? It seems a critical bug. Thanks!  @cloud-fan @rxin


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19989: [SPARK-22793][SQL][BACKPORT-2.0]Memory leak in Spark Thr...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19989
  
    This is not a backport as this patch is not merged to master yet. Let's move the discussion to the primary PR that against the master branch.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19989: [SPARK-22793][SQL]Memory leak in Spark Thrift Server

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19989
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19989: [SPARK-22793][SQL]Memory leak in Spark Thrift Server

Posted by zuotingbing <gi...@git.apache.org>.
Github user zuotingbing commented on the issue:

    https://github.com/apache/spark/pull/19989
  
    as i debug, every time when i connect to thrift server through beeline, the `SessionState.start(state)` will be called two times. one is in `HiveSessionImpl:open` , **another is in `HiveClientImpl` for sql `use default`** .
    SessionManager.java#L151 or `HiveSessionImpl:close` only to clean the first.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19989: [SPARK-22793][SQL]Memory leak in Spark Thrift Server

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19989
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19989: [SPARK-22793][SQL][BACKPORT-2.0]Memory leak in Spark Thr...

Posted by zuotingbing <gi...@git.apache.org>.
Github user zuotingbing commented on the issue:

    https://github.com/apache/spark/pull/19989
  
    @gatorsmile @liufengdb  Could you please also check this PR ?  it  [BACKPORT-2.0] from master/2.3  about [?](https://github.com/apache/spark/pull/20029) 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19989: [SPARK-22793][SQL]Memory leak in Spark Thrift Server

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19989
  
    cc @liufengdb 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19989: [SPARK-22793][SQL][BACKPORT-2.0]Memory leak in Spark Thr...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19989
  
    @zuotingbing No new 2.0 release is planned. Thus, we do not backport it to 2.0.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19989: [SPARK-22793][SQL]Memory leak in Spark Thrift Server

Posted by liufengdb <gi...@git.apache.org>.
Github user liufengdb commented on the issue:

    https://github.com/apache/spark/pull/19989
  
    I think this method can take care of resource clean up automatically: https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/SessionManager.java#L151
    
    Can you really make a heap dump and find out why the sessions are not cleaned up? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org