You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Karan Mehta (JIRA)" <ji...@apache.org> on 2018/01/08 04:10:00 UTC

[jira] [Commented] (PHOENIX-4503) Phoenix-Spark plugin doesn't release zookeeper connections

    [ https://issues.apache.org/jira/browse/PHOENIX-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16315649#comment-16315649 ] 

Karan Mehta commented on PHOENIX-4503:
--------------------------------------

The reason might be PHOENIX-4489. Can you try this experiment once, pause your code in middle, run a GC and see if the connections are decreased? 

Each call to {{read()}} method is essentially creating a {[HConnection}}, which contains a {{ZKConnection}}. This should be garbage collected since it gets out of scope real quick as {{PhoenixInputFormat#generateSplits()}} method is completed.

[~snalapure@dataken.net]

> Phoenix-Spark plugin doesn't release zookeeper connections
> ----------------------------------------------------------
>
>                 Key: PHOENIX-4503
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4503
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.11.0
>         Environment: HBase 1.2 on Linux (Ubuntu, CentOS)
>            Reporter: Suhas Nalapure
>
> *1. Phoenix-Spark plugin doesn't release zookeeper connections*
> Example: 
> 		
> {code:java}
> for(int i=0; i < 50; i++){
> 			Dataset<Row> df = sqlContext.read().format("org.apache.phoenix.spark")
> 					.option("table", "\"Sales\"").option("zkUrl", "localhost:2181")
> 					.load();
> 			df.show(2);
> 		}
> 		Thread.sleep(1000*60); 
> {code}
>    
>  When the above snippet is executed, we can see number of connections to 2181 increasing and not getting released until after the main thread wakes up from sleep and program ends as can be seen below (14 is the number of connections even before the program starts to run) :
> netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 14
> 16:52:05
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 22
> 16:52:15
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 38
> 16:52:18
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 68
> 16:52:23
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 100
> 16:52:27
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 116
> 16:52:32
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 116
> 16:52:38
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 116
> 16:52:52
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 116
> 16:53:00
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 116
> 16:53:24
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 14
> 16:53:32
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 14
> 16:53:34
> root@user1 ~ $
> *2. Instead if "jdbc" format is used to create Spark Dataframe, the connection count doesn't shoot up*
> Example:
> 		
> {code:java}
> for(int i=0; i < 50; i++){			
> 			Dataset<Row> df = sqlContext.read().format("jdbc")
> 					.option("url", "jdbc:phoenix:localhost:2181")
> 					.option("dbtable", "\"Sales\"")
> 					.option("driver", "org.apache.phoenix.jdbc.PhoenixDriver")
> 					.load();
> 			df.show(2);
> 		}
> 		Thread.sleep(1000*60);	
> {code}
> 		
> Connection counts during program execution(14 being the count before execution starts):
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 14
> 17:00:42
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 14
> 17:00:43
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:00:46
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:00:50
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:00:55
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:01:12
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:01:18
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:01:28
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:01:34
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:01:37
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:01:39
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 14
> 17:02:07



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)