You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Amit Singh Hora <ho...@gmail.com> on 2015/10/20 09:32:20 UTC

Spark opening to many connection with zookeeper

Hi All ,

My spark job started reporting zookeeper errors after seeing the zkdumps
from Hbase master i realized that there are N number of connection being
made from the nodes where worker of spark are running i  believe some how
the connections are not getting closed that is leading to error 

please find below code

val conf = ConfigFactory.load("connection.conf").getConfig("connection")
      val hconf = HBaseConfiguration.create();
    hconf.set(TableOutputFormat.OUTPUT_TABLE,
conf.getString("hbase.tablename"))
    hconf.set("zookeeper.session.timeout",
conf.getString("hbase.zookeepertimeout"));
    hconf.set("hbase.client.retries.number", Integer.toString(1));
    hconf.set("zookeeper.recovery.retry", Integer.toString(1));
    hconf.set("hbase.master", conf.getString("hbase.hbase_master"));
   
hconf.set("hbase.zookeeper.quorum",conf.getString("hbase.hbase_zkquorum"));
// zkquorum consists of 5 nodes 
    hconf.set("zookeeper.znode.parent", "/hbase-unsecure");
    hconf.set("hbase.zookeeper.property.clientPort",
conf.getString("hbase.hbase_zk_port"));
   
hconf.set(TableOutputFormat.OUTPUT_TABLE,conf.getString("hbase.tablename"))
    val jobConfig: JobConf = new JobConf(hconf, this.getClass)
    jobConfig.set("mapreduce.output.fileoutputformat.outputdir",
"/user/user01/out")
    jobConfig.setOutputFormat(classOf[TableOutputFormat])
    jobConfig.set(TableOutputFormat.OUTPUT_TABLE,
conf.getString("hbase.tablename"))
     
         try{
         rdd.map(convertToPut).
        saveAsHadoopDataset(jobConfig)
         }

the method convertToPut does nothing but jsut converts the json to Put
objects of HBase

After i killed the application/driver the number of connection decreased
drastically 

Kindly help in understanding and resolving the issue



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-opening-to-many-connection-with-zookeeper-tp25137.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

RE: Spark opening to many connection with zookeeper

Posted by Amit Hora <ho...@gmail.com>.

Request to share if you come across any hint

-----Original Message-----
From: "Ted Yu" <yu...@gmail.com>
Sent: ‎20-‎10-‎2015 21:30
To: "Amit Hora" <ho...@gmail.com>
Cc: "user" <us...@spark.apache.org>
Subject: Re: Spark opening to many connection with zookeeper

I need to dig deeper into saveAsHadoopDataset to see what might have caused the effect you observed.


Cheers


On Tue, Oct 20, 2015 at 8:57 AM, Amit Hora <ho...@gmail.com> wrote:

Hi Ted,

I made mistake last time yes the connection are very controlled when I used put like iterated over rdd for each and within that for each partition made connection and executed put list for hbase 

But why it was that the connection were getting too much when I used hibconf and storehadoopdataset method?


From: Amit Hora
Sent: ‎20-‎10-‎2015 20:38
To: Ted Yu
Cc: user
Subject: RE: Spark opening to many connection with zookeeper


I used that also but the number of connection goes on increasing started frm 10 and went till 299 
Than I changed my zookeeper conf to set max client connection to just 30 and restarted job 
Now the connections are between 18- 24 from last 2 hours

I am unable to understand such a behaviour


From: Ted Yu
Sent: ‎20-‎10-‎2015 20:19
To: Amit Hora
Cc: user
Subject: Re: Spark opening to many connection with zookeeper


Can you take a look at example 37 on page 225 of:

http://hbase.apache.org/apache_hbase_reference_guide.pdf



You can use the following method of Table:
  void put(List<Put> puts) throws IOException;
After the put() returns, the connection is closed.
Cheers


On Tue, Oct 20, 2015 at 2:40 AM, Amit Hora <ho...@gmail.com> wrote:

One region 


From: Ted Yu
Sent: ‎20-‎10-‎2015 15:01
To: Amit Singh Hora
Cc: user
Subject: Re: Spark opening to many connection with zookeeper


How many regions do your table have ?


Which hbase release do you use ?


Cheers


On Tue, Oct 20, 2015 at 12:32 AM, Amit Singh Hora <ho...@gmail.com> wrote:

Hi All ,

My spark job started reporting zookeeper errors after seeing the zkdumps
from Hbase master i realized that there are N number of connection being
made from the nodes where worker of spark are running i  believe some how
the connections are not getting closed that is leading to error

please find below code

val conf = ConfigFactory.load("connection.conf").getConfig("connection")
      val hconf = HBaseConfiguration.create();
    hconf.set(TableOutputFormat.OUTPUT_TABLE,
conf.getString("hbase.tablename"))
    hconf.set("zookeeper.session.timeout",
conf.getString("hbase.zookeepertimeout"));
    hconf.set("hbase.client.retries.number", Integer.toString(1));
    hconf.set("zookeeper.recovery.retry", Integer.toString(1));
    hconf.set("hbase.master", conf.getString("hbase.hbase_master"));

hconf.set("hbase.zookeeper.quorum",conf.getString("hbase.hbase_zkquorum"));
// zkquorum consists of 5 nodes
    hconf.set("zookeeper.znode.parent", "/hbase-unsecure");
    hconf.set("hbase.zookeeper.property.clientPort",
conf.getString("hbase.hbase_zk_port"));

hconf.set(TableOutputFormat.OUTPUT_TABLE,conf.getString("hbase.tablename"))
    val jobConfig: JobConf = new JobConf(hconf, this.getClass)
    jobConfig.set("mapreduce.output.fileoutputformat.outputdir",
"/user/user01/out")
    jobConfig.setOutputFormat(classOf[TableOutputFormat])
    jobConfig.set(TableOutputFormat.OUTPUT_TABLE,
conf.getString("hbase.tablename"))

         try{
         rdd.map(convertToPut).
        saveAsHadoopDataset(jobConfig)
         }

the method convertToPut does nothing but jsut converts the json to Put
objects of HBase

After i killed the application/driver the number of connection decreased
drastically

Kindly help in understanding and resolving the issue



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-opening-to-many-connection-with-zookeeper-tp25137.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

RE: Spark opening to many connection with zookeeper

Posted by Amit Hora <ho...@gmail.com>.

Hi All,

I am using Hbase 1.1.1 ,I came across a post describing hbase-spark included in hbase core

I am trying to use HbaseContext but cnt find the appropriate lib while trying to add following in pim I am getting missing artifact error

<groupId>Org.apache.hbase<groupId>
<artifactId>Hbase<artifactId>
<version>1.1.1<version>

-----Original Message-----
From: "Ted Yu" <yu...@gmail.com>
Sent: ‎20-‎10-‎2015 21:30
To: "Amit Hora" <ho...@gmail.com>
Cc: "user" <us...@spark.apache.org>
Subject: Re: Spark opening to many connection with zookeeper

I need to dig deeper into saveAsHadoopDataset to see what might have caused the effect you observed.


Cheers


On Tue, Oct 20, 2015 at 8:57 AM, Amit Hora <ho...@gmail.com> wrote:

Hi Ted,

I made mistake last time yes the connection are very controlled when I used put like iterated over rdd for each and within that for each partition made connection and executed put list for hbase 

But why it was that the connection were getting too much when I used hibconf and storehadoopdataset method?


From: Amit Hora
Sent: ‎20-‎10-‎2015 20:38
To: Ted Yu
Cc: user
Subject: RE: Spark opening to many connection with zookeeper


I used that also but the number of connection goes on increasing started frm 10 and went till 299 
Than I changed my zookeeper conf to set max client connection to just 30 and restarted job 
Now the connections are between 18- 24 from last 2 hours

I am unable to understand such a behaviour


From: Ted Yu
Sent: ‎20-‎10-‎2015 20:19
To: Amit Hora
Cc: user
Subject: Re: Spark opening to many connection with zookeeper


Can you take a look at example 37 on page 225 of:

http://hbase.apache.org/apache_hbase_reference_guide.pdf



You can use the following method of Table:
  void put(List<Put> puts) throws IOException;
After the put() returns, the connection is closed.
Cheers


On Tue, Oct 20, 2015 at 2:40 AM, Amit Hora <ho...@gmail.com> wrote:

One region 


From: Ted Yu
Sent: ‎20-‎10-‎2015 15:01
To: Amit Singh Hora
Cc: user
Subject: Re: Spark opening to many connection with zookeeper


How many regions do your table have ?


Which hbase release do you use ?


Cheers


On Tue, Oct 20, 2015 at 12:32 AM, Amit Singh Hora <ho...@gmail.com> wrote:

Hi All ,

My spark job started reporting zookeeper errors after seeing the zkdumps
from Hbase master i realized that there are N number of connection being
made from the nodes where worker of spark are running i  believe some how
the connections are not getting closed that is leading to error

please find below code

val conf = ConfigFactory.load("connection.conf").getConfig("connection")
      val hconf = HBaseConfiguration.create();
    hconf.set(TableOutputFormat.OUTPUT_TABLE,
conf.getString("hbase.tablename"))
    hconf.set("zookeeper.session.timeout",
conf.getString("hbase.zookeepertimeout"));
    hconf.set("hbase.client.retries.number", Integer.toString(1));
    hconf.set("zookeeper.recovery.retry", Integer.toString(1));
    hconf.set("hbase.master", conf.getString("hbase.hbase_master"));

hconf.set("hbase.zookeeper.quorum",conf.getString("hbase.hbase_zkquorum"));
// zkquorum consists of 5 nodes
    hconf.set("zookeeper.znode.parent", "/hbase-unsecure");
    hconf.set("hbase.zookeeper.property.clientPort",
conf.getString("hbase.hbase_zk_port"));

hconf.set(TableOutputFormat.OUTPUT_TABLE,conf.getString("hbase.tablename"))
    val jobConfig: JobConf = new JobConf(hconf, this.getClass)
    jobConfig.set("mapreduce.output.fileoutputformat.outputdir",
"/user/user01/out")
    jobConfig.setOutputFormat(classOf[TableOutputFormat])
    jobConfig.set(TableOutputFormat.OUTPUT_TABLE,
conf.getString("hbase.tablename"))

         try{
         rdd.map(convertToPut).
        saveAsHadoopDataset(jobConfig)
         }

the method convertToPut does nothing but jsut converts the json to Put
objects of HBase

After i killed the application/driver the number of connection decreased
drastically

Kindly help in understanding and resolving the issue



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-opening-to-many-connection-with-zookeeper-tp25137.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Spark opening to many connection with zookeeper

Posted by Ted Yu <yu...@gmail.com>.

I need to dig deeper into saveAsHadoopDataset to see what might have caused
the effect you observed.

Cheers

On Tue, Oct 20, 2015 at 8:57 AM, Amit Hora <ho...@gmail.com> wrote:

> Hi Ted,
>
> I made mistake last time yes the connection are very controlled when I
> used put like iterated over rdd for each and within that for each partition
> made connection and executed put list for hbase
>
> But why it was that the connection were getting too much when I used
> hibconf and storehadoopdataset method?
> ------------------------------
> From: Amit Hora <ho...@gmail.com>
> Sent: ‎20-‎10-‎2015 20:38
> To: Ted Yu <yu...@gmail.com>
> Cc: user <us...@spark.apache.org>
> Subject: RE: Spark opening to many connection with zookeeper
>
> I used that also but the number of connection goes on increasing started
> frm 10 and went till 299
> Than I changed my zookeeper conf to set max client connection to just 30
> and restarted job
> Now the connections are between 18- 24 from last 2 hours
>
> I am unable to understand such a behaviour
> ------------------------------
> From: Ted Yu <yu...@gmail.com>
> Sent: ‎20-‎10-‎2015 20:19
> To: Amit Hora <ho...@gmail.com>
> Cc: user <us...@spark.apache.org>
> Subject: Re: Spark opening to many connection with zookeeper
>
> Can you take a look at example 37 on page 225 of:
> http://hbase.apache.org/apache_hbase_reference_guide.pdf
>
> You can use the following method of Table:
>
>   void put(List<Put> puts) throws IOException;
>
> After the put() returns, the connection is closed.
>
> Cheers
>
> On Tue, Oct 20, 2015 at 2:40 AM, Amit Hora <ho...@gmail.com> wrote:
>
>> One region
>> ------------------------------
>> From: Ted Yu <yu...@gmail.com>
>> Sent: ‎20-‎10-‎2015 15:01
>> To: Amit Singh Hora <ho...@gmail.com>
>> Cc: user <us...@spark.apache.org>
>> Subject: Re: Spark opening to many connection with zookeeper
>>
>> How many regions do your table have ?
>>
>> Which hbase release do you use ?
>>
>> Cheers
>>
>> On Tue, Oct 20, 2015 at 12:32 AM, Amit Singh Hora <ho...@gmail.com>
>> wrote:
>>
>>> Hi All ,
>>>
>>> My spark job started reporting zookeeper errors after seeing the zkdumps
>>> from Hbase master i realized that there are N number of connection being
>>> made from the nodes where worker of spark are running i  believe some how
>>> the connections are not getting closed that is leading to error
>>>
>>> please find below code
>>>
>>> val conf = ConfigFactory.load("connection.conf").getConfig("connection")
>>>       val hconf = HBaseConfiguration.create();
>>>     hconf.set(TableOutputFormat.OUTPUT_TABLE,
>>> conf.getString("hbase.tablename"))
>>>     hconf.set("zookeeper.session.timeout",
>>> conf.getString("hbase.zookeepertimeout"));
>>>     hconf.set("hbase.client.retries.number", Integer.toString(1));
>>>     hconf.set("zookeeper.recovery.retry", Integer.toString(1));
>>>     hconf.set("hbase.master", conf.getString("hbase.hbase_master"));
>>>
>>>
>>> hconf.set("hbase.zookeeper.quorum",conf.getString("hbase.hbase_zkquorum"));
>>> // zkquorum consists of 5 nodes
>>>     hconf.set("zookeeper.znode.parent", "/hbase-unsecure");
>>>     hconf.set("hbase.zookeeper.property.clientPort",
>>> conf.getString("hbase.hbase_zk_port"));
>>>
>>>
>>> hconf.set(TableOutputFormat.OUTPUT_TABLE,conf.getString("hbase.tablename"))
>>>     val jobConfig: JobConf = new JobConf(hconf, this.getClass)
>>>     jobConfig.set("mapreduce.output.fileoutputformat.outputdir",
>>> "/user/user01/out")
>>>     jobConfig.setOutputFormat(classOf[TableOutputFormat])
>>>     jobConfig.set(TableOutputFormat.OUTPUT_TABLE,
>>> conf.getString("hbase.tablename"))
>>>
>>>          try{
>>>          rdd.map(convertToPut).
>>>         saveAsHadoopDataset(jobConfig)
>>>          }
>>>
>>> the method convertToPut does nothing but jsut converts the json to Put
>>> objects of HBase
>>>
>>> After i killed the application/driver the number of connection decreased
>>> drastically
>>>
>>> Kindly help in understanding and resolving the issue
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-opening-to-many-connection-with-zookeeper-tp25137.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>
>>
>

RE: Spark opening to many connection with zookeeper

Posted by Amit Hora <ho...@gmail.com>.

Hi Ted,

I made mistake last time yes the connection are very controlled when I used put like iterated over rdd for each and within that for each partition made connection and executed put list for hbase 

But why it was that the connection were getting too much when I used hibconf and storehadoopdataset method?

-----Original Message-----
From: "Amit Hora" <ho...@gmail.com>
Sent: ‎20-‎10-‎2015 20:38
To: "Ted Yu" <yu...@gmail.com>
Cc: "user" <us...@spark.apache.org>
Subject: RE: Spark opening to many connection with zookeeper

I used that also but the number of connection goes on increasing started frm 10 and went till 299 
Than I changed my zookeeper conf to set max client connection to just 30 and restarted job 
Now the connections are between 18- 24 from last 2 hours

I am unable to understand such a behaviour


From: Ted Yu
Sent: ‎20-‎10-‎2015 20:19
To: Amit Hora
Cc: user
Subject: Re: Spark opening to many connection with zookeeper


Can you take a look at example 37 on page 225 of:

http://hbase.apache.org/apache_hbase_reference_guide.pdf



You can use the following method of Table:
  void put(List<Put> puts) throws IOException;
After the put() returns, the connection is closed.
Cheers


On Tue, Oct 20, 2015 at 2:40 AM, Amit Hora <ho...@gmail.com> wrote:

One region 


From: Ted Yu
Sent: ‎20-‎10-‎2015 15:01
To: Amit Singh Hora
Cc: user
Subject: Re: Spark opening to many connection with zookeeper


How many regions do your table have ?


Which hbase release do you use ?


Cheers


On Tue, Oct 20, 2015 at 12:32 AM, Amit Singh Hora <ho...@gmail.com> wrote:

Hi All ,

My spark job started reporting zookeeper errors after seeing the zkdumps
from Hbase master i realized that there are N number of connection being
made from the nodes where worker of spark are running i  believe some how
the connections are not getting closed that is leading to error

please find below code

val conf = ConfigFactory.load("connection.conf").getConfig("connection")
      val hconf = HBaseConfiguration.create();
    hconf.set(TableOutputFormat.OUTPUT_TABLE,
conf.getString("hbase.tablename"))
    hconf.set("zookeeper.session.timeout",
conf.getString("hbase.zookeepertimeout"));
    hconf.set("hbase.client.retries.number", Integer.toString(1));
    hconf.set("zookeeper.recovery.retry", Integer.toString(1));
    hconf.set("hbase.master", conf.getString("hbase.hbase_master"));

hconf.set("hbase.zookeeper.quorum",conf.getString("hbase.hbase_zkquorum"));
// zkquorum consists of 5 nodes
    hconf.set("zookeeper.znode.parent", "/hbase-unsecure");
    hconf.set("hbase.zookeeper.property.clientPort",
conf.getString("hbase.hbase_zk_port"));

hconf.set(TableOutputFormat.OUTPUT_TABLE,conf.getString("hbase.tablename"))
    val jobConfig: JobConf = new JobConf(hconf, this.getClass)
    jobConfig.set("mapreduce.output.fileoutputformat.outputdir",
"/user/user01/out")
    jobConfig.setOutputFormat(classOf[TableOutputFormat])
    jobConfig.set(TableOutputFormat.OUTPUT_TABLE,
conf.getString("hbase.tablename"))

         try{
         rdd.map(convertToPut).
        saveAsHadoopDataset(jobConfig)
         }

the method convertToPut does nothing but jsut converts the json to Put
objects of HBase

After i killed the application/driver the number of connection decreased
drastically

Kindly help in understanding and resolving the issue



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-opening-to-many-connection-with-zookeeper-tp25137.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

RE: Spark opening to many connection with zookeeper

Posted by Amit Hora <ho...@gmail.com>.

I used that also but the number of connection goes on increasing started frm 10 and went till 299 
Than I changed my zookeeper conf to set max client connection to just 30 and restarted job 
Now the connections are between 18- 24 from last 2 hours

I am unable to understand such a behaviour

-----Original Message-----
From: "Ted Yu" <yu...@gmail.com>
Sent: ‎20-‎10-‎2015 20:19
To: "Amit Hora" <ho...@gmail.com>
Cc: "user" <us...@spark.apache.org>
Subject: Re: Spark opening to many connection with zookeeper

Can you take a look at example 37 on page 225 of:

http://hbase.apache.org/apache_hbase_reference_guide.pdf



You can use the following method of Table:
  void put(List<Put> puts) throws IOException;
After the put() returns, the connection is closed.
Cheers


On Tue, Oct 20, 2015 at 2:40 AM, Amit Hora <ho...@gmail.com> wrote:

One region 


From: Ted Yu
Sent: ‎20-‎10-‎2015 15:01
To: Amit Singh Hora
Cc: user
Subject: Re: Spark opening to many connection with zookeeper


How many regions do your table have ?


Which hbase release do you use ?


Cheers


On Tue, Oct 20, 2015 at 12:32 AM, Amit Singh Hora <ho...@gmail.com> wrote:

Hi All ,

My spark job started reporting zookeeper errors after seeing the zkdumps
from Hbase master i realized that there are N number of connection being
made from the nodes where worker of spark are running i  believe some how
the connections are not getting closed that is leading to error

please find below code

val conf = ConfigFactory.load("connection.conf").getConfig("connection")
      val hconf = HBaseConfiguration.create();
    hconf.set(TableOutputFormat.OUTPUT_TABLE,
conf.getString("hbase.tablename"))
    hconf.set("zookeeper.session.timeout",
conf.getString("hbase.zookeepertimeout"));
    hconf.set("hbase.client.retries.number", Integer.toString(1));
    hconf.set("zookeeper.recovery.retry", Integer.toString(1));
    hconf.set("hbase.master", conf.getString("hbase.hbase_master"));

hconf.set("hbase.zookeeper.quorum",conf.getString("hbase.hbase_zkquorum"));
// zkquorum consists of 5 nodes
    hconf.set("zookeeper.znode.parent", "/hbase-unsecure");
    hconf.set("hbase.zookeeper.property.clientPort",
conf.getString("hbase.hbase_zk_port"));

hconf.set(TableOutputFormat.OUTPUT_TABLE,conf.getString("hbase.tablename"))
    val jobConfig: JobConf = new JobConf(hconf, this.getClass)
    jobConfig.set("mapreduce.output.fileoutputformat.outputdir",
"/user/user01/out")
    jobConfig.setOutputFormat(classOf[TableOutputFormat])
    jobConfig.set(TableOutputFormat.OUTPUT_TABLE,
conf.getString("hbase.tablename"))

         try{
         rdd.map(convertToPut).
        saveAsHadoopDataset(jobConfig)
         }

the method convertToPut does nothing but jsut converts the json to Put
objects of HBase

After i killed the application/driver the number of connection decreased
drastically

Kindly help in understanding and resolving the issue



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-opening-to-many-connection-with-zookeeper-tp25137.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Spark opening to many connection with zookeeper

Posted by Ted Yu <yu...@gmail.com>.

Can you take a look at example 37 on page 225 of:
http://hbase.apache.org/apache_hbase_reference_guide.pdf

You can use the following method of Table:

  void put(List<Put> puts) throws IOException;

After the put() returns, the connection is closed.

Cheers

On Tue, Oct 20, 2015 at 2:40 AM, Amit Hora <ho...@gmail.com> wrote:

> One region
> ------------------------------
> From: Ted Yu <yu...@gmail.com>
> Sent: ‎20-‎10-‎2015 15:01
> To: Amit Singh Hora <ho...@gmail.com>
> Cc: user <us...@spark.apache.org>
> Subject: Re: Spark opening to many connection with zookeeper
>
> How many regions do your table have ?
>
> Which hbase release do you use ?
>
> Cheers
>
> On Tue, Oct 20, 2015 at 12:32 AM, Amit Singh Hora <ho...@gmail.com>
> wrote:
>
>> Hi All ,
>>
>> My spark job started reporting zookeeper errors after seeing the zkdumps
>> from Hbase master i realized that there are N number of connection being
>> made from the nodes where worker of spark are running i  believe some how
>> the connections are not getting closed that is leading to error
>>
>> please find below code
>>
>> val conf = ConfigFactory.load("connection.conf").getConfig("connection")
>>       val hconf = HBaseConfiguration.create();
>>     hconf.set(TableOutputFormat.OUTPUT_TABLE,
>> conf.getString("hbase.tablename"))
>>     hconf.set("zookeeper.session.timeout",
>> conf.getString("hbase.zookeepertimeout"));
>>     hconf.set("hbase.client.retries.number", Integer.toString(1));
>>     hconf.set("zookeeper.recovery.retry", Integer.toString(1));
>>     hconf.set("hbase.master", conf.getString("hbase.hbase_master"));
>>
>>
>> hconf.set("hbase.zookeeper.quorum",conf.getString("hbase.hbase_zkquorum"));
>> // zkquorum consists of 5 nodes
>>     hconf.set("zookeeper.znode.parent", "/hbase-unsecure");
>>     hconf.set("hbase.zookeeper.property.clientPort",
>> conf.getString("hbase.hbase_zk_port"));
>>
>>
>> hconf.set(TableOutputFormat.OUTPUT_TABLE,conf.getString("hbase.tablename"))
>>     val jobConfig: JobConf = new JobConf(hconf, this.getClass)
>>     jobConfig.set("mapreduce.output.fileoutputformat.outputdir",
>> "/user/user01/out")
>>     jobConfig.setOutputFormat(classOf[TableOutputFormat])
>>     jobConfig.set(TableOutputFormat.OUTPUT_TABLE,
>> conf.getString("hbase.tablename"))
>>
>>          try{
>>          rdd.map(convertToPut).
>>         saveAsHadoopDataset(jobConfig)
>>          }
>>
>> the method convertToPut does nothing but jsut converts the json to Put
>> objects of HBase
>>
>> After i killed the application/driver the number of connection decreased
>> drastically
>>
>> Kindly help in understanding and resolving the issue
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-opening-to-many-connection-with-zookeeper-tp25137.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>

RE: Spark opening to many connection with zookeeper

Posted by Amit Hora <ho...@gmail.com>.

One region 

-----Original Message-----
From: "Ted Yu" <yu...@gmail.com>
Sent: ‎20-‎10-‎2015 15:01
To: "Amit Singh Hora" <ho...@gmail.com>
Cc: "user" <us...@spark.apache.org>
Subject: Re: Spark opening to many connection with zookeeper

How many regions do your table have ?


Which hbase release do you use ?


Cheers


On Tue, Oct 20, 2015 at 12:32 AM, Amit Singh Hora <ho...@gmail.com> wrote:

Hi All ,

My spark job started reporting zookeeper errors after seeing the zkdumps
from Hbase master i realized that there are N number of connection being
made from the nodes where worker of spark are running i  believe some how
the connections are not getting closed that is leading to error

please find below code

val conf = ConfigFactory.load("connection.conf").getConfig("connection")
      val hconf = HBaseConfiguration.create();
    hconf.set(TableOutputFormat.OUTPUT_TABLE,
conf.getString("hbase.tablename"))
    hconf.set("zookeeper.session.timeout",
conf.getString("hbase.zookeepertimeout"));
    hconf.set("hbase.client.retries.number", Integer.toString(1));
    hconf.set("zookeeper.recovery.retry", Integer.toString(1));
    hconf.set("hbase.master", conf.getString("hbase.hbase_master"));

hconf.set("hbase.zookeeper.quorum",conf.getString("hbase.hbase_zkquorum"));
// zkquorum consists of 5 nodes
    hconf.set("zookeeper.znode.parent", "/hbase-unsecure");
    hconf.set("hbase.zookeeper.property.clientPort",
conf.getString("hbase.hbase_zk_port"));

hconf.set(TableOutputFormat.OUTPUT_TABLE,conf.getString("hbase.tablename"))
    val jobConfig: JobConf = new JobConf(hconf, this.getClass)
    jobConfig.set("mapreduce.output.fileoutputformat.outputdir",
"/user/user01/out")
    jobConfig.setOutputFormat(classOf[TableOutputFormat])
    jobConfig.set(TableOutputFormat.OUTPUT_TABLE,
conf.getString("hbase.tablename"))

         try{
         rdd.map(convertToPut).
        saveAsHadoopDataset(jobConfig)
         }

the method convertToPut does nothing but jsut converts the json to Put
objects of HBase

After i killed the application/driver the number of connection decreased
drastically

Kindly help in understanding and resolving the issue



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-opening-to-many-connection-with-zookeeper-tp25137.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Spark opening to many connection with zookeeper

Posted by Ted Yu <yu...@gmail.com>.

How many regions do your table have ?

Which hbase release do you use ?

Cheers

On Tue, Oct 20, 2015 at 12:32 AM, Amit Singh Hora <ho...@gmail.com>
wrote:

> Hi All ,
>
> My spark job started reporting zookeeper errors after seeing the zkdumps
> from Hbase master i realized that there are N number of connection being
> made from the nodes where worker of spark are running i  believe some how
> the connections are not getting closed that is leading to error
>
> please find below code
>
> val conf = ConfigFactory.load("connection.conf").getConfig("connection")
>       val hconf = HBaseConfiguration.create();
>     hconf.set(TableOutputFormat.OUTPUT_TABLE,
> conf.getString("hbase.tablename"))
>     hconf.set("zookeeper.session.timeout",
> conf.getString("hbase.zookeepertimeout"));
>     hconf.set("hbase.client.retries.number", Integer.toString(1));
>     hconf.set("zookeeper.recovery.retry", Integer.toString(1));
>     hconf.set("hbase.master", conf.getString("hbase.hbase_master"));
>
> hconf.set("hbase.zookeeper.quorum",conf.getString("hbase.hbase_zkquorum"));
> // zkquorum consists of 5 nodes
>     hconf.set("zookeeper.znode.parent", "/hbase-unsecure");
>     hconf.set("hbase.zookeeper.property.clientPort",
> conf.getString("hbase.hbase_zk_port"));
>
> hconf.set(TableOutputFormat.OUTPUT_TABLE,conf.getString("hbase.tablename"))
>     val jobConfig: JobConf = new JobConf(hconf, this.getClass)
>     jobConfig.set("mapreduce.output.fileoutputformat.outputdir",
> "/user/user01/out")
>     jobConfig.setOutputFormat(classOf[TableOutputFormat])
>     jobConfig.set(TableOutputFormat.OUTPUT_TABLE,
> conf.getString("hbase.tablename"))
>
>          try{
>          rdd.map(convertToPut).
>         saveAsHadoopDataset(jobConfig)
>          }
>
> the method convertToPut does nothing but jsut converts the json to Put
> objects of HBase
>
> After i killed the application/driver the number of connection decreased
> drastically
>
> Kindly help in understanding and resolving the issue
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-opening-to-many-connection-with-zookeeper-tp25137.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>