You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Antony Mayi <an...@yahoo.com.INVALID> on 2014/12/24 13:49:00 UTC

saveAsNewAPIHadoopDataset against hbase hanging in pyspark 1.2.0

Hi,
have been using this without any issues with spark 1.1.0 but after upgrading to 1.2.0 saving a RDD from pyspark using saveAsNewAPIHadoopDataset into HBase just hangs - even when testing with the example from the stock hbase_outputformat.py.
anyone having same issue? (and able to solve?)
using hbase 0.98.6 and yarn-client mode.
thanks,Antony.

Re: saveAsNewAPIHadoopDataset against hbase hanging in pyspark 1.2.0

Posted by Antony Mayi <an...@yahoo.com.INVALID>.
also hbase itself works ok:
hbase(main):006:0> scan 'test'ROW                            COLUMN+CELL                                                                             key1                          column=f1:asd, timestamp=1419463092904, value=456                                      1 row(s) in 0.0250 seconds
hbase(main):007:0> put 'test', 'testkey', 'f1:testqual', 'testval'0 row(s) in 0.0170 seconds
hbase(main):008:0> scan 'test'ROW                            COLUMN+CELL                                                                             key1                          column=f1:asd, timestamp=1419463092904, value=456                                       testkey                       column=f1:testqual, timestamp=1419487275905, value=testval                             2 row(s) in 0.0270 seconds
 

     On Thursday, 25 December 2014, 6:58, Antony Mayi <an...@yahoo.com> wrote:
   
 

 I am running it in yarn-client mode and I believe hbase-client is part of the spark-examples-1.2.0-cdh5.3.0-hadoop2.5.0-cdh5.3.0.jar which I am submitting at launch.
adding another jstack taken during the hanging - http://pastebin.com/QDQrBw70 - this is of the CoarseGrainedExecutorBackend process - this one is referencing hbase and zookeeper.
thanks,Antony. 

     On Thursday, 25 December 2014, 1:38, Ted Yu <yu...@gmail.com> wrote:
   
 

 bq. "hbase.zookeeper.quorum": "localhost"
You are running hbase cluster in standalone mode ?Is hbase-client jar in the classpath ?
Cheers
On Wed, Dec 24, 2014 at 4:11 PM, Antony Mayi <an...@yahoo.com> wrote:

I just run it by hand from pyspark shell. here is the steps:
pyspark --jars /usr/lib/spark/lib/spark-examples-1.2.0-cdh5.3.0-hadoop2.5.0-cdh5.3.0.jar
>>> conf = {"hbase.zookeeper.quorum": "localhost",
...         "hbase.mapred.outputtable": "test",...         "mapreduce.outputformat.class": "org.apache.hadoop.hbase.mapreduce.TableOutputFormat",...         "mapreduce.job.output.key.class": "org.apache.hadoop.hbase.io.ImmutableBytesWritable",...         "mapreduce.job.output.value.class": "org.apache.hadoop.io.Writable"}>>> keyConv = "org.apache.spark.examples.pythonconverters.StringToImmutableBytesWritableConverter">>> valueConv = "org.apache.spark.examples.pythonconverters.StringListToPutConverter">>> sc.parallelize([['testkey', 'f1', 'testqual', 'testval']], 1).map(lambda x: (x[0], x)).saveAsNewAPIHadoopDataset(...         conf=conf,...         keyConverter=keyConv,...         valueConverter=valueConv)
then it spills few of the INFO level messages about submitting a task etc but then it just hangs. very same code runs ok on spark 1.1.0 - the records gets stored in hbase.
thanks,Antony.

 

     On Thursday, 25 December 2014, 0:37, Ted Yu <yu...@gmail.com> wrote:
   
 

 I went over the jstack but didn't find any call related to hbase or zookeeper.Do you find anything important in the logs ?
Looks like container launcher was waiting for the script to return some result:
   
   -         at org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:715)
   -         at org.apache.hadoop.util.Shell.runCommand(Shell.java:524)

On Wed, Dec 24, 2014 at 3:11 PM, Antony Mayi <an...@yahoo.com> wrote:

this is it (jstack of particular yarn container) -> http://pastebin.com/eAdiUYKK
thanks, Antony. 

     On Wednesday, 24 December 2014, 16:34, Ted Yu <yu...@gmail.com> wrote:
   
 

 bq. even when testing with the example from the stock hbase_outputformat.py
Can you take jstack of the above and pastebin it ?
Thanks
On Wed, Dec 24, 2014 at 4:49 AM, Antony Mayi <an...@yahoo.com.invalid> wrote:

Hi,
have been using this without any issues with spark 1.1.0 but after upgrading to 1.2.0 saving a RDD from pyspark using saveAsNewAPIHadoopDataset into HBase just hangs - even when testing with the example from the stock hbase_outputformat.py.
anyone having same issue? (and able to solve?)
using hbase 0.98.6 and yarn-client mode.
thanks,Antony.




 
    



 
    



 
    

 
   

Re: saveAsNewAPIHadoopDataset against hbase hanging in pyspark 1.2.0

Posted by Antony Mayi <an...@yahoo.com.INVALID>.
I am running it in yarn-client mode and I believe hbase-client is part of the spark-examples-1.2.0-cdh5.3.0-hadoop2.5.0-cdh5.3.0.jar which I am submitting at launch.
adding another jstack taken during the hanging - http://pastebin.com/QDQrBw70 - this is of the CoarseGrainedExecutorBackend process - this one is referencing hbase and zookeeper.
thanks,Antony. 

     On Thursday, 25 December 2014, 1:38, Ted Yu <yu...@gmail.com> wrote:
   
 

 bq. "hbase.zookeeper.quorum": "localhost"
You are running hbase cluster in standalone mode ?Is hbase-client jar in the classpath ?
Cheers
On Wed, Dec 24, 2014 at 4:11 PM, Antony Mayi <an...@yahoo.com> wrote:

I just run it by hand from pyspark shell. here is the steps:
pyspark --jars /usr/lib/spark/lib/spark-examples-1.2.0-cdh5.3.0-hadoop2.5.0-cdh5.3.0.jar
>>> conf = {"hbase.zookeeper.quorum": "localhost",
...         "hbase.mapred.outputtable": "test",...         "mapreduce.outputformat.class": "org.apache.hadoop.hbase.mapreduce.TableOutputFormat",...         "mapreduce.job.output.key.class": "org.apache.hadoop.hbase.io.ImmutableBytesWritable",...         "mapreduce.job.output.value.class": "org.apache.hadoop.io.Writable"}>>> keyConv = "org.apache.spark.examples.pythonconverters.StringToImmutableBytesWritableConverter">>> valueConv = "org.apache.spark.examples.pythonconverters.StringListToPutConverter">>> sc.parallelize([['testkey', 'f1', 'testqual', 'testval']], 1).map(lambda x: (x[0], x)).saveAsNewAPIHadoopDataset(...         conf=conf,...         keyConverter=keyConv,...         valueConverter=valueConv)
then it spills few of the INFO level messages about submitting a task etc but then it just hangs. very same code runs ok on spark 1.1.0 - the records gets stored in hbase.
thanks,Antony.

 

     On Thursday, 25 December 2014, 0:37, Ted Yu <yu...@gmail.com> wrote:
   
 

 I went over the jstack but didn't find any call related to hbase or zookeeper.Do you find anything important in the logs ?
Looks like container launcher was waiting for the script to return some result:
   
   -         at org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:715)
   -         at org.apache.hadoop.util.Shell.runCommand(Shell.java:524)

On Wed, Dec 24, 2014 at 3:11 PM, Antony Mayi <an...@yahoo.com> wrote:

this is it (jstack of particular yarn container) -> http://pastebin.com/eAdiUYKK
thanks, Antony. 

     On Wednesday, 24 December 2014, 16:34, Ted Yu <yu...@gmail.com> wrote:
   
 

 bq. even when testing with the example from the stock hbase_outputformat.py
Can you take jstack of the above and pastebin it ?
Thanks
On Wed, Dec 24, 2014 at 4:49 AM, Antony Mayi <an...@yahoo.com.invalid> wrote:

Hi,
have been using this without any issues with spark 1.1.0 but after upgrading to 1.2.0 saving a RDD from pyspark using saveAsNewAPIHadoopDataset into HBase just hangs - even when testing with the example from the stock hbase_outputformat.py.
anyone having same issue? (and able to solve?)
using hbase 0.98.6 and yarn-client mode.
thanks,Antony.




 
    



 
    



 
   

Re: saveAsNewAPIHadoopDataset against hbase hanging in pyspark 1.2.0

Posted by Ted Yu <yu...@gmail.com>.
bq. "hbase.zookeeper.quorum": "localhost"

You are running hbase cluster in standalone mode ?
Is hbase-client jar in the classpath ?

Cheers

On Wed, Dec 24, 2014 at 4:11 PM, Antony Mayi <an...@yahoo.com> wrote:

> I just run it by hand from pyspark shell. here is the steps:
>
> pyspark --jars
> /usr/lib/spark/lib/spark-examples-1.2.0-cdh5.3.0-hadoop2.5.0-cdh5.3.0.jar
>
> >>> conf = {"hbase.zookeeper.quorum": "localhost",
> ...         "hbase.mapred.outputtable": "test",
> ...         "mapreduce.outputformat.class":
> "org.apache.hadoop.hbase.mapreduce.TableOutputFormat",
> ...         "mapreduce.job.output.key.class":
> "org.apache.hadoop.hbase.io.ImmutableBytesWritable",
> ...         "mapreduce.job.output.value.class":
> "org.apache.hadoop.io.Writable"}
> >>> keyConv =
> "org.apache.spark.examples.pythonconverters.StringToImmutableBytesWritableConverter"
> >>> valueConv =
> "org.apache.spark.examples.pythonconverters.StringListToPutConverter"
> >>> sc.parallelize([['testkey', 'f1', 'testqual', 'testval']],
> 1).map(lambda x: (x[0], x)).saveAsNewAPIHadoopDataset(
> ...         conf=conf,
> ...         keyConverter=keyConv,
> ...         valueConverter=valueConv)
>
> then it spills few of the INFO level messages about submitting a task etc
> but then it just hangs. very same code runs ok on spark 1.1.0 - the records
> gets stored in hbase.
>
> thanks,
> Antony.
>
>
>
>
>   On Thursday, 25 December 2014, 0:37, Ted Yu <yu...@gmail.com> wrote:
>
>
>
> I went over the jstack but didn't find any call related to hbase or
> zookeeper.
> Do you find anything important in the logs ?
>
> Looks like container launcher was waiting for the script to return some
> result:
>
>
>    1.         at
>    org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:715)
>    2.         at org.apache.hadoop.util.Shell.runCommand(Shell.java:524)
>
>
> On Wed, Dec 24, 2014 at 3:11 PM, Antony Mayi <an...@yahoo.com> wrote:
>
> this is it (jstack of particular yarn container) ->
> http://pastebin.com/eAdiUYKK
>
> thanks, Antony.
>
>
>   On Wednesday, 24 December 2014, 16:34, Ted Yu <yu...@gmail.com>
> wrote:
>
>
>
> bq. even when testing with the example from the stock
> hbase_outputformat.py
>
> Can you take jstack of the above and pastebin it ?
>
> Thanks
>
> On Wed, Dec 24, 2014 at 4:49 AM, Antony Mayi <antonymayi@yahoo.com.invalid
> > wrote:
>
> Hi,
>
> have been using this without any issues with spark 1.1.0 but after
> upgrading to 1.2.0 saving a RDD from pyspark
> using saveAsNewAPIHadoopDataset into HBase just hangs - even when testing
> with the example from the stock hbase_outputformat.py.
>
> anyone having same issue? (and able to solve?)
>
> using hbase 0.98.6 and yarn-client mode.
>
> thanks,
> Antony.
>
>
>
>
>
>
>
>

Re: saveAsNewAPIHadoopDataset against hbase hanging in pyspark 1.2.0

Posted by Antony Mayi <an...@yahoo.com.INVALID>.
I just run it by hand from pyspark shell. here is the steps:
pyspark --jars /usr/lib/spark/lib/spark-examples-1.2.0-cdh5.3.0-hadoop2.5.0-cdh5.3.0.jar
>>> conf = {"hbase.zookeeper.quorum": "localhost",
...         "hbase.mapred.outputtable": "test",...         "mapreduce.outputformat.class": "org.apache.hadoop.hbase.mapreduce.TableOutputFormat",...         "mapreduce.job.output.key.class": "org.apache.hadoop.hbase.io.ImmutableBytesWritable",...         "mapreduce.job.output.value.class": "org.apache.hadoop.io.Writable"}>>> keyConv = "org.apache.spark.examples.pythonconverters.StringToImmutableBytesWritableConverter">>> valueConv = "org.apache.spark.examples.pythonconverters.StringListToPutConverter">>> sc.parallelize([['testkey', 'f1', 'testqual', 'testval']], 1).map(lambda x: (x[0], x)).saveAsNewAPIHadoopDataset(...         conf=conf,...         keyConverter=keyConv,...         valueConverter=valueConv)
then it spills few of the INFO level messages about submitting a task etc but then it just hangs. very same code runs ok on spark 1.1.0 - the records gets stored in hbase.
thanks,Antony.

 

     On Thursday, 25 December 2014, 0:37, Ted Yu <yu...@gmail.com> wrote:
   
 

 I went over the jstack but didn't find any call related to hbase or zookeeper.Do you find anything important in the logs ?
Looks like container launcher was waiting for the script to return some result:
   
   -         at org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:715)
   -         at org.apache.hadoop.util.Shell.runCommand(Shell.java:524)

On Wed, Dec 24, 2014 at 3:11 PM, Antony Mayi <an...@yahoo.com> wrote:

this is it (jstack of particular yarn container) -> http://pastebin.com/eAdiUYKK
thanks, Antony. 

     On Wednesday, 24 December 2014, 16:34, Ted Yu <yu...@gmail.com> wrote:
   
 

 bq. even when testing with the example from the stock hbase_outputformat.py
Can you take jstack of the above and pastebin it ?
Thanks
On Wed, Dec 24, 2014 at 4:49 AM, Antony Mayi <an...@yahoo.com.invalid> wrote:

Hi,
have been using this without any issues with spark 1.1.0 but after upgrading to 1.2.0 saving a RDD from pyspark using saveAsNewAPIHadoopDataset into HBase just hangs - even when testing with the example from the stock hbase_outputformat.py.
anyone having same issue? (and able to solve?)
using hbase 0.98.6 and yarn-client mode.
thanks,Antony.




 
    



 
   

Re: saveAsNewAPIHadoopDataset against hbase hanging in pyspark 1.2.0

Posted by Ted Yu <yu...@gmail.com>.
I went over the jstack but didn't find any call related to hbase or
zookeeper.
Do you find anything important in the logs ?

Looks like container launcher was waiting for the script to return some
result:


   1.         at
   org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:715)
   2.         at org.apache.hadoop.util.Shell.runCommand(Shell.java:524)


On Wed, Dec 24, 2014 at 3:11 PM, Antony Mayi <an...@yahoo.com> wrote:

> this is it (jstack of particular yarn container) ->
> http://pastebin.com/eAdiUYKK
>
> thanks, Antony.
>
>
>   On Wednesday, 24 December 2014, 16:34, Ted Yu <yu...@gmail.com>
> wrote:
>
>
>
> bq. even when testing with the example from the stock
> hbase_outputformat.py
>
> Can you take jstack of the above and pastebin it ?
>
> Thanks
>
> On Wed, Dec 24, 2014 at 4:49 AM, Antony Mayi <antonymayi@yahoo.com.invalid
> > wrote:
>
> Hi,
>
> have been using this without any issues with spark 1.1.0 but after
> upgrading to 1.2.0 saving a RDD from pyspark
> using saveAsNewAPIHadoopDataset into HBase just hangs - even when testing
> with the example from the stock hbase_outputformat.py.
>
> anyone having same issue? (and able to solve?)
>
> using hbase 0.98.6 and yarn-client mode.
>
> thanks,
> Antony.
>
>
>
>
>

Re: saveAsNewAPIHadoopDataset against hbase hanging in pyspark 1.2.0

Posted by Antony Mayi <an...@yahoo.com.INVALID>.
this is it (jstack of particular yarn container) -> http://pastebin.com/eAdiUYKK
thanks, Antony. 

     On Wednesday, 24 December 2014, 16:34, Ted Yu <yu...@gmail.com> wrote:
   
 

 bq. even when testing with the example from the stock hbase_outputformat.py
Can you take jstack of the above and pastebin it ?
Thanks
On Wed, Dec 24, 2014 at 4:49 AM, Antony Mayi <an...@yahoo.com.invalid> wrote:

Hi,
have been using this without any issues with spark 1.1.0 but after upgrading to 1.2.0 saving a RDD from pyspark using saveAsNewAPIHadoopDataset into HBase just hangs - even when testing with the example from the stock hbase_outputformat.py.
anyone having same issue? (and able to solve?)
using hbase 0.98.6 and yarn-client mode.
thanks,Antony.




 
   

Re: saveAsNewAPIHadoopDataset against hbase hanging in pyspark 1.2.0

Posted by Ted Yu <yu...@gmail.com>.
bq. even when testing with the example from the stock hbase_outputformat.py

Can you take jstack of the above and pastebin it ?

Thanks

On Wed, Dec 24, 2014 at 4:49 AM, Antony Mayi <an...@yahoo.com.invalid>
wrote:

> Hi,
>
> have been using this without any issues with spark 1.1.0 but after
> upgrading to 1.2.0 saving a RDD from pyspark
> using saveAsNewAPIHadoopDataset into HBase just hangs - even when testing
> with the example from the stock hbase_outputformat.py.
>
> anyone having same issue? (and able to solve?)
>
> using hbase 0.98.6 and yarn-client mode.
>
> thanks,
> Antony.
>
>