You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by ayan guha <gu...@gmail.com> on 2017/06/23 12:46:42 UTC

HDP 2.5 - Python - Spark-On-Hbase

Hi

Is it possible to use SHC from Hortonworks with pyspark? If so, any working
code sample available?

Also, I faced an issue while running the samples with Spark 2.0

"Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging"

Any workaround?

Thanks in advance....

-- 
Best Regards,
Ayan Guha

Re: HDP 2.5 - Python - Spark-On-Hbase

Posted by Debabrata Ghosh <ma...@gmail.com>.
Ayan,
                       Did you get to work the HBase Connection through
Pyspark as well ? I have got the Spark - HBase connection working with
Scala (via HBasecontext). However, but I eventually want to get this
working within a Pyspark code - Would you have some suitable code snippets
or approach so that I can call a Scala class within Pyspark ?

Thanks,
Debu

On Wed, Jun 28, 2017 at 3:18 PM, ayan guha <gu...@gmail.com> wrote:

> Hi
>
> Thanks for all of you, I could get HBase connector working. there are
> still some details around namespace is pending, but overall it is working
> well.
>
> Now, as usual, I would like to use the same concept into Structured
> Streaming. Is there any similar way I can use writeStream.format and use
> HBase writer? Or any other way to write continuous data to HBase?
>
> best
> Ayan
>
> On Tue, Jun 27, 2017 at 2:15 AM, Weiqing Yang <ya...@gmail.com>
> wrote:
>
>> For SHC documentation, please refer the README in SHC github, which is
>> kept up-to-date.
>>
>> On Mon, Jun 26, 2017 at 5:46 AM, ayan guha <gu...@gmail.com> wrote:
>>
>>> Thanks all, I have found correct version of the package. Probably HDP
>>> documentation is little behind.
>>>
>>> Best
>>> Ayan
>>>
>>> On Mon, 26 Jun 2017 at 2:16 pm, Mahesh Sawaiker <
>>> mahesh_sawaiker@persistent.com> wrote:
>>>
>>>> Ayan,
>>>>
>>>> The location of the logging class was moved from Spark 1.6 to Spark 2.0.
>>>>
>>>> Looks like you are trying to run 1.6 code on 2.0, I have ported some
>>>> code like this before and if you have access to the code you can recompile
>>>> it by changing reference to Logging class and directly use the slf4 Logger
>>>> class, most of the code tends to be easily portable.
>>>>
>>>>
>>>>
>>>> Following is the release note for Spark 2.0
>>>>
>>>>
>>>>
>>>> *Removals, Behavior Changes and Deprecations*
>>>>
>>>> *Removals*
>>>>
>>>> The following features have been removed in Spark 2.0:
>>>>
>>>>    - Bagel
>>>>    - Support for Hadoop 2.1 and earlier
>>>>    - The ability to configure closure serializer
>>>>    - HTTPBroadcast
>>>>    - TTL-based metadata cleaning
>>>>    - *Semi-private class org.apache.spark.Logging. We suggest you use
>>>>    slf4j directly.*
>>>>    - SparkContext.metricsSystem
>>>>
>>>> Thanks,
>>>>
>>>> Mahesh
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *From:* ayan guha [mailto:guha.ayan@gmail.com]
>>>> *Sent:* Monday, June 26, 2017 6:26 AM
>>>> *To:* Weiqing Yang
>>>> *Cc:* user
>>>> *Subject:* Re: HDP 2.5 - Python - Spark-On-Hbase
>>>>
>>>>
>>>>
>>>> Hi
>>>>
>>>>
>>>>
>>>> I am using following:
>>>>
>>>>
>>>>
>>>> --packages com.hortonworks:shc:1.0.0-1.6-s_2.10 --repositories
>>>> http://repo.hortonworks.com/content/groups/public/
>>>>
>>>>
>>>>
>>>> Is it compatible with Spark 2.X? I would like to use it....
>>>>
>>>>
>>>>
>>>> Best
>>>>
>>>> Ayan
>>>>
>>>>
>>>>
>>>> On Sat, Jun 24, 2017 at 2:09 AM, Weiqing Yang <ya...@gmail.com>
>>>> wrote:
>>>>
>>>> Yes.
>>>>
>>>> What SHC version you were using?
>>>>
>>>> If hitting any issues, you can post them in SHC github issues. There
>>>> are some threads about this.
>>>>
>>>>
>>>>
>>>> On Fri, Jun 23, 2017 at 5:46 AM, ayan guha <gu...@gmail.com> wrote:
>>>>
>>>> Hi
>>>>
>>>>
>>>>
>>>> Is it possible to use SHC from Hortonworks with pyspark? If so, any
>>>> working code sample available?
>>>>
>>>>
>>>>
>>>> Also, I faced an issue while running the samples with Spark 2.0
>>>>
>>>>
>>>>
>>>> "Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging"
>>>>
>>>>
>>>>
>>>> Any workaround?
>>>>
>>>>
>>>>
>>>> Thanks in advance....
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Best Regards,
>>>> Ayan Guha
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Best Regards,
>>>> Ayan Guha
>>>> DISCLAIMER
>>>> ==========
>>>> This e-mail may contain privileged and confidential information which
>>>> is the property of Persistent Systems Ltd. It is intended only for the use
>>>> of the individual or entity to which it is addressed. If you are not the
>>>> intended recipient, you are not authorized to read, retain, copy, print,
>>>> distribute or use this message. If you have received this communication in
>>>> error, please notify the sender and delete all copies of this message.
>>>> Persistent Systems Ltd. does not accept any liability for virus infected
>>>> mails.
>>>>
>>> --
>>> Best Regards,
>>> Ayan Guha
>>>
>>
>>
>
>
> --
> Best Regards,
> Ayan Guha
>

Re: HDP 2.5 - Python - Spark-On-Hbase

Posted by ayan guha <gu...@gmail.com>.
Hi

Thanks for all of you, I could get HBase connector working. there are still
some details around namespace is pending, but overall it is working well.

Now, as usual, I would like to use the same concept into Structured
Streaming. Is there any similar way I can use writeStream.format and use
HBase writer? Or any other way to write continuous data to HBase?

best
Ayan

On Tue, Jun 27, 2017 at 2:15 AM, Weiqing Yang <ya...@gmail.com>
wrote:

> For SHC documentation, please refer the README in SHC github, which is
> kept up-to-date.
>
> On Mon, Jun 26, 2017 at 5:46 AM, ayan guha <gu...@gmail.com> wrote:
>
>> Thanks all, I have found correct version of the package. Probably HDP
>> documentation is little behind.
>>
>> Best
>> Ayan
>>
>> On Mon, 26 Jun 2017 at 2:16 pm, Mahesh Sawaiker <
>> mahesh_sawaiker@persistent.com> wrote:
>>
>>> Ayan,
>>>
>>> The location of the logging class was moved from Spark 1.6 to Spark 2.0.
>>>
>>> Looks like you are trying to run 1.6 code on 2.0, I have ported some
>>> code like this before and if you have access to the code you can recompile
>>> it by changing reference to Logging class and directly use the slf4 Logger
>>> class, most of the code tends to be easily portable.
>>>
>>>
>>>
>>> Following is the release note for Spark 2.0
>>>
>>>
>>>
>>> *Removals, Behavior Changes and Deprecations*
>>>
>>> *Removals*
>>>
>>> The following features have been removed in Spark 2.0:
>>>
>>>    - Bagel
>>>    - Support for Hadoop 2.1 and earlier
>>>    - The ability to configure closure serializer
>>>    - HTTPBroadcast
>>>    - TTL-based metadata cleaning
>>>    - *Semi-private class org.apache.spark.Logging. We suggest you use
>>>    slf4j directly.*
>>>    - SparkContext.metricsSystem
>>>
>>> Thanks,
>>>
>>> Mahesh
>>>
>>>
>>>
>>>
>>>
>>> *From:* ayan guha [mailto:guha.ayan@gmail.com]
>>> *Sent:* Monday, June 26, 2017 6:26 AM
>>> *To:* Weiqing Yang
>>> *Cc:* user
>>> *Subject:* Re: HDP 2.5 - Python - Spark-On-Hbase
>>>
>>>
>>>
>>> Hi
>>>
>>>
>>>
>>> I am using following:
>>>
>>>
>>>
>>> --packages com.hortonworks:shc:1.0.0-1.6-s_2.10 --repositories
>>> http://repo.hortonworks.com/content/groups/public/
>>>
>>>
>>>
>>> Is it compatible with Spark 2.X? I would like to use it....
>>>
>>>
>>>
>>> Best
>>>
>>> Ayan
>>>
>>>
>>>
>>> On Sat, Jun 24, 2017 at 2:09 AM, Weiqing Yang <ya...@gmail.com>
>>> wrote:
>>>
>>> Yes.
>>>
>>> What SHC version you were using?
>>>
>>> If hitting any issues, you can post them in SHC github issues. There are
>>> some threads about this.
>>>
>>>
>>>
>>> On Fri, Jun 23, 2017 at 5:46 AM, ayan guha <gu...@gmail.com> wrote:
>>>
>>> Hi
>>>
>>>
>>>
>>> Is it possible to use SHC from Hortonworks with pyspark? If so, any
>>> working code sample available?
>>>
>>>
>>>
>>> Also, I faced an issue while running the samples with Spark 2.0
>>>
>>>
>>>
>>> "Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging"
>>>
>>>
>>>
>>> Any workaround?
>>>
>>>
>>>
>>> Thanks in advance....
>>>
>>>
>>>
>>> --
>>>
>>> Best Regards,
>>> Ayan Guha
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Best Regards,
>>> Ayan Guha
>>> DISCLAIMER
>>> ==========
>>> This e-mail may contain privileged and confidential information which is
>>> the property of Persistent Systems Ltd. It is intended only for the use of
>>> the individual or entity to which it is addressed. If you are not the
>>> intended recipient, you are not authorized to read, retain, copy, print,
>>> distribute or use this message. If you have received this communication in
>>> error, please notify the sender and delete all copies of this message.
>>> Persistent Systems Ltd. does not accept any liability for virus infected
>>> mails.
>>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>
>


-- 
Best Regards,
Ayan Guha

Re: HDP 2.5 - Python - Spark-On-Hbase

Posted by Weiqing Yang <ya...@gmail.com>.
For SHC documentation, please refer the README in SHC github, which is kept
up-to-date.

On Mon, Jun 26, 2017 at 5:46 AM, ayan guha <gu...@gmail.com> wrote:

> Thanks all, I have found correct version of the package. Probably HDP
> documentation is little behind.
>
> Best
> Ayan
>
> On Mon, 26 Jun 2017 at 2:16 pm, Mahesh Sawaiker <
> mahesh_sawaiker@persistent.com> wrote:
>
>> Ayan,
>>
>> The location of the logging class was moved from Spark 1.6 to Spark 2.0.
>>
>> Looks like you are trying to run 1.6 code on 2.0, I have ported some code
>> like this before and if you have access to the code you can recompile it by
>> changing reference to Logging class and directly use the slf4 Logger class,
>> most of the code tends to be easily portable.
>>
>>
>>
>> Following is the release note for Spark 2.0
>>
>>
>>
>> *Removals, Behavior Changes and Deprecations*
>>
>> *Removals*
>>
>> The following features have been removed in Spark 2.0:
>>
>>    - Bagel
>>    - Support for Hadoop 2.1 and earlier
>>    - The ability to configure closure serializer
>>    - HTTPBroadcast
>>    - TTL-based metadata cleaning
>>    - *Semi-private class org.apache.spark.Logging. We suggest you use
>>    slf4j directly.*
>>    - SparkContext.metricsSystem
>>
>> Thanks,
>>
>> Mahesh
>>
>>
>>
>>
>>
>> *From:* ayan guha [mailto:guha.ayan@gmail.com]
>> *Sent:* Monday, June 26, 2017 6:26 AM
>> *To:* Weiqing Yang
>> *Cc:* user
>> *Subject:* Re: HDP 2.5 - Python - Spark-On-Hbase
>>
>>
>>
>> Hi
>>
>>
>>
>> I am using following:
>>
>>
>>
>> --packages com.hortonworks:shc:1.0.0-1.6-s_2.10 --repositories
>> http://repo.hortonworks.com/content/groups/public/
>>
>>
>>
>> Is it compatible with Spark 2.X? I would like to use it....
>>
>>
>>
>> Best
>>
>> Ayan
>>
>>
>>
>> On Sat, Jun 24, 2017 at 2:09 AM, Weiqing Yang <ya...@gmail.com>
>> wrote:
>>
>> Yes.
>>
>> What SHC version you were using?
>>
>> If hitting any issues, you can post them in SHC github issues. There are
>> some threads about this.
>>
>>
>>
>> On Fri, Jun 23, 2017 at 5:46 AM, ayan guha <gu...@gmail.com> wrote:
>>
>> Hi
>>
>>
>>
>> Is it possible to use SHC from Hortonworks with pyspark? If so, any
>> working code sample available?
>>
>>
>>
>> Also, I faced an issue while running the samples with Spark 2.0
>>
>>
>>
>> "Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging"
>>
>>
>>
>> Any workaround?
>>
>>
>>
>> Thanks in advance....
>>
>>
>>
>> --
>>
>> Best Regards,
>> Ayan Guha
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> Best Regards,
>> Ayan Guha
>> DISCLAIMER
>> ==========
>> This e-mail may contain privileged and confidential information which is
>> the property of Persistent Systems Ltd. It is intended only for the use of
>> the individual or entity to which it is addressed. If you are not the
>> intended recipient, you are not authorized to read, retain, copy, print,
>> distribute or use this message. If you have received this communication in
>> error, please notify the sender and delete all copies of this message.
>> Persistent Systems Ltd. does not accept any liability for virus infected
>> mails.
>>
> --
> Best Regards,
> Ayan Guha
>

Re: HDP 2.5 - Python - Spark-On-Hbase

Posted by ayan guha <gu...@gmail.com>.
Thanks all, I have found correct version of the package. Probably HDP
documentation is little behind.

Best
Ayan

On Mon, 26 Jun 2017 at 2:16 pm, Mahesh Sawaiker <
mahesh_sawaiker@persistent.com> wrote:

> Ayan,
>
> The location of the logging class was moved from Spark 1.6 to Spark 2.0.
>
> Looks like you are trying to run 1.6 code on 2.0, I have ported some code
> like this before and if you have access to the code you can recompile it by
> changing reference to Logging class and directly use the slf4 Logger class,
> most of the code tends to be easily portable.
>
>
>
> Following is the release note for Spark 2.0
>
>
>
> *Removals, Behavior Changes and Deprecations*
>
> *Removals*
>
> The following features have been removed in Spark 2.0:
>
>    - Bagel
>    - Support for Hadoop 2.1 and earlier
>    - The ability to configure closure serializer
>    - HTTPBroadcast
>    - TTL-based metadata cleaning
>    - *Semi-private class org.apache.spark.Logging. We suggest you use
>    slf4j directly.*
>    - SparkContext.metricsSystem
>
> Thanks,
>
> Mahesh
>
>
>
>
>
> *From:* ayan guha [mailto:guha.ayan@gmail.com]
> *Sent:* Monday, June 26, 2017 6:26 AM
> *To:* Weiqing Yang
> *Cc:* user
> *Subject:* Re: HDP 2.5 - Python - Spark-On-Hbase
>
>
>
> Hi
>
>
>
> I am using following:
>
>
>
> --packages com.hortonworks:shc:1.0.0-1.6-s_2.10 --repositories
> http://repo.hortonworks.com/content/groups/public/
>
>
>
> Is it compatible with Spark 2.X? I would like to use it....
>
>
>
> Best
>
> Ayan
>
>
>
> On Sat, Jun 24, 2017 at 2:09 AM, Weiqing Yang <ya...@gmail.com>
> wrote:
>
> Yes.
>
> What SHC version you were using?
>
> If hitting any issues, you can post them in SHC github issues. There are
> some threads about this.
>
>
>
> On Fri, Jun 23, 2017 at 5:46 AM, ayan guha <gu...@gmail.com> wrote:
>
> Hi
>
>
>
> Is it possible to use SHC from Hortonworks with pyspark? If so, any
> working code sample available?
>
>
>
> Also, I faced an issue while running the samples with Spark 2.0
>
>
>
> "Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging"
>
>
>
> Any workaround?
>
>
>
> Thanks in advance....
>
>
>
> --
>
> Best Regards,
> Ayan Guha
>
>
>
>
>
>
>
> --
>
> Best Regards,
> Ayan Guha
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is
> the property of Persistent Systems Ltd. It is intended only for the use of
> the individual or entity to which it is addressed. If you are not the
> intended recipient, you are not authorized to read, retain, copy, print,
> distribute or use this message. If you have received this communication in
> error, please notify the sender and delete all copies of this message.
> Persistent Systems Ltd. does not accept any liability for virus infected
> mails.
>
-- 
Best Regards,
Ayan Guha

RE: HDP 2.5 - Python - Spark-On-Hbase

Posted by Mahesh Sawaiker <ma...@persistent.com>.
Ayan,
The location of the logging class was moved from Spark 1.6 to Spark 2.0.
Looks like you are trying to run 1.6 code on 2.0, I have ported some code like this before and if you have access to the code you can recompile it by changing reference to Logging class and directly use the slf4 Logger class, most of the code tends to be easily portable.

Following is the release note for Spark 2.0

Removals, Behavior Changes and Deprecations
Removals
The following features have been removed in Spark 2.0:

  *   Bagel
  *   Support for Hadoop 2.1 and earlier
  *   The ability to configure closure serializer
  *   HTTPBroadcast
  *   TTL-based metadata cleaning
  *   Semi-private class org.apache.spark.Logging. We suggest you use slf4j directly.
  *   SparkContext.metricsSystem
Thanks,
Mahesh


From: ayan guha [mailto:guha.ayan@gmail.com]
Sent: Monday, June 26, 2017 6:26 AM
To: Weiqing Yang
Cc: user
Subject: Re: HDP 2.5 - Python - Spark-On-Hbase

Hi

I am using following:

--packages com.hortonworks:shc:1.0.0-1.6-s_2.10 --repositories http://repo.hortonworks.com/content/groups/public/

Is it compatible with Spark 2.X? I would like to use it....

Best
Ayan

On Sat, Jun 24, 2017 at 2:09 AM, Weiqing Yang <ya...@gmail.com>> wrote:
Yes.
What SHC version you were using?
If hitting any issues, you can post them in SHC github issues. There are some threads about this.

On Fri, Jun 23, 2017 at 5:46 AM, ayan guha <gu...@gmail.com>> wrote:
Hi

Is it possible to use SHC from Hortonworks with pyspark? If so, any working code sample available?

Also, I faced an issue while running the samples with Spark 2.0

"Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging"

Any workaround?

Thanks in advance....

--
Best Regards,
Ayan Guha




--
Best Regards,
Ayan Guha
DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.

Re: HDP 2.5 - Python - Spark-On-Hbase

Posted by ayan guha <gu...@gmail.com>.
Hi

I am using following:

--packages com.hortonworks:shc:1.0.0-1.6-s_2.10 --repositories
http://repo.hortonworks.com/content/groups/public/

Is it compatible with Spark 2.X? I would like to use it....

Best
Ayan

On Sat, Jun 24, 2017 at 2:09 AM, Weiqing Yang <ya...@gmail.com>
wrote:

> Yes.
> What SHC version you were using?
> If hitting any issues, you can post them in SHC github issues. There are
> some threads about this.
>
> On Fri, Jun 23, 2017 at 5:46 AM, ayan guha <gu...@gmail.com> wrote:
>
>> Hi
>>
>> Is it possible to use SHC from Hortonworks with pyspark? If so, any
>> working code sample available?
>>
>> Also, I faced an issue while running the samples with Spark 2.0
>>
>> "Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging"
>>
>> Any workaround?
>>
>> Thanks in advance....
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>
>


-- 
Best Regards,
Ayan Guha

Re: HDP 2.5 - Python - Spark-On-Hbase

Posted by Weiqing Yang <ya...@gmail.com>.
Yes.
What SHC version you were using?
If hitting any issues, you can post them in SHC github issues. There are
some threads about this.

On Fri, Jun 23, 2017 at 5:46 AM, ayan guha <gu...@gmail.com> wrote:

> Hi
>
> Is it possible to use SHC from Hortonworks with pyspark? If so, any
> working code sample available?
>
> Also, I faced an issue while running the samples with Spark 2.0
>
> "Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging"
>
> Any workaround?
>
> Thanks in advance....
>
> --
> Best Regards,
> Ayan Guha
>