You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Eric Wilson <er...@aver.io> on 2017/04/18 17:58:09 UTC

Writing to HBase from Spark

I've been spending a few days trying to figure out how to write to HBase
from a Spark RDD -- using Python.

I recently found that Spark has removed some of it's pythonconverters,
because (as this commit
<https://github.com/apache/spark/commit/a680562a6f87a03a00f71bad1c424267ae75c641>
 shows):
*"HBase has had Spark bindings for **a while and is even including them in
the HBase distribution in the next **version, making the examples obsolete.
The same applies to Cassandra, which **seems to have a proper Spark binding
library already."*

Can anyone point me to documentation of Spark/HBase/Python integration?

Thanks.


--

<http://aver.io> <http://aver.io>[image: aver] <http://aver.io>
Eric Wilson
Software Engineer

-- 

------------------------------
This email, including attachments, may contain information that is 
privileged, confidential or is exempt from disclosure under applicable law 
(including, but not limited to, protected health information). It is not 
intended for transmission to, or receipt by, any unauthorized persons. If 
the reader of this message is not the intended recipient, or the employee 
or agent responsible for delivering the message to the intended recipient, 
you are hereby notified that any dissemination, distribution or copying of 
this communication is strictly prohibited. If you believe this email was 
sent to you in error, do not read it. Please notify the sender immediately 
informing them of the error and delete all copies and attachments of the 
message from your system. Thank you.

Re: Writing to HBase from Spark

Posted by Sean Busbey <bu...@apache.org>.
PySpark just runs a normal Python interpreter, right? (It doesn't use e.g.
Jython?) If that's the case then the hbase-spark module as it currently
stands isn't going to buy folks anything because it is all JVM based.

AFAIK, your best bet is going to be to use one or more thrift proxy
services for HBase and then rely on the python binding for that thrift RPC
to talk to HBase.

-busbey

On Tue, Apr 18, 2017 at 2:10 PM Ted Yu <yu...@gmail.com> wrote:

> Not that I know of.
>
> On Tue, Apr 18, 2017 at 12:02 PM, Eric Wilson <er...@aver.io> wrote:
>
> > So if I'm using PySpark 2.1 and HBase 1.2.5 there is no way for them to
> > communicate with each other?
> >
> > Eric
> >
> > On Tue, Apr 18, 2017 at 2:03 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > There is some document here (slightly outdated) :
> > > http://hbase.apache.org/book.html#spark
> > >
> > > Currently hbase-spark module in master branch only supports Spark 1.6.x
> > >
> > > See HBASE-16179 for on-going work for Spark 2.y
> > >
> > > FYI
> > >
> > > On Tue, Apr 18, 2017 at 10:58 AM, Eric Wilson <er...@aver.io>
> > wrote:
> > >
> > > > I've been spending a few days trying to figure out how to write to
> > HBase
> > > > from a Spark RDD -- using Python.
> > > >
> > > > I recently found that Spark has removed some of it's
> pythonconverters,
> > > > because (as this commit
> > > > <
> https://github.com/apache/spark/commit/a680562a6f87a03a00f71bad1c4242
> > > > 67ae75c641>
> > > >  shows):
> > > > *"HBase has had Spark bindings for **a while and is even including
> them
> > > in
> > > > the HBase distribution in the next **version, making the examples
> > > obsolete.
> > > > The same applies to Cassandra, which **seems to have a proper Spark
> > > binding
> > > > library already."*
> > > >
> > > > Can anyone point me to documentation of Spark/HBase/Python
> integration?
> > > >
> > > > Thanks.
> > > >
> > > >
> > > > --
> > > >
> > > > <http://aver.io> <http://aver.io>[image: aver] <http://aver.io>
> > > > Eric Wilson
> > > > Software Engineer
> > > >
> > > > --
> > > >
> > > > ------------------------------
> > > > This email, including attachments, may contain information that is
> > > > privileged, confidential or is exempt from disclosure under
> applicable
> > > law
> > > > (including, but not limited to, protected health information). It is
> > not
> > > > intended for transmission to, or receipt by, any unauthorized
> persons.
> > If
> > > > the reader of this message is not the intended recipient, or the
> > employee
> > > > or agent responsible for delivering the message to the intended
> > > recipient,
> > > > you are hereby notified that any dissemination, distribution or
> copying
> > > of
> > > > this communication is strictly prohibited. If you believe this email
> > was
> > > > sent to you in error, do not read it. Please notify the sender
> > > immediately
> > > > informing them of the error and delete all copies and attachments of
> > the
> > > > message from your system. Thank you.
> > > >
> > >
> >
> >
> >
> > --
> >
> >
> > --
> >
> > <http://aver.io>[image: aver] <http://aver.io> <http://aver.io>
> > Eric Wilson
> > Software Engineer
> >
> > --
> >
> > ------------------------------
> > This email, including attachments, may contain information that is
> > privileged, confidential or is exempt from disclosure under applicable
> law
> > (including, but not limited to, protected health information). It is not
> > intended for transmission to, or receipt by, any unauthorized persons. If
> > the reader of this message is not the intended recipient, or the employee
> > or agent responsible for delivering the message to the intended
> recipient,
> > you are hereby notified that any dissemination, distribution or copying
> of
> > this communication is strictly prohibited. If you believe this email was
> > sent to you in error, do not read it. Please notify the sender
> immediately
> > informing them of the error and delete all copies and attachments of the
> > message from your system. Thank you.
> >
>

Re: Writing to HBase from Spark

Posted by Ted Yu <yu...@gmail.com>.
Not that I know of.

On Tue, Apr 18, 2017 at 12:02 PM, Eric Wilson <er...@aver.io> wrote:

> So if I'm using PySpark 2.1 and HBase 1.2.5 there is no way for them to
> communicate with each other?
>
> Eric
>
> On Tue, Apr 18, 2017 at 2:03 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > There is some document here (slightly outdated) :
> > http://hbase.apache.org/book.html#spark
> >
> > Currently hbase-spark module in master branch only supports Spark 1.6.x
> >
> > See HBASE-16179 for on-going work for Spark 2.y
> >
> > FYI
> >
> > On Tue, Apr 18, 2017 at 10:58 AM, Eric Wilson <er...@aver.io>
> wrote:
> >
> > > I've been spending a few days trying to figure out how to write to
> HBase
> > > from a Spark RDD -- using Python.
> > >
> > > I recently found that Spark has removed some of it's pythonconverters,
> > > because (as this commit
> > > <https://github.com/apache/spark/commit/a680562a6f87a03a00f71bad1c4242
> > > 67ae75c641>
> > >  shows):
> > > *"HBase has had Spark bindings for **a while and is even including them
> > in
> > > the HBase distribution in the next **version, making the examples
> > obsolete.
> > > The same applies to Cassandra, which **seems to have a proper Spark
> > binding
> > > library already."*
> > >
> > > Can anyone point me to documentation of Spark/HBase/Python integration?
> > >
> > > Thanks.
> > >
> > >
> > > --
> > >
> > > <http://aver.io> <http://aver.io>[image: aver] <http://aver.io>
> > > Eric Wilson
> > > Software Engineer
> > >
> > > --
> > >
> > > ------------------------------
> > > This email, including attachments, may contain information that is
> > > privileged, confidential or is exempt from disclosure under applicable
> > law
> > > (including, but not limited to, protected health information). It is
> not
> > > intended for transmission to, or receipt by, any unauthorized persons.
> If
> > > the reader of this message is not the intended recipient, or the
> employee
> > > or agent responsible for delivering the message to the intended
> > recipient,
> > > you are hereby notified that any dissemination, distribution or copying
> > of
> > > this communication is strictly prohibited. If you believe this email
> was
> > > sent to you in error, do not read it. Please notify the sender
> > immediately
> > > informing them of the error and delete all copies and attachments of
> the
> > > message from your system. Thank you.
> > >
> >
>
>
>
> --
>
>
> --
>
> <http://aver.io>[image: aver] <http://aver.io> <http://aver.io>
> Eric Wilson
> Software Engineer
>
> --
>
> ------------------------------
> This email, including attachments, may contain information that is
> privileged, confidential or is exempt from disclosure under applicable law
> (including, but not limited to, protected health information). It is not
> intended for transmission to, or receipt by, any unauthorized persons. If
> the reader of this message is not the intended recipient, or the employee
> or agent responsible for delivering the message to the intended recipient,
> you are hereby notified that any dissemination, distribution or copying of
> this communication is strictly prohibited. If you believe this email was
> sent to you in error, do not read it. Please notify the sender immediately
> informing them of the error and delete all copies and attachments of the
> message from your system. Thank you.
>

Re: Writing to HBase from Spark

Posted by Eric Wilson <er...@aver.io>.
So if I'm using PySpark 2.1 and HBase 1.2.5 there is no way for them to
communicate with each other?

Eric

On Tue, Apr 18, 2017 at 2:03 PM, Ted Yu <yu...@gmail.com> wrote:

> There is some document here (slightly outdated) :
> http://hbase.apache.org/book.html#spark
>
> Currently hbase-spark module in master branch only supports Spark 1.6.x
>
> See HBASE-16179 for on-going work for Spark 2.y
>
> FYI
>
> On Tue, Apr 18, 2017 at 10:58 AM, Eric Wilson <er...@aver.io> wrote:
>
> > I've been spending a few days trying to figure out how to write to HBase
> > from a Spark RDD -- using Python.
> >
> > I recently found that Spark has removed some of it's pythonconverters,
> > because (as this commit
> > <https://github.com/apache/spark/commit/a680562a6f87a03a00f71bad1c4242
> > 67ae75c641>
> >  shows):
> > *"HBase has had Spark bindings for **a while and is even including them
> in
> > the HBase distribution in the next **version, making the examples
> obsolete.
> > The same applies to Cassandra, which **seems to have a proper Spark
> binding
> > library already."*
> >
> > Can anyone point me to documentation of Spark/HBase/Python integration?
> >
> > Thanks.
> >
> >
> > --
> >
> > <http://aver.io> <http://aver.io>[image: aver] <http://aver.io>
> > Eric Wilson
> > Software Engineer
> >
> > --
> >
> > ------------------------------
> > This email, including attachments, may contain information that is
> > privileged, confidential or is exempt from disclosure under applicable
> law
> > (including, but not limited to, protected health information). It is not
> > intended for transmission to, or receipt by, any unauthorized persons. If
> > the reader of this message is not the intended recipient, or the employee
> > or agent responsible for delivering the message to the intended
> recipient,
> > you are hereby notified that any dissemination, distribution or copying
> of
> > this communication is strictly prohibited. If you believe this email was
> > sent to you in error, do not read it. Please notify the sender
> immediately
> > informing them of the error and delete all copies and attachments of the
> > message from your system. Thank you.
> >
>



-- 


--

<http://aver.io>[image: aver] <http://aver.io> <http://aver.io>
Eric Wilson
Software Engineer

-- 

------------------------------
This email, including attachments, may contain information that is 
privileged, confidential or is exempt from disclosure under applicable law 
(including, but not limited to, protected health information). It is not 
intended for transmission to, or receipt by, any unauthorized persons. If 
the reader of this message is not the intended recipient, or the employee 
or agent responsible for delivering the message to the intended recipient, 
you are hereby notified that any dissemination, distribution or copying of 
this communication is strictly prohibited. If you believe this email was 
sent to you in error, do not read it. Please notify the sender immediately 
informing them of the error and delete all copies and attachments of the 
message from your system. Thank you.

Re: Writing to HBase from Spark

Posted by Ted Yu <yu...@gmail.com>.
There is some document here (slightly outdated) :
http://hbase.apache.org/book.html#spark

Currently hbase-spark module in master branch only supports Spark 1.6.x

See HBASE-16179 for on-going work for Spark 2.y

FYI

On Tue, Apr 18, 2017 at 10:58 AM, Eric Wilson <er...@aver.io> wrote:

> I've been spending a few days trying to figure out how to write to HBase
> from a Spark RDD -- using Python.
>
> I recently found that Spark has removed some of it's pythonconverters,
> because (as this commit
> <https://github.com/apache/spark/commit/a680562a6f87a03a00f71bad1c4242
> 67ae75c641>
>  shows):
> *"HBase has had Spark bindings for **a while and is even including them in
> the HBase distribution in the next **version, making the examples obsolete.
> The same applies to Cassandra, which **seems to have a proper Spark binding
> library already."*
>
> Can anyone point me to documentation of Spark/HBase/Python integration?
>
> Thanks.
>
>
> --
>
> <http://aver.io> <http://aver.io>[image: aver] <http://aver.io>
> Eric Wilson
> Software Engineer
>
> --
>
> ------------------------------
> This email, including attachments, may contain information that is
> privileged, confidential or is exempt from disclosure under applicable law
> (including, but not limited to, protected health information). It is not
> intended for transmission to, or receipt by, any unauthorized persons. If
> the reader of this message is not the intended recipient, or the employee
> or agent responsible for delivering the message to the intended recipient,
> you are hereby notified that any dissemination, distribution or copying of
> this communication is strictly prohibited. If you believe this email was
> sent to you in error, do not read it. Please notify the sender immediately
> informing them of the error and delete all copies and attachments of the
> message from your system. Thank you.
>