You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by Neelesh <ne...@gmail.com> on 2016/04/10 07:21:00 UTC

Spark & Phoenix data load

Hi ,
  Does phoenix-spark's saveToPhoenix use the JDBC driver internally, or
does it do something similar to CSVBulkLoader using HFiles?

Thanks!

Re: Spark & Phoenix data load

Posted by Neelesh <ne...@gmail.com>.
Thanks Josh. I looked at the code as well and you are right.  It would've
been great to disconnect the core bulkloader logic from CSV. That would
make more direct bulkload integrations possible. Hopefully I'll get to that
one of these days.
On Apr 10, 2016 11:52 AM, "Josh Mahonin" <jm...@gmail.com> wrote:

Hi Neelesh,

The saveToPhoenix method uses the MapReduce PhoenixOutputFormat under the
hood, which is a wrapper over the JDBC driver. It's likely not as efficient
as the CSVBulkLoader, although there are performance improvements over a
simple JDBC client as the writes are spread across multiple Spark workers
(depending on the number of partitions in the RDD/DataFrame).

Regards,

Josh

On Sun, Apr 10, 2016 at 1:21 AM, Neelesh <ne...@gmail.com> wrote:

> Hi ,
>   Does phoenix-spark's saveToPhoenix use the JDBC driver internally, or
> does it do something similar to CSVBulkLoader using HFiles?
>
> Thanks!
>
>

Re: Spark & Phoenix data load

Posted by Josh Mahonin <jm...@gmail.com>.
Hi Neelesh,

The saveToPhoenix method uses the MapReduce PhoenixOutputFormat under the
hood, which is a wrapper over the JDBC driver. It's likely not as efficient
as the CSVBulkLoader, although there are performance improvements over a
simple JDBC client as the writes are spread across multiple Spark workers
(depending on the number of partitions in the RDD/DataFrame).

Regards,

Josh

On Sun, Apr 10, 2016 at 1:21 AM, Neelesh <ne...@gmail.com> wrote:

> Hi ,
>   Does phoenix-spark's saveToPhoenix use the JDBC driver internally, or
> does it do something similar to CSVBulkLoader using HFiles?
>
> Thanks!
>
>