You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Eugen Cepoi <ce...@gmail.com> on 2013/10/11 17:53:21 UTC

Write to HBase from spark job

Hi there,

I have got a few questions on how best to write to HBase from a spark job.

- If we want to write using TableOutputFormat are we supposed to use
saveAsNewAPIHadoopFile?
- Or should we do it by hand (without TableOutputFormat) in a foreach loop
for example?
- Or should use HFileOutputFormat with saveAsNewAPIHadoopFile?

Thanks,
Eugen

Re: Write to HBase from spark job

Posted by Eugen Cepoi <ce...@gmail.com>.

Hi Matei,

Ok thanks I will try it. Indeed using saveAsNewAPIHadoopFile was not
working, as TableOutputFormat implements Configurable and its setConf
method was never called.

BTW you have done great job with spark, it combines so nicely with scala,
the api is clean and is really easy to work with. I am impressed =)

Eugen


2013/10/12 Matei Zaharia <ma...@gmail.com>

> Hi Eugen,
>
> You should use saveAsHadoopDataset, to which you pass a JobConf object
> that you've configured with TableOutputFormat the same way you would for a
> MapReduce job. The saveAsHadoopFile methods are specifically for output
> formats that go to a filesystem (e.g. HDFS), but HBase isn't a filesystem.
>
> Matei
>
> On Oct 11, 2013, at 8:53 AM, Eugen Cepoi <ce...@gmail.com> wrote:
>
> > Hi there,
> >
> > I have got a few questions on how best to write to HBase from a spark
> job.
> >
> > - If we want to write using TableOutputFormat are we supposed to use
> saveAsNewAPIHadoopFile?
> > - Or should we do it by hand (without TableOutputFormat) in a foreach
> loop for example?
> > - Or should use HFileOutputFormat with saveAsNewAPIHadoopFile?
> >
> > Thanks,
> > Eugen
>
>

Re: Write to HBase from spark job

Posted by Matei Zaharia <ma...@gmail.com>.

Hi Eugen,

You should use saveAsHadoopDataset, to which you pass a JobConf object that you've configured with TableOutputFormat the same way you would for a MapReduce job. The saveAsHadoopFile methods are specifically for output formats that go to a filesystem (e.g. HDFS), but HBase isn't a filesystem.

Matei

On Oct 11, 2013, at 8:53 AM, Eugen Cepoi <ce...@gmail.com> wrote:

> Hi there,
> 
> I have got a few questions on how best to write to HBase from a spark job.
> 
> - If we want to write using TableOutputFormat are we supposed to use saveAsNewAPIHadoopFile?
> - Or should we do it by hand (without TableOutputFormat) in a foreach loop for example?
> - Or should use HFileOutputFormat with saveAsNewAPIHadoopFile?
> 
> Thanks,
> Eugen