You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Eugen Cepoi <ce...@gmail.com> on 2013/10/11 17:53:21 UTC
Write to HBase from spark job
Hi there,
I have got a few questions on how best to write to HBase from a spark job.
- If we want to write using TableOutputFormat are we supposed to use
saveAsNewAPIHadoopFile?
- Or should we do it by hand (without TableOutputFormat) in a foreach loop
for example?
- Or should use HFileOutputFormat with saveAsNewAPIHadoopFile?
Thanks,
Eugen
Re: Write to HBase from spark job
Posted by Eugen Cepoi <ce...@gmail.com>.
Hi Matei,
Ok thanks I will try it. Indeed using saveAsNewAPIHadoopFile was not
working, as TableOutputFormat implements Configurable and its setConf
method was never called.
BTW you have done great job with spark, it combines so nicely with scala,
the api is clean and is really easy to work with. I am impressed =)
Eugen
2013/10/12 Matei Zaharia <ma...@gmail.com>
> Hi Eugen,
>
> You should use saveAsHadoopDataset, to which you pass a JobConf object
> that you've configured with TableOutputFormat the same way you would for a
> MapReduce job. The saveAsHadoopFile methods are specifically for output
> formats that go to a filesystem (e.g. HDFS), but HBase isn't a filesystem.
>
> Matei
>
> On Oct 11, 2013, at 8:53 AM, Eugen Cepoi <ce...@gmail.com> wrote:
>
> > Hi there,
> >
> > I have got a few questions on how best to write to HBase from a spark
> job.
> >
> > - If we want to write using TableOutputFormat are we supposed to use
> saveAsNewAPIHadoopFile?
> > - Or should we do it by hand (without TableOutputFormat) in a foreach
> loop for example?
> > - Or should use HFileOutputFormat with saveAsNewAPIHadoopFile?
> >
> > Thanks,
> > Eugen
>
>
Re: Write to HBase from spark job
Posted by Matei Zaharia <ma...@gmail.com>.
Hi Eugen,
You should use saveAsHadoopDataset, to which you pass a JobConf object that you've configured with TableOutputFormat the same way you would for a MapReduce job. The saveAsHadoopFile methods are specifically for output formats that go to a filesystem (e.g. HDFS), but HBase isn't a filesystem.
Matei
On Oct 11, 2013, at 8:53 AM, Eugen Cepoi <ce...@gmail.com> wrote:
> Hi there,
>
> I have got a few questions on how best to write to HBase from a spark job.
>
> - If we want to write using TableOutputFormat are we supposed to use saveAsNewAPIHadoopFile?
> - Or should we do it by hand (without TableOutputFormat) in a foreach loop for example?
> - Or should use HFileOutputFormat with saveAsNewAPIHadoopFile?
>
> Thanks,
> Eugen