You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@phoenix.apache.org by "Riesland, Zack" <Za...@sensus.com> on 2016/02/24 13:44:19 UTC

leveraging hive.hbase.generatehfiles

We continue to have issues getting large amounts of data from Hive into Phoenix.

BulkLoading is very slow and often fails for very large data sets.

I stumbled upon this article that seems to present an interesting alternative:

https://community.hortonworks.com/articles/2745/creating-hbase-hfiles-from-an-existing-hive-table.html

What would it take to adapt this approach to Phoenix? I'm guessing it would primarily just be a matter of also updating some system tables (?)

Thoughts?

Thanks!

Re: leveraging hive.hbase.generatehfiles

Posted by Gabriel Reid <ga...@gmail.com>.

Hi Zack,

If bulk loading is currently slow or error prone, I don't think that
this approach would improve the situation.

>From what I understand from that link, this is a way to copy the
contents of a Hive table into HFiles. Hive operates via mapreduce
jobs, so this is technically a map reduce job that reads from an input
(probably some kind of text file, but could be anything) and creates
HFiles. The current Phoenix bulk loader is exactly the same thing: a
map reduce job that reads from text files and creates HFiles.

That being said, it's not good that you're currently (still) having
issues with performance and/or stability.

There is currently some work underway to improve the performance of
bulk loading (PHOENIX-1973), as well as an important bug fix memory
usage (PHOENIX-2649).

Are the specific issues that you're seeing currently logged in Jira?
If not, could you give some specifics here, or even better, in Jira,
on what you're seeing in terms of performance and stability issues.
The performance that you are expecting to get (i.e. how fast would it
need to be for you to not consider it really slow) would also be very
useful.

- Gabriel

On Wed, Feb 24, 2016 at 1:44 PM, Riesland, Zack
<Za...@sensus.com> wrote:
> We continue to have issues getting large amounts of data from Hive into
> Phoenix.
>
>
>
> BulkLoading is very slow and often fails for very large data sets.
>
>
>
> I stumbled upon this article that seems to present an interesting
> alternative:
>
>
>
> https://community.hortonworks.com/articles/2745/creating-hbase-hfiles-from-an-existing-hive-table.html
>
>
>
> What would it take to adapt this approach to Phoenix? I’m guessing it would
> primarily just be a matter of also updating some system tables (?)
>
>
>
> Thoughts?
>
>
>
> Thanks!
>
>