You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by venkatesh b <ve...@gmail.com> on 2015/08/06 21:20:50 UTC

Is it worth of using ORC format in my case. Can I replace hive with HBase.

Hi, here I got two things to know.
FIRST:
In our project we use hive.
We daily get new data. We need to process this new data only once. And send
this processed data to RDBMS. Here in processing we majorly use many
complex queries with joins with where condition and grouping functions.
There are many intermediate tables generated around 50 while
processing. Till now we use text format as storage. We came across ORC file
format. I would like to know that since it is one Time querying the table
is it worth of storing as ORC format.
Here we do full table scans.

SECOND:
I came to know about HBase, which is faster.
Can I replace hive with HBase for processing of data daily faster.
Currently it is taking 15hrs daily with hive.

We have two use cases
1. For daily incremental data of around 5-10 million records.
2. For processing of 2billion records.

Please inform me if any other information is needed.

Thanks &Regards
Venkatesh