You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Divya Gehlot <di...@gmail.com> on 2016/01/18 06:02:44 UTC

Data Storage for Joins and ACID transactions + Hadoop Cluster

Hi,
Which Data storage is best for multiple joins on the run time in Hadoop.
Tried Hive but performance is poor.
Pointers/Guidance appreciated.


Thanks,
Regards,
Divya

Re: Data Storage for Joins and ACID transactions + Hadoop Cluster

Posted by "mohit.kaushik" <mo...@orkash.com>.
Hive provides a SQL like functionality over hadoop but NOSQL does not 
provide all SQL capabilities very well. As the number of joins increase 
performance decreases. Instead you should try to model your data in one 
table to avoid joins. You can try Apache Accumulo which provides full 
control, over data structure and you also don't have have to define 
Column families in advance like in HBase you have to. Its fast and most 
scalable tested datastore which uses Hadoop in its base.

-Mohit Kaushik

On 01/18/2016 10:32 AM, Divya Gehlot wrote:
> Hi,
> Which Data storage is best for multiple joins on the run time in Hadoop.
> Tried Hive but performance is poor.
> Pointers/Guidance appreciated.
>
>
> Thanks,
> Regards,
> Divya

Re: Data Storage for Joins and ACID transactions + Hadoop Cluster

Posted by "mohit.kaushik" <mo...@orkash.com>.
Hive provides a SQL like functionality over hadoop but NOSQL does not 
provide all SQL capabilities very well. As the number of joins increase 
performance decreases. Instead you should try to model your data in one 
table to avoid joins. You can try Apache Accumulo which provides full 
control, over data structure and you also don't have have to define 
Column families in advance like in HBase you have to. Its fast and most 
scalable tested datastore which uses Hadoop in its base.

-Mohit Kaushik

On 01/18/2016 10:32 AM, Divya Gehlot wrote:
> Hi,
> Which Data storage is best for multiple joins on the run time in Hadoop.
> Tried Hive but performance is poor.
> Pointers/Guidance appreciated.
>
>
> Thanks,
> Regards,
> Divya

Re: Data Storage for Joins and ACID transactions + Hadoop Cluster

Posted by "mohit.kaushik" <mo...@orkash.com>.
Hive provides a SQL like functionality over hadoop but NOSQL does not 
provide all SQL capabilities very well. As the number of joins increase 
performance decreases. Instead you should try to model your data in one 
table to avoid joins. You can try Apache Accumulo which provides full 
control, over data structure and you also don't have have to define 
Column families in advance like in HBase you have to. Its fast and most 
scalable tested datastore which uses Hadoop in its base.

-Mohit Kaushik

On 01/18/2016 10:32 AM, Divya Gehlot wrote:
> Hi,
> Which Data storage is best for multiple joins on the run time in Hadoop.
> Tried Hive but performance is poor.
> Pointers/Guidance appreciated.
>
>
> Thanks,
> Regards,
> Divya

Re: Data Storage for Joins and ACID transactions + Hadoop Cluster

Posted by "mohit.kaushik" <mo...@orkash.com>.
Hive provides a SQL like functionality over hadoop but NOSQL does not 
provide all SQL capabilities very well. As the number of joins increase 
performance decreases. Instead you should try to model your data in one 
table to avoid joins. You can try Apache Accumulo which provides full 
control, over data structure and you also don't have have to define 
Column families in advance like in HBase you have to. Its fast and most 
scalable tested datastore which uses Hadoop in its base.

-Mohit Kaushik

On 01/18/2016 10:32 AM, Divya Gehlot wrote:
> Hi,
> Which Data storage is best for multiple joins on the run time in Hadoop.
> Tried Hive but performance is poor.
> Pointers/Guidance appreciated.
>
>
> Thanks,
> Regards,
> Divya