You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Divya Gehlot <di...@gmail.com> on 2016/01/18 06:02:44 UTC
Data Storage for Joins and ACID transactions + Hadoop Cluster
Hi,
Which Data storage is best for multiple joins on the run time in Hadoop.
Tried Hive but performance is poor.
Pointers/Guidance appreciated.
Thanks,
Regards,
Divya
Re: Data Storage for Joins and ACID transactions + Hadoop Cluster
Posted by "mohit.kaushik" <mo...@orkash.com>.
Hive provides a SQL like functionality over hadoop but NOSQL does not
provide all SQL capabilities very well. As the number of joins increase
performance decreases. Instead you should try to model your data in one
table to avoid joins. You can try Apache Accumulo which provides full
control, over data structure and you also don't have have to define
Column families in advance like in HBase you have to. Its fast and most
scalable tested datastore which uses Hadoop in its base.
-Mohit Kaushik
On 01/18/2016 10:32 AM, Divya Gehlot wrote:
> Hi,
> Which Data storage is best for multiple joins on the run time in Hadoop.
> Tried Hive but performance is poor.
> Pointers/Guidance appreciated.
>
>
> Thanks,
> Regards,
> Divya
Re: Data Storage for Joins and ACID transactions + Hadoop Cluster
Posted by "mohit.kaushik" <mo...@orkash.com>.
Hive provides a SQL like functionality over hadoop but NOSQL does not
provide all SQL capabilities very well. As the number of joins increase
performance decreases. Instead you should try to model your data in one
table to avoid joins. You can try Apache Accumulo which provides full
control, over data structure and you also don't have have to define
Column families in advance like in HBase you have to. Its fast and most
scalable tested datastore which uses Hadoop in its base.
-Mohit Kaushik
On 01/18/2016 10:32 AM, Divya Gehlot wrote:
> Hi,
> Which Data storage is best for multiple joins on the run time in Hadoop.
> Tried Hive but performance is poor.
> Pointers/Guidance appreciated.
>
>
> Thanks,
> Regards,
> Divya
Re: Data Storage for Joins and ACID transactions + Hadoop Cluster
Posted by "mohit.kaushik" <mo...@orkash.com>.
Hive provides a SQL like functionality over hadoop but NOSQL does not
provide all SQL capabilities very well. As the number of joins increase
performance decreases. Instead you should try to model your data in one
table to avoid joins. You can try Apache Accumulo which provides full
control, over data structure and you also don't have have to define
Column families in advance like in HBase you have to. Its fast and most
scalable tested datastore which uses Hadoop in its base.
-Mohit Kaushik
On 01/18/2016 10:32 AM, Divya Gehlot wrote:
> Hi,
> Which Data storage is best for multiple joins on the run time in Hadoop.
> Tried Hive but performance is poor.
> Pointers/Guidance appreciated.
>
>
> Thanks,
> Regards,
> Divya
Re: Data Storage for Joins and ACID transactions + Hadoop Cluster
Posted by "mohit.kaushik" <mo...@orkash.com>.
Hive provides a SQL like functionality over hadoop but NOSQL does not
provide all SQL capabilities very well. As the number of joins increase
performance decreases. Instead you should try to model your data in one
table to avoid joins. You can try Apache Accumulo which provides full
control, over data structure and you also don't have have to define
Column families in advance like in HBase you have to. Its fast and most
scalable tested datastore which uses Hadoop in its base.
-Mohit Kaushik
On 01/18/2016 10:32 AM, Divya Gehlot wrote:
> Hi,
> Which Data storage is best for multiple joins on the run time in Hadoop.
> Tried Hive but performance is poor.
> Pointers/Guidance appreciated.
>
>
> Thanks,
> Regards,
> Divya