You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Fengjiao Jiang <gr...@gmail.com> on 2014/06/18 17:26:30 UTC

Use Hadoop and other Apache products for SQL query manipulations

Hi,

We have a large data set originally stored on MS SQL and for intensive data
aggregation manipulation, we’re currently using Vertica. The thing is the
data is very large and sometimes, a “select” or “insert” query which is
very complex may needs even 10 minutes to return the correct results. (the
database size is maybe 2GB)

So we’re thinking whether we can use Hadoop together with some other Apache
Products (built on hadoop) to make the query faster.
For example, if we can use Hadoop & HBase & ZooKeeper and write MR
functions for these “SELECT” “INSERT” or complex queries like that to
improve the query speed?

Also, I don’t know if the combination I listed above is a good one, should
I use Hadoop, HBase and ZooKeepr or should I use Hadoop, Pig and Hive?

My question is mainly a “SQL-on-Hadoop” thing, would please tell me if it’s
possible and if so, would you give me some suggestions? I do appreciate it
a lot !


Thanks.

Best
Judy

Re: Use Hadoop and other Apache products for SQL query manipulations

Posted by Cristóbal Giadach <cr...@gmail.com>.
Try impala or Hawk(
http://www.gopivotal.com/sites/default/files/Hawq_WP_042313_FINAL.pdf), in
my opinion the best choice for SQL-on-Hadoop.


On Wed, Jun 18, 2014 at 11:26 AM, Fengjiao Jiang <gr...@gmail.com>
wrote:

> Hi,
>
> We have a large data set originally stored on MS SQL and for intensive
> data aggregation manipulation, we're currently using Vertica. The thing is
> the data is very large and sometimes, a "select" or "insert" query which is
> very complex may needs even 10 minutes to return the correct results. (the
> database size is maybe 2GB)
>
> So we're thinking whether we can use Hadoop together with some other
> Apache Products (built on hadoop) to make the query faster.
> For example, if we can use Hadoop & HBase & ZooKeeper and write MR
> functions for these "SELECT" "INSERT" or complex queries like that to
> improve the query speed?
>
> Also, I don't know if the combination I listed above is a good one, should
> I use Hadoop, HBase and ZooKeepr or should I use Hadoop, Pig and Hive?
>
> My question is mainly a "SQL-on-Hadoop" thing, would please tell me if
> it's possible and if so, would you give me some suggestions? I do
> appreciate it a lot !
>
>
> Thanks.
>
> Best
> Judy
>

Re: Use Hadoop and other Apache products for SQL query manipulations

Posted by Cristóbal Giadach <cr...@gmail.com>.
Try impala or Hawk(
http://www.gopivotal.com/sites/default/files/Hawq_WP_042313_FINAL.pdf), in
my opinion the best choice for SQL-on-Hadoop.


On Wed, Jun 18, 2014 at 11:26 AM, Fengjiao Jiang <gr...@gmail.com>
wrote:

> Hi,
>
> We have a large data set originally stored on MS SQL and for intensive
> data aggregation manipulation, we're currently using Vertica. The thing is
> the data is very large and sometimes, a "select" or "insert" query which is
> very complex may needs even 10 minutes to return the correct results. (the
> database size is maybe 2GB)
>
> So we're thinking whether we can use Hadoop together with some other
> Apache Products (built on hadoop) to make the query faster.
> For example, if we can use Hadoop & HBase & ZooKeeper and write MR
> functions for these "SELECT" "INSERT" or complex queries like that to
> improve the query speed?
>
> Also, I don't know if the combination I listed above is a good one, should
> I use Hadoop, HBase and ZooKeepr or should I use Hadoop, Pig and Hive?
>
> My question is mainly a "SQL-on-Hadoop" thing, would please tell me if
> it's possible and if so, would you give me some suggestions? I do
> appreciate it a lot !
>
>
> Thanks.
>
> Best
> Judy
>

Re: Use Hadoop and other Apache products for SQL query manipulations

Posted by Cristóbal Giadach <cr...@gmail.com>.
Try impala or Hawk(
http://www.gopivotal.com/sites/default/files/Hawq_WP_042313_FINAL.pdf), in
my opinion the best choice for SQL-on-Hadoop.


On Wed, Jun 18, 2014 at 11:26 AM, Fengjiao Jiang <gr...@gmail.com>
wrote:

> Hi,
>
> We have a large data set originally stored on MS SQL and for intensive
> data aggregation manipulation, we're currently using Vertica. The thing is
> the data is very large and sometimes, a "select" or "insert" query which is
> very complex may needs even 10 minutes to return the correct results. (the
> database size is maybe 2GB)
>
> So we're thinking whether we can use Hadoop together with some other
> Apache Products (built on hadoop) to make the query faster.
> For example, if we can use Hadoop & HBase & ZooKeeper and write MR
> functions for these "SELECT" "INSERT" or complex queries like that to
> improve the query speed?
>
> Also, I don't know if the combination I listed above is a good one, should
> I use Hadoop, HBase and ZooKeepr or should I use Hadoop, Pig and Hive?
>
> My question is mainly a "SQL-on-Hadoop" thing, would please tell me if
> it's possible and if so, would you give me some suggestions? I do
> appreciate it a lot !
>
>
> Thanks.
>
> Best
> Judy
>

Re: Use Hadoop and other Apache products for SQL query manipulations

Posted by Cristóbal Giadach <cr...@gmail.com>.
Try impala or Hawk(
http://www.gopivotal.com/sites/default/files/Hawq_WP_042313_FINAL.pdf), in
my opinion the best choice for SQL-on-Hadoop.


On Wed, Jun 18, 2014 at 11:26 AM, Fengjiao Jiang <gr...@gmail.com>
wrote:

> Hi,
>
> We have a large data set originally stored on MS SQL and for intensive
> data aggregation manipulation, we're currently using Vertica. The thing is
> the data is very large and sometimes, a "select" or "insert" query which is
> very complex may needs even 10 minutes to return the correct results. (the
> database size is maybe 2GB)
>
> So we're thinking whether we can use Hadoop together with some other
> Apache Products (built on hadoop) to make the query faster.
> For example, if we can use Hadoop & HBase & ZooKeeper and write MR
> functions for these "SELECT" "INSERT" or complex queries like that to
> improve the query speed?
>
> Also, I don't know if the combination I listed above is a good one, should
> I use Hadoop, HBase and ZooKeepr or should I use Hadoop, Pig and Hive?
>
> My question is mainly a "SQL-on-Hadoop" thing, would please tell me if
> it's possible and if so, would you give me some suggestions? I do
> appreciate it a lot !
>
>
> Thanks.
>
> Best
> Judy
>