You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Sreenath Menon <sr...@gmail.com> on 2012/06/04 11:12:12 UTC

Front end visualization tool with Hive (when using as a warehouse)

Hi all

I am new to hive and am working on analysis of twitter data with Hive and
Hadoop in a 27node cluster.
At present am using Microsoft powerpivot as the visualization tool for
visual representation of analysis done using Hive and have got some really
good results and I am stunned by the scalability power of the Hadoop system.

As a next step, I would like to evaluate the warehousing capabilities of
Hive for business data.
Any insights into this is welcome. And am facing problem of delegating job
to Hive/Powerpivot as Powerpivot itself has capabilities of being a
warehouse tool. Any other good visualization tools for usage with Hive is
also welcome.

For analyzing twitter data, I just ran complex Hive queries for each of
analysis done. But for a warehouse, this does not sound like a good
solution.

Any help is greatly appreciated.

Thanks.

Re: Front end visualization tool with Hive (when using as a warehouse)

Posted by Jagat <ja...@gmail.com>.
Hello Sreenath,

Beside the tools mentioned by Bejoy you can also refer to Pentaho and Hive
both play well.

Regards,

Jagat Singh

On Mon, Jun 4, 2012 at 3:49 PM, Bejoy Ks <be...@yahoo.com> wrote:

> Hi Sreenath
>
>      If you are looking at a UI for queries then Cloudera's hue is the
> best choice. Also you do have odbc connectors that integrates BI tools like
> microstrategy, tableau etc with hive.
>
> Regards
> Bejoy KS
>
>   ------------------------------
> *From:* Sreenath Menon <sr...@gmail.com>
> *To:* user@hive.apache.org
> *Sent:* Monday, June 4, 2012 2:42 PM
> *Subject:* Front end visualization tool with Hive (when using as a
> warehouse)
>
> Hi all
>
> I am new to hive and am working on analysis of twitter data with Hive and
> Hadoop in a 27node cluster.
> At present am using Microsoft powerpivot as the visualization tool for
> visual representation of analysis done using Hive and have got some really
> good results and I am stunned by the scalability power of the Hadoop system.
>
> As a next step, I would like to evaluate the warehousing capabilities of
> Hive for business data.
> Any insights into this is welcome. And am facing problem of delegating job
> to Hive/Powerpivot as Powerpivot itself has capabilities of being a
> warehouse tool. Any other good visualization tools for usage with Hive is
> also welcome.
>
> For analyzing twitter data, I just ran complex Hive queries for each of
> analysis done. But for a warehouse, this does not sound like a good
> solution.
>
> Any help is greatly appreciated.
>
> Thanks.
>
>
>

Re: Front end visualization tool with Hive (when using as a warehouse)

Posted by Sreenath Menon <sr...@gmail.com>.
Thanks all
All help is greatly appreciated. Pl. feel free to post whatever comes to
your mind.
Learned a lot from this conversation.
Pl. post any findings on this topic : Hive as a warehouse - limitations

Thanks

Re: Front end visualization tool with Hive (when using as a warehouse)

Posted by Mark Grover <mg...@oanda.com>.
Not sure if this would fall in the line of discussion but thought of mentioning it anyways.

A lot of people who run Hive as a warehouse, would load up their aggregations and results in traditional RDBMS database (using something like Sqoop) and then use one of the visualization tools to visualize those aggregations without having to wait for Hive-like timings.

Mark
----- Original Message -----
From: "Anand Ladda" <la...@microstrategy.com>
To: user@hive.apache.org, "Bejoy Ks" <be...@yahoo.com>
Sent: Monday, June 4, 2012 9:32:02 AM
Subject: RE: Front end visualization tool with Hive (when using as a warehouse)




I agree with Bejoy’s assessment – Hive is good for processing large volumes of data in a batch manner. But for real-time or any complex SQL based analysis you would typically want to have some type of a RDBMS in the mix along with Hadoop/Hive. In terms of what’s missing in Hive today - On the query side Hive doesn’t yet support all flavors of subqueries (correlated subqueries to be specific. There are potential workarounds for the non-correlated ones), it also doesn’t support inserting data from a stream i.e, INSERT INTO TABLE VALUES (…) type syntax, Hive’s query optimizer is mostly rule-based at this time although there’s push to move towards a cost-based one. On the administration side there’s no workload management/job prioritization scheme like a typical RDBMS, Hive Server isn’t thread-safe and also doesn’t yet have any kind of security/authentication scheme. 









From: Bejoy Ks [mailto:bejoy_ks@yahoo.com] 
Sent: Monday, June 04, 2012 7:20 AM 
To: user@hive.apache.org 
Subject: Re: Front end visualization tool with Hive (when using as a warehouse) 





Hi Sreenath 





First of all don't take hive like a RDBMS system, while designing your solution. It is an awesome tool when it comes to processing of huge volumes of data in non real time mode. If any of your use cases comes with 'updates' on rows, it is not supported in hive. It is pretty expensive to have a work around for updates as well. (you can implement it on overwriting a per partition level in the most granular manner, still it is expensive) 





By the way I'm not a DWH guy, may be others can add on their experience over these. 





Regards 


Bejoy KS 









From: Sreenath Menon < sreenathmenon5@gmail.com > 
To: user@hive.apache.org ; Bejoy Ks < bejoy_ks@yahoo.com > 
Sent: Monday, June 4, 2012 4:25 PM 
Subject: Re: Front end visualization tool with Hive (when using as a warehouse) 






Hi Bejoy 

I am not looking for just an UI for queries (even though at first, when working on twitter data, that was of my interest). But, now I am planning on using Hive as a warehouse with a front end in-memory processing engine. Microstrategy or tableau would be a good choice. 

Now further refining the problem, I would ask what is the warehousing power of Hive when compared to a traditional warehouse. Can Hive perform all operations performed/required in a warehouse. If not, where are the short comings which I need to deal with. 

Always thankful for your apt assistance. 


On Mon, Jun 4, 2012 at 3:49 PM, Bejoy Ks < bejoy_ks@yahoo.com > wrote: 




Hi Sreenath 





If you are looking at a UI for queries then Cloudera's hue is the best choice. Also you do have odbc connectors that integrates BI tools like microstrategy, tableau etc with hive. 





Regards 


Bejoy KS 









From: Sreenath Menon < sreenathmenon5@gmail.com > 
To: user@hive.apache.org 
Sent: Monday, June 4, 2012 2:42 PM 
Subject: Front end visualization tool with Hive (when using as a warehouse) 






Hi all 

I am new to hive and am working on analysis of twitter data with Hive and Hadoop in a 27node cluster. 
At present am using Microsoft powerpivot as the visualization tool for visual representation of analysis done using Hive and have got some really good results and I am stunned by the scalability power of the Hadoop system. 

As a next step, I would like to evaluate the warehousing capabilities of Hive for business data. 
Any insights into this is welcome. And am facing problem of delegating job to Hive/Powerpivot as Powerpivot itself has capabilities of being a warehouse tool. Any other good visualization tools for usage with Hive is also welcome. 

For analyzing twitter data, I just ran complex Hive queries for each of analysis done. But for a warehouse, this does not sound like a good solution. 

Any help is greatly appreciated. 

Thanks. 






RE: Front end visualization tool with Hive (when using as a warehouse)

Posted by "Ladda, Anand" <la...@microstrategy.com>.
I agree with Bejoy's assessment - Hive is good for processing large volumes of data in a batch manner. But for real-time or any complex SQL based analysis you would typically want to have some type of a RDBMS in the mix along with Hadoop/Hive. In terms of what's missing in Hive today - On the query side Hive doesn't yet support all flavors of subqueries (correlated subqueries to be specific. There are potential workarounds for the non-correlated ones), it also doesn't support inserting data from a stream i.e, INSERT INTO TABLE VALUES (...) type syntax, Hive's query optimizer is mostly rule-based at this time although there's push to move towards a cost-based one. On the administration side there's no workload management/job prioritization scheme like a typical RDBMS, Hive Server isn't thread-safe and also doesn't yet have any kind of security/authentication scheme.



From: Bejoy Ks [mailto:bejoy_ks@yahoo.com]
Sent: Monday, June 04, 2012 7:20 AM
To: user@hive.apache.org
Subject: Re: Front end visualization tool with Hive (when using as a warehouse)

Hi Sreenath

First of all don't take hive like a RDBMS system, while designing your solution. It is an awesome tool when it comes to processing of huge volumes of data in non real time mode. If any of your use cases comes with 'updates' on rows, it is not supported in hive. It is pretty expensive to have a work around for updates as well. (you can implement it on overwriting a per partition level in the most granular manner, still it is expensive)

By the way I'm not a DWH guy, may be others can add on their experience over these.

Regards
Bejoy KS

________________________________
From: Sreenath Menon <sr...@gmail.com>>
To: user@hive.apache.org<ma...@hive.apache.org>; Bejoy Ks <be...@yahoo.com>>
Sent: Monday, June 4, 2012 4:25 PM
Subject: Re: Front end visualization tool with Hive (when using as a warehouse)


Hi Bejoy

I am not looking for just an UI for queries (even though at first, when working on twitter data, that was of my interest). But, now I am planning on using Hive as a warehouse with a front end in-memory processing engine. Microstrategy or tableau would be a good choice.

Now further refining the problem, I would ask what is the warehousing power of Hive when compared to a traditional warehouse. Can Hive perform all operations performed/required in a warehouse. If not, where are the short comings which I need to deal with.

Always thankful for your apt assistance.
On Mon, Jun 4, 2012 at 3:49 PM, Bejoy Ks <be...@yahoo.com>> wrote:
Hi Sreenath

     If you are looking at a UI for queries then Cloudera's hue is the best choice. Also you do have odbc connectors that integrates BI tools like microstrategy, tableau etc with hive.

Regards
Bejoy KS

________________________________
From: Sreenath Menon <sr...@gmail.com>>
To: user@hive.apache.org<ma...@hive.apache.org>
Sent: Monday, June 4, 2012 2:42 PM
Subject: Front end visualization tool with Hive (when using as a warehouse)

Hi all

I am new to hive and am working on analysis of twitter data with Hive and Hadoop in a 27node cluster.
At present am using Microsoft powerpivot as the visualization tool for visual representation of analysis done using Hive and have got some really good results and I am stunned by the scalability power of the Hadoop system.

As a next step, I would like to evaluate the warehousing capabilities of Hive for business data.
Any insights into this is welcome. And am facing problem of delegating job to Hive/Powerpivot as Powerpivot itself has capabilities of being a warehouse tool. Any other good visualization tools for usage with Hive is also welcome.

For analyzing twitter data, I just ran complex Hive queries for each of analysis done. But for a warehouse, this does not sound like a good solution.

Any help is greatly appreciated.

Thanks.




Re: Front end visualization tool with Hive (when using as a warehouse)

Posted by Bejoy Ks <be...@yahoo.com>.
Hi Sreenath

First of all don't take hive like a RDBMS system, while designing your solution. It is an awesome tool when it comes to processing of huge volumes of data in non real time mode. If any of your use cases comes with 'updates' on rows, it is not supported in hive. It is pretty expensive to have a work around for updates as well. (you can implement it on overwriting a per partition level in the most granular manner, still it is expensive)

By the way I'm not a DWH guy, may be others can add on their experience over these.

Regards
Bejoy KS


________________________________
 From: Sreenath Menon <sr...@gmail.com>
To: user@hive.apache.org; Bejoy Ks <be...@yahoo.com> 
Sent: Monday, June 4, 2012 4:25 PM
Subject: Re: Front end visualization tool with Hive (when using as a warehouse)
 

Hi Bejoy

I am not looking for just an UI for queries (even though at first, when working on twitter data, that was of my interest). But, now I am planning on using Hive as a warehouse with a front end in-memory processing engine. Microstrategy or tableau would be a good choice.

Now further refining the problem, I would ask what is the warehousing power of Hive when compared to a traditional warehouse. Can Hive perform all operations performed/required in a warehouse. If not, where are the short comings which I need to deal with.
  
Always thankful for your apt assistance.


On Mon, Jun 4, 2012 at 3:49 PM, Bejoy Ks <be...@yahoo.com> wrote:

Hi Sreenath
>
>
>     If you are looking at a UI for queries then Cloudera's hue is the best choice. Also you do have odbc connectors that integrates BI tools like microstrategy, tableau etc with hive.
>
>
>Regards
>Bejoy KS
>
>
>
>________________________________
> From: Sreenath Menon <sr...@gmail.com>
>To: user@hive.apache.org 
>Sent: Monday, June 4, 2012 2:42 PM
>Subject: Front end visualization tool with Hive (when using as a warehouse)
> 
>
>
>Hi all
>
>I am new to hive and am working on analysis of twitter data with Hive and Hadoop in a 27node cluster.
>At present am using Microsoft powerpivot as the visualization tool for visual representation of analysis done using Hive and have got some really good results and I am stunned by the scalability power of the Hadoop system.
>
>As a next step, I would like to evaluate the warehousing capabilities of Hive for business data.
>Any insights into this is welcome. And am facing problem of delegating job to Hive/Powerpivot as Powerpivot itself has capabilities of being a warehouse tool. Any other good visualization tools for usage with Hive is also welcome.
>
>For analyzing twitter data, I just ran complex Hive queries for each of analysis done. But for a warehouse, this does not sound like a good solution.
>
>Any help is greatly appreciated.
>
>Thanks.
>
>
>

Re: Front end visualization tool with Hive (when using as a warehouse)

Posted by Sreenath Menon <sr...@gmail.com>.
Hi Bejoy

I am not looking for just an UI for queries (even though at first, when
working on twitter data, that was of my interest). But, now I am planning
on using Hive as a warehouse with a front end in-memory processing engine.
Microstrategy or tableau would be a good choice.

Now further refining the problem, I would ask what is the warehousing power
of Hive when compared to a traditional warehouse. Can Hive perform all
operations performed/required in a warehouse. If not, where are the short
comings which I need to deal with.

Always thankful for your apt assistance.

On Mon, Jun 4, 2012 at 3:49 PM, Bejoy Ks <be...@yahoo.com> wrote:

> Hi Sreenath
>
>      If you are looking at a UI for queries then Cloudera's hue is the
> best choice. Also you do have odbc connectors that integrates BI tools like
> microstrategy, tableau etc with hive.
>
> Regards
> Bejoy KS
>
>   ------------------------------
> *From:* Sreenath Menon <sr...@gmail.com>
> *To:* user@hive.apache.org
> *Sent:* Monday, June 4, 2012 2:42 PM
> *Subject:* Front end visualization tool with Hive (when using as a
> warehouse)
>
> Hi all
>
> I am new to hive and am working on analysis of twitter data with Hive and
> Hadoop in a 27node cluster.
> At present am using Microsoft powerpivot as the visualization tool for
> visual representation of analysis done using Hive and have got some really
> good results and I am stunned by the scalability power of the Hadoop system.
>
> As a next step, I would like to evaluate the warehousing capabilities of
> Hive for business data.
> Any insights into this is welcome. And am facing problem of delegating job
> to Hive/Powerpivot as Powerpivot itself has capabilities of being a
> warehouse tool. Any other good visualization tools for usage with Hive is
> also welcome.
>
> For analyzing twitter data, I just ran complex Hive queries for each of
> analysis done. But for a warehouse, this does not sound like a good
> solution.
>
> Any help is greatly appreciated.
>
> Thanks.
>
>
>

Re: Front end visualization tool with Hive (when using as a warehouse)

Posted by Bejoy Ks <be...@yahoo.com>.
Hi Sreenath

     If you are looking at a UI for queries then Cloudera's hue is the best choice. Also you do have odbc connectors that integrates BI tools like microstrategy, tableau etc with hive.

Regards
Bejoy KS


________________________________
 From: Sreenath Menon <sr...@gmail.com>
To: user@hive.apache.org 
Sent: Monday, June 4, 2012 2:42 PM
Subject: Front end visualization tool with Hive (when using as a warehouse)
 

Hi all

I am new to hive and am working on analysis of twitter data with Hive and Hadoop in a 27node cluster.
At present am using Microsoft powerpivot as the visualization tool for visual representation of analysis done using Hive and have got some really good results and I am stunned by the scalability power of the Hadoop system.

As a next step, I would like to evaluate the warehousing capabilities of Hive for business data.
Any insights into this is welcome. And am facing problem of delegating job to Hive/Powerpivot as Powerpivot itself has capabilities of being a warehouse tool. Any other good visualization tools for usage with Hive is also welcome.

For analyzing twitter data, I just ran complex Hive queries for each of analysis done. But for a warehouse, this does not sound like a good solution.

Any help is greatly appreciated.

Thanks.