You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Umesh Prasad <um...@flipkart.com> on 2016/07/21 02:34:29 UTC

Question : Is there a Automated Hive Database/Schema Designer

Hi All,
   Does hive a Automated Database Desginer or has anyone tried building it
?  Something which is equivalent to Vertica's DDB and Microsoft SQL
server's Automated Partitioning Design in Parallel Databases.

References are :
1. Automated Partitioning Design in Parallel Database Systems (
https://cs.brown.edu/courses/cs227/archives/2012/papers/partitioning/p1137-nehme.pdf
)

2. DBDesigner: A Customizable Physical Design Tool for Vertica Analytic
Database
(http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6816725)

Hive tuning tips mention need for pre-sorting tables on filter columns(for
better predicate push down and joins), partitioning/clustering on
join/group by columns, having a higher replication factor for dimension
tables etc. However, I couldn't find any tool/library which suggests a
physical layout given set of hive queries.

Manually designing the physical layout doesn't scale specially the
producers and consumers of tables (Data) are multiple different teams.
There are conflicting requirements for optimizing different queries and
globally optimal design can be very different from locally optimal design.

If someone in community has worked on this or can give pointers, then it
would be extremely useful for us.


Thanks & Regards
Umesh Prasad

Team Lead, Flipkart

Re: Question : Is there a Automated Hive Database/Schema Designer

Posted by Umesh Prasad <um...@flipkart.com>.
Reposting ..

Thanks & Regards
Umesh Prasad


On Thu, Jul 21, 2016 at 8:04 AM, Umesh Prasad <um...@flipkart.com>
wrote:

> Hi All,
>    Does hive a Automated Database Desginer or has anyone tried building it
> ?  Something which is equivalent to Vertica's DDB and Microsoft SQL
> server's Automated Partitioning Design in Parallel Databases.
>
> References are :
> 1. Automated Partitioning Design in Parallel Database Systems (
> https://cs.brown.edu/courses/cs227/archives/2012/papers/partitioning/p1137-nehme.pdf
> )
>
> 2. DBDesigner: A Customizable Physical Design Tool for Vertica Analytic
> Database
> (http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6816725)
>
> Hive tuning tips mention need for pre-sorting tables on filter columns(for
> better predicate push down and joins), partitioning/clustering on
> join/group by columns, having a higher replication factor for dimension
> tables etc. However, I couldn't find any tool/library which suggests a
> physical layout given set of hive queries.
>
> Manually designing the physical layout doesn't scale specially the
> producers and consumers of tables (Data) are multiple different teams.
> There are conflicting requirements for optimizing different queries and
> globally optimal design can be very different from locally optimal design.
>
> If someone in community has worked on this or can give pointers, then it
> would be extremely useful for us.
>
>
> Thanks & Regards
> Umesh Prasad
>
> Team Lead, Flipkart
>
>
>