You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Raj Hadoop <ha...@yahoo.com> on 2013/12/28 18:56:59 UTC

Dynamic columns in Hive Table - Best Design for the problem

Dear All Hive Group Members,

I have the following requirement.

Input:

Ticket#|Date of booking|Price
100|20-Oct-13|54

100|21-Oct-13|56
100|22-Oct-13|54
100|23-Oct-13|55
100|27-Oct-13|60
100|30-Oct-13|47

101|10-Sep-13|12
101|13-Sep-13|14
101|20-Oct-13|6


Expected Output:

Ticket#|Initial|Delta1|Delta2|Delta3|Delta4|Delta5
100|20-Oct-13,54|21-Oct-13,2|22-Oct-13,0|23-Oct-3,1|27-Oct-13,6|30-Oct-13,-7
101|10-Sep-13,12|13-Sep-13,2|20-Oct-13,-6|||

The number of columns in the expected output is a dynamic list depending on the number of price changes of a ticket.

1) What is the best design to solve the above problem in Hive? 
2) How do we implement it?

Please advise.

Regards,
Raj

Re: Dynamic columns in Hive Table - Best Design for the problem

Posted by Edward Capriolo <ed...@gmail.com>.

Basically when you have data like this, it is best to treat the all the
columns as a single string and write a tool to break the entire row apart.
You could use a UDF or a UDTF actually. Look at something like parseUrl...

select myRow(row) as id string, events List<String> ....

A UDTF allows you to produce columns and or rows.

The other way is you write a UDF that returns a struct.


On Sun, Dec 29, 2013 at 10:17 AM, Raj Hadoop <ha...@yahoo.com> wrote:

> Matt,
>
> Thanks for the suggestion. Can you please provide more details on what
> type of UDAF should I develop ? I have never worked on a UDAF earlier. But
> would like to explore it. Any tips on how to proceed.
>
> Thanks,
> Raj
>
>
>   On Saturday, December 28, 2013 2:47 PM, Matt Tucker <
> matthewtckr@gmail.com> wrote:
>  It looks like you're essentially doing a pivot function. Your best bet
> is to write a custom UDAF or look at the windowing functions available in
> recent releases.
> Matt
> On Dec 28, 2013 12:57 PM, "Raj Hadoop" <ha...@yahoo.com> wrote:
>
> Dear All Hive Group Members,
>
> I have the following requirement.
>
> Input:
>
> Ticket#|Date of booking|Price
> 100|20-Oct-13|54
> 100|21-Oct-13|56
> 100|22-Oct-13|54
> 100|23-Oct-13|55
> 100|27-Oct-13|60
> 100|30-Oct-13|47
>
> 101|10-Sep-13|12
> 101|13-Sep-13|14
> 101|20-Oct-13|6
>
>
> Expected Output:
>
> Ticket#|Initial|Delta1|Delta2|Delta3|Delta4|Delta5
> 100|20-Oct-13,54|
> 21-Oct-13,2|22-Oct-13,0|23-Oct-3,1|27-Oct-13,6|30-Oct-13,-7
> 101|10-Sep-13,12|13-Sep-13,2|20-Oct-13,-6|||
>
> The number of columns in the expected output is a dynamic list depending
> on the number of price changes of a ticket.
>
> 1) What is the best design to solve the above problem in Hive?
> 2) How do we implement it?
>
> Please advise.
>
> Regards,
> Raj
>
>
>
>
>
>
>
>

Re: Dynamic columns in Hive Table - Best Design for the problem

Posted by Raj Hadoop <ha...@yahoo.com>.

Matt,

Thanks for the suggestion. Can you please provide more details on what type of UDAF should I develop ? I have never worked on a UDAF earlier. But would like to explore it. Any tips on how to proceed.

Thanks,
Raj

On Saturday, December 28, 2013 2:47 PM, Matt Tucker <ma...@gmail.com> wrote:

It looks like you're essentially doing a pivot function. Your best bet is to write a custom UDAF or look at the windowing functions available in recent releases.
Matt
On Dec 28, 2013 12:57 PM, "Raj Hadoop" <ha...@yahoo.com> wrote:

Dear All Hive Group Members,
>
>
>I have the following requirement.
>
>
>Input:
>
>
>Ticket#|Date of booking|Price
>100|20-Oct-13|54
>
>100|21-Oct-13|56
>100|22-Oct-13|54
>100|23-Oct-13|55
>100|27-Oct-13|60
>100|30-Oct-13|47
>
>
>101|10-Sep-13|12
>101|13-Sep-13|14
>101|20-Oct-13|6
>
>
>
>
>Expected Output:
>
>
>Ticket#|Initial|Delta1|Delta2|Delta3|Delta4|Delta5
>100|20-Oct-13,54|21-Oct-13,2|22-Oct-13,0|23-Oct-3,1|27-Oct-13,6|30-Oct-13,-7
>101|10-Sep-13,12|13-Sep-13,2|20-Oct-13,-6|||
>
>
>The number of columns in the expected output is a dynamic list depending on the number of price changes of a ticket.
>
>
>1) What is the best design to solve the above problem in Hive? 
>2) How do we implement it?
>
>
>Please advise.
>
>
>Regards,
>Raj
>
>
>
>
>
>
>
>
>
>

Re: Dynamic columns in Hive Table - Best Design for the problem

Posted by Matt Tucker <ma...@gmail.com>.

It looks like you're essentially doing a pivot function. Your best bet is
to write a custom UDAF or look at the windowing functions available in
recent releases.

Matt
On Dec 28, 2013 12:57 PM, "Raj Hadoop" <ha...@yahoo.com> wrote:

> Dear All Hive Group Members,
>
> I have the following requirement.
>
> Input:
>
> Ticket#|Date of booking|Price
> 100|20-Oct-13|54
> 100|21-Oct-13|56
> 100|22-Oct-13|54
> 100|23-Oct-13|55
> 100|27-Oct-13|60
> 100|30-Oct-13|47
>
> 101|10-Sep-13|12
> 101|13-Sep-13|14
> 101|20-Oct-13|6
>
>
> Expected Output:
>
> Ticket#|Initial|Delta1|Delta2|Delta3|Delta4|Delta5
> 100|20-Oct-13,54|
> 21-Oct-13,2|22-Oct-13,0|23-Oct-3,1|27-Oct-13,6|30-Oct-13,-7
> 101|10-Sep-13,12|13-Sep-13,2|20-Oct-13,-6|||
>
> The number of columns in the expected output is a dynamic list depending
> on the number of price changes of a ticket.
>
> 1) What is the best design to solve the above problem in Hive?
> 2) How do we implement it?
>
> Please advise.
>
> Regards,
> Raj
>
>
>
>
>
>