You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by ri...@nokia.com on 2012/08/07 22:57:59 UTC

Converting rows into dynamic colums in Hive

Hi All,

One of my Query output looks like-

Alpha                    Beta                       Gamma
123                         xyz                         1.0
123                         abc                         0.5
123                         pqr                         1.3
123                         def                         2.1
456                         xyz                         0.1
456                         abc                         0.6
456                         pqr                         1.9
456                         def                         3.2
456                         lmn                        1.1
456                         sdf                          1.2

I want the output for the data visualization purpose to look like (basically taking rows from one table and making it column in another table)

Alpha                    xyz         abc         pqr         def         lmn        sdf
123                         1.0          0.5          1.3          2.1
456                         0.1          0.6          1.9          3.2          1.1          1.2

Question - Can it be done in Hive? If not, any suggestions.

Thanks,
Richin



RE: Converting rows into dynamic colums in Hive

Posted by "Raghunath, Ranjith" <Ra...@usaa.com>.
You can do this using case statements and summing the values. The only item to remember here is that the values in the grid need to be numeric for the sum to work.

Thanks,
Ranjith

From: richin.jain@nokia.com [mailto:richin.jain@nokia.com]
Sent: Tuesday, August 07, 2012 3:58 PM
To: user@hive.apache.org
Subject: Converting rows into dynamic colums in Hive

Hi All,

One of my Query output looks like-

Alpha                    Beta                       Gamma
123                         xyz                         1.0
123                         abc                         0.5
123                         pqr                         1.3
123                         def                         2.1
456                         xyz                         0.1
456                         abc                         0.6
456                         pqr                         1.9
456                         def                         3.2
456                         lmn                        1.1
456                         sdf                          1.2

I want the output for the data visualization purpose to look like (basically taking rows from one table and making it column in another table)

Alpha                    xyz         abc         pqr         def         lmn        sdf
123                         1.0          0.5          1.3          2.1
456                         0.1          0.6          1.9          3.2          1.1          1.2

Question - Can it be done in Hive? If not, any suggestions.

Thanks,
Richin



RE: Converting rows into dynamic colums in Hive

Posted by ri...@nokia.com.
You could do it using Pivot table in MS Excel. It's under the Insert tab, first option on the left.

Richin

-----Original Message-----
From: Jain Richin (Nokia-LC/Boston) 
Sent: Thursday, August 09, 2012 4:16 PM
To: user@hive.apache.org
Subject: RE: Converting rows into dynamic colums in Hive

Thanks John. 
Is there a way to do this is excel since I am outputting the table data in csv format (using macros or something else)?

Richin

-----Original Message-----
From: ext John Meagher [mailto:john.meagher@gmail.com]
Sent: Thursday, August 09, 2012 10:11 AM
To: user@hive.apache.org
Subject: Re: Converting rows into dynamic colums in Hive

R is another programming language geared to statistical analysis.  It has some features that make it easy to write this sort of data manipulation.  http://www.r-project.org/

Here's a sample R function that does something similar to what you need (note: I'm not a real R programmer, so this might be ugly, but it
works):

abCountsToMatrix <- function(csvFilename, outputmatrixfile){

   data <- read.csv(csvFilename, as.is=T, check.names=F, header=F)
   cat("Loaded", nrow(data), "rows.\n")
   idrows <- sort(unique(c(data[ , 1])))
   idcols <- sort(unique(c(data[ , 2])))

   cat("Found", length(idrows), " X ", length(idcols), ".")
   output <- array(0, c(length(idrows), length(idcols)))
   rownames(output) <- idrows
   colnames(output) <- idcols

   for(i in 1:nrow(data)){

     if (i %% 100 == 0) {
       cat("On ", i, " of ", nrow(data), "\n")
     }

     rowidx <- which(idrows == data[i, 1])
     colidx <- which(idcols == data[i, 2])

     count <- data[i,3]

     output[rowidx, colidx] <- count
   }


   write.csv(output, file=outputmatrixfile, quote=F)

}



On Wed, Aug 8, 2012 at 8:58 PM,  <ri...@nokia.com> wrote:
> John,
>
> What is R?
>
> -----Original Message-----
> From: ext John Meagher [mailto:john.meagher@gmail.com]
> Sent: Wednesday, August 08, 2012 4:34 PM
> To: user@hive.apache.org
> Subject: Re: Converting rows into dynamic colums in Hive
>
> I don't think having dynamic columns is possible in Hive.  I've always output from Hive a structure like your query output and used R to convert it into a dynamic column structure.
>
>
> On Wed, Aug 8, 2012 at 3:56 PM,  <ri...@nokia.com> wrote:
>> Thanks Ashish, that gives an idea.
>>
>> But I am not sure about the outer select loop, I have to know all the 
>> values in Beta column beforehand to do a max on each value.
>>
>> Is there a better way?
>>
>>
>>
>> Richin
>>
>>
>>
>> From: ext Ashish Thusoo [mailto:athusoo@qubole.com]
>> Sent: Tuesday, August 07, 2012 5:05 PM
>> To: user@hive.apache.org
>> Subject: Re: Converting rows into dynamic colums in Hive
>>
>>
>>
>> you should be able to do this in hive using a group by on alpha and 
>> then using a combination of the max and if statement... something on 
>> the following lines
>>
>> select alpha, max(abc), max(pqr), ...
>> (
>>   select alpha, if (beta == 'abc', Gamma, NULL) as abc, if (beta == 
>> 'pqr', Gamma, NUL) as pqr, ....
>>   from table
>> )
>> group by alpha
>>
>> something on those lines...
>>
>> Ashish
>>
>> On Tue, Aug 7, 2012 at 1:57 PM, <ri...@nokia.com> wrote:
>>
>> Hi All,
>>
>>
>>
>> One of my Query output looks like-
>>
>>
>>
>> Alpha                    Beta                       Gamma
>>
>> 123                         xyz                         1.0
>>
>> 123                         abc                         0.5
>>
>> 123                         pqr                         1.3
>>
>> 123                         def                         2.1
>>
>> 456                         xyz                         0.1
>>
>> 456                         abc                         0.6
>>
>> 456                         pqr                         1.9
>>
>> 456                         def                         3.2
>>
>> 456                         lmn                        1.1
>>
>> 456                         sdf                          1.2
>>
>>
>>
>> I want the output for the data visualization purpose to look like 
>> (basically taking rows from one table and making it column in another
>> table)
>>
>>
>>
>> Alpha                    xyz         abc         pqr         def         lmn
>> sdf
>>
>> 123                         1.0          0.5          1.3          2.1
>>
>> 456                         0.1          0.6          1.9          3.2
>> 1.1          1.2
>>
>>
>>
>> Question - Can it be done in Hive? If not, any suggestions.
>>
>>
>>
>> Thanks,
>>
>> Richin
>>
>>
>>
>>
>>
>>

RE: Converting rows into dynamic colums in Hive

Posted by ri...@nokia.com.
Thanks John. 
Is there a way to do this is excel since I am outputting the table data in csv format (using macros or something else)?

Richin

-----Original Message-----
From: ext John Meagher [mailto:john.meagher@gmail.com] 
Sent: Thursday, August 09, 2012 10:11 AM
To: user@hive.apache.org
Subject: Re: Converting rows into dynamic colums in Hive

R is another programming language geared to statistical analysis.  It has some features that make it easy to write this sort of data manipulation.  http://www.r-project.org/

Here's a sample R function that does something similar to what you need (note: I'm not a real R programmer, so this might be ugly, but it
works):

abCountsToMatrix <- function(csvFilename, outputmatrixfile){

   data <- read.csv(csvFilename, as.is=T, check.names=F, header=F)
   cat("Loaded", nrow(data), "rows.\n")
   idrows <- sort(unique(c(data[ , 1])))
   idcols <- sort(unique(c(data[ , 2])))

   cat("Found", length(idrows), " X ", length(idcols), ".")
   output <- array(0, c(length(idrows), length(idcols)))
   rownames(output) <- idrows
   colnames(output) <- idcols

   for(i in 1:nrow(data)){

     if (i %% 100 == 0) {
       cat("On ", i, " of ", nrow(data), "\n")
     }

     rowidx <- which(idrows == data[i, 1])
     colidx <- which(idcols == data[i, 2])

     count <- data[i,3]

     output[rowidx, colidx] <- count
   }


   write.csv(output, file=outputmatrixfile, quote=F)

}



On Wed, Aug 8, 2012 at 8:58 PM,  <ri...@nokia.com> wrote:
> John,
>
> What is R?
>
> -----Original Message-----
> From: ext John Meagher [mailto:john.meagher@gmail.com]
> Sent: Wednesday, August 08, 2012 4:34 PM
> To: user@hive.apache.org
> Subject: Re: Converting rows into dynamic colums in Hive
>
> I don't think having dynamic columns is possible in Hive.  I've always output from Hive a structure like your query output and used R to convert it into a dynamic column structure.
>
>
> On Wed, Aug 8, 2012 at 3:56 PM,  <ri...@nokia.com> wrote:
>> Thanks Ashish, that gives an idea.
>>
>> But I am not sure about the outer select loop, I have to know all the 
>> values in Beta column beforehand to do a max on each value.
>>
>> Is there a better way?
>>
>>
>>
>> Richin
>>
>>
>>
>> From: ext Ashish Thusoo [mailto:athusoo@qubole.com]
>> Sent: Tuesday, August 07, 2012 5:05 PM
>> To: user@hive.apache.org
>> Subject: Re: Converting rows into dynamic colums in Hive
>>
>>
>>
>> you should be able to do this in hive using a group by on alpha and 
>> then using a combination of the max and if statement... something on 
>> the following lines
>>
>> select alpha, max(abc), max(pqr), ...
>> (
>>   select alpha, if (beta == 'abc', Gamma, NULL) as abc, if (beta == 
>> 'pqr', Gamma, NUL) as pqr, ....
>>   from table
>> )
>> group by alpha
>>
>> something on those lines...
>>
>> Ashish
>>
>> On Tue, Aug 7, 2012 at 1:57 PM, <ri...@nokia.com> wrote:
>>
>> Hi All,
>>
>>
>>
>> One of my Query output looks like-
>>
>>
>>
>> Alpha                    Beta                       Gamma
>>
>> 123                         xyz                         1.0
>>
>> 123                         abc                         0.5
>>
>> 123                         pqr                         1.3
>>
>> 123                         def                         2.1
>>
>> 456                         xyz                         0.1
>>
>> 456                         abc                         0.6
>>
>> 456                         pqr                         1.9
>>
>> 456                         def                         3.2
>>
>> 456                         lmn                        1.1
>>
>> 456                         sdf                          1.2
>>
>>
>>
>> I want the output for the data visualization purpose to look like 
>> (basically taking rows from one table and making it column in another
>> table)
>>
>>
>>
>> Alpha                    xyz         abc         pqr         def         lmn
>> sdf
>>
>> 123                         1.0          0.5          1.3          2.1
>>
>> 456                         0.1          0.6          1.9          3.2
>> 1.1          1.2
>>
>>
>>
>> Question - Can it be done in Hive? If not, any suggestions.
>>
>>
>>
>> Thanks,
>>
>> Richin
>>
>>
>>
>>
>>
>>

Re: Converting rows into dynamic colums in Hive

Posted by John Meagher <jo...@gmail.com>.
R is another programming language geared to statistical analysis.  It
has some features that make it easy to write this sort of data
manipulation.  http://www.r-project.org/

Here's a sample R function that does something similar to what you
need (note: I'm not a real R programmer, so this might be ugly, but it
works):

abCountsToMatrix <- function(csvFilename, outputmatrixfile){

   data <- read.csv(csvFilename, as.is=T, check.names=F, header=F)
   cat("Loaded", nrow(data), "rows.\n")
   idrows <- sort(unique(c(data[ , 1])))
   idcols <- sort(unique(c(data[ , 2])))

   cat("Found", length(idrows), " X ", length(idcols), ".")
   output <- array(0, c(length(idrows), length(idcols)))
   rownames(output) <- idrows
   colnames(output) <- idcols

   for(i in 1:nrow(data)){

     if (i %% 100 == 0) {
       cat("On ", i, " of ", nrow(data), "\n")
     }

     rowidx <- which(idrows == data[i, 1])
     colidx <- which(idcols == data[i, 2])

     count <- data[i,3]

     output[rowidx, colidx] <- count
   }


   write.csv(output, file=outputmatrixfile, quote=F)

}



On Wed, Aug 8, 2012 at 8:58 PM,  <ri...@nokia.com> wrote:
> John,
>
> What is R?
>
> -----Original Message-----
> From: ext John Meagher [mailto:john.meagher@gmail.com]
> Sent: Wednesday, August 08, 2012 4:34 PM
> To: user@hive.apache.org
> Subject: Re: Converting rows into dynamic colums in Hive
>
> I don't think having dynamic columns is possible in Hive.  I've always output from Hive a structure like your query output and used R to convert it into a dynamic column structure.
>
>
> On Wed, Aug 8, 2012 at 3:56 PM,  <ri...@nokia.com> wrote:
>> Thanks Ashish, that gives an idea.
>>
>> But I am not sure about the outer select loop, I have to know all the
>> values in Beta column beforehand to do a max on each value.
>>
>> Is there a better way?
>>
>>
>>
>> Richin
>>
>>
>>
>> From: ext Ashish Thusoo [mailto:athusoo@qubole.com]
>> Sent: Tuesday, August 07, 2012 5:05 PM
>> To: user@hive.apache.org
>> Subject: Re: Converting rows into dynamic colums in Hive
>>
>>
>>
>> you should be able to do this in hive using a group by on alpha and
>> then using a combination of the max and if statement... something on
>> the following lines
>>
>> select alpha, max(abc), max(pqr), ...
>> (
>>   select alpha, if (beta == 'abc', Gamma, NULL) as abc, if (beta ==
>> 'pqr', Gamma, NUL) as pqr, ....
>>   from table
>> )
>> group by alpha
>>
>> something on those lines...
>>
>> Ashish
>>
>> On Tue, Aug 7, 2012 at 1:57 PM, <ri...@nokia.com> wrote:
>>
>> Hi All,
>>
>>
>>
>> One of my Query output looks like-
>>
>>
>>
>> Alpha                    Beta                       Gamma
>>
>> 123                         xyz                         1.0
>>
>> 123                         abc                         0.5
>>
>> 123                         pqr                         1.3
>>
>> 123                         def                         2.1
>>
>> 456                         xyz                         0.1
>>
>> 456                         abc                         0.6
>>
>> 456                         pqr                         1.9
>>
>> 456                         def                         3.2
>>
>> 456                         lmn                        1.1
>>
>> 456                         sdf                          1.2
>>
>>
>>
>> I want the output for the data visualization purpose to look like
>> (basically taking rows from one table and making it column in another
>> table)
>>
>>
>>
>> Alpha                    xyz         abc         pqr         def         lmn
>> sdf
>>
>> 123                         1.0          0.5          1.3          2.1
>>
>> 456                         0.1          0.6          1.9          3.2
>> 1.1          1.2
>>
>>
>>
>> Question - Can it be done in Hive? If not, any suggestions.
>>
>>
>>
>> Thanks,
>>
>> Richin
>>
>>
>>
>>
>>
>>

RE: Converting rows into dynamic colums in Hive

Posted by ri...@nokia.com.
John,

What is R?

-----Original Message-----
From: ext John Meagher [mailto:john.meagher@gmail.com] 
Sent: Wednesday, August 08, 2012 4:34 PM
To: user@hive.apache.org
Subject: Re: Converting rows into dynamic colums in Hive

I don't think having dynamic columns is possible in Hive.  I've always output from Hive a structure like your query output and used R to convert it into a dynamic column structure.


On Wed, Aug 8, 2012 at 3:56 PM,  <ri...@nokia.com> wrote:
> Thanks Ashish, that gives an idea.
>
> But I am not sure about the outer select loop, I have to know all the 
> values in Beta column beforehand to do a max on each value.
>
> Is there a better way?
>
>
>
> Richin
>
>
>
> From: ext Ashish Thusoo [mailto:athusoo@qubole.com]
> Sent: Tuesday, August 07, 2012 5:05 PM
> To: user@hive.apache.org
> Subject: Re: Converting rows into dynamic colums in Hive
>
>
>
> you should be able to do this in hive using a group by on alpha and 
> then using a combination of the max and if statement... something on 
> the following lines
>
> select alpha, max(abc), max(pqr), ...
> (
>   select alpha, if (beta == 'abc', Gamma, NULL) as abc, if (beta == 
> 'pqr', Gamma, NUL) as pqr, ....
>   from table
> )
> group by alpha
>
> something on those lines...
>
> Ashish
>
> On Tue, Aug 7, 2012 at 1:57 PM, <ri...@nokia.com> wrote:
>
> Hi All,
>
>
>
> One of my Query output looks like-
>
>
>
> Alpha                    Beta                       Gamma
>
> 123                         xyz                         1.0
>
> 123                         abc                         0.5
>
> 123                         pqr                         1.3
>
> 123                         def                         2.1
>
> 456                         xyz                         0.1
>
> 456                         abc                         0.6
>
> 456                         pqr                         1.9
>
> 456                         def                         3.2
>
> 456                         lmn                        1.1
>
> 456                         sdf                          1.2
>
>
>
> I want the output for the data visualization purpose to look like 
> (basically taking rows from one table and making it column in another 
> table)
>
>
>
> Alpha                    xyz         abc         pqr         def         lmn
> sdf
>
> 123                         1.0          0.5          1.3          2.1
>
> 456                         0.1          0.6          1.9          3.2
> 1.1          1.2
>
>
>
> Question - Can it be done in Hive? If not, any suggestions.
>
>
>
> Thanks,
>
> Richin
>
>
>
>
>
>

Re: Converting rows into dynamic colums in Hive

Posted by John Meagher <jo...@gmail.com>.
I don't think having dynamic columns is possible in Hive.  I've always
output from Hive a structure like your query output and used R to
convert it into a dynamic column structure.


On Wed, Aug 8, 2012 at 3:56 PM,  <ri...@nokia.com> wrote:
> Thanks Ashish, that gives an idea.
>
> But I am not sure about the outer select loop, I have to know all the values
> in Beta column beforehand to do a max on each value.
>
> Is there a better way?
>
>
>
> Richin
>
>
>
> From: ext Ashish Thusoo [mailto:athusoo@qubole.com]
> Sent: Tuesday, August 07, 2012 5:05 PM
> To: user@hive.apache.org
> Subject: Re: Converting rows into dynamic colums in Hive
>
>
>
> you should be able to do this in hive using a group by on alpha and then
> using a combination of the max and if statement... something on the
> following lines
>
> select alpha, max(abc), max(pqr), ...
> (
>   select alpha, if (beta == 'abc', Gamma, NULL) as abc, if (beta == 'pqr',
> Gamma, NUL) as pqr, ....
>   from table
> )
> group by alpha
>
> something on those lines...
>
> Ashish
>
> On Tue, Aug 7, 2012 at 1:57 PM, <ri...@nokia.com> wrote:
>
> Hi All,
>
>
>
> One of my Query output looks like-
>
>
>
> Alpha                    Beta                       Gamma
>
> 123                         xyz                         1.0
>
> 123                         abc                         0.5
>
> 123                         pqr                         1.3
>
> 123                         def                         2.1
>
> 456                         xyz                         0.1
>
> 456                         abc                         0.6
>
> 456                         pqr                         1.9
>
> 456                         def                         3.2
>
> 456                         lmn                        1.1
>
> 456                         sdf                          1.2
>
>
>
> I want the output for the data visualization purpose to look like (basically
> taking rows from one table and making it column in another table)
>
>
>
> Alpha                    xyz         abc         pqr         def         lmn
> sdf
>
> 123                         1.0          0.5          1.3          2.1
>
> 456                         0.1          0.6          1.9          3.2
> 1.1          1.2
>
>
>
> Question – Can it be done in Hive? If not, any suggestions.
>
>
>
> Thanks,
>
> Richin
>
>
>
>
>
>

RE: Converting rows into dynamic colums in Hive

Posted by ri...@nokia.com.
Thanks Ashish, that gives an idea.
But I am not sure about the outer select loop, I have to know all the values in Beta column beforehand to do a max on each value.
Is there a better way?

Richin

From: ext Ashish Thusoo [mailto:athusoo@qubole.com]
Sent: Tuesday, August 07, 2012 5:05 PM
To: user@hive.apache.org
Subject: Re: Converting rows into dynamic colums in Hive

you should be able to do this in hive using a group by on alpha and then using a combination of the max and if statement... something on the following lines

select alpha, max(abc), max(pqr), ...
(
  select alpha, if (beta == 'abc', Gamma, NULL) as abc, if (beta == 'pqr', Gamma, NUL) as pqr, ....
  from table
)
group by alpha

something on those lines...

Ashish
On Tue, Aug 7, 2012 at 1:57 PM, <ri...@nokia.com>> wrote:
Hi All,

One of my Query output looks like-

Alpha                    Beta                       Gamma
123                         xyz                         1.0
123                         abc                         0.5
123                         pqr                         1.3
123                         def                         2.1
456                         xyz                         0.1
456                         abc                         0.6
456                         pqr                         1.9
456                         def                         3.2
456                         lmn                        1.1
456                         sdf                          1.2

I want the output for the data visualization purpose to look like (basically taking rows from one table and making it column in another table)

Alpha                    xyz         abc         pqr         def         lmn        sdf
123                         1.0          0.5          1.3          2.1
456                         0.1          0.6          1.9          3.2          1.1          1.2

Question - Can it be done in Hive? If not, any suggestions.

Thanks,
Richin




Re: Converting rows into dynamic colums in Hive

Posted by Ashish Thusoo <at...@qubole.com>.
you should be able to do this in hive using a group by on alpha and then
using a combination of the max and if statement... something on the
following lines

select alpha, max(abc), max(pqr), ...
(
  select alpha, if (beta == 'abc', Gamma, NULL) as abc, if (beta == 'pqr',
Gamma, NUL) as pqr, ....
  from table
)
group by alpha

something on those lines...

Ashish

On Tue, Aug 7, 2012 at 1:57 PM, <ri...@nokia.com> wrote:

>  Hi All,****
>
> ** **
>
> One of my Query output looks like-****
>
> ** **
>
> *Alpha                    Beta                       Gamma*
>
> 123                         xyz                         1.0****
>
> 123                         abc                         0.5****
>
> 123                         pqr                         1.3****
>
> 123                         def                         2.1****
>
> *456                         xyz                         0.1*
>
> *456                         abc                         0.6*
>
> *456                         pqr                         1.9*
>
> *456                         def                         3.2*
>
> *456                         lmn                        1.1*
>
> *456                         sdf                          1.2*
>
> ** **
>
> I want the output for the data visualization purpose to look like
> (basically taking rows from one table and making it column in another table)
> ****
>
> ** **
>
> *Alpha                    xyz         abc         pqr         def
> lmn        sdf          *
>
> 123                         1.0          0.5          1.3
> 2.1                                          ****
>
> 456                         0.1          0.6          1.9
> 3.2          1.1          1.2****
>
> ** **
>
> Question – Can it be done in Hive? If not, any suggestions.****
>
> ** **
>
> Thanks,****
>
> Richin****
>
> ** **
>
> ** **
>