You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by ri...@nokia.com on 2012/08/07 22:57:59 UTC
Converting rows into dynamic colums in Hive
Hi All,
One of my Query output looks like-
Alpha Beta Gamma
123 xyz 1.0
123 abc 0.5
123 pqr 1.3
123 def 2.1
456 xyz 0.1
456 abc 0.6
456 pqr 1.9
456 def 3.2
456 lmn 1.1
456 sdf 1.2
I want the output for the data visualization purpose to look like (basically taking rows from one table and making it column in another table)
Alpha xyz abc pqr def lmn sdf
123 1.0 0.5 1.3 2.1
456 0.1 0.6 1.9 3.2 1.1 1.2
Question - Can it be done in Hive? If not, any suggestions.
Thanks,
Richin
RE: Converting rows into dynamic colums in Hive
Posted by "Raghunath, Ranjith" <Ra...@usaa.com>.
You can do this using case statements and summing the values. The only item to remember here is that the values in the grid need to be numeric for the sum to work.
Thanks,
Ranjith
From: richin.jain@nokia.com [mailto:richin.jain@nokia.com]
Sent: Tuesday, August 07, 2012 3:58 PM
To: user@hive.apache.org
Subject: Converting rows into dynamic colums in Hive
Hi All,
One of my Query output looks like-
Alpha Beta Gamma
123 xyz 1.0
123 abc 0.5
123 pqr 1.3
123 def 2.1
456 xyz 0.1
456 abc 0.6
456 pqr 1.9
456 def 3.2
456 lmn 1.1
456 sdf 1.2
I want the output for the data visualization purpose to look like (basically taking rows from one table and making it column in another table)
Alpha xyz abc pqr def lmn sdf
123 1.0 0.5 1.3 2.1
456 0.1 0.6 1.9 3.2 1.1 1.2
Question - Can it be done in Hive? If not, any suggestions.
Thanks,
Richin
RE: Converting rows into dynamic colums in Hive
Posted by ri...@nokia.com.
You could do it using Pivot table in MS Excel. It's under the Insert tab, first option on the left.
Richin
-----Original Message-----
From: Jain Richin (Nokia-LC/Boston)
Sent: Thursday, August 09, 2012 4:16 PM
To: user@hive.apache.org
Subject: RE: Converting rows into dynamic colums in Hive
Thanks John.
Is there a way to do this is excel since I am outputting the table data in csv format (using macros or something else)?
Richin
-----Original Message-----
From: ext John Meagher [mailto:john.meagher@gmail.com]
Sent: Thursday, August 09, 2012 10:11 AM
To: user@hive.apache.org
Subject: Re: Converting rows into dynamic colums in Hive
R is another programming language geared to statistical analysis. It has some features that make it easy to write this sort of data manipulation. http://www.r-project.org/
Here's a sample R function that does something similar to what you need (note: I'm not a real R programmer, so this might be ugly, but it
works):
abCountsToMatrix <- function(csvFilename, outputmatrixfile){
data <- read.csv(csvFilename, as.is=T, check.names=F, header=F)
cat("Loaded", nrow(data), "rows.\n")
idrows <- sort(unique(c(data[ , 1])))
idcols <- sort(unique(c(data[ , 2])))
cat("Found", length(idrows), " X ", length(idcols), ".")
output <- array(0, c(length(idrows), length(idcols)))
rownames(output) <- idrows
colnames(output) <- idcols
for(i in 1:nrow(data)){
if (i %% 100 == 0) {
cat("On ", i, " of ", nrow(data), "\n")
}
rowidx <- which(idrows == data[i, 1])
colidx <- which(idcols == data[i, 2])
count <- data[i,3]
output[rowidx, colidx] <- count
}
write.csv(output, file=outputmatrixfile, quote=F)
}
On Wed, Aug 8, 2012 at 8:58 PM, <ri...@nokia.com> wrote:
> John,
>
> What is R?
>
> -----Original Message-----
> From: ext John Meagher [mailto:john.meagher@gmail.com]
> Sent: Wednesday, August 08, 2012 4:34 PM
> To: user@hive.apache.org
> Subject: Re: Converting rows into dynamic colums in Hive
>
> I don't think having dynamic columns is possible in Hive. I've always output from Hive a structure like your query output and used R to convert it into a dynamic column structure.
>
>
> On Wed, Aug 8, 2012 at 3:56 PM, <ri...@nokia.com> wrote:
>> Thanks Ashish, that gives an idea.
>>
>> But I am not sure about the outer select loop, I have to know all the
>> values in Beta column beforehand to do a max on each value.
>>
>> Is there a better way?
>>
>>
>>
>> Richin
>>
>>
>>
>> From: ext Ashish Thusoo [mailto:athusoo@qubole.com]
>> Sent: Tuesday, August 07, 2012 5:05 PM
>> To: user@hive.apache.org
>> Subject: Re: Converting rows into dynamic colums in Hive
>>
>>
>>
>> you should be able to do this in hive using a group by on alpha and
>> then using a combination of the max and if statement... something on
>> the following lines
>>
>> select alpha, max(abc), max(pqr), ...
>> (
>> select alpha, if (beta == 'abc', Gamma, NULL) as abc, if (beta ==
>> 'pqr', Gamma, NUL) as pqr, ....
>> from table
>> )
>> group by alpha
>>
>> something on those lines...
>>
>> Ashish
>>
>> On Tue, Aug 7, 2012 at 1:57 PM, <ri...@nokia.com> wrote:
>>
>> Hi All,
>>
>>
>>
>> One of my Query output looks like-
>>
>>
>>
>> Alpha Beta Gamma
>>
>> 123 xyz 1.0
>>
>> 123 abc 0.5
>>
>> 123 pqr 1.3
>>
>> 123 def 2.1
>>
>> 456 xyz 0.1
>>
>> 456 abc 0.6
>>
>> 456 pqr 1.9
>>
>> 456 def 3.2
>>
>> 456 lmn 1.1
>>
>> 456 sdf 1.2
>>
>>
>>
>> I want the output for the data visualization purpose to look like
>> (basically taking rows from one table and making it column in another
>> table)
>>
>>
>>
>> Alpha xyz abc pqr def lmn
>> sdf
>>
>> 123 1.0 0.5 1.3 2.1
>>
>> 456 0.1 0.6 1.9 3.2
>> 1.1 1.2
>>
>>
>>
>> Question - Can it be done in Hive? If not, any suggestions.
>>
>>
>>
>> Thanks,
>>
>> Richin
>>
>>
>>
>>
>>
>>
RE: Converting rows into dynamic colums in Hive
Posted by ri...@nokia.com.
Thanks John.
Is there a way to do this is excel since I am outputting the table data in csv format (using macros or something else)?
Richin
-----Original Message-----
From: ext John Meagher [mailto:john.meagher@gmail.com]
Sent: Thursday, August 09, 2012 10:11 AM
To: user@hive.apache.org
Subject: Re: Converting rows into dynamic colums in Hive
R is another programming language geared to statistical analysis. It has some features that make it easy to write this sort of data manipulation. http://www.r-project.org/
Here's a sample R function that does something similar to what you need (note: I'm not a real R programmer, so this might be ugly, but it
works):
abCountsToMatrix <- function(csvFilename, outputmatrixfile){
data <- read.csv(csvFilename, as.is=T, check.names=F, header=F)
cat("Loaded", nrow(data), "rows.\n")
idrows <- sort(unique(c(data[ , 1])))
idcols <- sort(unique(c(data[ , 2])))
cat("Found", length(idrows), " X ", length(idcols), ".")
output <- array(0, c(length(idrows), length(idcols)))
rownames(output) <- idrows
colnames(output) <- idcols
for(i in 1:nrow(data)){
if (i %% 100 == 0) {
cat("On ", i, " of ", nrow(data), "\n")
}
rowidx <- which(idrows == data[i, 1])
colidx <- which(idcols == data[i, 2])
count <- data[i,3]
output[rowidx, colidx] <- count
}
write.csv(output, file=outputmatrixfile, quote=F)
}
On Wed, Aug 8, 2012 at 8:58 PM, <ri...@nokia.com> wrote:
> John,
>
> What is R?
>
> -----Original Message-----
> From: ext John Meagher [mailto:john.meagher@gmail.com]
> Sent: Wednesday, August 08, 2012 4:34 PM
> To: user@hive.apache.org
> Subject: Re: Converting rows into dynamic colums in Hive
>
> I don't think having dynamic columns is possible in Hive. I've always output from Hive a structure like your query output and used R to convert it into a dynamic column structure.
>
>
> On Wed, Aug 8, 2012 at 3:56 PM, <ri...@nokia.com> wrote:
>> Thanks Ashish, that gives an idea.
>>
>> But I am not sure about the outer select loop, I have to know all the
>> values in Beta column beforehand to do a max on each value.
>>
>> Is there a better way?
>>
>>
>>
>> Richin
>>
>>
>>
>> From: ext Ashish Thusoo [mailto:athusoo@qubole.com]
>> Sent: Tuesday, August 07, 2012 5:05 PM
>> To: user@hive.apache.org
>> Subject: Re: Converting rows into dynamic colums in Hive
>>
>>
>>
>> you should be able to do this in hive using a group by on alpha and
>> then using a combination of the max and if statement... something on
>> the following lines
>>
>> select alpha, max(abc), max(pqr), ...
>> (
>> select alpha, if (beta == 'abc', Gamma, NULL) as abc, if (beta ==
>> 'pqr', Gamma, NUL) as pqr, ....
>> from table
>> )
>> group by alpha
>>
>> something on those lines...
>>
>> Ashish
>>
>> On Tue, Aug 7, 2012 at 1:57 PM, <ri...@nokia.com> wrote:
>>
>> Hi All,
>>
>>
>>
>> One of my Query output looks like-
>>
>>
>>
>> Alpha Beta Gamma
>>
>> 123 xyz 1.0
>>
>> 123 abc 0.5
>>
>> 123 pqr 1.3
>>
>> 123 def 2.1
>>
>> 456 xyz 0.1
>>
>> 456 abc 0.6
>>
>> 456 pqr 1.9
>>
>> 456 def 3.2
>>
>> 456 lmn 1.1
>>
>> 456 sdf 1.2
>>
>>
>>
>> I want the output for the data visualization purpose to look like
>> (basically taking rows from one table and making it column in another
>> table)
>>
>>
>>
>> Alpha xyz abc pqr def lmn
>> sdf
>>
>> 123 1.0 0.5 1.3 2.1
>>
>> 456 0.1 0.6 1.9 3.2
>> 1.1 1.2
>>
>>
>>
>> Question - Can it be done in Hive? If not, any suggestions.
>>
>>
>>
>> Thanks,
>>
>> Richin
>>
>>
>>
>>
>>
>>
Re: Converting rows into dynamic colums in Hive
Posted by John Meagher <jo...@gmail.com>.
R is another programming language geared to statistical analysis. It
has some features that make it easy to write this sort of data
manipulation. http://www.r-project.org/
Here's a sample R function that does something similar to what you
need (note: I'm not a real R programmer, so this might be ugly, but it
works):
abCountsToMatrix <- function(csvFilename, outputmatrixfile){
data <- read.csv(csvFilename, as.is=T, check.names=F, header=F)
cat("Loaded", nrow(data), "rows.\n")
idrows <- sort(unique(c(data[ , 1])))
idcols <- sort(unique(c(data[ , 2])))
cat("Found", length(idrows), " X ", length(idcols), ".")
output <- array(0, c(length(idrows), length(idcols)))
rownames(output) <- idrows
colnames(output) <- idcols
for(i in 1:nrow(data)){
if (i %% 100 == 0) {
cat("On ", i, " of ", nrow(data), "\n")
}
rowidx <- which(idrows == data[i, 1])
colidx <- which(idcols == data[i, 2])
count <- data[i,3]
output[rowidx, colidx] <- count
}
write.csv(output, file=outputmatrixfile, quote=F)
}
On Wed, Aug 8, 2012 at 8:58 PM, <ri...@nokia.com> wrote:
> John,
>
> What is R?
>
> -----Original Message-----
> From: ext John Meagher [mailto:john.meagher@gmail.com]
> Sent: Wednesday, August 08, 2012 4:34 PM
> To: user@hive.apache.org
> Subject: Re: Converting rows into dynamic colums in Hive
>
> I don't think having dynamic columns is possible in Hive. I've always output from Hive a structure like your query output and used R to convert it into a dynamic column structure.
>
>
> On Wed, Aug 8, 2012 at 3:56 PM, <ri...@nokia.com> wrote:
>> Thanks Ashish, that gives an idea.
>>
>> But I am not sure about the outer select loop, I have to know all the
>> values in Beta column beforehand to do a max on each value.
>>
>> Is there a better way?
>>
>>
>>
>> Richin
>>
>>
>>
>> From: ext Ashish Thusoo [mailto:athusoo@qubole.com]
>> Sent: Tuesday, August 07, 2012 5:05 PM
>> To: user@hive.apache.org
>> Subject: Re: Converting rows into dynamic colums in Hive
>>
>>
>>
>> you should be able to do this in hive using a group by on alpha and
>> then using a combination of the max and if statement... something on
>> the following lines
>>
>> select alpha, max(abc), max(pqr), ...
>> (
>> select alpha, if (beta == 'abc', Gamma, NULL) as abc, if (beta ==
>> 'pqr', Gamma, NUL) as pqr, ....
>> from table
>> )
>> group by alpha
>>
>> something on those lines...
>>
>> Ashish
>>
>> On Tue, Aug 7, 2012 at 1:57 PM, <ri...@nokia.com> wrote:
>>
>> Hi All,
>>
>>
>>
>> One of my Query output looks like-
>>
>>
>>
>> Alpha Beta Gamma
>>
>> 123 xyz 1.0
>>
>> 123 abc 0.5
>>
>> 123 pqr 1.3
>>
>> 123 def 2.1
>>
>> 456 xyz 0.1
>>
>> 456 abc 0.6
>>
>> 456 pqr 1.9
>>
>> 456 def 3.2
>>
>> 456 lmn 1.1
>>
>> 456 sdf 1.2
>>
>>
>>
>> I want the output for the data visualization purpose to look like
>> (basically taking rows from one table and making it column in another
>> table)
>>
>>
>>
>> Alpha xyz abc pqr def lmn
>> sdf
>>
>> 123 1.0 0.5 1.3 2.1
>>
>> 456 0.1 0.6 1.9 3.2
>> 1.1 1.2
>>
>>
>>
>> Question - Can it be done in Hive? If not, any suggestions.
>>
>>
>>
>> Thanks,
>>
>> Richin
>>
>>
>>
>>
>>
>>
RE: Converting rows into dynamic colums in Hive
Posted by ri...@nokia.com.
John,
What is R?
-----Original Message-----
From: ext John Meagher [mailto:john.meagher@gmail.com]
Sent: Wednesday, August 08, 2012 4:34 PM
To: user@hive.apache.org
Subject: Re: Converting rows into dynamic colums in Hive
I don't think having dynamic columns is possible in Hive. I've always output from Hive a structure like your query output and used R to convert it into a dynamic column structure.
On Wed, Aug 8, 2012 at 3:56 PM, <ri...@nokia.com> wrote:
> Thanks Ashish, that gives an idea.
>
> But I am not sure about the outer select loop, I have to know all the
> values in Beta column beforehand to do a max on each value.
>
> Is there a better way?
>
>
>
> Richin
>
>
>
> From: ext Ashish Thusoo [mailto:athusoo@qubole.com]
> Sent: Tuesday, August 07, 2012 5:05 PM
> To: user@hive.apache.org
> Subject: Re: Converting rows into dynamic colums in Hive
>
>
>
> you should be able to do this in hive using a group by on alpha and
> then using a combination of the max and if statement... something on
> the following lines
>
> select alpha, max(abc), max(pqr), ...
> (
> select alpha, if (beta == 'abc', Gamma, NULL) as abc, if (beta ==
> 'pqr', Gamma, NUL) as pqr, ....
> from table
> )
> group by alpha
>
> something on those lines...
>
> Ashish
>
> On Tue, Aug 7, 2012 at 1:57 PM, <ri...@nokia.com> wrote:
>
> Hi All,
>
>
>
> One of my Query output looks like-
>
>
>
> Alpha Beta Gamma
>
> 123 xyz 1.0
>
> 123 abc 0.5
>
> 123 pqr 1.3
>
> 123 def 2.1
>
> 456 xyz 0.1
>
> 456 abc 0.6
>
> 456 pqr 1.9
>
> 456 def 3.2
>
> 456 lmn 1.1
>
> 456 sdf 1.2
>
>
>
> I want the output for the data visualization purpose to look like
> (basically taking rows from one table and making it column in another
> table)
>
>
>
> Alpha xyz abc pqr def lmn
> sdf
>
> 123 1.0 0.5 1.3 2.1
>
> 456 0.1 0.6 1.9 3.2
> 1.1 1.2
>
>
>
> Question - Can it be done in Hive? If not, any suggestions.
>
>
>
> Thanks,
>
> Richin
>
>
>
>
>
>
Re: Converting rows into dynamic colums in Hive
Posted by John Meagher <jo...@gmail.com>.
I don't think having dynamic columns is possible in Hive. I've always
output from Hive a structure like your query output and used R to
convert it into a dynamic column structure.
On Wed, Aug 8, 2012 at 3:56 PM, <ri...@nokia.com> wrote:
> Thanks Ashish, that gives an idea.
>
> But I am not sure about the outer select loop, I have to know all the values
> in Beta column beforehand to do a max on each value.
>
> Is there a better way?
>
>
>
> Richin
>
>
>
> From: ext Ashish Thusoo [mailto:athusoo@qubole.com]
> Sent: Tuesday, August 07, 2012 5:05 PM
> To: user@hive.apache.org
> Subject: Re: Converting rows into dynamic colums in Hive
>
>
>
> you should be able to do this in hive using a group by on alpha and then
> using a combination of the max and if statement... something on the
> following lines
>
> select alpha, max(abc), max(pqr), ...
> (
> select alpha, if (beta == 'abc', Gamma, NULL) as abc, if (beta == 'pqr',
> Gamma, NUL) as pqr, ....
> from table
> )
> group by alpha
>
> something on those lines...
>
> Ashish
>
> On Tue, Aug 7, 2012 at 1:57 PM, <ri...@nokia.com> wrote:
>
> Hi All,
>
>
>
> One of my Query output looks like-
>
>
>
> Alpha Beta Gamma
>
> 123 xyz 1.0
>
> 123 abc 0.5
>
> 123 pqr 1.3
>
> 123 def 2.1
>
> 456 xyz 0.1
>
> 456 abc 0.6
>
> 456 pqr 1.9
>
> 456 def 3.2
>
> 456 lmn 1.1
>
> 456 sdf 1.2
>
>
>
> I want the output for the data visualization purpose to look like (basically
> taking rows from one table and making it column in another table)
>
>
>
> Alpha xyz abc pqr def lmn
> sdf
>
> 123 1.0 0.5 1.3 2.1
>
> 456 0.1 0.6 1.9 3.2
> 1.1 1.2
>
>
>
> Question – Can it be done in Hive? If not, any suggestions.
>
>
>
> Thanks,
>
> Richin
>
>
>
>
>
>
RE: Converting rows into dynamic colums in Hive
Posted by ri...@nokia.com.
Thanks Ashish, that gives an idea.
But I am not sure about the outer select loop, I have to know all the values in Beta column beforehand to do a max on each value.
Is there a better way?
Richin
From: ext Ashish Thusoo [mailto:athusoo@qubole.com]
Sent: Tuesday, August 07, 2012 5:05 PM
To: user@hive.apache.org
Subject: Re: Converting rows into dynamic colums in Hive
you should be able to do this in hive using a group by on alpha and then using a combination of the max and if statement... something on the following lines
select alpha, max(abc), max(pqr), ...
(
select alpha, if (beta == 'abc', Gamma, NULL) as abc, if (beta == 'pqr', Gamma, NUL) as pqr, ....
from table
)
group by alpha
something on those lines...
Ashish
On Tue, Aug 7, 2012 at 1:57 PM, <ri...@nokia.com>> wrote:
Hi All,
One of my Query output looks like-
Alpha Beta Gamma
123 xyz 1.0
123 abc 0.5
123 pqr 1.3
123 def 2.1
456 xyz 0.1
456 abc 0.6
456 pqr 1.9
456 def 3.2
456 lmn 1.1
456 sdf 1.2
I want the output for the data visualization purpose to look like (basically taking rows from one table and making it column in another table)
Alpha xyz abc pqr def lmn sdf
123 1.0 0.5 1.3 2.1
456 0.1 0.6 1.9 3.2 1.1 1.2
Question - Can it be done in Hive? If not, any suggestions.
Thanks,
Richin
Re: Converting rows into dynamic colums in Hive
Posted by Ashish Thusoo <at...@qubole.com>.
you should be able to do this in hive using a group by on alpha and then
using a combination of the max and if statement... something on the
following lines
select alpha, max(abc), max(pqr), ...
(
select alpha, if (beta == 'abc', Gamma, NULL) as abc, if (beta == 'pqr',
Gamma, NUL) as pqr, ....
from table
)
group by alpha
something on those lines...
Ashish
On Tue, Aug 7, 2012 at 1:57 PM, <ri...@nokia.com> wrote:
> Hi All,****
>
> ** **
>
> One of my Query output looks like-****
>
> ** **
>
> *Alpha Beta Gamma*
>
> 123 xyz 1.0****
>
> 123 abc 0.5****
>
> 123 pqr 1.3****
>
> 123 def 2.1****
>
> *456 xyz 0.1*
>
> *456 abc 0.6*
>
> *456 pqr 1.9*
>
> *456 def 3.2*
>
> *456 lmn 1.1*
>
> *456 sdf 1.2*
>
> ** **
>
> I want the output for the data visualization purpose to look like
> (basically taking rows from one table and making it column in another table)
> ****
>
> ** **
>
> *Alpha xyz abc pqr def
> lmn sdf *
>
> 123 1.0 0.5 1.3
> 2.1 ****
>
> 456 0.1 0.6 1.9
> 3.2 1.1 1.2****
>
> ** **
>
> Question – Can it be done in Hive? If not, any suggestions.****
>
> ** **
>
> Thanks,****
>
> Richin****
>
> ** **
>
> ** **
>