You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Simanchal Das (JIRA)" <ji...@apache.org> on 2016/09/04 17:23:20 UTC

[jira] [Updated] (HIVE-14159) sorting of tuple array using multiple field[s]

     [ https://issues.apache.org/jira/browse/HIVE-14159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simanchal Das updated HIVE-14159:
---------------------------------
    Status: Open  (was: Patch Available)

> sorting of tuple array using multiple field[s]
> ----------------------------------------------
>
>                 Key: HIVE-14159
>                 URL: https://issues.apache.org/jira/browse/HIVE-14159
>             Project: Hive
>          Issue Type: Improvement
>          Components: UDF
>            Reporter: Simanchal Das
>            Assignee: Simanchal Das
>              Labels: patch
>         Attachments: HIVE-14159.1.patch, HIVE-14159.2.patch, HIVE-14159.3.patch
>
>
> Problem Statement:
> When we are working with complex structure of data like avro.
> Most of the times we are encountering array contains multiple tuples and each tuple have struct schema.
> Suppose here struct schema is like below:
> {noformat}
> {
> 	"name": "employee",
> 	"type": [{
> 		"type": "record",
> 		"name": "Employee",
> 		"namespace": "com.company.Employee",
> 		"fields": [{
> 			"name": "empId",
> 			"type": "int"
> 		}, {
> 			"name": "empName",
> 			"type": "string"
> 		}, {
> 			"name": "age",
> 			"type": "int"
> 		}, {
> 			"name": "salary",
> 			"type": "double"
> 		}]
> 	}]
> }
> {noformat}
> Then while running our hive query complex array looks like array of employee objects.
> {noformat}
> Example: 
> 	//(array<struct<empId,empName,age,salary>>)
> 	Array[Employee(100,Foo,20,20990),Employee(500,Boo,30,50990),Employee(700,Harry,25,40990),Employee(100,Tom,35,70990)]
> {noformat}
> When we are implementing business use cases day to day life we are encountering problems like sorting a tuple array by specific field[s] like empId,name,salary,etc by ASC or DESC order.
> Proposal:
> I have developed a udf 'sort_array_by' which will sort a tuple array by one or more fields in ASC or DESC order provided by user ,default is ascending order .
> {noformat}
> Example:
> 	1.Select sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Salary","ASC");
> 	output: array[struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(500,Boo,30,50990),struct(100,Tom,35,70990)]
> 	
> 	2.Select sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,80990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","ASC");
> 	output: array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
> 	3.Select sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","Age,"ASC");
> 	output: array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)