You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Simanchal Das (JIRA)" <ji...@apache.org> on 2016/07/05 04:39:10 UTC

[jira] [Created] (HIVE-14159) sorting of tuple array using multiple field[s]

Simanchal Das created HIVE-14159:
------------------------------------

             Summary: sorting of tuple array using multiple field[s]
                 Key: HIVE-14159
                 URL: https://issues.apache.org/jira/browse/HIVE-14159
             Project: Hive
          Issue Type: Improvement
          Components: UDF
            Reporter: Simanchal Das
            Assignee: Simanchal Das


Problem Statement:

When we are working with complex structure of data like avro.
Most of the times we are encountering array contains multiple tuples and each tuple have struct schema.

Suppose here struct schema is like below:
{noformat}
{
	"name": "employee",
	"type": [{
		"type": "record",
		"name": "Employee",
		"namespace": "com.company.Employee",
		"fields": [{
			"name": "empId",
			"type": "int"
		}, {
			"name": "empName",
			"type": "string"
		}, {
			"name": "age",
			"type": "int"
		}, {
			"name": "salary",
			"type": "double"
		}]
	}]
}

{noformat}
Then while running our hive query complex array looks like array of employee objects.
{noformat}
Example: 
	//(array<struct<empId,empName,age,salary>>)
	Array[Employee(100,Foo,20,20990),Employee(500,Boo,30,50990),Employee(700,Harry,25,40990),Employee(100,Tom,35,70990)]

{noformat}
When we are implementing business use cases day to day life we are encountering problems like sorting a tuple array by specific field[s] like empId,name,salary,etc.


Proposal:

I have developed a udf 'sort_array_field' which will sort a tuple array by one or more fields in naural order.
{noformat}
Example:
	1.Select sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Salary");
	output: array[struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(500,Boo,30,50990),struct(100,Tom,35,70990)]
	
	2.Select sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,80990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary");
	output: array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]

	3.Select sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","Age);
	output: array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)