You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Simanchal Das <si...@outlook.com> on 2016/07/05 05:03:10 UTC
Review Request 49619: sorting of tuple array using multiple fields
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49619/
-----------------------------------------------------------
Review request for hive and Carl Steinbach.
Repository: hive-git
Description
-------
Problem Statement:
When we are working with complext structure of data like avro.
Most of the times we are encountering array contains multiple tuples and each tuple have struct schema.
Suppose here struct schema is like below:
{
"name": "employee",
"type": [{
"type": "record",
"name": "Employee",
"namespace": "com.company.Employee",
"fields": [{
"name": "empId",
"type": "int"
}, {
"name": "empName",
"type": "string"
}, {
"name": "age",
"type": "int"
}, {
"name": "salary",
"type": "double"
}]
}]
}
Then while running our hive query complex array looks like array of employee objects.
Example:
//(array<struct<empId,empName,age,salary>>)
Array[Employee(100,Foo,20,20990),Employee(500,Boo,30,50990),Employee(700,Harry,25,40990),Employee(100,Tom,35,70990)]
When we are implementing business use cases day to day life we are encountering problems like sorting a tuple array by specific field[s] like empIdm,salary,etc.
Proposal:
I have developed a udf 'sort_array_field' which will sort a tuple array by one or more fields in naural order.
Example:
1.Select sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Salary");
output: array[struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(500,Boo,30,50990),struct(100,Tom,35,70990)]
2.Select sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,80990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary");
output: array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
3.Select sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","Age);
output: array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
Diffs
-----
itests/src/test/resources/testconfiguration.properties 1ab914d
ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 2f4a94c
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArrayField.java PRE-CREATION
ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFSortArrayField.java PRE-CREATION
ql/src/test/queries/clientnegative/udf_sort_array_field_wrong1.q PRE-CREATION
ql/src/test/queries/clientnegative/udf_sort_array_field_wrong2.q PRE-CREATION
ql/src/test/queries/clientpositive/udf_sort_array_field.q PRE-CREATION
ql/src/test/results/clientnegative/udf_sort_array_field_wrong1.q.out PRE-CREATION
ql/src/test/results/clientnegative/udf_sort_array_field_wrong2.q.out PRE-CREATION
ql/src/test/results/clientpositive/udf_sort_array_field.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/49619/diff/
Testing
-------
Junit test cases and query.q files are attached
Thanks,
Simanchal Das
Re: Review Request 49619: sorting of tuple array using multiple fields
Posted by Simanchal Das <si...@outlook.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49619/
-----------------------------------------------------------
(Updated July 7, 2016, 5:03 a.m.)
Review request for hive and Carl Steinbach.
Changes
-------
added udf name in show function q.out file
Repository: hive-git
Description
-------
Problem Statement:
When we are working with complext structure of data like avro.
Most of the times we are encountering array contains multiple tuples and each tuple have struct schema.
Suppose here struct schema is like below:
{
"name": "employee",
"type": [{
"type": "record",
"name": "Employee",
"namespace": "com.company.Employee",
"fields": [{
"name": "empId",
"type": "int"
}, {
"name": "empName",
"type": "string"
}, {
"name": "age",
"type": "int"
}, {
"name": "salary",
"type": "double"
}]
}]
}
Then while running our hive query complex array looks like array of employee objects.
Example:
//(array<struct<empId,empName,age,salary>>)
Array[Employee(100,Foo,20,20990),Employee(500,Boo,30,50990),Employee(700,Harry,25,40990),Employee(100,Tom,35,70990)]
When we are implementing business use cases day to day life we are encountering problems like sorting a tuple array by specific field[s] like empIdm,salary,etc.
Proposal:
I have developed a udf 'sort_array_field' which will sort a tuple array by one or more fields in naural order.
Example:
1.Select sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Salary");
output: array[struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(500,Boo,30,50990),struct(100,Tom,35,70990)]
2.Select sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,80990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary");
output: array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
3.Select sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","Age);
output: array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
Diffs (updated)
-----
itests/src/test/resources/testconfiguration.properties 1ab914d
ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 2f4a94c
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArrayField.java PRE-CREATION
ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFSortArrayField.java PRE-CREATION
ql/src/test/queries/clientnegative/udf_sort_array_field_wrong1.q PRE-CREATION
ql/src/test/queries/clientnegative/udf_sort_array_field_wrong2.q PRE-CREATION
ql/src/test/queries/clientpositive/udf_sort_array_field.q PRE-CREATION
ql/src/test/results/beelinepositive/show_functions.q.out 4f3ec40
ql/src/test/results/clientnegative/udf_sort_array_field_wrong1.q.out PRE-CREATION
ql/src/test/results/clientnegative/udf_sort_array_field_wrong2.q.out PRE-CREATION
ql/src/test/results/clientpositive/udf_sort_array_field.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/49619/diff/
Testing
-------
Junit test cases and query.q files are attached
Thanks,
Simanchal Das