You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Namit Jain (JIRA)" <ji...@apache.org> on 2013/02/11 10:57:12 UTC
[jira] [Created] (HIVE-4006) Add local sort operator
Namit Jain created HIVE-4006:
--------------------------------
Summary: Add local sort operator
Key: HIVE-4006
URL: https://issues.apache.org/jira/browse/HIVE-4006
Project: Hive
Issue Type: New Feature
Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
We've seen in the past that sorting data on a specific column can greatly improve the compression of data. The problem is that sorting data is expensive and requires a reduce phase.
One way around this is to add a local sort (either as an operator or between serialization and output). This could take chunks of rows and do an in memory sort of these. This would be much faster, but would need to be very memory efficient in order to get the maximum number of rows in a chunk (and hence the maximum benefit).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira