You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Heng Chen (JIRA)" <ji...@apache.org> on 2015/11/27 10:47:11 UTC

[jira] [Commented] (HBASE-7743) Replace *SortReducers with Hadoop Secondary Sort

    [ https://issues.apache.org/jira/browse/HBASE-7743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15029690#comment-15029690 ] 

Heng Chen commented on HBASE-7743:
----------------------------------

Any progress?  
IMO we can remove TreeSet at least to save memory used.  
As for very large rows,  we can not avoid OOM because we should group all cells per row.  
And it is not encouraged when we design Table.  

> Replace *SortReducers with Hadoop Secondary Sort
> ------------------------------------------------
>
>                 Key: HBASE-7743
>                 URL: https://issues.apache.org/jira/browse/HBASE-7743
>             Project: HBase
>          Issue Type: Sub-task
>          Components: mapreduce, Performance
>            Reporter: Nick Dimiduk
>             Fix For: 2.0.0
>
>
> The mapreduce package provides two Reducer implementations, KeyValueSortReducer and PutSortReducer, which are used by Import, ImportTsv, and WALPlayer in conjunction with the HFileOutputFormat. Both of these implementations make use of a TreeSet to sort values matching a key. This reducer will OOM when rows are large.
> A better solution would be to implement secondary sort of the values. That way hadoop sorts the records, spilling to disk when necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)