You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Bharath Vissapragada (Jira)" <ji...@apache.org> on 2020/11/03 16:06:00 UTC

[jira] [Resolved] (HBASE-24859) Optimize in-memory representation of mapreduce TableSplit objects

     [ https://issues.apache.org/jira/browse/HBASE-24859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bharath Vissapragada resolved HBASE-24859.
------------------------------------------
    Fix Version/s: 2.3.4
                   2.2.7
                   2.4.0
                   3.0.0-alpha-1
       Resolution: Fixed

> Optimize in-memory representation of mapreduce TableSplit objects
> -----------------------------------------------------------------
>
>                 Key: HBASE-24859
>                 URL: https://issues.apache.org/jira/browse/HBASE-24859
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>    Affects Versions: 3.0.0-alpha-1, 2.3.3, 1.7.0, 2.4.0, 2.2.7
>            Reporter: Sandeep Pal
>            Assignee: Sandeep Pal
>            Priority: Major
>             Fix For: 3.0.0-alpha-1, 1.7.0, 2.4.0, 2.2.7, 2.3.4
>
>         Attachments: Screen Shot 2020-08-26 at 8.44.34 AM.png, hbase-24859.png
>
>
> It has been observed that when the table has too many regions, MR jobs consume a lot of memory in the client. This is because we keep the region level information in memory and the memory heavy object is TableSplit because of the Scan object as a part of it.
> However, it looks like the TableInputFormat for single table doesn't need to store the scan object in the TableSplit because we do not use it and all the splits are expected to have the exact same scan object. In TableInputFormat we use the scan object directly from the MR conf.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)