You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Lukas Nalezenec (JIRA)" <ji...@apache.org> on 2014/02/03 17:24:09 UTC
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13889593#comment-13889593 ]
Lukas Nalezenec commented on HBASE-10413:
-----------------------------------------
I made big changes in code.
You can check it and discus it in https://github.com/apache/hbase/pull/8/files .
I have to write unit tests before making the patch.
- I need help with unit test. Is there some simple unit test helper/utility i can use ? I need to create table with some regions and then work with their sizes. It should be local, there should be some level of abstraction.
- I have added configuration option for disabling this feature:
Is there some policy about new configuration options ?
Should i move the configuration key constant to some place ?
Should be the feature disabled or enabled by default ?
- Computation of region sizes might be slow. We might need some parallelization.
from mail:
+ public void setLength(long length) {
This method in TableSplit can be package private.
I think that lot of people uses Table Split in their custom Input format. IMHO this method should be part of API.
> Tablesplit.getLength returns 0
> ------------------------------
>
> Key: HBASE-10413
> URL: https://issues.apache.org/jira/browse/HBASE-10413
> Project: HBase
> Issue Type: Bug
> Components: Client, mapreduce
> Affects Versions: 0.96.1.1
> Reporter: Lukas Nalezenec
> Assignee: Lukas Nalezenec
>
> InputSplits should be sorted by length but TableSplit does not contain real getLength implementation:
> @Override
> public long getLength() {
> // Not clear how to obtain this... seems to be used only for sorting splits
> return 0;
> }
> This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region.
> Can we implement this method ?
> What is the best way ?
> We were thinking about estimating size by size of files on HDFS.
> We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family.
> Update:
> This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)