You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Ted Yu (JIRA)" <ji...@apache.org> on 2012/12/12 23:59:21 UTC
[jira] [Commented] (HBASE-7342) Split operation without split key
incorrectly finds the middle key in off-by-one error
[ https://issues.apache.org/jira/browse/HBASE-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13530463#comment-13530463 ]
Ted Yu commented on HBASE-7342:
-------------------------------
{code}
+ System.out.println("Original table has: " + loadedTableCount + " rows");
{code}
Please use LOG variable for the above.
{code}
+ Thread.currentThread();
{code}
Does the above statement have any effect ?
{code}
+ Thread.sleep(1000);
{code}
Can the sleep duration be shorter ?
{code}
+ } catch (InterruptedException e) {
+ e.printStackTrace();
{code}
Throw InterruptedIOException from the catch block.
{code}
+ return;
+
{code}
nit: remove the empty line.
{code}
+ throw new Exception("Split did not increase the number of regions");
{code}
nit: use fail().
> Split operation without split key incorrectly finds the middle key in off-by-one error
> --------------------------------------------------------------------------------------
>
> Key: HBASE-7342
> URL: https://issues.apache.org/jira/browse/HBASE-7342
> Project: HBase
> Issue Type: Bug
> Components: HFile, io
> Affects Versions: 0.94.1, 0.94.2, 0.94.3, 0.96.0
> Reporter: Aleksandr Shulman
> Assignee: Aleksandr Shulman
> Priority: Minor
> Fix For: 0.96.0, 0.94.4
>
> Attachments: HBASE-7342-v1.patch
>
>
> I took a deeper look into issues I was having using region splitting when specifying a region (but not a key for splitting).
> The midkey calculation is off by one and when there are 2 rows, will pick the 0th one. This causes the firstkey to be the same as midkey and the split will fail. Removing the -1 causes it work correctly, as per the test I've added.
> Looking into the code here is what goes on:
> 1. Split takes the largest storefile
> 2. It puts all the keys into a 2-dimensional array called blockKeys[][]. Key i resides as blockKeys[i]
> 3. Getting the middle root-level index should yield the key in the middle of the storefile
> 4. In step 3, we see that there is a possible erroneous (-1) to adjust for the 0-offset indexing.
> 5. In a result with where there are only 2 blockKeys, this yields the 0th block key.
> 6. Unfortunately, this is the same block key that 'firstKey' will be.
> 7. This yields the result in HStore.java:1873 ("cannot split because midkey is the same as first or last row")
> 8. Removing the -1 solves the problem (in this case).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira