You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2013/10/15 06:10:42 UTC
[jira] [Commented] (HBASE-9759) IntegrationTestBulkLoad random number collision

    [ https://issues.apache.org/jira/browse/HBASE-9759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13794859#comment-13794859 ] 

stack commented on HBASE-9759:
------------------------------

+1 on trying the patch.  How does it provent collision (I did not review closely).

If you do a select on row 0, does it have more versions than other rows.

What is to prevent our clashing randomly on another row?  Because our randoms generation is within a fixed range per iteration?



> IntegrationTestBulkLoad random number collision
> -----------------------------------------------
>
>                 Key: HBASE-9759
>                 URL: https://issues.apache.org/jira/browse/HBASE-9759
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 0.98.0, 0.96.1
>
>         Attachments: hbase-9759_v1.patch
>
>
> ITBL failed recently in our test harness. Inspecting the failure made me believe that the only reason that particular failure might have happened is that there is a collision in random longs generated by the test. 
> The test creates 50 mappers by default, and each mapper writes a 500K random rows starting with row = 0. By default there are 5 iterations.
> The check job outputs these counters: 
> {code}
> 2013-10-13 07:48:01,134 Map input records=124999751
> 2013-10-13 07:48:01,134 Map output records=124999999
> {code}
> The number of input records seems fine because
> {code}
> 124999751 = 1 + 5 * (0.5M - 1) * 50
> {code}
> 5 = num iterations, 0.5M = num rows, 50 = num mappers, and 1 is for row =0 which every chain writes to. 
> Output records should be 125M, however we see one cell missing. Since the map input records matches expected number of distinct rows, I suspect that row = 0 had a collision. 
> In one of the generate jobs, we can see that the reducer output count does not match the reducer input count. Given that we are using KVSortReducer, this confirms that there is a collision in KeyValues received by this task.
> {code}
> 2013-10-13 06:48:12,738 Reduce input records=75000000
> 2013-10-13 06:48:12,738 Reduce output records=74999997
> {code}
> The count is off by 3 because we are writing 3 columns per row. 
> My only theory for explaining this is that we had a collision in chainId's or one of the chains reused row = 0 as the next row. 
> This is similar to HBASE-8700, however, in this the probability is much much much lower. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)