You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Appy (JIRA)" <ji...@apache.org> on 2017/03/25 02:33:42 UTC

[jira] [Commented] (HBASE-16775) Flakey test with TestExportSnapshot#testExportRetry and TestMobExportSnapshot#testExportRetry

    [ https://issues.apache.org/jira/browse/HBASE-16775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941526#comment-15941526 ] 

Appy commented on HBASE-16775:
------------------------------

Things tried so far:
Dumped the configuration to logs to make sure that MR job is correctly getting mapreduce.map.maxattempts.

Changed the code to use MiniMapReduce cluster. By default it spawns 2 servers. 
Looking at minicluster logs, i see retries happening.

At this point i can't think of a way to make it work. Summarizing everything:
What this test was trying to test is: if mapper fails and we have retries enabled, then overall job should pass.
To do so, earlier it was throwing exception from mapper based on probability, which is crazy and highly flaky.
What i was trying to do is, set retries to Y and throw exceptions X times where X  < Y. Initially, X is 0 and is incremented on every injected failure. The issue is, since mapper runs are isolated, i can't find a way to maintain state of X across mappers. As a result, even the 4th retry of mapper will see X= 0 initially.

Now am thinking that my initial line of thought (in [this|https://issues.apache.org/jira/browse/HBASE-16775?focusedCommentId=15553215&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15553215]  comment above) was right, this test is testing internals of mapreduce i.e. if mapreduce.map.maxattempts is set, MR framework should retry.
[~huaxiang], [~jmhsieh].

> Flakey test with TestExportSnapshot#testExportRetry and TestMobExportSnapshot#testExportRetry 
> ----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-16775
>                 URL: https://issues.apache.org/jira/browse/HBASE-16775
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>            Reporter: huaxiang sun
>            Assignee: huaxiang sun
>         Attachments: disable.patch, HBASE-16775.master.001.patch, HBASE-16775.master.002.patch, HBASE-16775.master.003.patch
>
>
> The root cause is that conf.setInt("mapreduce.map.maxattempts", 10) is not taken by the mapper job, so the retry is actually 0. Debugging to see why this is the case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)