You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Xing Shi (JIRA)" <ji...@apache.org> on 2012/06/11 05:13:42 UTC

[jira] [Commented] (HBASE-6195) Increment data will lost when the memstore flushed

    [ https://issues.apache.org/jira/browse/HBASE-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292625#comment-13292625 ] 

Xing Shi commented on HBASE-6195:
---------------------------------

Here is the data:
I delete the row first, and then use 2000 threads to increment one row, each increment 1000, after all threads done, I read the increment row's value, do 11 times.

for i in `seq 0 10`
do
    /home/shubao.sx/hadoop-0.20.2-cdh3u3/bin/hadoop --config /home/shubao.sx/0.90-hadoop-config jar /home/shubao.sx/inc-no-delete/inc.jar com.taobao.hbase.MultiThreadsIncrement --threadNum 2000 --inc 1000 >/home/shubao.sx/inc-no-delete/inc.$i.log
done

and the results:

inc.0.log : return 199838                                                                                                                  
inc.1.log : return 399729
inc.2.log : return 599579
inc.3.log : return 799441
inc.4.log : return 999305
inc.5.log : return 1199173
inc.6.log : return 1399037
inc.7.log : return 1598939
inc.8.log : return 1798804
inc.9.log : return 1998708
inc.10.log : return 2198637

Because I set the  hlog's parameter
  <property>
    <name>hbase.regionserver.logroll.multiplier</name>
    <value>0.005</value>
  </property>
  <property>
    <name>hbase.regionserver.maxlogs</name>
    <value>3</value>
  </property>

so the memstore flush occurs often.
                
> Increment data will lost when the memstore flushed
> --------------------------------------------------
>
>                 Key: HBASE-6195
>                 URL: https://issues.apache.org/jira/browse/HBASE-6195
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>            Reporter: Xing Shi
>
> There are two problems in increment() now:
> First:
> I see that the timestamp(the variable now) in HRegion's Increment() is generated before got the rowLock, so when there are multi-thread increment the same row, although it generate earlier, it may got the lock later. Because increment just store one version, so till now, the result will still be right.
> When the region is flushing, these increment will read the kv from snapshot and memstore with whose timestamp is larger, and write it back to memstore. If the snapshot's timestamp larger than the memstore, the increment will got the old data and then do the increment, it's wrong.
> Secondly:
> Also there is a risk in increment. Because it writes the memstore first and then HLog, so if it writes HLog failed, the client will also read the incremented value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira