You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Eric Newton (Created) (JIRA)" <ji...@apache.org> on 2012/04/09 23:17:16 UTC

[jira] [Created] (HBASE-5754) data lost with gora continuous ingest test (goraci)

data lost with gora continuous ingest test (goraci)
---------------------------------------------------

                 Key: HBASE-5754
                 URL: https://issues.apache.org/jira/browse/HBASE-5754
             Project: HBase
          Issue Type: Bug
    Affects Versions: 0.92.1
         Environment: 10 node test cluster
            Reporter: Eric Newton


Keith Turner re-wrote the accumulo continuous ingest test using gora, which has both hbase and accumulo back-ends.

I put a billion entries into HBase, and ran the Verify map/reduce job.  The verification failed because about 21K entries were missing.  The goraci [README|https://github.com/keith-turner/goraci] explains the test, and how it detects missing data.

I re-ran the test with 100 million entries, and it verified successfully.  
Both of the times I tested using a billion entries, the verification failed.
If I run the verification step twice, the results are consistent, so the problem is
probably not on the verify step.

Here's the versions of the various packages:

||package||version||
|hadoop|0.20.205.0|
|hbase|0.92.1|
|gora|http://svn.apache.org/repos/asf/gora/trunk r1311277|
|goraci|https://github.com/ericnewton/goraci  tagged 2012-04-08|

The change I made to goraci was to configure it for hbase and to allow it to build properly.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5754) data lost with gora continuous ingest test (goraci)

Posted by "Eric Newton (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251173#comment-13251173 ] 

Eric Newton commented on HBASE-5754:
------------------------------------

@stack

Save yourself a mountain of time during the Verify step by taking a look at GORA-117.

When I pre-split the data, my 100M runs were clean.  When I leaned on the split button during ingest, I had problems even at 100M.  Could be I just got (un)lucky, but I hope it will reduce your debugging time.


                
> data lost with gora continuous ingest test (goraci)
> ---------------------------------------------------
>
>                 Key: HBASE-5754
>                 URL: https://issues.apache.org/jira/browse/HBASE-5754
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.1
>         Environment: 10 node test cluster
>            Reporter: Eric Newton
>            Assignee: stack
>
> Keith Turner re-wrote the accumulo continuous ingest test using gora, which has both hbase and accumulo back-ends.
> I put a billion entries into HBase, and ran the Verify map/reduce job.  The verification failed because about 21K entries were missing.  The goraci [README|https://github.com/keith-turner/goraci] explains the test, and how it detects missing data.
> I re-ran the test with 100 million entries, and it verified successfully.  
> Both of the times I tested using a billion entries, the verification failed.
> If I run the verification step twice, the results are consistent, so the problem is
> probably not on the verify step.
> Here's the versions of the various packages:
> ||package||version||
> |hadoop|0.20.205.0|
> |hbase|0.92.1|
> |gora|http://svn.apache.org/repos/asf/gora/trunk r1311277|
> |goraci|https://github.com/ericnewton/goraci  tagged 2012-04-08|
> The change I made to goraci was to configure it for hbase and to allow it to build properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5754) data lost with gora continuous ingest test (goraci)

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13250195#comment-13250195 ] 

stack commented on HBASE-5754:
------------------------------

Let me take this one.  I've been playing w/ goraci.  I did 25M for my first test and that seemed to work.  Let me do the bigger numbers.  Thanks Eric.
                
> data lost with gora continuous ingest test (goraci)
> ---------------------------------------------------
>
>                 Key: HBASE-5754
>                 URL: https://issues.apache.org/jira/browse/HBASE-5754
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.1
>         Environment: 10 node test cluster
>            Reporter: Eric Newton
>            Assignee: stack
>
> Keith Turner re-wrote the accumulo continuous ingest test using gora, which has both hbase and accumulo back-ends.
> I put a billion entries into HBase, and ran the Verify map/reduce job.  The verification failed because about 21K entries were missing.  The goraci [README|https://github.com/keith-turner/goraci] explains the test, and how it detects missing data.
> I re-ran the test with 100 million entries, and it verified successfully.  
> Both of the times I tested using a billion entries, the verification failed.
> If I run the verification step twice, the results are consistent, so the problem is
> probably not on the verify step.
> Here's the versions of the various packages:
> ||package||version||
> |hadoop|0.20.205.0|
> |hbase|0.92.1|
> |gora|http://svn.apache.org/repos/asf/gora/trunk r1311277|
> |goraci|https://github.com/ericnewton/goraci  tagged 2012-04-08|
> The change I made to goraci was to configure it for hbase and to allow it to build properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5754) data lost with gora continuous ingest test (goraci)

Posted by "Eric Newton (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252454#comment-13252454 ] 

Eric Newton commented on HBASE-5754:
------------------------------------

Yes, that looks like a clean run.

My run of 1B completed correctly yesterday.

However, I just did a run of 100M.  I suspect split/balancing is causing the data loss.  So, during the 100M run, I pushed the split button on the master web page, every 5 seconds until I had several hundred tablets.  Then I left it alone.

{noformat}
12/04/12 14:17:01 INFO mapred.JobClient:   goraci.Verify$Counts
12/04/12 14:17:01 INFO mapred.JobClient:     UNDEFINED=37099
12/04/12 14:17:01 INFO mapred.JobClient:     REFERENCED=89961385
12/04/12 14:17:01 INFO mapred.JobClient:     UNREFERENCED=10001516
{noformat}

Perhaps if you do this you can replicate the problem.
                
> data lost with gora continuous ingest test (goraci)
> ---------------------------------------------------
>
>                 Key: HBASE-5754
>                 URL: https://issues.apache.org/jira/browse/HBASE-5754
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.1
>         Environment: 10 node test cluster
>            Reporter: Eric Newton
>            Assignee: stack
>
> Keith Turner re-wrote the accumulo continuous ingest test using gora, which has both hbase and accumulo back-ends.
> I put a billion entries into HBase, and ran the Verify map/reduce job.  The verification failed because about 21K entries were missing.  The goraci [README|https://github.com/keith-turner/goraci] explains the test, and how it detects missing data.
> I re-ran the test with 100 million entries, and it verified successfully.  
> Both of the times I tested using a billion entries, the verification failed.
> If I run the verification step twice, the results are consistent, so the problem is
> probably not on the verify step.
> Here's the versions of the various packages:
> ||package||version||
> |hadoop|0.20.205.0|
> |hbase|0.92.1|
> |gora|http://svn.apache.org/repos/asf/gora/trunk r1311277|
> |goraci|https://github.com/ericnewton/goraci  tagged 2012-04-08|
> The change I made to goraci was to configure it for hbase and to allow it to build properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5754) data lost with gora continuous ingest test (goraci)

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251043#comment-13251043 ] 

stack commented on HBASE-5754:
------------------------------

@Keith Thanks for the headsup.
                
> data lost with gora continuous ingest test (goraci)
> ---------------------------------------------------
>
>                 Key: HBASE-5754
>                 URL: https://issues.apache.org/jira/browse/HBASE-5754
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.1
>         Environment: 10 node test cluster
>            Reporter: Eric Newton
>            Assignee: stack
>
> Keith Turner re-wrote the accumulo continuous ingest test using gora, which has both hbase and accumulo back-ends.
> I put a billion entries into HBase, and ran the Verify map/reduce job.  The verification failed because about 21K entries were missing.  The goraci [README|https://github.com/keith-turner/goraci] explains the test, and how it detects missing data.
> I re-ran the test with 100 million entries, and it verified successfully.  
> Both of the times I tested using a billion entries, the verification failed.
> If I run the verification step twice, the results are consistent, so the problem is
> probably not on the verify step.
> Here's the versions of the various packages:
> ||package||version||
> |hadoop|0.20.205.0|
> |hbase|0.92.1|
> |gora|http://svn.apache.org/repos/asf/gora/trunk r1311277|
> |goraci|https://github.com/ericnewton/goraci  tagged 2012-04-08|
> The change I made to goraci was to configure it for hbase and to allow it to build properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5754) data lost with gora continuous ingest test (goraci)

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252178#comment-13252178 ] 

stack commented on HBASE-5754:
------------------------------

I managed to get things to work after a few detours and figuring the tool (I'm a little slow).  Here is output of a verify run after uploading 1B rows using the Generator tool (I have a five node cluster):

{code}
12/04/12 03:53:54 INFO mapred.JobClient:  map 100% reduce 99%
12/04/12 03:54:06 INFO mapred.JobClient:  map 100% reduce 100%
12/04/12 03:54:11 INFO mapred.JobClient: Job complete: job_201204092039_0040
12/04/12 03:54:11 INFO mapred.JobClient: Counters: 31
12/04/12 03:54:11 INFO mapred.JobClient:   Job Counters
12/04/12 03:54:11 INFO mapred.JobClient:     Launched reduce tasks=103
12/04/12 03:54:11 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=43090396
12/04/12 03:54:11 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
12/04/12 03:54:11 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
12/04/12 03:54:11 INFO mapred.JobClient:     Rack-local map tasks=75
12/04/12 03:54:11 INFO mapred.JobClient:     Launched map tasks=256
12/04/12 03:54:11 INFO mapred.JobClient:     Data-local map tasks=181
12/04/12 03:54:11 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=8911236
12/04/12 03:54:11 INFO mapred.JobClient:   goraci.Verify$Counts
12/04/12 03:54:11 INFO mapred.JobClient:     REFERENCED=564459547
12/04/12 03:54:11 INFO mapred.JobClient:     UNREFERENCED=1040000000
12/04/12 03:54:11 INFO mapred.JobClient:   File Output Format Counters
12/04/12 03:54:11 INFO mapred.JobClient:     Bytes Written=0
12/04/12 03:54:11 INFO mapred.JobClient:   FileSystemCounters
12/04/12 03:54:11 INFO mapred.JobClient:     FILE_BYTES_READ=80913119406
12/04/12 03:54:11 INFO mapred.JobClient:     HDFS_BYTES_READ=156449
12/04/12 03:54:11 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=107202716633
12/04/12 03:54:11 INFO mapred.JobClient:   File Input Format Counters
12/04/12 03:54:11 INFO mapred.JobClient:     Bytes Read=0
12/04/12 03:54:11 INFO mapred.JobClient:   Map-Reduce Framework
12/04/12 03:54:11 INFO mapred.JobClient:     Map output materialized bytes=28369514665
12/04/12 03:54:11 INFO mapred.JobClient:     Map input records=1604459547
12/04/12 03:54:11 INFO mapred.JobClient:     Reduce shuffle bytes=28259732158
12/04/12 03:54:11 INFO mapred.JobClient:     Spilled Records=8195463443
12/04/12 03:54:11 INFO mapred.JobClient:     Map output bytes=24031522877
12/04/12 03:54:11 INFO mapred.JobClient:     CPU time spent (ms)=20730410
12/04/12 03:54:11 INFO mapred.JobClient:     Total committed heap usage (bytes)=150411739136
12/04/12 03:54:11 INFO mapred.JobClient:     Combine input records=0
12/04/12 03:54:11 INFO mapred.JobClient:     SPLIT_RAW_BYTES=156449
12/04/12 03:54:11 INFO mapred.JobClient:     Reduce input records=2168919094
12/04/12 03:54:11 INFO mapred.JobClient:     Reduce input groups=1604459547
12/04/12 03:54:11 INFO mapred.JobClient:     Combine output records=0
12/04/12 03:54:11 INFO mapred.JobClient:     Physical memory (bytes) snapshot=144318976000
12/04/12 03:54:11 INFO mapred.JobClient:     Reduce output records=0
12/04/12 03:54:11 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=522892115968
12/04/12 03:54:11 INFO mapred.JobClient:     Map output records=2168919094
{code}

Going by the README, it would seem we're basically working.  Should I close this issue or would you like me to look at something else?  Thanks.

I like this goraci tool.  Will play some more with it.
                
> data lost with gora continuous ingest test (goraci)
> ---------------------------------------------------
>
>                 Key: HBASE-5754
>                 URL: https://issues.apache.org/jira/browse/HBASE-5754
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.1
>         Environment: 10 node test cluster
>            Reporter: Eric Newton
>            Assignee: stack
>
> Keith Turner re-wrote the accumulo continuous ingest test using gora, which has both hbase and accumulo back-ends.
> I put a billion entries into HBase, and ran the Verify map/reduce job.  The verification failed because about 21K entries were missing.  The goraci [README|https://github.com/keith-turner/goraci] explains the test, and how it detects missing data.
> I re-ran the test with 100 million entries, and it verified successfully.  
> Both of the times I tested using a billion entries, the verification failed.
> If I run the verification step twice, the results are consistent, so the problem is
> probably not on the verify step.
> Here's the versions of the various packages:
> ||package||version||
> |hadoop|0.20.205.0|
> |hbase|0.92.1|
> |gora|http://svn.apache.org/repos/asf/gora/trunk r1311277|
> |goraci|https://github.com/ericnewton/goraci  tagged 2012-04-08|
> The change I made to goraci was to configure it for hbase and to allow it to build properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (HBASE-5754) data lost with gora continuous ingest test (goraci)

Posted by "stack (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack reassigned HBASE-5754:
----------------------------

    Assignee: stack
    
> data lost with gora continuous ingest test (goraci)
> ---------------------------------------------------
>
>                 Key: HBASE-5754
>                 URL: https://issues.apache.org/jira/browse/HBASE-5754
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.1
>         Environment: 10 node test cluster
>            Reporter: Eric Newton
>            Assignee: stack
>
> Keith Turner re-wrote the accumulo continuous ingest test using gora, which has both hbase and accumulo back-ends.
> I put a billion entries into HBase, and ran the Verify map/reduce job.  The verification failed because about 21K entries were missing.  The goraci [README|https://github.com/keith-turner/goraci] explains the test, and how it detects missing data.
> I re-ran the test with 100 million entries, and it verified successfully.  
> Both of the times I tested using a billion entries, the verification failed.
> If I run the verification step twice, the results are consistent, so the problem is
> probably not on the verify step.
> Here's the versions of the various packages:
> ||package||version||
> |hadoop|0.20.205.0|
> |hbase|0.92.1|
> |gora|http://svn.apache.org/repos/asf/gora/trunk r1311277|
> |goraci|https://github.com/ericnewton/goraci  tagged 2012-04-08|
> The change I made to goraci was to configure it for hbase and to allow it to build properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5754) data lost with gora continuous ingest test (goraci)

Posted by "Enis Soztutar (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13250316#comment-13250316 ] 

Enis Soztutar commented on HBASE-5754:
--------------------------------------

@Stack, if you need any help with Gora, I can help with that. 
                
> data lost with gora continuous ingest test (goraci)
> ---------------------------------------------------
>
>                 Key: HBASE-5754
>                 URL: https://issues.apache.org/jira/browse/HBASE-5754
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.1
>         Environment: 10 node test cluster
>            Reporter: Eric Newton
>            Assignee: stack
>
> Keith Turner re-wrote the accumulo continuous ingest test using gora, which has both hbase and accumulo back-ends.
> I put a billion entries into HBase, and ran the Verify map/reduce job.  The verification failed because about 21K entries were missing.  The goraci [README|https://github.com/keith-turner/goraci] explains the test, and how it detects missing data.
> I re-ran the test with 100 million entries, and it verified successfully.  
> Both of the times I tested using a billion entries, the verification failed.
> If I run the verification step twice, the results are consistent, so the problem is
> probably not on the verify step.
> Here's the versions of the various packages:
> ||package||version||
> |hadoop|0.20.205.0|
> |hbase|0.92.1|
> |gora|http://svn.apache.org/repos/asf/gora/trunk r1311277|
> |goraci|https://github.com/ericnewton/goraci  tagged 2012-04-08|
> The change I made to goraci was to configure it for hbase and to allow it to build properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5754) data lost with gora continuous ingest test (goraci)

Posted by "Enis Soztutar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13416003#comment-13416003 ] 

Enis Soztutar commented on HBASE-5754:
--------------------------------------

@Lars, 
We have been running this for a while as nightlies, and apart from the reported HBASE-5986, HBASE-6060, and HBASE-6160, we did not run into more issues. All of them can be considered META issues w/o actual data loss. Let's see what Eric would say. 
                
> data lost with gora continuous ingest test (goraci)
> ---------------------------------------------------
>
>                 Key: HBASE-5754
>                 URL: https://issues.apache.org/jira/browse/HBASE-5754
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.1
>         Environment: 10 node test cluster
>            Reporter: Eric Newton
>            Assignee: stack
>
> Keith Turner re-wrote the accumulo continuous ingest test using gora, which has both hbase and accumulo back-ends.
> I put a billion entries into HBase, and ran the Verify map/reduce job.  The verification failed because about 21K entries were missing.  The goraci [README|https://github.com/keith-turner/goraci] explains the test, and how it detects missing data.
> I re-ran the test with 100 million entries, and it verified successfully.  
> Both of the times I tested using a billion entries, the verification failed.
> If I run the verification step twice, the results are consistent, so the problem is
> probably not on the verify step.
> Here's the versions of the various packages:
> ||package||version||
> |hadoop|0.20.205.0|
> |hbase|0.92.1|
> |gora|http://svn.apache.org/repos/asf/gora/trunk r1311277|
> |goraci|https://github.com/ericnewton/goraci  tagged 2012-04-08|
> The change I made to goraci was to configure it for hbase and to allow it to build properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5754) data lost with gora continuous ingest test (goraci)

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13416188#comment-13416188 ] 

nkeywal commented on HBASE-5754:
--------------------------------

@Eric
When you kill servers, do you stop the whole hardware, just the accumulo processes, or both hdfs & accumulo?
                
> data lost with gora continuous ingest test (goraci)
> ---------------------------------------------------
>
>                 Key: HBASE-5754
>                 URL: https://issues.apache.org/jira/browse/HBASE-5754
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.1
>         Environment: 10 node test cluster
>            Reporter: Eric Newton
>            Assignee: stack
>
> Keith Turner re-wrote the accumulo continuous ingest test using gora, which has both hbase and accumulo back-ends.
> I put a billion entries into HBase, and ran the Verify map/reduce job.  The verification failed because about 21K entries were missing.  The goraci [README|https://github.com/keith-turner/goraci] explains the test, and how it detects missing data.
> I re-ran the test with 100 million entries, and it verified successfully.  
> Both of the times I tested using a billion entries, the verification failed.
> If I run the verification step twice, the results are consistent, so the problem is
> probably not on the verify step.
> Here's the versions of the various packages:
> ||package||version||
> |hadoop|0.20.205.0|
> |hbase|0.92.1|
> |gora|http://svn.apache.org/repos/asf/gora/trunk r1311277|
> |goraci|https://github.com/ericnewton/goraci  tagged 2012-04-08|
> The change I made to goraci was to configure it for hbase and to allow it to build properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5754) data lost with gora continuous ingest test (goraci)

Posted by "Enis Soztutar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271036#comment-13271036 ] 

Enis Soztutar commented on HBASE-5754:
--------------------------------------

In one of my 0.92.x tests on a 10 node cluster, 250M inserts, I did manage to get the verify to fail: 
{code}
12/05/08 11:11:18 INFO mapred.JobClient:   goraci.Verify$Counts
12/05/08 11:11:18 INFO mapred.JobClient:     UNDEFINED=972506
12/05/08 11:11:18 INFO mapred.JobClient:     REFERENCED=248051318
12/05/08 11:11:18 INFO mapred.JobClient:     UNREFERENCED=972506
12/05/08 11:11:18 INFO mapred.JobClient:   Map-Reduce Framework
12/05/08 11:11:18 INFO mapred.JobClient:     Map input records=249023824
{code}

Notice that map input records is 1M less that 250M, which indicates that the inputformat did not provide all records in the table. The missing rows all belong to the single region. I rerun the test again after a couple of hours, and it passed. But the failed test created 244 maps, instead of 246, which is the current region count, so I am suspecting there is something wrong in the split calculation or in the supposed transactional behavior for split/balance operations in the meta table. I am still inspecting the code and the logs, but any pointers are welcome. 
                
> data lost with gora continuous ingest test (goraci)
> ---------------------------------------------------
>
>                 Key: HBASE-5754
>                 URL: https://issues.apache.org/jira/browse/HBASE-5754
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.1
>         Environment: 10 node test cluster
>            Reporter: Eric Newton
>            Assignee: stack
>
> Keith Turner re-wrote the accumulo continuous ingest test using gora, which has both hbase and accumulo back-ends.
> I put a billion entries into HBase, and ran the Verify map/reduce job.  The verification failed because about 21K entries were missing.  The goraci [README|https://github.com/keith-turner/goraci] explains the test, and how it detects missing data.
> I re-ran the test with 100 million entries, and it verified successfully.  
> Both of the times I tested using a billion entries, the verification failed.
> If I run the verification step twice, the results are consistent, so the problem is
> probably not on the verify step.
> Here's the versions of the various packages:
> ||package||version||
> |hadoop|0.20.205.0|
> |hbase|0.92.1|
> |gora|http://svn.apache.org/repos/asf/gora/trunk r1311277|
> |goraci|https://github.com/ericnewton/goraci  tagged 2012-04-08|
> The change I made to goraci was to configure it for hbase and to allow it to build properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5754) data lost with gora continuous ingest test (goraci)

Posted by "Keith Turner (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252502#comment-13252502 ] 

Keith Turner commented on HBASE-5754:
-------------------------------------

When Eric ran 10 generators each adding 100M against HBase and no data was lost, he saw 1B referenced, 0 unref, and 0 undef.
                
> data lost with gora continuous ingest test (goraci)
> ---------------------------------------------------
>
>                 Key: HBASE-5754
>                 URL: https://issues.apache.org/jira/browse/HBASE-5754
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.1
>         Environment: 10 node test cluster
>            Reporter: Eric Newton
>            Assignee: stack
>
> Keith Turner re-wrote the accumulo continuous ingest test using gora, which has both hbase and accumulo back-ends.
> I put a billion entries into HBase, and ran the Verify map/reduce job.  The verification failed because about 21K entries were missing.  The goraci [README|https://github.com/keith-turner/goraci] explains the test, and how it detects missing data.
> I re-ran the test with 100 million entries, and it verified successfully.  
> Both of the times I tested using a billion entries, the verification failed.
> If I run the verification step twice, the results are consistent, so the problem is
> probably not on the verify step.
> Here's the versions of the various packages:
> ||package||version||
> |hadoop|0.20.205.0|
> |hbase|0.92.1|
> |gora|http://svn.apache.org/repos/asf/gora/trunk r1311277|
> |goraci|https://github.com/ericnewton/goraci  tagged 2012-04-08|
> The change I made to goraci was to configure it for hbase and to allow it to build properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5754) data lost with gora continuous ingest test (goraci)

Posted by "Keith Turner (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251017#comment-13251017 ] 

Keith Turner commented on HBASE-5754:
-------------------------------------

You may run into GORA-116, a bug in the gora-hbase store.
                
> data lost with gora continuous ingest test (goraci)
> ---------------------------------------------------
>
>                 Key: HBASE-5754
>                 URL: https://issues.apache.org/jira/browse/HBASE-5754
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.1
>         Environment: 10 node test cluster
>            Reporter: Eric Newton
>            Assignee: stack
>
> Keith Turner re-wrote the accumulo continuous ingest test using gora, which has both hbase and accumulo back-ends.
> I put a billion entries into HBase, and ran the Verify map/reduce job.  The verification failed because about 21K entries were missing.  The goraci [README|https://github.com/keith-turner/goraci] explains the test, and how it detects missing data.
> I re-ran the test with 100 million entries, and it verified successfully.  
> Both of the times I tested using a billion entries, the verification failed.
> If I run the verification step twice, the results are consistent, so the problem is
> probably not on the verify step.
> Here's the versions of the various packages:
> ||package||version||
> |hadoop|0.20.205.0|
> |hbase|0.92.1|
> |gora|http://svn.apache.org/repos/asf/gora/trunk r1311277|
> |goraci|https://github.com/ericnewton/goraci  tagged 2012-04-08|
> The change I made to goraci was to configure it for hbase and to allow it to build properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5754) data lost with gora continuous ingest test (goraci)

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13416246#comment-13416246 ] 

nkeywal commented on HBASE-5754:
--------------------------------

@keith: thanks for the info. It will be interesting to see the results. fyi, I've just created HBASE-6401 on an hdfs linked issue.

                
> data lost with gora continuous ingest test (goraci)
> ---------------------------------------------------
>
>                 Key: HBASE-5754
>                 URL: https://issues.apache.org/jira/browse/HBASE-5754
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.1
>         Environment: 10 node test cluster
>            Reporter: Eric Newton
>            Assignee: stack
>
> Keith Turner re-wrote the accumulo continuous ingest test using gora, which has both hbase and accumulo back-ends.
> I put a billion entries into HBase, and ran the Verify map/reduce job.  The verification failed because about 21K entries were missing.  The goraci [README|https://github.com/keith-turner/goraci] explains the test, and how it detects missing data.
> I re-ran the test with 100 million entries, and it verified successfully.  
> Both of the times I tested using a billion entries, the verification failed.
> If I run the verification step twice, the results are consistent, so the problem is
> probably not on the verify step.
> Here's the versions of the various packages:
> ||package||version||
> |hadoop|0.20.205.0|
> |hbase|0.92.1|
> |gora|http://svn.apache.org/repos/asf/gora/trunk r1311277|
> |goraci|https://github.com/ericnewton/goraci  tagged 2012-04-08|
> The change I made to goraci was to configure it for hbase and to allow it to build properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5754) data lost with gora continuous ingest test (goraci)

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252504#comment-13252504 ] 

stack commented on HBASE-5754:
------------------------------

Let me do the same.  I did not match generator map tasks to verify reducers.  Then let me recreate the split issue Eric describes above.  Thanks lads.
                
> data lost with gora continuous ingest test (goraci)
> ---------------------------------------------------
>
>                 Key: HBASE-5754
>                 URL: https://issues.apache.org/jira/browse/HBASE-5754
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.1
>         Environment: 10 node test cluster
>            Reporter: Eric Newton
>            Assignee: stack
>
> Keith Turner re-wrote the accumulo continuous ingest test using gora, which has both hbase and accumulo back-ends.
> I put a billion entries into HBase, and ran the Verify map/reduce job.  The verification failed because about 21K entries were missing.  The goraci [README|https://github.com/keith-turner/goraci] explains the test, and how it detects missing data.
> I re-ran the test with 100 million entries, and it verified successfully.  
> Both of the times I tested using a billion entries, the verification failed.
> If I run the verification step twice, the results are consistent, so the problem is
> probably not on the verify step.
> Here's the versions of the various packages:
> ||package||version||
> |hadoop|0.20.205.0|
> |hbase|0.92.1|
> |gora|http://svn.apache.org/repos/asf/gora/trunk r1311277|
> |goraci|https://github.com/ericnewton/goraci  tagged 2012-04-08|
> The change I made to goraci was to configure it for hbase and to allow it to build properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5754) data lost with gora continuous ingest test (goraci)

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13253135#comment-13253135 ] 

stack commented on HBASE-5754:
------------------------------

I ran w/ 10 generators and 10 slots for the verify step and got the below which doesn't prints out only a REFERENCED count.

Running these recent tests I let it do its natural splitting so it grew from zero to 260odd regions so maybe the issue you see Eric comes of manual splits coming out of the UI.  Let me try that next.

Thanks lads.

{code}
12/04/13 05:16:23 INFO mapred.JobClient:  map 100% reduce 99%
12/04/13 05:16:54 INFO mapred.JobClient:  map 100% reduce 100%
12/04/13 05:16:59 INFO mapred.JobClient: Job complete: job_201204092039_0046
12/04/13 05:16:59 INFO mapred.JobClient: Counters: 30
12/04/13 05:16:59 INFO mapred.JobClient:   Job Counters
12/04/13 05:16:59 INFO mapred.JobClient:     Launched reduce tasks=10
12/04/13 05:16:59 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=30125694
12/04/13 05:16:59 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
12/04/13 05:16:59 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
12/04/13 05:16:59 INFO mapred.JobClient:     Rack-local map tasks=6
12/04/13 05:16:59 INFO mapred.JobClient:     Launched map tasks=256
12/04/13 05:16:59 INFO mapred.JobClient:     Data-local map tasks=250
12/04/13 05:16:59 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=5832198
12/04/13 05:16:59 INFO mapred.JobClient:   goraci.Verify$Counts
12/04/13 05:16:59 INFO mapred.JobClient:     REFERENCED=1000000000
12/04/13 05:16:59 INFO mapred.JobClient:   File Output Format Counters
12/04/13 05:16:59 INFO mapred.JobClient:     Bytes Written=0
12/04/13 05:16:59 INFO mapred.JobClient:   FileSystemCounters
12/04/13 05:16:59 INFO mapred.JobClient:     FILE_BYTES_READ=83022967343
12/04/13 05:16:59 INFO mapred.JobClient:     HDFS_BYTES_READ=156414
12/04/13 05:16:59 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=112881560332
12/04/13 05:16:59 INFO mapred.JobClient:   File Input Format Counters
12/04/13 05:16:59 INFO mapred.JobClient:     Bytes Read=0
12/04/13 05:16:59 INFO mapred.JobClient:   Map-Reduce Framework
12/04/13 05:16:59 INFO mapred.JobClient:     Map output materialized bytes=29992170602
12/04/13 05:16:59 INFO mapred.JobClient:     Map input records=1000000000
12/04/13 05:16:59 INFO mapred.JobClient:     Reduce shuffle bytes=29874879887
12/04/13 05:16:59 INFO mapred.JobClient:     Spilled Records=7527086436
12/04/13 05:16:59 INFO mapred.JobClient:     Map output bytes=25992155242
12/04/13 05:16:59 INFO mapred.JobClient:     CPU time spent (ms)=20182570
12/04/13 05:16:59 INFO mapred.JobClient:     Total committed heap usage (bytes)=99953082368
12/04/13 05:16:59 INFO mapred.JobClient:     Combine input records=0
12/04/13 05:16:59 INFO mapred.JobClient:     SPLIT_RAW_BYTES=156414
12/04/13 05:16:59 INFO mapred.JobClient:     Reduce input records=2000000000
12/04/13 05:16:59 INFO mapred.JobClient:     Reduce input groups=1000000000
12/04/13 05:16:59 INFO mapred.JobClient:     Combine output records=0
12/04/13 05:16:59 INFO mapred.JobClient:     Physical memory (bytes) snapshot=91762372608
12/04/13 05:16:59 INFO mapred.JobClient:     Reduce output records=0
12/04/13 05:16:59 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=391126540288
12/04/13 05:16:59 INFO mapred.JobClient:     Map output records=2000000000
{code}
                
> data lost with gora continuous ingest test (goraci)
> ---------------------------------------------------
>
>                 Key: HBASE-5754
>                 URL: https://issues.apache.org/jira/browse/HBASE-5754
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.1
>         Environment: 10 node test cluster
>            Reporter: Eric Newton
>            Assignee: stack
>
> Keith Turner re-wrote the accumulo continuous ingest test using gora, which has both hbase and accumulo back-ends.
> I put a billion entries into HBase, and ran the Verify map/reduce job.  The verification failed because about 21K entries were missing.  The goraci [README|https://github.com/keith-turner/goraci] explains the test, and how it detects missing data.
> I re-ran the test with 100 million entries, and it verified successfully.  
> Both of the times I tested using a billion entries, the verification failed.
> If I run the verification step twice, the results are consistent, so the problem is
> probably not on the verify step.
> Here's the versions of the various packages:
> ||package||version||
> |hadoop|0.20.205.0|
> |hbase|0.92.1|
> |gora|http://svn.apache.org/repos/asf/gora/trunk r1311277|
> |goraci|https://github.com/ericnewton/goraci  tagged 2012-04-08|
> The change I made to goraci was to configure it for hbase and to allow it to build properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5754) data lost with gora continuous ingest test (goraci)

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251181#comment-13251181 ] 

stack commented on HBASE-5754:
------------------------------

@Eric No worries.  Let me make sure of your 1B test regardless (I messed up and loaded many billions ... need to start over).  Thanks for the heads up.
                
> data lost with gora continuous ingest test (goraci)
> ---------------------------------------------------
>
>                 Key: HBASE-5754
>                 URL: https://issues.apache.org/jira/browse/HBASE-5754
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.1
>         Environment: 10 node test cluster
>            Reporter: Eric Newton
>            Assignee: stack
>
> Keith Turner re-wrote the accumulo continuous ingest test using gora, which has both hbase and accumulo back-ends.
> I put a billion entries into HBase, and ran the Verify map/reduce job.  The verification failed because about 21K entries were missing.  The goraci [README|https://github.com/keith-turner/goraci] explains the test, and how it detects missing data.
> I re-ran the test with 100 million entries, and it verified successfully.  
> Both of the times I tested using a billion entries, the verification failed.
> If I run the verification step twice, the results are consistent, so the problem is
> probably not on the verify step.
> Here's the versions of the various packages:
> ||package||version||
> |hadoop|0.20.205.0|
> |hbase|0.92.1|
> |gora|http://svn.apache.org/repos/asf/gora/trunk r1311277|
> |goraci|https://github.com/ericnewton/goraci  tagged 2012-04-08|
> The change I made to goraci was to configure it for hbase and to allow it to build properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5754) data lost with gora continuous ingest test (goraci)

Posted by "Keith Turner (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13416207#comment-13416207 ] 

Keith Turner commented on HBASE-5754:
-------------------------------------

In the past we just killed Accumulo processes, the tablet servers, loggers, and master process(es).  We were not looking to test HDFS.  For the next release, Accumulo has moved its write ahead logging to HDFS.  So for the next release we plan to start killing HDFS processes as part of our testing.  See ACCUMULO-578, ACCUMULO-623, and ACCUMULO-636.
                
> data lost with gora continuous ingest test (goraci)
> ---------------------------------------------------
>
>                 Key: HBASE-5754
>                 URL: https://issues.apache.org/jira/browse/HBASE-5754
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.1
>         Environment: 10 node test cluster
>            Reporter: Eric Newton
>            Assignee: stack
>
> Keith Turner re-wrote the accumulo continuous ingest test using gora, which has both hbase and accumulo back-ends.
> I put a billion entries into HBase, and ran the Verify map/reduce job.  The verification failed because about 21K entries were missing.  The goraci [README|https://github.com/keith-turner/goraci] explains the test, and how it detects missing data.
> I re-ran the test with 100 million entries, and it verified successfully.  
> Both of the times I tested using a billion entries, the verification failed.
> If I run the verification step twice, the results are consistent, so the problem is
> probably not on the verify step.
> Here's the versions of the various packages:
> ||package||version||
> |hadoop|0.20.205.0|
> |hbase|0.92.1|
> |gora|http://svn.apache.org/repos/asf/gora/trunk r1311277|
> |goraci|https://github.com/ericnewton/goraci  tagged 2012-04-08|
> The change I made to goraci was to configure it for hbase and to allow it to build properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5754) data lost with gora continuous ingest test (goraci)

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13415889#comment-13415889 ] 

Lars Hofhansl commented on HBASE-5754:
--------------------------------------

Trying to figure out whether this is something to worry about or not.
Is this a permanent data loss issue, or is it "just" .META. being out of whack temporarily (or at least fixable with hbck)?

                
> data lost with gora continuous ingest test (goraci)
> ---------------------------------------------------
>
>                 Key: HBASE-5754
>                 URL: https://issues.apache.org/jira/browse/HBASE-5754
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.1
>         Environment: 10 node test cluster
>            Reporter: Eric Newton
>            Assignee: stack
>
> Keith Turner re-wrote the accumulo continuous ingest test using gora, which has both hbase and accumulo back-ends.
> I put a billion entries into HBase, and ran the Verify map/reduce job.  The verification failed because about 21K entries were missing.  The goraci [README|https://github.com/keith-turner/goraci] explains the test, and how it detects missing data.
> I re-ran the test with 100 million entries, and it verified successfully.  
> Both of the times I tested using a billion entries, the verification failed.
> If I run the verification step twice, the results are consistent, so the problem is
> probably not on the verify step.
> Here's the versions of the various packages:
> ||package||version||
> |hadoop|0.20.205.0|
> |hbase|0.92.1|
> |gora|http://svn.apache.org/repos/asf/gora/trunk r1311277|
> |goraci|https://github.com/ericnewton/goraci  tagged 2012-04-08|
> The change I made to goraci was to configure it for hbase and to allow it to build properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5754) data lost with gora continuous ingest test (goraci)

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13250940#comment-13250940 ] 

stack commented on HBASE-5754:
------------------------------

@Enis Gora seems to work just fine.  Thanks for the offer.
                
> data lost with gora continuous ingest test (goraci)
> ---------------------------------------------------
>
>                 Key: HBASE-5754
>                 URL: https://issues.apache.org/jira/browse/HBASE-5754
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.1
>         Environment: 10 node test cluster
>            Reporter: Eric Newton
>            Assignee: stack
>
> Keith Turner re-wrote the accumulo continuous ingest test using gora, which has both hbase and accumulo back-ends.
> I put a billion entries into HBase, and ran the Verify map/reduce job.  The verification failed because about 21K entries were missing.  The goraci [README|https://github.com/keith-turner/goraci] explains the test, and how it detects missing data.
> I re-ran the test with 100 million entries, and it verified successfully.  
> Both of the times I tested using a billion entries, the verification failed.
> If I run the verification step twice, the results are consistent, so the problem is
> probably not on the verify step.
> Here's the versions of the various packages:
> ||package||version||
> |hadoop|0.20.205.0|
> |hbase|0.92.1|
> |gora|http://svn.apache.org/repos/asf/gora/trunk r1311277|
> |goraci|https://github.com/ericnewton/goraci  tagged 2012-04-08|
> The change I made to goraci was to configure it for hbase and to allow it to build properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5754) data lost with gora continuous ingest test (goraci)

Posted by "Eric Newton (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13416177#comment-13416177 ] 

Eric Newton commented on HBASE-5754:
------------------------------------

I will try to reproduce the bug with 0.94 and hadoop 1.0.3 sometime in the next week.

We normally run this test against Accumulo for 24 hours, both with and without agitation (random killing of servers).  I would be concerned if there was apparent data loss, even if it was transient.

                
> data lost with gora continuous ingest test (goraci)
> ---------------------------------------------------
>
>                 Key: HBASE-5754
>                 URL: https://issues.apache.org/jira/browse/HBASE-5754
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.1
>         Environment: 10 node test cluster
>            Reporter: Eric Newton
>            Assignee: stack
>
> Keith Turner re-wrote the accumulo continuous ingest test using gora, which has both hbase and accumulo back-ends.
> I put a billion entries into HBase, and ran the Verify map/reduce job.  The verification failed because about 21K entries were missing.  The goraci [README|https://github.com/keith-turner/goraci] explains the test, and how it detects missing data.
> I re-ran the test with 100 million entries, and it verified successfully.  
> Both of the times I tested using a billion entries, the verification failed.
> If I run the verification step twice, the results are consistent, so the problem is
> probably not on the verify step.
> Here's the versions of the various packages:
> ||package||version||
> |hadoop|0.20.205.0|
> |hbase|0.92.1|
> |gora|http://svn.apache.org/repos/asf/gora/trunk r1311277|
> |goraci|https://github.com/ericnewton/goraci  tagged 2012-04-08|
> The change I made to goraci was to configure it for hbase and to allow it to build properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5754) data lost with gora continuous ingest test (goraci)

Posted by "Enis Soztutar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13416243#comment-13416243 ] 

Enis Soztutar commented on HBASE-5754:
--------------------------------------

Just FYI in case, I've been working on adding "long running ingestion tests while randomly killing of servers", and other types of integration tests over at HBASE-6241, HBASE-6201. Feel free to chime in. 
                
> data lost with gora continuous ingest test (goraci)
> ---------------------------------------------------
>
>                 Key: HBASE-5754
>                 URL: https://issues.apache.org/jira/browse/HBASE-5754
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.1
>         Environment: 10 node test cluster
>            Reporter: Eric Newton
>            Assignee: stack
>
> Keith Turner re-wrote the accumulo continuous ingest test using gora, which has both hbase and accumulo back-ends.
> I put a billion entries into HBase, and ran the Verify map/reduce job.  The verification failed because about 21K entries were missing.  The goraci [README|https://github.com/keith-turner/goraci] explains the test, and how it detects missing data.
> I re-ran the test with 100 million entries, and it verified successfully.  
> Both of the times I tested using a billion entries, the verification failed.
> If I run the verification step twice, the results are consistent, so the problem is
> probably not on the verify step.
> Here's the versions of the various packages:
> ||package||version||
> |hadoop|0.20.205.0|
> |hbase|0.92.1|
> |gora|http://svn.apache.org/repos/asf/gora/trunk r1311277|
> |goraci|https://github.com/ericnewton/goraci  tagged 2012-04-08|
> The change I made to goraci was to configure it for hbase and to allow it to build properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5754) data lost with gora continuous ingest test (goraci)

Posted by "Keith Turner (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252491#comment-13252491 ] 

Keith Turner commented on HBASE-5754:
-------------------------------------

The counts for the 1B run seem odd to me , but maybe thats just an artifact of how many map task you ran for the generator and how much data each task generated.  If a map task does not does not generate a multiple of 25,000,000 then it will leave some unreferenced.  It generates a circular linked list every 25M.   

{noformat}
12/04/12 03:54:11 INFO mapred.JobClient:     REFERENCED=564459547
12/04/12 03:54:11 INFO mapred.JobClient:     UNREFERENCED=1040000000
{noformat}

If you were to run 10 map task each generating 100M, then this should generate 1B with all nodes referenced.  Minimizing the number of unreferenced is ideal, because the test can not detect the loss of unreferenced nodes.  I should probably add this info to the readme.

                
> data lost with gora continuous ingest test (goraci)
> ---------------------------------------------------
>
>                 Key: HBASE-5754
>                 URL: https://issues.apache.org/jira/browse/HBASE-5754
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.1
>         Environment: 10 node test cluster
>            Reporter: Eric Newton
>            Assignee: stack
>
> Keith Turner re-wrote the accumulo continuous ingest test using gora, which has both hbase and accumulo back-ends.
> I put a billion entries into HBase, and ran the Verify map/reduce job.  The verification failed because about 21K entries were missing.  The goraci [README|https://github.com/keith-turner/goraci] explains the test, and how it detects missing data.
> I re-ran the test with 100 million entries, and it verified successfully.  
> Both of the times I tested using a billion entries, the verification failed.
> If I run the verification step twice, the results are consistent, so the problem is
> probably not on the verify step.
> Here's the versions of the various packages:
> ||package||version||
> |hadoop|0.20.205.0|
> |hbase|0.92.1|
> |gora|http://svn.apache.org/repos/asf/gora/trunk r1311277|
> |goraci|https://github.com/ericnewton/goraci  tagged 2012-04-08|
> The change I made to goraci was to configure it for hbase and to allow it to build properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira