You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by Jean-Daniel Cryans <jd...@apache.org> on 2009/06/14 18:59:26 UTC

Re: scanner is returning everything in parent region plus one of the daughters?

Andrew,

+1 I think it's a great idea.

Building on that, I think we should have system-level tests to make
sure we don't break performance and reliability. For example, an
intensive and simultaneous read/write test of a couple of millions of
rows. We could even think of killing a region server or two during
that test (and a master of course). Currently, I don't think it's
easily doable on Hudson so someone would have to host it on a small
cluster.

J-D

On Sun, Jun 14, 2009 at 12:52 PM, Andrew Purtell<ap...@apache.org> wrote:
> This possibly belongs in one of the new existing/open issues put up over the
> past few days:
>
> Insert 1000 rows with random row keys, and induce a split (see test.rb
> attached to HBASE-1500). I would expect that no more than 1000 rows should
> be returned from a row count. However, the following is a series of row
> counts obtained after running the test, with total reinitialization in
> between, 5 times:
>
>    1516
>    1492
>    1497
>    1509
>    1501
>
> Also the shell provides an additional clue:
>
>    Current count: 1000, row: ffdcee2a75742697b375edef62fa4b75
>
>    1516 row(s) in 2.9530 seconds
>
> Looks like the parent region is fully iterated first, then in addition
> one of the daughters?
>
> Also, as these issues come up, kindly consider adding test cases to the
> test suite to catch these regressions. It seems the current coverage for
> scanners is letting big issues pass unnoticed.
>
> One thing we could do right away is commit my 'test.rb' reimplemented
> as Java/JUnit into the suite, with some additional logic to test that
> the scanners return the count of unique row keys inserted. If no -1 I
> will go ahead and do that.
>
>  - Andy
>
>
>

Re: scanner is returning everything in parent region plus one of the daughters?

Posted by Andrew Purtell <ap...@apache.org>.

Hi Ryan,

In case you missed it, I want to make that 'test.rb' into a Java/JUnit test and add it to the test suite. Still waiting to see if a -1 comes in on that idea, but nothing so far. 

Also, agree that all active devs have clusters of various sizes that can be used for testing, at least for now, but the project does not have some dedicated shared resource, and instead depends on the availability of resources at pset, or su, or tm, etc., resources that are used currently in an ad hoc manner.  I know it's not exactly top of the list right now but I do think an automated suite for running repeatable performance and reliability/fault tolerance tests with some reasonable scale that can be deployed onto EC2 via some script is something to at least consider. I'm happy to contribute the first "application" 

   - Andy

________________________________
From: Ryan Rawson <ry...@gmail.com>
To: hbase-dev@hadoop.apache.org
Sent: Sunday, June 14, 2009 6:54:24 PM
Subject: Re: scanner is returning everything in parent region plus one of the  daughters?

Hey,

Yes, 1304 has revealed weaknesses in the automated tests.  It would be nice
if they were fully covering all edge cases and concurrent scenarios, but
such as it goes.

I'm not sure we need to be renting EC2 time... I have clusters, and so do
pset folks, and we do run tests and verification on them.  It's just that
1304 hit, just in time to have to prep a hundred slides and 3 talks.  It was
hoped there were few bugs, but 1304 really caused some neato bugs.

I appreciate test.rb - but moving forward I think all tests should remain in
Java.  Dynamic scripting languages on the JVM are very difficult to debug
top to bottom.  JUnit is best really :-)

-ryan

On Sun, Jun 14, 2009 at 10:54 AM, Andrew Purtell <ap...@apache.org>wrote:

> Hi J-D,
>
> I agree on all your points. Regarding test hosting, I wonder if anyone
> has resources available to dedicate on a long term basis. I have a 4 node
> testbed which could conceivably run some suite once per day and generate
> some automated report, but I can't guarantee the availability of it. We
> might also consider EC2, as long as the tests are all self contained, all
> I/O between instances only, no data in/out or S3 charges. Using the usage
> calculator (http://calculator.s3.amazonaws.com/calc5.html), it seems that
> 5 extra large instances running for 5 hours once per day will cost $140/
> month. 10 of them would cost $280, etc. That is not a large figure.
>
> Further, this 'test.rb' thing is a distillation of some of the HBase usage
> of my crawler application, the write path. I may also simulate some of the
> scan/read path, the document processing bits. It would be great if we can
> get other contributions of test cases that simulate real world
> applications. Maybe there are examples to draw on from stuff running at
> Powerset, Streamy, Openspaces, etc.
>
>   - Andy
>
>
>
>
> ________________________________
> From: Jean-Daniel Cryans <jd...@apache.org>
> To: hbase-dev@hadoop.apache.org
> Sent: Sunday, June 14, 2009 9:59:26 AM
> Subject: Re: scanner is returning everything in parent region plus one of
> the  daughters?
>
> Andrew,
>
> +1 I think it's a great idea.
>
> Building on that, I think we should have system-level tests to make
> sure we don't break performance and reliability. For example, an
> intensive and simultaneous read/write test of a couple of millions of
> rows. We could even think of killing a region server or two during
> that test (and a master of course). Currently, I don't think it's
> easily doable on Hudson so someone would have to host it on a small
> cluster.
>
> J-D
>
> On Sun, Jun 14, 2009 at 12:52 PM, Andrew Purtell<ap...@apache.org>
> wrote:
> > This possibly belongs in one of the new existing/open issues put up over
> the
> > past few days:
> >
> > Insert 1000 rows with random row keys, and induce a split (see test.rb
> > attached to HBASE-1500). I would expect that no more than 1000 rows
> should
> > be returned from a row count. However, the following is a series of row
> > counts obtained after running the test, with total reinitialization in
> > between, 5 times:
> >
> >    1516
> >    1492
> >    1497
> >    1509
> >    1501
> >
> > Also the shell provides an additional clue:
> >
> >    Current count: 1000, row: ffdcee2a75742697b375edef62fa4b75
> >
> >    1516 row(s) in 2.9530 seconds
> >
> > Looks like the parent region is fully iterated first, then in addition
> > one of the daughters?
> >
> > Also, as these issues come up, kindly consider adding test cases to the
> > test suite to catch these regressions. It seems the current coverage for
> > scanners is letting big issues pass unnoticed.
> >
> > One thing we could do right away is commit my 'test.rb' reimplemented
> > as Java/JUnit into the suite, with some additional logic to test that
> > the scanners return the count of unique row keys inserted. If no -1 I
> > will go ahead and do that.
> >
> >  - Andy
> >
> >
> >
>
>
>
>
>

Re: scanner is returning everything in parent region plus one of the daughters?

Posted by Ryan Rawson <ry...@gmail.com>.

Hey,

Yes, 1304 has revealed weaknesses in the automated tests.  It would be nice
if they were fully covering all edge cases and concurrent scenarios, but
such as it goes.

I'm not sure we need to be renting EC2 time... I have clusters, and so do
pset folks, and we do run tests and verification on them.  It's just that
1304 hit, just in time to have to prep a hundred slides and 3 talks.  It was
hoped there were few bugs, but 1304 really caused some neato bugs.

I appreciate test.rb - but moving forward I think all tests should remain in
Java.  Dynamic scripting languages on the JVM are very difficult to debug
top to bottom.  JUnit is best really :-)

-ryan

On Sun, Jun 14, 2009 at 10:54 AM, Andrew Purtell <ap...@apache.org>wrote:

> Hi J-D,
>
> I agree on all your points. Regarding test hosting, I wonder if anyone
> has resources available to dedicate on a long term basis. I have a 4 node
> testbed which could conceivably run some suite once per day and generate
> some automated report, but I can't guarantee the availability of it. We
> might also consider EC2, as long as the tests are all self contained, all
> I/O between instances only, no data in/out or S3 charges. Using the usage
> calculator (http://calculator.s3.amazonaws.com/calc5.html), it seems that
> 5 extra large instances running for 5 hours once per day will cost $140/
> month. 10 of them would cost $280, etc. That is not a large figure.
>
> Further, this 'test.rb' thing is a distillation of some of the HBase usage
> of my crawler application, the write path. I may also simulate some of the
> scan/read path, the document processing bits. It would be great if we can
> get other contributions of test cases that simulate real world
> applications. Maybe there are examples to draw on from stuff running at
> Powerset, Streamy, Openspaces, etc.
>
>   - Andy
>
>
>
>
> ________________________________
> From: Jean-Daniel Cryans <jd...@apache.org>
> To: hbase-dev@hadoop.apache.org
> Sent: Sunday, June 14, 2009 9:59:26 AM
> Subject: Re: scanner is returning everything in parent region plus one of
> the  daughters?
>
> Andrew,
>
> +1 I think it's a great idea.
>
> Building on that, I think we should have system-level tests to make
> sure we don't break performance and reliability. For example, an
> intensive and simultaneous read/write test of a couple of millions of
> rows. We could even think of killing a region server or two during
> that test (and a master of course). Currently, I don't think it's
> easily doable on Hudson so someone would have to host it on a small
> cluster.
>
> J-D
>
> On Sun, Jun 14, 2009 at 12:52 PM, Andrew Purtell<ap...@apache.org>
> wrote:
> > This possibly belongs in one of the new existing/open issues put up over
> the
> > past few days:
> >
> > Insert 1000 rows with random row keys, and induce a split (see test.rb
> > attached to HBASE-1500). I would expect that no more than 1000 rows
> should
> > be returned from a row count. However, the following is a series of row
> > counts obtained after running the test, with total reinitialization in
> > between, 5 times:
> >
> >    1516
> >    1492
> >    1497
> >    1509
> >    1501
> >
> > Also the shell provides an additional clue:
> >
> >    Current count: 1000, row: ffdcee2a75742697b375edef62fa4b75
> >
> >    1516 row(s) in 2.9530 seconds
> >
> > Looks like the parent region is fully iterated first, then in addition
> > one of the daughters?
> >
> > Also, as these issues come up, kindly consider adding test cases to the
> > test suite to catch these regressions. It seems the current coverage for
> > scanners is letting big issues pass unnoticed.
> >
> > One thing we could do right away is commit my 'test.rb' reimplemented
> > as Java/JUnit into the suite, with some additional logic to test that
> > the scanners return the count of unique row keys inserted. If no -1 I
> > will go ahead and do that.
> >
> >  - Andy
> >
> >
> >
>
>
>
>
>

Re: scanner is returning everything in parent region plus one of the daughters?

Posted by stack <st...@duboce.net>.

On Sun, Jun 14, 2009 at 10:54 AM, Andrew Purtell <ap...@apache.org>wrote:

> Hi J-D,
>
> I agree on all your points. Regarding test hosting, I wonder if anyone
> has resources available to dedicate on a long term basis. I have a 4 node
> testbed which could conceivably run some suite once per day and generate
> some automated report, but I can't guarantee the availability of it. We
> might also consider EC2, as long as the tests are all self contained, all
> I/O between instances only, no data in/out or S3 charges. Using the usage
> calculator (http://calculator.s3.amazonaws.com/calc5.html), it seems that
> 5 extra large instances running for 5 hours once per day will cost $140/
> month. 10 of them would cost $280, etc. That is not a large figure.



IIRC, Amazon donated the Hadoop project time.  Let me see if I can find out
more about state of this resource and if we can get in on it.

Yes, to what J-D says.  Lets do some thinking and dev. around testing.  The
bulk of our unit tests are starting up mini clusters and trying stuff.
Often they are susceptible to failure when run on different hardwares.  The
pattern should be more testing of individual components.  We need to work on
mock objects to help make testing easier.

Also, our unit tests are crusty.   The bulk were written for another time
for earlier interfaces.  They have been carried down through time but their
effectiveness wanes.

I'd like to suggest that we develop testing tiers: unit tests that are run
on every checkin and up on hudson and integration tests that are run on big
checkins and before releases (these can be done as unit tests if it makes
sense but maybe we need to work out some kinda scripting framework).  The
latter we might run on a period up on ec2 or so as Andrew suggests.



>
>
> Further, this 'test.rb' thing is a distillation of some of the HBase usage
> of my crawler application, the write path. I may also simulate some of the
> scan/read path, the document processing bits. It would be great if we can
> get other contributions of test cases that simulate real world
> applications. Maybe there are examples to draw on from stuff running at
> Powerset, Streamy, Openspaces, etc.



Agreed.

Have you looked at TestSplit in the reionserver package?  Is it very
different from test.rb content (I suppose latter is run from client-side)?

St.Ack




>
>
>   - Andy
>
>
>
>
> ________________________________
> From: Jean-Daniel Cryans <jd...@apache.org>
> To: hbase-dev@hadoop.apache.org
> Sent: Sunday, June 14, 2009 9:59:26 AM
> Subject: Re: scanner is returning everything in parent region plus one of
> the  daughters?
>
> Andrew,
>
> +1 I think it's a great idea.
>
> Building on that, I think we should have system-level tests to make
> sure we don't break performance and reliability. For example, an
> intensive and simultaneous read/write test of a couple of millions of
> rows. We could even think of killing a region server or two during
> that test (and a master of course). Currently, I don't think it's
> easily doable on Hudson so someone would have to host it on a small
> cluster.
>
> J-D
>
> On Sun, Jun 14, 2009 at 12:52 PM, Andrew Purtell<ap...@apache.org>
> wrote:
> > This possibly belongs in one of the new existing/open issues put up over
> the
> > past few days:
> >
> > Insert 1000 rows with random row keys, and induce a split (see test.rb
> > attached to HBASE-1500). I would expect that no more than 1000 rows
> should
> > be returned from a row count. However, the following is a series of row
> > counts obtained after running the test, with total reinitialization in
> > between, 5 times:
> >
> >    1516
> >    1492
> >    1497
> >    1509
> >    1501
> >
> > Also the shell provides an additional clue:
> >
> >    Current count: 1000, row: ffdcee2a75742697b375edef62fa4b75
> >
> >    1516 row(s) in 2.9530 seconds
> >
> > Looks like the parent region is fully iterated first, then in addition
> > one of the daughters?
> >
> > Also, as these issues come up, kindly consider adding test cases to the
> > test suite to catch these regressions. It seems the current coverage for
> > scanners is letting big issues pass unnoticed.
> >
> > One thing we could do right away is commit my 'test.rb' reimplemented
> > as Java/JUnit into the suite, with some additional logic to test that
> > the scanners return the count of unique row keys inserted. If no -1 I
> > will go ahead and do that.
> >
> >  - Andy
> >
> >
> >
>
>
>
>
>

Re: scanner is returning everything in parent region plus one of the daughters?

Posted by Andrew Purtell <ap...@apache.org>.

Hi J-D,

I agree on all your points. Regarding test hosting, I wonder if anyone
has resources available to dedicate on a long term basis. I have a 4 node
testbed which could conceivably run some suite once per day and generate
some automated report, but I can't guarantee the availability of it. We
might also consider EC2, as long as the tests are all self contained, all
I/O between instances only, no data in/out or S3 charges. Using the usage
calculator (http://calculator.s3.amazonaws.com/calc5.html), it seems that
5 extra large instances running for 5 hours once per day will cost $140/
month. 10 of them would cost $280, etc. That is not a large figure. 

Further, this 'test.rb' thing is a distillation of some of the HBase usage
of my crawler application, the write path. I may also simulate some of the
scan/read path, the document processing bits. It would be great if we can
get other contributions of test cases that simulate real world
applications. Maybe there are examples to draw on from stuff running at 
Powerset, Streamy, Openspaces, etc. 

   - Andy

________________________________
From: Jean-Daniel Cryans <jd...@apache.org>
To: hbase-dev@hadoop.apache.org
Sent: Sunday, June 14, 2009 9:59:26 AM
Subject: Re: scanner is returning everything in parent region plus one of the  daughters?

Andrew,

+1 I think it's a great idea.

Building on that, I think we should have system-level tests to make
sure we don't break performance and reliability. For example, an
intensive and simultaneous read/write test of a couple of millions of
rows. We could even think of killing a region server or two during
that test (and a master of course). Currently, I don't think it's
easily doable on Hudson so someone would have to host it on a small
cluster.

J-D

On Sun, Jun 14, 2009 at 12:52 PM, Andrew Purtell<ap...@apache.org> wrote:
> This possibly belongs in one of the new existing/open issues put up over the
> past few days:
>
> Insert 1000 rows with random row keys, and induce a split (see test.rb
> attached to HBASE-1500). I would expect that no more than 1000 rows should
> be returned from a row count. However, the following is a series of row
> counts obtained after running the test, with total reinitialization in
> between, 5 times:
>
>    1516
>    1492
>    1497
>    1509
>    1501
>
> Also the shell provides an additional clue:
>
>    Current count: 1000, row: ffdcee2a75742697b375edef62fa4b75
>
>    1516 row(s) in 2.9530 seconds
>
> Looks like the parent region is fully iterated first, then in addition
> one of the daughters?
>
> Also, as these issues come up, kindly consider adding test cases to the
> test suite to catch these regressions. It seems the current coverage for
> scanners is letting big issues pass unnoticed.
>
> One thing we could do right away is commit my 'test.rb' reimplemented
> as Java/JUnit into the suite, with some additional logic to test that
> the scanners return the count of unique row keys inserted. If no -1 I
> will go ahead and do that.
>
>  - Andy
>
>
>