You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-dev@db.apache.org by siddharth srivastava <ak...@gmail.com> on 2012/04/06 18:43:15 UTC

GSoC 2012 Proposal

Hi

Following is my proposal for GSoC 2012 for improving code coverage in Derby.

https://docs.google.com/document/d/1_-ANrTY0wyU9o6bhu5aAQTGZPL1w3llMWH1TdMPkPeU/edit


Please provide your feedback.

Thanks

-- 
Regards
Siddharth Srivastava

Re: GSoC 2012 Proposal

Posted by siddharth srivastava <ak...@gmail.com>.

Hi

>
>  I would like to hear your thoughts on how, given a specific method for
> which you plan to improve coverage, how would you go about identifying a
> functional test case which will cover the method.  This will of course be
> the tricky bit of all of this.


For determining a functional test case for a method, first it needs to
determined what the method is supposed to do (from documentation related to
the class). Then it needs to be analyzed what flow of code exercise this
method.Then I would write a prototype code to test what the method does
when actually executed. (Probably some bugs may be determined from here as
well)
Once this is determined,next step would be to look at the expected input
and output for the method and write a test case for the scenario in which
the method is invoked (something like a black box to test if things work
for expected input ot not).
Next, within the method, there may be many controls flows which also need
to be exercised depending upon the input parameters and the way from which
the method has been invoked. Test case would then be added to exercise
these cases as well.
Also, negative tests also needs to be performed and write test cases for it
since testing also requires if things fail when they are supposed to fail.

Regards
Siddharth Srivastava

Re: GSoC 2012 Proposal

Posted by Bryan Pendleton <bp...@gmail.com>.

> Also, the fail log shows that these tests failed with emma:
>
> 1) Embedded_30: BasicSetup, Changes10_2, Changes10_3, Changes10_4, Changes10_5, Changes10_6,Changes10_7,Changes10_9
> 2) Embedded_40: Changes10_2, Changes10_3, Changes10_4, Changes10_5, Changes10_6, Changes10_7, Changes10_9, DatabaseMetaDataTest
>
> I am looking at the stacktrace of each of them:
>   For example, Embedded_30\BasicSetup\testDERBY5120NumRowsInSydependsForTrigger

Can you tell, from the timestamps on the various files, and the information
in the reports, which was the *very first* test that failed?

If you can, why did that test fail?

Sometimes, subsequent tests fail because there are inter-test dependencies,
so it can be simplest to analyze the very first failure.

thanks,

bryan

Re: GSoC 2012 Proposal

Posted by siddharth srivastava <ak...@gmail.com>.

   The reports published on the site also use sane builds. It looks like

> the coverage is poorer in the replication code. The replication tests
> spawn many processes, so I'm wondering if there might be a problem with
> storing the coverage data from sub-processes; sometimes causing the
> report generation to fail, sometimes making the report lack some data.
>
>
Thanks Knut. I have tried emma-all with an insane build as well.
The report generation failed while generating reports.
I have uploaded the results at:
http://liveaired.com/derby/emma/junit_insane/
seems to be quite different from the sane build.

Also, the fail log shows that these tests failed with emma:

1) Embedded_30: BasicSetup, Changes10_2, Changes10_3, Changes10_4,
Changes10_5, Changes10_6,Changes10_7,Changes10_9
2) Embedded_40: Changes10_2, Changes10_3, Changes10_4, Changes10_5,
Changes10_6, Changes10_7, Changes10_9, DatabaseMetaDataTest

I am looking at the stacktrace of each of them:
 For example,
Embedded_30\BasicSetup\testDERBY5120NumRowsInSydependsForTrigger
failed due to: ERROR 42802: The number of values assigned is not the same
as the number of specified or implied columns.

And there are a number of strange reasons for failures such as non
availbality of boot password for encrypted database.
Also, some tests required derby.database property to be setup.

Also there is: AUTHORIZATIONID not valid for SYSIBM expected:<[DBA]> but
was:<[APP]>

So, are these errors due to the specific JVM or environment and are as
expected ?

Second point I wanted to ask was, where can we look for emma logs, I am
pretty sure
that my console showed that report generation failed at some point, but
unable to find
where the logs are for that. (serverConsoleOutput doesn't seem to contain
it either).


Thanks

-- 
Regards
Siddharth Srivastava

Re: GSoC 2012 Proposal

Posted by Knut Anders Hatlen <kn...@oracle.com>.

siddharth srivastava <ak...@gmail.com> writes:

> The results with the sane build seem better than the one on the site
> [2]. Is it due to the 
> fact that insane build has debug information and other exception
> related code as well
> which makes the code coverage results for it numerically weaker than
> the sane build ?

The reports published on the site also use sane builds. It looks like
the coverage is poorer in the replication code. The replication tests
spawn many processes, so I'm wondering if there might be a problem with
storing the coverage data from sub-processes; sometimes causing the
report generation to fail, sometimes making the report lack some data.

-- 
Knut Anders

Re: GSoC 2012 Proposal

Posted by siddharth srivastava <ak...@gmail.com>.

>
> Hi

I have been working over the suggestions given over the lists and finally
was able to
get suites.All work with emma.

Bryan's Comment:

If you were using numbers published on the web site, have you tried to
> run the tests with code coverage instrumentation yourself? I am interested
> to know how closely the coverage numbers that you observe in your test runs
> match the onces published at the Derby web site.

The figures in the proposal were from the one published on the site.
I have executed the suites.All with emma and now have a coverage report up
at [1].
These are the results from sane build. The generation of coverage report
with insane
build is currently going on on my PC.
I have updated my proposal to reflect the same.

The results with the sane build seem better than the one on the site[2]. Is
it due to the
fact that insane build has debug information and other exception related
code as well
which makes the code coverage results for it numerically weaker than the
sane build ?

Tiago wrote:

Your proposal looks good to me, however I would like to see a bit more of
> background. Why are unit tests important? I offered to mentor this proposal
> but it would be nice to see that you grasp the importance of having such
> unit tests. When are they important? Why do we bother writing them? Are we
> wasting our time or will these tests actually be beneficial for Derby in
> the future? Try to answer these questions and sell me the idea of why we
> need to raise our test coverage :-)

Thanks Tiago for your suggestion. I have updated the proposal on melange to
reflect my views
over unit testing and code coverage and there subsequent advantage for it.
Hopefully I am a good
salesperson :) (I'll be improving as I go)

Kathey Wrote:

I think also allocating some time in the proposal for perhaps updating and
> expanding our limited  documentation on  the running and analyzing  code
> coverage.
> http://wiki.apache.org/db-**derby/CodeCoverageWithEMMA<http://wiki.apache.org/db-derby/CodeCoverageWithEMMA>

Since I have already setup Emma and ran the tests. I think I would be able
to dedicate time for improvising the documentation as well. Till now, I
have been following DerbyCodeCoverageUsingEmma [3]. But after some strange
errors, with addition of flags to argument list and a few exceptions
failing I switched to the ant build. Though it was easier to do it with the
ant build, but I understood things better with initial documentation.
So, probably I'll focus on explaining the working and especially add
various parameters that can be associated in the arguments
for running emma with ant.

I have also updated the proposal to contain all the above.

Also, I would like to hear your thoughts on how, given a specific method
> for which you plan to improve coverage, how would you go about identifying
> a functional test case which will cover the method.  This will of course be
> the tricky bit of all of this.

I'll be following this mail with another one to explain what my
understanding is in this regard.

Thanks

Regards
Siddharth Srivastava

[1]:http://liveaired.com/derby/emma/junit_sane/
[2] http://dbtg.foundry.sun.com/derby/test/coverage/
[3]  http://db.apache.org/derby/binaries/DerbyCodeCoverageUsingEmma.pdf

Re: GSoC 2012 Proposal

Posted by Katherine Marsden <km...@sbcglobal.net>.

On 4/11/2012 4:25 AM, Tiago Espinha wrote:
>
> Also from talking with you on IRC, I see that the results you 
> presented are based on Derby's automated reports. If you could get 
> your Emma run setup before April 16 (the deadline for the student 
> ranking) and you mention this in your application, it would make your 
> proposal much stronger.
>
I too think it would be good to go through running the code coverage and 
also for the the improvements for a summer projected it would be good to 
focus on method coverage and focus on a few specific packages.

I think also allocating some time in the proposal for perhaps updating 
and expanding our limited  documentation on  the running and analyzing  
code coverage.
http://wiki.apache.org/db-derby/CodeCoverageWithEMMA

Also, I would like to hear your thoughts on how, given a specific method 
for which you plan to improve coverage, how would you go about 
identifying a functional test case which will cover the method.  This 
will of course be the tricky bit of all of this.

Kathey

Re: GSoC 2012 Proposal

Posted by Tiago Espinha <ti...@espinha.net>.

Hi Siddharth,

Your proposal looks good to me, however I would like to see a bit more of
background. Why are unit tests important? I offered to mentor this proposal
but it would be nice to see that you grasp the importance of having such
unit tests. When are they important? Why do we bother writing them? Are we
wasting our time or will these tests actually be beneficial for Derby in
the future? Try to answer these questions and sell me the idea of why we
need to raise our test coverage :-)

In your proposal you also say "Main focus would be to maximise the code
coverage". The researcher in me tends to disagree :-) I'd be MUCH happier
with an overall code coverage of 80% on Derby, than having a few random
classes with 100% and another bunch with, say, 40%. It's been (more or
less) proven that anything above 80% is overkill but if we achieve 80%
along all of Derby's codebase (n.b. not an average number, but at least 80%
on all of Derby's codebase), we can in principle have a very good code
coverage.

Also from talking with you on IRC, I see that the results you presented are
based on Derby's automated reports. If you could get your Emma run setup
before April 16 (the deadline for the student ranking) and you mention this
in your application, it would make your proposal much stronger.

Eventually, if both you and Nufail get their proposals accepted we also
need to make sure your work does not overlap. I'm sure there is enough
tests to be written by two students, but this is something we can think
about at a later stage. For now stick with the classes/packages that look
more interesting to you.

Regards,
Tiago

On Wed, Apr 11, 2012 at 5:05 AM, Bryan Pendleton <bpendleton.derby@gmail.com
> wrote:

> On 04/06/2012 09:43 AM, siddharth srivastava wrote:
>
>> Hi
>>
>> Following is my proposal for GSoC 2012 for improving code coverage in
>> Derby.
>>
>> https://docs.google.com/**document/d/1_-**ANrTY0wyU9o6bhu5aAQTGZPL1w3llM*
>> *WH1TdMPkPeU/edit<https://docs.google.com/document/d/1_-ANrTY0wyU9o6bhu5aAQTGZPL1w3llMWH1TdMPkPeU/edit>
>>
>
> Hello Siddharth,
>
> I have been reading your proposal and I think it is quite good. I am
> pleased to
> see that you have been continuing to work with Derby and are interested in
> learning more about it.
>
> I think that those code packages are good ones to focus on.
>
> I suspect that it will be quite interesting and challenging to improve
> the code coverage in some of these areas, since we will have to uncover
> new ways to exercise Derby and cause it to take code paths that are not
> currently well-explored.
>
> It could be that it is hard to achieve 100% coverage, but even a moderate
> improvement over our current code coverage levels will enhance the quality
> of the system and, I am sure, uncover interesting new bugs to be logged
> and fixed.
>
> Did you get the code coverage numbers in your proposal by running the test
> suites yourself? Or were you reading the code coverage numbers published
> by one of the automated test runs on the Derby web site?
>
> If you were using numbers published on the web site, have you tried to
> run the tests with code coverage instrumentation yourself? I am interested
> to know how closely the coverage numbers that you observe in your test runs
> match the onces published at the Derby web site.
>
> I hope your proposal is accepted by Google, and I hope that other members
> of
> the community will also offer their suggestions about additional
> improvements
> you could make to your proposal.
>
> thanks,
>
> bryan
>

Re: GSoC 2012 Proposal

Posted by Bryan Pendleton <bp...@gmail.com>.

On 04/06/2012 09:43 AM, siddharth srivastava wrote:
> Hi
>
> Following is my proposal for GSoC 2012 for improving code coverage in Derby.
>
> https://docs.google.com/document/d/1_-ANrTY0wyU9o6bhu5aAQTGZPL1w3llMWH1TdMPkPeU/edit

Hello Siddharth,

I have been reading your proposal and I think it is quite good. I am pleased to
see that you have been continuing to work with Derby and are interested in
learning more about it.

I think that those code packages are good ones to focus on.

I suspect that it will be quite interesting and challenging to improve
the code coverage in some of these areas, since we will have to uncover
new ways to exercise Derby and cause it to take code paths that are not
currently well-explored.

It could be that it is hard to achieve 100% coverage, but even a moderate
improvement over our current code coverage levels will enhance the quality
of the system and, I am sure, uncover interesting new bugs to be logged and fixed.

Did you get the code coverage numbers in your proposal by running the test
suites yourself? Or were you reading the code coverage numbers published
by one of the automated test runs on the Derby web site?

If you were using numbers published on the web site, have you tried to
run the tests with code coverage instrumentation yourself? I am interested
to know how closely the coverage numbers that you observe in your test runs
match the onces published at the Derby web site.

I hope your proposal is accepted by Google, and I hope that other members of
the community will also offer their suggestions about additional improvements
you could make to your proposal.

thanks,

bryan