You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Dawid Weiss <da...@cs.put.poznan.pl> on 2006/04/07 09:03:50 UTC

Re: PMD integration

Hi Piotr,

 > that right now it is checking only main code (without plugins?).

Yes, that's correct -- I forgot to mention that. PMD target is hooked up 
with tests and stops the build if something fails. I thought the core 
code should be this strict; for plugins we can have more relaxed rules 
(in another target or even in the same one).

That's again up to you guys.

Dawid

P.S. Tom Copeland has already fixed the bug I mentioned in the patch. 
Quite impressive bugfix turnaround, isn't it. :)



Piotr Kosiorowski wrote:
> P.
> 
> 
> Dawid Weiss wrote:
>>
>> All right, I though I'd give it a go since I have a spare few minutes. 
>> Jura is off, so I made the patches available here --
>>
>> http://ophelia.cs.put.poznan.pl/~dweiss/nutch/
>>
>> pmd.patch is the build file patch and libraries (binaries are in a 
>> separate zip file pmd-ext.zip).
>>
>> pmd-fixes.patch fixes the current core code to go through pmd 
>> smoothly. I removed obvious unused code, but left FIXME comments where 
>> I wasn't sure if the removal can cause side effects (in these places 
>> PMD warnings are suppressed with NOPMD comments).
>>
>> I also discovered a bug in PMD... eh... nothing's perfect.
>>
>> https://sourceforge.net/tracker/?func=detail&atid=479921&aid=1465574&group_id=56262 
>>
>>
>> D.
>>
>>
>> Piotr Kosiorowski wrote:
>>> +1 - I offer my help - we can coordinate it and I can do a part of 
>>> work. I
>>> will also try to commit your patches quickly.
>>> Piotr
>>>
>>> On 4/6/06, Dawid Weiss <da...@cs.put.poznan.pl> wrote:
>>>>
>>>>> Other options (raised on the Hadoop list) are Checkstyle:
>>>> PMD seems to be the best choice for an Apache project and they all seem
>>>> to perform at a similar level.
>>>>
>>>>> Anything that generates a lot of false positives is bad: it either
>>>>> causes us to skip analysis of lots of files, or ignore the warnings.
>>>>> Skipping the JavaCC-generated classes is reasonable, but I'm wary of
>>>>> skipping much else.
>>>> I thought a bit about this. The warnings PMD may actually make sense to
>>>> fix. Take a look at maxDoc here:
>>>>
>>>> class LuceneQueryOptimizer {
>>>>
>>>>    private static class LimitExceeded extends RuntimeException {
>>>>      private int maxDoc;
>>>>      public LimitExceeded(int maxDoc) { this.maxDoc = maxDoc; }
>>>>    }
>>>> ...
>>>>
>>>> maxDoc is accessed from LuceneQueryOptimizer which requires a synthetic
>>>> accessor in LimitExceeded. It also may look confusing because you
>>>> declare a field private to a class, but use it from the outside...
>>>> changing declarations to something like this:
>>>>
>>>> class LuceneQueryOptimizer {
>>>>
>>>>    private static class LimitExceeded extends RuntimeException {
>>>>      final int maxDoc;
>>>>      public LimitExceeded(int maxDoc) { this.maxDoc = maxDoc; }
>>>>    }
>>>> ...
>>>>
>>>> removes the warning and also seems to make more sense (note that 
>>>> package
>>>> scope of maxDoc doesn't really expose it much more than before because
>>>> the entire class is private).
>>>>
>>>> So... if you agree to change existing warnings as shown above (there's
>>>> not that many) then integrating PMD with a set of sensible rules may
>>>> help detecting bad smells in the future (I couldn't resist -- it really
>>>> is called like this in software engineering :). I only used dead code
>>>> detection ruleset for now, other rulesets can be checked and we will 
>>>> see
>>>> if they help or quite the contrary.
>>>>
>>>> If developers agree to the above I'll create a patch together with what
>>>> needs to be fixed to cleanly compile. Otherwise I see little sense in
>>>> integrating PMD.
>>>>
>>>> D.
>>>>
>>>>
>>>>
>>>>
>>>
>>
> 

Re: PMD integration

Posted by Jérôme Charron <je...@gmail.com>.
> > Piotr, please keep oro-2.0.8 in pmd-ext
> I do not agree here - we are going to make a new release next week and
> releasing with two versions of oro does not look nice. oro is quite
> stable product and changes are in fact minimal:
> http://svn.apache.org/repos/asf/jakarta/oro/trunk/CHANGES

OK for me.
But we cannot make a release without minimal tests.
(I will made some tests for removing oro from nutch's regex for post 0.8release)

Jérôme

Re: PMD integration

Posted by Piotr Kosiorowski <pk...@gmail.com>.
Jérôme Charron wrote:
> 2) We do have oro 2-0.7 in dependencies (I think urlfilter and similar
>> things). PMD requires oro - 2.0.8. Do you think we can upgrade (as far
>> as I know 2.0.7 and 2.0.8 should be compatible)? We would have only one
>> oro jar than.
> 
> Piotr, please keep oro-2.0.8 in pmd-ext
> I think we can plan to replace oro regex by java ones (as in RegexUrlFilter)
> in the whole nutch code (and then remove oro-2.0.7 from lib):
> src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java
> src/plugin/parse-js/src/java/org/apache/nutch/parse/js/JSParseFilter.java
> src/java/org/apache/nutch/parse/OutlinkExtractor.java
> src/java/org/apache/nutch/net/RegexUrlNormalizer.java
> src/java/org/apache/nutch/net/BasicUrlNormalizer.java
> 

I do not agree here - we are going to make a new release next week and 
releasing with two versions of oro does not look nice. oro is quite 
stable product and changes are in fact minimal:
http://svn.apache.org/repos/asf/jakarta/oro/trunk/CHANGES
I would like to upgrade to 2.0.8 (as no interface changes were made it 
would be trivial) before 0.8 release.
What do others think?
Regards
Piotr


Re: PMD integration

Posted by Jérôme Charron <je...@gmail.com>.
> 1) Should we check test sources with pmd?

I don't think so.

2) We do have oro 2-0.7 in dependencies (I think urlfilter and similar
> things). PMD requires oro - 2.0.8. Do you think we can upgrade (as far
> as I know 2.0.7 and 2.0.8 should be compatible)? We would have only one
> oro jar than.

Piotr, please keep oro-2.0.8 in pmd-ext
I think we can plan to replace oro regex by java ones (as in RegexUrlFilter)
in the whole nutch code (and then remove oro-2.0.7 from lib):
src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java
src/plugin/parse-js/src/java/org/apache/nutch/parse/js/JSParseFilter.java
src/java/org/apache/nutch/parse/OutlinkExtractor.java
src/java/org/apache/nutch/net/RegexUrlNormalizer.java
src/java/org/apache/nutch/net/BasicUrlNormalizer.java


> So happy PMD-ing,

Thanks ...   ;-)

Jérôme

Re: PMD integration

Posted by Piotr Kosiorowski <pk...@gmail.com>.
Committed.
One can run the pmd checks by 'ant pmd'. It produces file with html 
report in build directory. It covers core nutch and plugins.
Currently it uses unusedcode ruleset checks only but one can uncomment 
other rulesets in build.xml (or add another ones according to pmd 
documentation).

I would like to  add cross-referenced source so report is easier to read 
in near feature.
I have two additional questions for developers:
1) Should we check test sources with pmd?
2) We do have oro 2-0.7 in dependencies (I think urlfilter and similar 
things). PMD requires oro - 2.0.8. Do you think we can upgrade (as far 
as I know 2.0.7 and 2.0.8 should be compatible)? We would have only one 
oro jar than.

So happy PMD-ing,
Piotr





Doug Cutting wrote:
> Piotr Kosiorowski wrote:
>>>> I will make it totally separate target (so test do not
>>>> depend on it).
>>>
>>> That was actually Doug's idea (and I agree with it) to stop the build
>>> file if PMD complains about something. It's similar to testing -- if
>>> your tests fail, the entire build file fails.
>>
>> I totally agree with it - but I want to switch it on for others to
>> play first, and when we agree on
>> rules we want to use make it obligatory.
> 
> So we start out comitting it as an independent target, and then add it 
> to the "test" target?  Is that the plan?  If so, +1.
> 
> Doug
> 


Re: PMD integration

Posted by Piotr Kosiorowski <pk...@gmail.com>.
Doug Cutting wrote:

> So we start out comitting it as an independent target, and then add it 
> to the "test" target?  Is that the plan?  If so, +1.
Exactly - I will do it over the weekend.
P.

Re: PMD integration

Posted by Doug Cutting <cu...@apache.org>.
Piotr Kosiorowski wrote:
>>>I will make it totally separate target (so test do not
>>>depend on it).
>>
>>That was actually Doug's idea (and I agree with it) to stop the build
>>file if PMD complains about something. It's similar to testing -- if
>>your tests fail, the entire build file fails.
> 
> I totally agree with it - but I want to switch it on for others to
> play first, and when we agree on
> rules we want to use make it obligatory.

So we start out comitting it as an independent target, and then add it 
to the "test" target?  Is that the plan?  If so, +1.

Doug

Re: PMD integration

Posted by Piotr Kosiorowski <pk...@gmail.com>.
>
>
> > I will make it totally separate target (so test do not
> > depend on it).
>
> That was actually Doug's idea (and I agree with it) to stop the build
> file if PMD complains about something. It's similar to testing -- if
> your tests fail, the entire build file fails.
>
I totally agree with it - but I want to switch it on for others to
play first, and when we agree on
rules we want to use make it obligatory.
Piotr

Re: PMD integration

Posted by Piotr Kosiorowski <pk...@gmail.com>.
Hello,
  I was looking at the cross-referenced code generation but it looks 
like the package I found mentioned in PMD context is JXR - and this is 
the part of maven as I suspect. As we are using ant for builds I would 
not like to mix these two systems. Do you know any other source 
cross-referencing package that is compatible with PMD?
Piotr

Re: PMD integration

Posted by Dawid Weiss <da...@cs.put.poznan.pl>.
> I do agree with Jarome  - plugins should be checked too.

This basically means modifying the fileset in the pmd task. Shouldn't be 
too difficult to include all plugin sources with a single <include> 
statement.

> I will make it totally separate target (so test do not
> depend on it).

That was actually Doug's idea (and I agree with it) to stop the build 
file if PMD complains about something. It's similar to testing -- if 
your tests fail, the entire build file fails.

> The goal is to allow other developers to play with pmd easily but at the
> same time I do not want the build to be affected.

Maybe in the initial phase, but I'd strongly recommend integrating it in 
the main build. PMD is quite fast and doesn't add much delay to the process.

> I would like also to look at possibility to generate crossreferenced HTML
> code from Nutch sources as it looks like pmd can use it and violation
> reports would be much easier to read.

Yes, it's possible but I didn't play with it. I can't do it today, but 
maybe during the weekend. If you're faster than me, Piotr, let me know 
so that I don't waste the time. Thanks.

D.

Re: PMD integration

Posted by Jérôme Charron <je...@gmail.com>.
> I will make it totally separate target (so test do not
> depend on it).

+1


> The goal is to allow other developers to play with pmd easily but at the
> same time I do not want the build to be affected.

+1


> I would like also to look at possibility to generate crossreferenced HTML
> code from Nutch sources as it looks like pmd can use it and violation
> reports would be much easier to read.

+1

Thanks Piotr (and Dawid too of course)

Jérôme

--
http://motrech.free.fr/
http://www.frutch.org/

Re: PMD integration

Posted by Piotr Kosiorowski <pk...@gmail.com>.
I do agree with Jarome  - plugins should be checked too.
I would like to integrate PMD for core and plugins over the weekend based on
the Dawid's work - I will make it totally separate target (so test do not
depend on it).
The goal is to allow other developers to play with pmd easily but at the
same time I do not want the build to be affected.
I would like also to look at possibility to generate crossreferenced HTML
code from Nutch sources as it looks like pmd can use it and violation
reports would be much easier to read.
P,


On 4/7/06, Jérôme Charron <je...@gmail.com> wrote:
>
> > > that right now it is checking only main code (without plugins?).
> > Yes, that's correct -- I forgot to mention that. PMD target is hooked up
> > with tests and stops the build if something fails. I thought the core
> > code should be this strict; for plugins we can have more relaxed rules
>
> -1
> Since plugins provides a lot of Nutch functionalities (without any plugin,
> Nutch provides no service), I think that plugins code should be as strict
> as
> the core code.
>
> Thanks
>
> Jérôme
>
> --
> http://motrech.free.fr/
> http://www.frutch.org/
>
>

Re: PMD integration

Posted by Jérôme Charron <je...@gmail.com>.
> > that right now it is checking only main code (without plugins?).
> Yes, that's correct -- I forgot to mention that. PMD target is hooked up
> with tests and stops the build if something fails. I thought the core
> code should be this strict; for plugins we can have more relaxed rules

-1
Since plugins provides a lot of Nutch functionalities (without any plugin,
Nutch provides no service), I think that plugins code should be as strict as
the core code.

Thanks

Jérôme

--
http://motrech.free.fr/
http://www.frutch.org/