You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@creadur.apache.org by Robert Burrell Donkin <ro...@blueyonder.co.uk> on 2013/07/08 21:46:45 UTC

[GSOC] Rat: Past, Present and Future

The Past
--------
(Here's my perspective on the history of Rat, as I recall it now. 
Hopefully it isn't too controversial. Please feel free to jump in with 
clarifications...)

Rat arose from an itch of mine, and I coded the core of Rat as an 
experimental project, playing around with some unconventional 
architectural ideas. With hindsight, once other people wanted to start 
using it too, I really should have just sat down and completely 
rewritten the core code. By not doing so, I inflicted a world of 
craziness and pain on the community and ecology which sprang up around 
Rat. But by then, it had become hard to fix as higher quality peripheral 
code sprang up around it. Apologies.

The Present
-----------
Thanks to Manuel Suárez Sánchez and GSOC, we have an opportunity to 
adopt a more sane and sensible core design with good test coverage that 
will be easier and more enjoyable to maintain and comprehend going forward.

AIUI Google likes to be able to access a copy of the GSOC code, so I've 
suggested that Manuel codes on GitHub 
(https://github.com/elnuma/creadur-rat). Hopefully, this should allow 
people with GitHub forks (mine is 
https://github.com/itstechupnorth/creadur-rat) to pull in Manuel's code 
and give encouragement and advice. Apache has recorded an ICLA for 
Manuel, so when we're ready we should be able to start patching in pull 
requests.

The Future
----------
(Bit of a strawman - hopefully the community - including Manuel - will 
dive in with suggestions and we'll be able to gain consensus on a design 
direction...)

...

I wonder whether it would be simpler and more conventional to factor out 
three phases:

1. scan the source, building a strongly-typed, immutable domain model
2. analyse this model against policies, building a strongly-typed, 
immutable report model
3. use the report to output descriptive text or XML, or errors and warnings

I also think that Rat would benefit from

* using more conventional dependency injection (see, for example, 
http://www.martinfowler.com/articles/injection.html) replacing the 
static methods that litter the code
* immutable domain objects with builders

Opinions...? Improvements...? Objections...? Alternatives...?

Robert

Re: [GSOC] Rat: Past, Present and Future

Posted by Manuel Suárez Sánchez <ss...@gmail.com>.

Hi Everyone.

Two months ago more less this topic was created at that time I was new in
the project and I didn´t know a lot of things about it but with the past of
time I was working in the project and I was learning more about it.

My objective is try to do this task:
https://issues.apache.org/jira/browse/RAT-131 , I think that I made a lot
of changes, improvements and punish bad code in the project. My fork of the
project is here: https://github.com/elnuma/creadur-rat/tree/gsoc . This is
open source project I would like that community review it and I would like
to receive a FeedBack(I know that I´m new in this world so I can do good
and bad things for me the most important is learning about the mistakes).
Apache-Rat-Core:
                 Before:    After:
Coverage       75%      96%
[image: Imágenes integradas 1]
Changes of Refactor:

-Deleted not used Vars, Class, Method.
-Change bad used of Java.
-Improved performance.
-Add Test class, Test Methods
-Apply PMD Changes.
-Format Code.
-Add JavaDoc.

I have still Two weeks to work in the project in the Timeline of GSOC, In
this time I would like to improve the project for this reason I would like
to work in one task( I need that all the community together try to find the
weakness point of the project), all this time I was working alone because I
thought that I don´t have time to finish it but I understand that it is
open source  and we need to work together. The community was made growing
up this project and it´s the great of Open Source Project.

Manuel.


2013/7/11 Robert Burrell Donkin <ro...@blueyonder.co.uk>

> On 07/10/13 23:49, Manuel Suárez Sánchez wrote:
>
>>
>>> 1. scan the source, building a strongly-typed, immutable domain model
>>>
>>
>>
>> This point is basic to improve the project because now there aren´t a good
>> domain model and it´s very confused.
>>
>
> I think that the question comes down to granularity.
>
> Here's one way that the two contrasting approach might work...
>
> With the full model approach, the source would be scanned completed into a
> model before the document contents were analysed. Once the analysis was
> complete, then the reporting would start. The process flow would be
> course-grained. This would cut across the grain of the current Rat design.
>
> With a message oriented architecture, the scanner would send each document
> to enrichment as soon as it was created. The enricher would take a look at
> the contents and add document-level meta-data, then pass on the enriched
> object as soon as it was created. Aggregate analysers would then build up
> the report. This would be sympathetic to the current Rat design.
>
> Retaining a streaming/messaging architecture means modelling at the
> message level (rather than more complete structures)
>
> <snip>
>
>
>  However, I think that the current streaming design isn't particularly
>>
>>> intuitive or obvious. I would be happy to retain an improved streaming
>>> design.
>>>
>>
>>
>> I think that apache rat is a release audit tool, focused on licenses. In
>> the project you analyse a file(audio) and you get the license of the
>> file. Why
>> do you try to use streaming/message driven architecture?
>>
>
> Performance at small memory footprint
>
> Robert
>

Re: [GSOC] Rat: Past, Present and Future

Posted by Robert Burrell Donkin <ro...@blueyonder.co.uk>.

On 07/10/13 23:49, Manuel Suárez Sánchez wrote:
>>
>> 1. scan the source, building a strongly-typed, immutable domain model
>
>
> This point is basic to improve the project because now there aren´t a good
> domain model and it´s very confused.

I think that the question comes down to granularity.

Here's one way that the two contrasting approach might work...

With the full model approach, the source would be scanned completed into 
a model before the document contents were analysed. Once the analysis 
was complete, then the reporting would start. The process flow would be 
course-grained. This would cut across the grain of the current Rat design.

With a message oriented architecture, the scanner would send each 
document to enrichment as soon as it was created. The enricher would 
take a look at the contents and add document-level meta-data, then pass 
on the enriched object as soon as it was created. Aggregate analysers 
would then build up the report. This would be sympathetic to the current 
Rat design.

Retaining a streaming/messaging architecture means modelling at the 
message level (rather than more complete structures)

<snip>

> However, I think that the current streaming design isn't particularly
>> intuitive or obvious. I would be happy to retain an improved streaming
>> design.
>
>
> I think that apache rat is a release audit tool, focused on licenses. In
> the project you analyse a file(audio) and you get the license of the file. Why
> do you try to use streaming/message driven architecture?

Performance at small memory footprint

Robert

Re: [GSOC] Rat: Past, Present and Future

Posted by Manuel Suárez Sánchez <ss...@gmail.com>.

>
> 1. scan the source, building a strongly-typed, immutable domain model

This point is basic to improve the project because now there aren´t a good
domain model and it´s very confused.

Scanning could be done multithreadedly with a status output on console

This is a very good improve but maybe it will make the project more
complex. Apache-rat-core is the base and it will need to be clear and hardy.

* using more conventional dependency injection (see, for example,
> http://www.martinfowler.com/**articles/injection.html<http://www.martinfowler.com/articles/injection.html>)
> replacing the static methods that litter the code
> * immutable domain objects with builders

Static is dead, in the actual world all of developments is orientated to
injection,one of the most important java framework is SPRING so all the
functionality can be injected and you can change in runtime.

However, I think that the current streaming design isn't particularly
> intuitive or obvious. I would be happy to retain an improved streaming
> design.

I think that apache rat is a release audit tool, focused on licenses. In
the project you analyse a file(audio) and you get the license of the file. Why
do you try to use streaming/message driven architecture?

Manuel.

2013/7/9 Robert Burrell Donkin <ro...@blueyonder.co.uk>

> On 07/08/13 21:41, sebb wrote:
>
>> On 8 July 2013 20:46, Robert Burrell Donkin
>> <ro...@blueyonder.co.uk>>
>> wrote:
>>
>
> <snip>
>
>
>  2. analyse this model against policies, building a strongly-typed,
>>> immutable
>>> report model
>>>
>>
>> Won't that require lots of memory?
>>
>
> Not sure about lots (the state required should be relatively small) but
> yes, more
>
>
>  At present the source can be forgotten as soon as a match occurs.
>>
>
> Perhaps
>
> An architecture where each document flowed through the system is likely to
> be more efficient and easier to parallelism.
>
> However, I think that the current streaming design isn't particularly
> intuitive or obvious. I would be happy to retain an improved streaming
> design.
>
> Robert
>

Re: [GSOC] Rat: Past, Present and Future

Posted by Robert Burrell Donkin <ro...@blueyonder.co.uk>.

On 07/08/13 21:41, sebb wrote:
> On 8 July 2013 20:46, Robert Burrell Donkin
> <ro...@blueyonder.co.uk> wrote:

<snip>

>> 2. analyse this model against policies, building a strongly-typed, immutable
>> report model
>
> Won't that require lots of memory?

Not sure about lots (the state required should be relatively small) but 
yes, more

> At present the source can be forgotten as soon as a match occurs.

Perhaps

An architecture where each document flowed through the system is likely 
to be more efficient and easier to parallelism.

However, I think that the current streaming design isn't particularly 
intuitive or obvious. I would be happy to retain an improved streaming 
design.

Robert

Re: [GSOC] Rat: Past, Present and Future

Posted by sebb <se...@gmail.com>.

On 8 July 2013 20:46, Robert Burrell Donkin
<ro...@blueyonder.co.uk> wrote:
> The Past
> --------
> (Here's my perspective on the history of Rat, as I recall it now. Hopefully
> it isn't too controversial. Please feel free to jump in with
> clarifications...)
>
> Rat arose from an itch of mine, and I coded the core of Rat as an
> experimental project, playing around with some unconventional architectural
> ideas. With hindsight, once other people wanted to start using it too, I
> really should have just sat down and completely rewritten the core code. By
> not doing so, I inflicted a world of craziness and pain on the community and
> ecology which sprang up around Rat. But by then, it had become hard to fix
> as higher quality peripheral code sprang up around it. Apologies.
>
> The Present
> -----------
> Thanks to Manuel Suárez Sánchez and GSOC, we have an opportunity to adopt a
> more sane and sensible core design with good test coverage that will be
> easier and more enjoyable to maintain and comprehend going forward.
>
> AIUI Google likes to be able to access a copy of the GSOC code, so I've
> suggested that Manuel codes on GitHub
> (https://github.com/elnuma/creadur-rat). Hopefully, this should allow people
> with GitHub forks (mine is https://github.com/itstechupnorth/creadur-rat) to
> pull in Manuel's code and give encouragement and advice. Apache has recorded
> an ICLA for Manuel, so when we're ready we should be able to start patching
> in pull requests.
>
> The Future
> ----------
> (Bit of a strawman - hopefully the community - including Manuel - will dive
> in with suggestions and we'll be able to gain consensus on a design
> direction...)
>
> ...
>
> I wonder whether it would be simpler and more conventional to factor out
> three phases:
>
> 1. scan the source, building a strongly-typed, immutable domain model
> 2. analyse this model against policies, building a strongly-typed, immutable
> report model

Won't that require lots of memory?
At present the source can be forgotten as soon as a match occurs.

> 3. use the report to output descriptive text or XML, or errors and warnings
>
> I also think that Rat would benefit from
>
> * using more conventional dependency injection (see, for example,
> http://www.martinfowler.com/articles/injection.html) replacing the static
> methods that litter the code
> * immutable domain objects with builders
>
> Opinions...? Improvements...? Objections...? Alternatives...?
>
> Robert

Re: RAT: Configuration [WAS Re: [GSOC] Rat: Past, Present and Future]

Posted by Robert Burrell Donkin <ro...@blueyonder.co.uk>.

On 07/19/13 06:43, P. Ottlinger wrote:
> Hi *.
>
> Am 15.07.2013 23:01, schrieb Robert Burrell Donkin:
>>> Apart from the stuff you mentioned I'd prefer to inject the
>>> configuration as well to not pollute pom.xml files with that - currently
>>> it's quite a pain to use the tool since you have to configure rat twice.
>>
>> So, some sort of descriptor...? Perhaps in the project...?
>
> I thought of a very short desciptor (JSON or XML) that just defines the
> target licence.
>
> Each licence needs to have a key/implementation class pair so that a
> configuration may look like that:
>
> rat-config.json
> {
>      "rat-config": {
>          "licence": "GPL3",
>          "implementation": "org.apache.foo.GPL3Licence.java",
>          "level": "ERROR"
>      }
> }
>
> level could be ERROR / MESSAGE / REPORT meaning that either a message is
> just printed, a report file is generated or the build is broken.
>
> One could try to make implementation optional and guess the correct
> implementation by matching it magically to the list of available
> licences (currently the static variables, that may be changed into
> enumerations of all supported licences).
>
> Just a sketch without thinking about implemenation details ;-)

:-)

So, we're looking to introduce some sort of pluggable strategy for policy...

Sounds good

Robert

Re: RAT: Configuration [WAS Re: [GSOC] Rat: Past, Present and Future]

Posted by "P. Ottlinger" <po...@aiki-it.de>.

Hi *.

Am 15.07.2013 23:01, schrieb Robert Burrell Donkin:
>> Apart from the stuff you mentioned I'd prefer to inject the
>> configuration as well to not pollute pom.xml files with that - currently
>> it's quite a pain to use the tool since you have to configure rat twice.
> 
> So, some sort of descriptor...? Perhaps in the project...?

I thought of a very short desciptor (JSON or XML) that just defines the
target licence.

Each licence needs to have a key/implementation class pair so that a
configuration may look like that:

rat-config.json
{
    "rat-config": {
        "licence": "GPL3",
        "implementation": "org.apache.foo.GPL3Licence.java",
        "level": "ERROR"
    }
}

level could be ERROR / MESSAGE / REPORT meaning that either a message is
just printed, a report file is generated or the build is broken.

One could try to make implementation optional and guess the correct
implementation by matching it magically to the list of available
licences (currently the static variables, that may be changed into
enumerations of all supported licences).

Just a sketch without thinking about implemenation details ;-)

Phil

RAT: Configuration [WAS Re: [GSOC] Rat: Past, Present and Future]

Posted by Robert Burrell Donkin <ro...@blueyonder.co.uk>.

On 07/08/13 21:10, P. Ottlinger wrote:

<snip>

> Apart from the stuff you mentioned I'd prefer to inject the
> configuration as well to not pollute pom.xml files with that - currently
> it's quite a pain to use the tool since you have to configure rat twice.

So, some sort of descriptor...? Perhaps in the project...?

Or did you have something else in mind...?

Any other ideas...?

Alternatives...?

Robert

Re: [GSOC] Rat: Past, Present and Future

Posted by Robert Burrell Donkin <ro...@blueyonder.co.uk>.

On 07/08/13 21:10, P. Ottlinger wrote:

<snip>

> I would prefer more configuration options to use Rat on projects that
> are not Apache2-licensed only. This could be done when all configuration
> objects have interfaces and user-specific implementations can be
> injected or chosen as defaults.

+1

> Apart from the stuff you mentioned I'd prefer to inject the
> configuration as well to not pollute pom.xml files with that - currently
> it's quite a pain to use the tool since you have to configure rat twice.

+1

Robert

RAT: Beyond Apache2 [WAS Re: [GSOC] Rat: Past, Present and Future]

Posted by Robert Burrell Donkin <ro...@blueyonder.co.uk>.

On 07/08/13 21:10, P. Ottlinger wrote:

<snip>

> I would prefer more configuration options to use Rat on projects that
> are not Apache2-licensed only. This could be done when all configuration
> objects have interfaces and user-specific implementations can be
> injected or chosen as defaults.

So, design work needs focus on more clarity around configuration, 
perhaps...?

In terms of injecting pluggable extensions, does anyone have any user 
stories or use cases which might help us to understand how best to 
improve the design in this area...?

Robert

Re: [GSOC] Rat: Past, Present and Future

Posted by "P. Ottlinger" <po...@aiki-it.de>.

Dear *,

Am 08.07.2013 21:46, schrieb Robert Burrell Donkin:
> I wonder whether it would be simpler and more conventional to factor out
> three phases:
> 
> 1. scan the source, building a strongly-typed, immutable domain model

+1
Scanning could be done multithreadedly with a status output on console

> 2. analyse this model against policies, building a strongly-typed,
> immutable report model
> 3. use the report to output descriptive text or XML, or errors and warnings

+1
I don't know enough about maven restrictions - but maybe it's possible
to generate a special report that makes it easier to integrate a
rat-check in CI/Jenkins to only see files that are incorrect and the
cause, but not the whole mvn output on console.

> I also think that Rat would benefit from
> 
> * using more conventional dependency injection (see, for example,
> http://www.martinfowler.com/articles/injection.html) replacing the
> static methods that litter the code
> * immutable domain objects with builders

+1

I would prefer more configuration options to use Rat on projects that
are not Apache2-licensed only. This could be done when all configuration
objects have interfaces and user-specific implementations can be
injected or chosen as defaults.

Apart from the stuff you mentioned I'd prefer to inject the
configuration as well to not pollute pom.xml files with that - currently
it's quite a pain to use the tool since you have to configure rat twice.

Cheers
Phil