You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by Karl Wright <kw...@metacarta.com> on 2010/02/01 14:15:13 UTC

Order of development?

Once the Apache IP committee is done with the LCF grant, please note I've entered a number of tickets for work that would need 
to be done immediately after the software hits Apache's SVN.

It seems to me that the highest priority of these is getting some set of starting ant build scripts created.  That way, 
committers will be in a position to not break the world too badly when we start de-MetaCarta-ifying the code itself.  I think it 
would also be a good idea to settle on a directory structure at this time, so that we all know where we are heading, no?

The current structure of the granted code is as follows:

- mcdoc contains documentation
- mcsqa contains example tests
- upstream-diffs contains instructions for how to include or modify upstream packages
- products/connectors contains the framework itself, and all individual connectors, each with their own subdirectory underneath
- products/libapache-mod-authz-annotate contains the code for the Apache2 mod-aa module that does the search-side security piece
- products/postgres-config contains a package that should be an example of how to set up postgresql properly (on debian systems, 
anyway)
- products/java-environment contains a package that manages class paths for all the moving parts, on debian systems

Inside the products/connectors/framework directory, there are three overall sections: a "core functionality" section (in 
java-common), which contains both UI common functionality as well as crawler common functionality; an "agent framework" section, 
in java-agents, which defines the concept of output connectors, and provides functionality concerned with handling document 
ingestion and removal from an output connection; and a "crawler agent" section, under "crawler", which does everything else. 
The UI component is lumped together with the "crawler agent" section right now, but probably is logically separable.  The 
authority service and web application are also lumped in this bin, but should also be separated, IMHO.

I'd like therefore to propose that the guts of the "framework" directory be rejiggered as follows:

- Create a new "crawler-ui" directory, consisting of the stuff from java-common/ui, and the crawler UI jsp code
- Create a new "authority-webapp" directory, consisting of the authority service web application servlet code
- Move the framework/crawler/pullagent directory to framework/pull-agent
- If there aren't any interdependencies (I don't think there are, but I can't be sure yet), create a new "authority-service" 
directory, containing the stuff currently under framework/crawler/pullagent/com/metacarta/authorities.  This will introduce a 
new .jar file, but I think that's appropriate.

There may also be some rejiggering of individual connector directory hierarchies, but I haven't thought as much about that yet.

Thoughts?  Comments?


Karl

Re: Order of development?

Posted by Karl Wright <kw...@metacarta.com>.
Robert Muir wrote:
> i saved this email, but maybe even better would be to commit a short
> README.txt into svn itself with this table of contents you just listed.
> 
> might make it easier for folks to dive in.
> 

Ah.  That begs the question, "where in svn"?  The grant has not yet been committed, because it's still waiting on Apache IP. 
Also, I note that there's no "trunk" area, so I don't even really know *where* it's going to go when it's ready.  I am sure 
Grant has something in mind.

Or, did you mean adding this info to site pages?

Karl

Re: Order of development?

Posted by Robert Muir <rc...@gmail.com>.
i saved this email, but maybe even better would be to commit a short
README.txt into svn itself with this table of contents you just listed.

might make it easier for folks to dive in.

On Mon, Feb 1, 2010 at 8:15 AM, Karl Wright <kw...@metacarta.com> wrote:

>
> Once the Apache IP committee is done with the LCF grant, please note I've
> entered a number of tickets for work that would need to be done immediately
> after the software hits Apache's SVN.
>
> It seems to me that the highest priority of these is getting some set of
> starting ant build scripts created.  That way, committers will be in a
> position to not break the world too badly when we start de-MetaCarta-ifying
> the code itself.  I think it would also be a good idea to settle on a
> directory structure at this time, so that we all know where we are heading,
> no?
>
> The current structure of the granted code is as follows:
>
> - mcdoc contains documentation
> - mcsqa contains example tests
> - upstream-diffs contains instructions for how to include or modify
> upstream packages
> - products/connectors contains the framework itself, and all individual
> connectors, each with their own subdirectory underneath
> - products/libapache-mod-authz-annotate contains the code for the Apache2
> mod-aa module that does the search-side security piece
> - products/postgres-config contains a package that should be an example of
> how to set up postgresql properly (on debian systems, anyway)
> - products/java-environment contains a package that manages class paths for
> all the moving parts, on debian systems
>
> Inside the products/connectors/framework directory, there are three overall
> sections: a "core functionality" section (in java-common), which contains
> both UI common functionality as well as crawler common functionality; an
> "agent framework" section, in java-agents, which defines the concept of
> output connectors, and provides functionality concerned with handling
> document ingestion and removal from an output connection; and a "crawler
> agent" section, under "crawler", which does everything else. The UI
> component is lumped together with the "crawler agent" section right now, but
> probably is logically separable.  The authority service and web application
> are also lumped in this bin, but should also be separated, IMHO.
>
> I'd like therefore to propose that the guts of the "framework" directory be
> rejiggered as follows:
>
> - Create a new "crawler-ui" directory, consisting of the stuff from
> java-common/ui, and the crawler UI jsp code
> - Create a new "authority-webapp" directory, consisting of the authority
> service web application servlet code
> - Move the framework/crawler/pullagent directory to framework/pull-agent
> - If there aren't any interdependencies (I don't think there are, but I
> can't be sure yet), create a new "authority-service" directory, containing
> the stuff currently under
> framework/crawler/pullagent/com/metacarta/authorities.  This will introduce
> a new .jar file, but I think that's appropriate.
>
> There may also be some rejiggering of individual connector directory
> hierarchies, but I haven't thought as much about that yet.
>
> Thoughts?  Comments?
>
>
> Karl
>



-- 
Robert Muir
rcmuir@gmail.com

Re: Order of development?

Posted by Karl Wright <kw...@metacarta.com>.
Gianugo Rabellino wrote:
> On Mon, Feb 1, 2010 at 2:15 PM, Karl Wright <kw...@metacarta.com> wrote:
>> Once the Apache IP committee is done with the LCF grant, please note I've
>> entered a number of tickets for work that would need to be done immediately
>> after the software hits Apache's SVN.
>>
>> It seems to me that the highest priority of these is getting some set of
>> starting ant build scripts created.
> 
> I haven't seen the code yet, but does this mean there is no build
> system in place at all?
> 

There was a build system in place, for building debian packages, but one of the engineers at MetaCarta objected to including the 
existing debian package structure, other than the makefiles and control files.  It should be possible to reconstruct the former 
debian packaging, if that was desired, with some effort.  Also, it was ruled that we could not legally include some things that 
were necessary for a build to complete, such as SharePoint wsdls.  Finally, there were required upstream changes to certain 
apache packages (libcommons-httpclient and xerces2-java) which we could do nothing else for except grant the appropriate diff files.

So, we effectively need to put together a build environment as job one, within Apache.

Karl

Re: Order of development?

Posted by Gianugo Rabellino <gi...@gmail.com>.
On Mon, Feb 1, 2010 at 2:15 PM, Karl Wright <kw...@metacarta.com> wrote:
>
> Once the Apache IP committee is done with the LCF grant, please note I've
> entered a number of tickets for work that would need to be done immediately
> after the software hits Apache's SVN.
>
> It seems to me that the highest priority of these is getting some set of
> starting ant build scripts created.

I haven't seen the code yet, but does this mean there is no build
system in place at all?

>  That way, committers will be in a
> position to not break the world too badly when we start de-MetaCarta-ifying
> the code itself.  I think it would also be a good idea to settle on a
> directory structure at this time, so that we all know where we are heading,
> no?

Indeed. And while I don't want (and couldn't really care less) to
start a ant/maven/buildr/whatever flame fest, I think it might be
useful to rely on the multi-module directory layout provided by maven
(again, regardless of the build tool) as this is somewhat a friendly
solution to newcomers who will feel a bit more at home when browsing
the source code.

-- 
Gianugo Rabellino
Sourcesense, making sense of Open Source: http://www.sourcesense.com
(blogging at http://www.rabellino.it/blog/)