You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by Piergiorgio Lucidi <pi...@apache.org> on 2011/11/03 19:28:26 UTC

Proposal about a roadmap for connectors

Hi guys,

I would like to discuss with all of you about some new potential
components that we could add to the project:

1. an Alfresco Repository Connector
2. an Alfresco Authority Connector
3. an ElasticSearch Output Connector

Discussing with Karl in chat some days ago we talked about a new
Alfresco Connector to cover all the older Alfresco installations that
don't yet support the CMIS protocol.
I think that it could be very useful to add an Alfresco specific
Repository Connector to ManifoldCF, I'm sure that many users will
appreciate it and the project could have a very wide range of
connectors in this way.

For the Alfresco Authority Connector I have some doubts because I
don't find any way to follow, because we have the same limitation that
has the CMIS protocol about taking ACL per user.
By default you can take ACLs only for a specific content, this is a
standard behaviour for many ECM products.

But thanks to Karl that met Andy Hind at the Lucene EuroConference in
Barcelona, now I'm in touch with Andy to understand if we find a way
to implement an Alfresco Authority Connector using some undocumented
APIs of Alfresco.

For the ElasticSearch Output Connector I would like to have some help
or anyway it could be nice to find someone that could write the
connector or help me to achieve the goal. Maybe I found someone that
can help me but I would like to have an official confirmation before
starting to write the code.

What do you think?

I would like to have your feedbacks to start a good discussion to
understand together how we could proceed.
Thank you all for any feedback.

Cheers,
Piergiorgio

-- 
Piergiorgio Lucidi
http://about.me/piergiorgiolucidi

Re: Proposal about a roadmap for connectors

Posted by Piergiorgio Lucidi <pi...@gmail.com>.
Hi Karl,

2011/11/20 Karl Wright <da...@gmail.com>:
> Hi Piergiorgio,
>
> Some comments.
>
> (1) The new jars in the lib directory.  First, is there any reason to
> have more than one version of jetty?  What's checked in now is a
> version that was patched by the Solr team to fix an encoding problem,
> but you've added a new jar which may or may not have that patch in it.
>  We should figure this out and only include ONE jetty jar, in my

I only added jetty-plus and jetty-naming libraries to manage runtime
datasources.
Opps.... jetty-naming must be added in the lib directory.

I didn't added a different version of the Jetty server I added some
tools for the same version of Jetty that is used in ManifoldCF.

> opinion.  Second, the rest of the new jars:
>
> A    lib\mail.jar
> A    lib\wss4j-1.5.4-patched.jar
> A    lib\alfresco-web-service-client-4.0.b.jar
> A    lib\jetty-plus-6.1.26.jar
> A    lib\opensaml-1.0.1.jar
> A    lib\h2-1.3.158.jar
> A    lib\xmlsec-1.4.1.jar
>
> ... all have to have approved licenses for us to include them.  Can
> you describe the licenses of each in an email?  I know you already
> discussed h2 a little, but under what terms is it currently licensed?
> Apache 2.0?  GPL?  LGPL? etc.
>
> There are ways to handle jars that do not meet Apache spec, but first
> we need to know the details.  They also have to be described in
> LICENSE.txt in detail, so any links you can provide will help.

Ok I have to check all these licences, anyway all these dependences
are released in the Alfresco SDK and in the Alfresco product and I
think that probably we will not have any problem to use it in an
Apache project. But it is better to check!
I'll let you know soon news about this.

>
> (2) I think we're going to want to be a bit more detailed about how an
> alfresco connector user picks documents they want to include for
> indexing.  Is there any kind of structure in Alfresco such as
> directories or folders?  How about document types?

Alfresco supports directories and documents, spaces are similar to
directories and you can define different types of documents and
spaces. In a space you can run rules, for each rule you can trigger
one or more actions.

Alfresco was born as a very generic JCR repository and it can be
easily managed and customized using actions, rules, and content
modeling to define new types of docs, spaces, aspects and associations
for docs and spaces.

Now the connector supports only the Lucene Query to define the scope
of the job for ManifoldCF, in the same way you saw for the CMIS
connector but here using the CMIS Query Language (based on SQL-92).

Lucene in Alfresco allows you to execute a query using properties,
types, aspects and spaces, for example if we have the following query:

PATH:"/app:company_home/*" AND @cm\:name:"JohnDoe" AND
TYPE:"cm:folder" AND ASPECT:"custom:toIndex"

We are searching a child content of the space Company Home, that is
the node root of current version contents, that is named "John Doe"
and this content must be a space (in Alfresco a space is defined as
cm:folder type). An Alfresco aspect could be used to tag the content
that must be indexed using ManifoldCF. This is just an example of
modeling. An Aspect is a model definition that allows you to define in
Alfresco a group of properties and/or associations that can be
applied/removed dynamically on content instances.

Another example that it could be useful to mention is a FullText
search query as the following:

PATH:"/app:company_home/cm:ourCompanyDocuments//*" AND
TYPE:"cm:content" AND TEXT:"this content must be indexed"

Here we are defining a scope for all the children of the space Company
Home/ourCompanyDocuments that are generic documents, this means that
this type of nodes will have an associated binary content, and we
would like to index only the documents that have a specific sentence
in the body of the content.

So ManifoldCF users could test the query using Alfresco and then they
could use ManifoldCF to run jobs.

This is just a first implementation, I'm thinking about improvements
to give to users a dynamic form that can be build the query without
know nothing about the Lucene Query Language.

The same improvement can be done for the CMIS Connector, because we
can get from repositories the content model and then we can
dynamically create forms to define CMIS queries for custom models.

>
> (3) Let's hold off on pulling this up into trunk until at least the
> copyright issues are solved, and we've got a good story for document
> security.  Have you managed to get in touch with the Alfresco
> engineers on this point?

Ok we have to check licences but I would like to finish/debug/enhance
this implementation before pulling into trunk :-P
I would like to work on this another week, also for checking licenses of course.

Then I have to write the site documentation for this new connector.

>
> Otherwise, looks like a lot of great work!

Thank you!

^__^

>
> Karl
>
> On Sun, Nov 20, 2011 at 8:23 AM, Piergiorgio Lucidi
> <pi...@gmail.com> wrote:
>> Just an update about these two new potential connectors.
>>
>> I released a first initial version of the Alfresco Repository
>> Connector in the branch CONNECTORS-278.
>> This new connector is compatible with Alfresco 2.x, 3.x and 4.x, but
>> probably it should be also compatible with Alfresco 1.x. I have to
>> finish some new potential tests on the older version of Alfresco.
>>
>> I would like to solve the following issues during the next week:
>>
>> - Maven tests executions: to build the specific Alfresco WAR I added a
>> new Maven submodule created by the Maven Alfresco Archetype, and here
>> I need to understand how to configure Maven to run this module before
>> the tests module.
>>
>> - the integration tests implementation was tested on Alfresco 2.1.0,
>> 3.4 and 4.0.b and this means that probably this connector should also
>> work on Alfresco 1.x!!! :D
>>
>> - I have to add a new dependency to support the H2 database to run
>> Alfresco during the execution of the integration tests created by
>> Carlo Sciolla [1]. This is the unique way to run an Alfresco
>> repository with an embedded database. But probably we need to ask to
>> Carlo to add a specific license to allow us to use it in our Apache
>> project.
>>
>> For the ElasticSearch Output Connector we have a potential
>> contribution that could be made by one of my collegues that has
>> started to think about an initial implementation of the connector, now
>> we don't have the code, but it could arrive soon.
>>
>> I also started a discussion on the ElasticSearch forum [2] and two
>> guys are interested to contribute and they would like to be involved
>> in the development of this task: Michael Kelleher and Lukas Vicek!!!!
>>
>> I think that they could help us to consolidate not only this new
>> connector but all the project.
>> WDYT?
>>
>> Thank you for any feedback.
>> Piergiorgio
>>
>>
>> [1] https://github.com/skuro/alfresco-h2-support
>> [2] https://groups.google.com/group/elasticsearch/msg/3f651ad3062ff172
>>
>>
>> 2011/11/3 Karl Wright <da...@gmail.com>:
>>> The Alfresco connector sounds like a "go" to me, as long as the
>>> Alfresco folks support us.  And it sounds like they are doing that.
>>>
>>> The ElasticSearch output connector I don't know enough about to have
>>> an opinion on.  Maybe I'll get lucky and run into somebody from
>>> ElasticSearch at ApacheCon next week.  As long as I don't give them my
>>> flu germs, they might be willing to point us in the right direction.
>>> ;-)
>>>
>>> Karl
>>>
>>>
>>> On Thu, Nov 3, 2011 at 2:28 PM, Piergiorgio Lucidi
>>> <pi...@apache.org> wrote:
>>>> Hi guys,
>>>>
>>>> I would like to discuss with all of you about some new potential
>>>> components that we could add to the project:
>>>>
>>>> 1. an Alfresco Repository Connector
>>>> 2. an Alfresco Authority Connector
>>>> 3. an ElasticSearch Output Connector
>>>>
>>>> Discussing with Karl in chat some days ago we talked about a new
>>>> Alfresco Connector to cover all the older Alfresco installations that
>>>> don't yet support the CMIS protocol.
>>>> I think that it could be very useful to add an Alfresco specific
>>>> Repository Connector to ManifoldCF, I'm sure that many users will
>>>> appreciate it and the project could have a very wide range of
>>>> connectors in this way.
>>>>
>>>> For the Alfresco Authority Connector I have some doubts because I
>>>> don't find any way to follow, because we have the same limitation that
>>>> has the CMIS protocol about taking ACL per user.
>>>> By default you can take ACLs only for a specific content, this is a
>>>> standard behaviour for many ECM products.
>>>>
>>>> But thanks to Karl that met Andy Hind at the Lucene EuroConference in
>>>> Barcelona, now I'm in touch with Andy to understand if we find a way
>>>> to implement an Alfresco Authority Connector using some undocumented
>>>> APIs of Alfresco.
>>>>
>>>> For the ElasticSearch Output Connector I would like to have some help
>>>> or anyway it could be nice to find someone that could write the
>>>> connector or help me to achieve the goal. Maybe I found someone that
>>>> can help me but I would like to have an official confirmation before
>>>> starting to write the code.
>>>>
>>>> What do you think?
>>>>
>>>> I would like to have your feedbacks to start a good discussion to
>>>> understand together how we could proceed.
>>>> Thank you all for any feedback.
>>>>
>>>> Cheers,
>>>> Piergiorgio
>>>>
>>>> --
>>>> Piergiorgio Lucidi
>>>> http://about.me/piergiorgiolucidi
>>>>
>>>
>>
>>
>>
>> --
>> Piergiorgio Lucidi
>> http://about.me/piergiorgiolucidi
>>
>



-- 
Piergiorgio Lucidi
http://about.me/piergiorgiolucidi

Re: Proposal about a roadmap for connectors

Posted by Karl Wright <da...@gmail.com>.
Hi Piergiorgio,

Some comments.

(1) The new jars in the lib directory.  First, is there any reason to
have more than one version of jetty?  What's checked in now is a
version that was patched by the Solr team to fix an encoding problem,
but you've added a new jar which may or may not have that patch in it.
 We should figure this out and only include ONE jetty jar, in my
opinion.  Second, the rest of the new jars:

A    lib\mail.jar
A    lib\wss4j-1.5.4-patched.jar
A    lib\alfresco-web-service-client-4.0.b.jar
A    lib\jetty-plus-6.1.26.jar
A    lib\opensaml-1.0.1.jar
A    lib\h2-1.3.158.jar
A    lib\xmlsec-1.4.1.jar

... all have to have approved licenses for us to include them.  Can
you describe the licenses of each in an email?  I know you already
discussed h2 a little, but under what terms is it currently licensed?
Apache 2.0?  GPL?  LGPL? etc.

There are ways to handle jars that do not meet Apache spec, but first
we need to know the details.  They also have to be described in
LICENSE.txt in detail, so any links you can provide will help.

(2) I think we're going to want to be a bit more detailed about how an
alfresco connector user picks documents they want to include for
indexing.  Is there any kind of structure in Alfresco such as
directories or folders?  How about document types?

(3) Let's hold off on pulling this up into trunk until at least the
copyright issues are solved, and we've got a good story for document
security.  Have you managed to get in touch with the Alfresco
engineers on this point?

Otherwise, looks like a lot of great work!

Karl

On Sun, Nov 20, 2011 at 8:23 AM, Piergiorgio Lucidi
<pi...@gmail.com> wrote:
> Just an update about these two new potential connectors.
>
> I released a first initial version of the Alfresco Repository
> Connector in the branch CONNECTORS-278.
> This new connector is compatible with Alfresco 2.x, 3.x and 4.x, but
> probably it should be also compatible with Alfresco 1.x. I have to
> finish some new potential tests on the older version of Alfresco.
>
> I would like to solve the following issues during the next week:
>
> - Maven tests executions: to build the specific Alfresco WAR I added a
> new Maven submodule created by the Maven Alfresco Archetype, and here
> I need to understand how to configure Maven to run this module before
> the tests module.
>
> - the integration tests implementation was tested on Alfresco 2.1.0,
> 3.4 and 4.0.b and this means that probably this connector should also
> work on Alfresco 1.x!!! :D
>
> - I have to add a new dependency to support the H2 database to run
> Alfresco during the execution of the integration tests created by
> Carlo Sciolla [1]. This is the unique way to run an Alfresco
> repository with an embedded database. But probably we need to ask to
> Carlo to add a specific license to allow us to use it in our Apache
> project.
>
> For the ElasticSearch Output Connector we have a potential
> contribution that could be made by one of my collegues that has
> started to think about an initial implementation of the connector, now
> we don't have the code, but it could arrive soon.
>
> I also started a discussion on the ElasticSearch forum [2] and two
> guys are interested to contribute and they would like to be involved
> in the development of this task: Michael Kelleher and Lukas Vicek!!!!
>
> I think that they could help us to consolidate not only this new
> connector but all the project.
> WDYT?
>
> Thank you for any feedback.
> Piergiorgio
>
>
> [1] https://github.com/skuro/alfresco-h2-support
> [2] https://groups.google.com/group/elasticsearch/msg/3f651ad3062ff172
>
>
> 2011/11/3 Karl Wright <da...@gmail.com>:
>> The Alfresco connector sounds like a "go" to me, as long as the
>> Alfresco folks support us.  And it sounds like they are doing that.
>>
>> The ElasticSearch output connector I don't know enough about to have
>> an opinion on.  Maybe I'll get lucky and run into somebody from
>> ElasticSearch at ApacheCon next week.  As long as I don't give them my
>> flu germs, they might be willing to point us in the right direction.
>> ;-)
>>
>> Karl
>>
>>
>> On Thu, Nov 3, 2011 at 2:28 PM, Piergiorgio Lucidi
>> <pi...@apache.org> wrote:
>>> Hi guys,
>>>
>>> I would like to discuss with all of you about some new potential
>>> components that we could add to the project:
>>>
>>> 1. an Alfresco Repository Connector
>>> 2. an Alfresco Authority Connector
>>> 3. an ElasticSearch Output Connector
>>>
>>> Discussing with Karl in chat some days ago we talked about a new
>>> Alfresco Connector to cover all the older Alfresco installations that
>>> don't yet support the CMIS protocol.
>>> I think that it could be very useful to add an Alfresco specific
>>> Repository Connector to ManifoldCF, I'm sure that many users will
>>> appreciate it and the project could have a very wide range of
>>> connectors in this way.
>>>
>>> For the Alfresco Authority Connector I have some doubts because I
>>> don't find any way to follow, because we have the same limitation that
>>> has the CMIS protocol about taking ACL per user.
>>> By default you can take ACLs only for a specific content, this is a
>>> standard behaviour for many ECM products.
>>>
>>> But thanks to Karl that met Andy Hind at the Lucene EuroConference in
>>> Barcelona, now I'm in touch with Andy to understand if we find a way
>>> to implement an Alfresco Authority Connector using some undocumented
>>> APIs of Alfresco.
>>>
>>> For the ElasticSearch Output Connector I would like to have some help
>>> or anyway it could be nice to find someone that could write the
>>> connector or help me to achieve the goal. Maybe I found someone that
>>> can help me but I would like to have an official confirmation before
>>> starting to write the code.
>>>
>>> What do you think?
>>>
>>> I would like to have your feedbacks to start a good discussion to
>>> understand together how we could proceed.
>>> Thank you all for any feedback.
>>>
>>> Cheers,
>>> Piergiorgio
>>>
>>> --
>>> Piergiorgio Lucidi
>>> http://about.me/piergiorgiolucidi
>>>
>>
>
>
>
> --
> Piergiorgio Lucidi
> http://about.me/piergiorgiolucidi
>

Re: Proposal about a roadmap for connectors

Posted by Piergiorgio Lucidi <pi...@gmail.com>.
Just an update about these two new potential connectors.

I released a first initial version of the Alfresco Repository
Connector in the branch CONNECTORS-278.
This new connector is compatible with Alfresco 2.x, 3.x and 4.x, but
probably it should be also compatible with Alfresco 1.x. I have to
finish some new potential tests on the older version of Alfresco.

I would like to solve the following issues during the next week:

- Maven tests executions: to build the specific Alfresco WAR I added a
new Maven submodule created by the Maven Alfresco Archetype, and here
I need to understand how to configure Maven to run this module before
the tests module.

- the integration tests implementation was tested on Alfresco 2.1.0,
3.4 and 4.0.b and this means that probably this connector should also
work on Alfresco 1.x!!! :D

- I have to add a new dependency to support the H2 database to run
Alfresco during the execution of the integration tests created by
Carlo Sciolla [1]. This is the unique way to run an Alfresco
repository with an embedded database. But probably we need to ask to
Carlo to add a specific license to allow us to use it in our Apache
project.

For the ElasticSearch Output Connector we have a potential
contribution that could be made by one of my collegues that has
started to think about an initial implementation of the connector, now
we don't have the code, but it could arrive soon.

I also started a discussion on the ElasticSearch forum [2] and two
guys are interested to contribute and they would like to be involved
in the development of this task: Michael Kelleher and Lukas Vicek!!!!

I think that they could help us to consolidate not only this new
connector but all the project.
WDYT?

Thank you for any feedback.
Piergiorgio


[1] https://github.com/skuro/alfresco-h2-support
[2] https://groups.google.com/group/elasticsearch/msg/3f651ad3062ff172


2011/11/3 Karl Wright <da...@gmail.com>:
> The Alfresco connector sounds like a "go" to me, as long as the
> Alfresco folks support us.  And it sounds like they are doing that.
>
> The ElasticSearch output connector I don't know enough about to have
> an opinion on.  Maybe I'll get lucky and run into somebody from
> ElasticSearch at ApacheCon next week.  As long as I don't give them my
> flu germs, they might be willing to point us in the right direction.
> ;-)
>
> Karl
>
>
> On Thu, Nov 3, 2011 at 2:28 PM, Piergiorgio Lucidi
> <pi...@apache.org> wrote:
>> Hi guys,
>>
>> I would like to discuss with all of you about some new potential
>> components that we could add to the project:
>>
>> 1. an Alfresco Repository Connector
>> 2. an Alfresco Authority Connector
>> 3. an ElasticSearch Output Connector
>>
>> Discussing with Karl in chat some days ago we talked about a new
>> Alfresco Connector to cover all the older Alfresco installations that
>> don't yet support the CMIS protocol.
>> I think that it could be very useful to add an Alfresco specific
>> Repository Connector to ManifoldCF, I'm sure that many users will
>> appreciate it and the project could have a very wide range of
>> connectors in this way.
>>
>> For the Alfresco Authority Connector I have some doubts because I
>> don't find any way to follow, because we have the same limitation that
>> has the CMIS protocol about taking ACL per user.
>> By default you can take ACLs only for a specific content, this is a
>> standard behaviour for many ECM products.
>>
>> But thanks to Karl that met Andy Hind at the Lucene EuroConference in
>> Barcelona, now I'm in touch with Andy to understand if we find a way
>> to implement an Alfresco Authority Connector using some undocumented
>> APIs of Alfresco.
>>
>> For the ElasticSearch Output Connector I would like to have some help
>> or anyway it could be nice to find someone that could write the
>> connector or help me to achieve the goal. Maybe I found someone that
>> can help me but I would like to have an official confirmation before
>> starting to write the code.
>>
>> What do you think?
>>
>> I would like to have your feedbacks to start a good discussion to
>> understand together how we could proceed.
>> Thank you all for any feedback.
>>
>> Cheers,
>> Piergiorgio
>>
>> --
>> Piergiorgio Lucidi
>> http://about.me/piergiorgiolucidi
>>
>



-- 
Piergiorgio Lucidi
http://about.me/piergiorgiolucidi

Re: Proposal about a roadmap for connectors

Posted by Karl Wright <da...@gmail.com>.
The Alfresco connector sounds like a "go" to me, as long as the
Alfresco folks support us.  And it sounds like they are doing that.

The ElasticSearch output connector I don't know enough about to have
an opinion on.  Maybe I'll get lucky and run into somebody from
ElasticSearch at ApacheCon next week.  As long as I don't give them my
flu germs, they might be willing to point us in the right direction.
;-)

Karl


On Thu, Nov 3, 2011 at 2:28 PM, Piergiorgio Lucidi
<pi...@apache.org> wrote:
> Hi guys,
>
> I would like to discuss with all of you about some new potential
> components that we could add to the project:
>
> 1. an Alfresco Repository Connector
> 2. an Alfresco Authority Connector
> 3. an ElasticSearch Output Connector
>
> Discussing with Karl in chat some days ago we talked about a new
> Alfresco Connector to cover all the older Alfresco installations that
> don't yet support the CMIS protocol.
> I think that it could be very useful to add an Alfresco specific
> Repository Connector to ManifoldCF, I'm sure that many users will
> appreciate it and the project could have a very wide range of
> connectors in this way.
>
> For the Alfresco Authority Connector I have some doubts because I
> don't find any way to follow, because we have the same limitation that
> has the CMIS protocol about taking ACL per user.
> By default you can take ACLs only for a specific content, this is a
> standard behaviour for many ECM products.
>
> But thanks to Karl that met Andy Hind at the Lucene EuroConference in
> Barcelona, now I'm in touch with Andy to understand if we find a way
> to implement an Alfresco Authority Connector using some undocumented
> APIs of Alfresco.
>
> For the ElasticSearch Output Connector I would like to have some help
> or anyway it could be nice to find someone that could write the
> connector or help me to achieve the goal. Maybe I found someone that
> can help me but I would like to have an official confirmation before
> starting to write the code.
>
> What do you think?
>
> I would like to have your feedbacks to start a good discussion to
> understand together how we could proceed.
> Thank you all for any feedback.
>
> Cheers,
> Piergiorgio
>
> --
> Piergiorgio Lucidi
> http://about.me/piergiorgiolucidi
>