You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@chemistry.apache.org by Piergiorgio Lucidi <pi...@apache.org> on 2017/12/12 10:10:59 UTC

[ManifoldCF] - Content migration - Issue on OpenCMIS Server

Hi guys,

we are adding a new feature in Apache ManifoldCF for migrating contents
from any repository supported by the framework (we have connectors for
CMIS, Alfresco, Documentum, Sharepoint, FileNet and so on...) to any
CMIS-compliant repo.

We are finalizing this implementation but during the execution of
integration tests, we have found a strange behavior of the OpenCMIS Server
[1].

After the upgrade of the OpenCMIS libraries to the latest version (we were
on 0.13.0), it seems that when we remove a content from the repo, queries
continue to return the content. Unfortunately in this way ManifoldCF can't
identify which contents should be removed in the target repo because the
query return the same results.

Do you have changed the behavior for removal in the OpenCMIS Server?
Am I missing something?

I'm wondering if there is a flag for identifying deleted content or I
should take care of the  caching strategy. I think that I need your support
for solving this problem.

Could you please support me?
Thank you.

Cheers,
PJ

[1] - https://issues.apache.org/jira/browse/CONNECTORS-1356

-- 
Piergiorgio Lucidi
Open Source Evangelist and Enterprise Information Management Specialist
Mentor / PMC Member / Committer @ Apache Software Foundation
Community Star / Wiki Gardener / Global Forum Moderator @ Alfresco
Author and Technical Reviewer @ Packt Publishing
Technical Advisory Group Member @ Microsoft
Top Community Contributor @ Crafter
Project Leader / Committer @ JBoss
https://www.open4dev.com

Re: [ManifoldCF] - Content migration - Issue on OpenCMIS Server

Posted by Piergiorgio Lucidi <pi...@apache.org>.
2017-12-13 9:46 GMT+01:00 Huebel, Jens <j....@sap.com>:

> Hi Piergiorgi,
>
> It is hard to find the root cause for your behaviour. Neither on the
> client side nor in the InMemory server have been significant changes in the
> last release which could explain this. Can this be a threading/parallelism
> issue somehow? Are you sure that the instance of the InMemory server is not
> restarted (this will reset the whole state as there is no persistence)?Do
> you use the same Java version?
>
> I would recommend testing this with a different CMIS server. Can you run
> your tests against the File Share repository or some other production ready
> server (e.g. Alfresco)?
>

I'm sorry, I forgotten to mention that using Alfresco the connector works
as expected removing contents.
So the problem is related to the OpenCMIS InMemory Repository used for this
integration test.
I wanted to keep the same OpenCMIS InMemory Repository for both the source
and the target repo that for me it is a smart and lightweight way to keep
compliant with OpenCMIS libraries.

If we can't found any solution to this I should change the integration test
introducing a real CMIS-compliant repo as Alfresco.


> Another option would be to set the log-level of the opencmis client
> package to DEBUG. In this case we can trace the requests, which might help
> to investigate.
>

Ok I'll let you know any update on this.

Thank you.

Cheers,
PJ


>
> Best regards
> Jens
>
>
> On 13.12.17, 00:36, "Piergiorgio Lucidi" <pi...@apache.org> wrote:
>
>     Hi Florian,
>
>     2017-12-12 13:20 GMT+01:00 Florian Müller <fm...@apache.org>:
>
>     > Hi Piergiorgio,
>     >
>     > OpenCMIS does not cache queries or query results - neither on the
> client
>     > nor on the server side.
>     > All queries are sent to the repository and all query results are
>     > straightly converted to Java objects.
>     > I cannot explain the behavior that you are seeing.
>     >
>     > Can you describe your setup? Binding, repository vendor, query, etc.
>     > Is there a (simple) way to reproduce this?
>     >
>
>     we are actually using OpenCMIS InMemory Repository inside the
> integration
>     test for both source and target repo, this implementation is included
> in
>     the branch CONNECTORS-1356-2.7.1 [1].
>
>     The test starts two separate instances of the OpenCMIS InMemory
> Repository,
>     it prepares a test area with some sample contents and then starts
>     ManifoldCF.
>     During the test we add, change and finally remove contents and for
> each of
>     these steps we restart the job.
>
>     The connector allow to configure both the bindings but during the
>     integration tests we are using by default the REST binding.
>     You can take a look at the integration test included in the CMIS
> Connectors
>     module [2].
>
>     This module includes three different CMIS connectors:
>     - CMIS Repository Connector [3] (Read contents and executes queries)
>     - CMIS Authority Connector (access tokens)
>     - the new CMIS Output Connector [4] (injection of contents)
>
>     After upgrading to the latest OpenCMIS library 1.1.0, the integration
> tests
>     included in both CMIS Repository [5] and CMIS Output Connector [6]
> return
>     an exception during the last step. This last step is related to the
>     removeDocument method included in the CMIS Output Connector.
>
>     We need to upgrade OpenCMIS because we want to use the existsPath and
>     deleteByPath introduced in the new version of OpenCMIS but I'm
> wondering if
>     these methods are fully supported by the OpenCMIS InMemory Repository.
>
>
>     > The best way to get information about deleted objects is
> getContentChanges
>     > (see CMIS spec 1.1, section 2.1.15 "Change Log").
>
>     Unfortunately, it is only supported by a few repositories or
> repositories
>     > have deactivated it by default. You will always need your query
> solution as
>     > a backup.
>
>
>     I definitely should try to take a look at this, but I'm wondering if
> exists
>     a way for getting changes related to a specific query and not to all
> the
>     repo, with ManifoldCF we are doing this. You can configure different
> jobs
>     for indexing or migrating contents using a standard CMIS Query, all the
>     changes will be tracked by the framework.
>
>     For reproducing this issue follow these steps:
>     1. Checkout of the branch [1]
>     2. Run the command from the root of the project: ant make-core-deps
>     3. Run the command from the root of the project: ant make-deps
>     4. Run the command from /connectors/cmis: mvn clean install
>
>     Any feedback will be welcome and thank you again for your support.
>
>     Cheers,
>     PJ
>
>     [1] -
>     https://svn.apache.org/repos/asf/manifoldcf/branches/
> CONNECTORS-1356-2.7.1/
>
>     [2] -
>     https://svn.apache.org/repos/asf/manifoldcf/branches/
> CONNECTORS-1356-2.7.1/connectors/cmis/
>
>     [3] -
>     https://svn.apache.org/repos/asf/manifoldcf/branches/
> CONNECTORS-1356-2.7.1/connectors/cmis/connector/src/main/java/org/apache/
> manifoldcf/crawler/connectors/cmis/CmisRepositoryConnector.java
>
>     [4] -
>     https://svn.apache.org/repos/asf/manifoldcf/branches/
> CONNECTORS-1356-2.7.1/connectors/cmis/connector/src/main/java/org/apache/
> manifoldcf/agents/output/cmisoutput/CmisOutputConnector.java
>
>     [5] -
>     https://svn.apache.org/repos/asf/manifoldcf/branches/
> CONNECTORS-1356-2.7.1/connectors/cmis/connector/src/test/java/org/apache/
> manifoldcf/crawler/connectors/cmis/tests/APISanityHSQLDBIT.java
>
>     [6] -
>     https://svn.apache.org/repos/asf/manifoldcf/branches/
> CONNECTORS-1356-2.7.1/connectors/cmis/connector/src/test/java/org/apache/
> manifoldcf/agents/output/cmisoutput/tests/APISanityHSQLDBIT.java
>
>     [7] - http://chemistry.apache.org/opencmis-inmemory-repository.html
>
>
>
>     >
>     >
>     > - Florian
>     >
>     >
>     >
>     >
>     >
>     > Hi guys,
>     >>
>     >> we are adding a new feature in Apache ManifoldCF for migrating
> contents
>     >> from any repository supported by the framework (we have connectors
> for
>     >> CMIS, Alfresco, Documentum, Sharepoint, FileNet and so on...) to any
>     >> CMIS-compliant repo.
>     >>
>     >> We are finalizing this implementation but during the execution of
>     >> integration tests, we have found a strange behavior of the OpenCMIS
> Server
>     >> [1].
>     >>
>     >> After the upgrade of the OpenCMIS libraries to the latest version
> (we were
>     >> on 0.13.0), it seems that when we remove a content from the repo,
> queries
>     >> continue to return the content. Unfortunately in this way
> ManifoldCF can't
>     >> identify which contents should be removed in the target repo
> because the
>     >> query return the same results.
>     >>
>     >> Do you have changed the behavior for removal in the OpenCMIS Server?
>     >> Am I missing something?
>     >>
>     >> I'm wondering if there is a flag for identifying deleted content or
> I
>     >> should take care of the  caching strategy. I think that I need your
>     >> support
>     >> for solving this problem.
>     >>
>     >> Could you please support me?
>     >> Thank you.
>     >>
>     >> Cheers,
>     >> PJ
>     >>
>     >> [1] - https://issues.apache.org/jira/browse/CONNECTORS-1356
>     >>
>     >
>
>
>     --
>     Piergiorgio Lucidi
>     Open Source Evangelist and Enterprise Information Management Specialist
>     Mentor / PMC Member / Committer @ Apache Software Foundation
>     Community Star / Wiki Gardener / Global Forum Moderator @ Alfresco
>     Author and Technical Reviewer @ Packt Publishing
>     Technical Advisory Group Member @ Microsoft
>     Top Community Contributor @ Crafter
>     Project Leader / Committer @ JBoss
>     https://www.open4dev.com
>
>
>


-- 
Piergiorgio Lucidi
Open Source Evangelist and Enterprise Information Management Specialist
Mentor / PMC Member / Committer @ Apache Software Foundation
Community Star / Wiki Gardener / Global Forum Moderator @ Alfresco
Author and Technical Reviewer @ Packt Publishing
Technical Advisory Group Member @ Microsoft
Top Community Contributor @ Crafter
Project Leader / Committer @ JBoss
https://www.open4dev.com

Re: [ManifoldCF] - Content migration - Issue on OpenCMIS Server

Posted by "Huebel, Jens" <j....@sap.com>.
Hi Piergiorgi,

It is hard to find the root cause for your behaviour. Neither on the client side nor in the InMemory server have been significant changes in the last release which could explain this. Can this be a threading/parallelism issue somehow? Are you sure that the instance of the InMemory server is not restarted (this will reset the whole state as there is no persistence)?Do you use the same Java version?

I would recommend testing this with a different CMIS server. Can you run your tests against the File Share repository or some other production ready server (e.g. Alfresco)?
Another option would be to set the log-level of the opencmis client package to DEBUG. In this case we can trace the requests, which might help to investigate.

Best regards
Jens


On 13.12.17, 00:36, "Piergiorgio Lucidi" <pi...@apache.org> wrote:

    Hi Florian,
    
    2017-12-12 13:20 GMT+01:00 Florian Müller <fm...@apache.org>:
    
    > Hi Piergiorgio,
    >
    > OpenCMIS does not cache queries or query results - neither on the client
    > nor on the server side.
    > All queries are sent to the repository and all query results are
    > straightly converted to Java objects.
    > I cannot explain the behavior that you are seeing.
    >
    > Can you describe your setup? Binding, repository vendor, query, etc.
    > Is there a (simple) way to reproduce this?
    >
    
    we are actually using OpenCMIS InMemory Repository inside the integration
    test for both source and target repo, this implementation is included in
    the branch CONNECTORS-1356-2.7.1 [1].
    
    The test starts two separate instances of the OpenCMIS InMemory Repository,
    it prepares a test area with some sample contents and then starts
    ManifoldCF.
    During the test we add, change and finally remove contents and for each of
    these steps we restart the job.
    
    The connector allow to configure both the bindings but during the
    integration tests we are using by default the REST binding.
    You can take a look at the integration test included in the CMIS Connectors
    module [2].
    
    This module includes three different CMIS connectors:
    - CMIS Repository Connector [3] (Read contents and executes queries)
    - CMIS Authority Connector (access tokens)
    - the new CMIS Output Connector [4] (injection of contents)
    
    After upgrading to the latest OpenCMIS library 1.1.0, the integration tests
    included in both CMIS Repository [5] and CMIS Output Connector [6] return
    an exception during the last step. This last step is related to the
    removeDocument method included in the CMIS Output Connector.
    
    We need to upgrade OpenCMIS because we want to use the existsPath and
    deleteByPath introduced in the new version of OpenCMIS but I'm wondering if
    these methods are fully supported by the OpenCMIS InMemory Repository.
    
    
    > The best way to get information about deleted objects is getContentChanges
    > (see CMIS spec 1.1, section 2.1.15 "Change Log").
    
    Unfortunately, it is only supported by a few repositories or repositories
    > have deactivated it by default. You will always need your query solution as
    > a backup.
    
    
    I definitely should try to take a look at this, but I'm wondering if exists
    a way for getting changes related to a specific query and not to all the
    repo, with ManifoldCF we are doing this. You can configure different jobs
    for indexing or migrating contents using a standard CMIS Query, all the
    changes will be tracked by the framework.
    
    For reproducing this issue follow these steps:
    1. Checkout of the branch [1]
    2. Run the command from the root of the project: ant make-core-deps
    3. Run the command from the root of the project: ant make-deps
    4. Run the command from /connectors/cmis: mvn clean install
    
    Any feedback will be welcome and thank you again for your support.
    
    Cheers,
    PJ
    
    [1] -
    https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-1356-2.7.1/
    
    [2] -
    https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-1356-2.7.1/connectors/cmis/
    
    [3] -
    https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-1356-2.7.1/connectors/cmis/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/cmis/CmisRepositoryConnector.java
    
    [4] -
    https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-1356-2.7.1/connectors/cmis/connector/src/main/java/org/apache/manifoldcf/agents/output/cmisoutput/CmisOutputConnector.java
    
    [5] -
    https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-1356-2.7.1/connectors/cmis/connector/src/test/java/org/apache/manifoldcf/crawler/connectors/cmis/tests/APISanityHSQLDBIT.java
    
    [6] -
    https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-1356-2.7.1/connectors/cmis/connector/src/test/java/org/apache/manifoldcf/agents/output/cmisoutput/tests/APISanityHSQLDBIT.java
    
    [7] - http://chemistry.apache.org/opencmis-inmemory-repository.html
    
    
    
    >
    >
    > - Florian
    >
    >
    >
    >
    >
    > Hi guys,
    >>
    >> we are adding a new feature in Apache ManifoldCF for migrating contents
    >> from any repository supported by the framework (we have connectors for
    >> CMIS, Alfresco, Documentum, Sharepoint, FileNet and so on...) to any
    >> CMIS-compliant repo.
    >>
    >> We are finalizing this implementation but during the execution of
    >> integration tests, we have found a strange behavior of the OpenCMIS Server
    >> [1].
    >>
    >> After the upgrade of the OpenCMIS libraries to the latest version (we were
    >> on 0.13.0), it seems that when we remove a content from the repo, queries
    >> continue to return the content. Unfortunately in this way ManifoldCF can't
    >> identify which contents should be removed in the target repo because the
    >> query return the same results.
    >>
    >> Do you have changed the behavior for removal in the OpenCMIS Server?
    >> Am I missing something?
    >>
    >> I'm wondering if there is a flag for identifying deleted content or I
    >> should take care of the  caching strategy. I think that I need your
    >> support
    >> for solving this problem.
    >>
    >> Could you please support me?
    >> Thank you.
    >>
    >> Cheers,
    >> PJ
    >>
    >> [1] - https://issues.apache.org/jira/browse/CONNECTORS-1356
    >>
    >
    
    
    -- 
    Piergiorgio Lucidi
    Open Source Evangelist and Enterprise Information Management Specialist
    Mentor / PMC Member / Committer @ Apache Software Foundation
    Community Star / Wiki Gardener / Global Forum Moderator @ Alfresco
    Author and Technical Reviewer @ Packt Publishing
    Technical Advisory Group Member @ Microsoft
    Top Community Contributor @ Crafter
    Project Leader / Committer @ JBoss
    https://www.open4dev.com
    


Re: [ManifoldCF] - Content migration - Issue on OpenCMIS Server

Posted by Piergiorgio Lucidi <pi...@apache.org>.
Hi Florian,

2017-12-12 13:20 GMT+01:00 Florian Müller <fm...@apache.org>:

> Hi Piergiorgio,
>
> OpenCMIS does not cache queries or query results - neither on the client
> nor on the server side.
> All queries are sent to the repository and all query results are
> straightly converted to Java objects.
> I cannot explain the behavior that you are seeing.
>
> Can you describe your setup? Binding, repository vendor, query, etc.
> Is there a (simple) way to reproduce this?
>

we are actually using OpenCMIS InMemory Repository inside the integration
test for both source and target repo, this implementation is included in
the branch CONNECTORS-1356-2.7.1 [1].

The test starts two separate instances of the OpenCMIS InMemory Repository,
it prepares a test area with some sample contents and then starts
ManifoldCF.
During the test we add, change and finally remove contents and for each of
these steps we restart the job.

The connector allow to configure both the bindings but during the
integration tests we are using by default the REST binding.
You can take a look at the integration test included in the CMIS Connectors
module [2].

This module includes three different CMIS connectors:
- CMIS Repository Connector [3] (Read contents and executes queries)
- CMIS Authority Connector (access tokens)
- the new CMIS Output Connector [4] (injection of contents)

After upgrading to the latest OpenCMIS library 1.1.0, the integration tests
included in both CMIS Repository [5] and CMIS Output Connector [6] return
an exception during the last step. This last step is related to the
removeDocument method included in the CMIS Output Connector.

We need to upgrade OpenCMIS because we want to use the existsPath and
deleteByPath introduced in the new version of OpenCMIS but I'm wondering if
these methods are fully supported by the OpenCMIS InMemory Repository.


> The best way to get information about deleted objects is getContentChanges
> (see CMIS spec 1.1, section 2.1.15 "Change Log").

Unfortunately, it is only supported by a few repositories or repositories
> have deactivated it by default. You will always need your query solution as
> a backup.


I definitely should try to take a look at this, but I'm wondering if exists
a way for getting changes related to a specific query and not to all the
repo, with ManifoldCF we are doing this. You can configure different jobs
for indexing or migrating contents using a standard CMIS Query, all the
changes will be tracked by the framework.

For reproducing this issue follow these steps:
1. Checkout of the branch [1]
2. Run the command from the root of the project: ant make-core-deps
3. Run the command from the root of the project: ant make-deps
4. Run the command from /connectors/cmis: mvn clean install

Any feedback will be welcome and thank you again for your support.

Cheers,
PJ

[1] -
https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-1356-2.7.1/

[2] -
https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-1356-2.7.1/connectors/cmis/

[3] -
https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-1356-2.7.1/connectors/cmis/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/cmis/CmisRepositoryConnector.java

[4] -
https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-1356-2.7.1/connectors/cmis/connector/src/main/java/org/apache/manifoldcf/agents/output/cmisoutput/CmisOutputConnector.java

[5] -
https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-1356-2.7.1/connectors/cmis/connector/src/test/java/org/apache/manifoldcf/crawler/connectors/cmis/tests/APISanityHSQLDBIT.java

[6] -
https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-1356-2.7.1/connectors/cmis/connector/src/test/java/org/apache/manifoldcf/agents/output/cmisoutput/tests/APISanityHSQLDBIT.java

[7] - http://chemistry.apache.org/opencmis-inmemory-repository.html



>
>
> - Florian
>
>
>
>
>
> Hi guys,
>>
>> we are adding a new feature in Apache ManifoldCF for migrating contents
>> from any repository supported by the framework (we have connectors for
>> CMIS, Alfresco, Documentum, Sharepoint, FileNet and so on...) to any
>> CMIS-compliant repo.
>>
>> We are finalizing this implementation but during the execution of
>> integration tests, we have found a strange behavior of the OpenCMIS Server
>> [1].
>>
>> After the upgrade of the OpenCMIS libraries to the latest version (we were
>> on 0.13.0), it seems that when we remove a content from the repo, queries
>> continue to return the content. Unfortunately in this way ManifoldCF can't
>> identify which contents should be removed in the target repo because the
>> query return the same results.
>>
>> Do you have changed the behavior for removal in the OpenCMIS Server?
>> Am I missing something?
>>
>> I'm wondering if there is a flag for identifying deleted content or I
>> should take care of the  caching strategy. I think that I need your
>> support
>> for solving this problem.
>>
>> Could you please support me?
>> Thank you.
>>
>> Cheers,
>> PJ
>>
>> [1] - https://issues.apache.org/jira/browse/CONNECTORS-1356
>>
>


-- 
Piergiorgio Lucidi
Open Source Evangelist and Enterprise Information Management Specialist
Mentor / PMC Member / Committer @ Apache Software Foundation
Community Star / Wiki Gardener / Global Forum Moderator @ Alfresco
Author and Technical Reviewer @ Packt Publishing
Technical Advisory Group Member @ Microsoft
Top Community Contributor @ Crafter
Project Leader / Committer @ JBoss
https://www.open4dev.com

Re: [ManifoldCF] - Content migration - Issue on OpenCMIS Server

Posted by Florian Müller <fm...@apache.org>.
Hi Piergiorgio,

OpenCMIS does not cache queries or query results - neither on the client 
nor on the server side.
All queries are sent to the repository and all query results are 
straightly converted to Java objects.
I cannot explain the behavior that you are seeing.

Can you describe your setup? Binding, repository vendor, query, etc.
Is there a (simple) way to reproduce this?

The best way to get information about deleted objects is 
getContentChanges (see CMIS spec 1.1, section 2.1.15 "Change Log").
Unfortunately, it is only supported by a few repositories or 
repositories have deactivated it by default. You will always need your 
query solution as a backup.


- Florian




> Hi guys,
> 
> we are adding a new feature in Apache ManifoldCF for migrating contents
> from any repository supported by the framework (we have connectors for
> CMIS, Alfresco, Documentum, Sharepoint, FileNet and so on...) to any
> CMIS-compliant repo.
> 
> We are finalizing this implementation but during the execution of
> integration tests, we have found a strange behavior of the OpenCMIS 
> Server
> [1].
> 
> After the upgrade of the OpenCMIS libraries to the latest version (we 
> were
> on 0.13.0), it seems that when we remove a content from the repo, 
> queries
> continue to return the content. Unfortunately in this way ManifoldCF 
> can't
> identify which contents should be removed in the target repo because 
> the
> query return the same results.
> 
> Do you have changed the behavior for removal in the OpenCMIS Server?
> Am I missing something?
> 
> I'm wondering if there is a flag for identifying deleted content or I
> should take care of the  caching strategy. I think that I need your 
> support
> for solving this problem.
> 
> Could you please support me?
> Thank you.
> 
> Cheers,
> PJ
> 
> [1] - https://issues.apache.org/jira/browse/CONNECTORS-1356