You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by pf...@funnelback.com on 2015/10/27 18:20:14 UTC

Manifold Config - Removing Associated Docs

Hi all,

I am wondering if anyone can advise on exactly what happens when someone clicks on the "Remove All Associated Records/Documents" button on the 'Output Connections' area?

Is there a way I can do whatever operations are carried out programatically?

Ultimately, I may need to run a crawl of a data source on a 'full crawl' basis only. In other words, I do not want Manifold to care about what documents have been crawled previously. I just want it to pick up and send all documents all of the time. The only way I can see of doing this is to replicate what happens when I click on that button (which obviously triggers a full crawl each time).

Perhaps this is already a config option I may have missed?

Just in case it is of interest, this is an Alfresco crawl

Cheers


Re: Manifold Config - Removing Associated Docs

Posted by pf...@funnelback.com.
So it is. I didn't even know there was an API. Sorry for not checking first. 

Cheers Karl

-----Original Message-----
From: "Karl Wright" <da...@gmail.com>
Sent: Tuesday, October 27, 2015 1:24pm
To: "user@manifoldcf.apache.org" <us...@manifoldcf.apache.org>
Subject: Re: Manifold Config - Removing Associated Docs

Hi Paul,

The functionality is present in the REST API.

Karl


On Tue, Oct 27, 2015 at 1:20 PM, <pf...@funnelback.com> wrote:

> Hi all,
>
> I am wondering if anyone can advise on exactly what happens when someone
> clicks on the "Remove All Associated Records/Documents" button on the
> 'Output Connections' area?
>
> Is there a way I can do whatever operations are carried out
> programatically?
>
> Ultimately, I may need to run a crawl of a data source on a 'full crawl'
> basis only. In other words, I do not want Manifold to care about what
> documents have been crawled previously. I just want it to pick up and send
> all documents all of the time. The only way I can see of doing this is to
> replicate what happens when I click on that button (which obviously
> triggers a full crawl each time).
>
> Perhaps this is already a config option I may have missed?
>
> Just in case it is of interest, this is an Alfresco crawl
>
> Cheers
>
>



Re: Alfresco WebScript Connector - Testing Question

Posted by Karl Wright <da...@gmail.com>.
Hi Deanna,

For the CMIS connector, I created CONNECTORS-1248 to cover the version info
issue you describe.

Karl


On Wed, Oct 28, 2015 at 8:08 AM, Delapasse, Deanna <
ddelapasse@oceaneering.com> wrote:

> Hi Paul,
>
> I haven't read the entire thread, so I apologize if this is way off base...
>
> When I worked with the CMIS connector I had to modify the logic to append
> document.getLastModificationDate().getTimeInMillis() to the versionString
> for it to pick up changes.  The Alfresco document version won't update when
> you modify metadata.  My memory is terrible, but I believe that even
> modifying content may not do it unless you have the proper 'versioning'
> aspect applied.
>
> Check inside Alfresco and see if your "version" is actually incrementing
> as you expect. I was using an older Alfresco version and was not able to
> run with the Alfresco connector, but the CMIS connector worked great for us!
>
> Good luck!
> Deanna
>
>
>
>
> On Wed, Oct 28, 2015 at 6:07 AM, Paul Farrell <pf...@funnelback.com>
> wrote:
>
>> The alfresco log snippet doesn’t really shed any more light. It simple
>> doesn’t think that the document content has changed.
>>
>> 09:56:42,059 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl]
>> [http-apr-8080-exec-5] [getNodesByTransactionId] On Store
>> workspace://SpacesStore
>> 09:56:42,065 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl]
>> [http-apr-8080-exec-5] [getLastTransactionID]
>> 09:56:42,065 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl]
>> [http-apr-8080-exec-5] [getNodesByAclChangesetId] On Store
>> workspace://SpacesStore
>> 09:56:42,070 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl]
>> [http-apr-8080-exec-5] [getLastAclChangeSetID]
>> 09:56:42,070 DEBUG
>> [com.github.maoo.indexer.webscripts.NodeChangesWebScript]
>> [http-apr-8080-exec-5] Attaching 0 nodes to the WebScript template
>> 09:56:42,079 DEBUG
>> [com.github.maoo.indexer.webscripts.NodeChangesWebScript]
>> [http-apr-8080-exec-9] Invoking Changes Webscript, using the following
>> params
>> lastTxnId: 352
>> lastAclChangesetId: 13
>> storeId: SpacesStore
>> storeProtocol: workspace
>> indexingFilters:
>> {"aspectFilters":[],"metadataFilters":{},"mimetypeFilters":[],"siteFilters":["Finance"],"typeFilters":[]}
>>
>> 09:56:42,079 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl]
>> [http-apr-8080-exec-9] [getNodesByTransactionId] On Store
>> workspace://SpacesStore
>> 09:56:42,082 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl]
>> [http-apr-8080-exec-9] [getLastTransactionID]
>> 09:56:42,082 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl]
>> [http-apr-8080-exec-9] [getNodesByAclChangesetId] On Store
>> workspace://SpacesStore
>> 09:56:42,087 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl]
>> [http-apr-8080-exec-9] [getLastAclChangeSetID]
>> 09:56:42,087 DEBUG
>> [com.github.maoo.indexer.webscripts.NodeChangesWebScript]
>> [http-apr-8080-exec-9] Attaching 0 nodes to the WebScript template
>>
>> *Paul Farrell*
>> Senior Search Consultant
>>
>> 109-123 Clifton Street, London EC2A 4LD
>> *T* +44 (0) 207 183 6865 | funnelback.com <http://www.funnelback.com/>
>>
>> *UNITED KINGDOM* | AUSTRALIA | NEW ZEALAND | POLAND | UNITED STATES
>>
>> Connect with us: LinkedIn <http://www.linkedin.com/company/funnelback> -
>> Twitter <https://twitter.com/funnelback>
>>
>> Funnelback UK Ltd is a limited liability company registered in England &
>> Wales. Registered address: Zetland House 109-123, Clifton Street, London.
>> EC2A 4LD. Company registration number: 07004264.
>>
>> On 28 Oct 2015, at 10:50, Rafa Haro <rh...@gmail.com> wrote:
>>
>> You’re welcome Paul. Just in case, could you check the Alfresco logs to
>> see if there is something informative there?
>>
>> Cheers,
>> Rafa
>>
>>
>>
>>
>> On Wed, Oct 28, 2015 at 11:47 AM, Paul Farrell <pf...@funnelback.com>
>> wrote:
>>
>>> I see. That makes sense.
>>>
>>> No problem. Thanks for the feedback Rafa. Much appreciated.
>>>
>>>
>>>
>>> *Paul Farrell*
>>> Senior Search Consultant
>>>
>>> 109-123 Clifton Street, London EC2A 4LD
>>> *T* +44 (0) 207 183 6865 | funnelback.com <http://www.funnelback.com/>
>>>
>>> *UNITED KINGDOM* | AUSTRALIA | NEW ZEALAND | POLAND | UNITED STATES
>>>
>>> Connect with us: LinkedIn <http://www.linkedin.com/company/funnelback> -
>>>  Twitter <https://twitter.com/funnelback>
>>>
>>> Funnelback UK Ltd is a limited liability company registered in England &
>>> Wales. Registered address: Zetland House 109-123, Clifton Street, London.
>>> EC2A 4LD. Company registration number: 07004264.
>>>
>>> On 28 Oct 2015, at 10:45, Rafa Haro <rh...@gmail.com> wrote:
>>>
>>> Hi Paul,
>>>
>>> Before contributing the Alfresco connector, we performed several tests
>>> similar to yours using an Alfresco 4.x version. Therefore, initially, my
>>> guess is the Webscript is not behaving correctly for Alfresco 5 instances.
>>> I’m including Maurizio Pillitu (Alfresco Indexer main developer) in the
>>> email thread. He might can provide some feedback about this or just confirm
>>> my suspicions.
>>>
>>> Cheers,
>>> Rafa
>>>
>>>
>>>
>>>
>>> On Wed, Oct 28, 2015 at 11:33 AM, Paul Farrell <pf...@funnelback.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> In follow up to my recent email (below) I thought I would share my
>>>> findings with the ‘Alfresco Indexer’ connector (
>>>> https://github.com/maoo/alfresco-indexer) in case someone may be able
>>>> to advise on it’s usage.
>>>>
>>>> The reason I went to this is due to the lack of change control
>>>> detection with either of the packaged Manifold Alfresco connectors (AtomPub
>>>> or WebService). I needed a method whereby the crawl runs each night and
>>>> picks up any and all changes to the documents from the previous 24 hours. A
>>>> common scenario.
>>>>
>>>> Unfortunately, I am still to achieve this.
>>>>
>>>> Having built and installed both the AMP and JAR files needed for the
>>>> new connector, changes are still not coming through. In fact, I have two
>>>> observations so far:
>>>>
>>>> 1. Changes to document content or properties does not cause the same
>>>> document to be picked up by the Alfresco connector on the next run
>>>> 2. Adding ‘Filter Configuration’ seems to do very little to change what
>>>> is picked up
>>>>
>>>> *IN DETAIL*
>>>> *1. Failing to pick up modified content*
>>>>
>>>> Looking at the log files (which are set to debug) I can see that, upon
>>>> the first crawl of Alfresco, Manifold sends the following requests:
>>>>
>>>> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - Executing request
>>>> GET
>>>> /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>>>> HTTP/1.1
>>>> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239
>>>> >> GET
>>>> /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>>>> HTTP/1.1
>>>> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239
>>>> >> "GET
>>>> /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>>>> HTTP/1.1[\r][\n]"
>>>> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - Executing request
>>>> GET
>>>> /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>>>> HTTP/1.1
>>>> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240
>>>> >> GET
>>>> /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>>>> HTTP/1.1
>>>> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240
>>>> >> "GET
>>>> /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>>>> HTTP/1.1[\r][\n]"
>>>> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - Executing request
>>>> GET
>>>> /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content
>>>> HTTP/1.1
>>>> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241
>>>> >> GET
>>>> /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content
>>>> HTTP/1.1
>>>> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241
>>>> >> "GET
>>>> /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content
>>>> HTTP/1.1[\r][\n]"
>>>> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - Executing request
>>>> GET
>>>> /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a
>>>> HTTP/1.1
>>>> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242
>>>> >> GET
>>>> /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a
>>>> HTTP/1.1
>>>> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242
>>>> >> "GET
>>>> /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a
>>>> HTTP/1.1[\r][\n]"
>>>>
>>>> This picks up all of the content e.g. documents.
>>>>
>>>> Running a second crawl, without any other actions being done, results
>>>> in the following requests:
>>>>
>>>> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - Executing request GET
>>>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
>>>> HTTP/1.1
>>>> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >>
>>>> GET
>>>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
>>>> HTTP/1.1
>>>> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >>
>>>> "GET
>>>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
>>>> HTTP/1.1[\r][\n]”
>>>>
>>>> So I can see that, in the first instance, we are targeting content
>>>> directly while, in the second, we are asking for changes. The problem is
>>>> that no changes are returned from the second set of requests. The response
>>>> from these calls is:
>>>>
>>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "
>>>>  "totalNodes" : "0", [\r][\n]"
>>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "
>>>>  "elapsedTime" : "8",[\r][\n]"
>>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "
>>>>  "docs" : [[\r][\n]"
>>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "
>>>>  ],[\r][\n]"
>>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "
>>>>    "last_txn_id" : "352",[\r][\n]"
>>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "
>>>>    "last_acl_changeset_id" : "13",[\r][\n]"
>>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "
>>>>  "store_id" : "SpacesStore",[\r][\n]"
>>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "
>>>>  "store_protocol" : "workspace"[\r][\n]"
>>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 <<
>>>> “}"
>>>>
>>>> Regardless of what changes I make to a document that I have been using
>>>> for testing, the document is not updated. The response from the calls for
>>>> changes (totalNodes) is always ‘0’.
>>>>
>>>>
>>>> *2. Adding ‘Filter Configuration’ seems to do very little to change
>>>> what is picked up*
>>>>
>>>> Within my test Alfresco environment I have one site set up (Finance).
>>>> Within the Finance doc library I have three test docs. No other changes
>>>> have been made to the Alfresco instance.
>>>> Running a crawl with no filter configurations set returns 81 items.
>>>> This is via the URL in a browser.
>>>> If I then set the Site Filter configuration to ‘Finance’ and apply, I
>>>> still get 81 items when I re-run the crawl.
>>>> I can see that the term ‘Finance’ is being added to the URL but this
>>>> does not seem to change the behaviour.
>>>>
>>>>
>>>> I am happy to spend time diagnosing this is there is anyone available
>>>> to assist.
>>>>
>>>> Thanks
>>>>
>>>> Paul
>>>>
>>>>
>>>>
>>>> On 27 Oct 2015, at 18:14, pfarrell@funnelback.com wrote:
>>>>
>>>> Hi all,
>>>>
>>>> This is a question regarding the relatively new Alfresco Webscript
>>>> connector.
>>>>
>>>> SETUP
>>>> I have a vanilla Alfresco Community 5.0 installation
>>>> One site has been created called 'Finance'
>>>> A handful of documents have been created in 'Finance' Doc Library.
>>>> I have cloned and packaged up the 'alfresco-indexer' (
>>>> https://github.com/maoo/alfresco-indexer) and have applied the AMP and
>>>> CLIENT packages to their respective environments.
>>>>
>>>>
>>>> ISSUE
>>>> The issue is that the default API call used by Manifold is returning
>>>> nothing. The full API call used by Manifold, and based on my config, is :
>>>>
>>>>
>>>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
>>>>
>>>>
>>>> TESTS
>>>> I have identified two streamlined URL's. The first one returns the
>>>> documents that exist in the doc library of the 'Finance' site. This URL is:
>>>>
>>>>
>>>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%7D
>>>>
>>>> The second URL simply adds the site restriction. This URL returns
>>>> nothing:
>>>>
>>>>
>>>> http://52.23.225.233:8080/alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%7D
>>>>
>>>>
>>>>
>>>> Can anyone explain why the documents do not return when only the
>>>> containing site is named in the API URL?
>>>>
>>>> Cheers
>>>>
>>>> Paul
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>

Re: Alfresco WebScript Connector - Testing Question

Posted by "Delapasse, Deanna" <dd...@oceaneering.com>.
Hi Paul,

I haven't read the entire thread, so I apologize if this is way off base...

When I worked with the CMIS connector I had to modify the logic to append
document.getLastModificationDate().getTimeInMillis() to the versionString
for it to pick up changes.  The Alfresco document version won't update when
you modify metadata.  My memory is terrible, but I believe that even
modifying content may not do it unless you have the proper 'versioning'
aspect applied.

Check inside Alfresco and see if your "version" is actually incrementing as
you expect. I was using an older Alfresco version and was not able to run
with the Alfresco connector, but the CMIS connector worked great for us!

Good luck!
Deanna




On Wed, Oct 28, 2015 at 6:07 AM, Paul Farrell <pf...@funnelback.com>
wrote:

> The alfresco log snippet doesn’t really shed any more light. It simple
> doesn’t think that the document content has changed.
>
> 09:56:42,059 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl]
> [http-apr-8080-exec-5] [getNodesByTransactionId] On Store
> workspace://SpacesStore
> 09:56:42,065 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl]
> [http-apr-8080-exec-5] [getLastTransactionID]
> 09:56:42,065 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl]
> [http-apr-8080-exec-5] [getNodesByAclChangesetId] On Store
> workspace://SpacesStore
> 09:56:42,070 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl]
> [http-apr-8080-exec-5] [getLastAclChangeSetID]
> 09:56:42,070 DEBUG
> [com.github.maoo.indexer.webscripts.NodeChangesWebScript]
> [http-apr-8080-exec-5] Attaching 0 nodes to the WebScript template
> 09:56:42,079 DEBUG
> [com.github.maoo.indexer.webscripts.NodeChangesWebScript]
> [http-apr-8080-exec-9] Invoking Changes Webscript, using the following
> params
> lastTxnId: 352
> lastAclChangesetId: 13
> storeId: SpacesStore
> storeProtocol: workspace
> indexingFilters:
> {"aspectFilters":[],"metadataFilters":{},"mimetypeFilters":[],"siteFilters":["Finance"],"typeFilters":[]}
>
> 09:56:42,079 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl]
> [http-apr-8080-exec-9] [getNodesByTransactionId] On Store
> workspace://SpacesStore
> 09:56:42,082 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl]
> [http-apr-8080-exec-9] [getLastTransactionID]
> 09:56:42,082 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl]
> [http-apr-8080-exec-9] [getNodesByAclChangesetId] On Store
> workspace://SpacesStore
> 09:56:42,087 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl]
> [http-apr-8080-exec-9] [getLastAclChangeSetID]
> 09:56:42,087 DEBUG
> [com.github.maoo.indexer.webscripts.NodeChangesWebScript]
> [http-apr-8080-exec-9] Attaching 0 nodes to the WebScript template
>
> *Paul Farrell*
> Senior Search Consultant
>
> 109-123 Clifton Street, London EC2A 4LD
> *T* +44 (0) 207 183 6865 | funnelback.com <http://www.funnelback.com/>
>
> *UNITED KINGDOM* | AUSTRALIA | NEW ZEALAND | POLAND | UNITED STATES
>
> Connect with us: LinkedIn <http://www.linkedin.com/company/funnelback> -
> Twitter <https://twitter.com/funnelback>
>
> Funnelback UK Ltd is a limited liability company registered in England &
> Wales. Registered address: Zetland House 109-123, Clifton Street, London.
> EC2A 4LD. Company registration number: 07004264.
>
> On 28 Oct 2015, at 10:50, Rafa Haro <rh...@gmail.com> wrote:
>
> You’re welcome Paul. Just in case, could you check the Alfresco logs to
> see if there is something informative there?
>
> Cheers,
> Rafa
>
>
>
>
> On Wed, Oct 28, 2015 at 11:47 AM, Paul Farrell <pf...@funnelback.com>
> wrote:
>
>> I see. That makes sense.
>>
>> No problem. Thanks for the feedback Rafa. Much appreciated.
>>
>>
>>
>> *Paul Farrell*
>> Senior Search Consultant
>>
>> 109-123 Clifton Street, London EC2A 4LD
>> *T* +44 (0) 207 183 6865 | funnelback.com <http://www.funnelback.com/>
>>
>> *UNITED KINGDOM* | AUSTRALIA | NEW ZEALAND | POLAND | UNITED STATES
>>
>> Connect with us: LinkedIn <http://www.linkedin.com/company/funnelback> -
>> Twitter <https://twitter.com/funnelback>
>>
>> Funnelback UK Ltd is a limited liability company registered in England &
>> Wales. Registered address: Zetland House 109-123, Clifton Street, London.
>> EC2A 4LD. Company registration number: 07004264.
>>
>> On 28 Oct 2015, at 10:45, Rafa Haro <rh...@gmail.com> wrote:
>>
>> Hi Paul,
>>
>> Before contributing the Alfresco connector, we performed several tests
>> similar to yours using an Alfresco 4.x version. Therefore, initially, my
>> guess is the Webscript is not behaving correctly for Alfresco 5 instances.
>> I’m including Maurizio Pillitu (Alfresco Indexer main developer) in the
>> email thread. He might can provide some feedback about this or just confirm
>> my suspicions.
>>
>> Cheers,
>> Rafa
>>
>>
>>
>>
>> On Wed, Oct 28, 2015 at 11:33 AM, Paul Farrell <pf...@funnelback.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> In follow up to my recent email (below) I thought I would share my
>>> findings with the ‘Alfresco Indexer’ connector (
>>> https://github.com/maoo/alfresco-indexer) in case someone may be able
>>> to advise on it’s usage.
>>>
>>> The reason I went to this is due to the lack of change control detection
>>> with either of the packaged Manifold Alfresco connectors (AtomPub or
>>> WebService). I needed a method whereby the crawl runs each night and picks
>>> up any and all changes to the documents from the previous 24 hours. A
>>> common scenario.
>>>
>>> Unfortunately, I am still to achieve this.
>>>
>>> Having built and installed both the AMP and JAR files needed for the new
>>> connector, changes are still not coming through. In fact, I have two
>>> observations so far:
>>>
>>> 1. Changes to document content or properties does not cause the same
>>> document to be picked up by the Alfresco connector on the next run
>>> 2. Adding ‘Filter Configuration’ seems to do very little to change what
>>> is picked up
>>>
>>> *IN DETAIL*
>>> *1. Failing to pick up modified content*
>>>
>>> Looking at the log files (which are set to debug) I can see that, upon
>>> the first crawl of Alfresco, Manifold sends the following requests:
>>>
>>> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - Executing request
>>> GET
>>> /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>>> HTTP/1.1
>>> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239 >>
>>> GET
>>> /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>>> HTTP/1.1
>>> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239 >>
>>> "GET
>>> /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>>> HTTP/1.1[\r][\n]"
>>> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - Executing request
>>> GET
>>> /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>>> HTTP/1.1
>>> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240 >>
>>> GET
>>> /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>>> HTTP/1.1
>>> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240 >>
>>> "GET
>>> /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>>> HTTP/1.1[\r][\n]"
>>> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - Executing request
>>> GET
>>> /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content
>>> HTTP/1.1
>>> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241 >>
>>> GET
>>> /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content
>>> HTTP/1.1
>>> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241 >>
>>> "GET
>>> /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content
>>> HTTP/1.1[\r][\n]"
>>> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - Executing request
>>> GET
>>> /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a
>>> HTTP/1.1
>>> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242 >>
>>> GET
>>> /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a
>>> HTTP/1.1
>>> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242 >>
>>> "GET
>>> /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a
>>> HTTP/1.1[\r][\n]"
>>>
>>> This picks up all of the content e.g. documents.
>>>
>>> Running a second crawl, without any other actions being done, results in
>>> the following requests:
>>>
>>> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - Executing request GET
>>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
>>> HTTP/1.1
>>> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >>
>>> GET
>>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
>>> HTTP/1.1
>>> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >>
>>> "GET
>>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
>>> HTTP/1.1[\r][\n]”
>>>
>>> So I can see that, in the first instance, we are targeting content
>>> directly while, in the second, we are asking for changes. The problem is
>>> that no changes are returned from the second set of requests. The response
>>> from these calls is:
>>>
>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "
>>>  "totalNodes" : "0", [\r][\n]"
>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "
>>>  "elapsedTime" : "8",[\r][\n]"
>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "
>>>  "docs" : [[\r][\n]"
>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "
>>>  ],[\r][\n]"
>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "
>>>    "last_txn_id" : "352",[\r][\n]"
>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "
>>>    "last_acl_changeset_id" : "13",[\r][\n]"
>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "
>>>  "store_id" : "SpacesStore",[\r][\n]"
>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "
>>>  "store_protocol" : "workspace"[\r][\n]"
>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << “}"
>>>
>>> Regardless of what changes I make to a document that I have been using
>>> for testing, the document is not updated. The response from the calls for
>>> changes (totalNodes) is always ‘0’.
>>>
>>>
>>> *2. Adding ‘Filter Configuration’ seems to do very little to change what
>>> is picked up*
>>>
>>> Within my test Alfresco environment I have one site set up (Finance).
>>> Within the Finance doc library I have three test docs. No other changes
>>> have been made to the Alfresco instance.
>>> Running a crawl with no filter configurations set returns 81 items. This
>>> is via the URL in a browser.
>>> If I then set the Site Filter configuration to ‘Finance’ and apply, I
>>> still get 81 items when I re-run the crawl.
>>> I can see that the term ‘Finance’ is being added to the URL but this
>>> does not seem to change the behaviour.
>>>
>>>
>>> I am happy to spend time diagnosing this is there is anyone available to
>>> assist.
>>>
>>> Thanks
>>>
>>> Paul
>>>
>>>
>>>
>>> On 27 Oct 2015, at 18:14, pfarrell@funnelback.com wrote:
>>>
>>> Hi all,
>>>
>>> This is a question regarding the relatively new Alfresco Webscript
>>> connector.
>>>
>>> SETUP
>>> I have a vanilla Alfresco Community 5.0 installation
>>> One site has been created called 'Finance'
>>> A handful of documents have been created in 'Finance' Doc Library.
>>> I have cloned and packaged up the 'alfresco-indexer' (
>>> https://github.com/maoo/alfresco-indexer) and have applied the AMP and
>>> CLIENT packages to their respective environments.
>>>
>>>
>>> ISSUE
>>> The issue is that the default API call used by Manifold is returning
>>> nothing. The full API call used by Manifold, and based on my config, is :
>>>
>>>
>>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
>>>
>>>
>>> TESTS
>>> I have identified two streamlined URL's. The first one returns the
>>> documents that exist in the doc library of the 'Finance' site. This URL is:
>>>
>>>
>>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%7D
>>>
>>> The second URL simply adds the site restriction. This URL returns
>>> nothing:
>>>
>>>
>>> http://52.23.225.233:8080/alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%7D
>>>
>>>
>>>
>>> Can anyone explain why the documents do not return when only the
>>> containing site is named in the API URL?
>>>
>>> Cheers
>>>
>>> Paul
>>>
>>>
>>>
>>>
>>
>>
>
>

Re: Alfresco WebScript Connector - Testing Question

Posted by Paul Farrell <pf...@funnelback.com>.
The alfresco log snippet doesn’t really shed any more light. It simple doesn’t think that the document content has changed. 

09:56:42,059 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl] [http-apr-8080-exec-5] [getNodesByTransactionId] On Store workspace://SpacesStore
09:56:42,065 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl] [http-apr-8080-exec-5] [getLastTransactionID]
09:56:42,065 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl] [http-apr-8080-exec-5] [getNodesByAclChangesetId] On Store workspace://SpacesStore
09:56:42,070 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl] [http-apr-8080-exec-5] [getLastAclChangeSetID]
09:56:42,070 DEBUG [com.github.maoo.indexer.webscripts.NodeChangesWebScript] [http-apr-8080-exec-5] Attaching 0 nodes to the WebScript template
09:56:42,079 DEBUG [com.github.maoo.indexer.webscripts.NodeChangesWebScript] [http-apr-8080-exec-9] Invoking Changes Webscript, using the following params
lastTxnId: 352
lastAclChangesetId: 13
storeId: SpacesStore
storeProtocol: workspace
indexingFilters: {"aspectFilters":[],"metadataFilters":{},"mimetypeFilters":[],"siteFilters":["Finance"],"typeFilters":[]}

09:56:42,079 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl] [http-apr-8080-exec-9] [getNodesByTransactionId] On Store workspace://SpacesStore
09:56:42,082 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl] [http-apr-8080-exec-9] [getLastTransactionID]
09:56:42,082 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl] [http-apr-8080-exec-9] [getNodesByAclChangesetId] On Store workspace://SpacesStore
09:56:42,087 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl] [http-apr-8080-exec-9] [getLastAclChangeSetID]
09:56:42,087 DEBUG [com.github.maoo.indexer.webscripts.NodeChangesWebScript] [http-apr-8080-exec-9] Attaching 0 nodes to the WebScript template

Paul Farrell
Senior Search Consultant
 
109-123 Clifton Street, London EC2A 4LD
T +44 (0) 207 183 6865 | funnelback.com <http://www.funnelback.com/>

UNITED KINGDOM | AUSTRALIA | NEW ZEALAND | POLAND | UNITED STATES

Connect with us: LinkedIn <http://www.linkedin.com/company/funnelback> - Twitter <https://twitter.com/funnelback>

Funnelback UK Ltd is a limited liability company registered in England & Wales. Registered address: Zetland House 109-123, Clifton Street, London. EC2A 4LD. Company registration number: 07004264.

> On 28 Oct 2015, at 10:50, Rafa Haro <rh...@gmail.com> wrote:
> 
> You’re welcome Paul. Just in case, could you check the Alfresco logs to see if there is something informative there?
> 
> Cheers,
> Rafa
> 
> 
> 
> 
> On Wed, Oct 28, 2015 at 11:47 AM, Paul Farrell <pfarrell@funnelback.com <ma...@funnelback.com>> wrote:
> 
> I see. That makes sense. 
> 
> No problem. Thanks for the feedback Rafa. Much appreciated. 
> 
> 
> 
> Paul Farrell
> Senior Search Consultant
>  
> 109-123 Clifton Street, London EC2A 4LD
> T +44 (0) 207 183 6865 | funnelback.com <http://www.funnelback.com/>
> 
> UNITED KINGDOM | AUSTRALIA | NEW ZEALAND | POLAND | UNITED STATES
> 
> Connect with us: LinkedIn <http://www.linkedin.com/company/funnelback> - Twitter <https://twitter.com/funnelback>
> 
> Funnelback UK Ltd is a limited liability company registered in England & Wales. Registered address: Zetland House 109-123, Clifton Street, London. EC2A 4LD. Company registration number: 07004264.
> 
>> On 28 Oct 2015, at 10:45, Rafa Haro <rharoapache@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Hi Paul, 
>> 
>> Before contributing the Alfresco connector, we performed several tests similar to yours using an Alfresco 4.x version. Therefore, initially, my guess is the Webscript is not behaving correctly for Alfresco 5 instances. I’m including Maurizio Pillitu (Alfresco Indexer main developer) in the email thread. He might can provide some feedback about this or just confirm my suspicions. 
>> 
>> Cheers,
>> Rafa
>> 
>> 
>> 
>> 
>> On Wed, Oct 28, 2015 at 11:33 AM, Paul Farrell <pfarrell@funnelback.com <ma...@funnelback.com>> wrote:
>> 
>> Hi all,
>> 
>> In follow up to my recent email (below) I thought I would share my findings with the ‘Alfresco Indexer’ connector (https://github.com/maoo/alfresco-indexer <https://github.com/maoo/alfresco-indexer>) in case someone may be able to advise on it’s usage. 
>> 
>> The reason I went to this is due to the lack of change control detection with either of the packaged Manifold Alfresco connectors (AtomPub or WebService). I needed a method whereby the crawl runs each night and picks up any and all changes to the documents from the previous 24 hours. A common scenario.
>> 
>> Unfortunately, I am still to achieve this. 
>> 
>> Having built and installed both the AMP and JAR files needed for the new connector, changes are still not coming through. In fact, I have two observations so far:
>> 
>> 1. Changes to document content or properties does not cause the same document to be picked up by the Alfresco connector on the next run
>> 2. Adding ‘Filter Configuration’ seems to do very little to change what is picked up
>> 
>> IN DETAIL
>> 1. Failing to pick up modified content
>> 
>> Looking at the log files (which are set to debug) I can see that, upon the first crawl of Alfresco, Manifold sends the following requests:
>> 
>> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - Executing request GET /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1
>> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239 >> GET /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1
>> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239 >> "GET /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1[\r][\n]"
>> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - Executing request GET /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1
>> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240 >> GET /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1
>> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240 >> "GET /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1[\r][\n]"
>> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - Executing request GET /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content HTTP/1.1
>> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241 >> GET /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content HTTP/1.1
>> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241 >> "GET /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content HTTP/1.1[\r][\n]"
>> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - Executing request GET /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a HTTP/1.1
>> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242 >> GET /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a HTTP/1.1
>> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242 >> "GET /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a HTTP/1.1[\r][\n]"
>> 
>> This picks up all of the content e.g. documents. 
>> 
>> Running a second crawl, without any other actions being done, results in the following requests:
>> 
>> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - Executing request GET /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D HTTP/1.1
>> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >> GET /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D HTTP/1.1
>> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >> "GET /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D HTTP/1.1[\r][\n]”
>> 
>> So I can see that, in the first instance, we are targeting content directly while, in the second, we are asking for changes. The problem is that no changes are returned from the second set of requests. The response from these calls is:
>> 
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  "totalNodes" : "0", [\r][\n]"
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  "elapsedTime" : "8",[\r][\n]"
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  "docs" : [[\r][\n]"
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  ],[\r][\n]"
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "    "last_txn_id" : "352",[\r][\n]"
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "    "last_acl_changeset_id" : "13",[\r][\n]"
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  "store_id" : "SpacesStore",[\r][\n]"
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  "store_protocol" : "workspace"[\r][\n]"
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << “}"
>> 
>> Regardless of what changes I make to a document that I have been using for testing, the document is not updated. The response from the calls for changes (totalNodes) is always ‘0’.
>> 
>> 
>> 2. Adding ‘Filter Configuration’ seems to do very little to change what is picked up
>> 
>> Within my test Alfresco environment I have one site set up (Finance). Within the Finance doc library I have three test docs. No other changes have been made to the Alfresco instance. 
>> Running a crawl with no filter configurations set returns 81 items. This is via the URL in a browser.
>> If I then set the Site Filter configuration to ‘Finance’ and apply, I still get 81 items when I re-run the crawl. 
>> I can see that the term ‘Finance’ is being added to the URL but this does not seem to change the behaviour. 
>> 
>> 
>> I am happy to spend time diagnosing this is there is anyone available to assist. 
>> 
>> Thanks
>> 
>> Paul
>> 
>> 
>> 
>>> On 27 Oct 2015, at 18:14, pfarrell@funnelback.com <ma...@funnelback.com> wrote:
>>> 
>>> Hi all,
>>> 
>>> This is a question regarding the relatively new Alfresco Webscript connector. 
>>> 
>>> SETUP
>>> I have a vanilla Alfresco Community 5.0 installation
>>> One site has been created called 'Finance'
>>> A handful of documents have been created in 'Finance' Doc Library.
>>> I have cloned and packaged up the 'alfresco-indexer' (https://github.com/maoo/alfresco-indexer <https://github.com/maoo/alfresco-indexer>) and have applied the AMP and CLIENT packages to their respective environments. 
>>> 
>>> 
>>> ISSUE
>>> The issue is that the default API call used by Manifold is returning nothing. The full API call used by Manifold, and based on my config, is :
>>> 
>>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
>>> 
>>> 
>>> TESTS
>>> I have identified two streamlined URL's. The first one returns the documents that exist in the doc library of the 'Finance' site. This URL is:
>>> 
>>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%7D
>>> 
>>> The second URL simply adds the site restriction. This URL returns nothing:
>>> 
>>> http://52.23.225.233:8080/alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%7D <http://52.23.225.233:8080/alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%7D>
>>> 
>>> 
>>> 
>>> Can anyone explain why the documents do not return when only the containing site is named in the API URL?
>>> 
>>> Cheers
>>> 
>>> Paul
>>> 
>>> 
>> 
>> 
> 
> 


Re: Alfresco WebScript Connector - Testing Question

Posted by Rafa Haro <rh...@gmail.com>.
You’re welcome Paul. Just in case, could you check the Alfresco logs to see if there is something informative there?


Cheers,

Rafa

On Wed, Oct 28, 2015 at 11:47 AM, Paul Farrell <pf...@funnelback.com>
wrote:

> I see. That makes sense. 
> No problem. Thanks for the feedback Rafa. Much appreciated. 
> Paul Farrell
> Senior Search Consultant
>  
> 109-123 Clifton Street, London EC2A 4LD
> T +44 (0) 207 183 6865 | funnelback.com <http://www.funnelback.com/>
> UNITED KINGDOM | AUSTRALIA | NEW ZEALAND | POLAND | UNITED STATES
> Connect with us: LinkedIn <http://www.linkedin.com/company/funnelback> - Twitter <https://twitter.com/funnelback>
> Funnelback UK Ltd is a limited liability company registered in England & Wales. Registered address: Zetland House 109-123, Clifton Street, London. EC2A 4LD. Company registration number: 07004264.
>> On 28 Oct 2015, at 10:45, Rafa Haro <rh...@gmail.com> wrote:
>> 
>> Hi Paul, 
>> 
>> Before contributing the Alfresco connector, we performed several tests similar to yours using an Alfresco 4.x version. Therefore, initially, my guess is the Webscript is not behaving correctly for Alfresco 5 instances. I’m including Maurizio Pillitu (Alfresco Indexer main developer) in the email thread. He might can provide some feedback about this or just confirm my suspicions. 
>> 
>> Cheers,
>> Rafa
>> 
>> 
>> 
>> 
>> On Wed, Oct 28, 2015 at 11:33 AM, Paul Farrell <pfarrell@funnelback.com <ma...@funnelback.com>> wrote:
>> 
>> Hi all,
>> 
>> In follow up to my recent email (below) I thought I would share my findings with the ‘Alfresco Indexer’ connector (https://github.com/maoo/alfresco-indexer <https://github.com/maoo/alfresco-indexer>) in case someone may be able to advise on it’s usage. 
>> 
>> The reason I went to this is due to the lack of change control detection with either of the packaged Manifold Alfresco connectors (AtomPub or WebService). I needed a method whereby the crawl runs each night and picks up any and all changes to the documents from the previous 24 hours. A common scenario.
>> 
>> Unfortunately, I am still to achieve this. 
>> 
>> Having built and installed both the AMP and JAR files needed for the new connector, changes are still not coming through. In fact, I have two observations so far:
>> 
>> 1. Changes to document content or properties does not cause the same document to be picked up by the Alfresco connector on the next run
>> 2. Adding ‘Filter Configuration’ seems to do very little to change what is picked up
>> 
>> IN DETAIL
>> 1. Failing to pick up modified content
>> 
>> Looking at the log files (which are set to debug) I can see that, upon the first crawl of Alfresco, Manifold sends the following requests:
>> 
>> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - Executing request GET /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1
>> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239 >> GET /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1
>> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239 >> "GET /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1[\r][\n]"
>> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - Executing request GET /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1
>> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240 >> GET /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1
>> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240 >> "GET /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1[\r][\n]"
>> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - Executing request GET /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content HTTP/1.1
>> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241 >> GET /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content HTTP/1.1
>> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241 >> "GET /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content HTTP/1.1[\r][\n]"
>> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - Executing request GET /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a HTTP/1.1
>> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242 >> GET /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a HTTP/1.1
>> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242 >> "GET /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a HTTP/1.1[\r][\n]"
>> 
>> This picks up all of the content e.g. documents. 
>> 
>> Running a second crawl, without any other actions being done, results in the following requests:
>> 
>> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - Executing request GET /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D HTTP/1.1
>> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >> GET /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D HTTP/1.1
>> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >> "GET /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D HTTP/1.1[\r][\n]”
>> 
>> So I can see that, in the first instance, we are targeting content directly while, in the second, we are asking for changes. The problem is that no changes are returned from the second set of requests. The response from these calls is:
>> 
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  "totalNodes" : "0", [\r][\n]"
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  "elapsedTime" : "8",[\r][\n]"
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  "docs" : [[\r][\n]"
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  ],[\r][\n]"
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "    "last_txn_id" : "352",[\r][\n]"
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "    "last_acl_changeset_id" : "13",[\r][\n]"
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  "store_id" : "SpacesStore",[\r][\n]"
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  "store_protocol" : "workspace"[\r][\n]"
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << “}"
>> 
>> Regardless of what changes I make to a document that I have been using for testing, the document is not updated. The response from the calls for changes (totalNodes) is always ‘0’.
>> 
>> 
>> 2. Adding ‘Filter Configuration’ seems to do very little to change what is picked up
>> 
>> Within my test Alfresco environment I have one site set up (Finance). Within the Finance doc library I have three test docs. No other changes have been made to the Alfresco instance. 
>> Running a crawl with no filter configurations set returns 81 items. This is via the URL in a browser.
>> If I then set the Site Filter configuration to ‘Finance’ and apply, I still get 81 items when I re-run the crawl. 
>> I can see that the term ‘Finance’ is being added to the URL but this does not seem to change the behaviour. 
>> 
>> 
>> I am happy to spend time diagnosing this is there is anyone available to assist. 
>> 
>> Thanks
>> 
>> Paul
>> 
>> 
>> 
>>> On 27 Oct 2015, at 18:14, pfarrell@funnelback.com <ma...@funnelback.com> wrote:
>>> 
>>> Hi all,
>>> 
>>> This is a question regarding the relatively new Alfresco Webscript connector. 
>>> 
>>> SETUP
>>> I have a vanilla Alfresco Community 5.0 installation
>>> One site has been created called 'Finance'
>>> A handful of documents have been created in 'Finance' Doc Library.
>>> I have cloned and packaged up the 'alfresco-indexer' (https://github.com/maoo/alfresco-indexer <https://github.com/maoo/alfresco-indexer>) and have applied the AMP and CLIENT packages to their respective environments. 
>>> 
>>> 
>>> ISSUE
>>> The issue is that the default API call used by Manifold is returning nothing. The full API call used by Manifold, and based on my config, is :
>>> 
>>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
>>> 
>>> 
>>> TESTS
>>> I have identified two streamlined URL's. The first one returns the documents that exist in the doc library of the 'Finance' site. This URL is:
>>> 
>>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%7D
>>> 
>>> The second URL simply adds the site restriction. This URL returns nothing:
>>> 
>>> http://52.23.225.233:8080/alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%7D <http://52.23.225.233:8080/alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%7D>
>>> 
>>> 
>>> 
>>> Can anyone explain why the documents do not return when only the containing site is named in the API URL?
>>> 
>>> Cheers
>>> 
>>> Paul
>>> 
>>> 
>> 
>> 

Re: Alfresco WebScript Connector - Testing Question

Posted by Paul Farrell <pf...@funnelback.com>.
I see. That makes sense. 

No problem. Thanks for the feedback Rafa. Much appreciated. 



Paul Farrell
Senior Search Consultant
 
109-123 Clifton Street, London EC2A 4LD
T +44 (0) 207 183 6865 | funnelback.com <http://www.funnelback.com/>

UNITED KINGDOM | AUSTRALIA | NEW ZEALAND | POLAND | UNITED STATES

Connect with us: LinkedIn <http://www.linkedin.com/company/funnelback> - Twitter <https://twitter.com/funnelback>

Funnelback UK Ltd is a limited liability company registered in England & Wales. Registered address: Zetland House 109-123, Clifton Street, London. EC2A 4LD. Company registration number: 07004264.

> On 28 Oct 2015, at 10:45, Rafa Haro <rh...@gmail.com> wrote:
> 
> Hi Paul, 
> 
> Before contributing the Alfresco connector, we performed several tests similar to yours using an Alfresco 4.x version. Therefore, initially, my guess is the Webscript is not behaving correctly for Alfresco 5 instances. I’m including Maurizio Pillitu (Alfresco Indexer main developer) in the email thread. He might can provide some feedback about this or just confirm my suspicions. 
> 
> Cheers,
> Rafa
> 
> 
> 
> 
> On Wed, Oct 28, 2015 at 11:33 AM, Paul Farrell <pfarrell@funnelback.com <ma...@funnelback.com>> wrote:
> 
> Hi all,
> 
> In follow up to my recent email (below) I thought I would share my findings with the ‘Alfresco Indexer’ connector (https://github.com/maoo/alfresco-indexer <https://github.com/maoo/alfresco-indexer>) in case someone may be able to advise on it’s usage. 
> 
> The reason I went to this is due to the lack of change control detection with either of the packaged Manifold Alfresco connectors (AtomPub or WebService). I needed a method whereby the crawl runs each night and picks up any and all changes to the documents from the previous 24 hours. A common scenario.
> 
> Unfortunately, I am still to achieve this. 
> 
> Having built and installed both the AMP and JAR files needed for the new connector, changes are still not coming through. In fact, I have two observations so far:
> 
> 1. Changes to document content or properties does not cause the same document to be picked up by the Alfresco connector on the next run
> 2. Adding ‘Filter Configuration’ seems to do very little to change what is picked up
> 
> IN DETAIL
> 1. Failing to pick up modified content
> 
> Looking at the log files (which are set to debug) I can see that, upon the first crawl of Alfresco, Manifold sends the following requests:
> 
> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - Executing request GET /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1
> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239 >> GET /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1
> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239 >> "GET /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1[\r][\n]"
> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - Executing request GET /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1
> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240 >> GET /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1
> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240 >> "GET /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1[\r][\n]"
> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - Executing request GET /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content HTTP/1.1
> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241 >> GET /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content HTTP/1.1
> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241 >> "GET /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content HTTP/1.1[\r][\n]"
> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - Executing request GET /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a HTTP/1.1
> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242 >> GET /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a HTTP/1.1
> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242 >> "GET /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a HTTP/1.1[\r][\n]"
> 
> This picks up all of the content e.g. documents. 
> 
> Running a second crawl, without any other actions being done, results in the following requests:
> 
> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - Executing request GET /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D HTTP/1.1
> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >> GET /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D HTTP/1.1
> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >> "GET /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D HTTP/1.1[\r][\n]”
> 
> So I can see that, in the first instance, we are targeting content directly while, in the second, we are asking for changes. The problem is that no changes are returned from the second set of requests. The response from these calls is:
> 
> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  "totalNodes" : "0", [\r][\n]"
> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  "elapsedTime" : "8",[\r][\n]"
> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  "docs" : [[\r][\n]"
> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  ],[\r][\n]"
> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "    "last_txn_id" : "352",[\r][\n]"
> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "    "last_acl_changeset_id" : "13",[\r][\n]"
> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  "store_id" : "SpacesStore",[\r][\n]"
> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  "store_protocol" : "workspace"[\r][\n]"
> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << “}"
> 
> Regardless of what changes I make to a document that I have been using for testing, the document is not updated. The response from the calls for changes (totalNodes) is always ‘0’.
> 
> 
> 2. Adding ‘Filter Configuration’ seems to do very little to change what is picked up
> 
> Within my test Alfresco environment I have one site set up (Finance). Within the Finance doc library I have three test docs. No other changes have been made to the Alfresco instance. 
> Running a crawl with no filter configurations set returns 81 items. This is via the URL in a browser.
> If I then set the Site Filter configuration to ‘Finance’ and apply, I still get 81 items when I re-run the crawl. 
> I can see that the term ‘Finance’ is being added to the URL but this does not seem to change the behaviour. 
> 
> 
> I am happy to spend time diagnosing this is there is anyone available to assist. 
> 
> Thanks
> 
> Paul
> 
> 
> 
>> On 27 Oct 2015, at 18:14, pfarrell@funnelback.com <ma...@funnelback.com> wrote:
>> 
>> Hi all,
>> 
>> This is a question regarding the relatively new Alfresco Webscript connector. 
>> 
>> SETUP
>> I have a vanilla Alfresco Community 5.0 installation
>> One site has been created called 'Finance'
>> A handful of documents have been created in 'Finance' Doc Library.
>> I have cloned and packaged up the 'alfresco-indexer' (https://github.com/maoo/alfresco-indexer <https://github.com/maoo/alfresco-indexer>) and have applied the AMP and CLIENT packages to their respective environments. 
>> 
>> 
>> ISSUE
>> The issue is that the default API call used by Manifold is returning nothing. The full API call used by Manifold, and based on my config, is :
>> 
>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
>> 
>> 
>> TESTS
>> I have identified two streamlined URL's. The first one returns the documents that exist in the doc library of the 'Finance' site. This URL is:
>> 
>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%7D
>> 
>> The second URL simply adds the site restriction. This URL returns nothing:
>> 
>> http://52.23.225.233:8080/alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%7D <http://52.23.225.233:8080/alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%7D>
>> 
>> 
>> 
>> Can anyone explain why the documents do not return when only the containing site is named in the API URL?
>> 
>> Cheers
>> 
>> Paul
>> 
>> 
> 
> 


Re: Alfresco WebScript Connector - Testing Question

Posted by Rafa Haro <rh...@gmail.com>.
Hi Paul, 




Before contributing the Alfresco connector, we performed several tests similar to yours using an Alfresco 4.x version. Therefore, initially, my guess is the Webscript is not behaving correctly for Alfresco 5 instances. I’m including Maurizio Pillitu (Alfresco Indexer main developer) in the email thread. He might can provide some feedback about this or just confirm my suspicions. 




Cheers,

Rafa

On Wed, Oct 28, 2015 at 11:33 AM, Paul Farrell <pf...@funnelback.com>
wrote:

> Hi all,
> In follow up to my recent email (below) I thought I would share my findings with the ‘Alfresco Indexer’ connector (https://github.com/maoo/alfresco-indexer <https://github.com/maoo/alfresco-indexer>) in case someone may be able to advise on it’s usage. 
> The reason I went to this is due to the lack of change control detection with either of the packaged Manifold Alfresco connectors (AtomPub or WebService). I needed a method whereby the crawl runs each night and picks up any and all changes to the documents from the previous 24 hours. A common scenario.
> Unfortunately, I am still to achieve this. 
> Having built and installed both the AMP and JAR files needed for the new connector, changes are still not coming through. In fact, I have two observations so far:
> 1. Changes to document content or properties does not cause the same document to be picked up by the Alfresco connector on the next run
> 2. Adding ‘Filter Configuration’ seems to do very little to change what is picked up
> IN DETAIL
> 1. Failing to pick up modified content
> Looking at the log files (which are set to debug) I can see that, upon the first crawl of Alfresco, Manifold sends the following requests:
> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - Executing request GET /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1
> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239 >> GET /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1
> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239 >> "GET /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1[\r][\n]"
> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - Executing request GET /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1
> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240 >> GET /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1
> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240 >> "GET /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1[\r][\n]"
> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - Executing request GET /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content HTTP/1.1
> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241 >> GET /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content HTTP/1.1
> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241 >> "GET /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content HTTP/1.1[\r][\n]"
> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - Executing request GET /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a HTTP/1.1
> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242 >> GET /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a HTTP/1.1
> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242 >> "GET /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a HTTP/1.1[\r][\n]"
> This picks up all of the content e.g. documents. 
> Running a second crawl, without any other actions being done, results in the following requests:
> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - Executing request GET /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D HTTP/1.1
> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >> GET /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D HTTP/1.1
> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >> "GET /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D HTTP/1.1[\r][\n]”
> So I can see that, in the first instance, we are targeting content directly while, in the second, we are asking for changes. The problem is that no changes are returned from the second set of requests. The response from these calls is:
> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  "totalNodes" : "0", [\r][\n]"
> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  "elapsedTime" : "8",[\r][\n]"
> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  "docs" : [[\r][\n]"
> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  ],[\r][\n]"
> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "    "last_txn_id" : "352",[\r][\n]"
> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "    "last_acl_changeset_id" : "13",[\r][\n]"
> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  "store_id" : "SpacesStore",[\r][\n]"
> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  "store_protocol" : "workspace"[\r][\n]"
> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << “}"
> Regardless of what changes I make to a document that I have been using for testing, the document is not updated. The response from the calls for changes (totalNodes) is always ‘0’.
> 2. Adding ‘Filter Configuration’ seems to do very little to change what is picked up
> Within my test Alfresco environment I have one site set up (Finance). Within the Finance doc library I have three test docs. No other changes have been made to the Alfresco instance. 
> Running a crawl with no filter configurations set returns 81 items. This is via the URL in a browser.
> If I then set the Site Filter configuration to ‘Finance’ and apply, I still get 81 items when I re-run the crawl. 
> I can see that the term ‘Finance’ is being added to the URL but this does not seem to change the behaviour. 
> I am happy to spend time diagnosing this is there is anyone available to assist. 
> Thanks
> Paul
>> On 27 Oct 2015, at 18:14, pfarrell@funnelback.com wrote:
>> 
>> Hi all,
>> 
>> This is a question regarding the relatively new Alfresco Webscript connector. 
>> 
>> SETUP
>> I have a vanilla Alfresco Community 5.0 installation
>> One site has been created called 'Finance'
>> A handful of documents have been created in 'Finance' Doc Library.
>> I have cloned and packaged up the 'alfresco-indexer' (https://github.com/maoo/alfresco-indexer) and have applied the AMP and CLIENT packages to their respective environments. 
>> 
>> 
>> ISSUE
>> The issue is that the default API call used by Manifold is returning nothing. The full API call used by Manifold, and based on my config, is :
>> 
>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
>> 
>> 
>> TESTS
>> I have identified two streamlined URL's. The first one returns the documents that exist in the doc library of the 'Finance' site. This URL is:
>> 
>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%7D
>> 
>> The second URL simply adds the site restriction. This URL returns nothing:
>> 
>> http://52.23.225.233:8080/alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%7D
>> 
>> 
>> 
>> Can anyone explain why the documents do not return when only the containing site is named in the API URL?
>> 
>> Cheers
>> 
>> Paul
>> 
>> 

Re: Alfresco WebScript Connector - Testing Question

Posted by Paul Farrell <pf...@funnelback.com>.
Hi all,

In follow up to my recent email (below) I thought I would share my findings with the ‘Alfresco Indexer’ connector (https://github.com/maoo/alfresco-indexer <https://github.com/maoo/alfresco-indexer>) in case someone may be able to advise on it’s usage. 

The reason I went to this is due to the lack of change control detection with either of the packaged Manifold Alfresco connectors (AtomPub or WebService). I needed a method whereby the crawl runs each night and picks up any and all changes to the documents from the previous 24 hours. A common scenario.

Unfortunately, I am still to achieve this. 

Having built and installed both the AMP and JAR files needed for the new connector, changes are still not coming through. In fact, I have two observations so far:

1. Changes to document content or properties does not cause the same document to be picked up by the Alfresco connector on the next run
2. Adding ‘Filter Configuration’ seems to do very little to change what is picked up

IN DETAIL
1. Failing to pick up modified content

Looking at the log files (which are set to debug) I can see that, upon the first crawl of Alfresco, Manifold sends the following requests:

DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - Executing request GET /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1
DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239 >> GET /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1
DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239 >> "GET /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1[\r][\n]"
DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - Executing request GET /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1
DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240 >> GET /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1
DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240 >> "GET /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1[\r][\n]"
DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - Executing request GET /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content HTTP/1.1
DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241 >> GET /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content HTTP/1.1
DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241 >> "GET /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content HTTP/1.1[\r][\n]"
DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - Executing request GET /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a HTTP/1.1
DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242 >> GET /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a HTTP/1.1
DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242 >> "GET /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a HTTP/1.1[\r][\n]"

This picks up all of the content e.g. documents. 

Running a second crawl, without any other actions being done, results in the following requests:

DEBUG 2015-10-28 05:26:31,854 (Startup thread) - Executing request GET /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D HTTP/1.1
DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >> GET /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D HTTP/1.1
DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >> "GET /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D HTTP/1.1[\r][\n]”

So I can see that, in the first instance, we are targeting content directly while, in the second, we are asking for changes. The problem is that no changes are returned from the second set of requests. The response from these calls is:

DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  "totalNodes" : "0", [\r][\n]"
DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  "elapsedTime" : "8",[\r][\n]"
DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  "docs" : [[\r][\n]"
DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  ],[\r][\n]"
DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "    "last_txn_id" : "352",[\r][\n]"
DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "    "last_acl_changeset_id" : "13",[\r][\n]"
DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  "store_id" : "SpacesStore",[\r][\n]"
DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  "store_protocol" : "workspace"[\r][\n]"
DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << “}"

Regardless of what changes I make to a document that I have been using for testing, the document is not updated. The response from the calls for changes (totalNodes) is always ‘0’.


2. Adding ‘Filter Configuration’ seems to do very little to change what is picked up

Within my test Alfresco environment I have one site set up (Finance). Within the Finance doc library I have three test docs. No other changes have been made to the Alfresco instance. 
Running a crawl with no filter configurations set returns 81 items. This is via the URL in a browser.
If I then set the Site Filter configuration to ‘Finance’ and apply, I still get 81 items when I re-run the crawl. 
I can see that the term ‘Finance’ is being added to the URL but this does not seem to change the behaviour. 


I am happy to spend time diagnosing this is there is anyone available to assist. 

Thanks

Paul



> On 27 Oct 2015, at 18:14, pfarrell@funnelback.com wrote:
> 
> Hi all,
> 
> This is a question regarding the relatively new Alfresco Webscript connector. 
> 
> SETUP
> I have a vanilla Alfresco Community 5.0 installation
> One site has been created called 'Finance'
> A handful of documents have been created in 'Finance' Doc Library.
> I have cloned and packaged up the 'alfresco-indexer' (https://github.com/maoo/alfresco-indexer) and have applied the AMP and CLIENT packages to their respective environments. 
> 
> 
> ISSUE
> The issue is that the default API call used by Manifold is returning nothing. The full API call used by Manifold, and based on my config, is :
> 
> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
> 
> 
> TESTS
> I have identified two streamlined URL's. The first one returns the documents that exist in the doc library of the 'Finance' site. This URL is:
> 
> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%7D
> 
> The second URL simply adds the site restriction. This URL returns nothing:
> 
> http://52.23.225.233:8080/alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%7D
> 
> 
> 
> Can anyone explain why the documents do not return when only the containing site is named in the API URL?
> 
> Cheers
> 
> Paul
> 
> 


Alfresco WebScript Connector - Testing Question

Posted by pf...@funnelback.com.
Hi all,

This is a question regarding the relatively new Alfresco Webscript connector. 

SETUP
I have a vanilla Alfresco Community 5.0 installation
One site has been created called 'Finance'
A handful of documents have been created in 'Finance' Doc Library.
I have cloned and packaged up the 'alfresco-indexer' (https://github.com/maoo/alfresco-indexer) and have applied the AMP and CLIENT packages to their respective environments. 


ISSUE
The issue is that the default API call used by Manifold is returning nothing. The full API call used by Manifold, and based on my config, is :

/alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D


TESTS
I have identified two streamlined URL's. The first one returns the documents that exist in the doc library of the 'Finance' site. This URL is:

/alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%7D

The second URL simply adds the site restriction. This URL returns nothing:

http://52.23.225.233:8080/alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%7D



Can anyone explain why the documents do not return when only the containing site is named in the API URL?

Cheers

Paul




Re: Manifold Config - Removing Associated Docs

Posted by Karl Wright <da...@gmail.com>.
Hi Paul,

The functionality is present in the REST API.

Karl


On Tue, Oct 27, 2015 at 1:20 PM, <pf...@funnelback.com> wrote:

> Hi all,
>
> I am wondering if anyone can advise on exactly what happens when someone
> clicks on the "Remove All Associated Records/Documents" button on the
> 'Output Connections' area?
>
> Is there a way I can do whatever operations are carried out
> programatically?
>
> Ultimately, I may need to run a crawl of a data source on a 'full crawl'
> basis only. In other words, I do not want Manifold to care about what
> documents have been crawled previously. I just want it to pick up and send
> all documents all of the time. The only way I can see of doing this is to
> replicate what happens when I click on that button (which obviously
> triggers a full crawl each time).
>
> Perhaps this is already a config option I may have missed?
>
> Just in case it is of interest, this is an Alfresco crawl
>
> Cheers
>
>