You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@manifoldcf.apache.org by "Sivakoti, Nikhilesh" <ni...@capgemini.com> on 2018/11/30 09:42:28 UTC

Manifold crawler issue | Alfresco | Not able to crawl large set of data

Hi Team,

We have been migrating the indexes from GSA to Elastic search. We are using Manifold crawler with alfresco webscript connector.
Crawler is able to crawl the less number of indexes. But it fails to crawl the indexes in QA environment.
We have more than 80 million+ transactions in QA which is making hard to crawl the indexes.

Is there anything we could do here to do the phase wise migration? Or do we have lucene support to query the contents as we need to query the contents under a specific path. But the current crawler using the SQL queries which is hard to query under a path.

Kindly help me on this.

Thanks,
Nikhilesh

This message contains information that may be privileged or confidential and is the property of the Capgemini Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.

Re: Manifold crawler issue | Alfresco | Not able to crawl large set of data

Posted by Karl Wright <da...@gmail.com>.

I need better information before I can help you, specifically:
(1) How are you doing this?  What is your setup?  Is this based on one of
the shipping examples, and if so, which one?  What database?
(2) How is it "failing"?  What happens?  What do you see that is wrong?

Thanks,
Karl




On Fri, Nov 30, 2018 at 11:25 AM Sivakoti, Nikhilesh <
nikhilesh.sivakoti@capgemini.com> wrote:

> Hi Karl,
>
>
>
> Manifold crawler is failing to crawl all the transaction in Alfresco
> server. Alfresco DB have around 81 million+ transactions.
>
> Can we fine tune the performance in manifold server to handle the
> transactions?
>
> _______________________________________________________________________
>
> [image: Email_CBE.gif]*Nikhilesh Sivakoti*
>
> Senior Consultant | I&D ECM
>
>
>
> Capgemini India | Bangalore
>
> Tel.: +8099062 – Mob.: + 91 81236 25125
>
> www.capgemini.com
>
>
>
> *People matter, results count.*
>
> _______________________________________________________________________
>
> [image: 50years]
>
>
>
> *Connect with Capgemini:*
> [image: Picto_Blog]
> <http://www.capgemini.com/insights-and-resources/blogs>[image:
> Picto_Twitter] <http://www.twitter.com/capgemini>[image: Picto_Facebook]
> <http://www.facebook.com/capgemini>[image: Picto_LinkedIn]
> <http://www.linkedin.com/company/capgemini>[image: Picto_Slideshare]
> <http://www.slideshare.net/capgemini>[image: Picto_YouTube]
> <http://www.youtube.com/capgeminimedia>
>
>
>
>
>
>
>
>
>
> P Please consider the environment and do not print this email unless
> absolutely necessary.
>
> Capgemini encourages environmental awareness.
>
>
>
> *From:* Karl Wright [mailto:daddywri@gmail.com]
> *Sent:* Friday, November 30, 2018 9:00 PM
> *To:* user@manifoldcf.apache.org
> *Subject:* Re: Manifold crawler issue | Alfresco | Not able to crawl
> large set of data
>
>
>
> I'm sorry, you'll need to provide more details about what exactly you are
> running into trouble with.
>
> Specifically, this: " But the current crawler using the SQL queries which
> is hard to query under a path.  "
>
>
>
> Karl
>
>
>
> On Fri, Nov 30, 2018 at 4:42 AM Sivakoti, Nikhilesh <
> nikhilesh.sivakoti@capgemini.com> wrote:
>
> Hi Team,
>
>
>
> We have been migrating the indexes from GSA to Elastic search. We are
> using Manifold crawler with alfresco webscript connector.
>
> Crawler is able to crawl the less number of indexes. But it fails to crawl
> the indexes in QA environment.
>
> We have more than 80 million+ transactions in QA which is making hard to
> crawl the indexes.
>
>
>
> Is there anything we could do here to do the phase wise migration? Or do
> we have lucene support to query the contents as we need to query the
> contents under a specific path. But the current crawler using the SQL
> queries which is hard to query under a path.
>
>
>
> Kindly help me on this.
>
>
>
> Thanks,
>
> Nikhilesh
>
>
>
> This message contains information that may be privileged or confidential
> and is the property of the Capgemini Group. It is intended only for the
> person to whom it is addressed. If you are not the intended recipient, you
> are not authorized to read, print, retain, copy, disseminate, distribute,
> or use this message or any part thereof. If you receive this message in
> error, please notify the sender immediately and delete all copies of this
> message.
>
>

Re: Manifold crawler issue | Alfresco | Not able to crawl large set of data

Posted by Karl Wright <da...@gmail.com>.

Can you check the ManifoldCF log for errors?  It sounds to me like the
agents process went down catastrophically, perhaps because of insufficient
memory.

Karl


On Sat, Dec 1, 2018 at 1:46 AM Sivakoti, Nikhilesh <
nikhilesh.sivakoti@capgemini.com> wrote:

> Yes Karl, the job has started and ran for some time and it showed the
> documents in Document and Active columns in the job status and management
> page. (Screenshot attached). But it never starts the processing.
>
>
>
>
>
> We are using MySQL database with multiple-process only.
>
> _______________________________________________________________________
>
> [image: Email_CBE.gif]*Nikhilesh Sivakoti*
>
> Senior Consultant | I&D ECM
>
>
>
> Capgemini India | Bangalore
>
> Tel.: +8099062 – Mob.: + 91 81236 25125
>
> www.capgemini.com
>
>
>
> *People matter, results count.*
>
> _______________________________________________________________________
>
> [image: 50years]
>
>
>
> *Connect with Capgemini:*
> [image: Picto_Blog]
> <http://www.capgemini.com/insights-and-resources/blogs>[image:
> Picto_Twitter] <http://www.twitter.com/capgemini>[image: Picto_Facebook]
> <http://www.facebook.com/capgemini>[image: Picto_LinkedIn]
> <http://www.linkedin.com/company/capgemini>[image: Picto_Slideshare]
> <http://www.slideshare.net/capgemini>[image: Picto_YouTube]
> <http://www.youtube.com/capgeminimedia>
>
>
>
>
>
>
>
>
>
> P Please consider the environment and do not print this email unless
> absolutely necessary.
>
> Capgemini encourages environmental awareness.
>
>
>
> *From:* Karl Wright [mailto:daddywri@gmail.com]
> *Sent:* Saturday, December 01, 2018 12:06 PM
> *To:* user@manifoldcf.apache.org
> *Cc:* Rafa Haro
> *Subject:* Re: Manifold crawler issue | Alfresco | Not able to crawl
> large set of data
>
>
>
> ' and the job is hanging to process those many documents'
>
>
>
> So the basic problem is that you aren't seeing the job advance?  Does the
> job start?  Does it make some progress and then there is no further change?
>
> What database are you using?  If this is the single process example
> without any modification, the database is HSQLDB and it puts everything in
> memory, so I wouldn't recommend it for anything at that kind of scale.
>
>
>
> Karl
>
>
>
>
>
> On Sat, Dec 1, 2018 at 1:03 AM Sivakoti, Nikhilesh <
> nikhilesh.sivakoti@capgemini.com> wrote:
>
> Hi Rafa,
>
>
>
> Yes. It deals with REST API of alfresco. But that REST API fetches the
> nodes using the hibernate configuration. So the API queries the database
> tables (ex.alf_node, alf_node_properties) based on the transaction id.
>
>
>
> Here we have 81million + transactions in database and the job is hanging
> to process those many documents.
>
>
>
> _______________________________________________________________________
>
> [image: Email_CBE.gif]*Nikhilesh Sivakoti*
>
> Senior Consultant | I&D ECM
>
>
>
> Capgemini India | Bangalore
>
> Tel.: +8099062 – Mob.: + 91 81236 25125
>
> www.capgemini.com
>
>
>
> *People matter, results count.*
>
> _______________________________________________________________________
>
> [image: 50years]
>
>
>
> *Connect with Capgemini:*
> [image: Picto_Blog]
> <http://www.capgemini.com/insights-and-resources/blogs>[image:
> Picto_Twitter] <http://www.twitter.com/capgemini>[image: Picto_Facebook]
> <http://www.facebook.com/capgemini>[image: Picto_LinkedIn]
> <http://www.linkedin.com/company/capgemini>[image: Picto_Slideshare]
> <http://www.slideshare.net/capgemini>[image: Picto_YouTube]
> <http://www.youtube.com/capgeminimedia>
>
>
>
>
>
>
>
>
>
> P Please consider the environment and do not print this email unless
> absolutely necessary.
>
> Capgemini encourages environmental awareness.
>
>
>
> *From:* Rafa Haro [mailto:rharo@apache.org]
> *Sent:* Friday, November 30, 2018 10:19 PM
> *To:* user@manifoldcf.apache.org
> *Subject:* Re: Manifold crawler issue | Alfresco | Not able to crawl
> large set of data
>
>
>
> Hi, you said before that you were using Alfresco Webscript connector and
> that connector deals directly with content REST APIs, it has nothing do
> with database transactions, can you clarify on that please?
>
>
>
> Cheers,
>
> Rafa
>
>
>
> On Fri, Nov 30, 2018 at 5:25 PM Sivakoti, Nikhilesh <
> nikhilesh.sivakoti@capgemini.com> wrote:
>
> Hi Karl,
>
>
>
> Manifold crawler is failing to crawl all the transaction in Alfresco
> server. Alfresco DB have around 81 million+ transactions.
>
> Can we fine tune the performance in manifold server to handle the
> transactions?
>
> _______________________________________________________________________
>
> [image: Email_CBE.gif]*Nikhilesh Sivakoti*
>
> Senior Consultant | I&D ECM
>
>
>
> Capgemini India | Bangalore
>
> Tel.: +8099062 – Mob.: + 91 81236 25125
>
> www.capgemini.com
>
>
>
> *People matter, results count.*
>
> _______________________________________________________________________
>
> *Error! Filename not specified.*
>
>
>
> *Connect with Capgemini:*
> *Error! Filename not specified.*
> <http://www.capgemini.com/insights-and-resources/blogs>*Error! Filename
> not specified.* <http://www.twitter.com/capgemini>*Error! Filename not
> specified.* <http://www.facebook.com/capgemini>*Error! Filename not
> specified.* <http://www.linkedin.com/company/capgemini>*Error! Filename
> not specified.* <http://www.slideshare.net/capgemini>*Error! Filename not
> specified.* <http://www.youtube.com/capgeminimedia>
>
>
>
>
>
>
>
>
>
> P Please consider the environment and do not print this email unless
> absolutely necessary.
>
> Capgemini encourages environmental awareness.
>
>
>
> *From:* Karl Wright [mailto:daddywri@gmail.com]
> *Sent:* Friday, November 30, 2018 9:00 PM
> *To:* user@manifoldcf.apache.org
> *Subject:* Re: Manifold crawler issue | Alfresco | Not able to crawl
> large set of data
>
>
>
> I'm sorry, you'll need to provide more details about what exactly you are
> running into trouble with.
>
> Specifically, this: " But the current crawler using the SQL queries which
> is hard to query under a path.  "
>
>
>
> Karl
>
>
>
> On Fri, Nov 30, 2018 at 4:42 AM Sivakoti, Nikhilesh <
> nikhilesh.sivakoti@capgemini.com> wrote:
>
> Hi Team,
>
>
>
> We have been migrating the indexes from GSA to Elastic search. We are
> using Manifold crawler with alfresco webscript connector.
>
> Crawler is able to crawl the less number of indexes. But it fails to crawl
> the indexes in QA environment.
>
> We have more than 80 million+ transactions in QA which is making hard to
> crawl the indexes.
>
>
>
> Is there anything we could do here to do the phase wise migration? Or do
> we have lucene support to query the contents as we need to query the
> contents under a specific path. But the current crawler using the SQL
> queries which is hard to query under a path.
>
>
>
> Kindly help me on this.
>
>
>
> Thanks,
>
> Nikhilesh
>
>
>
> This message contains information that may be privileged or confidential
> and is the property of the Capgemini Group. It is intended only for the
> person to whom it is addressed. If you are not the intended recipient, you
> are not authorized to read, print, retain, copy, disseminate, distribute,
> or use this message or any part thereof. If you receive this message in
> error, please notify the sender immediately and delete all copies of this
> message.
>
>

RE: Manifold crawler issue | Alfresco | Not able to crawl large set of data

Posted by "Sivakoti, Nikhilesh" <ni...@capgemini.com>.

Yes Karl, the job has started and ran for some time and it showed the documents in Document and Active columns in the job status and management page. (Screenshot attached). But it never starts the processing.

[cid:image001.png@01D4896F.9A9E5570]

We are using MySQL database with multiple-process only.
_______________________________________________________________________
[Email_CBE.gif]Nikhilesh Sivakoti
Senior Consultant | I&D ECM

Capgemini India | Bangalore
Tel.: +8099062 – Mob.: + 91 81236 25125
www.capgemini.com<http://www.capgemini.com/>

People matter, results count.
_______________________________________________________________________
[50years]



Connect with Capgemini:
[Picto_Blog]<http://www.capgemini.com/insights-and-resources/blogs>[Picto_Twitter]<http://www.twitter.com/capgemini>[Picto_Facebook]<http://www.facebook.com/capgemini>[Picto_LinkedIn]<http://www.linkedin.com/company/capgemini>[Picto_Slideshare]<http://www.slideshare.net/capgemini>[Picto_YouTube]<http://www.youtube.com/capgeminimedia>








P Please consider the environment and do not print this email unless absolutely necessary.
Capgemini encourages environmental awareness.

From: Karl Wright [mailto:daddywri@gmail.com]
Sent: Saturday, December 01, 2018 12:06 PM
To: user@manifoldcf.apache.org
Cc: Rafa Haro
Subject: Re: Manifold crawler issue | Alfresco | Not able to crawl large set of data

' and the job is hanging to process those many documents'

So the basic problem is that you aren't seeing the job advance?  Does the job start?  Does it make some progress and then there is no further change?

What database are you using?  If this is the single process example without any modification, the database is HSQLDB and it puts everything in memory, so I wouldn't recommend it for anything at that kind of scale.

Karl


On Sat, Dec 1, 2018 at 1:03 AM Sivakoti, Nikhilesh <ni...@capgemini.com>> wrote:
Hi Rafa,

Yes. It deals with REST API of alfresco. But that REST API fetches the nodes using the hibernate configuration. So the API queries the database tables (ex.alf_node, alf_node_properties) based on the transaction id.

Here we have 81million + transactions in database and the job is hanging to process those many documents.

_______________________________________________________________________
[Email_CBE.gif]Nikhilesh Sivakoti
Senior Consultant | I&D ECM

Capgemini India | Bangalore
Tel.: +8099062 – Mob.: + 91 81236 25125
www.capgemini.com<http://www.capgemini.com/>

People matter, results count.
_______________________________________________________________________
[50years]



Connect with Capgemini:
[Picto_Blog]<http://www.capgemini.com/insights-and-resources/blogs>[Picto_Twitter]<http://www.twitter.com/capgemini>[Picto_Facebook]<http://www.facebook.com/capgemini>[Picto_LinkedIn]<http://www.linkedin.com/company/capgemini>[Picto_Slideshare]<http://www.slideshare.net/capgemini>[Picto_YouTube]<http://www.youtube.com/capgeminimedia>








P Please consider the environment and do not print this email unless absolutely necessary.
Capgemini encourages environmental awareness.

From: Rafa Haro [mailto:rharo@apache.org<ma...@apache.org>]
Sent: Friday, November 30, 2018 10:19 PM
To: user@manifoldcf.apache.org<ma...@manifoldcf.apache.org>
Subject: Re: Manifold crawler issue | Alfresco | Not able to crawl large set of data

Hi, you said before that you were using Alfresco Webscript connector and that connector deals directly with content REST APIs, it has nothing do with database transactions, can you clarify on that please?

Cheers,
Rafa

On Fri, Nov 30, 2018 at 5:25 PM Sivakoti, Nikhilesh <ni...@capgemini.com>> wrote:
Hi Karl,

Manifold crawler is failing to crawl all the transaction in Alfresco server. Alfresco DB have around 81 million+ transactions.
Can we fine tune the performance in manifold server to handle the transactions?
_______________________________________________________________________
[Email_CBE.gif]Nikhilesh Sivakoti
Senior Consultant | I&D ECM

Capgemini India | Bangalore
Tel.: +8099062 – Mob.: + 91 81236 25125
www.capgemini.com<http://www.capgemini.com/>

People matter, results count.
_______________________________________________________________________
Error! Filename not specified.



Connect with Capgemini:
Error! Filename not specified.<http://www.capgemini.com/insights-and-resources/blogs>Error! Filename not specified.<http://www.twitter.com/capgemini>Error! Filename not specified.<http://www.facebook.com/capgemini>Error! Filename not specified.<http://www.linkedin.com/company/capgemini>Error! Filename not specified.<http://www.slideshare.net/capgemini>Error! Filename not specified.<http://www.youtube.com/capgeminimedia>








P Please consider the environment and do not print this email unless absolutely necessary.
Capgemini encourages environmental awareness.

From: Karl Wright [mailto:daddywri@gmail.com<ma...@gmail.com>]
Sent: Friday, November 30, 2018 9:00 PM
To: user@manifoldcf.apache.org<ma...@manifoldcf.apache.org>
Subject: Re: Manifold crawler issue | Alfresco | Not able to crawl large set of data

I'm sorry, you'll need to provide more details about what exactly you are running into trouble with.

Specifically, this: " But the current crawler using the SQL queries which is hard to query under a path.  "

Karl

On Fri, Nov 30, 2018 at 4:42 AM Sivakoti, Nikhilesh <ni...@capgemini.com>> wrote:
Hi Team,

We have been migrating the indexes from GSA to Elastic search. We are using Manifold crawler with alfresco webscript connector.
Crawler is able to crawl the less number of indexes. But it fails to crawl the indexes in QA environment.
We have more than 80 million+ transactions in QA which is making hard to crawl the indexes.

Is there anything we could do here to do the phase wise migration? Or do we have lucene support to query the contents as we need to query the contents under a specific path. But the current crawler using the SQL queries which is hard to query under a path.

Kindly help me on this.

Thanks,
Nikhilesh

This message contains information that may be privileged or confidential and is the property of the Capgemini Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.

Re: Manifold crawler issue | Alfresco | Not able to crawl large set of data

Posted by Karl Wright <da...@gmail.com>.

' and the job is hanging to process those many documents'

So the basic problem is that you aren't seeing the job advance?  Does the
job start?  Does it make some progress and then there is no further change?

What database are you using?  If this is the single process example without
any modification, the database is HSQLDB and it puts everything in memory,
so I wouldn't recommend it for anything at that kind of scale.

Karl


On Sat, Dec 1, 2018 at 1:03 AM Sivakoti, Nikhilesh <
nikhilesh.sivakoti@capgemini.com> wrote:

> Hi Rafa,
>
>
>
> Yes. It deals with REST API of alfresco. But that REST API fetches the
> nodes using the hibernate configuration. So the API queries the database
> tables (ex.alf_node, alf_node_properties) based on the transaction id.
>
>
>
> Here we have 81million + transactions in database and the job is hanging
> to process those many documents.
>
>
>
> _______________________________________________________________________
>
> [image: Email_CBE.gif]*Nikhilesh Sivakoti*
>
> Senior Consultant | I&D ECM
>
>
>
> Capgemini India | Bangalore
>
> Tel.: +8099062 – Mob.: + 91 81236 25125
>
> www.capgemini.com
>
>
>
> *People matter, results count.*
>
> _______________________________________________________________________
>
> [image: 50years]
>
>
>
> *Connect with Capgemini:*
> [image: Picto_Blog]
> <http://www.capgemini.com/insights-and-resources/blogs>[image:
> Picto_Twitter] <http://www.twitter.com/capgemini>[image: Picto_Facebook]
> <http://www.facebook.com/capgemini>[image: Picto_LinkedIn]
> <http://www.linkedin.com/company/capgemini>[image: Picto_Slideshare]
> <http://www.slideshare.net/capgemini>[image: Picto_YouTube]
> <http://www.youtube.com/capgeminimedia>
>
>
>
>
>
>
>
>
>
> P Please consider the environment and do not print this email unless
> absolutely necessary.
>
> Capgemini encourages environmental awareness.
>
>
>
> *From:* Rafa Haro [mailto:rharo@apache.org]
> *Sent:* Friday, November 30, 2018 10:19 PM
> *To:* user@manifoldcf.apache.org
> *Subject:* Re: Manifold crawler issue | Alfresco | Not able to crawl
> large set of data
>
>
>
> Hi, you said before that you were using Alfresco Webscript connector and
> that connector deals directly with content REST APIs, it has nothing do
> with database transactions, can you clarify on that please?
>
>
>
> Cheers,
>
> Rafa
>
>
>
> On Fri, Nov 30, 2018 at 5:25 PM Sivakoti, Nikhilesh <
> nikhilesh.sivakoti@capgemini.com> wrote:
>
> Hi Karl,
>
>
>
> Manifold crawler is failing to crawl all the transaction in Alfresco
> server. Alfresco DB have around 81 million+ transactions.
>
> Can we fine tune the performance in manifold server to handle the
> transactions?
>
> _______________________________________________________________________
>
> [image: Email_CBE.gif]*Nikhilesh Sivakoti*
>
> Senior Consultant | I&D ECM
>
>
>
> Capgemini India | Bangalore
>
> Tel.: +8099062 – Mob.: + 91 81236 25125
>
> www.capgemini.com
>
>
>
> *People matter, results count.*
>
> _______________________________________________________________________
>
> [image: 50years]
>
>
>
> *Connect with Capgemini:*
> [image: Picto_Blog]
> <http://www.capgemini.com/insights-and-resources/blogs>[image:
> Picto_Twitter] <http://www.twitter.com/capgemini>[image: Picto_Facebook]
> <http://www.facebook.com/capgemini>[image: Picto_LinkedIn]
> <http://www.linkedin.com/company/capgemini>[image: Picto_Slideshare]
> <http://www.slideshare.net/capgemini>[image: Picto_YouTube]
> <http://www.youtube.com/capgeminimedia>
>
>
>
>
>
>
>
>
>
> P Please consider the environment and do not print this email unless
> absolutely necessary.
>
> Capgemini encourages environmental awareness.
>
>
>
> *From:* Karl Wright [mailto:daddywri@gmail.com]
> *Sent:* Friday, November 30, 2018 9:00 PM
> *To:* user@manifoldcf.apache.org
> *Subject:* Re: Manifold crawler issue | Alfresco | Not able to crawl
> large set of data
>
>
>
> I'm sorry, you'll need to provide more details about what exactly you are
> running into trouble with.
>
> Specifically, this: " But the current crawler using the SQL queries which
> is hard to query under a path.  "
>
>
>
> Karl
>
>
>
> On Fri, Nov 30, 2018 at 4:42 AM Sivakoti, Nikhilesh <
> nikhilesh.sivakoti@capgemini.com> wrote:
>
> Hi Team,
>
>
>
> We have been migrating the indexes from GSA to Elastic search. We are
> using Manifold crawler with alfresco webscript connector.
>
> Crawler is able to crawl the less number of indexes. But it fails to crawl
> the indexes in QA environment.
>
> We have more than 80 million+ transactions in QA which is making hard to
> crawl the indexes.
>
>
>
> Is there anything we could do here to do the phase wise migration? Or do
> we have lucene support to query the contents as we need to query the
> contents under a specific path. But the current crawler using the SQL
> queries which is hard to query under a path.
>
>
>
> Kindly help me on this.
>
>
>
> Thanks,
>
> Nikhilesh
>
>
>
> This message contains information that may be privileged or confidential
> and is the property of the Capgemini Group. It is intended only for the
> person to whom it is addressed. If you are not the intended recipient, you
> are not authorized to read, print, retain, copy, disseminate, distribute,
> or use this message or any part thereof. If you receive this message in
> error, please notify the sender immediately and delete all copies of this
> message.
>
>

RE: Manifold crawler issue | Alfresco | Not able to crawl large set of data

Posted by "Sivakoti, Nikhilesh" <ni...@capgemini.com>.

Hi Rafa,

Yes. It deals with REST API of alfresco. But that REST API fetches the nodes using the hibernate configuration. So the API queries the database tables (ex.alf_node, alf_node_properties) based on the transaction id.

Here we have 81million + transactions in database and the job is hanging to process those many documents.

_______________________________________________________________________
[Email_CBE.gif]Nikhilesh Sivakoti
Senior Consultant | I&D ECM

Capgemini India | Bangalore
Tel.: +8099062 – Mob.: + 91 81236 25125
www.capgemini.com<http://www.capgemini.com/>

People matter, results count.
_______________________________________________________________________
[50years]



Connect with Capgemini:
[Picto_Blog]<http://www.capgemini.com/insights-and-resources/blogs>[Picto_Twitter]<http://www.twitter.com/capgemini>[Picto_Facebook]<http://www.facebook.com/capgemini>[Picto_LinkedIn]<http://www.linkedin.com/company/capgemini>[Picto_Slideshare]<http://www.slideshare.net/capgemini>[Picto_YouTube]<http://www.youtube.com/capgeminimedia>








P Please consider the environment and do not print this email unless absolutely necessary.
Capgemini encourages environmental awareness.

From: Rafa Haro [mailto:rharo@apache.org]
Sent: Friday, November 30, 2018 10:19 PM
To: user@manifoldcf.apache.org
Subject: Re: Manifold crawler issue | Alfresco | Not able to crawl large set of data

Hi, you said before that you were using Alfresco Webscript connector and that connector deals directly with content REST APIs, it has nothing do with database transactions, can you clarify on that please?

Cheers,
Rafa

On Fri, Nov 30, 2018 at 5:25 PM Sivakoti, Nikhilesh <ni...@capgemini.com>> wrote:
Hi Karl,

Manifold crawler is failing to crawl all the transaction in Alfresco server. Alfresco DB have around 81 million+ transactions.
Can we fine tune the performance in manifold server to handle the transactions?
_______________________________________________________________________
[Email_CBE.gif]Nikhilesh Sivakoti
Senior Consultant | I&D ECM

Capgemini India | Bangalore
Tel.: +8099062 – Mob.: + 91 81236 25125
www.capgemini.com<http://www.capgemini.com/>

People matter, results count.
_______________________________________________________________________
[50years]



Connect with Capgemini:
[Picto_Blog]<http://www.capgemini.com/insights-and-resources/blogs>[Picto_Twitter]<http://www.twitter.com/capgemini>[Picto_Facebook]<http://www.facebook.com/capgemini>[Picto_LinkedIn]<http://www.linkedin.com/company/capgemini>[Picto_Slideshare]<http://www.slideshare.net/capgemini>[Picto_YouTube]<http://www.youtube.com/capgeminimedia>








P Please consider the environment and do not print this email unless absolutely necessary.
Capgemini encourages environmental awareness.

From: Karl Wright [mailto:daddywri@gmail.com<ma...@gmail.com>]
Sent: Friday, November 30, 2018 9:00 PM
To: user@manifoldcf.apache.org<ma...@manifoldcf.apache.org>
Subject: Re: Manifold crawler issue | Alfresco | Not able to crawl large set of data

I'm sorry, you'll need to provide more details about what exactly you are running into trouble with.

Specifically, this: " But the current crawler using the SQL queries which is hard to query under a path.  "

Karl

On Fri, Nov 30, 2018 at 4:42 AM Sivakoti, Nikhilesh <ni...@capgemini.com>> wrote:
Hi Team,

We have been migrating the indexes from GSA to Elastic search. We are using Manifold crawler with alfresco webscript connector.
Crawler is able to crawl the less number of indexes. But it fails to crawl the indexes in QA environment.
We have more than 80 million+ transactions in QA which is making hard to crawl the indexes.

Is there anything we could do here to do the phase wise migration? Or do we have lucene support to query the contents as we need to query the contents under a specific path. But the current crawler using the SQL queries which is hard to query under a path.

Kindly help me on this.

Thanks,
Nikhilesh

This message contains information that may be privileged or confidential and is the property of the Capgemini Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.

Re: Manifold crawler issue | Alfresco | Not able to crawl large set of data

Posted by Rafa Haro <rh...@apache.org>.

Hi, you said before that you were using Alfresco Webscript connector and
that connector deals directly with content REST APIs, it has nothing do
with database transactions, can you clarify on that please?

Cheers,
Rafa

On Fri, Nov 30, 2018 at 5:25 PM Sivakoti, Nikhilesh <
nikhilesh.sivakoti@capgemini.com> wrote:

> Hi Karl,
>
>
>
> Manifold crawler is failing to crawl all the transaction in Alfresco
> server. Alfresco DB have around 81 million+ transactions.
>
> Can we fine tune the performance in manifold server to handle the
> transactions?
>
> _______________________________________________________________________
>
> [image: Email_CBE.gif]*Nikhilesh Sivakoti*
>
> Senior Consultant | I&D ECM
>
>
>
> Capgemini India | Bangalore
>
> Tel.: +8099062 – Mob.: + 91 81236 25125
>
> www.capgemini.com
>
>
>
> *People matter, results count.*
>
> _______________________________________________________________________
>
> [image: 50years]
>
>
>
> *Connect with Capgemini:*
> [image: Picto_Blog]
> <http://www.capgemini.com/insights-and-resources/blogs>[image:
> Picto_Twitter] <http://www.twitter.com/capgemini>[image: Picto_Facebook]
> <http://www.facebook.com/capgemini>[image: Picto_LinkedIn]
> <http://www.linkedin.com/company/capgemini>[image: Picto_Slideshare]
> <http://www.slideshare.net/capgemini>[image: Picto_YouTube]
> <http://www.youtube.com/capgeminimedia>
>
>
>
>
>
>
>
>
>
> P Please consider the environment and do not print this email unless
> absolutely necessary.
>
> Capgemini encourages environmental awareness.
>
>
>
> *From:* Karl Wright [mailto:daddywri@gmail.com]
> *Sent:* Friday, November 30, 2018 9:00 PM
> *To:* user@manifoldcf.apache.org
> *Subject:* Re: Manifold crawler issue | Alfresco | Not able to crawl
> large set of data
>
>
>
> I'm sorry, you'll need to provide more details about what exactly you are
> running into trouble with.
>
> Specifically, this: " But the current crawler using the SQL queries which
> is hard to query under a path.  "
>
>
>
> Karl
>
>
>
> On Fri, Nov 30, 2018 at 4:42 AM Sivakoti, Nikhilesh <
> nikhilesh.sivakoti@capgemini.com> wrote:
>
> Hi Team,
>
>
>
> We have been migrating the indexes from GSA to Elastic search. We are
> using Manifold crawler with alfresco webscript connector.
>
> Crawler is able to crawl the less number of indexes. But it fails to crawl
> the indexes in QA environment.
>
> We have more than 80 million+ transactions in QA which is making hard to
> crawl the indexes.
>
>
>
> Is there anything we could do here to do the phase wise migration? Or do
> we have lucene support to query the contents as we need to query the
> contents under a specific path. But the current crawler using the SQL
> queries which is hard to query under a path.
>
>
>
> Kindly help me on this.
>
>
>
> Thanks,
>
> Nikhilesh
>
>
>
> This message contains information that may be privileged or confidential
> and is the property of the Capgemini Group. It is intended only for the
> person to whom it is addressed. If you are not the intended recipient, you
> are not authorized to read, print, retain, copy, disseminate, distribute,
> or use this message or any part thereof. If you receive this message in
> error, please notify the sender immediately and delete all copies of this
> message.
>
>

RE: Manifold crawler issue | Alfresco | Not able to crawl large set of data

Posted by "Sivakoti, Nikhilesh" <ni...@capgemini.com>.

Hi Karl,

Manifold crawler is failing to crawl all the transaction in Alfresco server. Alfresco DB have around 81 million+ transactions.
Can we fine tune the performance in manifold server to handle the transactions?
_______________________________________________________________________
[Email_CBE.gif]Nikhilesh Sivakoti
Senior Consultant | I&D ECM

Capgemini India | Bangalore
Tel.: +8099062 – Mob.: + 91 81236 25125
www.capgemini.com<http://www.capgemini.com/>

People matter, results count.
_______________________________________________________________________
[50years]



Connect with Capgemini:
[Picto_Blog]<http://www.capgemini.com/insights-and-resources/blogs>[Picto_Twitter]<http://www.twitter.com/capgemini>[Picto_Facebook]<http://www.facebook.com/capgemini>[Picto_LinkedIn]<http://www.linkedin.com/company/capgemini>[Picto_Slideshare]<http://www.slideshare.net/capgemini>[Picto_YouTube]<http://www.youtube.com/capgeminimedia>








P Please consider the environment and do not print this email unless absolutely necessary.
Capgemini encourages environmental awareness.

From: Karl Wright [mailto:daddywri@gmail.com]
Sent: Friday, November 30, 2018 9:00 PM
To: user@manifoldcf.apache.org
Subject: Re: Manifold crawler issue | Alfresco | Not able to crawl large set of data

I'm sorry, you'll need to provide more details about what exactly you are running into trouble with.

Specifically, this: " But the current crawler using the SQL queries which is hard to query under a path.  "

Karl

On Fri, Nov 30, 2018 at 4:42 AM Sivakoti, Nikhilesh <ni...@capgemini.com>> wrote:
Hi Team,

We have been migrating the indexes from GSA to Elastic search. We are using Manifold crawler with alfresco webscript connector.
Crawler is able to crawl the less number of indexes. But it fails to crawl the indexes in QA environment.
We have more than 80 million+ transactions in QA which is making hard to crawl the indexes.

Is there anything we could do here to do the phase wise migration? Or do we have lucene support to query the contents as we need to query the contents under a specific path. But the current crawler using the SQL queries which is hard to query under a path.

Kindly help me on this.

Thanks,
Nikhilesh

This message contains information that may be privileged or confidential and is the property of the Capgemini Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.

Re: Manifold crawler issue | Alfresco | Not able to crawl large set of data

Posted by Karl Wright <da...@gmail.com>.

I'm sorry, you'll need to provide more details about what exactly you are
running into trouble with.

Specifically, this: " But the current crawler using the SQL queries which
is hard to query under a path.  "

Karl

On Fri, Nov 30, 2018 at 4:42 AM Sivakoti, Nikhilesh <
nikhilesh.sivakoti@capgemini.com> wrote:

> Hi Team,
>
>
>
> We have been migrating the indexes from GSA to Elastic search. We are
> using Manifold crawler with alfresco webscript connector.
>
> Crawler is able to crawl the less number of indexes. But it fails to crawl
> the indexes in QA environment.
>
> We have more than 80 million+ transactions in QA which is making hard to
> crawl the indexes.
>
>
>
> Is there anything we could do here to do the phase wise migration? Or do
> we have lucene support to query the contents as we need to query the
> contents under a specific path. But the current crawler using the SQL
> queries which is hard to query under a path.
>
>
>
> Kindly help me on this.
>
>
>
> Thanks,
>
> Nikhilesh
>
>
> This message contains information that may be privileged or confidential
> and is the property of the Capgemini Group. It is intended only for the
> person to whom it is addressed. If you are not the intended recipient, you
> are not authorized to read, print, retain, copy, disseminate, distribute,
> or use this message or any part thereof. If you receive this message in
> error, please notify the sender immediately and delete all copies of this
> message.
>