You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@manifoldcf.apache.org by Luca Alicata <al...@gmail.com> on 2016/05/06 08:32:26 UTC

Job with Generic Connector stop to work

Hi,
i'm using Manifold 2.2 with multi-process configuration in Jboss instance
inside a Windows Server 2012 and i've a set of job that work with
Sharepoint (SP) or Generic Connector (GC), that get file from a db.
With SP i've no problem, while with GC with a lot of document (one with 47k
and another with 60k), the Seed taking process, sometimes, not finish,
because the agents seem to stop (although java process is still alive).
After this, if i try to start any other job, that not start, like the
agents are stopped.

Other times, this jobs work correctly and one time together work correctly,
running in the same moment.

For information:

   - On Jboss there are only Manifold and Generic Repository application.


   - On the same Virtual Server, there is another Jboss istance, with solr
   istance and a web application.


   - I've check if it was a type of memory problem, but it's not the case.


   - GC with almost 23k seed work always, at least in test that i've done.


   - In local instance of Jboss with Manifold and Generic Rpository
   Application, i've not keep this problem.

This is the only recurrent information that i've seen on manifold.log:
---------------
Connection 0.0.0.0:62755<-><ip-address>:<port> shut down
Releasing connection
org.apache.http.impl.conn.ManagedClientConnectionImpl@6c98c1bd

---------------

Thanks,
L. Alicata

Re: Job with Generic Connector stop to work

Posted by Karl Wright <da...@gmail.com>.

CONNECTORS-1313

Karl


On Fri, May 6, 2016 at 10:02 AM, Karl Wright <da...@gmail.com> wrote:

> Hi Luca,
>
> This approach causes each document's binary data to be read more than
> once.  I think that is expensive, especially if there are a lot of values.
> for a row.
>
> Instead I think something more like ACLs will be needed -- that is, a
> separate query for each multi-valued field.  This is more work but it would
> work much better.
>
> I will create a ticket to add this to the JDBC connector, but it won't
> happen for a while.
>
> Karl
>
>
> On Fri, May 6, 2016 at 9:40 AM, Luca Alicata <al...@gmail.com>
> wrote:
>
>> I've decompile java connector and modified the code in this way:
>>
>> in process document, i see that just currently arrive all row of query
>> result (also multi values row), but in the cycle that parse document, after
>> first document with an ID, all the other with the same are skipped.
>> So i removed the control that not permits to check other document with
>> the same ID and i modified the method that store metadata, to permit to
>> store multi value data as array in metadata mapping.
>>
>> I attached the code in this e-mail. You can find a comment that start
>> with "---", that i insert know for you.
>>
>> Thanks,
>> L. Alicata
>>
>> 2016-05-06 15:25 GMT+02:00 Karl Wright <da...@gmail.com>:
>>
>>> Ok, it's now clear what you are looking for, but it is still not clear
>>> how we'd integrate that in the JDBC connector.  How did you do this when
>>> you modified the connector for 1.8?
>>>
>>> Karl
>>>
>>>
>>> On Fri, May 6, 2016 at 9:21 AM, Luca Alicata <al...@gmail.com>
>>> wrote:
>>>
>>>> Hi Karl,
>>>> sorry for my english :).
>>>> I mean the fact that i've to extract value from query with a join
>>>> between two table with a relationship of one-to-many, the dataset returned
>>>> from Connector is only one pair from the two table.
>>>>
>>>> For example:
>>>> Table A with persons
>>>> Table B with eyes
>>>>
>>>> As result of join, i aspect have two row like:
>>>> person 1, eye left
>>>> person 1, eye right
>>>>
>>>> but the connector returns only one row:
>>>> person 1, eye left
>>>>
>>>> I hope now it's more clear.
>>>>
>>>> Ps. i report the phrase on Manifold documentation that explain that (
>>>> https://manifoldcf.apache.org/release/release-2.3/en_US/end-user-documentation.html#jdbcrepository
>>>> ):
>>>> ------
>>>> There is currently no support in the JDBC connection type for natively
>>>> handling multi-valued metadata.
>>>> ------
>>>>
>>>> Thanks,
>>>> L. Alicata
>>>>
>>>>
>>>> 2016-05-06 15:10 GMT+02:00 Karl Wright <da...@gmail.com>:
>>>>
>>>>> Hi Luca,
>>>>>
>>>>> It is not clear what you mean by "multi value extraction" using the
>>>>> JDBC connector.  The JDBC connector allows collection of primary binary
>>>>> content as well as metadata from a database row.  So maybe if you can
>>>>> explain what you need beyond that it would help.
>>>>>
>>>>> Thanks,
>>>>> Karl
>>>>>
>>>>>
>>>>> On Fri, May 6, 2016 at 9:04 AM, Luca Alicata <al...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Karl,
>>>>>> thanks for information, fortunately in other jboss instance i have a
>>>>>> old Manifold configuration with single process, that i've dismissed. But in
>>>>>> this moment, i start to test this jobs with that and if it work fine, i can
>>>>>> use it only for this job and use it also in production. Maybe after, if i
>>>>>> can, i try to check the possible problem that stop the agent.
>>>>>>
>>>>>> I Take advantage of this discussion to ask you, if multi-value
>>>>>> extraction from db is consider as possible future work or no. Because i've
>>>>>> used this generi connector to resolve this lack of JDBC Connector. In fact
>>>>>> with Manifold 1.8 i've modified the connector to support this behavior (in
>>>>>> addiction to parse blob file), but upgrade Manifold Version, to not rewrite
>>>>>> the new connector i decide to use Generic Connector with application that
>>>>>> do the work of extraction data from DB.
>>>>>>
>>>>>> Thanks,
>>>>>> L. Alicata
>>>>>>
>>>>>> 2016-05-06 14:42 GMT+02:00 Karl Wright <da...@gmail.com>:
>>>>>>
>>>>>>> Hi Luca,
>>>>>>>
>>>>>>> If you do a lock clean and the process still stops, then the locks
>>>>>>> are not the problem.
>>>>>>>
>>>>>>> One way we can drill down into the problem is to get a thread dump
>>>>>>> of the agents process after it stops.  The thread dump must be of the
>>>>>>> agents process, not any of the others.
>>>>>>>
>>>>>>> FWIW, the generic connector is not well supported; the person who
>>>>>>> wrote it is still a committer but is not actively involved in MCF
>>>>>>> development at this time.  I suspect that the problem may have to do with
>>>>>>> how that connector deals with exceptions or errors, but I am not sure.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>>
>>>>>>> On Fri, May 6, 2016 at 8:38 AM, Luca Alicata <al...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Karl,
>>>>>>>> I've just tried with lock-clean after agents stop to work,
>>>>>>>> obviously after stopping process. After this, job start correctly, but just
>>>>>>>> second time that i start a job with a lot of data (or sometimes the third
>>>>>>>> time), agent stop again.
>>>>>>>>
>>>>>>>> Unfortunately, it's difficult start, for the moment, to using
>>>>>>>> Zookeeper in this environment, but this can resolve the fact that during
>>>>>>>> working agents stop to work? or help only for cleaning lock agent when i
>>>>>>>> restart the process?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> L. Alicata
>>>>>>>>
>>>>>>>> 2016-05-06 14:15 GMT+02:00 Karl Wright <da...@gmail.com>:
>>>>>>>>
>>>>>>>>> Hi Luca,
>>>>>>>>>
>>>>>>>>> With file-based synchronization, if you kill any of the processes
>>>>>>>>> involved, you will need to execute the lock-clean procedure to make sure
>>>>>>>>> you have no dangling locks in the file system.
>>>>>>>>>
>>>>>>>>> - shut down all MCF processes (except the database)
>>>>>>>>> - run the lock-clean script
>>>>>>>>> - start your MCF processes back up
>>>>>>>>>
>>>>>>>>> I suspect what you are seeing is related to this.
>>>>>>>>>
>>>>>>>>> Also, please consider using Zookeeper instead, since it is more
>>>>>>>>> robust about cleaning out dangling locks.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, May 6, 2016 at 8:06 AM, Luca Alicata <
>>>>>>>>> alicataluca@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Karl,
>>>>>>>>>> thanks for help.
>>>>>>>>>> In my case i've only one instance of MCF running, with both type
>>>>>>>>>> of job (SP and Generic), and so i have only one properties files (that i
>>>>>>>>>> have attached).
>>>>>>>>>> For information i used (multiprocess-file configuration) with
>>>>>>>>>> postgres.
>>>>>>>>>>
>>>>>>>>>> Do you have other suggestions? do you need more information, that
>>>>>>>>>> i can give you?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> L.Alicata
>>>>>>>>>>
>>>>>>>>>> 2016-05-06 12:55 GMT+02:00 Karl Wright <da...@gmail.com>:
>>>>>>>>>>
>>>>>>>>>>> Hi Luca,
>>>>>>>>>>>
>>>>>>>>>>> Do you have multiple independent MCF clusters running at the
>>>>>>>>>>> same time?  It sounds like you do: you have SP on one, and Generic on
>>>>>>>>>>> another.  If so, you will need to be sure that the synchronization you are
>>>>>>>>>>> using (either zookeeper or file-based) does not overlap.  Each cluster
>>>>>>>>>>> needs its own synchronization.  If there is overlap, then doing things with
>>>>>>>>>>> one cluster may cause the other cluster to hang.  This also means you have
>>>>>>>>>>> to have different properties files for the two clusters, of course.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, May 6, 2016 at 4:32 AM, Luca Alicata <
>>>>>>>>>>> alicataluca@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> i'm using Manifold 2.2 with multi-process configuration in
>>>>>>>>>>>> Jboss instance inside a Windows Server 2012 and i've a set of job that work
>>>>>>>>>>>> with Sharepoint (SP) or Generic Connector (GC), that get file from a db.
>>>>>>>>>>>> With SP i've no problem, while with GC with a lot of document
>>>>>>>>>>>> (one with 47k and another with 60k), the Seed taking process, sometimes,
>>>>>>>>>>>> not finish, because the agents seem to stop (although java process is still
>>>>>>>>>>>> alive).
>>>>>>>>>>>> After this, if i try to start any other job, that not start,
>>>>>>>>>>>> like the agents are stopped.
>>>>>>>>>>>>
>>>>>>>>>>>> Other times, this jobs work correctly and one time together
>>>>>>>>>>>> work correctly, running in the same moment.
>>>>>>>>>>>>
>>>>>>>>>>>> For information:
>>>>>>>>>>>>
>>>>>>>>>>>>    - On Jboss there are only Manifold and Generic Repository
>>>>>>>>>>>>    application.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    - On the same Virtual Server, there is another Jboss
>>>>>>>>>>>>    istance, with solr istance and a web application.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    - I've check if it was a type of memory problem, but it's
>>>>>>>>>>>>    not the case.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    - GC with almost 23k seed work always, at least in test
>>>>>>>>>>>>    that i've done.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    - In local instance of Jboss with Manifold and Generic
>>>>>>>>>>>>    Rpository Application, i've not keep this problem.
>>>>>>>>>>>>
>>>>>>>>>>>> This is the only recurrent information that i've seen on
>>>>>>>>>>>> manifold.log:
>>>>>>>>>>>> ---------------
>>>>>>>>>>>> Connection 0.0.0.0:62755<-><ip-address>:<port> shut down
>>>>>>>>>>>> Releasing connection
>>>>>>>>>>>> org.apache.http.impl.conn.ManagedClientConnectionImpl@6c98c1bd
>>>>>>>>>>>>
>>>>>>>>>>>> ---------------
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> L. Alicata
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Job with Generic Connector stop to work

Posted by Luca Alicata <al...@gmail.com>.

Hi Karl,
unfortunately i'm busy in this day, but i try to test and let you known.

Thanks,
L. Alciata

2016-05-09 18:04 GMT+02:00 Karl Wright <da...@gmail.com>:

> Hi Luca,
>
> I've put together code that should allow multivalued attributes to be
> crawled.  In order to try it, you will need to check out the
> CONNECTORS-1313 branch:
>
> svn checkout
> https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-1313
>
> Then, build:
>
> ant make-core-deps
> ant build
>
> Please give this a try and see if it works for you.
>
> Thanks,
> Karl
>
>
> On Fri, May 6, 2016 at 10:15 AM, Luca Alicata <al...@gmail.com>
> wrote:
>
>> Hi Karl,
>> I can confirm that it is a little expensive, but at that time, i haven't
>> much time, and i stop to work after found the solution.
>> Thanks for the creation of the ticket, for the moment, i try to use
>> generic connector.
>>
>> An other question, there is another connector that can use an application
>> to receive data? Like GenericConnector?
>>
>> Thanks,
>> L. Alicata
>>
>> 2016-05-06 16:02 GMT+02:00 Karl Wright <da...@gmail.com>:
>>
>>> Hi Luca,
>>>
>>> This approach causes each document's binary data to be read more than
>>> once.  I think that is expensive, especially if there are a lot of values.
>>> for a row.
>>>
>>> Instead I think something more like ACLs will be needed -- that is, a
>>> separate query for each multi-valued field.  This is more work but it would
>>> work much better.
>>>
>>> I will create a ticket to add this to the JDBC connector, but it won't
>>> happen for a while.
>>>
>>> Karl
>>>
>>>
>>> On Fri, May 6, 2016 at 9:40 AM, Luca Alicata <al...@gmail.com>
>>> wrote:
>>>
>>>> I've decompile java connector and modified the code in this way:
>>>>
>>>> in process document, i see that just currently arrive all row of query
>>>> result (also multi values row), but in the cycle that parse document, after
>>>> first document with an ID, all the other with the same are skipped.
>>>> So i removed the control that not permits to check other document with
>>>> the same ID and i modified the method that store metadata, to permit to
>>>> store multi value data as array in metadata mapping.
>>>>
>>>> I attached the code in this e-mail. You can find a comment that start
>>>> with "---", that i insert know for you.
>>>>
>>>> Thanks,
>>>> L. Alicata
>>>>
>>>> 2016-05-06 15:25 GMT+02:00 Karl Wright <da...@gmail.com>:
>>>>
>>>>> Ok, it's now clear what you are looking for, but it is still not clear
>>>>> how we'd integrate that in the JDBC connector.  How did you do this when
>>>>> you modified the connector for 1.8?
>>>>>
>>>>> Karl
>>>>>
>>>>>
>>>>> On Fri, May 6, 2016 at 9:21 AM, Luca Alicata <al...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Karl,
>>>>>> sorry for my english :).
>>>>>> I mean the fact that i've to extract value from query with a join
>>>>>> between two table with a relationship of one-to-many, the dataset returned
>>>>>> from Connector is only one pair from the two table.
>>>>>>
>>>>>> For example:
>>>>>> Table A with persons
>>>>>> Table B with eyes
>>>>>>
>>>>>> As result of join, i aspect have two row like:
>>>>>> person 1, eye left
>>>>>> person 1, eye right
>>>>>>
>>>>>> but the connector returns only one row:
>>>>>> person 1, eye left
>>>>>>
>>>>>> I hope now it's more clear.
>>>>>>
>>>>>> Ps. i report the phrase on Manifold documentation that explain that (
>>>>>> https://manifoldcf.apache.org/release/release-2.3/en_US/end-user-documentation.html#jdbcrepository
>>>>>> ):
>>>>>> ------
>>>>>> There is currently no support in the JDBC connection type for
>>>>>> natively handling multi-valued metadata.
>>>>>> ------
>>>>>>
>>>>>> Thanks,
>>>>>> L. Alicata
>>>>>>
>>>>>>
>>>>>> 2016-05-06 15:10 GMT+02:00 Karl Wright <da...@gmail.com>:
>>>>>>
>>>>>>> Hi Luca,
>>>>>>>
>>>>>>> It is not clear what you mean by "multi value extraction" using the
>>>>>>> JDBC connector.  The JDBC connector allows collection of primary binary
>>>>>>> content as well as metadata from a database row.  So maybe if you can
>>>>>>> explain what you need beyond that it would help.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Karl
>>>>>>>
>>>>>>>
>>>>>>> On Fri, May 6, 2016 at 9:04 AM, Luca Alicata <al...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Karl,
>>>>>>>> thanks for information, fortunately in other jboss instance i have
>>>>>>>> a old Manifold configuration with single process, that i've dismissed. But
>>>>>>>> in this moment, i start to test this jobs with that and if it work fine, i
>>>>>>>> can use it only for this job and use it also in production. Maybe after, if
>>>>>>>> i can, i try to check the possible problem that stop the agent.
>>>>>>>>
>>>>>>>> I Take advantage of this discussion to ask you, if multi-value
>>>>>>>> extraction from db is consider as possible future work or no. Because i've
>>>>>>>> used this generi connector to resolve this lack of JDBC Connector. In fact
>>>>>>>> with Manifold 1.8 i've modified the connector to support this behavior (in
>>>>>>>> addiction to parse blob file), but upgrade Manifold Version, to not rewrite
>>>>>>>> the new connector i decide to use Generic Connector with application that
>>>>>>>> do the work of extraction data from DB.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> L. Alicata
>>>>>>>>
>>>>>>>> 2016-05-06 14:42 GMT+02:00 Karl Wright <da...@gmail.com>:
>>>>>>>>
>>>>>>>>> Hi Luca,
>>>>>>>>>
>>>>>>>>> If you do a lock clean and the process still stops, then the locks
>>>>>>>>> are not the problem.
>>>>>>>>>
>>>>>>>>> One way we can drill down into the problem is to get a thread dump
>>>>>>>>> of the agents process after it stops.  The thread dump must be of the
>>>>>>>>> agents process, not any of the others.
>>>>>>>>>
>>>>>>>>> FWIW, the generic connector is not well supported; the person who
>>>>>>>>> wrote it is still a committer but is not actively involved in MCF
>>>>>>>>> development at this time.  I suspect that the problem may have to do with
>>>>>>>>> how that connector deals with exceptions or errors, but I am not sure.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, May 6, 2016 at 8:38 AM, Luca Alicata <
>>>>>>>>> alicataluca@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Karl,
>>>>>>>>>> I've just tried with lock-clean after agents stop to work,
>>>>>>>>>> obviously after stopping process. After this, job start correctly, but just
>>>>>>>>>> second time that i start a job with a lot of data (or sometimes the third
>>>>>>>>>> time), agent stop again.
>>>>>>>>>>
>>>>>>>>>> Unfortunately, it's difficult start, for the moment, to using
>>>>>>>>>> Zookeeper in this environment, but this can resolve the fact that during
>>>>>>>>>> working agents stop to work? or help only for cleaning lock agent when i
>>>>>>>>>> restart the process?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> L. Alicata
>>>>>>>>>>
>>>>>>>>>> 2016-05-06 14:15 GMT+02:00 Karl Wright <da...@gmail.com>:
>>>>>>>>>>
>>>>>>>>>>> Hi Luca,
>>>>>>>>>>>
>>>>>>>>>>> With file-based synchronization, if you kill any of the
>>>>>>>>>>> processes involved, you will need to execute the lock-clean procedure to
>>>>>>>>>>> make sure you have no dangling locks in the file system.
>>>>>>>>>>>
>>>>>>>>>>> - shut down all MCF processes (except the database)
>>>>>>>>>>> - run the lock-clean script
>>>>>>>>>>> - start your MCF processes back up
>>>>>>>>>>>
>>>>>>>>>>> I suspect what you are seeing is related to this.
>>>>>>>>>>>
>>>>>>>>>>> Also, please consider using Zookeeper instead, since it is more
>>>>>>>>>>> robust about cleaning out dangling locks.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, May 6, 2016 at 8:06 AM, Luca Alicata <
>>>>>>>>>>> alicataluca@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>> thanks for help.
>>>>>>>>>>>> In my case i've only one instance of MCF running, with both
>>>>>>>>>>>> type of job (SP and Generic), and so i have only one properties files (that
>>>>>>>>>>>> i have attached).
>>>>>>>>>>>> For information i used (multiprocess-file configuration) with
>>>>>>>>>>>> postgres.
>>>>>>>>>>>>
>>>>>>>>>>>> Do you have other suggestions? do you need more information,
>>>>>>>>>>>> that i can give you?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> L.Alicata
>>>>>>>>>>>>
>>>>>>>>>>>> 2016-05-06 12:55 GMT+02:00 Karl Wright <da...@gmail.com>:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Luca,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Do you have multiple independent MCF clusters running at the
>>>>>>>>>>>>> same time?  It sounds like you do: you have SP on one, and Generic on
>>>>>>>>>>>>> another.  If so, you will need to be sure that the synchronization you are
>>>>>>>>>>>>> using (either zookeeper or file-based) does not overlap.  Each cluster
>>>>>>>>>>>>> needs its own synchronization.  If there is overlap, then doing things with
>>>>>>>>>>>>> one cluster may cause the other cluster to hang.  This also means you have
>>>>>>>>>>>>> to have different properties files for the two clusters, of course.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, May 6, 2016 at 4:32 AM, Luca Alicata <
>>>>>>>>>>>>> alicataluca@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>> i'm using Manifold 2.2 with multi-process configuration in
>>>>>>>>>>>>>> Jboss instance inside a Windows Server 2012 and i've a set of job that work
>>>>>>>>>>>>>> with Sharepoint (SP) or Generic Connector (GC), that get file from a db.
>>>>>>>>>>>>>> With SP i've no problem, while with GC with a lot of document
>>>>>>>>>>>>>> (one with 47k and another with 60k), the Seed taking process, sometimes,
>>>>>>>>>>>>>> not finish, because the agents seem to stop (although java process is still
>>>>>>>>>>>>>> alive).
>>>>>>>>>>>>>> After this, if i try to start any other job, that not start,
>>>>>>>>>>>>>> like the agents are stopped.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Other times, this jobs work correctly and one time together
>>>>>>>>>>>>>> work correctly, running in the same moment.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> For information:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    - On Jboss there are only Manifold and Generic Repository
>>>>>>>>>>>>>>    application.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    - On the same Virtual Server, there is another Jboss
>>>>>>>>>>>>>>    istance, with solr istance and a web application.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    - I've check if it was a type of memory problem, but it's
>>>>>>>>>>>>>>    not the case.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    - GC with almost 23k seed work always, at least in test
>>>>>>>>>>>>>>    that i've done.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    - In local instance of Jboss with Manifold and Generic
>>>>>>>>>>>>>>    Rpository Application, i've not keep this problem.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This is the only recurrent information that i've seen on
>>>>>>>>>>>>>> manifold.log:
>>>>>>>>>>>>>> ---------------
>>>>>>>>>>>>>> Connection 0.0.0.0:62755<-><ip-address>:<port> shut down
>>>>>>>>>>>>>> Releasing connection
>>>>>>>>>>>>>> org.apache.http.impl.conn.ManagedClientConnectionImpl@6c98c1bd
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ---------------
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> L. Alicata
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Job with Generic Connector stop to work

Posted by Karl Wright <da...@gmail.com>.

Hi Luca,

I've put together code that should allow multivalued attributes to be
crawled.  In order to try it, you will need to check out the
CONNECTORS-1313 branch:

svn checkout
https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-1313

Then, build:

ant make-core-deps
ant build

Please give this a try and see if it works for you.

Thanks,
Karl


On Fri, May 6, 2016 at 10:15 AM, Luca Alicata <al...@gmail.com> wrote:

> Hi Karl,
> I can confirm that it is a little expensive, but at that time, i haven't
> much time, and i stop to work after found the solution.
> Thanks for the creation of the ticket, for the moment, i try to use
> generic connector.
>
> An other question, there is another connector that can use an application
> to receive data? Like GenericConnector?
>
> Thanks,
> L. Alicata
>
> 2016-05-06 16:02 GMT+02:00 Karl Wright <da...@gmail.com>:
>
>> Hi Luca,
>>
>> This approach causes each document's binary data to be read more than
>> once.  I think that is expensive, especially if there are a lot of values.
>> for a row.
>>
>> Instead I think something more like ACLs will be needed -- that is, a
>> separate query for each multi-valued field.  This is more work but it would
>> work much better.
>>
>> I will create a ticket to add this to the JDBC connector, but it won't
>> happen for a while.
>>
>> Karl
>>
>>
>> On Fri, May 6, 2016 at 9:40 AM, Luca Alicata <al...@gmail.com>
>> wrote:
>>
>>> I've decompile java connector and modified the code in this way:
>>>
>>> in process document, i see that just currently arrive all row of query
>>> result (also multi values row), but in the cycle that parse document, after
>>> first document with an ID, all the other with the same are skipped.
>>> So i removed the control that not permits to check other document with
>>> the same ID and i modified the method that store metadata, to permit to
>>> store multi value data as array in metadata mapping.
>>>
>>> I attached the code in this e-mail. You can find a comment that start
>>> with "---", that i insert know for you.
>>>
>>> Thanks,
>>> L. Alicata
>>>
>>> 2016-05-06 15:25 GMT+02:00 Karl Wright <da...@gmail.com>:
>>>
>>>> Ok, it's now clear what you are looking for, but it is still not clear
>>>> how we'd integrate that in the JDBC connector.  How did you do this when
>>>> you modified the connector for 1.8?
>>>>
>>>> Karl
>>>>
>>>>
>>>> On Fri, May 6, 2016 at 9:21 AM, Luca Alicata <al...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Karl,
>>>>> sorry for my english :).
>>>>> I mean the fact that i've to extract value from query with a join
>>>>> between two table with a relationship of one-to-many, the dataset returned
>>>>> from Connector is only one pair from the two table.
>>>>>
>>>>> For example:
>>>>> Table A with persons
>>>>> Table B with eyes
>>>>>
>>>>> As result of join, i aspect have two row like:
>>>>> person 1, eye left
>>>>> person 1, eye right
>>>>>
>>>>> but the connector returns only one row:
>>>>> person 1, eye left
>>>>>
>>>>> I hope now it's more clear.
>>>>>
>>>>> Ps. i report the phrase on Manifold documentation that explain that (
>>>>> https://manifoldcf.apache.org/release/release-2.3/en_US/end-user-documentation.html#jdbcrepository
>>>>> ):
>>>>> ------
>>>>> There is currently no support in the JDBC connection type for natively
>>>>> handling multi-valued metadata.
>>>>> ------
>>>>>
>>>>> Thanks,
>>>>> L. Alicata
>>>>>
>>>>>
>>>>> 2016-05-06 15:10 GMT+02:00 Karl Wright <da...@gmail.com>:
>>>>>
>>>>>> Hi Luca,
>>>>>>
>>>>>> It is not clear what you mean by "multi value extraction" using the
>>>>>> JDBC connector.  The JDBC connector allows collection of primary binary
>>>>>> content as well as metadata from a database row.  So maybe if you can
>>>>>> explain what you need beyond that it would help.
>>>>>>
>>>>>> Thanks,
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>> On Fri, May 6, 2016 at 9:04 AM, Luca Alicata <al...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Karl,
>>>>>>> thanks for information, fortunately in other jboss instance i have a
>>>>>>> old Manifold configuration with single process, that i've dismissed. But in
>>>>>>> this moment, i start to test this jobs with that and if it work fine, i can
>>>>>>> use it only for this job and use it also in production. Maybe after, if i
>>>>>>> can, i try to check the possible problem that stop the agent.
>>>>>>>
>>>>>>> I Take advantage of this discussion to ask you, if multi-value
>>>>>>> extraction from db is consider as possible future work or no. Because i've
>>>>>>> used this generi connector to resolve this lack of JDBC Connector. In fact
>>>>>>> with Manifold 1.8 i've modified the connector to support this behavior (in
>>>>>>> addiction to parse blob file), but upgrade Manifold Version, to not rewrite
>>>>>>> the new connector i decide to use Generic Connector with application that
>>>>>>> do the work of extraction data from DB.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> L. Alicata
>>>>>>>
>>>>>>> 2016-05-06 14:42 GMT+02:00 Karl Wright <da...@gmail.com>:
>>>>>>>
>>>>>>>> Hi Luca,
>>>>>>>>
>>>>>>>> If you do a lock clean and the process still stops, then the locks
>>>>>>>> are not the problem.
>>>>>>>>
>>>>>>>> One way we can drill down into the problem is to get a thread dump
>>>>>>>> of the agents process after it stops.  The thread dump must be of the
>>>>>>>> agents process, not any of the others.
>>>>>>>>
>>>>>>>> FWIW, the generic connector is not well supported; the person who
>>>>>>>> wrote it is still a committer but is not actively involved in MCF
>>>>>>>> development at this time.  I suspect that the problem may have to do with
>>>>>>>> how that connector deals with exceptions or errors, but I am not sure.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Karl
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, May 6, 2016 at 8:38 AM, Luca Alicata <alicataluca@gmail.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> Hi Karl,
>>>>>>>>> I've just tried with lock-clean after agents stop to work,
>>>>>>>>> obviously after stopping process. After this, job start correctly, but just
>>>>>>>>> second time that i start a job with a lot of data (or sometimes the third
>>>>>>>>> time), agent stop again.
>>>>>>>>>
>>>>>>>>> Unfortunately, it's difficult start, for the moment, to using
>>>>>>>>> Zookeeper in this environment, but this can resolve the fact that during
>>>>>>>>> working agents stop to work? or help only for cleaning lock agent when i
>>>>>>>>> restart the process?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> L. Alicata
>>>>>>>>>
>>>>>>>>> 2016-05-06 14:15 GMT+02:00 Karl Wright <da...@gmail.com>:
>>>>>>>>>
>>>>>>>>>> Hi Luca,
>>>>>>>>>>
>>>>>>>>>> With file-based synchronization, if you kill any of the processes
>>>>>>>>>> involved, you will need to execute the lock-clean procedure to make sure
>>>>>>>>>> you have no dangling locks in the file system.
>>>>>>>>>>
>>>>>>>>>> - shut down all MCF processes (except the database)
>>>>>>>>>> - run the lock-clean script
>>>>>>>>>> - start your MCF processes back up
>>>>>>>>>>
>>>>>>>>>> I suspect what you are seeing is related to this.
>>>>>>>>>>
>>>>>>>>>> Also, please consider using Zookeeper instead, since it is more
>>>>>>>>>> robust about cleaning out dangling locks.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, May 6, 2016 at 8:06 AM, Luca Alicata <
>>>>>>>>>> alicataluca@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Karl,
>>>>>>>>>>> thanks for help.
>>>>>>>>>>> In my case i've only one instance of MCF running, with both type
>>>>>>>>>>> of job (SP and Generic), and so i have only one properties files (that i
>>>>>>>>>>> have attached).
>>>>>>>>>>> For information i used (multiprocess-file configuration) with
>>>>>>>>>>> postgres.
>>>>>>>>>>>
>>>>>>>>>>> Do you have other suggestions? do you need more information,
>>>>>>>>>>> that i can give you?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> L.Alicata
>>>>>>>>>>>
>>>>>>>>>>> 2016-05-06 12:55 GMT+02:00 Karl Wright <da...@gmail.com>:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Luca,
>>>>>>>>>>>>
>>>>>>>>>>>> Do you have multiple independent MCF clusters running at the
>>>>>>>>>>>> same time?  It sounds like you do: you have SP on one, and Generic on
>>>>>>>>>>>> another.  If so, you will need to be sure that the synchronization you are
>>>>>>>>>>>> using (either zookeeper or file-based) does not overlap.  Each cluster
>>>>>>>>>>>> needs its own synchronization.  If there is overlap, then doing things with
>>>>>>>>>>>> one cluster may cause the other cluster to hang.  This also means you have
>>>>>>>>>>>> to have different properties files for the two clusters, of course.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Karl
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, May 6, 2016 at 4:32 AM, Luca Alicata <
>>>>>>>>>>>> alicataluca@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>> i'm using Manifold 2.2 with multi-process configuration in
>>>>>>>>>>>>> Jboss instance inside a Windows Server 2012 and i've a set of job that work
>>>>>>>>>>>>> with Sharepoint (SP) or Generic Connector (GC), that get file from a db.
>>>>>>>>>>>>> With SP i've no problem, while with GC with a lot of document
>>>>>>>>>>>>> (one with 47k and another with 60k), the Seed taking process, sometimes,
>>>>>>>>>>>>> not finish, because the agents seem to stop (although java process is still
>>>>>>>>>>>>> alive).
>>>>>>>>>>>>> After this, if i try to start any other job, that not start,
>>>>>>>>>>>>> like the agents are stopped.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Other times, this jobs work correctly and one time together
>>>>>>>>>>>>> work correctly, running in the same moment.
>>>>>>>>>>>>>
>>>>>>>>>>>>> For information:
>>>>>>>>>>>>>
>>>>>>>>>>>>>    - On Jboss there are only Manifold and Generic Repository
>>>>>>>>>>>>>    application.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    - On the same Virtual Server, there is another Jboss
>>>>>>>>>>>>>    istance, with solr istance and a web application.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    - I've check if it was a type of memory problem, but it's
>>>>>>>>>>>>>    not the case.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    - GC with almost 23k seed work always, at least in test
>>>>>>>>>>>>>    that i've done.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    - In local instance of Jboss with Manifold and Generic
>>>>>>>>>>>>>    Rpository Application, i've not keep this problem.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is the only recurrent information that i've seen on
>>>>>>>>>>>>> manifold.log:
>>>>>>>>>>>>> ---------------
>>>>>>>>>>>>> Connection 0.0.0.0:62755<-><ip-address>:<port> shut down
>>>>>>>>>>>>> Releasing connection
>>>>>>>>>>>>> org.apache.http.impl.conn.ManagedClientConnectionImpl@6c98c1bd
>>>>>>>>>>>>>
>>>>>>>>>>>>> ---------------
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> L. Alicata
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Job with Generic Connector stop to work

Posted by Luca Alicata <al...@gmail.com>.

Hi Karl,
I can confirm that it is a little expensive, but at that time, i haven't
much time, and i stop to work after found the solution.
Thanks for the creation of the ticket, for the moment, i try to use generic
connector.

An other question, there is another connector that can use an application
to receive data? Like GenericConnector?

Thanks,
L. Alicata

2016-05-06 16:02 GMT+02:00 Karl Wright <da...@gmail.com>:

> Hi Luca,
>
> This approach causes each document's binary data to be read more than
> once.  I think that is expensive, especially if there are a lot of values.
> for a row.
>
> Instead I think something more like ACLs will be needed -- that is, a
> separate query for each multi-valued field.  This is more work but it would
> work much better.
>
> I will create a ticket to add this to the JDBC connector, but it won't
> happen for a while.
>
> Karl
>
>
> On Fri, May 6, 2016 at 9:40 AM, Luca Alicata <al...@gmail.com>
> wrote:
>
>> I've decompile java connector and modified the code in this way:
>>
>> in process document, i see that just currently arrive all row of query
>> result (also multi values row), but in the cycle that parse document, after
>> first document with an ID, all the other with the same are skipped.
>> So i removed the control that not permits to check other document with
>> the same ID and i modified the method that store metadata, to permit to
>> store multi value data as array in metadata mapping.
>>
>> I attached the code in this e-mail. You can find a comment that start
>> with "---", that i insert know for you.
>>
>> Thanks,
>> L. Alicata
>>
>> 2016-05-06 15:25 GMT+02:00 Karl Wright <da...@gmail.com>:
>>
>>> Ok, it's now clear what you are looking for, but it is still not clear
>>> how we'd integrate that in the JDBC connector.  How did you do this when
>>> you modified the connector for 1.8?
>>>
>>> Karl
>>>
>>>
>>> On Fri, May 6, 2016 at 9:21 AM, Luca Alicata <al...@gmail.com>
>>> wrote:
>>>
>>>> Hi Karl,
>>>> sorry for my english :).
>>>> I mean the fact that i've to extract value from query with a join
>>>> between two table with a relationship of one-to-many, the dataset returned
>>>> from Connector is only one pair from the two table.
>>>>
>>>> For example:
>>>> Table A with persons
>>>> Table B with eyes
>>>>
>>>> As result of join, i aspect have two row like:
>>>> person 1, eye left
>>>> person 1, eye right
>>>>
>>>> but the connector returns only one row:
>>>> person 1, eye left
>>>>
>>>> I hope now it's more clear.
>>>>
>>>> Ps. i report the phrase on Manifold documentation that explain that (
>>>> https://manifoldcf.apache.org/release/release-2.3/en_US/end-user-documentation.html#jdbcrepository
>>>> ):
>>>> ------
>>>> There is currently no support in the JDBC connection type for natively
>>>> handling multi-valued metadata.
>>>> ------
>>>>
>>>> Thanks,
>>>> L. Alicata
>>>>
>>>>
>>>> 2016-05-06 15:10 GMT+02:00 Karl Wright <da...@gmail.com>:
>>>>
>>>>> Hi Luca,
>>>>>
>>>>> It is not clear what you mean by "multi value extraction" using the
>>>>> JDBC connector.  The JDBC connector allows collection of primary binary
>>>>> content as well as metadata from a database row.  So maybe if you can
>>>>> explain what you need beyond that it would help.
>>>>>
>>>>> Thanks,
>>>>> Karl
>>>>>
>>>>>
>>>>> On Fri, May 6, 2016 at 9:04 AM, Luca Alicata <al...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Karl,
>>>>>> thanks for information, fortunately in other jboss instance i have a
>>>>>> old Manifold configuration with single process, that i've dismissed. But in
>>>>>> this moment, i start to test this jobs with that and if it work fine, i can
>>>>>> use it only for this job and use it also in production. Maybe after, if i
>>>>>> can, i try to check the possible problem that stop the agent.
>>>>>>
>>>>>> I Take advantage of this discussion to ask you, if multi-value
>>>>>> extraction from db is consider as possible future work or no. Because i've
>>>>>> used this generi connector to resolve this lack of JDBC Connector. In fact
>>>>>> with Manifold 1.8 i've modified the connector to support this behavior (in
>>>>>> addiction to parse blob file), but upgrade Manifold Version, to not rewrite
>>>>>> the new connector i decide to use Generic Connector with application that
>>>>>> do the work of extraction data from DB.
>>>>>>
>>>>>> Thanks,
>>>>>> L. Alicata
>>>>>>
>>>>>> 2016-05-06 14:42 GMT+02:00 Karl Wright <da...@gmail.com>:
>>>>>>
>>>>>>> Hi Luca,
>>>>>>>
>>>>>>> If you do a lock clean and the process still stops, then the locks
>>>>>>> are not the problem.
>>>>>>>
>>>>>>> One way we can drill down into the problem is to get a thread dump
>>>>>>> of the agents process after it stops.  The thread dump must be of the
>>>>>>> agents process, not any of the others.
>>>>>>>
>>>>>>> FWIW, the generic connector is not well supported; the person who
>>>>>>> wrote it is still a committer but is not actively involved in MCF
>>>>>>> development at this time.  I suspect that the problem may have to do with
>>>>>>> how that connector deals with exceptions or errors, but I am not sure.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>>
>>>>>>> On Fri, May 6, 2016 at 8:38 AM, Luca Alicata <al...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Karl,
>>>>>>>> I've just tried with lock-clean after agents stop to work,
>>>>>>>> obviously after stopping process. After this, job start correctly, but just
>>>>>>>> second time that i start a job with a lot of data (or sometimes the third
>>>>>>>> time), agent stop again.
>>>>>>>>
>>>>>>>> Unfortunately, it's difficult start, for the moment, to using
>>>>>>>> Zookeeper in this environment, but this can resolve the fact that during
>>>>>>>> working agents stop to work? or help only for cleaning lock agent when i
>>>>>>>> restart the process?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> L. Alicata
>>>>>>>>
>>>>>>>> 2016-05-06 14:15 GMT+02:00 Karl Wright <da...@gmail.com>:
>>>>>>>>
>>>>>>>>> Hi Luca,
>>>>>>>>>
>>>>>>>>> With file-based synchronization, if you kill any of the processes
>>>>>>>>> involved, you will need to execute the lock-clean procedure to make sure
>>>>>>>>> you have no dangling locks in the file system.
>>>>>>>>>
>>>>>>>>> - shut down all MCF processes (except the database)
>>>>>>>>> - run the lock-clean script
>>>>>>>>> - start your MCF processes back up
>>>>>>>>>
>>>>>>>>> I suspect what you are seeing is related to this.
>>>>>>>>>
>>>>>>>>> Also, please consider using Zookeeper instead, since it is more
>>>>>>>>> robust about cleaning out dangling locks.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, May 6, 2016 at 8:06 AM, Luca Alicata <
>>>>>>>>> alicataluca@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Karl,
>>>>>>>>>> thanks for help.
>>>>>>>>>> In my case i've only one instance of MCF running, with both type
>>>>>>>>>> of job (SP and Generic), and so i have only one properties files (that i
>>>>>>>>>> have attached).
>>>>>>>>>> For information i used (multiprocess-file configuration) with
>>>>>>>>>> postgres.
>>>>>>>>>>
>>>>>>>>>> Do you have other suggestions? do you need more information, that
>>>>>>>>>> i can give you?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> L.Alicata
>>>>>>>>>>
>>>>>>>>>> 2016-05-06 12:55 GMT+02:00 Karl Wright <da...@gmail.com>:
>>>>>>>>>>
>>>>>>>>>>> Hi Luca,
>>>>>>>>>>>
>>>>>>>>>>> Do you have multiple independent MCF clusters running at the
>>>>>>>>>>> same time?  It sounds like you do: you have SP on one, and Generic on
>>>>>>>>>>> another.  If so, you will need to be sure that the synchronization you are
>>>>>>>>>>> using (either zookeeper or file-based) does not overlap.  Each cluster
>>>>>>>>>>> needs its own synchronization.  If there is overlap, then doing things with
>>>>>>>>>>> one cluster may cause the other cluster to hang.  This also means you have
>>>>>>>>>>> to have different properties files for the two clusters, of course.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, May 6, 2016 at 4:32 AM, Luca Alicata <
>>>>>>>>>>> alicataluca@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> i'm using Manifold 2.2 with multi-process configuration in
>>>>>>>>>>>> Jboss instance inside a Windows Server 2012 and i've a set of job that work
>>>>>>>>>>>> with Sharepoint (SP) or Generic Connector (GC), that get file from a db.
>>>>>>>>>>>> With SP i've no problem, while with GC with a lot of document
>>>>>>>>>>>> (one with 47k and another with 60k), the Seed taking process, sometimes,
>>>>>>>>>>>> not finish, because the agents seem to stop (although java process is still
>>>>>>>>>>>> alive).
>>>>>>>>>>>> After this, if i try to start any other job, that not start,
>>>>>>>>>>>> like the agents are stopped.
>>>>>>>>>>>>
>>>>>>>>>>>> Other times, this jobs work correctly and one time together
>>>>>>>>>>>> work correctly, running in the same moment.
>>>>>>>>>>>>
>>>>>>>>>>>> For information:
>>>>>>>>>>>>
>>>>>>>>>>>>    - On Jboss there are only Manifold and Generic Repository
>>>>>>>>>>>>    application.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    - On the same Virtual Server, there is another Jboss
>>>>>>>>>>>>    istance, with solr istance and a web application.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    - I've check if it was a type of memory problem, but it's
>>>>>>>>>>>>    not the case.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    - GC with almost 23k seed work always, at least in test
>>>>>>>>>>>>    that i've done.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    - In local instance of Jboss with Manifold and Generic
>>>>>>>>>>>>    Rpository Application, i've not keep this problem.
>>>>>>>>>>>>
>>>>>>>>>>>> This is the only recurrent information that i've seen on
>>>>>>>>>>>> manifold.log:
>>>>>>>>>>>> ---------------
>>>>>>>>>>>> Connection 0.0.0.0:62755<-><ip-address>:<port> shut down
>>>>>>>>>>>> Releasing connection
>>>>>>>>>>>> org.apache.http.impl.conn.ManagedClientConnectionImpl@6c98c1bd
>>>>>>>>>>>>
>>>>>>>>>>>> ---------------
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> L. Alicata
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Job with Generic Connector stop to work

Posted by Karl Wright <da...@gmail.com>.

Hi Luca,

This approach causes each document's binary data to be read more than
once.  I think that is expensive, especially if there are a lot of values.
for a row.

Instead I think something more like ACLs will be needed -- that is, a
separate query for each multi-valued field.  This is more work but it would
work much better.

I will create a ticket to add this to the JDBC connector, but it won't
happen for a while.

Karl


On Fri, May 6, 2016 at 9:40 AM, Luca Alicata <al...@gmail.com> wrote:

> I've decompile java connector and modified the code in this way:
>
> in process document, i see that just currently arrive all row of query
> result (also multi values row), but in the cycle that parse document, after
> first document with an ID, all the other with the same are skipped.
> So i removed the control that not permits to check other document with the
> same ID and i modified the method that store metadata, to permit to store
> multi value data as array in metadata mapping.
>
> I attached the code in this e-mail. You can find a comment that start with
> "---", that i insert know for you.
>
> Thanks,
> L. Alicata
>
> 2016-05-06 15:25 GMT+02:00 Karl Wright <da...@gmail.com>:
>
>> Ok, it's now clear what you are looking for, but it is still not clear
>> how we'd integrate that in the JDBC connector.  How did you do this when
>> you modified the connector for 1.8?
>>
>> Karl
>>
>>
>> On Fri, May 6, 2016 at 9:21 AM, Luca Alicata <al...@gmail.com>
>> wrote:
>>
>>> Hi Karl,
>>> sorry for my english :).
>>> I mean the fact that i've to extract value from query with a join
>>> between two table with a relationship of one-to-many, the dataset returned
>>> from Connector is only one pair from the two table.
>>>
>>> For example:
>>> Table A with persons
>>> Table B with eyes
>>>
>>> As result of join, i aspect have two row like:
>>> person 1, eye left
>>> person 1, eye right
>>>
>>> but the connector returns only one row:
>>> person 1, eye left
>>>
>>> I hope now it's more clear.
>>>
>>> Ps. i report the phrase on Manifold documentation that explain that (
>>> https://manifoldcf.apache.org/release/release-2.3/en_US/end-user-documentation.html#jdbcrepository
>>> ):
>>> ------
>>> There is currently no support in the JDBC connection type for natively
>>> handling multi-valued metadata.
>>> ------
>>>
>>> Thanks,
>>> L. Alicata
>>>
>>>
>>> 2016-05-06 15:10 GMT+02:00 Karl Wright <da...@gmail.com>:
>>>
>>>> Hi Luca,
>>>>
>>>> It is not clear what you mean by "multi value extraction" using the
>>>> JDBC connector.  The JDBC connector allows collection of primary binary
>>>> content as well as metadata from a database row.  So maybe if you can
>>>> explain what you need beyond that it would help.
>>>>
>>>> Thanks,
>>>> Karl
>>>>
>>>>
>>>> On Fri, May 6, 2016 at 9:04 AM, Luca Alicata <al...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Karl,
>>>>> thanks for information, fortunately in other jboss instance i have a
>>>>> old Manifold configuration with single process, that i've dismissed. But in
>>>>> this moment, i start to test this jobs with that and if it work fine, i can
>>>>> use it only for this job and use it also in production. Maybe after, if i
>>>>> can, i try to check the possible problem that stop the agent.
>>>>>
>>>>> I Take advantage of this discussion to ask you, if multi-value
>>>>> extraction from db is consider as possible future work or no. Because i've
>>>>> used this generi connector to resolve this lack of JDBC Connector. In fact
>>>>> with Manifold 1.8 i've modified the connector to support this behavior (in
>>>>> addiction to parse blob file), but upgrade Manifold Version, to not rewrite
>>>>> the new connector i decide to use Generic Connector with application that
>>>>> do the work of extraction data from DB.
>>>>>
>>>>> Thanks,
>>>>> L. Alicata
>>>>>
>>>>> 2016-05-06 14:42 GMT+02:00 Karl Wright <da...@gmail.com>:
>>>>>
>>>>>> Hi Luca,
>>>>>>
>>>>>> If you do a lock clean and the process still stops, then the locks
>>>>>> are not the problem.
>>>>>>
>>>>>> One way we can drill down into the problem is to get a thread dump of
>>>>>> the agents process after it stops.  The thread dump must be of the agents
>>>>>> process, not any of the others.
>>>>>>
>>>>>> FWIW, the generic connector is not well supported; the person who
>>>>>> wrote it is still a committer but is not actively involved in MCF
>>>>>> development at this time.  I suspect that the problem may have to do with
>>>>>> how that connector deals with exceptions or errors, but I am not sure.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>> On Fri, May 6, 2016 at 8:38 AM, Luca Alicata <al...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Karl,
>>>>>>> I've just tried with lock-clean after agents stop to work, obviously
>>>>>>> after stopping process. After this, job start correctly, but just second
>>>>>>> time that i start a job with a lot of data (or sometimes the third time),
>>>>>>> agent stop again.
>>>>>>>
>>>>>>> Unfortunately, it's difficult start, for the moment, to using
>>>>>>> Zookeeper in this environment, but this can resolve the fact that during
>>>>>>> working agents stop to work? or help only for cleaning lock agent when i
>>>>>>> restart the process?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> L. Alicata
>>>>>>>
>>>>>>> 2016-05-06 14:15 GMT+02:00 Karl Wright <da...@gmail.com>:
>>>>>>>
>>>>>>>> Hi Luca,
>>>>>>>>
>>>>>>>> With file-based synchronization, if you kill any of the processes
>>>>>>>> involved, you will need to execute the lock-clean procedure to make sure
>>>>>>>> you have no dangling locks in the file system.
>>>>>>>>
>>>>>>>> - shut down all MCF processes (except the database)
>>>>>>>> - run the lock-clean script
>>>>>>>> - start your MCF processes back up
>>>>>>>>
>>>>>>>> I suspect what you are seeing is related to this.
>>>>>>>>
>>>>>>>> Also, please consider using Zookeeper instead, since it is more
>>>>>>>> robust about cleaning out dangling locks.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Karl
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, May 6, 2016 at 8:06 AM, Luca Alicata <alicataluca@gmail.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> Hi Karl,
>>>>>>>>> thanks for help.
>>>>>>>>> In my case i've only one instance of MCF running, with both type
>>>>>>>>> of job (SP and Generic), and so i have only one properties files (that i
>>>>>>>>> have attached).
>>>>>>>>> For information i used (multiprocess-file configuration) with
>>>>>>>>> postgres.
>>>>>>>>>
>>>>>>>>> Do you have other suggestions? do you need more information, that
>>>>>>>>> i can give you?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> L.Alicata
>>>>>>>>>
>>>>>>>>> 2016-05-06 12:55 GMT+02:00 Karl Wright <da...@gmail.com>:
>>>>>>>>>
>>>>>>>>>> Hi Luca,
>>>>>>>>>>
>>>>>>>>>> Do you have multiple independent MCF clusters running at the same
>>>>>>>>>> time?  It sounds like you do: you have SP on one, and Generic on another.
>>>>>>>>>> If so, you will need to be sure that the synchronization you are using
>>>>>>>>>> (either zookeeper or file-based) does not overlap.  Each cluster needs its
>>>>>>>>>> own synchronization.  If there is overlap, then doing things with one
>>>>>>>>>> cluster may cause the other cluster to hang.  This also means you have to
>>>>>>>>>> have different properties files for the two clusters, of course.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, May 6, 2016 at 4:32 AM, Luca Alicata <
>>>>>>>>>> alicataluca@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>> i'm using Manifold 2.2 with multi-process configuration in Jboss
>>>>>>>>>>> instance inside a Windows Server 2012 and i've a set of job that work with
>>>>>>>>>>> Sharepoint (SP) or Generic Connector (GC), that get file from a db.
>>>>>>>>>>> With SP i've no problem, while with GC with a lot of document
>>>>>>>>>>> (one with 47k and another with 60k), the Seed taking process, sometimes,
>>>>>>>>>>> not finish, because the agents seem to stop (although java process is still
>>>>>>>>>>> alive).
>>>>>>>>>>> After this, if i try to start any other job, that not start,
>>>>>>>>>>> like the agents are stopped.
>>>>>>>>>>>
>>>>>>>>>>> Other times, this jobs work correctly and one time together work
>>>>>>>>>>> correctly, running in the same moment.
>>>>>>>>>>>
>>>>>>>>>>> For information:
>>>>>>>>>>>
>>>>>>>>>>>    - On Jboss there are only Manifold and Generic Repository
>>>>>>>>>>>    application.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    - On the same Virtual Server, there is another Jboss
>>>>>>>>>>>    istance, with solr istance and a web application.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    - I've check if it was a type of memory problem, but it's
>>>>>>>>>>>    not the case.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    - GC with almost 23k seed work always, at least in test that
>>>>>>>>>>>    i've done.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    - In local instance of Jboss with Manifold and Generic
>>>>>>>>>>>    Rpository Application, i've not keep this problem.
>>>>>>>>>>>
>>>>>>>>>>> This is the only recurrent information that i've seen on
>>>>>>>>>>> manifold.log:
>>>>>>>>>>> ---------------
>>>>>>>>>>> Connection 0.0.0.0:62755<-><ip-address>:<port> shut down
>>>>>>>>>>> Releasing connection
>>>>>>>>>>> org.apache.http.impl.conn.ManagedClientConnectionImpl@6c98c1bd
>>>>>>>>>>>
>>>>>>>>>>> ---------------
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> L. Alicata
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Job with Generic Connector stop to work

Posted by Luca Alicata <al...@gmail.com>.

I've decompile java connector and modified the code in this way:

in process document, i see that just currently arrive all row of query
result (also multi values row), but in the cycle that parse document, after
first document with an ID, all the other with the same are skipped.
So i removed the control that not permits to check other document with the
same ID and i modified the method that store metadata, to permit to store
multi value data as array in metadata mapping.

I attached the code in this e-mail. You can find a comment that start with
"---", that i insert know for you.

Thanks,
L. Alicata

2016-05-06 15:25 GMT+02:00 Karl Wright <da...@gmail.com>:

> Ok, it's now clear what you are looking for, but it is still not clear how
> we'd integrate that in the JDBC connector.  How did you do this when you
> modified the connector for 1.8?
>
> Karl
>
>
> On Fri, May 6, 2016 at 9:21 AM, Luca Alicata <al...@gmail.com>
> wrote:
>
>> Hi Karl,
>> sorry for my english :).
>> I mean the fact that i've to extract value from query with a join between
>> two table with a relationship of one-to-many, the dataset returned from
>> Connector is only one pair from the two table.
>>
>> For example:
>> Table A with persons
>> Table B with eyes
>>
>> As result of join, i aspect have two row like:
>> person 1, eye left
>> person 1, eye right
>>
>> but the connector returns only one row:
>> person 1, eye left
>>
>> I hope now it's more clear.
>>
>> Ps. i report the phrase on Manifold documentation that explain that (
>> https://manifoldcf.apache.org/release/release-2.3/en_US/end-user-documentation.html#jdbcrepository
>> ):
>> ------
>> There is currently no support in the JDBC connection type for natively
>> handling multi-valued metadata.
>> ------
>>
>> Thanks,
>> L. Alicata
>>
>>
>> 2016-05-06 15:10 GMT+02:00 Karl Wright <da...@gmail.com>:
>>
>>> Hi Luca,
>>>
>>> It is not clear what you mean by "multi value extraction" using the JDBC
>>> connector.  The JDBC connector allows collection of primary binary content
>>> as well as metadata from a database row.  So maybe if you can explain what
>>> you need beyond that it would help.
>>>
>>> Thanks,
>>> Karl
>>>
>>>
>>> On Fri, May 6, 2016 at 9:04 AM, Luca Alicata <al...@gmail.com>
>>> wrote:
>>>
>>>> Hi Karl,
>>>> thanks for information, fortunately in other jboss instance i have a
>>>> old Manifold configuration with single process, that i've dismissed. But in
>>>> this moment, i start to test this jobs with that and if it work fine, i can
>>>> use it only for this job and use it also in production. Maybe after, if i
>>>> can, i try to check the possible problem that stop the agent.
>>>>
>>>> I Take advantage of this discussion to ask you, if multi-value
>>>> extraction from db is consider as possible future work or no. Because i've
>>>> used this generi connector to resolve this lack of JDBC Connector. In fact
>>>> with Manifold 1.8 i've modified the connector to support this behavior (in
>>>> addiction to parse blob file), but upgrade Manifold Version, to not rewrite
>>>> the new connector i decide to use Generic Connector with application that
>>>> do the work of extraction data from DB.
>>>>
>>>> Thanks,
>>>> L. Alicata
>>>>
>>>> 2016-05-06 14:42 GMT+02:00 Karl Wright <da...@gmail.com>:
>>>>
>>>>> Hi Luca,
>>>>>
>>>>> If you do a lock clean and the process still stops, then the locks are
>>>>> not the problem.
>>>>>
>>>>> One way we can drill down into the problem is to get a thread dump of
>>>>> the agents process after it stops.  The thread dump must be of the agents
>>>>> process, not any of the others.
>>>>>
>>>>> FWIW, the generic connector is not well supported; the person who
>>>>> wrote it is still a committer but is not actively involved in MCF
>>>>> development at this time.  I suspect that the problem may have to do with
>>>>> how that connector deals with exceptions or errors, but I am not sure.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Karl
>>>>>
>>>>>
>>>>> On Fri, May 6, 2016 at 8:38 AM, Luca Alicata <al...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Karl,
>>>>>> I've just tried with lock-clean after agents stop to work, obviously
>>>>>> after stopping process. After this, job start correctly, but just second
>>>>>> time that i start a job with a lot of data (or sometimes the third time),
>>>>>> agent stop again.
>>>>>>
>>>>>> Unfortunately, it's difficult start, for the moment, to using
>>>>>> Zookeeper in this environment, but this can resolve the fact that during
>>>>>> working agents stop to work? or help only for cleaning lock agent when i
>>>>>> restart the process?
>>>>>>
>>>>>> Thanks,
>>>>>> L. Alicata
>>>>>>
>>>>>> 2016-05-06 14:15 GMT+02:00 Karl Wright <da...@gmail.com>:
>>>>>>
>>>>>>> Hi Luca,
>>>>>>>
>>>>>>> With file-based synchronization, if you kill any of the processes
>>>>>>> involved, you will need to execute the lock-clean procedure to make sure
>>>>>>> you have no dangling locks in the file system.
>>>>>>>
>>>>>>> - shut down all MCF processes (except the database)
>>>>>>> - run the lock-clean script
>>>>>>> - start your MCF processes back up
>>>>>>>
>>>>>>> I suspect what you are seeing is related to this.
>>>>>>>
>>>>>>> Also, please consider using Zookeeper instead, since it is more
>>>>>>> robust about cleaning out dangling locks.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Karl
>>>>>>>
>>>>>>>
>>>>>>> On Fri, May 6, 2016 at 8:06 AM, Luca Alicata <al...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Karl,
>>>>>>>> thanks for help.
>>>>>>>> In my case i've only one instance of MCF running, with both type of
>>>>>>>> job (SP and Generic), and so i have only one properties files (that i have
>>>>>>>> attached).
>>>>>>>> For information i used (multiprocess-file configuration) with
>>>>>>>> postgres.
>>>>>>>>
>>>>>>>> Do you have other suggestions? do you need more information, that i
>>>>>>>> can give you?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> L.Alicata
>>>>>>>>
>>>>>>>> 2016-05-06 12:55 GMT+02:00 Karl Wright <da...@gmail.com>:
>>>>>>>>
>>>>>>>>> Hi Luca,
>>>>>>>>>
>>>>>>>>> Do you have multiple independent MCF clusters running at the same
>>>>>>>>> time?  It sounds like you do: you have SP on one, and Generic on another.
>>>>>>>>> If so, you will need to be sure that the synchronization you are using
>>>>>>>>> (either zookeeper or file-based) does not overlap.  Each cluster needs its
>>>>>>>>> own synchronization.  If there is overlap, then doing things with one
>>>>>>>>> cluster may cause the other cluster to hang.  This also means you have to
>>>>>>>>> have different properties files for the two clusters, of course.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, May 6, 2016 at 4:32 AM, Luca Alicata <
>>>>>>>>> alicataluca@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>> i'm using Manifold 2.2 with multi-process configuration in Jboss
>>>>>>>>>> instance inside a Windows Server 2012 and i've a set of job that work with
>>>>>>>>>> Sharepoint (SP) or Generic Connector (GC), that get file from a db.
>>>>>>>>>> With SP i've no problem, while with GC with a lot of document
>>>>>>>>>> (one with 47k and another with 60k), the Seed taking process, sometimes,
>>>>>>>>>> not finish, because the agents seem to stop (although java process is still
>>>>>>>>>> alive).
>>>>>>>>>> After this, if i try to start any other job, that not start, like
>>>>>>>>>> the agents are stopped.
>>>>>>>>>>
>>>>>>>>>> Other times, this jobs work correctly and one time together work
>>>>>>>>>> correctly, running in the same moment.
>>>>>>>>>>
>>>>>>>>>> For information:
>>>>>>>>>>
>>>>>>>>>>    - On Jboss there are only Manifold and Generic Repository
>>>>>>>>>>    application.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    - On the same Virtual Server, there is another Jboss istance,
>>>>>>>>>>    with solr istance and a web application.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    - I've check if it was a type of memory problem, but it's not
>>>>>>>>>>    the case.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    - GC with almost 23k seed work always, at least in test that
>>>>>>>>>>    i've done.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    - In local instance of Jboss with Manifold and Generic
>>>>>>>>>>    Rpository Application, i've not keep this problem.
>>>>>>>>>>
>>>>>>>>>> This is the only recurrent information that i've seen on
>>>>>>>>>> manifold.log:
>>>>>>>>>> ---------------
>>>>>>>>>> Connection 0.0.0.0:62755<-><ip-address>:<port> shut down
>>>>>>>>>> Releasing connection
>>>>>>>>>> org.apache.http.impl.conn.ManagedClientConnectionImpl@6c98c1bd
>>>>>>>>>>
>>>>>>>>>> ---------------
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> L. Alicata
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Job with Generic Connector stop to work

Posted by Karl Wright <da...@gmail.com>.

Ok, it's now clear what you are looking for, but it is still not clear how
we'd integrate that in the JDBC connector.  How did you do this when you
modified the connector for 1.8?

Karl


On Fri, May 6, 2016 at 9:21 AM, Luca Alicata <al...@gmail.com> wrote:

> Hi Karl,
> sorry for my english :).
> I mean the fact that i've to extract value from query with a join between
> two table with a relationship of one-to-many, the dataset returned from
> Connector is only one pair from the two table.
>
> For example:
> Table A with persons
> Table B with eyes
>
> As result of join, i aspect have two row like:
> person 1, eye left
> person 1, eye right
>
> but the connector returns only one row:
> person 1, eye left
>
> I hope now it's more clear.
>
> Ps. i report the phrase on Manifold documentation that explain that (
> https://manifoldcf.apache.org/release/release-2.3/en_US/end-user-documentation.html#jdbcrepository
> ):
> ------
> There is currently no support in the JDBC connection type for natively
> handling multi-valued metadata.
> ------
>
> Thanks,
> L. Alicata
>
>
> 2016-05-06 15:10 GMT+02:00 Karl Wright <da...@gmail.com>:
>
>> Hi Luca,
>>
>> It is not clear what you mean by "multi value extraction" using the JDBC
>> connector.  The JDBC connector allows collection of primary binary content
>> as well as metadata from a database row.  So maybe if you can explain what
>> you need beyond that it would help.
>>
>> Thanks,
>> Karl
>>
>>
>> On Fri, May 6, 2016 at 9:04 AM, Luca Alicata <al...@gmail.com>
>> wrote:
>>
>>> Hi Karl,
>>> thanks for information, fortunately in other jboss instance i have a old
>>> Manifold configuration with single process, that i've dismissed. But in
>>> this moment, i start to test this jobs with that and if it work fine, i can
>>> use it only for this job and use it also in production. Maybe after, if i
>>> can, i try to check the possible problem that stop the agent.
>>>
>>> I Take advantage of this discussion to ask you, if multi-value
>>> extraction from db is consider as possible future work or no. Because i've
>>> used this generi connector to resolve this lack of JDBC Connector. In fact
>>> with Manifold 1.8 i've modified the connector to support this behavior (in
>>> addiction to parse blob file), but upgrade Manifold Version, to not rewrite
>>> the new connector i decide to use Generic Connector with application that
>>> do the work of extraction data from DB.
>>>
>>> Thanks,
>>> L. Alicata
>>>
>>> 2016-05-06 14:42 GMT+02:00 Karl Wright <da...@gmail.com>:
>>>
>>>> Hi Luca,
>>>>
>>>> If you do a lock clean and the process still stops, then the locks are
>>>> not the problem.
>>>>
>>>> One way we can drill down into the problem is to get a thread dump of
>>>> the agents process after it stops.  The thread dump must be of the agents
>>>> process, not any of the others.
>>>>
>>>> FWIW, the generic connector is not well supported; the person who wrote
>>>> it is still a committer but is not actively involved in MCF development at
>>>> this time.  I suspect that the problem may have to do with how that
>>>> connector deals with exceptions or errors, but I am not sure.
>>>>
>>>> Thanks,
>>>>
>>>> Karl
>>>>
>>>>
>>>> On Fri, May 6, 2016 at 8:38 AM, Luca Alicata <al...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Karl,
>>>>> I've just tried with lock-clean after agents stop to work, obviously
>>>>> after stopping process. After this, job start correctly, but just second
>>>>> time that i start a job with a lot of data (or sometimes the third time),
>>>>> agent stop again.
>>>>>
>>>>> Unfortunately, it's difficult start, for the moment, to using
>>>>> Zookeeper in this environment, but this can resolve the fact that during
>>>>> working agents stop to work? or help only for cleaning lock agent when i
>>>>> restart the process?
>>>>>
>>>>> Thanks,
>>>>> L. Alicata
>>>>>
>>>>> 2016-05-06 14:15 GMT+02:00 Karl Wright <da...@gmail.com>:
>>>>>
>>>>>> Hi Luca,
>>>>>>
>>>>>> With file-based synchronization, if you kill any of the processes
>>>>>> involved, you will need to execute the lock-clean procedure to make sure
>>>>>> you have no dangling locks in the file system.
>>>>>>
>>>>>> - shut down all MCF processes (except the database)
>>>>>> - run the lock-clean script
>>>>>> - start your MCF processes back up
>>>>>>
>>>>>> I suspect what you are seeing is related to this.
>>>>>>
>>>>>> Also, please consider using Zookeeper instead, since it is more
>>>>>> robust about cleaning out dangling locks.
>>>>>>
>>>>>> Thanks,
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>> On Fri, May 6, 2016 at 8:06 AM, Luca Alicata <al...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Karl,
>>>>>>> thanks for help.
>>>>>>> In my case i've only one instance of MCF running, with both type of
>>>>>>> job (SP and Generic), and so i have only one properties files (that i have
>>>>>>> attached).
>>>>>>> For information i used (multiprocess-file configuration) with
>>>>>>> postgres.
>>>>>>>
>>>>>>> Do you have other suggestions? do you need more information, that i
>>>>>>> can give you?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> L.Alicata
>>>>>>>
>>>>>>> 2016-05-06 12:55 GMT+02:00 Karl Wright <da...@gmail.com>:
>>>>>>>
>>>>>>>> Hi Luca,
>>>>>>>>
>>>>>>>> Do you have multiple independent MCF clusters running at the same
>>>>>>>> time?  It sounds like you do: you have SP on one, and Generic on another.
>>>>>>>> If so, you will need to be sure that the synchronization you are using
>>>>>>>> (either zookeeper or file-based) does not overlap.  Each cluster needs its
>>>>>>>> own synchronization.  If there is overlap, then doing things with one
>>>>>>>> cluster may cause the other cluster to hang.  This also means you have to
>>>>>>>> have different properties files for the two clusters, of course.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Karl
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, May 6, 2016 at 4:32 AM, Luca Alicata <alicataluca@gmail.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>> i'm using Manifold 2.2 with multi-process configuration in Jboss
>>>>>>>>> instance inside a Windows Server 2012 and i've a set of job that work with
>>>>>>>>> Sharepoint (SP) or Generic Connector (GC), that get file from a db.
>>>>>>>>> With SP i've no problem, while with GC with a lot of document (one
>>>>>>>>> with 47k and another with 60k), the Seed taking process, sometimes, not
>>>>>>>>> finish, because the agents seem to stop (although java process is still
>>>>>>>>> alive).
>>>>>>>>> After this, if i try to start any other job, that not start, like
>>>>>>>>> the agents are stopped.
>>>>>>>>>
>>>>>>>>> Other times, this jobs work correctly and one time together work
>>>>>>>>> correctly, running in the same moment.
>>>>>>>>>
>>>>>>>>> For information:
>>>>>>>>>
>>>>>>>>>    - On Jboss there are only Manifold and Generic Repository
>>>>>>>>>    application.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    - On the same Virtual Server, there is another Jboss istance,
>>>>>>>>>    with solr istance and a web application.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    - I've check if it was a type of memory problem, but it's not
>>>>>>>>>    the case.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    - GC with almost 23k seed work always, at least in test that
>>>>>>>>>    i've done.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    - In local instance of Jboss with Manifold and Generic
>>>>>>>>>    Rpository Application, i've not keep this problem.
>>>>>>>>>
>>>>>>>>> This is the only recurrent information that i've seen on
>>>>>>>>> manifold.log:
>>>>>>>>> ---------------
>>>>>>>>> Connection 0.0.0.0:62755<-><ip-address>:<port> shut down
>>>>>>>>> Releasing connection
>>>>>>>>> org.apache.http.impl.conn.ManagedClientConnectionImpl@6c98c1bd
>>>>>>>>>
>>>>>>>>> ---------------
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> L. Alicata
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Job with Generic Connector stop to work

Posted by Luca Alicata <al...@gmail.com>.

Hi Karl,
sorry for my english :).
I mean the fact that i've to extract value from query with a join between
two table with a relationship of one-to-many, the dataset returned from
Connector is only one pair from the two table.

For example:
Table A with persons
Table B with eyes

As result of join, i aspect have two row like:
person 1, eye left
person 1, eye right

but the connector returns only one row:
person 1, eye left

I hope now it's more clear.

Ps. i report the phrase on Manifold documentation that explain that (
https://manifoldcf.apache.org/release/release-2.3/en_US/end-user-documentation.html#jdbcrepository
):
------
There is currently no support in the JDBC connection type for natively
handling multi-valued metadata.
------

Thanks,
L. Alicata


2016-05-06 15:10 GMT+02:00 Karl Wright <da...@gmail.com>:

> Hi Luca,
>
> It is not clear what you mean by "multi value extraction" using the JDBC
> connector.  The JDBC connector allows collection of primary binary content
> as well as metadata from a database row.  So maybe if you can explain what
> you need beyond that it would help.
>
> Thanks,
> Karl
>
>
> On Fri, May 6, 2016 at 9:04 AM, Luca Alicata <al...@gmail.com>
> wrote:
>
>> Hi Karl,
>> thanks for information, fortunately in other jboss instance i have a old
>> Manifold configuration with single process, that i've dismissed. But in
>> this moment, i start to test this jobs with that and if it work fine, i can
>> use it only for this job and use it also in production. Maybe after, if i
>> can, i try to check the possible problem that stop the agent.
>>
>> I Take advantage of this discussion to ask you, if multi-value extraction
>> from db is consider as possible future work or no. Because i've used this
>> generi connector to resolve this lack of JDBC Connector. In fact with
>> Manifold 1.8 i've modified the connector to support this behavior (in
>> addiction to parse blob file), but upgrade Manifold Version, to not rewrite
>> the new connector i decide to use Generic Connector with application that
>> do the work of extraction data from DB.
>>
>> Thanks,
>> L. Alicata
>>
>> 2016-05-06 14:42 GMT+02:00 Karl Wright <da...@gmail.com>:
>>
>>> Hi Luca,
>>>
>>> If you do a lock clean and the process still stops, then the locks are
>>> not the problem.
>>>
>>> One way we can drill down into the problem is to get a thread dump of
>>> the agents process after it stops.  The thread dump must be of the agents
>>> process, not any of the others.
>>>
>>> FWIW, the generic connector is not well supported; the person who wrote
>>> it is still a committer but is not actively involved in MCF development at
>>> this time.  I suspect that the problem may have to do with how that
>>> connector deals with exceptions or errors, but I am not sure.
>>>
>>> Thanks,
>>>
>>> Karl
>>>
>>>
>>> On Fri, May 6, 2016 at 8:38 AM, Luca Alicata <al...@gmail.com>
>>> wrote:
>>>
>>>> Hi Karl,
>>>> I've just tried with lock-clean after agents stop to work, obviously
>>>> after stopping process. After this, job start correctly, but just second
>>>> time that i start a job with a lot of data (or sometimes the third time),
>>>> agent stop again.
>>>>
>>>> Unfortunately, it's difficult start, for the moment, to using Zookeeper
>>>> in this environment, but this can resolve the fact that during working
>>>> agents stop to work? or help only for cleaning lock agent when i restart
>>>> the process?
>>>>
>>>> Thanks,
>>>> L. Alicata
>>>>
>>>> 2016-05-06 14:15 GMT+02:00 Karl Wright <da...@gmail.com>:
>>>>
>>>>> Hi Luca,
>>>>>
>>>>> With file-based synchronization, if you kill any of the processes
>>>>> involved, you will need to execute the lock-clean procedure to make sure
>>>>> you have no dangling locks in the file system.
>>>>>
>>>>> - shut down all MCF processes (except the database)
>>>>> - run the lock-clean script
>>>>> - start your MCF processes back up
>>>>>
>>>>> I suspect what you are seeing is related to this.
>>>>>
>>>>> Also, please consider using Zookeeper instead, since it is more robust
>>>>> about cleaning out dangling locks.
>>>>>
>>>>> Thanks,
>>>>> Karl
>>>>>
>>>>>
>>>>> On Fri, May 6, 2016 at 8:06 AM, Luca Alicata <al...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Karl,
>>>>>> thanks for help.
>>>>>> In my case i've only one instance of MCF running, with both type of
>>>>>> job (SP and Generic), and so i have only one properties files (that i have
>>>>>> attached).
>>>>>> For information i used (multiprocess-file configuration) with
>>>>>> postgres.
>>>>>>
>>>>>> Do you have other suggestions? do you need more information, that i
>>>>>> can give you?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> L.Alicata
>>>>>>
>>>>>> 2016-05-06 12:55 GMT+02:00 Karl Wright <da...@gmail.com>:
>>>>>>
>>>>>>> Hi Luca,
>>>>>>>
>>>>>>> Do you have multiple independent MCF clusters running at the same
>>>>>>> time?  It sounds like you do: you have SP on one, and Generic on another.
>>>>>>> If so, you will need to be sure that the synchronization you are using
>>>>>>> (either zookeeper or file-based) does not overlap.  Each cluster needs its
>>>>>>> own synchronization.  If there is overlap, then doing things with one
>>>>>>> cluster may cause the other cluster to hang.  This also means you have to
>>>>>>> have different properties files for the two clusters, of course.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Karl
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, May 6, 2016 at 4:32 AM, Luca Alicata <al...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>> i'm using Manifold 2.2 with multi-process configuration in Jboss
>>>>>>>> instance inside a Windows Server 2012 and i've a set of job that work with
>>>>>>>> Sharepoint (SP) or Generic Connector (GC), that get file from a db.
>>>>>>>> With SP i've no problem, while with GC with a lot of document (one
>>>>>>>> with 47k and another with 60k), the Seed taking process, sometimes, not
>>>>>>>> finish, because the agents seem to stop (although java process is still
>>>>>>>> alive).
>>>>>>>> After this, if i try to start any other job, that not start, like
>>>>>>>> the agents are stopped.
>>>>>>>>
>>>>>>>> Other times, this jobs work correctly and one time together work
>>>>>>>> correctly, running in the same moment.
>>>>>>>>
>>>>>>>> For information:
>>>>>>>>
>>>>>>>>    - On Jboss there are only Manifold and Generic Repository
>>>>>>>>    application.
>>>>>>>>
>>>>>>>>
>>>>>>>>    - On the same Virtual Server, there is another Jboss istance,
>>>>>>>>    with solr istance and a web application.
>>>>>>>>
>>>>>>>>
>>>>>>>>    - I've check if it was a type of memory problem, but it's not
>>>>>>>>    the case.
>>>>>>>>
>>>>>>>>
>>>>>>>>    - GC with almost 23k seed work always, at least in test that
>>>>>>>>    i've done.
>>>>>>>>
>>>>>>>>
>>>>>>>>    - In local instance of Jboss with Manifold and Generic
>>>>>>>>    Rpository Application, i've not keep this problem.
>>>>>>>>
>>>>>>>> This is the only recurrent information that i've seen on
>>>>>>>> manifold.log:
>>>>>>>> ---------------
>>>>>>>> Connection 0.0.0.0:62755<-><ip-address>:<port> shut down
>>>>>>>> Releasing connection
>>>>>>>> org.apache.http.impl.conn.ManagedClientConnectionImpl@6c98c1bd
>>>>>>>>
>>>>>>>> ---------------
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> L. Alicata
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Job with Generic Connector stop to work

Posted by Karl Wright <da...@gmail.com>.

Hi Luca,

It is not clear what you mean by "multi value extraction" using the JDBC
connector.  The JDBC connector allows collection of primary binary content
as well as metadata from a database row.  So maybe if you can explain what
you need beyond that it would help.

Thanks,
Karl


On Fri, May 6, 2016 at 9:04 AM, Luca Alicata <al...@gmail.com> wrote:

> Hi Karl,
> thanks for information, fortunately in other jboss instance i have a old
> Manifold configuration with single process, that i've dismissed. But in
> this moment, i start to test this jobs with that and if it work fine, i can
> use it only for this job and use it also in production. Maybe after, if i
> can, i try to check the possible problem that stop the agent.
>
> I Take advantage of this discussion to ask you, if multi-value extraction
> from db is consider as possible future work or no. Because i've used this
> generi connector to resolve this lack of JDBC Connector. In fact with
> Manifold 1.8 i've modified the connector to support this behavior (in
> addiction to parse blob file), but upgrade Manifold Version, to not rewrite
> the new connector i decide to use Generic Connector with application that
> do the work of extraction data from DB.
>
> Thanks,
> L. Alicata
>
> 2016-05-06 14:42 GMT+02:00 Karl Wright <da...@gmail.com>:
>
>> Hi Luca,
>>
>> If you do a lock clean and the process still stops, then the locks are
>> not the problem.
>>
>> One way we can drill down into the problem is to get a thread dump of the
>> agents process after it stops.  The thread dump must be of the agents
>> process, not any of the others.
>>
>> FWIW, the generic connector is not well supported; the person who wrote
>> it is still a committer but is not actively involved in MCF development at
>> this time.  I suspect that the problem may have to do with how that
>> connector deals with exceptions or errors, but I am not sure.
>>
>> Thanks,
>>
>> Karl
>>
>>
>> On Fri, May 6, 2016 at 8:38 AM, Luca Alicata <al...@gmail.com>
>> wrote:
>>
>>> Hi Karl,
>>> I've just tried with lock-clean after agents stop to work, obviously
>>> after stopping process. After this, job start correctly, but just second
>>> time that i start a job with a lot of data (or sometimes the third time),
>>> agent stop again.
>>>
>>> Unfortunately, it's difficult start, for the moment, to using Zookeeper
>>> in this environment, but this can resolve the fact that during working
>>> agents stop to work? or help only for cleaning lock agent when i restart
>>> the process?
>>>
>>> Thanks,
>>> L. Alicata
>>>
>>> 2016-05-06 14:15 GMT+02:00 Karl Wright <da...@gmail.com>:
>>>
>>>> Hi Luca,
>>>>
>>>> With file-based synchronization, if you kill any of the processes
>>>> involved, you will need to execute the lock-clean procedure to make sure
>>>> you have no dangling locks in the file system.
>>>>
>>>> - shut down all MCF processes (except the database)
>>>> - run the lock-clean script
>>>> - start your MCF processes back up
>>>>
>>>> I suspect what you are seeing is related to this.
>>>>
>>>> Also, please consider using Zookeeper instead, since it is more robust
>>>> about cleaning out dangling locks.
>>>>
>>>> Thanks,
>>>> Karl
>>>>
>>>>
>>>> On Fri, May 6, 2016 at 8:06 AM, Luca Alicata <al...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Karl,
>>>>> thanks for help.
>>>>> In my case i've only one instance of MCF running, with both type of
>>>>> job (SP and Generic), and so i have only one properties files (that i have
>>>>> attached).
>>>>> For information i used (multiprocess-file configuration) with postgres.
>>>>>
>>>>> Do you have other suggestions? do you need more information, that i
>>>>> can give you?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> L.Alicata
>>>>>
>>>>> 2016-05-06 12:55 GMT+02:00 Karl Wright <da...@gmail.com>:
>>>>>
>>>>>> Hi Luca,
>>>>>>
>>>>>> Do you have multiple independent MCF clusters running at the same
>>>>>> time?  It sounds like you do: you have SP on one, and Generic on another.
>>>>>> If so, you will need to be sure that the synchronization you are using
>>>>>> (either zookeeper or file-based) does not overlap.  Each cluster needs its
>>>>>> own synchronization.  If there is overlap, then doing things with one
>>>>>> cluster may cause the other cluster to hang.  This also means you have to
>>>>>> have different properties files for the two clusters, of course.
>>>>>>
>>>>>> Thanks,
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, May 6, 2016 at 4:32 AM, Luca Alicata <al...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>> i'm using Manifold 2.2 with multi-process configuration in Jboss
>>>>>>> instance inside a Windows Server 2012 and i've a set of job that work with
>>>>>>> Sharepoint (SP) or Generic Connector (GC), that get file from a db.
>>>>>>> With SP i've no problem, while with GC with a lot of document (one
>>>>>>> with 47k and another with 60k), the Seed taking process, sometimes, not
>>>>>>> finish, because the agents seem to stop (although java process is still
>>>>>>> alive).
>>>>>>> After this, if i try to start any other job, that not start, like
>>>>>>> the agents are stopped.
>>>>>>>
>>>>>>> Other times, this jobs work correctly and one time together work
>>>>>>> correctly, running in the same moment.
>>>>>>>
>>>>>>> For information:
>>>>>>>
>>>>>>>    - On Jboss there are only Manifold and Generic Repository
>>>>>>>    application.
>>>>>>>
>>>>>>>
>>>>>>>    - On the same Virtual Server, there is another Jboss istance,
>>>>>>>    with solr istance and a web application.
>>>>>>>
>>>>>>>
>>>>>>>    - I've check if it was a type of memory problem, but it's not
>>>>>>>    the case.
>>>>>>>
>>>>>>>
>>>>>>>    - GC with almost 23k seed work always, at least in test that
>>>>>>>    i've done.
>>>>>>>
>>>>>>>
>>>>>>>    - In local instance of Jboss with Manifold and Generic Rpository
>>>>>>>    Application, i've not keep this problem.
>>>>>>>
>>>>>>> This is the only recurrent information that i've seen on
>>>>>>> manifold.log:
>>>>>>> ---------------
>>>>>>> Connection 0.0.0.0:62755<-><ip-address>:<port> shut down
>>>>>>> Releasing connection
>>>>>>> org.apache.http.impl.conn.ManagedClientConnectionImpl@6c98c1bd
>>>>>>>
>>>>>>> ---------------
>>>>>>>
>>>>>>> Thanks,
>>>>>>> L. Alicata
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Job with Generic Connector stop to work

Posted by Luca Alicata <al...@gmail.com>.

Hi Karl,
thanks for information, fortunately in other jboss instance i have a old
Manifold configuration with single process, that i've dismissed. But in
this moment, i start to test this jobs with that and if it work fine, i can
use it only for this job and use it also in production. Maybe after, if i
can, i try to check the possible problem that stop the agent.

I Take advantage of this discussion to ask you, if multi-value extraction
from db is consider as possible future work or no. Because i've used this
generi connector to resolve this lack of JDBC Connector. In fact with
Manifold 1.8 i've modified the connector to support this behavior (in
addiction to parse blob file), but upgrade Manifold Version, to not rewrite
the new connector i decide to use Generic Connector with application that
do the work of extraction data from DB.

Thanks,
L. Alicata

2016-05-06 14:42 GMT+02:00 Karl Wright <da...@gmail.com>:

> Hi Luca,
>
> If you do a lock clean and the process still stops, then the locks are not
> the problem.
>
> One way we can drill down into the problem is to get a thread dump of the
> agents process after it stops.  The thread dump must be of the agents
> process, not any of the others.
>
> FWIW, the generic connector is not well supported; the person who wrote it
> is still a committer but is not actively involved in MCF development at
> this time.  I suspect that the problem may have to do with how that
> connector deals with exceptions or errors, but I am not sure.
>
> Thanks,
>
> Karl
>
>
> On Fri, May 6, 2016 at 8:38 AM, Luca Alicata <al...@gmail.com>
> wrote:
>
>> Hi Karl,
>> I've just tried with lock-clean after agents stop to work, obviously
>> after stopping process. After this, job start correctly, but just second
>> time that i start a job with a lot of data (or sometimes the third time),
>> agent stop again.
>>
>> Unfortunately, it's difficult start, for the moment, to using Zookeeper
>> in this environment, but this can resolve the fact that during working
>> agents stop to work? or help only for cleaning lock agent when i restart
>> the process?
>>
>> Thanks,
>> L. Alicata
>>
>> 2016-05-06 14:15 GMT+02:00 Karl Wright <da...@gmail.com>:
>>
>>> Hi Luca,
>>>
>>> With file-based synchronization, if you kill any of the processes
>>> involved, you will need to execute the lock-clean procedure to make sure
>>> you have no dangling locks in the file system.
>>>
>>> - shut down all MCF processes (except the database)
>>> - run the lock-clean script
>>> - start your MCF processes back up
>>>
>>> I suspect what you are seeing is related to this.
>>>
>>> Also, please consider using Zookeeper instead, since it is more robust
>>> about cleaning out dangling locks.
>>>
>>> Thanks,
>>> Karl
>>>
>>>
>>> On Fri, May 6, 2016 at 8:06 AM, Luca Alicata <al...@gmail.com>
>>> wrote:
>>>
>>>> Hi Karl,
>>>> thanks for help.
>>>> In my case i've only one instance of MCF running, with both type of job
>>>> (SP and Generic), and so i have only one properties files (that i have
>>>> attached).
>>>> For information i used (multiprocess-file configuration) with postgres.
>>>>
>>>> Do you have other suggestions? do you need more information, that i can
>>>> give you?
>>>>
>>>> Thanks,
>>>>
>>>> L.Alicata
>>>>
>>>> 2016-05-06 12:55 GMT+02:00 Karl Wright <da...@gmail.com>:
>>>>
>>>>> Hi Luca,
>>>>>
>>>>> Do you have multiple independent MCF clusters running at the same
>>>>> time?  It sounds like you do: you have SP on one, and Generic on another.
>>>>> If so, you will need to be sure that the synchronization you are using
>>>>> (either zookeeper or file-based) does not overlap.  Each cluster needs its
>>>>> own synchronization.  If there is overlap, then doing things with one
>>>>> cluster may cause the other cluster to hang.  This also means you have to
>>>>> have different properties files for the two clusters, of course.
>>>>>
>>>>> Thanks,
>>>>> Karl
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, May 6, 2016 at 4:32 AM, Luca Alicata <al...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>> i'm using Manifold 2.2 with multi-process configuration in Jboss
>>>>>> instance inside a Windows Server 2012 and i've a set of job that work with
>>>>>> Sharepoint (SP) or Generic Connector (GC), that get file from a db.
>>>>>> With SP i've no problem, while with GC with a lot of document (one
>>>>>> with 47k and another with 60k), the Seed taking process, sometimes, not
>>>>>> finish, because the agents seem to stop (although java process is still
>>>>>> alive).
>>>>>> After this, if i try to start any other job, that not start, like the
>>>>>> agents are stopped.
>>>>>>
>>>>>> Other times, this jobs work correctly and one time together work
>>>>>> correctly, running in the same moment.
>>>>>>
>>>>>> For information:
>>>>>>
>>>>>>    - On Jboss there are only Manifold and Generic Repository
>>>>>>    application.
>>>>>>
>>>>>>
>>>>>>    - On the same Virtual Server, there is another Jboss istance,
>>>>>>    with solr istance and a web application.
>>>>>>
>>>>>>
>>>>>>    - I've check if it was a type of memory problem, but it's not the
>>>>>>    case.
>>>>>>
>>>>>>
>>>>>>    - GC with almost 23k seed work always, at least in test that i've
>>>>>>    done.
>>>>>>
>>>>>>
>>>>>>    - In local instance of Jboss with Manifold and Generic Rpository
>>>>>>    Application, i've not keep this problem.
>>>>>>
>>>>>> This is the only recurrent information that i've seen on manifold.log:
>>>>>> ---------------
>>>>>> Connection 0.0.0.0:62755<-><ip-address>:<port> shut down
>>>>>> Releasing connection
>>>>>> org.apache.http.impl.conn.ManagedClientConnectionImpl@6c98c1bd
>>>>>>
>>>>>> ---------------
>>>>>>
>>>>>> Thanks,
>>>>>> L. Alicata
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Job with Generic Connector stop to work

Posted by Karl Wright <da...@gmail.com>.

Hi Luca,

If you do a lock clean and the process still stops, then the locks are not
the problem.

One way we can drill down into the problem is to get a thread dump of the
agents process after it stops.  The thread dump must be of the agents
process, not any of the others.

FWIW, the generic connector is not well supported; the person who wrote it
is still a committer but is not actively involved in MCF development at
this time.  I suspect that the problem may have to do with how that
connector deals with exceptions or errors, but I am not sure.

Thanks,

Karl


On Fri, May 6, 2016 at 8:38 AM, Luca Alicata <al...@gmail.com> wrote:

> Hi Karl,
> I've just tried with lock-clean after agents stop to work, obviously after
> stopping process. After this, job start correctly, but just second time
> that i start a job with a lot of data (or sometimes the third time), agent
> stop again.
>
> Unfortunately, it's difficult start, for the moment, to using Zookeeper in
> this environment, but this can resolve the fact that during working agents
> stop to work? or help only for cleaning lock agent when i restart the
> process?
>
> Thanks,
> L. Alicata
>
> 2016-05-06 14:15 GMT+02:00 Karl Wright <da...@gmail.com>:
>
>> Hi Luca,
>>
>> With file-based synchronization, if you kill any of the processes
>> involved, you will need to execute the lock-clean procedure to make sure
>> you have no dangling locks in the file system.
>>
>> - shut down all MCF processes (except the database)
>> - run the lock-clean script
>> - start your MCF processes back up
>>
>> I suspect what you are seeing is related to this.
>>
>> Also, please consider using Zookeeper instead, since it is more robust
>> about cleaning out dangling locks.
>>
>> Thanks,
>> Karl
>>
>>
>> On Fri, May 6, 2016 at 8:06 AM, Luca Alicata <al...@gmail.com>
>> wrote:
>>
>>> Hi Karl,
>>> thanks for help.
>>> In my case i've only one instance of MCF running, with both type of job
>>> (SP and Generic), and so i have only one properties files (that i have
>>> attached).
>>> For information i used (multiprocess-file configuration) with postgres.
>>>
>>> Do you have other suggestions? do you need more information, that i can
>>> give you?
>>>
>>> Thanks,
>>>
>>> L.Alicata
>>>
>>> 2016-05-06 12:55 GMT+02:00 Karl Wright <da...@gmail.com>:
>>>
>>>> Hi Luca,
>>>>
>>>> Do you have multiple independent MCF clusters running at the same
>>>> time?  It sounds like you do: you have SP on one, and Generic on another.
>>>> If so, you will need to be sure that the synchronization you are using
>>>> (either zookeeper or file-based) does not overlap.  Each cluster needs its
>>>> own synchronization.  If there is overlap, then doing things with one
>>>> cluster may cause the other cluster to hang.  This also means you have to
>>>> have different properties files for the two clusters, of course.
>>>>
>>>> Thanks,
>>>> Karl
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, May 6, 2016 at 4:32 AM, Luca Alicata <al...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>> i'm using Manifold 2.2 with multi-process configuration in Jboss
>>>>> instance inside a Windows Server 2012 and i've a set of job that work with
>>>>> Sharepoint (SP) or Generic Connector (GC), that get file from a db.
>>>>> With SP i've no problem, while with GC with a lot of document (one
>>>>> with 47k and another with 60k), the Seed taking process, sometimes, not
>>>>> finish, because the agents seem to stop (although java process is still
>>>>> alive).
>>>>> After this, if i try to start any other job, that not start, like the
>>>>> agents are stopped.
>>>>>
>>>>> Other times, this jobs work correctly and one time together work
>>>>> correctly, running in the same moment.
>>>>>
>>>>> For information:
>>>>>
>>>>>    - On Jboss there are only Manifold and Generic Repository
>>>>>    application.
>>>>>
>>>>>
>>>>>    - On the same Virtual Server, there is another Jboss istance, with
>>>>>    solr istance and a web application.
>>>>>
>>>>>
>>>>>    - I've check if it was a type of memory problem, but it's not the
>>>>>    case.
>>>>>
>>>>>
>>>>>    - GC with almost 23k seed work always, at least in test that i've
>>>>>    done.
>>>>>
>>>>>
>>>>>    - In local instance of Jboss with Manifold and Generic Rpository
>>>>>    Application, i've not keep this problem.
>>>>>
>>>>> This is the only recurrent information that i've seen on manifold.log:
>>>>> ---------------
>>>>> Connection 0.0.0.0:62755<-><ip-address>:<port> shut down
>>>>> Releasing connection
>>>>> org.apache.http.impl.conn.ManagedClientConnectionImpl@6c98c1bd
>>>>>
>>>>> ---------------
>>>>>
>>>>> Thanks,
>>>>> L. Alicata
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Job with Generic Connector stop to work

Posted by Luca Alicata <al...@gmail.com>.

Hi Karl,
I've just tried with lock-clean after agents stop to work, obviously after
stopping process. After this, job start correctly, but just second time
that i start a job with a lot of data (or sometimes the third time), agent
stop again.

Unfortunately, it's difficult start, for the moment, to using Zookeeper in
this environment, but this can resolve the fact that during working agents
stop to work? or help only for cleaning lock agent when i restart the
process?

Thanks,
L. Alicata

2016-05-06 14:15 GMT+02:00 Karl Wright <da...@gmail.com>:

> Hi Luca,
>
> With file-based synchronization, if you kill any of the processes
> involved, you will need to execute the lock-clean procedure to make sure
> you have no dangling locks in the file system.
>
> - shut down all MCF processes (except the database)
> - run the lock-clean script
> - start your MCF processes back up
>
> I suspect what you are seeing is related to this.
>
> Also, please consider using Zookeeper instead, since it is more robust
> about cleaning out dangling locks.
>
> Thanks,
> Karl
>
>
> On Fri, May 6, 2016 at 8:06 AM, Luca Alicata <al...@gmail.com>
> wrote:
>
>> Hi Karl,
>> thanks for help.
>> In my case i've only one instance of MCF running, with both type of job
>> (SP and Generic), and so i have only one properties files (that i have
>> attached).
>> For information i used (multiprocess-file configuration) with postgres.
>>
>> Do you have other suggestions? do you need more information, that i can
>> give you?
>>
>> Thanks,
>>
>> L.Alicata
>>
>> 2016-05-06 12:55 GMT+02:00 Karl Wright <da...@gmail.com>:
>>
>>> Hi Luca,
>>>
>>> Do you have multiple independent MCF clusters running at the same time?
>>> It sounds like you do: you have SP on one, and Generic on another.  If so,
>>> you will need to be sure that the synchronization you are using (either
>>> zookeeper or file-based) does not overlap.  Each cluster needs its own
>>> synchronization.  If there is overlap, then doing things with one cluster
>>> may cause the other cluster to hang.  This also means you have to have
>>> different properties files for the two clusters, of course.
>>>
>>> Thanks,
>>> Karl
>>>
>>>
>>>
>>>
>>> On Fri, May 6, 2016 at 4:32 AM, Luca Alicata <al...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>> i'm using Manifold 2.2 with multi-process configuration in Jboss
>>>> instance inside a Windows Server 2012 and i've a set of job that work with
>>>> Sharepoint (SP) or Generic Connector (GC), that get file from a db.
>>>> With SP i've no problem, while with GC with a lot of document (one with
>>>> 47k and another with 60k), the Seed taking process, sometimes, not finish,
>>>> because the agents seem to stop (although java process is still alive).
>>>> After this, if i try to start any other job, that not start, like the
>>>> agents are stopped.
>>>>
>>>> Other times, this jobs work correctly and one time together work
>>>> correctly, running in the same moment.
>>>>
>>>> For information:
>>>>
>>>>    - On Jboss there are only Manifold and Generic Repository
>>>>    application.
>>>>
>>>>
>>>>    - On the same Virtual Server, there is another Jboss istance, with
>>>>    solr istance and a web application.
>>>>
>>>>
>>>>    - I've check if it was a type of memory problem, but it's not the
>>>>    case.
>>>>
>>>>
>>>>    - GC with almost 23k seed work always, at least in test that i've
>>>>    done.
>>>>
>>>>
>>>>    - In local instance of Jboss with Manifold and Generic Rpository
>>>>    Application, i've not keep this problem.
>>>>
>>>> This is the only recurrent information that i've seen on manifold.log:
>>>> ---------------
>>>> Connection 0.0.0.0:62755<-><ip-address>:<port> shut down
>>>> Releasing connection
>>>> org.apache.http.impl.conn.ManagedClientConnectionImpl@6c98c1bd
>>>>
>>>> ---------------
>>>>
>>>> Thanks,
>>>> L. Alicata
>>>>
>>>
>>>
>>
>

Re: Job with Generic Connector stop to work

Posted by Karl Wright <da...@gmail.com>.

Hi Luca,

With file-based synchronization, if you kill any of the processes involved,
you will need to execute the lock-clean procedure to make sure you have no
dangling locks in the file system.

- shut down all MCF processes (except the database)
- run the lock-clean script
- start your MCF processes back up

I suspect what you are seeing is related to this.

Also, please consider using Zookeeper instead, since it is more robust
about cleaning out dangling locks.

Thanks,
Karl


On Fri, May 6, 2016 at 8:06 AM, Luca Alicata <al...@gmail.com> wrote:

> Hi Karl,
> thanks for help.
> In my case i've only one instance of MCF running, with both type of job
> (SP and Generic), and so i have only one properties files (that i have
> attached).
> For information i used (multiprocess-file configuration) with postgres.
>
> Do you have other suggestions? do you need more information, that i can
> give you?
>
> Thanks,
>
> L.Alicata
>
> 2016-05-06 12:55 GMT+02:00 Karl Wright <da...@gmail.com>:
>
>> Hi Luca,
>>
>> Do you have multiple independent MCF clusters running at the same time?
>> It sounds like you do: you have SP on one, and Generic on another.  If so,
>> you will need to be sure that the synchronization you are using (either
>> zookeeper or file-based) does not overlap.  Each cluster needs its own
>> synchronization.  If there is overlap, then doing things with one cluster
>> may cause the other cluster to hang.  This also means you have to have
>> different properties files for the two clusters, of course.
>>
>> Thanks,
>> Karl
>>
>>
>>
>>
>> On Fri, May 6, 2016 at 4:32 AM, Luca Alicata <al...@gmail.com>
>> wrote:
>>
>>> Hi,
>>> i'm using Manifold 2.2 with multi-process configuration in Jboss
>>> instance inside a Windows Server 2012 and i've a set of job that work with
>>> Sharepoint (SP) or Generic Connector (GC), that get file from a db.
>>> With SP i've no problem, while with GC with a lot of document (one with
>>> 47k and another with 60k), the Seed taking process, sometimes, not finish,
>>> because the agents seem to stop (although java process is still alive).
>>> After this, if i try to start any other job, that not start, like the
>>> agents are stopped.
>>>
>>> Other times, this jobs work correctly and one time together work
>>> correctly, running in the same moment.
>>>
>>> For information:
>>>
>>>    - On Jboss there are only Manifold and Generic Repository
>>>    application.
>>>
>>>
>>>    - On the same Virtual Server, there is another Jboss istance, with
>>>    solr istance and a web application.
>>>
>>>
>>>    - I've check if it was a type of memory problem, but it's not the
>>>    case.
>>>
>>>
>>>    - GC with almost 23k seed work always, at least in test that i've
>>>    done.
>>>
>>>
>>>    - In local instance of Jboss with Manifold and Generic Rpository
>>>    Application, i've not keep this problem.
>>>
>>> This is the only recurrent information that i've seen on manifold.log:
>>> ---------------
>>> Connection 0.0.0.0:62755<-><ip-address>:<port> shut down
>>> Releasing connection
>>> org.apache.http.impl.conn.ManagedClientConnectionImpl@6c98c1bd
>>>
>>> ---------------
>>>
>>> Thanks,
>>> L. Alicata
>>>
>>
>>
>

Re: Job with Generic Connector stop to work

Posted by Luca Alicata <al...@gmail.com>.

Hi Karl,
thanks for help.
In my case i've only one instance of MCF running, with both type of job (SP
and Generic), and so i have only one properties files (that i have
attached).
For information i used (multiprocess-file configuration) with postgres.

Do you have other suggestions? do you need more information, that i can
give you?

Thanks,

L.Alicata

2016-05-06 12:55 GMT+02:00 Karl Wright <da...@gmail.com>:

> Hi Luca,
>
> Do you have multiple independent MCF clusters running at the same time?
> It sounds like you do: you have SP on one, and Generic on another.  If so,
> you will need to be sure that the synchronization you are using (either
> zookeeper or file-based) does not overlap.  Each cluster needs its own
> synchronization.  If there is overlap, then doing things with one cluster
> may cause the other cluster to hang.  This also means you have to have
> different properties files for the two clusters, of course.
>
> Thanks,
> Karl
>
>
>
>
> On Fri, May 6, 2016 at 4:32 AM, Luca Alicata <al...@gmail.com>
> wrote:
>
>> Hi,
>> i'm using Manifold 2.2 with multi-process configuration in Jboss instance
>> inside a Windows Server 2012 and i've a set of job that work with
>> Sharepoint (SP) or Generic Connector (GC), that get file from a db.
>> With SP i've no problem, while with GC with a lot of document (one with
>> 47k and another with 60k), the Seed taking process, sometimes, not finish,
>> because the agents seem to stop (although java process is still alive).
>> After this, if i try to start any other job, that not start, like the
>> agents are stopped.
>>
>> Other times, this jobs work correctly and one time together work
>> correctly, running in the same moment.
>>
>> For information:
>>
>>    - On Jboss there are only Manifold and Generic Repository application.
>>
>>
>>    - On the same Virtual Server, there is another Jboss istance, with
>>    solr istance and a web application.
>>
>>
>>    - I've check if it was a type of memory problem, but it's not the
>>    case.
>>
>>
>>    - GC with almost 23k seed work always, at least in test that i've
>>    done.
>>
>>
>>    - In local instance of Jboss with Manifold and Generic Rpository
>>    Application, i've not keep this problem.
>>
>> This is the only recurrent information that i've seen on manifold.log:
>> ---------------
>> Connection 0.0.0.0:62755<-><ip-address>:<port> shut down
>> Releasing connection
>> org.apache.http.impl.conn.ManagedClientConnectionImpl@6c98c1bd
>>
>> ---------------
>>
>> Thanks,
>> L. Alicata
>>
>
>

Re: Job with Generic Connector stop to work

Posted by Karl Wright <da...@gmail.com>.

Hi Luca,

Do you have multiple independent MCF clusters running at the same time?  It
sounds like you do: you have SP on one, and Generic on another.  If so, you
will need to be sure that the synchronization you are using (either
zookeeper or file-based) does not overlap.  Each cluster needs its own
synchronization.  If there is overlap, then doing things with one cluster
may cause the other cluster to hang.  This also means you have to have
different properties files for the two clusters, of course.

Thanks,
Karl




On Fri, May 6, 2016 at 4:32 AM, Luca Alicata <al...@gmail.com> wrote:

> Hi,
> i'm using Manifold 2.2 with multi-process configuration in Jboss instance
> inside a Windows Server 2012 and i've a set of job that work with
> Sharepoint (SP) or Generic Connector (GC), that get file from a db.
> With SP i've no problem, while with GC with a lot of document (one with
> 47k and another with 60k), the Seed taking process, sometimes, not finish,
> because the agents seem to stop (although java process is still alive).
> After this, if i try to start any other job, that not start, like the
> agents are stopped.
>
> Other times, this jobs work correctly and one time together work
> correctly, running in the same moment.
>
> For information:
>
>    - On Jboss there are only Manifold and Generic Repository application.
>
>
>    - On the same Virtual Server, there is another Jboss istance, with
>    solr istance and a web application.
>
>
>    - I've check if it was a type of memory problem, but it's not the case.
>
>
>    - GC with almost 23k seed work always, at least in test that i've done.
>
>
>    - In local instance of Jboss with Manifold and Generic Rpository
>    Application, i've not keep this problem.
>
> This is the only recurrent information that i've seen on manifold.log:
> ---------------
> Connection 0.0.0.0:62755<-><ip-address>:<port> shut down
> Releasing connection
> org.apache.http.impl.conn.ManagedClientConnectionImpl@6c98c1bd
>
> ---------------
>
> Thanks,
> L. Alicata
>