You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by Karl Wright <da...@gmail.com> on 2012/02/14 21:23:54 UTC

Re: Need Help on setting up ManifoldCF

Hi Anupam,

Please post emails like this directly to
connectors-user@incubator.apache.org.  See below for responses.

On Tue, Feb 14, 2012 at 3:07 PM, Anupam Bhattacharya
<an...@gmail.com> wrote:
>
> Hello Karl,
>
> I am a software programmer in DuPont, Gurgaon, India. Recently, due to the
> economic instability all over the world the company has decided to go for
> cheaper Search Engine Applications. Thus we are getting rid of many costly
> proprietary Search Applications and will be replacing with FAST.
>
> Although, I recently came across SOLR search engine & ManiFoldCF Connector
> framework. Thus, I am currently driving this effort within my company as i
> am a big supporter of open source technologies. I started my career in
> Alfresco CMS and now working on Search Technologies.
>
> Currently I am facing lots of initial building/deploying/installing issues.
> I have already referred the url
> http://incubator.apache.org/connectors/en_US/how-to-build-and-deploy.html
> Read it multiple times but still face many issues. I downloaded the latest
> 0.4 version and it seems the documentation is not up to date on the above
> link.
>

The online documentation is pertinent to trunk.  The documentation you
want to use is contained within the 0.4-incubating release.  Go to
dist/doc and you will see it there.

> Few issues which took me a long time to resolve which can be added in
> ManifoldCF wiki as learnings for others are listed below:
> a. No single example is given for running the executecommand.bat with proper
> arguments. Only list of commands given with parameter defined.

I'm not entirely sure I get this.  Do you just want an example in the
documentation?

> b. Setting where and which file for the property manifoldcf.configfile for deploying the war on tomcat with Postgresql database.

The documentation already tells you that you need to add an
appropriate -D to your tomcat invocation to point to your
properties.xml file.  Tomcat documentation differs from version to
version and platform to platform on how best to do that, and if you
run under Windows there's even a service wrapper with a configuration
UI that allows you to set these parameters.  So it's way beyond
ManifoldCF's mission to describe all that, I think.

> c. I am trying to build the Documentum Connector but came to know that some
> additional environment variables needs to be added for "DOCUMENTUM".
> Additionally the latest version of documentum uses dfc.properties file while
> run.bat look for dctl.ini file.

Could you open a ticket in Jira for this issue?
https://issues.apache.org/jira. It should not be a problem if you
modify the script temporarily, but we can readily make the script look
for either of these.

> d. postgresql driver is jdbc3 thus it creates problem with JVM6 or above.

We use JDK 6 all the time without problems, so I don't know what you
are talking about here.

> e. I was getting errors during  the ant build which tries to delete jar
> files from lib directory. Don't have the source code right now with me thus
> cant provide the full path.

It sounds like you were trying to run ant while you still had
ManifoldCF processes running from the same tree.

> f. It was advised in the documentation to set MCF_Home for
> example_multiprocess project but it seems the build of documentum connector
> refers to this property differently from run.bat.

Yes, this was noticed and fixed on trunk recently.

>
> Can you please update the Apache ManifoldCF website with the latest
> installation procedures. Also, It will be very kind of you in the meanwhile
> if you can send few notes for me to head start the configuration of
> ManifoldCF, with SOLR & Documentum connector.
>

The documentation online has been updated to be consistent with trunk,
so if you want to use the trunk version this might be a good
opportunity to help clarify the documentation.  Either that or you
will need to stick with the 0.4-incubating release and the
0.4-incubating documentation that is part of it; we cannot at this
time update documentation that has already been released.

Thanks,
Karl

> Looking forward for your help.
>
> Thanks & Regards
> Anupam Bhattacharya
>
>
>

Re: Need Help on setting up ManifoldCF

Posted by Karl Wright <da...@gmail.com>.
By all means, please go ahead.  Solr has a tutorial - maybe something
like that would be appropriate?

Karl

On Tue, Feb 14, 2012 at 6:54 PM, Hitoshi Ozawa
<Oz...@ogis-ri.co.jp> wrote:
> Hi,
>
> I agree with Anupam on getting started with ManifoldCF. I'm thinking of
> writing up a simple quick guide
> because many people are having trouble. I think it would help others if
> there was a simple example with
> ManifoldCF + Solr + local file + jsp to crawl some files in local directory
> (ManifoldCF documents in PDF?)
> and search and display results.
>
> H.Ozawa
>
>
> (2012/02/15 5:23), Karl Wright wrote:
>>
>> Hi Anupam,
>>
>> Please post emails like this directly to
>> connectors-user@incubator.apache.org.  See below for responses.
>>
>> On Tue, Feb 14, 2012 at 3:07 PM, Anupam Bhattacharya
>> <an...@gmail.com>  wrote:
>>
>>>
>>> Hello Karl,
>>>
>>> I am a software programmer in DuPont, Gurgaon, India. Recently, due to
>>> the
>>> economic instability all over the world the company has decided to go for
>>> cheaper Search Engine Applications. Thus we are getting rid of many
>>> costly
>>> proprietary Search Applications and will be replacing with FAST.
>>>
>>> Although, I recently came across SOLR search engine&  ManiFoldCF
>>> Connector
>>>
>>> framework. Thus, I am currently driving this effort within my company as
>>> i
>>> am a big supporter of open source technologies. I started my career in
>>> Alfresco CMS and now working on Search Technologies.
>>>
>>> Currently I am facing lots of initial building/deploying/installing
>>> issues.
>>> I have already referred the url
>>> http://incubator.apache.org/connectors/en_US/how-to-build-and-deploy.html
>>> Read it multiple times but still face many issues. I downloaded the
>>> latest
>>> 0.4 version and it seems the documentation is not up to date on the above
>>> link.
>>>
>>>
>>
>> The online documentation is pertinent to trunk.  The documentation you
>> want to use is contained within the 0.4-incubating release.  Go to
>> dist/doc and you will see it there.
>>
>>
>>>
>>> Few issues which took me a long time to resolve which can be added in
>>> ManifoldCF wiki as learnings for others are listed below:
>>> a. No single example is given for running the executecommand.bat with
>>> proper
>>> arguments. Only list of commands given with parameter defined.
>>>
>>
>> I'm not entirely sure I get this.  Do you just want an example in the
>> documentation?
>>
>>
>>>
>>> b. Setting where and which file for the property manifoldcf.configfile
>>> for deploying the war on tomcat with Postgresql database.
>>>
>>
>> The documentation already tells you that you need to add an
>> appropriate -D to your tomcat invocation to point to your
>> properties.xml file.  Tomcat documentation differs from version to
>> version and platform to platform on how best to do that, and if you
>> run under Windows there's even a service wrapper with a configuration
>> UI that allows you to set these parameters.  So it's way beyond
>> ManifoldCF's mission to describe all that, I think.
>>
>>
>>>
>>> c. I am trying to build the Documentum Connector but came to know that
>>> some
>>> additional environment variables needs to be added for "DOCUMENTUM".
>>> Additionally the latest version of documentum uses dfc.properties file
>>> while
>>> run.bat look for dctl.ini file.
>>>
>>
>> Could you open a ticket in Jira for this issue?
>> https://issues.apache.org/jira. It should not be a problem if you
>> modify the script temporarily, but we can readily make the script look
>> for either of these.
>>
>>
>>>
>>> d. postgresql driver is jdbc3 thus it creates problem with JVM6 or above.
>>>
>>
>> We use JDK 6 all the time without problems, so I don't know what you
>> are talking about here.
>>
>>
>>>
>>> e. I was getting errors during  the ant build which tries to delete jar
>>> files from lib directory. Don't have the source code right now with me
>>> thus
>>> cant provide the full path.
>>>
>>
>> It sounds like you were trying to run ant while you still had
>> ManifoldCF processes running from the same tree.
>>
>>
>>>
>>> f. It was advised in the documentation to set MCF_Home for
>>> example_multiprocess project but it seems the build of documentum
>>> connector
>>> refers to this property differently from run.bat.
>>>
>>
>> Yes, this was noticed and fixed on trunk recently.
>>
>>
>>>
>>> Can you please update the Apache ManifoldCF website with the latest
>>> installation procedures. Also, It will be very kind of you in the
>>> meanwhile
>>> if you can send few notes for me to head start the configuration of
>>> ManifoldCF, with SOLR&  Documentum connector.
>>>
>>>
>>
>> The documentation online has been updated to be consistent with trunk,
>> so if you want to use the trunk version this might be a good
>> opportunity to help clarify the documentation.  Either that or you
>> will need to stick with the 0.4-incubating release and the
>> 0.4-incubating documentation that is part of it; we cannot at this
>> time update documentation that has already been released.
>>
>> Thanks,
>> Karl
>>
>>
>>>
>>> Looking forward for your help.
>>>
>>> Thanks&  Regards
>>> Anupam Bhattacharya
>>>
>>>
>>>
>>>
>>
>>
>
>
>

Re: Need Help on setting up ManifoldCF

Posted by Hitoshi Ozawa <Oz...@ogis-ri.co.jp>.
Hi,

I agree with Anupam on getting started with ManifoldCF. I'm thinking of 
writing up a simple quick guide
because many people are having trouble. I think it would help others if 
there was a simple example with
ManifoldCF + Solr + local file + jsp to crawl some files in local 
directory (ManifoldCF documents in PDF?)
and search and display results.

H.Ozawa

(2012/02/15 5:23), Karl Wright wrote:
> Hi Anupam,
>
> Please post emails like this directly to
> connectors-user@incubator.apache.org.  See below for responses.
>
> On Tue, Feb 14, 2012 at 3:07 PM, Anupam Bhattacharya
> <an...@gmail.com>  wrote:
>    
>> Hello Karl,
>>
>> I am a software programmer in DuPont, Gurgaon, India. Recently, due to the
>> economic instability all over the world the company has decided to go for
>> cheaper Search Engine Applications. Thus we are getting rid of many costly
>> proprietary Search Applications and will be replacing with FAST.
>>
>> Although, I recently came across SOLR search engine&  ManiFoldCF Connector
>> framework. Thus, I am currently driving this effort within my company as i
>> am a big supporter of open source technologies. I started my career in
>> Alfresco CMS and now working on Search Technologies.
>>
>> Currently I am facing lots of initial building/deploying/installing issues.
>> I have already referred the url
>> http://incubator.apache.org/connectors/en_US/how-to-build-and-deploy.html
>> Read it multiple times but still face many issues. I downloaded the latest
>> 0.4 version and it seems the documentation is not up to date on the above
>> link.
>>
>>      
> The online documentation is pertinent to trunk.  The documentation you
> want to use is contained within the 0.4-incubating release.  Go to
> dist/doc and you will see it there.
>
>    
>> Few issues which took me a long time to resolve which can be added in
>> ManifoldCF wiki as learnings for others are listed below:
>> a. No single example is given for running the executecommand.bat with proper
>> arguments. Only list of commands given with parameter defined.
>>      
> I'm not entirely sure I get this.  Do you just want an example in the
> documentation?
>
>    
>> b. Setting where and which file for the property manifoldcf.configfile for deploying the war on tomcat with Postgresql database.
>>      
> The documentation already tells you that you need to add an
> appropriate -D to your tomcat invocation to point to your
> properties.xml file.  Tomcat documentation differs from version to
> version and platform to platform on how best to do that, and if you
> run under Windows there's even a service wrapper with a configuration
> UI that allows you to set these parameters.  So it's way beyond
> ManifoldCF's mission to describe all that, I think.
>
>    
>> c. I am trying to build the Documentum Connector but came to know that some
>> additional environment variables needs to be added for "DOCUMENTUM".
>> Additionally the latest version of documentum uses dfc.properties file while
>> run.bat look for dctl.ini file.
>>      
> Could you open a ticket in Jira for this issue?
> https://issues.apache.org/jira. It should not be a problem if you
> modify the script temporarily, but we can readily make the script look
> for either of these.
>
>    
>> d. postgresql driver is jdbc3 thus it creates problem with JVM6 or above.
>>      
> We use JDK 6 all the time without problems, so I don't know what you
> are talking about here.
>
>    
>> e. I was getting errors during  the ant build which tries to delete jar
>> files from lib directory. Don't have the source code right now with me thus
>> cant provide the full path.
>>      
> It sounds like you were trying to run ant while you still had
> ManifoldCF processes running from the same tree.
>
>    
>> f. It was advised in the documentation to set MCF_Home for
>> example_multiprocess project but it seems the build of documentum connector
>> refers to this property differently from run.bat.
>>      
> Yes, this was noticed and fixed on trunk recently.
>
>    
>> Can you please update the Apache ManifoldCF website with the latest
>> installation procedures. Also, It will be very kind of you in the meanwhile
>> if you can send few notes for me to head start the configuration of
>> ManifoldCF, with SOLR&  Documentum connector.
>>
>>      
> The documentation online has been updated to be consistent with trunk,
> so if you want to use the trunk version this might be a good
> opportunity to help clarify the documentation.  Either that or you
> will need to stick with the 0.4-incubating release and the
> 0.4-incubating documentation that is part of it; we cannot at this
> time update documentation that has already been released.
>
> Thanks,
> Karl
>
>    
>> Looking forward for your help.
>>
>> Thanks&  Regards
>> Anupam Bhattacharya
>>
>>
>>
>>      
>    



Re: Need Help on setting up ManifoldCF

Posted by Karl Wright <da...@gmail.com>.
Hi Anupam,

The Documentum Connector indexes binary documents, as well as the
metadata you select.  If you are not seeing the binary documents get
indexed, you will need to determine whether the problem is in Solr or
in ManifoldCF.  The best way to do that is to look at the Simple
History report in the ManifoldCF UI.  Look for the "document ingest"
event for one or more documents you have crawled.  If the size
reported is greater than zero, then the document was sent to Solr.
You should then look at the Solr standard output to see whether the
Extracting Update Handler has noted that a document was received.  I
believe that it also logs its size.

Thanks,
Karl


On Thu, Feb 23, 2012 at 2:41 PM, Anupam Bhattacharya
<an...@gmail.com> wrote:
> Thanks Karl,
>
> I was just curious.. can the Documentum Connector present in ManifoldCF
> index binary documents also in addition to the content model defined
> document types & its metadata ?
>
> Since configuring documentum repository connection in ManifoldCF for crawler
> and then again in SOLR to fetch the actual document will be repeat work to
> fetch metadata of one document.
>
> Regards
> Anupam
>
>
> On Fri, Feb 24, 2012 at 12:44 AM, Karl Wright <da...@gmail.com> wrote:
>>
>> Glad it is working for you!
>>
>> Solr is almost infinitely flexible, so you have many options.
>>
>> In my opinion the best way you convert binary documents to indexable
>> text is indeed to use Solr Cell.  Solr Cell is constructed on Tika, so
>> you won't need to bring in Tika for this because it should already be
>> there. Tika has a pipeline architecture which should suit your use
>> case well.   It should thus be possible to configure the existing
>> update handler to use Solr Cell, and configure Solr Cell's Tika
>> instance to perform whatever transformations you need.
>>
>> Hope this helps.  For further Solr questions, you can always ask on
>> the Solr user list.  A Tika user list is also available.
>>
>> Thanks,
>> Karl
>>
>> On Thu, Feb 23, 2012 at 2:04 PM, Anupam Bhattacharya
>> <an...@gmail.com> wrote:
>> > Hello Karl,
>> >
>> > Finally, I was able to index all the metadata for the defined document
>> > types
>> > with different content types. Everything went well.
>> > Although I was not able to index the file full text content. (like PDF,
>> > XML). I read about SOLR Cell where using CURL we can upload documents
>> > but
>> > unfortunately our XML files structure contains Tag & values which also
>> > needs
>> > to be indexed.
>> > e.g, some XML structure..
>> >
>> > <doc>
>> > <object_id>111</object_id>
>> > <abstract>Abstract Text</abstract>
>> > <citation>Citation Text</citation>
>> > <publication>News Source</publication>
>> > </doc>
>> >
>> > I found that in SOLR if we add a new RequestHandler Code extending the
>> > ExtractingRequestHandler we can parse the documents fetch information
>> > and
>> > add it as index field in the SOLR index.
>> >
>> > What is the ideal approach for indexing tag values from XML in lucene
>> > from
>> > ManifoldCF to SOLR ? Is it necessary to integrate TIKA for this ?
>> > I found a good post over here.. https://community.emc.com/docs/DOC-6520
>> >
>> > Appreciate your advice on this.
>> >
>> > Regards
>> > Anupam
>> >
>> >
>> >
>> >
>> > On Thu, Feb 16, 2012 at 12:17 AM, Karl Wright <da...@gmail.com>
>> > wrote:
>> >>
>> >> On Wed, Feb 15, 2012 at 1:13 PM, Anupam Bhattacharya
>> >> <an...@gmail.com> wrote:
>> >> > Hello Karl,
>> >> >
>> >> > Thanks for adding this to the JIRA system.
>> >> >
>> >> > The dfc.properties was introduced from Documentum 6.0 version onwards
>> >> > &
>> >> > as
>> >> > per manifoldcf connector documentation
>> >> >
>> >> > (http://incubator.apache.org/connectors/en_US/included-connectors.html)
>> >> > the
>> >> > out-of the box connector classes were tested against DFC 5.3 SP5
>> >> > which
>> >> > needed the dmcl.ini file. Thus run.bat must have been configured
>> >> > properly
>> >> > for that dmcl.ini.
>> >>
>> >> Right - so does DFC 6.0 on Windows require the DOCUMENTUM environment
>> >> variable to be set to point at the directory where dfc.properties is
>> >> found?  Or perhaps it doesn't require the DOCUMENTUM environment
>> >> variable at all anymore?
>> >>
>> >> >
>> >> > As I am trying to connect to DFC 6.5 SP3 version i need to look for
>> >> > dfc.properties file. I hope the out-of the box documentum connector
>> >> > will
>> >> > work with 6.5 version.
>> >>
>> >> It was tried and worked.  The script was developed later with only the
>> >> 5.3 version available.
>> >>
>> >> >
>> >> > I am confused, why for all connector we have Client & Server version
>> >> > ?
>> >> > Can
>> >> > you please explain.
>> >> >
>> >>
>> >> Do you mean "why is there a documentum-connector-server" process?  If
>> >> that's the question, it was created for two reasons:
>> >> (1) We had problems with stability of DFC.  It segfaults occasionally,
>> >> somewhere in its native code.  We did not want that to bring down
>> >> ManifoldCF, and we wanted to be able to restart the part of the
>> >> connector that depended on DFC transparently when it crashed.
>> >> (2) DFC has dependencies on many older open-source jars that conflict
>> >> with the rest of ManifoldCF.  If (1) was not a problem we might have
>> >> used a classloader to fix this, but since we had to fix both we
>> >> created a separate process.
>> >>
>> >> FWIW, we do the same thing for FileNet because of its dependency on
>> >> Wasp.
>> >>
>> >> Karl
>> >>
>> >> > Again, Thanks for all the help.
>> >> >
>> >> > Regards
>> >> > Anupam
>> >> >
>> >> >
>> >> > On Wed, Feb 15, 2012 at 8:42 PM, Karl Wright <da...@gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> Hi Anupam,
>> >> >>
>> >> >> I did not see a ticket from you about the DOCUMENTUM environment
>> >> >> variable and the dmcl.ini vs. dfc.properties file.  I've created an
>> >> >> issue at https://issues.apache.org/jira/browse/CONNECTORS-410 to
>> >> >> track
>> >> >> this problem.  It would be great if you could confirm that: (a) the
>> >> >> DOCUMENTUM environment variable is still needed at all by DFC, and
>> >> >> (b)
>> >> >> that when it is set properly, the file dfc.properties can be found
>> >> >> at
>> >> >> $DOCUMENTUM\dfc.properties (on Windows, at least).
>> >> >>
>> >> >> Thanks,
>> >> >> Karl
>> >> >>
>> >> >> On Tue, Feb 14, 2012 at 3:23 PM, Karl Wright <da...@gmail.com>
>> >> >> wrote:
>> >> >> > Hi Anupam,
>> >> >> >
>> >> >> > Please post emails like this directly to
>> >> >> > connectors-user@incubator.apache.org.  See below for responses.
>> >> >> >
>> >> >> > On Tue, Feb 14, 2012 at 3:07 PM, Anupam Bhattacharya
>> >> >> > <an...@gmail.com> wrote:
>> >> >> >>
>> >> >> >> Hello Karl,
>> >> >> >>
>> >> >> >> I am a software programmer in DuPont, Gurgaon, India. Recently,
>> >> >> >> due
>> >> >> >> to
>> >> >> >> the
>> >> >> >> economic instability all over the world the company has decided
>> >> >> >> to
>> >> >> >> go
>> >> >> >> for
>> >> >> >> cheaper Search Engine Applications. Thus we are getting rid of
>> >> >> >> many
>> >> >> >> costly
>> >> >> >> proprietary Search Applications and will be replacing with FAST.
>> >> >> >>
>> >> >> >> Although, I recently came across SOLR search engine & ManiFoldCF
>> >> >> >> Connector
>> >> >> >> framework. Thus, I am currently driving this effort within my
>> >> >> >> company
>> >> >> >> as i
>> >> >> >> am a big supporter of open source technologies. I started my
>> >> >> >> career
>> >> >> >> in
>> >> >> >> Alfresco CMS and now working on Search Technologies.
>> >> >> >>
>> >> >> >> Currently I am facing lots of initial
>> >> >> >> building/deploying/installing
>> >> >> >> issues.
>> >> >> >> I have already referred the url
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> http://incubator.apache.org/connectors/en_US/how-to-build-and-deploy.html
>> >> >> >> Read it multiple times but still face many issues. I downloaded
>> >> >> >> the
>> >> >> >> latest
>> >> >> >> 0.4 version and it seems the documentation is not up to date on
>> >> >> >> the
>> >> >> >> above
>> >> >> >> link.
>> >> >> >>
>> >> >> >
>> >> >> > The online documentation is pertinent to trunk.  The documentation
>> >> >> > you
>> >> >> > want to use is contained within the 0.4-incubating release.  Go to
>> >> >> > dist/doc and you will see it there.
>> >> >> >
>> >> >> >> Few issues which took me a long time to resolve which can be
>> >> >> >> added
>> >> >> >> in
>> >> >> >> ManifoldCF wiki as learnings for others are listed below:
>> >> >> >> a. No single example is given for running the executecommand.bat
>> >> >> >> with
>> >> >> >> proper
>> >> >> >> arguments. Only list of commands given with parameter defined.
>> >> >> >
>> >> >> > I'm not entirely sure I get this.  Do you just want an example in
>> >> >> > the
>> >> >> > documentation?
>> >> >> >
>> >> >> >> b. Setting where and which file for the property
>> >> >> >> manifoldcf.configfile
>> >> >> >> for deploying the war on tomcat with Postgresql database.
>> >> >> >
>> >> >> > The documentation already tells you that you need to add an
>> >> >> > appropriate -D to your tomcat invocation to point to your
>> >> >> > properties.xml file.  Tomcat documentation differs from version to
>> >> >> > version and platform to platform on how best to do that, and if
>> >> >> > you
>> >> >> > run under Windows there's even a service wrapper with a
>> >> >> > configuration
>> >> >> > UI that allows you to set these parameters.  So it's way beyond
>> >> >> > ManifoldCF's mission to describe all that, I think.
>> >> >> >
>> >> >> >> c. I am trying to build the Documentum Connector but came to know
>> >> >> >> that
>> >> >> >> some
>> >> >> >> additional environment variables needs to be added for
>> >> >> >> "DOCUMENTUM".
>> >> >> >> Additionally the latest version of documentum uses dfc.properties
>> >> >> >> file
>> >> >> >> while
>> >> >> >> run.bat look for dctl.ini file.
>> >> >> >
>> >> >> > Could you open a ticket in Jira for this issue?
>> >> >> > https://issues.apache.org/jira. It should not be a problem if you
>> >> >> > modify the script temporarily, but we can readily make the script
>> >> >> > look
>> >> >> > for either of these.
>> >> >> >
>> >> >> >> d. postgresql driver is jdbc3 thus it creates problem with JVM6
>> >> >> >> or
>> >> >> >> above.
>> >> >> >
>> >> >> > We use JDK 6 all the time without problems, so I don't know what
>> >> >> > you
>> >> >> > are talking about here.
>> >> >> >
>> >> >> >> e. I was getting errors during  the ant build which tries to
>> >> >> >> delete
>> >> >> >> jar
>> >> >> >> files from lib directory. Don't have the source code right now
>> >> >> >> with
>> >> >> >> me
>> >> >> >> thus
>> >> >> >> cant provide the full path.
>> >> >> >
>> >> >> > It sounds like you were trying to run ant while you still had
>> >> >> > ManifoldCF processes running from the same tree.
>> >> >> >
>> >> >> >> f. It was advised in the documentation to set MCF_Home for
>> >> >> >> example_multiprocess project but it seems the build of documentum
>> >> >> >> connector
>> >> >> >> refers to this property differently from run.bat.
>> >> >> >
>> >> >> > Yes, this was noticed and fixed on trunk recently.
>> >> >> >
>> >> >> >>
>> >> >> >> Can you please update the Apache ManifoldCF website with the
>> >> >> >> latest
>> >> >> >> installation procedures. Also, It will be very kind of you in the
>> >> >> >> meanwhile
>> >> >> >> if you can send few notes for me to head start the configuration
>> >> >> >> of
>> >> >> >> ManifoldCF, with SOLR & Documentum connector.
>> >> >> >>
>> >> >> >
>> >> >> > The documentation online has been updated to be consistent with
>> >> >> > trunk,
>> >> >> > so if you want to use the trunk version this might be a good
>> >> >> > opportunity to help clarify the documentation.  Either that or you
>> >> >> > will need to stick with the 0.4-incubating release and the
>> >> >> > 0.4-incubating documentation that is part of it; we cannot at this
>> >> >> > time update documentation that has already been released.
>> >> >> >
>> >> >> > Thanks,
>> >> >> > Karl
>> >> >> >
>> >> >> >> Looking forward for your help.
>> >> >> >>
>> >> >> >> Thanks & Regards
>> >> >> >> Anupam Bhattacharya
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Thanks & Regards
>> >> > Anupam Bhattacharya
>> >> >
>> >> >
>> >
>> >
>> >
>> >
>> > --
>> > Thanks & Regards
>> > Anupam Bhattacharya
>> >
>> >
>
>
>
>
> --
> Thanks & Regards
> Anupam Bhattacharya
>
>

Re: Need Help on setting up ManifoldCF

Posted by Anupam Bhattacharya <an...@gmail.com>.
Thanks Karl,

I was just curious.. can the Documentum Connector present in ManifoldCF
index binary documents also in addition to the content model defined
document types & its metadata ?

Since configuring documentum repository connection in ManifoldCF for
crawler and then again in SOLR to fetch the actual document will be repeat
work to fetch metadata of one document.

Regards
Anupam

On Fri, Feb 24, 2012 at 12:44 AM, Karl Wright <da...@gmail.com> wrote:

> Glad it is working for you!
>
> Solr is almost infinitely flexible, so you have many options.
>
> In my opinion the best way you convert binary documents to indexable
> text is indeed to use Solr Cell.  Solr Cell is constructed on Tika, so
> you won't need to bring in Tika for this because it should already be
> there. Tika has a pipeline architecture which should suit your use
> case well.   It should thus be possible to configure the existing
> update handler to use Solr Cell, and configure Solr Cell's Tika
> instance to perform whatever transformations you need.
>
> Hope this helps.  For further Solr questions, you can always ask on
> the Solr user list.  A Tika user list is also available.
>
> Thanks,
> Karl
>
> On Thu, Feb 23, 2012 at 2:04 PM, Anupam Bhattacharya
> <an...@gmail.com> wrote:
> > Hello Karl,
> >
> > Finally, I was able to index all the metadata for the defined document
> types
> > with different content types. Everything went well.
> > Although I was not able to index the file full text content. (like PDF,
> > XML). I read about SOLR Cell where using CURL we can upload documents but
> > unfortunately our XML files structure contains Tag & values which also
> needs
> > to be indexed.
> > e.g, some XML structure..
> >
> > <doc>
> > <object_id>111</object_id>
> > <abstract>Abstract Text</abstract>
> > <citation>Citation Text</citation>
> > <publication>News Source</publication>
> > </doc>
> >
> > I found that in SOLR if we add a new RequestHandler Code extending the
> > ExtractingRequestHandler we can parse the documents fetch information and
> > add it as index field in the SOLR index.
> >
> > What is the ideal approach for indexing tag values from XML in lucene
> from
> > ManifoldCF to SOLR ? Is it necessary to integrate TIKA for this ?
> > I found a good post over here.. https://community.emc.com/docs/DOC-6520
> >
> > Appreciate your advice on this.
> >
> > Regards
> > Anupam
> >
> >
> >
> >
> > On Thu, Feb 16, 2012 at 12:17 AM, Karl Wright <da...@gmail.com>
> wrote:
> >>
> >> On Wed, Feb 15, 2012 at 1:13 PM, Anupam Bhattacharya
> >> <an...@gmail.com> wrote:
> >> > Hello Karl,
> >> >
> >> > Thanks for adding this to the JIRA system.
> >> >
> >> > The dfc.properties was introduced from Documentum 6.0 version onwards
> &
> >> > as
> >> > per manifoldcf connector documentation
> >> > (
> http://incubator.apache.org/connectors/en_US/included-connectors.html)
> >> > the
> >> > out-of the box connector classes were tested against DFC 5.3 SP5 which
> >> > needed the dmcl.ini file. Thus run.bat must have been configured
> >> > properly
> >> > for that dmcl.ini.
> >>
> >> Right - so does DFC 6.0 on Windows require the DOCUMENTUM environment
> >> variable to be set to point at the directory where dfc.properties is
> >> found?  Or perhaps it doesn't require the DOCUMENTUM environment
> >> variable at all anymore?
> >>
> >> >
> >> > As I am trying to connect to DFC 6.5 SP3 version i need to look for
> >> > dfc.properties file. I hope the out-of the box documentum connector
> will
> >> > work with 6.5 version.
> >>
> >> It was tried and worked.  The script was developed later with only the
> >> 5.3 version available.
> >>
> >> >
> >> > I am confused, why for all connector we have Client & Server version ?
> >> > Can
> >> > you please explain.
> >> >
> >>
> >> Do you mean "why is there a documentum-connector-server" process?  If
> >> that's the question, it was created for two reasons:
> >> (1) We had problems with stability of DFC.  It segfaults occasionally,
> >> somewhere in its native code.  We did not want that to bring down
> >> ManifoldCF, and we wanted to be able to restart the part of the
> >> connector that depended on DFC transparently when it crashed.
> >> (2) DFC has dependencies on many older open-source jars that conflict
> >> with the rest of ManifoldCF.  If (1) was not a problem we might have
> >> used a classloader to fix this, but since we had to fix both we
> >> created a separate process.
> >>
> >> FWIW, we do the same thing for FileNet because of its dependency on
> Wasp.
> >>
> >> Karl
> >>
> >> > Again, Thanks for all the help.
> >> >
> >> > Regards
> >> > Anupam
> >> >
> >> >
> >> > On Wed, Feb 15, 2012 at 8:42 PM, Karl Wright <da...@gmail.com>
> wrote:
> >> >>
> >> >> Hi Anupam,
> >> >>
> >> >> I did not see a ticket from you about the DOCUMENTUM environment
> >> >> variable and the dmcl.ini vs. dfc.properties file.  I've created an
> >> >> issue at https://issues.apache.org/jira/browse/CONNECTORS-410 to
> track
> >> >> this problem.  It would be great if you could confirm that: (a) the
> >> >> DOCUMENTUM environment variable is still needed at all by DFC, and
> (b)
> >> >> that when it is set properly, the file dfc.properties can be found at
> >> >> $DOCUMENTUM\dfc.properties (on Windows, at least).
> >> >>
> >> >> Thanks,
> >> >> Karl
> >> >>
> >> >> On Tue, Feb 14, 2012 at 3:23 PM, Karl Wright <da...@gmail.com>
> >> >> wrote:
> >> >> > Hi Anupam,
> >> >> >
> >> >> > Please post emails like this directly to
> >> >> > connectors-user@incubator.apache.org.  See below for responses.
> >> >> >
> >> >> > On Tue, Feb 14, 2012 at 3:07 PM, Anupam Bhattacharya
> >> >> > <an...@gmail.com> wrote:
> >> >> >>
> >> >> >> Hello Karl,
> >> >> >>
> >> >> >> I am a software programmer in DuPont, Gurgaon, India. Recently,
> due
> >> >> >> to
> >> >> >> the
> >> >> >> economic instability all over the world the company has decided to
> >> >> >> go
> >> >> >> for
> >> >> >> cheaper Search Engine Applications. Thus we are getting rid of
> many
> >> >> >> costly
> >> >> >> proprietary Search Applications and will be replacing with FAST.
> >> >> >>
> >> >> >> Although, I recently came across SOLR search engine & ManiFoldCF
> >> >> >> Connector
> >> >> >> framework. Thus, I am currently driving this effort within my
> >> >> >> company
> >> >> >> as i
> >> >> >> am a big supporter of open source technologies. I started my
> career
> >> >> >> in
> >> >> >> Alfresco CMS and now working on Search Technologies.
> >> >> >>
> >> >> >> Currently I am facing lots of initial
> building/deploying/installing
> >> >> >> issues.
> >> >> >> I have already referred the url
> >> >> >>
> >> >> >>
> >> >> >>
> http://incubator.apache.org/connectors/en_US/how-to-build-and-deploy.html
> >> >> >> Read it multiple times but still face many issues. I downloaded
> the
> >> >> >> latest
> >> >> >> 0.4 version and it seems the documentation is not up to date on
> the
> >> >> >> above
> >> >> >> link.
> >> >> >>
> >> >> >
> >> >> > The online documentation is pertinent to trunk.  The documentation
> >> >> > you
> >> >> > want to use is contained within the 0.4-incubating release.  Go to
> >> >> > dist/doc and you will see it there.
> >> >> >
> >> >> >> Few issues which took me a long time to resolve which can be added
> >> >> >> in
> >> >> >> ManifoldCF wiki as learnings for others are listed below:
> >> >> >> a. No single example is given for running the executecommand.bat
> >> >> >> with
> >> >> >> proper
> >> >> >> arguments. Only list of commands given with parameter defined.
> >> >> >
> >> >> > I'm not entirely sure I get this.  Do you just want an example in
> the
> >> >> > documentation?
> >> >> >
> >> >> >> b. Setting where and which file for the property
> >> >> >> manifoldcf.configfile
> >> >> >> for deploying the war on tomcat with Postgresql database.
> >> >> >
> >> >> > The documentation already tells you that you need to add an
> >> >> > appropriate -D to your tomcat invocation to point to your
> >> >> > properties.xml file.  Tomcat documentation differs from version to
> >> >> > version and platform to platform on how best to do that, and if you
> >> >> > run under Windows there's even a service wrapper with a
> configuration
> >> >> > UI that allows you to set these parameters.  So it's way beyond
> >> >> > ManifoldCF's mission to describe all that, I think.
> >> >> >
> >> >> >> c. I am trying to build the Documentum Connector but came to know
> >> >> >> that
> >> >> >> some
> >> >> >> additional environment variables needs to be added for
> "DOCUMENTUM".
> >> >> >> Additionally the latest version of documentum uses dfc.properties
> >> >> >> file
> >> >> >> while
> >> >> >> run.bat look for dctl.ini file.
> >> >> >
> >> >> > Could you open a ticket in Jira for this issue?
> >> >> > https://issues.apache.org/jira. It should not be a problem if you
> >> >> > modify the script temporarily, but we can readily make the script
> >> >> > look
> >> >> > for either of these.
> >> >> >
> >> >> >> d. postgresql driver is jdbc3 thus it creates problem with JVM6 or
> >> >> >> above.
> >> >> >
> >> >> > We use JDK 6 all the time without problems, so I don't know what
> you
> >> >> > are talking about here.
> >> >> >
> >> >> >> e. I was getting errors during  the ant build which tries to
> delete
> >> >> >> jar
> >> >> >> files from lib directory. Don't have the source code right now
> with
> >> >> >> me
> >> >> >> thus
> >> >> >> cant provide the full path.
> >> >> >
> >> >> > It sounds like you were trying to run ant while you still had
> >> >> > ManifoldCF processes running from the same tree.
> >> >> >
> >> >> >> f. It was advised in the documentation to set MCF_Home for
> >> >> >> example_multiprocess project but it seems the build of documentum
> >> >> >> connector
> >> >> >> refers to this property differently from run.bat.
> >> >> >
> >> >> > Yes, this was noticed and fixed on trunk recently.
> >> >> >
> >> >> >>
> >> >> >> Can you please update the Apache ManifoldCF website with the
> latest
> >> >> >> installation procedures. Also, It will be very kind of you in the
> >> >> >> meanwhile
> >> >> >> if you can send few notes for me to head start the configuration
> of
> >> >> >> ManifoldCF, with SOLR & Documentum connector.
> >> >> >>
> >> >> >
> >> >> > The documentation online has been updated to be consistent with
> >> >> > trunk,
> >> >> > so if you want to use the trunk version this might be a good
> >> >> > opportunity to help clarify the documentation.  Either that or you
> >> >> > will need to stick with the 0.4-incubating release and the
> >> >> > 0.4-incubating documentation that is part of it; we cannot at this
> >> >> > time update documentation that has already been released.
> >> >> >
> >> >> > Thanks,
> >> >> > Karl
> >> >> >
> >> >> >> Looking forward for your help.
> >> >> >>
> >> >> >> Thanks & Regards
> >> >> >> Anupam Bhattacharya
> >> >> >>
> >> >> >>
> >> >> >>
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Thanks & Regards
> >> > Anupam Bhattacharya
> >> >
> >> >
> >
> >
> >
> >
> > --
> > Thanks & Regards
> > Anupam Bhattacharya
> >
> >
>



-- 
Thanks & Regards
Anupam Bhattacharya

Re: Need Help on setting up ManifoldCF

Posted by Karl Wright <da...@gmail.com>.
Glad it is working for you!

Solr is almost infinitely flexible, so you have many options.

In my opinion the best way you convert binary documents to indexable
text is indeed to use Solr Cell.  Solr Cell is constructed on Tika, so
you won't need to bring in Tika for this because it should already be
there. Tika has a pipeline architecture which should suit your use
case well.   It should thus be possible to configure the existing
update handler to use Solr Cell, and configure Solr Cell's Tika
instance to perform whatever transformations you need.

Hope this helps.  For further Solr questions, you can always ask on
the Solr user list.  A Tika user list is also available.

Thanks,
Karl

On Thu, Feb 23, 2012 at 2:04 PM, Anupam Bhattacharya
<an...@gmail.com> wrote:
> Hello Karl,
>
> Finally, I was able to index all the metadata for the defined document types
> with different content types. Everything went well.
> Although I was not able to index the file full text content. (like PDF,
> XML). I read about SOLR Cell where using CURL we can upload documents but
> unfortunately our XML files structure contains Tag & values which also needs
> to be indexed.
> e.g, some XML structure..
>
> <doc>
> <object_id>111</object_id>
> <abstract>Abstract Text</abstract>
> <citation>Citation Text</citation>
> <publication>News Source</publication>
> </doc>
>
> I found that in SOLR if we add a new RequestHandler Code extending the
> ExtractingRequestHandler we can parse the documents fetch information and
> add it as index field in the SOLR index.
>
> What is the ideal approach for indexing tag values from XML in lucene from
> ManifoldCF to SOLR ? Is it necessary to integrate TIKA for this ?
> I found a good post over here.. https://community.emc.com/docs/DOC-6520
>
> Appreciate your advice on this.
>
> Regards
> Anupam
>
>
>
>
> On Thu, Feb 16, 2012 at 12:17 AM, Karl Wright <da...@gmail.com> wrote:
>>
>> On Wed, Feb 15, 2012 at 1:13 PM, Anupam Bhattacharya
>> <an...@gmail.com> wrote:
>> > Hello Karl,
>> >
>> > Thanks for adding this to the JIRA system.
>> >
>> > The dfc.properties was introduced from Documentum 6.0 version onwards &
>> > as
>> > per manifoldcf connector documentation
>> > (http://incubator.apache.org/connectors/en_US/included-connectors.html)
>> > the
>> > out-of the box connector classes were tested against DFC 5.3 SP5 which
>> > needed the dmcl.ini file. Thus run.bat must have been configured
>> > properly
>> > for that dmcl.ini.
>>
>> Right - so does DFC 6.0 on Windows require the DOCUMENTUM environment
>> variable to be set to point at the directory where dfc.properties is
>> found?  Or perhaps it doesn't require the DOCUMENTUM environment
>> variable at all anymore?
>>
>> >
>> > As I am trying to connect to DFC 6.5 SP3 version i need to look for
>> > dfc.properties file. I hope the out-of the box documentum connector will
>> > work with 6.5 version.
>>
>> It was tried and worked.  The script was developed later with only the
>> 5.3 version available.
>>
>> >
>> > I am confused, why for all connector we have Client & Server version ?
>> > Can
>> > you please explain.
>> >
>>
>> Do you mean "why is there a documentum-connector-server" process?  If
>> that's the question, it was created for two reasons:
>> (1) We had problems with stability of DFC.  It segfaults occasionally,
>> somewhere in its native code.  We did not want that to bring down
>> ManifoldCF, and we wanted to be able to restart the part of the
>> connector that depended on DFC transparently when it crashed.
>> (2) DFC has dependencies on many older open-source jars that conflict
>> with the rest of ManifoldCF.  If (1) was not a problem we might have
>> used a classloader to fix this, but since we had to fix both we
>> created a separate process.
>>
>> FWIW, we do the same thing for FileNet because of its dependency on Wasp.
>>
>> Karl
>>
>> > Again, Thanks for all the help.
>> >
>> > Regards
>> > Anupam
>> >
>> >
>> > On Wed, Feb 15, 2012 at 8:42 PM, Karl Wright <da...@gmail.com> wrote:
>> >>
>> >> Hi Anupam,
>> >>
>> >> I did not see a ticket from you about the DOCUMENTUM environment
>> >> variable and the dmcl.ini vs. dfc.properties file.  I've created an
>> >> issue at https://issues.apache.org/jira/browse/CONNECTORS-410 to track
>> >> this problem.  It would be great if you could confirm that: (a) the
>> >> DOCUMENTUM environment variable is still needed at all by DFC, and (b)
>> >> that when it is set properly, the file dfc.properties can be found at
>> >> $DOCUMENTUM\dfc.properties (on Windows, at least).
>> >>
>> >> Thanks,
>> >> Karl
>> >>
>> >> On Tue, Feb 14, 2012 at 3:23 PM, Karl Wright <da...@gmail.com>
>> >> wrote:
>> >> > Hi Anupam,
>> >> >
>> >> > Please post emails like this directly to
>> >> > connectors-user@incubator.apache.org.  See below for responses.
>> >> >
>> >> > On Tue, Feb 14, 2012 at 3:07 PM, Anupam Bhattacharya
>> >> > <an...@gmail.com> wrote:
>> >> >>
>> >> >> Hello Karl,
>> >> >>
>> >> >> I am a software programmer in DuPont, Gurgaon, India. Recently, due
>> >> >> to
>> >> >> the
>> >> >> economic instability all over the world the company has decided to
>> >> >> go
>> >> >> for
>> >> >> cheaper Search Engine Applications. Thus we are getting rid of many
>> >> >> costly
>> >> >> proprietary Search Applications and will be replacing with FAST.
>> >> >>
>> >> >> Although, I recently came across SOLR search engine & ManiFoldCF
>> >> >> Connector
>> >> >> framework. Thus, I am currently driving this effort within my
>> >> >> company
>> >> >> as i
>> >> >> am a big supporter of open source technologies. I started my career
>> >> >> in
>> >> >> Alfresco CMS and now working on Search Technologies.
>> >> >>
>> >> >> Currently I am facing lots of initial building/deploying/installing
>> >> >> issues.
>> >> >> I have already referred the url
>> >> >>
>> >> >>
>> >> >> http://incubator.apache.org/connectors/en_US/how-to-build-and-deploy.html
>> >> >> Read it multiple times but still face many issues. I downloaded the
>> >> >> latest
>> >> >> 0.4 version and it seems the documentation is not up to date on the
>> >> >> above
>> >> >> link.
>> >> >>
>> >> >
>> >> > The online documentation is pertinent to trunk.  The documentation
>> >> > you
>> >> > want to use is contained within the 0.4-incubating release.  Go to
>> >> > dist/doc and you will see it there.
>> >> >
>> >> >> Few issues which took me a long time to resolve which can be added
>> >> >> in
>> >> >> ManifoldCF wiki as learnings for others are listed below:
>> >> >> a. No single example is given for running the executecommand.bat
>> >> >> with
>> >> >> proper
>> >> >> arguments. Only list of commands given with parameter defined.
>> >> >
>> >> > I'm not entirely sure I get this.  Do you just want an example in the
>> >> > documentation?
>> >> >
>> >> >> b. Setting where and which file for the property
>> >> >> manifoldcf.configfile
>> >> >> for deploying the war on tomcat with Postgresql database.
>> >> >
>> >> > The documentation already tells you that you need to add an
>> >> > appropriate -D to your tomcat invocation to point to your
>> >> > properties.xml file.  Tomcat documentation differs from version to
>> >> > version and platform to platform on how best to do that, and if you
>> >> > run under Windows there's even a service wrapper with a configuration
>> >> > UI that allows you to set these parameters.  So it's way beyond
>> >> > ManifoldCF's mission to describe all that, I think.
>> >> >
>> >> >> c. I am trying to build the Documentum Connector but came to know
>> >> >> that
>> >> >> some
>> >> >> additional environment variables needs to be added for "DOCUMENTUM".
>> >> >> Additionally the latest version of documentum uses dfc.properties
>> >> >> file
>> >> >> while
>> >> >> run.bat look for dctl.ini file.
>> >> >
>> >> > Could you open a ticket in Jira for this issue?
>> >> > https://issues.apache.org/jira. It should not be a problem if you
>> >> > modify the script temporarily, but we can readily make the script
>> >> > look
>> >> > for either of these.
>> >> >
>> >> >> d. postgresql driver is jdbc3 thus it creates problem with JVM6 or
>> >> >> above.
>> >> >
>> >> > We use JDK 6 all the time without problems, so I don't know what you
>> >> > are talking about here.
>> >> >
>> >> >> e. I was getting errors during  the ant build which tries to delete
>> >> >> jar
>> >> >> files from lib directory. Don't have the source code right now with
>> >> >> me
>> >> >> thus
>> >> >> cant provide the full path.
>> >> >
>> >> > It sounds like you were trying to run ant while you still had
>> >> > ManifoldCF processes running from the same tree.
>> >> >
>> >> >> f. It was advised in the documentation to set MCF_Home for
>> >> >> example_multiprocess project but it seems the build of documentum
>> >> >> connector
>> >> >> refers to this property differently from run.bat.
>> >> >
>> >> > Yes, this was noticed and fixed on trunk recently.
>> >> >
>> >> >>
>> >> >> Can you please update the Apache ManifoldCF website with the latest
>> >> >> installation procedures. Also, It will be very kind of you in the
>> >> >> meanwhile
>> >> >> if you can send few notes for me to head start the configuration of
>> >> >> ManifoldCF, with SOLR & Documentum connector.
>> >> >>
>> >> >
>> >> > The documentation online has been updated to be consistent with
>> >> > trunk,
>> >> > so if you want to use the trunk version this might be a good
>> >> > opportunity to help clarify the documentation.  Either that or you
>> >> > will need to stick with the 0.4-incubating release and the
>> >> > 0.4-incubating documentation that is part of it; we cannot at this
>> >> > time update documentation that has already been released.
>> >> >
>> >> > Thanks,
>> >> > Karl
>> >> >
>> >> >> Looking forward for your help.
>> >> >>
>> >> >> Thanks & Regards
>> >> >> Anupam Bhattacharya
>> >> >>
>> >> >>
>> >> >>
>> >
>> >
>> >
>> >
>> > --
>> > Thanks & Regards
>> > Anupam Bhattacharya
>> >
>> >
>
>
>
>
> --
> Thanks & Regards
> Anupam Bhattacharya
>
>

Re: Need Help on setting up ManifoldCF

Posted by Anupam Bhattacharya <an...@gmail.com>.
Hello Karl,

Finally, I was able to index all the metadata for the defined document
types with different content types. Everything went well.
Although I was not able to index the file full text content. (like PDF,
XML). I read about SOLR Cell where using CURL we can upload documents but
unfortunately our XML files structure contains Tag & values which also
needs to be indexed.
e.g, some XML structure..

<doc>
<object_id>111</object_id>
<abstract>Abstract Text</abstract>
<citation>Citation Text</citation>
<publication>News Source</publication>
</doc>

I found that in SOLR if we add a new RequestHandler Code extending the
ExtractingRequestHandler we can parse the documents fetch information and
add it as index field in the SOLR index.

What is the ideal approach for indexing tag values from XML in lucene from
ManifoldCF to SOLR ? Is it necessary to integrate TIKA for this ?
I found a good post over here.. https://community.emc.com/docs/DOC-6520

Appreciate your advice on this.

Regards
Anupam




On Thu, Feb 16, 2012 at 12:17 AM, Karl Wright <da...@gmail.com> wrote:

> On Wed, Feb 15, 2012 at 1:13 PM, Anupam Bhattacharya
> <an...@gmail.com> wrote:
> > Hello Karl,
> >
> > Thanks for adding this to the JIRA system.
> >
> > The dfc.properties was introduced from Documentum 6.0 version onwards &
> as
> > per manifoldcf connector documentation
> > (http://incubator.apache.org/connectors/en_US/included-connectors.html)
> the
> > out-of the box connector classes were tested against DFC 5.3 SP5 which
> > needed the dmcl.ini file. Thus run.bat must have been configured properly
> > for that dmcl.ini.
>
> Right - so does DFC 6.0 on Windows require the DOCUMENTUM environment
> variable to be set to point at the directory where dfc.properties is
> found?  Or perhaps it doesn't require the DOCUMENTUM environment
> variable at all anymore?
>
> >
> > As I am trying to connect to DFC 6.5 SP3 version i need to look for
> > dfc.properties file. I hope the out-of the box documentum connector will
> > work with 6.5 version.
>
> It was tried and worked.  The script was developed later with only the
> 5.3 version available.
>
> >
> > I am confused, why for all connector we have Client & Server version ?
> Can
> > you please explain.
> >
>
> Do you mean "why is there a documentum-connector-server" process?  If
> that's the question, it was created for two reasons:
> (1) We had problems with stability of DFC.  It segfaults occasionally,
> somewhere in its native code.  We did not want that to bring down
> ManifoldCF, and we wanted to be able to restart the part of the
> connector that depended on DFC transparently when it crashed.
> (2) DFC has dependencies on many older open-source jars that conflict
> with the rest of ManifoldCF.  If (1) was not a problem we might have
> used a classloader to fix this, but since we had to fix both we
> created a separate process.
>
> FWIW, we do the same thing for FileNet because of its dependency on Wasp.
>
> Karl
>
> > Again, Thanks for all the help.
> >
> > Regards
> > Anupam
> >
> >
> > On Wed, Feb 15, 2012 at 8:42 PM, Karl Wright <da...@gmail.com> wrote:
> >>
> >> Hi Anupam,
> >>
> >> I did not see a ticket from you about the DOCUMENTUM environment
> >> variable and the dmcl.ini vs. dfc.properties file.  I've created an
> >> issue at https://issues.apache.org/jira/browse/CONNECTORS-410 to track
> >> this problem.  It would be great if you could confirm that: (a) the
> >> DOCUMENTUM environment variable is still needed at all by DFC, and (b)
> >> that when it is set properly, the file dfc.properties can be found at
> >> $DOCUMENTUM\dfc.properties (on Windows, at least).
> >>
> >> Thanks,
> >> Karl
> >>
> >> On Tue, Feb 14, 2012 at 3:23 PM, Karl Wright <da...@gmail.com>
> wrote:
> >> > Hi Anupam,
> >> >
> >> > Please post emails like this directly to
> >> > connectors-user@incubator.apache.org.  See below for responses.
> >> >
> >> > On Tue, Feb 14, 2012 at 3:07 PM, Anupam Bhattacharya
> >> > <an...@gmail.com> wrote:
> >> >>
> >> >> Hello Karl,
> >> >>
> >> >> I am a software programmer in DuPont, Gurgaon, India. Recently, due
> to
> >> >> the
> >> >> economic instability all over the world the company has decided to go
> >> >> for
> >> >> cheaper Search Engine Applications. Thus we are getting rid of many
> >> >> costly
> >> >> proprietary Search Applications and will be replacing with FAST.
> >> >>
> >> >> Although, I recently came across SOLR search engine & ManiFoldCF
> >> >> Connector
> >> >> framework. Thus, I am currently driving this effort within my company
> >> >> as i
> >> >> am a big supporter of open source technologies. I started my career
> in
> >> >> Alfresco CMS and now working on Search Technologies.
> >> >>
> >> >> Currently I am facing lots of initial building/deploying/installing
> >> >> issues.
> >> >> I have already referred the url
> >> >>
> >> >>
> http://incubator.apache.org/connectors/en_US/how-to-build-and-deploy.html
> >> >> Read it multiple times but still face many issues. I downloaded the
> >> >> latest
> >> >> 0.4 version and it seems the documentation is not up to date on the
> >> >> above
> >> >> link.
> >> >>
> >> >
> >> > The online documentation is pertinent to trunk.  The documentation you
> >> > want to use is contained within the 0.4-incubating release.  Go to
> >> > dist/doc and you will see it there.
> >> >
> >> >> Few issues which took me a long time to resolve which can be added in
> >> >> ManifoldCF wiki as learnings for others are listed below:
> >> >> a. No single example is given for running the executecommand.bat with
> >> >> proper
> >> >> arguments. Only list of commands given with parameter defined.
> >> >
> >> > I'm not entirely sure I get this.  Do you just want an example in the
> >> > documentation?
> >> >
> >> >> b. Setting where and which file for the property
> manifoldcf.configfile
> >> >> for deploying the war on tomcat with Postgresql database.
> >> >
> >> > The documentation already tells you that you need to add an
> >> > appropriate -D to your tomcat invocation to point to your
> >> > properties.xml file.  Tomcat documentation differs from version to
> >> > version and platform to platform on how best to do that, and if you
> >> > run under Windows there's even a service wrapper with a configuration
> >> > UI that allows you to set these parameters.  So it's way beyond
> >> > ManifoldCF's mission to describe all that, I think.
> >> >
> >> >> c. I am trying to build the Documentum Connector but came to know
> that
> >> >> some
> >> >> additional environment variables needs to be added for "DOCUMENTUM".
> >> >> Additionally the latest version of documentum uses dfc.properties
> file
> >> >> while
> >> >> run.bat look for dctl.ini file.
> >> >
> >> > Could you open a ticket in Jira for this issue?
> >> > https://issues.apache.org/jira. It should not be a problem if you
> >> > modify the script temporarily, but we can readily make the script look
> >> > for either of these.
> >> >
> >> >> d. postgresql driver is jdbc3 thus it creates problem with JVM6 or
> >> >> above.
> >> >
> >> > We use JDK 6 all the time without problems, so I don't know what you
> >> > are talking about here.
> >> >
> >> >> e. I was getting errors during  the ant build which tries to delete
> jar
> >> >> files from lib directory. Don't have the source code right now with
> me
> >> >> thus
> >> >> cant provide the full path.
> >> >
> >> > It sounds like you were trying to run ant while you still had
> >> > ManifoldCF processes running from the same tree.
> >> >
> >> >> f. It was advised in the documentation to set MCF_Home for
> >> >> example_multiprocess project but it seems the build of documentum
> >> >> connector
> >> >> refers to this property differently from run.bat.
> >> >
> >> > Yes, this was noticed and fixed on trunk recently.
> >> >
> >> >>
> >> >> Can you please update the Apache ManifoldCF website with the latest
> >> >> installation procedures. Also, It will be very kind of you in the
> >> >> meanwhile
> >> >> if you can send few notes for me to head start the configuration of
> >> >> ManifoldCF, with SOLR & Documentum connector.
> >> >>
> >> >
> >> > The documentation online has been updated to be consistent with trunk,
> >> > so if you want to use the trunk version this might be a good
> >> > opportunity to help clarify the documentation.  Either that or you
> >> > will need to stick with the 0.4-incubating release and the
> >> > 0.4-incubating documentation that is part of it; we cannot at this
> >> > time update documentation that has already been released.
> >> >
> >> > Thanks,
> >> > Karl
> >> >
> >> >> Looking forward for your help.
> >> >>
> >> >> Thanks & Regards
> >> >> Anupam Bhattacharya
> >> >>
> >> >>
> >> >>
> >
> >
> >
> >
> > --
> > Thanks & Regards
> > Anupam Bhattacharya
> >
> >
>



-- 
Thanks & Regards
Anupam Bhattacharya

Re: Need Help on setting up ManifoldCF

Posted by Karl Wright <da...@gmail.com>.
On Wed, Feb 15, 2012 at 1:13 PM, Anupam Bhattacharya
<an...@gmail.com> wrote:
> Hello Karl,
>
> Thanks for adding this to the JIRA system.
>
> The dfc.properties was introduced from Documentum 6.0 version onwards & as
> per manifoldcf connector documentation
> (http://incubator.apache.org/connectors/en_US/included-connectors.html) the
> out-of the box connector classes were tested against DFC 5.3 SP5 which
> needed the dmcl.ini file. Thus run.bat must have been configured properly
> for that dmcl.ini.

Right - so does DFC 6.0 on Windows require the DOCUMENTUM environment
variable to be set to point at the directory where dfc.properties is
found?  Or perhaps it doesn't require the DOCUMENTUM environment
variable at all anymore?

>
> As I am trying to connect to DFC 6.5 SP3 version i need to look for
> dfc.properties file. I hope the out-of the box documentum connector will
> work with 6.5 version.

It was tried and worked.  The script was developed later with only the
5.3 version available.

>
> I am confused, why for all connector we have Client & Server version ? Can
> you please explain.
>

Do you mean "why is there a documentum-connector-server" process?  If
that's the question, it was created for two reasons:
(1) We had problems with stability of DFC.  It segfaults occasionally,
somewhere in its native code.  We did not want that to bring down
ManifoldCF, and we wanted to be able to restart the part of the
connector that depended on DFC transparently when it crashed.
(2) DFC has dependencies on many older open-source jars that conflict
with the rest of ManifoldCF.  If (1) was not a problem we might have
used a classloader to fix this, but since we had to fix both we
created a separate process.

FWIW, we do the same thing for FileNet because of its dependency on Wasp.

Karl

> Again, Thanks for all the help.
>
> Regards
> Anupam
>
>
> On Wed, Feb 15, 2012 at 8:42 PM, Karl Wright <da...@gmail.com> wrote:
>>
>> Hi Anupam,
>>
>> I did not see a ticket from you about the DOCUMENTUM environment
>> variable and the dmcl.ini vs. dfc.properties file.  I've created an
>> issue at https://issues.apache.org/jira/browse/CONNECTORS-410 to track
>> this problem.  It would be great if you could confirm that: (a) the
>> DOCUMENTUM environment variable is still needed at all by DFC, and (b)
>> that when it is set properly, the file dfc.properties can be found at
>> $DOCUMENTUM\dfc.properties (on Windows, at least).
>>
>> Thanks,
>> Karl
>>
>> On Tue, Feb 14, 2012 at 3:23 PM, Karl Wright <da...@gmail.com> wrote:
>> > Hi Anupam,
>> >
>> > Please post emails like this directly to
>> > connectors-user@incubator.apache.org.  See below for responses.
>> >
>> > On Tue, Feb 14, 2012 at 3:07 PM, Anupam Bhattacharya
>> > <an...@gmail.com> wrote:
>> >>
>> >> Hello Karl,
>> >>
>> >> I am a software programmer in DuPont, Gurgaon, India. Recently, due to
>> >> the
>> >> economic instability all over the world the company has decided to go
>> >> for
>> >> cheaper Search Engine Applications. Thus we are getting rid of many
>> >> costly
>> >> proprietary Search Applications and will be replacing with FAST.
>> >>
>> >> Although, I recently came across SOLR search engine & ManiFoldCF
>> >> Connector
>> >> framework. Thus, I am currently driving this effort within my company
>> >> as i
>> >> am a big supporter of open source technologies. I started my career in
>> >> Alfresco CMS and now working on Search Technologies.
>> >>
>> >> Currently I am facing lots of initial building/deploying/installing
>> >> issues.
>> >> I have already referred the url
>> >>
>> >> http://incubator.apache.org/connectors/en_US/how-to-build-and-deploy.html
>> >> Read it multiple times but still face many issues. I downloaded the
>> >> latest
>> >> 0.4 version and it seems the documentation is not up to date on the
>> >> above
>> >> link.
>> >>
>> >
>> > The online documentation is pertinent to trunk.  The documentation you
>> > want to use is contained within the 0.4-incubating release.  Go to
>> > dist/doc and you will see it there.
>> >
>> >> Few issues which took me a long time to resolve which can be added in
>> >> ManifoldCF wiki as learnings for others are listed below:
>> >> a. No single example is given for running the executecommand.bat with
>> >> proper
>> >> arguments. Only list of commands given with parameter defined.
>> >
>> > I'm not entirely sure I get this.  Do you just want an example in the
>> > documentation?
>> >
>> >> b. Setting where and which file for the property manifoldcf.configfile
>> >> for deploying the war on tomcat with Postgresql database.
>> >
>> > The documentation already tells you that you need to add an
>> > appropriate -D to your tomcat invocation to point to your
>> > properties.xml file.  Tomcat documentation differs from version to
>> > version and platform to platform on how best to do that, and if you
>> > run under Windows there's even a service wrapper with a configuration
>> > UI that allows you to set these parameters.  So it's way beyond
>> > ManifoldCF's mission to describe all that, I think.
>> >
>> >> c. I am trying to build the Documentum Connector but came to know that
>> >> some
>> >> additional environment variables needs to be added for "DOCUMENTUM".
>> >> Additionally the latest version of documentum uses dfc.properties file
>> >> while
>> >> run.bat look for dctl.ini file.
>> >
>> > Could you open a ticket in Jira for this issue?
>> > https://issues.apache.org/jira. It should not be a problem if you
>> > modify the script temporarily, but we can readily make the script look
>> > for either of these.
>> >
>> >> d. postgresql driver is jdbc3 thus it creates problem with JVM6 or
>> >> above.
>> >
>> > We use JDK 6 all the time without problems, so I don't know what you
>> > are talking about here.
>> >
>> >> e. I was getting errors during  the ant build which tries to delete jar
>> >> files from lib directory. Don't have the source code right now with me
>> >> thus
>> >> cant provide the full path.
>> >
>> > It sounds like you were trying to run ant while you still had
>> > ManifoldCF processes running from the same tree.
>> >
>> >> f. It was advised in the documentation to set MCF_Home for
>> >> example_multiprocess project but it seems the build of documentum
>> >> connector
>> >> refers to this property differently from run.bat.
>> >
>> > Yes, this was noticed and fixed on trunk recently.
>> >
>> >>
>> >> Can you please update the Apache ManifoldCF website with the latest
>> >> installation procedures. Also, It will be very kind of you in the
>> >> meanwhile
>> >> if you can send few notes for me to head start the configuration of
>> >> ManifoldCF, with SOLR & Documentum connector.
>> >>
>> >
>> > The documentation online has been updated to be consistent with trunk,
>> > so if you want to use the trunk version this might be a good
>> > opportunity to help clarify the documentation.  Either that or you
>> > will need to stick with the 0.4-incubating release and the
>> > 0.4-incubating documentation that is part of it; we cannot at this
>> > time update documentation that has already been released.
>> >
>> > Thanks,
>> > Karl
>> >
>> >> Looking forward for your help.
>> >>
>> >> Thanks & Regards
>> >> Anupam Bhattacharya
>> >>
>> >>
>> >>
>
>
>
>
> --
> Thanks & Regards
> Anupam Bhattacharya
>
>

Re: Need Help on setting up ManifoldCF

Posted by Karl Wright <da...@gmail.com>.
Hi Anupam,

I did not see a ticket from you about the DOCUMENTUM environment
variable and the dmcl.ini vs. dfc.properties file.  I've created an
issue at https://issues.apache.org/jira/browse/CONNECTORS-410 to track
this problem.  It would be great if you could confirm that: (a) the
DOCUMENTUM environment variable is still needed at all by DFC, and (b)
that when it is set properly, the file dfc.properties can be found at
$DOCUMENTUM\dfc.properties (on Windows, at least).

Thanks,
Karl

On Tue, Feb 14, 2012 at 3:23 PM, Karl Wright <da...@gmail.com> wrote:
> Hi Anupam,
>
> Please post emails like this directly to
> connectors-user@incubator.apache.org.  See below for responses.
>
> On Tue, Feb 14, 2012 at 3:07 PM, Anupam Bhattacharya
> <an...@gmail.com> wrote:
>>
>> Hello Karl,
>>
>> I am a software programmer in DuPont, Gurgaon, India. Recently, due to the
>> economic instability all over the world the company has decided to go for
>> cheaper Search Engine Applications. Thus we are getting rid of many costly
>> proprietary Search Applications and will be replacing with FAST.
>>
>> Although, I recently came across SOLR search engine & ManiFoldCF Connector
>> framework. Thus, I am currently driving this effort within my company as i
>> am a big supporter of open source technologies. I started my career in
>> Alfresco CMS and now working on Search Technologies.
>>
>> Currently I am facing lots of initial building/deploying/installing issues.
>> I have already referred the url
>> http://incubator.apache.org/connectors/en_US/how-to-build-and-deploy.html
>> Read it multiple times but still face many issues. I downloaded the latest
>> 0.4 version and it seems the documentation is not up to date on the above
>> link.
>>
>
> The online documentation is pertinent to trunk.  The documentation you
> want to use is contained within the 0.4-incubating release.  Go to
> dist/doc and you will see it there.
>
>> Few issues which took me a long time to resolve which can be added in
>> ManifoldCF wiki as learnings for others are listed below:
>> a. No single example is given for running the executecommand.bat with proper
>> arguments. Only list of commands given with parameter defined.
>
> I'm not entirely sure I get this.  Do you just want an example in the
> documentation?
>
>> b. Setting where and which file for the property manifoldcf.configfile for deploying the war on tomcat with Postgresql database.
>
> The documentation already tells you that you need to add an
> appropriate -D to your tomcat invocation to point to your
> properties.xml file.  Tomcat documentation differs from version to
> version and platform to platform on how best to do that, and if you
> run under Windows there's even a service wrapper with a configuration
> UI that allows you to set these parameters.  So it's way beyond
> ManifoldCF's mission to describe all that, I think.
>
>> c. I am trying to build the Documentum Connector but came to know that some
>> additional environment variables needs to be added for "DOCUMENTUM".
>> Additionally the latest version of documentum uses dfc.properties file while
>> run.bat look for dctl.ini file.
>
> Could you open a ticket in Jira for this issue?
> https://issues.apache.org/jira. It should not be a problem if you
> modify the script temporarily, but we can readily make the script look
> for either of these.
>
>> d. postgresql driver is jdbc3 thus it creates problem with JVM6 or above.
>
> We use JDK 6 all the time without problems, so I don't know what you
> are talking about here.
>
>> e. I was getting errors during  the ant build which tries to delete jar
>> files from lib directory. Don't have the source code right now with me thus
>> cant provide the full path.
>
> It sounds like you were trying to run ant while you still had
> ManifoldCF processes running from the same tree.
>
>> f. It was advised in the documentation to set MCF_Home for
>> example_multiprocess project but it seems the build of documentum connector
>> refers to this property differently from run.bat.
>
> Yes, this was noticed and fixed on trunk recently.
>
>>
>> Can you please update the Apache ManifoldCF website with the latest
>> installation procedures. Also, It will be very kind of you in the meanwhile
>> if you can send few notes for me to head start the configuration of
>> ManifoldCF, with SOLR & Documentum connector.
>>
>
> The documentation online has been updated to be consistent with trunk,
> so if you want to use the trunk version this might be a good
> opportunity to help clarify the documentation.  Either that or you
> will need to stick with the 0.4-incubating release and the
> 0.4-incubating documentation that is part of it; we cannot at this
> time update documentation that has already been released.
>
> Thanks,
> Karl
>
>> Looking forward for your help.
>>
>> Thanks & Regards
>> Anupam Bhattacharya
>>
>>
>>