You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by Karl Wright <da...@gmail.com> on 2012/01/24 22:52:56 UTC

Re: ManifoldCF's dist/shapoint-integration dir

I have not seen this exact problem before.

The "Bad envelope tag: HTML" indicates that the SOAP request the
SharePoint connector is attempting to perform is, in fact, returning
an HTML response.  This usually indicates that the server or path
parameters you've used to set up the connection are not set correctly,
and SharePoint is not actually being engaged.

But usually when that happens I don't recall a ConfigurationException
logged, unless it's what Axis does in response to the HTML.

The best thing to do at this point is turn on Http Client wire
logging, restart ManifoldCF, and view the connection.  The log will
then contain a record of the exact SOAP requests and the responses,
and we can see what's wrong.  The technique is described here:

https://cwiki.apache.org/confluence/display/CONNECTORS/Debugging+Connections

You can also confirm that the right SharePoint web services are
functioning on the machine in question by trying to access them
directly.  For the Lists web service, which is the one it sounds like
it was complaining about, try using IE (not Firefox etc because you
want NTLM support) to go to the url where you think the web service
lives.  This will be http: or https:, plus the server, plus the port,
plus the path, plus "_vti_bin/Lists.asmx".  You should see an
unequivocable SharePoint response.  For an example from the Microsoft
demo service, try http://www.wssdemo.com/_vti_bin/Lists.asmx.

Please let me know how it goes, and cc the dev list (as I have) so a
record of what you're encountering can be made available to others.

Thanks!
Karl




On Tue, Jan 24, 2012 at 1:52 PM, Silvia, Daniel [USA]
<Si...@bah.com> wrote:
> Hi Karl
>
> I have downloaded the newest version of ManifoldCF v .4 and have run the necessary ant scripts to download dependencies and then built the entire project. I have also had the ShrePoint webservice MetCarta.SharePoint.MCPermissionsService.wsp deployed on the SharePoint instance due to running version 3 of SharePoint (SharePoint 2007). When I try to create a Repository Connection and select "Save" I get a message on the ManifoldCF front end of "org.xml.sax.SAXException Bad envelope tag: HTML". When I look at the log file I see an error message " org.apache.axis.ConfigurationException: No service named ListsSoap is available".
>
> Can you tell me if you have seen this issue before and what may be causing this issue?
>
> Thanks for your help.
>
> Dan
> ________________________________________
> From: Karl Wright [daddywri@gmail.com]
> Sent: Friday, January 20, 2012 7:31 AM
> To: Silvia, Daniel [USA]
> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>
> Hi Daniel,
>
> In order for the SharePoint connector to build, you need to have the
> wsdls in place in the right area.  We cannot ship those because of
> potential copyright issues.  The easiest way to obtain the right
> dependencies is:
>
> ant download-dependencies
>
> Then, just build normally:
>
> ant build
>
> This will only work for ManifoldCF-0.4-incubating, or trunk.
> 0.4-incubating is still in the process of being signed off by the
> incubator, but you can find the release candidate here:
>
> http://people.apache.org/~kwright
>
> Thanks,
> Karl
>
>
>
> On Fri, Jan 20, 2012 at 7:02 AM, Silvia, Daniel [USA]
> <Si...@bah.com> wrote:
>> Hi Karl
>>
>> I work with Matt Parker and we are in the process of developing a pipeline
>> that uses ManifoldCF at the beginning. I just subscribed to the
>> connectors-user-subscribe@incubator.apache.org
>> group yesterday and submitted an e-mail question to the group. Can you help
>> us with the below issue?
>>
>> I downloaded MCF and started playing with the default setup under Jetty and
>> Derby. It starts up without any issue. I am trying to configure a SharePoint
>> connector, connecting to SharePoint Service 3. I have been following the
>> instructions and I am at the point of deploying the custom SharePoint web
>> service to the SharePoint instance. The instructions indicate that I should
>> get the web service from dist/sharepoint-integration after building MCF.
>> However, after looking through the entire directory structure, I am unable
>> to find the service to deploy.
>>
>> Can someone tell me where to find this service?
>>
>> Thanks for your help.
>>
>> Daniel Silvia

RE: ManifoldCF's dist/shapoint-integration dir

Posted by "Silvia, Daniel [USA]" <Si...@bah.com>.
Thanks Karl, I will try your suggestions.

________________________________
From: Karl Wright [daddywri@gmail.com]
Sent: Tuesday, January 24, 2012 4:52 PM
To: Silvia, Daniel [USA]; connectors-user@incubator.apache.org
Subject: Re: ManifoldCF's dist/shapoint-integration dir

I have not seen this exact problem before.

The "Bad envelope tag: HTML" indicates that the SOAP request the
SharePoint connector is attempting to perform is, in fact, returning
an HTML response.  This usually indicates that the server or path
parameters you've used to set up the connection are not set correctly,
and SharePoint is not actually being engaged.

But usually when that happens I don't recall a ConfigurationException
logged, unless it's what Axis does in response to the HTML.

The best thing to do at this point is turn on Http Client wire
logging, restart ManifoldCF, and view the connection.  The log will
then contain a record of the exact SOAP requests and the responses,
and we can see what's wrong.  The technique is described here:

https://cwiki.apache.org/confluence/display/CONNECTORS/Debugging+Connections

You can also confirm that the right SharePoint web services are
functioning on the machine in question by trying to access them
directly.  For the Lists web service, which is the one it sounds like
it was complaining about, try using IE (not Firefox etc because you
want NTLM support) to go to the url where you think the web service
lives.  This will be http: or https:, plus the server, plus the port,
plus the path, plus "_vti_bin/Lists.asmx".  You should see an
unequivocable SharePoint response.  For an example from the Microsoft
demo service, try http://www.wssdemo.com/_vti_bin/Lists.asmx.

Please let me know how it goes, and cc the dev list (as I have) so a
record of what you're encountering can be made available to others.

Thanks!
Karl




On Tue, Jan 24, 2012 at 1:52 PM, Silvia, Daniel [USA]
<Si...@bah.com> wrote:
> Hi Karl
>
> I have downloaded the newest version of ManifoldCF v .4 and have run the necessary ant scripts to download dependencies and then built the entire project. I have also had the ShrePoint webservice MetCarta.SharePoint.MCPermissionsService.wsp deployed on the SharePoint instance due to running version 3 of SharePoint (SharePoint 2007). When I try to create a Repository Connection and select "Save" I get a message on the ManifoldCF front end of "org.xml.sax.SAXException Bad envelope tag: HTML". When I look at the log file I see an error message " org.apache.axis.ConfigurationException: No service named ListsSoap is available".
>
> Can you tell me if you have seen this issue before and what may be causing this issue?
>
> Thanks for your help.
>
> Dan
> ________________________________________
> From: Karl Wright [daddywri@gmail.com]
> Sent: Friday, January 20, 2012 7:31 AM
> To: Silvia, Daniel [USA]
> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>
> Hi Daniel,
>
> In order for the SharePoint connector to build, you need to have the
> wsdls in place in the right area.  We cannot ship those because of
> potential copyright issues.  The easiest way to obtain the right
> dependencies is:
>
> ant download-dependencies
>
> Then, just build normally:
>
> ant build
>
> This will only work for ManifoldCF-0.4-incubating, or trunk.
> 0.4-incubating is still in the process of being signed off by the
> incubator, but you can find the release candidate here:
>
> http://people.apache.org/~kwright
>
> Thanks,
> Karl
>
>
>
> On Fri, Jan 20, 2012 at 7:02 AM, Silvia, Daniel [USA]
> <Si...@bah.com> wrote:
>> Hi Karl
>>
>> I work with Matt Parker and we are in the process of developing a pipeline
>> that uses ManifoldCF at the beginning. I just subscribed to the
>> connectors-user-subscribe@incubator.apache.org
>> group yesterday and submitted an e-mail question to the group. Can you help
>> us with the below issue?
>>
>> I downloaded MCF and started playing with the default setup under Jetty and
>> Derby. It starts up without any issue. I am trying to configure a SharePoint
>> connector, connecting to SharePoint Service 3. I have been following the
>> instructions and I am at the point of deploying the custom SharePoint web
>> service to the SharePoint instance. The instructions indicate that I should
>> get the web service from dist/sharepoint-integration after building MCF.
>> However, after looking through the entire directory structure, I am unable
>> to find the service to deploy.
>>
>> Can someone tell me where to find this service?
>>
>> Thanks for your help.
>>
>> Daniel Silvia

RE: ManifoldCF's dist/shapoint-integration dir

Posted by "Silvia, Daniel [USA]" <Si...@bah.com>.
Hi Karl

Ok, I added a number of Paths using File, Site, and Library to the main site, and sub site under the main site. I looked at the log file and it appears I am getting an axis configuration exception:

No service named UserGroupSoap is available and No service named http://....../GetUserCollectionFromGroup is available.

My site admin for the SharePoint Portal has given me Full Control access, so there shouldn't be aby issue with authentication to the SharePoint services.

Also, I went to the properties.xml file to modify the org.apache.manifoldcf.connectors property, however, this property didn't exist. I added the property which looks like <property name="org.apache.manifoldcf.connectors" value="DEBUG" />

Thanks

________________________________________
From: Karl Wright [daddywri@gmail.com]
Sent: Tuesday, January 31, 2012 10:52 AM
To: Silvia, Daniel [USA]
Cc: connectors-user@incubator.apache.org
Subject: Re: ManifoldCF's dist/shapoint-integration dir

It's been a while since I've set up a SharePoint job but I think what
you are missing is a file rule (instead of just a library rule).
Here's what the end-user documentation says on the matter:

"Each rule consists of a path, a rule type, and an action. The actions
are "Include" and "Exclude". The rule type tells the connection what
kind of SharePoint entity it is allowed to exactly match. For example,
a "File" rule will only exactly match SharePoint paths that represent
files - it cannot exactly match sites or libraries. The path itself is
just a sequence of characters, where the "*" character has the special
meaning of being able to match any number of any kind of characters,
and the "?" character matches exactly one character of any kind.

The rule matcher extends strict, exact matching by introducing a
concept of implicit inclusion rules. If your rule action is "Include",
and you specify (say) a "File" rule, the matcher presumes implicit
inclusion rules for the corresponding site and library. So, if you
create an "Include File" rule that matches (for example)
"/MySite/MyLibrary/MyFile", there is an implied "Site Include" rule
for "/MySite", and an implied "Library Include" rule for
"/MySite/MyLibrary". Similarly, if you create a "Library Include"
rule, there is an implied "Site Include" rule that corresponds to it.
Note that these shortcuts only applies to "Include" rules - there are
no corresponding implied "Exclude" rules."

What this means is that you should probably be declaring file rules
with "*" as the file name for each library, rather than a library
rule.  You might want to just try this.  If you still have trouble,
you can try setting the "org.apache.manifoldcf.connectors" property to
"DEBUG" in the properties.xml file and restarting ManifoldCF before
your next crawl.  The manifoldcf.log file will then have output
describing the decisions the SharePoint connector made about each
site, library, file, or folder it encountered.

Thanks,
Karl

On Tue, Jan 31, 2012 at 10:27 AM, Silvia, Daniel [USA]
<Si...@bah.com> wrote:
> Hi Karl
>
> The Path Rules are :
>
> Path Match: /Shared Documents
> Type: library
> Action: include
>
> Path Match: /IDD/Shared Documents
> Type: library
> Action: include
>
> Path Match: /IDD/Documents
> Type: library
> Action: include
>
> Path Match: /manifoldcf/Shared Documents
> Type: library
> Action: include
>
> I hope this helps.
>
> I really appreciate your help.
>
>
>
> ________________________________________
> From: Karl Wright [daddywri@gmail.com]
> Sent: Tuesday, January 31, 2012 10:01 AM
> To: Silvia, Daniel [USA]
> Cc: connectors-user@incubator.apache.org
> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>
> "When I select only the fetch activity, I don't see anything in the
> events, when I select the Document Ingest activity, I don't see
> anything in the events."
>
> So either you've already run the job and the documents were accessed
> the first time (and won't be accessed again until they change), or the
> problem is likely that your SharePoint Path Rules are not including
> any documents.  It would be very helpful at this point to include a
> screen shot of the job you've created.  Since you are not on the net,
> perhaps you can jot down your SharePoint path rules for me to have a
> look at, as they are displayed when you view the job.
>
> Thanks,
> Karl
>
> On Tue, Jan 31, 2012 at 9:44 AM, Silvia, Daniel [USA]
> <Si...@bah.com> wrote:
>> Hi Karl
>>
>> Ok, I have created a new job and ran the job and went to the Simple History Report.
>>
>> I see the Events. If all the  Activities in the Simple History Report, Document Deletion(SolrPipeline), Document Ingest(SolrPipeline), and Fetch are selected I see a start job and end job for events . When I get to the Simple History Report I can select the "Connection", I don't have an option to select the Activities I run the report first.
>> When I select only the fetch activity, I don't see anything in the events, when I select the Document Ingest activity, I don't see anything in the events.
>>
>> My solr output connection has the following information:
>> Protocol: http
>> Server: "the server name"
>> Port:8080 (we are running solr on Jboss port 8080)
>> Web Application Name: solr
>> Core Name: collection1
>> Update Handler: update/extract
>> Remove Handler: /update
>> Status Handler: /admin/ping
>>
>>
>>
>> ________________________________________
>> From: Karl Wright [daddywri@gmail.com]
>> Sent: Tuesday, January 31, 2012 9:00 AM
>> To: Silvia, Daniel [USA]; connectors-user@incubator.apache.org
>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>
>> Ok, let's do one thing at a time.
>>
>> First:
>>
>> "For the Path tab where there are Path Rules, are these the paths we
>> want ManifoldCF to follow? Each site, and each Library like Documents
>> and Shared Documents. And in the Metadata tab, this is the tab where
>> you indicate for each "Site" and "Library" you want to include
>> specific metadata or include all metadata?"
>>
>> For SharePoint, there are Path Rules and Metadata Rules.  The Path
>> Rules describe what documents you want to include or exclude.  The
>> Metadata Rules describe what metadata you want to include or exclude.
>> For right now I would ignore the Metadata Rules and just make sure you
>> have Path Rules that mean that you have included documents.
>>
>> "As I run the report, I see "Documents", "Active, and "Processed"
>> where the numbers change under the "Active" column as well as the
>> "Document" and "Processed" column (these just get larger, where Active
>> changes). "
>>
>> This "report" we actually call the Job Status screen.  The fact that
>> the numbers get larger and the job doesn't just end indicates that you
>> are successfully crawling your SharePoint, and you have set up the job
>> to include at least some documents.  This is good news.  However, this
>> is NOT the "Simple History" report I was alluding to earlier.  To get
>> to that report, click on the "Simple History" link on the left-hand
>> navigation area.  This report will show the events of your choice
>> (default - ALL recorded events) over a given time window (default: the
>> last hour).  If you've done this right you should at least see a "Job
>> start" event.  The events you are most interested in are the "fetch"
>> (which describes all attempts to fetch documents from SharePoint) and
>> "document ingest", which describe attempts to get documents into Solr.
>>  You can refresh the displayed events by clicking the "Go" button in
>> the middle of the screen whenever you wish.
>>
>> I'd like you to delete your job, create it again, and start it.  Then,
>> while it is running, I'd like you to go to the "Simple History"
>> screen, and select the appropriate connection (your SharePoint
>> repository connection), and click the "Go" button.  So as not to skip
>> anything basic:
>>
>> (1) What event types do you see?
>> (2) Are there "fetch" events?
>> (3) Are there "document ingest" events?
>>
>> If you see no "fetch" events, that implies you have either not
>> specified any documents to include in your job, OR your Solr
>> connection is configured to reject too many document types so they are
>> all getting filtered out.
>>
>> If you see "document ingest" events, but those have errors, it implies
>> that the configuration of your Solr connection is incorrect and does
>> not match the way your Solr is configured.  If you send me a specific
>> error code and/or text I can help you figure out what is happening.
>>
>> If you see "document ingest" events with NO errors, but the Solr
>> instance is not getting documents, you are describing an impossible
>> situation.  While your Solr instance may not be configured to have the
>> Extracting Update Handler active, or it may be at a different URL than
>> what you pointed at, that would definitely yield errors or
>> notifications in the Simple History.
>>
>> Please let me know what you actually see.
>> Karl
>>
>>
>>
>> On Tue, Jan 31, 2012 at 7:53 AM, Silvia, Daniel [USA]
>> <Si...@bah.com> wrote:
>>> Hi Karl
>>>
>>> I am trying to figure out why I can't see anything being indexed into our Solr index. I was looking at another post where you were working with "Martijn" and that individual was not able to see info getting into Solr. In the report  that I have set up, I have included all metadata associated to each site, Share Documents, and Documents. In the Solr Field Mapping, I am associating metadata fields that are indicated in the MetaData tab to fields that exist in our solr index.
>>>
>>> For the Path tab where there are Path Rules, are these the paths we want ManifoldCF to follow? Each site, and each Library like Documents and Shared Documents. And in the Metadata tab, this is the tab where you indicate for each "Site" and "Library" you want to include specific metadata or include all metadata?
>>>
>>> As I run the report, I see "Documents", "Active, and "Processed" where the numbers change under the "Active" column as well as the "Document" and "Processed" column (these just get larger, where Active changes). While I was researching why I may not be seeing something over on the Solr side, I saw your communication with another individual indicating that I should see something like literal.xxx=yyy in the Solr log. This is an older post so there maybe something else I should see. But the only thing I see when I look at the Solr log is "[ ] webapp=/solr path=/update/extract params={commit=true} status=0 QTime=0".
>>>
>>> Any ideas.
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>>
>>> ________________________________________
>>> From: Karl Wright [daddywri@gmail.com]
>>> Sent: Monday, January 30, 2012 10:40 AM
>>> To: Silvia, Daniel [USA]
>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>
>>> The default time range for the Simple History is the last hour.  I
>>> suspect you are unaware of that.  If you want a different time range
>>> you will have to modify the start and end time pulldowns accordingly.
>>>
>>> Karl
>>>
>>> On Mon, Jan 30, 2012 at 10:34 AM, Silvia, Daniel [USA]
>>> <Si...@bah.com> wrote:
>>>> Hi Karl
>>>>
>>>> I am looking at the Simple History in the UI and there isn't much to see, unless I am not getting what I am suppose to.  I see the "Start Time, Activity, Identifier, Bytes, and Time, I don't get anything for Result Code or Result Description. I looked in the documentation and we should be getting something in those fields, I believe.
>>>>
>>>> Anyway, I will look through the mail list to see what I can find.
>>>>
>>>> Thanks for the help.
>>>>
>>>> Dan
>>>>
>>>> ________________________________________
>>>> From: Karl Wright [daddywri@gmail.com]
>>>> Sent: Monday, January 30, 2012 8:24 AM
>>>> To: Silvia, Daniel [USA]
>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>
>>>> So just to be clear, I'm NOT talking about the ManifoldCF logging.
>>>> For the Solr connector you probably won't need to turn that on; it's
>>>> pretty simple and you can look at the Simple History in the UI to see
>>>> what the request and response look like from Solr.  I was talking
>>>> instead about Solr logging - when you run the Solr Webapp, by default
>>>> all requests against the Extracting Update Handler are logged to
>>>> standard error, so you will see them appear in the process window in
>>>> which Solr is running.
>>>>
>>>> My suggestion to you is to first have a look at the Simple History for
>>>> the job you are trying to run.  If you are getting back 500 errors
>>>> from Solr, that means you have not set up Solr properly to work with
>>>> ManifoldCF.  In recent versions of Solr, the example works fine out of
>>>> the box, but when you try to deploy any other way you are often
>>>> missing the jar that contains the extracting update handler, so of
>>>> course nothing works.  Several people on the connectors-user list have
>>>> run into this and if you search the list (go to the ManifoldCF site
>>>> and click through to the mailing list page and there are links at the
>>>> bottom for this purpose) you will find posts that describe exactly
>>>> what is wrong and how to fix it.
>>>>
>>>> Hope this helps.
>>>>
>>>> Karl
>>>>
>>>>
>>>> On Sun, Jan 29, 2012 at 2:30 PM, Silvia, Daniel [USA]
>>>> <Si...@bah.com> wrote:
>>>>> Yea,but for some reason the logging isn't coming through. The logging is set for info and I will have to change the logging level to DEBUG.
>>>>>
>>>>> Thanks again for your help.
>>>>>
>>>>>
>>>>> ________________________________________
>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>> Sent: Friday, January 27, 2012 5:06 PM
>>>>> To: Silvia, Daniel [USA]
>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>
>>>>> Actually, the best thing for debugging the Solr connection is looking
>>>>> at standard-output on the Solr instance.  You will see all the posts
>>>>> that are made and what the arguments were.  Also, this is the kind of
>>>>> question you'd get a lot of benefit from posting to the list.  The
>>>>> end-user documentation I pointed you at before describes some of this
>>>>> but the Solr connector has grown beyond the doc to some extent at this
>>>>> point.
>>>>>
>>>>> Karl
>>>>>
>>>>> On Fri, Jan 27, 2012 at 9:51 AM, Silvia, Daniel [USA]
>>>>> <Si...@bah.com> wrote:
>>>>>> Hi Karl
>>>>>>
>>>>>> Is there a log level other than  Wire-level debugging to view log staements for trying to send output to a Solr instance in the Jobs List/Creation section? We are having an issue getting content to Solr. Is there a document anywhere which defines the fields for the Jobs sections for the Solr Field Mapping tab and the Paths and MetaData tabs?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>> ________________________________________
>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>> Sent: Thursday, January 26, 2012 10:44 AM
>>>>>> To: Silvia, Daniel [USA]
>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>
>>>>>> I am afraid I don't know the answer to that.  I'm sure it's infinitely
>>>>>> configurable but it's not clear what the SharePoint web services need
>>>>>> to do under the hood, so anything I tell you would be just a guess.
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>> On Thu, Jan 26, 2012 at 10:43 AM, Silvia, Daniel [USA]
>>>>>> <Si...@bah.com> wrote:
>>>>>>> Hi Karl
>>>>>>>
>>>>>>> One more question. Do you know the minimum permissions needed to crawl the Sharepoint instance and all sites under the instance? The individual who set my permissions set me up as the "site collection admin" for the top most site. Is there a specific admin role without setting the user crawling the sharpoint instance other than "Farm Admin"?
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> ________________________________________
>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>> Sent: Thursday, January 26, 2012 9:53 AM
>>>>>>> To: Silvia, Daniel [USA]
>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>
>>>>>>> Good news!  Please keep in touch; we'd like to hear how things work
>>>>>>> for you (it helps keep the software fresh ;-) ).
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>> On Thu, Jan 26, 2012 at 9:48 AM, Silvia, Daniel [USA]
>>>>>>> <Si...@bah.com> wrote:
>>>>>>>> Hey Karl
>>>>>>>>
>>>>>>>> (1) was the issue. When requesting access to the SharePoint instance I indicated that I needed to be able to crawl SharePoint, I guess the problem was on my end indicating that I also needed privileges to crawl the site.
>>>>>>>>
>>>>>>>> Anyway, thank you for your help. When I change the SharePoint version to v 3 I get a message indicating "Connection Working".
>>>>>>>>
>>>>>>>> Appreciate the help.
>>>>>>>>
>>>>>>>> Dan
>>>>>>>>
>>>>>>>> ________________________________________
>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>> Sent: Thursday, January 26, 2012 9:19 AM
>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>
>>>>>>>> The error message "axisFault=Server, detail=Server was unable to
>>>>>>>> process request --> Requested Registry access is not allowed" is Axis
>>>>>>>> interpreting an error message from SharePoint.  What it is saying is
>>>>>>>> that the user you are trying to crawl with is unable to read the
>>>>>>>> SharePoint machine's registry but needs to.  There are two possible
>>>>>>>> causes for this:
>>>>>>>>
>>>>>>>> (1) The user you gave doesn't have enough permissions to crawl SharePoint
>>>>>>>> (2) When you installed the SharePoint MCPermissions plugin, you
>>>>>>>> installed it logged in as a user that did not enough permissions to do
>>>>>>>> what it needs to do.
>>>>>>>>
>>>>>>>> You can tell the difference between the two by selecting "SharePoint
>>>>>>>> 2.0" in the sharepoint version pulldown.  If a connection saved in
>>>>>>>> this way says "Connection working", it means that the MCPermissions
>>>>>>>> plugin has the permission problem, not your user.
>>>>>>>>
>>>>>>>> Karl
>>>>>>>>
>>>>>>>> On Thu, Jan 26, 2012 at 9:14 AM, Silvia, Daniel [USA]
>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>> Hi Karl
>>>>>>>>>
>>>>>>>>> When I try to use option (1) and don't put anything in the Site field, I get an error message "axisFault=Server, detail=Server was unable to process request --> Requested Registry access is not allowed" and when I put a "/" in the site filed I get  a GUI error indicating that the site field can't end with a "/".
>>>>>>>>>
>>>>>>>>> Anyway, do you have any ideas. Or maybe the Sharepoint instance is not configured properly for us to crawl?
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ________________________________________
>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>> Sent: Thursday, January 26, 2012 8:52 AM
>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>
>>>>>>>>> SharePoint has two kinds of site:
>>>>>>>>>
>>>>>>>>> (1) the root site, which can be reached by the path http://server:port
>>>>>>>>> (2) a number of sites under the 'virtual path', with URLs of the form:
>>>>>>>>>
>>>>>>>>> http://server:port/something/sitename
>>>>>>>>>
>>>>>>>>> The "something" is, by default, the string "site", so
>>>>>>>>> http://server:port/site/xyz might be the URL of one such virtual site.
>>>>>>>>>
>>>>>>>>> The form of the "site" field in the SharePoint connection for the
>>>>>>>>> first is either blank or "/" (can't remember which right now), and the
>>>>>>>>> form of the "site" field for the second is "/site/xyz".  On no account
>>>>>>>>> does the connector expect to see default.aspx attached to that path,
>>>>>>>>> so you should not do this; it cannot work.
>>>>>>>>>
>>>>>>>>> FWIW, my recommendation to try setting the connection type to
>>>>>>>>> "SharePoint 2.0" was to rule out any possible installation issue with
>>>>>>>>> the ManifoldCF sharepoint plugin.  The connection check for 2.0 does
>>>>>>>>> not look for it; only the connection check for 3.0 does.
>>>>>>>>>
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>> On Thu, Jan 26, 2012 at 8:41 AM, Silvia, Daniel [USA]
>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>> Hey Karl
>>>>>>>>>>
>>>>>>>>>> I am also getting an "HTTP Error 401.2: Unauthorized: Access is denied due to server configuration" when setting the Site field to /default.aspx. Do most Sharepoint instances have the urls set to something like http://server:port/sites/...... instead of http://server:port/? When I use the "/default.aspx" I see in the log files that ManifoldCF is trying to go to the Lists.asmx service with the url http://server:port/default.aspx/_vti_bin/Lists.asmx, where nothing is found.
>>>>>>>>>>
>>>>>>>>>> As you can tell I am not much of a SharePoint user or installer.
>>>>>>>>>>
>>>>>>>>>> Also, I don't think the issue is with the connector in ManifoldCF, I am just trying to
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ________________________________________
>>>>>>>>>> From: Silvia, Daniel [USA]
>>>>>>>>>> Sent: Thursday, January 26, 2012 7:23 AM
>>>>>>>>>> To: Karl Wright
>>>>>>>>>> Subject: RE: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>
>>>>>>>>>> Hey Karl
>>>>>>>>>>
>>>>>>>>>> The issue I am having is that the Sharepoint instance url is something like http://server:port/default.aspx. If I don't put anything in the site field I get a message indicating "Requested Registry Access is not allowed". I was putting "/default.apsx" as my Site field which I believe may have been the issue. However, what do you put in your Site field when the site is the top most site, as in http://server:port/default.aspx?
>>>>>>>>>>
>>>>>>>>>> I would love to send you the log messages, but I am working on a network which is not connected to the outside.
>>>>>>>>>>
>>>>>>>>>> Thanks for your help.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ________________________________________
>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>> Sent: Wednesday, January 25, 2012 6:12 PM
>>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>
>>>>>>>>>> Daniel,
>>>>>>>>>>
>>>>>>>>>> FWIW, I can help you diagnose the issue, but to do so you really need
>>>>>>>>>> to give me some concrete data.  I'm happy to grovel over the whole
>>>>>>>>>> wire log if you feel you can send it to me; something that may not
>>>>>>>>>> seem important to you will likely stand out strongly to me.  I can,
>>>>>>>>>> for example, see whether you are getting back HTML because of an
>>>>>>>>>> authentication error, for instance.  And if you ARE getting back valid
>>>>>>>>>> SOAP, I would then be sure that something was wrong with the Axis
>>>>>>>>>> client configuration, and I could pursue that here with the data
>>>>>>>>>> provided.  The problem with software like SharePoint running on IIS is
>>>>>>>>>> that it can be configured a nearly infinite number of ways, so
>>>>>>>>>> diagnosis is more of an art than a science.  I strongly suspect that
>>>>>>>>>> you're laboring under a pretty straightforward misconception which is
>>>>>>>>>> likely blocking progress, rather than there being an issue with the
>>>>>>>>>> SharePoint connector itself.  But I can't tell that without more
>>>>>>>>>> detailed communication.
>>>>>>>>>>
>>>>>>>>>> Also, you mentioned that the Lists.asmx service was right where you
>>>>>>>>>> expected it to be.  Have you read the SharePoint Connector part of the
>>>>>>>>>> end-user documentation?  To whit:
>>>>>>>>>>
>>>>>>>>>> "Select the server protocol, and enter the server name and port, based
>>>>>>>>>> on what you recorded from the URL for your SharePoint site. For the
>>>>>>>>>> "Site path" field, type in the portion of the root site URL that
>>>>>>>>>> includes everything after the server and port, except for the final
>>>>>>>>>> "aspx" file. For example, if the SharePoint URL is
>>>>>>>>>> "http://myserver:81/sites/somewhere/index.asp", the site path would be
>>>>>>>>>> "/sites/somewhere"."  The Lists.asmx service in this example would be
>>>>>>>>>> expected to be found at
>>>>>>>>>> "http://myserver:81/sites/somewhere/_vti_bin/Lists.asmx".  And the URL
>>>>>>>>>> you would start with would be the URL you see in the browser when you
>>>>>>>>>> log into the SharePoint web client and go to the site you wish to
>>>>>>>>>> crawl.  Is this what you are doing?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks again,
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Jan 25, 2012 at 12:33 PM, Karl Wright <da...@gmail.com> wrote:
>>>>>>>>>>> The code that parses the SOAP response is Apache Axis.  This hasn't
>>>>>>>>>>> changed in several years.
>>>>>>>>>>>
>>>>>>>>>>> Can you answer the following questions:
>>>>>>>>>>>
>>>>>>>>>>> (1) When the SharePoint connector makes a request to SharePoint, is
>>>>>>>>>>> the response HTML, or is it XML?  Does it have an XML header which
>>>>>>>>>>> describes a Microsoft XML namespace?  It sure sounds like it is
>>>>>>>>>>> responding with HTML.  The SharePoint connector is expecting to
>>>>>>>>>>> communicate using SOAP.  Is the response valid SOAP?
>>>>>>>>>>>
>>>>>>>>>>> (2) What version of SharePoint are you trying to connect to?  Is the
>>>>>>>>>>> SharePoint 2007?  SharePoint 2010?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jan 25, 2012 at 12:26 PM, Silvia, Daniel [USA]
>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>
>>>>>>>>>>>> I have added the specific log4j lines for Http Client wire and I restarted the ManifoldCF instance. I was also see the webservice Lists.asmx through IE. When reviewing the log files I was able to see some of the content that resides in the Sharepoint instance in the content coming back from the request. However, I am still seeing the error messages in the ManifoldCF GUI as well as in the log file indicating  "Bad Envelope: HTML" ,"No service named ListsSoap is available" and "No service named http://schemas.microsoft.com/sharepoint/soap/GetListCollection is available".
>>>>>>>>>>>>
>>>>>>>>>>>> Could there be something going on with the way the services are being built on the client side?
>>>>>>>>>>>>
>>>>>>>>>>>> Appreciate your help.
>>>>>>>>>>>>
>>>>>>>>>>>> Dan
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ________________________________________
>>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>>> Sent: Tuesday, January 24, 2012 4:52 PM
>>>>>>>>>>>> To: Silvia, Daniel [USA]; connectors-user@incubator.apache.org
>>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>>
>>>>>>>>>>>> I have not seen this exact problem before.
>>>>>>>>>>>>
>>>>>>>>>>>> The "Bad envelope tag: HTML" indicates that the SOAP request the
>>>>>>>>>>>> SharePoint connector is attempting to perform is, in fact, returning
>>>>>>>>>>>> an HTML response.  This usually indicates that the server or path
>>>>>>>>>>>> parameters you've used to set up the connection are not set correctly,
>>>>>>>>>>>> and SharePoint is not actually being engaged.
>>>>>>>>>>>>
>>>>>>>>>>>> But usually when that happens I don't recall a ConfigurationException
>>>>>>>>>>>> logged, unless it's what Axis does in response to the HTML.
>>>>>>>>>>>>
>>>>>>>>>>>> The best thing to do at this point is turn on Http Client wire
>>>>>>>>>>>> logging, restart ManifoldCF, and view the connection.  The log will
>>>>>>>>>>>> then contain a record of the exact SOAP requests and the responses,
>>>>>>>>>>>> and we can see what's wrong.  The technique is described here:
>>>>>>>>>>>>
>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/CONNECTORS/Debugging+Connections
>>>>>>>>>>>>
>>>>>>>>>>>> You can also confirm that the right SharePoint web services are
>>>>>>>>>>>> functioning on the machine in question by trying to access them
>>>>>>>>>>>> directly.  For the Lists web service, which is the one it sounds like
>>>>>>>>>>>> it was complaining about, try using IE (not Firefox etc because you
>>>>>>>>>>>> want NTLM support) to go to the url where you think the web service
>>>>>>>>>>>> lives.  This will be http: or https:, plus the server, plus the port,
>>>>>>>>>>>> plus the path, plus "_vti_bin/Lists.asmx".  You should see an
>>>>>>>>>>>> unequivocable SharePoint response.  For an example from the Microsoft
>>>>>>>>>>>> demo service, try http://www.wssdemo.com/_vti_bin/Lists.asmx.
>>>>>>>>>>>>
>>>>>>>>>>>> Please let me know how it goes, and cc the dev list (as I have) so a
>>>>>>>>>>>> record of what you're encountering can be made available to others.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks!
>>>>>>>>>>>> Karl
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jan 24, 2012 at 1:52 PM, Silvia, Daniel [USA]
>>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have downloaded the newest version of ManifoldCF v .4 and have run the necessary ant scripts to download dependencies and then built the entire project. I have also had the ShrePoint webservice MetCarta.SharePoint.MCPermissionsService.wsp deployed on the SharePoint instance due to running version 3 of SharePoint (SharePoint 2007). When I try to create a Repository Connection and select "Save" I get a message on the ManifoldCF front end of "org.xml.sax.SAXException Bad envelope tag: HTML". When I look at the log file I see an error message " org.apache.axis.ConfigurationException: No service named ListsSoap is available".
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can you tell me if you have seen this issue before and what may be causing this issue?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Dan
>>>>>>>>>>>>> ________________________________________
>>>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>>>> Sent: Friday, January 20, 2012 7:31 AM
>>>>>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>>>
>>>>>>>>>>>>> In order for the SharePoint connector to build, you need to have the
>>>>>>>>>>>>> wsdls in place in the right area.  We cannot ship those because of
>>>>>>>>>>>>> potential copyright issues.  The easiest way to obtain the right
>>>>>>>>>>>>> dependencies is:
>>>>>>>>>>>>>
>>>>>>>>>>>>> ant download-dependencies
>>>>>>>>>>>>>
>>>>>>>>>>>>> Then, just build normally:
>>>>>>>>>>>>>
>>>>>>>>>>>>> ant build
>>>>>>>>>>>>>
>>>>>>>>>>>>> This will only work for ManifoldCF-0.4-incubating, or trunk.
>>>>>>>>>>>>> 0.4-incubating is still in the process of being signed off by the
>>>>>>>>>>>>> incubator, but you can find the release candidate here:
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://people.apache.org/~kwright
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Jan 20, 2012 at 7:02 AM, Silvia, Daniel [USA]
>>>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I work with Matt Parker and we are in the process of developing a pipeline
>>>>>>>>>>>>>> that uses ManifoldCF at the beginning. I just subscribed to the
>>>>>>>>>>>>>> connectors-user-subscribe@incubator.apache.org
>>>>>>>>>>>>>> group yesterday and submitted an e-mail question to the group. Can you help
>>>>>>>>>>>>>> us with the below issue?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I downloaded MCF and started playing with the default setup under Jetty and
>>>>>>>>>>>>>> Derby. It starts up without any issue. I am trying to configure a SharePoint
>>>>>>>>>>>>>> connector, connecting to SharePoint Service 3. I have been following the
>>>>>>>>>>>>>> instructions and I am at the point of deploying the custom SharePoint web
>>>>>>>>>>>>>> service to the SharePoint instance. The instructions indicate that I should
>>>>>>>>>>>>>> get the web service from dist/sharepoint-integration after building MCF.
>>>>>>>>>>>>>> However, after looking through the entire directory structure, I am unable
>>>>>>>>>>>>>> to find the service to deploy.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Can someone tell me where to find this service?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Daniel Silvia

RE: ManifoldCF's dist/shapoint-integration dir

Posted by "Silvia, Daniel [USA]" <Si...@bah.com>.
Thanks for replying. I will work on it and let you know what I find out.
________________________________________
From: Karl Wright [daddywri@gmail.com]
Sent: Thursday, February 23, 2012 8:17 PM
To: Silvia, Daniel [USA]
Cc: connectors-user@incubator.apache.org
Subject: Re: ManifoldCF's dist/shapoint-integration dir

Hi Daniel,

I have not personally tried ManifoldCF on JBoss, but since both Jetty
and Tomcat work without modification I would wonder if there is a
JBoss classloader option you might be setting incorrectly.  The reason
this is likely is because the web container specification is pretty
clear about the hierarchical order of resolution of classes for web
applications, and it is this characteristic which will determine
whether JDBC DriverManager registration works properly or not.  Jetty
has two possible settings, for instance - one that makes it conform to
the spec, and one that is useful for single-process deployments.

Perhaps other users on this list might have some hints?

Karl


On Thu, Feb 23, 2012 at 7:47 PM, Silvia, Daniel [USA]
<Si...@bah.com> wrote:
> Hi Karl
>
> I have been trying to configure ManifoldCF to run on JBoss. When I Manifold on JBoss the connection pool can't be created. Do we need to set the datasource through the web console of JBoss. I believe the code is in the DatabaseFactory.
>
> Thanks
> Dan
>
> ________________________________________
> From: Karl Wright [daddywri@gmail.com]
> Sent: Monday, February 13, 2012 10:10 AM
> To: Silvia, Daniel [USA]
> Cc: connectors-user@incubator.apache.org
> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>
> The SharePoint connector only looks at documents within libraries, and
> documents within folders in those libraries.  I don't know how
> SharePoint is structuring your Wiki content, though.  If it is
> individual documents within libraries, it should be accessible by the
> SharePoint Connector.  If it is some other construct, then it likely
> won't be found by that connector.
>
> The Simple History is going to list the URLs that the SharePoint
> connector fetches.  If you know the URL of a piece of Wiki content and
> that URL does not appear in the Simple History, it's not being
> fetched.  Similarly, if the URL of that piece of Wiki content has no
> library name in the path, it's not something the SharePoint Connector
> will be able to index.
>
> If the SharePoint connector is not going to do it for you, and your
> wiki content is being rendered in a manner that supports standard Wiki
> API calls, you can use the Wiki Connector to index it.  If that too
> isn't going to work, then we should analyze exactly what SharePoint is
> presenting with a view towards extending the SharePoint connector.
>
> Karl
>
> On Mon, Feb 13, 2012 at 9:51 AM, Silvia, Daniel [USA]
> <Si...@bah.com> wrote:
>> Hi Karl
>>
>> Does the SharePoint connector only pull files from the SharePoint instance and not content like Wiki content. As mentioned in the previous e-mail I am able to see the xml content in the log file for the wikis with the element similar to <someWiki><someNameWiki_row>some other elements<WikiFiled>content.....</WikiField></someNameWiki_row></someWiki>. However, I do not see information in the Simple History Report pulling Wiki information or the .aspx pages. Does this report only produce information on files and not content pulled from SharePoint?
>>
>> I am just trying to figure out if I need to configure another connector to pull content from SharePoint other than the SharePoint connector.
>>
>>
>> Thanks
>>
>> Dan
>> ________________________________________
>> From: Karl Wright [daddywri@gmail.com]
>> Sent: Sunday, February 12, 2012 12:08 PM
>> To: Silvia, Daniel [USA]
>> Cc: connectors-user@incubator.apache.org
>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>
>> Hi Daniel,
>>
>> If you are seeing fetches in the Simple History that include the wiki
>> URLs you are trying to capture, the SharePoint job is likely correct.
>> Are you seeing "Document ingest" activities for the same documents?
>> If so, they are being sent to Solr, and you'd have to look into the
>> Solr configuration to figure out why they aren't being indexed.
>>
>> Thanks,
>>
>>
>> On Sun, Feb 12, 2012 at 11:37 AM, Silvia, Daniel [USA]
>> <Si...@bah.com> wrote:
>>> Hi Karl
>>>
>>> Quick question regarding SharePoint Wikis and ingesting them into Solr.
>>>
>>> I have been trying to get the Wikis, created in SharePoint, to be ingested into Solr. I am able to see the Wikis in the logging where the SharePoint Connector pulls everything from site, however, I do not see the Wikis content in the solr instance. When creating a job to run, do I need to indicate a path similar to "*Wiki* for the entire site or do I need to configure the solr metadata in the job to capture "WikiField" element in the xml being passed to the Solr connector?
>>>
>>> Thanks for your help.
>>>
>>> Dan
>>> ________________________________________
>>> From: Karl Wright [daddywri@gmail.com]
>>> Sent: Tuesday, January 31, 2012 10:52 AM
>>> To: Silvia, Daniel [USA]
>>> Cc: connectors-user@incubator.apache.org
>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>
>>> It's been a while since I've set up a SharePoint job but I think what
>>> you are missing is a file rule (instead of just a library rule).
>>> Here's what the end-user documentation says on the matter:
>>>
>>> "Each rule consists of a path, a rule type, and an action. The actions
>>> are "Include" and "Exclude". The rule type tells the connection what
>>> kind of SharePoint entity it is allowed to exactly match. For example,
>>> a "File" rule will only exactly match SharePoint paths that represent
>>> files - it cannot exactly match sites or libraries. The path itself is
>>> just a sequence of characters, where the "*" character has the special
>>> meaning of being able to match any number of any kind of characters,
>>> and the "?" character matches exactly one character of any kind.
>>>
>>> The rule matcher extends strict, exact matching by introducing a
>>> concept of implicit inclusion rules. If your rule action is "Include",
>>> and you specify (say) a "File" rule, the matcher presumes implicit
>>> inclusion rules for the corresponding site and library. So, if you
>>> create an "Include File" rule that matches (for example)
>>> "/MySite/MyLibrary/MyFile", there is an implied "Site Include" rule
>>> for "/MySite", and an implied "Library Include" rule for
>>> "/MySite/MyLibrary". Similarly, if you create a "Library Include"
>>> rule, there is an implied "Site Include" rule that corresponds to it.
>>> Note that these shortcuts only applies to "Include" rules - there are
>>> no corresponding implied "Exclude" rules."
>>>
>>> What this means is that you should probably be declaring file rules
>>> with "*" as the file name for each library, rather than a library
>>> rule.  You might want to just try this.  If you still have trouble,
>>> you can try setting the "org.apache.manifoldcf.connectors" property to
>>> "DEBUG" in the properties.xml file and restarting ManifoldCF before
>>> your next crawl.  The manifoldcf.log file will then have output
>>> describing the decisions the SharePoint connector made about each
>>> site, library, file, or folder it encountered.
>>>
>>> Thanks,
>>> Karl
>>>
>>> On Tue, Jan 31, 2012 at 10:27 AM, Silvia, Daniel [USA]
>>> <Si...@bah.com> wrote:
>>>> Hi Karl
>>>>
>>>> The Path Rules are :
>>>>
>>>> Path Match: /Shared Documents
>>>> Type: library
>>>> Action: include
>>>>
>>>> Path Match: /IDD/Shared Documents
>>>> Type: library
>>>> Action: include
>>>>
>>>> Path Match: /IDD/Documents
>>>> Type: library
>>>> Action: include
>>>>
>>>> Path Match: /manifoldcf/Shared Documents
>>>> Type: library
>>>> Action: include
>>>>
>>>> I hope this helps.
>>>>
>>>> I really appreciate your help.
>>>>
>>>>
>>>>
>>>> ________________________________________
>>>> From: Karl Wright [daddywri@gmail.com]
>>>> Sent: Tuesday, January 31, 2012 10:01 AM
>>>> To: Silvia, Daniel [USA]
>>>> Cc: connectors-user@incubator.apache.org
>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>
>>>> "When I select only the fetch activity, I don't see anything in the
>>>> events, when I select the Document Ingest activity, I don't see
>>>> anything in the events."
>>>>
>>>> So either you've already run the job and the documents were accessed
>>>> the first time (and won't be accessed again until they change), or the
>>>> problem is likely that your SharePoint Path Rules are not including
>>>> any documents.  It would be very helpful at this point to include a
>>>> screen shot of the job you've created.  Since you are not on the net,
>>>> perhaps you can jot down your SharePoint path rules for me to have a
>>>> look at, as they are displayed when you view the job.
>>>>
>>>> Thanks,
>>>> Karl
>>>>
>>>> On Tue, Jan 31, 2012 at 9:44 AM, Silvia, Daniel [USA]
>>>> <Si...@bah.com> wrote:
>>>>> Hi Karl
>>>>>
>>>>> Ok, I have created a new job and ran the job and went to the Simple History Report.
>>>>>
>>>>> I see the Events. If all the  Activities in the Simple History Report, Document Deletion(SolrPipeline), Document Ingest(SolrPipeline), and Fetch are selected I see a start job and end job for events . When I get to the Simple History Report I can select the "Connection", I don't have an option to select the Activities I run the report first.
>>>>> When I select only the fetch activity, I don't see anything in the events, when I select the Document Ingest activity, I don't see anything in the events.
>>>>>
>>>>> My solr output connection has the following information:
>>>>> Protocol: http
>>>>> Server: "the server name"
>>>>> Port:8080 (we are running solr on Jboss port 8080)
>>>>> Web Application Name: solr
>>>>> Core Name: collection1
>>>>> Update Handler: update/extract
>>>>> Remove Handler: /update
>>>>> Status Handler: /admin/ping
>>>>>
>>>>>
>>>>>
>>>>> ________________________________________
>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>> Sent: Tuesday, January 31, 2012 9:00 AM
>>>>> To: Silvia, Daniel [USA]; connectors-user@incubator.apache.org
>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>
>>>>> Ok, let's do one thing at a time.
>>>>>
>>>>> First:
>>>>>
>>>>> "For the Path tab where there are Path Rules, are these the paths we
>>>>> want ManifoldCF to follow? Each site, and each Library like Documents
>>>>> and Shared Documents. And in the Metadata tab, this is the tab where
>>>>> you indicate for each "Site" and "Library" you want to include
>>>>> specific metadata or include all metadata?"
>>>>>
>>>>> For SharePoint, there are Path Rules and Metadata Rules.  The Path
>>>>> Rules describe what documents you want to include or exclude.  The
>>>>> Metadata Rules describe what metadata you want to include or exclude.
>>>>> For right now I would ignore the Metadata Rules and just make sure you
>>>>> have Path Rules that mean that you have included documents.
>>>>>
>>>>> "As I run the report, I see "Documents", "Active, and "Processed"
>>>>> where the numbers change under the "Active" column as well as the
>>>>> "Document" and "Processed" column (these just get larger, where Active
>>>>> changes). "
>>>>>
>>>>> This "report" we actually call the Job Status screen.  The fact that
>>>>> the numbers get larger and the job doesn't just end indicates that you
>>>>> are successfully crawling your SharePoint, and you have set up the job
>>>>> to include at least some documents.  This is good news.  However, this
>>>>> is NOT the "Simple History" report I was alluding to earlier.  To get
>>>>> to that report, click on the "Simple History" link on the left-hand
>>>>> navigation area.  This report will show the events of your choice
>>>>> (default - ALL recorded events) over a given time window (default: the
>>>>> last hour).  If you've done this right you should at least see a "Job
>>>>> start" event.  The events you are most interested in are the "fetch"
>>>>> (which describes all attempts to fetch documents from SharePoint) and
>>>>> "document ingest", which describe attempts to get documents into Solr.
>>>>>  You can refresh the displayed events by clicking the "Go" button in
>>>>> the middle of the screen whenever you wish.
>>>>>
>>>>> I'd like you to delete your job, create it again, and start it.  Then,
>>>>> while it is running, I'd like you to go to the "Simple History"
>>>>> screen, and select the appropriate connection (your SharePoint
>>>>> repository connection), and click the "Go" button.  So as not to skip
>>>>> anything basic:
>>>>>
>>>>> (1) What event types do you see?
>>>>> (2) Are there "fetch" events?
>>>>> (3) Are there "document ingest" events?
>>>>>
>>>>> If you see no "fetch" events, that implies you have either not
>>>>> specified any documents to include in your job, OR your Solr
>>>>> connection is configured to reject too many document types so they are
>>>>> all getting filtered out.
>>>>>
>>>>> If you see "document ingest" events, but those have errors, it implies
>>>>> that the configuration of your Solr connection is incorrect and does
>>>>> not match the way your Solr is configured.  If you send me a specific
>>>>> error code and/or text I can help you figure out what is happening.
>>>>>
>>>>> If you see "document ingest" events with NO errors, but the Solr
>>>>> instance is not getting documents, you are describing an impossible
>>>>> situation.  While your Solr instance may not be configured to have the
>>>>> Extracting Update Handler active, or it may be at a different URL than
>>>>> what you pointed at, that would definitely yield errors or
>>>>> notifications in the Simple History.
>>>>>
>>>>> Please let me know what you actually see.
>>>>> Karl
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Jan 31, 2012 at 7:53 AM, Silvia, Daniel [USA]
>>>>> <Si...@bah.com> wrote:
>>>>>> Hi Karl
>>>>>>
>>>>>> I am trying to figure out why I can't see anything being indexed into our Solr index. I was looking at another post where you were working with "Martijn" and that individual was not able to see info getting into Solr. In the report  that I have set up, I have included all metadata associated to each site, Share Documents, and Documents. In the Solr Field Mapping, I am associating metadata fields that are indicated in the MetaData tab to fields that exist in our solr index.
>>>>>>
>>>>>> For the Path tab where there are Path Rules, are these the paths we want ManifoldCF to follow? Each site, and each Library like Documents and Shared Documents. And in the Metadata tab, this is the tab where you indicate for each "Site" and "Library" you want to include specific metadata or include all metadata?
>>>>>>
>>>>>> As I run the report, I see "Documents", "Active, and "Processed" where the numbers change under the "Active" column as well as the "Document" and "Processed" column (these just get larger, where Active changes). While I was researching why I may not be seeing something over on the Solr side, I saw your communication with another individual indicating that I should see something like literal.xxx=yyy in the Solr log. This is an older post so there maybe something else I should see. But the only thing I see when I look at the Solr log is "[ ] webapp=/solr path=/update/extract params={commit=true} status=0 QTime=0".
>>>>>>
>>>>>> Any ideas.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ________________________________________
>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>> Sent: Monday, January 30, 2012 10:40 AM
>>>>>> To: Silvia, Daniel [USA]
>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>
>>>>>> The default time range for the Simple History is the last hour.  I
>>>>>> suspect you are unaware of that.  If you want a different time range
>>>>>> you will have to modify the start and end time pulldowns accordingly.
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>> On Mon, Jan 30, 2012 at 10:34 AM, Silvia, Daniel [USA]
>>>>>> <Si...@bah.com> wrote:
>>>>>>> Hi Karl
>>>>>>>
>>>>>>> I am looking at the Simple History in the UI and there isn't much to see, unless I am not getting what I am suppose to.  I see the "Start Time, Activity, Identifier, Bytes, and Time, I don't get anything for Result Code or Result Description. I looked in the documentation and we should be getting something in those fields, I believe.
>>>>>>>
>>>>>>> Anyway, I will look through the mail list to see what I can find.
>>>>>>>
>>>>>>> Thanks for the help.
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>> ________________________________________
>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>> Sent: Monday, January 30, 2012 8:24 AM
>>>>>>> To: Silvia, Daniel [USA]
>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>
>>>>>>> So just to be clear, I'm NOT talking about the ManifoldCF logging.
>>>>>>> For the Solr connector you probably won't need to turn that on; it's
>>>>>>> pretty simple and you can look at the Simple History in the UI to see
>>>>>>> what the request and response look like from Solr.  I was talking
>>>>>>> instead about Solr logging - when you run the Solr Webapp, by default
>>>>>>> all requests against the Extracting Update Handler are logged to
>>>>>>> standard error, so you will see them appear in the process window in
>>>>>>> which Solr is running.
>>>>>>>
>>>>>>> My suggestion to you is to first have a look at the Simple History for
>>>>>>> the job you are trying to run.  If you are getting back 500 errors
>>>>>>> from Solr, that means you have not set up Solr properly to work with
>>>>>>> ManifoldCF.  In recent versions of Solr, the example works fine out of
>>>>>>> the box, but when you try to deploy any other way you are often
>>>>>>> missing the jar that contains the extracting update handler, so of
>>>>>>> course nothing works.  Several people on the connectors-user list have
>>>>>>> run into this and if you search the list (go to the ManifoldCF site
>>>>>>> and click through to the mailing list page and there are links at the
>>>>>>> bottom for this purpose) you will find posts that describe exactly
>>>>>>> what is wrong and how to fix it.
>>>>>>>
>>>>>>> Hope this helps.
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Jan 29, 2012 at 2:30 PM, Silvia, Daniel [USA]
>>>>>>> <Si...@bah.com> wrote:
>>>>>>>> Yea,but for some reason the logging isn't coming through. The logging is set for info and I will have to change the logging level to DEBUG.
>>>>>>>>
>>>>>>>> Thanks again for your help.
>>>>>>>>
>>>>>>>>
>>>>>>>> ________________________________________
>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>> Sent: Friday, January 27, 2012 5:06 PM
>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>
>>>>>>>> Actually, the best thing for debugging the Solr connection is looking
>>>>>>>> at standard-output on the Solr instance.  You will see all the posts
>>>>>>>> that are made and what the arguments were.  Also, this is the kind of
>>>>>>>> question you'd get a lot of benefit from posting to the list.  The
>>>>>>>> end-user documentation I pointed you at before describes some of this
>>>>>>>> but the Solr connector has grown beyond the doc to some extent at this
>>>>>>>> point.
>>>>>>>>
>>>>>>>> Karl
>>>>>>>>
>>>>>>>> On Fri, Jan 27, 2012 at 9:51 AM, Silvia, Daniel [USA]
>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>> Hi Karl
>>>>>>>>>
>>>>>>>>> Is there a log level other than  Wire-level debugging to view log staements for trying to send output to a Solr instance in the Jobs List/Creation section? We are having an issue getting content to Solr. Is there a document anywhere which defines the fields for the Jobs sections for the Solr Field Mapping tab and the Paths and MetaData tabs?
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>> Dan
>>>>>>>>>
>>>>>>>>> ________________________________________
>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>> Sent: Thursday, January 26, 2012 10:44 AM
>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>
>>>>>>>>> I am afraid I don't know the answer to that.  I'm sure it's infinitely
>>>>>>>>> configurable but it's not clear what the SharePoint web services need
>>>>>>>>> to do under the hood, so anything I tell you would be just a guess.
>>>>>>>>>
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>> On Thu, Jan 26, 2012 at 10:43 AM, Silvia, Daniel [USA]
>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>> Hi Karl
>>>>>>>>>>
>>>>>>>>>> One more question. Do you know the minimum permissions needed to crawl the Sharepoint instance and all sites under the instance? The individual who set my permissions set me up as the "site collection admin" for the top most site. Is there a specific admin role without setting the user crawling the sharpoint instance other than "Farm Admin"?
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>> ________________________________________
>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>> Sent: Thursday, January 26, 2012 9:53 AM
>>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>
>>>>>>>>>> Good news!  Please keep in touch; we'd like to hear how things work
>>>>>>>>>> for you (it helps keep the software fresh ;-) ).
>>>>>>>>>>
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>> On Thu, Jan 26, 2012 at 9:48 AM, Silvia, Daniel [USA]
>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>> Hey Karl
>>>>>>>>>>>
>>>>>>>>>>> (1) was the issue. When requesting access to the SharePoint instance I indicated that I needed to be able to crawl SharePoint, I guess the problem was on my end indicating that I also needed privileges to crawl the site.
>>>>>>>>>>>
>>>>>>>>>>> Anyway, thank you for your help. When I change the SharePoint version to v 3 I get a message indicating "Connection Working".
>>>>>>>>>>>
>>>>>>>>>>> Appreciate the help.
>>>>>>>>>>>
>>>>>>>>>>> Dan
>>>>>>>>>>>
>>>>>>>>>>> ________________________________________
>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>> Sent: Thursday, January 26, 2012 9:19 AM
>>>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>
>>>>>>>>>>> The error message "axisFault=Server, detail=Server was unable to
>>>>>>>>>>> process request --> Requested Registry access is not allowed" is Axis
>>>>>>>>>>> interpreting an error message from SharePoint.  What it is saying is
>>>>>>>>>>> that the user you are trying to crawl with is unable to read the
>>>>>>>>>>> SharePoint machine's registry but needs to.  There are two possible
>>>>>>>>>>> causes for this:
>>>>>>>>>>>
>>>>>>>>>>> (1) The user you gave doesn't have enough permissions to crawl SharePoint
>>>>>>>>>>> (2) When you installed the SharePoint MCPermissions plugin, you
>>>>>>>>>>> installed it logged in as a user that did not enough permissions to do
>>>>>>>>>>> what it needs to do.
>>>>>>>>>>>
>>>>>>>>>>> You can tell the difference between the two by selecting "SharePoint
>>>>>>>>>>> 2.0" in the sharepoint version pulldown.  If a connection saved in
>>>>>>>>>>> this way says "Connection working", it means that the MCPermissions
>>>>>>>>>>> plugin has the permission problem, not your user.
>>>>>>>>>>>
>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jan 26, 2012 at 9:14 AM, Silvia, Daniel [USA]
>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>
>>>>>>>>>>>> When I try to use option (1) and don't put anything in the Site field, I get an error message "axisFault=Server, detail=Server was unable to process request --> Requested Registry access is not allowed" and when I put a "/" in the site filed I get  a GUI error indicating that the site field can't end with a "/".
>>>>>>>>>>>>
>>>>>>>>>>>> Anyway, do you have any ideas. Or maybe the Sharepoint instance is not configured properly for us to crawl?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ________________________________________
>>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>>> Sent: Thursday, January 26, 2012 8:52 AM
>>>>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>>
>>>>>>>>>>>> SharePoint has two kinds of site:
>>>>>>>>>>>>
>>>>>>>>>>>> (1) the root site, which can be reached by the path http://server:port
>>>>>>>>>>>> (2) a number of sites under the 'virtual path', with URLs of the form:
>>>>>>>>>>>>
>>>>>>>>>>>> http://server:port/something/sitename
>>>>>>>>>>>>
>>>>>>>>>>>> The "something" is, by default, the string "site", so
>>>>>>>>>>>> http://server:port/site/xyz might be the URL of one such virtual site.
>>>>>>>>>>>>
>>>>>>>>>>>> The form of the "site" field in the SharePoint connection for the
>>>>>>>>>>>> first is either blank or "/" (can't remember which right now), and the
>>>>>>>>>>>> form of the "site" field for the second is "/site/xyz".  On no account
>>>>>>>>>>>> does the connector expect to see default.aspx attached to that path,
>>>>>>>>>>>> so you should not do this; it cannot work.
>>>>>>>>>>>>
>>>>>>>>>>>> FWIW, my recommendation to try setting the connection type to
>>>>>>>>>>>> "SharePoint 2.0" was to rule out any possible installation issue with
>>>>>>>>>>>> the ManifoldCF sharepoint plugin.  The connection check for 2.0 does
>>>>>>>>>>>> not look for it; only the connection check for 3.0 does.
>>>>>>>>>>>>
>>>>>>>>>>>> Karl
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Jan 26, 2012 at 8:41 AM, Silvia, Daniel [USA]
>>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>>> Hey Karl
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am also getting an "HTTP Error 401.2: Unauthorized: Access is denied due to server configuration" when setting the Site field to /default.aspx. Do most Sharepoint instances have the urls set to something like http://server:port/sites/...... instead of http://server:port/? When I use the "/default.aspx" I see in the log files that ManifoldCF is trying to go to the Lists.asmx service with the url http://server:port/default.aspx/_vti_bin/Lists.asmx, where nothing is found.
>>>>>>>>>>>>>
>>>>>>>>>>>>> As you can tell I am not much of a SharePoint user or installer.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also, I don't think the issue is with the connector in ManifoldCF, I am just trying to
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ________________________________________
>>>>>>>>>>>>> From: Silvia, Daniel [USA]
>>>>>>>>>>>>> Sent: Thursday, January 26, 2012 7:23 AM
>>>>>>>>>>>>> To: Karl Wright
>>>>>>>>>>>>> Subject: RE: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hey Karl
>>>>>>>>>>>>>
>>>>>>>>>>>>> The issue I am having is that the Sharepoint instance url is something like http://server:port/default.aspx. If I don't put anything in the site field I get a message indicating "Requested Registry Access is not allowed". I was putting "/default.apsx" as my Site field which I believe may have been the issue. However, what do you put in your Site field when the site is the top most site, as in http://server:port/default.aspx?
>>>>>>>>>>>>>
>>>>>>>>>>>>> I would love to send you the log messages, but I am working on a network which is not connected to the outside.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ________________________________________
>>>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>>>> Sent: Wednesday, January 25, 2012 6:12 PM
>>>>>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>>>
>>>>>>>>>>>>> Daniel,
>>>>>>>>>>>>>
>>>>>>>>>>>>> FWIW, I can help you diagnose the issue, but to do so you really need
>>>>>>>>>>>>> to give me some concrete data.  I'm happy to grovel over the whole
>>>>>>>>>>>>> wire log if you feel you can send it to me; something that may not
>>>>>>>>>>>>> seem important to you will likely stand out strongly to me.  I can,
>>>>>>>>>>>>> for example, see whether you are getting back HTML because of an
>>>>>>>>>>>>> authentication error, for instance.  And if you ARE getting back valid
>>>>>>>>>>>>> SOAP, I would then be sure that something was wrong with the Axis
>>>>>>>>>>>>> client configuration, and I could pursue that here with the data
>>>>>>>>>>>>> provided.  The problem with software like SharePoint running on IIS is
>>>>>>>>>>>>> that it can be configured a nearly infinite number of ways, so
>>>>>>>>>>>>> diagnosis is more of an art than a science.  I strongly suspect that
>>>>>>>>>>>>> you're laboring under a pretty straightforward misconception which is
>>>>>>>>>>>>> likely blocking progress, rather than there being an issue with the
>>>>>>>>>>>>> SharePoint connector itself.  But I can't tell that without more
>>>>>>>>>>>>> detailed communication.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also, you mentioned that the Lists.asmx service was right where you
>>>>>>>>>>>>> expected it to be.  Have you read the SharePoint Connector part of the
>>>>>>>>>>>>> end-user documentation?  To whit:
>>>>>>>>>>>>>
>>>>>>>>>>>>> "Select the server protocol, and enter the server name and port, based
>>>>>>>>>>>>> on what you recorded from the URL for your SharePoint site. For the
>>>>>>>>>>>>> "Site path" field, type in the portion of the root site URL that
>>>>>>>>>>>>> includes everything after the server and port, except for the final
>>>>>>>>>>>>> "aspx" file. For example, if the SharePoint URL is
>>>>>>>>>>>>> "http://myserver:81/sites/somewhere/index.asp", the site path would be
>>>>>>>>>>>>> "/sites/somewhere"."  The Lists.asmx service in this example would be
>>>>>>>>>>>>> expected to be found at
>>>>>>>>>>>>> "http://myserver:81/sites/somewhere/_vti_bin/Lists.asmx".  And the URL
>>>>>>>>>>>>> you would start with would be the URL you see in the browser when you
>>>>>>>>>>>>> log into the SharePoint web client and go to the site you wish to
>>>>>>>>>>>>> crawl.  Is this what you are doing?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks again,
>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Jan 25, 2012 at 12:33 PM, Karl Wright <da...@gmail.com> wrote:
>>>>>>>>>>>>>> The code that parses the SOAP response is Apache Axis.  This hasn't
>>>>>>>>>>>>>> changed in several years.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Can you answer the following questions:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> (1) When the SharePoint connector makes a request to SharePoint, is
>>>>>>>>>>>>>> the response HTML, or is it XML?  Does it have an XML header which
>>>>>>>>>>>>>> describes a Microsoft XML namespace?  It sure sounds like it is
>>>>>>>>>>>>>> responding with HTML.  The SharePoint connector is expecting to
>>>>>>>>>>>>>> communicate using SOAP.  Is the response valid SOAP?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> (2) What version of SharePoint are you trying to connect to?  Is the
>>>>>>>>>>>>>> SharePoint 2007?  SharePoint 2010?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Jan 25, 2012 at 12:26 PM, Silvia, Daniel [USA]
>>>>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have added the specific log4j lines for Http Client wire and I restarted the ManifoldCF instance. I was also see the webservice Lists.asmx through IE. When reviewing the log files I was able to see some of the content that resides in the Sharepoint instance in the content coming back from the request. However, I am still seeing the error messages in the ManifoldCF GUI as well as in the log file indicating  "Bad Envelope: HTML" ,"No service named ListsSoap is available" and "No service named http://schemas.microsoft.com/sharepoint/soap/GetListCollection is available".
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Could there be something going on with the way the services are being built on the client side?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Appreciate your help.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ________________________________________
>>>>>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>>>>>> Sent: Tuesday, January 24, 2012 4:52 PM
>>>>>>>>>>>>>>> To: Silvia, Daniel [USA]; connectors-user@incubator.apache.org
>>>>>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have not seen this exact problem before.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The "Bad envelope tag: HTML" indicates that the SOAP request the
>>>>>>>>>>>>>>> SharePoint connector is attempting to perform is, in fact, returning
>>>>>>>>>>>>>>> an HTML response.  This usually indicates that the server or path
>>>>>>>>>>>>>>> parameters you've used to set up the connection are not set correctly,
>>>>>>>>>>>>>>> and SharePoint is not actually being engaged.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> But usually when that happens I don't recall a ConfigurationException
>>>>>>>>>>>>>>> logged, unless it's what Axis does in response to the HTML.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The best thing to do at this point is turn on Http Client wire
>>>>>>>>>>>>>>> logging, restart ManifoldCF, and view the connection.  The log will
>>>>>>>>>>>>>>> then contain a record of the exact SOAP requests and the responses,
>>>>>>>>>>>>>>> and we can see what's wrong.  The technique is described here:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/CONNECTORS/Debugging+Connections
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> You can also confirm that the right SharePoint web services are
>>>>>>>>>>>>>>> functioning on the machine in question by trying to access them
>>>>>>>>>>>>>>> directly.  For the Lists web service, which is the one it sounds like
>>>>>>>>>>>>>>> it was complaining about, try using IE (not Firefox etc because you
>>>>>>>>>>>>>>> want NTLM support) to go to the url where you think the web service
>>>>>>>>>>>>>>> lives.  This will be http: or https:, plus the server, plus the port,
>>>>>>>>>>>>>>> plus the path, plus "_vti_bin/Lists.asmx".  You should see an
>>>>>>>>>>>>>>> unequivocable SharePoint response.  For an example from the Microsoft
>>>>>>>>>>>>>>> demo service, try http://www.wssdemo.com/_vti_bin/Lists.asmx.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Please let me know how it goes, and cc the dev list (as I have) so a
>>>>>>>>>>>>>>> record of what you're encountering can be made available to others.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Jan 24, 2012 at 1:52 PM, Silvia, Daniel [USA]
>>>>>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I have downloaded the newest version of ManifoldCF v .4 and have run the necessary ant scripts to download dependencies and then built the entire project. I have also had the ShrePoint webservice MetCarta.SharePoint.MCPermissionsService.wsp deployed on the SharePoint instance due to running version 3 of SharePoint (SharePoint 2007). When I try to create a Repository Connection and select "Save" I get a message on the ManifoldCF front end of "org.xml.sax.SAXException Bad envelope tag: HTML". When I look at the log file I see an error message " org.apache.axis.ConfigurationException: No service named ListsSoap is available".
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Can you tell me if you have seen this issue before and what may be causing this issue?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>>>> ________________________________________
>>>>>>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>>>>>>> Sent: Friday, January 20, 2012 7:31 AM
>>>>>>>>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In order for the SharePoint connector to build, you need to have the
>>>>>>>>>>>>>>>> wsdls in place in the right area.  We cannot ship those because of
>>>>>>>>>>>>>>>> potential copyright issues.  The easiest way to obtain the right
>>>>>>>>>>>>>>>> dependencies is:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ant download-dependencies
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Then, just build normally:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ant build
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This will only work for ManifoldCF-0.4-incubating, or trunk.
>>>>>>>>>>>>>>>> 0.4-incubating is still in the process of being signed off by the
>>>>>>>>>>>>>>>> incubator, but you can find the release candidate here:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> http://people.apache.org/~kwright
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, Jan 20, 2012 at 7:02 AM, Silvia, Daniel [USA]
>>>>>>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I work with Matt Parker and we are in the process of developing a pipeline
>>>>>>>>>>>>>>>>> that uses ManifoldCF at the beginning. I just subscribed to the
>>>>>>>>>>>>>>>>> connectors-user-subscribe@incubator.apache.org
>>>>>>>>>>>>>>>>> group yesterday and submitted an e-mail question to the group. Can you help
>>>>>>>>>>>>>>>>> us with the below issue?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I downloaded MCF and started playing with the default setup under Jetty and
>>>>>>>>>>>>>>>>> Derby. It starts up without any issue. I am trying to configure a SharePoint
>>>>>>>>>>>>>>>>> connector, connecting to SharePoint Service 3. I have been following the
>>>>>>>>>>>>>>>>> instructions and I am at the point of deploying the custom SharePoint web
>>>>>>>>>>>>>>>>> service to the SharePoint instance. The instructions indicate that I should
>>>>>>>>>>>>>>>>> get the web service from dist/sharepoint-integration after building MCF.
>>>>>>>>>>>>>>>>> However, after looking through the entire directory structure, I am unable
>>>>>>>>>>>>>>>>> to find the service to deploy.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Can someone tell me where to find this service?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Daniel Silvia

Re: ManifoldCF's dist/shapoint-integration dir

Posted by Karl Wright <da...@gmail.com>.
Hi Daniel,

I have not personally tried ManifoldCF on JBoss, but since both Jetty
and Tomcat work without modification I would wonder if there is a
JBoss classloader option you might be setting incorrectly.  The reason
this is likely is because the web container specification is pretty
clear about the hierarchical order of resolution of classes for web
applications, and it is this characteristic which will determine
whether JDBC DriverManager registration works properly or not.  Jetty
has two possible settings, for instance - one that makes it conform to
the spec, and one that is useful for single-process deployments.

Perhaps other users on this list might have some hints?

Karl


On Thu, Feb 23, 2012 at 7:47 PM, Silvia, Daniel [USA]
<Si...@bah.com> wrote:
> Hi Karl
>
> I have been trying to configure ManifoldCF to run on JBoss. When I Manifold on JBoss the connection pool can't be created. Do we need to set the datasource through the web console of JBoss. I believe the code is in the DatabaseFactory.
>
> Thanks
> Dan
>
> ________________________________________
> From: Karl Wright [daddywri@gmail.com]
> Sent: Monday, February 13, 2012 10:10 AM
> To: Silvia, Daniel [USA]
> Cc: connectors-user@incubator.apache.org
> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>
> The SharePoint connector only looks at documents within libraries, and
> documents within folders in those libraries.  I don't know how
> SharePoint is structuring your Wiki content, though.  If it is
> individual documents within libraries, it should be accessible by the
> SharePoint Connector.  If it is some other construct, then it likely
> won't be found by that connector.
>
> The Simple History is going to list the URLs that the SharePoint
> connector fetches.  If you know the URL of a piece of Wiki content and
> that URL does not appear in the Simple History, it's not being
> fetched.  Similarly, if the URL of that piece of Wiki content has no
> library name in the path, it's not something the SharePoint Connector
> will be able to index.
>
> If the SharePoint connector is not going to do it for you, and your
> wiki content is being rendered in a manner that supports standard Wiki
> API calls, you can use the Wiki Connector to index it.  If that too
> isn't going to work, then we should analyze exactly what SharePoint is
> presenting with a view towards extending the SharePoint connector.
>
> Karl
>
> On Mon, Feb 13, 2012 at 9:51 AM, Silvia, Daniel [USA]
> <Si...@bah.com> wrote:
>> Hi Karl
>>
>> Does the SharePoint connector only pull files from the SharePoint instance and not content like Wiki content. As mentioned in the previous e-mail I am able to see the xml content in the log file for the wikis with the element similar to <someWiki><someNameWiki_row>some other elements<WikiFiled>content.....</WikiField></someNameWiki_row></someWiki>. However, I do not see information in the Simple History Report pulling Wiki information or the .aspx pages. Does this report only produce information on files and not content pulled from SharePoint?
>>
>> I am just trying to figure out if I need to configure another connector to pull content from SharePoint other than the SharePoint connector.
>>
>>
>> Thanks
>>
>> Dan
>> ________________________________________
>> From: Karl Wright [daddywri@gmail.com]
>> Sent: Sunday, February 12, 2012 12:08 PM
>> To: Silvia, Daniel [USA]
>> Cc: connectors-user@incubator.apache.org
>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>
>> Hi Daniel,
>>
>> If you are seeing fetches in the Simple History that include the wiki
>> URLs you are trying to capture, the SharePoint job is likely correct.
>> Are you seeing "Document ingest" activities for the same documents?
>> If so, they are being sent to Solr, and you'd have to look into the
>> Solr configuration to figure out why they aren't being indexed.
>>
>> Thanks,
>>
>>
>> On Sun, Feb 12, 2012 at 11:37 AM, Silvia, Daniel [USA]
>> <Si...@bah.com> wrote:
>>> Hi Karl
>>>
>>> Quick question regarding SharePoint Wikis and ingesting them into Solr.
>>>
>>> I have been trying to get the Wikis, created in SharePoint, to be ingested into Solr. I am able to see the Wikis in the logging where the SharePoint Connector pulls everything from site, however, I do not see the Wikis content in the solr instance. When creating a job to run, do I need to indicate a path similar to "*Wiki* for the entire site or do I need to configure the solr metadata in the job to capture "WikiField" element in the xml being passed to the Solr connector?
>>>
>>> Thanks for your help.
>>>
>>> Dan
>>> ________________________________________
>>> From: Karl Wright [daddywri@gmail.com]
>>> Sent: Tuesday, January 31, 2012 10:52 AM
>>> To: Silvia, Daniel [USA]
>>> Cc: connectors-user@incubator.apache.org
>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>
>>> It's been a while since I've set up a SharePoint job but I think what
>>> you are missing is a file rule (instead of just a library rule).
>>> Here's what the end-user documentation says on the matter:
>>>
>>> "Each rule consists of a path, a rule type, and an action. The actions
>>> are "Include" and "Exclude". The rule type tells the connection what
>>> kind of SharePoint entity it is allowed to exactly match. For example,
>>> a "File" rule will only exactly match SharePoint paths that represent
>>> files - it cannot exactly match sites or libraries. The path itself is
>>> just a sequence of characters, where the "*" character has the special
>>> meaning of being able to match any number of any kind of characters,
>>> and the "?" character matches exactly one character of any kind.
>>>
>>> The rule matcher extends strict, exact matching by introducing a
>>> concept of implicit inclusion rules. If your rule action is "Include",
>>> and you specify (say) a "File" rule, the matcher presumes implicit
>>> inclusion rules for the corresponding site and library. So, if you
>>> create an "Include File" rule that matches (for example)
>>> "/MySite/MyLibrary/MyFile", there is an implied "Site Include" rule
>>> for "/MySite", and an implied "Library Include" rule for
>>> "/MySite/MyLibrary". Similarly, if you create a "Library Include"
>>> rule, there is an implied "Site Include" rule that corresponds to it.
>>> Note that these shortcuts only applies to "Include" rules - there are
>>> no corresponding implied "Exclude" rules."
>>>
>>> What this means is that you should probably be declaring file rules
>>> with "*" as the file name for each library, rather than a library
>>> rule.  You might want to just try this.  If you still have trouble,
>>> you can try setting the "org.apache.manifoldcf.connectors" property to
>>> "DEBUG" in the properties.xml file and restarting ManifoldCF before
>>> your next crawl.  The manifoldcf.log file will then have output
>>> describing the decisions the SharePoint connector made about each
>>> site, library, file, or folder it encountered.
>>>
>>> Thanks,
>>> Karl
>>>
>>> On Tue, Jan 31, 2012 at 10:27 AM, Silvia, Daniel [USA]
>>> <Si...@bah.com> wrote:
>>>> Hi Karl
>>>>
>>>> The Path Rules are :
>>>>
>>>> Path Match: /Shared Documents
>>>> Type: library
>>>> Action: include
>>>>
>>>> Path Match: /IDD/Shared Documents
>>>> Type: library
>>>> Action: include
>>>>
>>>> Path Match: /IDD/Documents
>>>> Type: library
>>>> Action: include
>>>>
>>>> Path Match: /manifoldcf/Shared Documents
>>>> Type: library
>>>> Action: include
>>>>
>>>> I hope this helps.
>>>>
>>>> I really appreciate your help.
>>>>
>>>>
>>>>
>>>> ________________________________________
>>>> From: Karl Wright [daddywri@gmail.com]
>>>> Sent: Tuesday, January 31, 2012 10:01 AM
>>>> To: Silvia, Daniel [USA]
>>>> Cc: connectors-user@incubator.apache.org
>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>
>>>> "When I select only the fetch activity, I don't see anything in the
>>>> events, when I select the Document Ingest activity, I don't see
>>>> anything in the events."
>>>>
>>>> So either you've already run the job and the documents were accessed
>>>> the first time (and won't be accessed again until they change), or the
>>>> problem is likely that your SharePoint Path Rules are not including
>>>> any documents.  It would be very helpful at this point to include a
>>>> screen shot of the job you've created.  Since you are not on the net,
>>>> perhaps you can jot down your SharePoint path rules for me to have a
>>>> look at, as they are displayed when you view the job.
>>>>
>>>> Thanks,
>>>> Karl
>>>>
>>>> On Tue, Jan 31, 2012 at 9:44 AM, Silvia, Daniel [USA]
>>>> <Si...@bah.com> wrote:
>>>>> Hi Karl
>>>>>
>>>>> Ok, I have created a new job and ran the job and went to the Simple History Report.
>>>>>
>>>>> I see the Events. If all the  Activities in the Simple History Report, Document Deletion(SolrPipeline), Document Ingest(SolrPipeline), and Fetch are selected I see a start job and end job for events . When I get to the Simple History Report I can select the "Connection", I don't have an option to select the Activities I run the report first.
>>>>> When I select only the fetch activity, I don't see anything in the events, when I select the Document Ingest activity, I don't see anything in the events.
>>>>>
>>>>> My solr output connection has the following information:
>>>>> Protocol: http
>>>>> Server: "the server name"
>>>>> Port:8080 (we are running solr on Jboss port 8080)
>>>>> Web Application Name: solr
>>>>> Core Name: collection1
>>>>> Update Handler: update/extract
>>>>> Remove Handler: /update
>>>>> Status Handler: /admin/ping
>>>>>
>>>>>
>>>>>
>>>>> ________________________________________
>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>> Sent: Tuesday, January 31, 2012 9:00 AM
>>>>> To: Silvia, Daniel [USA]; connectors-user@incubator.apache.org
>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>
>>>>> Ok, let's do one thing at a time.
>>>>>
>>>>> First:
>>>>>
>>>>> "For the Path tab where there are Path Rules, are these the paths we
>>>>> want ManifoldCF to follow? Each site, and each Library like Documents
>>>>> and Shared Documents. And in the Metadata tab, this is the tab where
>>>>> you indicate for each "Site" and "Library" you want to include
>>>>> specific metadata or include all metadata?"
>>>>>
>>>>> For SharePoint, there are Path Rules and Metadata Rules.  The Path
>>>>> Rules describe what documents you want to include or exclude.  The
>>>>> Metadata Rules describe what metadata you want to include or exclude.
>>>>> For right now I would ignore the Metadata Rules and just make sure you
>>>>> have Path Rules that mean that you have included documents.
>>>>>
>>>>> "As I run the report, I see "Documents", "Active, and "Processed"
>>>>> where the numbers change under the "Active" column as well as the
>>>>> "Document" and "Processed" column (these just get larger, where Active
>>>>> changes). "
>>>>>
>>>>> This "report" we actually call the Job Status screen.  The fact that
>>>>> the numbers get larger and the job doesn't just end indicates that you
>>>>> are successfully crawling your SharePoint, and you have set up the job
>>>>> to include at least some documents.  This is good news.  However, this
>>>>> is NOT the "Simple History" report I was alluding to earlier.  To get
>>>>> to that report, click on the "Simple History" link on the left-hand
>>>>> navigation area.  This report will show the events of your choice
>>>>> (default - ALL recorded events) over a given time window (default: the
>>>>> last hour).  If you've done this right you should at least see a "Job
>>>>> start" event.  The events you are most interested in are the "fetch"
>>>>> (which describes all attempts to fetch documents from SharePoint) and
>>>>> "document ingest", which describe attempts to get documents into Solr.
>>>>>  You can refresh the displayed events by clicking the "Go" button in
>>>>> the middle of the screen whenever you wish.
>>>>>
>>>>> I'd like you to delete your job, create it again, and start it.  Then,
>>>>> while it is running, I'd like you to go to the "Simple History"
>>>>> screen, and select the appropriate connection (your SharePoint
>>>>> repository connection), and click the "Go" button.  So as not to skip
>>>>> anything basic:
>>>>>
>>>>> (1) What event types do you see?
>>>>> (2) Are there "fetch" events?
>>>>> (3) Are there "document ingest" events?
>>>>>
>>>>> If you see no "fetch" events, that implies you have either not
>>>>> specified any documents to include in your job, OR your Solr
>>>>> connection is configured to reject too many document types so they are
>>>>> all getting filtered out.
>>>>>
>>>>> If you see "document ingest" events, but those have errors, it implies
>>>>> that the configuration of your Solr connection is incorrect and does
>>>>> not match the way your Solr is configured.  If you send me a specific
>>>>> error code and/or text I can help you figure out what is happening.
>>>>>
>>>>> If you see "document ingest" events with NO errors, but the Solr
>>>>> instance is not getting documents, you are describing an impossible
>>>>> situation.  While your Solr instance may not be configured to have the
>>>>> Extracting Update Handler active, or it may be at a different URL than
>>>>> what you pointed at, that would definitely yield errors or
>>>>> notifications in the Simple History.
>>>>>
>>>>> Please let me know what you actually see.
>>>>> Karl
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Jan 31, 2012 at 7:53 AM, Silvia, Daniel [USA]
>>>>> <Si...@bah.com> wrote:
>>>>>> Hi Karl
>>>>>>
>>>>>> I am trying to figure out why I can't see anything being indexed into our Solr index. I was looking at another post where you were working with "Martijn" and that individual was not able to see info getting into Solr. In the report  that I have set up, I have included all metadata associated to each site, Share Documents, and Documents. In the Solr Field Mapping, I am associating metadata fields that are indicated in the MetaData tab to fields that exist in our solr index.
>>>>>>
>>>>>> For the Path tab where there are Path Rules, are these the paths we want ManifoldCF to follow? Each site, and each Library like Documents and Shared Documents. And in the Metadata tab, this is the tab where you indicate for each "Site" and "Library" you want to include specific metadata or include all metadata?
>>>>>>
>>>>>> As I run the report, I see "Documents", "Active, and "Processed" where the numbers change under the "Active" column as well as the "Document" and "Processed" column (these just get larger, where Active changes). While I was researching why I may not be seeing something over on the Solr side, I saw your communication with another individual indicating that I should see something like literal.xxx=yyy in the Solr log. This is an older post so there maybe something else I should see. But the only thing I see when I look at the Solr log is "[ ] webapp=/solr path=/update/extract params={commit=true} status=0 QTime=0".
>>>>>>
>>>>>> Any ideas.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ________________________________________
>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>> Sent: Monday, January 30, 2012 10:40 AM
>>>>>> To: Silvia, Daniel [USA]
>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>
>>>>>> The default time range for the Simple History is the last hour.  I
>>>>>> suspect you are unaware of that.  If you want a different time range
>>>>>> you will have to modify the start and end time pulldowns accordingly.
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>> On Mon, Jan 30, 2012 at 10:34 AM, Silvia, Daniel [USA]
>>>>>> <Si...@bah.com> wrote:
>>>>>>> Hi Karl
>>>>>>>
>>>>>>> I am looking at the Simple History in the UI and there isn't much to see, unless I am not getting what I am suppose to.  I see the "Start Time, Activity, Identifier, Bytes, and Time, I don't get anything for Result Code or Result Description. I looked in the documentation and we should be getting something in those fields, I believe.
>>>>>>>
>>>>>>> Anyway, I will look through the mail list to see what I can find.
>>>>>>>
>>>>>>> Thanks for the help.
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>> ________________________________________
>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>> Sent: Monday, January 30, 2012 8:24 AM
>>>>>>> To: Silvia, Daniel [USA]
>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>
>>>>>>> So just to be clear, I'm NOT talking about the ManifoldCF logging.
>>>>>>> For the Solr connector you probably won't need to turn that on; it's
>>>>>>> pretty simple and you can look at the Simple History in the UI to see
>>>>>>> what the request and response look like from Solr.  I was talking
>>>>>>> instead about Solr logging - when you run the Solr Webapp, by default
>>>>>>> all requests against the Extracting Update Handler are logged to
>>>>>>> standard error, so you will see them appear in the process window in
>>>>>>> which Solr is running.
>>>>>>>
>>>>>>> My suggestion to you is to first have a look at the Simple History for
>>>>>>> the job you are trying to run.  If you are getting back 500 errors
>>>>>>> from Solr, that means you have not set up Solr properly to work with
>>>>>>> ManifoldCF.  In recent versions of Solr, the example works fine out of
>>>>>>> the box, but when you try to deploy any other way you are often
>>>>>>> missing the jar that contains the extracting update handler, so of
>>>>>>> course nothing works.  Several people on the connectors-user list have
>>>>>>> run into this and if you search the list (go to the ManifoldCF site
>>>>>>> and click through to the mailing list page and there are links at the
>>>>>>> bottom for this purpose) you will find posts that describe exactly
>>>>>>> what is wrong and how to fix it.
>>>>>>>
>>>>>>> Hope this helps.
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Jan 29, 2012 at 2:30 PM, Silvia, Daniel [USA]
>>>>>>> <Si...@bah.com> wrote:
>>>>>>>> Yea,but for some reason the logging isn't coming through. The logging is set for info and I will have to change the logging level to DEBUG.
>>>>>>>>
>>>>>>>> Thanks again for your help.
>>>>>>>>
>>>>>>>>
>>>>>>>> ________________________________________
>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>> Sent: Friday, January 27, 2012 5:06 PM
>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>
>>>>>>>> Actually, the best thing for debugging the Solr connection is looking
>>>>>>>> at standard-output on the Solr instance.  You will see all the posts
>>>>>>>> that are made and what the arguments were.  Also, this is the kind of
>>>>>>>> question you'd get a lot of benefit from posting to the list.  The
>>>>>>>> end-user documentation I pointed you at before describes some of this
>>>>>>>> but the Solr connector has grown beyond the doc to some extent at this
>>>>>>>> point.
>>>>>>>>
>>>>>>>> Karl
>>>>>>>>
>>>>>>>> On Fri, Jan 27, 2012 at 9:51 AM, Silvia, Daniel [USA]
>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>> Hi Karl
>>>>>>>>>
>>>>>>>>> Is there a log level other than  Wire-level debugging to view log staements for trying to send output to a Solr instance in the Jobs List/Creation section? We are having an issue getting content to Solr. Is there a document anywhere which defines the fields for the Jobs sections for the Solr Field Mapping tab and the Paths and MetaData tabs?
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>> Dan
>>>>>>>>>
>>>>>>>>> ________________________________________
>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>> Sent: Thursday, January 26, 2012 10:44 AM
>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>
>>>>>>>>> I am afraid I don't know the answer to that.  I'm sure it's infinitely
>>>>>>>>> configurable but it's not clear what the SharePoint web services need
>>>>>>>>> to do under the hood, so anything I tell you would be just a guess.
>>>>>>>>>
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>> On Thu, Jan 26, 2012 at 10:43 AM, Silvia, Daniel [USA]
>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>> Hi Karl
>>>>>>>>>>
>>>>>>>>>> One more question. Do you know the minimum permissions needed to crawl the Sharepoint instance and all sites under the instance? The individual who set my permissions set me up as the "site collection admin" for the top most site. Is there a specific admin role without setting the user crawling the sharpoint instance other than "Farm Admin"?
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>> ________________________________________
>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>> Sent: Thursday, January 26, 2012 9:53 AM
>>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>
>>>>>>>>>> Good news!  Please keep in touch; we'd like to hear how things work
>>>>>>>>>> for you (it helps keep the software fresh ;-) ).
>>>>>>>>>>
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>> On Thu, Jan 26, 2012 at 9:48 AM, Silvia, Daniel [USA]
>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>> Hey Karl
>>>>>>>>>>>
>>>>>>>>>>> (1) was the issue. When requesting access to the SharePoint instance I indicated that I needed to be able to crawl SharePoint, I guess the problem was on my end indicating that I also needed privileges to crawl the site.
>>>>>>>>>>>
>>>>>>>>>>> Anyway, thank you for your help. When I change the SharePoint version to v 3 I get a message indicating "Connection Working".
>>>>>>>>>>>
>>>>>>>>>>> Appreciate the help.
>>>>>>>>>>>
>>>>>>>>>>> Dan
>>>>>>>>>>>
>>>>>>>>>>> ________________________________________
>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>> Sent: Thursday, January 26, 2012 9:19 AM
>>>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>
>>>>>>>>>>> The error message "axisFault=Server, detail=Server was unable to
>>>>>>>>>>> process request --> Requested Registry access is not allowed" is Axis
>>>>>>>>>>> interpreting an error message from SharePoint.  What it is saying is
>>>>>>>>>>> that the user you are trying to crawl with is unable to read the
>>>>>>>>>>> SharePoint machine's registry but needs to.  There are two possible
>>>>>>>>>>> causes for this:
>>>>>>>>>>>
>>>>>>>>>>> (1) The user you gave doesn't have enough permissions to crawl SharePoint
>>>>>>>>>>> (2) When you installed the SharePoint MCPermissions plugin, you
>>>>>>>>>>> installed it logged in as a user that did not enough permissions to do
>>>>>>>>>>> what it needs to do.
>>>>>>>>>>>
>>>>>>>>>>> You can tell the difference between the two by selecting "SharePoint
>>>>>>>>>>> 2.0" in the sharepoint version pulldown.  If a connection saved in
>>>>>>>>>>> this way says "Connection working", it means that the MCPermissions
>>>>>>>>>>> plugin has the permission problem, not your user.
>>>>>>>>>>>
>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jan 26, 2012 at 9:14 AM, Silvia, Daniel [USA]
>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>
>>>>>>>>>>>> When I try to use option (1) and don't put anything in the Site field, I get an error message "axisFault=Server, detail=Server was unable to process request --> Requested Registry access is not allowed" and when I put a "/" in the site filed I get  a GUI error indicating that the site field can't end with a "/".
>>>>>>>>>>>>
>>>>>>>>>>>> Anyway, do you have any ideas. Or maybe the Sharepoint instance is not configured properly for us to crawl?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ________________________________________
>>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>>> Sent: Thursday, January 26, 2012 8:52 AM
>>>>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>>
>>>>>>>>>>>> SharePoint has two kinds of site:
>>>>>>>>>>>>
>>>>>>>>>>>> (1) the root site, which can be reached by the path http://server:port
>>>>>>>>>>>> (2) a number of sites under the 'virtual path', with URLs of the form:
>>>>>>>>>>>>
>>>>>>>>>>>> http://server:port/something/sitename
>>>>>>>>>>>>
>>>>>>>>>>>> The "something" is, by default, the string "site", so
>>>>>>>>>>>> http://server:port/site/xyz might be the URL of one such virtual site.
>>>>>>>>>>>>
>>>>>>>>>>>> The form of the "site" field in the SharePoint connection for the
>>>>>>>>>>>> first is either blank or "/" (can't remember which right now), and the
>>>>>>>>>>>> form of the "site" field for the second is "/site/xyz".  On no account
>>>>>>>>>>>> does the connector expect to see default.aspx attached to that path,
>>>>>>>>>>>> so you should not do this; it cannot work.
>>>>>>>>>>>>
>>>>>>>>>>>> FWIW, my recommendation to try setting the connection type to
>>>>>>>>>>>> "SharePoint 2.0" was to rule out any possible installation issue with
>>>>>>>>>>>> the ManifoldCF sharepoint plugin.  The connection check for 2.0 does
>>>>>>>>>>>> not look for it; only the connection check for 3.0 does.
>>>>>>>>>>>>
>>>>>>>>>>>> Karl
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Jan 26, 2012 at 8:41 AM, Silvia, Daniel [USA]
>>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>>> Hey Karl
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am also getting an "HTTP Error 401.2: Unauthorized: Access is denied due to server configuration" when setting the Site field to /default.aspx. Do most Sharepoint instances have the urls set to something like http://server:port/sites/...... instead of http://server:port/? When I use the "/default.aspx" I see in the log files that ManifoldCF is trying to go to the Lists.asmx service with the url http://server:port/default.aspx/_vti_bin/Lists.asmx, where nothing is found.
>>>>>>>>>>>>>
>>>>>>>>>>>>> As you can tell I am not much of a SharePoint user or installer.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also, I don't think the issue is with the connector in ManifoldCF, I am just trying to
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ________________________________________
>>>>>>>>>>>>> From: Silvia, Daniel [USA]
>>>>>>>>>>>>> Sent: Thursday, January 26, 2012 7:23 AM
>>>>>>>>>>>>> To: Karl Wright
>>>>>>>>>>>>> Subject: RE: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hey Karl
>>>>>>>>>>>>>
>>>>>>>>>>>>> The issue I am having is that the Sharepoint instance url is something like http://server:port/default.aspx. If I don't put anything in the site field I get a message indicating "Requested Registry Access is not allowed". I was putting "/default.apsx" as my Site field which I believe may have been the issue. However, what do you put in your Site field when the site is the top most site, as in http://server:port/default.aspx?
>>>>>>>>>>>>>
>>>>>>>>>>>>> I would love to send you the log messages, but I am working on a network which is not connected to the outside.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ________________________________________
>>>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>>>> Sent: Wednesday, January 25, 2012 6:12 PM
>>>>>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>>>
>>>>>>>>>>>>> Daniel,
>>>>>>>>>>>>>
>>>>>>>>>>>>> FWIW, I can help you diagnose the issue, but to do so you really need
>>>>>>>>>>>>> to give me some concrete data.  I'm happy to grovel over the whole
>>>>>>>>>>>>> wire log if you feel you can send it to me; something that may not
>>>>>>>>>>>>> seem important to you will likely stand out strongly to me.  I can,
>>>>>>>>>>>>> for example, see whether you are getting back HTML because of an
>>>>>>>>>>>>> authentication error, for instance.  And if you ARE getting back valid
>>>>>>>>>>>>> SOAP, I would then be sure that something was wrong with the Axis
>>>>>>>>>>>>> client configuration, and I could pursue that here with the data
>>>>>>>>>>>>> provided.  The problem with software like SharePoint running on IIS is
>>>>>>>>>>>>> that it can be configured a nearly infinite number of ways, so
>>>>>>>>>>>>> diagnosis is more of an art than a science.  I strongly suspect that
>>>>>>>>>>>>> you're laboring under a pretty straightforward misconception which is
>>>>>>>>>>>>> likely blocking progress, rather than there being an issue with the
>>>>>>>>>>>>> SharePoint connector itself.  But I can't tell that without more
>>>>>>>>>>>>> detailed communication.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also, you mentioned that the Lists.asmx service was right where you
>>>>>>>>>>>>> expected it to be.  Have you read the SharePoint Connector part of the
>>>>>>>>>>>>> end-user documentation?  To whit:
>>>>>>>>>>>>>
>>>>>>>>>>>>> "Select the server protocol, and enter the server name and port, based
>>>>>>>>>>>>> on what you recorded from the URL for your SharePoint site. For the
>>>>>>>>>>>>> "Site path" field, type in the portion of the root site URL that
>>>>>>>>>>>>> includes everything after the server and port, except for the final
>>>>>>>>>>>>> "aspx" file. For example, if the SharePoint URL is
>>>>>>>>>>>>> "http://myserver:81/sites/somewhere/index.asp", the site path would be
>>>>>>>>>>>>> "/sites/somewhere"."  The Lists.asmx service in this example would be
>>>>>>>>>>>>> expected to be found at
>>>>>>>>>>>>> "http://myserver:81/sites/somewhere/_vti_bin/Lists.asmx".  And the URL
>>>>>>>>>>>>> you would start with would be the URL you see in the browser when you
>>>>>>>>>>>>> log into the SharePoint web client and go to the site you wish to
>>>>>>>>>>>>> crawl.  Is this what you are doing?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks again,
>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Jan 25, 2012 at 12:33 PM, Karl Wright <da...@gmail.com> wrote:
>>>>>>>>>>>>>> The code that parses the SOAP response is Apache Axis.  This hasn't
>>>>>>>>>>>>>> changed in several years.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Can you answer the following questions:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> (1) When the SharePoint connector makes a request to SharePoint, is
>>>>>>>>>>>>>> the response HTML, or is it XML?  Does it have an XML header which
>>>>>>>>>>>>>> describes a Microsoft XML namespace?  It sure sounds like it is
>>>>>>>>>>>>>> responding with HTML.  The SharePoint connector is expecting to
>>>>>>>>>>>>>> communicate using SOAP.  Is the response valid SOAP?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> (2) What version of SharePoint are you trying to connect to?  Is the
>>>>>>>>>>>>>> SharePoint 2007?  SharePoint 2010?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Jan 25, 2012 at 12:26 PM, Silvia, Daniel [USA]
>>>>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have added the specific log4j lines for Http Client wire and I restarted the ManifoldCF instance. I was also see the webservice Lists.asmx through IE. When reviewing the log files I was able to see some of the content that resides in the Sharepoint instance in the content coming back from the request. However, I am still seeing the error messages in the ManifoldCF GUI as well as in the log file indicating  "Bad Envelope: HTML" ,"No service named ListsSoap is available" and "No service named http://schemas.microsoft.com/sharepoint/soap/GetListCollection is available".
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Could there be something going on with the way the services are being built on the client side?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Appreciate your help.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ________________________________________
>>>>>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>>>>>> Sent: Tuesday, January 24, 2012 4:52 PM
>>>>>>>>>>>>>>> To: Silvia, Daniel [USA]; connectors-user@incubator.apache.org
>>>>>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have not seen this exact problem before.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The "Bad envelope tag: HTML" indicates that the SOAP request the
>>>>>>>>>>>>>>> SharePoint connector is attempting to perform is, in fact, returning
>>>>>>>>>>>>>>> an HTML response.  This usually indicates that the server or path
>>>>>>>>>>>>>>> parameters you've used to set up the connection are not set correctly,
>>>>>>>>>>>>>>> and SharePoint is not actually being engaged.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> But usually when that happens I don't recall a ConfigurationException
>>>>>>>>>>>>>>> logged, unless it's what Axis does in response to the HTML.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The best thing to do at this point is turn on Http Client wire
>>>>>>>>>>>>>>> logging, restart ManifoldCF, and view the connection.  The log will
>>>>>>>>>>>>>>> then contain a record of the exact SOAP requests and the responses,
>>>>>>>>>>>>>>> and we can see what's wrong.  The technique is described here:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/CONNECTORS/Debugging+Connections
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> You can also confirm that the right SharePoint web services are
>>>>>>>>>>>>>>> functioning on the machine in question by trying to access them
>>>>>>>>>>>>>>> directly.  For the Lists web service, which is the one it sounds like
>>>>>>>>>>>>>>> it was complaining about, try using IE (not Firefox etc because you
>>>>>>>>>>>>>>> want NTLM support) to go to the url where you think the web service
>>>>>>>>>>>>>>> lives.  This will be http: or https:, plus the server, plus the port,
>>>>>>>>>>>>>>> plus the path, plus "_vti_bin/Lists.asmx".  You should see an
>>>>>>>>>>>>>>> unequivocable SharePoint response.  For an example from the Microsoft
>>>>>>>>>>>>>>> demo service, try http://www.wssdemo.com/_vti_bin/Lists.asmx.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Please let me know how it goes, and cc the dev list (as I have) so a
>>>>>>>>>>>>>>> record of what you're encountering can be made available to others.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Jan 24, 2012 at 1:52 PM, Silvia, Daniel [USA]
>>>>>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I have downloaded the newest version of ManifoldCF v .4 and have run the necessary ant scripts to download dependencies and then built the entire project. I have also had the ShrePoint webservice MetCarta.SharePoint.MCPermissionsService.wsp deployed on the SharePoint instance due to running version 3 of SharePoint (SharePoint 2007). When I try to create a Repository Connection and select "Save" I get a message on the ManifoldCF front end of "org.xml.sax.SAXException Bad envelope tag: HTML". When I look at the log file I see an error message " org.apache.axis.ConfigurationException: No service named ListsSoap is available".
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Can you tell me if you have seen this issue before and what may be causing this issue?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>>>> ________________________________________
>>>>>>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>>>>>>> Sent: Friday, January 20, 2012 7:31 AM
>>>>>>>>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In order for the SharePoint connector to build, you need to have the
>>>>>>>>>>>>>>>> wsdls in place in the right area.  We cannot ship those because of
>>>>>>>>>>>>>>>> potential copyright issues.  The easiest way to obtain the right
>>>>>>>>>>>>>>>> dependencies is:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ant download-dependencies
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Then, just build normally:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ant build
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This will only work for ManifoldCF-0.4-incubating, or trunk.
>>>>>>>>>>>>>>>> 0.4-incubating is still in the process of being signed off by the
>>>>>>>>>>>>>>>> incubator, but you can find the release candidate here:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> http://people.apache.org/~kwright
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, Jan 20, 2012 at 7:02 AM, Silvia, Daniel [USA]
>>>>>>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I work with Matt Parker and we are in the process of developing a pipeline
>>>>>>>>>>>>>>>>> that uses ManifoldCF at the beginning. I just subscribed to the
>>>>>>>>>>>>>>>>> connectors-user-subscribe@incubator.apache.org
>>>>>>>>>>>>>>>>> group yesterday and submitted an e-mail question to the group. Can you help
>>>>>>>>>>>>>>>>> us with the below issue?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I downloaded MCF and started playing with the default setup under Jetty and
>>>>>>>>>>>>>>>>> Derby. It starts up without any issue. I am trying to configure a SharePoint
>>>>>>>>>>>>>>>>> connector, connecting to SharePoint Service 3. I have been following the
>>>>>>>>>>>>>>>>> instructions and I am at the point of deploying the custom SharePoint web
>>>>>>>>>>>>>>>>> service to the SharePoint instance. The instructions indicate that I should
>>>>>>>>>>>>>>>>> get the web service from dist/sharepoint-integration after building MCF.
>>>>>>>>>>>>>>>>> However, after looking through the entire directory structure, I am unable
>>>>>>>>>>>>>>>>> to find the service to deploy.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Can someone tell me where to find this service?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Daniel Silvia

RE: ManifoldCF's dist/shapoint-integration dir

Posted by "Silvia, Daniel [USA]" <Si...@bah.com>.
Hi Karl

I have been trying to configure ManifoldCF to run on JBoss. When I Manifold on JBoss the connection pool can't be created. Do we need to set the datasource through the web console of JBoss. I believe the code is in the DatabaseFactory.

Thanks
Dan

________________________________________
From: Karl Wright [daddywri@gmail.com]
Sent: Monday, February 13, 2012 10:10 AM
To: Silvia, Daniel [USA]
Cc: connectors-user@incubator.apache.org
Subject: Re: ManifoldCF's dist/shapoint-integration dir

The SharePoint connector only looks at documents within libraries, and
documents within folders in those libraries.  I don't know how
SharePoint is structuring your Wiki content, though.  If it is
individual documents within libraries, it should be accessible by the
SharePoint Connector.  If it is some other construct, then it likely
won't be found by that connector.

The Simple History is going to list the URLs that the SharePoint
connector fetches.  If you know the URL of a piece of Wiki content and
that URL does not appear in the Simple History, it's not being
fetched.  Similarly, if the URL of that piece of Wiki content has no
library name in the path, it's not something the SharePoint Connector
will be able to index.

If the SharePoint connector is not going to do it for you, and your
wiki content is being rendered in a manner that supports standard Wiki
API calls, you can use the Wiki Connector to index it.  If that too
isn't going to work, then we should analyze exactly what SharePoint is
presenting with a view towards extending the SharePoint connector.

Karl

On Mon, Feb 13, 2012 at 9:51 AM, Silvia, Daniel [USA]
<Si...@bah.com> wrote:
> Hi Karl
>
> Does the SharePoint connector only pull files from the SharePoint instance and not content like Wiki content. As mentioned in the previous e-mail I am able to see the xml content in the log file for the wikis with the element similar to <someWiki><someNameWiki_row>some other elements<WikiFiled>content.....</WikiField></someNameWiki_row></someWiki>. However, I do not see information in the Simple History Report pulling Wiki information or the .aspx pages. Does this report only produce information on files and not content pulled from SharePoint?
>
> I am just trying to figure out if I need to configure another connector to pull content from SharePoint other than the SharePoint connector.
>
>
> Thanks
>
> Dan
> ________________________________________
> From: Karl Wright [daddywri@gmail.com]
> Sent: Sunday, February 12, 2012 12:08 PM
> To: Silvia, Daniel [USA]
> Cc: connectors-user@incubator.apache.org
> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>
> Hi Daniel,
>
> If you are seeing fetches in the Simple History that include the wiki
> URLs you are trying to capture, the SharePoint job is likely correct.
> Are you seeing "Document ingest" activities for the same documents?
> If so, they are being sent to Solr, and you'd have to look into the
> Solr configuration to figure out why they aren't being indexed.
>
> Thanks,
>
>
> On Sun, Feb 12, 2012 at 11:37 AM, Silvia, Daniel [USA]
> <Si...@bah.com> wrote:
>> Hi Karl
>>
>> Quick question regarding SharePoint Wikis and ingesting them into Solr.
>>
>> I have been trying to get the Wikis, created in SharePoint, to be ingested into Solr. I am able to see the Wikis in the logging where the SharePoint Connector pulls everything from site, however, I do not see the Wikis content in the solr instance. When creating a job to run, do I need to indicate a path similar to "*Wiki* for the entire site or do I need to configure the solr metadata in the job to capture "WikiField" element in the xml being passed to the Solr connector?
>>
>> Thanks for your help.
>>
>> Dan
>> ________________________________________
>> From: Karl Wright [daddywri@gmail.com]
>> Sent: Tuesday, January 31, 2012 10:52 AM
>> To: Silvia, Daniel [USA]
>> Cc: connectors-user@incubator.apache.org
>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>
>> It's been a while since I've set up a SharePoint job but I think what
>> you are missing is a file rule (instead of just a library rule).
>> Here's what the end-user documentation says on the matter:
>>
>> "Each rule consists of a path, a rule type, and an action. The actions
>> are "Include" and "Exclude". The rule type tells the connection what
>> kind of SharePoint entity it is allowed to exactly match. For example,
>> a "File" rule will only exactly match SharePoint paths that represent
>> files - it cannot exactly match sites or libraries. The path itself is
>> just a sequence of characters, where the "*" character has the special
>> meaning of being able to match any number of any kind of characters,
>> and the "?" character matches exactly one character of any kind.
>>
>> The rule matcher extends strict, exact matching by introducing a
>> concept of implicit inclusion rules. If your rule action is "Include",
>> and you specify (say) a "File" rule, the matcher presumes implicit
>> inclusion rules for the corresponding site and library. So, if you
>> create an "Include File" rule that matches (for example)
>> "/MySite/MyLibrary/MyFile", there is an implied "Site Include" rule
>> for "/MySite", and an implied "Library Include" rule for
>> "/MySite/MyLibrary". Similarly, if you create a "Library Include"
>> rule, there is an implied "Site Include" rule that corresponds to it.
>> Note that these shortcuts only applies to "Include" rules - there are
>> no corresponding implied "Exclude" rules."
>>
>> What this means is that you should probably be declaring file rules
>> with "*" as the file name for each library, rather than a library
>> rule.  You might want to just try this.  If you still have trouble,
>> you can try setting the "org.apache.manifoldcf.connectors" property to
>> "DEBUG" in the properties.xml file and restarting ManifoldCF before
>> your next crawl.  The manifoldcf.log file will then have output
>> describing the decisions the SharePoint connector made about each
>> site, library, file, or folder it encountered.
>>
>> Thanks,
>> Karl
>>
>> On Tue, Jan 31, 2012 at 10:27 AM, Silvia, Daniel [USA]
>> <Si...@bah.com> wrote:
>>> Hi Karl
>>>
>>> The Path Rules are :
>>>
>>> Path Match: /Shared Documents
>>> Type: library
>>> Action: include
>>>
>>> Path Match: /IDD/Shared Documents
>>> Type: library
>>> Action: include
>>>
>>> Path Match: /IDD/Documents
>>> Type: library
>>> Action: include
>>>
>>> Path Match: /manifoldcf/Shared Documents
>>> Type: library
>>> Action: include
>>>
>>> I hope this helps.
>>>
>>> I really appreciate your help.
>>>
>>>
>>>
>>> ________________________________________
>>> From: Karl Wright [daddywri@gmail.com]
>>> Sent: Tuesday, January 31, 2012 10:01 AM
>>> To: Silvia, Daniel [USA]
>>> Cc: connectors-user@incubator.apache.org
>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>
>>> "When I select only the fetch activity, I don't see anything in the
>>> events, when I select the Document Ingest activity, I don't see
>>> anything in the events."
>>>
>>> So either you've already run the job and the documents were accessed
>>> the first time (and won't be accessed again until they change), or the
>>> problem is likely that your SharePoint Path Rules are not including
>>> any documents.  It would be very helpful at this point to include a
>>> screen shot of the job you've created.  Since you are not on the net,
>>> perhaps you can jot down your SharePoint path rules for me to have a
>>> look at, as they are displayed when you view the job.
>>>
>>> Thanks,
>>> Karl
>>>
>>> On Tue, Jan 31, 2012 at 9:44 AM, Silvia, Daniel [USA]
>>> <Si...@bah.com> wrote:
>>>> Hi Karl
>>>>
>>>> Ok, I have created a new job and ran the job and went to the Simple History Report.
>>>>
>>>> I see the Events. If all the  Activities in the Simple History Report, Document Deletion(SolrPipeline), Document Ingest(SolrPipeline), and Fetch are selected I see a start job and end job for events . When I get to the Simple History Report I can select the "Connection", I don't have an option to select the Activities I run the report first.
>>>> When I select only the fetch activity, I don't see anything in the events, when I select the Document Ingest activity, I don't see anything in the events.
>>>>
>>>> My solr output connection has the following information:
>>>> Protocol: http
>>>> Server: "the server name"
>>>> Port:8080 (we are running solr on Jboss port 8080)
>>>> Web Application Name: solr
>>>> Core Name: collection1
>>>> Update Handler: update/extract
>>>> Remove Handler: /update
>>>> Status Handler: /admin/ping
>>>>
>>>>
>>>>
>>>> ________________________________________
>>>> From: Karl Wright [daddywri@gmail.com]
>>>> Sent: Tuesday, January 31, 2012 9:00 AM
>>>> To: Silvia, Daniel [USA]; connectors-user@incubator.apache.org
>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>
>>>> Ok, let's do one thing at a time.
>>>>
>>>> First:
>>>>
>>>> "For the Path tab where there are Path Rules, are these the paths we
>>>> want ManifoldCF to follow? Each site, and each Library like Documents
>>>> and Shared Documents. And in the Metadata tab, this is the tab where
>>>> you indicate for each "Site" and "Library" you want to include
>>>> specific metadata or include all metadata?"
>>>>
>>>> For SharePoint, there are Path Rules and Metadata Rules.  The Path
>>>> Rules describe what documents you want to include or exclude.  The
>>>> Metadata Rules describe what metadata you want to include or exclude.
>>>> For right now I would ignore the Metadata Rules and just make sure you
>>>> have Path Rules that mean that you have included documents.
>>>>
>>>> "As I run the report, I see "Documents", "Active, and "Processed"
>>>> where the numbers change under the "Active" column as well as the
>>>> "Document" and "Processed" column (these just get larger, where Active
>>>> changes). "
>>>>
>>>> This "report" we actually call the Job Status screen.  The fact that
>>>> the numbers get larger and the job doesn't just end indicates that you
>>>> are successfully crawling your SharePoint, and you have set up the job
>>>> to include at least some documents.  This is good news.  However, this
>>>> is NOT the "Simple History" report I was alluding to earlier.  To get
>>>> to that report, click on the "Simple History" link on the left-hand
>>>> navigation area.  This report will show the events of your choice
>>>> (default - ALL recorded events) over a given time window (default: the
>>>> last hour).  If you've done this right you should at least see a "Job
>>>> start" event.  The events you are most interested in are the "fetch"
>>>> (which describes all attempts to fetch documents from SharePoint) and
>>>> "document ingest", which describe attempts to get documents into Solr.
>>>>  You can refresh the displayed events by clicking the "Go" button in
>>>> the middle of the screen whenever you wish.
>>>>
>>>> I'd like you to delete your job, create it again, and start it.  Then,
>>>> while it is running, I'd like you to go to the "Simple History"
>>>> screen, and select the appropriate connection (your SharePoint
>>>> repository connection), and click the "Go" button.  So as not to skip
>>>> anything basic:
>>>>
>>>> (1) What event types do you see?
>>>> (2) Are there "fetch" events?
>>>> (3) Are there "document ingest" events?
>>>>
>>>> If you see no "fetch" events, that implies you have either not
>>>> specified any documents to include in your job, OR your Solr
>>>> connection is configured to reject too many document types so they are
>>>> all getting filtered out.
>>>>
>>>> If you see "document ingest" events, but those have errors, it implies
>>>> that the configuration of your Solr connection is incorrect and does
>>>> not match the way your Solr is configured.  If you send me a specific
>>>> error code and/or text I can help you figure out what is happening.
>>>>
>>>> If you see "document ingest" events with NO errors, but the Solr
>>>> instance is not getting documents, you are describing an impossible
>>>> situation.  While your Solr instance may not be configured to have the
>>>> Extracting Update Handler active, or it may be at a different URL than
>>>> what you pointed at, that would definitely yield errors or
>>>> notifications in the Simple History.
>>>>
>>>> Please let me know what you actually see.
>>>> Karl
>>>>
>>>>
>>>>
>>>> On Tue, Jan 31, 2012 at 7:53 AM, Silvia, Daniel [USA]
>>>> <Si...@bah.com> wrote:
>>>>> Hi Karl
>>>>>
>>>>> I am trying to figure out why I can't see anything being indexed into our Solr index. I was looking at another post where you were working with "Martijn" and that individual was not able to see info getting into Solr. In the report  that I have set up, I have included all metadata associated to each site, Share Documents, and Documents. In the Solr Field Mapping, I am associating metadata fields that are indicated in the MetaData tab to fields that exist in our solr index.
>>>>>
>>>>> For the Path tab where there are Path Rules, are these the paths we want ManifoldCF to follow? Each site, and each Library like Documents and Shared Documents. And in the Metadata tab, this is the tab where you indicate for each "Site" and "Library" you want to include specific metadata or include all metadata?
>>>>>
>>>>> As I run the report, I see "Documents", "Active, and "Processed" where the numbers change under the "Active" column as well as the "Document" and "Processed" column (these just get larger, where Active changes). While I was researching why I may not be seeing something over on the Solr side, I saw your communication with another individual indicating that I should see something like literal.xxx=yyy in the Solr log. This is an older post so there maybe something else I should see. But the only thing I see when I look at the Solr log is "[ ] webapp=/solr path=/update/extract params={commit=true} status=0 QTime=0".
>>>>>
>>>>> Any ideas.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ________________________________________
>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>> Sent: Monday, January 30, 2012 10:40 AM
>>>>> To: Silvia, Daniel [USA]
>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>
>>>>> The default time range for the Simple History is the last hour.  I
>>>>> suspect you are unaware of that.  If you want a different time range
>>>>> you will have to modify the start and end time pulldowns accordingly.
>>>>>
>>>>> Karl
>>>>>
>>>>> On Mon, Jan 30, 2012 at 10:34 AM, Silvia, Daniel [USA]
>>>>> <Si...@bah.com> wrote:
>>>>>> Hi Karl
>>>>>>
>>>>>> I am looking at the Simple History in the UI and there isn't much to see, unless I am not getting what I am suppose to.  I see the "Start Time, Activity, Identifier, Bytes, and Time, I don't get anything for Result Code or Result Description. I looked in the documentation and we should be getting something in those fields, I believe.
>>>>>>
>>>>>> Anyway, I will look through the mail list to see what I can find.
>>>>>>
>>>>>> Thanks for the help.
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>> ________________________________________
>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>> Sent: Monday, January 30, 2012 8:24 AM
>>>>>> To: Silvia, Daniel [USA]
>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>
>>>>>> So just to be clear, I'm NOT talking about the ManifoldCF logging.
>>>>>> For the Solr connector you probably won't need to turn that on; it's
>>>>>> pretty simple and you can look at the Simple History in the UI to see
>>>>>> what the request and response look like from Solr.  I was talking
>>>>>> instead about Solr logging - when you run the Solr Webapp, by default
>>>>>> all requests against the Extracting Update Handler are logged to
>>>>>> standard error, so you will see them appear in the process window in
>>>>>> which Solr is running.
>>>>>>
>>>>>> My suggestion to you is to first have a look at the Simple History for
>>>>>> the job you are trying to run.  If you are getting back 500 errors
>>>>>> from Solr, that means you have not set up Solr properly to work with
>>>>>> ManifoldCF.  In recent versions of Solr, the example works fine out of
>>>>>> the box, but when you try to deploy any other way you are often
>>>>>> missing the jar that contains the extracting update handler, so of
>>>>>> course nothing works.  Several people on the connectors-user list have
>>>>>> run into this and if you search the list (go to the ManifoldCF site
>>>>>> and click through to the mailing list page and there are links at the
>>>>>> bottom for this purpose) you will find posts that describe exactly
>>>>>> what is wrong and how to fix it.
>>>>>>
>>>>>> Hope this helps.
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>> On Sun, Jan 29, 2012 at 2:30 PM, Silvia, Daniel [USA]
>>>>>> <Si...@bah.com> wrote:
>>>>>>> Yea,but for some reason the logging isn't coming through. The logging is set for info and I will have to change the logging level to DEBUG.
>>>>>>>
>>>>>>> Thanks again for your help.
>>>>>>>
>>>>>>>
>>>>>>> ________________________________________
>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>> Sent: Friday, January 27, 2012 5:06 PM
>>>>>>> To: Silvia, Daniel [USA]
>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>
>>>>>>> Actually, the best thing for debugging the Solr connection is looking
>>>>>>> at standard-output on the Solr instance.  You will see all the posts
>>>>>>> that are made and what the arguments were.  Also, this is the kind of
>>>>>>> question you'd get a lot of benefit from posting to the list.  The
>>>>>>> end-user documentation I pointed you at before describes some of this
>>>>>>> but the Solr connector has grown beyond the doc to some extent at this
>>>>>>> point.
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>> On Fri, Jan 27, 2012 at 9:51 AM, Silvia, Daniel [USA]
>>>>>>> <Si...@bah.com> wrote:
>>>>>>>> Hi Karl
>>>>>>>>
>>>>>>>> Is there a log level other than  Wire-level debugging to view log staements for trying to send output to a Solr instance in the Jobs List/Creation section? We are having an issue getting content to Solr. Is there a document anywhere which defines the fields for the Jobs sections for the Solr Field Mapping tab and the Paths and MetaData tabs?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> Dan
>>>>>>>>
>>>>>>>> ________________________________________
>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>> Sent: Thursday, January 26, 2012 10:44 AM
>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>
>>>>>>>> I am afraid I don't know the answer to that.  I'm sure it's infinitely
>>>>>>>> configurable but it's not clear what the SharePoint web services need
>>>>>>>> to do under the hood, so anything I tell you would be just a guess.
>>>>>>>>
>>>>>>>> Karl
>>>>>>>>
>>>>>>>> On Thu, Jan 26, 2012 at 10:43 AM, Silvia, Daniel [USA]
>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>> Hi Karl
>>>>>>>>>
>>>>>>>>> One more question. Do you know the minimum permissions needed to crawl the Sharepoint instance and all sites under the instance? The individual who set my permissions set me up as the "site collection admin" for the top most site. Is there a specific admin role without setting the user crawling the sharpoint instance other than "Farm Admin"?
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>> ________________________________________
>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>> Sent: Thursday, January 26, 2012 9:53 AM
>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>
>>>>>>>>> Good news!  Please keep in touch; we'd like to hear how things work
>>>>>>>>> for you (it helps keep the software fresh ;-) ).
>>>>>>>>>
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>> On Thu, Jan 26, 2012 at 9:48 AM, Silvia, Daniel [USA]
>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>> Hey Karl
>>>>>>>>>>
>>>>>>>>>> (1) was the issue. When requesting access to the SharePoint instance I indicated that I needed to be able to crawl SharePoint, I guess the problem was on my end indicating that I also needed privileges to crawl the site.
>>>>>>>>>>
>>>>>>>>>> Anyway, thank you for your help. When I change the SharePoint version to v 3 I get a message indicating "Connection Working".
>>>>>>>>>>
>>>>>>>>>> Appreciate the help.
>>>>>>>>>>
>>>>>>>>>> Dan
>>>>>>>>>>
>>>>>>>>>> ________________________________________
>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>> Sent: Thursday, January 26, 2012 9:19 AM
>>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>
>>>>>>>>>> The error message "axisFault=Server, detail=Server was unable to
>>>>>>>>>> process request --> Requested Registry access is not allowed" is Axis
>>>>>>>>>> interpreting an error message from SharePoint.  What it is saying is
>>>>>>>>>> that the user you are trying to crawl with is unable to read the
>>>>>>>>>> SharePoint machine's registry but needs to.  There are two possible
>>>>>>>>>> causes for this:
>>>>>>>>>>
>>>>>>>>>> (1) The user you gave doesn't have enough permissions to crawl SharePoint
>>>>>>>>>> (2) When you installed the SharePoint MCPermissions plugin, you
>>>>>>>>>> installed it logged in as a user that did not enough permissions to do
>>>>>>>>>> what it needs to do.
>>>>>>>>>>
>>>>>>>>>> You can tell the difference between the two by selecting "SharePoint
>>>>>>>>>> 2.0" in the sharepoint version pulldown.  If a connection saved in
>>>>>>>>>> this way says "Connection working", it means that the MCPermissions
>>>>>>>>>> plugin has the permission problem, not your user.
>>>>>>>>>>
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>> On Thu, Jan 26, 2012 at 9:14 AM, Silvia, Daniel [USA]
>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>> Hi Karl
>>>>>>>>>>>
>>>>>>>>>>> When I try to use option (1) and don't put anything in the Site field, I get an error message "axisFault=Server, detail=Server was unable to process request --> Requested Registry access is not allowed" and when I put a "/" in the site filed I get  a GUI error indicating that the site field can't end with a "/".
>>>>>>>>>>>
>>>>>>>>>>> Anyway, do you have any ideas. Or maybe the Sharepoint instance is not configured properly for us to crawl?
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ________________________________________
>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>> Sent: Thursday, January 26, 2012 8:52 AM
>>>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>
>>>>>>>>>>> SharePoint has two kinds of site:
>>>>>>>>>>>
>>>>>>>>>>> (1) the root site, which can be reached by the path http://server:port
>>>>>>>>>>> (2) a number of sites under the 'virtual path', with URLs of the form:
>>>>>>>>>>>
>>>>>>>>>>> http://server:port/something/sitename
>>>>>>>>>>>
>>>>>>>>>>> The "something" is, by default, the string "site", so
>>>>>>>>>>> http://server:port/site/xyz might be the URL of one such virtual site.
>>>>>>>>>>>
>>>>>>>>>>> The form of the "site" field in the SharePoint connection for the
>>>>>>>>>>> first is either blank or "/" (can't remember which right now), and the
>>>>>>>>>>> form of the "site" field for the second is "/site/xyz".  On no account
>>>>>>>>>>> does the connector expect to see default.aspx attached to that path,
>>>>>>>>>>> so you should not do this; it cannot work.
>>>>>>>>>>>
>>>>>>>>>>> FWIW, my recommendation to try setting the connection type to
>>>>>>>>>>> "SharePoint 2.0" was to rule out any possible installation issue with
>>>>>>>>>>> the ManifoldCF sharepoint plugin.  The connection check for 2.0 does
>>>>>>>>>>> not look for it; only the connection check for 3.0 does.
>>>>>>>>>>>
>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jan 26, 2012 at 8:41 AM, Silvia, Daniel [USA]
>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>> Hey Karl
>>>>>>>>>>>>
>>>>>>>>>>>> I am also getting an "HTTP Error 401.2: Unauthorized: Access is denied due to server configuration" when setting the Site field to /default.aspx. Do most Sharepoint instances have the urls set to something like http://server:port/sites/...... instead of http://server:port/? When I use the "/default.aspx" I see in the log files that ManifoldCF is trying to go to the Lists.asmx service with the url http://server:port/default.aspx/_vti_bin/Lists.asmx, where nothing is found.
>>>>>>>>>>>>
>>>>>>>>>>>> As you can tell I am not much of a SharePoint user or installer.
>>>>>>>>>>>>
>>>>>>>>>>>> Also, I don't think the issue is with the connector in ManifoldCF, I am just trying to
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ________________________________________
>>>>>>>>>>>> From: Silvia, Daniel [USA]
>>>>>>>>>>>> Sent: Thursday, January 26, 2012 7:23 AM
>>>>>>>>>>>> To: Karl Wright
>>>>>>>>>>>> Subject: RE: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>>
>>>>>>>>>>>> Hey Karl
>>>>>>>>>>>>
>>>>>>>>>>>> The issue I am having is that the Sharepoint instance url is something like http://server:port/default.aspx. If I don't put anything in the site field I get a message indicating "Requested Registry Access is not allowed". I was putting "/default.apsx" as my Site field which I believe may have been the issue. However, what do you put in your Site field when the site is the top most site, as in http://server:port/default.aspx?
>>>>>>>>>>>>
>>>>>>>>>>>> I would love to send you the log messages, but I am working on a network which is not connected to the outside.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ________________________________________
>>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>>> Sent: Wednesday, January 25, 2012 6:12 PM
>>>>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>>
>>>>>>>>>>>> Daniel,
>>>>>>>>>>>>
>>>>>>>>>>>> FWIW, I can help you diagnose the issue, but to do so you really need
>>>>>>>>>>>> to give me some concrete data.  I'm happy to grovel over the whole
>>>>>>>>>>>> wire log if you feel you can send it to me; something that may not
>>>>>>>>>>>> seem important to you will likely stand out strongly to me.  I can,
>>>>>>>>>>>> for example, see whether you are getting back HTML because of an
>>>>>>>>>>>> authentication error, for instance.  And if you ARE getting back valid
>>>>>>>>>>>> SOAP, I would then be sure that something was wrong with the Axis
>>>>>>>>>>>> client configuration, and I could pursue that here with the data
>>>>>>>>>>>> provided.  The problem with software like SharePoint running on IIS is
>>>>>>>>>>>> that it can be configured a nearly infinite number of ways, so
>>>>>>>>>>>> diagnosis is more of an art than a science.  I strongly suspect that
>>>>>>>>>>>> you're laboring under a pretty straightforward misconception which is
>>>>>>>>>>>> likely blocking progress, rather than there being an issue with the
>>>>>>>>>>>> SharePoint connector itself.  But I can't tell that without more
>>>>>>>>>>>> detailed communication.
>>>>>>>>>>>>
>>>>>>>>>>>> Also, you mentioned that the Lists.asmx service was right where you
>>>>>>>>>>>> expected it to be.  Have you read the SharePoint Connector part of the
>>>>>>>>>>>> end-user documentation?  To whit:
>>>>>>>>>>>>
>>>>>>>>>>>> "Select the server protocol, and enter the server name and port, based
>>>>>>>>>>>> on what you recorded from the URL for your SharePoint site. For the
>>>>>>>>>>>> "Site path" field, type in the portion of the root site URL that
>>>>>>>>>>>> includes everything after the server and port, except for the final
>>>>>>>>>>>> "aspx" file. For example, if the SharePoint URL is
>>>>>>>>>>>> "http://myserver:81/sites/somewhere/index.asp", the site path would be
>>>>>>>>>>>> "/sites/somewhere"."  The Lists.asmx service in this example would be
>>>>>>>>>>>> expected to be found at
>>>>>>>>>>>> "http://myserver:81/sites/somewhere/_vti_bin/Lists.asmx".  And the URL
>>>>>>>>>>>> you would start with would be the URL you see in the browser when you
>>>>>>>>>>>> log into the SharePoint web client and go to the site you wish to
>>>>>>>>>>>> crawl.  Is this what you are doing?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks again,
>>>>>>>>>>>> Karl
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jan 25, 2012 at 12:33 PM, Karl Wright <da...@gmail.com> wrote:
>>>>>>>>>>>>> The code that parses the SOAP response is Apache Axis.  This hasn't
>>>>>>>>>>>>> changed in several years.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can you answer the following questions:
>>>>>>>>>>>>>
>>>>>>>>>>>>> (1) When the SharePoint connector makes a request to SharePoint, is
>>>>>>>>>>>>> the response HTML, or is it XML?  Does it have an XML header which
>>>>>>>>>>>>> describes a Microsoft XML namespace?  It sure sounds like it is
>>>>>>>>>>>>> responding with HTML.  The SharePoint connector is expecting to
>>>>>>>>>>>>> communicate using SOAP.  Is the response valid SOAP?
>>>>>>>>>>>>>
>>>>>>>>>>>>> (2) What version of SharePoint are you trying to connect to?  Is the
>>>>>>>>>>>>> SharePoint 2007?  SharePoint 2010?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Jan 25, 2012 at 12:26 PM, Silvia, Daniel [USA]
>>>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have added the specific log4j lines for Http Client wire and I restarted the ManifoldCF instance. I was also see the webservice Lists.asmx through IE. When reviewing the log files I was able to see some of the content that resides in the Sharepoint instance in the content coming back from the request. However, I am still seeing the error messages in the ManifoldCF GUI as well as in the log file indicating  "Bad Envelope: HTML" ,"No service named ListsSoap is available" and "No service named http://schemas.microsoft.com/sharepoint/soap/GetListCollection is available".
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Could there be something going on with the way the services are being built on the client side?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Appreciate your help.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ________________________________________
>>>>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>>>>> Sent: Tuesday, January 24, 2012 4:52 PM
>>>>>>>>>>>>>> To: Silvia, Daniel [USA]; connectors-user@incubator.apache.org
>>>>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have not seen this exact problem before.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The "Bad envelope tag: HTML" indicates that the SOAP request the
>>>>>>>>>>>>>> SharePoint connector is attempting to perform is, in fact, returning
>>>>>>>>>>>>>> an HTML response.  This usually indicates that the server or path
>>>>>>>>>>>>>> parameters you've used to set up the connection are not set correctly,
>>>>>>>>>>>>>> and SharePoint is not actually being engaged.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> But usually when that happens I don't recall a ConfigurationException
>>>>>>>>>>>>>> logged, unless it's what Axis does in response to the HTML.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The best thing to do at this point is turn on Http Client wire
>>>>>>>>>>>>>> logging, restart ManifoldCF, and view the connection.  The log will
>>>>>>>>>>>>>> then contain a record of the exact SOAP requests and the responses,
>>>>>>>>>>>>>> and we can see what's wrong.  The technique is described here:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/CONNECTORS/Debugging+Connections
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> You can also confirm that the right SharePoint web services are
>>>>>>>>>>>>>> functioning on the machine in question by trying to access them
>>>>>>>>>>>>>> directly.  For the Lists web service, which is the one it sounds like
>>>>>>>>>>>>>> it was complaining about, try using IE (not Firefox etc because you
>>>>>>>>>>>>>> want NTLM support) to go to the url where you think the web service
>>>>>>>>>>>>>> lives.  This will be http: or https:, plus the server, plus the port,
>>>>>>>>>>>>>> plus the path, plus "_vti_bin/Lists.asmx".  You should see an
>>>>>>>>>>>>>> unequivocable SharePoint response.  For an example from the Microsoft
>>>>>>>>>>>>>> demo service, try http://www.wssdemo.com/_vti_bin/Lists.asmx.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Please let me know how it goes, and cc the dev list (as I have) so a
>>>>>>>>>>>>>> record of what you're encountering can be made available to others.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jan 24, 2012 at 1:52 PM, Silvia, Daniel [USA]
>>>>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have downloaded the newest version of ManifoldCF v .4 and have run the necessary ant scripts to download dependencies and then built the entire project. I have also had the ShrePoint webservice MetCarta.SharePoint.MCPermissionsService.wsp deployed on the SharePoint instance due to running version 3 of SharePoint (SharePoint 2007). When I try to create a Repository Connection and select "Save" I get a message on the ManifoldCF front end of "org.xml.sax.SAXException Bad envelope tag: HTML". When I look at the log file I see an error message " org.apache.axis.ConfigurationException: No service named ListsSoap is available".
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Can you tell me if you have seen this issue before and what may be causing this issue?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>>> ________________________________________
>>>>>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>>>>>> Sent: Friday, January 20, 2012 7:31 AM
>>>>>>>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In order for the SharePoint connector to build, you need to have the
>>>>>>>>>>>>>>> wsdls in place in the right area.  We cannot ship those because of
>>>>>>>>>>>>>>> potential copyright issues.  The easiest way to obtain the right
>>>>>>>>>>>>>>> dependencies is:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ant download-dependencies
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Then, just build normally:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ant build
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This will only work for ManifoldCF-0.4-incubating, or trunk.
>>>>>>>>>>>>>>> 0.4-incubating is still in the process of being signed off by the
>>>>>>>>>>>>>>> incubator, but you can find the release candidate here:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> http://people.apache.org/~kwright
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, Jan 20, 2012 at 7:02 AM, Silvia, Daniel [USA]
>>>>>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I work with Matt Parker and we are in the process of developing a pipeline
>>>>>>>>>>>>>>>> that uses ManifoldCF at the beginning. I just subscribed to the
>>>>>>>>>>>>>>>> connectors-user-subscribe@incubator.apache.org
>>>>>>>>>>>>>>>> group yesterday and submitted an e-mail question to the group. Can you help
>>>>>>>>>>>>>>>> us with the below issue?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I downloaded MCF and started playing with the default setup under Jetty and
>>>>>>>>>>>>>>>> Derby. It starts up without any issue. I am trying to configure a SharePoint
>>>>>>>>>>>>>>>> connector, connecting to SharePoint Service 3. I have been following the
>>>>>>>>>>>>>>>> instructions and I am at the point of deploying the custom SharePoint web
>>>>>>>>>>>>>>>> service to the SharePoint instance. The instructions indicate that I should
>>>>>>>>>>>>>>>> get the web service from dist/sharepoint-integration after building MCF.
>>>>>>>>>>>>>>>> However, after looking through the entire directory structure, I am unable
>>>>>>>>>>>>>>>> to find the service to deploy.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Can someone tell me where to find this service?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Daniel Silvia

Re: ManifoldCF's dist/shapoint-integration dir

Posted by Karl Wright <da...@gmail.com>.
The SharePoint connector only looks at documents within libraries, and
documents within folders in those libraries.  I don't know how
SharePoint is structuring your Wiki content, though.  If it is
individual documents within libraries, it should be accessible by the
SharePoint Connector.  If it is some other construct, then it likely
won't be found by that connector.

The Simple History is going to list the URLs that the SharePoint
connector fetches.  If you know the URL of a piece of Wiki content and
that URL does not appear in the Simple History, it's not being
fetched.  Similarly, if the URL of that piece of Wiki content has no
library name in the path, it's not something the SharePoint Connector
will be able to index.

If the SharePoint connector is not going to do it for you, and your
wiki content is being rendered in a manner that supports standard Wiki
API calls, you can use the Wiki Connector to index it.  If that too
isn't going to work, then we should analyze exactly what SharePoint is
presenting with a view towards extending the SharePoint connector.

Karl

On Mon, Feb 13, 2012 at 9:51 AM, Silvia, Daniel [USA]
<Si...@bah.com> wrote:
> Hi Karl
>
> Does the SharePoint connector only pull files from the SharePoint instance and not content like Wiki content. As mentioned in the previous e-mail I am able to see the xml content in the log file for the wikis with the element similar to <someWiki><someNameWiki_row>some other elements<WikiFiled>content.....</WikiField></someNameWiki_row></someWiki>. However, I do not see information in the Simple History Report pulling Wiki information or the .aspx pages. Does this report only produce information on files and not content pulled from SharePoint?
>
> I am just trying to figure out if I need to configure another connector to pull content from SharePoint other than the SharePoint connector.
>
>
> Thanks
>
> Dan
> ________________________________________
> From: Karl Wright [daddywri@gmail.com]
> Sent: Sunday, February 12, 2012 12:08 PM
> To: Silvia, Daniel [USA]
> Cc: connectors-user@incubator.apache.org
> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>
> Hi Daniel,
>
> If you are seeing fetches in the Simple History that include the wiki
> URLs you are trying to capture, the SharePoint job is likely correct.
> Are you seeing "Document ingest" activities for the same documents?
> If so, they are being sent to Solr, and you'd have to look into the
> Solr configuration to figure out why they aren't being indexed.
>
> Thanks,
>
>
> On Sun, Feb 12, 2012 at 11:37 AM, Silvia, Daniel [USA]
> <Si...@bah.com> wrote:
>> Hi Karl
>>
>> Quick question regarding SharePoint Wikis and ingesting them into Solr.
>>
>> I have been trying to get the Wikis, created in SharePoint, to be ingested into Solr. I am able to see the Wikis in the logging where the SharePoint Connector pulls everything from site, however, I do not see the Wikis content in the solr instance. When creating a job to run, do I need to indicate a path similar to "*Wiki* for the entire site or do I need to configure the solr metadata in the job to capture "WikiField" element in the xml being passed to the Solr connector?
>>
>> Thanks for your help.
>>
>> Dan
>> ________________________________________
>> From: Karl Wright [daddywri@gmail.com]
>> Sent: Tuesday, January 31, 2012 10:52 AM
>> To: Silvia, Daniel [USA]
>> Cc: connectors-user@incubator.apache.org
>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>
>> It's been a while since I've set up a SharePoint job but I think what
>> you are missing is a file rule (instead of just a library rule).
>> Here's what the end-user documentation says on the matter:
>>
>> "Each rule consists of a path, a rule type, and an action. The actions
>> are "Include" and "Exclude". The rule type tells the connection what
>> kind of SharePoint entity it is allowed to exactly match. For example,
>> a "File" rule will only exactly match SharePoint paths that represent
>> files - it cannot exactly match sites or libraries. The path itself is
>> just a sequence of characters, where the "*" character has the special
>> meaning of being able to match any number of any kind of characters,
>> and the "?" character matches exactly one character of any kind.
>>
>> The rule matcher extends strict, exact matching by introducing a
>> concept of implicit inclusion rules. If your rule action is "Include",
>> and you specify (say) a "File" rule, the matcher presumes implicit
>> inclusion rules for the corresponding site and library. So, if you
>> create an "Include File" rule that matches (for example)
>> "/MySite/MyLibrary/MyFile", there is an implied "Site Include" rule
>> for "/MySite", and an implied "Library Include" rule for
>> "/MySite/MyLibrary". Similarly, if you create a "Library Include"
>> rule, there is an implied "Site Include" rule that corresponds to it.
>> Note that these shortcuts only applies to "Include" rules - there are
>> no corresponding implied "Exclude" rules."
>>
>> What this means is that you should probably be declaring file rules
>> with "*" as the file name for each library, rather than a library
>> rule.  You might want to just try this.  If you still have trouble,
>> you can try setting the "org.apache.manifoldcf.connectors" property to
>> "DEBUG" in the properties.xml file and restarting ManifoldCF before
>> your next crawl.  The manifoldcf.log file will then have output
>> describing the decisions the SharePoint connector made about each
>> site, library, file, or folder it encountered.
>>
>> Thanks,
>> Karl
>>
>> On Tue, Jan 31, 2012 at 10:27 AM, Silvia, Daniel [USA]
>> <Si...@bah.com> wrote:
>>> Hi Karl
>>>
>>> The Path Rules are :
>>>
>>> Path Match: /Shared Documents
>>> Type: library
>>> Action: include
>>>
>>> Path Match: /IDD/Shared Documents
>>> Type: library
>>> Action: include
>>>
>>> Path Match: /IDD/Documents
>>> Type: library
>>> Action: include
>>>
>>> Path Match: /manifoldcf/Shared Documents
>>> Type: library
>>> Action: include
>>>
>>> I hope this helps.
>>>
>>> I really appreciate your help.
>>>
>>>
>>>
>>> ________________________________________
>>> From: Karl Wright [daddywri@gmail.com]
>>> Sent: Tuesday, January 31, 2012 10:01 AM
>>> To: Silvia, Daniel [USA]
>>> Cc: connectors-user@incubator.apache.org
>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>
>>> "When I select only the fetch activity, I don't see anything in the
>>> events, when I select the Document Ingest activity, I don't see
>>> anything in the events."
>>>
>>> So either you've already run the job and the documents were accessed
>>> the first time (and won't be accessed again until they change), or the
>>> problem is likely that your SharePoint Path Rules are not including
>>> any documents.  It would be very helpful at this point to include a
>>> screen shot of the job you've created.  Since you are not on the net,
>>> perhaps you can jot down your SharePoint path rules for me to have a
>>> look at, as they are displayed when you view the job.
>>>
>>> Thanks,
>>> Karl
>>>
>>> On Tue, Jan 31, 2012 at 9:44 AM, Silvia, Daniel [USA]
>>> <Si...@bah.com> wrote:
>>>> Hi Karl
>>>>
>>>> Ok, I have created a new job and ran the job and went to the Simple History Report.
>>>>
>>>> I see the Events. If all the  Activities in the Simple History Report, Document Deletion(SolrPipeline), Document Ingest(SolrPipeline), and Fetch are selected I see a start job and end job for events . When I get to the Simple History Report I can select the "Connection", I don't have an option to select the Activities I run the report first.
>>>> When I select only the fetch activity, I don't see anything in the events, when I select the Document Ingest activity, I don't see anything in the events.
>>>>
>>>> My solr output connection has the following information:
>>>> Protocol: http
>>>> Server: "the server name"
>>>> Port:8080 (we are running solr on Jboss port 8080)
>>>> Web Application Name: solr
>>>> Core Name: collection1
>>>> Update Handler: update/extract
>>>> Remove Handler: /update
>>>> Status Handler: /admin/ping
>>>>
>>>>
>>>>
>>>> ________________________________________
>>>> From: Karl Wright [daddywri@gmail.com]
>>>> Sent: Tuesday, January 31, 2012 9:00 AM
>>>> To: Silvia, Daniel [USA]; connectors-user@incubator.apache.org
>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>
>>>> Ok, let's do one thing at a time.
>>>>
>>>> First:
>>>>
>>>> "For the Path tab where there are Path Rules, are these the paths we
>>>> want ManifoldCF to follow? Each site, and each Library like Documents
>>>> and Shared Documents. And in the Metadata tab, this is the tab where
>>>> you indicate for each "Site" and "Library" you want to include
>>>> specific metadata or include all metadata?"
>>>>
>>>> For SharePoint, there are Path Rules and Metadata Rules.  The Path
>>>> Rules describe what documents you want to include or exclude.  The
>>>> Metadata Rules describe what metadata you want to include or exclude.
>>>> For right now I would ignore the Metadata Rules and just make sure you
>>>> have Path Rules that mean that you have included documents.
>>>>
>>>> "As I run the report, I see "Documents", "Active, and "Processed"
>>>> where the numbers change under the "Active" column as well as the
>>>> "Document" and "Processed" column (these just get larger, where Active
>>>> changes). "
>>>>
>>>> This "report" we actually call the Job Status screen.  The fact that
>>>> the numbers get larger and the job doesn't just end indicates that you
>>>> are successfully crawling your SharePoint, and you have set up the job
>>>> to include at least some documents.  This is good news.  However, this
>>>> is NOT the "Simple History" report I was alluding to earlier.  To get
>>>> to that report, click on the "Simple History" link on the left-hand
>>>> navigation area.  This report will show the events of your choice
>>>> (default - ALL recorded events) over a given time window (default: the
>>>> last hour).  If you've done this right you should at least see a "Job
>>>> start" event.  The events you are most interested in are the "fetch"
>>>> (which describes all attempts to fetch documents from SharePoint) and
>>>> "document ingest", which describe attempts to get documents into Solr.
>>>>  You can refresh the displayed events by clicking the "Go" button in
>>>> the middle of the screen whenever you wish.
>>>>
>>>> I'd like you to delete your job, create it again, and start it.  Then,
>>>> while it is running, I'd like you to go to the "Simple History"
>>>> screen, and select the appropriate connection (your SharePoint
>>>> repository connection), and click the "Go" button.  So as not to skip
>>>> anything basic:
>>>>
>>>> (1) What event types do you see?
>>>> (2) Are there "fetch" events?
>>>> (3) Are there "document ingest" events?
>>>>
>>>> If you see no "fetch" events, that implies you have either not
>>>> specified any documents to include in your job, OR your Solr
>>>> connection is configured to reject too many document types so they are
>>>> all getting filtered out.
>>>>
>>>> If you see "document ingest" events, but those have errors, it implies
>>>> that the configuration of your Solr connection is incorrect and does
>>>> not match the way your Solr is configured.  If you send me a specific
>>>> error code and/or text I can help you figure out what is happening.
>>>>
>>>> If you see "document ingest" events with NO errors, but the Solr
>>>> instance is not getting documents, you are describing an impossible
>>>> situation.  While your Solr instance may not be configured to have the
>>>> Extracting Update Handler active, or it may be at a different URL than
>>>> what you pointed at, that would definitely yield errors or
>>>> notifications in the Simple History.
>>>>
>>>> Please let me know what you actually see.
>>>> Karl
>>>>
>>>>
>>>>
>>>> On Tue, Jan 31, 2012 at 7:53 AM, Silvia, Daniel [USA]
>>>> <Si...@bah.com> wrote:
>>>>> Hi Karl
>>>>>
>>>>> I am trying to figure out why I can't see anything being indexed into our Solr index. I was looking at another post where you were working with "Martijn" and that individual was not able to see info getting into Solr. In the report  that I have set up, I have included all metadata associated to each site, Share Documents, and Documents. In the Solr Field Mapping, I am associating metadata fields that are indicated in the MetaData tab to fields that exist in our solr index.
>>>>>
>>>>> For the Path tab where there are Path Rules, are these the paths we want ManifoldCF to follow? Each site, and each Library like Documents and Shared Documents. And in the Metadata tab, this is the tab where you indicate for each "Site" and "Library" you want to include specific metadata or include all metadata?
>>>>>
>>>>> As I run the report, I see "Documents", "Active, and "Processed" where the numbers change under the "Active" column as well as the "Document" and "Processed" column (these just get larger, where Active changes). While I was researching why I may not be seeing something over on the Solr side, I saw your communication with another individual indicating that I should see something like literal.xxx=yyy in the Solr log. This is an older post so there maybe something else I should see. But the only thing I see when I look at the Solr log is "[ ] webapp=/solr path=/update/extract params={commit=true} status=0 QTime=0".
>>>>>
>>>>> Any ideas.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ________________________________________
>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>> Sent: Monday, January 30, 2012 10:40 AM
>>>>> To: Silvia, Daniel [USA]
>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>
>>>>> The default time range for the Simple History is the last hour.  I
>>>>> suspect you are unaware of that.  If you want a different time range
>>>>> you will have to modify the start and end time pulldowns accordingly.
>>>>>
>>>>> Karl
>>>>>
>>>>> On Mon, Jan 30, 2012 at 10:34 AM, Silvia, Daniel [USA]
>>>>> <Si...@bah.com> wrote:
>>>>>> Hi Karl
>>>>>>
>>>>>> I am looking at the Simple History in the UI and there isn't much to see, unless I am not getting what I am suppose to.  I see the "Start Time, Activity, Identifier, Bytes, and Time, I don't get anything for Result Code or Result Description. I looked in the documentation and we should be getting something in those fields, I believe.
>>>>>>
>>>>>> Anyway, I will look through the mail list to see what I can find.
>>>>>>
>>>>>> Thanks for the help.
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>> ________________________________________
>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>> Sent: Monday, January 30, 2012 8:24 AM
>>>>>> To: Silvia, Daniel [USA]
>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>
>>>>>> So just to be clear, I'm NOT talking about the ManifoldCF logging.
>>>>>> For the Solr connector you probably won't need to turn that on; it's
>>>>>> pretty simple and you can look at the Simple History in the UI to see
>>>>>> what the request and response look like from Solr.  I was talking
>>>>>> instead about Solr logging - when you run the Solr Webapp, by default
>>>>>> all requests against the Extracting Update Handler are logged to
>>>>>> standard error, so you will see them appear in the process window in
>>>>>> which Solr is running.
>>>>>>
>>>>>> My suggestion to you is to first have a look at the Simple History for
>>>>>> the job you are trying to run.  If you are getting back 500 errors
>>>>>> from Solr, that means you have not set up Solr properly to work with
>>>>>> ManifoldCF.  In recent versions of Solr, the example works fine out of
>>>>>> the box, but when you try to deploy any other way you are often
>>>>>> missing the jar that contains the extracting update handler, so of
>>>>>> course nothing works.  Several people on the connectors-user list have
>>>>>> run into this and if you search the list (go to the ManifoldCF site
>>>>>> and click through to the mailing list page and there are links at the
>>>>>> bottom for this purpose) you will find posts that describe exactly
>>>>>> what is wrong and how to fix it.
>>>>>>
>>>>>> Hope this helps.
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>> On Sun, Jan 29, 2012 at 2:30 PM, Silvia, Daniel [USA]
>>>>>> <Si...@bah.com> wrote:
>>>>>>> Yea,but for some reason the logging isn't coming through. The logging is set for info and I will have to change the logging level to DEBUG.
>>>>>>>
>>>>>>> Thanks again for your help.
>>>>>>>
>>>>>>>
>>>>>>> ________________________________________
>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>> Sent: Friday, January 27, 2012 5:06 PM
>>>>>>> To: Silvia, Daniel [USA]
>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>
>>>>>>> Actually, the best thing for debugging the Solr connection is looking
>>>>>>> at standard-output on the Solr instance.  You will see all the posts
>>>>>>> that are made and what the arguments were.  Also, this is the kind of
>>>>>>> question you'd get a lot of benefit from posting to the list.  The
>>>>>>> end-user documentation I pointed you at before describes some of this
>>>>>>> but the Solr connector has grown beyond the doc to some extent at this
>>>>>>> point.
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>> On Fri, Jan 27, 2012 at 9:51 AM, Silvia, Daniel [USA]
>>>>>>> <Si...@bah.com> wrote:
>>>>>>>> Hi Karl
>>>>>>>>
>>>>>>>> Is there a log level other than  Wire-level debugging to view log staements for trying to send output to a Solr instance in the Jobs List/Creation section? We are having an issue getting content to Solr. Is there a document anywhere which defines the fields for the Jobs sections for the Solr Field Mapping tab and the Paths and MetaData tabs?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> Dan
>>>>>>>>
>>>>>>>> ________________________________________
>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>> Sent: Thursday, January 26, 2012 10:44 AM
>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>
>>>>>>>> I am afraid I don't know the answer to that.  I'm sure it's infinitely
>>>>>>>> configurable but it's not clear what the SharePoint web services need
>>>>>>>> to do under the hood, so anything I tell you would be just a guess.
>>>>>>>>
>>>>>>>> Karl
>>>>>>>>
>>>>>>>> On Thu, Jan 26, 2012 at 10:43 AM, Silvia, Daniel [USA]
>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>> Hi Karl
>>>>>>>>>
>>>>>>>>> One more question. Do you know the minimum permissions needed to crawl the Sharepoint instance and all sites under the instance? The individual who set my permissions set me up as the "site collection admin" for the top most site. Is there a specific admin role without setting the user crawling the sharpoint instance other than "Farm Admin"?
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>> ________________________________________
>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>> Sent: Thursday, January 26, 2012 9:53 AM
>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>
>>>>>>>>> Good news!  Please keep in touch; we'd like to hear how things work
>>>>>>>>> for you (it helps keep the software fresh ;-) ).
>>>>>>>>>
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>> On Thu, Jan 26, 2012 at 9:48 AM, Silvia, Daniel [USA]
>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>> Hey Karl
>>>>>>>>>>
>>>>>>>>>> (1) was the issue. When requesting access to the SharePoint instance I indicated that I needed to be able to crawl SharePoint, I guess the problem was on my end indicating that I also needed privileges to crawl the site.
>>>>>>>>>>
>>>>>>>>>> Anyway, thank you for your help. When I change the SharePoint version to v 3 I get a message indicating "Connection Working".
>>>>>>>>>>
>>>>>>>>>> Appreciate the help.
>>>>>>>>>>
>>>>>>>>>> Dan
>>>>>>>>>>
>>>>>>>>>> ________________________________________
>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>> Sent: Thursday, January 26, 2012 9:19 AM
>>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>
>>>>>>>>>> The error message "axisFault=Server, detail=Server was unable to
>>>>>>>>>> process request --> Requested Registry access is not allowed" is Axis
>>>>>>>>>> interpreting an error message from SharePoint.  What it is saying is
>>>>>>>>>> that the user you are trying to crawl with is unable to read the
>>>>>>>>>> SharePoint machine's registry but needs to.  There are two possible
>>>>>>>>>> causes for this:
>>>>>>>>>>
>>>>>>>>>> (1) The user you gave doesn't have enough permissions to crawl SharePoint
>>>>>>>>>> (2) When you installed the SharePoint MCPermissions plugin, you
>>>>>>>>>> installed it logged in as a user that did not enough permissions to do
>>>>>>>>>> what it needs to do.
>>>>>>>>>>
>>>>>>>>>> You can tell the difference between the two by selecting "SharePoint
>>>>>>>>>> 2.0" in the sharepoint version pulldown.  If a connection saved in
>>>>>>>>>> this way says "Connection working", it means that the MCPermissions
>>>>>>>>>> plugin has the permission problem, not your user.
>>>>>>>>>>
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>> On Thu, Jan 26, 2012 at 9:14 AM, Silvia, Daniel [USA]
>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>> Hi Karl
>>>>>>>>>>>
>>>>>>>>>>> When I try to use option (1) and don't put anything in the Site field, I get an error message "axisFault=Server, detail=Server was unable to process request --> Requested Registry access is not allowed" and when I put a "/" in the site filed I get  a GUI error indicating that the site field can't end with a "/".
>>>>>>>>>>>
>>>>>>>>>>> Anyway, do you have any ideas. Or maybe the Sharepoint instance is not configured properly for us to crawl?
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ________________________________________
>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>> Sent: Thursday, January 26, 2012 8:52 AM
>>>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>
>>>>>>>>>>> SharePoint has two kinds of site:
>>>>>>>>>>>
>>>>>>>>>>> (1) the root site, which can be reached by the path http://server:port
>>>>>>>>>>> (2) a number of sites under the 'virtual path', with URLs of the form:
>>>>>>>>>>>
>>>>>>>>>>> http://server:port/something/sitename
>>>>>>>>>>>
>>>>>>>>>>> The "something" is, by default, the string "site", so
>>>>>>>>>>> http://server:port/site/xyz might be the URL of one such virtual site.
>>>>>>>>>>>
>>>>>>>>>>> The form of the "site" field in the SharePoint connection for the
>>>>>>>>>>> first is either blank or "/" (can't remember which right now), and the
>>>>>>>>>>> form of the "site" field for the second is "/site/xyz".  On no account
>>>>>>>>>>> does the connector expect to see default.aspx attached to that path,
>>>>>>>>>>> so you should not do this; it cannot work.
>>>>>>>>>>>
>>>>>>>>>>> FWIW, my recommendation to try setting the connection type to
>>>>>>>>>>> "SharePoint 2.0" was to rule out any possible installation issue with
>>>>>>>>>>> the ManifoldCF sharepoint plugin.  The connection check for 2.0 does
>>>>>>>>>>> not look for it; only the connection check for 3.0 does.
>>>>>>>>>>>
>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jan 26, 2012 at 8:41 AM, Silvia, Daniel [USA]
>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>> Hey Karl
>>>>>>>>>>>>
>>>>>>>>>>>> I am also getting an "HTTP Error 401.2: Unauthorized: Access is denied due to server configuration" when setting the Site field to /default.aspx. Do most Sharepoint instances have the urls set to something like http://server:port/sites/...... instead of http://server:port/? When I use the "/default.aspx" I see in the log files that ManifoldCF is trying to go to the Lists.asmx service with the url http://server:port/default.aspx/_vti_bin/Lists.asmx, where nothing is found.
>>>>>>>>>>>>
>>>>>>>>>>>> As you can tell I am not much of a SharePoint user or installer.
>>>>>>>>>>>>
>>>>>>>>>>>> Also, I don't think the issue is with the connector in ManifoldCF, I am just trying to
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ________________________________________
>>>>>>>>>>>> From: Silvia, Daniel [USA]
>>>>>>>>>>>> Sent: Thursday, January 26, 2012 7:23 AM
>>>>>>>>>>>> To: Karl Wright
>>>>>>>>>>>> Subject: RE: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>>
>>>>>>>>>>>> Hey Karl
>>>>>>>>>>>>
>>>>>>>>>>>> The issue I am having is that the Sharepoint instance url is something like http://server:port/default.aspx. If I don't put anything in the site field I get a message indicating "Requested Registry Access is not allowed". I was putting "/default.apsx" as my Site field which I believe may have been the issue. However, what do you put in your Site field when the site is the top most site, as in http://server:port/default.aspx?
>>>>>>>>>>>>
>>>>>>>>>>>> I would love to send you the log messages, but I am working on a network which is not connected to the outside.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ________________________________________
>>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>>> Sent: Wednesday, January 25, 2012 6:12 PM
>>>>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>>
>>>>>>>>>>>> Daniel,
>>>>>>>>>>>>
>>>>>>>>>>>> FWIW, I can help you diagnose the issue, but to do so you really need
>>>>>>>>>>>> to give me some concrete data.  I'm happy to grovel over the whole
>>>>>>>>>>>> wire log if you feel you can send it to me; something that may not
>>>>>>>>>>>> seem important to you will likely stand out strongly to me.  I can,
>>>>>>>>>>>> for example, see whether you are getting back HTML because of an
>>>>>>>>>>>> authentication error, for instance.  And if you ARE getting back valid
>>>>>>>>>>>> SOAP, I would then be sure that something was wrong with the Axis
>>>>>>>>>>>> client configuration, and I could pursue that here with the data
>>>>>>>>>>>> provided.  The problem with software like SharePoint running on IIS is
>>>>>>>>>>>> that it can be configured a nearly infinite number of ways, so
>>>>>>>>>>>> diagnosis is more of an art than a science.  I strongly suspect that
>>>>>>>>>>>> you're laboring under a pretty straightforward misconception which is
>>>>>>>>>>>> likely blocking progress, rather than there being an issue with the
>>>>>>>>>>>> SharePoint connector itself.  But I can't tell that without more
>>>>>>>>>>>> detailed communication.
>>>>>>>>>>>>
>>>>>>>>>>>> Also, you mentioned that the Lists.asmx service was right where you
>>>>>>>>>>>> expected it to be.  Have you read the SharePoint Connector part of the
>>>>>>>>>>>> end-user documentation?  To whit:
>>>>>>>>>>>>
>>>>>>>>>>>> "Select the server protocol, and enter the server name and port, based
>>>>>>>>>>>> on what you recorded from the URL for your SharePoint site. For the
>>>>>>>>>>>> "Site path" field, type in the portion of the root site URL that
>>>>>>>>>>>> includes everything after the server and port, except for the final
>>>>>>>>>>>> "aspx" file. For example, if the SharePoint URL is
>>>>>>>>>>>> "http://myserver:81/sites/somewhere/index.asp", the site path would be
>>>>>>>>>>>> "/sites/somewhere"."  The Lists.asmx service in this example would be
>>>>>>>>>>>> expected to be found at
>>>>>>>>>>>> "http://myserver:81/sites/somewhere/_vti_bin/Lists.asmx".  And the URL
>>>>>>>>>>>> you would start with would be the URL you see in the browser when you
>>>>>>>>>>>> log into the SharePoint web client and go to the site you wish to
>>>>>>>>>>>> crawl.  Is this what you are doing?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks again,
>>>>>>>>>>>> Karl
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jan 25, 2012 at 12:33 PM, Karl Wright <da...@gmail.com> wrote:
>>>>>>>>>>>>> The code that parses the SOAP response is Apache Axis.  This hasn't
>>>>>>>>>>>>> changed in several years.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can you answer the following questions:
>>>>>>>>>>>>>
>>>>>>>>>>>>> (1) When the SharePoint connector makes a request to SharePoint, is
>>>>>>>>>>>>> the response HTML, or is it XML?  Does it have an XML header which
>>>>>>>>>>>>> describes a Microsoft XML namespace?  It sure sounds like it is
>>>>>>>>>>>>> responding with HTML.  The SharePoint connector is expecting to
>>>>>>>>>>>>> communicate using SOAP.  Is the response valid SOAP?
>>>>>>>>>>>>>
>>>>>>>>>>>>> (2) What version of SharePoint are you trying to connect to?  Is the
>>>>>>>>>>>>> SharePoint 2007?  SharePoint 2010?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Jan 25, 2012 at 12:26 PM, Silvia, Daniel [USA]
>>>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have added the specific log4j lines for Http Client wire and I restarted the ManifoldCF instance. I was also see the webservice Lists.asmx through IE. When reviewing the log files I was able to see some of the content that resides in the Sharepoint instance in the content coming back from the request. However, I am still seeing the error messages in the ManifoldCF GUI as well as in the log file indicating  "Bad Envelope: HTML" ,"No service named ListsSoap is available" and "No service named http://schemas.microsoft.com/sharepoint/soap/GetListCollection is available".
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Could there be something going on with the way the services are being built on the client side?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Appreciate your help.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ________________________________________
>>>>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>>>>> Sent: Tuesday, January 24, 2012 4:52 PM
>>>>>>>>>>>>>> To: Silvia, Daniel [USA]; connectors-user@incubator.apache.org
>>>>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have not seen this exact problem before.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The "Bad envelope tag: HTML" indicates that the SOAP request the
>>>>>>>>>>>>>> SharePoint connector is attempting to perform is, in fact, returning
>>>>>>>>>>>>>> an HTML response.  This usually indicates that the server or path
>>>>>>>>>>>>>> parameters you've used to set up the connection are not set correctly,
>>>>>>>>>>>>>> and SharePoint is not actually being engaged.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> But usually when that happens I don't recall a ConfigurationException
>>>>>>>>>>>>>> logged, unless it's what Axis does in response to the HTML.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The best thing to do at this point is turn on Http Client wire
>>>>>>>>>>>>>> logging, restart ManifoldCF, and view the connection.  The log will
>>>>>>>>>>>>>> then contain a record of the exact SOAP requests and the responses,
>>>>>>>>>>>>>> and we can see what's wrong.  The technique is described here:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/CONNECTORS/Debugging+Connections
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> You can also confirm that the right SharePoint web services are
>>>>>>>>>>>>>> functioning on the machine in question by trying to access them
>>>>>>>>>>>>>> directly.  For the Lists web service, which is the one it sounds like
>>>>>>>>>>>>>> it was complaining about, try using IE (not Firefox etc because you
>>>>>>>>>>>>>> want NTLM support) to go to the url where you think the web service
>>>>>>>>>>>>>> lives.  This will be http: or https:, plus the server, plus the port,
>>>>>>>>>>>>>> plus the path, plus "_vti_bin/Lists.asmx".  You should see an
>>>>>>>>>>>>>> unequivocable SharePoint response.  For an example from the Microsoft
>>>>>>>>>>>>>> demo service, try http://www.wssdemo.com/_vti_bin/Lists.asmx.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Please let me know how it goes, and cc the dev list (as I have) so a
>>>>>>>>>>>>>> record of what you're encountering can be made available to others.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jan 24, 2012 at 1:52 PM, Silvia, Daniel [USA]
>>>>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have downloaded the newest version of ManifoldCF v .4 and have run the necessary ant scripts to download dependencies and then built the entire project. I have also had the ShrePoint webservice MetCarta.SharePoint.MCPermissionsService.wsp deployed on the SharePoint instance due to running version 3 of SharePoint (SharePoint 2007). When I try to create a Repository Connection and select "Save" I get a message on the ManifoldCF front end of "org.xml.sax.SAXException Bad envelope tag: HTML". When I look at the log file I see an error message " org.apache.axis.ConfigurationException: No service named ListsSoap is available".
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Can you tell me if you have seen this issue before and what may be causing this issue?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>>> ________________________________________
>>>>>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>>>>>> Sent: Friday, January 20, 2012 7:31 AM
>>>>>>>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In order for the SharePoint connector to build, you need to have the
>>>>>>>>>>>>>>> wsdls in place in the right area.  We cannot ship those because of
>>>>>>>>>>>>>>> potential copyright issues.  The easiest way to obtain the right
>>>>>>>>>>>>>>> dependencies is:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ant download-dependencies
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Then, just build normally:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ant build
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This will only work for ManifoldCF-0.4-incubating, or trunk.
>>>>>>>>>>>>>>> 0.4-incubating is still in the process of being signed off by the
>>>>>>>>>>>>>>> incubator, but you can find the release candidate here:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> http://people.apache.org/~kwright
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, Jan 20, 2012 at 7:02 AM, Silvia, Daniel [USA]
>>>>>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I work with Matt Parker and we are in the process of developing a pipeline
>>>>>>>>>>>>>>>> that uses ManifoldCF at the beginning. I just subscribed to the
>>>>>>>>>>>>>>>> connectors-user-subscribe@incubator.apache.org
>>>>>>>>>>>>>>>> group yesterday and submitted an e-mail question to the group. Can you help
>>>>>>>>>>>>>>>> us with the below issue?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I downloaded MCF and started playing with the default setup under Jetty and
>>>>>>>>>>>>>>>> Derby. It starts up without any issue. I am trying to configure a SharePoint
>>>>>>>>>>>>>>>> connector, connecting to SharePoint Service 3. I have been following the
>>>>>>>>>>>>>>>> instructions and I am at the point of deploying the custom SharePoint web
>>>>>>>>>>>>>>>> service to the SharePoint instance. The instructions indicate that I should
>>>>>>>>>>>>>>>> get the web service from dist/sharepoint-integration after building MCF.
>>>>>>>>>>>>>>>> However, after looking through the entire directory structure, I am unable
>>>>>>>>>>>>>>>> to find the service to deploy.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Can someone tell me where to find this service?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Daniel Silvia

RE: ManifoldCF's dist/shapoint-integration dir

Posted by "Silvia, Daniel [USA]" <Si...@bah.com>.
Hi Karl

Does the SharePoint connector only pull files from the SharePoint instance and not content like Wiki content. As mentioned in the previous e-mail I am able to see the xml content in the log file for the wikis with the element similar to <someWiki><someNameWiki_row>some other elements<WikiFiled>content.....</WikiField></someNameWiki_row></someWiki>. However, I do not see information in the Simple History Report pulling Wiki information or the .aspx pages. Does this report only produce information on files and not content pulled from SharePoint?

I am just trying to figure out if I need to configure another connector to pull content from SharePoint other than the SharePoint connector.


Thanks

Dan
________________________________________
From: Karl Wright [daddywri@gmail.com]
Sent: Sunday, February 12, 2012 12:08 PM
To: Silvia, Daniel [USA]
Cc: connectors-user@incubator.apache.org
Subject: Re: ManifoldCF's dist/shapoint-integration dir

Hi Daniel,

If you are seeing fetches in the Simple History that include the wiki
URLs you are trying to capture, the SharePoint job is likely correct.
Are you seeing "Document ingest" activities for the same documents?
If so, they are being sent to Solr, and you'd have to look into the
Solr configuration to figure out why they aren't being indexed.

Thanks,


On Sun, Feb 12, 2012 at 11:37 AM, Silvia, Daniel [USA]
<Si...@bah.com> wrote:
> Hi Karl
>
> Quick question regarding SharePoint Wikis and ingesting them into Solr.
>
> I have been trying to get the Wikis, created in SharePoint, to be ingested into Solr. I am able to see the Wikis in the logging where the SharePoint Connector pulls everything from site, however, I do not see the Wikis content in the solr instance. When creating a job to run, do I need to indicate a path similar to "*Wiki* for the entire site or do I need to configure the solr metadata in the job to capture "WikiField" element in the xml being passed to the Solr connector?
>
> Thanks for your help.
>
> Dan
> ________________________________________
> From: Karl Wright [daddywri@gmail.com]
> Sent: Tuesday, January 31, 2012 10:52 AM
> To: Silvia, Daniel [USA]
> Cc: connectors-user@incubator.apache.org
> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>
> It's been a while since I've set up a SharePoint job but I think what
> you are missing is a file rule (instead of just a library rule).
> Here's what the end-user documentation says on the matter:
>
> "Each rule consists of a path, a rule type, and an action. The actions
> are "Include" and "Exclude". The rule type tells the connection what
> kind of SharePoint entity it is allowed to exactly match. For example,
> a "File" rule will only exactly match SharePoint paths that represent
> files - it cannot exactly match sites or libraries. The path itself is
> just a sequence of characters, where the "*" character has the special
> meaning of being able to match any number of any kind of characters,
> and the "?" character matches exactly one character of any kind.
>
> The rule matcher extends strict, exact matching by introducing a
> concept of implicit inclusion rules. If your rule action is "Include",
> and you specify (say) a "File" rule, the matcher presumes implicit
> inclusion rules for the corresponding site and library. So, if you
> create an "Include File" rule that matches (for example)
> "/MySite/MyLibrary/MyFile", there is an implied "Site Include" rule
> for "/MySite", and an implied "Library Include" rule for
> "/MySite/MyLibrary". Similarly, if you create a "Library Include"
> rule, there is an implied "Site Include" rule that corresponds to it.
> Note that these shortcuts only applies to "Include" rules - there are
> no corresponding implied "Exclude" rules."
>
> What this means is that you should probably be declaring file rules
> with "*" as the file name for each library, rather than a library
> rule.  You might want to just try this.  If you still have trouble,
> you can try setting the "org.apache.manifoldcf.connectors" property to
> "DEBUG" in the properties.xml file and restarting ManifoldCF before
> your next crawl.  The manifoldcf.log file will then have output
> describing the decisions the SharePoint connector made about each
> site, library, file, or folder it encountered.
>
> Thanks,
> Karl
>
> On Tue, Jan 31, 2012 at 10:27 AM, Silvia, Daniel [USA]
> <Si...@bah.com> wrote:
>> Hi Karl
>>
>> The Path Rules are :
>>
>> Path Match: /Shared Documents
>> Type: library
>> Action: include
>>
>> Path Match: /IDD/Shared Documents
>> Type: library
>> Action: include
>>
>> Path Match: /IDD/Documents
>> Type: library
>> Action: include
>>
>> Path Match: /manifoldcf/Shared Documents
>> Type: library
>> Action: include
>>
>> I hope this helps.
>>
>> I really appreciate your help.
>>
>>
>>
>> ________________________________________
>> From: Karl Wright [daddywri@gmail.com]
>> Sent: Tuesday, January 31, 2012 10:01 AM
>> To: Silvia, Daniel [USA]
>> Cc: connectors-user@incubator.apache.org
>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>
>> "When I select only the fetch activity, I don't see anything in the
>> events, when I select the Document Ingest activity, I don't see
>> anything in the events."
>>
>> So either you've already run the job and the documents were accessed
>> the first time (and won't be accessed again until they change), or the
>> problem is likely that your SharePoint Path Rules are not including
>> any documents.  It would be very helpful at this point to include a
>> screen shot of the job you've created.  Since you are not on the net,
>> perhaps you can jot down your SharePoint path rules for me to have a
>> look at, as they are displayed when you view the job.
>>
>> Thanks,
>> Karl
>>
>> On Tue, Jan 31, 2012 at 9:44 AM, Silvia, Daniel [USA]
>> <Si...@bah.com> wrote:
>>> Hi Karl
>>>
>>> Ok, I have created a new job and ran the job and went to the Simple History Report.
>>>
>>> I see the Events. If all the  Activities in the Simple History Report, Document Deletion(SolrPipeline), Document Ingest(SolrPipeline), and Fetch are selected I see a start job and end job for events . When I get to the Simple History Report I can select the "Connection", I don't have an option to select the Activities I run the report first.
>>> When I select only the fetch activity, I don't see anything in the events, when I select the Document Ingest activity, I don't see anything in the events.
>>>
>>> My solr output connection has the following information:
>>> Protocol: http
>>> Server: "the server name"
>>> Port:8080 (we are running solr on Jboss port 8080)
>>> Web Application Name: solr
>>> Core Name: collection1
>>> Update Handler: update/extract
>>> Remove Handler: /update
>>> Status Handler: /admin/ping
>>>
>>>
>>>
>>> ________________________________________
>>> From: Karl Wright [daddywri@gmail.com]
>>> Sent: Tuesday, January 31, 2012 9:00 AM
>>> To: Silvia, Daniel [USA]; connectors-user@incubator.apache.org
>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>
>>> Ok, let's do one thing at a time.
>>>
>>> First:
>>>
>>> "For the Path tab where there are Path Rules, are these the paths we
>>> want ManifoldCF to follow? Each site, and each Library like Documents
>>> and Shared Documents. And in the Metadata tab, this is the tab where
>>> you indicate for each "Site" and "Library" you want to include
>>> specific metadata or include all metadata?"
>>>
>>> For SharePoint, there are Path Rules and Metadata Rules.  The Path
>>> Rules describe what documents you want to include or exclude.  The
>>> Metadata Rules describe what metadata you want to include or exclude.
>>> For right now I would ignore the Metadata Rules and just make sure you
>>> have Path Rules that mean that you have included documents.
>>>
>>> "As I run the report, I see "Documents", "Active, and "Processed"
>>> where the numbers change under the "Active" column as well as the
>>> "Document" and "Processed" column (these just get larger, where Active
>>> changes). "
>>>
>>> This "report" we actually call the Job Status screen.  The fact that
>>> the numbers get larger and the job doesn't just end indicates that you
>>> are successfully crawling your SharePoint, and you have set up the job
>>> to include at least some documents.  This is good news.  However, this
>>> is NOT the "Simple History" report I was alluding to earlier.  To get
>>> to that report, click on the "Simple History" link on the left-hand
>>> navigation area.  This report will show the events of your choice
>>> (default - ALL recorded events) over a given time window (default: the
>>> last hour).  If you've done this right you should at least see a "Job
>>> start" event.  The events you are most interested in are the "fetch"
>>> (which describes all attempts to fetch documents from SharePoint) and
>>> "document ingest", which describe attempts to get documents into Solr.
>>>  You can refresh the displayed events by clicking the "Go" button in
>>> the middle of the screen whenever you wish.
>>>
>>> I'd like you to delete your job, create it again, and start it.  Then,
>>> while it is running, I'd like you to go to the "Simple History"
>>> screen, and select the appropriate connection (your SharePoint
>>> repository connection), and click the "Go" button.  So as not to skip
>>> anything basic:
>>>
>>> (1) What event types do you see?
>>> (2) Are there "fetch" events?
>>> (3) Are there "document ingest" events?
>>>
>>> If you see no "fetch" events, that implies you have either not
>>> specified any documents to include in your job, OR your Solr
>>> connection is configured to reject too many document types so they are
>>> all getting filtered out.
>>>
>>> If you see "document ingest" events, but those have errors, it implies
>>> that the configuration of your Solr connection is incorrect and does
>>> not match the way your Solr is configured.  If you send me a specific
>>> error code and/or text I can help you figure out what is happening.
>>>
>>> If you see "document ingest" events with NO errors, but the Solr
>>> instance is not getting documents, you are describing an impossible
>>> situation.  While your Solr instance may not be configured to have the
>>> Extracting Update Handler active, or it may be at a different URL than
>>> what you pointed at, that would definitely yield errors or
>>> notifications in the Simple History.
>>>
>>> Please let me know what you actually see.
>>> Karl
>>>
>>>
>>>
>>> On Tue, Jan 31, 2012 at 7:53 AM, Silvia, Daniel [USA]
>>> <Si...@bah.com> wrote:
>>>> Hi Karl
>>>>
>>>> I am trying to figure out why I can't see anything being indexed into our Solr index. I was looking at another post where you were working with "Martijn" and that individual was not able to see info getting into Solr. In the report  that I have set up, I have included all metadata associated to each site, Share Documents, and Documents. In the Solr Field Mapping, I am associating metadata fields that are indicated in the MetaData tab to fields that exist in our solr index.
>>>>
>>>> For the Path tab where there are Path Rules, are these the paths we want ManifoldCF to follow? Each site, and each Library like Documents and Shared Documents. And in the Metadata tab, this is the tab where you indicate for each "Site" and "Library" you want to include specific metadata or include all metadata?
>>>>
>>>> As I run the report, I see "Documents", "Active, and "Processed" where the numbers change under the "Active" column as well as the "Document" and "Processed" column (these just get larger, where Active changes). While I was researching why I may not be seeing something over on the Solr side, I saw your communication with another individual indicating that I should see something like literal.xxx=yyy in the Solr log. This is an older post so there maybe something else I should see. But the only thing I see when I look at the Solr log is "[ ] webapp=/solr path=/update/extract params={commit=true} status=0 QTime=0".
>>>>
>>>> Any ideas.
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ________________________________________
>>>> From: Karl Wright [daddywri@gmail.com]
>>>> Sent: Monday, January 30, 2012 10:40 AM
>>>> To: Silvia, Daniel [USA]
>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>
>>>> The default time range for the Simple History is the last hour.  I
>>>> suspect you are unaware of that.  If you want a different time range
>>>> you will have to modify the start and end time pulldowns accordingly.
>>>>
>>>> Karl
>>>>
>>>> On Mon, Jan 30, 2012 at 10:34 AM, Silvia, Daniel [USA]
>>>> <Si...@bah.com> wrote:
>>>>> Hi Karl
>>>>>
>>>>> I am looking at the Simple History in the UI and there isn't much to see, unless I am not getting what I am suppose to.  I see the "Start Time, Activity, Identifier, Bytes, and Time, I don't get anything for Result Code or Result Description. I looked in the documentation and we should be getting something in those fields, I believe.
>>>>>
>>>>> Anyway, I will look through the mail list to see what I can find.
>>>>>
>>>>> Thanks for the help.
>>>>>
>>>>> Dan
>>>>>
>>>>> ________________________________________
>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>> Sent: Monday, January 30, 2012 8:24 AM
>>>>> To: Silvia, Daniel [USA]
>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>
>>>>> So just to be clear, I'm NOT talking about the ManifoldCF logging.
>>>>> For the Solr connector you probably won't need to turn that on; it's
>>>>> pretty simple and you can look at the Simple History in the UI to see
>>>>> what the request and response look like from Solr.  I was talking
>>>>> instead about Solr logging - when you run the Solr Webapp, by default
>>>>> all requests against the Extracting Update Handler are logged to
>>>>> standard error, so you will see them appear in the process window in
>>>>> which Solr is running.
>>>>>
>>>>> My suggestion to you is to first have a look at the Simple History for
>>>>> the job you are trying to run.  If you are getting back 500 errors
>>>>> from Solr, that means you have not set up Solr properly to work with
>>>>> ManifoldCF.  In recent versions of Solr, the example works fine out of
>>>>> the box, but when you try to deploy any other way you are often
>>>>> missing the jar that contains the extracting update handler, so of
>>>>> course nothing works.  Several people on the connectors-user list have
>>>>> run into this and if you search the list (go to the ManifoldCF site
>>>>> and click through to the mailing list page and there are links at the
>>>>> bottom for this purpose) you will find posts that describe exactly
>>>>> what is wrong and how to fix it.
>>>>>
>>>>> Hope this helps.
>>>>>
>>>>> Karl
>>>>>
>>>>>
>>>>> On Sun, Jan 29, 2012 at 2:30 PM, Silvia, Daniel [USA]
>>>>> <Si...@bah.com> wrote:
>>>>>> Yea,but for some reason the logging isn't coming through. The logging is set for info and I will have to change the logging level to DEBUG.
>>>>>>
>>>>>> Thanks again for your help.
>>>>>>
>>>>>>
>>>>>> ________________________________________
>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>> Sent: Friday, January 27, 2012 5:06 PM
>>>>>> To: Silvia, Daniel [USA]
>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>
>>>>>> Actually, the best thing for debugging the Solr connection is looking
>>>>>> at standard-output on the Solr instance.  You will see all the posts
>>>>>> that are made and what the arguments were.  Also, this is the kind of
>>>>>> question you'd get a lot of benefit from posting to the list.  The
>>>>>> end-user documentation I pointed you at before describes some of this
>>>>>> but the Solr connector has grown beyond the doc to some extent at this
>>>>>> point.
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>> On Fri, Jan 27, 2012 at 9:51 AM, Silvia, Daniel [USA]
>>>>>> <Si...@bah.com> wrote:
>>>>>>> Hi Karl
>>>>>>>
>>>>>>> Is there a log level other than  Wire-level debugging to view log staements for trying to send output to a Solr instance in the Jobs List/Creation section? We are having an issue getting content to Solr. Is there a document anywhere which defines the fields for the Jobs sections for the Solr Field Mapping tab and the Paths and MetaData tabs?
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>> ________________________________________
>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>> Sent: Thursday, January 26, 2012 10:44 AM
>>>>>>> To: Silvia, Daniel [USA]
>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>
>>>>>>> I am afraid I don't know the answer to that.  I'm sure it's infinitely
>>>>>>> configurable but it's not clear what the SharePoint web services need
>>>>>>> to do under the hood, so anything I tell you would be just a guess.
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>> On Thu, Jan 26, 2012 at 10:43 AM, Silvia, Daniel [USA]
>>>>>>> <Si...@bah.com> wrote:
>>>>>>>> Hi Karl
>>>>>>>>
>>>>>>>> One more question. Do you know the minimum permissions needed to crawl the Sharepoint instance and all sites under the instance? The individual who set my permissions set me up as the "site collection admin" for the top most site. Is there a specific admin role without setting the user crawling the sharpoint instance other than "Farm Admin"?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> ________________________________________
>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>> Sent: Thursday, January 26, 2012 9:53 AM
>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>
>>>>>>>> Good news!  Please keep in touch; we'd like to hear how things work
>>>>>>>> for you (it helps keep the software fresh ;-) ).
>>>>>>>>
>>>>>>>> Karl
>>>>>>>>
>>>>>>>> On Thu, Jan 26, 2012 at 9:48 AM, Silvia, Daniel [USA]
>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>> Hey Karl
>>>>>>>>>
>>>>>>>>> (1) was the issue. When requesting access to the SharePoint instance I indicated that I needed to be able to crawl SharePoint, I guess the problem was on my end indicating that I also needed privileges to crawl the site.
>>>>>>>>>
>>>>>>>>> Anyway, thank you for your help. When I change the SharePoint version to v 3 I get a message indicating "Connection Working".
>>>>>>>>>
>>>>>>>>> Appreciate the help.
>>>>>>>>>
>>>>>>>>> Dan
>>>>>>>>>
>>>>>>>>> ________________________________________
>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>> Sent: Thursday, January 26, 2012 9:19 AM
>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>
>>>>>>>>> The error message "axisFault=Server, detail=Server was unable to
>>>>>>>>> process request --> Requested Registry access is not allowed" is Axis
>>>>>>>>> interpreting an error message from SharePoint.  What it is saying is
>>>>>>>>> that the user you are trying to crawl with is unable to read the
>>>>>>>>> SharePoint machine's registry but needs to.  There are two possible
>>>>>>>>> causes for this:
>>>>>>>>>
>>>>>>>>> (1) The user you gave doesn't have enough permissions to crawl SharePoint
>>>>>>>>> (2) When you installed the SharePoint MCPermissions plugin, you
>>>>>>>>> installed it logged in as a user that did not enough permissions to do
>>>>>>>>> what it needs to do.
>>>>>>>>>
>>>>>>>>> You can tell the difference between the two by selecting "SharePoint
>>>>>>>>> 2.0" in the sharepoint version pulldown.  If a connection saved in
>>>>>>>>> this way says "Connection working", it means that the MCPermissions
>>>>>>>>> plugin has the permission problem, not your user.
>>>>>>>>>
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>> On Thu, Jan 26, 2012 at 9:14 AM, Silvia, Daniel [USA]
>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>> Hi Karl
>>>>>>>>>>
>>>>>>>>>> When I try to use option (1) and don't put anything in the Site field, I get an error message "axisFault=Server, detail=Server was unable to process request --> Requested Registry access is not allowed" and when I put a "/" in the site filed I get  a GUI error indicating that the site field can't end with a "/".
>>>>>>>>>>
>>>>>>>>>> Anyway, do you have any ideas. Or maybe the Sharepoint instance is not configured properly for us to crawl?
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ________________________________________
>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>> Sent: Thursday, January 26, 2012 8:52 AM
>>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>
>>>>>>>>>> SharePoint has two kinds of site:
>>>>>>>>>>
>>>>>>>>>> (1) the root site, which can be reached by the path http://server:port
>>>>>>>>>> (2) a number of sites under the 'virtual path', with URLs of the form:
>>>>>>>>>>
>>>>>>>>>> http://server:port/something/sitename
>>>>>>>>>>
>>>>>>>>>> The "something" is, by default, the string "site", so
>>>>>>>>>> http://server:port/site/xyz might be the URL of one such virtual site.
>>>>>>>>>>
>>>>>>>>>> The form of the "site" field in the SharePoint connection for the
>>>>>>>>>> first is either blank or "/" (can't remember which right now), and the
>>>>>>>>>> form of the "site" field for the second is "/site/xyz".  On no account
>>>>>>>>>> does the connector expect to see default.aspx attached to that path,
>>>>>>>>>> so you should not do this; it cannot work.
>>>>>>>>>>
>>>>>>>>>> FWIW, my recommendation to try setting the connection type to
>>>>>>>>>> "SharePoint 2.0" was to rule out any possible installation issue with
>>>>>>>>>> the ManifoldCF sharepoint plugin.  The connection check for 2.0 does
>>>>>>>>>> not look for it; only the connection check for 3.0 does.
>>>>>>>>>>
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>> On Thu, Jan 26, 2012 at 8:41 AM, Silvia, Daniel [USA]
>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>> Hey Karl
>>>>>>>>>>>
>>>>>>>>>>> I am also getting an "HTTP Error 401.2: Unauthorized: Access is denied due to server configuration" when setting the Site field to /default.aspx. Do most Sharepoint instances have the urls set to something like http://server:port/sites/...... instead of http://server:port/? When I use the "/default.aspx" I see in the log files that ManifoldCF is trying to go to the Lists.asmx service with the url http://server:port/default.aspx/_vti_bin/Lists.asmx, where nothing is found.
>>>>>>>>>>>
>>>>>>>>>>> As you can tell I am not much of a SharePoint user or installer.
>>>>>>>>>>>
>>>>>>>>>>> Also, I don't think the issue is with the connector in ManifoldCF, I am just trying to
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ________________________________________
>>>>>>>>>>> From: Silvia, Daniel [USA]
>>>>>>>>>>> Sent: Thursday, January 26, 2012 7:23 AM
>>>>>>>>>>> To: Karl Wright
>>>>>>>>>>> Subject: RE: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>
>>>>>>>>>>> Hey Karl
>>>>>>>>>>>
>>>>>>>>>>> The issue I am having is that the Sharepoint instance url is something like http://server:port/default.aspx. If I don't put anything in the site field I get a message indicating "Requested Registry Access is not allowed". I was putting "/default.apsx" as my Site field which I believe may have been the issue. However, what do you put in your Site field when the site is the top most site, as in http://server:port/default.aspx?
>>>>>>>>>>>
>>>>>>>>>>> I would love to send you the log messages, but I am working on a network which is not connected to the outside.
>>>>>>>>>>>
>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ________________________________________
>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>> Sent: Wednesday, January 25, 2012 6:12 PM
>>>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>
>>>>>>>>>>> Daniel,
>>>>>>>>>>>
>>>>>>>>>>> FWIW, I can help you diagnose the issue, but to do so you really need
>>>>>>>>>>> to give me some concrete data.  I'm happy to grovel over the whole
>>>>>>>>>>> wire log if you feel you can send it to me; something that may not
>>>>>>>>>>> seem important to you will likely stand out strongly to me.  I can,
>>>>>>>>>>> for example, see whether you are getting back HTML because of an
>>>>>>>>>>> authentication error, for instance.  And if you ARE getting back valid
>>>>>>>>>>> SOAP, I would then be sure that something was wrong with the Axis
>>>>>>>>>>> client configuration, and I could pursue that here with the data
>>>>>>>>>>> provided.  The problem with software like SharePoint running on IIS is
>>>>>>>>>>> that it can be configured a nearly infinite number of ways, so
>>>>>>>>>>> diagnosis is more of an art than a science.  I strongly suspect that
>>>>>>>>>>> you're laboring under a pretty straightforward misconception which is
>>>>>>>>>>> likely blocking progress, rather than there being an issue with the
>>>>>>>>>>> SharePoint connector itself.  But I can't tell that without more
>>>>>>>>>>> detailed communication.
>>>>>>>>>>>
>>>>>>>>>>> Also, you mentioned that the Lists.asmx service was right where you
>>>>>>>>>>> expected it to be.  Have you read the SharePoint Connector part of the
>>>>>>>>>>> end-user documentation?  To whit:
>>>>>>>>>>>
>>>>>>>>>>> "Select the server protocol, and enter the server name and port, based
>>>>>>>>>>> on what you recorded from the URL for your SharePoint site. For the
>>>>>>>>>>> "Site path" field, type in the portion of the root site URL that
>>>>>>>>>>> includes everything after the server and port, except for the final
>>>>>>>>>>> "aspx" file. For example, if the SharePoint URL is
>>>>>>>>>>> "http://myserver:81/sites/somewhere/index.asp", the site path would be
>>>>>>>>>>> "/sites/somewhere"."  The Lists.asmx service in this example would be
>>>>>>>>>>> expected to be found at
>>>>>>>>>>> "http://myserver:81/sites/somewhere/_vti_bin/Lists.asmx".  And the URL
>>>>>>>>>>> you would start with would be the URL you see in the browser when you
>>>>>>>>>>> log into the SharePoint web client and go to the site you wish to
>>>>>>>>>>> crawl.  Is this what you are doing?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks again,
>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jan 25, 2012 at 12:33 PM, Karl Wright <da...@gmail.com> wrote:
>>>>>>>>>>>> The code that parses the SOAP response is Apache Axis.  This hasn't
>>>>>>>>>>>> changed in several years.
>>>>>>>>>>>>
>>>>>>>>>>>> Can you answer the following questions:
>>>>>>>>>>>>
>>>>>>>>>>>> (1) When the SharePoint connector makes a request to SharePoint, is
>>>>>>>>>>>> the response HTML, or is it XML?  Does it have an XML header which
>>>>>>>>>>>> describes a Microsoft XML namespace?  It sure sounds like it is
>>>>>>>>>>>> responding with HTML.  The SharePoint connector is expecting to
>>>>>>>>>>>> communicate using SOAP.  Is the response valid SOAP?
>>>>>>>>>>>>
>>>>>>>>>>>> (2) What version of SharePoint are you trying to connect to?  Is the
>>>>>>>>>>>> SharePoint 2007?  SharePoint 2010?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Karl
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jan 25, 2012 at 12:26 PM, Silvia, Daniel [USA]
>>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have added the specific log4j lines for Http Client wire and I restarted the ManifoldCF instance. I was also see the webservice Lists.asmx through IE. When reviewing the log files I was able to see some of the content that resides in the Sharepoint instance in the content coming back from the request. However, I am still seeing the error messages in the ManifoldCF GUI as well as in the log file indicating  "Bad Envelope: HTML" ,"No service named ListsSoap is available" and "No service named http://schemas.microsoft.com/sharepoint/soap/GetListCollection is available".
>>>>>>>>>>>>>
>>>>>>>>>>>>> Could there be something going on with the way the services are being built on the client side?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Appreciate your help.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ________________________________________
>>>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>>>> Sent: Tuesday, January 24, 2012 4:52 PM
>>>>>>>>>>>>> To: Silvia, Daniel [USA]; connectors-user@incubator.apache.org
>>>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have not seen this exact problem before.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The "Bad envelope tag: HTML" indicates that the SOAP request the
>>>>>>>>>>>>> SharePoint connector is attempting to perform is, in fact, returning
>>>>>>>>>>>>> an HTML response.  This usually indicates that the server or path
>>>>>>>>>>>>> parameters you've used to set up the connection are not set correctly,
>>>>>>>>>>>>> and SharePoint is not actually being engaged.
>>>>>>>>>>>>>
>>>>>>>>>>>>> But usually when that happens I don't recall a ConfigurationException
>>>>>>>>>>>>> logged, unless it's what Axis does in response to the HTML.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The best thing to do at this point is turn on Http Client wire
>>>>>>>>>>>>> logging, restart ManifoldCF, and view the connection.  The log will
>>>>>>>>>>>>> then contain a record of the exact SOAP requests and the responses,
>>>>>>>>>>>>> and we can see what's wrong.  The technique is described here:
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/CONNECTORS/Debugging+Connections
>>>>>>>>>>>>>
>>>>>>>>>>>>> You can also confirm that the right SharePoint web services are
>>>>>>>>>>>>> functioning on the machine in question by trying to access them
>>>>>>>>>>>>> directly.  For the Lists web service, which is the one it sounds like
>>>>>>>>>>>>> it was complaining about, try using IE (not Firefox etc because you
>>>>>>>>>>>>> want NTLM support) to go to the url where you think the web service
>>>>>>>>>>>>> lives.  This will be http: or https:, plus the server, plus the port,
>>>>>>>>>>>>> plus the path, plus "_vti_bin/Lists.asmx".  You should see an
>>>>>>>>>>>>> unequivocable SharePoint response.  For an example from the Microsoft
>>>>>>>>>>>>> demo service, try http://www.wssdemo.com/_vti_bin/Lists.asmx.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Please let me know how it goes, and cc the dev list (as I have) so a
>>>>>>>>>>>>> record of what you're encountering can be made available to others.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jan 24, 2012 at 1:52 PM, Silvia, Daniel [USA]
>>>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have downloaded the newest version of ManifoldCF v .4 and have run the necessary ant scripts to download dependencies and then built the entire project. I have also had the ShrePoint webservice MetCarta.SharePoint.MCPermissionsService.wsp deployed on the SharePoint instance due to running version 3 of SharePoint (SharePoint 2007). When I try to create a Repository Connection and select "Save" I get a message on the ManifoldCF front end of "org.xml.sax.SAXException Bad envelope tag: HTML". When I look at the log file I see an error message " org.apache.axis.ConfigurationException: No service named ListsSoap is available".
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Can you tell me if you have seen this issue before and what may be causing this issue?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>> ________________________________________
>>>>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>>>>> Sent: Friday, January 20, 2012 7:31 AM
>>>>>>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In order for the SharePoint connector to build, you need to have the
>>>>>>>>>>>>>> wsdls in place in the right area.  We cannot ship those because of
>>>>>>>>>>>>>> potential copyright issues.  The easiest way to obtain the right
>>>>>>>>>>>>>> dependencies is:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ant download-dependencies
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Then, just build normally:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ant build
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This will only work for ManifoldCF-0.4-incubating, or trunk.
>>>>>>>>>>>>>> 0.4-incubating is still in the process of being signed off by the
>>>>>>>>>>>>>> incubator, but you can find the release candidate here:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://people.apache.org/~kwright
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Jan 20, 2012 at 7:02 AM, Silvia, Daniel [USA]
>>>>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I work with Matt Parker and we are in the process of developing a pipeline
>>>>>>>>>>>>>>> that uses ManifoldCF at the beginning. I just subscribed to the
>>>>>>>>>>>>>>> connectors-user-subscribe@incubator.apache.org
>>>>>>>>>>>>>>> group yesterday and submitted an e-mail question to the group. Can you help
>>>>>>>>>>>>>>> us with the below issue?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I downloaded MCF and started playing with the default setup under Jetty and
>>>>>>>>>>>>>>> Derby. It starts up without any issue. I am trying to configure a SharePoint
>>>>>>>>>>>>>>> connector, connecting to SharePoint Service 3. I have been following the
>>>>>>>>>>>>>>> instructions and I am at the point of deploying the custom SharePoint web
>>>>>>>>>>>>>>> service to the SharePoint instance. The instructions indicate that I should
>>>>>>>>>>>>>>> get the web service from dist/sharepoint-integration after building MCF.
>>>>>>>>>>>>>>> However, after looking through the entire directory structure, I am unable
>>>>>>>>>>>>>>> to find the service to deploy.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Can someone tell me where to find this service?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Daniel Silvia

Re: ManifoldCF's dist/shapoint-integration dir

Posted by Karl Wright <da...@gmail.com>.
Hi Daniel,

If you are seeing fetches in the Simple History that include the wiki
URLs you are trying to capture, the SharePoint job is likely correct.
Are you seeing "Document ingest" activities for the same documents?
If so, they are being sent to Solr, and you'd have to look into the
Solr configuration to figure out why they aren't being indexed.

Thanks,


On Sun, Feb 12, 2012 at 11:37 AM, Silvia, Daniel [USA]
<Si...@bah.com> wrote:
> Hi Karl
>
> Quick question regarding SharePoint Wikis and ingesting them into Solr.
>
> I have been trying to get the Wikis, created in SharePoint, to be ingested into Solr. I am able to see the Wikis in the logging where the SharePoint Connector pulls everything from site, however, I do not see the Wikis content in the solr instance. When creating a job to run, do I need to indicate a path similar to "*Wiki* for the entire site or do I need to configure the solr metadata in the job to capture "WikiField" element in the xml being passed to the Solr connector?
>
> Thanks for your help.
>
> Dan
> ________________________________________
> From: Karl Wright [daddywri@gmail.com]
> Sent: Tuesday, January 31, 2012 10:52 AM
> To: Silvia, Daniel [USA]
> Cc: connectors-user@incubator.apache.org
> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>
> It's been a while since I've set up a SharePoint job but I think what
> you are missing is a file rule (instead of just a library rule).
> Here's what the end-user documentation says on the matter:
>
> "Each rule consists of a path, a rule type, and an action. The actions
> are "Include" and "Exclude". The rule type tells the connection what
> kind of SharePoint entity it is allowed to exactly match. For example,
> a "File" rule will only exactly match SharePoint paths that represent
> files - it cannot exactly match sites or libraries. The path itself is
> just a sequence of characters, where the "*" character has the special
> meaning of being able to match any number of any kind of characters,
> and the "?" character matches exactly one character of any kind.
>
> The rule matcher extends strict, exact matching by introducing a
> concept of implicit inclusion rules. If your rule action is "Include",
> and you specify (say) a "File" rule, the matcher presumes implicit
> inclusion rules for the corresponding site and library. So, if you
> create an "Include File" rule that matches (for example)
> "/MySite/MyLibrary/MyFile", there is an implied "Site Include" rule
> for "/MySite", and an implied "Library Include" rule for
> "/MySite/MyLibrary". Similarly, if you create a "Library Include"
> rule, there is an implied "Site Include" rule that corresponds to it.
> Note that these shortcuts only applies to "Include" rules - there are
> no corresponding implied "Exclude" rules."
>
> What this means is that you should probably be declaring file rules
> with "*" as the file name for each library, rather than a library
> rule.  You might want to just try this.  If you still have trouble,
> you can try setting the "org.apache.manifoldcf.connectors" property to
> "DEBUG" in the properties.xml file and restarting ManifoldCF before
> your next crawl.  The manifoldcf.log file will then have output
> describing the decisions the SharePoint connector made about each
> site, library, file, or folder it encountered.
>
> Thanks,
> Karl
>
> On Tue, Jan 31, 2012 at 10:27 AM, Silvia, Daniel [USA]
> <Si...@bah.com> wrote:
>> Hi Karl
>>
>> The Path Rules are :
>>
>> Path Match: /Shared Documents
>> Type: library
>> Action: include
>>
>> Path Match: /IDD/Shared Documents
>> Type: library
>> Action: include
>>
>> Path Match: /IDD/Documents
>> Type: library
>> Action: include
>>
>> Path Match: /manifoldcf/Shared Documents
>> Type: library
>> Action: include
>>
>> I hope this helps.
>>
>> I really appreciate your help.
>>
>>
>>
>> ________________________________________
>> From: Karl Wright [daddywri@gmail.com]
>> Sent: Tuesday, January 31, 2012 10:01 AM
>> To: Silvia, Daniel [USA]
>> Cc: connectors-user@incubator.apache.org
>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>
>> "When I select only the fetch activity, I don't see anything in the
>> events, when I select the Document Ingest activity, I don't see
>> anything in the events."
>>
>> So either you've already run the job and the documents were accessed
>> the first time (and won't be accessed again until they change), or the
>> problem is likely that your SharePoint Path Rules are not including
>> any documents.  It would be very helpful at this point to include a
>> screen shot of the job you've created.  Since you are not on the net,
>> perhaps you can jot down your SharePoint path rules for me to have a
>> look at, as they are displayed when you view the job.
>>
>> Thanks,
>> Karl
>>
>> On Tue, Jan 31, 2012 at 9:44 AM, Silvia, Daniel [USA]
>> <Si...@bah.com> wrote:
>>> Hi Karl
>>>
>>> Ok, I have created a new job and ran the job and went to the Simple History Report.
>>>
>>> I see the Events. If all the  Activities in the Simple History Report, Document Deletion(SolrPipeline), Document Ingest(SolrPipeline), and Fetch are selected I see a start job and end job for events . When I get to the Simple History Report I can select the "Connection", I don't have an option to select the Activities I run the report first.
>>> When I select only the fetch activity, I don't see anything in the events, when I select the Document Ingest activity, I don't see anything in the events.
>>>
>>> My solr output connection has the following information:
>>> Protocol: http
>>> Server: "the server name"
>>> Port:8080 (we are running solr on Jboss port 8080)
>>> Web Application Name: solr
>>> Core Name: collection1
>>> Update Handler: update/extract
>>> Remove Handler: /update
>>> Status Handler: /admin/ping
>>>
>>>
>>>
>>> ________________________________________
>>> From: Karl Wright [daddywri@gmail.com]
>>> Sent: Tuesday, January 31, 2012 9:00 AM
>>> To: Silvia, Daniel [USA]; connectors-user@incubator.apache.org
>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>
>>> Ok, let's do one thing at a time.
>>>
>>> First:
>>>
>>> "For the Path tab where there are Path Rules, are these the paths we
>>> want ManifoldCF to follow? Each site, and each Library like Documents
>>> and Shared Documents. And in the Metadata tab, this is the tab where
>>> you indicate for each "Site" and "Library" you want to include
>>> specific metadata or include all metadata?"
>>>
>>> For SharePoint, there are Path Rules and Metadata Rules.  The Path
>>> Rules describe what documents you want to include or exclude.  The
>>> Metadata Rules describe what metadata you want to include or exclude.
>>> For right now I would ignore the Metadata Rules and just make sure you
>>> have Path Rules that mean that you have included documents.
>>>
>>> "As I run the report, I see "Documents", "Active, and "Processed"
>>> where the numbers change under the "Active" column as well as the
>>> "Document" and "Processed" column (these just get larger, where Active
>>> changes). "
>>>
>>> This "report" we actually call the Job Status screen.  The fact that
>>> the numbers get larger and the job doesn't just end indicates that you
>>> are successfully crawling your SharePoint, and you have set up the job
>>> to include at least some documents.  This is good news.  However, this
>>> is NOT the "Simple History" report I was alluding to earlier.  To get
>>> to that report, click on the "Simple History" link on the left-hand
>>> navigation area.  This report will show the events of your choice
>>> (default - ALL recorded events) over a given time window (default: the
>>> last hour).  If you've done this right you should at least see a "Job
>>> start" event.  The events you are most interested in are the "fetch"
>>> (which describes all attempts to fetch documents from SharePoint) and
>>> "document ingest", which describe attempts to get documents into Solr.
>>>  You can refresh the displayed events by clicking the "Go" button in
>>> the middle of the screen whenever you wish.
>>>
>>> I'd like you to delete your job, create it again, and start it.  Then,
>>> while it is running, I'd like you to go to the "Simple History"
>>> screen, and select the appropriate connection (your SharePoint
>>> repository connection), and click the "Go" button.  So as not to skip
>>> anything basic:
>>>
>>> (1) What event types do you see?
>>> (2) Are there "fetch" events?
>>> (3) Are there "document ingest" events?
>>>
>>> If you see no "fetch" events, that implies you have either not
>>> specified any documents to include in your job, OR your Solr
>>> connection is configured to reject too many document types so they are
>>> all getting filtered out.
>>>
>>> If you see "document ingest" events, but those have errors, it implies
>>> that the configuration of your Solr connection is incorrect and does
>>> not match the way your Solr is configured.  If you send me a specific
>>> error code and/or text I can help you figure out what is happening.
>>>
>>> If you see "document ingest" events with NO errors, but the Solr
>>> instance is not getting documents, you are describing an impossible
>>> situation.  While your Solr instance may not be configured to have the
>>> Extracting Update Handler active, or it may be at a different URL than
>>> what you pointed at, that would definitely yield errors or
>>> notifications in the Simple History.
>>>
>>> Please let me know what you actually see.
>>> Karl
>>>
>>>
>>>
>>> On Tue, Jan 31, 2012 at 7:53 AM, Silvia, Daniel [USA]
>>> <Si...@bah.com> wrote:
>>>> Hi Karl
>>>>
>>>> I am trying to figure out why I can't see anything being indexed into our Solr index. I was looking at another post where you were working with "Martijn" and that individual was not able to see info getting into Solr. In the report  that I have set up, I have included all metadata associated to each site, Share Documents, and Documents. In the Solr Field Mapping, I am associating metadata fields that are indicated in the MetaData tab to fields that exist in our solr index.
>>>>
>>>> For the Path tab where there are Path Rules, are these the paths we want ManifoldCF to follow? Each site, and each Library like Documents and Shared Documents. And in the Metadata tab, this is the tab where you indicate for each "Site" and "Library" you want to include specific metadata or include all metadata?
>>>>
>>>> As I run the report, I see "Documents", "Active, and "Processed" where the numbers change under the "Active" column as well as the "Document" and "Processed" column (these just get larger, where Active changes). While I was researching why I may not be seeing something over on the Solr side, I saw your communication with another individual indicating that I should see something like literal.xxx=yyy in the Solr log. This is an older post so there maybe something else I should see. But the only thing I see when I look at the Solr log is "[ ] webapp=/solr path=/update/extract params={commit=true} status=0 QTime=0".
>>>>
>>>> Any ideas.
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ________________________________________
>>>> From: Karl Wright [daddywri@gmail.com]
>>>> Sent: Monday, January 30, 2012 10:40 AM
>>>> To: Silvia, Daniel [USA]
>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>
>>>> The default time range for the Simple History is the last hour.  I
>>>> suspect you are unaware of that.  If you want a different time range
>>>> you will have to modify the start and end time pulldowns accordingly.
>>>>
>>>> Karl
>>>>
>>>> On Mon, Jan 30, 2012 at 10:34 AM, Silvia, Daniel [USA]
>>>> <Si...@bah.com> wrote:
>>>>> Hi Karl
>>>>>
>>>>> I am looking at the Simple History in the UI and there isn't much to see, unless I am not getting what I am suppose to.  I see the "Start Time, Activity, Identifier, Bytes, and Time, I don't get anything for Result Code or Result Description. I looked in the documentation and we should be getting something in those fields, I believe.
>>>>>
>>>>> Anyway, I will look through the mail list to see what I can find.
>>>>>
>>>>> Thanks for the help.
>>>>>
>>>>> Dan
>>>>>
>>>>> ________________________________________
>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>> Sent: Monday, January 30, 2012 8:24 AM
>>>>> To: Silvia, Daniel [USA]
>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>
>>>>> So just to be clear, I'm NOT talking about the ManifoldCF logging.
>>>>> For the Solr connector you probably won't need to turn that on; it's
>>>>> pretty simple and you can look at the Simple History in the UI to see
>>>>> what the request and response look like from Solr.  I was talking
>>>>> instead about Solr logging - when you run the Solr Webapp, by default
>>>>> all requests against the Extracting Update Handler are logged to
>>>>> standard error, so you will see them appear in the process window in
>>>>> which Solr is running.
>>>>>
>>>>> My suggestion to you is to first have a look at the Simple History for
>>>>> the job you are trying to run.  If you are getting back 500 errors
>>>>> from Solr, that means you have not set up Solr properly to work with
>>>>> ManifoldCF.  In recent versions of Solr, the example works fine out of
>>>>> the box, but when you try to deploy any other way you are often
>>>>> missing the jar that contains the extracting update handler, so of
>>>>> course nothing works.  Several people on the connectors-user list have
>>>>> run into this and if you search the list (go to the ManifoldCF site
>>>>> and click through to the mailing list page and there are links at the
>>>>> bottom for this purpose) you will find posts that describe exactly
>>>>> what is wrong and how to fix it.
>>>>>
>>>>> Hope this helps.
>>>>>
>>>>> Karl
>>>>>
>>>>>
>>>>> On Sun, Jan 29, 2012 at 2:30 PM, Silvia, Daniel [USA]
>>>>> <Si...@bah.com> wrote:
>>>>>> Yea,but for some reason the logging isn't coming through. The logging is set for info and I will have to change the logging level to DEBUG.
>>>>>>
>>>>>> Thanks again for your help.
>>>>>>
>>>>>>
>>>>>> ________________________________________
>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>> Sent: Friday, January 27, 2012 5:06 PM
>>>>>> To: Silvia, Daniel [USA]
>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>
>>>>>> Actually, the best thing for debugging the Solr connection is looking
>>>>>> at standard-output on the Solr instance.  You will see all the posts
>>>>>> that are made and what the arguments were.  Also, this is the kind of
>>>>>> question you'd get a lot of benefit from posting to the list.  The
>>>>>> end-user documentation I pointed you at before describes some of this
>>>>>> but the Solr connector has grown beyond the doc to some extent at this
>>>>>> point.
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>> On Fri, Jan 27, 2012 at 9:51 AM, Silvia, Daniel [USA]
>>>>>> <Si...@bah.com> wrote:
>>>>>>> Hi Karl
>>>>>>>
>>>>>>> Is there a log level other than  Wire-level debugging to view log staements for trying to send output to a Solr instance in the Jobs List/Creation section? We are having an issue getting content to Solr. Is there a document anywhere which defines the fields for the Jobs sections for the Solr Field Mapping tab and the Paths and MetaData tabs?
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>> ________________________________________
>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>> Sent: Thursday, January 26, 2012 10:44 AM
>>>>>>> To: Silvia, Daniel [USA]
>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>
>>>>>>> I am afraid I don't know the answer to that.  I'm sure it's infinitely
>>>>>>> configurable but it's not clear what the SharePoint web services need
>>>>>>> to do under the hood, so anything I tell you would be just a guess.
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>> On Thu, Jan 26, 2012 at 10:43 AM, Silvia, Daniel [USA]
>>>>>>> <Si...@bah.com> wrote:
>>>>>>>> Hi Karl
>>>>>>>>
>>>>>>>> One more question. Do you know the minimum permissions needed to crawl the Sharepoint instance and all sites under the instance? The individual who set my permissions set me up as the "site collection admin" for the top most site. Is there a specific admin role without setting the user crawling the sharpoint instance other than "Farm Admin"?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> ________________________________________
>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>> Sent: Thursday, January 26, 2012 9:53 AM
>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>
>>>>>>>> Good news!  Please keep in touch; we'd like to hear how things work
>>>>>>>> for you (it helps keep the software fresh ;-) ).
>>>>>>>>
>>>>>>>> Karl
>>>>>>>>
>>>>>>>> On Thu, Jan 26, 2012 at 9:48 AM, Silvia, Daniel [USA]
>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>> Hey Karl
>>>>>>>>>
>>>>>>>>> (1) was the issue. When requesting access to the SharePoint instance I indicated that I needed to be able to crawl SharePoint, I guess the problem was on my end indicating that I also needed privileges to crawl the site.
>>>>>>>>>
>>>>>>>>> Anyway, thank you for your help. When I change the SharePoint version to v 3 I get a message indicating "Connection Working".
>>>>>>>>>
>>>>>>>>> Appreciate the help.
>>>>>>>>>
>>>>>>>>> Dan
>>>>>>>>>
>>>>>>>>> ________________________________________
>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>> Sent: Thursday, January 26, 2012 9:19 AM
>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>
>>>>>>>>> The error message "axisFault=Server, detail=Server was unable to
>>>>>>>>> process request --> Requested Registry access is not allowed" is Axis
>>>>>>>>> interpreting an error message from SharePoint.  What it is saying is
>>>>>>>>> that the user you are trying to crawl with is unable to read the
>>>>>>>>> SharePoint machine's registry but needs to.  There are two possible
>>>>>>>>> causes for this:
>>>>>>>>>
>>>>>>>>> (1) The user you gave doesn't have enough permissions to crawl SharePoint
>>>>>>>>> (2) When you installed the SharePoint MCPermissions plugin, you
>>>>>>>>> installed it logged in as a user that did not enough permissions to do
>>>>>>>>> what it needs to do.
>>>>>>>>>
>>>>>>>>> You can tell the difference between the two by selecting "SharePoint
>>>>>>>>> 2.0" in the sharepoint version pulldown.  If a connection saved in
>>>>>>>>> this way says "Connection working", it means that the MCPermissions
>>>>>>>>> plugin has the permission problem, not your user.
>>>>>>>>>
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>> On Thu, Jan 26, 2012 at 9:14 AM, Silvia, Daniel [USA]
>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>> Hi Karl
>>>>>>>>>>
>>>>>>>>>> When I try to use option (1) and don't put anything in the Site field, I get an error message "axisFault=Server, detail=Server was unable to process request --> Requested Registry access is not allowed" and when I put a "/" in the site filed I get  a GUI error indicating that the site field can't end with a "/".
>>>>>>>>>>
>>>>>>>>>> Anyway, do you have any ideas. Or maybe the Sharepoint instance is not configured properly for us to crawl?
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ________________________________________
>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>> Sent: Thursday, January 26, 2012 8:52 AM
>>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>
>>>>>>>>>> SharePoint has two kinds of site:
>>>>>>>>>>
>>>>>>>>>> (1) the root site, which can be reached by the path http://server:port
>>>>>>>>>> (2) a number of sites under the 'virtual path', with URLs of the form:
>>>>>>>>>>
>>>>>>>>>> http://server:port/something/sitename
>>>>>>>>>>
>>>>>>>>>> The "something" is, by default, the string "site", so
>>>>>>>>>> http://server:port/site/xyz might be the URL of one such virtual site.
>>>>>>>>>>
>>>>>>>>>> The form of the "site" field in the SharePoint connection for the
>>>>>>>>>> first is either blank or "/" (can't remember which right now), and the
>>>>>>>>>> form of the "site" field for the second is "/site/xyz".  On no account
>>>>>>>>>> does the connector expect to see default.aspx attached to that path,
>>>>>>>>>> so you should not do this; it cannot work.
>>>>>>>>>>
>>>>>>>>>> FWIW, my recommendation to try setting the connection type to
>>>>>>>>>> "SharePoint 2.0" was to rule out any possible installation issue with
>>>>>>>>>> the ManifoldCF sharepoint plugin.  The connection check for 2.0 does
>>>>>>>>>> not look for it; only the connection check for 3.0 does.
>>>>>>>>>>
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>> On Thu, Jan 26, 2012 at 8:41 AM, Silvia, Daniel [USA]
>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>> Hey Karl
>>>>>>>>>>>
>>>>>>>>>>> I am also getting an "HTTP Error 401.2: Unauthorized: Access is denied due to server configuration" when setting the Site field to /default.aspx. Do most Sharepoint instances have the urls set to something like http://server:port/sites/...... instead of http://server:port/? When I use the "/default.aspx" I see in the log files that ManifoldCF is trying to go to the Lists.asmx service with the url http://server:port/default.aspx/_vti_bin/Lists.asmx, where nothing is found.
>>>>>>>>>>>
>>>>>>>>>>> As you can tell I am not much of a SharePoint user or installer.
>>>>>>>>>>>
>>>>>>>>>>> Also, I don't think the issue is with the connector in ManifoldCF, I am just trying to
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ________________________________________
>>>>>>>>>>> From: Silvia, Daniel [USA]
>>>>>>>>>>> Sent: Thursday, January 26, 2012 7:23 AM
>>>>>>>>>>> To: Karl Wright
>>>>>>>>>>> Subject: RE: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>
>>>>>>>>>>> Hey Karl
>>>>>>>>>>>
>>>>>>>>>>> The issue I am having is that the Sharepoint instance url is something like http://server:port/default.aspx. If I don't put anything in the site field I get a message indicating "Requested Registry Access is not allowed". I was putting "/default.apsx" as my Site field which I believe may have been the issue. However, what do you put in your Site field when the site is the top most site, as in http://server:port/default.aspx?
>>>>>>>>>>>
>>>>>>>>>>> I would love to send you the log messages, but I am working on a network which is not connected to the outside.
>>>>>>>>>>>
>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ________________________________________
>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>> Sent: Wednesday, January 25, 2012 6:12 PM
>>>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>
>>>>>>>>>>> Daniel,
>>>>>>>>>>>
>>>>>>>>>>> FWIW, I can help you diagnose the issue, but to do so you really need
>>>>>>>>>>> to give me some concrete data.  I'm happy to grovel over the whole
>>>>>>>>>>> wire log if you feel you can send it to me; something that may not
>>>>>>>>>>> seem important to you will likely stand out strongly to me.  I can,
>>>>>>>>>>> for example, see whether you are getting back HTML because of an
>>>>>>>>>>> authentication error, for instance.  And if you ARE getting back valid
>>>>>>>>>>> SOAP, I would then be sure that something was wrong with the Axis
>>>>>>>>>>> client configuration, and I could pursue that here with the data
>>>>>>>>>>> provided.  The problem with software like SharePoint running on IIS is
>>>>>>>>>>> that it can be configured a nearly infinite number of ways, so
>>>>>>>>>>> diagnosis is more of an art than a science.  I strongly suspect that
>>>>>>>>>>> you're laboring under a pretty straightforward misconception which is
>>>>>>>>>>> likely blocking progress, rather than there being an issue with the
>>>>>>>>>>> SharePoint connector itself.  But I can't tell that without more
>>>>>>>>>>> detailed communication.
>>>>>>>>>>>
>>>>>>>>>>> Also, you mentioned that the Lists.asmx service was right where you
>>>>>>>>>>> expected it to be.  Have you read the SharePoint Connector part of the
>>>>>>>>>>> end-user documentation?  To whit:
>>>>>>>>>>>
>>>>>>>>>>> "Select the server protocol, and enter the server name and port, based
>>>>>>>>>>> on what you recorded from the URL for your SharePoint site. For the
>>>>>>>>>>> "Site path" field, type in the portion of the root site URL that
>>>>>>>>>>> includes everything after the server and port, except for the final
>>>>>>>>>>> "aspx" file. For example, if the SharePoint URL is
>>>>>>>>>>> "http://myserver:81/sites/somewhere/index.asp", the site path would be
>>>>>>>>>>> "/sites/somewhere"."  The Lists.asmx service in this example would be
>>>>>>>>>>> expected to be found at
>>>>>>>>>>> "http://myserver:81/sites/somewhere/_vti_bin/Lists.asmx".  And the URL
>>>>>>>>>>> you would start with would be the URL you see in the browser when you
>>>>>>>>>>> log into the SharePoint web client and go to the site you wish to
>>>>>>>>>>> crawl.  Is this what you are doing?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks again,
>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jan 25, 2012 at 12:33 PM, Karl Wright <da...@gmail.com> wrote:
>>>>>>>>>>>> The code that parses the SOAP response is Apache Axis.  This hasn't
>>>>>>>>>>>> changed in several years.
>>>>>>>>>>>>
>>>>>>>>>>>> Can you answer the following questions:
>>>>>>>>>>>>
>>>>>>>>>>>> (1) When the SharePoint connector makes a request to SharePoint, is
>>>>>>>>>>>> the response HTML, or is it XML?  Does it have an XML header which
>>>>>>>>>>>> describes a Microsoft XML namespace?  It sure sounds like it is
>>>>>>>>>>>> responding with HTML.  The SharePoint connector is expecting to
>>>>>>>>>>>> communicate using SOAP.  Is the response valid SOAP?
>>>>>>>>>>>>
>>>>>>>>>>>> (2) What version of SharePoint are you trying to connect to?  Is the
>>>>>>>>>>>> SharePoint 2007?  SharePoint 2010?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Karl
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jan 25, 2012 at 12:26 PM, Silvia, Daniel [USA]
>>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have added the specific log4j lines for Http Client wire and I restarted the ManifoldCF instance. I was also see the webservice Lists.asmx through IE. When reviewing the log files I was able to see some of the content that resides in the Sharepoint instance in the content coming back from the request. However, I am still seeing the error messages in the ManifoldCF GUI as well as in the log file indicating  "Bad Envelope: HTML" ,"No service named ListsSoap is available" and "No service named http://schemas.microsoft.com/sharepoint/soap/GetListCollection is available".
>>>>>>>>>>>>>
>>>>>>>>>>>>> Could there be something going on with the way the services are being built on the client side?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Appreciate your help.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ________________________________________
>>>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>>>> Sent: Tuesday, January 24, 2012 4:52 PM
>>>>>>>>>>>>> To: Silvia, Daniel [USA]; connectors-user@incubator.apache.org
>>>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have not seen this exact problem before.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The "Bad envelope tag: HTML" indicates that the SOAP request the
>>>>>>>>>>>>> SharePoint connector is attempting to perform is, in fact, returning
>>>>>>>>>>>>> an HTML response.  This usually indicates that the server or path
>>>>>>>>>>>>> parameters you've used to set up the connection are not set correctly,
>>>>>>>>>>>>> and SharePoint is not actually being engaged.
>>>>>>>>>>>>>
>>>>>>>>>>>>> But usually when that happens I don't recall a ConfigurationException
>>>>>>>>>>>>> logged, unless it's what Axis does in response to the HTML.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The best thing to do at this point is turn on Http Client wire
>>>>>>>>>>>>> logging, restart ManifoldCF, and view the connection.  The log will
>>>>>>>>>>>>> then contain a record of the exact SOAP requests and the responses,
>>>>>>>>>>>>> and we can see what's wrong.  The technique is described here:
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/CONNECTORS/Debugging+Connections
>>>>>>>>>>>>>
>>>>>>>>>>>>> You can also confirm that the right SharePoint web services are
>>>>>>>>>>>>> functioning on the machine in question by trying to access them
>>>>>>>>>>>>> directly.  For the Lists web service, which is the one it sounds like
>>>>>>>>>>>>> it was complaining about, try using IE (not Firefox etc because you
>>>>>>>>>>>>> want NTLM support) to go to the url where you think the web service
>>>>>>>>>>>>> lives.  This will be http: or https:, plus the server, plus the port,
>>>>>>>>>>>>> plus the path, plus "_vti_bin/Lists.asmx".  You should see an
>>>>>>>>>>>>> unequivocable SharePoint response.  For an example from the Microsoft
>>>>>>>>>>>>> demo service, try http://www.wssdemo.com/_vti_bin/Lists.asmx.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Please let me know how it goes, and cc the dev list (as I have) so a
>>>>>>>>>>>>> record of what you're encountering can be made available to others.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jan 24, 2012 at 1:52 PM, Silvia, Daniel [USA]
>>>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have downloaded the newest version of ManifoldCF v .4 and have run the necessary ant scripts to download dependencies and then built the entire project. I have also had the ShrePoint webservice MetCarta.SharePoint.MCPermissionsService.wsp deployed on the SharePoint instance due to running version 3 of SharePoint (SharePoint 2007). When I try to create a Repository Connection and select "Save" I get a message on the ManifoldCF front end of "org.xml.sax.SAXException Bad envelope tag: HTML". When I look at the log file I see an error message " org.apache.axis.ConfigurationException: No service named ListsSoap is available".
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Can you tell me if you have seen this issue before and what may be causing this issue?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>> ________________________________________
>>>>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>>>>> Sent: Friday, January 20, 2012 7:31 AM
>>>>>>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In order for the SharePoint connector to build, you need to have the
>>>>>>>>>>>>>> wsdls in place in the right area.  We cannot ship those because of
>>>>>>>>>>>>>> potential copyright issues.  The easiest way to obtain the right
>>>>>>>>>>>>>> dependencies is:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ant download-dependencies
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Then, just build normally:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ant build
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This will only work for ManifoldCF-0.4-incubating, or trunk.
>>>>>>>>>>>>>> 0.4-incubating is still in the process of being signed off by the
>>>>>>>>>>>>>> incubator, but you can find the release candidate here:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://people.apache.org/~kwright
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Jan 20, 2012 at 7:02 AM, Silvia, Daniel [USA]
>>>>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I work with Matt Parker and we are in the process of developing a pipeline
>>>>>>>>>>>>>>> that uses ManifoldCF at the beginning. I just subscribed to the
>>>>>>>>>>>>>>> connectors-user-subscribe@incubator.apache.org
>>>>>>>>>>>>>>> group yesterday and submitted an e-mail question to the group. Can you help
>>>>>>>>>>>>>>> us with the below issue?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I downloaded MCF and started playing with the default setup under Jetty and
>>>>>>>>>>>>>>> Derby. It starts up without any issue. I am trying to configure a SharePoint
>>>>>>>>>>>>>>> connector, connecting to SharePoint Service 3. I have been following the
>>>>>>>>>>>>>>> instructions and I am at the point of deploying the custom SharePoint web
>>>>>>>>>>>>>>> service to the SharePoint instance. The instructions indicate that I should
>>>>>>>>>>>>>>> get the web service from dist/sharepoint-integration after building MCF.
>>>>>>>>>>>>>>> However, after looking through the entire directory structure, I am unable
>>>>>>>>>>>>>>> to find the service to deploy.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Can someone tell me where to find this service?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Daniel Silvia

RE: ManifoldCF's dist/shapoint-integration dir

Posted by "Silvia, Daniel [USA]" <Si...@bah.com>.
Hi Karl

Quick question regarding SharePoint Wikis and ingesting them into Solr. 

I have been trying to get the Wikis, created in SharePoint, to be ingested into Solr. I am able to see the Wikis in the logging where the SharePoint Connector pulls everything from site, however, I do not see the Wikis content in the solr instance. When creating a job to run, do I need to indicate a path similar to "*Wiki* for the entire site or do I need to configure the solr metadata in the job to capture "WikiField" element in the xml being passed to the Solr connector?

Thanks for your help.

Dan
________________________________________
From: Karl Wright [daddywri@gmail.com]
Sent: Tuesday, January 31, 2012 10:52 AM
To: Silvia, Daniel [USA]
Cc: connectors-user@incubator.apache.org
Subject: Re: ManifoldCF's dist/shapoint-integration dir

It's been a while since I've set up a SharePoint job but I think what
you are missing is a file rule (instead of just a library rule).
Here's what the end-user documentation says on the matter:

"Each rule consists of a path, a rule type, and an action. The actions
are "Include" and "Exclude". The rule type tells the connection what
kind of SharePoint entity it is allowed to exactly match. For example,
a "File" rule will only exactly match SharePoint paths that represent
files - it cannot exactly match sites or libraries. The path itself is
just a sequence of characters, where the "*" character has the special
meaning of being able to match any number of any kind of characters,
and the "?" character matches exactly one character of any kind.

The rule matcher extends strict, exact matching by introducing a
concept of implicit inclusion rules. If your rule action is "Include",
and you specify (say) a "File" rule, the matcher presumes implicit
inclusion rules for the corresponding site and library. So, if you
create an "Include File" rule that matches (for example)
"/MySite/MyLibrary/MyFile", there is an implied "Site Include" rule
for "/MySite", and an implied "Library Include" rule for
"/MySite/MyLibrary". Similarly, if you create a "Library Include"
rule, there is an implied "Site Include" rule that corresponds to it.
Note that these shortcuts only applies to "Include" rules - there are
no corresponding implied "Exclude" rules."

What this means is that you should probably be declaring file rules
with "*" as the file name for each library, rather than a library
rule.  You might want to just try this.  If you still have trouble,
you can try setting the "org.apache.manifoldcf.connectors" property to
"DEBUG" in the properties.xml file and restarting ManifoldCF before
your next crawl.  The manifoldcf.log file will then have output
describing the decisions the SharePoint connector made about each
site, library, file, or folder it encountered.

Thanks,
Karl

On Tue, Jan 31, 2012 at 10:27 AM, Silvia, Daniel [USA]
<Si...@bah.com> wrote:
> Hi Karl
>
> The Path Rules are :
>
> Path Match: /Shared Documents
> Type: library
> Action: include
>
> Path Match: /IDD/Shared Documents
> Type: library
> Action: include
>
> Path Match: /IDD/Documents
> Type: library
> Action: include
>
> Path Match: /manifoldcf/Shared Documents
> Type: library
> Action: include
>
> I hope this helps.
>
> I really appreciate your help.
>
>
>
> ________________________________________
> From: Karl Wright [daddywri@gmail.com]
> Sent: Tuesday, January 31, 2012 10:01 AM
> To: Silvia, Daniel [USA]
> Cc: connectors-user@incubator.apache.org
> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>
> "When I select only the fetch activity, I don't see anything in the
> events, when I select the Document Ingest activity, I don't see
> anything in the events."
>
> So either you've already run the job and the documents were accessed
> the first time (and won't be accessed again until they change), or the
> problem is likely that your SharePoint Path Rules are not including
> any documents.  It would be very helpful at this point to include a
> screen shot of the job you've created.  Since you are not on the net,
> perhaps you can jot down your SharePoint path rules for me to have a
> look at, as they are displayed when you view the job.
>
> Thanks,
> Karl
>
> On Tue, Jan 31, 2012 at 9:44 AM, Silvia, Daniel [USA]
> <Si...@bah.com> wrote:
>> Hi Karl
>>
>> Ok, I have created a new job and ran the job and went to the Simple History Report.
>>
>> I see the Events. If all the  Activities in the Simple History Report, Document Deletion(SolrPipeline), Document Ingest(SolrPipeline), and Fetch are selected I see a start job and end job for events . When I get to the Simple History Report I can select the "Connection", I don't have an option to select the Activities I run the report first.
>> When I select only the fetch activity, I don't see anything in the events, when I select the Document Ingest activity, I don't see anything in the events.
>>
>> My solr output connection has the following information:
>> Protocol: http
>> Server: "the server name"
>> Port:8080 (we are running solr on Jboss port 8080)
>> Web Application Name: solr
>> Core Name: collection1
>> Update Handler: update/extract
>> Remove Handler: /update
>> Status Handler: /admin/ping
>>
>>
>>
>> ________________________________________
>> From: Karl Wright [daddywri@gmail.com]
>> Sent: Tuesday, January 31, 2012 9:00 AM
>> To: Silvia, Daniel [USA]; connectors-user@incubator.apache.org
>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>
>> Ok, let's do one thing at a time.
>>
>> First:
>>
>> "For the Path tab where there are Path Rules, are these the paths we
>> want ManifoldCF to follow? Each site, and each Library like Documents
>> and Shared Documents. And in the Metadata tab, this is the tab where
>> you indicate for each "Site" and "Library" you want to include
>> specific metadata or include all metadata?"
>>
>> For SharePoint, there are Path Rules and Metadata Rules.  The Path
>> Rules describe what documents you want to include or exclude.  The
>> Metadata Rules describe what metadata you want to include or exclude.
>> For right now I would ignore the Metadata Rules and just make sure you
>> have Path Rules that mean that you have included documents.
>>
>> "As I run the report, I see "Documents", "Active, and "Processed"
>> where the numbers change under the "Active" column as well as the
>> "Document" and "Processed" column (these just get larger, where Active
>> changes). "
>>
>> This "report" we actually call the Job Status screen.  The fact that
>> the numbers get larger and the job doesn't just end indicates that you
>> are successfully crawling your SharePoint, and you have set up the job
>> to include at least some documents.  This is good news.  However, this
>> is NOT the "Simple History" report I was alluding to earlier.  To get
>> to that report, click on the "Simple History" link on the left-hand
>> navigation area.  This report will show the events of your choice
>> (default - ALL recorded events) over a given time window (default: the
>> last hour).  If you've done this right you should at least see a "Job
>> start" event.  The events you are most interested in are the "fetch"
>> (which describes all attempts to fetch documents from SharePoint) and
>> "document ingest", which describe attempts to get documents into Solr.
>>  You can refresh the displayed events by clicking the "Go" button in
>> the middle of the screen whenever you wish.
>>
>> I'd like you to delete your job, create it again, and start it.  Then,
>> while it is running, I'd like you to go to the "Simple History"
>> screen, and select the appropriate connection (your SharePoint
>> repository connection), and click the "Go" button.  So as not to skip
>> anything basic:
>>
>> (1) What event types do you see?
>> (2) Are there "fetch" events?
>> (3) Are there "document ingest" events?
>>
>> If you see no "fetch" events, that implies you have either not
>> specified any documents to include in your job, OR your Solr
>> connection is configured to reject too many document types so they are
>> all getting filtered out.
>>
>> If you see "document ingest" events, but those have errors, it implies
>> that the configuration of your Solr connection is incorrect and does
>> not match the way your Solr is configured.  If you send me a specific
>> error code and/or text I can help you figure out what is happening.
>>
>> If you see "document ingest" events with NO errors, but the Solr
>> instance is not getting documents, you are describing an impossible
>> situation.  While your Solr instance may not be configured to have the
>> Extracting Update Handler active, or it may be at a different URL than
>> what you pointed at, that would definitely yield errors or
>> notifications in the Simple History.
>>
>> Please let me know what you actually see.
>> Karl
>>
>>
>>
>> On Tue, Jan 31, 2012 at 7:53 AM, Silvia, Daniel [USA]
>> <Si...@bah.com> wrote:
>>> Hi Karl
>>>
>>> I am trying to figure out why I can't see anything being indexed into our Solr index. I was looking at another post where you were working with "Martijn" and that individual was not able to see info getting into Solr. In the report  that I have set up, I have included all metadata associated to each site, Share Documents, and Documents. In the Solr Field Mapping, I am associating metadata fields that are indicated in the MetaData tab to fields that exist in our solr index.
>>>
>>> For the Path tab where there are Path Rules, are these the paths we want ManifoldCF to follow? Each site, and each Library like Documents and Shared Documents. And in the Metadata tab, this is the tab where you indicate for each "Site" and "Library" you want to include specific metadata or include all metadata?
>>>
>>> As I run the report, I see "Documents", "Active, and "Processed" where the numbers change under the "Active" column as well as the "Document" and "Processed" column (these just get larger, where Active changes). While I was researching why I may not be seeing something over on the Solr side, I saw your communication with another individual indicating that I should see something like literal.xxx=yyy in the Solr log. This is an older post so there maybe something else I should see. But the only thing I see when I look at the Solr log is "[ ] webapp=/solr path=/update/extract params={commit=true} status=0 QTime=0".
>>>
>>> Any ideas.
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>>
>>> ________________________________________
>>> From: Karl Wright [daddywri@gmail.com]
>>> Sent: Monday, January 30, 2012 10:40 AM
>>> To: Silvia, Daniel [USA]
>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>
>>> The default time range for the Simple History is the last hour.  I
>>> suspect you are unaware of that.  If you want a different time range
>>> you will have to modify the start and end time pulldowns accordingly.
>>>
>>> Karl
>>>
>>> On Mon, Jan 30, 2012 at 10:34 AM, Silvia, Daniel [USA]
>>> <Si...@bah.com> wrote:
>>>> Hi Karl
>>>>
>>>> I am looking at the Simple History in the UI and there isn't much to see, unless I am not getting what I am suppose to.  I see the "Start Time, Activity, Identifier, Bytes, and Time, I don't get anything for Result Code or Result Description. I looked in the documentation and we should be getting something in those fields, I believe.
>>>>
>>>> Anyway, I will look through the mail list to see what I can find.
>>>>
>>>> Thanks for the help.
>>>>
>>>> Dan
>>>>
>>>> ________________________________________
>>>> From: Karl Wright [daddywri@gmail.com]
>>>> Sent: Monday, January 30, 2012 8:24 AM
>>>> To: Silvia, Daniel [USA]
>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>
>>>> So just to be clear, I'm NOT talking about the ManifoldCF logging.
>>>> For the Solr connector you probably won't need to turn that on; it's
>>>> pretty simple and you can look at the Simple History in the UI to see
>>>> what the request and response look like from Solr.  I was talking
>>>> instead about Solr logging - when you run the Solr Webapp, by default
>>>> all requests against the Extracting Update Handler are logged to
>>>> standard error, so you will see them appear in the process window in
>>>> which Solr is running.
>>>>
>>>> My suggestion to you is to first have a look at the Simple History for
>>>> the job you are trying to run.  If you are getting back 500 errors
>>>> from Solr, that means you have not set up Solr properly to work with
>>>> ManifoldCF.  In recent versions of Solr, the example works fine out of
>>>> the box, but when you try to deploy any other way you are often
>>>> missing the jar that contains the extracting update handler, so of
>>>> course nothing works.  Several people on the connectors-user list have
>>>> run into this and if you search the list (go to the ManifoldCF site
>>>> and click through to the mailing list page and there are links at the
>>>> bottom for this purpose) you will find posts that describe exactly
>>>> what is wrong and how to fix it.
>>>>
>>>> Hope this helps.
>>>>
>>>> Karl
>>>>
>>>>
>>>> On Sun, Jan 29, 2012 at 2:30 PM, Silvia, Daniel [USA]
>>>> <Si...@bah.com> wrote:
>>>>> Yea,but for some reason the logging isn't coming through. The logging is set for info and I will have to change the logging level to DEBUG.
>>>>>
>>>>> Thanks again for your help.
>>>>>
>>>>>
>>>>> ________________________________________
>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>> Sent: Friday, January 27, 2012 5:06 PM
>>>>> To: Silvia, Daniel [USA]
>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>
>>>>> Actually, the best thing for debugging the Solr connection is looking
>>>>> at standard-output on the Solr instance.  You will see all the posts
>>>>> that are made and what the arguments were.  Also, this is the kind of
>>>>> question you'd get a lot of benefit from posting to the list.  The
>>>>> end-user documentation I pointed you at before describes some of this
>>>>> but the Solr connector has grown beyond the doc to some extent at this
>>>>> point.
>>>>>
>>>>> Karl
>>>>>
>>>>> On Fri, Jan 27, 2012 at 9:51 AM, Silvia, Daniel [USA]
>>>>> <Si...@bah.com> wrote:
>>>>>> Hi Karl
>>>>>>
>>>>>> Is there a log level other than  Wire-level debugging to view log staements for trying to send output to a Solr instance in the Jobs List/Creation section? We are having an issue getting content to Solr. Is there a document anywhere which defines the fields for the Jobs sections for the Solr Field Mapping tab and the Paths and MetaData tabs?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>> ________________________________________
>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>> Sent: Thursday, January 26, 2012 10:44 AM
>>>>>> To: Silvia, Daniel [USA]
>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>
>>>>>> I am afraid I don't know the answer to that.  I'm sure it's infinitely
>>>>>> configurable but it's not clear what the SharePoint web services need
>>>>>> to do under the hood, so anything I tell you would be just a guess.
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>> On Thu, Jan 26, 2012 at 10:43 AM, Silvia, Daniel [USA]
>>>>>> <Si...@bah.com> wrote:
>>>>>>> Hi Karl
>>>>>>>
>>>>>>> One more question. Do you know the minimum permissions needed to crawl the Sharepoint instance and all sites under the instance? The individual who set my permissions set me up as the "site collection admin" for the top most site. Is there a specific admin role without setting the user crawling the sharpoint instance other than "Farm Admin"?
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> ________________________________________
>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>> Sent: Thursday, January 26, 2012 9:53 AM
>>>>>>> To: Silvia, Daniel [USA]
>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>
>>>>>>> Good news!  Please keep in touch; we'd like to hear how things work
>>>>>>> for you (it helps keep the software fresh ;-) ).
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>> On Thu, Jan 26, 2012 at 9:48 AM, Silvia, Daniel [USA]
>>>>>>> <Si...@bah.com> wrote:
>>>>>>>> Hey Karl
>>>>>>>>
>>>>>>>> (1) was the issue. When requesting access to the SharePoint instance I indicated that I needed to be able to crawl SharePoint, I guess the problem was on my end indicating that I also needed privileges to crawl the site.
>>>>>>>>
>>>>>>>> Anyway, thank you for your help. When I change the SharePoint version to v 3 I get a message indicating "Connection Working".
>>>>>>>>
>>>>>>>> Appreciate the help.
>>>>>>>>
>>>>>>>> Dan
>>>>>>>>
>>>>>>>> ________________________________________
>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>> Sent: Thursday, January 26, 2012 9:19 AM
>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>
>>>>>>>> The error message "axisFault=Server, detail=Server was unable to
>>>>>>>> process request --> Requested Registry access is not allowed" is Axis
>>>>>>>> interpreting an error message from SharePoint.  What it is saying is
>>>>>>>> that the user you are trying to crawl with is unable to read the
>>>>>>>> SharePoint machine's registry but needs to.  There are two possible
>>>>>>>> causes for this:
>>>>>>>>
>>>>>>>> (1) The user you gave doesn't have enough permissions to crawl SharePoint
>>>>>>>> (2) When you installed the SharePoint MCPermissions plugin, you
>>>>>>>> installed it logged in as a user that did not enough permissions to do
>>>>>>>> what it needs to do.
>>>>>>>>
>>>>>>>> You can tell the difference between the two by selecting "SharePoint
>>>>>>>> 2.0" in the sharepoint version pulldown.  If a connection saved in
>>>>>>>> this way says "Connection working", it means that the MCPermissions
>>>>>>>> plugin has the permission problem, not your user.
>>>>>>>>
>>>>>>>> Karl
>>>>>>>>
>>>>>>>> On Thu, Jan 26, 2012 at 9:14 AM, Silvia, Daniel [USA]
>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>> Hi Karl
>>>>>>>>>
>>>>>>>>> When I try to use option (1) and don't put anything in the Site field, I get an error message "axisFault=Server, detail=Server was unable to process request --> Requested Registry access is not allowed" and when I put a "/" in the site filed I get  a GUI error indicating that the site field can't end with a "/".
>>>>>>>>>
>>>>>>>>> Anyway, do you have any ideas. Or maybe the Sharepoint instance is not configured properly for us to crawl?
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ________________________________________
>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>> Sent: Thursday, January 26, 2012 8:52 AM
>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>
>>>>>>>>> SharePoint has two kinds of site:
>>>>>>>>>
>>>>>>>>> (1) the root site, which can be reached by the path http://server:port
>>>>>>>>> (2) a number of sites under the 'virtual path', with URLs of the form:
>>>>>>>>>
>>>>>>>>> http://server:port/something/sitename
>>>>>>>>>
>>>>>>>>> The "something" is, by default, the string "site", so
>>>>>>>>> http://server:port/site/xyz might be the URL of one such virtual site.
>>>>>>>>>
>>>>>>>>> The form of the "site" field in the SharePoint connection for the
>>>>>>>>> first is either blank or "/" (can't remember which right now), and the
>>>>>>>>> form of the "site" field for the second is "/site/xyz".  On no account
>>>>>>>>> does the connector expect to see default.aspx attached to that path,
>>>>>>>>> so you should not do this; it cannot work.
>>>>>>>>>
>>>>>>>>> FWIW, my recommendation to try setting the connection type to
>>>>>>>>> "SharePoint 2.0" was to rule out any possible installation issue with
>>>>>>>>> the ManifoldCF sharepoint plugin.  The connection check for 2.0 does
>>>>>>>>> not look for it; only the connection check for 3.0 does.
>>>>>>>>>
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>> On Thu, Jan 26, 2012 at 8:41 AM, Silvia, Daniel [USA]
>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>> Hey Karl
>>>>>>>>>>
>>>>>>>>>> I am also getting an "HTTP Error 401.2: Unauthorized: Access is denied due to server configuration" when setting the Site field to /default.aspx. Do most Sharepoint instances have the urls set to something like http://server:port/sites/...... instead of http://server:port/? When I use the "/default.aspx" I see in the log files that ManifoldCF is trying to go to the Lists.asmx service with the url http://server:port/default.aspx/_vti_bin/Lists.asmx, where nothing is found.
>>>>>>>>>>
>>>>>>>>>> As you can tell I am not much of a SharePoint user or installer.
>>>>>>>>>>
>>>>>>>>>> Also, I don't think the issue is with the connector in ManifoldCF, I am just trying to
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ________________________________________
>>>>>>>>>> From: Silvia, Daniel [USA]
>>>>>>>>>> Sent: Thursday, January 26, 2012 7:23 AM
>>>>>>>>>> To: Karl Wright
>>>>>>>>>> Subject: RE: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>
>>>>>>>>>> Hey Karl
>>>>>>>>>>
>>>>>>>>>> The issue I am having is that the Sharepoint instance url is something like http://server:port/default.aspx. If I don't put anything in the site field I get a message indicating "Requested Registry Access is not allowed". I was putting "/default.apsx" as my Site field which I believe may have been the issue. However, what do you put in your Site field when the site is the top most site, as in http://server:port/default.aspx?
>>>>>>>>>>
>>>>>>>>>> I would love to send you the log messages, but I am working on a network which is not connected to the outside.
>>>>>>>>>>
>>>>>>>>>> Thanks for your help.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ________________________________________
>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>> Sent: Wednesday, January 25, 2012 6:12 PM
>>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>
>>>>>>>>>> Daniel,
>>>>>>>>>>
>>>>>>>>>> FWIW, I can help you diagnose the issue, but to do so you really need
>>>>>>>>>> to give me some concrete data.  I'm happy to grovel over the whole
>>>>>>>>>> wire log if you feel you can send it to me; something that may not
>>>>>>>>>> seem important to you will likely stand out strongly to me.  I can,
>>>>>>>>>> for example, see whether you are getting back HTML because of an
>>>>>>>>>> authentication error, for instance.  And if you ARE getting back valid
>>>>>>>>>> SOAP, I would then be sure that something was wrong with the Axis
>>>>>>>>>> client configuration, and I could pursue that here with the data
>>>>>>>>>> provided.  The problem with software like SharePoint running on IIS is
>>>>>>>>>> that it can be configured a nearly infinite number of ways, so
>>>>>>>>>> diagnosis is more of an art than a science.  I strongly suspect that
>>>>>>>>>> you're laboring under a pretty straightforward misconception which is
>>>>>>>>>> likely blocking progress, rather than there being an issue with the
>>>>>>>>>> SharePoint connector itself.  But I can't tell that without more
>>>>>>>>>> detailed communication.
>>>>>>>>>>
>>>>>>>>>> Also, you mentioned that the Lists.asmx service was right where you
>>>>>>>>>> expected it to be.  Have you read the SharePoint Connector part of the
>>>>>>>>>> end-user documentation?  To whit:
>>>>>>>>>>
>>>>>>>>>> "Select the server protocol, and enter the server name and port, based
>>>>>>>>>> on what you recorded from the URL for your SharePoint site. For the
>>>>>>>>>> "Site path" field, type in the portion of the root site URL that
>>>>>>>>>> includes everything after the server and port, except for the final
>>>>>>>>>> "aspx" file. For example, if the SharePoint URL is
>>>>>>>>>> "http://myserver:81/sites/somewhere/index.asp", the site path would be
>>>>>>>>>> "/sites/somewhere"."  The Lists.asmx service in this example would be
>>>>>>>>>> expected to be found at
>>>>>>>>>> "http://myserver:81/sites/somewhere/_vti_bin/Lists.asmx".  And the URL
>>>>>>>>>> you would start with would be the URL you see in the browser when you
>>>>>>>>>> log into the SharePoint web client and go to the site you wish to
>>>>>>>>>> crawl.  Is this what you are doing?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks again,
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Jan 25, 2012 at 12:33 PM, Karl Wright <da...@gmail.com> wrote:
>>>>>>>>>>> The code that parses the SOAP response is Apache Axis.  This hasn't
>>>>>>>>>>> changed in several years.
>>>>>>>>>>>
>>>>>>>>>>> Can you answer the following questions:
>>>>>>>>>>>
>>>>>>>>>>> (1) When the SharePoint connector makes a request to SharePoint, is
>>>>>>>>>>> the response HTML, or is it XML?  Does it have an XML header which
>>>>>>>>>>> describes a Microsoft XML namespace?  It sure sounds like it is
>>>>>>>>>>> responding with HTML.  The SharePoint connector is expecting to
>>>>>>>>>>> communicate using SOAP.  Is the response valid SOAP?
>>>>>>>>>>>
>>>>>>>>>>> (2) What version of SharePoint are you trying to connect to?  Is the
>>>>>>>>>>> SharePoint 2007?  SharePoint 2010?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jan 25, 2012 at 12:26 PM, Silvia, Daniel [USA]
>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>
>>>>>>>>>>>> I have added the specific log4j lines for Http Client wire and I restarted the ManifoldCF instance. I was also see the webservice Lists.asmx through IE. When reviewing the log files I was able to see some of the content that resides in the Sharepoint instance in the content coming back from the request. However, I am still seeing the error messages in the ManifoldCF GUI as well as in the log file indicating  "Bad Envelope: HTML" ,"No service named ListsSoap is available" and "No service named http://schemas.microsoft.com/sharepoint/soap/GetListCollection is available".
>>>>>>>>>>>>
>>>>>>>>>>>> Could there be something going on with the way the services are being built on the client side?
>>>>>>>>>>>>
>>>>>>>>>>>> Appreciate your help.
>>>>>>>>>>>>
>>>>>>>>>>>> Dan
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ________________________________________
>>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>>> Sent: Tuesday, January 24, 2012 4:52 PM
>>>>>>>>>>>> To: Silvia, Daniel [USA]; connectors-user@incubator.apache.org
>>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>>
>>>>>>>>>>>> I have not seen this exact problem before.
>>>>>>>>>>>>
>>>>>>>>>>>> The "Bad envelope tag: HTML" indicates that the SOAP request the
>>>>>>>>>>>> SharePoint connector is attempting to perform is, in fact, returning
>>>>>>>>>>>> an HTML response.  This usually indicates that the server or path
>>>>>>>>>>>> parameters you've used to set up the connection are not set correctly,
>>>>>>>>>>>> and SharePoint is not actually being engaged.
>>>>>>>>>>>>
>>>>>>>>>>>> But usually when that happens I don't recall a ConfigurationException
>>>>>>>>>>>> logged, unless it's what Axis does in response to the HTML.
>>>>>>>>>>>>
>>>>>>>>>>>> The best thing to do at this point is turn on Http Client wire
>>>>>>>>>>>> logging, restart ManifoldCF, and view the connection.  The log will
>>>>>>>>>>>> then contain a record of the exact SOAP requests and the responses,
>>>>>>>>>>>> and we can see what's wrong.  The technique is described here:
>>>>>>>>>>>>
>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/CONNECTORS/Debugging+Connections
>>>>>>>>>>>>
>>>>>>>>>>>> You can also confirm that the right SharePoint web services are
>>>>>>>>>>>> functioning on the machine in question by trying to access them
>>>>>>>>>>>> directly.  For the Lists web service, which is the one it sounds like
>>>>>>>>>>>> it was complaining about, try using IE (not Firefox etc because you
>>>>>>>>>>>> want NTLM support) to go to the url where you think the web service
>>>>>>>>>>>> lives.  This will be http: or https:, plus the server, plus the port,
>>>>>>>>>>>> plus the path, plus "_vti_bin/Lists.asmx".  You should see an
>>>>>>>>>>>> unequivocable SharePoint response.  For an example from the Microsoft
>>>>>>>>>>>> demo service, try http://www.wssdemo.com/_vti_bin/Lists.asmx.
>>>>>>>>>>>>
>>>>>>>>>>>> Please let me know how it goes, and cc the dev list (as I have) so a
>>>>>>>>>>>> record of what you're encountering can be made available to others.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks!
>>>>>>>>>>>> Karl
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jan 24, 2012 at 1:52 PM, Silvia, Daniel [USA]
>>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have downloaded the newest version of ManifoldCF v .4 and have run the necessary ant scripts to download dependencies and then built the entire project. I have also had the ShrePoint webservice MetCarta.SharePoint.MCPermissionsService.wsp deployed on the SharePoint instance due to running version 3 of SharePoint (SharePoint 2007). When I try to create a Repository Connection and select "Save" I get a message on the ManifoldCF front end of "org.xml.sax.SAXException Bad envelope tag: HTML". When I look at the log file I see an error message " org.apache.axis.ConfigurationException: No service named ListsSoap is available".
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can you tell me if you have seen this issue before and what may be causing this issue?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Dan
>>>>>>>>>>>>> ________________________________________
>>>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>>>> Sent: Friday, January 20, 2012 7:31 AM
>>>>>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>>>
>>>>>>>>>>>>> In order for the SharePoint connector to build, you need to have the
>>>>>>>>>>>>> wsdls in place in the right area.  We cannot ship those because of
>>>>>>>>>>>>> potential copyright issues.  The easiest way to obtain the right
>>>>>>>>>>>>> dependencies is:
>>>>>>>>>>>>>
>>>>>>>>>>>>> ant download-dependencies
>>>>>>>>>>>>>
>>>>>>>>>>>>> Then, just build normally:
>>>>>>>>>>>>>
>>>>>>>>>>>>> ant build
>>>>>>>>>>>>>
>>>>>>>>>>>>> This will only work for ManifoldCF-0.4-incubating, or trunk.
>>>>>>>>>>>>> 0.4-incubating is still in the process of being signed off by the
>>>>>>>>>>>>> incubator, but you can find the release candidate here:
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://people.apache.org/~kwright
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Jan 20, 2012 at 7:02 AM, Silvia, Daniel [USA]
>>>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I work with Matt Parker and we are in the process of developing a pipeline
>>>>>>>>>>>>>> that uses ManifoldCF at the beginning. I just subscribed to the
>>>>>>>>>>>>>> connectors-user-subscribe@incubator.apache.org
>>>>>>>>>>>>>> group yesterday and submitted an e-mail question to the group. Can you help
>>>>>>>>>>>>>> us with the below issue?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I downloaded MCF and started playing with the default setup under Jetty and
>>>>>>>>>>>>>> Derby. It starts up without any issue. I am trying to configure a SharePoint
>>>>>>>>>>>>>> connector, connecting to SharePoint Service 3. I have been following the
>>>>>>>>>>>>>> instructions and I am at the point of deploying the custom SharePoint web
>>>>>>>>>>>>>> service to the SharePoint instance. The instructions indicate that I should
>>>>>>>>>>>>>> get the web service from dist/sharepoint-integration after building MCF.
>>>>>>>>>>>>>> However, after looking through the entire directory structure, I am unable
>>>>>>>>>>>>>> to find the service to deploy.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Can someone tell me where to find this service?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Daniel Silvia

Re: ManifoldCF's dist/shapoint-integration dir

Posted by Karl Wright <da...@gmail.com>.
It's been a while since I've set up a SharePoint job but I think what
you are missing is a file rule (instead of just a library rule).
Here's what the end-user documentation says on the matter:

"Each rule consists of a path, a rule type, and an action. The actions
are "Include" and "Exclude". The rule type tells the connection what
kind of SharePoint entity it is allowed to exactly match. For example,
a "File" rule will only exactly match SharePoint paths that represent
files - it cannot exactly match sites or libraries. The path itself is
just a sequence of characters, where the "*" character has the special
meaning of being able to match any number of any kind of characters,
and the "?" character matches exactly one character of any kind.

The rule matcher extends strict, exact matching by introducing a
concept of implicit inclusion rules. If your rule action is "Include",
and you specify (say) a "File" rule, the matcher presumes implicit
inclusion rules for the corresponding site and library. So, if you
create an "Include File" rule that matches (for example)
"/MySite/MyLibrary/MyFile", there is an implied "Site Include" rule
for "/MySite", and an implied "Library Include" rule for
"/MySite/MyLibrary". Similarly, if you create a "Library Include"
rule, there is an implied "Site Include" rule that corresponds to it.
Note that these shortcuts only applies to "Include" rules - there are
no corresponding implied "Exclude" rules."

What this means is that you should probably be declaring file rules
with "*" as the file name for each library, rather than a library
rule.  You might want to just try this.  If you still have trouble,
you can try setting the "org.apache.manifoldcf.connectors" property to
"DEBUG" in the properties.xml file and restarting ManifoldCF before
your next crawl.  The manifoldcf.log file will then have output
describing the decisions the SharePoint connector made about each
site, library, file, or folder it encountered.

Thanks,
Karl

On Tue, Jan 31, 2012 at 10:27 AM, Silvia, Daniel [USA]
<Si...@bah.com> wrote:
> Hi Karl
>
> The Path Rules are :
>
> Path Match: /Shared Documents
> Type: library
> Action: include
>
> Path Match: /IDD/Shared Documents
> Type: library
> Action: include
>
> Path Match: /IDD/Documents
> Type: library
> Action: include
>
> Path Match: /manifoldcf/Shared Documents
> Type: library
> Action: include
>
> I hope this helps.
>
> I really appreciate your help.
>
>
>
> ________________________________________
> From: Karl Wright [daddywri@gmail.com]
> Sent: Tuesday, January 31, 2012 10:01 AM
> To: Silvia, Daniel [USA]
> Cc: connectors-user@incubator.apache.org
> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>
> "When I select only the fetch activity, I don't see anything in the
> events, when I select the Document Ingest activity, I don't see
> anything in the events."
>
> So either you've already run the job and the documents were accessed
> the first time (and won't be accessed again until they change), or the
> problem is likely that your SharePoint Path Rules are not including
> any documents.  It would be very helpful at this point to include a
> screen shot of the job you've created.  Since you are not on the net,
> perhaps you can jot down your SharePoint path rules for me to have a
> look at, as they are displayed when you view the job.
>
> Thanks,
> Karl
>
> On Tue, Jan 31, 2012 at 9:44 AM, Silvia, Daniel [USA]
> <Si...@bah.com> wrote:
>> Hi Karl
>>
>> Ok, I have created a new job and ran the job and went to the Simple History Report.
>>
>> I see the Events. If all the  Activities in the Simple History Report, Document Deletion(SolrPipeline), Document Ingest(SolrPipeline), and Fetch are selected I see a start job and end job for events . When I get to the Simple History Report I can select the "Connection", I don't have an option to select the Activities I run the report first.
>> When I select only the fetch activity, I don't see anything in the events, when I select the Document Ingest activity, I don't see anything in the events.
>>
>> My solr output connection has the following information:
>> Protocol: http
>> Server: "the server name"
>> Port:8080 (we are running solr on Jboss port 8080)
>> Web Application Name: solr
>> Core Name: collection1
>> Update Handler: update/extract
>> Remove Handler: /update
>> Status Handler: /admin/ping
>>
>>
>>
>> ________________________________________
>> From: Karl Wright [daddywri@gmail.com]
>> Sent: Tuesday, January 31, 2012 9:00 AM
>> To: Silvia, Daniel [USA]; connectors-user@incubator.apache.org
>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>
>> Ok, let's do one thing at a time.
>>
>> First:
>>
>> "For the Path tab where there are Path Rules, are these the paths we
>> want ManifoldCF to follow? Each site, and each Library like Documents
>> and Shared Documents. And in the Metadata tab, this is the tab where
>> you indicate for each "Site" and "Library" you want to include
>> specific metadata or include all metadata?"
>>
>> For SharePoint, there are Path Rules and Metadata Rules.  The Path
>> Rules describe what documents you want to include or exclude.  The
>> Metadata Rules describe what metadata you want to include or exclude.
>> For right now I would ignore the Metadata Rules and just make sure you
>> have Path Rules that mean that you have included documents.
>>
>> "As I run the report, I see "Documents", "Active, and "Processed"
>> where the numbers change under the "Active" column as well as the
>> "Document" and "Processed" column (these just get larger, where Active
>> changes). "
>>
>> This "report" we actually call the Job Status screen.  The fact that
>> the numbers get larger and the job doesn't just end indicates that you
>> are successfully crawling your SharePoint, and you have set up the job
>> to include at least some documents.  This is good news.  However, this
>> is NOT the "Simple History" report I was alluding to earlier.  To get
>> to that report, click on the "Simple History" link on the left-hand
>> navigation area.  This report will show the events of your choice
>> (default - ALL recorded events) over a given time window (default: the
>> last hour).  If you've done this right you should at least see a "Job
>> start" event.  The events you are most interested in are the "fetch"
>> (which describes all attempts to fetch documents from SharePoint) and
>> "document ingest", which describe attempts to get documents into Solr.
>>  You can refresh the displayed events by clicking the "Go" button in
>> the middle of the screen whenever you wish.
>>
>> I'd like you to delete your job, create it again, and start it.  Then,
>> while it is running, I'd like you to go to the "Simple History"
>> screen, and select the appropriate connection (your SharePoint
>> repository connection), and click the "Go" button.  So as not to skip
>> anything basic:
>>
>> (1) What event types do you see?
>> (2) Are there "fetch" events?
>> (3) Are there "document ingest" events?
>>
>> If you see no "fetch" events, that implies you have either not
>> specified any documents to include in your job, OR your Solr
>> connection is configured to reject too many document types so they are
>> all getting filtered out.
>>
>> If you see "document ingest" events, but those have errors, it implies
>> that the configuration of your Solr connection is incorrect and does
>> not match the way your Solr is configured.  If you send me a specific
>> error code and/or text I can help you figure out what is happening.
>>
>> If you see "document ingest" events with NO errors, but the Solr
>> instance is not getting documents, you are describing an impossible
>> situation.  While your Solr instance may not be configured to have the
>> Extracting Update Handler active, or it may be at a different URL than
>> what you pointed at, that would definitely yield errors or
>> notifications in the Simple History.
>>
>> Please let me know what you actually see.
>> Karl
>>
>>
>>
>> On Tue, Jan 31, 2012 at 7:53 AM, Silvia, Daniel [USA]
>> <Si...@bah.com> wrote:
>>> Hi Karl
>>>
>>> I am trying to figure out why I can't see anything being indexed into our Solr index. I was looking at another post where you were working with "Martijn" and that individual was not able to see info getting into Solr. In the report  that I have set up, I have included all metadata associated to each site, Share Documents, and Documents. In the Solr Field Mapping, I am associating metadata fields that are indicated in the MetaData tab to fields that exist in our solr index.
>>>
>>> For the Path tab where there are Path Rules, are these the paths we want ManifoldCF to follow? Each site, and each Library like Documents and Shared Documents. And in the Metadata tab, this is the tab where you indicate for each "Site" and "Library" you want to include specific metadata or include all metadata?
>>>
>>> As I run the report, I see "Documents", "Active, and "Processed" where the numbers change under the "Active" column as well as the "Document" and "Processed" column (these just get larger, where Active changes). While I was researching why I may not be seeing something over on the Solr side, I saw your communication with another individual indicating that I should see something like literal.xxx=yyy in the Solr log. This is an older post so there maybe something else I should see. But the only thing I see when I look at the Solr log is "[ ] webapp=/solr path=/update/extract params={commit=true} status=0 QTime=0".
>>>
>>> Any ideas.
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>>
>>> ________________________________________
>>> From: Karl Wright [daddywri@gmail.com]
>>> Sent: Monday, January 30, 2012 10:40 AM
>>> To: Silvia, Daniel [USA]
>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>
>>> The default time range for the Simple History is the last hour.  I
>>> suspect you are unaware of that.  If you want a different time range
>>> you will have to modify the start and end time pulldowns accordingly.
>>>
>>> Karl
>>>
>>> On Mon, Jan 30, 2012 at 10:34 AM, Silvia, Daniel [USA]
>>> <Si...@bah.com> wrote:
>>>> Hi Karl
>>>>
>>>> I am looking at the Simple History in the UI and there isn't much to see, unless I am not getting what I am suppose to.  I see the "Start Time, Activity, Identifier, Bytes, and Time, I don't get anything for Result Code or Result Description. I looked in the documentation and we should be getting something in those fields, I believe.
>>>>
>>>> Anyway, I will look through the mail list to see what I can find.
>>>>
>>>> Thanks for the help.
>>>>
>>>> Dan
>>>>
>>>> ________________________________________
>>>> From: Karl Wright [daddywri@gmail.com]
>>>> Sent: Monday, January 30, 2012 8:24 AM
>>>> To: Silvia, Daniel [USA]
>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>
>>>> So just to be clear, I'm NOT talking about the ManifoldCF logging.
>>>> For the Solr connector you probably won't need to turn that on; it's
>>>> pretty simple and you can look at the Simple History in the UI to see
>>>> what the request and response look like from Solr.  I was talking
>>>> instead about Solr logging - when you run the Solr Webapp, by default
>>>> all requests against the Extracting Update Handler are logged to
>>>> standard error, so you will see them appear in the process window in
>>>> which Solr is running.
>>>>
>>>> My suggestion to you is to first have a look at the Simple History for
>>>> the job you are trying to run.  If you are getting back 500 errors
>>>> from Solr, that means you have not set up Solr properly to work with
>>>> ManifoldCF.  In recent versions of Solr, the example works fine out of
>>>> the box, but when you try to deploy any other way you are often
>>>> missing the jar that contains the extracting update handler, so of
>>>> course nothing works.  Several people on the connectors-user list have
>>>> run into this and if you search the list (go to the ManifoldCF site
>>>> and click through to the mailing list page and there are links at the
>>>> bottom for this purpose) you will find posts that describe exactly
>>>> what is wrong and how to fix it.
>>>>
>>>> Hope this helps.
>>>>
>>>> Karl
>>>>
>>>>
>>>> On Sun, Jan 29, 2012 at 2:30 PM, Silvia, Daniel [USA]
>>>> <Si...@bah.com> wrote:
>>>>> Yea,but for some reason the logging isn't coming through. The logging is set for info and I will have to change the logging level to DEBUG.
>>>>>
>>>>> Thanks again for your help.
>>>>>
>>>>>
>>>>> ________________________________________
>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>> Sent: Friday, January 27, 2012 5:06 PM
>>>>> To: Silvia, Daniel [USA]
>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>
>>>>> Actually, the best thing for debugging the Solr connection is looking
>>>>> at standard-output on the Solr instance.  You will see all the posts
>>>>> that are made and what the arguments were.  Also, this is the kind of
>>>>> question you'd get a lot of benefit from posting to the list.  The
>>>>> end-user documentation I pointed you at before describes some of this
>>>>> but the Solr connector has grown beyond the doc to some extent at this
>>>>> point.
>>>>>
>>>>> Karl
>>>>>
>>>>> On Fri, Jan 27, 2012 at 9:51 AM, Silvia, Daniel [USA]
>>>>> <Si...@bah.com> wrote:
>>>>>> Hi Karl
>>>>>>
>>>>>> Is there a log level other than  Wire-level debugging to view log staements for trying to send output to a Solr instance in the Jobs List/Creation section? We are having an issue getting content to Solr. Is there a document anywhere which defines the fields for the Jobs sections for the Solr Field Mapping tab and the Paths and MetaData tabs?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>> ________________________________________
>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>> Sent: Thursday, January 26, 2012 10:44 AM
>>>>>> To: Silvia, Daniel [USA]
>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>
>>>>>> I am afraid I don't know the answer to that.  I'm sure it's infinitely
>>>>>> configurable but it's not clear what the SharePoint web services need
>>>>>> to do under the hood, so anything I tell you would be just a guess.
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>> On Thu, Jan 26, 2012 at 10:43 AM, Silvia, Daniel [USA]
>>>>>> <Si...@bah.com> wrote:
>>>>>>> Hi Karl
>>>>>>>
>>>>>>> One more question. Do you know the minimum permissions needed to crawl the Sharepoint instance and all sites under the instance? The individual who set my permissions set me up as the "site collection admin" for the top most site. Is there a specific admin role without setting the user crawling the sharpoint instance other than "Farm Admin"?
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> ________________________________________
>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>> Sent: Thursday, January 26, 2012 9:53 AM
>>>>>>> To: Silvia, Daniel [USA]
>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>
>>>>>>> Good news!  Please keep in touch; we'd like to hear how things work
>>>>>>> for you (it helps keep the software fresh ;-) ).
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>> On Thu, Jan 26, 2012 at 9:48 AM, Silvia, Daniel [USA]
>>>>>>> <Si...@bah.com> wrote:
>>>>>>>> Hey Karl
>>>>>>>>
>>>>>>>> (1) was the issue. When requesting access to the SharePoint instance I indicated that I needed to be able to crawl SharePoint, I guess the problem was on my end indicating that I also needed privileges to crawl the site.
>>>>>>>>
>>>>>>>> Anyway, thank you for your help. When I change the SharePoint version to v 3 I get a message indicating "Connection Working".
>>>>>>>>
>>>>>>>> Appreciate the help.
>>>>>>>>
>>>>>>>> Dan
>>>>>>>>
>>>>>>>> ________________________________________
>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>> Sent: Thursday, January 26, 2012 9:19 AM
>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>
>>>>>>>> The error message "axisFault=Server, detail=Server was unable to
>>>>>>>> process request --> Requested Registry access is not allowed" is Axis
>>>>>>>> interpreting an error message from SharePoint.  What it is saying is
>>>>>>>> that the user you are trying to crawl with is unable to read the
>>>>>>>> SharePoint machine's registry but needs to.  There are two possible
>>>>>>>> causes for this:
>>>>>>>>
>>>>>>>> (1) The user you gave doesn't have enough permissions to crawl SharePoint
>>>>>>>> (2) When you installed the SharePoint MCPermissions plugin, you
>>>>>>>> installed it logged in as a user that did not enough permissions to do
>>>>>>>> what it needs to do.
>>>>>>>>
>>>>>>>> You can tell the difference between the two by selecting "SharePoint
>>>>>>>> 2.0" in the sharepoint version pulldown.  If a connection saved in
>>>>>>>> this way says "Connection working", it means that the MCPermissions
>>>>>>>> plugin has the permission problem, not your user.
>>>>>>>>
>>>>>>>> Karl
>>>>>>>>
>>>>>>>> On Thu, Jan 26, 2012 at 9:14 AM, Silvia, Daniel [USA]
>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>> Hi Karl
>>>>>>>>>
>>>>>>>>> When I try to use option (1) and don't put anything in the Site field, I get an error message "axisFault=Server, detail=Server was unable to process request --> Requested Registry access is not allowed" and when I put a "/" in the site filed I get  a GUI error indicating that the site field can't end with a "/".
>>>>>>>>>
>>>>>>>>> Anyway, do you have any ideas. Or maybe the Sharepoint instance is not configured properly for us to crawl?
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ________________________________________
>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>> Sent: Thursday, January 26, 2012 8:52 AM
>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>
>>>>>>>>> SharePoint has two kinds of site:
>>>>>>>>>
>>>>>>>>> (1) the root site, which can be reached by the path http://server:port
>>>>>>>>> (2) a number of sites under the 'virtual path', with URLs of the form:
>>>>>>>>>
>>>>>>>>> http://server:port/something/sitename
>>>>>>>>>
>>>>>>>>> The "something" is, by default, the string "site", so
>>>>>>>>> http://server:port/site/xyz might be the URL of one such virtual site.
>>>>>>>>>
>>>>>>>>> The form of the "site" field in the SharePoint connection for the
>>>>>>>>> first is either blank or "/" (can't remember which right now), and the
>>>>>>>>> form of the "site" field for the second is "/site/xyz".  On no account
>>>>>>>>> does the connector expect to see default.aspx attached to that path,
>>>>>>>>> so you should not do this; it cannot work.
>>>>>>>>>
>>>>>>>>> FWIW, my recommendation to try setting the connection type to
>>>>>>>>> "SharePoint 2.0" was to rule out any possible installation issue with
>>>>>>>>> the ManifoldCF sharepoint plugin.  The connection check for 2.0 does
>>>>>>>>> not look for it; only the connection check for 3.0 does.
>>>>>>>>>
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>> On Thu, Jan 26, 2012 at 8:41 AM, Silvia, Daniel [USA]
>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>> Hey Karl
>>>>>>>>>>
>>>>>>>>>> I am also getting an "HTTP Error 401.2: Unauthorized: Access is denied due to server configuration" when setting the Site field to /default.aspx. Do most Sharepoint instances have the urls set to something like http://server:port/sites/...... instead of http://server:port/? When I use the "/default.aspx" I see in the log files that ManifoldCF is trying to go to the Lists.asmx service with the url http://server:port/default.aspx/_vti_bin/Lists.asmx, where nothing is found.
>>>>>>>>>>
>>>>>>>>>> As you can tell I am not much of a SharePoint user or installer.
>>>>>>>>>>
>>>>>>>>>> Also, I don't think the issue is with the connector in ManifoldCF, I am just trying to
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ________________________________________
>>>>>>>>>> From: Silvia, Daniel [USA]
>>>>>>>>>> Sent: Thursday, January 26, 2012 7:23 AM
>>>>>>>>>> To: Karl Wright
>>>>>>>>>> Subject: RE: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>
>>>>>>>>>> Hey Karl
>>>>>>>>>>
>>>>>>>>>> The issue I am having is that the Sharepoint instance url is something like http://server:port/default.aspx. If I don't put anything in the site field I get a message indicating "Requested Registry Access is not allowed". I was putting "/default.apsx" as my Site field which I believe may have been the issue. However, what do you put in your Site field when the site is the top most site, as in http://server:port/default.aspx?
>>>>>>>>>>
>>>>>>>>>> I would love to send you the log messages, but I am working on a network which is not connected to the outside.
>>>>>>>>>>
>>>>>>>>>> Thanks for your help.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ________________________________________
>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>> Sent: Wednesday, January 25, 2012 6:12 PM
>>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>
>>>>>>>>>> Daniel,
>>>>>>>>>>
>>>>>>>>>> FWIW, I can help you diagnose the issue, but to do so you really need
>>>>>>>>>> to give me some concrete data.  I'm happy to grovel over the whole
>>>>>>>>>> wire log if you feel you can send it to me; something that may not
>>>>>>>>>> seem important to you will likely stand out strongly to me.  I can,
>>>>>>>>>> for example, see whether you are getting back HTML because of an
>>>>>>>>>> authentication error, for instance.  And if you ARE getting back valid
>>>>>>>>>> SOAP, I would then be sure that something was wrong with the Axis
>>>>>>>>>> client configuration, and I could pursue that here with the data
>>>>>>>>>> provided.  The problem with software like SharePoint running on IIS is
>>>>>>>>>> that it can be configured a nearly infinite number of ways, so
>>>>>>>>>> diagnosis is more of an art than a science.  I strongly suspect that
>>>>>>>>>> you're laboring under a pretty straightforward misconception which is
>>>>>>>>>> likely blocking progress, rather than there being an issue with the
>>>>>>>>>> SharePoint connector itself.  But I can't tell that without more
>>>>>>>>>> detailed communication.
>>>>>>>>>>
>>>>>>>>>> Also, you mentioned that the Lists.asmx service was right where you
>>>>>>>>>> expected it to be.  Have you read the SharePoint Connector part of the
>>>>>>>>>> end-user documentation?  To whit:
>>>>>>>>>>
>>>>>>>>>> "Select the server protocol, and enter the server name and port, based
>>>>>>>>>> on what you recorded from the URL for your SharePoint site. For the
>>>>>>>>>> "Site path" field, type in the portion of the root site URL that
>>>>>>>>>> includes everything after the server and port, except for the final
>>>>>>>>>> "aspx" file. For example, if the SharePoint URL is
>>>>>>>>>> "http://myserver:81/sites/somewhere/index.asp", the site path would be
>>>>>>>>>> "/sites/somewhere"."  The Lists.asmx service in this example would be
>>>>>>>>>> expected to be found at
>>>>>>>>>> "http://myserver:81/sites/somewhere/_vti_bin/Lists.asmx".  And the URL
>>>>>>>>>> you would start with would be the URL you see in the browser when you
>>>>>>>>>> log into the SharePoint web client and go to the site you wish to
>>>>>>>>>> crawl.  Is this what you are doing?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks again,
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Jan 25, 2012 at 12:33 PM, Karl Wright <da...@gmail.com> wrote:
>>>>>>>>>>> The code that parses the SOAP response is Apache Axis.  This hasn't
>>>>>>>>>>> changed in several years.
>>>>>>>>>>>
>>>>>>>>>>> Can you answer the following questions:
>>>>>>>>>>>
>>>>>>>>>>> (1) When the SharePoint connector makes a request to SharePoint, is
>>>>>>>>>>> the response HTML, or is it XML?  Does it have an XML header which
>>>>>>>>>>> describes a Microsoft XML namespace?  It sure sounds like it is
>>>>>>>>>>> responding with HTML.  The SharePoint connector is expecting to
>>>>>>>>>>> communicate using SOAP.  Is the response valid SOAP?
>>>>>>>>>>>
>>>>>>>>>>> (2) What version of SharePoint are you trying to connect to?  Is the
>>>>>>>>>>> SharePoint 2007?  SharePoint 2010?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jan 25, 2012 at 12:26 PM, Silvia, Daniel [USA]
>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>
>>>>>>>>>>>> I have added the specific log4j lines for Http Client wire and I restarted the ManifoldCF instance. I was also see the webservice Lists.asmx through IE. When reviewing the log files I was able to see some of the content that resides in the Sharepoint instance in the content coming back from the request. However, I am still seeing the error messages in the ManifoldCF GUI as well as in the log file indicating  "Bad Envelope: HTML" ,"No service named ListsSoap is available" and "No service named http://schemas.microsoft.com/sharepoint/soap/GetListCollection is available".
>>>>>>>>>>>>
>>>>>>>>>>>> Could there be something going on with the way the services are being built on the client side?
>>>>>>>>>>>>
>>>>>>>>>>>> Appreciate your help.
>>>>>>>>>>>>
>>>>>>>>>>>> Dan
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ________________________________________
>>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>>> Sent: Tuesday, January 24, 2012 4:52 PM
>>>>>>>>>>>> To: Silvia, Daniel [USA]; connectors-user@incubator.apache.org
>>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>>
>>>>>>>>>>>> I have not seen this exact problem before.
>>>>>>>>>>>>
>>>>>>>>>>>> The "Bad envelope tag: HTML" indicates that the SOAP request the
>>>>>>>>>>>> SharePoint connector is attempting to perform is, in fact, returning
>>>>>>>>>>>> an HTML response.  This usually indicates that the server or path
>>>>>>>>>>>> parameters you've used to set up the connection are not set correctly,
>>>>>>>>>>>> and SharePoint is not actually being engaged.
>>>>>>>>>>>>
>>>>>>>>>>>> But usually when that happens I don't recall a ConfigurationException
>>>>>>>>>>>> logged, unless it's what Axis does in response to the HTML.
>>>>>>>>>>>>
>>>>>>>>>>>> The best thing to do at this point is turn on Http Client wire
>>>>>>>>>>>> logging, restart ManifoldCF, and view the connection.  The log will
>>>>>>>>>>>> then contain a record of the exact SOAP requests and the responses,
>>>>>>>>>>>> and we can see what's wrong.  The technique is described here:
>>>>>>>>>>>>
>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/CONNECTORS/Debugging+Connections
>>>>>>>>>>>>
>>>>>>>>>>>> You can also confirm that the right SharePoint web services are
>>>>>>>>>>>> functioning on the machine in question by trying to access them
>>>>>>>>>>>> directly.  For the Lists web service, which is the one it sounds like
>>>>>>>>>>>> it was complaining about, try using IE (not Firefox etc because you
>>>>>>>>>>>> want NTLM support) to go to the url where you think the web service
>>>>>>>>>>>> lives.  This will be http: or https:, plus the server, plus the port,
>>>>>>>>>>>> plus the path, plus "_vti_bin/Lists.asmx".  You should see an
>>>>>>>>>>>> unequivocable SharePoint response.  For an example from the Microsoft
>>>>>>>>>>>> demo service, try http://www.wssdemo.com/_vti_bin/Lists.asmx.
>>>>>>>>>>>>
>>>>>>>>>>>> Please let me know how it goes, and cc the dev list (as I have) so a
>>>>>>>>>>>> record of what you're encountering can be made available to others.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks!
>>>>>>>>>>>> Karl
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jan 24, 2012 at 1:52 PM, Silvia, Daniel [USA]
>>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have downloaded the newest version of ManifoldCF v .4 and have run the necessary ant scripts to download dependencies and then built the entire project. I have also had the ShrePoint webservice MetCarta.SharePoint.MCPermissionsService.wsp deployed on the SharePoint instance due to running version 3 of SharePoint (SharePoint 2007). When I try to create a Repository Connection and select "Save" I get a message on the ManifoldCF front end of "org.xml.sax.SAXException Bad envelope tag: HTML". When I look at the log file I see an error message " org.apache.axis.ConfigurationException: No service named ListsSoap is available".
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can you tell me if you have seen this issue before and what may be causing this issue?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Dan
>>>>>>>>>>>>> ________________________________________
>>>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>>>> Sent: Friday, January 20, 2012 7:31 AM
>>>>>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>>>
>>>>>>>>>>>>> In order for the SharePoint connector to build, you need to have the
>>>>>>>>>>>>> wsdls in place in the right area.  We cannot ship those because of
>>>>>>>>>>>>> potential copyright issues.  The easiest way to obtain the right
>>>>>>>>>>>>> dependencies is:
>>>>>>>>>>>>>
>>>>>>>>>>>>> ant download-dependencies
>>>>>>>>>>>>>
>>>>>>>>>>>>> Then, just build normally:
>>>>>>>>>>>>>
>>>>>>>>>>>>> ant build
>>>>>>>>>>>>>
>>>>>>>>>>>>> This will only work for ManifoldCF-0.4-incubating, or trunk.
>>>>>>>>>>>>> 0.4-incubating is still in the process of being signed off by the
>>>>>>>>>>>>> incubator, but you can find the release candidate here:
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://people.apache.org/~kwright
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Jan 20, 2012 at 7:02 AM, Silvia, Daniel [USA]
>>>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I work with Matt Parker and we are in the process of developing a pipeline
>>>>>>>>>>>>>> that uses ManifoldCF at the beginning. I just subscribed to the
>>>>>>>>>>>>>> connectors-user-subscribe@incubator.apache.org
>>>>>>>>>>>>>> group yesterday and submitted an e-mail question to the group. Can you help
>>>>>>>>>>>>>> us with the below issue?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I downloaded MCF and started playing with the default setup under Jetty and
>>>>>>>>>>>>>> Derby. It starts up without any issue. I am trying to configure a SharePoint
>>>>>>>>>>>>>> connector, connecting to SharePoint Service 3. I have been following the
>>>>>>>>>>>>>> instructions and I am at the point of deploying the custom SharePoint web
>>>>>>>>>>>>>> service to the SharePoint instance. The instructions indicate that I should
>>>>>>>>>>>>>> get the web service from dist/sharepoint-integration after building MCF.
>>>>>>>>>>>>>> However, after looking through the entire directory structure, I am unable
>>>>>>>>>>>>>> to find the service to deploy.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Can someone tell me where to find this service?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Daniel Silvia

RE: ManifoldCF's dist/shapoint-integration dir

Posted by "Silvia, Daniel [USA]" <Si...@bah.com>.
Hi Karl

The Path Rules are :

Path Match: /Shared Documents
Type: library
Action: include

Path Match: /IDD/Shared Documents
Type: library
Action: include

Path Match: /IDD/Documents
Type: library
Action: include

Path Match: /manifoldcf/Shared Documents
Type: library
Action: include

I hope this helps.

I really appreciate your help.



________________________________________
From: Karl Wright [daddywri@gmail.com]
Sent: Tuesday, January 31, 2012 10:01 AM
To: Silvia, Daniel [USA]
Cc: connectors-user@incubator.apache.org
Subject: Re: ManifoldCF's dist/shapoint-integration dir

"When I select only the fetch activity, I don't see anything in the
events, when I select the Document Ingest activity, I don't see
anything in the events."

So either you've already run the job and the documents were accessed
the first time (and won't be accessed again until they change), or the
problem is likely that your SharePoint Path Rules are not including
any documents.  It would be very helpful at this point to include a
screen shot of the job you've created.  Since you are not on the net,
perhaps you can jot down your SharePoint path rules for me to have a
look at, as they are displayed when you view the job.

Thanks,
Karl

On Tue, Jan 31, 2012 at 9:44 AM, Silvia, Daniel [USA]
<Si...@bah.com> wrote:
> Hi Karl
>
> Ok, I have created a new job and ran the job and went to the Simple History Report.
>
> I see the Events. If all the  Activities in the Simple History Report, Document Deletion(SolrPipeline), Document Ingest(SolrPipeline), and Fetch are selected I see a start job and end job for events . When I get to the Simple History Report I can select the "Connection", I don't have an option to select the Activities I run the report first.
> When I select only the fetch activity, I don't see anything in the events, when I select the Document Ingest activity, I don't see anything in the events.
>
> My solr output connection has the following information:
> Protocol: http
> Server: "the server name"
> Port:8080 (we are running solr on Jboss port 8080)
> Web Application Name: solr
> Core Name: collection1
> Update Handler: update/extract
> Remove Handler: /update
> Status Handler: /admin/ping
>
>
>
> ________________________________________
> From: Karl Wright [daddywri@gmail.com]
> Sent: Tuesday, January 31, 2012 9:00 AM
> To: Silvia, Daniel [USA]; connectors-user@incubator.apache.org
> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>
> Ok, let's do one thing at a time.
>
> First:
>
> "For the Path tab where there are Path Rules, are these the paths we
> want ManifoldCF to follow? Each site, and each Library like Documents
> and Shared Documents. And in the Metadata tab, this is the tab where
> you indicate for each "Site" and "Library" you want to include
> specific metadata or include all metadata?"
>
> For SharePoint, there are Path Rules and Metadata Rules.  The Path
> Rules describe what documents you want to include or exclude.  The
> Metadata Rules describe what metadata you want to include or exclude.
> For right now I would ignore the Metadata Rules and just make sure you
> have Path Rules that mean that you have included documents.
>
> "As I run the report, I see "Documents", "Active, and "Processed"
> where the numbers change under the "Active" column as well as the
> "Document" and "Processed" column (these just get larger, where Active
> changes). "
>
> This "report" we actually call the Job Status screen.  The fact that
> the numbers get larger and the job doesn't just end indicates that you
> are successfully crawling your SharePoint, and you have set up the job
> to include at least some documents.  This is good news.  However, this
> is NOT the "Simple History" report I was alluding to earlier.  To get
> to that report, click on the "Simple History" link on the left-hand
> navigation area.  This report will show the events of your choice
> (default - ALL recorded events) over a given time window (default: the
> last hour).  If you've done this right you should at least see a "Job
> start" event.  The events you are most interested in are the "fetch"
> (which describes all attempts to fetch documents from SharePoint) and
> "document ingest", which describe attempts to get documents into Solr.
>  You can refresh the displayed events by clicking the "Go" button in
> the middle of the screen whenever you wish.
>
> I'd like you to delete your job, create it again, and start it.  Then,
> while it is running, I'd like you to go to the "Simple History"
> screen, and select the appropriate connection (your SharePoint
> repository connection), and click the "Go" button.  So as not to skip
> anything basic:
>
> (1) What event types do you see?
> (2) Are there "fetch" events?
> (3) Are there "document ingest" events?
>
> If you see no "fetch" events, that implies you have either not
> specified any documents to include in your job, OR your Solr
> connection is configured to reject too many document types so they are
> all getting filtered out.
>
> If you see "document ingest" events, but those have errors, it implies
> that the configuration of your Solr connection is incorrect and does
> not match the way your Solr is configured.  If you send me a specific
> error code and/or text I can help you figure out what is happening.
>
> If you see "document ingest" events with NO errors, but the Solr
> instance is not getting documents, you are describing an impossible
> situation.  While your Solr instance may not be configured to have the
> Extracting Update Handler active, or it may be at a different URL than
> what you pointed at, that would definitely yield errors or
> notifications in the Simple History.
>
> Please let me know what you actually see.
> Karl
>
>
>
> On Tue, Jan 31, 2012 at 7:53 AM, Silvia, Daniel [USA]
> <Si...@bah.com> wrote:
>> Hi Karl
>>
>> I am trying to figure out why I can't see anything being indexed into our Solr index. I was looking at another post where you were working with "Martijn" and that individual was not able to see info getting into Solr. In the report  that I have set up, I have included all metadata associated to each site, Share Documents, and Documents. In the Solr Field Mapping, I am associating metadata fields that are indicated in the MetaData tab to fields that exist in our solr index.
>>
>> For the Path tab where there are Path Rules, are these the paths we want ManifoldCF to follow? Each site, and each Library like Documents and Shared Documents. And in the Metadata tab, this is the tab where you indicate for each "Site" and "Library" you want to include specific metadata or include all metadata?
>>
>> As I run the report, I see "Documents", "Active, and "Processed" where the numbers change under the "Active" column as well as the "Document" and "Processed" column (these just get larger, where Active changes). While I was researching why I may not be seeing something over on the Solr side, I saw your communication with another individual indicating that I should see something like literal.xxx=yyy in the Solr log. This is an older post so there maybe something else I should see. But the only thing I see when I look at the Solr log is "[ ] webapp=/solr path=/update/extract params={commit=true} status=0 QTime=0".
>>
>> Any ideas.
>>
>> Thanks
>>
>>
>>
>>
>>
>> ________________________________________
>> From: Karl Wright [daddywri@gmail.com]
>> Sent: Monday, January 30, 2012 10:40 AM
>> To: Silvia, Daniel [USA]
>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>
>> The default time range for the Simple History is the last hour.  I
>> suspect you are unaware of that.  If you want a different time range
>> you will have to modify the start and end time pulldowns accordingly.
>>
>> Karl
>>
>> On Mon, Jan 30, 2012 at 10:34 AM, Silvia, Daniel [USA]
>> <Si...@bah.com> wrote:
>>> Hi Karl
>>>
>>> I am looking at the Simple History in the UI and there isn't much to see, unless I am not getting what I am suppose to.  I see the "Start Time, Activity, Identifier, Bytes, and Time, I don't get anything for Result Code or Result Description. I looked in the documentation and we should be getting something in those fields, I believe.
>>>
>>> Anyway, I will look through the mail list to see what I can find.
>>>
>>> Thanks for the help.
>>>
>>> Dan
>>>
>>> ________________________________________
>>> From: Karl Wright [daddywri@gmail.com]
>>> Sent: Monday, January 30, 2012 8:24 AM
>>> To: Silvia, Daniel [USA]
>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>
>>> So just to be clear, I'm NOT talking about the ManifoldCF logging.
>>> For the Solr connector you probably won't need to turn that on; it's
>>> pretty simple and you can look at the Simple History in the UI to see
>>> what the request and response look like from Solr.  I was talking
>>> instead about Solr logging - when you run the Solr Webapp, by default
>>> all requests against the Extracting Update Handler are logged to
>>> standard error, so you will see them appear in the process window in
>>> which Solr is running.
>>>
>>> My suggestion to you is to first have a look at the Simple History for
>>> the job you are trying to run.  If you are getting back 500 errors
>>> from Solr, that means you have not set up Solr properly to work with
>>> ManifoldCF.  In recent versions of Solr, the example works fine out of
>>> the box, but when you try to deploy any other way you are often
>>> missing the jar that contains the extracting update handler, so of
>>> course nothing works.  Several people on the connectors-user list have
>>> run into this and if you search the list (go to the ManifoldCF site
>>> and click through to the mailing list page and there are links at the
>>> bottom for this purpose) you will find posts that describe exactly
>>> what is wrong and how to fix it.
>>>
>>> Hope this helps.
>>>
>>> Karl
>>>
>>>
>>> On Sun, Jan 29, 2012 at 2:30 PM, Silvia, Daniel [USA]
>>> <Si...@bah.com> wrote:
>>>> Yea,but for some reason the logging isn't coming through. The logging is set for info and I will have to change the logging level to DEBUG.
>>>>
>>>> Thanks again for your help.
>>>>
>>>>
>>>> ________________________________________
>>>> From: Karl Wright [daddywri@gmail.com]
>>>> Sent: Friday, January 27, 2012 5:06 PM
>>>> To: Silvia, Daniel [USA]
>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>
>>>> Actually, the best thing for debugging the Solr connection is looking
>>>> at standard-output on the Solr instance.  You will see all the posts
>>>> that are made and what the arguments were.  Also, this is the kind of
>>>> question you'd get a lot of benefit from posting to the list.  The
>>>> end-user documentation I pointed you at before describes some of this
>>>> but the Solr connector has grown beyond the doc to some extent at this
>>>> point.
>>>>
>>>> Karl
>>>>
>>>> On Fri, Jan 27, 2012 at 9:51 AM, Silvia, Daniel [USA]
>>>> <Si...@bah.com> wrote:
>>>>> Hi Karl
>>>>>
>>>>> Is there a log level other than  Wire-level debugging to view log staements for trying to send output to a Solr instance in the Jobs List/Creation section? We are having an issue getting content to Solr. Is there a document anywhere which defines the fields for the Jobs sections for the Solr Field Mapping tab and the Paths and MetaData tabs?
>>>>>
>>>>> Thanks
>>>>>
>>>>> Dan
>>>>>
>>>>> ________________________________________
>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>> Sent: Thursday, January 26, 2012 10:44 AM
>>>>> To: Silvia, Daniel [USA]
>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>
>>>>> I am afraid I don't know the answer to that.  I'm sure it's infinitely
>>>>> configurable but it's not clear what the SharePoint web services need
>>>>> to do under the hood, so anything I tell you would be just a guess.
>>>>>
>>>>> Karl
>>>>>
>>>>> On Thu, Jan 26, 2012 at 10:43 AM, Silvia, Daniel [USA]
>>>>> <Si...@bah.com> wrote:
>>>>>> Hi Karl
>>>>>>
>>>>>> One more question. Do you know the minimum permissions needed to crawl the Sharepoint instance and all sites under the instance? The individual who set my permissions set me up as the "site collection admin" for the top most site. Is there a specific admin role without setting the user crawling the sharpoint instance other than "Farm Admin"?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> ________________________________________
>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>> Sent: Thursday, January 26, 2012 9:53 AM
>>>>>> To: Silvia, Daniel [USA]
>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>
>>>>>> Good news!  Please keep in touch; we'd like to hear how things work
>>>>>> for you (it helps keep the software fresh ;-) ).
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>> On Thu, Jan 26, 2012 at 9:48 AM, Silvia, Daniel [USA]
>>>>>> <Si...@bah.com> wrote:
>>>>>>> Hey Karl
>>>>>>>
>>>>>>> (1) was the issue. When requesting access to the SharePoint instance I indicated that I needed to be able to crawl SharePoint, I guess the problem was on my end indicating that I also needed privileges to crawl the site.
>>>>>>>
>>>>>>> Anyway, thank you for your help. When I change the SharePoint version to v 3 I get a message indicating "Connection Working".
>>>>>>>
>>>>>>> Appreciate the help.
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>> ________________________________________
>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>> Sent: Thursday, January 26, 2012 9:19 AM
>>>>>>> To: Silvia, Daniel [USA]
>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>
>>>>>>> The error message "axisFault=Server, detail=Server was unable to
>>>>>>> process request --> Requested Registry access is not allowed" is Axis
>>>>>>> interpreting an error message from SharePoint.  What it is saying is
>>>>>>> that the user you are trying to crawl with is unable to read the
>>>>>>> SharePoint machine's registry but needs to.  There are two possible
>>>>>>> causes for this:
>>>>>>>
>>>>>>> (1) The user you gave doesn't have enough permissions to crawl SharePoint
>>>>>>> (2) When you installed the SharePoint MCPermissions plugin, you
>>>>>>> installed it logged in as a user that did not enough permissions to do
>>>>>>> what it needs to do.
>>>>>>>
>>>>>>> You can tell the difference between the two by selecting "SharePoint
>>>>>>> 2.0" in the sharepoint version pulldown.  If a connection saved in
>>>>>>> this way says "Connection working", it means that the MCPermissions
>>>>>>> plugin has the permission problem, not your user.
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>> On Thu, Jan 26, 2012 at 9:14 AM, Silvia, Daniel [USA]
>>>>>>> <Si...@bah.com> wrote:
>>>>>>>> Hi Karl
>>>>>>>>
>>>>>>>> When I try to use option (1) and don't put anything in the Site field, I get an error message "axisFault=Server, detail=Server was unable to process request --> Requested Registry access is not allowed" and when I put a "/" in the site filed I get  a GUI error indicating that the site field can't end with a "/".
>>>>>>>>
>>>>>>>> Anyway, do you have any ideas. Or maybe the Sharepoint instance is not configured properly for us to crawl?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ________________________________________
>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>> Sent: Thursday, January 26, 2012 8:52 AM
>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>
>>>>>>>> SharePoint has two kinds of site:
>>>>>>>>
>>>>>>>> (1) the root site, which can be reached by the path http://server:port
>>>>>>>> (2) a number of sites under the 'virtual path', with URLs of the form:
>>>>>>>>
>>>>>>>> http://server:port/something/sitename
>>>>>>>>
>>>>>>>> The "something" is, by default, the string "site", so
>>>>>>>> http://server:port/site/xyz might be the URL of one such virtual site.
>>>>>>>>
>>>>>>>> The form of the "site" field in the SharePoint connection for the
>>>>>>>> first is either blank or "/" (can't remember which right now), and the
>>>>>>>> form of the "site" field for the second is "/site/xyz".  On no account
>>>>>>>> does the connector expect to see default.aspx attached to that path,
>>>>>>>> so you should not do this; it cannot work.
>>>>>>>>
>>>>>>>> FWIW, my recommendation to try setting the connection type to
>>>>>>>> "SharePoint 2.0" was to rule out any possible installation issue with
>>>>>>>> the ManifoldCF sharepoint plugin.  The connection check for 2.0 does
>>>>>>>> not look for it; only the connection check for 3.0 does.
>>>>>>>>
>>>>>>>> Karl
>>>>>>>>
>>>>>>>> On Thu, Jan 26, 2012 at 8:41 AM, Silvia, Daniel [USA]
>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>> Hey Karl
>>>>>>>>>
>>>>>>>>> I am also getting an "HTTP Error 401.2: Unauthorized: Access is denied due to server configuration" when setting the Site field to /default.aspx. Do most Sharepoint instances have the urls set to something like http://server:port/sites/...... instead of http://server:port/? When I use the "/default.aspx" I see in the log files that ManifoldCF is trying to go to the Lists.asmx service with the url http://server:port/default.aspx/_vti_bin/Lists.asmx, where nothing is found.
>>>>>>>>>
>>>>>>>>> As you can tell I am not much of a SharePoint user or installer.
>>>>>>>>>
>>>>>>>>> Also, I don't think the issue is with the connector in ManifoldCF, I am just trying to
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ________________________________________
>>>>>>>>> From: Silvia, Daniel [USA]
>>>>>>>>> Sent: Thursday, January 26, 2012 7:23 AM
>>>>>>>>> To: Karl Wright
>>>>>>>>> Subject: RE: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>
>>>>>>>>> Hey Karl
>>>>>>>>>
>>>>>>>>> The issue I am having is that the Sharepoint instance url is something like http://server:port/default.aspx. If I don't put anything in the site field I get a message indicating "Requested Registry Access is not allowed". I was putting "/default.apsx" as my Site field which I believe may have been the issue. However, what do you put in your Site field when the site is the top most site, as in http://server:port/default.aspx?
>>>>>>>>>
>>>>>>>>> I would love to send you the log messages, but I am working on a network which is not connected to the outside.
>>>>>>>>>
>>>>>>>>> Thanks for your help.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ________________________________________
>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>> Sent: Wednesday, January 25, 2012 6:12 PM
>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>
>>>>>>>>> Daniel,
>>>>>>>>>
>>>>>>>>> FWIW, I can help you diagnose the issue, but to do so you really need
>>>>>>>>> to give me some concrete data.  I'm happy to grovel over the whole
>>>>>>>>> wire log if you feel you can send it to me; something that may not
>>>>>>>>> seem important to you will likely stand out strongly to me.  I can,
>>>>>>>>> for example, see whether you are getting back HTML because of an
>>>>>>>>> authentication error, for instance.  And if you ARE getting back valid
>>>>>>>>> SOAP, I would then be sure that something was wrong with the Axis
>>>>>>>>> client configuration, and I could pursue that here with the data
>>>>>>>>> provided.  The problem with software like SharePoint running on IIS is
>>>>>>>>> that it can be configured a nearly infinite number of ways, so
>>>>>>>>> diagnosis is more of an art than a science.  I strongly suspect that
>>>>>>>>> you're laboring under a pretty straightforward misconception which is
>>>>>>>>> likely blocking progress, rather than there being an issue with the
>>>>>>>>> SharePoint connector itself.  But I can't tell that without more
>>>>>>>>> detailed communication.
>>>>>>>>>
>>>>>>>>> Also, you mentioned that the Lists.asmx service was right where you
>>>>>>>>> expected it to be.  Have you read the SharePoint Connector part of the
>>>>>>>>> end-user documentation?  To whit:
>>>>>>>>>
>>>>>>>>> "Select the server protocol, and enter the server name and port, based
>>>>>>>>> on what you recorded from the URL for your SharePoint site. For the
>>>>>>>>> "Site path" field, type in the portion of the root site URL that
>>>>>>>>> includes everything after the server and port, except for the final
>>>>>>>>> "aspx" file. For example, if the SharePoint URL is
>>>>>>>>> "http://myserver:81/sites/somewhere/index.asp", the site path would be
>>>>>>>>> "/sites/somewhere"."  The Lists.asmx service in this example would be
>>>>>>>>> expected to be found at
>>>>>>>>> "http://myserver:81/sites/somewhere/_vti_bin/Lists.asmx".  And the URL
>>>>>>>>> you would start with would be the URL you see in the browser when you
>>>>>>>>> log into the SharePoint web client and go to the site you wish to
>>>>>>>>> crawl.  Is this what you are doing?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks again,
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jan 25, 2012 at 12:33 PM, Karl Wright <da...@gmail.com> wrote:
>>>>>>>>>> The code that parses the SOAP response is Apache Axis.  This hasn't
>>>>>>>>>> changed in several years.
>>>>>>>>>>
>>>>>>>>>> Can you answer the following questions:
>>>>>>>>>>
>>>>>>>>>> (1) When the SharePoint connector makes a request to SharePoint, is
>>>>>>>>>> the response HTML, or is it XML?  Does it have an XML header which
>>>>>>>>>> describes a Microsoft XML namespace?  It sure sounds like it is
>>>>>>>>>> responding with HTML.  The SharePoint connector is expecting to
>>>>>>>>>> communicate using SOAP.  Is the response valid SOAP?
>>>>>>>>>>
>>>>>>>>>> (2) What version of SharePoint are you trying to connect to?  Is the
>>>>>>>>>> SharePoint 2007?  SharePoint 2010?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>> On Wed, Jan 25, 2012 at 12:26 PM, Silvia, Daniel [USA]
>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>> Hi Karl
>>>>>>>>>>>
>>>>>>>>>>> I have added the specific log4j lines for Http Client wire and I restarted the ManifoldCF instance. I was also see the webservice Lists.asmx through IE. When reviewing the log files I was able to see some of the content that resides in the Sharepoint instance in the content coming back from the request. However, I am still seeing the error messages in the ManifoldCF GUI as well as in the log file indicating  "Bad Envelope: HTML" ,"No service named ListsSoap is available" and "No service named http://schemas.microsoft.com/sharepoint/soap/GetListCollection is available".
>>>>>>>>>>>
>>>>>>>>>>> Could there be something going on with the way the services are being built on the client side?
>>>>>>>>>>>
>>>>>>>>>>> Appreciate your help.
>>>>>>>>>>>
>>>>>>>>>>> Dan
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ________________________________________
>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>> Sent: Tuesday, January 24, 2012 4:52 PM
>>>>>>>>>>> To: Silvia, Daniel [USA]; connectors-user@incubator.apache.org
>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>
>>>>>>>>>>> I have not seen this exact problem before.
>>>>>>>>>>>
>>>>>>>>>>> The "Bad envelope tag: HTML" indicates that the SOAP request the
>>>>>>>>>>> SharePoint connector is attempting to perform is, in fact, returning
>>>>>>>>>>> an HTML response.  This usually indicates that the server or path
>>>>>>>>>>> parameters you've used to set up the connection are not set correctly,
>>>>>>>>>>> and SharePoint is not actually being engaged.
>>>>>>>>>>>
>>>>>>>>>>> But usually when that happens I don't recall a ConfigurationException
>>>>>>>>>>> logged, unless it's what Axis does in response to the HTML.
>>>>>>>>>>>
>>>>>>>>>>> The best thing to do at this point is turn on Http Client wire
>>>>>>>>>>> logging, restart ManifoldCF, and view the connection.  The log will
>>>>>>>>>>> then contain a record of the exact SOAP requests and the responses,
>>>>>>>>>>> and we can see what's wrong.  The technique is described here:
>>>>>>>>>>>
>>>>>>>>>>> https://cwiki.apache.org/confluence/display/CONNECTORS/Debugging+Connections
>>>>>>>>>>>
>>>>>>>>>>> You can also confirm that the right SharePoint web services are
>>>>>>>>>>> functioning on the machine in question by trying to access them
>>>>>>>>>>> directly.  For the Lists web service, which is the one it sounds like
>>>>>>>>>>> it was complaining about, try using IE (not Firefox etc because you
>>>>>>>>>>> want NTLM support) to go to the url where you think the web service
>>>>>>>>>>> lives.  This will be http: or https:, plus the server, plus the port,
>>>>>>>>>>> plus the path, plus "_vti_bin/Lists.asmx".  You should see an
>>>>>>>>>>> unequivocable SharePoint response.  For an example from the Microsoft
>>>>>>>>>>> demo service, try http://www.wssdemo.com/_vti_bin/Lists.asmx.
>>>>>>>>>>>
>>>>>>>>>>> Please let me know how it goes, and cc the dev list (as I have) so a
>>>>>>>>>>> record of what you're encountering can be made available to others.
>>>>>>>>>>>
>>>>>>>>>>> Thanks!
>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jan 24, 2012 at 1:52 PM, Silvia, Daniel [USA]
>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>
>>>>>>>>>>>> I have downloaded the newest version of ManifoldCF v .4 and have run the necessary ant scripts to download dependencies and then built the entire project. I have also had the ShrePoint webservice MetCarta.SharePoint.MCPermissionsService.wsp deployed on the SharePoint instance due to running version 3 of SharePoint (SharePoint 2007). When I try to create a Repository Connection and select "Save" I get a message on the ManifoldCF front end of "org.xml.sax.SAXException Bad envelope tag: HTML". When I look at the log file I see an error message " org.apache.axis.ConfigurationException: No service named ListsSoap is available".
>>>>>>>>>>>>
>>>>>>>>>>>> Can you tell me if you have seen this issue before and what may be causing this issue?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>>
>>>>>>>>>>>> Dan
>>>>>>>>>>>> ________________________________________
>>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>>> Sent: Friday, January 20, 2012 7:31 AM
>>>>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>>
>>>>>>>>>>>> In order for the SharePoint connector to build, you need to have the
>>>>>>>>>>>> wsdls in place in the right area.  We cannot ship those because of
>>>>>>>>>>>> potential copyright issues.  The easiest way to obtain the right
>>>>>>>>>>>> dependencies is:
>>>>>>>>>>>>
>>>>>>>>>>>> ant download-dependencies
>>>>>>>>>>>>
>>>>>>>>>>>> Then, just build normally:
>>>>>>>>>>>>
>>>>>>>>>>>> ant build
>>>>>>>>>>>>
>>>>>>>>>>>> This will only work for ManifoldCF-0.4-incubating, or trunk.
>>>>>>>>>>>> 0.4-incubating is still in the process of being signed off by the
>>>>>>>>>>>> incubator, but you can find the release candidate here:
>>>>>>>>>>>>
>>>>>>>>>>>> http://people.apache.org/~kwright
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Karl
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jan 20, 2012 at 7:02 AM, Silvia, Daniel [USA]
>>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>>
>>>>>>>>>>>>> I work with Matt Parker and we are in the process of developing a pipeline
>>>>>>>>>>>>> that uses ManifoldCF at the beginning. I just subscribed to the
>>>>>>>>>>>>> connectors-user-subscribe@incubator.apache.org
>>>>>>>>>>>>> group yesterday and submitted an e-mail question to the group. Can you help
>>>>>>>>>>>>> us with the below issue?
>>>>>>>>>>>>>
>>>>>>>>>>>>> I downloaded MCF and started playing with the default setup under Jetty and
>>>>>>>>>>>>> Derby. It starts up without any issue. I am trying to configure a SharePoint
>>>>>>>>>>>>> connector, connecting to SharePoint Service 3. I have been following the
>>>>>>>>>>>>> instructions and I am at the point of deploying the custom SharePoint web
>>>>>>>>>>>>> service to the SharePoint instance. The instructions indicate that I should
>>>>>>>>>>>>> get the web service from dist/sharepoint-integration after building MCF.
>>>>>>>>>>>>> However, after looking through the entire directory structure, I am unable
>>>>>>>>>>>>> to find the service to deploy.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can someone tell me where to find this service?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Daniel Silvia

Re: ManifoldCF's dist/shapoint-integration dir

Posted by Karl Wright <da...@gmail.com>.
"When I select only the fetch activity, I don't see anything in the
events, when I select the Document Ingest activity, I don't see
anything in the events."

So either you've already run the job and the documents were accessed
the first time (and won't be accessed again until they change), or the
problem is likely that your SharePoint Path Rules are not including
any documents.  It would be very helpful at this point to include a
screen shot of the job you've created.  Since you are not on the net,
perhaps you can jot down your SharePoint path rules for me to have a
look at, as they are displayed when you view the job.

Thanks,
Karl

On Tue, Jan 31, 2012 at 9:44 AM, Silvia, Daniel [USA]
<Si...@bah.com> wrote:
> Hi Karl
>
> Ok, I have created a new job and ran the job and went to the Simple History Report.
>
> I see the Events. If all the  Activities in the Simple History Report, Document Deletion(SolrPipeline), Document Ingest(SolrPipeline), and Fetch are selected I see a start job and end job for events . When I get to the Simple History Report I can select the "Connection", I don't have an option to select the Activities I run the report first.
> When I select only the fetch activity, I don't see anything in the events, when I select the Document Ingest activity, I don't see anything in the events.
>
> My solr output connection has the following information:
> Protocol: http
> Server: "the server name"
> Port:8080 (we are running solr on Jboss port 8080)
> Web Application Name: solr
> Core Name: collection1
> Update Handler: update/extract
> Remove Handler: /update
> Status Handler: /admin/ping
>
>
>
> ________________________________________
> From: Karl Wright [daddywri@gmail.com]
> Sent: Tuesday, January 31, 2012 9:00 AM
> To: Silvia, Daniel [USA]; connectors-user@incubator.apache.org
> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>
> Ok, let's do one thing at a time.
>
> First:
>
> "For the Path tab where there are Path Rules, are these the paths we
> want ManifoldCF to follow? Each site, and each Library like Documents
> and Shared Documents. And in the Metadata tab, this is the tab where
> you indicate for each "Site" and "Library" you want to include
> specific metadata or include all metadata?"
>
> For SharePoint, there are Path Rules and Metadata Rules.  The Path
> Rules describe what documents you want to include or exclude.  The
> Metadata Rules describe what metadata you want to include or exclude.
> For right now I would ignore the Metadata Rules and just make sure you
> have Path Rules that mean that you have included documents.
>
> "As I run the report, I see "Documents", "Active, and "Processed"
> where the numbers change under the "Active" column as well as the
> "Document" and "Processed" column (these just get larger, where Active
> changes). "
>
> This "report" we actually call the Job Status screen.  The fact that
> the numbers get larger and the job doesn't just end indicates that you
> are successfully crawling your SharePoint, and you have set up the job
> to include at least some documents.  This is good news.  However, this
> is NOT the "Simple History" report I was alluding to earlier.  To get
> to that report, click on the "Simple History" link on the left-hand
> navigation area.  This report will show the events of your choice
> (default - ALL recorded events) over a given time window (default: the
> last hour).  If you've done this right you should at least see a "Job
> start" event.  The events you are most interested in are the "fetch"
> (which describes all attempts to fetch documents from SharePoint) and
> "document ingest", which describe attempts to get documents into Solr.
>  You can refresh the displayed events by clicking the "Go" button in
> the middle of the screen whenever you wish.
>
> I'd like you to delete your job, create it again, and start it.  Then,
> while it is running, I'd like you to go to the "Simple History"
> screen, and select the appropriate connection (your SharePoint
> repository connection), and click the "Go" button.  So as not to skip
> anything basic:
>
> (1) What event types do you see?
> (2) Are there "fetch" events?
> (3) Are there "document ingest" events?
>
> If you see no "fetch" events, that implies you have either not
> specified any documents to include in your job, OR your Solr
> connection is configured to reject too many document types so they are
> all getting filtered out.
>
> If you see "document ingest" events, but those have errors, it implies
> that the configuration of your Solr connection is incorrect and does
> not match the way your Solr is configured.  If you send me a specific
> error code and/or text I can help you figure out what is happening.
>
> If you see "document ingest" events with NO errors, but the Solr
> instance is not getting documents, you are describing an impossible
> situation.  While your Solr instance may not be configured to have the
> Extracting Update Handler active, or it may be at a different URL than
> what you pointed at, that would definitely yield errors or
> notifications in the Simple History.
>
> Please let me know what you actually see.
> Karl
>
>
>
> On Tue, Jan 31, 2012 at 7:53 AM, Silvia, Daniel [USA]
> <Si...@bah.com> wrote:
>> Hi Karl
>>
>> I am trying to figure out why I can't see anything being indexed into our Solr index. I was looking at another post where you were working with "Martijn" and that individual was not able to see info getting into Solr. In the report  that I have set up, I have included all metadata associated to each site, Share Documents, and Documents. In the Solr Field Mapping, I am associating metadata fields that are indicated in the MetaData tab to fields that exist in our solr index.
>>
>> For the Path tab where there are Path Rules, are these the paths we want ManifoldCF to follow? Each site, and each Library like Documents and Shared Documents. And in the Metadata tab, this is the tab where you indicate for each "Site" and "Library" you want to include specific metadata or include all metadata?
>>
>> As I run the report, I see "Documents", "Active, and "Processed" where the numbers change under the "Active" column as well as the "Document" and "Processed" column (these just get larger, where Active changes). While I was researching why I may not be seeing something over on the Solr side, I saw your communication with another individual indicating that I should see something like literal.xxx=yyy in the Solr log. This is an older post so there maybe something else I should see. But the only thing I see when I look at the Solr log is "[ ] webapp=/solr path=/update/extract params={commit=true} status=0 QTime=0".
>>
>> Any ideas.
>>
>> Thanks
>>
>>
>>
>>
>>
>> ________________________________________
>> From: Karl Wright [daddywri@gmail.com]
>> Sent: Monday, January 30, 2012 10:40 AM
>> To: Silvia, Daniel [USA]
>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>
>> The default time range for the Simple History is the last hour.  I
>> suspect you are unaware of that.  If you want a different time range
>> you will have to modify the start and end time pulldowns accordingly.
>>
>> Karl
>>
>> On Mon, Jan 30, 2012 at 10:34 AM, Silvia, Daniel [USA]
>> <Si...@bah.com> wrote:
>>> Hi Karl
>>>
>>> I am looking at the Simple History in the UI and there isn't much to see, unless I am not getting what I am suppose to.  I see the "Start Time, Activity, Identifier, Bytes, and Time, I don't get anything for Result Code or Result Description. I looked in the documentation and we should be getting something in those fields, I believe.
>>>
>>> Anyway, I will look through the mail list to see what I can find.
>>>
>>> Thanks for the help.
>>>
>>> Dan
>>>
>>> ________________________________________
>>> From: Karl Wright [daddywri@gmail.com]
>>> Sent: Monday, January 30, 2012 8:24 AM
>>> To: Silvia, Daniel [USA]
>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>
>>> So just to be clear, I'm NOT talking about the ManifoldCF logging.
>>> For the Solr connector you probably won't need to turn that on; it's
>>> pretty simple and you can look at the Simple History in the UI to see
>>> what the request and response look like from Solr.  I was talking
>>> instead about Solr logging - when you run the Solr Webapp, by default
>>> all requests against the Extracting Update Handler are logged to
>>> standard error, so you will see them appear in the process window in
>>> which Solr is running.
>>>
>>> My suggestion to you is to first have a look at the Simple History for
>>> the job you are trying to run.  If you are getting back 500 errors
>>> from Solr, that means you have not set up Solr properly to work with
>>> ManifoldCF.  In recent versions of Solr, the example works fine out of
>>> the box, but when you try to deploy any other way you are often
>>> missing the jar that contains the extracting update handler, so of
>>> course nothing works.  Several people on the connectors-user list have
>>> run into this and if you search the list (go to the ManifoldCF site
>>> and click through to the mailing list page and there are links at the
>>> bottom for this purpose) you will find posts that describe exactly
>>> what is wrong and how to fix it.
>>>
>>> Hope this helps.
>>>
>>> Karl
>>>
>>>
>>> On Sun, Jan 29, 2012 at 2:30 PM, Silvia, Daniel [USA]
>>> <Si...@bah.com> wrote:
>>>> Yea,but for some reason the logging isn't coming through. The logging is set for info and I will have to change the logging level to DEBUG.
>>>>
>>>> Thanks again for your help.
>>>>
>>>>
>>>> ________________________________________
>>>> From: Karl Wright [daddywri@gmail.com]
>>>> Sent: Friday, January 27, 2012 5:06 PM
>>>> To: Silvia, Daniel [USA]
>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>
>>>> Actually, the best thing for debugging the Solr connection is looking
>>>> at standard-output on the Solr instance.  You will see all the posts
>>>> that are made and what the arguments were.  Also, this is the kind of
>>>> question you'd get a lot of benefit from posting to the list.  The
>>>> end-user documentation I pointed you at before describes some of this
>>>> but the Solr connector has grown beyond the doc to some extent at this
>>>> point.
>>>>
>>>> Karl
>>>>
>>>> On Fri, Jan 27, 2012 at 9:51 AM, Silvia, Daniel [USA]
>>>> <Si...@bah.com> wrote:
>>>>> Hi Karl
>>>>>
>>>>> Is there a log level other than  Wire-level debugging to view log staements for trying to send output to a Solr instance in the Jobs List/Creation section? We are having an issue getting content to Solr. Is there a document anywhere which defines the fields for the Jobs sections for the Solr Field Mapping tab and the Paths and MetaData tabs?
>>>>>
>>>>> Thanks
>>>>>
>>>>> Dan
>>>>>
>>>>> ________________________________________
>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>> Sent: Thursday, January 26, 2012 10:44 AM
>>>>> To: Silvia, Daniel [USA]
>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>
>>>>> I am afraid I don't know the answer to that.  I'm sure it's infinitely
>>>>> configurable but it's not clear what the SharePoint web services need
>>>>> to do under the hood, so anything I tell you would be just a guess.
>>>>>
>>>>> Karl
>>>>>
>>>>> On Thu, Jan 26, 2012 at 10:43 AM, Silvia, Daniel [USA]
>>>>> <Si...@bah.com> wrote:
>>>>>> Hi Karl
>>>>>>
>>>>>> One more question. Do you know the minimum permissions needed to crawl the Sharepoint instance and all sites under the instance? The individual who set my permissions set me up as the "site collection admin" for the top most site. Is there a specific admin role without setting the user crawling the sharpoint instance other than "Farm Admin"?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> ________________________________________
>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>> Sent: Thursday, January 26, 2012 9:53 AM
>>>>>> To: Silvia, Daniel [USA]
>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>
>>>>>> Good news!  Please keep in touch; we'd like to hear how things work
>>>>>> for you (it helps keep the software fresh ;-) ).
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>> On Thu, Jan 26, 2012 at 9:48 AM, Silvia, Daniel [USA]
>>>>>> <Si...@bah.com> wrote:
>>>>>>> Hey Karl
>>>>>>>
>>>>>>> (1) was the issue. When requesting access to the SharePoint instance I indicated that I needed to be able to crawl SharePoint, I guess the problem was on my end indicating that I also needed privileges to crawl the site.
>>>>>>>
>>>>>>> Anyway, thank you for your help. When I change the SharePoint version to v 3 I get a message indicating "Connection Working".
>>>>>>>
>>>>>>> Appreciate the help.
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>> ________________________________________
>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>> Sent: Thursday, January 26, 2012 9:19 AM
>>>>>>> To: Silvia, Daniel [USA]
>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>
>>>>>>> The error message "axisFault=Server, detail=Server was unable to
>>>>>>> process request --> Requested Registry access is not allowed" is Axis
>>>>>>> interpreting an error message from SharePoint.  What it is saying is
>>>>>>> that the user you are trying to crawl with is unable to read the
>>>>>>> SharePoint machine's registry but needs to.  There are two possible
>>>>>>> causes for this:
>>>>>>>
>>>>>>> (1) The user you gave doesn't have enough permissions to crawl SharePoint
>>>>>>> (2) When you installed the SharePoint MCPermissions plugin, you
>>>>>>> installed it logged in as a user that did not enough permissions to do
>>>>>>> what it needs to do.
>>>>>>>
>>>>>>> You can tell the difference between the two by selecting "SharePoint
>>>>>>> 2.0" in the sharepoint version pulldown.  If a connection saved in
>>>>>>> this way says "Connection working", it means that the MCPermissions
>>>>>>> plugin has the permission problem, not your user.
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>> On Thu, Jan 26, 2012 at 9:14 AM, Silvia, Daniel [USA]
>>>>>>> <Si...@bah.com> wrote:
>>>>>>>> Hi Karl
>>>>>>>>
>>>>>>>> When I try to use option (1) and don't put anything in the Site field, I get an error message "axisFault=Server, detail=Server was unable to process request --> Requested Registry access is not allowed" and when I put a "/" in the site filed I get  a GUI error indicating that the site field can't end with a "/".
>>>>>>>>
>>>>>>>> Anyway, do you have any ideas. Or maybe the Sharepoint instance is not configured properly for us to crawl?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ________________________________________
>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>> Sent: Thursday, January 26, 2012 8:52 AM
>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>
>>>>>>>> SharePoint has two kinds of site:
>>>>>>>>
>>>>>>>> (1) the root site, which can be reached by the path http://server:port
>>>>>>>> (2) a number of sites under the 'virtual path', with URLs of the form:
>>>>>>>>
>>>>>>>> http://server:port/something/sitename
>>>>>>>>
>>>>>>>> The "something" is, by default, the string "site", so
>>>>>>>> http://server:port/site/xyz might be the URL of one such virtual site.
>>>>>>>>
>>>>>>>> The form of the "site" field in the SharePoint connection for the
>>>>>>>> first is either blank or "/" (can't remember which right now), and the
>>>>>>>> form of the "site" field for the second is "/site/xyz".  On no account
>>>>>>>> does the connector expect to see default.aspx attached to that path,
>>>>>>>> so you should not do this; it cannot work.
>>>>>>>>
>>>>>>>> FWIW, my recommendation to try setting the connection type to
>>>>>>>> "SharePoint 2.0" was to rule out any possible installation issue with
>>>>>>>> the ManifoldCF sharepoint plugin.  The connection check for 2.0 does
>>>>>>>> not look for it; only the connection check for 3.0 does.
>>>>>>>>
>>>>>>>> Karl
>>>>>>>>
>>>>>>>> On Thu, Jan 26, 2012 at 8:41 AM, Silvia, Daniel [USA]
>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>> Hey Karl
>>>>>>>>>
>>>>>>>>> I am also getting an "HTTP Error 401.2: Unauthorized: Access is denied due to server configuration" when setting the Site field to /default.aspx. Do most Sharepoint instances have the urls set to something like http://server:port/sites/...... instead of http://server:port/? When I use the "/default.aspx" I see in the log files that ManifoldCF is trying to go to the Lists.asmx service with the url http://server:port/default.aspx/_vti_bin/Lists.asmx, where nothing is found.
>>>>>>>>>
>>>>>>>>> As you can tell I am not much of a SharePoint user or installer.
>>>>>>>>>
>>>>>>>>> Also, I don't think the issue is with the connector in ManifoldCF, I am just trying to
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ________________________________________
>>>>>>>>> From: Silvia, Daniel [USA]
>>>>>>>>> Sent: Thursday, January 26, 2012 7:23 AM
>>>>>>>>> To: Karl Wright
>>>>>>>>> Subject: RE: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>
>>>>>>>>> Hey Karl
>>>>>>>>>
>>>>>>>>> The issue I am having is that the Sharepoint instance url is something like http://server:port/default.aspx. If I don't put anything in the site field I get a message indicating "Requested Registry Access is not allowed". I was putting "/default.apsx" as my Site field which I believe may have been the issue. However, what do you put in your Site field when the site is the top most site, as in http://server:port/default.aspx?
>>>>>>>>>
>>>>>>>>> I would love to send you the log messages, but I am working on a network which is not connected to the outside.
>>>>>>>>>
>>>>>>>>> Thanks for your help.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ________________________________________
>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>> Sent: Wednesday, January 25, 2012 6:12 PM
>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>
>>>>>>>>> Daniel,
>>>>>>>>>
>>>>>>>>> FWIW, I can help you diagnose the issue, but to do so you really need
>>>>>>>>> to give me some concrete data.  I'm happy to grovel over the whole
>>>>>>>>> wire log if you feel you can send it to me; something that may not
>>>>>>>>> seem important to you will likely stand out strongly to me.  I can,
>>>>>>>>> for example, see whether you are getting back HTML because of an
>>>>>>>>> authentication error, for instance.  And if you ARE getting back valid
>>>>>>>>> SOAP, I would then be sure that something was wrong with the Axis
>>>>>>>>> client configuration, and I could pursue that here with the data
>>>>>>>>> provided.  The problem with software like SharePoint running on IIS is
>>>>>>>>> that it can be configured a nearly infinite number of ways, so
>>>>>>>>> diagnosis is more of an art than a science.  I strongly suspect that
>>>>>>>>> you're laboring under a pretty straightforward misconception which is
>>>>>>>>> likely blocking progress, rather than there being an issue with the
>>>>>>>>> SharePoint connector itself.  But I can't tell that without more
>>>>>>>>> detailed communication.
>>>>>>>>>
>>>>>>>>> Also, you mentioned that the Lists.asmx service was right where you
>>>>>>>>> expected it to be.  Have you read the SharePoint Connector part of the
>>>>>>>>> end-user documentation?  To whit:
>>>>>>>>>
>>>>>>>>> "Select the server protocol, and enter the server name and port, based
>>>>>>>>> on what you recorded from the URL for your SharePoint site. For the
>>>>>>>>> "Site path" field, type in the portion of the root site URL that
>>>>>>>>> includes everything after the server and port, except for the final
>>>>>>>>> "aspx" file. For example, if the SharePoint URL is
>>>>>>>>> "http://myserver:81/sites/somewhere/index.asp", the site path would be
>>>>>>>>> "/sites/somewhere"."  The Lists.asmx service in this example would be
>>>>>>>>> expected to be found at
>>>>>>>>> "http://myserver:81/sites/somewhere/_vti_bin/Lists.asmx".  And the URL
>>>>>>>>> you would start with would be the URL you see in the browser when you
>>>>>>>>> log into the SharePoint web client and go to the site you wish to
>>>>>>>>> crawl.  Is this what you are doing?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks again,
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jan 25, 2012 at 12:33 PM, Karl Wright <da...@gmail.com> wrote:
>>>>>>>>>> The code that parses the SOAP response is Apache Axis.  This hasn't
>>>>>>>>>> changed in several years.
>>>>>>>>>>
>>>>>>>>>> Can you answer the following questions:
>>>>>>>>>>
>>>>>>>>>> (1) When the SharePoint connector makes a request to SharePoint, is
>>>>>>>>>> the response HTML, or is it XML?  Does it have an XML header which
>>>>>>>>>> describes a Microsoft XML namespace?  It sure sounds like it is
>>>>>>>>>> responding with HTML.  The SharePoint connector is expecting to
>>>>>>>>>> communicate using SOAP.  Is the response valid SOAP?
>>>>>>>>>>
>>>>>>>>>> (2) What version of SharePoint are you trying to connect to?  Is the
>>>>>>>>>> SharePoint 2007?  SharePoint 2010?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>> On Wed, Jan 25, 2012 at 12:26 PM, Silvia, Daniel [USA]
>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>> Hi Karl
>>>>>>>>>>>
>>>>>>>>>>> I have added the specific log4j lines for Http Client wire and I restarted the ManifoldCF instance. I was also see the webservice Lists.asmx through IE. When reviewing the log files I was able to see some of the content that resides in the Sharepoint instance in the content coming back from the request. However, I am still seeing the error messages in the ManifoldCF GUI as well as in the log file indicating  "Bad Envelope: HTML" ,"No service named ListsSoap is available" and "No service named http://schemas.microsoft.com/sharepoint/soap/GetListCollection is available".
>>>>>>>>>>>
>>>>>>>>>>> Could there be something going on with the way the services are being built on the client side?
>>>>>>>>>>>
>>>>>>>>>>> Appreciate your help.
>>>>>>>>>>>
>>>>>>>>>>> Dan
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ________________________________________
>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>> Sent: Tuesday, January 24, 2012 4:52 PM
>>>>>>>>>>> To: Silvia, Daniel [USA]; connectors-user@incubator.apache.org
>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>
>>>>>>>>>>> I have not seen this exact problem before.
>>>>>>>>>>>
>>>>>>>>>>> The "Bad envelope tag: HTML" indicates that the SOAP request the
>>>>>>>>>>> SharePoint connector is attempting to perform is, in fact, returning
>>>>>>>>>>> an HTML response.  This usually indicates that the server or path
>>>>>>>>>>> parameters you've used to set up the connection are not set correctly,
>>>>>>>>>>> and SharePoint is not actually being engaged.
>>>>>>>>>>>
>>>>>>>>>>> But usually when that happens I don't recall a ConfigurationException
>>>>>>>>>>> logged, unless it's what Axis does in response to the HTML.
>>>>>>>>>>>
>>>>>>>>>>> The best thing to do at this point is turn on Http Client wire
>>>>>>>>>>> logging, restart ManifoldCF, and view the connection.  The log will
>>>>>>>>>>> then contain a record of the exact SOAP requests and the responses,
>>>>>>>>>>> and we can see what's wrong.  The technique is described here:
>>>>>>>>>>>
>>>>>>>>>>> https://cwiki.apache.org/confluence/display/CONNECTORS/Debugging+Connections
>>>>>>>>>>>
>>>>>>>>>>> You can also confirm that the right SharePoint web services are
>>>>>>>>>>> functioning on the machine in question by trying to access them
>>>>>>>>>>> directly.  For the Lists web service, which is the one it sounds like
>>>>>>>>>>> it was complaining about, try using IE (not Firefox etc because you
>>>>>>>>>>> want NTLM support) to go to the url where you think the web service
>>>>>>>>>>> lives.  This will be http: or https:, plus the server, plus the port,
>>>>>>>>>>> plus the path, plus "_vti_bin/Lists.asmx".  You should see an
>>>>>>>>>>> unequivocable SharePoint response.  For an example from the Microsoft
>>>>>>>>>>> demo service, try http://www.wssdemo.com/_vti_bin/Lists.asmx.
>>>>>>>>>>>
>>>>>>>>>>> Please let me know how it goes, and cc the dev list (as I have) so a
>>>>>>>>>>> record of what you're encountering can be made available to others.
>>>>>>>>>>>
>>>>>>>>>>> Thanks!
>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jan 24, 2012 at 1:52 PM, Silvia, Daniel [USA]
>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>
>>>>>>>>>>>> I have downloaded the newest version of ManifoldCF v .4 and have run the necessary ant scripts to download dependencies and then built the entire project. I have also had the ShrePoint webservice MetCarta.SharePoint.MCPermissionsService.wsp deployed on the SharePoint instance due to running version 3 of SharePoint (SharePoint 2007). When I try to create a Repository Connection and select "Save" I get a message on the ManifoldCF front end of "org.xml.sax.SAXException Bad envelope tag: HTML". When I look at the log file I see an error message " org.apache.axis.ConfigurationException: No service named ListsSoap is available".
>>>>>>>>>>>>
>>>>>>>>>>>> Can you tell me if you have seen this issue before and what may be causing this issue?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>>
>>>>>>>>>>>> Dan
>>>>>>>>>>>> ________________________________________
>>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>>> Sent: Friday, January 20, 2012 7:31 AM
>>>>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>>
>>>>>>>>>>>> In order for the SharePoint connector to build, you need to have the
>>>>>>>>>>>> wsdls in place in the right area.  We cannot ship those because of
>>>>>>>>>>>> potential copyright issues.  The easiest way to obtain the right
>>>>>>>>>>>> dependencies is:
>>>>>>>>>>>>
>>>>>>>>>>>> ant download-dependencies
>>>>>>>>>>>>
>>>>>>>>>>>> Then, just build normally:
>>>>>>>>>>>>
>>>>>>>>>>>> ant build
>>>>>>>>>>>>
>>>>>>>>>>>> This will only work for ManifoldCF-0.4-incubating, or trunk.
>>>>>>>>>>>> 0.4-incubating is still in the process of being signed off by the
>>>>>>>>>>>> incubator, but you can find the release candidate here:
>>>>>>>>>>>>
>>>>>>>>>>>> http://people.apache.org/~kwright
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Karl
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jan 20, 2012 at 7:02 AM, Silvia, Daniel [USA]
>>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>>
>>>>>>>>>>>>> I work with Matt Parker and we are in the process of developing a pipeline
>>>>>>>>>>>>> that uses ManifoldCF at the beginning. I just subscribed to the
>>>>>>>>>>>>> connectors-user-subscribe@incubator.apache.org
>>>>>>>>>>>>> group yesterday and submitted an e-mail question to the group. Can you help
>>>>>>>>>>>>> us with the below issue?
>>>>>>>>>>>>>
>>>>>>>>>>>>> I downloaded MCF and started playing with the default setup under Jetty and
>>>>>>>>>>>>> Derby. It starts up without any issue. I am trying to configure a SharePoint
>>>>>>>>>>>>> connector, connecting to SharePoint Service 3. I have been following the
>>>>>>>>>>>>> instructions and I am at the point of deploying the custom SharePoint web
>>>>>>>>>>>>> service to the SharePoint instance. The instructions indicate that I should
>>>>>>>>>>>>> get the web service from dist/sharepoint-integration after building MCF.
>>>>>>>>>>>>> However, after looking through the entire directory structure, I am unable
>>>>>>>>>>>>> to find the service to deploy.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can someone tell me where to find this service?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Daniel Silvia

RE: ManifoldCF's dist/shapoint-integration dir

Posted by "Silvia, Daniel [USA]" <Si...@bah.com>.
Hi Karl

Ok, I have created a new job and ran the job and went to the Simple History Report.

I see the Events. If all the  Activities in the Simple History Report, Document Deletion(SolrPipeline), Document Ingest(SolrPipeline), and Fetch are selected I see a start job and end job for events . When I get to the Simple History Report I can select the "Connection", I don't have an option to select the Activities I run the report first.
When I select only the fetch activity, I don't see anything in the events, when I select the Document Ingest activity, I don't see anything in the events. 

My solr output connection has the following information:
Protocol: http
Server: "the server name"
Port:8080 (we are running solr on Jboss port 8080)
Web Application Name: solr
Core Name: collection1
Update Handler: update/extract
Remove Handler: /update
Status Handler: /admin/ping



________________________________________
From: Karl Wright [daddywri@gmail.com]
Sent: Tuesday, January 31, 2012 9:00 AM
To: Silvia, Daniel [USA]; connectors-user@incubator.apache.org
Subject: Re: ManifoldCF's dist/shapoint-integration dir

Ok, let's do one thing at a time.

First:

"For the Path tab where there are Path Rules, are these the paths we
want ManifoldCF to follow? Each site, and each Library like Documents
and Shared Documents. And in the Metadata tab, this is the tab where
you indicate for each "Site" and "Library" you want to include
specific metadata or include all metadata?"

For SharePoint, there are Path Rules and Metadata Rules.  The Path
Rules describe what documents you want to include or exclude.  The
Metadata Rules describe what metadata you want to include or exclude.
For right now I would ignore the Metadata Rules and just make sure you
have Path Rules that mean that you have included documents.

"As I run the report, I see "Documents", "Active, and "Processed"
where the numbers change under the "Active" column as well as the
"Document" and "Processed" column (these just get larger, where Active
changes). "

This "report" we actually call the Job Status screen.  The fact that
the numbers get larger and the job doesn't just end indicates that you
are successfully crawling your SharePoint, and you have set up the job
to include at least some documents.  This is good news.  However, this
is NOT the "Simple History" report I was alluding to earlier.  To get
to that report, click on the "Simple History" link on the left-hand
navigation area.  This report will show the events of your choice
(default - ALL recorded events) over a given time window (default: the
last hour).  If you've done this right you should at least see a "Job
start" event.  The events you are most interested in are the "fetch"
(which describes all attempts to fetch documents from SharePoint) and
"document ingest", which describe attempts to get documents into Solr.
 You can refresh the displayed events by clicking the "Go" button in
the middle of the screen whenever you wish.

I'd like you to delete your job, create it again, and start it.  Then,
while it is running, I'd like you to go to the "Simple History"
screen, and select the appropriate connection (your SharePoint
repository connection), and click the "Go" button.  So as not to skip
anything basic:

(1) What event types do you see?
(2) Are there "fetch" events?
(3) Are there "document ingest" events?

If you see no "fetch" events, that implies you have either not
specified any documents to include in your job, OR your Solr
connection is configured to reject too many document types so they are
all getting filtered out.

If you see "document ingest" events, but those have errors, it implies
that the configuration of your Solr connection is incorrect and does
not match the way your Solr is configured.  If you send me a specific
error code and/or text I can help you figure out what is happening.

If you see "document ingest" events with NO errors, but the Solr
instance is not getting documents, you are describing an impossible
situation.  While your Solr instance may not be configured to have the
Extracting Update Handler active, or it may be at a different URL than
what you pointed at, that would definitely yield errors or
notifications in the Simple History.

Please let me know what you actually see.
Karl



On Tue, Jan 31, 2012 at 7:53 AM, Silvia, Daniel [USA]
<Si...@bah.com> wrote:
> Hi Karl
>
> I am trying to figure out why I can't see anything being indexed into our Solr index. I was looking at another post where you were working with "Martijn" and that individual was not able to see info getting into Solr. In the report  that I have set up, I have included all metadata associated to each site, Share Documents, and Documents. In the Solr Field Mapping, I am associating metadata fields that are indicated in the MetaData tab to fields that exist in our solr index.
>
> For the Path tab where there are Path Rules, are these the paths we want ManifoldCF to follow? Each site, and each Library like Documents and Shared Documents. And in the Metadata tab, this is the tab where you indicate for each "Site" and "Library" you want to include specific metadata or include all metadata?
>
> As I run the report, I see "Documents", "Active, and "Processed" where the numbers change under the "Active" column as well as the "Document" and "Processed" column (these just get larger, where Active changes). While I was researching why I may not be seeing something over on the Solr side, I saw your communication with another individual indicating that I should see something like literal.xxx=yyy in the Solr log. This is an older post so there maybe something else I should see. But the only thing I see when I look at the Solr log is "[ ] webapp=/solr path=/update/extract params={commit=true} status=0 QTime=0".
>
> Any ideas.
>
> Thanks
>
>
>
>
>
> ________________________________________
> From: Karl Wright [daddywri@gmail.com]
> Sent: Monday, January 30, 2012 10:40 AM
> To: Silvia, Daniel [USA]
> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>
> The default time range for the Simple History is the last hour.  I
> suspect you are unaware of that.  If you want a different time range
> you will have to modify the start and end time pulldowns accordingly.
>
> Karl
>
> On Mon, Jan 30, 2012 at 10:34 AM, Silvia, Daniel [USA]
> <Si...@bah.com> wrote:
>> Hi Karl
>>
>> I am looking at the Simple History in the UI and there isn't much to see, unless I am not getting what I am suppose to.  I see the "Start Time, Activity, Identifier, Bytes, and Time, I don't get anything for Result Code or Result Description. I looked in the documentation and we should be getting something in those fields, I believe.
>>
>> Anyway, I will look through the mail list to see what I can find.
>>
>> Thanks for the help.
>>
>> Dan
>>
>> ________________________________________
>> From: Karl Wright [daddywri@gmail.com]
>> Sent: Monday, January 30, 2012 8:24 AM
>> To: Silvia, Daniel [USA]
>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>
>> So just to be clear, I'm NOT talking about the ManifoldCF logging.
>> For the Solr connector you probably won't need to turn that on; it's
>> pretty simple and you can look at the Simple History in the UI to see
>> what the request and response look like from Solr.  I was talking
>> instead about Solr logging - when you run the Solr Webapp, by default
>> all requests against the Extracting Update Handler are logged to
>> standard error, so you will see them appear in the process window in
>> which Solr is running.
>>
>> My suggestion to you is to first have a look at the Simple History for
>> the job you are trying to run.  If you are getting back 500 errors
>> from Solr, that means you have not set up Solr properly to work with
>> ManifoldCF.  In recent versions of Solr, the example works fine out of
>> the box, but when you try to deploy any other way you are often
>> missing the jar that contains the extracting update handler, so of
>> course nothing works.  Several people on the connectors-user list have
>> run into this and if you search the list (go to the ManifoldCF site
>> and click through to the mailing list page and there are links at the
>> bottom for this purpose) you will find posts that describe exactly
>> what is wrong and how to fix it.
>>
>> Hope this helps.
>>
>> Karl
>>
>>
>> On Sun, Jan 29, 2012 at 2:30 PM, Silvia, Daniel [USA]
>> <Si...@bah.com> wrote:
>>> Yea,but for some reason the logging isn't coming through. The logging is set for info and I will have to change the logging level to DEBUG.
>>>
>>> Thanks again for your help.
>>>
>>>
>>> ________________________________________
>>> From: Karl Wright [daddywri@gmail.com]
>>> Sent: Friday, January 27, 2012 5:06 PM
>>> To: Silvia, Daniel [USA]
>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>
>>> Actually, the best thing for debugging the Solr connection is looking
>>> at standard-output on the Solr instance.  You will see all the posts
>>> that are made and what the arguments were.  Also, this is the kind of
>>> question you'd get a lot of benefit from posting to the list.  The
>>> end-user documentation I pointed you at before describes some of this
>>> but the Solr connector has grown beyond the doc to some extent at this
>>> point.
>>>
>>> Karl
>>>
>>> On Fri, Jan 27, 2012 at 9:51 AM, Silvia, Daniel [USA]
>>> <Si...@bah.com> wrote:
>>>> Hi Karl
>>>>
>>>> Is there a log level other than  Wire-level debugging to view log staements for trying to send output to a Solr instance in the Jobs List/Creation section? We are having an issue getting content to Solr. Is there a document anywhere which defines the fields for the Jobs sections for the Solr Field Mapping tab and the Paths and MetaData tabs?
>>>>
>>>> Thanks
>>>>
>>>> Dan
>>>>
>>>> ________________________________________
>>>> From: Karl Wright [daddywri@gmail.com]
>>>> Sent: Thursday, January 26, 2012 10:44 AM
>>>> To: Silvia, Daniel [USA]
>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>
>>>> I am afraid I don't know the answer to that.  I'm sure it's infinitely
>>>> configurable but it's not clear what the SharePoint web services need
>>>> to do under the hood, so anything I tell you would be just a guess.
>>>>
>>>> Karl
>>>>
>>>> On Thu, Jan 26, 2012 at 10:43 AM, Silvia, Daniel [USA]
>>>> <Si...@bah.com> wrote:
>>>>> Hi Karl
>>>>>
>>>>> One more question. Do you know the minimum permissions needed to crawl the Sharepoint instance and all sites under the instance? The individual who set my permissions set me up as the "site collection admin" for the top most site. Is there a specific admin role without setting the user crawling the sharpoint instance other than "Farm Admin"?
>>>>>
>>>>> Thanks
>>>>>
>>>>> ________________________________________
>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>> Sent: Thursday, January 26, 2012 9:53 AM
>>>>> To: Silvia, Daniel [USA]
>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>
>>>>> Good news!  Please keep in touch; we'd like to hear how things work
>>>>> for you (it helps keep the software fresh ;-) ).
>>>>>
>>>>> Karl
>>>>>
>>>>> On Thu, Jan 26, 2012 at 9:48 AM, Silvia, Daniel [USA]
>>>>> <Si...@bah.com> wrote:
>>>>>> Hey Karl
>>>>>>
>>>>>> (1) was the issue. When requesting access to the SharePoint instance I indicated that I needed to be able to crawl SharePoint, I guess the problem was on my end indicating that I also needed privileges to crawl the site.
>>>>>>
>>>>>> Anyway, thank you for your help. When I change the SharePoint version to v 3 I get a message indicating "Connection Working".
>>>>>>
>>>>>> Appreciate the help.
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>> ________________________________________
>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>> Sent: Thursday, January 26, 2012 9:19 AM
>>>>>> To: Silvia, Daniel [USA]
>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>
>>>>>> The error message "axisFault=Server, detail=Server was unable to
>>>>>> process request --> Requested Registry access is not allowed" is Axis
>>>>>> interpreting an error message from SharePoint.  What it is saying is
>>>>>> that the user you are trying to crawl with is unable to read the
>>>>>> SharePoint machine's registry but needs to.  There are two possible
>>>>>> causes for this:
>>>>>>
>>>>>> (1) The user you gave doesn't have enough permissions to crawl SharePoint
>>>>>> (2) When you installed the SharePoint MCPermissions plugin, you
>>>>>> installed it logged in as a user that did not enough permissions to do
>>>>>> what it needs to do.
>>>>>>
>>>>>> You can tell the difference between the two by selecting "SharePoint
>>>>>> 2.0" in the sharepoint version pulldown.  If a connection saved in
>>>>>> this way says "Connection working", it means that the MCPermissions
>>>>>> plugin has the permission problem, not your user.
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>> On Thu, Jan 26, 2012 at 9:14 AM, Silvia, Daniel [USA]
>>>>>> <Si...@bah.com> wrote:
>>>>>>> Hi Karl
>>>>>>>
>>>>>>> When I try to use option (1) and don't put anything in the Site field, I get an error message "axisFault=Server, detail=Server was unable to process request --> Requested Registry access is not allowed" and when I put a "/" in the site filed I get  a GUI error indicating that the site field can't end with a "/".
>>>>>>>
>>>>>>> Anyway, do you have any ideas. Or maybe the Sharepoint instance is not configured properly for us to crawl?
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ________________________________________
>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>> Sent: Thursday, January 26, 2012 8:52 AM
>>>>>>> To: Silvia, Daniel [USA]
>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>
>>>>>>> SharePoint has two kinds of site:
>>>>>>>
>>>>>>> (1) the root site, which can be reached by the path http://server:port
>>>>>>> (2) a number of sites under the 'virtual path', with URLs of the form:
>>>>>>>
>>>>>>> http://server:port/something/sitename
>>>>>>>
>>>>>>> The "something" is, by default, the string "site", so
>>>>>>> http://server:port/site/xyz might be the URL of one such virtual site.
>>>>>>>
>>>>>>> The form of the "site" field in the SharePoint connection for the
>>>>>>> first is either blank or "/" (can't remember which right now), and the
>>>>>>> form of the "site" field for the second is "/site/xyz".  On no account
>>>>>>> does the connector expect to see default.aspx attached to that path,
>>>>>>> so you should not do this; it cannot work.
>>>>>>>
>>>>>>> FWIW, my recommendation to try setting the connection type to
>>>>>>> "SharePoint 2.0" was to rule out any possible installation issue with
>>>>>>> the ManifoldCF sharepoint plugin.  The connection check for 2.0 does
>>>>>>> not look for it; only the connection check for 3.0 does.
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>> On Thu, Jan 26, 2012 at 8:41 AM, Silvia, Daniel [USA]
>>>>>>> <Si...@bah.com> wrote:
>>>>>>>> Hey Karl
>>>>>>>>
>>>>>>>> I am also getting an "HTTP Error 401.2: Unauthorized: Access is denied due to server configuration" when setting the Site field to /default.aspx. Do most Sharepoint instances have the urls set to something like http://server:port/sites/...... instead of http://server:port/? When I use the "/default.aspx" I see in the log files that ManifoldCF is trying to go to the Lists.asmx service with the url http://server:port/default.aspx/_vti_bin/Lists.asmx, where nothing is found.
>>>>>>>>
>>>>>>>> As you can tell I am not much of a SharePoint user or installer.
>>>>>>>>
>>>>>>>> Also, I don't think the issue is with the connector in ManifoldCF, I am just trying to
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ________________________________________
>>>>>>>> From: Silvia, Daniel [USA]
>>>>>>>> Sent: Thursday, January 26, 2012 7:23 AM
>>>>>>>> To: Karl Wright
>>>>>>>> Subject: RE: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>
>>>>>>>> Hey Karl
>>>>>>>>
>>>>>>>> The issue I am having is that the Sharepoint instance url is something like http://server:port/default.aspx. If I don't put anything in the site field I get a message indicating "Requested Registry Access is not allowed". I was putting "/default.apsx" as my Site field which I believe may have been the issue. However, what do you put in your Site field when the site is the top most site, as in http://server:port/default.aspx?
>>>>>>>>
>>>>>>>> I would love to send you the log messages, but I am working on a network which is not connected to the outside.
>>>>>>>>
>>>>>>>> Thanks for your help.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ________________________________________
>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>> Sent: Wednesday, January 25, 2012 6:12 PM
>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>
>>>>>>>> Daniel,
>>>>>>>>
>>>>>>>> FWIW, I can help you diagnose the issue, but to do so you really need
>>>>>>>> to give me some concrete data.  I'm happy to grovel over the whole
>>>>>>>> wire log if you feel you can send it to me; something that may not
>>>>>>>> seem important to you will likely stand out strongly to me.  I can,
>>>>>>>> for example, see whether you are getting back HTML because of an
>>>>>>>> authentication error, for instance.  And if you ARE getting back valid
>>>>>>>> SOAP, I would then be sure that something was wrong with the Axis
>>>>>>>> client configuration, and I could pursue that here with the data
>>>>>>>> provided.  The problem with software like SharePoint running on IIS is
>>>>>>>> that it can be configured a nearly infinite number of ways, so
>>>>>>>> diagnosis is more of an art than a science.  I strongly suspect that
>>>>>>>> you're laboring under a pretty straightforward misconception which is
>>>>>>>> likely blocking progress, rather than there being an issue with the
>>>>>>>> SharePoint connector itself.  But I can't tell that without more
>>>>>>>> detailed communication.
>>>>>>>>
>>>>>>>> Also, you mentioned that the Lists.asmx service was right where you
>>>>>>>> expected it to be.  Have you read the SharePoint Connector part of the
>>>>>>>> end-user documentation?  To whit:
>>>>>>>>
>>>>>>>> "Select the server protocol, and enter the server name and port, based
>>>>>>>> on what you recorded from the URL for your SharePoint site. For the
>>>>>>>> "Site path" field, type in the portion of the root site URL that
>>>>>>>> includes everything after the server and port, except for the final
>>>>>>>> "aspx" file. For example, if the SharePoint URL is
>>>>>>>> "http://myserver:81/sites/somewhere/index.asp", the site path would be
>>>>>>>> "/sites/somewhere"."  The Lists.asmx service in this example would be
>>>>>>>> expected to be found at
>>>>>>>> "http://myserver:81/sites/somewhere/_vti_bin/Lists.asmx".  And the URL
>>>>>>>> you would start with would be the URL you see in the browser when you
>>>>>>>> log into the SharePoint web client and go to the site you wish to
>>>>>>>> crawl.  Is this what you are doing?
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks again,
>>>>>>>> Karl
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jan 25, 2012 at 12:33 PM, Karl Wright <da...@gmail.com> wrote:
>>>>>>>>> The code that parses the SOAP response is Apache Axis.  This hasn't
>>>>>>>>> changed in several years.
>>>>>>>>>
>>>>>>>>> Can you answer the following questions:
>>>>>>>>>
>>>>>>>>> (1) When the SharePoint connector makes a request to SharePoint, is
>>>>>>>>> the response HTML, or is it XML?  Does it have an XML header which
>>>>>>>>> describes a Microsoft XML namespace?  It sure sounds like it is
>>>>>>>>> responding with HTML.  The SharePoint connector is expecting to
>>>>>>>>> communicate using SOAP.  Is the response valid SOAP?
>>>>>>>>>
>>>>>>>>> (2) What version of SharePoint are you trying to connect to?  Is the
>>>>>>>>> SharePoint 2007?  SharePoint 2010?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>> On Wed, Jan 25, 2012 at 12:26 PM, Silvia, Daniel [USA]
>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>> Hi Karl
>>>>>>>>>>
>>>>>>>>>> I have added the specific log4j lines for Http Client wire and I restarted the ManifoldCF instance. I was also see the webservice Lists.asmx through IE. When reviewing the log files I was able to see some of the content that resides in the Sharepoint instance in the content coming back from the request. However, I am still seeing the error messages in the ManifoldCF GUI as well as in the log file indicating  "Bad Envelope: HTML" ,"No service named ListsSoap is available" and "No service named http://schemas.microsoft.com/sharepoint/soap/GetListCollection is available".
>>>>>>>>>>
>>>>>>>>>> Could there be something going on with the way the services are being built on the client side?
>>>>>>>>>>
>>>>>>>>>> Appreciate your help.
>>>>>>>>>>
>>>>>>>>>> Dan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ________________________________________
>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>> Sent: Tuesday, January 24, 2012 4:52 PM
>>>>>>>>>> To: Silvia, Daniel [USA]; connectors-user@incubator.apache.org
>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>
>>>>>>>>>> I have not seen this exact problem before.
>>>>>>>>>>
>>>>>>>>>> The "Bad envelope tag: HTML" indicates that the SOAP request the
>>>>>>>>>> SharePoint connector is attempting to perform is, in fact, returning
>>>>>>>>>> an HTML response.  This usually indicates that the server or path
>>>>>>>>>> parameters you've used to set up the connection are not set correctly,
>>>>>>>>>> and SharePoint is not actually being engaged.
>>>>>>>>>>
>>>>>>>>>> But usually when that happens I don't recall a ConfigurationException
>>>>>>>>>> logged, unless it's what Axis does in response to the HTML.
>>>>>>>>>>
>>>>>>>>>> The best thing to do at this point is turn on Http Client wire
>>>>>>>>>> logging, restart ManifoldCF, and view the connection.  The log will
>>>>>>>>>> then contain a record of the exact SOAP requests and the responses,
>>>>>>>>>> and we can see what's wrong.  The technique is described here:
>>>>>>>>>>
>>>>>>>>>> https://cwiki.apache.org/confluence/display/CONNECTORS/Debugging+Connections
>>>>>>>>>>
>>>>>>>>>> You can also confirm that the right SharePoint web services are
>>>>>>>>>> functioning on the machine in question by trying to access them
>>>>>>>>>> directly.  For the Lists web service, which is the one it sounds like
>>>>>>>>>> it was complaining about, try using IE (not Firefox etc because you
>>>>>>>>>> want NTLM support) to go to the url where you think the web service
>>>>>>>>>> lives.  This will be http: or https:, plus the server, plus the port,
>>>>>>>>>> plus the path, plus "_vti_bin/Lists.asmx".  You should see an
>>>>>>>>>> unequivocable SharePoint response.  For an example from the Microsoft
>>>>>>>>>> demo service, try http://www.wssdemo.com/_vti_bin/Lists.asmx.
>>>>>>>>>>
>>>>>>>>>> Please let me know how it goes, and cc the dev list (as I have) so a
>>>>>>>>>> record of what you're encountering can be made available to others.
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Jan 24, 2012 at 1:52 PM, Silvia, Daniel [USA]
>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>> Hi Karl
>>>>>>>>>>>
>>>>>>>>>>> I have downloaded the newest version of ManifoldCF v .4 and have run the necessary ant scripts to download dependencies and then built the entire project. I have also had the ShrePoint webservice MetCarta.SharePoint.MCPermissionsService.wsp deployed on the SharePoint instance due to running version 3 of SharePoint (SharePoint 2007). When I try to create a Repository Connection and select "Save" I get a message on the ManifoldCF front end of "org.xml.sax.SAXException Bad envelope tag: HTML". When I look at the log file I see an error message " org.apache.axis.ConfigurationException: No service named ListsSoap is available".
>>>>>>>>>>>
>>>>>>>>>>> Can you tell me if you have seen this issue before and what may be causing this issue?
>>>>>>>>>>>
>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>
>>>>>>>>>>> Dan
>>>>>>>>>>> ________________________________________
>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>> Sent: Friday, January 20, 2012 7:31 AM
>>>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>
>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>
>>>>>>>>>>> In order for the SharePoint connector to build, you need to have the
>>>>>>>>>>> wsdls in place in the right area.  We cannot ship those because of
>>>>>>>>>>> potential copyright issues.  The easiest way to obtain the right
>>>>>>>>>>> dependencies is:
>>>>>>>>>>>
>>>>>>>>>>> ant download-dependencies
>>>>>>>>>>>
>>>>>>>>>>> Then, just build normally:
>>>>>>>>>>>
>>>>>>>>>>> ant build
>>>>>>>>>>>
>>>>>>>>>>> This will only work for ManifoldCF-0.4-incubating, or trunk.
>>>>>>>>>>> 0.4-incubating is still in the process of being signed off by the
>>>>>>>>>>> incubator, but you can find the release candidate here:
>>>>>>>>>>>
>>>>>>>>>>> http://people.apache.org/~kwright
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jan 20, 2012 at 7:02 AM, Silvia, Daniel [USA]
>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>
>>>>>>>>>>>> I work with Matt Parker and we are in the process of developing a pipeline
>>>>>>>>>>>> that uses ManifoldCF at the beginning. I just subscribed to the
>>>>>>>>>>>> connectors-user-subscribe@incubator.apache.org
>>>>>>>>>>>> group yesterday and submitted an e-mail question to the group. Can you help
>>>>>>>>>>>> us with the below issue?
>>>>>>>>>>>>
>>>>>>>>>>>> I downloaded MCF and started playing with the default setup under Jetty and
>>>>>>>>>>>> Derby. It starts up without any issue. I am trying to configure a SharePoint
>>>>>>>>>>>> connector, connecting to SharePoint Service 3. I have been following the
>>>>>>>>>>>> instructions and I am at the point of deploying the custom SharePoint web
>>>>>>>>>>>> service to the SharePoint instance. The instructions indicate that I should
>>>>>>>>>>>> get the web service from dist/sharepoint-integration after building MCF.
>>>>>>>>>>>> However, after looking through the entire directory structure, I am unable
>>>>>>>>>>>> to find the service to deploy.
>>>>>>>>>>>>
>>>>>>>>>>>> Can someone tell me where to find this service?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>>
>>>>>>>>>>>> Daniel Silvia

RE: ManifoldCF's dist/shapoint-integration dir

Posted by "Silvia, Daniel [USA]" <Si...@bah.com>.
Ok, I deleted all the jobs and created a new one. I added some paths to include everything and ran the job. However, I didn't see anything different.

Looking at my solrconfig.xml file, the requestHandler has the following:

name="/update/extract"
startup="lazy"
class="solr.extraction.ExtractingRequestHandler"

< lst name=defaults>
   <str name="fmap.content">text</str>
   <str name="lowernames">true</str>
   <str name="uprefix">ignored_</str>
   <str name="lowernames">true</str>
   <str name="fmap.a">link</str>
   <str name="fmap.div">ignored_</str>
</lst>

________________________________________
From: Karl Wright [daddywri@gmail.com]
Sent: Tuesday, January 31, 2012 9:13 AM
To: Silvia, Daniel [USA]; connectors-user@incubator.apache.org
Subject: Re: ManifoldCF's dist/shapoint-integration dir

I should clarify that the reason for deleting and recreating the job
is because ManifoldCF crawls incrementally.  If you just run a job a
second time you may well not get any documents if none have changed
from the first time the job was run.

Thanks,
Karl

On Tue, Jan 31, 2012 at 9:00 AM, Karl Wright <da...@gmail.com> wrote:
> Ok, let's do one thing at a time.
>
> First:
>
> "For the Path tab where there are Path Rules, are these the paths we
> want ManifoldCF to follow? Each site, and each Library like Documents
> and Shared Documents. And in the Metadata tab, this is the tab where
> you indicate for each "Site" and "Library" you want to include
> specific metadata or include all metadata?"
>
> For SharePoint, there are Path Rules and Metadata Rules.  The Path
> Rules describe what documents you want to include or exclude.  The
> Metadata Rules describe what metadata you want to include or exclude.
> For right now I would ignore the Metadata Rules and just make sure you
> have Path Rules that mean that you have included documents.
>
> "As I run the report, I see "Documents", "Active, and "Processed"
> where the numbers change under the "Active" column as well as the
> "Document" and "Processed" column (these just get larger, where Active
> changes). "
>
> This "report" we actually call the Job Status screen.  The fact that
> the numbers get larger and the job doesn't just end indicates that you
> are successfully crawling your SharePoint, and you have set up the job
> to include at least some documents.  This is good news.  However, this
> is NOT the "Simple History" report I was alluding to earlier.  To get
> to that report, click on the "Simple History" link on the left-hand
> navigation area.  This report will show the events of your choice
> (default - ALL recorded events) over a given time window (default: the
> last hour).  If you've done this right you should at least see a "Job
> start" event.  The events you are most interested in are the "fetch"
> (which describes all attempts to fetch documents from SharePoint) and
> "document ingest", which describe attempts to get documents into Solr.
>  You can refresh the displayed events by clicking the "Go" button in
> the middle of the screen whenever you wish.
>
> I'd like you to delete your job, create it again, and start it.  Then,
> while it is running, I'd like you to go to the "Simple History"
> screen, and select the appropriate connection (your SharePoint
> repository connection), and click the "Go" button.  So as not to skip
> anything basic:
>
> (1) What event types do you see?
> (2) Are there "fetch" events?
> (3) Are there "document ingest" events?
>
> If you see no "fetch" events, that implies you have either not
> specified any documents to include in your job, OR your Solr
> connection is configured to reject too many document types so they are
> all getting filtered out.
>
> If you see "document ingest" events, but those have errors, it implies
> that the configuration of your Solr connection is incorrect and does
> not match the way your Solr is configured.  If you send me a specific
> error code and/or text I can help you figure out what is happening.
>
> If you see "document ingest" events with NO errors, but the Solr
> instance is not getting documents, you are describing an impossible
> situation.  While your Solr instance may not be configured to have the
> Extracting Update Handler active, or it may be at a different URL than
> what you pointed at, that would definitely yield errors or
> notifications in the Simple History.
>
> Please let me know what you actually see.
> Karl
>
>
>
> On Tue, Jan 31, 2012 at 7:53 AM, Silvia, Daniel [USA]
> <Si...@bah.com> wrote:
>> Hi Karl
>>
>> I am trying to figure out why I can't see anything being indexed into our Solr index. I was looking at another post where you were working with "Martijn" and that individual was not able to see info getting into Solr. In the report  that I have set up, I have included all metadata associated to each site, Share Documents, and Documents. In the Solr Field Mapping, I am associating metadata fields that are indicated in the MetaData tab to fields that exist in our solr index.
>>
>> For the Path tab where there are Path Rules, are these the paths we want ManifoldCF to follow? Each site, and each Library like Documents and Shared Documents. And in the Metadata tab, this is the tab where you indicate for each "Site" and "Library" you want to include specific metadata or include all metadata?
>>
>> As I run the report, I see "Documents", "Active, and "Processed" where the numbers change under the "Active" column as well as the "Document" and "Processed" column (these just get larger, where Active changes). While I was researching why I may not be seeing something over on the Solr side, I saw your communication with another individual indicating that I should see something like literal.xxx=yyy in the Solr log. This is an older post so there maybe something else I should see. But the only thing I see when I look at the Solr log is "[ ] webapp=/solr path=/update/extract params={commit=true} status=0 QTime=0".
>>
>> Any ideas.
>>
>> Thanks
>>
>>
>>
>>
>>
>> ________________________________________
>> From: Karl Wright [daddywri@gmail.com]
>> Sent: Monday, January 30, 2012 10:40 AM
>> To: Silvia, Daniel [USA]
>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>
>> The default time range for the Simple History is the last hour.  I
>> suspect you are unaware of that.  If you want a different time range
>> you will have to modify the start and end time pulldowns accordingly.
>>
>> Karl
>>
>> On Mon, Jan 30, 2012 at 10:34 AM, Silvia, Daniel [USA]
>> <Si...@bah.com> wrote:
>>> Hi Karl
>>>
>>> I am looking at the Simple History in the UI and there isn't much to see, unless I am not getting what I am suppose to.  I see the "Start Time, Activity, Identifier, Bytes, and Time, I don't get anything for Result Code or Result Description. I looked in the documentation and we should be getting something in those fields, I believe.
>>>
>>> Anyway, I will look through the mail list to see what I can find.
>>>
>>> Thanks for the help.
>>>
>>> Dan
>>>
>>> ________________________________________
>>> From: Karl Wright [daddywri@gmail.com]
>>> Sent: Monday, January 30, 2012 8:24 AM
>>> To: Silvia, Daniel [USA]
>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>
>>> So just to be clear, I'm NOT talking about the ManifoldCF logging.
>>> For the Solr connector you probably won't need to turn that on; it's
>>> pretty simple and you can look at the Simple History in the UI to see
>>> what the request and response look like from Solr.  I was talking
>>> instead about Solr logging - when you run the Solr Webapp, by default
>>> all requests against the Extracting Update Handler are logged to
>>> standard error, so you will see them appear in the process window in
>>> which Solr is running.
>>>
>>> My suggestion to you is to first have a look at the Simple History for
>>> the job you are trying to run.  If you are getting back 500 errors
>>> from Solr, that means you have not set up Solr properly to work with
>>> ManifoldCF.  In recent versions of Solr, the example works fine out of
>>> the box, but when you try to deploy any other way you are often
>>> missing the jar that contains the extracting update handler, so of
>>> course nothing works.  Several people on the connectors-user list have
>>> run into this and if you search the list (go to the ManifoldCF site
>>> and click through to the mailing list page and there are links at the
>>> bottom for this purpose) you will find posts that describe exactly
>>> what is wrong and how to fix it.
>>>
>>> Hope this helps.
>>>
>>> Karl
>>>
>>>
>>> On Sun, Jan 29, 2012 at 2:30 PM, Silvia, Daniel [USA]
>>> <Si...@bah.com> wrote:
>>>> Yea,but for some reason the logging isn't coming through. The logging is set for info and I will have to change the logging level to DEBUG.
>>>>
>>>> Thanks again for your help.
>>>>
>>>>
>>>> ________________________________________
>>>> From: Karl Wright [daddywri@gmail.com]
>>>> Sent: Friday, January 27, 2012 5:06 PM
>>>> To: Silvia, Daniel [USA]
>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>
>>>> Actually, the best thing for debugging the Solr connection is looking
>>>> at standard-output on the Solr instance.  You will see all the posts
>>>> that are made and what the arguments were.  Also, this is the kind of
>>>> question you'd get a lot of benefit from posting to the list.  The
>>>> end-user documentation I pointed you at before describes some of this
>>>> but the Solr connector has grown beyond the doc to some extent at this
>>>> point.
>>>>
>>>> Karl
>>>>
>>>> On Fri, Jan 27, 2012 at 9:51 AM, Silvia, Daniel [USA]
>>>> <Si...@bah.com> wrote:
>>>>> Hi Karl
>>>>>
>>>>> Is there a log level other than  Wire-level debugging to view log staements for trying to send output to a Solr instance in the Jobs List/Creation section? We are having an issue getting content to Solr. Is there a document anywhere which defines the fields for the Jobs sections for the Solr Field Mapping tab and the Paths and MetaData tabs?
>>>>>
>>>>> Thanks
>>>>>
>>>>> Dan
>>>>>
>>>>> ________________________________________
>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>> Sent: Thursday, January 26, 2012 10:44 AM
>>>>> To: Silvia, Daniel [USA]
>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>
>>>>> I am afraid I don't know the answer to that.  I'm sure it's infinitely
>>>>> configurable but it's not clear what the SharePoint web services need
>>>>> to do under the hood, so anything I tell you would be just a guess.
>>>>>
>>>>> Karl
>>>>>
>>>>> On Thu, Jan 26, 2012 at 10:43 AM, Silvia, Daniel [USA]
>>>>> <Si...@bah.com> wrote:
>>>>>> Hi Karl
>>>>>>
>>>>>> One more question. Do you know the minimum permissions needed to crawl the Sharepoint instance and all sites under the instance? The individual who set my permissions set me up as the "site collection admin" for the top most site. Is there a specific admin role without setting the user crawling the sharpoint instance other than "Farm Admin"?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> ________________________________________
>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>> Sent: Thursday, January 26, 2012 9:53 AM
>>>>>> To: Silvia, Daniel [USA]
>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>
>>>>>> Good news!  Please keep in touch; we'd like to hear how things work
>>>>>> for you (it helps keep the software fresh ;-) ).
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>> On Thu, Jan 26, 2012 at 9:48 AM, Silvia, Daniel [USA]
>>>>>> <Si...@bah.com> wrote:
>>>>>>> Hey Karl
>>>>>>>
>>>>>>> (1) was the issue. When requesting access to the SharePoint instance I indicated that I needed to be able to crawl SharePoint, I guess the problem was on my end indicating that I also needed privileges to crawl the site.
>>>>>>>
>>>>>>> Anyway, thank you for your help. When I change the SharePoint version to v 3 I get a message indicating "Connection Working".
>>>>>>>
>>>>>>> Appreciate the help.
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>> ________________________________________
>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>> Sent: Thursday, January 26, 2012 9:19 AM
>>>>>>> To: Silvia, Daniel [USA]
>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>
>>>>>>> The error message "axisFault=Server, detail=Server was unable to
>>>>>>> process request --> Requested Registry access is not allowed" is Axis
>>>>>>> interpreting an error message from SharePoint.  What it is saying is
>>>>>>> that the user you are trying to crawl with is unable to read the
>>>>>>> SharePoint machine's registry but needs to.  There are two possible
>>>>>>> causes for this:
>>>>>>>
>>>>>>> (1) The user you gave doesn't have enough permissions to crawl SharePoint
>>>>>>> (2) When you installed the SharePoint MCPermissions plugin, you
>>>>>>> installed it logged in as a user that did not enough permissions to do
>>>>>>> what it needs to do.
>>>>>>>
>>>>>>> You can tell the difference between the two by selecting "SharePoint
>>>>>>> 2.0" in the sharepoint version pulldown.  If a connection saved in
>>>>>>> this way says "Connection working", it means that the MCPermissions
>>>>>>> plugin has the permission problem, not your user.
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>> On Thu, Jan 26, 2012 at 9:14 AM, Silvia, Daniel [USA]
>>>>>>> <Si...@bah.com> wrote:
>>>>>>>> Hi Karl
>>>>>>>>
>>>>>>>> When I try to use option (1) and don't put anything in the Site field, I get an error message "axisFault=Server, detail=Server was unable to process request --> Requested Registry access is not allowed" and when I put a "/" in the site filed I get  a GUI error indicating that the site field can't end with a "/".
>>>>>>>>
>>>>>>>> Anyway, do you have any ideas. Or maybe the Sharepoint instance is not configured properly for us to crawl?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ________________________________________
>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>> Sent: Thursday, January 26, 2012 8:52 AM
>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>
>>>>>>>> SharePoint has two kinds of site:
>>>>>>>>
>>>>>>>> (1) the root site, which can be reached by the path http://server:port
>>>>>>>> (2) a number of sites under the 'virtual path', with URLs of the form:
>>>>>>>>
>>>>>>>> http://server:port/something/sitename
>>>>>>>>
>>>>>>>> The "something" is, by default, the string "site", so
>>>>>>>> http://server:port/site/xyz might be the URL of one such virtual site.
>>>>>>>>
>>>>>>>> The form of the "site" field in the SharePoint connection for the
>>>>>>>> first is either blank or "/" (can't remember which right now), and the
>>>>>>>> form of the "site" field for the second is "/site/xyz".  On no account
>>>>>>>> does the connector expect to see default.aspx attached to that path,
>>>>>>>> so you should not do this; it cannot work.
>>>>>>>>
>>>>>>>> FWIW, my recommendation to try setting the connection type to
>>>>>>>> "SharePoint 2.0" was to rule out any possible installation issue with
>>>>>>>> the ManifoldCF sharepoint plugin.  The connection check for 2.0 does
>>>>>>>> not look for it; only the connection check for 3.0 does.
>>>>>>>>
>>>>>>>> Karl
>>>>>>>>
>>>>>>>> On Thu, Jan 26, 2012 at 8:41 AM, Silvia, Daniel [USA]
>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>> Hey Karl
>>>>>>>>>
>>>>>>>>> I am also getting an "HTTP Error 401.2: Unauthorized: Access is denied due to server configuration" when setting the Site field to /default.aspx. Do most Sharepoint instances have the urls set to something like http://server:port/sites/...... instead of http://server:port/? When I use the "/default.aspx" I see in the log files that ManifoldCF is trying to go to the Lists.asmx service with the url http://server:port/default.aspx/_vti_bin/Lists.asmx, where nothing is found.
>>>>>>>>>
>>>>>>>>> As you can tell I am not much of a SharePoint user or installer.
>>>>>>>>>
>>>>>>>>> Also, I don't think the issue is with the connector in ManifoldCF, I am just trying to
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ________________________________________
>>>>>>>>> From: Silvia, Daniel [USA]
>>>>>>>>> Sent: Thursday, January 26, 2012 7:23 AM
>>>>>>>>> To: Karl Wright
>>>>>>>>> Subject: RE: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>
>>>>>>>>> Hey Karl
>>>>>>>>>
>>>>>>>>> The issue I am having is that the Sharepoint instance url is something like http://server:port/default.aspx. If I don't put anything in the site field I get a message indicating "Requested Registry Access is not allowed". I was putting "/default.apsx" as my Site field which I believe may have been the issue. However, what do you put in your Site field when the site is the top most site, as in http://server:port/default.aspx?
>>>>>>>>>
>>>>>>>>> I would love to send you the log messages, but I am working on a network which is not connected to the outside.
>>>>>>>>>
>>>>>>>>> Thanks for your help.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ________________________________________
>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>> Sent: Wednesday, January 25, 2012 6:12 PM
>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>
>>>>>>>>> Daniel,
>>>>>>>>>
>>>>>>>>> FWIW, I can help you diagnose the issue, but to do so you really need
>>>>>>>>> to give me some concrete data.  I'm happy to grovel over the whole
>>>>>>>>> wire log if you feel you can send it to me; something that may not
>>>>>>>>> seem important to you will likely stand out strongly to me.  I can,
>>>>>>>>> for example, see whether you are getting back HTML because of an
>>>>>>>>> authentication error, for instance.  And if you ARE getting back valid
>>>>>>>>> SOAP, I would then be sure that something was wrong with the Axis
>>>>>>>>> client configuration, and I could pursue that here with the data
>>>>>>>>> provided.  The problem with software like SharePoint running on IIS is
>>>>>>>>> that it can be configured a nearly infinite number of ways, so
>>>>>>>>> diagnosis is more of an art than a science.  I strongly suspect that
>>>>>>>>> you're laboring under a pretty straightforward misconception which is
>>>>>>>>> likely blocking progress, rather than there being an issue with the
>>>>>>>>> SharePoint connector itself.  But I can't tell that without more
>>>>>>>>> detailed communication.
>>>>>>>>>
>>>>>>>>> Also, you mentioned that the Lists.asmx service was right where you
>>>>>>>>> expected it to be.  Have you read the SharePoint Connector part of the
>>>>>>>>> end-user documentation?  To whit:
>>>>>>>>>
>>>>>>>>> "Select the server protocol, and enter the server name and port, based
>>>>>>>>> on what you recorded from the URL for your SharePoint site. For the
>>>>>>>>> "Site path" field, type in the portion of the root site URL that
>>>>>>>>> includes everything after the server and port, except for the final
>>>>>>>>> "aspx" file. For example, if the SharePoint URL is
>>>>>>>>> "http://myserver:81/sites/somewhere/index.asp", the site path would be
>>>>>>>>> "/sites/somewhere"."  The Lists.asmx service in this example would be
>>>>>>>>> expected to be found at
>>>>>>>>> "http://myserver:81/sites/somewhere/_vti_bin/Lists.asmx".  And the URL
>>>>>>>>> you would start with would be the URL you see in the browser when you
>>>>>>>>> log into the SharePoint web client and go to the site you wish to
>>>>>>>>> crawl.  Is this what you are doing?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks again,
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jan 25, 2012 at 12:33 PM, Karl Wright <da...@gmail.com> wrote:
>>>>>>>>>> The code that parses the SOAP response is Apache Axis.  This hasn't
>>>>>>>>>> changed in several years.
>>>>>>>>>>
>>>>>>>>>> Can you answer the following questions:
>>>>>>>>>>
>>>>>>>>>> (1) When the SharePoint connector makes a request to SharePoint, is
>>>>>>>>>> the response HTML, or is it XML?  Does it have an XML header which
>>>>>>>>>> describes a Microsoft XML namespace?  It sure sounds like it is
>>>>>>>>>> responding with HTML.  The SharePoint connector is expecting to
>>>>>>>>>> communicate using SOAP.  Is the response valid SOAP?
>>>>>>>>>>
>>>>>>>>>> (2) What version of SharePoint are you trying to connect to?  Is the
>>>>>>>>>> SharePoint 2007?  SharePoint 2010?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>> On Wed, Jan 25, 2012 at 12:26 PM, Silvia, Daniel [USA]
>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>> Hi Karl
>>>>>>>>>>>
>>>>>>>>>>> I have added the specific log4j lines for Http Client wire and I restarted the ManifoldCF instance. I was also see the webservice Lists.asmx through IE. When reviewing the log files I was able to see some of the content that resides in the Sharepoint instance in the content coming back from the request. However, I am still seeing the error messages in the ManifoldCF GUI as well as in the log file indicating  "Bad Envelope: HTML" ,"No service named ListsSoap is available" and "No service named http://schemas.microsoft.com/sharepoint/soap/GetListCollection is available".
>>>>>>>>>>>
>>>>>>>>>>> Could there be something going on with the way the services are being built on the client side?
>>>>>>>>>>>
>>>>>>>>>>> Appreciate your help.
>>>>>>>>>>>
>>>>>>>>>>> Dan
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ________________________________________
>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>> Sent: Tuesday, January 24, 2012 4:52 PM
>>>>>>>>>>> To: Silvia, Daniel [USA]; connectors-user@incubator.apache.org
>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>
>>>>>>>>>>> I have not seen this exact problem before.
>>>>>>>>>>>
>>>>>>>>>>> The "Bad envelope tag: HTML" indicates that the SOAP request the
>>>>>>>>>>> SharePoint connector is attempting to perform is, in fact, returning
>>>>>>>>>>> an HTML response.  This usually indicates that the server or path
>>>>>>>>>>> parameters you've used to set up the connection are not set correctly,
>>>>>>>>>>> and SharePoint is not actually being engaged.
>>>>>>>>>>>
>>>>>>>>>>> But usually when that happens I don't recall a ConfigurationException
>>>>>>>>>>> logged, unless it's what Axis does in response to the HTML.
>>>>>>>>>>>
>>>>>>>>>>> The best thing to do at this point is turn on Http Client wire
>>>>>>>>>>> logging, restart ManifoldCF, and view the connection.  The log will
>>>>>>>>>>> then contain a record of the exact SOAP requests and the responses,
>>>>>>>>>>> and we can see what's wrong.  The technique is described here:
>>>>>>>>>>>
>>>>>>>>>>> https://cwiki.apache.org/confluence/display/CONNECTORS/Debugging+Connections
>>>>>>>>>>>
>>>>>>>>>>> You can also confirm that the right SharePoint web services are
>>>>>>>>>>> functioning on the machine in question by trying to access them
>>>>>>>>>>> directly.  For the Lists web service, which is the one it sounds like
>>>>>>>>>>> it was complaining about, try using IE (not Firefox etc because you
>>>>>>>>>>> want NTLM support) to go to the url where you think the web service
>>>>>>>>>>> lives.  This will be http: or https:, plus the server, plus the port,
>>>>>>>>>>> plus the path, plus "_vti_bin/Lists.asmx".  You should see an
>>>>>>>>>>> unequivocable SharePoint response.  For an example from the Microsoft
>>>>>>>>>>> demo service, try http://www.wssdemo.com/_vti_bin/Lists.asmx.
>>>>>>>>>>>
>>>>>>>>>>> Please let me know how it goes, and cc the dev list (as I have) so a
>>>>>>>>>>> record of what you're encountering can be made available to others.
>>>>>>>>>>>
>>>>>>>>>>> Thanks!
>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jan 24, 2012 at 1:52 PM, Silvia, Daniel [USA]
>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>
>>>>>>>>>>>> I have downloaded the newest version of ManifoldCF v .4 and have run the necessary ant scripts to download dependencies and then built the entire project. I have also had the ShrePoint webservice MetCarta.SharePoint.MCPermissionsService.wsp deployed on the SharePoint instance due to running version 3 of SharePoint (SharePoint 2007). When I try to create a Repository Connection and select "Save" I get a message on the ManifoldCF front end of "org.xml.sax.SAXException Bad envelope tag: HTML". When I look at the log file I see an error message " org.apache.axis.ConfigurationException: No service named ListsSoap is available".
>>>>>>>>>>>>
>>>>>>>>>>>> Can you tell me if you have seen this issue before and what may be causing this issue?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>>
>>>>>>>>>>>> Dan
>>>>>>>>>>>> ________________________________________
>>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>>> Sent: Friday, January 20, 2012 7:31 AM
>>>>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>>
>>>>>>>>>>>> In order for the SharePoint connector to build, you need to have the
>>>>>>>>>>>> wsdls in place in the right area.  We cannot ship those because of
>>>>>>>>>>>> potential copyright issues.  The easiest way to obtain the right
>>>>>>>>>>>> dependencies is:
>>>>>>>>>>>>
>>>>>>>>>>>> ant download-dependencies
>>>>>>>>>>>>
>>>>>>>>>>>> Then, just build normally:
>>>>>>>>>>>>
>>>>>>>>>>>> ant build
>>>>>>>>>>>>
>>>>>>>>>>>> This will only work for ManifoldCF-0.4-incubating, or trunk.
>>>>>>>>>>>> 0.4-incubating is still in the process of being signed off by the
>>>>>>>>>>>> incubator, but you can find the release candidate here:
>>>>>>>>>>>>
>>>>>>>>>>>> http://people.apache.org/~kwright
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Karl
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jan 20, 2012 at 7:02 AM, Silvia, Daniel [USA]
>>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>>
>>>>>>>>>>>>> I work with Matt Parker and we are in the process of developing a pipeline
>>>>>>>>>>>>> that uses ManifoldCF at the beginning. I just subscribed to the
>>>>>>>>>>>>> connectors-user-subscribe@incubator.apache.org
>>>>>>>>>>>>> group yesterday and submitted an e-mail question to the group. Can you help
>>>>>>>>>>>>> us with the below issue?
>>>>>>>>>>>>>
>>>>>>>>>>>>> I downloaded MCF and started playing with the default setup under Jetty and
>>>>>>>>>>>>> Derby. It starts up without any issue. I am trying to configure a SharePoint
>>>>>>>>>>>>> connector, connecting to SharePoint Service 3. I have been following the
>>>>>>>>>>>>> instructions and I am at the point of deploying the custom SharePoint web
>>>>>>>>>>>>> service to the SharePoint instance. The instructions indicate that I should
>>>>>>>>>>>>> get the web service from dist/sharepoint-integration after building MCF.
>>>>>>>>>>>>> However, after looking through the entire directory structure, I am unable
>>>>>>>>>>>>> to find the service to deploy.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can someone tell me where to find this service?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Daniel Silvia

Re: ManifoldCF's dist/shapoint-integration dir

Posted by Karl Wright <da...@gmail.com>.
I should clarify that the reason for deleting and recreating the job
is because ManifoldCF crawls incrementally.  If you just run a job a
second time you may well not get any documents if none have changed
from the first time the job was run.

Thanks,
Karl

On Tue, Jan 31, 2012 at 9:00 AM, Karl Wright <da...@gmail.com> wrote:
> Ok, let's do one thing at a time.
>
> First:
>
> "For the Path tab where there are Path Rules, are these the paths we
> want ManifoldCF to follow? Each site, and each Library like Documents
> and Shared Documents. And in the Metadata tab, this is the tab where
> you indicate for each "Site" and "Library" you want to include
> specific metadata or include all metadata?"
>
> For SharePoint, there are Path Rules and Metadata Rules.  The Path
> Rules describe what documents you want to include or exclude.  The
> Metadata Rules describe what metadata you want to include or exclude.
> For right now I would ignore the Metadata Rules and just make sure you
> have Path Rules that mean that you have included documents.
>
> "As I run the report, I see "Documents", "Active, and "Processed"
> where the numbers change under the "Active" column as well as the
> "Document" and "Processed" column (these just get larger, where Active
> changes). "
>
> This "report" we actually call the Job Status screen.  The fact that
> the numbers get larger and the job doesn't just end indicates that you
> are successfully crawling your SharePoint, and you have set up the job
> to include at least some documents.  This is good news.  However, this
> is NOT the "Simple History" report I was alluding to earlier.  To get
> to that report, click on the "Simple History" link on the left-hand
> navigation area.  This report will show the events of your choice
> (default - ALL recorded events) over a given time window (default: the
> last hour).  If you've done this right you should at least see a "Job
> start" event.  The events you are most interested in are the "fetch"
> (which describes all attempts to fetch documents from SharePoint) and
> "document ingest", which describe attempts to get documents into Solr.
>  You can refresh the displayed events by clicking the "Go" button in
> the middle of the screen whenever you wish.
>
> I'd like you to delete your job, create it again, and start it.  Then,
> while it is running, I'd like you to go to the "Simple History"
> screen, and select the appropriate connection (your SharePoint
> repository connection), and click the "Go" button.  So as not to skip
> anything basic:
>
> (1) What event types do you see?
> (2) Are there "fetch" events?
> (3) Are there "document ingest" events?
>
> If you see no "fetch" events, that implies you have either not
> specified any documents to include in your job, OR your Solr
> connection is configured to reject too many document types so they are
> all getting filtered out.
>
> If you see "document ingest" events, but those have errors, it implies
> that the configuration of your Solr connection is incorrect and does
> not match the way your Solr is configured.  If you send me a specific
> error code and/or text I can help you figure out what is happening.
>
> If you see "document ingest" events with NO errors, but the Solr
> instance is not getting documents, you are describing an impossible
> situation.  While your Solr instance may not be configured to have the
> Extracting Update Handler active, or it may be at a different URL than
> what you pointed at, that would definitely yield errors or
> notifications in the Simple History.
>
> Please let me know what you actually see.
> Karl
>
>
>
> On Tue, Jan 31, 2012 at 7:53 AM, Silvia, Daniel [USA]
> <Si...@bah.com> wrote:
>> Hi Karl
>>
>> I am trying to figure out why I can't see anything being indexed into our Solr index. I was looking at another post where you were working with "Martijn" and that individual was not able to see info getting into Solr. In the report  that I have set up, I have included all metadata associated to each site, Share Documents, and Documents. In the Solr Field Mapping, I am associating metadata fields that are indicated in the MetaData tab to fields that exist in our solr index.
>>
>> For the Path tab where there are Path Rules, are these the paths we want ManifoldCF to follow? Each site, and each Library like Documents and Shared Documents. And in the Metadata tab, this is the tab where you indicate for each "Site" and "Library" you want to include specific metadata or include all metadata?
>>
>> As I run the report, I see "Documents", "Active, and "Processed" where the numbers change under the "Active" column as well as the "Document" and "Processed" column (these just get larger, where Active changes). While I was researching why I may not be seeing something over on the Solr side, I saw your communication with another individual indicating that I should see something like literal.xxx=yyy in the Solr log. This is an older post so there maybe something else I should see. But the only thing I see when I look at the Solr log is "[ ] webapp=/solr path=/update/extract params={commit=true} status=0 QTime=0".
>>
>> Any ideas.
>>
>> Thanks
>>
>>
>>
>>
>>
>> ________________________________________
>> From: Karl Wright [daddywri@gmail.com]
>> Sent: Monday, January 30, 2012 10:40 AM
>> To: Silvia, Daniel [USA]
>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>
>> The default time range for the Simple History is the last hour.  I
>> suspect you are unaware of that.  If you want a different time range
>> you will have to modify the start and end time pulldowns accordingly.
>>
>> Karl
>>
>> On Mon, Jan 30, 2012 at 10:34 AM, Silvia, Daniel [USA]
>> <Si...@bah.com> wrote:
>>> Hi Karl
>>>
>>> I am looking at the Simple History in the UI and there isn't much to see, unless I am not getting what I am suppose to.  I see the "Start Time, Activity, Identifier, Bytes, and Time, I don't get anything for Result Code or Result Description. I looked in the documentation and we should be getting something in those fields, I believe.
>>>
>>> Anyway, I will look through the mail list to see what I can find.
>>>
>>> Thanks for the help.
>>>
>>> Dan
>>>
>>> ________________________________________
>>> From: Karl Wright [daddywri@gmail.com]
>>> Sent: Monday, January 30, 2012 8:24 AM
>>> To: Silvia, Daniel [USA]
>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>
>>> So just to be clear, I'm NOT talking about the ManifoldCF logging.
>>> For the Solr connector you probably won't need to turn that on; it's
>>> pretty simple and you can look at the Simple History in the UI to see
>>> what the request and response look like from Solr.  I was talking
>>> instead about Solr logging - when you run the Solr Webapp, by default
>>> all requests against the Extracting Update Handler are logged to
>>> standard error, so you will see them appear in the process window in
>>> which Solr is running.
>>>
>>> My suggestion to you is to first have a look at the Simple History for
>>> the job you are trying to run.  If you are getting back 500 errors
>>> from Solr, that means you have not set up Solr properly to work with
>>> ManifoldCF.  In recent versions of Solr, the example works fine out of
>>> the box, but when you try to deploy any other way you are often
>>> missing the jar that contains the extracting update handler, so of
>>> course nothing works.  Several people on the connectors-user list have
>>> run into this and if you search the list (go to the ManifoldCF site
>>> and click through to the mailing list page and there are links at the
>>> bottom for this purpose) you will find posts that describe exactly
>>> what is wrong and how to fix it.
>>>
>>> Hope this helps.
>>>
>>> Karl
>>>
>>>
>>> On Sun, Jan 29, 2012 at 2:30 PM, Silvia, Daniel [USA]
>>> <Si...@bah.com> wrote:
>>>> Yea,but for some reason the logging isn't coming through. The logging is set for info and I will have to change the logging level to DEBUG.
>>>>
>>>> Thanks again for your help.
>>>>
>>>>
>>>> ________________________________________
>>>> From: Karl Wright [daddywri@gmail.com]
>>>> Sent: Friday, January 27, 2012 5:06 PM
>>>> To: Silvia, Daniel [USA]
>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>
>>>> Actually, the best thing for debugging the Solr connection is looking
>>>> at standard-output on the Solr instance.  You will see all the posts
>>>> that are made and what the arguments were.  Also, this is the kind of
>>>> question you'd get a lot of benefit from posting to the list.  The
>>>> end-user documentation I pointed you at before describes some of this
>>>> but the Solr connector has grown beyond the doc to some extent at this
>>>> point.
>>>>
>>>> Karl
>>>>
>>>> On Fri, Jan 27, 2012 at 9:51 AM, Silvia, Daniel [USA]
>>>> <Si...@bah.com> wrote:
>>>>> Hi Karl
>>>>>
>>>>> Is there a log level other than  Wire-level debugging to view log staements for trying to send output to a Solr instance in the Jobs List/Creation section? We are having an issue getting content to Solr. Is there a document anywhere which defines the fields for the Jobs sections for the Solr Field Mapping tab and the Paths and MetaData tabs?
>>>>>
>>>>> Thanks
>>>>>
>>>>> Dan
>>>>>
>>>>> ________________________________________
>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>> Sent: Thursday, January 26, 2012 10:44 AM
>>>>> To: Silvia, Daniel [USA]
>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>
>>>>> I am afraid I don't know the answer to that.  I'm sure it's infinitely
>>>>> configurable but it's not clear what the SharePoint web services need
>>>>> to do under the hood, so anything I tell you would be just a guess.
>>>>>
>>>>> Karl
>>>>>
>>>>> On Thu, Jan 26, 2012 at 10:43 AM, Silvia, Daniel [USA]
>>>>> <Si...@bah.com> wrote:
>>>>>> Hi Karl
>>>>>>
>>>>>> One more question. Do you know the minimum permissions needed to crawl the Sharepoint instance and all sites under the instance? The individual who set my permissions set me up as the "site collection admin" for the top most site. Is there a specific admin role without setting the user crawling the sharpoint instance other than "Farm Admin"?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> ________________________________________
>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>> Sent: Thursday, January 26, 2012 9:53 AM
>>>>>> To: Silvia, Daniel [USA]
>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>
>>>>>> Good news!  Please keep in touch; we'd like to hear how things work
>>>>>> for you (it helps keep the software fresh ;-) ).
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>> On Thu, Jan 26, 2012 at 9:48 AM, Silvia, Daniel [USA]
>>>>>> <Si...@bah.com> wrote:
>>>>>>> Hey Karl
>>>>>>>
>>>>>>> (1) was the issue. When requesting access to the SharePoint instance I indicated that I needed to be able to crawl SharePoint, I guess the problem was on my end indicating that I also needed privileges to crawl the site.
>>>>>>>
>>>>>>> Anyway, thank you for your help. When I change the SharePoint version to v 3 I get a message indicating "Connection Working".
>>>>>>>
>>>>>>> Appreciate the help.
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>> ________________________________________
>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>> Sent: Thursday, January 26, 2012 9:19 AM
>>>>>>> To: Silvia, Daniel [USA]
>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>
>>>>>>> The error message "axisFault=Server, detail=Server was unable to
>>>>>>> process request --> Requested Registry access is not allowed" is Axis
>>>>>>> interpreting an error message from SharePoint.  What it is saying is
>>>>>>> that the user you are trying to crawl with is unable to read the
>>>>>>> SharePoint machine's registry but needs to.  There are two possible
>>>>>>> causes for this:
>>>>>>>
>>>>>>> (1) The user you gave doesn't have enough permissions to crawl SharePoint
>>>>>>> (2) When you installed the SharePoint MCPermissions plugin, you
>>>>>>> installed it logged in as a user that did not enough permissions to do
>>>>>>> what it needs to do.
>>>>>>>
>>>>>>> You can tell the difference between the two by selecting "SharePoint
>>>>>>> 2.0" in the sharepoint version pulldown.  If a connection saved in
>>>>>>> this way says "Connection working", it means that the MCPermissions
>>>>>>> plugin has the permission problem, not your user.
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>> On Thu, Jan 26, 2012 at 9:14 AM, Silvia, Daniel [USA]
>>>>>>> <Si...@bah.com> wrote:
>>>>>>>> Hi Karl
>>>>>>>>
>>>>>>>> When I try to use option (1) and don't put anything in the Site field, I get an error message "axisFault=Server, detail=Server was unable to process request --> Requested Registry access is not allowed" and when I put a "/" in the site filed I get  a GUI error indicating that the site field can't end with a "/".
>>>>>>>>
>>>>>>>> Anyway, do you have any ideas. Or maybe the Sharepoint instance is not configured properly for us to crawl?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ________________________________________
>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>> Sent: Thursday, January 26, 2012 8:52 AM
>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>
>>>>>>>> SharePoint has two kinds of site:
>>>>>>>>
>>>>>>>> (1) the root site, which can be reached by the path http://server:port
>>>>>>>> (2) a number of sites under the 'virtual path', with URLs of the form:
>>>>>>>>
>>>>>>>> http://server:port/something/sitename
>>>>>>>>
>>>>>>>> The "something" is, by default, the string "site", so
>>>>>>>> http://server:port/site/xyz might be the URL of one such virtual site.
>>>>>>>>
>>>>>>>> The form of the "site" field in the SharePoint connection for the
>>>>>>>> first is either blank or "/" (can't remember which right now), and the
>>>>>>>> form of the "site" field for the second is "/site/xyz".  On no account
>>>>>>>> does the connector expect to see default.aspx attached to that path,
>>>>>>>> so you should not do this; it cannot work.
>>>>>>>>
>>>>>>>> FWIW, my recommendation to try setting the connection type to
>>>>>>>> "SharePoint 2.0" was to rule out any possible installation issue with
>>>>>>>> the ManifoldCF sharepoint plugin.  The connection check for 2.0 does
>>>>>>>> not look for it; only the connection check for 3.0 does.
>>>>>>>>
>>>>>>>> Karl
>>>>>>>>
>>>>>>>> On Thu, Jan 26, 2012 at 8:41 AM, Silvia, Daniel [USA]
>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>> Hey Karl
>>>>>>>>>
>>>>>>>>> I am also getting an "HTTP Error 401.2: Unauthorized: Access is denied due to server configuration" when setting the Site field to /default.aspx. Do most Sharepoint instances have the urls set to something like http://server:port/sites/...... instead of http://server:port/? When I use the "/default.aspx" I see in the log files that ManifoldCF is trying to go to the Lists.asmx service with the url http://server:port/default.aspx/_vti_bin/Lists.asmx, where nothing is found.
>>>>>>>>>
>>>>>>>>> As you can tell I am not much of a SharePoint user or installer.
>>>>>>>>>
>>>>>>>>> Also, I don't think the issue is with the connector in ManifoldCF, I am just trying to
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ________________________________________
>>>>>>>>> From: Silvia, Daniel [USA]
>>>>>>>>> Sent: Thursday, January 26, 2012 7:23 AM
>>>>>>>>> To: Karl Wright
>>>>>>>>> Subject: RE: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>
>>>>>>>>> Hey Karl
>>>>>>>>>
>>>>>>>>> The issue I am having is that the Sharepoint instance url is something like http://server:port/default.aspx. If I don't put anything in the site field I get a message indicating "Requested Registry Access is not allowed". I was putting "/default.apsx" as my Site field which I believe may have been the issue. However, what do you put in your Site field when the site is the top most site, as in http://server:port/default.aspx?
>>>>>>>>>
>>>>>>>>> I would love to send you the log messages, but I am working on a network which is not connected to the outside.
>>>>>>>>>
>>>>>>>>> Thanks for your help.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ________________________________________
>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>> Sent: Wednesday, January 25, 2012 6:12 PM
>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>
>>>>>>>>> Daniel,
>>>>>>>>>
>>>>>>>>> FWIW, I can help you diagnose the issue, but to do so you really need
>>>>>>>>> to give me some concrete data.  I'm happy to grovel over the whole
>>>>>>>>> wire log if you feel you can send it to me; something that may not
>>>>>>>>> seem important to you will likely stand out strongly to me.  I can,
>>>>>>>>> for example, see whether you are getting back HTML because of an
>>>>>>>>> authentication error, for instance.  And if you ARE getting back valid
>>>>>>>>> SOAP, I would then be sure that something was wrong with the Axis
>>>>>>>>> client configuration, and I could pursue that here with the data
>>>>>>>>> provided.  The problem with software like SharePoint running on IIS is
>>>>>>>>> that it can be configured a nearly infinite number of ways, so
>>>>>>>>> diagnosis is more of an art than a science.  I strongly suspect that
>>>>>>>>> you're laboring under a pretty straightforward misconception which is
>>>>>>>>> likely blocking progress, rather than there being an issue with the
>>>>>>>>> SharePoint connector itself.  But I can't tell that without more
>>>>>>>>> detailed communication.
>>>>>>>>>
>>>>>>>>> Also, you mentioned that the Lists.asmx service was right where you
>>>>>>>>> expected it to be.  Have you read the SharePoint Connector part of the
>>>>>>>>> end-user documentation?  To whit:
>>>>>>>>>
>>>>>>>>> "Select the server protocol, and enter the server name and port, based
>>>>>>>>> on what you recorded from the URL for your SharePoint site. For the
>>>>>>>>> "Site path" field, type in the portion of the root site URL that
>>>>>>>>> includes everything after the server and port, except for the final
>>>>>>>>> "aspx" file. For example, if the SharePoint URL is
>>>>>>>>> "http://myserver:81/sites/somewhere/index.asp", the site path would be
>>>>>>>>> "/sites/somewhere"."  The Lists.asmx service in this example would be
>>>>>>>>> expected to be found at
>>>>>>>>> "http://myserver:81/sites/somewhere/_vti_bin/Lists.asmx".  And the URL
>>>>>>>>> you would start with would be the URL you see in the browser when you
>>>>>>>>> log into the SharePoint web client and go to the site you wish to
>>>>>>>>> crawl.  Is this what you are doing?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks again,
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jan 25, 2012 at 12:33 PM, Karl Wright <da...@gmail.com> wrote:
>>>>>>>>>> The code that parses the SOAP response is Apache Axis.  This hasn't
>>>>>>>>>> changed in several years.
>>>>>>>>>>
>>>>>>>>>> Can you answer the following questions:
>>>>>>>>>>
>>>>>>>>>> (1) When the SharePoint connector makes a request to SharePoint, is
>>>>>>>>>> the response HTML, or is it XML?  Does it have an XML header which
>>>>>>>>>> describes a Microsoft XML namespace?  It sure sounds like it is
>>>>>>>>>> responding with HTML.  The SharePoint connector is expecting to
>>>>>>>>>> communicate using SOAP.  Is the response valid SOAP?
>>>>>>>>>>
>>>>>>>>>> (2) What version of SharePoint are you trying to connect to?  Is the
>>>>>>>>>> SharePoint 2007?  SharePoint 2010?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>> On Wed, Jan 25, 2012 at 12:26 PM, Silvia, Daniel [USA]
>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>> Hi Karl
>>>>>>>>>>>
>>>>>>>>>>> I have added the specific log4j lines for Http Client wire and I restarted the ManifoldCF instance. I was also see the webservice Lists.asmx through IE. When reviewing the log files I was able to see some of the content that resides in the Sharepoint instance in the content coming back from the request. However, I am still seeing the error messages in the ManifoldCF GUI as well as in the log file indicating  "Bad Envelope: HTML" ,"No service named ListsSoap is available" and "No service named http://schemas.microsoft.com/sharepoint/soap/GetListCollection is available".
>>>>>>>>>>>
>>>>>>>>>>> Could there be something going on with the way the services are being built on the client side?
>>>>>>>>>>>
>>>>>>>>>>> Appreciate your help.
>>>>>>>>>>>
>>>>>>>>>>> Dan
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ________________________________________
>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>> Sent: Tuesday, January 24, 2012 4:52 PM
>>>>>>>>>>> To: Silvia, Daniel [USA]; connectors-user@incubator.apache.org
>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>
>>>>>>>>>>> I have not seen this exact problem before.
>>>>>>>>>>>
>>>>>>>>>>> The "Bad envelope tag: HTML" indicates that the SOAP request the
>>>>>>>>>>> SharePoint connector is attempting to perform is, in fact, returning
>>>>>>>>>>> an HTML response.  This usually indicates that the server or path
>>>>>>>>>>> parameters you've used to set up the connection are not set correctly,
>>>>>>>>>>> and SharePoint is not actually being engaged.
>>>>>>>>>>>
>>>>>>>>>>> But usually when that happens I don't recall a ConfigurationException
>>>>>>>>>>> logged, unless it's what Axis does in response to the HTML.
>>>>>>>>>>>
>>>>>>>>>>> The best thing to do at this point is turn on Http Client wire
>>>>>>>>>>> logging, restart ManifoldCF, and view the connection.  The log will
>>>>>>>>>>> then contain a record of the exact SOAP requests and the responses,
>>>>>>>>>>> and we can see what's wrong.  The technique is described here:
>>>>>>>>>>>
>>>>>>>>>>> https://cwiki.apache.org/confluence/display/CONNECTORS/Debugging+Connections
>>>>>>>>>>>
>>>>>>>>>>> You can also confirm that the right SharePoint web services are
>>>>>>>>>>> functioning on the machine in question by trying to access them
>>>>>>>>>>> directly.  For the Lists web service, which is the one it sounds like
>>>>>>>>>>> it was complaining about, try using IE (not Firefox etc because you
>>>>>>>>>>> want NTLM support) to go to the url where you think the web service
>>>>>>>>>>> lives.  This will be http: or https:, plus the server, plus the port,
>>>>>>>>>>> plus the path, plus "_vti_bin/Lists.asmx".  You should see an
>>>>>>>>>>> unequivocable SharePoint response.  For an example from the Microsoft
>>>>>>>>>>> demo service, try http://www.wssdemo.com/_vti_bin/Lists.asmx.
>>>>>>>>>>>
>>>>>>>>>>> Please let me know how it goes, and cc the dev list (as I have) so a
>>>>>>>>>>> record of what you're encountering can be made available to others.
>>>>>>>>>>>
>>>>>>>>>>> Thanks!
>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jan 24, 2012 at 1:52 PM, Silvia, Daniel [USA]
>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>
>>>>>>>>>>>> I have downloaded the newest version of ManifoldCF v .4 and have run the necessary ant scripts to download dependencies and then built the entire project. I have also had the ShrePoint webservice MetCarta.SharePoint.MCPermissionsService.wsp deployed on the SharePoint instance due to running version 3 of SharePoint (SharePoint 2007). When I try to create a Repository Connection and select "Save" I get a message on the ManifoldCF front end of "org.xml.sax.SAXException Bad envelope tag: HTML". When I look at the log file I see an error message " org.apache.axis.ConfigurationException: No service named ListsSoap is available".
>>>>>>>>>>>>
>>>>>>>>>>>> Can you tell me if you have seen this issue before and what may be causing this issue?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>>
>>>>>>>>>>>> Dan
>>>>>>>>>>>> ________________________________________
>>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>>> Sent: Friday, January 20, 2012 7:31 AM
>>>>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>>
>>>>>>>>>>>> In order for the SharePoint connector to build, you need to have the
>>>>>>>>>>>> wsdls in place in the right area.  We cannot ship those because of
>>>>>>>>>>>> potential copyright issues.  The easiest way to obtain the right
>>>>>>>>>>>> dependencies is:
>>>>>>>>>>>>
>>>>>>>>>>>> ant download-dependencies
>>>>>>>>>>>>
>>>>>>>>>>>> Then, just build normally:
>>>>>>>>>>>>
>>>>>>>>>>>> ant build
>>>>>>>>>>>>
>>>>>>>>>>>> This will only work for ManifoldCF-0.4-incubating, or trunk.
>>>>>>>>>>>> 0.4-incubating is still in the process of being signed off by the
>>>>>>>>>>>> incubator, but you can find the release candidate here:
>>>>>>>>>>>>
>>>>>>>>>>>> http://people.apache.org/~kwright
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Karl
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jan 20, 2012 at 7:02 AM, Silvia, Daniel [USA]
>>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>>
>>>>>>>>>>>>> I work with Matt Parker and we are in the process of developing a pipeline
>>>>>>>>>>>>> that uses ManifoldCF at the beginning. I just subscribed to the
>>>>>>>>>>>>> connectors-user-subscribe@incubator.apache.org
>>>>>>>>>>>>> group yesterday and submitted an e-mail question to the group. Can you help
>>>>>>>>>>>>> us with the below issue?
>>>>>>>>>>>>>
>>>>>>>>>>>>> I downloaded MCF and started playing with the default setup under Jetty and
>>>>>>>>>>>>> Derby. It starts up without any issue. I am trying to configure a SharePoint
>>>>>>>>>>>>> connector, connecting to SharePoint Service 3. I have been following the
>>>>>>>>>>>>> instructions and I am at the point of deploying the custom SharePoint web
>>>>>>>>>>>>> service to the SharePoint instance. The instructions indicate that I should
>>>>>>>>>>>>> get the web service from dist/sharepoint-integration after building MCF.
>>>>>>>>>>>>> However, after looking through the entire directory structure, I am unable
>>>>>>>>>>>>> to find the service to deploy.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can someone tell me where to find this service?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Daniel Silvia

Re: ManifoldCF's dist/shapoint-integration dir

Posted by Karl Wright <da...@gmail.com>.
Ok, let's do one thing at a time.

First:

"For the Path tab where there are Path Rules, are these the paths we
want ManifoldCF to follow? Each site, and each Library like Documents
and Shared Documents. And in the Metadata tab, this is the tab where
you indicate for each "Site" and "Library" you want to include
specific metadata or include all metadata?"

For SharePoint, there are Path Rules and Metadata Rules.  The Path
Rules describe what documents you want to include or exclude.  The
Metadata Rules describe what metadata you want to include or exclude.
For right now I would ignore the Metadata Rules and just make sure you
have Path Rules that mean that you have included documents.

"As I run the report, I see "Documents", "Active, and "Processed"
where the numbers change under the "Active" column as well as the
"Document" and "Processed" column (these just get larger, where Active
changes). "

This "report" we actually call the Job Status screen.  The fact that
the numbers get larger and the job doesn't just end indicates that you
are successfully crawling your SharePoint, and you have set up the job
to include at least some documents.  This is good news.  However, this
is NOT the "Simple History" report I was alluding to earlier.  To get
to that report, click on the "Simple History" link on the left-hand
navigation area.  This report will show the events of your choice
(default - ALL recorded events) over a given time window (default: the
last hour).  If you've done this right you should at least see a "Job
start" event.  The events you are most interested in are the "fetch"
(which describes all attempts to fetch documents from SharePoint) and
"document ingest", which describe attempts to get documents into Solr.
 You can refresh the displayed events by clicking the "Go" button in
the middle of the screen whenever you wish.

I'd like you to delete your job, create it again, and start it.  Then,
while it is running, I'd like you to go to the "Simple History"
screen, and select the appropriate connection (your SharePoint
repository connection), and click the "Go" button.  So as not to skip
anything basic:

(1) What event types do you see?
(2) Are there "fetch" events?
(3) Are there "document ingest" events?

If you see no "fetch" events, that implies you have either not
specified any documents to include in your job, OR your Solr
connection is configured to reject too many document types so they are
all getting filtered out.

If you see "document ingest" events, but those have errors, it implies
that the configuration of your Solr connection is incorrect and does
not match the way your Solr is configured.  If you send me a specific
error code and/or text I can help you figure out what is happening.

If you see "document ingest" events with NO errors, but the Solr
instance is not getting documents, you are describing an impossible
situation.  While your Solr instance may not be configured to have the
Extracting Update Handler active, or it may be at a different URL than
what you pointed at, that would definitely yield errors or
notifications in the Simple History.

Please let me know what you actually see.
Karl



On Tue, Jan 31, 2012 at 7:53 AM, Silvia, Daniel [USA]
<Si...@bah.com> wrote:
> Hi Karl
>
> I am trying to figure out why I can't see anything being indexed into our Solr index. I was looking at another post where you were working with "Martijn" and that individual was not able to see info getting into Solr. In the report  that I have set up, I have included all metadata associated to each site, Share Documents, and Documents. In the Solr Field Mapping, I am associating metadata fields that are indicated in the MetaData tab to fields that exist in our solr index.
>
> For the Path tab where there are Path Rules, are these the paths we want ManifoldCF to follow? Each site, and each Library like Documents and Shared Documents. And in the Metadata tab, this is the tab where you indicate for each "Site" and "Library" you want to include specific metadata or include all metadata?
>
> As I run the report, I see "Documents", "Active, and "Processed" where the numbers change under the "Active" column as well as the "Document" and "Processed" column (these just get larger, where Active changes). While I was researching why I may not be seeing something over on the Solr side, I saw your communication with another individual indicating that I should see something like literal.xxx=yyy in the Solr log. This is an older post so there maybe something else I should see. But the only thing I see when I look at the Solr log is "[ ] webapp=/solr path=/update/extract params={commit=true} status=0 QTime=0".
>
> Any ideas.
>
> Thanks
>
>
>
>
>
> ________________________________________
> From: Karl Wright [daddywri@gmail.com]
> Sent: Monday, January 30, 2012 10:40 AM
> To: Silvia, Daniel [USA]
> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>
> The default time range for the Simple History is the last hour.  I
> suspect you are unaware of that.  If you want a different time range
> you will have to modify the start and end time pulldowns accordingly.
>
> Karl
>
> On Mon, Jan 30, 2012 at 10:34 AM, Silvia, Daniel [USA]
> <Si...@bah.com> wrote:
>> Hi Karl
>>
>> I am looking at the Simple History in the UI and there isn't much to see, unless I am not getting what I am suppose to.  I see the "Start Time, Activity, Identifier, Bytes, and Time, I don't get anything for Result Code or Result Description. I looked in the documentation and we should be getting something in those fields, I believe.
>>
>> Anyway, I will look through the mail list to see what I can find.
>>
>> Thanks for the help.
>>
>> Dan
>>
>> ________________________________________
>> From: Karl Wright [daddywri@gmail.com]
>> Sent: Monday, January 30, 2012 8:24 AM
>> To: Silvia, Daniel [USA]
>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>
>> So just to be clear, I'm NOT talking about the ManifoldCF logging.
>> For the Solr connector you probably won't need to turn that on; it's
>> pretty simple and you can look at the Simple History in the UI to see
>> what the request and response look like from Solr.  I was talking
>> instead about Solr logging - when you run the Solr Webapp, by default
>> all requests against the Extracting Update Handler are logged to
>> standard error, so you will see them appear in the process window in
>> which Solr is running.
>>
>> My suggestion to you is to first have a look at the Simple History for
>> the job you are trying to run.  If you are getting back 500 errors
>> from Solr, that means you have not set up Solr properly to work with
>> ManifoldCF.  In recent versions of Solr, the example works fine out of
>> the box, but when you try to deploy any other way you are often
>> missing the jar that contains the extracting update handler, so of
>> course nothing works.  Several people on the connectors-user list have
>> run into this and if you search the list (go to the ManifoldCF site
>> and click through to the mailing list page and there are links at the
>> bottom for this purpose) you will find posts that describe exactly
>> what is wrong and how to fix it.
>>
>> Hope this helps.
>>
>> Karl
>>
>>
>> On Sun, Jan 29, 2012 at 2:30 PM, Silvia, Daniel [USA]
>> <Si...@bah.com> wrote:
>>> Yea,but for some reason the logging isn't coming through. The logging is set for info and I will have to change the logging level to DEBUG.
>>>
>>> Thanks again for your help.
>>>
>>>
>>> ________________________________________
>>> From: Karl Wright [daddywri@gmail.com]
>>> Sent: Friday, January 27, 2012 5:06 PM
>>> To: Silvia, Daniel [USA]
>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>
>>> Actually, the best thing for debugging the Solr connection is looking
>>> at standard-output on the Solr instance.  You will see all the posts
>>> that are made and what the arguments were.  Also, this is the kind of
>>> question you'd get a lot of benefit from posting to the list.  The
>>> end-user documentation I pointed you at before describes some of this
>>> but the Solr connector has grown beyond the doc to some extent at this
>>> point.
>>>
>>> Karl
>>>
>>> On Fri, Jan 27, 2012 at 9:51 AM, Silvia, Daniel [USA]
>>> <Si...@bah.com> wrote:
>>>> Hi Karl
>>>>
>>>> Is there a log level other than  Wire-level debugging to view log staements for trying to send output to a Solr instance in the Jobs List/Creation section? We are having an issue getting content to Solr. Is there a document anywhere which defines the fields for the Jobs sections for the Solr Field Mapping tab and the Paths and MetaData tabs?
>>>>
>>>> Thanks
>>>>
>>>> Dan
>>>>
>>>> ________________________________________
>>>> From: Karl Wright [daddywri@gmail.com]
>>>> Sent: Thursday, January 26, 2012 10:44 AM
>>>> To: Silvia, Daniel [USA]
>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>
>>>> I am afraid I don't know the answer to that.  I'm sure it's infinitely
>>>> configurable but it's not clear what the SharePoint web services need
>>>> to do under the hood, so anything I tell you would be just a guess.
>>>>
>>>> Karl
>>>>
>>>> On Thu, Jan 26, 2012 at 10:43 AM, Silvia, Daniel [USA]
>>>> <Si...@bah.com> wrote:
>>>>> Hi Karl
>>>>>
>>>>> One more question. Do you know the minimum permissions needed to crawl the Sharepoint instance and all sites under the instance? The individual who set my permissions set me up as the "site collection admin" for the top most site. Is there a specific admin role without setting the user crawling the sharpoint instance other than "Farm Admin"?
>>>>>
>>>>> Thanks
>>>>>
>>>>> ________________________________________
>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>> Sent: Thursday, January 26, 2012 9:53 AM
>>>>> To: Silvia, Daniel [USA]
>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>
>>>>> Good news!  Please keep in touch; we'd like to hear how things work
>>>>> for you (it helps keep the software fresh ;-) ).
>>>>>
>>>>> Karl
>>>>>
>>>>> On Thu, Jan 26, 2012 at 9:48 AM, Silvia, Daniel [USA]
>>>>> <Si...@bah.com> wrote:
>>>>>> Hey Karl
>>>>>>
>>>>>> (1) was the issue. When requesting access to the SharePoint instance I indicated that I needed to be able to crawl SharePoint, I guess the problem was on my end indicating that I also needed privileges to crawl the site.
>>>>>>
>>>>>> Anyway, thank you for your help. When I change the SharePoint version to v 3 I get a message indicating "Connection Working".
>>>>>>
>>>>>> Appreciate the help.
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>> ________________________________________
>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>> Sent: Thursday, January 26, 2012 9:19 AM
>>>>>> To: Silvia, Daniel [USA]
>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>
>>>>>> The error message "axisFault=Server, detail=Server was unable to
>>>>>> process request --> Requested Registry access is not allowed" is Axis
>>>>>> interpreting an error message from SharePoint.  What it is saying is
>>>>>> that the user you are trying to crawl with is unable to read the
>>>>>> SharePoint machine's registry but needs to.  There are two possible
>>>>>> causes for this:
>>>>>>
>>>>>> (1) The user you gave doesn't have enough permissions to crawl SharePoint
>>>>>> (2) When you installed the SharePoint MCPermissions plugin, you
>>>>>> installed it logged in as a user that did not enough permissions to do
>>>>>> what it needs to do.
>>>>>>
>>>>>> You can tell the difference between the two by selecting "SharePoint
>>>>>> 2.0" in the sharepoint version pulldown.  If a connection saved in
>>>>>> this way says "Connection working", it means that the MCPermissions
>>>>>> plugin has the permission problem, not your user.
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>> On Thu, Jan 26, 2012 at 9:14 AM, Silvia, Daniel [USA]
>>>>>> <Si...@bah.com> wrote:
>>>>>>> Hi Karl
>>>>>>>
>>>>>>> When I try to use option (1) and don't put anything in the Site field, I get an error message "axisFault=Server, detail=Server was unable to process request --> Requested Registry access is not allowed" and when I put a "/" in the site filed I get  a GUI error indicating that the site field can't end with a "/".
>>>>>>>
>>>>>>> Anyway, do you have any ideas. Or maybe the Sharepoint instance is not configured properly for us to crawl?
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ________________________________________
>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>> Sent: Thursday, January 26, 2012 8:52 AM
>>>>>>> To: Silvia, Daniel [USA]
>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>
>>>>>>> SharePoint has two kinds of site:
>>>>>>>
>>>>>>> (1) the root site, which can be reached by the path http://server:port
>>>>>>> (2) a number of sites under the 'virtual path', with URLs of the form:
>>>>>>>
>>>>>>> http://server:port/something/sitename
>>>>>>>
>>>>>>> The "something" is, by default, the string "site", so
>>>>>>> http://server:port/site/xyz might be the URL of one such virtual site.
>>>>>>>
>>>>>>> The form of the "site" field in the SharePoint connection for the
>>>>>>> first is either blank or "/" (can't remember which right now), and the
>>>>>>> form of the "site" field for the second is "/site/xyz".  On no account
>>>>>>> does the connector expect to see default.aspx attached to that path,
>>>>>>> so you should not do this; it cannot work.
>>>>>>>
>>>>>>> FWIW, my recommendation to try setting the connection type to
>>>>>>> "SharePoint 2.0" was to rule out any possible installation issue with
>>>>>>> the ManifoldCF sharepoint plugin.  The connection check for 2.0 does
>>>>>>> not look for it; only the connection check for 3.0 does.
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>> On Thu, Jan 26, 2012 at 8:41 AM, Silvia, Daniel [USA]
>>>>>>> <Si...@bah.com> wrote:
>>>>>>>> Hey Karl
>>>>>>>>
>>>>>>>> I am also getting an "HTTP Error 401.2: Unauthorized: Access is denied due to server configuration" when setting the Site field to /default.aspx. Do most Sharepoint instances have the urls set to something like http://server:port/sites/...... instead of http://server:port/? When I use the "/default.aspx" I see in the log files that ManifoldCF is trying to go to the Lists.asmx service with the url http://server:port/default.aspx/_vti_bin/Lists.asmx, where nothing is found.
>>>>>>>>
>>>>>>>> As you can tell I am not much of a SharePoint user or installer.
>>>>>>>>
>>>>>>>> Also, I don't think the issue is with the connector in ManifoldCF, I am just trying to
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ________________________________________
>>>>>>>> From: Silvia, Daniel [USA]
>>>>>>>> Sent: Thursday, January 26, 2012 7:23 AM
>>>>>>>> To: Karl Wright
>>>>>>>> Subject: RE: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>
>>>>>>>> Hey Karl
>>>>>>>>
>>>>>>>> The issue I am having is that the Sharepoint instance url is something like http://server:port/default.aspx. If I don't put anything in the site field I get a message indicating "Requested Registry Access is not allowed". I was putting "/default.apsx" as my Site field which I believe may have been the issue. However, what do you put in your Site field when the site is the top most site, as in http://server:port/default.aspx?
>>>>>>>>
>>>>>>>> I would love to send you the log messages, but I am working on a network which is not connected to the outside.
>>>>>>>>
>>>>>>>> Thanks for your help.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ________________________________________
>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>> Sent: Wednesday, January 25, 2012 6:12 PM
>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>
>>>>>>>> Daniel,
>>>>>>>>
>>>>>>>> FWIW, I can help you diagnose the issue, but to do so you really need
>>>>>>>> to give me some concrete data.  I'm happy to grovel over the whole
>>>>>>>> wire log if you feel you can send it to me; something that may not
>>>>>>>> seem important to you will likely stand out strongly to me.  I can,
>>>>>>>> for example, see whether you are getting back HTML because of an
>>>>>>>> authentication error, for instance.  And if you ARE getting back valid
>>>>>>>> SOAP, I would then be sure that something was wrong with the Axis
>>>>>>>> client configuration, and I could pursue that here with the data
>>>>>>>> provided.  The problem with software like SharePoint running on IIS is
>>>>>>>> that it can be configured a nearly infinite number of ways, so
>>>>>>>> diagnosis is more of an art than a science.  I strongly suspect that
>>>>>>>> you're laboring under a pretty straightforward misconception which is
>>>>>>>> likely blocking progress, rather than there being an issue with the
>>>>>>>> SharePoint connector itself.  But I can't tell that without more
>>>>>>>> detailed communication.
>>>>>>>>
>>>>>>>> Also, you mentioned that the Lists.asmx service was right where you
>>>>>>>> expected it to be.  Have you read the SharePoint Connector part of the
>>>>>>>> end-user documentation?  To whit:
>>>>>>>>
>>>>>>>> "Select the server protocol, and enter the server name and port, based
>>>>>>>> on what you recorded from the URL for your SharePoint site. For the
>>>>>>>> "Site path" field, type in the portion of the root site URL that
>>>>>>>> includes everything after the server and port, except for the final
>>>>>>>> "aspx" file. For example, if the SharePoint URL is
>>>>>>>> "http://myserver:81/sites/somewhere/index.asp", the site path would be
>>>>>>>> "/sites/somewhere"."  The Lists.asmx service in this example would be
>>>>>>>> expected to be found at
>>>>>>>> "http://myserver:81/sites/somewhere/_vti_bin/Lists.asmx".  And the URL
>>>>>>>> you would start with would be the URL you see in the browser when you
>>>>>>>> log into the SharePoint web client and go to the site you wish to
>>>>>>>> crawl.  Is this what you are doing?
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks again,
>>>>>>>> Karl
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jan 25, 2012 at 12:33 PM, Karl Wright <da...@gmail.com> wrote:
>>>>>>>>> The code that parses the SOAP response is Apache Axis.  This hasn't
>>>>>>>>> changed in several years.
>>>>>>>>>
>>>>>>>>> Can you answer the following questions:
>>>>>>>>>
>>>>>>>>> (1) When the SharePoint connector makes a request to SharePoint, is
>>>>>>>>> the response HTML, or is it XML?  Does it have an XML header which
>>>>>>>>> describes a Microsoft XML namespace?  It sure sounds like it is
>>>>>>>>> responding with HTML.  The SharePoint connector is expecting to
>>>>>>>>> communicate using SOAP.  Is the response valid SOAP?
>>>>>>>>>
>>>>>>>>> (2) What version of SharePoint are you trying to connect to?  Is the
>>>>>>>>> SharePoint 2007?  SharePoint 2010?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>> On Wed, Jan 25, 2012 at 12:26 PM, Silvia, Daniel [USA]
>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>> Hi Karl
>>>>>>>>>>
>>>>>>>>>> I have added the specific log4j lines for Http Client wire and I restarted the ManifoldCF instance. I was also see the webservice Lists.asmx through IE. When reviewing the log files I was able to see some of the content that resides in the Sharepoint instance in the content coming back from the request. However, I am still seeing the error messages in the ManifoldCF GUI as well as in the log file indicating  "Bad Envelope: HTML" ,"No service named ListsSoap is available" and "No service named http://schemas.microsoft.com/sharepoint/soap/GetListCollection is available".
>>>>>>>>>>
>>>>>>>>>> Could there be something going on with the way the services are being built on the client side?
>>>>>>>>>>
>>>>>>>>>> Appreciate your help.
>>>>>>>>>>
>>>>>>>>>> Dan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ________________________________________
>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>> Sent: Tuesday, January 24, 2012 4:52 PM
>>>>>>>>>> To: Silvia, Daniel [USA]; connectors-user@incubator.apache.org
>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>
>>>>>>>>>> I have not seen this exact problem before.
>>>>>>>>>>
>>>>>>>>>> The "Bad envelope tag: HTML" indicates that the SOAP request the
>>>>>>>>>> SharePoint connector is attempting to perform is, in fact, returning
>>>>>>>>>> an HTML response.  This usually indicates that the server or path
>>>>>>>>>> parameters you've used to set up the connection are not set correctly,
>>>>>>>>>> and SharePoint is not actually being engaged.
>>>>>>>>>>
>>>>>>>>>> But usually when that happens I don't recall a ConfigurationException
>>>>>>>>>> logged, unless it's what Axis does in response to the HTML.
>>>>>>>>>>
>>>>>>>>>> The best thing to do at this point is turn on Http Client wire
>>>>>>>>>> logging, restart ManifoldCF, and view the connection.  The log will
>>>>>>>>>> then contain a record of the exact SOAP requests and the responses,
>>>>>>>>>> and we can see what's wrong.  The technique is described here:
>>>>>>>>>>
>>>>>>>>>> https://cwiki.apache.org/confluence/display/CONNECTORS/Debugging+Connections
>>>>>>>>>>
>>>>>>>>>> You can also confirm that the right SharePoint web services are
>>>>>>>>>> functioning on the machine in question by trying to access them
>>>>>>>>>> directly.  For the Lists web service, which is the one it sounds like
>>>>>>>>>> it was complaining about, try using IE (not Firefox etc because you
>>>>>>>>>> want NTLM support) to go to the url where you think the web service
>>>>>>>>>> lives.  This will be http: or https:, plus the server, plus the port,
>>>>>>>>>> plus the path, plus "_vti_bin/Lists.asmx".  You should see an
>>>>>>>>>> unequivocable SharePoint response.  For an example from the Microsoft
>>>>>>>>>> demo service, try http://www.wssdemo.com/_vti_bin/Lists.asmx.
>>>>>>>>>>
>>>>>>>>>> Please let me know how it goes, and cc the dev list (as I have) so a
>>>>>>>>>> record of what you're encountering can be made available to others.
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Jan 24, 2012 at 1:52 PM, Silvia, Daniel [USA]
>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>> Hi Karl
>>>>>>>>>>>
>>>>>>>>>>> I have downloaded the newest version of ManifoldCF v .4 and have run the necessary ant scripts to download dependencies and then built the entire project. I have also had the ShrePoint webservice MetCarta.SharePoint.MCPermissionsService.wsp deployed on the SharePoint instance due to running version 3 of SharePoint (SharePoint 2007). When I try to create a Repository Connection and select "Save" I get a message on the ManifoldCF front end of "org.xml.sax.SAXException Bad envelope tag: HTML". When I look at the log file I see an error message " org.apache.axis.ConfigurationException: No service named ListsSoap is available".
>>>>>>>>>>>
>>>>>>>>>>> Can you tell me if you have seen this issue before and what may be causing this issue?
>>>>>>>>>>>
>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>
>>>>>>>>>>> Dan
>>>>>>>>>>> ________________________________________
>>>>>>>>>>> From: Karl Wright [daddywri@gmail.com]
>>>>>>>>>>> Sent: Friday, January 20, 2012 7:31 AM
>>>>>>>>>>> To: Silvia, Daniel [USA]
>>>>>>>>>>> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>>>>>>>>>>>
>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>
>>>>>>>>>>> In order for the SharePoint connector to build, you need to have the
>>>>>>>>>>> wsdls in place in the right area.  We cannot ship those because of
>>>>>>>>>>> potential copyright issues.  The easiest way to obtain the right
>>>>>>>>>>> dependencies is:
>>>>>>>>>>>
>>>>>>>>>>> ant download-dependencies
>>>>>>>>>>>
>>>>>>>>>>> Then, just build normally:
>>>>>>>>>>>
>>>>>>>>>>> ant build
>>>>>>>>>>>
>>>>>>>>>>> This will only work for ManifoldCF-0.4-incubating, or trunk.
>>>>>>>>>>> 0.4-incubating is still in the process of being signed off by the
>>>>>>>>>>> incubator, but you can find the release candidate here:
>>>>>>>>>>>
>>>>>>>>>>> http://people.apache.org/~kwright
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jan 20, 2012 at 7:02 AM, Silvia, Daniel [USA]
>>>>>>>>>>> <Si...@bah.com> wrote:
>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>
>>>>>>>>>>>> I work with Matt Parker and we are in the process of developing a pipeline
>>>>>>>>>>>> that uses ManifoldCF at the beginning. I just subscribed to the
>>>>>>>>>>>> connectors-user-subscribe@incubator.apache.org
>>>>>>>>>>>> group yesterday and submitted an e-mail question to the group. Can you help
>>>>>>>>>>>> us with the below issue?
>>>>>>>>>>>>
>>>>>>>>>>>> I downloaded MCF and started playing with the default setup under Jetty and
>>>>>>>>>>>> Derby. It starts up without any issue. I am trying to configure a SharePoint
>>>>>>>>>>>> connector, connecting to SharePoint Service 3. I have been following the
>>>>>>>>>>>> instructions and I am at the point of deploying the custom SharePoint web
>>>>>>>>>>>> service to the SharePoint instance. The instructions indicate that I should
>>>>>>>>>>>> get the web service from dist/sharepoint-integration after building MCF.
>>>>>>>>>>>> However, after looking through the entire directory structure, I am unable
>>>>>>>>>>>> to find the service to deploy.
>>>>>>>>>>>>
>>>>>>>>>>>> Can someone tell me where to find this service?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>>
>>>>>>>>>>>> Daniel Silvia

RE: ManifoldCF's dist/shapoint-integration dir

Posted by "Silvia, Daniel [USA]" <Si...@bah.com>.
Hi Karl

I have added the specific log4j lines for Http Client wire and I restarted the ManifoldCF instance. I was also see the webservice Lists.asmx through IE. When reviewing the log files I was able to see some of the content that resides in the Sharepoint instance in the content coming back from the request. However, I am still seeing the error messages in the ManifoldCF GUI as well as in the log file indicating  "Bad Envelope: HTML" ,"No service named ListsSoap is available" and "No service named http://schemas.microsoft.com/sharepoint/soap/GetListCollection is available".

Could there be something going on with the way the services are being built on the client side?

Appreciate your help.

Dan



________________________________________
From: Karl Wright [daddywri@gmail.com]
Sent: Tuesday, January 24, 2012 4:52 PM
To: Silvia, Daniel [USA]; connectors-user@incubator.apache.org
Subject: Re: ManifoldCF's dist/shapoint-integration dir

I have not seen this exact problem before.

The "Bad envelope tag: HTML" indicates that the SOAP request the
SharePoint connector is attempting to perform is, in fact, returning
an HTML response.  This usually indicates that the server or path
parameters you've used to set up the connection are not set correctly,
and SharePoint is not actually being engaged.

But usually when that happens I don't recall a ConfigurationException
logged, unless it's what Axis does in response to the HTML.

The best thing to do at this point is turn on Http Client wire
logging, restart ManifoldCF, and view the connection.  The log will
then contain a record of the exact SOAP requests and the responses,
and we can see what's wrong.  The technique is described here:

https://cwiki.apache.org/confluence/display/CONNECTORS/Debugging+Connections

You can also confirm that the right SharePoint web services are
functioning on the machine in question by trying to access them
directly.  For the Lists web service, which is the one it sounds like
it was complaining about, try using IE (not Firefox etc because you
want NTLM support) to go to the url where you think the web service
lives.  This will be http: or https:, plus the server, plus the port,
plus the path, plus "_vti_bin/Lists.asmx".  You should see an
unequivocable SharePoint response.  For an example from the Microsoft
demo service, try http://www.wssdemo.com/_vti_bin/Lists.asmx.

Please let me know how it goes, and cc the dev list (as I have) so a
record of what you're encountering can be made available to others.

Thanks!
Karl




On Tue, Jan 24, 2012 at 1:52 PM, Silvia, Daniel [USA]
<Si...@bah.com> wrote:
> Hi Karl
>
> I have downloaded the newest version of ManifoldCF v .4 and have run the necessary ant scripts to download dependencies and then built the entire project. I have also had the ShrePoint webservice MetCarta.SharePoint.MCPermissionsService.wsp deployed on the SharePoint instance due to running version 3 of SharePoint (SharePoint 2007). When I try to create a Repository Connection and select "Save" I get a message on the ManifoldCF front end of "org.xml.sax.SAXException Bad envelope tag: HTML". When I look at the log file I see an error message " org.apache.axis.ConfigurationException: No service named ListsSoap is available".
>
> Can you tell me if you have seen this issue before and what may be causing this issue?
>
> Thanks for your help.
>
> Dan
> ________________________________________
> From: Karl Wright [daddywri@gmail.com]
> Sent: Friday, January 20, 2012 7:31 AM
> To: Silvia, Daniel [USA]
> Subject: Re: ManifoldCF's dist/shapoint-integration dir
>
> Hi Daniel,
>
> In order for the SharePoint connector to build, you need to have the
> wsdls in place in the right area.  We cannot ship those because of
> potential copyright issues.  The easiest way to obtain the right
> dependencies is:
>
> ant download-dependencies
>
> Then, just build normally:
>
> ant build
>
> This will only work for ManifoldCF-0.4-incubating, or trunk.
> 0.4-incubating is still in the process of being signed off by the
> incubator, but you can find the release candidate here:
>
> http://people.apache.org/~kwright
>
> Thanks,
> Karl
>
>
>
> On Fri, Jan 20, 2012 at 7:02 AM, Silvia, Daniel [USA]
> <Si...@bah.com> wrote:
>> Hi Karl
>>
>> I work with Matt Parker and we are in the process of developing a pipeline
>> that uses ManifoldCF at the beginning. I just subscribed to the
>> connectors-user-subscribe@incubator.apache.org
>> group yesterday and submitted an e-mail question to the group. Can you help
>> us with the below issue?
>>
>> I downloaded MCF and started playing with the default setup under Jetty and
>> Derby. It starts up without any issue. I am trying to configure a SharePoint
>> connector, connecting to SharePoint Service 3. I have been following the
>> instructions and I am at the point of deploying the custom SharePoint web
>> service to the SharePoint instance. The instructions indicate that I should
>> get the web service from dist/sharepoint-integration after building MCF.
>> However, after looking through the entire directory structure, I am unable
>> to find the service to deploy.
>>
>> Can someone tell me where to find this service?
>>
>> Thanks for your help.
>>
>> Daniel Silvia