You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by Mark Libucha <ml...@gmail.com> on 2013/10/10 18:04:10 UTC

Crawling SharePoint Lists

I have successfully crawled documents with my SharePoint
RepositoryConnector (MCF 1.3), so I know that things are set up correctly.

But I can't crawl a list.

I'm using a Filesystem output connector, and the job appears to run
successfully (2 of 2 documents), but I see nothing in output directory when
the job is finished. In fact, the output directory is not even created.

As far as I can determine, I followed the instructions under

*Example: How to index SharePoint 2010 Lists

*
exactly. And I am using SharePoint 2010.

My "Greg" list shows up in the MCF UI under Lists, I and choose it.

My job looks like this:
Path rules:  Path match Rule type Action  /Greg list include   Metadata:  Path
match Action All metadata? Fields  /Greg/* include true

Suggestions?

Thanks,

Mark
------------------------------

**

Re: Crawling SharePoint Lists

Posted by Mark Libucha <ml...@gmail.com>.
trunk is no better.

When I turn on debug logging, I see the SharePoint processDocument() calls
all end in exceptions that look something like this:

org.apache.axis.ConfigurationException: No service named
http://schemas.microsoft.com/sharepoint/soap/directory/GetUserCollectionFromGroupis
available

Those URLs 404, but does that matter? Any other suggestions?

Thanks,

Mark



On Thu, Oct 10, 2013 at 1:22 PM, Mark Libucha <ml...@gmail.com> wrote:

> Thanks, guys. I'll give these both a try and let you know what I discover.
> Appreciate the help.
>
>
> On Thu, Oct 10, 2013 at 10:04 AM, Karl Wright <da...@gmail.com> wrote:
>
>> Hi Mark,
>>
>>
>>
>> There are some issues with list crawling that we found in 1.3.  I
>> suggest that you try trunk; it will likely work better for you.
>>
>>
>>
>> Karl
>>
>>
>>
>>
>> On Thu, Oct 10, 2013 at 12:04 PM, Mark Libucha <ml...@gmail.com>wrote:
>>
>>> I have successfully crawled documents with my SharePoint
>>> RepositoryConnector (MCF 1.3), so I know that things are set up correctly.
>>>
>>> But I can't crawl a list.
>>>
>>> I'm using a Filesystem output connector, and the job appears to run
>>> successfully (2 of 2 documents), but I see nothing in output directory when
>>> the job is finished. In fact, the output directory is not even created.
>>>
>>> As far as I can determine, I followed the instructions under
>>>
>>> *Example: How to index SharePoint 2010 Lists
>>>
>>> *
>>> exactly. And I am using SharePoint 2010.
>>>
>>> My "Greg" list shows up in the MCF UI under Lists, I and choose it.
>>>
>>> My job looks like this:
>>> Path rules:  Path match Rule type Action  /Greg list include   Metadata:  Path
>>> match Action All metadata? Fields  /Greg/* include true
>>>
>>> Suggestions?
>>>
>>> Thanks,
>>>
>>> Mark
>>> ------------------------------
>>>
>>> **
>>>
>>
>>
>

Re: Crawling SharePoint Lists

Posted by Mark Libucha <ml...@gmail.com>.
Thanks, guys. I'll give these both a try and let you know what I discover.
Appreciate the help.


On Thu, Oct 10, 2013 at 10:04 AM, Karl Wright <da...@gmail.com> wrote:

> Hi Mark,
>
>
>
> There are some issues with list crawling that we found in 1.3.  I suggest
> that you try trunk; it will likely work better for you.
>
>
>
> Karl
>
>
>
>
> On Thu, Oct 10, 2013 at 12:04 PM, Mark Libucha <ml...@gmail.com> wrote:
>
>> I have successfully crawled documents with my SharePoint
>> RepositoryConnector (MCF 1.3), so I know that things are set up correctly.
>>
>> But I can't crawl a list.
>>
>> I'm using a Filesystem output connector, and the job appears to run
>> successfully (2 of 2 documents), but I see nothing in output directory when
>> the job is finished. In fact, the output directory is not even created.
>>
>> As far as I can determine, I followed the instructions under
>>
>> *Example: How to index SharePoint 2010 Lists
>>
>> *
>> exactly. And I am using SharePoint 2010.
>>
>> My "Greg" list shows up in the MCF UI under Lists, I and choose it.
>>
>> My job looks like this:
>> Path rules:  Path match Rule type Action  /Greg list include   Metadata:  Path
>> match Action All metadata? Fields  /Greg/* include true
>>
>> Suggestions?
>>
>> Thanks,
>>
>> Mark
>> ------------------------------
>>
>> **
>>
>
>

Re: Crawling SharePoint Lists

Posted by Karl Wright <da...@gmail.com>.
Hi Mark,



There are some issues with list crawling that we found in 1.3.  I suggest
that you try trunk; it will likely work better for you.



Karl




On Thu, Oct 10, 2013 at 12:04 PM, Mark Libucha <ml...@gmail.com> wrote:

> I have successfully crawled documents with my SharePoint
> RepositoryConnector (MCF 1.3), so I know that things are set up correctly.
>
> But I can't crawl a list.
>
> I'm using a Filesystem output connector, and the job appears to run
> successfully (2 of 2 documents), but I see nothing in output directory when
> the job is finished. In fact, the output directory is not even created.
>
> As far as I can determine, I followed the instructions under
>
> *Example: How to index SharePoint 2010 Lists
>
> *
> exactly. And I am using SharePoint 2010.
>
> My "Greg" list shows up in the MCF UI under Lists, I and choose it.
>
> My job looks like this:
> Path rules:  Path match Rule type Action  /Greg list include   Metadata:  Path
> match Action All metadata? Fields  /Greg/* include true
>
> Suggestions?
>
> Thanks,
>
> Mark
> ------------------------------
>
> **
>

Re: Crawling SharePoint Lists

Posted by Karl Wright <da...@gmail.com>.
CONNECTORS-787.

Karl


On Wed, Oct 16, 2013 at 3:01 AM, Karl Wright <da...@gmail.com> wrote:

> Confirmed: the Items member is risky to use in large lists because there
> is no paging (so it can cause the SharePoint instance to run out of memory):
>
> "The Items property returns all the files in a document library,
> including files in subfolders, but not the folders themselves. In a
> document library, folders are not considered items.
>
> When you call the Items property, it returns an instance of an
> SPListItemCollection<http://msdn.microsoft.com/en-us/library/sharepoint/microsoft.sharepoint.splistitemcollection%28v=office.14%29.aspx>object that does not contain any data, but on first access to an item from
> the collection, the entire collection object is filled with data.
> Consequently, to improve performance it is recommended that you assign the
> items returned by Items to an SPListItemCollection<http://msdn.microsoft.com/en-us/library/sharepoint/microsoft.sharepoint.splistitemcollection%28v=office.14%29.aspx>object if you must iterate the entire collection, as seen in the example.
> It is best practice is to use one of the GetItem* methods of SPList<http://msdn.microsoft.com/en-us/library/sharepoint/microsoft.sharepoint.splist%28v=office.14%29.aspx>to return a filtered collection of items."
>
>
> So that's why we haven't been doing it that way.  We need the proper CAML
> expression which will allow full return of the discussion board contents.
>
> Nevertheless I'll open a ticket for this functionality; no idea how to
> complete it though.
>
> Karl
>
>
>
>
> On Wed, Oct 16, 2013 at 2:54 AM, Karl Wright <da...@gmail.com> wrote:
>
>> For discussion boards, then, the SharePoint C# API must not be working
>> properly, or we are using it incorrectly.  SharePoint API bugs are way
>> beyond my pay grade to fix.  If you think we are using it improperly, you
>> may have other resources than I have, which is basically just the web page
>> here:
>>
>>
>> http://msdn.microsoft.com/en-us/library/Microsoft.SharePoint.SPList.GetItems%28v=office.14%29.aspx
>>
>> ... and the one describing SPQuery objects here:
>>
>>
>> http://msdn.microsoft.com/en-us/library/microsoft.sharepoint.spquery.query%28v=office.14%29.aspx
>>
>> Specifically I'm missing a description of the schema of Discussion
>> Boards, and how you'd construct a CAML query to get the missing rows.  All
>> of this stuff is pretty mysterious because if it is documented at all it is
>> documented in obscure places.  More full-time Microsoft coders seem to have
>> a similar problem, see:
>>
>>
>> http://social.msdn.microsoft.com/Forums/sharepoint/en-US/afe07483-6aec-424a-9434-c8e8b963e55c/how-to-get-all-the-items-from-a-discussion-board?forum=sharepointdevelopmentlegacy
>>
>> ... where they didn't figure out how to do it either, other than the
>> advice "don't do it that way" or just use the "Items" field, which I'm not
>> sure works in cases where the number of items in the list is large (I'll
>> look into this though).  Maybe you can experiment with the API directly
>> under SharePoint, and recommend C# code changes that will return the
>> missing rows, and if so I am happy to implement it and release it.
>>
>> Thanks,
>> Karl
>>
>>
>>
>>
>>
>> On Tue, Oct 15, 2013 at 8:53 PM, Mark Libucha <ml...@gmail.com> wrote:
>>
>>> Pretty sure. Screenshot attached.
>>>
>>>
>>>
>>> On Tue, Oct 15, 2013 at 3:42 PM, Karl Wright <da...@gmail.com> wrote:
>>>
>>>> Are you sure you haven't deleted two of these rows?  Because the method
>>>> call on the server side is pretty generic:
>>>>
>>>> SPListItemCollection collListItems = oList.GetItems(listQuery);
>>>>
>>>> ... where listQuery is this:
>>>>
>>>>                     SPQuery listQuery = new SPQuery();
>>>>                         listQuery.Query = "<OrderBy
>>>> Override=\"TRUE\"><FieldRef Name=\"FileRef\" /></OrderBy>";
>>>>                         listQuery.QueryThrottleMode =
>>>> SPQueryThrottleOption.Override;
>>>>                         listQuery.ViewAttributes =
>>>> "Scope=\"Recursive\"";
>>>>                         listQuery.ViewFields = "<FieldRef
>>>> Name='FileRef' />";
>>>>                         listQuery.RowLimit = 1000;
>>>>
>>>> It's the same code that is used for all other lists as well, and those
>>>> do not suffer any lost rows - I tested that just now against Dmitry's
>>>> SharePoint instance.
>>>>
>>>> Karl
>>>>
>>>>
>>>>
>>>> On Tue, Oct 15, 2013 at 6:23 PM, Mark Libucha <ml...@gmail.com>wrote:
>>>>
>>>>> Hi Karl,
>>>>>
>>>>> Thanks for the quick attention. It's better, but not fixed?
>>>>>
>>>>> I am now getting metadata for the one list row we were choking on
>>>>> before, but it doesn't see the other two rows at all. I think the relevant
>>>>> part of the log is this:
>>>>>
>>>>>
>>>>> SharePoint: Document identifier is a list: '/DiscussStuff'
>>>>>
>>>>> SharePoint: getListItems xml response: '<GetListItems xmlns="
>>>>> http://schemas.microsoft.com/sharepoint/soap/directory/"><GetListItemsResponse
>>>>> xmlns=""><GetListItemsResult
>>>>> FileRef="Lists/DiscussStuff/Giants/3_.000"/></GetListItemsResponse></GetListItems>'
>>>>>
>>>>> There should be a 1_.000 and a 2._000 as well.
>>>>>
>>>>> Maybe the problem is in the webapp on the SharePoint server?
>>>>>
>>>>> Thanks again for all the help.
>>>>>
>>>>> Mark
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Oct 15, 2013 at 2:31 PM, Karl Wright <da...@gmail.com>wrote:
>>>>>
>>>>>> Just resolved this ticket, on trunk.
>>>>>>
>>>>>> Please synch up and try again.
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Oct 15, 2013 at 4:59 PM, Karl Wright <da...@gmail.com>wrote:
>>>>>>
>>>>>>> CONNECTORS-786.
>>>>>>>
>>>>>>> I've prioritized this as very high because this is functionality
>>>>>>> that used to work but is now broken because I added attachment support.
>>>>>>> With luck I will be able to look at it later tonight.
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 15, 2013 at 4:36 PM, Karl Wright <da...@gmail.com>wrote:
>>>>>>>
>>>>>>>> This is the problem:
>>>>>>>>
>>>>>>>>
>>>>>>>> SharePoint: Can't get version of '/DiscussStuff///Giants/3_.
>>>>>>>> 000' because modified date or attachment url not found
>>>>>>>>
>>>>>>>> It looks like it decided that the list item was in fact an
>>>>>>>> attachment, which makes sense because it was a compound list id.
>>>>>>>>
>>>>>>>> I'll open a ticket for this.
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>> Karl
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Oct 15, 2013 at 3:40 PM, Mark Libucha <ml...@gmail.com>wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Oct 15, 2013 at 11:50 AM, Karl Wright <da...@gmail.com>wrote:
>>>>>>>>>
>>>>>>>>>> Do you see any of these in the log?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> There are 3 rows in my discussion group -- two topics, one post in
>>>>>>>>> each, one with a reply. In the logs I'm only seeing one of them (the
>>>>>>>>> chronologically last to be put into SharePoint).
>>>>>>>>>
>>>>>>>>> The log looks like this -- maybe that last message means it's
>>>>>>>>> choking on this list and giving up on processing it further?
>>>>>>>>>
>>>>>>>>> Thanks...
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> SharePoint: Document identifier is a list: '/DiscussStuff'
>>>>>>>>>
>>>>>>>>> SharePoint: getListItems xml response: '<GetListItems xmlns="
>>>>>>>>> http://schemas.microsoft.com/sharepoint/soap/directory/"><GetListItemsResponse
>>>>>>>>> xmlns=""><GetListItemsResult
>>>>>>>>> FileRef="Lists/DiscussStuff/Giants/3_.000"/></GetListItemsResponse></GetListItems>'
>>>>>>>>>
>>>>>>>>> SharePoint: Checking whether to include list item
>>>>>>>>> '/DiscussStuff/Giants/3_.000'
>>>>>>>>>
>>>>>>>>> SharePoint: Getting version of '/DiscussStuff///Giants/3_.000'
>>>>>>>>>
>>>>>>>>> SharePoint: Checking whether to include list item attachment
>>>>>>>>> '/DiscussStuff/Giants/3_.000'
>>>>>>>>>
>>>>>>>>> SharePoint: Can't get version of '/DiscussStuff///Giants/3_.000'
>>>>>>>>> because modified date or attachment url not found
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Crawling SharePoint Lists

Posted by Karl Wright <da...@gmail.com>.
Confirmed: the Items member is risky to use in large lists because there is
no paging (so it can cause the SharePoint instance to run out of memory):

"The Items property returns all the files in a document library, including
files in subfolders, but not the folders themselves. In a document library,
folders are not considered items.

When you call the Items property, it returns an instance of an
SPListItemCollection<http://msdn.microsoft.com/en-us/library/sharepoint/microsoft.sharepoint.splistitemcollection%28v=office.14%29.aspx>object
that does not contain any data, but on first access to an item from
the collection, the entire collection object is filled with data.
Consequently, to improve performance it is recommended that you assign the
items returned by Items to an
SPListItemCollection<http://msdn.microsoft.com/en-us/library/sharepoint/microsoft.sharepoint.splistitemcollection%28v=office.14%29.aspx>object
if you must iterate the entire collection, as seen in the example.
It is best practice is to use one of the GetItem* methods of
SPList<http://msdn.microsoft.com/en-us/library/sharepoint/microsoft.sharepoint.splist%28v=office.14%29.aspx>to
return a filtered collection of items."


So that's why we haven't been doing it that way.  We need the proper CAML
expression which will allow full return of the discussion board contents.

Nevertheless I'll open a ticket for this functionality; no idea how to
complete it though.

Karl




On Wed, Oct 16, 2013 at 2:54 AM, Karl Wright <da...@gmail.com> wrote:

> For discussion boards, then, the SharePoint C# API must not be working
> properly, or we are using it incorrectly.  SharePoint API bugs are way
> beyond my pay grade to fix.  If you think we are using it improperly, you
> may have other resources than I have, which is basically just the web page
> here:
>
>
> http://msdn.microsoft.com/en-us/library/Microsoft.SharePoint.SPList.GetItems%28v=office.14%29.aspx
>
> ... and the one describing SPQuery objects here:
>
>
> http://msdn.microsoft.com/en-us/library/microsoft.sharepoint.spquery.query%28v=office.14%29.aspx
>
> Specifically I'm missing a description of the schema of Discussion Boards,
> and how you'd construct a CAML query to get the missing rows.  All of this
> stuff is pretty mysterious because if it is documented at all it is
> documented in obscure places.  More full-time Microsoft coders seem to have
> a similar problem, see:
>
>
> http://social.msdn.microsoft.com/Forums/sharepoint/en-US/afe07483-6aec-424a-9434-c8e8b963e55c/how-to-get-all-the-items-from-a-discussion-board?forum=sharepointdevelopmentlegacy
>
> ... where they didn't figure out how to do it either, other than the
> advice "don't do it that way" or just use the "Items" field, which I'm not
> sure works in cases where the number of items in the list is large (I'll
> look into this though).  Maybe you can experiment with the API directly
> under SharePoint, and recommend C# code changes that will return the
> missing rows, and if so I am happy to implement it and release it.
>
> Thanks,
> Karl
>
>
>
>
>
> On Tue, Oct 15, 2013 at 8:53 PM, Mark Libucha <ml...@gmail.com> wrote:
>
>> Pretty sure. Screenshot attached.
>>
>>
>>
>> On Tue, Oct 15, 2013 at 3:42 PM, Karl Wright <da...@gmail.com> wrote:
>>
>>> Are you sure you haven't deleted two of these rows?  Because the method
>>> call on the server side is pretty generic:
>>>
>>> SPListItemCollection collListItems = oList.GetItems(listQuery);
>>>
>>> ... where listQuery is this:
>>>
>>>                     SPQuery listQuery = new SPQuery();
>>>                         listQuery.Query = "<OrderBy
>>> Override=\"TRUE\"><FieldRef Name=\"FileRef\" /></OrderBy>";
>>>                         listQuery.QueryThrottleMode =
>>> SPQueryThrottleOption.Override;
>>>                         listQuery.ViewAttributes = "Scope=\"Recursive\"";
>>>                         listQuery.ViewFields = "<FieldRef Name='FileRef'
>>> />";
>>>                         listQuery.RowLimit = 1000;
>>>
>>> It's the same code that is used for all other lists as well, and those
>>> do not suffer any lost rows - I tested that just now against Dmitry's
>>> SharePoint instance.
>>>
>>> Karl
>>>
>>>
>>>
>>> On Tue, Oct 15, 2013 at 6:23 PM, Mark Libucha <ml...@gmail.com>wrote:
>>>
>>>> Hi Karl,
>>>>
>>>> Thanks for the quick attention. It's better, but not fixed?
>>>>
>>>> I am now getting metadata for the one list row we were choking on
>>>> before, but it doesn't see the other two rows at all. I think the relevant
>>>> part of the log is this:
>>>>
>>>>
>>>> SharePoint: Document identifier is a list: '/DiscussStuff'
>>>>
>>>> SharePoint: getListItems xml response: '<GetListItems xmlns="
>>>> http://schemas.microsoft.com/sharepoint/soap/directory/"><GetListItemsResponse
>>>> xmlns=""><GetListItemsResult
>>>> FileRef="Lists/DiscussStuff/Giants/3_.000"/></GetListItemsResponse></GetListItems>'
>>>>
>>>> There should be a 1_.000 and a 2._000 as well.
>>>>
>>>> Maybe the problem is in the webapp on the SharePoint server?
>>>>
>>>> Thanks again for all the help.
>>>>
>>>> Mark
>>>>
>>>>
>>>>
>>>> On Tue, Oct 15, 2013 at 2:31 PM, Karl Wright <da...@gmail.com>wrote:
>>>>
>>>>> Just resolved this ticket, on trunk.
>>>>>
>>>>> Please synch up and try again.
>>>>>
>>>>> Karl
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Oct 15, 2013 at 4:59 PM, Karl Wright <da...@gmail.com>wrote:
>>>>>
>>>>>> CONNECTORS-786.
>>>>>>
>>>>>> I've prioritized this as very high because this is functionality that
>>>>>> used to work but is now broken because I added attachment support.  With
>>>>>> luck I will be able to look at it later tonight.
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Oct 15, 2013 at 4:36 PM, Karl Wright <da...@gmail.com>wrote:
>>>>>>
>>>>>>> This is the problem:
>>>>>>>
>>>>>>>
>>>>>>> SharePoint: Can't get version of '/DiscussStuff///Giants/3_.
>>>>>>> 000' because modified date or attachment url not found
>>>>>>>
>>>>>>> It looks like it decided that the list item was in fact an
>>>>>>> attachment, which makes sense because it was a compound list id.
>>>>>>>
>>>>>>> I'll open a ticket for this.
>>>>>>>
>>>>>>> Thanks!
>>>>>>> Karl
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 15, 2013 at 3:40 PM, Mark Libucha <ml...@gmail.com>wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Oct 15, 2013 at 11:50 AM, Karl Wright <da...@gmail.com>wrote:
>>>>>>>>
>>>>>>>>> Do you see any of these in the log?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> There are 3 rows in my discussion group -- two topics, one post in
>>>>>>>> each, one with a reply. In the logs I'm only seeing one of them (the
>>>>>>>> chronologically last to be put into SharePoint).
>>>>>>>>
>>>>>>>> The log looks like this -- maybe that last message means it's
>>>>>>>> choking on this list and giving up on processing it further?
>>>>>>>>
>>>>>>>> Thanks...
>>>>>>>>
>>>>>>>>
>>>>>>>> SharePoint: Document identifier is a list: '/DiscussStuff'
>>>>>>>>
>>>>>>>> SharePoint: getListItems xml response: '<GetListItems xmlns="
>>>>>>>> http://schemas.microsoft.com/sharepoint/soap/directory/"><GetListItemsResponse
>>>>>>>> xmlns=""><GetListItemsResult
>>>>>>>> FileRef="Lists/DiscussStuff/Giants/3_.000"/></GetListItemsResponse></GetListItems>'
>>>>>>>>
>>>>>>>> SharePoint: Checking whether to include list item
>>>>>>>> '/DiscussStuff/Giants/3_.000'
>>>>>>>>
>>>>>>>> SharePoint: Getting version of '/DiscussStuff///Giants/3_.000'
>>>>>>>>
>>>>>>>> SharePoint: Checking whether to include list item attachment
>>>>>>>> '/DiscussStuff/Giants/3_.000'
>>>>>>>>
>>>>>>>> SharePoint: Can't get version of '/DiscussStuff///Giants/3_.000'
>>>>>>>> because modified date or attachment url not found
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Crawling SharePoint Lists

Posted by Karl Wright <da...@gmail.com>.
For discussion boards, then, the SharePoint C# API must not be working
properly, or we are using it incorrectly.  SharePoint API bugs are way
beyond my pay grade to fix.  If you think we are using it improperly, you
may have other resources than I have, which is basically just the web page
here:

http://msdn.microsoft.com/en-us/library/Microsoft.SharePoint.SPList.GetItems%28v=office.14%29.aspx

... and the one describing SPQuery objects here:

http://msdn.microsoft.com/en-us/library/microsoft.sharepoint.spquery.query%28v=office.14%29.aspx

Specifically I'm missing a description of the schema of Discussion Boards,
and how you'd construct a CAML query to get the missing rows.  All of this
stuff is pretty mysterious because if it is documented at all it is
documented in obscure places.  More full-time Microsoft coders seem to have
a similar problem, see:

http://social.msdn.microsoft.com/Forums/sharepoint/en-US/afe07483-6aec-424a-9434-c8e8b963e55c/how-to-get-all-the-items-from-a-discussion-board?forum=sharepointdevelopmentlegacy

... where they didn't figure out how to do it either, other than the advice
"don't do it that way" or just use the "Items" field, which I'm not sure
works in cases where the number of items in the list is large (I'll look
into this though).  Maybe you can experiment with the API directly under
SharePoint, and recommend C# code changes that will return the missing
rows, and if so I am happy to implement it and release it.

Thanks,
Karl





On Tue, Oct 15, 2013 at 8:53 PM, Mark Libucha <ml...@gmail.com> wrote:

> Pretty sure. Screenshot attached.
>
>
>
> On Tue, Oct 15, 2013 at 3:42 PM, Karl Wright <da...@gmail.com> wrote:
>
>> Are you sure you haven't deleted two of these rows?  Because the method
>> call on the server side is pretty generic:
>>
>> SPListItemCollection collListItems = oList.GetItems(listQuery);
>>
>> ... where listQuery is this:
>>
>>                     SPQuery listQuery = new SPQuery();
>>                         listQuery.Query = "<OrderBy
>> Override=\"TRUE\"><FieldRef Name=\"FileRef\" /></OrderBy>";
>>                         listQuery.QueryThrottleMode =
>> SPQueryThrottleOption.Override;
>>                         listQuery.ViewAttributes = "Scope=\"Recursive\"";
>>                         listQuery.ViewFields = "<FieldRef Name='FileRef'
>> />";
>>                         listQuery.RowLimit = 1000;
>>
>> It's the same code that is used for all other lists as well, and those do
>> not suffer any lost rows - I tested that just now against Dmitry's
>> SharePoint instance.
>>
>> Karl
>>
>>
>>
>> On Tue, Oct 15, 2013 at 6:23 PM, Mark Libucha <ml...@gmail.com> wrote:
>>
>>> Hi Karl,
>>>
>>> Thanks for the quick attention. It's better, but not fixed?
>>>
>>> I am now getting metadata for the one list row we were choking on
>>> before, but it doesn't see the other two rows at all. I think the relevant
>>> part of the log is this:
>>>
>>>
>>> SharePoint: Document identifier is a list: '/DiscussStuff'
>>>
>>> SharePoint: getListItems xml response: '<GetListItems xmlns="
>>> http://schemas.microsoft.com/sharepoint/soap/directory/"><GetListItemsResponse
>>> xmlns=""><GetListItemsResult
>>> FileRef="Lists/DiscussStuff/Giants/3_.000"/></GetListItemsResponse></GetListItems>'
>>>
>>> There should be a 1_.000 and a 2._000 as well.
>>>
>>> Maybe the problem is in the webapp on the SharePoint server?
>>>
>>> Thanks again for all the help.
>>>
>>> Mark
>>>
>>>
>>>
>>> On Tue, Oct 15, 2013 at 2:31 PM, Karl Wright <da...@gmail.com> wrote:
>>>
>>>> Just resolved this ticket, on trunk.
>>>>
>>>> Please synch up and try again.
>>>>
>>>> Karl
>>>>
>>>>
>>>>
>>>> On Tue, Oct 15, 2013 at 4:59 PM, Karl Wright <da...@gmail.com>wrote:
>>>>
>>>>> CONNECTORS-786.
>>>>>
>>>>> I've prioritized this as very high because this is functionality that
>>>>> used to work but is now broken because I added attachment support.  With
>>>>> luck I will be able to look at it later tonight.
>>>>>
>>>>> Karl
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Oct 15, 2013 at 4:36 PM, Karl Wright <da...@gmail.com>wrote:
>>>>>
>>>>>> This is the problem:
>>>>>>
>>>>>>
>>>>>> SharePoint: Can't get version of '/DiscussStuff///Giants/3_.
>>>>>> 000' because modified date or attachment url not found
>>>>>>
>>>>>> It looks like it decided that the list item was in fact an
>>>>>> attachment, which makes sense because it was a compound list id.
>>>>>>
>>>>>> I'll open a ticket for this.
>>>>>>
>>>>>> Thanks!
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Oct 15, 2013 at 3:40 PM, Mark Libucha <ml...@gmail.com>wrote:
>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 15, 2013 at 11:50 AM, Karl Wright <da...@gmail.com>wrote:
>>>>>>>
>>>>>>>> Do you see any of these in the log?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> There are 3 rows in my discussion group -- two topics, one post in
>>>>>>> each, one with a reply. In the logs I'm only seeing one of them (the
>>>>>>> chronologically last to be put into SharePoint).
>>>>>>>
>>>>>>> The log looks like this -- maybe that last message means it's
>>>>>>> choking on this list and giving up on processing it further?
>>>>>>>
>>>>>>> Thanks...
>>>>>>>
>>>>>>>
>>>>>>> SharePoint: Document identifier is a list: '/DiscussStuff'
>>>>>>>
>>>>>>> SharePoint: getListItems xml response: '<GetListItems xmlns="
>>>>>>> http://schemas.microsoft.com/sharepoint/soap/directory/"><GetListItemsResponse
>>>>>>> xmlns=""><GetListItemsResult
>>>>>>> FileRef="Lists/DiscussStuff/Giants/3_.000"/></GetListItemsResponse></GetListItems>'
>>>>>>>
>>>>>>> SharePoint: Checking whether to include list item
>>>>>>> '/DiscussStuff/Giants/3_.000'
>>>>>>>
>>>>>>> SharePoint: Getting version of '/DiscussStuff///Giants/3_.000'
>>>>>>>
>>>>>>> SharePoint: Checking whether to include list item attachment
>>>>>>> '/DiscussStuff/Giants/3_.000'
>>>>>>>
>>>>>>> SharePoint: Can't get version of '/DiscussStuff///Giants/3_.000'
>>>>>>> because modified date or attachment url not found
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Crawling SharePoint Lists

Posted by Mark Libucha <ml...@gmail.com>.
Pretty sure. Screenshot attached.



On Tue, Oct 15, 2013 at 3:42 PM, Karl Wright <da...@gmail.com> wrote:

> Are you sure you haven't deleted two of these rows?  Because the method
> call on the server side is pretty generic:
>
> SPListItemCollection collListItems = oList.GetItems(listQuery);
>
> ... where listQuery is this:
>
>                     SPQuery listQuery = new SPQuery();
>                         listQuery.Query = "<OrderBy
> Override=\"TRUE\"><FieldRef Name=\"FileRef\" /></OrderBy>";
>                         listQuery.QueryThrottleMode =
> SPQueryThrottleOption.Override;
>                         listQuery.ViewAttributes = "Scope=\"Recursive\"";
>                         listQuery.ViewFields = "<FieldRef Name='FileRef'
> />";
>                         listQuery.RowLimit = 1000;
>
> It's the same code that is used for all other lists as well, and those do
> not suffer any lost rows - I tested that just now against Dmitry's
> SharePoint instance.
>
> Karl
>
>
>
> On Tue, Oct 15, 2013 at 6:23 PM, Mark Libucha <ml...@gmail.com> wrote:
>
>> Hi Karl,
>>
>> Thanks for the quick attention. It's better, but not fixed?
>>
>> I am now getting metadata for the one list row we were choking on before,
>> but it doesn't see the other two rows at all. I think the relevant part of
>> the log is this:
>>
>>
>> SharePoint: Document identifier is a list: '/DiscussStuff'
>>
>> SharePoint: getListItems xml response: '<GetListItems xmlns="
>> http://schemas.microsoft.com/sharepoint/soap/directory/"><GetListItemsResponse
>> xmlns=""><GetListItemsResult
>> FileRef="Lists/DiscussStuff/Giants/3_.000"/></GetListItemsResponse></GetListItems>'
>>
>> There should be a 1_.000 and a 2._000 as well.
>>
>> Maybe the problem is in the webapp on the SharePoint server?
>>
>> Thanks again for all the help.
>>
>> Mark
>>
>>
>>
>> On Tue, Oct 15, 2013 at 2:31 PM, Karl Wright <da...@gmail.com> wrote:
>>
>>> Just resolved this ticket, on trunk.
>>>
>>> Please synch up and try again.
>>>
>>> Karl
>>>
>>>
>>>
>>> On Tue, Oct 15, 2013 at 4:59 PM, Karl Wright <da...@gmail.com> wrote:
>>>
>>>> CONNECTORS-786.
>>>>
>>>> I've prioritized this as very high because this is functionality that
>>>> used to work but is now broken because I added attachment support.  With
>>>> luck I will be able to look at it later tonight.
>>>>
>>>> Karl
>>>>
>>>>
>>>>
>>>> On Tue, Oct 15, 2013 at 4:36 PM, Karl Wright <da...@gmail.com>wrote:
>>>>
>>>>> This is the problem:
>>>>>
>>>>>
>>>>> SharePoint: Can't get version of '/DiscussStuff///Giants/3_.
>>>>> 000' because modified date or attachment url not found
>>>>>
>>>>> It looks like it decided that the list item was in fact an attachment,
>>>>> which makes sense because it was a compound list id.
>>>>>
>>>>> I'll open a ticket for this.
>>>>>
>>>>> Thanks!
>>>>> Karl
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Oct 15, 2013 at 3:40 PM, Mark Libucha <ml...@gmail.com>wrote:
>>>>>
>>>>>>
>>>>>> On Tue, Oct 15, 2013 at 11:50 AM, Karl Wright <da...@gmail.com>wrote:
>>>>>>
>>>>>>> Do you see any of these in the log?
>>>>>>
>>>>>>
>>>>>>
>>>>>> There are 3 rows in my discussion group -- two topics, one post in
>>>>>> each, one with a reply. In the logs I'm only seeing one of them (the
>>>>>> chronologically last to be put into SharePoint).
>>>>>>
>>>>>> The log looks like this -- maybe that last message means it's choking
>>>>>> on this list and giving up on processing it further?
>>>>>>
>>>>>> Thanks...
>>>>>>
>>>>>>
>>>>>> SharePoint: Document identifier is a list: '/DiscussStuff'
>>>>>>
>>>>>> SharePoint: getListItems xml response: '<GetListItems xmlns="
>>>>>> http://schemas.microsoft.com/sharepoint/soap/directory/"><GetListItemsResponse
>>>>>> xmlns=""><GetListItemsResult
>>>>>> FileRef="Lists/DiscussStuff/Giants/3_.000"/></GetListItemsResponse></GetListItems>'
>>>>>>
>>>>>> SharePoint: Checking whether to include list item
>>>>>> '/DiscussStuff/Giants/3_.000'
>>>>>>
>>>>>> SharePoint: Getting version of '/DiscussStuff///Giants/3_.000'
>>>>>>
>>>>>> SharePoint: Checking whether to include list item attachment
>>>>>> '/DiscussStuff/Giants/3_.000'
>>>>>>
>>>>>> SharePoint: Can't get version of '/DiscussStuff///Giants/3_.000'
>>>>>> because modified date or attachment url not found
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Crawling SharePoint Lists

Posted by Karl Wright <da...@gmail.com>.
Are you sure you haven't deleted two of these rows?  Because the method
call on the server side is pretty generic:

SPListItemCollection collListItems = oList.GetItems(listQuery);

... where listQuery is this:

                    SPQuery listQuery = new SPQuery();
                        listQuery.Query = "<OrderBy
Override=\"TRUE\"><FieldRef Name=\"FileRef\" /></OrderBy>";
                        listQuery.QueryThrottleMode =
SPQueryThrottleOption.Override;
                        listQuery.ViewAttributes = "Scope=\"Recursive\"";
                        listQuery.ViewFields = "<FieldRef Name='FileRef'
/>";
                        listQuery.RowLimit = 1000;

It's the same code that is used for all other lists as well, and those do
not suffer any lost rows - I tested that just now against Dmitry's
SharePoint instance.

Karl



On Tue, Oct 15, 2013 at 6:23 PM, Mark Libucha <ml...@gmail.com> wrote:

> Hi Karl,
>
> Thanks for the quick attention. It's better, but not fixed?
>
> I am now getting metadata for the one list row we were choking on before,
> but it doesn't see the other two rows at all. I think the relevant part of
> the log is this:
>
>
> SharePoint: Document identifier is a list: '/DiscussStuff'
>
> SharePoint: getListItems xml response: '<GetListItems xmlns="
> http://schemas.microsoft.com/sharepoint/soap/directory/"><GetListItemsResponse
> xmlns=""><GetListItemsResult
> FileRef="Lists/DiscussStuff/Giants/3_.000"/></GetListItemsResponse></GetListItems>'
>
> There should be a 1_.000 and a 2._000 as well.
>
> Maybe the problem is in the webapp on the SharePoint server?
>
> Thanks again for all the help.
>
> Mark
>
>
>
> On Tue, Oct 15, 2013 at 2:31 PM, Karl Wright <da...@gmail.com> wrote:
>
>> Just resolved this ticket, on trunk.
>>
>> Please synch up and try again.
>>
>> Karl
>>
>>
>>
>> On Tue, Oct 15, 2013 at 4:59 PM, Karl Wright <da...@gmail.com> wrote:
>>
>>> CONNECTORS-786.
>>>
>>> I've prioritized this as very high because this is functionality that
>>> used to work but is now broken because I added attachment support.  With
>>> luck I will be able to look at it later tonight.
>>>
>>> Karl
>>>
>>>
>>>
>>> On Tue, Oct 15, 2013 at 4:36 PM, Karl Wright <da...@gmail.com> wrote:
>>>
>>>> This is the problem:
>>>>
>>>>
>>>> SharePoint: Can't get version of '/DiscussStuff///Giants/3_.
>>>> 000' because modified date or attachment url not found
>>>>
>>>> It looks like it decided that the list item was in fact an attachment,
>>>> which makes sense because it was a compound list id.
>>>>
>>>> I'll open a ticket for this.
>>>>
>>>> Thanks!
>>>> Karl
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Oct 15, 2013 at 3:40 PM, Mark Libucha <ml...@gmail.com>wrote:
>>>>
>>>>>
>>>>> On Tue, Oct 15, 2013 at 11:50 AM, Karl Wright <da...@gmail.com>wrote:
>>>>>
>>>>>> Do you see any of these in the log?
>>>>>
>>>>>
>>>>>
>>>>> There are 3 rows in my discussion group -- two topics, one post in
>>>>> each, one with a reply. In the logs I'm only seeing one of them (the
>>>>> chronologically last to be put into SharePoint).
>>>>>
>>>>> The log looks like this -- maybe that last message means it's choking
>>>>> on this list and giving up on processing it further?
>>>>>
>>>>> Thanks...
>>>>>
>>>>>
>>>>> SharePoint: Document identifier is a list: '/DiscussStuff'
>>>>>
>>>>> SharePoint: getListItems xml response: '<GetListItems xmlns="
>>>>> http://schemas.microsoft.com/sharepoint/soap/directory/"><GetListItemsResponse
>>>>> xmlns=""><GetListItemsResult
>>>>> FileRef="Lists/DiscussStuff/Giants/3_.000"/></GetListItemsResponse></GetListItems>'
>>>>>
>>>>> SharePoint: Checking whether to include list item
>>>>> '/DiscussStuff/Giants/3_.000'
>>>>>
>>>>> SharePoint: Getting version of '/DiscussStuff///Giants/3_.000'
>>>>>
>>>>> SharePoint: Checking whether to include list item attachment
>>>>> '/DiscussStuff/Giants/3_.000'
>>>>>
>>>>> SharePoint: Can't get version of '/DiscussStuff///Giants/3_.000'
>>>>> because modified date or attachment url not found
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Crawling SharePoint Lists

Posted by Mark Libucha <ml...@gmail.com>.
Hi Karl,

Thanks for the quick attention. It's better, but not fixed?

I am now getting metadata for the one list row we were choking on before,
but it doesn't see the other two rows at all. I think the relevant part of
the log is this:

SharePoint: Document identifier is a list: '/DiscussStuff'

SharePoint: getListItems xml response: '<GetListItems xmlns="
http://schemas.microsoft.com/sharepoint/soap/directory/"><GetListItemsResponse
xmlns=""><GetListItemsResult
FileRef="Lists/DiscussStuff/Giants/3_.000"/></GetListItemsResponse></GetListItems>'

There should be a 1_.000 and a 2._000 as well.

Maybe the problem is in the webapp on the SharePoint server?

Thanks again for all the help.

Mark



On Tue, Oct 15, 2013 at 2:31 PM, Karl Wright <da...@gmail.com> wrote:

> Just resolved this ticket, on trunk.
>
> Please synch up and try again.
>
> Karl
>
>
>
> On Tue, Oct 15, 2013 at 4:59 PM, Karl Wright <da...@gmail.com> wrote:
>
>> CONNECTORS-786.
>>
>> I've prioritized this as very high because this is functionality that
>> used to work but is now broken because I added attachment support.  With
>> luck I will be able to look at it later tonight.
>>
>> Karl
>>
>>
>>
>> On Tue, Oct 15, 2013 at 4:36 PM, Karl Wright <da...@gmail.com> wrote:
>>
>>> This is the problem:
>>>
>>>
>>> SharePoint: Can't get version of '/DiscussStuff///Giants/3_.
>>> 000' because modified date or attachment url not found
>>>
>>> It looks like it decided that the list item was in fact an attachment,
>>> which makes sense because it was a compound list id.
>>>
>>> I'll open a ticket for this.
>>>
>>> Thanks!
>>> Karl
>>>
>>>
>>>
>>>
>>> On Tue, Oct 15, 2013 at 3:40 PM, Mark Libucha <ml...@gmail.com>wrote:
>>>
>>>>
>>>> On Tue, Oct 15, 2013 at 11:50 AM, Karl Wright <da...@gmail.com>wrote:
>>>>
>>>>> Do you see any of these in the log?
>>>>
>>>>
>>>>
>>>> There are 3 rows in my discussion group -- two topics, one post in
>>>> each, one with a reply. In the logs I'm only seeing one of them (the
>>>> chronologically last to be put into SharePoint).
>>>>
>>>> The log looks like this -- maybe that last message means it's choking
>>>> on this list and giving up on processing it further?
>>>>
>>>> Thanks...
>>>>
>>>>
>>>> SharePoint: Document identifier is a list: '/DiscussStuff'
>>>>
>>>> SharePoint: getListItems xml response: '<GetListItems xmlns="
>>>> http://schemas.microsoft.com/sharepoint/soap/directory/"><GetListItemsResponse
>>>> xmlns=""><GetListItemsResult
>>>> FileRef="Lists/DiscussStuff/Giants/3_.000"/></GetListItemsResponse></GetListItems>'
>>>>
>>>> SharePoint: Checking whether to include list item
>>>> '/DiscussStuff/Giants/3_.000'
>>>>
>>>> SharePoint: Getting version of '/DiscussStuff///Giants/3_.000'
>>>>
>>>> SharePoint: Checking whether to include list item attachment
>>>> '/DiscussStuff/Giants/3_.000'
>>>>
>>>> SharePoint: Can't get version of '/DiscussStuff///Giants/3_.000'
>>>> because modified date or attachment url not found
>>>>
>>>>
>>>
>>
>

Re: Crawling SharePoint Lists

Posted by Karl Wright <da...@gmail.com>.
Just resolved this ticket, on trunk.

Please synch up and try again.

Karl



On Tue, Oct 15, 2013 at 4:59 PM, Karl Wright <da...@gmail.com> wrote:

> CONNECTORS-786.
>
> I've prioritized this as very high because this is functionality that used
> to work but is now broken because I added attachment support.  With luck I
> will be able to look at it later tonight.
>
> Karl
>
>
>
> On Tue, Oct 15, 2013 at 4:36 PM, Karl Wright <da...@gmail.com> wrote:
>
>> This is the problem:
>>
>>
>> SharePoint: Can't get version of '/DiscussStuff///Giants/3_.
>> 000' because modified date or attachment url not found
>>
>> It looks like it decided that the list item was in fact an attachment,
>> which makes sense because it was a compound list id.
>>
>> I'll open a ticket for this.
>>
>> Thanks!
>> Karl
>>
>>
>>
>>
>> On Tue, Oct 15, 2013 at 3:40 PM, Mark Libucha <ml...@gmail.com> wrote:
>>
>>>
>>> On Tue, Oct 15, 2013 at 11:50 AM, Karl Wright <da...@gmail.com>wrote:
>>>
>>>> Do you see any of these in the log?
>>>
>>>
>>>
>>> There are 3 rows in my discussion group -- two topics, one post in each,
>>> one with a reply. In the logs I'm only seeing one of them (the
>>> chronologically last to be put into SharePoint).
>>>
>>> The log looks like this -- maybe that last message means it's choking on
>>> this list and giving up on processing it further?
>>>
>>> Thanks...
>>>
>>>
>>> SharePoint: Document identifier is a list: '/DiscussStuff'
>>>
>>> SharePoint: getListItems xml response: '<GetListItems xmlns="
>>> http://schemas.microsoft.com/sharepoint/soap/directory/"><GetListItemsResponse
>>> xmlns=""><GetListItemsResult
>>> FileRef="Lists/DiscussStuff/Giants/3_.000"/></GetListItemsResponse></GetListItems>'
>>>
>>> SharePoint: Checking whether to include list item
>>> '/DiscussStuff/Giants/3_.000'
>>>
>>> SharePoint: Getting version of '/DiscussStuff///Giants/3_.000'
>>>
>>> SharePoint: Checking whether to include list item attachment
>>> '/DiscussStuff/Giants/3_.000'
>>>
>>> SharePoint: Can't get version of '/DiscussStuff///Giants/3_.000' because
>>> modified date or attachment url not found
>>>
>>>
>>
>

Re: Crawling SharePoint Lists

Posted by Karl Wright <da...@gmail.com>.
CONNECTORS-786.

I've prioritized this as very high because this is functionality that used
to work but is now broken because I added attachment support.  With luck I
will be able to look at it later tonight.

Karl



On Tue, Oct 15, 2013 at 4:36 PM, Karl Wright <da...@gmail.com> wrote:

> This is the problem:
>
>
> SharePoint: Can't get version of '/DiscussStuff///Giants/3_.
> 000' because modified date or attachment url not found
>
> It looks like it decided that the list item was in fact an attachment,
> which makes sense because it was a compound list id.
>
> I'll open a ticket for this.
>
> Thanks!
> Karl
>
>
>
>
> On Tue, Oct 15, 2013 at 3:40 PM, Mark Libucha <ml...@gmail.com> wrote:
>
>>
>> On Tue, Oct 15, 2013 at 11:50 AM, Karl Wright <da...@gmail.com> wrote:
>>
>>> Do you see any of these in the log?
>>
>>
>>
>> There are 3 rows in my discussion group -- two topics, one post in each,
>> one with a reply. In the logs I'm only seeing one of them (the
>> chronologically last to be put into SharePoint).
>>
>> The log looks like this -- maybe that last message means it's choking on
>> this list and giving up on processing it further?
>>
>> Thanks...
>>
>>
>> SharePoint: Document identifier is a list: '/DiscussStuff'
>>
>> SharePoint: getListItems xml response: '<GetListItems xmlns="
>> http://schemas.microsoft.com/sharepoint/soap/directory/"><GetListItemsResponse
>> xmlns=""><GetListItemsResult
>> FileRef="Lists/DiscussStuff/Giants/3_.000"/></GetListItemsResponse></GetListItems>'
>>
>> SharePoint: Checking whether to include list item
>> '/DiscussStuff/Giants/3_.000'
>>
>> SharePoint: Getting version of '/DiscussStuff///Giants/3_.000'
>>
>> SharePoint: Checking whether to include list item attachment
>> '/DiscussStuff/Giants/3_.000'
>>
>> SharePoint: Can't get version of '/DiscussStuff///Giants/3_.000' because
>> modified date or attachment url not found
>>
>>
>

Re: Crawling SharePoint Lists

Posted by Karl Wright <da...@gmail.com>.
This is the problem:

SharePoint: Can't get version of '/DiscussStuff///Giants/3_.
000' because modified date or attachment url not found

It looks like it decided that the list item was in fact an attachment,
which makes sense because it was a compound list id.

I'll open a ticket for this.

Thanks!
Karl




On Tue, Oct 15, 2013 at 3:40 PM, Mark Libucha <ml...@gmail.com> wrote:

>
> On Tue, Oct 15, 2013 at 11:50 AM, Karl Wright <da...@gmail.com> wrote:
>
>> Do you see any of these in the log?
>
>
>
> There are 3 rows in my discussion group -- two topics, one post in each,
> one with a reply. In the logs I'm only seeing one of them (the
> chronologically last to be put into SharePoint).
>
> The log looks like this -- maybe that last message means it's choking on
> this list and giving up on processing it further?
>
> Thanks...
>
>
> SharePoint: Document identifier is a list: '/DiscussStuff'
>
> SharePoint: getListItems xml response: '<GetListItems xmlns="
> http://schemas.microsoft.com/sharepoint/soap/directory/"><GetListItemsResponse
> xmlns=""><GetListItemsResult
> FileRef="Lists/DiscussStuff/Giants/3_.000"/></GetListItemsResponse></GetListItems>'
>
> SharePoint: Checking whether to include list item
> '/DiscussStuff/Giants/3_.000'
>
> SharePoint: Getting version of '/DiscussStuff///Giants/3_.000'
>
> SharePoint: Checking whether to include list item attachment
> '/DiscussStuff/Giants/3_.000'
>
> SharePoint: Can't get version of '/DiscussStuff///Giants/3_.000' because
> modified date or attachment url not found
>
>

Re: Crawling SharePoint Lists

Posted by Mark Libucha <ml...@gmail.com>.
On Tue, Oct 15, 2013 at 11:50 AM, Karl Wright <da...@gmail.com> wrote:

> Do you see any of these in the log?



There are 3 rows in my discussion group -- two topics, one post in each,
one with a reply. In the logs I'm only seeing one of them (the
chronologically last to be put into SharePoint).

The log looks like this -- maybe that last message means it's choking on
this list and giving up on processing it further?

Thanks...


SharePoint: Document identifier is a list: '/DiscussStuff'

SharePoint: getListItems xml response: '<GetListItems xmlns="
http://schemas.microsoft.com/sharepoint/soap/directory/"><GetListItemsResponse
xmlns=""><GetListItemsResult
FileRef="Lists/DiscussStuff/Giants/3_.000"/></GetListItemsResponse></GetListItems>'

SharePoint: Checking whether to include list item
'/DiscussStuff/Giants/3_.000'

SharePoint: Getting version of '/DiscussStuff///Giants/3_.000'

SharePoint: Checking whether to include list item attachment
'/DiscussStuff/Giants/3_.000'

SharePoint: Can't get version of '/DiscussStuff///Giants/3_.000' because
modified date or attachment url not found

Re: Crawling SharePoint Lists

Posted by Karl Wright <da...@gmail.com>.
Hi Mark,

Since you are seeing no entries whatsoever from underneath the discussion
list, but the discussion list itself is present, we know that the list
itself is being processed.  If you turn on connector debugging (property
org.apache.manifoldcf.connectors set to DEBUG), and recrawl, you should see
some of the following messages in the log pertaining to the discussion
group:

              Logging.connectors.debug( "SharePoint: Document identifier is
a list: '" + siteListPath + "'" );

                      Logging.connectors.debug("SharePoint: No list found
for list '"+siteListPath+"' - deleting");

                    Logging.connectors.debug("SharePoint: Access token
lookup failed for list '"+siteListPath+"' - deleting");

                  Logging.connectors.debug("SharePoint: Field list lookup
failed for list '"+siteListPath+"' - deleting");

                Logging.connectors.debug("SharePoint: GUID lookup failed
for list '"+siteListPath+"' - deleting");

Which of these do you see?

If the code *does* manage to discover list items, you would be expected to
see messages like this:

                  Logging.connectors.debug("SharePoint: List
'"+decodedListPath+"' no longer exists - deleting item
'"+documentIdentifier+"'");

                  Logging.connectors.debug( "SharePoint: Processing list
item '"+documentIdentifier+"'; url: '" + itemUrl + "'" );

                      Logging.connectors.debug("SharePoint: Item metadata
fetch failure indicated that item is gone: '"+documentIdentifier+"' -
removing");

Do you see any of these in the log?

Karl



On Tue, Oct 15, 2013 at 2:40 PM, Mark Libucha <ml...@gmail.com> wrote:

> Yes, screen shot attached.
>
>
>
>
> On Tue, Oct 15, 2013 at 11:35 AM, Karl Wright <da...@gmail.com> wrote:
>
>> Hi Mark,
>>
>> If you get a Document Status Report after running the job, do you see the
>> missing list's document identifier in the queue?
>>
>> Karl
>>
>>
>>
>> On Tue, Oct 15, 2013 at 2:32 PM, Mark Libucha <ml...@gmail.com> wrote:
>>
>>>
>>> On Tue, Oct 15, 2013 at 10:30 AM, Karl Wright <da...@gmail.com>wrote:
>>>
>>>> Can you please describe what goes wrong with "Discussion Boards"?
>>>
>>>
>>>
>>> I'm using a Filesystem output connector. I've added some debug code in
>>> there at the very top of the addOrReplaceDocument() method which prints out
>>> the uri, fields etc. (Printing it out because the Filesystem connector
>>> doesn't write this stuff to disk for lists.)
>>>
>>> So, when I choose from the Job's "Add List" dropdown, if it's a task
>>> list (like "Tasks"), or a contact list, addOrReplaceDocument() gets called
>>> for each row of list data and my code prints out the uri and the fields.
>>>
>>> However, if the list is a discussion group, addOrReplaceDocument() never
>>> gets called.
>>>
>>> I repeated the same test, and got the same results, using the Solr
>>> output connector.
>>>
>>> Mark
>>>
>>
>>
>

Re: Crawling SharePoint Lists

Posted by Mark Libucha <ml...@gmail.com>.
Yes, screen shot attached.




On Tue, Oct 15, 2013 at 11:35 AM, Karl Wright <da...@gmail.com> wrote:

> Hi Mark,
>
> If you get a Document Status Report after running the job, do you see the
> missing list's document identifier in the queue?
>
> Karl
>
>
>
> On Tue, Oct 15, 2013 at 2:32 PM, Mark Libucha <ml...@gmail.com> wrote:
>
>>
>> On Tue, Oct 15, 2013 at 10:30 AM, Karl Wright <da...@gmail.com> wrote:
>>
>>> Can you please describe what goes wrong with "Discussion Boards"?
>>
>>
>>
>> I'm using a Filesystem output connector. I've added some debug code in
>> there at the very top of the addOrReplaceDocument() method which prints out
>> the uri, fields etc. (Printing it out because the Filesystem connector
>> doesn't write this stuff to disk for lists.)
>>
>> So, when I choose from the Job's "Add List" dropdown, if it's a task list
>> (like "Tasks"), or a contact list, addOrReplaceDocument() gets called for
>> each row of list data and my code prints out the uri and the fields.
>>
>> However, if the list is a discussion group, addOrReplaceDocument() never
>> gets called.
>>
>> I repeated the same test, and got the same results, using the Solr output
>> connector.
>>
>> Mark
>>
>
>

Re: Crawling SharePoint Lists

Posted by Karl Wright <da...@gmail.com>.
Hi Mark,

If you get a Document Status Report after running the job, do you see the
missing list's document identifier in the queue?

Karl



On Tue, Oct 15, 2013 at 2:32 PM, Mark Libucha <ml...@gmail.com> wrote:

>
> On Tue, Oct 15, 2013 at 10:30 AM, Karl Wright <da...@gmail.com> wrote:
>
>> Can you please describe what goes wrong with "Discussion Boards"?
>
>
>
> I'm using a Filesystem output connector. I've added some debug code in
> there at the very top of the addOrReplaceDocument() method which prints out
> the uri, fields etc. (Printing it out because the Filesystem connector
> doesn't write this stuff to disk for lists.)
>
> So, when I choose from the Job's "Add List" dropdown, if it's a task list
> (like "Tasks"), or a contact list, addOrReplaceDocument() gets called for
> each row of list data and my code prints out the uri and the fields.
>
> However, if the list is a discussion group, addOrReplaceDocument() never
> gets called.
>
> I repeated the same test, and got the same results, using the Solr output
> connector.
>
> Mark
>

Re: Crawling SharePoint Lists

Posted by Mark Libucha <ml...@gmail.com>.
On Tue, Oct 15, 2013 at 10:30 AM, Karl Wright <da...@gmail.com> wrote:

> Can you please describe what goes wrong with "Discussion Boards"?



I'm using a Filesystem output connector. I've added some debug code in
there at the very top of the addOrReplaceDocument() method which prints out
the uri, fields etc. (Printing it out because the Filesystem connector
doesn't write this stuff to disk for lists.)

So, when I choose from the Job's "Add List" dropdown, if it's a task list
(like "Tasks"), or a contact list, addOrReplaceDocument() gets called for
each row of list data and my code prints out the uri and the fields.

However, if the list is a discussion group, addOrReplaceDocument() never
gets called.

I repeated the same test, and got the same results, using the Solr output
connector.

Mark

Re: Crawling SharePoint Lists

Posted by Karl Wright <da...@gmail.com>.
Can you please describe what goes wrong with "Discussion Boards"?

Karl



On Tue, Oct 15, 2013 at 1:26 PM, Mark Libucha <ml...@gmail.com> wrote:

> I made some progress on this...Dmitry turned out to be right, but I didn't
> fully understand exactly what he was saying.
>
> The mistake I make it that I entered a "Site path:" in my SharePoint
> Repository Connection definition. Doing that seems to cause the Job "Path"
> and "Metadata" specifications fail whenever I tried to include a SharePoint
> list.
>
> So, I left "Site path:" blank, and my site showed up correctly under
> "Sites" on the Job page, which was what Dmitry was try to tell me. Choosing
> it there, and specifying "Path" and "Metadata" works for my lists now.
>
> However, there is still (at least) one thing that is not working.
> "Discussion Boards" show up as Lists, but they are not crawled correctly
> (SharePoint 2010).
>
> Thanks again for the feedback.
>
> Mark
>
>
> On Thu, Oct 10, 2013 at 9:16 AM, Dmitry Goldenberg <dgoldenberg@kmwllc.com
> > wrote:
>
>> Mark,
>> I think you may have to build up the fuller path for that list, like
>> include the site before it...
>> - Dmitry
>>
>>
>> On Thu, Oct 10, 2013 at 12:04 PM, Mark Libucha <ml...@gmail.com>wrote:
>>
>>> I have successfully crawled documents with my SharePoint
>>> RepositoryConnector (MCF 1.3), so I know that things are set up correctly.
>>>
>>> But I can't crawl a list.
>>>
>>> I'm using a Filesystem output connector, and the job appears to run
>>> successfully (2 of 2 documents), but I see nothing in output directory when
>>> the job is finished. In fact, the output directory is not even created.
>>>
>>> As far as I can determine, I followed the instructions under
>>>
>>> *Example: How to index SharePoint 2010 Lists
>>>
>>> *
>>> exactly. And I am using SharePoint 2010.
>>>
>>> My "Greg" list shows up in the MCF UI under Lists, I and choose it.
>>>
>>> My job looks like this:
>>> Path rules:  Path match Rule type Action  /Greg list include   Metadata:  Path
>>> match Action All metadata? Fields  /Greg/* include true
>>>
>>> Suggestions?
>>>
>>> Thanks,
>>>
>>> Mark
>>> ------------------------------
>>>
>>> **
>>>
>>
>>
>

Re: Crawling SharePoint Lists

Posted by Mark Libucha <ml...@gmail.com>.
I made some progress on this...Dmitry turned out to be right, but I didn't
fully understand exactly what he was saying.

The mistake I make it that I entered a "Site path:" in my SharePoint
Repository Connection definition. Doing that seems to cause the Job "Path"
and "Metadata" specifications fail whenever I tried to include a SharePoint
list.

So, I left "Site path:" blank, and my site showed up correctly under
"Sites" on the Job page, which was what Dmitry was try to tell me. Choosing
it there, and specifying "Path" and "Metadata" works for my lists now.

However, there is still (at least) one thing that is not working.
"Discussion Boards" show up as Lists, but they are not crawled correctly
(SharePoint 2010).

Thanks again for the feedback.

Mark


On Thu, Oct 10, 2013 at 9:16 AM, Dmitry Goldenberg
<dg...@kmwllc.com>wrote:

> Mark,
> I think you may have to build up the fuller path for that list, like
> include the site before it...
> - Dmitry
>
>
> On Thu, Oct 10, 2013 at 12:04 PM, Mark Libucha <ml...@gmail.com> wrote:
>
>> I have successfully crawled documents with my SharePoint
>> RepositoryConnector (MCF 1.3), so I know that things are set up correctly.
>>
>> But I can't crawl a list.
>>
>> I'm using a Filesystem output connector, and the job appears to run
>> successfully (2 of 2 documents), but I see nothing in output directory when
>> the job is finished. In fact, the output directory is not even created.
>>
>> As far as I can determine, I followed the instructions under
>>
>> *Example: How to index SharePoint 2010 Lists
>>
>> *
>> exactly. And I am using SharePoint 2010.
>>
>> My "Greg" list shows up in the MCF UI under Lists, I and choose it.
>>
>> My job looks like this:
>> Path rules:  Path match Rule type Action  /Greg list include   Metadata:  Path
>> match Action All metadata? Fields  /Greg/* include true
>>
>> Suggestions?
>>
>> Thanks,
>>
>> Mark
>> ------------------------------
>>
>> **
>>
>
>

Re: Crawling SharePoint Lists

Posted by Dmitry Goldenberg <dg...@kmwllc.com>.
Mark,
I think you may have to build up the fuller path for that list, like
include the site before it...
- Dmitry


On Thu, Oct 10, 2013 at 12:04 PM, Mark Libucha <ml...@gmail.com> wrote:

> I have successfully crawled documents with my SharePoint
> RepositoryConnector (MCF 1.3), so I know that things are set up correctly.
>
> But I can't crawl a list.
>
> I'm using a Filesystem output connector, and the job appears to run
> successfully (2 of 2 documents), but I see nothing in output directory when
> the job is finished. In fact, the output directory is not even created.
>
> As far as I can determine, I followed the instructions under
>
> *Example: How to index SharePoint 2010 Lists
>
> *
> exactly. And I am using SharePoint 2010.
>
> My "Greg" list shows up in the MCF UI under Lists, I and choose it.
>
> My job looks like this:
> Path rules:  Path match Rule type Action  /Greg list include   Metadata:  Path
> match Action All metadata? Fields  /Greg/* include true
>
> Suggestions?
>
> Thanks,
>
> Mark
> ------------------------------
>
> **
>