You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by "Karl Wright (JIRA)" <ji...@apache.org> on 2019/02/05 13:56:00 UTC
[jira] [Assigned] (CONNECTORS-1579) Error when crawling a MSSQL
table
[ https://issues.apache.org/jira/browse/CONNECTORS-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Karl Wright reassigned CONNECTORS-1579:
---------------------------------------
Assignee: Karl Wright
> Error when crawling a MSSQL table
> ---------------------------------
>
> Key: CONNECTORS-1579
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1579
> Project: ManifoldCF
> Issue Type: Bug
> Components: JDBC connector
> Affects Versions: ManifoldCF 2.12
> Reporter: Donald Van den Driessche
> Assignee: Karl Wright
> Priority: Major
> Attachments: 636_bb2.csv
>
>
> When I'm crawling a MSSQL table through the JDBC connector I get following error on multiple lines:
>
> {noformat}
> FATAL 2019-02-05T13:21:58,929 (Worker thread '40') - Error tossed: Multiple document primary component dispositions not allowed: document '636'
> java.lang.IllegalStateException: Multiple document primary component dispositions not allowed: document '636'
> at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.checkMultipleDispositions(WorkerThread.java:2125) ~[mcf-pull-agent.jar:?]
> at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.noDocument(WorkerThread.java:1624) ~[mcf-pull-agent.jar:?]
> at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.noDocument(WorkerThread.java:1605) ~[mcf-pull-agent.jar:?]
> at org.apache.manifoldcf.crawler.connectors.jdbc.JDBCConnector.processDocuments(JDBCConnector.java:944) ~[?:?]
> at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) [mcf-pull-agent.jar:?]{noformat}
> I looked this error up on the internet and it said that it might have something to do with using the same key for different lines.
> I checked, but I couldn't find any duplicates that match any of the selected fields in the JDBC.
> Hereby my queries:
> Seeding query
> {code:java}
> SELECT pk1 as $(IDCOLUMN)
> FROM dbo.bb2
> WHERE search_url IS NOT NULL
> AND mimetype IS NOT NULL AND mimetype NOT IN ('unknown/unknown', 'application/xml', 'application/zip');
> {code}
> Version check query: none
> Access token query: none
> Data query:
>
>
> {code:java}
> SELECT
> pk1 AS $(IDCOLUMN),
> search_url AS $(URLCOLUMN),
> ISNULL(content, '') AS $(DATACOLUMN),
> doc_id,
> search_url AS url,
> ISNULL(title, '') as title,
> ISNULL(groups,'') as groups,
> ISNULL(type,'') as document_type,
> ISNULL(users, '') as users
> FROM dbo.bb2
> WHERE pk1 IN $(IDLIST);
> {code}
> The hereby added csv is the corresponding line from the table.
> [^636_bb2.csv]
>
> Due to this problem, the whole crawling pipeline is being held up. It keeps on retrying this line.
> Could you help me understand this error?
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)