You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by "Karl Wright (JIRA)" <ji...@apache.org> on 2013/01/31 00:19:13 UTC

[jira] [Created] (CONNECTORS-630) Documents whose names begin with hash mark blow up the ManifoldCF solr connector

Karl Wright created CONNECTORS-630:
--------------------------------------

             Summary: Documents whose names begin with hash mark blow up the ManifoldCF solr connector
                 Key: CONNECTORS-630
                 URL: https://issues.apache.org/jira/browse/CONNECTORS-630
             Project: ManifoldCF
          Issue Type: Bug
          Components: Lucene/SOLR connector
    Affects Versions: ManifoldCF 1.1
            Reporter: Karl Wright
            Assignee: Karl Wright
            Priority: Critical
             Fix For: ManifoldCF 1.2


If a document has a name with a hash symbol (#) in it, and you try to ingest that into Solr via the Solr connector, SolrJ throws an IllegalArgumentException and the worker thread goes into an infinite loop.


{code}
FATAL 2013-01-30 17:46:13,664 (Worker thread '20') - Error tossed: Illegal character in query at index 537: http://localhost:8080/solr/Lisa/update/extract?literal.id=https%3A%2F%2Fopentextdev2.llan.ll.mit.edu%2Fcs%2Fllisapi.dll%3Ffunc%3Dll%26objID%3D1016599%26objAction%3Ddownload&literal.allow_token_document=LISA-Authority-DEV%3A1005367&literal.allow_token_document=LISA-Authority-DEV%3A68276&literal.allow_token_document=LISA-Authority-DEV%3A796642&literal.allow_token_document=LISA-Authority-DEV%3AGUEST&literal.allow_token_document=LISA-Authority-DEV%3ASYSTEM&literal.deny_token_document=LISA-Authority-DEV%3ADEAD_AUTHORITY&literal.Document Info:Keyword / Phrase=%3F&literal.general_creator=th23825&literal.Document Info:Performing Organization=%3F&literal.general_description=&literal.general_modifier=th23825&literal.general_creationdate=Wed+Nov+14+09%3A28%3A16+EST+2012&literal.Document Info:Document Date=%3F&literal.Document Info:Document Author(s)=%3F&literal.general_name=%23raodoc4.txt%3E&literal.ll_filename=%23raodoc4.txt%3E&literal.general_owner=th23825&literal.Document Info:Document Revision Notes=%3F&literal.Document Info:Data Classification=For+Laboratory+Use+Only+%28FLUO%29&literal.general_modifydate=Wed+Nov+14+09%3A28%3A16+EST+2012&literal.Document Info:Document Description=%3F&commitWithin=4000&wt=xml&version=2.2

java.lang.IllegalArgumentException: Illegal character in query at index 537: http://localhost:8080/solr/Lisa/update/extract?literal.id=https%3A%2F%2Fopentextdev2.llan.ll.mit.edu%2Fcs%2Fllisapi.dll%3Ffunc%3Dll%26objID%3D1016599%26objAction%3Ddownload&literal.allow_token_document=LISA-Authority-DEV%3A1005367&literal.allow_token_document=LISA-Authority-DEV%3A68276&literal.allow_token_document=LISA-Authority-DEV%3A796642&literal.allow_token_document=LISA-Authority-DEV%3AGUEST&literal.allow_token_document=LISA-Authority-DEV%3ASYSTEM&literal.deny_token_document=LISA-Authority-DEV%3ADEAD_AUTHORITY&literal.Document Info:Keyword / Phrase=%3F&literal.general_creator=th23825&literal.Document Info:Performing Organization=%3F&literal.general_description=&literal.general_modifier=th23825&literal.general_creationdate=Wed+Nov+14+09%3A28%3A16+EST+2012&literal.Document Info:Document Date=%3F&literal.Document Info:Document Author(s)=%3F&literal.general_name=%23raodoc4.txt%3E&literal.ll_filename=%23raodoc4.txt%3E&literal.general_owner=th23825&literal.Document Info:Document Revision Notes=%3F&literal.Document Info:Data Classification=For+Laboratory+Use+Only+%28FLUO%29&literal.general_modifydate=Wed+Nov+14+09%3A28%3A16+EST+2012&literal.Document Info:Document Description=%3F&commitWithin=4000&wt=xml&version=2.2

                at java.net.URI.create(Unknown Source)

                at org.apache.http.client.methods.HttpPost.<init>(HttpPost.java:76)

                at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:286)

                at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)

                at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)

                at org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:797)

Caused by: java.net.URISyntaxException: Illegal character in query at index 537: http://localhost:8080/solr/Lisa/update/extract?literal.id=https%3A%2F%2Fopentextdev2.llan.ll.mit.edu%2Fcs%2Fllisapi.dll%3Ffunc%3Dll%26objID%3D1016599%26objAction%3Ddownload&literal.allow_token_document=LISA-Authority-DEV%3A1005367&literal.allow_token_document=LISA-Authority-DEV%3A68276&literal.allow_token_document=LISA-Authority-DEV%3A796642&literal.allow_token_document=LISA-Authority-DEV%3AGUEST&literal.allow_token_document=LISA-Authority-DEV%3ASYSTEM&literal.deny_token_document=LISA-Authority-DEV%3ADEAD_AUTHORITY&literal.Document Info:Keyword / Phrase=%3F&literal.general_creator=th23825&literal.Document Info:Performing Organization=%3F&literal.general_description=&literal.general_modifier=th23825&literal.general_creationdate=Wed+Nov+14+09%3A28%3A16+EST+2012&literal.Document Info:Document Date=%3F&literal.Document Info:Document Author(s)=%3F&literal.general_name=%23raodoc4.txt%3E&literal.ll_filename=%23raodoc4.txt%3E&literal.general_owner=th23825&literal.Document Info:Document Revision Notes=%3F&literal.Document Info:Data Classification=For+Laboratory+Use+Only+%28FLUO%29&literal.general_modifydate=Wed+Nov+14+09%3A28%3A16+EST+2012&literal.Document Info:Document Description=%3F&commitWithin=4000&wt=xml&version=2.2

                at java.net.URI$Parser.fail(Unknown Source)

                at java.net.URI$Parser.checkChars(Unknown Source)

                at java.net.URI$Parser.parseHierarchical(Unknown Source)

                at java.net.URI$Parser.parse(Unknown Source)

                at java.net.URI.<init>(Unknown Source)

                ... 6 more
{code}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira