You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by "Karl Wright (JIRA)" <ji...@apache.org> on 2017/06/21 11:36:00 UTC

[jira] [Updated] (CONNECTORS-1434) Bad characters in file name can cause Solr 500 errors

     [ https://issues.apache.org/jira/browse/CONNECTORS-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Karl Wright updated CONNECTORS-1434:
------------------------------------
    Attachment: CONNECTORS-1434.patch

Tentative patch, which escapes filename according to the hint found in Stack Overflow.


> Bad characters in file name can cause Solr 500 errors
> -----------------------------------------------------
>
>                 Key: CONNECTORS-1434
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1434
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Lucene/SOLR connector
>    Affects Versions: ManifoldCF 2.7
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 2.8
>
>         Attachments: CONNECTORS-1434.patch
>
>
> There are reports that quotes or spaces in a file name can blow up the Solr indexing of the document and cause it to throw a 500 error.
> The code in question (from ModifiedHttpSolrClient) is the following:
> {code}
>             String name = content.getName();
>             if (name == null) {
>               name = "";
>             }
>             parts.add(new FormBodyPart(name,
>                 new InputStreamBody(
>                     content.getStream(),
>                     contentType,
>                     content.getName())));
> {code}
> ... where content.getName() would be returning a name with illegal characters.  The question is, what does httpclient do with this name, and should it be escaping it in some way?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)