You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by "Karl Wright (JIRA)" <ji...@apache.org> on 2014/09/17 14:49:33 UTC

[jira] [Commented] (CONNECTORS-1034) Manifold 1.7

    [ https://issues.apache.org/jira/browse/CONNECTORS-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137164#comment-14137164 ] 

Karl Wright commented on CONNECTORS-1034:
-----------------------------------------

Hi Edgardo,

First, since this is the same issue as CONNECTORS-956, and CONNECTORS-956 is still open, please let's close this issue and discuss your problem in that ticket.

Second, the issue is that SolrJ (and, apparently, Solr as well, to some extent) simply does not support field names which have characters not that are outside a very specific set.  Until Solr changes this behavior, we cannot fix it.  Even if you managed to send a field that included an illegal character to SolrJ and therefore to Solr, there's no guarantee that that would work.  URL encoding is not ideal for this purpose, so if you could look up the list of disallowed field name characters, we could try to be more specific about which characters we encode and which we don't.

Third, the behavior of SolrJ with regard to this issue is very broken.  SolrJ originally did not do anything to insure that legal XML was generated for field names, because they assumed that nobody would be using field names that contained illegal characters.  So, no encoding at all will almost certainly lead to badly formed XML for many or even most documents, unless SolrJ has been changed to address this issue.  (I opened a SOLR ticket for this problem, but the Solr team declined to fix it for many releases, and since then I've lost track.)

Fourth, now we have backwards compatibility issues, because people have named their solr fields based on ManifoldCF's workaround behavior to the above problems.  Your suggestion of a UI switch would address ONLY this last issue.

SO, given all that, let's continue the discussion in the CONNECTORS-956 ticket, and I'll close this one.



> Manifold 1.7
> ------------
>
>                 Key: CONNECTORS-1034
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1034
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Solr-4.x-component
>    Affects Versions: ManifoldCF 1.7
>            Reporter: Edgardo Ambrosi
>              Labels: patch
>
> Following the issue CONNECTORS-956, since the behavior makes ManifoldCF unuseful for Alfresco-Solr-based environment , because it is impossible to correctly populate Solr, could you provide at least a solution as 
> a checkbox in the "job specification" JSP  page, tab "Solr Field Mapping" near "Keep All Metadata" to choose preEncode() or not.
> Our Use Case is: 
> Alfresco Server 4.2 enterprise, ManifoldCF, Solr server 4.7.1.
> Set a repo connection type CMIS, 
> Set a output connection type Solr, 
> Set a job with cmis query as "select * from cmis:document" (the repo has only 1 document),
> Running the jobs it normally end but...
> querying Solr the result set reports a strange encoding of the field name:
> if in Alfresco the fileds is named: cmis:name
> then in Solr after ManifoldCF has populated it the index contains the encoded field as cmis_3Aname
> Best



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)