You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Alexandre Rafalovitch (JIRA)" <ji...@apache.org> on 2018/05/26 02:43:00 UTC

[jira] [Commented] (SOLR-12403) CSVLoader cannot split fields that contain new lines

    [ https://issues.apache.org/jira/browse/SOLR-12403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16491474#comment-16491474 ] 

Alexandre Rafalovitch commented on SOLR-12403:
----------------------------------------------

I believe the problem is caused by *CSVLoaderBase.FieldSplitter* calling *parser.getLine()*.

That call is used in the outer code but the encapsulator character in the parser triggers the branch that consumes the newline as part of the field. However, when we call the parser again recursively on that field (with new lines), the parser does not know it was encapsulated and just reads the first line of the value, ignoring the rest.

> CSVLoader cannot split fields that contain new lines
> ----------------------------------------------------
>
>                 Key: SOLR-12403
>                 URL: https://issues.apache.org/jira/browse/SOLR-12403
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: update
>    Affects Versions: 7.3
>            Reporter: Alexandre Rafalovitch
>            Priority: Minor
>
> It is possible to import CSV that contains newlines in the field content, it just needs to be escaped.
> However, if that field is split, any content from lines after the first is lost. It does not matter if the split character is new line or anything else, existing or not.
> Example
> {code:java}
> id,text1,text2
> 1,"t1.line1
> t1.line2
> t1.line3",t2
> 2,t1.oneline,t2.oneline
> {code}
> {code:java}
> // bin/solr create -c splittest
> // bin/post -c splittest test.csv (creates "text1":["t1.line1\nt1.line2\nt1.line3"])
> // bin/post -c splittest -params "f.text1.split=true&f.text1.separator=^" test.csv (creates "text1":["t1.line1"])
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org