You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by mkorthuis <mi...@gemvara.com> on 2011/10/14 04:45:55 UTC

SolrJ stripping Carriage Returns

We recently updated our Solr and Solr indexing from DIH using Solr 1.4 to our
own Hadoop import using SolrJ and Solr 3.4.   

While everything seems to be working, we seem to have one stumper of a
problem.  

Any document that has a string field value with a carriage return "\r" is
having that carriage return stripped before being added to the index.  All
line breaks "\n" are not being stripped.  

Example: I am adding the following to my SolrInputDocument - "Peter\r\nPan". 
This is what is the index after commit: "Peter\nPan". 

This did not occur with the DIH.  

Thoughts? Is there a way to not have solrJ strip all carriage returns?

--
View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-stripping-Carriage-Returns-tp3420557p3420557.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrJ stripping Carriage Returns

Posted by mkorthuis <mi...@gemvara.com>.
Hmm..  Reason:

When I debug it in eclipse, I am verifying that the value I am setting in
the SolrInputDocument includes '\r\n'.  However it only has '\n' in the
index.  It is just a simple string field in solr.

On Fri, Oct 14, 2011 at 2:23 PM, Chris Hostetter-3 [via Lucene] <
ml-node+s472066n3422431h58@n3.nabble.com> wrote:

>
> : We recently updated our Solr and Solr indexing from DIH using Solr 1.4 to
> our
> : own Hadoop import using SolrJ and Solr 3.4.
>         ...
> : Any document that has a string field value with a carriage return "\r" is
>
> : having that carriage return stripped before being added to the index.
>  All
> : line breaks "\n" are not being stripped.
>         ...
> : This did not occur with the DIH.
> :
> : Thoughts? Is there a way to not have solrJ strip all carriage returns?
>
> What makes you think this is SolrJ?  If it is, you should be able to
> create a ~10 line test of SOlrJ demonstrating this with hard coded date.
>
> I suspect your data is getting cleaned somewhere else in your data flow
> that didn't exist when DIH was fetching it directly.
>
>
>
> -Hoss
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/SolrJ-stripping-Carriage-Returns-tp3420557p3422431.html
>  To unsubscribe from SolrJ stripping Carriage Returns, click here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3420557&code=bWljaGFlbEBnZW12YXJhLmNvbXwzNDIwNTU3fDIwMzg4NDQ2MDE=>.
>
>



-- 
Michael Korthuis
Platform Architect
Gemvara Inc.


--
View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-stripping-Carriage-Returns-tp3420557p3422479.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrJ stripping Carriage Returns

Posted by Chris Hostetter <ho...@fucit.org>.
: We recently updated our Solr and Solr indexing from DIH using Solr 1.4 to our
: own Hadoop import using SolrJ and Solr 3.4.   
	...
: Any document that has a string field value with a carriage return "\r" is
: having that carriage return stripped before being added to the index.  All
: line breaks "\n" are not being stripped.  
	...
: This did not occur with the DIH.  
: 
: Thoughts? Is there a way to not have solrJ strip all carriage returns?

What makes you think this is SolrJ?  If it is, you should be able to 
create a ~10 line test of SOlrJ demonstrating this with hard coded date.

I suspect your data is getting cleaned somewhere else in your data flow 
that didn't exist when DIH was fetching it directly.



-Hoss