You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Eran Buchnick <bu...@gmail.com> on 2020/12/01 05:30:46 UTC

Can solr index replacement character

Hi community,
During integration tests with new data source I have noticed weird scenario
where replacement character can't be searched, though, seems to be stored.
I mean, honestly, I don't want that irrelevant data stored in my index but
I wondered if solr can index replacement character (U+FFFD �) as string, if
so, how to search it?
And in general, is there any built-in char filtration?!

Thanks

Re: Can solr index replacement character

Posted by Erick Erickson <er...@gmail.com>.
Solr handles UTF-8, so it should be able to. The problem you’ll have is
getting the UTF-8 characters to get through all the various transport
encodings, i.e. if you try to search from a browser, you need to encode
it so the browser passes it through. If you search through SolrJ, it needs
to be encoded at that level. If you use cURL, it needs another….

> On Dec 1, 2020, at 12:30 AM, Eran Buchnick <bu...@gmail.com> wrote:
> 
> Hi community,
> During integration tests with new data source I have noticed weird scenario
> where replacement character can't be searched, though, seems to be stored.
> I mean, honestly, I don't want that irrelevant data stored in my index but
> I wondered if solr can index replacement character (U+FFFD �) as string, if
> so, how to search it?
> And in general, is there any built-in char filtration?!
> 
> Thanks