You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by neerajp <ne...@yahoo.com> on 2013/10/22 12:54:27 UTC

Solr indexing on urlencoded fields

Hi,
I am a new solr user. I need to integrate Solr with my email application for
searching.
My code is in C++ so I am making REST request to post the data in solr for
making indexes.

I have fields like: from, to , subject, body etc and they can contain the
characters which needs to be urlencoded. So I made a REST request to Solr
with content type text/xml and HTTP POST body contains a xml file with
fields in urlencoded form. After successfully posting data to solr I open
Solr GUI page to check the format of the data which I have posted and I see
that the data is in urlencoded format. I have couple of questions here:

1. Does urlencoded data received by Solr not decoded by Solr automatically ?
2. Which tokenizer should I use to make indexes on urlencoded fields ?
3. Will that tokenizer work for stop words ?

Pls. excuse my limited knowledge on Solr




--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-indexing-on-urlencoded-fields-tp4096994.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr indexing on urlencoded fields

Posted by neerajp <ne...@yahoo.com>.
Thanks for giving me your valuable thoughts.
I used CDATA for escaping the special characters('<', '>', & etc) in XML
file.



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-indexing-on-urlencoded-fields-tp4096994p4097138.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr indexing on urlencoded fields

Posted by Erik Hatcher <er...@gmail.com>.
You only url encode data that is in the URL  If you're posting Solr XML, you need to encode entities appropriately for it to be valid XML but that's not the same as URL encoding.

	Erik

On Oct 22, 2013, at 6:54 AM, neerajp <ne...@yahoo.com> wrote:

> Hi,
> I am a new solr user. I need to integrate Solr with my email application for
> searching.
> My code is in C++ so I am making REST request to post the data in solr for
> making indexes.
> 
> I have fields like: from, to , subject, body etc and they can contain the
> characters which needs to be urlencoded. So I made a REST request to Solr
> with content type text/xml and HTTP POST body contains a xml file with
> fields in urlencoded form. After successfully posting data to solr I open
> Solr GUI page to check the format of the data which I have posted and I see
> that the data is in urlencoded format. I have couple of questions here:
> 
> 1. Does urlencoded data received by Solr not decoded by Solr automatically ?
> 2. Which tokenizer should I use to make indexes on urlencoded fields ?
> 3. Will that tokenizer work for stop words ?
> 
> Pls. excuse my limited knowledge on Solr
> 
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-indexing-on-urlencoded-fields-tp4096994.html
> Sent from the Solr - User mailing list archive at Nabble.com.