You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Shalin Shekhar Mangar <sh...@gmail.com> on 2008/04/19 07:41:04 UTC

Re: indexing text containing xml tags

CC'ing the solr-user mailing list because that is the right list for usage
questions.
You'll need to XML encode your title field. Basically you need to replace
'<' with &lt; etc, then you will be able to index them.

On Sat, Apr 19, 2008 at 10:54 AM, Saurabh Kataria <sk...@ist.psu.edu>
wrote:

>
> Hi everyone,
>
> I am having a problem while indexing my document. A very typical field of
> my document looks like:
>
> <field name="title">pK<small><sub>a</sub></small> Values of the Opened
> Form of a
> Thieno-1,2,4-triazolo-1,4-diazepine in Water</field>
>
> solr has a problem indexing this because of the xml tags. I was wondering
> if there is any way that I can index this field "title" without stripping
> off my tags. If anyone could help me out, that wld be great.
>
> Thanks,
> SK.
>



-- 
Regards,
Shalin Shekhar Mangar.

RE: indexing text containing xml tags

Posted by "Norskog, Lance" <la...@divvio.com>.
We wrap everything in CDATA tags. Works great. 

-----Original Message-----
From: Shalin Shekhar Mangar [mailto:shalinmangar@gmail.com] 
Sent: Friday, April 18, 2008 10:41 PM
To: solr-dev@lucene.apache.org
Cc: solr-user@lucene.apache.org
Subject: Re: indexing text containing xml tags

CC'ing the solr-user mailing list because that is the right list for
usage questions.
You'll need to XML encode your title field. Basically you need to
replace '<' with &lt; etc, then you will be able to index them.

On Sat, Apr 19, 2008 at 10:54 AM, Saurabh Kataria <sk...@ist.psu.edu>
wrote:

>
> Hi everyone,
>
> I am having a problem while indexing my document. A very typical field

> of my document looks like:
>
> <field name="title">pK<small><sub>a</sub></small> Values of the Opened

> Form of a Thieno-1,2,4-triazolo-1,4-diazepine in Water</field>
>
> solr has a problem indexing this because of the xml tags. I was 
> wondering if there is any way that I can index this field "title" 
> without stripping off my tags. If anyone could help me out, that wld
be great.
>
> Thanks,
> SK.
>



--
Regards,
Shalin Shekhar Mangar.

RE: indexing text containing xml tags

Posted by Saurabh Kataria <sk...@ist.psu.edu>.
Thanks Shalin. That worked. Also, I will make sure that the next time I post to the right mailing list :).

Saurabh.



-----Original Message-----
From: Shalin Shekhar Mangar [mailto:shalinmangar@gmail.com]
Sent: Sat 4/19/2008 1:41 AM
To: solr-dev@lucene.apache.org
Cc: solr-user@lucene.apache.org
Subject: Re: indexing text containing xml tags
 
CC'ing the solr-user mailing list because that is the right list for usage
questions.
You'll need to XML encode your title field. Basically you need to replace
'<' with &lt; etc, then you will be able to index them.

On Sat, Apr 19, 2008 at 10:54 AM, Saurabh Kataria <sk...@ist.psu.edu>
wrote:

>
> Hi everyone,
>
> I am having a problem while indexing my document. A very typical field of
> my document looks like:
>
> <field name="title">pK<small><sub>a</sub></small> Values of the Opened
> Form of a
> Thieno-1,2,4-triazolo-1,4-diazepine in Water</field>
>
> solr has a problem indexing this because of the xml tags. I was wondering
> if there is any way that I can index this field "title" without stripping
> off my tags. If anyone could help me out, that wld be great.
>
> Thanks,
> SK.
>



-- 
Regards,
Shalin Shekhar Mangar.