You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Thung, Peter C CIV SPAWARSYSCEN-PACIFIC, 56340" <pe...@navy.mil> on 2009/09/28 11:12:09 UTC
Question on trying to Index and XML document...
With a basically default install of the trunk version of solr 1.4
when trying to index an xml file, it appears that the xml tags
seem to get stripped when indexed.
If the tag names and their frequenicies are important to me for search
purposes could someone tell me what
my options are to not have solr strip out xml tags?
for example
if I have and xml tag of
<tag1> hello </tag1>
I'd like to see tag1 appear twice as a term and count as 2 is some
termFrequency vector.
I was trying out the examples from this link
http://wiki.apache.org/solr/ExtractingRequestHandler
and sending in an xml file.
Would I need to modify some exsiting code or is it just a configuration
to not strip out xml tags in processing?
-Peter
******************************************************************
Peter Thung
Software Developer
IBS Project Technical Lead -Web Developer
Code 56340 - Net-centric ISR Development Branch
Joint & National ISR Systems Division
Inteligence, Surveillance and Reconnaissance Department
US Navy Space & Naval Warfare Systems Center Pacific (SSC PAC)
Topside Campus, Bldg A33, room 0055
53560 Hull Street, San Diego, CA 92152
UNCLASS Email: peter.thung@navy.mil
SIPRNET Email: thungp@spawar.navy.smil.mil
COMM (Primary): (619) 553-6513
COMM (Secondary):(619) 553-0777
FAX: (619) 553-1586
******************************************************************
Re: Question on trying to Index and XML document...
Posted by Lance Norskog <go...@gmail.com>.
Another way to index XML data is to use the normal Solr XML updater
and wrap your XML documents inside CDATA blocks.
On Mon, Sep 28, 2009 at 2:12 AM, Thung, Peter C CIV
SPAWARSYSCEN-PACIFIC, 56340 <pe...@navy.mil> wrote:
> With a basically default install of the trunk version of solr 1.4
> when trying to index an xml file, it appears that the xml tags
> seem to get stripped when indexed.
>
> If the tag names and their frequenicies are important to me for search
> purposes could someone tell me what
> my options are to not have solr strip out xml tags?
> for example
>
> if I have and xml tag of
> <tag1> hello </tag1>
> I'd like to see tag1 appear twice as a term and count as 2 is some
> termFrequency vector.
>
> I was trying out the examples from this link
> http://wiki.apache.org/solr/ExtractingRequestHandler
>
> and sending in an xml file.
>
> Would I need to modify some exsiting code or is it just a configuration
> to not strip out xml tags in processing?
>
> -Peter
>
>
>
>
>
>
>
> ******************************************************************
>
> Peter Thung
>
> Software Developer
>
> IBS Project Technical Lead -Web Developer
>
>
>
> Code 56340 - Net-centric ISR Development Branch
>
> Joint & National ISR Systems Division
>
> Inteligence, Surveillance and Reconnaissance Department
>
> US Navy Space & Naval Warfare Systems Center Pacific (SSC PAC)
>
> Topside Campus, Bldg A33, room 0055
>
> 53560 Hull Street, San Diego, CA 92152
>
>
>
> UNCLASS Email: peter.thung@navy.mil
>
> SIPRNET Email: thungp@spawar.navy.smil.mil
>
> COMM (Primary): (619) 553-6513
>
> COMM (Secondary):(619) 553-0777
>
> FAX: (619) 553-1586
>
> ******************************************************************
>
>
>
>
>
--
Lance Norskog
goksron@gmail.com