You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Divyanand Tiwari <di...@gmail.com> on 2013/03/06 07:54:37 UTC

Re: How can i instruct the Solr/ Solr Cell to output the original HTML document which was fed to it.?

Hi Chris thank you for replying. My "content" field in the schema is
stored="true" and indexed="false" because I am copying the "content" field
in "text" field which is by default indexed="true".

I was having a query that I am able to search in the html documents I had
fed to the solr, but as the results returned by the
Tika/ExtractingRequestHandler is stripped down version of the HTML
document, I am not able to present the document in the original format at
my site. :(

I got certain idea based upon Jack's reply that making my own request
handler and I am working on it.
I'll update if I am coming up with any solution also any help is most
welcomed..!!!

Thank you all for all your support...!!!


On Fri, Feb 22, 2013 at 6:42 AM, Chris Hostetter
<ho...@fucit.org>wrote:

>
> : Hi everyone, i am new to solr technology and not getting a way to get
> back
> : the original HTML document with Hits highlighted into it. what
> : configuration and where i can do to instruct SolrCell/ Tika so that it
> does
> : not strips down the tags of HTML document in the content field.
>
> I _think_ what you want is simply to ensure that you have a "content"
> field in your schema which is stored="true" (and indexed="true" if you
> want to serach on it directly) ... and then ExtractingRequestHandler will
> put the entire XHTML it generates from the documents you index into that
> field.
>
> http://wiki.apache.org/solr/ExtractingRequestHandler
>
> If that isn't what you had in mind, then you need to provide us with more
> details about what you've tried, what results you get, and how exactly
> those results differ fro mwhat you want to get.
>
>
> -Hoss
>



-- 
Regards,
Divyanand Tiwari