You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Phong Dais <ph...@gmail.com> on 2010/10/26 12:50:26 UTC

Highlighting for non-stored fields

Hi,

I've been looking thru the mailing archive for the past week and I haven't
found any useful info regarding this issue.

My requirement is to index a few terabytes worth of data to be searched.
Due to the size of the data, I would like to index without storing but I
would like to use the highlighting feature.  Is this even possible?  What
are my options?

I've read about termOffsets, payload that could possibly be used to do this
but I have no idea how this could be done.

Any pointers greatly appreciated.  Someone please point me in the right
direction.

 I don't mind having to write some code or digging thru existing code to
accomplish this task.

Thanks,
P.

Re: Highlighting for non-stored fields

Posted by Erick Erickson <er...@gmail.com>.
Also, consider what you'd be reconstructing from if you could try it. The
indexed data has been transformed by, say, stemming, casing, etc. So
any attempt to reconstruct the fields for highlighting would necessarily
show the transformed version, which would not be pleasing. Plus you could
have synonyms in there...

I've used a variant on Pradeep's suggestion with good results...

Best
Erick

On Tue, Oct 26, 2010 at 3:44 PM, Phong Dais <ph...@gmail.com> wrote:

> Thanks for the insight.
> This is definitely a feasible solution because I only need to highlight
> when
> the user open the document.
> I guess the easiest way I can do this is to "reuse" the solr code (with
> some
> modification) in my own application.
>
> On Tue, Oct 26, 2010 at 2:35 PM, Pradeep Singh <pk...@gmail.com>
> wrote:
>
> > Another way you can do this is - after the search has completed, load the
> > field in your application, write separate code to reanalyze that
> > field/document, index it in RAM, and run it through highlighter classes.
> > All
> > this as part of your web application outside of Solr. Considering the
> size
> > of your data it doesn't look advisable to store it because then you would
> > be
> > almost doubling the size of your index (if you are looking to highlight
> on
> > a
> > field then it's probably going to be full of content).
> >
> > -Pradeep
> >
> > On Tue, Oct 26, 2010 at 8:32 AM, Phong Dais <ph...@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > I understand that I need to store the fields in order to use
> highlighting
> > > "out of the box".
> > > I'm looking for a way to highlighting using term offsets instead of the
> > > actual text since the text is not stored.  What am asking is is it
> > possible
> > > to modify the response (thru custom implementation) to contain
> > highlighted
> > > offsets instead of the actual matched text.  Should I be writing my own
> > > DefaultHighlighter?  Or overiding some of its functionality?  Can this
> be
> > > done this way or am I way off?
> > >
> > > BTW, I'm using solr-1.4.
> > >
> > > Thanks,
> > > P.
> > >
> > > On Tue, Oct 26, 2010 at 9:25 AM, Israel Ekpo <is...@gmail.com>
> > wrote:
> > >
> > > > Check out this link
> > > >
> > > > http://wiki.apache.org/solr/FieldOptionsByUseCase
> > > >
> > > > You need to store the field if you want to use the highlighting
> > feature.
> > > >
> > > > If you need to retrieve and display the highlighted snippets then the
> > > > fields
> > > > definitely needs to be stored.
> > > >
> > > > To use term offsets, it will be a good idea to enable the following
> > > > attributes for that field  termVectors termPositions termOffsets
> > > >
> > > > The only issue here is that your storage costs will increase because
> of
> > > > these extra features.
> > > >
> > > > Nevertheless, you definitely need to store the field if you need to
> > > > retrieve
> > > > it for highlighting purposes.
> > > >
> > > > On Tue, Oct 26, 2010 at 6:50 AM, Phong Dais <ph...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I've been looking thru the mailing archive for the past week and I
> > > > haven't
> > > > > found any useful info regarding this issue.
> > > > >
> > > > > My requirement is to index a few terabytes worth of data to be
> > > searched.
> > > > > Due to the size of the data, I would like to index without storing
> > but
> > > I
> > > > > would like to use the highlighting feature.  Is this even possible?
> > >  What
> > > > > are my options?
> > > > >
> > > > > I've read about termOffsets, payload that could possibly be used to
> > do
> > > > this
> > > > > but I have no idea how this could be done.
> > > > >
> > > > > Any pointers greatly appreciated.  Someone please point me in the
> > right
> > > > > direction.
> > > > >
> > > > >  I don't mind having to write some code or digging thru existing
> code
> > > to
> > > > > accomplish this task.
> > > > >
> > > > > Thanks,
> > > > > P.
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > °O°
> > > > "Good Enough" is not good enough.
> > > > To give anything less than your best is to sacrifice the gift.
> > > > Quality First. Measure Twice. Cut Once.
> > > > http://www.israelekpo.com/
> > > >
> > >
> >
>

Re: Highlighting for non-stored fields

Posted by Phong Dais <ph...@gmail.com>.
Thanks for the insight.
This is definitely a feasible solution because I only need to highlight when
the user open the document.
I guess the easiest way I can do this is to "reuse" the solr code (with some
modification) in my own application.

On Tue, Oct 26, 2010 at 2:35 PM, Pradeep Singh <pk...@gmail.com> wrote:

> Another way you can do this is - after the search has completed, load the
> field in your application, write separate code to reanalyze that
> field/document, index it in RAM, and run it through highlighter classes.
> All
> this as part of your web application outside of Solr. Considering the size
> of your data it doesn't look advisable to store it because then you would
> be
> almost doubling the size of your index (if you are looking to highlight on
> a
> field then it's probably going to be full of content).
>
> -Pradeep
>
> On Tue, Oct 26, 2010 at 8:32 AM, Phong Dais <ph...@gmail.com> wrote:
>
> > Hi,
> >
> > I understand that I need to store the fields in order to use highlighting
> > "out of the box".
> > I'm looking for a way to highlighting using term offsets instead of the
> > actual text since the text is not stored.  What am asking is is it
> possible
> > to modify the response (thru custom implementation) to contain
> highlighted
> > offsets instead of the actual matched text.  Should I be writing my own
> > DefaultHighlighter?  Or overiding some of its functionality?  Can this be
> > done this way or am I way off?
> >
> > BTW, I'm using solr-1.4.
> >
> > Thanks,
> > P.
> >
> > On Tue, Oct 26, 2010 at 9:25 AM, Israel Ekpo <is...@gmail.com>
> wrote:
> >
> > > Check out this link
> > >
> > > http://wiki.apache.org/solr/FieldOptionsByUseCase
> > >
> > > You need to store the field if you want to use the highlighting
> feature.
> > >
> > > If you need to retrieve and display the highlighted snippets then the
> > > fields
> > > definitely needs to be stored.
> > >
> > > To use term offsets, it will be a good idea to enable the following
> > > attributes for that field  termVectors termPositions termOffsets
> > >
> > > The only issue here is that your storage costs will increase because of
> > > these extra features.
> > >
> > > Nevertheless, you definitely need to store the field if you need to
> > > retrieve
> > > it for highlighting purposes.
> > >
> > > On Tue, Oct 26, 2010 at 6:50 AM, Phong Dais <ph...@gmail.com>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > I've been looking thru the mailing archive for the past week and I
> > > haven't
> > > > found any useful info regarding this issue.
> > > >
> > > > My requirement is to index a few terabytes worth of data to be
> > searched.
> > > > Due to the size of the data, I would like to index without storing
> but
> > I
> > > > would like to use the highlighting feature.  Is this even possible?
> >  What
> > > > are my options?
> > > >
> > > > I've read about termOffsets, payload that could possibly be used to
> do
> > > this
> > > > but I have no idea how this could be done.
> > > >
> > > > Any pointers greatly appreciated.  Someone please point me in the
> right
> > > > direction.
> > > >
> > > >  I don't mind having to write some code or digging thru existing code
> > to
> > > > accomplish this task.
> > > >
> > > > Thanks,
> > > > P.
> > > >
> > >
> > >
> > >
> > > --
> > > °O°
> > > "Good Enough" is not good enough.
> > > To give anything less than your best is to sacrifice the gift.
> > > Quality First. Measure Twice. Cut Once.
> > > http://www.israelekpo.com/
> > >
> >
>

Re: Highlighting for non-stored fields

Posted by Pradeep Singh <pk...@gmail.com>.
Another way you can do this is - after the search has completed, load the
field in your application, write separate code to reanalyze that
field/document, index it in RAM, and run it through highlighter classes. All
this as part of your web application outside of Solr. Considering the size
of your data it doesn't look advisable to store it because then you would be
almost doubling the size of your index (if you are looking to highlight on a
field then it's probably going to be full of content).

-Pradeep

On Tue, Oct 26, 2010 at 8:32 AM, Phong Dais <ph...@gmail.com> wrote:

> Hi,
>
> I understand that I need to store the fields in order to use highlighting
> "out of the box".
> I'm looking for a way to highlighting using term offsets instead of the
> actual text since the text is not stored.  What am asking is is it possible
> to modify the response (thru custom implementation) to contain highlighted
> offsets instead of the actual matched text.  Should I be writing my own
> DefaultHighlighter?  Or overiding some of its functionality?  Can this be
> done this way or am I way off?
>
> BTW, I'm using solr-1.4.
>
> Thanks,
> P.
>
> On Tue, Oct 26, 2010 at 9:25 AM, Israel Ekpo <is...@gmail.com> wrote:
>
> > Check out this link
> >
> > http://wiki.apache.org/solr/FieldOptionsByUseCase
> >
> > You need to store the field if you want to use the highlighting feature.
> >
> > If you need to retrieve and display the highlighted snippets then the
> > fields
> > definitely needs to be stored.
> >
> > To use term offsets, it will be a good idea to enable the following
> > attributes for that field  termVectors termPositions termOffsets
> >
> > The only issue here is that your storage costs will increase because of
> > these extra features.
> >
> > Nevertheless, you definitely need to store the field if you need to
> > retrieve
> > it for highlighting purposes.
> >
> > On Tue, Oct 26, 2010 at 6:50 AM, Phong Dais <ph...@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > I've been looking thru the mailing archive for the past week and I
> > haven't
> > > found any useful info regarding this issue.
> > >
> > > My requirement is to index a few terabytes worth of data to be
> searched.
> > > Due to the size of the data, I would like to index without storing but
> I
> > > would like to use the highlighting feature.  Is this even possible?
>  What
> > > are my options?
> > >
> > > I've read about termOffsets, payload that could possibly be used to do
> > this
> > > but I have no idea how this could be done.
> > >
> > > Any pointers greatly appreciated.  Someone please point me in the right
> > > direction.
> > >
> > >  I don't mind having to write some code or digging thru existing code
> to
> > > accomplish this task.
> > >
> > > Thanks,
> > > P.
> > >
> >
> >
> >
> > --
> > °O°
> > "Good Enough" is not good enough.
> > To give anything less than your best is to sacrifice the gift.
> > Quality First. Measure Twice. Cut Once.
> > http://www.israelekpo.com/
> >
>

Re: Highlighting for non-stored fields

Posted by Phong Dais <ph...@gmail.com>.
Hi,

I understand that I need to store the fields in order to use highlighting
"out of the box".
I'm looking for a way to highlighting using term offsets instead of the
actual text since the text is not stored.  What am asking is is it possible
to modify the response (thru custom implementation) to contain highlighted
offsets instead of the actual matched text.  Should I be writing my own
DefaultHighlighter?  Or overiding some of its functionality?  Can this be
done this way or am I way off?

BTW, I'm using solr-1.4.

Thanks,
P.

On Tue, Oct 26, 2010 at 9:25 AM, Israel Ekpo <is...@gmail.com> wrote:

> Check out this link
>
> http://wiki.apache.org/solr/FieldOptionsByUseCase
>
> You need to store the field if you want to use the highlighting feature.
>
> If you need to retrieve and display the highlighted snippets then the
> fields
> definitely needs to be stored.
>
> To use term offsets, it will be a good idea to enable the following
> attributes for that field  termVectors termPositions termOffsets
>
> The only issue here is that your storage costs will increase because of
> these extra features.
>
> Nevertheless, you definitely need to store the field if you need to
> retrieve
> it for highlighting purposes.
>
> On Tue, Oct 26, 2010 at 6:50 AM, Phong Dais <ph...@gmail.com> wrote:
>
> > Hi,
> >
> > I've been looking thru the mailing archive for the past week and I
> haven't
> > found any useful info regarding this issue.
> >
> > My requirement is to index a few terabytes worth of data to be searched.
> > Due to the size of the data, I would like to index without storing but I
> > would like to use the highlighting feature.  Is this even possible?  What
> > are my options?
> >
> > I've read about termOffsets, payload that could possibly be used to do
> this
> > but I have no idea how this could be done.
> >
> > Any pointers greatly appreciated.  Someone please point me in the right
> > direction.
> >
> >  I don't mind having to write some code or digging thru existing code to
> > accomplish this task.
> >
> > Thanks,
> > P.
> >
>
>
>
> --
> °O°
> "Good Enough" is not good enough.
> To give anything less than your best is to sacrifice the gift.
> Quality First. Measure Twice. Cut Once.
> http://www.israelekpo.com/
>

Re: Highlighting for non-stored fields

Posted by Israel Ekpo <is...@gmail.com>.
Check out this link

http://wiki.apache.org/solr/FieldOptionsByUseCase

You need to store the field if you want to use the highlighting feature.

If you need to retrieve and display the highlighted snippets then the fields
definitely needs to be stored.

To use term offsets, it will be a good idea to enable the following
attributes for that field  termVectors termPositions termOffsets

The only issue here is that your storage costs will increase because of
these extra features.

Nevertheless, you definitely need to store the field if you need to retrieve
it for highlighting purposes.

On Tue, Oct 26, 2010 at 6:50 AM, Phong Dais <ph...@gmail.com> wrote:

> Hi,
>
> I've been looking thru the mailing archive for the past week and I haven't
> found any useful info regarding this issue.
>
> My requirement is to index a few terabytes worth of data to be searched.
> Due to the size of the data, I would like to index without storing but I
> would like to use the highlighting feature.  Is this even possible?  What
> are my options?
>
> I've read about termOffsets, payload that could possibly be used to do this
> but I have no idea how this could be done.
>
> Any pointers greatly appreciated.  Someone please point me in the right
> direction.
>
>  I don't mind having to write some code or digging thru existing code to
> accomplish this task.
>
> Thanks,
> P.
>



-- 
°O°
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/

Re: Highlighting for non-stored fields

Posted by Alessandro Benedetti <be...@gmail.com>.
We developed a custom Highlighter to solve this issue.
We added a "url" field in the solr schema doc for our domain and when
highlighting is called, we access the file, extract the information and send
them to the custom highlighter.

If you still need some help, I can provide you, our solution in detail!
Cheers

2010/10/26 Phong Dais <ph...@gmail.com>

> Hi,
>
> I've been looking thru the mailing archive for the past week and I haven't
> found any useful info regarding this issue.
>
> My requirement is to index a few terabytes worth of data to be searched.
> Due to the size of the data, I would like to index without storing but I
> would like to use the highlighting feature.  Is this even possible?  What
> are my options?
>
> I've read about termOffsets, payload that could possibly be used to do this
> but I have no idea how this could be done.
>
> Any pointers greatly appreciated.  Someone please point me in the right
> direction.
>
>  I don't mind having to write some code or digging thru existing code to
> accomplish this task.
>
> Thanks,
> P.
>



-- 
--------------------------

Benedetti Alessandro
Personal Page: http://tigerbolt.altervista.org

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England