You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Li Li <fa...@gmail.com> on 2010/07/07 08:07:49 UTC

How to manage resource out of index?

I used to store full text into lucene index. But I found it's very
slow when merging index because when merging 2 segments it copy the
fdt files into a new one. So I want to only index full text. But When
searching I need the full text for applications such as hightlight and
view full text. I can store the full text by <url,full text> pair in
database and load it to memory. And When I search in lucene(or solr),
I retrive url of doc first, then use url to get full text. But when
they are stored separately, it is hard to managed. They may be not
consistent with each other. Does lucene or solr provied any method to
ease this problem? Or any one  has some experience of this problem?

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: How to manage resource out of index?

Posted by Li Li <fa...@gmail.com>.

thank you.

2010/7/7 Rebecca Watson <be...@gmail.com>:
> hi li,
>
> i looked at doing something similar - where we only index the text
> but retrieve search results / highlight from files -- we ended up giving
> up because of the amount of customisation required in solr -- mainly
> because we wanted the distributed search functionality in solr which
> meant making
> sure the original file ended up the same filing system i.e. machine too!).
>
> we ended up just storing the main text field too even though there was a
> bit of text -- in the end solr/lucene can handle the index size fine and
> disk space is cheaper than man-hours to customise solr/lucene to work
> in this way!
>
> that was our conclusion anyway and it works fine -- we also have
> separate index / search server(s) so we don't care about merge time
> either -- and as i said above - we use the distributed search so don't tend
> to need to merge very large indexes anyway.
> when your system grows / you go into production you'll probably split
> the indexes too to use solr's distributed search func. for the sake of
> query speed).
>
> hope that helps,
>
> bec :)
>
> On 7 July 2010 14:07, Li Li <fa...@gmail.com> wrote:
>> I used to store full text into lucene index. But I found it's very
>> slow when merging index because when merging 2 segments it copy the
>> fdt files into a new one. So I want to only index full text. But When
>> searching I need the full text for applications such as hightlight and
>> view full text. I can store the full text by <url,full text> pair in
>> database and load it to memory. And When I search in lucene(or solr),
>> I retrive url of doc first, then use url to get full text. But when
>> they are stored separately, it is hard to managed. They may be not
>> consistent with each other. Does lucene or solr provied any method to
>> ease this problem? Or any one  has some experience of this problem?
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: How to manage resource out of index?

Posted by Rebecca Watson <be...@gmail.com>.

hi li,

i looked at doing something similar - where we only index the text
but retrieve search results / highlight from files -- we ended up giving
up because of the amount of customisation required in solr -- mainly
because we wanted the distributed search functionality in solr which
meant making
sure the original file ended up the same filing system i.e. machine too!).

we ended up just storing the main text field too even though there was a
bit of text -- in the end solr/lucene can handle the index size fine and
disk space is cheaper than man-hours to customise solr/lucene to work
in this way!

that was our conclusion anyway and it works fine -- we also have
separate index / search server(s) so we don't care about merge time
either -- and as i said above - we use the distributed search so don't tend
to need to merge very large indexes anyway.
when your system grows / you go into production you'll probably split
the indexes too to use solr's distributed search func. for the sake of
query speed).

hope that helps,

bec :)

On 7 July 2010 14:07, Li Li <fa...@gmail.com> wrote:
> I used to store full text into lucene index. But I found it's very
> slow when merging index because when merging 2 segments it copy the
> fdt files into a new one. So I want to only index full text. But When
> searching I need the full text for applications such as hightlight and
> view full text. I can store the full text by <url,full text> pair in
> database and load it to memory. And When I search in lucene(or solr),
> I retrive url of doc first, then use url to get full text. But when
> they are stored separately, it is hard to managed. They may be not
> consistent with each other. Does lucene or solr provied any method to
> ease this problem? Or any one  has some experience of this problem?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: How to manage resource out of index?

Posted by Rebecca Watson <be...@gmail.com>.

hi li,

i looked at doing something similar - where we only index the text
but retrieve search results / highlight from files -- we ended up giving
up because of the amount of customisation required in solr -- mainly
because we wanted the distributed search functionality in solr which
meant making
sure the original file ended up the same filing system i.e. machine too!).

we ended up just storing the main text field too even though there was a
bit of text -- in the end solr/lucene can handle the index size fine and
disk space is cheaper than man-hours to customise solr/lucene to work
in this way!

that was our conclusion anyway and it works fine -- we also have
separate index / search server(s) so we don't care about merge time
either -- and as i said above - we use the distributed search so don't tend
to need to merge very large indexes anyway.
when your system grows / you go into production you'll probably split
the indexes too to use solr's distributed search func. for the sake of
query speed).

hope that helps,

bec :)

On 7 July 2010 14:07, Li Li <fa...@gmail.com> wrote:
> I used to store full text into lucene index. But I found it's very
> slow when merging index because when merging 2 segments it copy the
> fdt files into a new one. So I want to only index full text. But When
> searching I need the full text for applications such as hightlight and
> view full text. I can store the full text by <url,full text> pair in
> database and load it to memory. And When I search in lucene(or solr),
> I retrive url of doc first, then use url to get full text. But when
> they are stored separately, it is hard to managed. They may be not
> consistent with each other. Does lucene or solr provied any method to
> ease this problem? Or any one  has some experience of this problem?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org