You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Raghu Ram <ra...@gmail.com> on 2007/10/06 05:20:43 UTC

Group of documents.

Hi,
      We have an application in which we want to index feeds. Each feed is a
collection of articles and some other metadata. The problem is that
sometimes we want to search for feeds and sometimes for articles. As far as
I know lucene doesn't provide any abstraction for grouping  its documents.
The only solution that we have in our minds now is to have two indexes one
for articles and one for feeds. There are two problems with this approach
1) redundancy
2) as feeds are just a collection of articles and get updated we have to
continuously update the document that represents the feed in the feed index.
Is this an efficient operation ??

Can we have any other solutions/hacks for this problem ??
Thanks.

Re: Group of documents.

Posted by Chris Hostetter <ho...@fucit.org>.
: The only solution that we have in our minds now is to have two indexes one
: for articles and one for feeds. There are two problems with this approach
: 1) redundancy

this isn't really a "problem" a lucene index is designed to make searching 
fast, not to be a normalized data store -- there are lots of little 
redundencies inside of a lucene index to make searching faster, you're 
just talking about adding one higher up.  

alterntately you could just have an index of articles and when you want to 
"search for a feed" you would scan through every matching article to build 
a list of matching feeds.  this would be less "redundent" but it would 
also probably be slower.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Group of documents.

Posted by Alf Eaton <li...@hubmed.org>.
Make a separate index of feeds?

alf

Raghu Ram wrote:
> But then how can i search for feeds ???
> 
> On 10/6/07, Alf Eaton <li...@hubmed.org> wrote:
>> Raghu Ram wrote:
>>> Hi,
>>>       We have an application in which we want to index feeds. Each feed
>> is a
>>> collection of articles and some other metadata. The problem is that
>>> sometimes we want to search for feeds and sometimes for articles. As far
>> as
>>> I know lucene doesn't provide any abstraction for grouping  its
>> documents.
>>> The only solution that we have in our minds now is to have two indexes
>> one
>>> for articles and one for feeds. There are two problems with this
>> approach
>>> 1) redundancy
>>> 2) as feeds are just a collection of articles and get updated we have to
>>> continuously update the document that represents the feed in the feed
>> index.
>>> Is this an efficient operation ??
>>>
>>> Can we have any other solutions/hacks for this problem ??
>> Add a multi-valued field to each article that says which feed(s) it's
>> found in?
>>
>> alf
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Group of documents.

Posted by Yonik Seeley <yo...@apache.org>.
On 10/6/07, Raghu Ram <ra...@gmail.com> wrote:
> But then how can i search for feeds ???

I'm not quite sure what you mean by "search for feeds"...
but assuming you want a list of feeds that contain articles with the
search terms, you could do faceting on the "feeds" field.  That would
let you know which feeds had the most matching documents.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Group of documents.

Posted by Raghu Ram <ra...@gmail.com>.
But then how can i search for feeds ???

On 10/6/07, Alf Eaton <li...@hubmed.org> wrote:
>
> Raghu Ram wrote:
> > Hi,
> >       We have an application in which we want to index feeds. Each feed
> is a
> > collection of articles and some other metadata. The problem is that
> > sometimes we want to search for feeds and sometimes for articles. As far
> as
> > I know lucene doesn't provide any abstraction for grouping  its
> documents.
> > The only solution that we have in our minds now is to have two indexes
> one
> > for articles and one for feeds. There are two problems with this
> approach
> > 1) redundancy
> > 2) as feeds are just a collection of articles and get updated we have to
> > continuously update the document that represents the feed in the feed
> index.
> > Is this an efficient operation ??
> >
> > Can we have any other solutions/hacks for this problem ??
>
> Add a multi-valued field to each article that says which feed(s) it's
> found in?
>
> alf
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Group of documents.

Posted by Alf Eaton <li...@hubmed.org>.
Raghu Ram wrote:
> Hi,
>       We have an application in which we want to index feeds. Each feed is a
> collection of articles and some other metadata. The problem is that
> sometimes we want to search for feeds and sometimes for articles. As far as
> I know lucene doesn't provide any abstraction for grouping  its documents.
> The only solution that we have in our minds now is to have two indexes one
> for articles and one for feeds. There are two problems with this approach
> 1) redundancy
> 2) as feeds are just a collection of articles and get updated we have to
> continuously update the document that represents the feed in the feed index.
> Is this an efficient operation ??
> 
> Can we have any other solutions/hacks for this problem ??

Add a multi-valued field to each article that says which feed(s) it's
found in?

alf

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org