You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Bruno Mannina <bm...@free.fr> on 2015/01/09 18:12:10 UTC

Request two databases at the same time ?

Dear All,

I use Apache-SOLR3.6, on Ubuntu (newbie user).

I have a big database named BigDB1 with 90M documents,
each document contains several fields (docid, title, author, date, etc...)

I received today from another source, abstract of some documents (there 
are also the same docid field in this source).
I don't want to modify my BigDB1 to update documents with abstract 
because BigDB1 is always updated twice by week.

Do you think it's possible to create a new database named AbsDB1 and 
request the both database at the same time ?
  if I do for example:
title:airplane AND abstract:plastic

I would like to obtain documents from BigDB1 and AbsDB1.

Many thanks for your help, information and others things that can help me.

Regards,
Bruno

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active.
http://www.avast.com


Re: Request two databases at the same time ?

Posted by Bruno Mannina <bm...@free.fr>.
Dear Erick,

thank you for your answer.

My answers are below.

Le 09/01/2015 20:43, Erick Erickson a écrit :
> bq: I don't want to modify my BigDB1 to update documents with abstract
> because BigDB1 is always updated twice by week.
>
> Why not? Solr/Lucene handle updating docs, if a doc in the index has
> the same <uniqueKey>, the old doc is deleted and the new one takes its
> place. So why not just put the new abstracts into BigDB1? If you
> re-index the docs later (your twice/week comment), then they'll be
> overwritten. This will be much simpler than trying to maintain two.
I understand this process, I use it for other collections and twice time 
by week for BigDB1.
But, i.e. Doc1 is updated with Abstract on Monday. Tuesday I must update 
it with new data, then Abstract will be lost.
I can't check/get abstract before to re-insert it in the new doc because 
I receive several thousand docs every week (new and amend),
i think it will take a long time to do that.

> But if you cannot update BigDB1 just fire off two queries and combine
> them. Or specify the shards parameter on the URL pointing to both
> collections. Do note, though, that the relevance calculations may not
> be absolutely comparable, so mixing the results may show some
> surprises...
Shards..I wilkl take a look to this, I don't know this param.
Concerning relevance, I don't really use it, so it won't be a problem I 
think.


Sincerely,

> Best,
> Erick
>
> On Fri, Jan 9, 2015 at 9:12 AM, Bruno Mannina <bm...@free.fr> wrote:
>> Dear All,
>>
>> I use Apache-SOLR3.6, on Ubuntu (newbie user).
>>
>> I have a big database named BigDB1 with 90M documents,
>> each document contains several fields (docid, title, author, date, etc...)
>>
>> I received today from another source, abstract of some documents (there are
>> also the same docid field in this source).
>> I don't want to modify my BigDB1 to update documents with abstract because
>> BigDB1 is always updated twice by week.
>>
>> Do you think it's possible to create a new database named AbsDB1 and request
>> the both database at the same time ?
>>   if I do for example:
>> title:airplane AND abstract:plastic
>>
>> I would like to obtain documents from BigDB1 and AbsDB1.
>>
>> Many thanks for your help, information and others things that can help me.
>>
>> Regards,
>> Bruno
>>
>> ---
>> Ce courrier électronique ne contient aucun virus ou logiciel malveillant
>> parce que la protection avast! Antivirus est active.
>> http://www.avast.com
>>
>


---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active.
http://www.avast.com


Re: Request two databases at the same time ?

Posted by Erick Erickson <er...@gmail.com>.
bq: I don't want to modify my BigDB1 to update documents with abstract
because BigDB1 is always updated twice by week.

Why not? Solr/Lucene handle updating docs, if a doc in the index has
the same <uniqueKey>, the old doc is deleted and the new one takes its
place. So why not just put the new abstracts into BigDB1? If you
re-index the docs later (your twice/week comment), then they'll be
overwritten. This will be much simpler than trying to maintain two.

But if you cannot update BigDB1 just fire off two queries and combine
them. Or specify the shards parameter on the URL pointing to both
collections. Do note, though, that the relevance calculations may not
be absolutely comparable, so mixing the results may show some
surprises...

Best,
Erick

On Fri, Jan 9, 2015 at 9:12 AM, Bruno Mannina <bm...@free.fr> wrote:
> Dear All,
>
> I use Apache-SOLR3.6, on Ubuntu (newbie user).
>
> I have a big database named BigDB1 with 90M documents,
> each document contains several fields (docid, title, author, date, etc...)
>
> I received today from another source, abstract of some documents (there are
> also the same docid field in this source).
> I don't want to modify my BigDB1 to update documents with abstract because
> BigDB1 is always updated twice by week.
>
> Do you think it's possible to create a new database named AbsDB1 and request
> the both database at the same time ?
>  if I do for example:
> title:airplane AND abstract:plastic
>
> I would like to obtain documents from BigDB1 and AbsDB1.
>
> Many thanks for your help, information and others things that can help me.
>
> Regards,
> Bruno
>
> ---
> Ce courrier électronique ne contient aucun virus ou logiciel malveillant
> parce que la protection avast! Antivirus est active.
> http://www.avast.com
>