You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jan Høydahl <ja...@cominvent.com> on 2019/10/31 07:51:47 UTC

DIH across two SQL DBs

I need a SELECT which filters IDS based on an ‘id’ list coming from another database, i.e. SELECT * FROM maindb.maintable WHERE id IN (SELECT myid FROM otherdb.other_table). 

The docs are fetched from a MySql DB while the list of IDs to includ in that first SELECT WHERE statement is fetched from a view in a PgSql DB, so you cannot simply include the table name in the WHERE clause. I have added two dataSources, and I think I’ll need an <entity> which caches the ID list from ‘otherdb’ in memory and then somehow references that cached list in place of the inner select?

However since the list of IDs are UUID strings and there are a few thousand of them, I guess the SELECT becomes too large if you just send a huge OR clause to MySql. I have been thinking about a 2-stage solution, first create a temp table in MySql and INSERT all the IDs there, then include the temp table in the WHERE as usual, and delete the tmp table afterwards. Does DIH have a built-in and efficient feature for such an operation?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com


Re: DIH across two SQL DBs

Posted by Jan Høydahl <ja...@cominvent.com>.
Hmm, I'll have a look, but the SELECT is a bit more involved so the IDs from the other DB will be OR'ed into the WHERE clause, i.e. be added to those selected from other part of the where clause, so it's not a pure join. I'll think some more

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 31. okt. 2019 kl. 14:43 skrev Mikhail Khludnev <mk...@apache.org>:
> 
> Hello, Jan.
> 
> Have you considered join="zipper" ?
> 
> On Thu, Oct 31, 2019 at 12:52 AM Jan Høydahl <ja...@cominvent.com> wrote:
> 
>> I need a SELECT which filters IDS based on an ‘id’ list coming from
>> another database, i.e. SELECT * FROM maindb.maintable WHERE id IN (SELECT
>> myid FROM otherdb.other_table).
>> 
>> The docs are fetched from a MySql DB while the list of IDs to includ in
>> that first SELECT WHERE statement is fetched from a view in a PgSql DB, so
>> you cannot simply include the table name in the WHERE clause. I have added
>> two dataSources, and I think I’ll need an <entity> which caches the ID list
>> from ‘otherdb’ in memory and then somehow references that cached list in
>> place of the inner select?
>> 
>> However since the list of IDs are UUID strings and there are a few
>> thousand of them, I guess the SELECT becomes too large if you just send a
>> huge OR clause to MySql. I have been thinking about a 2-stage solution,
>> first create a temp table in MySql and INSERT all the IDs there, then
>> include the temp table in the WHERE as usual, and delete the tmp table
>> afterwards. Does DIH have a built-in and efficient feature for such an
>> operation?
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> 
>> 
> 
> -- 
> Sincerely yours
> Mikhail Khludnev


Re: DIH across two SQL DBs

Posted by Mikhail Khludnev <mk...@apache.org>.
Hello, Jan.

Have you considered join="zipper" ?

On Thu, Oct 31, 2019 at 12:52 AM Jan Høydahl <ja...@cominvent.com> wrote:

> I need a SELECT which filters IDS based on an ‘id’ list coming from
> another database, i.e. SELECT * FROM maindb.maintable WHERE id IN (SELECT
> myid FROM otherdb.other_table).
>
> The docs are fetched from a MySql DB while the list of IDs to includ in
> that first SELECT WHERE statement is fetched from a view in a PgSql DB, so
> you cannot simply include the table name in the WHERE clause. I have added
> two dataSources, and I think I’ll need an <entity> which caches the ID list
> from ‘otherdb’ in memory and then somehow references that cached list in
> place of the inner select?
>
> However since the list of IDs are UUID strings and there are a few
> thousand of them, I guess the SELECT becomes too large if you just send a
> huge OR clause to MySql. I have been thinking about a 2-stage solution,
> first create a temp table in MySql and INSERT all the IDs there, then
> include the temp table in the WHERE as usual, and delete the tmp table
> afterwards. Does DIH have a built-in and efficient feature for such an
> operation?
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
>

-- 
Sincerely yours
Mikhail Khludnev