You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by spravin <sp...@gmail.com> on 2011/07/27 01:27:32 UTC

Solr DataImport with multiple DBs

Hi All

I am stuck with an issue with delta-import while configuring solr in an
environment where multiple databases exist.

My schema looks like this:
<id, name, keyword>
names exist in one DB and keywords in a table in the other DB (with id as
foreign key).

For delta import, I would need to check against the updated column in both
the tables. But they are in two different databases, so I can't do this in a
single deltaquery.
So I'm not able to detect if the field in the second database has changed.

The relevant part of my dataconfig xml looks like this:

<dataConfig>
  <dataSource ds1... />
  <dataSource ds2... />
  <document>
    <entity name="name" dataSource="ds1"
            query="SELECT ID, Name, Updated FROM records"
            deltaImportQuery="SELECT ID, Name, Updated FROM records WHERE ID
= '${dataimporter.delta.ID <http://dataimporter.delta.id/>}'"
            deltaQuery="SELECT ID FROM records WHERE Updated >
'${dataimporter.last_index_time}'">

            <entity name="keywords"  dataSource="ds2"
                    query="SELECT Keyword,Updated AS KeywordUpdated FROM
keywords WHERE ID = '${name.ID}'">
            </entity>

    </entity>
  </document>
</dataConfig>

I'm hoping someone in this list could point me to a solution: a way to
specify deltaQuery across multiple databases.

(In the above example, I would like to add "OR ID IN (SELECT ID FROM
keywords WHERE Updated > '${dataimporter.last_index_time}')" to the
deltaQuery, but this table can be accessed only from a different dataSource.

Thanks
- PS


--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-DataImport-with-multiple-DBs-tp3201843p3201843.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Solr DataImport with multiple DBs

Posted by "Dyer, James" <Ja...@ingrambook.com>.
Would it be possible to just run two sepearate deltas, one that updates records that changed in ds1 and another that updates records that changed in ds2 ?  Of course this would be inefficient if a lot of records typically change in both places at the same time.

With this approach, you might have to run the deltas using "command=full-import / clean=false" as shown here: http://wiki.apache.org/solr/DataImportHandlerFaq#fullimportdelta

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Thursday, July 28, 2011 9:14 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr DataImport with multiple DBs

Often, the easiest solution when DIH gets really complex is to do one of
two things:
1> Use SolrJ instead. You can do complex things more easily much of
     the time with DIH.
2> You could consider using a custom Transformer in conjunction with your
      primary delta query to access the second table, see:
       http://wiki.apache.org/solr/DIHCustomTransformer


Best
Erick

On Tue, Jul 26, 2011 at 7:27 PM, spravin <sp...@gmail.com> wrote:
> Hi All
>
> I am stuck with an issue with delta-import while configuring solr in an
> environment where multiple databases exist.
>
> My schema looks like this:
> <id, name, keyword>
> names exist in one DB and keywords in a table in the other DB (with id as
> foreign key).
>
> For delta import, I would need to check against the updated column in both
> the tables. But they are in two different databases, so I can't do this in a
> single deltaquery.
> So I'm not able to detect if the field in the second database has changed.
>
> The relevant part of my dataconfig xml looks like this:
>
> <dataConfig>
>  <dataSource ds1... />
>  <dataSource ds2... />
>  <document>
>    <entity name="name" dataSource="ds1"
>            query="SELECT ID, Name, Updated FROM records"
>            deltaImportQuery="SELECT ID, Name, Updated FROM records WHERE ID
> = '${dataimporter.delta.ID <http://dataimporter.delta.id/>}'"
>            deltaQuery="SELECT ID FROM records WHERE Updated >
> '${dataimporter.last_index_time}'">
>
>            <entity name="keywords"  dataSource="ds2"
>                    query="SELECT Keyword,Updated AS KeywordUpdated FROM
> keywords WHERE ID = '${name.ID}'">
>            </entity>
>
>    </entity>
>  </document>
> </dataConfig>
>
> I'm hoping someone in this list could point me to a solution: a way to
> specify deltaQuery across multiple databases.
>
> (In the above example, I would like to add "OR ID IN (SELECT ID FROM
> keywords WHERE Updated > '${dataimporter.last_index_time}')" to the
> deltaQuery, but this table can be accessed only from a different dataSource.
>
> Thanks
> - PS
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-DataImport-with-multiple-DBs-tp3201843p3201843.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr DataImport with multiple DBs

Posted by Erick Erickson <er...@gmail.com>.
Often, the easiest solution when DIH gets really complex is to do one of
two things:
1> Use SolrJ instead. You can do complex things more easily much of
     the time with DIH.
2> You could consider using a custom Transformer in conjunction with your
      primary delta query to access the second table, see:
       http://wiki.apache.org/solr/DIHCustomTransformer


Best
Erick

On Tue, Jul 26, 2011 at 7:27 PM, spravin <sp...@gmail.com> wrote:
> Hi All
>
> I am stuck with an issue with delta-import while configuring solr in an
> environment where multiple databases exist.
>
> My schema looks like this:
> <id, name, keyword>
> names exist in one DB and keywords in a table in the other DB (with id as
> foreign key).
>
> For delta import, I would need to check against the updated column in both
> the tables. But they are in two different databases, so I can't do this in a
> single deltaquery.
> So I'm not able to detect if the field in the second database has changed.
>
> The relevant part of my dataconfig xml looks like this:
>
> <dataConfig>
>  <dataSource ds1... />
>  <dataSource ds2... />
>  <document>
>    <entity name="name" dataSource="ds1"
>            query="SELECT ID, Name, Updated FROM records"
>            deltaImportQuery="SELECT ID, Name, Updated FROM records WHERE ID
> = '${dataimporter.delta.ID <http://dataimporter.delta.id/>}'"
>            deltaQuery="SELECT ID FROM records WHERE Updated >
> '${dataimporter.last_index_time}'">
>
>            <entity name="keywords"  dataSource="ds2"
>                    query="SELECT Keyword,Updated AS KeywordUpdated FROM
> keywords WHERE ID = '${name.ID}'">
>            </entity>
>
>    </entity>
>  </document>
> </dataConfig>
>
> I'm hoping someone in this list could point me to a solution: a way to
> specify deltaQuery across multiple databases.
>
> (In the above example, I would like to add "OR ID IN (SELECT ID FROM
> keywords WHERE Updated > '${dataimporter.last_index_time}')" to the
> deltaQuery, but this table can be accessed only from a different dataSource.
>
> Thanks
> - PS
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-DataImport-with-multiple-DBs-tp3201843p3201843.html
> Sent from the Solr - User mailing list archive at Nabble.com.