You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by spravin <sp...@gmail.com> on 2011/07/27 01:27:32 UTC
Solr DataImport with multiple DBs
Hi All
I am stuck with an issue with delta-import while configuring solr in an
environment where multiple databases exist.
My schema looks like this:
<id, name, keyword>
names exist in one DB and keywords in a table in the other DB (with id as
foreign key).
For delta import, I would need to check against the updated column in both
the tables. But they are in two different databases, so I can't do this in a
single deltaquery.
So I'm not able to detect if the field in the second database has changed.
The relevant part of my dataconfig xml looks like this:
<dataConfig>
<dataSource ds1... />
<dataSource ds2... />
<document>
<entity name="name" dataSource="ds1"
query="SELECT ID, Name, Updated FROM records"
deltaImportQuery="SELECT ID, Name, Updated FROM records WHERE ID
= '${dataimporter.delta.ID <http://dataimporter.delta.id/>}'"
deltaQuery="SELECT ID FROM records WHERE Updated >
'${dataimporter.last_index_time}'">
<entity name="keywords" dataSource="ds2"
query="SELECT Keyword,Updated AS KeywordUpdated FROM
keywords WHERE ID = '${name.ID}'">
</entity>
</entity>
</document>
</dataConfig>
I'm hoping someone in this list could point me to a solution: a way to
specify deltaQuery across multiple databases.
(In the above example, I would like to add "OR ID IN (SELECT ID FROM
keywords WHERE Updated > '${dataimporter.last_index_time}')" to the
deltaQuery, but this table can be accessed only from a different dataSource.
Thanks
- PS
--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-DataImport-with-multiple-DBs-tp3201843p3201843.html
Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr DataImport with multiple DBs
Posted by "Dyer, James" <Ja...@ingrambook.com>.
Would it be possible to just run two sepearate deltas, one that updates records that changed in ds1 and another that updates records that changed in ds2 ? Of course this would be inefficient if a lot of records typically change in both places at the same time.
With this approach, you might have to run the deltas using "command=full-import / clean=false" as shown here: http://wiki.apache.org/solr/DataImportHandlerFaq#fullimportdelta
James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311
-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com]
Sent: Thursday, July 28, 2011 9:14 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr DataImport with multiple DBs
Often, the easiest solution when DIH gets really complex is to do one of
two things:
1> Use SolrJ instead. You can do complex things more easily much of
the time with DIH.
2> You could consider using a custom Transformer in conjunction with your
primary delta query to access the second table, see:
http://wiki.apache.org/solr/DIHCustomTransformer
Best
Erick
On Tue, Jul 26, 2011 at 7:27 PM, spravin <sp...@gmail.com> wrote:
> Hi All
>
> I am stuck with an issue with delta-import while configuring solr in an
> environment where multiple databases exist.
>
> My schema looks like this:
> <id, name, keyword>
> names exist in one DB and keywords in a table in the other DB (with id as
> foreign key).
>
> For delta import, I would need to check against the updated column in both
> the tables. But they are in two different databases, so I can't do this in a
> single deltaquery.
> So I'm not able to detect if the field in the second database has changed.
>
> The relevant part of my dataconfig xml looks like this:
>
> <dataConfig>
> <dataSource ds1... />
> <dataSource ds2... />
> <document>
> <entity name="name" dataSource="ds1"
> query="SELECT ID, Name, Updated FROM records"
> deltaImportQuery="SELECT ID, Name, Updated FROM records WHERE ID
> = '${dataimporter.delta.ID <http://dataimporter.delta.id/>}'"
> deltaQuery="SELECT ID FROM records WHERE Updated >
> '${dataimporter.last_index_time}'">
>
> <entity name="keywords" dataSource="ds2"
> query="SELECT Keyword,Updated AS KeywordUpdated FROM
> keywords WHERE ID = '${name.ID}'">
> </entity>
>
> </entity>
> </document>
> </dataConfig>
>
> I'm hoping someone in this list could point me to a solution: a way to
> specify deltaQuery across multiple databases.
>
> (In the above example, I would like to add "OR ID IN (SELECT ID FROM
> keywords WHERE Updated > '${dataimporter.last_index_time}')" to the
> deltaQuery, but this table can be accessed only from a different dataSource.
>
> Thanks
> - PS
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-DataImport-with-multiple-DBs-tp3201843p3201843.html
> Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr DataImport with multiple DBs
Posted by Erick Erickson <er...@gmail.com>.
Often, the easiest solution when DIH gets really complex is to do one of
two things:
1> Use SolrJ instead. You can do complex things more easily much of
the time with DIH.
2> You could consider using a custom Transformer in conjunction with your
primary delta query to access the second table, see:
http://wiki.apache.org/solr/DIHCustomTransformer
Best
Erick
On Tue, Jul 26, 2011 at 7:27 PM, spravin <sp...@gmail.com> wrote:
> Hi All
>
> I am stuck with an issue with delta-import while configuring solr in an
> environment where multiple databases exist.
>
> My schema looks like this:
> <id, name, keyword>
> names exist in one DB and keywords in a table in the other DB (with id as
> foreign key).
>
> For delta import, I would need to check against the updated column in both
> the tables. But they are in two different databases, so I can't do this in a
> single deltaquery.
> So I'm not able to detect if the field in the second database has changed.
>
> The relevant part of my dataconfig xml looks like this:
>
> <dataConfig>
> <dataSource ds1... />
> <dataSource ds2... />
> <document>
> <entity name="name" dataSource="ds1"
> query="SELECT ID, Name, Updated FROM records"
> deltaImportQuery="SELECT ID, Name, Updated FROM records WHERE ID
> = '${dataimporter.delta.ID <http://dataimporter.delta.id/>}'"
> deltaQuery="SELECT ID FROM records WHERE Updated >
> '${dataimporter.last_index_time}'">
>
> <entity name="keywords" dataSource="ds2"
> query="SELECT Keyword,Updated AS KeywordUpdated FROM
> keywords WHERE ID = '${name.ID}'">
> </entity>
>
> </entity>
> </document>
> </dataConfig>
>
> I'm hoping someone in this list could point me to a solution: a way to
> specify deltaQuery across multiple databases.
>
> (In the above example, I would like to add "OR ID IN (SELECT ID FROM
> keywords WHERE Updated > '${dataimporter.last_index_time}')" to the
> deltaQuery, but this table can be accessed only from a different dataSource.
>
> Thanks
> - PS
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-DataImport-with-multiple-DBs-tp3201843p3201843.html
> Sent from the Solr - User mailing list archive at Nabble.com.