You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Kevin Xiang <KE...@morningstar.com> on 2011/04/21 04:01:56 UTC

The issue of import data from database using Solr DIH

Hi all,
I am a new to solr,I am importing data from database using DIH(solr
1.4).One document is made up of two entity,Every entity is a table in
database.
For example:
Table1:have 3 fields;
Table2:have 4 fields;
If it is Ok,it will be 7 fields.
But it is only 4 fields,it seem that solr don't merge the fields and
table2 over write table1.
The key is OS06Y.
The configuration of db-data-config.xml is the following:
<document name="allperf">
    <entity name="PerformanceData1"
dataSource="getTrailingTotalReturnForMonthEnd1" query="SELECT Perfo
rmanceId,Trailing1MonthReturn,Trailing2MonthReturn,Trailing3MonthReturn,
FROM  Table1">
        <field column="PerformanceId" name="OS06Y" />
        <field column="Trailing1MonthReturn" name="PM004" />
        <field column="Trailing2MonthReturn" name="PM133" />
        <field column="Trailing3MonthReturn" name="PM006" />
            </entity>
<entity name="PerformanceData2"
dataSource="getTrailingTotalReturnForMonthEnd2" query="SELECT Performan
ceId,Trailing10YearReturn,Trailing15YearReturn,TrailingYearToDateReturn,
SinceInceptionReturn FROM Table2">
        <field column="PerformanceId" name="OS06Y" />
        <field column="Trailing10YearReturn" name="PM00I" />
        <field column="Trailing15YearReturn" name="PM00K" />
        <field column="TrailingYearToDateReturn" name="PM00A" />
        <field column="SinceInceptionReturn" name="PM00M" />
    </entity>
</document>
Has anyone come across this issue?
Any suggestions on how to fix this issue is much appreciated. 
Thanks.

RE: The issue of import data from database using Solr DIH

Posted by Em <ma...@yahoo.de>.
As Iboutrus mentioned, if you can summarize it in a query, than yes, Solr can
handle it. 

Make a step backward: Do not think of Solr. Write a query (one! query) that
shows exactly the output you exepct. Afterwards, implement this query as a
source for DIH.

Regards,
Em

--
View this message in context: http://lucene.472066.n3.nabble.com/The-issue-of-import-data-from-database-using-Solr-DIH-tp2845318p2846414.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: The issue of import data from database using Solr DIH

Posted by Kevin Xiang <KE...@morningstar.com>.
Yes, it is like the left outer join.
In my example.the table may be table or view or stored procedure,I can
not change it in database.
If for every id in table1,we need search the fields by id from table2 in
database,it will met performance issue,especially the size of tables are
very big.

-----Original Message-----
From: lboutros [mailto:boutrosl@gmail.com] 
Sent: Thursday, April 21, 2011 5:25 PM
To: solr-user@lucene.apache.org
Subject: RE: The issue of import data from database using Solr DIH

What you want to do is something like a left outer join, isn't it ?

something like : select table2.OS06Y, f1,f2,f3,f4,f5 from table2 left
outer
join table1 on table2.OS06Y = table1.OS06Y where ...

could you prepare a view in your RDBMS ? That could be another solution
?

Ludovic.

-----
Jouve
France.
--
View this message in context:
http://lucene.472066.n3.nabble.com/The-issue-of-import-data-from-databas
e-using-Solr-DIH-tp2845318p2846403.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: The issue of import data from database using Solr DIH

Posted by lboutros <bo...@gmail.com>.
What you want to do is something like a left outer join, isn't it ?

something like : select table2.OS06Y, f1,f2,f3,f4,f5 from table2 left outer
join table1 on table2.OS06Y = table1.OS06Y where ...

could you prepare a view in your RDBMS ? That could be another solution ?

Ludovic.

-----
Jouve
France.
--
View this message in context: http://lucene.472066.n3.nabble.com/The-issue-of-import-data-from-database-using-Solr-DIH-tp2845318p2846403.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: The issue of import data from database using Solr DIH

Posted by Kevin Xiang <KE...@morningstar.com>.
I try "remove the OS06Y-field from your second entity ",import the
second entity failed.

Give a example:

Table1:
OS06Y=123,f1=100,f2=200,f3=300;
OS06Y=456,f1=100,f2=200,f3=300;

Table2:
OS06Y=123,f4=100,f5=200;
OS06Y=456,f4=100;
OS06Y=789,f4=100;

I want the result:
OS06Y=123,f1=100,f2=200,f3=300,f4=100,f5=200;
OS06Y=456,f1=100,f2=200,f3=300,f4=100;
OS06Y=789,f4=100;

Can solr implement it? if yes,how to configure dataconfig.xml in solr?

-----Original Message-----
From: Em [mailto:mailformailinglists@yahoo.de] 
Sent: Thursday, April 21, 2011 4:59 PM
To: solr-user@lucene.apache.org
Subject: RE: The issue of import data from database using Solr DIH

Not sure I understood you correct:

You expect that OS06Y stores *two* different performanceIds? One from
table1
and the other from table2?
I think this may be a problem.

If both OS06Y-keys are equal, than you can use the syntax as mentioned
in
the wiki without any problems. You just have to rewrite your config to
make
the second entity a sub-entity and to add a WHERE-clause.

If this is really not possible for you, just a guess, what happens if
you
remove the OS06Y-field from your second entity?

Regards,
Em

--
View this message in context:
http://lucene.472066.n3.nabble.com/The-issue-of-import-data-from-databas
e-using-Solr-DIH-tp2845318p2846347.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: The issue of import data from database using Solr DIH

Posted by Em <ma...@yahoo.de>.
Not sure I understood you correct:

You expect that OS06Y stores *two* different performanceIds? One from table1
and the other from table2?
I think this may be a problem.

If both OS06Y-keys are equal, than you can use the syntax as mentioned in
the wiki without any problems. You just have to rewrite your config to make
the second entity a sub-entity and to add a WHERE-clause.

If this is really not possible for you, just a guess, what happens if you
remove the OS06Y-field from your second entity?

Regards,
Em

--
View this message in context: http://lucene.472066.n3.nabble.com/The-issue-of-import-data-from-database-using-Solr-DIH-tp2845318p2846347.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: The issue of import data from database using Solr DIH

Posted by Kevin Xiang <KE...@morningstar.com>.
Thanks Em.
Yes, OS06Y is the uniqueKey.
Table1 and Table2 is parallel in my example.
In the Url:
http://wiki.apache.org/solr/DIHQuickStart#Index_data_from_multiple_table
s_into_Solr
The tables don't have parallel relations in the above URL example
I want to know that can solr implement the case?
1.Get data from database table1;
2.Get data from database table2;
3.merge the fields of table1 and table2;

The configuration of db-data-config.xml is the following:
<document name="allperf">
    <entity name="PerformanceData1"
dataSource="getTrailingTotalReturnForMonthEnd1" query="SELECT Perfo
rmanceId,Trailing1MonthReturn,Trailing2MonthReturn,Trailing3MonthReturn,
FROM  Table1">
        <field column="PerformanceId" name="OS06Y" />
        <field column="Trailing1MonthReturn" name="PM004" />
        <field column="Trailing2MonthReturn" name="PM133" />
        <field column="Trailing3MonthReturn" name="PM006" />
            </entity>
<entity name="PerformanceData2"
dataSource="getTrailingTotalReturnForMonthEnd2" query="SELECT Performan
ceId,Trailing10YearReturn,Trailing15YearReturn,TrailingYearToDateReturn,
SinceInceptionReturn FROM Table2">
        <field column="PerformanceId" name="OS06Y" />
        <field column="Trailing10YearReturn" name="PM00I" />
        <field column="Trailing15YearReturn" name="PM00K" />
        <field column="TrailingYearToDateReturn" name="PM00A" />
        <field column="SinceInceptionReturn" name="PM00M" />
    </entity>
</document>

Because I don't want to get one id and data from table1 and then get the
data by id from table2,it may met performance issue.

-----Original Message-----
From: Em [mailto:mailformailinglists@yahoo.de] 
Sent: Thursday, April 21, 2011 4:38 PM
To: solr-user@lucene.apache.org
Subject: Re: The issue of import data from database using Solr DIH

Hi Kevin,

I think you made OS06Y the uniqueKey, right?
So, in entity 1 you specify values for it, but in entity 2 you do so as
well. 
I am not absolutely sure about this, but: It seems like your two
entities
create two documents and the second will overwrite the first.

Have a look at this page:
http://wiki.apache.org/solr/DIHQuickStart#Index_data_from_multiple_table
s_into_Solr

I think it will help you in rewriting your queries to fit your usecase.

Regards,
Em

--
View this message in context:
http://lucene.472066.n3.nabble.com/The-issue-of-import-data-from-databas
e-using-Solr-DIH-tp2845318p2846296.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: The issue of import data from database using Solr DIH

Posted by Em <ma...@yahoo.de>.
Hi Kevin,

I think you made OS06Y the uniqueKey, right?
So, in entity 1 you specify values for it, but in entity 2 you do so as
well. 
I am not absolutely sure about this, but: It seems like your two entities
create two documents and the second will overwrite the first.

Have a look at this page:
http://wiki.apache.org/solr/DIHQuickStart#Index_data_from_multiple_tables_into_Solr

I think it will help you in rewriting your queries to fit your usecase.

Regards,
Em

--
View this message in context: http://lucene.472066.n3.nabble.com/The-issue-of-import-data-from-database-using-Solr-DIH-tp2845318p2846296.html
Sent from the Solr - User mailing list archive at Nabble.com.