You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Eugeny Balakhonov <c0...@gmail.com> on 2011/08/10 22:52:57 UTC

Solr 3.3: DIH configuration for Oracle

Hello, all!

 

I want to create a good DIH configuration for my Oracle database with deltas
support. Unfortunately I am not able to do it well as DIH has the strange
restrictions.

I want to explain a problem on a simple example. In a reality my database
has very difficult structure.

 

Initial conditions: Two tables with following easy structure:

 

Table1

-          ID_RECORD    (Primary key)

-          DATA_FIELD1

-          ..

-          DATA_FIELD2

-          LAST_CHANGE_TIME

Table2

-          ID_RECORD    (Primary key)

-          PARENT_ID_RECORD (Foreign key to Table1.ID_RECORD) 

-          DATA_FIELD1

-          ..

-          DATA_FIELD2

-          LAST_CHANGE_TIME

 

In performance reasons it is necessary to do selection of the given tables
by means of one request (via inner join).

 

My db-data-config.xml file:

 

<?xml version="1.0" encoding="UTF-8"?>

<dataConfig>

    <dataSource jndiName="jdbc/DB1" type="JdbcDataSource" user=""
password=""/>

    <document>

        <entity name="ent" pk="T1_ID_RECORD, T2_ID_RECORD"

            query="select * from TABLE1 t1 inner join TABLE2 t2 on
t1.ID_RECORD = t2.PARENT_ID_RECORD"

            deltaQuery="select t1.ID_RECORD T1_ID_RECORD, t1.ID_RECORD
T2_ID_RECORD 

                               from TABLE1 t1 inner join TABLE2 t2 on
t1.ID_RECORD = t2.PARENT_ID_RECORD

                               where TABLE1.LAST_CHANGE_TIME >
to_date('${dataimporter.last_index_time}', 'YYYY-MM-DD HH24:MI:SS')

                               or TABLE2.LAST_CHANGE_TIME >
to_date('${dataimporter.last_index_time}', 'YYYY-MM-DD HH24:MI:SS')"

            deltaImportQuery="select * from TABLE1 t1 inner join TABLE2 t2
on t1.ID_RECORD = t2.PARENT_ID_RECORD

            where t1.ID_RECORD = ${dataimporter.delta.T1_ID_RECORD} and
t2.ID_RECORD = ${dataimporter.delta.T2_ID_RECORD}"

        />

    </document>

</dataConfig>

 

In result I have following error:

 

java.lang.IllegalArgumentException: deltaQuery has no column to resolve to
declared primary key pk='T1_ID_RECORD, T2_ID_RECORD'

 

I have analyzed the source code of DIH. I found that in the DocBuilder class
collectDelta() method works with value of entity attribute "pk" as with
simple string. But in my case this is array with two values: T1_ID_RECORD,
T2_ID_RECORD

 

What do I do wrong?

 

Thanks,

Eugeny

 


Re: Solr 3.3: DIH configuration for Oracle

Posted by Shawn Heisey <so...@elyograg.org>.
On 8/10/2011 2:52 PM, Eugeny Balakhonov wrote:
> java.lang.IllegalArgumentException: deltaQuery has no column to resolve to
> declared primary key pk='T1_ID_RECORD, T2_ID_RECORD'
>
> I have analyzed the source code of DIH. I found that in the DocBuilder class
> collectDelta() method works with value of entity attribute "pk" as with
> simple string. But in my case this is array with two values: T1_ID_RECORD,
> T2_ID_RECORD

Whatever you declare as the DIH primary key must exist as a field name 
in the result set, or Solr will complain.  I had a perfectly working 
config in 1.4.1, with identical text in query and deltaImportQuery.  It 
didn't work when I tried to upgrade to 3.1.  The problem was that I was 
using a deltaQuery that just returned MAX(did), to tell Solr that 
something needed to be done.  I had to add "AS did" to the deltaQuery so 
that it matched my primary key.  I am controlling the delta-import from 
outside Solr, so I do not need to use the result set from deltaQuery.

The point is to pick something that will exist in all of your result 
sets.  You might need to include an "AS xxx" (with something you choose 
for xxx) in your queries and use the xxx value as your pk.  Because you 
have only provided a simple example, I can't really tell you what you 
should use.

The pk value is only used to coordinate your queries.  It only has 
meaning in the DIH, not the Solr index.  Uniqueness in the Solr index is 
controlled by the uniqueKey value in schema.xml.  In my case, pk and 
uniqueKey are not the same field.

Side note: I'm not much of an expert, so I can't guarantee I can help 
further.  I will give it a try, though.

Thanks,
Shawn


Re: Solr 3.3: DIH configuration for Oracle

Posted by Alexey Serba <as...@gmail.com>.
Why do you need to collect both primary keys T1_ID_RECORD and
T2_ID_RECORD in your delta query. Isn't T2_ID_RECORD primary key value
enough to get all data from both tables? (you have table1-table2
relation as 1-N, right?)

On Thu, Aug 11, 2011 at 12:52 AM, Eugeny Balakhonov <c0...@gmail.com> wrote:
> Hello, all!
>
>
>
> I want to create a good DIH configuration for my Oracle database with deltas
> support. Unfortunately I am not able to do it well as DIH has the strange
> restrictions.
>
> I want to explain a problem on a simple example. In a reality my database
> has very difficult structure.
>
>
>
> Initial conditions: Two tables with following easy structure:
>
>
>
> Table1
>
> -          ID_RECORD    (Primary key)
>
> -          DATA_FIELD1
>
> -          ..
>
> -          DATA_FIELD2
>
> -          LAST_CHANGE_TIME
>
> Table2
>
> -          ID_RECORD    (Primary key)
>
> -          PARENT_ID_RECORD (Foreign key to Table1.ID_RECORD)
>
> -          DATA_FIELD1
>
> -          ..
>
> -          DATA_FIELD2
>
> -          LAST_CHANGE_TIME
>
>
>
> In performance reasons it is necessary to do selection of the given tables
> by means of one request (via inner join).
>
>
>
> My db-data-config.xml file:
>
>
>
> <?xml version="1.0" encoding="UTF-8"?>
>
> <dataConfig>
>
>    <dataSource jndiName="jdbc/DB1" type="JdbcDataSource" user=""
> password=""/>
>
>    <document>
>
>        <entity name="ent" pk="T1_ID_RECORD, T2_ID_RECORD"
>
>            query="select * from TABLE1 t1 inner join TABLE2 t2 on
> t1.ID_RECORD = t2.PARENT_ID_RECORD"
>
>            deltaQuery="select t1.ID_RECORD T1_ID_RECORD, t1.ID_RECORD
> T2_ID_RECORD
>
>                               from TABLE1 t1 inner join TABLE2 t2 on
> t1.ID_RECORD = t2.PARENT_ID_RECORD
>
>                               where TABLE1.LAST_CHANGE_TIME >
> to_date('${dataimporter.last_index_time}', 'YYYY-MM-DD HH24:MI:SS')
>
>                               or TABLE2.LAST_CHANGE_TIME >
> to_date('${dataimporter.last_index_time}', 'YYYY-MM-DD HH24:MI:SS')"
>
>            deltaImportQuery="select * from TABLE1 t1 inner join TABLE2 t2
> on t1.ID_RECORD = t2.PARENT_ID_RECORD
>
>            where t1.ID_RECORD = ${dataimporter.delta.T1_ID_RECORD} and
> t2.ID_RECORD = ${dataimporter.delta.T2_ID_RECORD}"
>
>        />
>
>    </document>
>
> </dataConfig>
>
>
>
> In result I have following error:
>
>
>
> java.lang.IllegalArgumentException: deltaQuery has no column to resolve to
> declared primary key pk='T1_ID_RECORD, T2_ID_RECORD'
>
>
>
> I have analyzed the source code of DIH. I found that in the DocBuilder class
> collectDelta() method works with value of entity attribute "pk" as with
> simple string. But in my case this is array with two values: T1_ID_RECORD,
> T2_ID_RECORD
>
>
>
> What do I do wrong?
>
>
>
> Thanks,
>
> Eugeny
>
>
>
>