You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Andrew Clegg <an...@gmail.com> on 2009/11/13 12:08:13 UTC

Data import problem with child entity from different database

Morning all,

I'm having problems with joining child a child entity from one database to a
parent from another...

My entity definitions look like this (names changed for brevity):

<entity name="parent" dataSource="db1" query="select a, b, c from
parent_table">

  <entity name="child" dataSource="db2" onError="continue" query="select c,
d from child_table where c = '${parent.c}'" />

</entity>

c is getting indexed fine (it's stored, I can see field 'c' in the search
results) but child.d isn't. I know the child table has data for the
corresponding parent rows, and I've even watched the SQL queries against the
child table appearing in Oracle's sqldeveloper as the DataImportHandler
runs. But no content for child.d gets into the index.

My schema contains a definition for a field called d like so:

<field name="d" type="keywords_ids" indexed="true" stored="true"
multiValued="true" termVectors="true" />

(keywords_ids is a conservatively-analyzed text type which has worked fine
in other contexts.)

Two things occur to me.

1. db1 is PostgreSQL and db2 is Oracle, although the d field in both tables
is just a char(4), nothing fancy. Could something weird with character
encodings be happening?

2. d isn't a primary key in either parent or child, but this shouldn't
matter should it?

Additional data points -- I also tried using the CachedSqlEntityProcessor to
do in-memory table caching of child, but it didn't work then either. I got a
lot of error messages like this:

No value available for the cache key : d in the entity : child

If anyone knows whether this is a known limitation (if so I can work round
it), or an unexpected case (if so I'll file a bug report), please shout. I'm
using 1.4.

Yet again, many thanks :-)

Andrew.

-- 
View this message in context: http://old.nabble.com/Data-import-problem-with-child-entity-from-different-database-tp26334948p26334948.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Data import problem with child entity from different database

Posted by Lance Norskog <go...@gmail.com>.
<dataConfig>

    <dataSource name="caffdubya" driver="org.postgresql.Driver"
url="jdbc:postgresql://db1/cathdb_v3_3_0" user="USER" password="PASS"
/>

    <dataSource name="sinatra" driver="oracle.jdbc.OracleDriver"
url="jdbc:oracle:thin:@db2:1521:biomapwh" user="USER" password="PASS"
/>

    <!-- The following path is on bsmcmp11's local disk for speed. -->
    <!-- The master copy (compressed) lives at
/cath/data/current/pdb-XML-noatom -->
    <!-- For convenience, there's a script at
bsmcmp11:/export/local/refresh_pdb to copy and unpack it. -->

    <dataSource name="filesystem" type="FileDataSource"
basePath="/export/local/pdb-XML-noatom/" encoding="UTF-8"
connectionTimeout="5000" readTimeout="10000"/>

    <document>

        <entity name="domain" dataSource="caffdubya" query="select *
from domain_text">

            <!-- Subquery for related PubMed IDs (we could pull the
actual text in later...) ... NOT WORKING! :-( -->

            <entity
                name="domain_pubmed_ids"
                dataSource="sinatra"
                onError="continue"
                query="select id as pdb_code, related_id as
related_ids from biomap_admin.uniprot_pdb_pubmed_for_solr where id =
'${domain.pdb_code}'" />

        </entity>

            <!-- REMOVED MOST ENTITIES FOR TEST PURPOSES, RESTORE FROM
PREVIOUS REVISION -->

    </document>

</dataConfig>



2009/11/13 Noble Paul നോബിള്‍  नोब्ळ् <no...@corp.aol.com>:
> am unable to get the file
> http://old.nabble.com/file/p26335171/dataimport.temp.xml
>
> On Fri, Nov 13, 2009 at 4:57 PM, Andrew Clegg <an...@gmail.com> wrote:
>>
>>
>>
>> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
>>>
>>> no obvious issues.
>>> you may post your entire data-config.xml
>>>
>>
>> Here it is, exactly as last attempt but with usernames etc. removed.
>>
>> Ignore the comments and the unused FileDataSource...
>>
>> http://old.nabble.com/file/p26335171/dataimport.temp.xml dataimport.temp.xml
>>
>>
>> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
>>>
>>> do w/o CachedSqlEntityProcessor first and then apply that later
>>>
>>
>> Yep, that was just a bit of a wild stab in the dark to see if it made any
>> difference.
>>
>> Thanks,
>>
>> Andrew.
>>
>> --
>> View this message in context: http://old.nabble.com/Data-import-problem-with-child-entity-from-different-database-tp26334948p26335171.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>
>
>
> --
> -----------------------------------------------------
> Noble Paul | Principal Engineer| AOL | http://aol.com
>



-- 
Lance Norskog
goksron@gmail.com

Re: Data import problem with child entity from different database

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@corp.aol.com>.
am unable to get the file
http://old.nabble.com/file/p26335171/dataimport.temp.xml

On Fri, Nov 13, 2009 at 4:57 PM, Andrew Clegg <an...@gmail.com> wrote:
>
>
>
> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
>>
>> no obvious issues.
>> you may post your entire data-config.xml
>>
>
> Here it is, exactly as last attempt but with usernames etc. removed.
>
> Ignore the comments and the unused FileDataSource...
>
> http://old.nabble.com/file/p26335171/dataimport.temp.xml dataimport.temp.xml
>
>
> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
>>
>> do w/o CachedSqlEntityProcessor first and then apply that later
>>
>
> Yep, that was just a bit of a wild stab in the dark to see if it made any
> difference.
>
> Thanks,
>
> Andrew.
>
> --
> View this message in context: http://old.nabble.com/Data-import-problem-with-child-entity-from-different-database-tp26334948p26335171.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Data import problem with child entity from different database

Posted by Andrew Clegg <an...@gmail.com>.


Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
> 
> no obvious issues.
> you may post your entire data-config.xml
> 

Here it is, exactly as last attempt but with usernames etc. removed.

Ignore the comments and the unused FileDataSource...

http://old.nabble.com/file/p26335171/dataimport.temp.xml dataimport.temp.xml 


Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
> 
> do w/o CachedSqlEntityProcessor first and then apply that later
> 

Yep, that was just a bit of a wild stab in the dark to see if it made any
difference.

Thanks,

Andrew.

-- 
View this message in context: http://old.nabble.com/Data-import-problem-with-child-entity-from-different-database-tp26334948p26335171.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Data import problem with child entity from different database

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@corp.aol.com>.
no obvious issues.
you may post your entire data-config.xml

do w/o CachedSqlEntityProcessor first and then apply that later


On Fri, Nov 13, 2009 at 4:38 PM, Andrew Clegg <an...@gmail.com> wrote:
>
> Morning all,
>
> I'm having problems with joining child a child entity from one database to a
> parent from another...
>
> My entity definitions look like this (names changed for brevity):
>
> <entity name="parent" dataSource="db1" query="select a, b, c from
> parent_table">
>
>  <entity name="child" dataSource="db2" onError="continue" query="select c,
> d from child_table where c = '${parent.c}'" />
>
> </entity>
>
> c is getting indexed fine (it's stored, I can see field 'c' in the search
> results) but child.d isn't. I know the child table has data for the
> corresponding parent rows, and I've even watched the SQL queries against the
> child table appearing in Oracle's sqldeveloper as the DataImportHandler
> runs. But no content for child.d gets into the index.
>
> My schema contains a definition for a field called d like so:
>
> <field name="d" type="keywords_ids" indexed="true" stored="true"
> multiValued="true" termVectors="true" />
>
> (keywords_ids is a conservatively-analyzed text type which has worked fine
> in other contexts.)
>
> Two things occur to me.
>
> 1. db1 is PostgreSQL and db2 is Oracle, although the d field in both tables
> is just a char(4), nothing fancy. Could something weird with character
> encodings be happening?
>
> 2. d isn't a primary key in either parent or child, but this shouldn't
> matter should it?
>
> Additional data points -- I also tried using the CachedSqlEntityProcessor to
> do in-memory table caching of child, but it didn't work then either. I got a
> lot of error messages like this:
>
> No value available for the cache key : d in the entity : child
>
> If anyone knows whether this is a known limitation (if so I can work round
> it), or an unexpected case (if so I'll file a bug report), please shout. I'm
> using 1.4.
>
> Yet again, many thanks :-)
>
> Andrew.
>
> --
> View this message in context: http://old.nabble.com/Data-import-problem-with-child-entity-from-different-database-tp26334948p26334948.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com