You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by kmf <kf...@gmail.com> on 2010/12/06 18:02:25 UTC

DIH - rdbms to index confusion

I'm new to solr (and indexing in general) and am having a hard time making
the transition from rdbms to indexing in terms of the DIH/data-config.xml
file.  I've successfully created a working index (so far) for the simple
queries in my db, but I'm struggling to add a more "complex" query.  When I
say simple I mean one or two tables and when I say complex I'm referring to
3 plus.

I have a table that contains the data values I'm wanting to return when
someone makes a search.  This table has, in addition to the data values, 3
id's (FKs) pointing to the data/info that I'm wanting the users to be able
to search on (while also returning the data values).

The general rdbms query would be something like:
select f.value, g.gar_name, c.cat_name from foo f, gar g, cat c, dub d
where g.id=f.gar_id
and c.id=f.cat_id
and d.id=f.dub_id

I tried following the item_category entity used in the DIH example here:
http://wiki.apache.org/solr/DataImportHandler#Full_Import_Example
and am struggling to get it to work.
 
My current attempt looks like (entity translated to the above rdbms query):
<dataConfig>
   <dataSource .... />
   <document>
     <entity ...simple query-working for main entity, cat>
	<field ... />
     
         <entity name="foo" query="SELECT gar_id FROM foo
                                     WHERE cat_id='${cat.id}'">     
        <entity name="gar" query="SELECT name FROM gar                 
                                    WHERE id='${f.gar_id}'">                     
          <field column="name" name="g_name" />
        </entity>                                                                                                   
        <entity name="dub" query="SELECT name FROM dub                                 
                                         WHERE id='${f.dub_id}'">
          <field column="name" name="dub_name" />
        </entity>       
        <field column="value" name="f_value" />
        </entity>   

....other working entities
</entity>
</document>
</dataConfig>

I'm getting some of the data/info back, but it's not what I am expecting. 
I'm hoping for/expecting a document/record to look like:
cat_name 1 : g_name 1 : dub_name 1 : f_value 1
cat_name 1 : g_name 1 : dub_name 2 : f_value 2 
cat_name 1 : g_name 2 : dub_name 1 : f_value 1
cat_name 1 : g_name 2 : dub_name 2 : f_value 2 
cat_name 2 : g_name 1 : dub_name 1 : f_value 1
cat_name 2 : g_name 1 : dub_name 2 : f_value 2 
cat_name 2 : g_name 2 : dub_name 1 : f_value 1
cat_name 2 : g_name 2 : dub_name 2 : f_value 2 

(All but the values are showing up in the index in some form)

Any suggestions on where my logic is failing?

Thanks

-- 
View this message in context: http://lucene.472066.n3.nabble.com/DIH-rdbms-to-index-confusion-tp2028543p2028543.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: DIH - rdbms to index confusion

Posted by Stefan Matheis <ma...@googlemail.com>.
Hi,

have a look at the mysql-query-log - it will tell you what queries are
executed from the solr dih. so you'll see which variables are empty/not set
as expected and therefore maybe missing in the result.

otherwise (for the rest of the list) it would be easier to help you, when
you're using real queries and not pseudo-queries :)

Regards
Stefan

Re: DIH - rdbms to index confusion

Posted by kmf <kf...@gmail.com>.
I'm not understanding this response.  My main table does have a one to many
relationship with the other tables.  What should I be anticipating/wanting
for each document if I want to return to the user the values while allowing
them to search on the other terms?  

Thanks.
-- 
View this message in context: http://lucene.472066.n3.nabble.com/DIH-rdbms-to-index-confusion-tp2028543p2030456.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: DIH - rdbms to index confusion

Posted by Alexey Serba <as...@gmail.com>.
> I have a table that contains the data values I'm wanting to return when
> someone makes a search.  This table has, in addition to the data values, 3
> id's (FKs) pointing to the data/info that I'm wanting the users to be able
> to search on (while also returning the data values).
>
> The general rdbms query would be something like:
> select f.value, g.gar_name, c.cat_name from foo f, gar g, cat c, dub d
> where g.id=f.gar_id
> and c.id=f.cat_id
> and d.id=f.dub_id
>
You can put this general rdbms query as is into single DIH entity - no
need to split it.

You would probably want to split it if your main table has one to many
relation with other tables, so you can't retrieve all the data and
have single result set row per Solr document.