You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Spadez <ja...@hotmail.com> on 2012/12/04 11:08:10 UTC

SQL DIH - Can I have some guidance please?

Hi.

I am having a bit of trouble figuruing out the DIH for SQL files. I have
asked around a few different places but havent got any replies so I was
hoping you could help me.

*I have a database schema like this:*

CREATE TABLE company (
    id SERIAL PRIMARY KEY,
    name varchar(60) NOT NULL
);

CREATE TABLE country (
    id SERIAL PRIMARY KEY,
    name varchar(255) NOT NULL
);

CREATE TABLE location (
    id SERIAL PRIMARY KEY,
    name varchar(255) NOT NULL,
    coordinate varchar(255) NOT NULL,
    location_id integer NOT NULL REFERENCES country (id)
);

CREATE TABLE source (
    id SERIAL PRIMARY KEY,
    name varchar(60) NOT NULL
);

CREATE TABLE item (
    id SERIAL PRIMARY KEY,
    title varchar(60) NOT NULL,
    description varchar(900) NOT NULL,
    company_id integer NOT NULL REFERENCES company (id),
    date timestamp NOT NULL,
    source_id integer NOT NULL REFERENCES source (id),
    link varchar(255) NOT NULL,
    location_id integer NOT NULL REFERENCES location (id)
);

*My what I want to put into my schema is this information (named as they are
in my schema):*

id
title
description
date
source
link
location_name
location_coordinates

*I made my DIH like this:*

<dataConfig>
	<dataSource name="app-ds" driver="org.postgresql.Driver"
url="jdbc:postgresql://localhost:5432/wikipedia" user="wikipedia"
password="secret" />
	<document>
		<entity dataSource="app-ds" name="application" query="SELECT id,
page_title from page">
      			<field column="id" name="id" />
      			<field column="title" name="title" />
      			<field column="description" name="description" />
      			<field column="date" name="date" />
      			<field column="name" name="source" />
      			<field column="link" name="link" />
      			<field column="name" name="location_name" />
      			<field column="coordinate" name="location_coordinates" />
   		</entity>
  	</document>
</dataConfig>

My main questions relate to the entity datastore query and also what to do
for field columns when it is a linked table. For example the word "name"
isnt unique since it appears in several different tables.

I would really appreciate any help on this, its taken me a while to get to
this stage and now I am truely stuck.



--
View this message in context: http://lucene.472066.n3.nabble.com/SQL-DIH-Can-I-have-some-guidance-please-tp4024207.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SQL DIH - Can I have some guidance please?

Posted by Spadez <ja...@hotmail.com>.
Thank you so much for the help, I really appreciate it.



--
View this message in context: http://lucene.472066.n3.nabble.com/SQL-DIH-Can-I-have-some-guidance-please-tp4024207p4024250.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SQL DIH - Can I have some guidance please?

Posted by Gora Mohanty <go...@mimirtech.com>.
On 04/12/2012, Spadez <ja...@hotmail.com> wrote:
> Thank you so much for your help. Based on the same schema in my first post
> and your help I created this, have I implemented it correctly based on your
> suggestion? I tried to comment it:

Looks almost correct. You only need two levels of
nesting, and can use proper nesting to differentiate
between the different "name" attributes.

<dataConfig>
  <dataSource name="app-ds" driver="org.postgresql.Driver"
url="jdbc:postgresql://localhost:5432/wikipedia" user="wikipedia"
password="secret" />
     <document>
          <entity dataSource="app-ds" name="item" query="SELECT id,
title, description, date, link, location_id, source_id, company_id
from item">
 			<field column="id" name="id" />
 			<field column="title" name="title" />
 			<field column="description" name="description" />			
 			<field column="date" name="date" />
 			<field column="link" name="link" />

 		        <entity dataSource="app-ds" name="location" query="SELECT
name, coordinate from location where location_id=${item.location_id}">
 			     <field column="name" name="location_name" />
 			     <field column="coordinate" name="location_coordinates" />
                       </entity>

          	       <entity dataSource="app-ds" name="source"
query="SELECT name from source where source_id=${item.source_id}">
 			     <field column="name" name="source_name" />
                       </entity>

 	               <entity dataSource="app-ds" name="company"
query="SELECT name from company where company_id=${item.company_id}">
 			     <field column="name" name="company_name" />
 		         </entity>
 		</entity>
 	</document>
</dataConfig>
>
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SQL-DIH-Can-I-have-some-guidance-please-tp4024207p4024235.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: SQL DIH - Can I have some guidance please?

Posted by Spadez <ja...@hotmail.com>.
Thank you so much for your help. Based on the same schema in my first post
and your help I created this, have I implemented it correctly based on your
suggestion? I tried to comment it: 

<dataConfig> 
 <dataSource name="app-ds" driver="org.postgresql.Driver"
url="jdbc:postgresql://localhost:5432/wikipedia" user="wikipedia"
password="secret" />
 	<document> 

		
		<entity dataSource="app-ds" name="item" query="SELECT id, title,
description, date, link, location_id, source_id, company_id from item">

		
		<entity dataSource="app-ds" name="location" query="SELECT name, coordinate
from location where location_id=${item.location_id}"> 
		<entity dataSource="app-ds" name="source" query="SELECT name from source
where source_id=${item.source_id}">
		<entity dataSource="app-ds" name="company" query="SELECT name from company
where company_id=${item.company_id}"> 

			
			<field column="id" name="id" /> 
			<field column="title" name="title" />
			<field column="description" name="description" />			
			<field column="date" name="date" />
			<field column="link" name="link" />

			
			<field column="name" name="company_name" />
			<field column="name" name="source_name" /> 
			<field column="name" name="location_name" /> 
			<field column="coordinate" name="location_coordinates" />

		</entity> 
		</entity> 
	</document> 
</dataConfig>






--
View this message in context: http://lucene.472066.n3.nabble.com/SQL-DIH-Can-I-have-some-guidance-please-tp4024207p4024235.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SQL DIH - Can I have some guidance please?

Posted by Gora Mohanty <go...@mimirtech.com>.
On 04/12/2012, Spadez <ja...@hotmail.com> wrote:
> Hi.
>
> I am having a bit of trouble figuruing out the DIH for SQL files. I have
> asked around a few different places but havent got any replies so I was
> hoping you could help me.
>
> *I have a database schema like this:*
>
> CREATE TABLE company (
>     id SERIAL PRIMARY KEY,
>     name varchar(60) NOT NULL
> );
>
> CREATE TABLE country (
>     id SERIAL PRIMARY KEY,
>     name varchar(255) NOT NULL
> );
>
> CREATE TABLE location (
>     id SERIAL PRIMARY KEY,
>     name varchar(255) NOT NULL,
>     coordinate varchar(255) NOT NULL,
>     location_id integer NOT NULL REFERENCES country (id)
> );
>
> CREATE TABLE source (
>     id SERIAL PRIMARY KEY,
>     name varchar(60) NOT NULL
> );
>
> CREATE TABLE item (
>     id SERIAL PRIMARY KEY,
>     title varchar(60) NOT NULL,
>     description varchar(900) NOT NULL,
>     company_id integer NOT NULL REFERENCES company (id),
>     date timestamp NOT NULL,
>     source_id integer NOT NULL REFERENCES source (id),
>     link varchar(255) NOT NULL,
>     location_id integer NOT NULL REFERENCES location (id)
> );
>
> *My what I want to put into my schema is this information (named as they
> are
> in my schema):*
>
> id
> title
> description
> date
> source
> link
> location_name
> location_coordinates

It is not entirely clear:
(a) How the tables are related. One can guess that item is
     related to source through source_id, and to location through
     location_id
(b) Which of the Solr fields are to be derived from which table.
     In particular, what are the tables, company and country, used
     for.

> *I made my DIH like this:*

The way to deal with related tables is by using nested entities, e.g.,
some of your fields can be populated by

<dataConfig>
 <dataSource name="app-ds" driver="org.postgresql.Driver"
 url="jdbc:postgresql://localhost:5432/wikipedia" user="wikipedia"
 password="secret" />
 	<document>
           <entity dataSource="app-ds" name="item" query="SELECT id,
 title, location_id from item">
              <entity dataSource="app-ds" name="location" query="SELECT name,
 coordinate from location where location_id=${item.location_id}">
       			<field column="id" name="id" />
       			<field column="title" name="title" />
       			<field column="name" name="location_name" />
       			<field column="coordinate" name="location_coordinates" />
    		</entity>
            </entity>
   	</document>
</dataConfig>

You can add more second-level entities, and/or fields as needed. The
${item.location_id} refers to the location_id in the select from the top-
level entity.

> My main questions relate to the entity datastore query and also what to do
> for field columns when it is a linked table. For example the word "name"
> isnt unique since it appears in several different tables.

You could change the name in the select, e.g.,
for the top-level entity have:
  query="select name as top_level_name"
and for the inner entity have:
  query="select name as second_level_name"

Regards,
Gora