You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Spadez <ja...@hotmail.com> on 2012/12/04 11:08:10 UTC
SQL DIH - Can I have some guidance please?
Hi.
I am having a bit of trouble figuruing out the DIH for SQL files. I have
asked around a few different places but havent got any replies so I was
hoping you could help me.
*I have a database schema like this:*
CREATE TABLE company (
id SERIAL PRIMARY KEY,
name varchar(60) NOT NULL
);
CREATE TABLE country (
id SERIAL PRIMARY KEY,
name varchar(255) NOT NULL
);
CREATE TABLE location (
id SERIAL PRIMARY KEY,
name varchar(255) NOT NULL,
coordinate varchar(255) NOT NULL,
location_id integer NOT NULL REFERENCES country (id)
);
CREATE TABLE source (
id SERIAL PRIMARY KEY,
name varchar(60) NOT NULL
);
CREATE TABLE item (
id SERIAL PRIMARY KEY,
title varchar(60) NOT NULL,
description varchar(900) NOT NULL,
company_id integer NOT NULL REFERENCES company (id),
date timestamp NOT NULL,
source_id integer NOT NULL REFERENCES source (id),
link varchar(255) NOT NULL,
location_id integer NOT NULL REFERENCES location (id)
);
*My what I want to put into my schema is this information (named as they are
in my schema):*
id
title
description
date
source
link
location_name
location_coordinates
*I made my DIH like this:*
<dataConfig>
<dataSource name="app-ds" driver="org.postgresql.Driver"
url="jdbc:postgresql://localhost:5432/wikipedia" user="wikipedia"
password="secret" />
<document>
<entity dataSource="app-ds" name="application" query="SELECT id,
page_title from page">
<field column="id" name="id" />
<field column="title" name="title" />
<field column="description" name="description" />
<field column="date" name="date" />
<field column="name" name="source" />
<field column="link" name="link" />
<field column="name" name="location_name" />
<field column="coordinate" name="location_coordinates" />
</entity>
</document>
</dataConfig>
My main questions relate to the entity datastore query and also what to do
for field columns when it is a linked table. For example the word "name"
isnt unique since it appears in several different tables.
I would really appreciate any help on this, its taken me a while to get to
this stage and now I am truely stuck.
--
View this message in context: http://lucene.472066.n3.nabble.com/SQL-DIH-Can-I-have-some-guidance-please-tp4024207.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: SQL DIH - Can I have some guidance please?
Posted by Spadez <ja...@hotmail.com>.
Thank you so much for the help, I really appreciate it.
--
View this message in context: http://lucene.472066.n3.nabble.com/SQL-DIH-Can-I-have-some-guidance-please-tp4024207p4024250.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: SQL DIH - Can I have some guidance please?
Posted by Gora Mohanty <go...@mimirtech.com>.
On 04/12/2012, Spadez <ja...@hotmail.com> wrote:
> Thank you so much for your help. Based on the same schema in my first post
> and your help I created this, have I implemented it correctly based on your
> suggestion? I tried to comment it:
Looks almost correct. You only need two levels of
nesting, and can use proper nesting to differentiate
between the different "name" attributes.
<dataConfig>
<dataSource name="app-ds" driver="org.postgresql.Driver"
url="jdbc:postgresql://localhost:5432/wikipedia" user="wikipedia"
password="secret" />
<document>
<entity dataSource="app-ds" name="item" query="SELECT id,
title, description, date, link, location_id, source_id, company_id
from item">
<field column="id" name="id" />
<field column="title" name="title" />
<field column="description" name="description" />
<field column="date" name="date" />
<field column="link" name="link" />
<entity dataSource="app-ds" name="location" query="SELECT
name, coordinate from location where location_id=${item.location_id}">
<field column="name" name="location_name" />
<field column="coordinate" name="location_coordinates" />
</entity>
<entity dataSource="app-ds" name="source"
query="SELECT name from source where source_id=${item.source_id}">
<field column="name" name="source_name" />
</entity>
<entity dataSource="app-ds" name="company"
query="SELECT name from company where company_id=${item.company_id}">
<field column="name" name="company_name" />
</entity>
</entity>
</document>
</dataConfig>
>
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SQL-DIH-Can-I-have-some-guidance-please-tp4024207p4024235.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
Re: SQL DIH - Can I have some guidance please?
Posted by Spadez <ja...@hotmail.com>.
Thank you so much for your help. Based on the same schema in my first post
and your help I created this, have I implemented it correctly based on your
suggestion? I tried to comment it:
<dataConfig>
<dataSource name="app-ds" driver="org.postgresql.Driver"
url="jdbc:postgresql://localhost:5432/wikipedia" user="wikipedia"
password="secret" />
<document>
<entity dataSource="app-ds" name="item" query="SELECT id, title,
description, date, link, location_id, source_id, company_id from item">
<entity dataSource="app-ds" name="location" query="SELECT name, coordinate
from location where location_id=${item.location_id}">
<entity dataSource="app-ds" name="source" query="SELECT name from source
where source_id=${item.source_id}">
<entity dataSource="app-ds" name="company" query="SELECT name from company
where company_id=${item.company_id}">
<field column="id" name="id" />
<field column="title" name="title" />
<field column="description" name="description" />
<field column="date" name="date" />
<field column="link" name="link" />
<field column="name" name="company_name" />
<field column="name" name="source_name" />
<field column="name" name="location_name" />
<field column="coordinate" name="location_coordinates" />
</entity>
</entity>
</document>
</dataConfig>
--
View this message in context: http://lucene.472066.n3.nabble.com/SQL-DIH-Can-I-have-some-guidance-please-tp4024207p4024235.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: SQL DIH - Can I have some guidance please?
Posted by Gora Mohanty <go...@mimirtech.com>.
On 04/12/2012, Spadez <ja...@hotmail.com> wrote:
> Hi.
>
> I am having a bit of trouble figuruing out the DIH for SQL files. I have
> asked around a few different places but havent got any replies so I was
> hoping you could help me.
>
> *I have a database schema like this:*
>
> CREATE TABLE company (
> id SERIAL PRIMARY KEY,
> name varchar(60) NOT NULL
> );
>
> CREATE TABLE country (
> id SERIAL PRIMARY KEY,
> name varchar(255) NOT NULL
> );
>
> CREATE TABLE location (
> id SERIAL PRIMARY KEY,
> name varchar(255) NOT NULL,
> coordinate varchar(255) NOT NULL,
> location_id integer NOT NULL REFERENCES country (id)
> );
>
> CREATE TABLE source (
> id SERIAL PRIMARY KEY,
> name varchar(60) NOT NULL
> );
>
> CREATE TABLE item (
> id SERIAL PRIMARY KEY,
> title varchar(60) NOT NULL,
> description varchar(900) NOT NULL,
> company_id integer NOT NULL REFERENCES company (id),
> date timestamp NOT NULL,
> source_id integer NOT NULL REFERENCES source (id),
> link varchar(255) NOT NULL,
> location_id integer NOT NULL REFERENCES location (id)
> );
>
> *My what I want to put into my schema is this information (named as they
> are
> in my schema):*
>
> id
> title
> description
> date
> source
> link
> location_name
> location_coordinates
It is not entirely clear:
(a) How the tables are related. One can guess that item is
related to source through source_id, and to location through
location_id
(b) Which of the Solr fields are to be derived from which table.
In particular, what are the tables, company and country, used
for.
> *I made my DIH like this:*
The way to deal with related tables is by using nested entities, e.g.,
some of your fields can be populated by
<dataConfig>
<dataSource name="app-ds" driver="org.postgresql.Driver"
url="jdbc:postgresql://localhost:5432/wikipedia" user="wikipedia"
password="secret" />
<document>
<entity dataSource="app-ds" name="item" query="SELECT id,
title, location_id from item">
<entity dataSource="app-ds" name="location" query="SELECT name,
coordinate from location where location_id=${item.location_id}">
<field column="id" name="id" />
<field column="title" name="title" />
<field column="name" name="location_name" />
<field column="coordinate" name="location_coordinates" />
</entity>
</entity>
</document>
</dataConfig>
You can add more second-level entities, and/or fields as needed. The
${item.location_id} refers to the location_id in the select from the top-
level entity.
> My main questions relate to the entity datastore query and also what to do
> for field columns when it is a linked table. For example the word "name"
> isnt unique since it appears in several different tables.
You could change the name in the select, e.g.,
for the top-level entity have:
query="select name as top_level_name"
and for the inner entity have:
query="select name as second_level_name"
Regards,
Gora