You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Radu Toev <ra...@gmail.com> on 2012/02/16 09:15:28 UTC

Entity with multiple datasources

Hello,

I created a data-config.xml file where I define a datasource and an entity
with 12 fields.
In my use case I have 2 databases with the same schema, so I want to
combine in one index the 2 databases.
I defined a second dataSource tag and duplicateed the entity with its
field(changed the name and the datasource).
What I'm expecting is to get around 7k results(I have around 6k in the
first db and 1k in the second). However I'm getting a total of 2k.
Where could be the problem?

Thanks

Re: Entity with multiple datasources

Posted by Dmitry Kan <dm...@gmail.com>.
no problem, hope it helps, you're welcome.

On Thu, Feb 16, 2012 at 5:03 PM, Radu Toev <ra...@gmail.com> wrote:

> Really good point on the ids, I completely overlooked that matter.
> I will give it a try.
> Thanks again.
>
> On Thu, Feb 16, 2012 at 5:00 PM, Dmitry Kan <dm...@gmail.com> wrote:
>
> > Each document in SOLR will correspond to one db record and since both
> > databases have the same schema, you can't index two records from two
> > databases into the same SOLR document.
> >
> > So after indexing, you should have 7k different documents, each of which
> > holds data from a db record.
> >
> > Also one problem I see here is that since the record id in each table is
> > unique only within the table and (most probably) not globally, there will
> > be collisions. To aviod this, I would prepend a record_id with some
> static
> > value, like: concat("t1",  CONVERT(id, CHAR(8))).
> >
> > Dmitry
> >
> > On Thu, Feb 16, 2012 at 4:47 PM, Radu Toev <ra...@gmail.com> wrote:
> >
> > > I'm not sure I follow.
> > > The idea is to have only one document. Do the multiple documents have
> the
> > > same structure then(different datasources), and if so how are they
> > actually
> > > indexed?
> > >
> > > Thanks.
> > >
> > > On Thu, Feb 16, 2012 at 4:40 PM, Dmitry Kan <dm...@gmail.com>
> > wrote:
> > >
> > > > I think the problem here is that initially you trying to create
> > separate
> > > > documents for two different tables, while your config is aiming to
> > create
> > > > only one document. Here there is one solution (not tried by me):
> > > >
> > > > ------
> > > > You can have multiple documents generated by the same data-config:
> > > >
> > > > <dataConfig>
> > > >  <dataSource name="ds1" .../>
> > > >  <dataSource name="ds2" .../>
> > > >  <dataSource name="ds3" .../>
> > > >  <document>
> > > >   <entity blah blah rootEntity="false">
> > > >       <entity blah blah this is a document>
> > > >          <entity sets unique id/>
> > > >       </document>
> > > >       <document blah blah this is another document>
> > > >          <entity sets unique id>
> > > >       </document>
> > > >  </document>
> > > > </dataConfig>
> > > >
> > > > It's the 'rootEntity="false" that makes the child entity a document.
> > > > ------
> > > >
> > > > Dmitry
> > > >
> > > > On Thu, Feb 16, 2012 at 2:37 PM, Radu Toev <ra...@gmail.com>
> wrote:
> > > >
> > > > > <dataConfig>
> > > > >  <dataSource
> > > > >     name="s"
> > > > >     driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
> > > > >     url=""
> > > > >     user=""
> > > > >     password=""/>
> > > > >  <dataSource
> > > > >     name="p"
> > > > >  driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
> > > > >     url=""
> > > > >     user=""
> > > > >     password=""/>
> > > > >  <document>
> > > > >    <entity name="ms"
> > > > >        datasource="s"
> > > > > query="SELECT m.id as id, m.serial as m_machine_serial, m.ivk as
> > > > > m_machine_ivk, m.sitename as m_sitename, m.deliveryDate as
> > > > m_delivery_date,
> > > > > m.hotsite as m_hotsite, m.guardian as m_guardian, m.warranty as
> > > > m_warranty,
> > > > > m.contract as m_contract,
> > > > >   st.name as m_st_name, pm.name as m_pm_name, p.name as m_p_name,
> > > > > sv.shortName as m_sv_name, c.clusterMajor as m_c_cluster_major,
> > > > > c.clusterMinor as m_c_cluster_minor, c.country as m_c_country,
> c.code
> > > as
> > > > > m_c_code
> > > > >   FROM Machine AS m
> > > > >   LEFT JOIN SystemType AS st ON m.fk_systemType=st.id
> > > > >   LEFT JOIN ProductModel AS pm ON fk_productModel = pm.id
> > > > >   LEFT JOIN Platform AS p ON m.fk_platform = p.id
> > > > >   LEFT JOIN SoftwareVersion AS sv ON fk_softwareVersion = sv.id
> > > > >   LEFT JOIN Country AS c ON fk_country = c.id"
> > > > > readOnly="true"
> > > > > transformer="DateFormatTransformer">
> > > > > <field column="id" />
> > > > > <field column="m_machine_serial"/>
> > > > > <field column="m_machine_ivk"/>
> > > > > <field column="m_sitename"/>
> > > > > <filed column="m_delivery_date" dateTimeFormat="yyyy-MM-dd"/>
> > > > > <field column="m_hotsite"/>
> > > > > <field column="m_guardian"/>
> > > > > <field column="m_warranty"/>
> > > > > <field column="m_contract"/>
> > > > > <field column="m_st_name"/>
> > > > > <field column="m_pm_name"/>
> > > > > <field column="m_p_name"/>
> > > > > <field column="m_sv_name"/>
> > > > > <field column="m_c_cluster_major"/>
> > > > > <field column="m_c_cluster_minor"/>
> > > > > <field column="m_c_country"/>
> > > > > <field column="m_c_code"/>
> > > > >   </entity>
> > > > >
> > > > >   <entity name="mp"
> > > > >        datasource="p"
> > > > > query="SELECT m.id as id, m.serial as m_machine_serial, m.ivk as
> > > > > m_machine_ivk, m.sitename as m_sitename, m.deliveryDate as
> > > > m_delivery_date,
> > > > > m.hotsite as m_hotsite, m.guardian as m_guardian, m.warranty as
> > > > m_warranty,
> > > > > m.contract as m_contract,
> > > > >   st.name as m_st_name, pm.name as m_pm_name, p.name as m_p_name,
> > > > > sv.shortName as m_sv_name, c.clusterMajor as m_c_cluster_major,
> > > > > c.clusterMinor as m_c_cluster_minor, c.country as m_c_country,
> c.code
> > > as
> > > > > m_c_code
> > > > >   FROM Machine AS m
> > > > >   LEFT JOIN SystemType AS st ON m.fk_systemType=st.id
> > > > >   LEFT JOIN ProductModel AS pm ON fk_productModel = pm.id
> > > > >   LEFT JOIN Platform AS p ON m.fk_platform = p.id
> > > > >   LEFT JOIN SoftwareVersion AS sv ON fk_softwareVersion = sv.id
> > > > >   LEFT JOIN Country AS c ON fk_country = c.id"
> > > > > readOnly="true"
> > > > > transformer="DateFormatTransformer">
> > > > > <field column="id" />
> > > > > <field column="m_machine_serial"/>
> > > > > <field column="m_machine_ivk"/>
> > > > > <field column="m_sitename"/>
> > > > > <filed column="m_delivery_date" dateTimeFormat="yyyy-MM-dd"/>
> > > > > <field column="m_hotsite"/>
> > > > > <field column="m_guardian"/>
> > > > > <field column="m_warranty"/>
> > > > > <field column="m_contract"/>
> > > > > <field column="m_st_name"/>
> > > > > <field column="m_pm_name"/>
> > > > > <field column="m_p_name"/>
> > > > > <field column="m_sv_name"/>
> > > > > <field column="m_c_cluster_major"/>
> > > > > <field column="m_c_cluster_minor"/>
> > > > > <field column="m_c_country"/>
> > > > > <field column="m_c_code"/>
> > > > >   </entity>
> > > > >  </document>
> > > > > </dataConfig>
> > > > >
> > > > > I've removed the connection params
> > > > > The unique key is id.
> > > > >
> > > > > On Thu, Feb 16, 2012 at 2:27 PM, Dmitry Kan <dm...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > OK, maybe you can show the db-data-config.xml just in case?
> > > > > > Also in schema.xml, does you <uniqueKey> correspond to the unique
> > > field
> > > > > in
> > > > > > the db?
> > > > > >
> > > > > > On Thu, Feb 16, 2012 at 2:13 PM, Radu Toev <ra...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > I tried running with just one datasource(the one that has 6k
> > > entries)
> > > > > and
> > > > > > > it indexes them ok.
> > > > > > > The same, if I do sepparately the 1k database. It indexes ok.
> > > > > > >
> > > > > > > On Thu, Feb 16, 2012 at 2:11 PM, Dmitry Kan <
> > dmitry.kan@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > It sounds a bit, as if SOLR stopped processing data once it
> > > queried
> > > > > all
> > > > > > > > from the smaller dataset. That's why you have 2000. If you
> just
> > > > have
> > > > > a
> > > > > > > > handler pointed to the bigger data set (6k), do you manage to
> > get
> > > > all
> > > > > > 6k
> > > > > > > db
> > > > > > > > entries into solr?
> > > > > > > >
> > > > > > > > On Thu, Feb 16, 2012 at 1:46 PM, Radu Toev <
> radutoev@gmail.com
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > > 1. Nothing in the logs
> > > > > > > > > 2. No.
> > > > > > > > >
> > > > > > > > > On Thu, Feb 16, 2012 at 12:44 PM, Dmitry Kan <
> > > > dmitry.kan@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > 1. Do you see any errors / exceptions in the logs?
> > > > > > > > > > 2. Could you have duplicates?
> > > > > > > > > >
> > > > > > > > > > On Thu, Feb 16, 2012 at 10:15 AM, Radu Toev <
> > > > radutoev@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hello,
> > > > > > > > > > >
> > > > > > > > > > > I created a data-config.xml file where I define a
> > > datasource
> > > > > and
> > > > > > an
> > > > > > > > > > entity
> > > > > > > > > > > with 12 fields.
> > > > > > > > > > > In my use case I have 2 databases with the same schema,
> > so
> > > I
> > > > > want
> > > > > > > to
> > > > > > > > > > > combine in one index the 2 databases.
> > > > > > > > > > > I defined a second dataSource tag and duplicateed the
> > > entity
> > > > > with
> > > > > > > its
> > > > > > > > > > > field(changed the name and the datasource).
> > > > > > > > > > > What I'm expecting is to get around 7k results(I have
> > > around
> > > > 6k
> > > > > > in
> > > > > > > > the
> > > > > > > > > > > first db and 1k in the second). However I'm getting a
> > total
> > > > of
> > > > > > 2k.
> > > > > > > > > > > Where could be the problem?
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Regards,
> > > > > > > > > >
> > > > > > > > > > Dmitry Kan
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Regards,
> > > > > > > >
> > > > > > > > Dmitry Kan
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Regards,
> > > > > >
> > > > > > Dmitry Kan
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Regards,
> > > >
> > > > Dmitry Kan
> > > >
> > >
> >
> >
> >
> > --
> > Regards,
> >
> > Dmitry Kan
> >
>



-- 
Regards,

Dmitry Kan

Re: Entity with multiple datasources

Posted by Radu Toev <ra...@gmail.com>.
Really good point on the ids, I completely overlooked that matter.
I will give it a try.
Thanks again.

On Thu, Feb 16, 2012 at 5:00 PM, Dmitry Kan <dm...@gmail.com> wrote:

> Each document in SOLR will correspond to one db record and since both
> databases have the same schema, you can't index two records from two
> databases into the same SOLR document.
>
> So after indexing, you should have 7k different documents, each of which
> holds data from a db record.
>
> Also one problem I see here is that since the record id in each table is
> unique only within the table and (most probably) not globally, there will
> be collisions. To aviod this, I would prepend a record_id with some static
> value, like: concat("t1",  CONVERT(id, CHAR(8))).
>
> Dmitry
>
> On Thu, Feb 16, 2012 at 4:47 PM, Radu Toev <ra...@gmail.com> wrote:
>
> > I'm not sure I follow.
> > The idea is to have only one document. Do the multiple documents have the
> > same structure then(different datasources), and if so how are they
> actually
> > indexed?
> >
> > Thanks.
> >
> > On Thu, Feb 16, 2012 at 4:40 PM, Dmitry Kan <dm...@gmail.com>
> wrote:
> >
> > > I think the problem here is that initially you trying to create
> separate
> > > documents for two different tables, while your config is aiming to
> create
> > > only one document. Here there is one solution (not tried by me):
> > >
> > > ------
> > > You can have multiple documents generated by the same data-config:
> > >
> > > <dataConfig>
> > >  <dataSource name="ds1" .../>
> > >  <dataSource name="ds2" .../>
> > >  <dataSource name="ds3" .../>
> > >  <document>
> > >   <entity blah blah rootEntity="false">
> > >       <entity blah blah this is a document>
> > >          <entity sets unique id/>
> > >       </document>
> > >       <document blah blah this is another document>
> > >          <entity sets unique id>
> > >       </document>
> > >  </document>
> > > </dataConfig>
> > >
> > > It's the 'rootEntity="false" that makes the child entity a document.
> > > ------
> > >
> > > Dmitry
> > >
> > > On Thu, Feb 16, 2012 at 2:37 PM, Radu Toev <ra...@gmail.com> wrote:
> > >
> > > > <dataConfig>
> > > >  <dataSource
> > > >     name="s"
> > > >     driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
> > > >     url=""
> > > >     user=""
> > > >     password=""/>
> > > >  <dataSource
> > > >     name="p"
> > > >  driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
> > > >     url=""
> > > >     user=""
> > > >     password=""/>
> > > >  <document>
> > > >    <entity name="ms"
> > > >        datasource="s"
> > > > query="SELECT m.id as id, m.serial as m_machine_serial, m.ivk as
> > > > m_machine_ivk, m.sitename as m_sitename, m.deliveryDate as
> > > m_delivery_date,
> > > > m.hotsite as m_hotsite, m.guardian as m_guardian, m.warranty as
> > > m_warranty,
> > > > m.contract as m_contract,
> > > >   st.name as m_st_name, pm.name as m_pm_name, p.name as m_p_name,
> > > > sv.shortName as m_sv_name, c.clusterMajor as m_c_cluster_major,
> > > > c.clusterMinor as m_c_cluster_minor, c.country as m_c_country, c.code
> > as
> > > > m_c_code
> > > >   FROM Machine AS m
> > > >   LEFT JOIN SystemType AS st ON m.fk_systemType=st.id
> > > >   LEFT JOIN ProductModel AS pm ON fk_productModel = pm.id
> > > >   LEFT JOIN Platform AS p ON m.fk_platform = p.id
> > > >   LEFT JOIN SoftwareVersion AS sv ON fk_softwareVersion = sv.id
> > > >   LEFT JOIN Country AS c ON fk_country = c.id"
> > > > readOnly="true"
> > > > transformer="DateFormatTransformer">
> > > > <field column="id" />
> > > > <field column="m_machine_serial"/>
> > > > <field column="m_machine_ivk"/>
> > > > <field column="m_sitename"/>
> > > > <filed column="m_delivery_date" dateTimeFormat="yyyy-MM-dd"/>
> > > > <field column="m_hotsite"/>
> > > > <field column="m_guardian"/>
> > > > <field column="m_warranty"/>
> > > > <field column="m_contract"/>
> > > > <field column="m_st_name"/>
> > > > <field column="m_pm_name"/>
> > > > <field column="m_p_name"/>
> > > > <field column="m_sv_name"/>
> > > > <field column="m_c_cluster_major"/>
> > > > <field column="m_c_cluster_minor"/>
> > > > <field column="m_c_country"/>
> > > > <field column="m_c_code"/>
> > > >   </entity>
> > > >
> > > >   <entity name="mp"
> > > >        datasource="p"
> > > > query="SELECT m.id as id, m.serial as m_machine_serial, m.ivk as
> > > > m_machine_ivk, m.sitename as m_sitename, m.deliveryDate as
> > > m_delivery_date,
> > > > m.hotsite as m_hotsite, m.guardian as m_guardian, m.warranty as
> > > m_warranty,
> > > > m.contract as m_contract,
> > > >   st.name as m_st_name, pm.name as m_pm_name, p.name as m_p_name,
> > > > sv.shortName as m_sv_name, c.clusterMajor as m_c_cluster_major,
> > > > c.clusterMinor as m_c_cluster_minor, c.country as m_c_country, c.code
> > as
> > > > m_c_code
> > > >   FROM Machine AS m
> > > >   LEFT JOIN SystemType AS st ON m.fk_systemType=st.id
> > > >   LEFT JOIN ProductModel AS pm ON fk_productModel = pm.id
> > > >   LEFT JOIN Platform AS p ON m.fk_platform = p.id
> > > >   LEFT JOIN SoftwareVersion AS sv ON fk_softwareVersion = sv.id
> > > >   LEFT JOIN Country AS c ON fk_country = c.id"
> > > > readOnly="true"
> > > > transformer="DateFormatTransformer">
> > > > <field column="id" />
> > > > <field column="m_machine_serial"/>
> > > > <field column="m_machine_ivk"/>
> > > > <field column="m_sitename"/>
> > > > <filed column="m_delivery_date" dateTimeFormat="yyyy-MM-dd"/>
> > > > <field column="m_hotsite"/>
> > > > <field column="m_guardian"/>
> > > > <field column="m_warranty"/>
> > > > <field column="m_contract"/>
> > > > <field column="m_st_name"/>
> > > > <field column="m_pm_name"/>
> > > > <field column="m_p_name"/>
> > > > <field column="m_sv_name"/>
> > > > <field column="m_c_cluster_major"/>
> > > > <field column="m_c_cluster_minor"/>
> > > > <field column="m_c_country"/>
> > > > <field column="m_c_code"/>
> > > >   </entity>
> > > >  </document>
> > > > </dataConfig>
> > > >
> > > > I've removed the connection params
> > > > The unique key is id.
> > > >
> > > > On Thu, Feb 16, 2012 at 2:27 PM, Dmitry Kan <dm...@gmail.com>
> > > wrote:
> > > >
> > > > > OK, maybe you can show the db-data-config.xml just in case?
> > > > > Also in schema.xml, does you <uniqueKey> correspond to the unique
> > field
> > > > in
> > > > > the db?
> > > > >
> > > > > On Thu, Feb 16, 2012 at 2:13 PM, Radu Toev <ra...@gmail.com>
> > wrote:
> > > > >
> > > > > > I tried running with just one datasource(the one that has 6k
> > entries)
> > > > and
> > > > > > it indexes them ok.
> > > > > > The same, if I do sepparately the 1k database. It indexes ok.
> > > > > >
> > > > > > On Thu, Feb 16, 2012 at 2:11 PM, Dmitry Kan <
> dmitry.kan@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > It sounds a bit, as if SOLR stopped processing data once it
> > queried
> > > > all
> > > > > > > from the smaller dataset. That's why you have 2000. If you just
> > > have
> > > > a
> > > > > > > handler pointed to the bigger data set (6k), do you manage to
> get
> > > all
> > > > > 6k
> > > > > > db
> > > > > > > entries into solr?
> > > > > > >
> > > > > > > On Thu, Feb 16, 2012 at 1:46 PM, Radu Toev <radutoev@gmail.com
> >
> > > > wrote:
> > > > > > >
> > > > > > > > 1. Nothing in the logs
> > > > > > > > 2. No.
> > > > > > > >
> > > > > > > > On Thu, Feb 16, 2012 at 12:44 PM, Dmitry Kan <
> > > dmitry.kan@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > 1. Do you see any errors / exceptions in the logs?
> > > > > > > > > 2. Could you have duplicates?
> > > > > > > > >
> > > > > > > > > On Thu, Feb 16, 2012 at 10:15 AM, Radu Toev <
> > > radutoev@gmail.com>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hello,
> > > > > > > > > >
> > > > > > > > > > I created a data-config.xml file where I define a
> > datasource
> > > > and
> > > > > an
> > > > > > > > > entity
> > > > > > > > > > with 12 fields.
> > > > > > > > > > In my use case I have 2 databases with the same schema,
> so
> > I
> > > > want
> > > > > > to
> > > > > > > > > > combine in one index the 2 databases.
> > > > > > > > > > I defined a second dataSource tag and duplicateed the
> > entity
> > > > with
> > > > > > its
> > > > > > > > > > field(changed the name and the datasource).
> > > > > > > > > > What I'm expecting is to get around 7k results(I have
> > around
> > > 6k
> > > > > in
> > > > > > > the
> > > > > > > > > > first db and 1k in the second). However I'm getting a
> total
> > > of
> > > > > 2k.
> > > > > > > > > > Where could be the problem?
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Regards,
> > > > > > > > >
> > > > > > > > > Dmitry Kan
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Regards,
> > > > > > >
> > > > > > > Dmitry Kan
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Regards,
> > > > >
> > > > > Dmitry Kan
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > >
> > > Dmitry Kan
> > >
> >
>
>
>
> --
> Regards,
>
> Dmitry Kan
>

Re: Entity with multiple datasources

Posted by Dmitry Kan <dm...@gmail.com>.
Each document in SOLR will correspond to one db record and since both
databases have the same schema, you can't index two records from two
databases into the same SOLR document.

So after indexing, you should have 7k different documents, each of which
holds data from a db record.

Also one problem I see here is that since the record id in each table is
unique only within the table and (most probably) not globally, there will
be collisions. To aviod this, I would prepend a record_id with some static
value, like: concat("t1",  CONVERT(id, CHAR(8))).

Dmitry

On Thu, Feb 16, 2012 at 4:47 PM, Radu Toev <ra...@gmail.com> wrote:

> I'm not sure I follow.
> The idea is to have only one document. Do the multiple documents have the
> same structure then(different datasources), and if so how are they actually
> indexed?
>
> Thanks.
>
> On Thu, Feb 16, 2012 at 4:40 PM, Dmitry Kan <dm...@gmail.com> wrote:
>
> > I think the problem here is that initially you trying to create separate
> > documents for two different tables, while your config is aiming to create
> > only one document. Here there is one solution (not tried by me):
> >
> > ------
> > You can have multiple documents generated by the same data-config:
> >
> > <dataConfig>
> >  <dataSource name="ds1" .../>
> >  <dataSource name="ds2" .../>
> >  <dataSource name="ds3" .../>
> >  <document>
> >   <entity blah blah rootEntity="false">
> >       <entity blah blah this is a document>
> >          <entity sets unique id/>
> >       </document>
> >       <document blah blah this is another document>
> >          <entity sets unique id>
> >       </document>
> >  </document>
> > </dataConfig>
> >
> > It's the 'rootEntity="false" that makes the child entity a document.
> > ------
> >
> > Dmitry
> >
> > On Thu, Feb 16, 2012 at 2:37 PM, Radu Toev <ra...@gmail.com> wrote:
> >
> > > <dataConfig>
> > >  <dataSource
> > >     name="s"
> > >     driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
> > >     url=""
> > >     user=""
> > >     password=""/>
> > >  <dataSource
> > >     name="p"
> > >  driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
> > >     url=""
> > >     user=""
> > >     password=""/>
> > >  <document>
> > >    <entity name="ms"
> > >        datasource="s"
> > > query="SELECT m.id as id, m.serial as m_machine_serial, m.ivk as
> > > m_machine_ivk, m.sitename as m_sitename, m.deliveryDate as
> > m_delivery_date,
> > > m.hotsite as m_hotsite, m.guardian as m_guardian, m.warranty as
> > m_warranty,
> > > m.contract as m_contract,
> > >   st.name as m_st_name, pm.name as m_pm_name, p.name as m_p_name,
> > > sv.shortName as m_sv_name, c.clusterMajor as m_c_cluster_major,
> > > c.clusterMinor as m_c_cluster_minor, c.country as m_c_country, c.code
> as
> > > m_c_code
> > >   FROM Machine AS m
> > >   LEFT JOIN SystemType AS st ON m.fk_systemType=st.id
> > >   LEFT JOIN ProductModel AS pm ON fk_productModel = pm.id
> > >   LEFT JOIN Platform AS p ON m.fk_platform = p.id
> > >   LEFT JOIN SoftwareVersion AS sv ON fk_softwareVersion = sv.id
> > >   LEFT JOIN Country AS c ON fk_country = c.id"
> > > readOnly="true"
> > > transformer="DateFormatTransformer">
> > > <field column="id" />
> > > <field column="m_machine_serial"/>
> > > <field column="m_machine_ivk"/>
> > > <field column="m_sitename"/>
> > > <filed column="m_delivery_date" dateTimeFormat="yyyy-MM-dd"/>
> > > <field column="m_hotsite"/>
> > > <field column="m_guardian"/>
> > > <field column="m_warranty"/>
> > > <field column="m_contract"/>
> > > <field column="m_st_name"/>
> > > <field column="m_pm_name"/>
> > > <field column="m_p_name"/>
> > > <field column="m_sv_name"/>
> > > <field column="m_c_cluster_major"/>
> > > <field column="m_c_cluster_minor"/>
> > > <field column="m_c_country"/>
> > > <field column="m_c_code"/>
> > >   </entity>
> > >
> > >   <entity name="mp"
> > >        datasource="p"
> > > query="SELECT m.id as id, m.serial as m_machine_serial, m.ivk as
> > > m_machine_ivk, m.sitename as m_sitename, m.deliveryDate as
> > m_delivery_date,
> > > m.hotsite as m_hotsite, m.guardian as m_guardian, m.warranty as
> > m_warranty,
> > > m.contract as m_contract,
> > >   st.name as m_st_name, pm.name as m_pm_name, p.name as m_p_name,
> > > sv.shortName as m_sv_name, c.clusterMajor as m_c_cluster_major,
> > > c.clusterMinor as m_c_cluster_minor, c.country as m_c_country, c.code
> as
> > > m_c_code
> > >   FROM Machine AS m
> > >   LEFT JOIN SystemType AS st ON m.fk_systemType=st.id
> > >   LEFT JOIN ProductModel AS pm ON fk_productModel = pm.id
> > >   LEFT JOIN Platform AS p ON m.fk_platform = p.id
> > >   LEFT JOIN SoftwareVersion AS sv ON fk_softwareVersion = sv.id
> > >   LEFT JOIN Country AS c ON fk_country = c.id"
> > > readOnly="true"
> > > transformer="DateFormatTransformer">
> > > <field column="id" />
> > > <field column="m_machine_serial"/>
> > > <field column="m_machine_ivk"/>
> > > <field column="m_sitename"/>
> > > <filed column="m_delivery_date" dateTimeFormat="yyyy-MM-dd"/>
> > > <field column="m_hotsite"/>
> > > <field column="m_guardian"/>
> > > <field column="m_warranty"/>
> > > <field column="m_contract"/>
> > > <field column="m_st_name"/>
> > > <field column="m_pm_name"/>
> > > <field column="m_p_name"/>
> > > <field column="m_sv_name"/>
> > > <field column="m_c_cluster_major"/>
> > > <field column="m_c_cluster_minor"/>
> > > <field column="m_c_country"/>
> > > <field column="m_c_code"/>
> > >   </entity>
> > >  </document>
> > > </dataConfig>
> > >
> > > I've removed the connection params
> > > The unique key is id.
> > >
> > > On Thu, Feb 16, 2012 at 2:27 PM, Dmitry Kan <dm...@gmail.com>
> > wrote:
> > >
> > > > OK, maybe you can show the db-data-config.xml just in case?
> > > > Also in schema.xml, does you <uniqueKey> correspond to the unique
> field
> > > in
> > > > the db?
> > > >
> > > > On Thu, Feb 16, 2012 at 2:13 PM, Radu Toev <ra...@gmail.com>
> wrote:
> > > >
> > > > > I tried running with just one datasource(the one that has 6k
> entries)
> > > and
> > > > > it indexes them ok.
> > > > > The same, if I do sepparately the 1k database. It indexes ok.
> > > > >
> > > > > On Thu, Feb 16, 2012 at 2:11 PM, Dmitry Kan <dm...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > It sounds a bit, as if SOLR stopped processing data once it
> queried
> > > all
> > > > > > from the smaller dataset. That's why you have 2000. If you just
> > have
> > > a
> > > > > > handler pointed to the bigger data set (6k), do you manage to get
> > all
> > > > 6k
> > > > > db
> > > > > > entries into solr?
> > > > > >
> > > > > > On Thu, Feb 16, 2012 at 1:46 PM, Radu Toev <ra...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > 1. Nothing in the logs
> > > > > > > 2. No.
> > > > > > >
> > > > > > > On Thu, Feb 16, 2012 at 12:44 PM, Dmitry Kan <
> > dmitry.kan@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > 1. Do you see any errors / exceptions in the logs?
> > > > > > > > 2. Could you have duplicates?
> > > > > > > >
> > > > > > > > On Thu, Feb 16, 2012 at 10:15 AM, Radu Toev <
> > radutoev@gmail.com>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hello,
> > > > > > > > >
> > > > > > > > > I created a data-config.xml file where I define a
> datasource
> > > and
> > > > an
> > > > > > > > entity
> > > > > > > > > with 12 fields.
> > > > > > > > > In my use case I have 2 databases with the same schema, so
> I
> > > want
> > > > > to
> > > > > > > > > combine in one index the 2 databases.
> > > > > > > > > I defined a second dataSource tag and duplicateed the
> entity
> > > with
> > > > > its
> > > > > > > > > field(changed the name and the datasource).
> > > > > > > > > What I'm expecting is to get around 7k results(I have
> around
> > 6k
> > > > in
> > > > > > the
> > > > > > > > > first db and 1k in the second). However I'm getting a total
> > of
> > > > 2k.
> > > > > > > > > Where could be the problem?
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Regards,
> > > > > > > >
> > > > > > > > Dmitry Kan
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Regards,
> > > > > >
> > > > > > Dmitry Kan
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Regards,
> > > >
> > > > Dmitry Kan
> > > >
> > >
> >
> >
> >
> > --
> > Regards,
> >
> > Dmitry Kan
> >
>



-- 
Regards,

Dmitry Kan

Re: Entity with multiple datasources

Posted by Radu Toev <ra...@gmail.com>.
I'm not sure I follow.
The idea is to have only one document. Do the multiple documents have the
same structure then(different datasources), and if so how are they actually
indexed?

Thanks.

On Thu, Feb 16, 2012 at 4:40 PM, Dmitry Kan <dm...@gmail.com> wrote:

> I think the problem here is that initially you trying to create separate
> documents for two different tables, while your config is aiming to create
> only one document. Here there is one solution (not tried by me):
>
> ------
> You can have multiple documents generated by the same data-config:
>
> <dataConfig>
>  <dataSource name="ds1" .../>
>  <dataSource name="ds2" .../>
>  <dataSource name="ds3" .../>
>  <document>
>   <entity blah blah rootEntity="false">
>       <entity blah blah this is a document>
>          <entity sets unique id/>
>       </document>
>       <document blah blah this is another document>
>          <entity sets unique id>
>       </document>
>  </document>
> </dataConfig>
>
> It's the 'rootEntity="false" that makes the child entity a document.
> ------
>
> Dmitry
>
> On Thu, Feb 16, 2012 at 2:37 PM, Radu Toev <ra...@gmail.com> wrote:
>
> > <dataConfig>
> >  <dataSource
> >     name="s"
> >     driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
> >     url=""
> >     user=""
> >     password=""/>
> >  <dataSource
> >     name="p"
> >  driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
> >     url=""
> >     user=""
> >     password=""/>
> >  <document>
> >    <entity name="ms"
> >        datasource="s"
> > query="SELECT m.id as id, m.serial as m_machine_serial, m.ivk as
> > m_machine_ivk, m.sitename as m_sitename, m.deliveryDate as
> m_delivery_date,
> > m.hotsite as m_hotsite, m.guardian as m_guardian, m.warranty as
> m_warranty,
> > m.contract as m_contract,
> >   st.name as m_st_name, pm.name as m_pm_name, p.name as m_p_name,
> > sv.shortName as m_sv_name, c.clusterMajor as m_c_cluster_major,
> > c.clusterMinor as m_c_cluster_minor, c.country as m_c_country, c.code as
> > m_c_code
> >   FROM Machine AS m
> >   LEFT JOIN SystemType AS st ON m.fk_systemType=st.id
> >   LEFT JOIN ProductModel AS pm ON fk_productModel = pm.id
> >   LEFT JOIN Platform AS p ON m.fk_platform = p.id
> >   LEFT JOIN SoftwareVersion AS sv ON fk_softwareVersion = sv.id
> >   LEFT JOIN Country AS c ON fk_country = c.id"
> > readOnly="true"
> > transformer="DateFormatTransformer">
> > <field column="id" />
> > <field column="m_machine_serial"/>
> > <field column="m_machine_ivk"/>
> > <field column="m_sitename"/>
> > <filed column="m_delivery_date" dateTimeFormat="yyyy-MM-dd"/>
> > <field column="m_hotsite"/>
> > <field column="m_guardian"/>
> > <field column="m_warranty"/>
> > <field column="m_contract"/>
> > <field column="m_st_name"/>
> > <field column="m_pm_name"/>
> > <field column="m_p_name"/>
> > <field column="m_sv_name"/>
> > <field column="m_c_cluster_major"/>
> > <field column="m_c_cluster_minor"/>
> > <field column="m_c_country"/>
> > <field column="m_c_code"/>
> >   </entity>
> >
> >   <entity name="mp"
> >        datasource="p"
> > query="SELECT m.id as id, m.serial as m_machine_serial, m.ivk as
> > m_machine_ivk, m.sitename as m_sitename, m.deliveryDate as
> m_delivery_date,
> > m.hotsite as m_hotsite, m.guardian as m_guardian, m.warranty as
> m_warranty,
> > m.contract as m_contract,
> >   st.name as m_st_name, pm.name as m_pm_name, p.name as m_p_name,
> > sv.shortName as m_sv_name, c.clusterMajor as m_c_cluster_major,
> > c.clusterMinor as m_c_cluster_minor, c.country as m_c_country, c.code as
> > m_c_code
> >   FROM Machine AS m
> >   LEFT JOIN SystemType AS st ON m.fk_systemType=st.id
> >   LEFT JOIN ProductModel AS pm ON fk_productModel = pm.id
> >   LEFT JOIN Platform AS p ON m.fk_platform = p.id
> >   LEFT JOIN SoftwareVersion AS sv ON fk_softwareVersion = sv.id
> >   LEFT JOIN Country AS c ON fk_country = c.id"
> > readOnly="true"
> > transformer="DateFormatTransformer">
> > <field column="id" />
> > <field column="m_machine_serial"/>
> > <field column="m_machine_ivk"/>
> > <field column="m_sitename"/>
> > <filed column="m_delivery_date" dateTimeFormat="yyyy-MM-dd"/>
> > <field column="m_hotsite"/>
> > <field column="m_guardian"/>
> > <field column="m_warranty"/>
> > <field column="m_contract"/>
> > <field column="m_st_name"/>
> > <field column="m_pm_name"/>
> > <field column="m_p_name"/>
> > <field column="m_sv_name"/>
> > <field column="m_c_cluster_major"/>
> > <field column="m_c_cluster_minor"/>
> > <field column="m_c_country"/>
> > <field column="m_c_code"/>
> >   </entity>
> >  </document>
> > </dataConfig>
> >
> > I've removed the connection params
> > The unique key is id.
> >
> > On Thu, Feb 16, 2012 at 2:27 PM, Dmitry Kan <dm...@gmail.com>
> wrote:
> >
> > > OK, maybe you can show the db-data-config.xml just in case?
> > > Also in schema.xml, does you <uniqueKey> correspond to the unique field
> > in
> > > the db?
> > >
> > > On Thu, Feb 16, 2012 at 2:13 PM, Radu Toev <ra...@gmail.com> wrote:
> > >
> > > > I tried running with just one datasource(the one that has 6k entries)
> > and
> > > > it indexes them ok.
> > > > The same, if I do sepparately the 1k database. It indexes ok.
> > > >
> > > > On Thu, Feb 16, 2012 at 2:11 PM, Dmitry Kan <dm...@gmail.com>
> > > wrote:
> > > >
> > > > > It sounds a bit, as if SOLR stopped processing data once it queried
> > all
> > > > > from the smaller dataset. That's why you have 2000. If you just
> have
> > a
> > > > > handler pointed to the bigger data set (6k), do you manage to get
> all
> > > 6k
> > > > db
> > > > > entries into solr?
> > > > >
> > > > > On Thu, Feb 16, 2012 at 1:46 PM, Radu Toev <ra...@gmail.com>
> > wrote:
> > > > >
> > > > > > 1. Nothing in the logs
> > > > > > 2. No.
> > > > > >
> > > > > > On Thu, Feb 16, 2012 at 12:44 PM, Dmitry Kan <
> dmitry.kan@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > 1. Do you see any errors / exceptions in the logs?
> > > > > > > 2. Could you have duplicates?
> > > > > > >
> > > > > > > On Thu, Feb 16, 2012 at 10:15 AM, Radu Toev <
> radutoev@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > Hello,
> > > > > > > >
> > > > > > > > I created a data-config.xml file where I define a datasource
> > and
> > > an
> > > > > > > entity
> > > > > > > > with 12 fields.
> > > > > > > > In my use case I have 2 databases with the same schema, so I
> > want
> > > > to
> > > > > > > > combine in one index the 2 databases.
> > > > > > > > I defined a second dataSource tag and duplicateed the entity
> > with
> > > > its
> > > > > > > > field(changed the name and the datasource).
> > > > > > > > What I'm expecting is to get around 7k results(I have around
> 6k
> > > in
> > > > > the
> > > > > > > > first db and 1k in the second). However I'm getting a total
> of
> > > 2k.
> > > > > > > > Where could be the problem?
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Regards,
> > > > > > >
> > > > > > > Dmitry Kan
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Regards,
> > > > >
> > > > > Dmitry Kan
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > >
> > > Dmitry Kan
> > >
> >
>
>
>
> --
> Regards,
>
> Dmitry Kan
>

Re: Entity with multiple datasources

Posted by Dmitry Kan <dm...@gmail.com>.
I think the problem here is that initially you trying to create separate
documents for two different tables, while your config is aiming to create
only one document. Here there is one solution (not tried by me):

------
You can have multiple documents generated by the same data-config:

<dataConfig>
 <dataSource name="ds1" .../>
 <dataSource name="ds2" .../>
 <dataSource name="ds3" .../>
 <document>
   <entity blah blah rootEntity="false">
       <entity blah blah this is a document>
          <entity sets unique id/>
       </document>
       <document blah blah this is another document>
          <entity sets unique id>
       </document>
 </document>
</dataConfig>

It's the 'rootEntity="false" that makes the child entity a document.
------

Dmitry

On Thu, Feb 16, 2012 at 2:37 PM, Radu Toev <ra...@gmail.com> wrote:

> <dataConfig>
>  <dataSource
>     name="s"
>     driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
>     url=""
>     user=""
>     password=""/>
>  <dataSource
>     name="p"
>  driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
>     url=""
>     user=""
>     password=""/>
>  <document>
>    <entity name="ms"
>        datasource="s"
> query="SELECT m.id as id, m.serial as m_machine_serial, m.ivk as
> m_machine_ivk, m.sitename as m_sitename, m.deliveryDate as m_delivery_date,
> m.hotsite as m_hotsite, m.guardian as m_guardian, m.warranty as m_warranty,
> m.contract as m_contract,
>   st.name as m_st_name, pm.name as m_pm_name, p.name as m_p_name,
> sv.shortName as m_sv_name, c.clusterMajor as m_c_cluster_major,
> c.clusterMinor as m_c_cluster_minor, c.country as m_c_country, c.code as
> m_c_code
>   FROM Machine AS m
>   LEFT JOIN SystemType AS st ON m.fk_systemType=st.id
>   LEFT JOIN ProductModel AS pm ON fk_productModel = pm.id
>   LEFT JOIN Platform AS p ON m.fk_platform = p.id
>   LEFT JOIN SoftwareVersion AS sv ON fk_softwareVersion = sv.id
>   LEFT JOIN Country AS c ON fk_country = c.id"
> readOnly="true"
> transformer="DateFormatTransformer">
> <field column="id" />
> <field column="m_machine_serial"/>
> <field column="m_machine_ivk"/>
> <field column="m_sitename"/>
> <filed column="m_delivery_date" dateTimeFormat="yyyy-MM-dd"/>
> <field column="m_hotsite"/>
> <field column="m_guardian"/>
> <field column="m_warranty"/>
> <field column="m_contract"/>
> <field column="m_st_name"/>
> <field column="m_pm_name"/>
> <field column="m_p_name"/>
> <field column="m_sv_name"/>
> <field column="m_c_cluster_major"/>
> <field column="m_c_cluster_minor"/>
> <field column="m_c_country"/>
> <field column="m_c_code"/>
>   </entity>
>
>   <entity name="mp"
>        datasource="p"
> query="SELECT m.id as id, m.serial as m_machine_serial, m.ivk as
> m_machine_ivk, m.sitename as m_sitename, m.deliveryDate as m_delivery_date,
> m.hotsite as m_hotsite, m.guardian as m_guardian, m.warranty as m_warranty,
> m.contract as m_contract,
>   st.name as m_st_name, pm.name as m_pm_name, p.name as m_p_name,
> sv.shortName as m_sv_name, c.clusterMajor as m_c_cluster_major,
> c.clusterMinor as m_c_cluster_minor, c.country as m_c_country, c.code as
> m_c_code
>   FROM Machine AS m
>   LEFT JOIN SystemType AS st ON m.fk_systemType=st.id
>   LEFT JOIN ProductModel AS pm ON fk_productModel = pm.id
>   LEFT JOIN Platform AS p ON m.fk_platform = p.id
>   LEFT JOIN SoftwareVersion AS sv ON fk_softwareVersion = sv.id
>   LEFT JOIN Country AS c ON fk_country = c.id"
> readOnly="true"
> transformer="DateFormatTransformer">
> <field column="id" />
> <field column="m_machine_serial"/>
> <field column="m_machine_ivk"/>
> <field column="m_sitename"/>
> <filed column="m_delivery_date" dateTimeFormat="yyyy-MM-dd"/>
> <field column="m_hotsite"/>
> <field column="m_guardian"/>
> <field column="m_warranty"/>
> <field column="m_contract"/>
> <field column="m_st_name"/>
> <field column="m_pm_name"/>
> <field column="m_p_name"/>
> <field column="m_sv_name"/>
> <field column="m_c_cluster_major"/>
> <field column="m_c_cluster_minor"/>
> <field column="m_c_country"/>
> <field column="m_c_code"/>
>   </entity>
>  </document>
> </dataConfig>
>
> I've removed the connection params
> The unique key is id.
>
> On Thu, Feb 16, 2012 at 2:27 PM, Dmitry Kan <dm...@gmail.com> wrote:
>
> > OK, maybe you can show the db-data-config.xml just in case?
> > Also in schema.xml, does you <uniqueKey> correspond to the unique field
> in
> > the db?
> >
> > On Thu, Feb 16, 2012 at 2:13 PM, Radu Toev <ra...@gmail.com> wrote:
> >
> > > I tried running with just one datasource(the one that has 6k entries)
> and
> > > it indexes them ok.
> > > The same, if I do sepparately the 1k database. It indexes ok.
> > >
> > > On Thu, Feb 16, 2012 at 2:11 PM, Dmitry Kan <dm...@gmail.com>
> > wrote:
> > >
> > > > It sounds a bit, as if SOLR stopped processing data once it queried
> all
> > > > from the smaller dataset. That's why you have 2000. If you just have
> a
> > > > handler pointed to the bigger data set (6k), do you manage to get all
> > 6k
> > > db
> > > > entries into solr?
> > > >
> > > > On Thu, Feb 16, 2012 at 1:46 PM, Radu Toev <ra...@gmail.com>
> wrote:
> > > >
> > > > > 1. Nothing in the logs
> > > > > 2. No.
> > > > >
> > > > > On Thu, Feb 16, 2012 at 12:44 PM, Dmitry Kan <dmitry.kan@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > 1. Do you see any errors / exceptions in the logs?
> > > > > > 2. Could you have duplicates?
> > > > > >
> > > > > > On Thu, Feb 16, 2012 at 10:15 AM, Radu Toev <ra...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > I created a data-config.xml file where I define a datasource
> and
> > an
> > > > > > entity
> > > > > > > with 12 fields.
> > > > > > > In my use case I have 2 databases with the same schema, so I
> want
> > > to
> > > > > > > combine in one index the 2 databases.
> > > > > > > I defined a second dataSource tag and duplicateed the entity
> with
> > > its
> > > > > > > field(changed the name and the datasource).
> > > > > > > What I'm expecting is to get around 7k results(I have around 6k
> > in
> > > > the
> > > > > > > first db and 1k in the second). However I'm getting a total of
> > 2k.
> > > > > > > Where could be the problem?
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Regards,
> > > > > >
> > > > > > Dmitry Kan
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Regards,
> > > >
> > > > Dmitry Kan
> > > >
> > >
> >
> >
> >
> > --
> > Regards,
> >
> > Dmitry Kan
> >
>



-- 
Regards,

Dmitry Kan

Re: Entity with multiple datasources

Posted by Radu Toev <ra...@gmail.com>.
<dataConfig>
  <dataSource
     name="s"
     driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
     url=""
     user=""
     password=""/>
  <dataSource
     name="p"
 driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
     url=""
     user=""
     password=""/>
  <document>
    <entity name="ms"
        datasource="s"
query="SELECT m.id as id, m.serial as m_machine_serial, m.ivk as
m_machine_ivk, m.sitename as m_sitename, m.deliveryDate as m_delivery_date,
m.hotsite as m_hotsite, m.guardian as m_guardian, m.warranty as m_warranty,
m.contract as m_contract,
   st.name as m_st_name, pm.name as m_pm_name, p.name as m_p_name,
sv.shortName as m_sv_name, c.clusterMajor as m_c_cluster_major,
c.clusterMinor as m_c_cluster_minor, c.country as m_c_country, c.code as
m_c_code
   FROM Machine AS m
   LEFT JOIN SystemType AS st ON m.fk_systemType=st.id
   LEFT JOIN ProductModel AS pm ON fk_productModel = pm.id
   LEFT JOIN Platform AS p ON m.fk_platform = p.id
   LEFT JOIN SoftwareVersion AS sv ON fk_softwareVersion = sv.id
   LEFT JOIN Country AS c ON fk_country = c.id"
readOnly="true"
transformer="DateFormatTransformer">
<field column="id" />
<field column="m_machine_serial"/>
<field column="m_machine_ivk"/>
<field column="m_sitename"/>
<filed column="m_delivery_date" dateTimeFormat="yyyy-MM-dd"/>
<field column="m_hotsite"/>
<field column="m_guardian"/>
<field column="m_warranty"/>
<field column="m_contract"/>
<field column="m_st_name"/>
<field column="m_pm_name"/>
<field column="m_p_name"/>
<field column="m_sv_name"/>
<field column="m_c_cluster_major"/>
<field column="m_c_cluster_minor"/>
<field column="m_c_country"/>
<field column="m_c_code"/>
   </entity>

   <entity name="mp"
        datasource="p"
query="SELECT m.id as id, m.serial as m_machine_serial, m.ivk as
m_machine_ivk, m.sitename as m_sitename, m.deliveryDate as m_delivery_date,
m.hotsite as m_hotsite, m.guardian as m_guardian, m.warranty as m_warranty,
m.contract as m_contract,
   st.name as m_st_name, pm.name as m_pm_name, p.name as m_p_name,
sv.shortName as m_sv_name, c.clusterMajor as m_c_cluster_major,
c.clusterMinor as m_c_cluster_minor, c.country as m_c_country, c.code as
m_c_code
   FROM Machine AS m
   LEFT JOIN SystemType AS st ON m.fk_systemType=st.id
   LEFT JOIN ProductModel AS pm ON fk_productModel = pm.id
   LEFT JOIN Platform AS p ON m.fk_platform = p.id
   LEFT JOIN SoftwareVersion AS sv ON fk_softwareVersion = sv.id
   LEFT JOIN Country AS c ON fk_country = c.id"
readOnly="true"
transformer="DateFormatTransformer">
<field column="id" />
<field column="m_machine_serial"/>
<field column="m_machine_ivk"/>
<field column="m_sitename"/>
<filed column="m_delivery_date" dateTimeFormat="yyyy-MM-dd"/>
<field column="m_hotsite"/>
<field column="m_guardian"/>
<field column="m_warranty"/>
<field column="m_contract"/>
<field column="m_st_name"/>
<field column="m_pm_name"/>
<field column="m_p_name"/>
<field column="m_sv_name"/>
<field column="m_c_cluster_major"/>
<field column="m_c_cluster_minor"/>
<field column="m_c_country"/>
<field column="m_c_code"/>
   </entity>
  </document>
</dataConfig>

I've removed the connection params
The unique key is id.

On Thu, Feb 16, 2012 at 2:27 PM, Dmitry Kan <dm...@gmail.com> wrote:

> OK, maybe you can show the db-data-config.xml just in case?
> Also in schema.xml, does you <uniqueKey> correspond to the unique field in
> the db?
>
> On Thu, Feb 16, 2012 at 2:13 PM, Radu Toev <ra...@gmail.com> wrote:
>
> > I tried running with just one datasource(the one that has 6k entries) and
> > it indexes them ok.
> > The same, if I do sepparately the 1k database. It indexes ok.
> >
> > On Thu, Feb 16, 2012 at 2:11 PM, Dmitry Kan <dm...@gmail.com>
> wrote:
> >
> > > It sounds a bit, as if SOLR stopped processing data once it queried all
> > > from the smaller dataset. That's why you have 2000. If you just have a
> > > handler pointed to the bigger data set (6k), do you manage to get all
> 6k
> > db
> > > entries into solr?
> > >
> > > On Thu, Feb 16, 2012 at 1:46 PM, Radu Toev <ra...@gmail.com> wrote:
> > >
> > > > 1. Nothing in the logs
> > > > 2. No.
> > > >
> > > > On Thu, Feb 16, 2012 at 12:44 PM, Dmitry Kan <dm...@gmail.com>
> > > wrote:
> > > >
> > > > > 1. Do you see any errors / exceptions in the logs?
> > > > > 2. Could you have duplicates?
> > > > >
> > > > > On Thu, Feb 16, 2012 at 10:15 AM, Radu Toev <ra...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hello,
> > > > > >
> > > > > > I created a data-config.xml file where I define a datasource and
> an
> > > > > entity
> > > > > > with 12 fields.
> > > > > > In my use case I have 2 databases with the same schema, so I want
> > to
> > > > > > combine in one index the 2 databases.
> > > > > > I defined a second dataSource tag and duplicateed the entity with
> > its
> > > > > > field(changed the name and the datasource).
> > > > > > What I'm expecting is to get around 7k results(I have around 6k
> in
> > > the
> > > > > > first db and 1k in the second). However I'm getting a total of
> 2k.
> > > > > > Where could be the problem?
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Regards,
> > > > >
> > > > > Dmitry Kan
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > >
> > > Dmitry Kan
> > >
> >
>
>
>
> --
> Regards,
>
> Dmitry Kan
>

Re: Entity with multiple datasources

Posted by Dmitry Kan <dm...@gmail.com>.
OK, maybe you can show the db-data-config.xml just in case?
Also in schema.xml, does you <uniqueKey> correspond to the unique field in
the db?

On Thu, Feb 16, 2012 at 2:13 PM, Radu Toev <ra...@gmail.com> wrote:

> I tried running with just one datasource(the one that has 6k entries) and
> it indexes them ok.
> The same, if I do sepparately the 1k database. It indexes ok.
>
> On Thu, Feb 16, 2012 at 2:11 PM, Dmitry Kan <dm...@gmail.com> wrote:
>
> > It sounds a bit, as if SOLR stopped processing data once it queried all
> > from the smaller dataset. That's why you have 2000. If you just have a
> > handler pointed to the bigger data set (6k), do you manage to get all 6k
> db
> > entries into solr?
> >
> > On Thu, Feb 16, 2012 at 1:46 PM, Radu Toev <ra...@gmail.com> wrote:
> >
> > > 1. Nothing in the logs
> > > 2. No.
> > >
> > > On Thu, Feb 16, 2012 at 12:44 PM, Dmitry Kan <dm...@gmail.com>
> > wrote:
> > >
> > > > 1. Do you see any errors / exceptions in the logs?
> > > > 2. Could you have duplicates?
> > > >
> > > > On Thu, Feb 16, 2012 at 10:15 AM, Radu Toev <ra...@gmail.com>
> > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > I created a data-config.xml file where I define a datasource and an
> > > > entity
> > > > > with 12 fields.
> > > > > In my use case I have 2 databases with the same schema, so I want
> to
> > > > > combine in one index the 2 databases.
> > > > > I defined a second dataSource tag and duplicateed the entity with
> its
> > > > > field(changed the name and the datasource).
> > > > > What I'm expecting is to get around 7k results(I have around 6k in
> > the
> > > > > first db and 1k in the second). However I'm getting a total of 2k.
> > > > > Where could be the problem?
> > > > >
> > > > > Thanks
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Regards,
> > > >
> > > > Dmitry Kan
> > > >
> > >
> >
> >
> >
> > --
> > Regards,
> >
> > Dmitry Kan
> >
>



-- 
Regards,

Dmitry Kan

Re: Entity with multiple datasources

Posted by Radu Toev <ra...@gmail.com>.
I tried running with just one datasource(the one that has 6k entries) and
it indexes them ok.
The same, if I do sepparately the 1k database. It indexes ok.

On Thu, Feb 16, 2012 at 2:11 PM, Dmitry Kan <dm...@gmail.com> wrote:

> It sounds a bit, as if SOLR stopped processing data once it queried all
> from the smaller dataset. That's why you have 2000. If you just have a
> handler pointed to the bigger data set (6k), do you manage to get all 6k db
> entries into solr?
>
> On Thu, Feb 16, 2012 at 1:46 PM, Radu Toev <ra...@gmail.com> wrote:
>
> > 1. Nothing in the logs
> > 2. No.
> >
> > On Thu, Feb 16, 2012 at 12:44 PM, Dmitry Kan <dm...@gmail.com>
> wrote:
> >
> > > 1. Do you see any errors / exceptions in the logs?
> > > 2. Could you have duplicates?
> > >
> > > On Thu, Feb 16, 2012 at 10:15 AM, Radu Toev <ra...@gmail.com>
> wrote:
> > >
> > > > Hello,
> > > >
> > > > I created a data-config.xml file where I define a datasource and an
> > > entity
> > > > with 12 fields.
> > > > In my use case I have 2 databases with the same schema, so I want to
> > > > combine in one index the 2 databases.
> > > > I defined a second dataSource tag and duplicateed the entity with its
> > > > field(changed the name and the datasource).
> > > > What I'm expecting is to get around 7k results(I have around 6k in
> the
> > > > first db and 1k in the second). However I'm getting a total of 2k.
> > > > Where could be the problem?
> > > >
> > > > Thanks
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > >
> > > Dmitry Kan
> > >
> >
>
>
>
> --
> Regards,
>
> Dmitry Kan
>

Re: Entity with multiple datasources

Posted by Dmitry Kan <dm...@gmail.com>.
It sounds a bit, as if SOLR stopped processing data once it queried all
from the smaller dataset. That's why you have 2000. If you just have a
handler pointed to the bigger data set (6k), do you manage to get all 6k db
entries into solr?

On Thu, Feb 16, 2012 at 1:46 PM, Radu Toev <ra...@gmail.com> wrote:

> 1. Nothing in the logs
> 2. No.
>
> On Thu, Feb 16, 2012 at 12:44 PM, Dmitry Kan <dm...@gmail.com> wrote:
>
> > 1. Do you see any errors / exceptions in the logs?
> > 2. Could you have duplicates?
> >
> > On Thu, Feb 16, 2012 at 10:15 AM, Radu Toev <ra...@gmail.com> wrote:
> >
> > > Hello,
> > >
> > > I created a data-config.xml file where I define a datasource and an
> > entity
> > > with 12 fields.
> > > In my use case I have 2 databases with the same schema, so I want to
> > > combine in one index the 2 databases.
> > > I defined a second dataSource tag and duplicateed the entity with its
> > > field(changed the name and the datasource).
> > > What I'm expecting is to get around 7k results(I have around 6k in the
> > > first db and 1k in the second). However I'm getting a total of 2k.
> > > Where could be the problem?
> > >
> > > Thanks
> > >
> >
> >
> >
> > --
> > Regards,
> >
> > Dmitry Kan
> >
>



-- 
Regards,

Dmitry Kan

Re: Entity with multiple datasources

Posted by Radu Toev <ra...@gmail.com>.
1. Nothing in the logs
2. No.

On Thu, Feb 16, 2012 at 12:44 PM, Dmitry Kan <dm...@gmail.com> wrote:

> 1. Do you see any errors / exceptions in the logs?
> 2. Could you have duplicates?
>
> On Thu, Feb 16, 2012 at 10:15 AM, Radu Toev <ra...@gmail.com> wrote:
>
> > Hello,
> >
> > I created a data-config.xml file where I define a datasource and an
> entity
> > with 12 fields.
> > In my use case I have 2 databases with the same schema, so I want to
> > combine in one index the 2 databases.
> > I defined a second dataSource tag and duplicateed the entity with its
> > field(changed the name and the datasource).
> > What I'm expecting is to get around 7k results(I have around 6k in the
> > first db and 1k in the second). However I'm getting a total of 2k.
> > Where could be the problem?
> >
> > Thanks
> >
>
>
>
> --
> Regards,
>
> Dmitry Kan
>

Re: Entity with multiple datasources

Posted by Dmitry Kan <dm...@gmail.com>.
1. Do you see any errors / exceptions in the logs?
2. Could you have duplicates?

On Thu, Feb 16, 2012 at 10:15 AM, Radu Toev <ra...@gmail.com> wrote:

> Hello,
>
> I created a data-config.xml file where I define a datasource and an entity
> with 12 fields.
> In my use case I have 2 databases with the same schema, so I want to
> combine in one index the 2 databases.
> I defined a second dataSource tag and duplicateed the entity with its
> field(changed the name and the datasource).
> What I'm expecting is to get around 7k results(I have around 6k in the
> first db and 1k in the second). However I'm getting a total of 2k.
> Where could be the problem?
>
> Thanks
>



-- 
Regards,

Dmitry Kan