You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by st...@bt.com on 2010/02/08 13:12:23 UTC

How to configure multiple data import types

I have got a dataimport request handler configured to index data by selecting data from a DB view 

I now need to index additional data sets from other views so that I can support other search queries

I defined additional <entity ..> definitions within the <document ..>  section of my data-config.xml
But I only seem to pull in data for the 1st <entity ..>  and not both


Is there an xsd (or dtd) for 
	data-config.xml
	schema.xml
	slrconfig.xml

As these might help with understanding how to construct usable conf files

Regards
Stefan Maric 
BT Innovate & Design | Collaboration Platform - Customer Innovation Solutions

Re: How to configure multiple data import types

Posted by Chris Hostetter <ho...@fucit.org>.
: Subject: How to configure multiple data import types
: In-Reply-To: <4B...@zib.de>

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking




-Hoss


Re: Indexing / querying multiple data types

Posted by Chris Hostetter <ho...@fucit.org>.
: Subject: Indexing / querying multiple data types
: In-Reply-To: <8C...@XMB-RCD-104.cisco.com>

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking





-Hoss


Re: Indexing / querying multiple data types

Posted by Lance Norskog <go...@gmail.com>.
I gave you bad advice about qt=. Erik Hatcher kindly corrected me:

>> Actually qt selects the request handler.  defType selects the query parser.  qt may implicitly select a query parser of course, but that would depend on the request handler definition.

On Wed, Feb 10, 2010 at 1:10 PM, Stefan Maric <sm...@ntlworld.com> wrote:
> Lance
>
> after a bit more reading - & cleaning up my configuration (case sensitivity corrected but didn't appear to be affecting the indexing & i don't use the atomID field for querying anyhow)
>
> I've added a docType field when I index my data and now use the fq parameter to filter on that new field
>
>
>
>
>
> -----Original Message-----
> From: Lance Norskog [mailto:goksron@gmail.com]
> Sent: 10 February 2010 03:28
> To: solr-user@lucene.apache.org
> Subject: Re: Indexing / querying multiple data types
>
>
> A couple of minor problems:
>
> The qt parameter (Que Tee) selects the parser for the q (Q for query)
> parameter. I think you mean 'qf':
>
> http://wiki.apache.org/solr/DisMaxRequestHandler#qf_.28Query_Fields.29
>
> Another problems with atomID, atomId, atomid: Solr field names are
> case-sensitive. I don't know how this plays out.
>
> Now, to the main part:  the <entity name="name1"> part does not create
> a column named name1.
> The two queries only populate the same namespace of four fields: id,
> atomID, name, description.
>
> If you want data from each entity to have a constant field
> distinguishing it, you have to create a new field with a constant
> value. You do this with the TemplateTransformer.
>
> http://wiki.apache.org/solr/DataImportHandler#TemplateTransformer
>
> Add this as an entity attribute to both entities:
>    transformer="TemplateTransformer"
> and add this as a column to each entity:
>    <field column="name" template="name1"> and then "name2".
>
> You may have to do something else for these to appear in the document.
>
> On Tue, Feb 9, 2010 at 12:41 AM,  <st...@bt.com> wrote:
>> Sven
>>
>> In my data-config.xml I have the following
>>        <document >
>>                <entity name="name1" query="select id, atomID, name, description from v_1" />
>>                <entity name="name2" query="select id, atomID, name, description from V_2" />
>>        </document>
>>
>> In my schema.xml I have
>>   <field name="id" type="string" indexed="true" stored="true" required="true" />
>>   <field name="name" type="text" indexed="true" stored="true"/>
>>   <field name="atomId" type="string" indexed="false" stored="true" required="true" />
>>   <field name="description" type="text" indexed="true" stored="true" />
>>
>> And in my solrconfig.xml I have
>>  <requestHandler name="/dataimport"
>>        class="org.apache.solr.handler.dataimport.DataImportHandler">
>>    <lst name="defaults">
>>                <str name="config">data-config.xml</str>
>>    </lst>
>>  </requestHandler>
>>
>>        <requestHandler name="name1" class="solr.SearchHandler" >
>>                <lst name="defaults">
>>                        <str name="defType">dismax</str>
>>                        <str name="echoParams">explicit</str>
>>                        <float name="tie">0.01</float>
>>                        <str name="qf">name^1.5 description^1.0</str>
>>                </lst>
>>        </requestHandler>
>>
>>        <requestHandler name="contacts" class="solr.SearchHandler" >
>>                <lst name="defaults">
>>                        <str name="defType">dismax</str>
>>                        <str name="echoParams">explicit</str>
>>                        <float name="tie">0.01</float>
>>                        <str name="qf">name^1.5 description^1.0</str>
>>                </lst>
>>        </requestHandler>
>>
>> And the
>>  <requestHandler name="dismax" class="solr.SearchHandler" >
>> Has been untouched
>>
>> So when I run
>> http://localhost:7001/solr/select/?q=food&qt=name1
>> I was expecting to get results form the data that had been indexed by <entity name="name1"
>>
>>
>> Regards
>> Stefan Maric
>>
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.435 / Virus Database: 271.1.1/2677 - Release Date: 02/09/10 07:35:00
>
>
>



-- 
Lance Norskog
goksron@gmail.com

RE: Indexing / querying multiple data types

Posted by Stefan Maric <sm...@ntlworld.com>.
Lance

after a bit more reading - & cleaning up my configuration (case sensitivity corrected but didn't appear to be affecting the indexing & i don't use the atomID field for querying anyhow)

I've added a docType field when I index my data and now use the fq parameter to filter on that new field





-----Original Message-----
From: Lance Norskog [mailto:goksron@gmail.com]
Sent: 10 February 2010 03:28
To: solr-user@lucene.apache.org
Subject: Re: Indexing / querying multiple data types


A couple of minor problems:

The qt parameter (Que Tee) selects the parser for the q (Q for query)
parameter. I think you mean 'qf':

http://wiki.apache.org/solr/DisMaxRequestHandler#qf_.28Query_Fields.29

Another problems with atomID, atomId, atomid: Solr field names are
case-sensitive. I don't know how this plays out.

Now, to the main part:  the <entity name="name1"> part does not create
a column named name1.
The two queries only populate the same namespace of four fields: id,
atomID, name, description.

If you want data from each entity to have a constant field
distinguishing it, you have to create a new field with a constant
value. You do this with the TemplateTransformer.

http://wiki.apache.org/solr/DataImportHandler#TemplateTransformer

Add this as an entity attribute to both entities:
    transformer="TemplateTransformer"
and add this as a column to each entity:
    <field column="name" template="name1"> and then "name2".

You may have to do something else for these to appear in the document.

On Tue, Feb 9, 2010 at 12:41 AM,  <st...@bt.com> wrote:
> Sven
>
> In my data-config.xml I have the following
>        <document >
>                <entity name="name1" query="select id, atomID, name, description from v_1" />
>                <entity name="name2" query="select id, atomID, name, description from V_2" />
>        </document>
>
> In my schema.xml I have
>   <field name="id" type="string" indexed="true" stored="true" required="true" />
>   <field name="name" type="text" indexed="true" stored="true"/>
>   <field name="atomId" type="string" indexed="false" stored="true" required="true" />
>   <field name="description" type="text" indexed="true" stored="true" />
>
> And in my solrconfig.xml I have
>  <requestHandler name="/dataimport"
>        class="org.apache.solr.handler.dataimport.DataImportHandler">
>    <lst name="defaults">
>                <str name="config">data-config.xml</str>
>    </lst>
>  </requestHandler>
>
>        <requestHandler name="name1" class="solr.SearchHandler" >
>                <lst name="defaults">
>                        <str name="defType">dismax</str>
>                        <str name="echoParams">explicit</str>
>                        <float name="tie">0.01</float>
>                        <str name="qf">name^1.5 description^1.0</str>
>                </lst>
>        </requestHandler>
>
>        <requestHandler name="contacts" class="solr.SearchHandler" >
>                <lst name="defaults">
>                        <str name="defType">dismax</str>
>                        <str name="echoParams">explicit</str>
>                        <float name="tie">0.01</float>
>                        <str name="qf">name^1.5 description^1.0</str>
>                </lst>
>        </requestHandler>
>
> And the
>  <requestHandler name="dismax" class="solr.SearchHandler" >
> Has been untouched
>
> So when I run
> http://localhost:7001/solr/select/?q=food&qt=name1
> I was expecting to get results form the data that had been indexed by <entity name="name1"
>
>
> Regards
> Stefan Maric
>



-- 
Lance Norskog
goksron@gmail.com
No virus found in this incoming message.
Checked by AVG - www.avg.com 
Version: 8.5.435 / Virus Database: 271.1.1/2677 - Release Date: 02/09/10 07:35:00



Re: Indexing / querying multiple data types

Posted by Lance Norskog <go...@gmail.com>.
A couple of minor problems:

The qt parameter (Que Tee) selects the parser for the q (Q for query)
parameter. I think you mean 'qf':

http://wiki.apache.org/solr/DisMaxRequestHandler#qf_.28Query_Fields.29

Another problems with atomID, atomId, atomid: Solr field names are
case-sensitive. I don't know how this plays out.

Now, to the main part:  the <entity name="name1"> part does not create
a column named name1.
The two queries only populate the same namespace of four fields: id,
atomID, name, description.

If you want data from each entity to have a constant field
distinguishing it, you have to create a new field with a constant
value. You do this with the TemplateTransformer.

http://wiki.apache.org/solr/DataImportHandler#TemplateTransformer

Add this as an entity attribute to both entities:
    transformer="TemplateTransformer"
and add this as a column to each entity:
    <field column="name" template="name1"> and then "name2".

You may have to do something else for these to appear in the document.

On Tue, Feb 9, 2010 at 12:41 AM,  <st...@bt.com> wrote:
> Sven
>
> In my data-config.xml I have the following
>        <document >
>                <entity name="name1" query="select id, atomID, name, description from v_1" />
>                <entity name="name2" query="select id, atomID, name, description from V_2" />
>        </document>
>
> In my schema.xml I have
>   <field name="id" type="string" indexed="true" stored="true" required="true" />
>   <field name="name" type="text" indexed="true" stored="true"/>
>   <field name="atomId" type="string" indexed="false" stored="true" required="true" />
>   <field name="description" type="text" indexed="true" stored="true" />
>
> And in my solrconfig.xml I have
>  <requestHandler name="/dataimport"
>        class="org.apache.solr.handler.dataimport.DataImportHandler">
>    <lst name="defaults">
>                <str name="config">data-config.xml</str>
>    </lst>
>  </requestHandler>
>
>        <requestHandler name="name1" class="solr.SearchHandler" >
>                <lst name="defaults">
>                        <str name="defType">dismax</str>
>                        <str name="echoParams">explicit</str>
>                        <float name="tie">0.01</float>
>                        <str name="qf">name^1.5 description^1.0</str>
>                </lst>
>        </requestHandler>
>
>        <requestHandler name="contacts" class="solr.SearchHandler" >
>                <lst name="defaults">
>                        <str name="defType">dismax</str>
>                        <str name="echoParams">explicit</str>
>                        <float name="tie">0.01</float>
>                        <str name="qf">name^1.5 description^1.0</str>
>                </lst>
>        </requestHandler>
>
> And the
>  <requestHandler name="dismax" class="solr.SearchHandler" >
> Has been untouched
>
> So when I run
> http://localhost:7001/solr/select/?q=food&qt=name1
> I was expecting to get results form the data that had been indexed by <entity name="name1"
>
>
> Regards
> Stefan Maric
>



-- 
Lance Norskog
goksron@gmail.com

RE: Indexing / querying multiple data types

Posted by st...@bt.com.
Sven

In my data-config.xml I have the following 
	<document >
		<entity name="name1" query="select id, atomID, name, description from v_1" />
		<entity name="name2" query="select id, atomID, name, description from V_2" />
	</document>

In my schema.xml I have
   <field name="id" type="string" indexed="true" stored="true" required="true" /> 
   <field name="name" type="text" indexed="true" stored="true"/>
   <field name="atomId" type="string" indexed="false" stored="true" required="true" /> 
   <field name="description" type="text" indexed="true" stored="true" />

And in my solrconfig.xml I have
 <requestHandler name="/dataimport" 
 	class="org.apache.solr.handler.dataimport.DataImportHandler">
    <lst name="defaults">
		<str name="config">data-config.xml</str>
    </lst>
  </requestHandler>

	<requestHandler name="name1" class="solr.SearchHandler" >
		<lst name="defaults">
			<str name="defType">dismax</str>
			<str name="echoParams">explicit</str>
			<float name="tie">0.01</float>
			<str name="qf">name^1.5 description^1.0</str>
		</lst>
	</requestHandler>

	<requestHandler name="contacts" class="solr.SearchHandler" >
		<lst name="defaults">
			<str name="defType">dismax</str>
			<str name="echoParams">explicit</str>
			<float name="tie">0.01</float>
			<str name="qf">name^1.5 description^1.0</str>
		</lst>
	</requestHandler>

And the 
  <requestHandler name="dismax" class="solr.SearchHandler" >
Has been untouched

So when I run
http://localhost:7001/solr/select/?q=food&qt=name1
I was expecting to get results form the data that had been indexed by <entity name="name1" 


Regards
Stefan Maric 

Re: Indexing / querying multiple data types

Posted by Sven Maurmann <sv...@kippdata.de>.
Hi,

could you be a little more precise about your configuration?
It may be much easier to answer your question then.

Cheers,
     Sven

--On Montag, 8. Februar 2010 17:39 +0000 stefan.maric@bt.com wrote:

> OK - so I've now got my data-config.xml sorted so that I'm pulling in the
> expected number of indexed documents for my two data sets
>
> So I've defined two entities (name1 & name2) and they both make use of
> the same fields  --  I'm not sure if this is a good thing to have done
>
> When I run a query I include qt=name1 (or qt=name2) and am expecting to
> only get the number of results from the appropriate data set --  in fact
> I'm getting the sum total from both
>
> Does the entity name=name1 equate to the query qt=name1
>
> In my solrconfig.xml I have defined two requestHandlers (name1 & name2)
> using the common set of fields
>
> So how do ensure that my query
> http://localhost:7001/solr/select/?q=food&qt=name1
> or
> http://localhost:7001/solr/select/?q=food&qt=name2
>
> Will operate on the correct data set as loaded via the data import  --
> <entity name=name1> or <entity name=name2>
>
>
>
>
> Thankss
> Stefan Maric
> BT Innovate & Design | Collaboration Platform - Customer Innovation
> Solutions

Indexing / querying multiple data types

Posted by st...@bt.com.
OK - so I've now got my data-config.xml sorted so that I'm pulling in the expected number of indexed documents for my two data sets

So I've defined two entities (name1 & name2) and they both make use of the same fields  --  I'm not sure if this is a good thing to have done

When I run a query I include qt=name1 (or qt=name2) and am expecting to only get the number of results from the appropriate data set --  in fact I'm getting the sum total from both

Does the entity name=name1 equate to the query qt=name1

In my solrconfig.xml I have defined two requestHandlers (name1 & name2) using the common set of fields 

So how do ensure that my query
http://localhost:7001/solr/select/?q=food&qt=name1
or
http://localhost:7001/solr/select/?q=food&qt=name2

Will operate on the correct data set as loaded via the data import  -- <entity name=name1> or <entity name=name2>




Thankss
Stefan Maric 
BT Innovate & Design | Collaboration Platform - Customer Innovation Solutions

RE: How to configure multiple data import types

Posted by "Ken Lane (kenlane)" <ke...@cisco.com>.
It sounds like you are doing it correctly, Stefan. Must be something
syntactical. The schema.xml and solrconfig.xml does not factor into your
problem, only the data-config.

I do the same thing you are trying to do. A watered down version is:

<dataConfig>
  <dataSource type="JdbcDataSource" 
              name="bdb-1" 
              driver="oracle.jdbc.driver.OracleDriver"
              url="jdbc:oracle:thin:@(DESCRIPTION = (LOAD_BALANCE = on)
(FAILOVER = on) (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST =
server.domain.com)(PORT = 1528))) (CONNECT_DATA = (SERVICE_NAME =
instance.domain.COM)))"
              user="scott" 
              password="tiger"/>
  <document name="monitors">
    <entity name="bdbmon" dataSource="bdb-1" query="SELECT column from
table">
    </entity> 
     <entity name="bug"  dataSource="bdb-1"  
            query="SELECT another_column from another_table">
      </entity>
  </document>
</dataConfig>

Hope this helps...

-----Original Message-----
From: stefan.maric@bt.com [mailto:stefan.maric@bt.com] 
Sent: Monday, February 08, 2010 7:34 AM
To: solr-user@lucene.apache.org; noble.paul@gmail.com
Subject: RE: How to configure multiple data import types

No my views have already taken care of pulling the related data together


I've indexed my first data set and now want to configure a second
(non-related) data set so that a User can issue a query for data set #1
whilst another user might be querying for data set #2

Should I be defining multiple <document ..> or <entity ..> entries
Or what ??

Thanks
Stefan Maric 

Re: How to configure multiple data import types

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Mon, Feb 8, 2010 at 6:03 PM, <st...@bt.com> wrote:

> No my views have already taken care of pulling the related data together
>
> I've indexed my first data set and now want to configure a second
> (non-related) data set so that a User can issue a query for data set #1
> whilst another user might be querying for data set #2
>
> Should I be defining multiple <document ..> or <entity ..> entries
> Or what ??
>
>
You can define multiple entities (all at the root level) to import all your
views at once.

-- 
Regards,
Shalin Shekhar Mangar.

RE: How to configure multiple data import types

Posted by st...@bt.com.
No my views have already taken care of pulling the related data together 

I've indexed my first data set and now want to configure a second (non-related) data set so that a User can issue a query for data set #1 whilst another user might be querying for data set #2

Should I be defining multiple <document ..> or <entity ..> entries
Or what ??

Thanks
Stefan Maric 

Re: How to configure multiple data import types

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@corp.aol.com>.
are you referring to nested entities?
http://wiki.apache.org/solr/DIHQuickStart#Index_data_from_multiple_tables_into_Solr

On Mon, Feb 8, 2010 at 5:42 PM,  <st...@bt.com> wrote:
> I have got a dataimport request handler configured to index data by selecting data from a DB view
>
> I now need to index additional data sets from other views so that I can support other search queries
>
> I defined additional <entity ..> definitions within the <document ..>  section of my data-config.xml
> But I only seem to pull in data for the 1st <entity ..>  and not both
>
>
> Is there an xsd (or dtd) for
>        data-config.xml
>        schema.xml
>        slrconfig.xml
>
> As these might help with understanding how to construct usable conf files
>
> Regards
> Stefan Maric
> BT Innovate & Design | Collaboration Platform - Customer Innovation Solutions
>



-- 
-----------------------------------------------------
Noble Paul | Systems Architect| AOL | http://aol.com