You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Andre Lopes <lo...@gmail.com> on 2012/08/05 13:47:53 UTC

How to configure schema.xml to take in account two database tables?

Hi,

I'm new to Solr. I've take some reads about how it works, but I can't
find a clue for my specific situation.

Here is my case. I've 2 database tables that I need to add to the
index, but they are related. One entry in the table "clients" could
have more than one entry in the table "contacts". Here is the visual
example:

Table clients:

id | name        | description
1  | customer 1  | This is the description of customer 1
2  | customer 2  | This is the description of customer 2
3  | customer 3  | This is the description of customer 3
4  | customer 4  | This is the description of customer 4

Table contacts:

id | phone_number
1  | 921234567
1  | 932122345
2  | 934545444
3  | 943322345
3  | 343445545
3  | 213443435

I think the case is simple. If in a search I input "921234567" I must
to present information about "customer 1".

My question... How can I setup the schema.xml in a way that I will
take in account this two database tables?


Best Regards,

Re: How to configure schema.xml to take in account two database tables?

Posted by Andre Lopes <lo...@gmail.com>.
Hi,

I've found what's wrong. By default the query was returning 10 results.

With "rows" I can now return more than 10:

http://localhost:8983/solr/select?q=*:*&rows=400

Thanks for the help. From here I will try do dig deeper.

Best Regards,


On Sun, Aug 5, 2012 at 7:20 PM, Andre Lopes <lo...@gmail.com> wrote:
> Hi,
>
> Thanks for the replies. The info in my admin/stats is the following:
>
> searcherName : Searcher@f4e40da main
> caching : true
> numDocs : 654
> maxDoc : 654
> reader : SolrIndexReader{this=6a6078e7,r=ReadOnlyDirectoryReader@6a6078e7,refCnt=1,segments=1}
> readerDir : org.apache.lucene.store.MMapDirectory@/home/andre/workspace/test/3rd_party/solr/apache-solr-3.6.1/example/solr/data/index
> lockFactory=org.apache.lucene.store.NativeFSLockFactory@51a422f6
> indexVersion : 1343578710140
> openedAt : Sun Aug 05 19:04:35 WEST 2012
> registeredAt : Sun Aug 05 19:04:35 WEST 2012
> warmupTime : 15
>
> There are 654 docs.
>
> Some more info, my solrconfig.xml:
>
>   <!-- Request handler added by Andre Lopes to import data from database -->
>   <requestHandler name="/dataimport"
> class="org.apache.solr.handler.dataimport.DataImportHandler">
>     <!-- default values for query parameters can be specified, these
>          will be overridden by parameters in the request
>       -->
>      <lst name="defaults">
>        <str name="config">db-data-config.xml</str>
>      </lst>
>
>   </requestHandler>
>
>
> My db-data-config.xml:
>
> <?xml version="1.0" encoding="UTF-8" ?>
> <dataConfig>
>         <dataSource driver="org.postgresql.Driver"
> url="jdbc:postgresql://localhost:5432/euvoudebicicleta" user="myuser"
> password="mypass" />
>         <document>
>                 <entity name="bicyclebusinesses" query="select * from
> table_text__single_occurrencies order by date_inserted">
>                 <field column="uri" name="uri" />
>                 <field column="business_name" name="name" />
>                 <field column="business_address" name="address" />
>                 </entity>
>         </document>
> </dataConfig>
>
>
> My schema.xml:
>
> <?xml version="1.0" encoding="UTF-8" ?>
> <schema name="example" version="1.5">
>   <types>
>     <fieldType name="string" class="solr.StrField"/>
>   </types>
>
>   <fields>
>     <dynamicField name="*"       type="string" indexed="false" stored="false" />
>     <field name="uri" type="string" indexed="true" stored="true" />
> <!--
>     <field name="name" type="string" indexed="true" stored="true" />
>     <field name="address" type="string" indexed="true" stored="true" />
> -->
>   </fields>
>     <uniqueKey>uri</uniqueKey>
>    <!-- <defaultSearchField>catchall</defaultSearchField> -->
> </schema>
>
>
> I've tested, and the SELECT in the db-data-config.xml outputs 654
> results. Some more clues?
>
>
> Best Regards,
>
>
>
>
> On Sun, Aug 5, 2012 at 6:59 PM, Erick Erickson <er...@gmail.com> wrote:
>> A quick check here is to go to your admin/stats page and look at
>> numDocs and maxDocs. numDocs is the number of documents that it's
>> possible to find, i.e. non updated/deleted docs. maxDocs is the number
>> of documents that have been added, and that count includes ones with
>> duplicate unique IDs.
>>
>> So I'm guessing that numDocs == 9 and maxDocs == 654, which as Jack
>> says indicates that your uniqueKey is repeated for lots and lots of
>> your data...
>>
>> Best
>> Erick
>>
>> On Sun, Aug 5, 2012 at 1:40 PM, Jack Krupansky <ja...@basetechnology.com> wrote:
>>> Make sure the id is not duplicated. You might have inadvertently populated
>>> the id field in your Solr schema with some non-key value that occurs with
>>> high frequency (and may have roughly 9 unique values.)
>>>
>>> Examine the 9 results and their id fields. Then look at some of your input
>>> data to verify that the values placed in the id field are what you expected.
>>>
>>> If possible, identify one input record that isn't in the 9 results but
>>> should be and verify its id.
>>>
>>>
>>> -- Jack Krupansky
>>>
>>> -----Original Message----- From: Andre Lopes
>>> Sent: Sunday, August 05, 2012 1:31 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: How to configure schema.xml to take in account two database
>>> tables?
>>>
>>>
>>> Thanks for the replies,
>>>
>>> I've now successfully indexed the database using the DataImportHandler
>>> but there is something weird. I've indexed 654 entries but I can't
>>> output all the 654 results.
>>>
>>> After the I run the
>>> "http://localhost:8983/solr/dataimport?command=full-import" I got 654
>>> adds:
>>>
>>> Aug 5, 2012 6:16:51 PM
>>> org.apache.solr.update.processor.LogUpdateProcessor finish
>>> INFO: {deleteByQuery=*:*,add=[http://1.com, http://2.com,
>>> http://3.com, http://4.com, http://5.com, http://6a.com, http://7.vu,
>>> http://8.com/, ... (654 adds)],commit=} 0 35
>>>
>>> But when I query the Solr with this query
>>> "http://localhost:8983/solr/select?q=*:*" I only get 9 results.
>>>
>>> I've used a very basic schema.xml:
>>>
>>> <?xml version="1.0" encoding="UTF-8" ?>
>>> <schema name="example" version="1.5">
>>>
>>>  <types>
>>>    <fieldType name="string" class="solr.StrField"/>
>>>  </types>
>>>
>>>  <fields>
>>>    <dynamicField name="*"       type="string" indexed="true" stored="true"
>>> />
>>>
>>>    <field name="id" type="string" indexed="true" stored="true"
>>> multiValued="false" />
>>>    <field name="name" type="string" indexed="true" stored="true"
>>> multiValued="false" />
>>>    <field name="address" type="string" indexed="true" stored="true"
>>> multiValued="false" />
>>>
>>>  </fields>
>>>
>>>    <uniqueKey>id</uniqueKey>
>>>   <!-- <defaultSearchField>catchall</defaultSearchField> -->
>>>
>>> </schema>
>>>
>>>
>>> Some clues on what I'm doing wrong?
>>>
>>> Best Regards,
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Sun, Aug 5, 2012 at 1:19 PM, Gora Mohanty <go...@mimirtech.com> wrote:
>>>>
>>>> On 5 August 2012 17:17, Andre Lopes <lo...@gmail.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I'm new to Solr. I've take some reads about how it works, but I can't
>>>>> find a clue for my specific situation.
>>>>>
>>>>> Here is my case. I've 2 database tables that I need to add to the
>>>>> index, but they are related. One entry in the table "clients" could
>>>>> have more than one entry in the table "contacts".
>>>>
>>>> [...]
>>>>
>>>> There seem to be various things that you need clarity on:
>>>> 1. Firstly, schema.xml describes the various fields that you
>>>>     might be indexing, and/or storing in Solr. Thus, it should
>>>>     contain a description for each field that you will be using,
>>>>     no matter what data source the field might come from.
>>>> 2. One typically flattens data when indexing into Solr.
>>>>     Following your example, as customers can have multiple
>>>>     phone numbers, you should denormalise your data.
>>>>     E.g., each Solr record could have these fields:
>>>>        <cust. name>, <cust. desc.>, <phone>
>>>>     Thus, for customer 1 you would need two records, for
>>>>     customer 2 one record, and for customer 3 three records.
>>>>
>>>>     You might find this blog useful, though it probably has
>>>>      more detail than you need:
>>>>      http://mysolr.com/tips/denormalized-data-structure/
>>>> 3. You will need some way to index the data into Solr. One
>>>>     way is to use the DataImportHandler which allows
>>>>     indexing from multiple databases:
>>>>     http://wiki.apache.org/solr/DataImportHandler
>>>>
>>>> Regards,
>>>> Gora
>>>
>>>

Re: How to configure schema.xml to take in account two database tables?

Posted by Andre Lopes <lo...@gmail.com>.
Hi,

Thanks for the replies. The info in my admin/stats is the following:

searcherName : Searcher@f4e40da main
caching : true
numDocs : 654
maxDoc : 654
reader : SolrIndexReader{this=6a6078e7,r=ReadOnlyDirectoryReader@6a6078e7,refCnt=1,segments=1}
readerDir : org.apache.lucene.store.MMapDirectory@/home/andre/workspace/test/3rd_party/solr/apache-solr-3.6.1/example/solr/data/index
lockFactory=org.apache.lucene.store.NativeFSLockFactory@51a422f6
indexVersion : 1343578710140
openedAt : Sun Aug 05 19:04:35 WEST 2012
registeredAt : Sun Aug 05 19:04:35 WEST 2012
warmupTime : 15

There are 654 docs.

Some more info, my solrconfig.xml:

  <!-- Request handler added by Andre Lopes to import data from database -->
  <requestHandler name="/dataimport"
class="org.apache.solr.handler.dataimport.DataImportHandler">
    <!-- default values for query parameters can be specified, these
         will be overridden by parameters in the request
      -->
     <lst name="defaults">
       <str name="config">db-data-config.xml</str>
     </lst>

  </requestHandler>


My db-data-config.xml:

<?xml version="1.0" encoding="UTF-8" ?>
<dataConfig>
	<dataSource driver="org.postgresql.Driver"
url="jdbc:postgresql://localhost:5432/euvoudebicicleta" user="myuser"
password="mypass" />
	<document>
		<entity name="bicyclebusinesses" query="select * from
table_text__single_occurrencies order by date_inserted">
		<field column="uri" name="uri" />		
		<field column="business_name" name="name" />
		<field column="business_address" name="address" />
		</entity>
	</document>
</dataConfig>


My schema.xml:

<?xml version="1.0" encoding="UTF-8" ?>
<schema name="example" version="1.5">
  <types>
    <fieldType name="string" class="solr.StrField"/>
  </types>

  <fields>
    <dynamicField name="*"       type="string" indexed="false" stored="false" />
    <field name="uri" type="string" indexed="true" stored="true" />
<!--
    <field name="name" type="string" indexed="true" stored="true" />
    <field name="address" type="string" indexed="true" stored="true" />
-->	
  </fields>
    <uniqueKey>uri</uniqueKey>
   <!-- <defaultSearchField>catchall</defaultSearchField> -->
</schema>


I've tested, and the SELECT in the db-data-config.xml outputs 654
results. Some more clues?


Best Regards,




On Sun, Aug 5, 2012 at 6:59 PM, Erick Erickson <er...@gmail.com> wrote:
> A quick check here is to go to your admin/stats page and look at
> numDocs and maxDocs. numDocs is the number of documents that it's
> possible to find, i.e. non updated/deleted docs. maxDocs is the number
> of documents that have been added, and that count includes ones with
> duplicate unique IDs.
>
> So I'm guessing that numDocs == 9 and maxDocs == 654, which as Jack
> says indicates that your uniqueKey is repeated for lots and lots of
> your data...
>
> Best
> Erick
>
> On Sun, Aug 5, 2012 at 1:40 PM, Jack Krupansky <ja...@basetechnology.com> wrote:
>> Make sure the id is not duplicated. You might have inadvertently populated
>> the id field in your Solr schema with some non-key value that occurs with
>> high frequency (and may have roughly 9 unique values.)
>>
>> Examine the 9 results and their id fields. Then look at some of your input
>> data to verify that the values placed in the id field are what you expected.
>>
>> If possible, identify one input record that isn't in the 9 results but
>> should be and verify its id.
>>
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Andre Lopes
>> Sent: Sunday, August 05, 2012 1:31 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: How to configure schema.xml to take in account two database
>> tables?
>>
>>
>> Thanks for the replies,
>>
>> I've now successfully indexed the database using the DataImportHandler
>> but there is something weird. I've indexed 654 entries but I can't
>> output all the 654 results.
>>
>> After the I run the
>> "http://localhost:8983/solr/dataimport?command=full-import" I got 654
>> adds:
>>
>> Aug 5, 2012 6:16:51 PM
>> org.apache.solr.update.processor.LogUpdateProcessor finish
>> INFO: {deleteByQuery=*:*,add=[http://1.com, http://2.com,
>> http://3.com, http://4.com, http://5.com, http://6a.com, http://7.vu,
>> http://8.com/, ... (654 adds)],commit=} 0 35
>>
>> But when I query the Solr with this query
>> "http://localhost:8983/solr/select?q=*:*" I only get 9 results.
>>
>> I've used a very basic schema.xml:
>>
>> <?xml version="1.0" encoding="UTF-8" ?>
>> <schema name="example" version="1.5">
>>
>>  <types>
>>    <fieldType name="string" class="solr.StrField"/>
>>  </types>
>>
>>  <fields>
>>    <dynamicField name="*"       type="string" indexed="true" stored="true"
>> />
>>
>>    <field name="id" type="string" indexed="true" stored="true"
>> multiValued="false" />
>>    <field name="name" type="string" indexed="true" stored="true"
>> multiValued="false" />
>>    <field name="address" type="string" indexed="true" stored="true"
>> multiValued="false" />
>>
>>  </fields>
>>
>>    <uniqueKey>id</uniqueKey>
>>   <!-- <defaultSearchField>catchall</defaultSearchField> -->
>>
>> </schema>
>>
>>
>> Some clues on what I'm doing wrong?
>>
>> Best Regards,
>>
>>
>>
>>
>>
>>
>> On Sun, Aug 5, 2012 at 1:19 PM, Gora Mohanty <go...@mimirtech.com> wrote:
>>>
>>> On 5 August 2012 17:17, Andre Lopes <lo...@gmail.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I'm new to Solr. I've take some reads about how it works, but I can't
>>>> find a clue for my specific situation.
>>>>
>>>> Here is my case. I've 2 database tables that I need to add to the
>>>> index, but they are related. One entry in the table "clients" could
>>>> have more than one entry in the table "contacts".
>>>
>>> [...]
>>>
>>> There seem to be various things that you need clarity on:
>>> 1. Firstly, schema.xml describes the various fields that you
>>>     might be indexing, and/or storing in Solr. Thus, it should
>>>     contain a description for each field that you will be using,
>>>     no matter what data source the field might come from.
>>> 2. One typically flattens data when indexing into Solr.
>>>     Following your example, as customers can have multiple
>>>     phone numbers, you should denormalise your data.
>>>     E.g., each Solr record could have these fields:
>>>        <cust. name>, <cust. desc.>, <phone>
>>>     Thus, for customer 1 you would need two records, for
>>>     customer 2 one record, and for customer 3 three records.
>>>
>>>     You might find this blog useful, though it probably has
>>>      more detail than you need:
>>>      http://mysolr.com/tips/denormalized-data-structure/
>>> 3. You will need some way to index the data into Solr. One
>>>     way is to use the DataImportHandler which allows
>>>     indexing from multiple databases:
>>>     http://wiki.apache.org/solr/DataImportHandler
>>>
>>> Regards,
>>> Gora
>>
>>

Re: How to configure schema.xml to take in account two database tables?

Posted by Erick Erickson <er...@gmail.com>.
A quick check here is to go to your admin/stats page and look at
numDocs and maxDocs. numDocs is the number of documents that it's
possible to find, i.e. non updated/deleted docs. maxDocs is the number
of documents that have been added, and that count includes ones with
duplicate unique IDs.

So I'm guessing that numDocs == 9 and maxDocs == 654, which as Jack
says indicates that your uniqueKey is repeated for lots and lots of
your data...

Best
Erick

On Sun, Aug 5, 2012 at 1:40 PM, Jack Krupansky <ja...@basetechnology.com> wrote:
> Make sure the id is not duplicated. You might have inadvertently populated
> the id field in your Solr schema with some non-key value that occurs with
> high frequency (and may have roughly 9 unique values.)
>
> Examine the 9 results and their id fields. Then look at some of your input
> data to verify that the values placed in the id field are what you expected.
>
> If possible, identify one input record that isn't in the 9 results but
> should be and verify its id.
>
>
> -- Jack Krupansky
>
> -----Original Message----- From: Andre Lopes
> Sent: Sunday, August 05, 2012 1:31 PM
> To: solr-user@lucene.apache.org
> Subject: Re: How to configure schema.xml to take in account two database
> tables?
>
>
> Thanks for the replies,
>
> I've now successfully indexed the database using the DataImportHandler
> but there is something weird. I've indexed 654 entries but I can't
> output all the 654 results.
>
> After the I run the
> "http://localhost:8983/solr/dataimport?command=full-import" I got 654
> adds:
>
> Aug 5, 2012 6:16:51 PM
> org.apache.solr.update.processor.LogUpdateProcessor finish
> INFO: {deleteByQuery=*:*,add=[http://1.com, http://2.com,
> http://3.com, http://4.com, http://5.com, http://6a.com, http://7.vu,
> http://8.com/, ... (654 adds)],commit=} 0 35
>
> But when I query the Solr with this query
> "http://localhost:8983/solr/select?q=*:*" I only get 9 results.
>
> I've used a very basic schema.xml:
>
> <?xml version="1.0" encoding="UTF-8" ?>
> <schema name="example" version="1.5">
>
>  <types>
>    <fieldType name="string" class="solr.StrField"/>
>  </types>
>
>  <fields>
>    <dynamicField name="*"       type="string" indexed="true" stored="true"
> />
>
>    <field name="id" type="string" indexed="true" stored="true"
> multiValued="false" />
>    <field name="name" type="string" indexed="true" stored="true"
> multiValued="false" />
>    <field name="address" type="string" indexed="true" stored="true"
> multiValued="false" />
>
>  </fields>
>
>    <uniqueKey>id</uniqueKey>
>   <!-- <defaultSearchField>catchall</defaultSearchField> -->
>
> </schema>
>
>
> Some clues on what I'm doing wrong?
>
> Best Regards,
>
>
>
>
>
>
> On Sun, Aug 5, 2012 at 1:19 PM, Gora Mohanty <go...@mimirtech.com> wrote:
>>
>> On 5 August 2012 17:17, Andre Lopes <lo...@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> I'm new to Solr. I've take some reads about how it works, but I can't
>>> find a clue for my specific situation.
>>>
>>> Here is my case. I've 2 database tables that I need to add to the
>>> index, but they are related. One entry in the table "clients" could
>>> have more than one entry in the table "contacts".
>>
>> [...]
>>
>> There seem to be various things that you need clarity on:
>> 1. Firstly, schema.xml describes the various fields that you
>>     might be indexing, and/or storing in Solr. Thus, it should
>>     contain a description for each field that you will be using,
>>     no matter what data source the field might come from.
>> 2. One typically flattens data when indexing into Solr.
>>     Following your example, as customers can have multiple
>>     phone numbers, you should denormalise your data.
>>     E.g., each Solr record could have these fields:
>>        <cust. name>, <cust. desc.>, <phone>
>>     Thus, for customer 1 you would need two records, for
>>     customer 2 one record, and for customer 3 three records.
>>
>>     You might find this blog useful, though it probably has
>>      more detail than you need:
>>      http://mysolr.com/tips/denormalized-data-structure/
>> 3. You will need some way to index the data into Solr. One
>>     way is to use the DataImportHandler which allows
>>     indexing from multiple databases:
>>     http://wiki.apache.org/solr/DataImportHandler
>>
>> Regards,
>> Gora
>
>

Re: How to configure schema.xml to take in account two database tables?

Posted by Jack Krupansky <ja...@basetechnology.com>.
Make sure the id is not duplicated. You might have inadvertently populated 
the id field in your Solr schema with some non-key value that occurs with 
high frequency (and may have roughly 9 unique values.)

Examine the 9 results and their id fields. Then look at some of your input 
data to verify that the values placed in the id field are what you expected.

If possible, identify one input record that isn't in the 9 results but 
should be and verify its id.

-- Jack Krupansky

-----Original Message----- 
From: Andre Lopes
Sent: Sunday, August 05, 2012 1:31 PM
To: solr-user@lucene.apache.org
Subject: Re: How to configure schema.xml to take in account two database 
tables?

Thanks for the replies,

I've now successfully indexed the database using the DataImportHandler
but there is something weird. I've indexed 654 entries but I can't
output all the 654 results.

After the I run the
"http://localhost:8983/solr/dataimport?command=full-import" I got 654
adds:

Aug 5, 2012 6:16:51 PM
org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: {deleteByQuery=*:*,add=[http://1.com, http://2.com,
http://3.com, http://4.com, http://5.com, http://6a.com, http://7.vu,
http://8.com/, ... (654 adds)],commit=} 0 35

But when I query the Solr with this query
"http://localhost:8983/solr/select?q=*:*" I only get 9 results.

I've used a very basic schema.xml:

<?xml version="1.0" encoding="UTF-8" ?>
<schema name="example" version="1.5">

  <types>
    <fieldType name="string" class="solr.StrField"/>
  </types>

  <fields>
    <dynamicField name="*"       type="string" indexed="true" stored="true" 
/>

    <field name="id" type="string" indexed="true" stored="true"
multiValued="false" />
    <field name="name" type="string" indexed="true" stored="true"
multiValued="false" />
    <field name="address" type="string" indexed="true" stored="true"
multiValued="false" />

  </fields>

    <uniqueKey>id</uniqueKey>
   <!-- <defaultSearchField>catchall</defaultSearchField> -->

</schema>


Some clues on what I'm doing wrong?

Best Regards,






On Sun, Aug 5, 2012 at 1:19 PM, Gora Mohanty <go...@mimirtech.com> wrote:
> On 5 August 2012 17:17, Andre Lopes <lo...@gmail.com> wrote:
>> Hi,
>>
>> I'm new to Solr. I've take some reads about how it works, but I can't
>> find a clue for my specific situation.
>>
>> Here is my case. I've 2 database tables that I need to add to the
>> index, but they are related. One entry in the table "clients" could
>> have more than one entry in the table "contacts".
> [...]
>
> There seem to be various things that you need clarity on:
> 1. Firstly, schema.xml describes the various fields that you
>     might be indexing, and/or storing in Solr. Thus, it should
>     contain a description for each field that you will be using,
>     no matter what data source the field might come from.
> 2. One typically flattens data when indexing into Solr.
>     Following your example, as customers can have multiple
>     phone numbers, you should denormalise your data.
>     E.g., each Solr record could have these fields:
>        <cust. name>, <cust. desc.>, <phone>
>     Thus, for customer 1 you would need two records, for
>     customer 2 one record, and for customer 3 three records.
>
>     You might find this blog useful, though it probably has
>      more detail than you need:
>      http://mysolr.com/tips/denormalized-data-structure/
> 3. You will need some way to index the data into Solr. One
>     way is to use the DataImportHandler which allows
>     indexing from multiple databases:
>     http://wiki.apache.org/solr/DataImportHandler
>
> Regards,
> Gora 


Re: How to configure schema.xml to take in account two database tables?

Posted by Andre Lopes <lo...@gmail.com>.
Thanks for the replies,

I've now successfully indexed the database using the DataImportHandler
but there is something weird. I've indexed 654 entries but I can't
output all the 654 results.

After the I run the
"http://localhost:8983/solr/dataimport?command=full-import" I got 654
adds:

Aug 5, 2012 6:16:51 PM
org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: {deleteByQuery=*:*,add=[http://1.com, http://2.com,
http://3.com, http://4.com, http://5.com, http://6a.com, http://7.vu,
http://8.com/, ... (654 adds)],commit=} 0 35

But when I query the Solr with this query
"http://localhost:8983/solr/select?q=*:*" I only get 9 results.

I've used a very basic schema.xml:

<?xml version="1.0" encoding="UTF-8" ?>
<schema name="example" version="1.5">

  <types>
    <fieldType name="string" class="solr.StrField"/>
  </types>

  <fields>
    <dynamicField name="*"       type="string" indexed="true" stored="true" />

    <field name="id" type="string" indexed="true" stored="true"
multiValued="false" />
    <field name="name" type="string" indexed="true" stored="true"
multiValued="false" />
    <field name="address" type="string" indexed="true" stored="true"
multiValued="false" />
	
  </fields>

    <uniqueKey>id</uniqueKey>
   <!-- <defaultSearchField>catchall</defaultSearchField> -->

</schema>


Some clues on what I'm doing wrong?

Best Regards,






On Sun, Aug 5, 2012 at 1:19 PM, Gora Mohanty <go...@mimirtech.com> wrote:
> On 5 August 2012 17:17, Andre Lopes <lo...@gmail.com> wrote:
>> Hi,
>>
>> I'm new to Solr. I've take some reads about how it works, but I can't
>> find a clue for my specific situation.
>>
>> Here is my case. I've 2 database tables that I need to add to the
>> index, but they are related. One entry in the table "clients" could
>> have more than one entry in the table "contacts".
> [...]
>
> There seem to be various things that you need clarity on:
> 1. Firstly, schema.xml describes the various fields that you
>     might be indexing, and/or storing in Solr. Thus, it should
>     contain a description for each field that you will be using,
>     no matter what data source the field might come from.
> 2. One typically flattens data when indexing into Solr.
>     Following your example, as customers can have multiple
>     phone numbers, you should denormalise your data.
>     E.g., each Solr record could have these fields:
>        <cust. name>, <cust. desc.>, <phone>
>     Thus, for customer 1 you would need two records, for
>     customer 2 one record, and for customer 3 three records.
>
>     You might find this blog useful, though it probably has
>      more detail than you need:
>      http://mysolr.com/tips/denormalized-data-structure/
> 3. You will need some way to index the data into Solr. One
>     way is to use the DataImportHandler which allows
>     indexing from multiple databases:
>     http://wiki.apache.org/solr/DataImportHandler
>
> Regards,
> Gora

Re: How to configure schema.xml to take in account two database tables?

Posted by Gora Mohanty <go...@mimirtech.com>.
On 5 August 2012 17:17, Andre Lopes <lo...@gmail.com> wrote:
> Hi,
>
> I'm new to Solr. I've take some reads about how it works, but I can't
> find a clue for my specific situation.
>
> Here is my case. I've 2 database tables that I need to add to the
> index, but they are related. One entry in the table "clients" could
> have more than one entry in the table "contacts".
[...]

There seem to be various things that you need clarity on:
1. Firstly, schema.xml describes the various fields that you
    might be indexing, and/or storing in Solr. Thus, it should
    contain a description for each field that you will be using,
    no matter what data source the field might come from.
2. One typically flattens data when indexing into Solr.
    Following your example, as customers can have multiple
    phone numbers, you should denormalise your data.
    E.g., each Solr record could have these fields:
       <cust. name>, <cust. desc.>, <phone>
    Thus, for customer 1 you would need two records, for
    customer 2 one record, and for customer 3 three records.

    You might find this blog useful, though it probably has
     more detail than you need:
     http://mysolr.com/tips/denormalized-data-structure/
3. You will need some way to index the data into Solr. One
    way is to use the DataImportHandler which allows
    indexing from multiple databases:
    http://wiki.apache.org/solr/DataImportHandler

Regards,
Gora

Re: How to configure schema.xml to take in account two database tables?

Posted by Jack Krupansky <ja...@basetechnology.com>.
In general, you need to "flatten" relational tables.

In this specific case, I see two choices:

1. Add a "customer_id" field to your contacts for the customer's id. The id 
field for a contact would need to be a unique id such as the concatenation 
of the customer id and the contact id. You can then query all contacts for a 
specified customer id.
2. Add "phone_number" as a multi-valued field for each customer.

The latter seems a little more compelling, for this specific example.

-- Jack Krupansky

-----Original Message----- 
From: Andre Lopes
Sent: Sunday, August 05, 2012 7:47 AM
To: solr-user@lucene.apache.org
Subject: How to configure schema.xml to take in account two database tables?

Hi,

I'm new to Solr. I've take some reads about how it works, but I can't
find a clue for my specific situation.

Here is my case. I've 2 database tables that I need to add to the
index, but they are related. One entry in the table "clients" could
have more than one entry in the table "contacts". Here is the visual
example:

Table clients:

id | name        | description
1  | customer 1  | This is the description of customer 1
2  | customer 2  | This is the description of customer 2
3  | customer 3  | This is the description of customer 3
4  | customer 4  | This is the description of customer 4

Table contacts:

id | phone_number
1  | 921234567
1  | 932122345
2  | 934545444
3  | 943322345
3  | 343445545
3  | 213443435

I think the case is simple. If in a search I input "921234567" I must
to present information about "customer 1".

My question... How can I setup the schema.xml in a way that I will
take in account this two database tables?


Best Regards,