You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Stefan Maric <sm...@ntlworld.com> on 2010/02/03 00:31:29 UTC

Basic indexing question

I have got a basic configuration of Solr up and running and have loaded some data to experiment with
 When I run a query for 'ore' I get 3 results when I'm expecting 4
Dataimport is pulling the expected number of rows in from my DB view

 In my schema.xml I have 
 <field name="id" type="string" indexed="true" stored="true" required="true" /> 
 <field name="atomId" type="string" indexed="true" stored="true" required="true" /> 
 <field name="name" type="text" indexed="true" stored="true"/>
 <field name="description" type="text" indexed="true" stored="true" />

 and  the defaults
<field name="text" type="text" indexed="true" stored="false" multiValued="true"/>
<copyField source="name" dest="text"/>

 From an SQL point of view - I am expecting a search for 'ore' to retrieve 4 results (which the following does)
select * from v_sm_search_sectors where description like '% ore%' or name like '% ore%';
1000021 B0.010.010      Mining and quarrying                                  Mining of metal ore, stone, sand, clay, coal and other solid minerals
1000144 E0.030              Metal and metal ores wholesale               (null)
1000145 E0.030.010      Metal and metal ores wholesale               (null)
1000146 E0.030.020      Metal and metal ores wholesale agents   (null)

>From a Solr query for 'ore' - I get the following
<response>
−
      <lst name="responseHeader">
      <int name="status">0</int>
      <int name="QTime">0</int>
      −
      <lst name="params">
      <str name="rows">10</str>
      <str name="start">0</str>
      <str name="indent">on</str>
      <str name="q">ore</str>
      <str name="version">2.2</str>
      </lst>
      </lst>
      −
      <result name="response" numFound="3" start="0">
      −
      <doc>
      <str name="atomId">E0.030</str>
      <str name="id">1000144</str>
      <str name="name">Metal and metal ores wholesale</str>
      </doc>
      −
      <doc>
      <str name="atomId">E0.030.010</str>
      <str name="id">1000145</str>
      <str name="name">Metal and metal ores wholesale</str>
      </doc>
      −
      <doc>
      <str name="atomId">E0.030.020</str>
      <str name="id">1000146</str>
      <str name="name">Metal and metal ores wholesale agents</str>
      </doc>
      </result>
      </response>


      So I don't retrieve the document where 'ore' is in the descritpion field (and NOT the name field)

      It would seem that Solr is ONLY returning me results based on what has been put into the <field name="text" by the <copyField source="name" dest="text"/>

      Any hints as to what I've missed ??

      Regards
      Stefan Maric 

RE: Basic indexing question

Posted by Stefan Maric <sm...@ntlworld.com>.
Thanks that was it - I've now configured a dismax requesthandler that suits
my needs



-----Original Message-----
From: Joe Calderon [mailto:calderon.joe@gmail.com]
Sent: 03 February 2010 00:20
To: solr-user@lucene.apache.org
Subject: Re: Basic indexing question


see http://wiki.apache.org/solr/SchemaXml#The_Default_Search_Field for
details on default field, most people use the dismax handler when
handling queries from user
see http://wiki.apache.org/solr/DisMaxRequestHandler for more details,
if you dont have many fields you can write your own query using the
lucene query parser as i mentioned before, the syntax cen be found at
http://lucene.apache.org/java/2_9_1/queryparsersyntax.html

hope this helps


--joe
On Tue, Feb 2, 2010 at 3:59 PM, Stefan Maric <sm...@ntlworld.com> wrote:
> Thanks for the quick reply
> I will have to see if the default query mechanism will suffice for most of
> my needs
>
> I have skimmed through most of the Solr documentation and didn't see
> anything describing
>
> I can easily change my DB View so that I only source Solr with a single
> string plus my id field
> (as my application makng the search will have to collate associated
> information into a presentable screen anyhow - so I'm not too worried
about
> info being returned by Solr as such)
>
> Would that be a reasonable way of using Solr
>
>
>
>
> -----Original Message-----
> From: Joe Calderon [mailto:calderon.joe@gmail.com]
> Sent: 02 February 2010 23:42
> To: solr-user@lucene.apache.org
> Subject: Re: Basic indexing question
>
>
> by default solr will only search the default fields, you have to
> either query all fields field1:(ore) or field2:(ore) or field3:(ore)
> or use a different query parser like dismax
>
> On Tue, Feb 2, 2010 at 3:31 PM, Stefan Maric <sm...@ntlworld.com> wrote:
>> I have got a basic configuration of Solr up and running and have loaded
> some data to experiment with
>>  When I run a query for 'ore' I get 3 results when I'm expecting 4
>> Dataimport is pulling the expected number of rows in from my DB view
>>
>>  In my schema.xml I have
>>  <field name="id" type="string" indexed="true" stored="true"
> required="true" />
>>  <field name="atomId" type="string" indexed="true" stored="true"
> required="true" />
>>  <field name="name" type="text" indexed="true" stored="true"/>
>>  <field name="description" type="text" indexed="true" stored="true" />
>>
>>  and  the defaults
>> <field name="text" type="text" indexed="true" stored="false"
> multiValued="true"/>
>> <copyField source="name" dest="text"/>
>>
>>  From an SQL point of view - I am expecting a search for 'ore' to
retrieve
> 4 results (which the following does)
>> select * from v_sm_search_sectors where description like '% ore%' or name
> like '% ore%';
>> 1000021 B0.010.010      Mining and quarrying
> Mining of metal ore, stone, sand, clay, coal and other solid minerals
>> 1000144 E0.030              Metal and metal ores wholesale
> (null)
>> 1000145 E0.030.010      Metal and metal ores wholesale
> (null)
>> 1000146 E0.030.020      Metal and metal ores wholesale agents   (null)
>>
>> From a Solr query for 'ore' - I get the following
>> <response>
>> -
>>      <lst name="responseHeader">
>>      <int name="status">0</int>
>>      <int name="QTime">0</int>
>>      -
>>      <lst name="params">
>>      <str name="rows">10</str>
>>      <str name="start">0</str>
>>      <str name="indent">on</str>
>>      <str name="q">ore</str>
>>      <str name="version">2.2</str>
>>      </lst>
>>      </lst>
>>      -
>>      <result name="response" numFound="3" start="0">
>>      -
>>      <doc>
>>      <str name="atomId">E0.030</str>
>>      <str name="id">1000144</str>
>>      <str name="name">Metal and metal ores wholesale</str>
>>      </doc>
>>      -
>>      <doc>
>>      <str name="atomId">E0.030.010</str>
>>      <str name="id">1000145</str>
>>      <str name="name">Metal and metal ores wholesale</str>
>>      </doc>
>>      -
>>      <doc>
>>      <str name="atomId">E0.030.020</str>
>>      <str name="id">1000146</str>
>>      <str name="name">Metal and metal ores wholesale agents</str>
>>      </doc>
>>      </result>
>>      </response>
>>
>>
>>      So I don't retrieve the document where 'ore' is in the descritpion
> field (and NOT the name field)
>>
>>      It would seem that Solr is ONLY returning me results based on what
> has been put into the <field name="text" by the <copyField source="name"
> dest="text"/>
>>
>>      Any hints as to what I've missed ??
>>
>>      Regards
>>      Stefan Maric
>>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.435 / Virus Database: 271.1.1/2663 - Release Date: 02/02/10
> 07:35:00
>
>
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.435 / Virus Database: 271.1.1/2664 - Release Date: 02/02/10
19:35:00


Re: Basic indexing question

Posted by Joe Calderon <ca...@gmail.com>.
see http://wiki.apache.org/solr/SchemaXml#The_Default_Search_Field for
details on default field, most people use the dismax handler when
handling queries from user
see http://wiki.apache.org/solr/DisMaxRequestHandler for more details,
if you dont have many fields you can write your own query using the
lucene query parser as i mentioned before, the syntax cen be found at
http://lucene.apache.org/java/2_9_1/queryparsersyntax.html

hope this helps


--joe
On Tue, Feb 2, 2010 at 3:59 PM, Stefan Maric <sm...@ntlworld.com> wrote:
> Thanks for the quick reply
> I will have to see if the default query mechanism will suffice for most of
> my needs
>
> I have skimmed through most of the Solr documentation and didn't see
> anything describing
>
> I can easily change my DB View so that I only source Solr with a single
> string plus my id field
> (as my application makng the search will have to collate associated
> information into a presentable screen anyhow - so I'm not too worried about
> info being returned by Solr as such)
>
> Would that be a reasonable way of using Solr
>
>
>
>
> -----Original Message-----
> From: Joe Calderon [mailto:calderon.joe@gmail.com]
> Sent: 02 February 2010 23:42
> To: solr-user@lucene.apache.org
> Subject: Re: Basic indexing question
>
>
> by default solr will only search the default fields, you have to
> either query all fields field1:(ore) or field2:(ore) or field3:(ore)
> or use a different query parser like dismax
>
> On Tue, Feb 2, 2010 at 3:31 PM, Stefan Maric <sm...@ntlworld.com> wrote:
>> I have got a basic configuration of Solr up and running and have loaded
> some data to experiment with
>>  When I run a query for 'ore' I get 3 results when I'm expecting 4
>> Dataimport is pulling the expected number of rows in from my DB view
>>
>>  In my schema.xml I have
>>  <field name="id" type="string" indexed="true" stored="true"
> required="true" />
>>  <field name="atomId" type="string" indexed="true" stored="true"
> required="true" />
>>  <field name="name" type="text" indexed="true" stored="true"/>
>>  <field name="description" type="text" indexed="true" stored="true" />
>>
>>  and  the defaults
>> <field name="text" type="text" indexed="true" stored="false"
> multiValued="true"/>
>> <copyField source="name" dest="text"/>
>>
>>  From an SQL point of view - I am expecting a search for 'ore' to retrieve
> 4 results (which the following does)
>> select * from v_sm_search_sectors where description like '% ore%' or name
> like '% ore%';
>> 1000021 B0.010.010      Mining and quarrying
> Mining of metal ore, stone, sand, clay, coal and other solid minerals
>> 1000144 E0.030              Metal and metal ores wholesale
> (null)
>> 1000145 E0.030.010      Metal and metal ores wholesale
> (null)
>> 1000146 E0.030.020      Metal and metal ores wholesale agents   (null)
>>
>> From a Solr query for 'ore' - I get the following
>> <response>
>> -
>>      <lst name="responseHeader">
>>      <int name="status">0</int>
>>      <int name="QTime">0</int>
>>      -
>>      <lst name="params">
>>      <str name="rows">10</str>
>>      <str name="start">0</str>
>>      <str name="indent">on</str>
>>      <str name="q">ore</str>
>>      <str name="version">2.2</str>
>>      </lst>
>>      </lst>
>>      -
>>      <result name="response" numFound="3" start="0">
>>      -
>>      <doc>
>>      <str name="atomId">E0.030</str>
>>      <str name="id">1000144</str>
>>      <str name="name">Metal and metal ores wholesale</str>
>>      </doc>
>>      -
>>      <doc>
>>      <str name="atomId">E0.030.010</str>
>>      <str name="id">1000145</str>
>>      <str name="name">Metal and metal ores wholesale</str>
>>      </doc>
>>      -
>>      <doc>
>>      <str name="atomId">E0.030.020</str>
>>      <str name="id">1000146</str>
>>      <str name="name">Metal and metal ores wholesale agents</str>
>>      </doc>
>>      </result>
>>      </response>
>>
>>
>>      So I don't retrieve the document where 'ore' is in the descritpion
> field (and NOT the name field)
>>
>>      It would seem that Solr is ONLY returning me results based on what
> has been put into the <field name="text" by the <copyField source="name"
> dest="text"/>
>>
>>      Any hints as to what I've missed ??
>>
>>      Regards
>>      Stefan Maric
>>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.435 / Virus Database: 271.1.1/2663 - Release Date: 02/02/10
> 07:35:00
>
>

RE: Basic indexing question

Posted by Stefan Maric <sm...@ntlworld.com>.
Thanks for the quick reply
I will have to see if the default query mechanism will suffice for most of
my needs

I have skimmed through most of the Solr documentation and didn't see
anything describing

I can easily change my DB View so that I only source Solr with a single
string plus my id field
(as my application makng the search will have to collate associated
information into a presentable screen anyhow - so I'm not too worried about
info being returned by Solr as such)

Would that be a reasonable way of using Solr




-----Original Message-----
From: Joe Calderon [mailto:calderon.joe@gmail.com]
Sent: 02 February 2010 23:42
To: solr-user@lucene.apache.org
Subject: Re: Basic indexing question


by default solr will only search the default fields, you have to
either query all fields field1:(ore) or field2:(ore) or field3:(ore)
or use a different query parser like dismax

On Tue, Feb 2, 2010 at 3:31 PM, Stefan Maric <sm...@ntlworld.com> wrote:
> I have got a basic configuration of Solr up and running and have loaded
some data to experiment with
>  When I run a query for 'ore' I get 3 results when I'm expecting 4
> Dataimport is pulling the expected number of rows in from my DB view
>
>  In my schema.xml I have
>  <field name="id" type="string" indexed="true" stored="true"
required="true" />
>  <field name="atomId" type="string" indexed="true" stored="true"
required="true" />
>  <field name="name" type="text" indexed="true" stored="true"/>
>  <field name="description" type="text" indexed="true" stored="true" />
>
>  and  the defaults
> <field name="text" type="text" indexed="true" stored="false"
multiValued="true"/>
> <copyField source="name" dest="text"/>
>
>  From an SQL point of view - I am expecting a search for 'ore' to retrieve
4 results (which the following does)
> select * from v_sm_search_sectors where description like '% ore%' or name
like '% ore%';
> 1000021 B0.010.010      Mining and quarrying
Mining of metal ore, stone, sand, clay, coal and other solid minerals
> 1000144 E0.030              Metal and metal ores wholesale
(null)
> 1000145 E0.030.010      Metal and metal ores wholesale
(null)
> 1000146 E0.030.020      Metal and metal ores wholesale agents   (null)
>
> From a Solr query for 'ore' - I get the following
> <response>
> -
>      <lst name="responseHeader">
>      <int name="status">0</int>
>      <int name="QTime">0</int>
>      -
>      <lst name="params">
>      <str name="rows">10</str>
>      <str name="start">0</str>
>      <str name="indent">on</str>
>      <str name="q">ore</str>
>      <str name="version">2.2</str>
>      </lst>
>      </lst>
>      -
>      <result name="response" numFound="3" start="0">
>      -
>      <doc>
>      <str name="atomId">E0.030</str>
>      <str name="id">1000144</str>
>      <str name="name">Metal and metal ores wholesale</str>
>      </doc>
>      -
>      <doc>
>      <str name="atomId">E0.030.010</str>
>      <str name="id">1000145</str>
>      <str name="name">Metal and metal ores wholesale</str>
>      </doc>
>      -
>      <doc>
>      <str name="atomId">E0.030.020</str>
>      <str name="id">1000146</str>
>      <str name="name">Metal and metal ores wholesale agents</str>
>      </doc>
>      </result>
>      </response>
>
>
>      So I don't retrieve the document where 'ore' is in the descritpion
field (and NOT the name field)
>
>      It would seem that Solr is ONLY returning me results based on what
has been put into the <field name="text" by the <copyField source="name"
dest="text"/>
>
>      Any hints as to what I've missed ??
>
>      Regards
>      Stefan Maric
>
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.435 / Virus Database: 271.1.1/2663 - Release Date: 02/02/10
07:35:00


Re: Another basic question

Posted by Ahmet Arslan <io...@yahoo.com>.
> I have got a basic configuration of
> Solr up and running and have loaded some
> data to experiment with
> Dataimport is pulling the expected number of rows in from
> my DB view
> 
> If I query for Beekeeping i get one result returned (as
> expected)
> 
> If I query for bee - I get no results
> similarly for Bee
> etc

Do you want the query (bee) to return documents containing beekeeping?

You can use prefix query bee* but I think DisMax does not support it.

Alternatively you can use index time synonym expansion :
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="true" /> 

with index_synonyms.txt :
beekeeping, bee keeping, bee-keeping


      

Another basic question

Posted by Stefan Maric <sm...@ntlworld.com>.
I have got a basic configuration of Solr up and running and have loaded some
data to experiment with
Dataimport is pulling the expected number of rows in from my DB view

If I query for Beekeeping i get one result returned (as expected)

If I query for bee - I get no results
similarly for Bee
etc

What areas of Solr configuration do I need to look into

Thanks
Stefan Maric


Re: Basic indexing question

Posted by Joe Calderon <ca...@gmail.com>.
by default solr will only search the default fields, you have to
either query all fields field1:(ore) or field2:(ore) or field3:(ore)
or use a different query parser like dismax

On Tue, Feb 2, 2010 at 3:31 PM, Stefan Maric <sm...@ntlworld.com> wrote:
> I have got a basic configuration of Solr up and running and have loaded some data to experiment with
>  When I run a query for 'ore' I get 3 results when I'm expecting 4
> Dataimport is pulling the expected number of rows in from my DB view
>
>  In my schema.xml I have
>  <field name="id" type="string" indexed="true" stored="true" required="true" />
>  <field name="atomId" type="string" indexed="true" stored="true" required="true" />
>  <field name="name" type="text" indexed="true" stored="true"/>
>  <field name="description" type="text" indexed="true" stored="true" />
>
>  and  the defaults
> <field name="text" type="text" indexed="true" stored="false" multiValued="true"/>
> <copyField source="name" dest="text"/>
>
>  From an SQL point of view - I am expecting a search for 'ore' to retrieve 4 results (which the following does)
> select * from v_sm_search_sectors where description like '% ore%' or name like '% ore%';
> 1000021 B0.010.010      Mining and quarrying                                  Mining of metal ore, stone, sand, clay, coal and other solid minerals
> 1000144 E0.030              Metal and metal ores wholesale               (null)
> 1000145 E0.030.010      Metal and metal ores wholesale               (null)
> 1000146 E0.030.020      Metal and metal ores wholesale agents   (null)
>
> From a Solr query for 'ore' - I get the following
> <response>
> -
>      <lst name="responseHeader">
>      <int name="status">0</int>
>      <int name="QTime">0</int>
>      -
>      <lst name="params">
>      <str name="rows">10</str>
>      <str name="start">0</str>
>      <str name="indent">on</str>
>      <str name="q">ore</str>
>      <str name="version">2.2</str>
>      </lst>
>      </lst>
>      -
>      <result name="response" numFound="3" start="0">
>      -
>      <doc>
>      <str name="atomId">E0.030</str>
>      <str name="id">1000144</str>
>      <str name="name">Metal and metal ores wholesale</str>
>      </doc>
>      -
>      <doc>
>      <str name="atomId">E0.030.010</str>
>      <str name="id">1000145</str>
>      <str name="name">Metal and metal ores wholesale</str>
>      </doc>
>      -
>      <doc>
>      <str name="atomId">E0.030.020</str>
>      <str name="id">1000146</str>
>      <str name="name">Metal and metal ores wholesale agents</str>
>      </doc>
>      </result>
>      </response>
>
>
>      So I don't retrieve the document where 'ore' is in the descritpion field (and NOT the name field)
>
>      It would seem that Solr is ONLY returning me results based on what has been put into the <field name="text" by the <copyField source="name" dest="text"/>
>
>      Any hints as to what I've missed ??
>
>      Regards
>      Stefan Maric
>