You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Saïd Radhouani <r....@gmail.com> on 2010/07/02 18:36:31 UTC

Use free text to search against boolean fields?

Hi,

I have the following kind of data to index in a multilingual context: is_man, is_single, has_job, etc.

Logically, the underlying fields have a value of "yes" or "no." That's why the boolean type would be appropriate. But my problem is, in addition to be able to filter on these fields, I would like to give my users the possibility to search against these fields using free text. i.e., a query might be "single man having job." Therefore, I think that the boolean type is not appropriate anymore. Instead, I'm thinking of using the string type, and each field will be either empty (the "no" case), or populated by its own tag. e.g., if we deal about a man, the field is_man will contain the string "man." Then, I copy all these fields into a text field that I ca user for free text search.

Does that make sense?

Does that make sense in a multilingual context, i.e., field tags can be different in each language (EN => man, single, jog, FR => homme, célibataire, emploi, etc.)

Thanks!

-Saïd

Re: Use free text to search against boolean fields?

Posted by Saïd Radhouani <r....@gmail.com>.
Hi Jan,

The vocabulary of my domain is very small and pretty controlled. Users will ask queries about features of our products, and we have less than one hundred features.. So the idea is to have a text field "features" storing all the features. And, re: the multilingualism, I can have "features_en", "features_fr", etc.  

What do you think?
-Saïd


On Jul 3, 2010, at 5:09 PM, Jan Høydahl / Cominvent wrote:

> Hi,
> 
> It would help to know more about the actual application, and see some use cases in order to answer that question. I thought that this would be free-text queries from users, and as soon as you have free-text then you WILL get all kinds of stuff in the queries. However, if your users are well educated on how to query your system and behave, then what you suggest makes more sense. It's quick to test and see how it works.
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Training in Europe - www.solrtraining.com
> 
> On 3. juli 2010, at 01.11, Saïd Radhouani wrote:
> 
>> Hi Jan,
>> 
>> Thanks for this suggestion. If we choose parsing, then why don't we do it at the indexing side, instead of the querying side, which might slows down the search process? i.e., if a document has "is_man=true" and "is_single=true", the we populate a text field by the words "man" and "single". Then, during the search, we compare the user query with the text field. There's no "intelligent" query in my application, i.e., users would not ask for "not smoking". If they mention a word, it means that the boolean value is true.
>> 
>> I don't have many fields, so populating a text field will not dramatically increase the size of my index.
>> 
>> What do you think?
>> 
>> -Saïd
>> 
>> On Jul 3, 2010, at 12:36 AM, Jan Høydahl / Cominvent wrote:
>> 
>>> Hi,
>>> 
>>> I would rather go for the boolean variant and spend some time writing a query parser which tries to understand all kinds of input people may make, mapping it into boolean filters. In this way you can support both navigation and search and keep both in sync whatever people prefert to start with. I'm not saying it is easy to write such a parser, but you know the domain and the users...
>>> 
>>> Another reason for doing it this way is that if you have a field does_smoke=true, you still want to match if someone writes "not smoking". Your parser would have to understand negations, e.g. through a set of regex ((not|non|no) (smoker|smoking|smoke))...
>>> 
>>> You could always do a mix also - to keep a free-text field as well, and any words that your parser does not understand can be passed through to the free-text as a "should" term with a boost.
>>> 
>>> --
>>> Jan Høydahl, search solution architect
>>> Cominvent AS - www.cominvent.com
>>> Training in Europe - www.solrtraining.com
>>> 
>>> On 2. juli 2010, at 18.36, Saïd Radhouani wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I have the following kind of data to index in a multilingual context: is_man, is_single, has_job, etc.
>>>> 
>>>> Logically, the underlying fields have a value of "yes" or "no." That's why the boolean type would be appropriate. But my problem is, in addition to be able to filter on these fields, I would like to give my users the possibility to search against these fields using free text. i.e., a query might be "single man having job." Therefore, I think that the boolean type is not appropriate anymore. Instead, I'm thinking of using the string type, and each field will be either empty (the "no" case), or populated by its own tag. e.g., if we deal about a man, the field is_man will contain the string "man." Then, I copy all these fields into a text field that I ca user for free text search.
>>>> 
>>>> Does that make sense?
>>>> 
>>>> Does that make sense in a multilingual context, i.e., field tags can be different in each language (EN => man, single, jog, FR => homme, célibataire, emploi, etc.)
>>>> 
>>>> Thanks!
>>>> 
>>>> -Saïd
>>> 
>> 
> 


Re: Use free text to search against boolean fields?

Posted by Jan Høydahl / Cominvent <ja...@cominvent.com>.
Hi,

It would help to know more about the actual application, and see some use cases in order to answer that question. I thought that this would be free-text queries from users, and as soon as you have free-text then you WILL get all kinds of stuff in the queries. However, if your users are well educated on how to query your system and behave, then what you suggest makes more sense. It's quick to test and see how it works.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com

On 3. juli 2010, at 01.11, Saïd Radhouani wrote:

> Hi Jan,
> 
> Thanks for this suggestion. If we choose parsing, then why don't we do it at the indexing side, instead of the querying side, which might slows down the search process? i.e., if a document has "is_man=true" and "is_single=true", the we populate a text field by the words "man" and "single". Then, during the search, we compare the user query with the text field. There's no "intelligent" query in my application, i.e., users would not ask for "not smoking". If they mention a word, it means that the boolean value is true.
> 
> I don't have many fields, so populating a text field will not dramatically increase the size of my index.
> 
> What do you think?
> 
> -Saïd
> 
> On Jul 3, 2010, at 12:36 AM, Jan Høydahl / Cominvent wrote:
> 
>> Hi,
>> 
>> I would rather go for the boolean variant and spend some time writing a query parser which tries to understand all kinds of input people may make, mapping it into boolean filters. In this way you can support both navigation and search and keep both in sync whatever people prefert to start with. I'm not saying it is easy to write such a parser, but you know the domain and the users...
>> 
>> Another reason for doing it this way is that if you have a field does_smoke=true, you still want to match if someone writes "not smoking". Your parser would have to understand negations, e.g. through a set of regex ((not|non|no) (smoker|smoking|smoke))...
>> 
>> You could always do a mix also - to keep a free-text field as well, and any words that your parser does not understand can be passed through to the free-text as a "should" term with a boost.
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> Training in Europe - www.solrtraining.com
>> 
>> On 2. juli 2010, at 18.36, Saïd Radhouani wrote:
>> 
>>> Hi,
>>> 
>>> I have the following kind of data to index in a multilingual context: is_man, is_single, has_job, etc.
>>> 
>>> Logically, the underlying fields have a value of "yes" or "no." That's why the boolean type would be appropriate. But my problem is, in addition to be able to filter on these fields, I would like to give my users the possibility to search against these fields using free text. i.e., a query might be "single man having job." Therefore, I think that the boolean type is not appropriate anymore. Instead, I'm thinking of using the string type, and each field will be either empty (the "no" case), or populated by its own tag. e.g., if we deal about a man, the field is_man will contain the string "man." Then, I copy all these fields into a text field that I ca user for free text search.
>>> 
>>> Does that make sense?
>>> 
>>> Does that make sense in a multilingual context, i.e., field tags can be different in each language (EN => man, single, jog, FR => homme, célibataire, emploi, etc.)
>>> 
>>> Thanks!
>>> 
>>> -Saïd
>> 
> 


Re: Use free text to search against boolean fields?

Posted by Saïd Radhouani <r....@gmail.com>.
Hi Jan,

Thanks for this suggestion. If we choose parsing, then why don't we do it at the indexing side, instead of the querying side, which might slows down the search process? i.e., if a document has "is_man=true" and "is_single=true", the we populate a text field by the words "man" and "single". Then, during the search, we compare the user query with the text field. There's no "intelligent" query in my application, i.e., users would not ask for "not smoking". If they mention a word, it means that the boolean value is true.

I don't have many fields, so populating a text field will not dramatically increase the size of my index.

What do you think?

-Saïd

On Jul 3, 2010, at 12:36 AM, Jan Høydahl / Cominvent wrote:

> Hi,
> 
> I would rather go for the boolean variant and spend some time writing a query parser which tries to understand all kinds of input people may make, mapping it into boolean filters. In this way you can support both navigation and search and keep both in sync whatever people prefert to start with. I'm not saying it is easy to write such a parser, but you know the domain and the users...
> 
> Another reason for doing it this way is that if you have a field does_smoke=true, you still want to match if someone writes "not smoking". Your parser would have to understand negations, e.g. through a set of regex ((not|non|no) (smoker|smoking|smoke))...
> 
> You could always do a mix also - to keep a free-text field as well, and any words that your parser does not understand can be passed through to the free-text as a "should" term with a boost.
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Training in Europe - www.solrtraining.com
> 
> On 2. juli 2010, at 18.36, Saïd Radhouani wrote:
> 
>> Hi,
>> 
>> I have the following kind of data to index in a multilingual context: is_man, is_single, has_job, etc.
>> 
>> Logically, the underlying fields have a value of "yes" or "no." That's why the boolean type would be appropriate. But my problem is, in addition to be able to filter on these fields, I would like to give my users the possibility to search against these fields using free text. i.e., a query might be "single man having job." Therefore, I think that the boolean type is not appropriate anymore. Instead, I'm thinking of using the string type, and each field will be either empty (the "no" case), or populated by its own tag. e.g., if we deal about a man, the field is_man will contain the string "man." Then, I copy all these fields into a text field that I ca user for free text search.
>> 
>> Does that make sense?
>> 
>> Does that make sense in a multilingual context, i.e., field tags can be different in each language (EN => man, single, jog, FR => homme, célibataire, emploi, etc.)
>> 
>> Thanks!
>> 
>> -Saïd
> 


Re: Use free text to search against boolean fields?

Posted by Jan Høydahl / Cominvent <ja...@cominvent.com>.
Hi,

I would rather go for the boolean variant and spend some time writing a query parser which tries to understand all kinds of input people may make, mapping it into boolean filters. In this way you can support both navigation and search and keep both in sync whatever people prefert to start with. I'm not saying it is easy to write such a parser, but you know the domain and the users...

Another reason for doing it this way is that if you have a field does_smoke=true, you still want to match if someone writes "not smoking". Your parser would have to understand negations, e.g. through a set of regex ((not|non|no) (smoker|smoking|smoke))...

You could always do a mix also - to keep a free-text field as well, and any words that your parser does not understand can be passed through to the free-text as a "should" term with a boost.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com

On 2. juli 2010, at 18.36, Saïd Radhouani wrote:

> Hi,
> 
> I have the following kind of data to index in a multilingual context: is_man, is_single, has_job, etc.
> 
> Logically, the underlying fields have a value of "yes" or "no." That's why the boolean type would be appropriate. But my problem is, in addition to be able to filter on these fields, I would like to give my users the possibility to search against these fields using free text. i.e., a query might be "single man having job." Therefore, I think that the boolean type is not appropriate anymore. Instead, I'm thinking of using the string type, and each field will be either empty (the "no" case), or populated by its own tag. e.g., if we deal about a man, the field is_man will contain the string "man." Then, I copy all these fields into a text field that I ca user for free text search.
> 
> Does that make sense?
> 
> Does that make sense in a multilingual context, i.e., field tags can be different in each language (EN => man, single, jog, FR => homme, célibataire, emploi, etc.)
> 
> Thanks!
> 
> -Saïd