You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Eugeny Balakhonov <c0...@gmail.com> on 2011/09/10 21:31:20 UTC

Full-search index for the database

I want to create full-text search for my database.

It means that search engine should look up some string for all fields of my
database.

I have created Solr configuration for extracting and indexing data from a
database.

 

 

According documentation in the file schema.xml I have created field for
full-text search index:

 

<field name="TEXT" type="..." indexed="true" stored="true"
multiValued="true"/>

 

Also I have added strings for copying all values of all fields into this
full-search field:

 

...

    <copyField source="...." dest="TEXT"/>

...

 

In result I have possibility to search for all fields in my database. But I
can't recognize which field in the found record contains requested string.

Highlighting functionality just marks string in the "TEXT" field like
following:

 

<lst name="highlighting">

<lst name="431046.431344...8473633">

  <arr name="TEXT">

    <str>Any text any text <em>Test</em>"</str> 

  </arr>

</lst>

<lst name="431046.431231...8476393">

  <arr name="TEXT">

   <str>Any text any text <em>Test</em>"</str> 

  </arr>

</lst>

 

How to create full-search index with possibility to recognize source
database field?

 

Thx a lot.

Eugeny


Re: Full-search index for the database

Posted by Erick Erickson <er...@gmail.com>.
How much search-specific stuff are we talking here? Do you want to
do stemming? Plurals? Or are you talking exact match? Phrases?
multi-word queries? If exact match on individual terms
is all you want, you could hack something together like this:

index each term into a catch-all field with the field appended, something
like
val1|field1 val2|field2
be sure you don't use an analysis chain that splits on non-letters. Then, for
each term, append |* to the term and your returned terms will have the
field they came from. Of course you'll have to "do the right thing" with the
results to show them correctly, but that'd work.

But this is really abusing Solr <G>. I wonder if this is an "XY problem", so
can you explain what it is you're trying to do at a higher level and maybe
we can suggest some other approach?

You could also have some kind of hybrid solution that searched with
Solr (not using the trick above) and just returned the PK from Solr,
then go to the DB to fill things out.

Best
Erick

On Sun, Sep 11, 2011 at 7:06 PM, Eugeny Balakhonov <c0...@gmail.com> wrote:
> My task is very simple:
>
> I have a big database with a lot tables and fields. This database has
> dynamic structure and can be extended or changed in any time.
> I need a tool for full-search possibility via all fields in all tables of my
> database. On the input of this tool - some text for search. On the output -
> some unique key and the name of field which contains this text.
>
>
> Solr is very good selection, but I have serious problem with it: all Solr
> query parsers (standard, dismax, edismax) requires explicit declaration of
> fields for search. But list of these fields in my case is very and very big!
> And at search time I don't know all field names in  the database.
>
> I think that my task is not unique. According google a lot of people tries
> to solve same problems with Solr.
>
> May be good idea to add more flexible possibilities for search in all
> indexed fields?
>
>
> I see following variants:
>
> 1. Add wildcards in the qf parameter for dismax/edismax query parsers.
>
> 2. Add possibility to store source field name in <copyField > operator in
> schema.xml. In this case user can do following:
>
> a) create field for default search:
> <field name="TEXT" type="text_ALL" indexed="true" stored="true"
> multiValued="true"/>
> ...
> <defaultSearchField>TEXT</defaultSearchField>
>
> b) copy all fields to default search field:
> <copyField source="*" dest="TEXT" storeSource="true" />
>
> c) In query response user can receive needed source field name:
>
> <lst name="highlighting">
>  <lst name="......">
>  <arr name="TEXT">
>  <str source="SOURCE_FIELD_NAME">foo foo foo <em>test</em> foo foo</str>
>  </arr>
>  </lst>
>
>
> 2011/9/12 Eugeny Balakhonov <c0...@gmail.com>
>
>> Hello,
>>
>> Thanks for answer!
>>
>> I have created separate fields in mysolr schema for each field in database
>> (more than 500!). How to ask parser for search via all these fields? By
>> default Solr schema should contain explicit declaration of default search
>> field like following:
>>
>> <defaultSearchField>TEXT</defaultSearchField>
>>
>> I tried to use following search query:
>>
>> .....?q=*:search text&hl=on&defType=edismax
>>
>> In this case search goes across default search field.
>>
>> I can't concatenate all 500 database field names in a big search
>> expression.
>>
>>
>> 2011/9/11 Jamie Johnson <je...@gmail.com>
>>
>>> You should create separate fields in your solr schema for each field
>>> in your database that you want recognized separately.  You can use a
>>> query parser like edismax to do a weighted query across all of your
>>> fields and then provide highlighting on the specific field which
>>> matched.
>>>
>>> 2011/9/10 Eugeny Balakhonov <c0...@gmail.com>:
>>> > I want to create full-text search for my database.
>>> >
>>> > It means that search engine should look up some string for all fields of
>>> my
>>> > database.
>>> >
>>> > I have created Solr configuration for extracting and indexing data from
>>> a
>>> > database.
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > According documentation in the file schema.xml I have created field for
>>> > full-text search index:
>>> >
>>> >
>>> >
>>> > <field name="TEXT" type="..." indexed="true" stored="true"
>>> > multiValued="true"/>
>>> >
>>> >
>>> >
>>> > Also I have added strings for copying all values of all fields into this
>>> > full-search field:
>>> >
>>> >
>>> >
>>> > ...
>>> >
>>> >    <copyField source="...." dest="TEXT"/>
>>> >
>>> > ...
>>> >
>>> >
>>> >
>>> > In result I have possibility to search for all fields in my database.
>>> But I
>>> > can't recognize which field in the found record contains requested
>>> string.
>>> >
>>> > Highlighting functionality just marks string in the "TEXT" field like
>>> > following:
>>> >
>>> >
>>> >
>>> > <lst name="highlighting">
>>> >
>>> > <lst name="431046.431344...8473633">
>>> >
>>> >  <arr name="TEXT">
>>> >
>>> >    <str>Any text any text <em>Test</em>"</str>
>>> >
>>> >  </arr>
>>> >
>>> > </lst>
>>> >
>>> > <lst name="431046.431231...8476393">
>>> >
>>> >  <arr name="TEXT">
>>> >
>>> >   <str>Any text any text <em>Test</em>"</str>
>>> >
>>> >  </arr>
>>> >
>>> > </lst>
>>> >
>>> >
>>> >
>>> > How to create full-search index with possibility to recognize source
>>> > database field?
>>> >
>>> >
>>> >
>>> > Thx a lot.
>>> >
>>> > Eugeny
>>> >
>>> >
>>>
>>
>>
>>
>> --
>> Best regards,
>> Eugeny Balakhonov
>>
>
>
>
> --
> Best regards,
> Eugeny Balakhonov
>

Re: Full-search index for the database

Posted by Eugeny Balakhonov <c0...@gmail.com>.
My task is very simple:

I have a big database with a lot tables and fields. This database has
dynamic structure and can be extended or changed in any time.
I need a tool for full-search possibility via all fields in all tables of my
database. On the input of this tool - some text for search. On the output -
some unique key and the name of field which contains this text.


Solr is very good selection, but I have serious problem with it: all Solr
query parsers (standard, dismax, edismax) requires explicit declaration of
fields for search. But list of these fields in my case is very and very big!
And at search time I don't know all field names in  the database.

I think that my task is not unique. According google a lot of people tries
to solve same problems with Solr.

May be good idea to add more flexible possibilities for search in all
indexed fields?


I see following variants:

1. Add wildcards in the qf parameter for dismax/edismax query parsers.

2. Add possibility to store source field name in <copyField > operator in
schema.xml. In this case user can do following:

a) create field for default search:
<field name="TEXT" type="text_ALL" indexed="true" stored="true"
multiValued="true"/>
...
<defaultSearchField>TEXT</defaultSearchField>

b) copy all fields to default search field:
<copyField source="*" dest="TEXT" storeSource="true" />

c) In query response user can receive needed source field name:

<lst name="highlighting">
 <lst name="......">
 <arr name="TEXT">
  <str source="SOURCE_FIELD_NAME">foo foo foo <em>test</em> foo foo</str>
  </arr>
  </lst>


2011/9/12 Eugeny Balakhonov <c0...@gmail.com>

> Hello,
>
> Thanks for answer!
>
> I have created separate fields in mysolr schema for each field in database
> (more than 500!). How to ask parser for search via all these fields? By
> default Solr schema should contain explicit declaration of default search
> field like following:
>
> <defaultSearchField>TEXT</defaultSearchField>
>
> I tried to use following search query:
>
> .....?q=*:search text&hl=on&defType=edismax
>
> In this case search goes across default search field.
>
> I can't concatenate all 500 database field names in a big search
> expression.
>
>
> 2011/9/11 Jamie Johnson <je...@gmail.com>
>
>> You should create separate fields in your solr schema for each field
>> in your database that you want recognized separately.  You can use a
>> query parser like edismax to do a weighted query across all of your
>> fields and then provide highlighting on the specific field which
>> matched.
>>
>> 2011/9/10 Eugeny Balakhonov <c0...@gmail.com>:
>> > I want to create full-text search for my database.
>> >
>> > It means that search engine should look up some string for all fields of
>> my
>> > database.
>> >
>> > I have created Solr configuration for extracting and indexing data from
>> a
>> > database.
>> >
>> >
>> >
>> >
>> >
>> > According documentation in the file schema.xml I have created field for
>> > full-text search index:
>> >
>> >
>> >
>> > <field name="TEXT" type="..." indexed="true" stored="true"
>> > multiValued="true"/>
>> >
>> >
>> >
>> > Also I have added strings for copying all values of all fields into this
>> > full-search field:
>> >
>> >
>> >
>> > ...
>> >
>> >    <copyField source="...." dest="TEXT"/>
>> >
>> > ...
>> >
>> >
>> >
>> > In result I have possibility to search for all fields in my database.
>> But I
>> > can't recognize which field in the found record contains requested
>> string.
>> >
>> > Highlighting functionality just marks string in the "TEXT" field like
>> > following:
>> >
>> >
>> >
>> > <lst name="highlighting">
>> >
>> > <lst name="431046.431344...8473633">
>> >
>> >  <arr name="TEXT">
>> >
>> >    <str>Any text any text <em>Test</em>"</str>
>> >
>> >  </arr>
>> >
>> > </lst>
>> >
>> > <lst name="431046.431231...8476393">
>> >
>> >  <arr name="TEXT">
>> >
>> >   <str>Any text any text <em>Test</em>"</str>
>> >
>> >  </arr>
>> >
>> > </lst>
>> >
>> >
>> >
>> > How to create full-search index with possibility to recognize source
>> > database field?
>> >
>> >
>> >
>> > Thx a lot.
>> >
>> > Eugeny
>> >
>> >
>>
>
>
>
> --
> Best regards,
> Eugeny Balakhonov
>



-- 
Best regards,
Eugeny Balakhonov

Re: Full-search index for the database

Posted by Eugeny Balakhonov <c0...@gmail.com>.
Hello,

Thanks for answer!

I have created separate fields in mysolr schema for each field in database
(more than 500!). How to ask parser for search via all these fields? By
default Solr schema should contain explicit declaration of default search
field like following:

<defaultSearchField>TEXT</defaultSearchField>

I tried to use following search query:

.....?q=*:search text&hl=on&defType=edismax

In this case search goes across default search field.

I can't concatenate all 500 database field names in a big search expression.


2011/9/11 Jamie Johnson <je...@gmail.com>

> You should create separate fields in your solr schema for each field
> in your database that you want recognized separately.  You can use a
> query parser like edismax to do a weighted query across all of your
> fields and then provide highlighting on the specific field which
> matched.
>
> 2011/9/10 Eugeny Balakhonov <c0...@gmail.com>:
> > I want to create full-text search for my database.
> >
> > It means that search engine should look up some string for all fields of
> my
> > database.
> >
> > I have created Solr configuration for extracting and indexing data from a
> > database.
> >
> >
> >
> >
> >
> > According documentation in the file schema.xml I have created field for
> > full-text search index:
> >
> >
> >
> > <field name="TEXT" type="..." indexed="true" stored="true"
> > multiValued="true"/>
> >
> >
> >
> > Also I have added strings for copying all values of all fields into this
> > full-search field:
> >
> >
> >
> > ...
> >
> >    <copyField source="...." dest="TEXT"/>
> >
> > ...
> >
> >
> >
> > In result I have possibility to search for all fields in my database. But
> I
> > can't recognize which field in the found record contains requested
> string.
> >
> > Highlighting functionality just marks string in the "TEXT" field like
> > following:
> >
> >
> >
> > <lst name="highlighting">
> >
> > <lst name="431046.431344...8473633">
> >
> >  <arr name="TEXT">
> >
> >    <str>Any text any text <em>Test</em>"</str>
> >
> >  </arr>
> >
> > </lst>
> >
> > <lst name="431046.431231...8476393">
> >
> >  <arr name="TEXT">
> >
> >   <str>Any text any text <em>Test</em>"</str>
> >
> >  </arr>
> >
> > </lst>
> >
> >
> >
> > How to create full-search index with possibility to recognize source
> > database field?
> >
> >
> >
> > Thx a lot.
> >
> > Eugeny
> >
> >
>



-- 
Best regards,
Eugeny Balakhonov

Re: Full-search index for the database

Posted by Jamie Johnson <je...@gmail.com>.
You should create separate fields in your solr schema for each field
in your database that you want recognized separately.  You can use a
query parser like edismax to do a weighted query across all of your
fields and then provide highlighting on the specific field which
matched.

2011/9/10 Eugeny Balakhonov <c0...@gmail.com>:
> I want to create full-text search for my database.
>
> It means that search engine should look up some string for all fields of my
> database.
>
> I have created Solr configuration for extracting and indexing data from a
> database.
>
>
>
>
>
> According documentation in the file schema.xml I have created field for
> full-text search index:
>
>
>
> <field name="TEXT" type="..." indexed="true" stored="true"
> multiValued="true"/>
>
>
>
> Also I have added strings for copying all values of all fields into this
> full-search field:
>
>
>
> ...
>
>    <copyField source="...." dest="TEXT"/>
>
> ...
>
>
>
> In result I have possibility to search for all fields in my database. But I
> can't recognize which field in the found record contains requested string.
>
> Highlighting functionality just marks string in the "TEXT" field like
> following:
>
>
>
> <lst name="highlighting">
>
> <lst name="431046.431344...8473633">
>
>  <arr name="TEXT">
>
>    <str>Any text any text <em>Test</em>"</str>
>
>  </arr>
>
> </lst>
>
> <lst name="431046.431231...8476393">
>
>  <arr name="TEXT">
>
>   <str>Any text any text <em>Test</em>"</str>
>
>  </arr>
>
> </lst>
>
>
>
> How to create full-search index with possibility to recognize source
> database field?
>
>
>
> Thx a lot.
>
> Eugeny
>
>