You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Francisco Andrés Fernández <fr...@gmail.com> on 2015/09/10 15:58:03 UTC

Detect term occurrences

Hi all, I'm new to Solr.
I want to detect all ocurrences of terms existing in a thesaurus into 1 or
more documents.
What´s the best strategy to make it?
Doing a query for each term doesn't seem to be the best way.
Many thanks,

Francisco

Re: Detect term occurrences

Posted by Walter Underwood <wu...@wunderwood.org>.

Doing a query for each term should work well. Solr is fast for queries. Write a script.

I assume you only need to do this once. Running all the queries will probably take less time than figuring out a different approach.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


On Sep 10, 2015, at 7:37 AM, Markus Jelsma <ma...@openindex.io> wrote:

> If you are interested in just the number of occurences of an indexed term. The TermsComponent will give that answer.
> MArkus 
> 
> -----Original message-----
>> From:Francisco Andrés Fernández <fr...@gmail.com>
>> Sent: Thursday 10th September 2015 15:58
>> To: solr-user@lucene.apache.org
>> Subject: Detect term occurrences
>> 
>> Hi all, I'm new to Solr.
>> I want to detect all ocurrences of terms existing in a thesaurus into 1 or
>> more documents.
>> What´s the best strategy to make it?
>> Doing a query for each term doesn't seem to be the best way.
>> Many thanks,
>> 
>> Francisco
>>

RE: Detect term occurrences

Posted by Markus Jelsma <ma...@openindex.io>.

If you are interested in just the number of occurences of an indexed term. The TermsComponent will give that answer.
MArkus 
 
-----Original message-----
> From:Francisco Andrés Fernández <fr...@gmail.com>
> Sent: Thursday 10th September 2015 15:58
> To: solr-user@lucene.apache.org
> Subject: Detect term occurrences
> 
> Hi all, I'm new to Solr.
> I want to detect all ocurrences of terms existing in a thesaurus into 1 or
> more documents.
> What´s the best strategy to make it?
> Doing a query for each term doesn't seem to be the best way.
> Many thanks,
> 
> Francisco
>

Re: Detect term occurrences

Posted by Erick Erickson <er...@gmail.com>.

_Assuming_ this isn't a high throughput _and_ the leaflet text isn't too big...

Index the thesaurus and fire all the terms of the query in a big OR
clause against the index as a _query_. Perhaps turn highlighting on
and highlight the entire leaflet text.

Note, this is just "off the top of my head", I really haven't thought
it through too far and a lot depends on how many leaflets you have to
process and how often....

Best,
Erick

On Thu, Sep 10, 2015 at 7:21 PM, Francisco Andrés Fernández
<fr...@gmail.com> wrote:
> Yes.
> I have many drug products leaflets, each corresponding to 1 product. In the
> other hand we have a medical dictionary with about 10^5 terms.
> I want to detect all the occurrences of those terms for any leaflet
> document.
> Could you give me a clue about how is the best way to perform it?
> Perhaps, the best way is (as Walter suggests) to do all the queries every
> time, as needed.
> Regards,
>
> Francisco
>
> El jue., 10 de sept. de 2015 a la(s) 11:14 a. m., Alexandre Rafalovitch <
> arafalov@gmail.com> escribió:
>
>> Can you tell us a bit more about the business case? Not the current
>> technical one. Because it is entirely possible Solr can solve the
>> higher level problem out of the box without you doing manual term
>> comparisons.In which case, your problem scope is not quite right.
>>
>> Regards,
>>    Alex.
>> ----
>> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> http://www.solr-start.com/
>>
>>
>> On 10 September 2015 at 09:58, Francisco Andrés Fernández
>> <fr...@gmail.com> wrote:
>> > Hi all, I'm new to Solr.
>> > I want to detect all ocurrences of terms existing in a thesaurus into 1
>> or
>> > more documents.
>> > What´s the best strategy to make it?
>> > Doing a query for each term doesn't seem to be the best way.
>> > Many thanks,
>> >
>> > Francisco
>>

Re: Detect term occurrences

Posted by Francisco Andrés Fernández <fr...@gmail.com>.

Thanks again.
For the moment I think it won't be a problem. I have ~500 documents.
Regards,

Francisco

El vie., 11 de sept. de 2015 a la(s) 6:08 p. m., simon <mt...@gmail.com>
escribió:

> +1 on Sujit's recommendation: we have a similar use case (detecting drug
> names / disease entities /MeSH terms ) and have been using the
> SolrTextTagger with great success.
>
> We run a separate Solr instance as a tagging  service and add the detected
> tags as metadata fields to a document before it is ingested into our main
> Solr collection.
>
> How many documents/product leaflets do you have ? The tagger is very fast
> at the Solr level but I'm seeing quite a bit of HTTP overhead.
>
> best
>
> -Simon
>
> On Fri, Sep 11, 2015 at 1:39 PM, Sujit Pal <su...@comcast.net> wrote:
>
> > Hi Francisco,
> >
> > >> I have many drug products leaflets, each corresponding to 1 product.
> In
> > the
> > other hand we have a medical dictionary with about 10^5 terms.
> > I want to detect all the occurrences of those terms for any leaflet
> > document.
> > Take a look at SolrTextTagger for this use case.
> > https://github.com/OpenSextant/SolrTextTagger
> >
> > 10^5 entries are not that large, I am using it for much larger
> dictionaries
> > at the moment with very good results.
> >
> > Its a project built (at least originally) by David Smiley, who is also
> > quite active in this group.
> >
> > -sujit
> >
> >
> > On Fri, Sep 11, 2015 at 7:29 AM, Alexandre Rafalovitch <
> arafalov@gmail.com
> > >
> > wrote:
> >
> > > Assuming the medical dictionary is constant, I would do a copyField of
> > > text into a separate field and have that separate field use:
> > >
> > >
> >
> http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/miscellaneous/KeepWordFilterFactory.html
> > > with words coming from the dictionary (normalized).
> > >
> > > That way that new field will ONLY have your dictionary terms from the
> > > text. Then you can do facet against that field or anything else. Or
> > > even search and just be a lot more efficient.
> > >
> > > The main issue would be a gigantic filter, which may mean speed and/or
> > > memory issues. Solr has some ways to deal with such large set matches
> > > by compiling them into a state machine (used for auto-complete), but I
> > > don't know if that's exposed for your purpose.
> > >
> > > But could make a fun custom filter to build.
> > >
> > > Regards,
> > >    Alex.
> > > ----
> > > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> > > http://www.solr-start.com/
> > >
> > >
> > > On 10 September 2015 at 22:21, Francisco Andrés Fernández
> > > <fr...@gmail.com> wrote:
> > > > Yes.
> > > > I have many drug products leaflets, each corresponding to 1 product.
> In
> > > the
> > > > other hand we have a medical dictionary with about 10^5 terms.
> > > > I want to detect all the occurrences of those terms for any leaflet
> > > > document.
> > > > Could you give me a clue about how is the best way to perform it?
> > > > Perhaps, the best way is (as Walter suggests) to do all the queries
> > every
> > > > time, as needed.
> > > > Regards,
> > > >
> > > > Francisco
> > > >
> > > > El jue., 10 de sept. de 2015 a la(s) 11:14 a. m., Alexandre
> > Rafalovitch <
> > > > arafalov@gmail.com> escribió:
> > > >
> > > >> Can you tell us a bit more about the business case? Not the current
> > > >> technical one. Because it is entirely possible Solr can solve the
> > > >> higher level problem out of the box without you doing manual term
> > > >> comparisons.In which case, your problem scope is not quite right.
> > > >>
> > > >> Regards,
> > > >>    Alex.
> > > >> ----
> > > >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> > > >> http://www.solr-start.com/
> > > >>
> > > >>
> > > >> On 10 September 2015 at 09:58, Francisco Andrés Fernández
> > > >> <fr...@gmail.com> wrote:
> > > >> > Hi all, I'm new to Solr.
> > > >> > I want to detect all ocurrences of terms existing in a thesaurus
> > into
> > > 1
> > > >> or
> > > >> > more documents.
> > > >> > What´s the best strategy to make it?
> > > >> > Doing a query for each term doesn't seem to be the best way.
> > > >> > Many thanks,
> > > >> >
> > > >> > Francisco
> > > >>
> > >
> >
>

Re: Detect term occurrences

Posted by simon <mt...@gmail.com>.

+1 on Sujit's recommendation: we have a similar use case (detecting drug
names / disease entities /MeSH terms ) and have been using the
SolrTextTagger with great success.

We run a separate Solr instance as a tagging  service and add the detected
tags as metadata fields to a document before it is ingested into our main
Solr collection.

How many documents/product leaflets do you have ? The tagger is very fast
at the Solr level but I'm seeing quite a bit of HTTP overhead.

best

-Simon

On Fri, Sep 11, 2015 at 1:39 PM, Sujit Pal <su...@comcast.net> wrote:

> Hi Francisco,
>
> >> I have many drug products leaflets, each corresponding to 1 product. In
> the
> other hand we have a medical dictionary with about 10^5 terms.
> I want to detect all the occurrences of those terms for any leaflet
> document.
> Take a look at SolrTextTagger for this use case.
> https://github.com/OpenSextant/SolrTextTagger
>
> 10^5 entries are not that large, I am using it for much larger dictionaries
> at the moment with very good results.
>
> Its a project built (at least originally) by David Smiley, who is also
> quite active in this group.
>
> -sujit
>
>
> On Fri, Sep 11, 2015 at 7:29 AM, Alexandre Rafalovitch <arafalov@gmail.com
> >
> wrote:
>
> > Assuming the medical dictionary is constant, I would do a copyField of
> > text into a separate field and have that separate field use:
> >
> >
> http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/miscellaneous/KeepWordFilterFactory.html
> > with words coming from the dictionary (normalized).
> >
> > That way that new field will ONLY have your dictionary terms from the
> > text. Then you can do facet against that field or anything else. Or
> > even search and just be a lot more efficient.
> >
> > The main issue would be a gigantic filter, which may mean speed and/or
> > memory issues. Solr has some ways to deal with such large set matches
> > by compiling them into a state machine (used for auto-complete), but I
> > don't know if that's exposed for your purpose.
> >
> > But could make a fun custom filter to build.
> >
> > Regards,
> >    Alex.
> > ----
> > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> > http://www.solr-start.com/
> >
> >
> > On 10 September 2015 at 22:21, Francisco Andrés Fernández
> > <fr...@gmail.com> wrote:
> > > Yes.
> > > I have many drug products leaflets, each corresponding to 1 product. In
> > the
> > > other hand we have a medical dictionary with about 10^5 terms.
> > > I want to detect all the occurrences of those terms for any leaflet
> > > document.
> > > Could you give me a clue about how is the best way to perform it?
> > > Perhaps, the best way is (as Walter suggests) to do all the queries
> every
> > > time, as needed.
> > > Regards,
> > >
> > > Francisco
> > >
> > > El jue., 10 de sept. de 2015 a la(s) 11:14 a. m., Alexandre
> Rafalovitch <
> > > arafalov@gmail.com> escribió:
> > >
> > >> Can you tell us a bit more about the business case? Not the current
> > >> technical one. Because it is entirely possible Solr can solve the
> > >> higher level problem out of the box without you doing manual term
> > >> comparisons.In which case, your problem scope is not quite right.
> > >>
> > >> Regards,
> > >>    Alex.
> > >> ----
> > >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> > >> http://www.solr-start.com/
> > >>
> > >>
> > >> On 10 September 2015 at 09:58, Francisco Andrés Fernández
> > >> <fr...@gmail.com> wrote:
> > >> > Hi all, I'm new to Solr.
> > >> > I want to detect all ocurrences of terms existing in a thesaurus
> into
> > 1
> > >> or
> > >> > more documents.
> > >> > What´s the best strategy to make it?
> > >> > Doing a query for each term doesn't seem to be the best way.
> > >> > Many thanks,
> > >> >
> > >> > Francisco
> > >>
> >
>

Re: Detect term occurrences

Posted by Francisco Andrés Fernández <fr...@gmail.com>.

Thanks!

El vie, sep 11, 2015 14:39, Sujit Pal <su...@comcast.net> escribió:

> Hi Francisco,
>
> >> I have many drug products leaflets, each corresponding to 1 product. In
> the
> other hand we have a medical dictionary with about 10^5 terms.
> I want to detect all the occurrences of those terms for any leaflet
> document.
> Take a look at SolrTextTagger for this use case.
> https://github.com/OpenSextant/SolrTextTagger
>
> 10^5 entries are not that large, I am using it for much larger dictionaries
> at the moment with very good results.
>
> Its a project built (at least originally) by David Smiley, who is also
> quite active in this group.
>
> -sujit
>
>
> On Fri, Sep 11, 2015 at 7:29 AM, Alexandre Rafalovitch <arafalov@gmail.com
> >
> wrote:
>
> > Assuming the medical dictionary is constant, I would do a copyField of
> > text into a separate field and have that separate field use:
> >
> >
> http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/miscellaneous/KeepWordFilterFactory.html
> > with words coming from the dictionary (normalized).
> >
> > That way that new field will ONLY have your dictionary terms from the
> > text. Then you can do facet against that field or anything else. Or
> > even search and just be a lot more efficient.
> >
> > The main issue would be a gigantic filter, which may mean speed and/or
> > memory issues. Solr has some ways to deal with such large set matches
> > by compiling them into a state machine (used for auto-complete), but I
> > don't know if that's exposed for your purpose.
> >
> > But could make a fun custom filter to build.
> >
> > Regards,
> >    Alex.
> > ----
> > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> > http://www.solr-start.com/
> >
> >
> > On 10 September 2015 at 22:21, Francisco Andrés Fernández
> > <fr...@gmail.com> wrote:
> > > Yes.
> > > I have many drug products leaflets, each corresponding to 1 product. In
> > the
> > > other hand we have a medical dictionary with about 10^5 terms.
> > > I want to detect all the occurrences of those terms for any leaflet
> > > document.
> > > Could you give me a clue about how is the best way to perform it?
> > > Perhaps, the best way is (as Walter suggests) to do all the queries
> every
> > > time, as needed.
> > > Regards,
> > >
> > > Francisco
> > >
> > > El jue., 10 de sept. de 2015 a la(s) 11:14 a. m., Alexandre
> Rafalovitch <
> > > arafalov@gmail.com> escribió:
> > >
> > >> Can you tell us a bit more about the business case? Not the current
> > >> technical one. Because it is entirely possible Solr can solve the
> > >> higher level problem out of the box without you doing manual term
> > >> comparisons.In which case, your problem scope is not quite right.
> > >>
> > >> Regards,
> > >>    Alex.
> > >> ----
> > >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> > >> http://www.solr-start.com/
> > >>
> > >>
> > >> On 10 September 2015 at 09:58, Francisco Andrés Fernández
> > >> <fr...@gmail.com> wrote:
> > >> > Hi all, I'm new to Solr.
> > >> > I want to detect all ocurrences of terms existing in a thesaurus
> into
> > 1
> > >> or
> > >> > more documents.
> > >> > What´s the best strategy to make it?
> > >> > Doing a query for each term doesn't seem to be the best way.
> > >> > Many thanks,
> > >> >
> > >> > Francisco
> > >>
> >
>

Re: Detect term occurrences

Posted by Sujit Pal <su...@comcast.net>.

Hi Francisco,

>> I have many drug products leaflets, each corresponding to 1 product. In
the
other hand we have a medical dictionary with about 10^5 terms.
I want to detect all the occurrences of those terms for any leaflet
document.
Take a look at SolrTextTagger for this use case.
https://github.com/OpenSextant/SolrTextTagger

10^5 entries are not that large, I am using it for much larger dictionaries
at the moment with very good results.

Its a project built (at least originally) by David Smiley, who is also
quite active in this group.

-sujit


On Fri, Sep 11, 2015 at 7:29 AM, Alexandre Rafalovitch <ar...@gmail.com>
wrote:

> Assuming the medical dictionary is constant, I would do a copyField of
> text into a separate field and have that separate field use:
>
> http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/miscellaneous/KeepWordFilterFactory.html
> with words coming from the dictionary (normalized).
>
> That way that new field will ONLY have your dictionary terms from the
> text. Then you can do facet against that field or anything else. Or
> even search and just be a lot more efficient.
>
> The main issue would be a gigantic filter, which may mean speed and/or
> memory issues. Solr has some ways to deal with such large set matches
> by compiling them into a state machine (used for auto-complete), but I
> don't know if that's exposed for your purpose.
>
> But could make a fun custom filter to build.
>
> Regards,
>    Alex.
> ----
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 10 September 2015 at 22:21, Francisco Andrés Fernández
> <fr...@gmail.com> wrote:
> > Yes.
> > I have many drug products leaflets, each corresponding to 1 product. In
> the
> > other hand we have a medical dictionary with about 10^5 terms.
> > I want to detect all the occurrences of those terms for any leaflet
> > document.
> > Could you give me a clue about how is the best way to perform it?
> > Perhaps, the best way is (as Walter suggests) to do all the queries every
> > time, as needed.
> > Regards,
> >
> > Francisco
> >
> > El jue., 10 de sept. de 2015 a la(s) 11:14 a. m., Alexandre Rafalovitch <
> > arafalov@gmail.com> escribió:
> >
> >> Can you tell us a bit more about the business case? Not the current
> >> technical one. Because it is entirely possible Solr can solve the
> >> higher level problem out of the box without you doing manual term
> >> comparisons.In which case, your problem scope is not quite right.
> >>
> >> Regards,
> >>    Alex.
> >> ----
> >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> >> http://www.solr-start.com/
> >>
> >>
> >> On 10 September 2015 at 09:58, Francisco Andrés Fernández
> >> <fr...@gmail.com> wrote:
> >> > Hi all, I'm new to Solr.
> >> > I want to detect all ocurrences of terms existing in a thesaurus into
> 1
> >> or
> >> > more documents.
> >> > What´s the best strategy to make it?
> >> > Doing a query for each term doesn't seem to be the best way.
> >> > Many thanks,
> >> >
> >> > Francisco
> >>
>

Re: Detect term occurrences

Posted by Alexandre Rafalovitch <ar...@gmail.com>.

Assuming the medical dictionary is constant, I would do a copyField of
text into a separate field and have that separate field use:
http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/miscellaneous/KeepWordFilterFactory.html
with words coming from the dictionary (normalized).

That way that new field will ONLY have your dictionary terms from the
text. Then you can do facet against that field or anything else. Or
even search and just be a lot more efficient.

The main issue would be a gigantic filter, which may mean speed and/or
memory issues. Solr has some ways to deal with such large set matches
by compiling them into a state machine (used for auto-complete), but I
don't know if that's exposed for your purpose.

But could make a fun custom filter to build.

Regards,
   Alex.
----
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/

On 10 September 2015 at 22:21, Francisco Andrés Fernández
<fr...@gmail.com> wrote:
> Yes.
> I have many drug products leaflets, each corresponding to 1 product. In the
> other hand we have a medical dictionary with about 10^5 terms.
> I want to detect all the occurrences of those terms for any leaflet
> document.
> Could you give me a clue about how is the best way to perform it?
> Perhaps, the best way is (as Walter suggests) to do all the queries every
> time, as needed.
> Regards,
>
> Francisco
>
> El jue., 10 de sept. de 2015 a la(s) 11:14 a. m., Alexandre Rafalovitch <
> arafalov@gmail.com> escribió:
>
>> Can you tell us a bit more about the business case? Not the current
>> technical one. Because it is entirely possible Solr can solve the
>> higher level problem out of the box without you doing manual term
>> comparisons.In which case, your problem scope is not quite right.
>>
>> Regards,
>>    Alex.
>> ----
>> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> http://www.solr-start.com/
>>
>>
>> On 10 September 2015 at 09:58, Francisco Andrés Fernández
>> <fr...@gmail.com> wrote:
>> > Hi all, I'm new to Solr.
>> > I want to detect all ocurrences of terms existing in a thesaurus into 1
>> or
>> > more documents.
>> > What´s the best strategy to make it?
>> > Doing a query for each term doesn't seem to be the best way.
>> > Many thanks,
>> >
>> > Francisco
>>

Re: Detect term occurrences

Posted by Francisco Andrés Fernández <fr...@gmail.com>.

Many thanks pals.
I will walk some of those ways (and return with new questions)
;)
Best regards,

Francisco

El vie., 11 de sept. de 2015 a la(s) 5:41 a. m., Upayavira <uv...@odoko.co.uk>
escribió:

> It sounds to me like you are wanting to *filter* your document to only
> include terms within that medical dictionary. Or to have a keyword field
> based upon those of your 100k terms that appear in that doc.
>
> Synonyms are your saviour, if that's the case. Create a synonyms list
> for your terms, they can be a one-to-one mapping, so:
>
> diabetes => diabetes
>
> is quite okay. Then, in your index time analysis chain, have a
> SynonymFilterFactory followed by a TypeTokenFilterFactory configured to
> only allow SYNONYM tokens through.
>
> Then, in your index, you will have a field that contains all the terms
> from your 100k that are included in that particular document.
>
> Does that get it?
>
> Upayavira
>
> On Fri, Sep 11, 2015, at 03:21 AM, Francisco Andrés Fernández wrote:
> > Yes.
> > I have many drug products leaflets, each corresponding to 1 product. In
> > the
> > other hand we have a medical dictionary with about 10^5 terms.
> > I want to detect all the occurrences of those terms for any leaflet
> > document.
> > Could you give me a clue about how is the best way to perform it?
> > Perhaps, the best way is (as Walter suggests) to do all the queries every
> > time, as needed.
> > Regards,
> >
> > Francisco
> >
> > El jue., 10 de sept. de 2015 a la(s) 11:14 a. m., Alexandre Rafalovitch <
> > arafalov@gmail.com> escribió:
> >
> > > Can you tell us a bit more about the business case? Not the current
> > > technical one. Because it is entirely possible Solr can solve the
> > > higher level problem out of the box without you doing manual term
> > > comparisons.In which case, your problem scope is not quite right.
> > >
> > > Regards,
> > >    Alex.
> > > ----
> > > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> > > http://www.solr-start.com/
> > >
> > >
> > > On 10 September 2015 at 09:58, Francisco Andrés Fernández
> > > <fr...@gmail.com> wrote:
> > > > Hi all, I'm new to Solr.
> > > > I want to detect all ocurrences of terms existing in a thesaurus
> into 1
> > > or
> > > > more documents.
> > > > What´s the best strategy to make it?
> > > > Doing a query for each term doesn't seem to be the best way.
> > > > Many thanks,
> > > >
> > > > Francisco
> > >
>

Re: Detect term occurrences

Posted by Upayavira <uv...@odoko.co.uk>.

It sounds to me like you are wanting to *filter* your document to only
include terms within that medical dictionary. Or to have a keyword field
based upon those of your 100k terms that appear in that doc.

Synonyms are your saviour, if that's the case. Create a synonyms list
for your terms, they can be a one-to-one mapping, so:

diabetes => diabetes

is quite okay. Then, in your index time analysis chain, have a
SynonymFilterFactory followed by a TypeTokenFilterFactory configured to
only allow SYNONYM tokens through.

Then, in your index, you will have a field that contains all the terms
from your 100k that are included in that particular document.

Does that get it?

Upayavira

On Fri, Sep 11, 2015, at 03:21 AM, Francisco Andrés Fernández wrote:
> Yes.
> I have many drug products leaflets, each corresponding to 1 product. In
> the
> other hand we have a medical dictionary with about 10^5 terms.
> I want to detect all the occurrences of those terms for any leaflet
> document.
> Could you give me a clue about how is the best way to perform it?
> Perhaps, the best way is (as Walter suggests) to do all the queries every
> time, as needed.
> Regards,
> 
> Francisco
> 
> El jue., 10 de sept. de 2015 a la(s) 11:14 a. m., Alexandre Rafalovitch <
> arafalov@gmail.com> escribió:
> 
> > Can you tell us a bit more about the business case? Not the current
> > technical one. Because it is entirely possible Solr can solve the
> > higher level problem out of the box without you doing manual term
> > comparisons.In which case, your problem scope is not quite right.
> >
> > Regards,
> >    Alex.
> > ----
> > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> > http://www.solr-start.com/
> >
> >
> > On 10 September 2015 at 09:58, Francisco Andrés Fernández
> > <fr...@gmail.com> wrote:
> > > Hi all, I'm new to Solr.
> > > I want to detect all ocurrences of terms existing in a thesaurus into 1
> > or
> > > more documents.
> > > What´s the best strategy to make it?
> > > Doing a query for each term doesn't seem to be the best way.
> > > Many thanks,
> > >
> > > Francisco
> >

Re: Detect term occurrences

Posted by Francisco Andrés Fernández <fr...@gmail.com>.

Yes.
I have many drug products leaflets, each corresponding to 1 product. In the
other hand we have a medical dictionary with about 10^5 terms.
I want to detect all the occurrences of those terms for any leaflet
document.
Could you give me a clue about how is the best way to perform it?
Perhaps, the best way is (as Walter suggests) to do all the queries every
time, as needed.
Regards,

Francisco

El jue., 10 de sept. de 2015 a la(s) 11:14 a. m., Alexandre Rafalovitch <
arafalov@gmail.com> escribió:

> Can you tell us a bit more about the business case? Not the current
> technical one. Because it is entirely possible Solr can solve the
> higher level problem out of the box without you doing manual term
> comparisons.In which case, your problem scope is not quite right.
>
> Regards,
>    Alex.
> ----
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 10 September 2015 at 09:58, Francisco Andrés Fernández
> <fr...@gmail.com> wrote:
> > Hi all, I'm new to Solr.
> > I want to detect all ocurrences of terms existing in a thesaurus into 1
> or
> > more documents.
> > What´s the best strategy to make it?
> > Doing a query for each term doesn't seem to be the best way.
> > Many thanks,
> >
> > Francisco
>

Re: Detect term occurrences

Posted by Alexandre Rafalovitch <ar...@gmail.com>.

Can you tell us a bit more about the business case? Not the current
technical one. Because it is entirely possible Solr can solve the
higher level problem out of the box without you doing manual term
comparisons.In which case, your problem scope is not quite right.

Regards,
   Alex.
----
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/

On 10 September 2015 at 09:58, Francisco Andrés Fernández
<fr...@gmail.com> wrote:
> Hi all, I'm new to Solr.
> I want to detect all ocurrences of terms existing in a thesaurus into 1 or
> more documents.
> What´s the best strategy to make it?
> Doing a query for each term doesn't seem to be the best way.
> Many thanks,
>
> Francisco