You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Celso Fontes <ce...@gmail.com> on 2010/11/15 20:20:58 UTC

What is the best Analyzer and Parser for this type of question?

I am using this code, with SnowBall and TopDocScore
the code: http://pastebin.com/3X3gbpXE

Example of Question:
- What is the role of PrnP in mad cow disease?

I am running in 11.638 documents and the result is 10410 docs for this
question (lowwwwww precision)
How optimize this?

Thanks,
Celso.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: What is the best Analyzer and Parser for this type of question?

Posted by Celso Fontes <ce...@gmail.com>.

Hi Erick,

My queries going from a list of Genomic TREC 2006...

What the operator you recommend to me?

Thanks,
Celso


2010/11/15 Erick Erickson <er...@gmail.com>

> First question:  What's the default operator? Out of
> the box, its OR. See QueryParser.setDefaultOperator...
>
> Second, how are you forming your query? Just running
> it at the query parser? Query.toString() may be your friend.
>
> Best
> Erick
>
> On Mon, Nov 15, 2010 at 2:20 PM, Celso Fontes <ce...@gmail.com> wrote:
>
> > I am using this code, with SnowBall and TopDocScore
> > the code: http://pastebin.com/3X3gbpXE
> >
> > Example of Question:
> > - What is the role of PrnP in mad cow disease?
> >
> > I am running in 11.638 documents and the result is 10410 docs for this
> > question (lowwwwww precision)
> > How optimize this?
> >
> > Thanks,
> > Celso.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>

Re: What is the best Analyzer and Parser for this type of question?

Posted by Erick Erickson <er...@gmail.com>.

First question:  What's the default operator? Out of
the box, its OR. See QueryParser.setDefaultOperator...

Second, how are you forming your query? Just running
it at the query parser? Query.toString() may be your friend.

Best
Erick

On Mon, Nov 15, 2010 at 2:20 PM, Celso Fontes <ce...@gmail.com> wrote:

> I am using this code, with SnowBall and TopDocScore
> the code: http://pastebin.com/3X3gbpXE
>
> Example of Question:
> - What is the role of PrnP in mad cow disease?
>
> I am running in 11.638 documents and the result is 10410 docs for this
> question (lowwwwww precision)
> How optimize this?
>
> Thanks,
> Celso.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: What is the best Analyzer and Parser for this type of question?

Posted by Lance Norskog <go...@gmail.com>.

First, to understand what your query looks like, go to 
admin/analysis.jsp. It lets you see what happens to your queries when 
they go in. Then, do the query with debugQuery=true. This will add some 
complex junk to the end of the XML page that describes in painful detail 
exactly how each document was scored.

After all that- you might have a problem with the PrnP etc. stuff 
getting chopped up in weird ways. I don't know how people handle this in 
chemistry/bio search.

Lance

Ahmet Arslan wrote:
>    
>> Example of Question:
>> - What is the role of PrnP in mad cow disease?
>>      
> First thing is do not directly query questions. Manually formulate queries:
> remove 'what' 'is' 'the' 'of' '?' etc.
>
> For example i would convert this question into:
>
> "mad cow"^5 "cow disease"^3 "mad cow disease"^15 "role PrnP"~5^2 "role mad cow disease"~45 mad^0.1 role^0.5 cow disease PrnP^10
>
>    
>> I am running in 11.638 documents and the result is 10410
>> docs for this question (lowwwwww precision)
>>      
> Use OR default operator, collect and evaluate top 1000 documents only.
>
> And instead of Porter you can try KStem.
> http://ciir.cs.umass.edu/cgi-bin/downloads/downloads.cgi
>
> Try different length normalization described here. Also their Lucene query example (SpanNear) can inspire you.  http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>    

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: What is the best Analyzer and Parser for this type of question?

Posted by Ahmet Arslan <io...@yahoo.com>.

> Example of Question:
> - What is the role of PrnP in mad cow disease?

First thing is do not directly query questions. Manually formulate queries:
remove 'what' 'is' 'the' 'of' '?' etc.

For example i would convert this question into:

"mad cow"^5 "cow disease"^3 "mad cow disease"^15 "role PrnP"~5^2 "role mad cow disease"~45 mad^0.1 role^0.5 cow disease PrnP^10

> I am running in 11.638 documents and the result is 10410
> docs for this question (lowwwwww precision)

Use OR default operator, collect and evaluate top 1000 documents only.

And instead of Porter you can try KStem.
http://ciir.cs.umass.edu/cgi-bin/downloads/downloads.cgi

Try different length normalization described here. Also their Lucene query example (SpanNear) can inspire you.  http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf


      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org