You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Grant Ingersoll <gs...@syr.edu> on 2005/12/09 15:31:01 UTC

ApacheCon next week

Any one planning on going to ApacheCon next week?  I will be giving a 
talk on Lucene on Monday afternoon at 3pm on term vectors, span queries 
and some case studies from our work at CNLP with Lucene. 

Abstract for my talk can be found at 
http://www.apachecon.com/2005/US/html/sessions.html/e=MjAwNS9VUw#1400

-Grant

-- 
------------------------------------------------------------------- 
Grant Ingersoll 
Sr. Software Engineer 
Center for Natural Language Processing 
Syracuse University 
School of Information Studies 
337 Hinds Hall 
Syracuse, NY 13244 

http://www.cnlp.org 
Voice:  315-443-5484 
Fax: 315-443-6886 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: ApacheCon next week

Posted by Dave Kor <da...@gmail.com>.
On 12/27/05, Koji Sekiguchi <ko...@m4.dion.ne.jp> wrote:
> Hi Grant,
>
> > You stole my thunder!  :-)  Was going to post the URL after doing the
> > actual talk, but that's all right.  I will post a few changes I have
> > made on the plane tonight or tomorrow to the website below.
> >
> > Let me know if you have any questions...
>
> I'm curious to know "Candidate Identification for QA".
> At your PPT slide p.19,
>
> "Can be as short as a word or as large as multiple documents, based on
> system goals"
>
> Can you explain it in more detail?

In Question Answering, for questions that expect a single answer (Eg,
Which country won the most gold medals in the 1996 Atlanta Olympics?),
we typically just need to find a single document that contains the
answer we are seeking.

However there are other types of questions require extracting answers
from several different documents because a single document might not
contain all the relevant information. A simple example would be a
question with more than one valid answer (Eg, Who are the presidents
of the united states?). The QA system will have to find answers from
different documents since there might not be a single document that
contain  all the answers. Btw, in QA terminology we label this type of
question as a List question.

Another example are Definition questions, where we would like to
provide all interesting facets on a particular topic (Eg, Tell me all
there is to know about the Grand Canyon). Again, a set of documents
might each describe a single aspect about the Grand Canyon. To build a
complete picture, we may need to sample most documents that mention
the Grand Canyon.

I hope this helps.


Regards,
Dave Kor.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: ApacheCon next week

Posted by Grant Ingersoll <gs...@syr.edu>.
We use boosts that are calculated based on the frequencies and the 
standard alpha, beta, gamma multipliers from Rochio.  Non-relevant terms 
decrement the frequency.  If a term is <= 0, we remove the term (someone 
has posted a contribution for dealing with negative weights, we just 
haven't adopted it yet).  I am sure there are more things you could do, 
we just haven't investigated too much.  We also give different weights 
to things we think are more important based on our NLP analysis.


Ian Soboroff wrote:

>Grant Ingersoll <gs...@syr.edu> writes:
>
>  
>
>>You stole my thunder!  :-)  Was going to post the URL after doing the
>>actual talk, but that's all right.  I will post a few changes I have
>>made on the plane tonight or tomorrow to the website below.
>>
>>Let me know if you have any questions...
>>    
>>
>
>I have one.  I've been thinking about the problem with doing relevance
>feedback in Lucene, and I appreciate seeing your code on getting the
>top terms from a single document.
>
>However, the real problem for RF and pseudo-RF techniques is forming
>the query.  You can obviously add terms to a query, but how are you
>handling the weighting?  With boosts, or something more sophisticated?
>
>Ian
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>For additional commands, e-mail: java-user-help@lucene.apache.org
>
>  
>

-- 
------------------------------------------------------------------- 
Grant Ingersoll 
Sr. Software Engineer 
Center for Natural Language Processing 
Syracuse University 
School of Information Studies 
337 Hinds Hall 
Syracuse, NY 13244 

http://www.cnlp.org 
Voice:  315-443-5484 
Fax: 315-443-6886 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: ApacheCon next week

Posted by Ian Soboroff <ia...@nist.gov>.
Grant Ingersoll <gs...@syr.edu> writes:

> You stole my thunder!  :-)  Was going to post the URL after doing the
> actual talk, but that's all right.  I will post a few changes I have
> made on the plane tonight or tomorrow to the website below.
>
> Let me know if you have any questions...

I have one.  I've been thinking about the problem with doing relevance
feedback in Lucene, and I appreciate seeing your code on getting the
top terms from a single document.

However, the real problem for RF and pseudo-RF techniques is forming
the query.  You can obviously add terms to a query, but how are you
handling the weighting?  With boosts, or something more sophisticated?

Ian


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Impact of Term Vectors (was ApacheCon next week)

Posted by Dan Climan <dc...@keepmedia.com>.
Good question. I was wondering about the impact of adding term vectors with
the various options. For example, is adding term vectors with both positions
and offsets a significant impact? Which current parts of lucene (including
contributions) take advantage of term vectors being present? I know that
Highlighter class can make use of them if present.

Dan

-----Original Message-----
From: Jeff Rodenburg [mailto:jeff.rodenburg@gmail.com] 
Sent: Monday, December 12, 2005 9:08 PM
To: java-user@lucene.apache.org
Subject: Re: ApacheCon next week

Well done, Grant.  Very informative.

Question on Term Vectors: with their inclusion in an index, have you noticed
any degradation in performance, either from a search effiiciency or
maintenance point-of-view?  Given the power of term vectors, if the perf
impact is negligible, I'm curious to the reasons why one would NOT include
term vectors in any and every index...

cheers,
j




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: ApacheCon next week

Posted by Grant Ingersoll <gs...@syr.edu>.
Thanks, Jeff.

I have only done basic testing, so not completely sure on your 
question.  However, one trade off is definitely in disk space.  As far 
as searching, I don't think there should be any impact b/c you get the 
vector separate from a search via the IndexReader.  Perhaps, the 
compound file format might effect it as the file will be larger, but 
that is only speculation.  I haven't noticed any performance impact 
other than the increase in disk space needed. 

-Grant

Jeff Rodenburg wrote:

>Well done, Grant.  Very informative.
>
>Question on Term Vectors: with their inclusion in an index, have you noticed
>any degradation in performance, either from a search effiiciency or
>maintenance point-of-view?  Given the power of term vectors, if the perf
>impact is negligible, I'm curious to the reasons why one would NOT include
>term vectors in any and every index...
>
>cheers,
>j
>
>On 12/12/05, Grant Ingersoll <gs...@syr.edu> wrote:
>  
>
>>You stole my thunder!  :-)  Was going to post the URL after doing the
>>actual talk, but that's all right.  I will post a few changes I have
>>made on the plane tonight or tomorrow to the website below.
>>
>>Let me know if you have any questions...
>>
>>Luke Nezda wrote:
>>
>>    
>>
>>>Where are my manners :-/
>>>Anyway, I found the answer to my own request.
>>>http://www.cnlp.org/apachecon2005/
>>>Looks like some cool work, I only wish I could hear the accompanying
>>>      
>>>
>>speech.
>>    
>>
>>>Cheers,
>>>-Luke
>>>
>>>On 12/11/05, gekkokid <me...@gekkokid.org.uk> wrote:
>>>
>>>
>>>      
>>>
>>>>please :)
>>>>----- Original Message -----
>>>>From: "Luke Nezda" <ln...@gmail.com>
>>>>To: <ja...@lucene.apache.org>
>>>>Sent: Sunday, December 11, 2005 6:28 PM
>>>>Subject: Re: ApacheCon next week
>>>>
>>>>
>>>>Hello Grant-
>>>>Could you post the material you present (eg slides, handouts, etc) for
>>>>those
>>>>of us who cannot attend?
>>>>Thanks in advance,
>>>>-Luke
>>>>
>>>>On 12/9/05, Grant Ingersoll <gs...@syr.edu> wrote:
>>>>
>>>>
>>>>        
>>>>
>>>>>Any one planning on going to ApacheCon next week?  I will be giving a
>>>>>talk on Lucene on Monday afternoon at 3pm on term vectors, span queries
>>>>>and some case studies from our work at CNLP with Lucene.
>>>>>
>>>>>Abstract for my talk can be found at
>>>>>http://www.apachecon.com/2005/US/html/sessions.html/e=MjAwNS9VUw#1400
>>>>>
>>>>>-Grant
>>>>>
>>>>>--
>>>>>-------------------------------------------------------------------
>>>>>Grant Ingersoll
>>>>>Sr. Software Engineer
>>>>>Center for Natural Language Processing
>>>>>Syracuse University
>>>>>School of Information Studies
>>>>>337 Hinds Hall
>>>>>Syracuse, NY 13244
>>>>>
>>>>>http://www.cnlp.org
>>>>>Voice:  315-443-5484
>>>>>Fax: 315-443-6886
>>>>>
>>>>>
>>>>>---------------------------------------------------------------------
>>>>>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>          
>>>>>
>>>>---------------------------------------------------------------------
>>>>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>>
>>>>        
>>>>
>>>
>>>      
>>>
>>--
>>-------------------------------------------------------------------
>>Grant Ingersoll
>>Sr. Software Engineer
>>Center for Natural Language Processing
>>Syracuse University
>>School of Information Studies
>>337 Hinds Hall
>>Syracuse, NY 13244
>>
>>http://www.cnlp.org
>>Voice:  315-443-5484
>>Fax: 315-443-6886
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>    
>>
>
>  
>

-- 
------------------------------------------------------------------- 
Grant Ingersoll 
Sr. Software Engineer 
Center for Natural Language Processing 
Syracuse University 
School of Information Studies 
337 Hinds Hall 
Syracuse, NY 13244 

http://www.cnlp.org 
Voice:  315-443-5484 
Fax: 315-443-6886 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: ApacheCon next week

Posted by Jeff Rodenburg <je...@gmail.com>.
Well done, Grant.  Very informative.

Question on Term Vectors: with their inclusion in an index, have you noticed
any degradation in performance, either from a search effiiciency or
maintenance point-of-view?  Given the power of term vectors, if the perf
impact is negligible, I'm curious to the reasons why one would NOT include
term vectors in any and every index...

cheers,
j

On 12/12/05, Grant Ingersoll <gs...@syr.edu> wrote:
>
> You stole my thunder!  :-)  Was going to post the URL after doing the
> actual talk, but that's all right.  I will post a few changes I have
> made on the plane tonight or tomorrow to the website below.
>
> Let me know if you have any questions...
>
> Luke Nezda wrote:
>
> >Where are my manners :-/
> >Anyway, I found the answer to my own request.
> >http://www.cnlp.org/apachecon2005/
> >Looks like some cool work, I only wish I could hear the accompanying
> speech.
> >Cheers,
> >-Luke
> >
> >On 12/11/05, gekkokid <me...@gekkokid.org.uk> wrote:
> >
> >
> >>please :)
> >>----- Original Message -----
> >>From: "Luke Nezda" <ln...@gmail.com>
> >>To: <ja...@lucene.apache.org>
> >>Sent: Sunday, December 11, 2005 6:28 PM
> >>Subject: Re: ApacheCon next week
> >>
> >>
> >>Hello Grant-
> >>Could you post the material you present (eg slides, handouts, etc) for
> >>those
> >>of us who cannot attend?
> >>Thanks in advance,
> >>-Luke
> >>
> >>On 12/9/05, Grant Ingersoll <gs...@syr.edu> wrote:
> >>
> >>
> >>>Any one planning on going to ApacheCon next week?  I will be giving a
> >>>talk on Lucene on Monday afternoon at 3pm on term vectors, span queries
> >>>and some case studies from our work at CNLP with Lucene.
> >>>
> >>>Abstract for my talk can be found at
> >>>http://www.apachecon.com/2005/US/html/sessions.html/e=MjAwNS9VUw#1400
> >>>
> >>>-Grant
> >>>
> >>>--
> >>>-------------------------------------------------------------------
> >>>Grant Ingersoll
> >>>Sr. Software Engineer
> >>>Center for Natural Language Processing
> >>>Syracuse University
> >>>School of Information Studies
> >>>337 Hinds Hall
> >>>Syracuse, NY 13244
> >>>
> >>>http://www.cnlp.org
> >>>Voice:  315-443-5484
> >>>Fax: 315-443-6886
> >>>
> >>>
> >>>---------------------------------------------------------------------
> >>>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>
> >>>
> >>>
> >>>
> >>---------------------------------------------------------------------
> >>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >>
> >>
> >
> >
> >
>
> --
> -------------------------------------------------------------------
> Grant Ingersoll
> Sr. Software Engineer
> Center for Natural Language Processing
> Syracuse University
> School of Information Studies
> 337 Hinds Hall
> Syracuse, NY 13244
>
> http://www.cnlp.org
> Voice:  315-443-5484
> Fax: 315-443-6886
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

RE: ApacheCon next week

Posted by Koji Sekiguchi <ko...@m4.dion.ne.jp>.
Hi Grant,

> You stole my thunder!  :-)  Was going to post the URL after doing the
> actual talk, but that's all right.  I will post a few changes I have
> made on the plane tonight or tomorrow to the website below.
>
> Let me know if you have any questions...

I'm curious to know "Candidate Identification for QA".
At your PPT slide p.19,

"Can be as short as a word or as large as multiple documents, based on
system goals"

Can you explain it in more detail?
I expect that QAService.java implements the spirit of the above sentence,
but I couldn't understand the meaning.

Thanks in advance,

Koji




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: ApacheCon next week

Posted by Grant Ingersoll <gs...@syr.edu>.
You stole my thunder!  :-)  Was going to post the URL after doing the 
actual talk, but that's all right.  I will post a few changes I have 
made on the plane tonight or tomorrow to the website below.

Let me know if you have any questions...

Luke Nezda wrote:

>Where are my manners :-/
>Anyway, I found the answer to my own request.
>http://www.cnlp.org/apachecon2005/
>Looks like some cool work, I only wish I could hear the accompanying speech.
>Cheers,
>-Luke
>
>On 12/11/05, gekkokid <me...@gekkokid.org.uk> wrote:
>  
>
>>please :)
>>----- Original Message -----
>>From: "Luke Nezda" <ln...@gmail.com>
>>To: <ja...@lucene.apache.org>
>>Sent: Sunday, December 11, 2005 6:28 PM
>>Subject: Re: ApacheCon next week
>>
>>
>>Hello Grant-
>>Could you post the material you present (eg slides, handouts, etc) for
>>those
>>of us who cannot attend?
>>Thanks in advance,
>>-Luke
>>
>>On 12/9/05, Grant Ingersoll <gs...@syr.edu> wrote:
>>    
>>
>>>Any one planning on going to ApacheCon next week?  I will be giving a
>>>talk on Lucene on Monday afternoon at 3pm on term vectors, span queries
>>>and some case studies from our work at CNLP with Lucene.
>>>
>>>Abstract for my talk can be found at
>>>http://www.apachecon.com/2005/US/html/sessions.html/e=MjAwNS9VUw#1400
>>>
>>>-Grant
>>>
>>>--
>>>-------------------------------------------------------------------
>>>Grant Ingersoll
>>>Sr. Software Engineer
>>>Center for Natural Language Processing
>>>Syracuse University
>>>School of Information Studies
>>>337 Hinds Hall
>>>Syracuse, NY 13244
>>>
>>>http://www.cnlp.org
>>>Voice:  315-443-5484
>>>Fax: 315-443-6886
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>      
>>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>    
>>
>
>  
>

-- 
------------------------------------------------------------------- 
Grant Ingersoll 
Sr. Software Engineer 
Center for Natural Language Processing 
Syracuse University 
School of Information Studies 
337 Hinds Hall 
Syracuse, NY 13244 

http://www.cnlp.org 
Voice:  315-443-5484 
Fax: 315-443-6886 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: ApacheCon next week

Posted by Luke Nezda <ln...@gmail.com>.
Where are my manners :-/
Anyway, I found the answer to my own request.
http://www.cnlp.org/apachecon2005/
Looks like some cool work, I only wish I could hear the accompanying speech.
Cheers,
-Luke

On 12/11/05, gekkokid <me...@gekkokid.org.uk> wrote:
>
> please :)
> ----- Original Message -----
> From: "Luke Nezda" <ln...@gmail.com>
> To: <ja...@lucene.apache.org>
> Sent: Sunday, December 11, 2005 6:28 PM
> Subject: Re: ApacheCon next week
>
>
> Hello Grant-
> Could you post the material you present (eg slides, handouts, etc) for
> those
> of us who cannot attend?
> Thanks in advance,
> -Luke
>
> On 12/9/05, Grant Ingersoll <gs...@syr.edu> wrote:
> >
> > Any one planning on going to ApacheCon next week?  I will be giving a
> > talk on Lucene on Monday afternoon at 3pm on term vectors, span queries
> > and some case studies from our work at CNLP with Lucene.
> >
> > Abstract for my talk can be found at
> > http://www.apachecon.com/2005/US/html/sessions.html/e=MjAwNS9VUw#1400
> >
> > -Grant
> >
> > --
> > -------------------------------------------------------------------
> > Grant Ingersoll
> > Sr. Software Engineer
> > Center for Natural Language Processing
> > Syracuse University
> > School of Information Studies
> > 337 Hinds Hall
> > Syracuse, NY 13244
> >
> > http://www.cnlp.org
> > Voice:  315-443-5484
> > Fax: 315-443-6886
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: ApacheCon next week

Posted by gekkokid <me...@gekkokid.org.uk>.
please :)
----- Original Message ----- 
From: "Luke Nezda" <ln...@gmail.com>
To: <ja...@lucene.apache.org>
Sent: Sunday, December 11, 2005 6:28 PM
Subject: Re: ApacheCon next week


Hello Grant-
Could you post the material you present (eg slides, handouts, etc) for those
of us who cannot attend?
Thanks in advance,
-Luke

On 12/9/05, Grant Ingersoll <gs...@syr.edu> wrote:
>
> Any one planning on going to ApacheCon next week?  I will be giving a
> talk on Lucene on Monday afternoon at 3pm on term vectors, span queries
> and some case studies from our work at CNLP with Lucene.
>
> Abstract for my talk can be found at
> http://www.apachecon.com/2005/US/html/sessions.html/e=MjAwNS9VUw#1400
>
> -Grant
>
> --
> -------------------------------------------------------------------
> Grant Ingersoll
> Sr. Software Engineer
> Center for Natural Language Processing
> Syracuse University
> School of Information Studies
> 337 Hinds Hall
> Syracuse, NY 13244
>
> http://www.cnlp.org
> Voice:  315-443-5484
> Fax: 315-443-6886
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: ApacheCon next week

Posted by Luke Nezda <ln...@gmail.com>.
Hello Grant-
Could you post the material you present (eg slides, handouts, etc) for those
of us who cannot attend?
Thanks in advance,
-Luke

On 12/9/05, Grant Ingersoll <gs...@syr.edu> wrote:
>
> Any one planning on going to ApacheCon next week?  I will be giving a
> talk on Lucene on Monday afternoon at 3pm on term vectors, span queries
> and some case studies from our work at CNLP with Lucene.
>
> Abstract for my talk can be found at
> http://www.apachecon.com/2005/US/html/sessions.html/e=MjAwNS9VUw#1400
>
> -Grant
>
> --
> -------------------------------------------------------------------
> Grant Ingersoll
> Sr. Software Engineer
> Center for Natural Language Processing
> Syracuse University
> School of Information Studies
> 337 Hinds Hall
> Syracuse, NY 13244
>
> http://www.cnlp.org
> Voice:  315-443-5484
> Fax: 315-443-6886
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>