You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Grant Ingersoll <gs...@syr.edu> on 2005/12/09 15:31:01 UTC
ApacheCon next week
Any one planning on going to ApacheCon next week? I will be giving a
talk on Lucene on Monday afternoon at 3pm on term vectors, span queries
and some case studies from our work at CNLP with Lucene.
Abstract for my talk can be found at
http://www.apachecon.com/2005/US/html/sessions.html/e=MjAwNS9VUw#1400
-Grant
--
-------------------------------------------------------------------
Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
School of Information Studies
337 Hinds Hall
Syracuse, NY 13244
http://www.cnlp.org
Voice: 315-443-5484
Fax: 315-443-6886
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: ApacheCon next week
Posted by Dave Kor <da...@gmail.com>.
On 12/27/05, Koji Sekiguchi <ko...@m4.dion.ne.jp> wrote:
> Hi Grant,
>
> > You stole my thunder! :-) Was going to post the URL after doing the
> > actual talk, but that's all right. I will post a few changes I have
> > made on the plane tonight or tomorrow to the website below.
> >
> > Let me know if you have any questions...
>
> I'm curious to know "Candidate Identification for QA".
> At your PPT slide p.19,
>
> "Can be as short as a word or as large as multiple documents, based on
> system goals"
>
> Can you explain it in more detail?
In Question Answering, for questions that expect a single answer (Eg,
Which country won the most gold medals in the 1996 Atlanta Olympics?),
we typically just need to find a single document that contains the
answer we are seeking.
However there are other types of questions require extracting answers
from several different documents because a single document might not
contain all the relevant information. A simple example would be a
question with more than one valid answer (Eg, Who are the presidents
of the united states?). The QA system will have to find answers from
different documents since there might not be a single document that
contain all the answers. Btw, in QA terminology we label this type of
question as a List question.
Another example are Definition questions, where we would like to
provide all interesting facets on a particular topic (Eg, Tell me all
there is to know about the Grand Canyon). Again, a set of documents
might each describe a single aspect about the Grand Canyon. To build a
complete picture, we may need to sample most documents that mention
the Grand Canyon.
I hope this helps.
Regards,
Dave Kor.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: ApacheCon next week
Posted by Grant Ingersoll <gs...@syr.edu>.
We use boosts that are calculated based on the frequencies and the
standard alpha, beta, gamma multipliers from Rochio. Non-relevant terms
decrement the frequency. If a term is <= 0, we remove the term (someone
has posted a contribution for dealing with negative weights, we just
haven't adopted it yet). I am sure there are more things you could do,
we just haven't investigated too much. We also give different weights
to things we think are more important based on our NLP analysis.
Ian Soboroff wrote:
>Grant Ingersoll <gs...@syr.edu> writes:
>
>
>
>>You stole my thunder! :-) Was going to post the URL after doing the
>>actual talk, but that's all right. I will post a few changes I have
>>made on the plane tonight or tomorrow to the website below.
>>
>>Let me know if you have any questions...
>>
>>
>
>I have one. I've been thinking about the problem with doing relevance
>feedback in Lucene, and I appreciate seeing your code on getting the
>top terms from a single document.
>
>However, the real problem for RF and pseudo-RF techniques is forming
>the query. You can obviously add terms to a query, but how are you
>handling the weighting? With boosts, or something more sophisticated?
>
>Ian
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
--
-------------------------------------------------------------------
Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
School of Information Studies
337 Hinds Hall
Syracuse, NY 13244
http://www.cnlp.org
Voice: 315-443-5484
Fax: 315-443-6886
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: ApacheCon next week
Posted by Ian Soboroff <ia...@nist.gov>.
Grant Ingersoll <gs...@syr.edu> writes:
> You stole my thunder! :-) Was going to post the URL after doing the
> actual talk, but that's all right. I will post a few changes I have
> made on the plane tonight or tomorrow to the website below.
>
> Let me know if you have any questions...
I have one. I've been thinking about the problem with doing relevance
feedback in Lucene, and I appreciate seeing your code on getting the
top terms from a single document.
However, the real problem for RF and pseudo-RF techniques is forming
the query. You can obviously add terms to a query, but how are you
handling the weighting? With boosts, or something more sophisticated?
Ian
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Impact of Term Vectors (was ApacheCon next week)
Posted by Dan Climan <dc...@keepmedia.com>.
Good question. I was wondering about the impact of adding term vectors with
the various options. For example, is adding term vectors with both positions
and offsets a significant impact? Which current parts of lucene (including
contributions) take advantage of term vectors being present? I know that
Highlighter class can make use of them if present.
Dan
-----Original Message-----
From: Jeff Rodenburg [mailto:jeff.rodenburg@gmail.com]
Sent: Monday, December 12, 2005 9:08 PM
To: java-user@lucene.apache.org
Subject: Re: ApacheCon next week
Well done, Grant. Very informative.
Question on Term Vectors: with their inclusion in an index, have you noticed
any degradation in performance, either from a search effiiciency or
maintenance point-of-view? Given the power of term vectors, if the perf
impact is negligible, I'm curious to the reasons why one would NOT include
term vectors in any and every index...
cheers,
j
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: ApacheCon next week
Posted by Grant Ingersoll <gs...@syr.edu>.
Thanks, Jeff.
I have only done basic testing, so not completely sure on your
question. However, one trade off is definitely in disk space. As far
as searching, I don't think there should be any impact b/c you get the
vector separate from a search via the IndexReader. Perhaps, the
compound file format might effect it as the file will be larger, but
that is only speculation. I haven't noticed any performance impact
other than the increase in disk space needed.
-Grant
Jeff Rodenburg wrote:
>Well done, Grant. Very informative.
>
>Question on Term Vectors: with their inclusion in an index, have you noticed
>any degradation in performance, either from a search effiiciency or
>maintenance point-of-view? Given the power of term vectors, if the perf
>impact is negligible, I'm curious to the reasons why one would NOT include
>term vectors in any and every index...
>
>cheers,
>j
>
>On 12/12/05, Grant Ingersoll <gs...@syr.edu> wrote:
>
>
>>You stole my thunder! :-) Was going to post the URL after doing the
>>actual talk, but that's all right. I will post a few changes I have
>>made on the plane tonight or tomorrow to the website below.
>>
>>Let me know if you have any questions...
>>
>>Luke Nezda wrote:
>>
>>
>>
>>>Where are my manners :-/
>>>Anyway, I found the answer to my own request.
>>>http://www.cnlp.org/apachecon2005/
>>>Looks like some cool work, I only wish I could hear the accompanying
>>>
>>>
>>speech.
>>
>>
>>>Cheers,
>>>-Luke
>>>
>>>On 12/11/05, gekkokid <me...@gekkokid.org.uk> wrote:
>>>
>>>
>>>
>>>
>>>>please :)
>>>>----- Original Message -----
>>>>From: "Luke Nezda" <ln...@gmail.com>
>>>>To: <ja...@lucene.apache.org>
>>>>Sent: Sunday, December 11, 2005 6:28 PM
>>>>Subject: Re: ApacheCon next week
>>>>
>>>>
>>>>Hello Grant-
>>>>Could you post the material you present (eg slides, handouts, etc) for
>>>>those
>>>>of us who cannot attend?
>>>>Thanks in advance,
>>>>-Luke
>>>>
>>>>On 12/9/05, Grant Ingersoll <gs...@syr.edu> wrote:
>>>>
>>>>
>>>>
>>>>
>>>>>Any one planning on going to ApacheCon next week? I will be giving a
>>>>>talk on Lucene on Monday afternoon at 3pm on term vectors, span queries
>>>>>and some case studies from our work at CNLP with Lucene.
>>>>>
>>>>>Abstract for my talk can be found at
>>>>>http://www.apachecon.com/2005/US/html/sessions.html/e=MjAwNS9VUw#1400
>>>>>
>>>>>-Grant
>>>>>
>>>>>--
>>>>>-------------------------------------------------------------------
>>>>>Grant Ingersoll
>>>>>Sr. Software Engineer
>>>>>Center for Natural Language Processing
>>>>>Syracuse University
>>>>>School of Information Studies
>>>>>337 Hinds Hall
>>>>>Syracuse, NY 13244
>>>>>
>>>>>http://www.cnlp.org
>>>>>Voice: 315-443-5484
>>>>>Fax: 315-443-6886
>>>>>
>>>>>
>>>>>---------------------------------------------------------------------
>>>>>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>---------------------------------------------------------------------
>>>>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>--
>>-------------------------------------------------------------------
>>Grant Ingersoll
>>Sr. Software Engineer
>>Center for Natural Language Processing
>>Syracuse University
>>School of Information Studies
>>337 Hinds Hall
>>Syracuse, NY 13244
>>
>>http://www.cnlp.org
>>Voice: 315-443-5484
>>Fax: 315-443-6886
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>>
>
>
>
--
-------------------------------------------------------------------
Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
School of Information Studies
337 Hinds Hall
Syracuse, NY 13244
http://www.cnlp.org
Voice: 315-443-5484
Fax: 315-443-6886
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: ApacheCon next week
Posted by Jeff Rodenburg <je...@gmail.com>.
Well done, Grant. Very informative.
Question on Term Vectors: with their inclusion in an index, have you noticed
any degradation in performance, either from a search effiiciency or
maintenance point-of-view? Given the power of term vectors, if the perf
impact is negligible, I'm curious to the reasons why one would NOT include
term vectors in any and every index...
cheers,
j
On 12/12/05, Grant Ingersoll <gs...@syr.edu> wrote:
>
> You stole my thunder! :-) Was going to post the URL after doing the
> actual talk, but that's all right. I will post a few changes I have
> made on the plane tonight or tomorrow to the website below.
>
> Let me know if you have any questions...
>
> Luke Nezda wrote:
>
> >Where are my manners :-/
> >Anyway, I found the answer to my own request.
> >http://www.cnlp.org/apachecon2005/
> >Looks like some cool work, I only wish I could hear the accompanying
> speech.
> >Cheers,
> >-Luke
> >
> >On 12/11/05, gekkokid <me...@gekkokid.org.uk> wrote:
> >
> >
> >>please :)
> >>----- Original Message -----
> >>From: "Luke Nezda" <ln...@gmail.com>
> >>To: <ja...@lucene.apache.org>
> >>Sent: Sunday, December 11, 2005 6:28 PM
> >>Subject: Re: ApacheCon next week
> >>
> >>
> >>Hello Grant-
> >>Could you post the material you present (eg slides, handouts, etc) for
> >>those
> >>of us who cannot attend?
> >>Thanks in advance,
> >>-Luke
> >>
> >>On 12/9/05, Grant Ingersoll <gs...@syr.edu> wrote:
> >>
> >>
> >>>Any one planning on going to ApacheCon next week? I will be giving a
> >>>talk on Lucene on Monday afternoon at 3pm on term vectors, span queries
> >>>and some case studies from our work at CNLP with Lucene.
> >>>
> >>>Abstract for my talk can be found at
> >>>http://www.apachecon.com/2005/US/html/sessions.html/e=MjAwNS9VUw#1400
> >>>
> >>>-Grant
> >>>
> >>>--
> >>>-------------------------------------------------------------------
> >>>Grant Ingersoll
> >>>Sr. Software Engineer
> >>>Center for Natural Language Processing
> >>>Syracuse University
> >>>School of Information Studies
> >>>337 Hinds Hall
> >>>Syracuse, NY 13244
> >>>
> >>>http://www.cnlp.org
> >>>Voice: 315-443-5484
> >>>Fax: 315-443-6886
> >>>
> >>>
> >>>---------------------------------------------------------------------
> >>>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>
> >>>
> >>>
> >>>
> >>---------------------------------------------------------------------
> >>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >>
> >>
> >
> >
> >
>
> --
> -------------------------------------------------------------------
> Grant Ingersoll
> Sr. Software Engineer
> Center for Natural Language Processing
> Syracuse University
> School of Information Studies
> 337 Hinds Hall
> Syracuse, NY 13244
>
> http://www.cnlp.org
> Voice: 315-443-5484
> Fax: 315-443-6886
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
RE: ApacheCon next week
Posted by Koji Sekiguchi <ko...@m4.dion.ne.jp>.
Hi Grant,
> You stole my thunder! :-) Was going to post the URL after doing the
> actual talk, but that's all right. I will post a few changes I have
> made on the plane tonight or tomorrow to the website below.
>
> Let me know if you have any questions...
I'm curious to know "Candidate Identification for QA".
At your PPT slide p.19,
"Can be as short as a word or as large as multiple documents, based on
system goals"
Can you explain it in more detail?
I expect that QAService.java implements the spirit of the above sentence,
but I couldn't understand the meaning.
Thanks in advance,
Koji
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: ApacheCon next week
Posted by Grant Ingersoll <gs...@syr.edu>.
You stole my thunder! :-) Was going to post the URL after doing the
actual talk, but that's all right. I will post a few changes I have
made on the plane tonight or tomorrow to the website below.
Let me know if you have any questions...
Luke Nezda wrote:
>Where are my manners :-/
>Anyway, I found the answer to my own request.
>http://www.cnlp.org/apachecon2005/
>Looks like some cool work, I only wish I could hear the accompanying speech.
>Cheers,
>-Luke
>
>On 12/11/05, gekkokid <me...@gekkokid.org.uk> wrote:
>
>
>>please :)
>>----- Original Message -----
>>From: "Luke Nezda" <ln...@gmail.com>
>>To: <ja...@lucene.apache.org>
>>Sent: Sunday, December 11, 2005 6:28 PM
>>Subject: Re: ApacheCon next week
>>
>>
>>Hello Grant-
>>Could you post the material you present (eg slides, handouts, etc) for
>>those
>>of us who cannot attend?
>>Thanks in advance,
>>-Luke
>>
>>On 12/9/05, Grant Ingersoll <gs...@syr.edu> wrote:
>>
>>
>>>Any one planning on going to ApacheCon next week? I will be giving a
>>>talk on Lucene on Monday afternoon at 3pm on term vectors, span queries
>>>and some case studies from our work at CNLP with Lucene.
>>>
>>>Abstract for my talk can be found at
>>>http://www.apachecon.com/2005/US/html/sessions.html/e=MjAwNS9VUw#1400
>>>
>>>-Grant
>>>
>>>--
>>>-------------------------------------------------------------------
>>>Grant Ingersoll
>>>Sr. Software Engineer
>>>Center for Natural Language Processing
>>>Syracuse University
>>>School of Information Studies
>>>337 Hinds Hall
>>>Syracuse, NY 13244
>>>
>>>http://www.cnlp.org
>>>Voice: 315-443-5484
>>>Fax: 315-443-6886
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>>
>
>
>
--
-------------------------------------------------------------------
Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
School of Information Studies
337 Hinds Hall
Syracuse, NY 13244
http://www.cnlp.org
Voice: 315-443-5484
Fax: 315-443-6886
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: ApacheCon next week
Posted by Luke Nezda <ln...@gmail.com>.
Where are my manners :-/
Anyway, I found the answer to my own request.
http://www.cnlp.org/apachecon2005/
Looks like some cool work, I only wish I could hear the accompanying speech.
Cheers,
-Luke
On 12/11/05, gekkokid <me...@gekkokid.org.uk> wrote:
>
> please :)
> ----- Original Message -----
> From: "Luke Nezda" <ln...@gmail.com>
> To: <ja...@lucene.apache.org>
> Sent: Sunday, December 11, 2005 6:28 PM
> Subject: Re: ApacheCon next week
>
>
> Hello Grant-
> Could you post the material you present (eg slides, handouts, etc) for
> those
> of us who cannot attend?
> Thanks in advance,
> -Luke
>
> On 12/9/05, Grant Ingersoll <gs...@syr.edu> wrote:
> >
> > Any one planning on going to ApacheCon next week? I will be giving a
> > talk on Lucene on Monday afternoon at 3pm on term vectors, span queries
> > and some case studies from our work at CNLP with Lucene.
> >
> > Abstract for my talk can be found at
> > http://www.apachecon.com/2005/US/html/sessions.html/e=MjAwNS9VUw#1400
> >
> > -Grant
> >
> > --
> > -------------------------------------------------------------------
> > Grant Ingersoll
> > Sr. Software Engineer
> > Center for Natural Language Processing
> > Syracuse University
> > School of Information Studies
> > 337 Hinds Hall
> > Syracuse, NY 13244
> >
> > http://www.cnlp.org
> > Voice: 315-443-5484
> > Fax: 315-443-6886
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: ApacheCon next week
Posted by gekkokid <me...@gekkokid.org.uk>.
please :)
----- Original Message -----
From: "Luke Nezda" <ln...@gmail.com>
To: <ja...@lucene.apache.org>
Sent: Sunday, December 11, 2005 6:28 PM
Subject: Re: ApacheCon next week
Hello Grant-
Could you post the material you present (eg slides, handouts, etc) for those
of us who cannot attend?
Thanks in advance,
-Luke
On 12/9/05, Grant Ingersoll <gs...@syr.edu> wrote:
>
> Any one planning on going to ApacheCon next week? I will be giving a
> talk on Lucene on Monday afternoon at 3pm on term vectors, span queries
> and some case studies from our work at CNLP with Lucene.
>
> Abstract for my talk can be found at
> http://www.apachecon.com/2005/US/html/sessions.html/e=MjAwNS9VUw#1400
>
> -Grant
>
> --
> -------------------------------------------------------------------
> Grant Ingersoll
> Sr. Software Engineer
> Center for Natural Language Processing
> Syracuse University
> School of Information Studies
> 337 Hinds Hall
> Syracuse, NY 13244
>
> http://www.cnlp.org
> Voice: 315-443-5484
> Fax: 315-443-6886
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: ApacheCon next week
Posted by Luke Nezda <ln...@gmail.com>.
Hello Grant-
Could you post the material you present (eg slides, handouts, etc) for those
of us who cannot attend?
Thanks in advance,
-Luke
On 12/9/05, Grant Ingersoll <gs...@syr.edu> wrote:
>
> Any one planning on going to ApacheCon next week? I will be giving a
> talk on Lucene on Monday afternoon at 3pm on term vectors, span queries
> and some case studies from our work at CNLP with Lucene.
>
> Abstract for my talk can be found at
> http://www.apachecon.com/2005/US/html/sessions.html/e=MjAwNS9VUw#1400
>
> -Grant
>
> --
> -------------------------------------------------------------------
> Grant Ingersoll
> Sr. Software Engineer
> Center for Natural Language Processing
> Syracuse University
> School of Information Studies
> 337 Hinds Hall
> Syracuse, NY 13244
>
> http://www.cnlp.org
> Voice: 315-443-5484
> Fax: 315-443-6886
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>