You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by beatriz ramos <be...@gmail.com> on 2006/10/24 11:24:47 UTC

Re: number of term occurrences

Hi, thanks for all your answers, but they don't work

I have tried the 3 options and with all of them we get termDoc = 0
I have checked my index with Luke software and termDoc is 1 here, so my
index is correct.

is it possible I have a problem with the reader? (because my index is
allright)

Thanks

(when I talk about termDocs, it means number of documents in which term
appears)



On 24/10/06, Grant Ingersoll <gs...@apache.org> wrote:
>
> You can also use Term Vectors, at the cost of extra storage.  Search
> this list for Term Vectors for info on how to implement.
>
> On Oct 23, 2006, at 5:50 AM, beatriz ramos wrote:
>
> > Hello,
> > I´m working with Lucene. I need to get the number of occurrences of
> > the term
> > in the document. I had seen the documentations ant I don´t find
> > anything.
> > Do you have any idea?
> > Thanks.
>
> --------------------------
> Grant Ingersoll
> Sr. Software Engineer
> Center for Natural Language Processing
> Syracuse University
> 335 Hinds Hall
> Syracuse, NY 13244
> http://www.cnlp.org
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: number of term occurrences

Posted by Doron Cohen <DO...@il.ibm.com>.
I don't know why the termDocs option did not work for you. Perhaps you did
not (re)open the searcher after the index was populated?  Anyhow, here is a
small code snippet that does just this, see if it works for you, then you
can compare it to your code...

  void numberOfTermOcc() throws Exception {
    System.out.println("======== populate index");
    RAMDirectory dir = new RAMDirectory();
    IndexWriter iw = new IndexWriter(dir,
                                     new StandardAnalyzer(),true);
    for (int i = 0; i < 10; i++) {
      Document doc = new Document();
      for (int j = 0; j < 10; j++) {
        doc.add(new Field("field_"+(i+j), "value_"+(i+j),
                          Field.Store.NO, Field.Index.TOKENIZED));
        doc.add(new Field("field_"+(i+j), "value_"+(i+j),
                          Field.Store.NO, Field.Index.TOKENIZED));
        doc.add(new Field("field_"+(i+j), "value_"+(i+j+1),
                          Field.Store.NO, Field.Index.TOKENIZED));
      }
      iw.addDocument(doc);
    }
    iw.close();

    IndexReader ir = IndexReader.open(dir);
    printTermDocs(ir, new Term("field_7","value_7"));
    printTermDocs(ir, new Term("field_7","value_8"));
  }

  void printTermDocs(IndexReader ir, Term t) throws IOException {
    System.out.println("========= iterate docs for "+t);
    TermDocs td = ir.termDocs(t);

    while (td.next()) {
      System.out.println("term frequency in doc "+td.doc()+
                         " is: "+ td.freq());
    };
  }

"beatriz ramos" <be...@gmail.com> wrote on 24/10/2006
02:24:47:

> Hi, thanks for all your answers, but they don't work
>
> I have tried the 3 options and with all of them we get termDoc = 0
> I have checked my index with Luke software and termDoc is 1 here, so my
> index is correct.
>
> is it possible I have a problem with the reader? (because my index is
> allright)
>
> Thanks
>
> (when I talk about termDocs, it means number of documents in which term
> appears)
>
>
>
> On 24/10/06, Grant Ingersoll <gs...@apache.org> wrote:
> >
> > You can also use Term Vectors, at the cost of extra storage.  Search
> > this list for Term Vectors for info on how to implement.
> >
> > On Oct 23, 2006, at 5:50 AM, beatriz ramos wrote:
> >
> > > Hello,
> > > I´m working with Lucene. I need to get the number of occurrences of
> > > the term
> > > in the document. I had seen the documentations ant I don´t find
> > > anything.
> > > Do you have any idea?
> > > Thanks.
> >
> > --------------------------
> > Grant Ingersoll
> > Sr. Software Engineer
> > Center for Natural Language Processing
> > Syracuse University
> > 335 Hinds Hall
> > Syracuse, NY 13244
> > http://www.cnlp.org
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: number of term occurrences

Posted by beatriz ramos <be...@gmail.com>.
Thank you

I had forgotten "Field.TermVector.YES" when I created the new Field



On 24/10/06, Samir Abdou <Sa...@unine.ch> wrote:
>
> Hi,
>
> You indexed without storing vectors! This is why the term vector is null.
>
> Samir
>
>
> -----Message d'origine-----
> De: Paz Belmonte [mailto:pazbl1@gmail.com]
> Envoyé: mardi, 24. octobre 2006 12:30
> Ŕ: java-user
> Objet: Re: number of term occurrences
>
> Hi,
>
> I have tried this options too and the Term Vector return null.
>
> Which do you think that it is the problem?
>
>
> 2006/10/24, beatriz ramos <be...@gmail.com>:
> >
> >
> >
> > ---------- Forwarded message ----------
> > From: beatriz ramos <be...@gmail.com>
> > Date: 24-Oct-2006 11:24
> > Subject: Re: number of term occurrences
> > To: java-user@lucene.apache.org
> >
> > Hi, thanks for all your answers, but they don't work
> >
> > I have tried the 3 options and with all of them we get termDoc = 0
> > I have checked my index with Luke software and termDoc is 1 here, so my
> > index is correct.
> >
> > is it possible I have a problem with the reader? (because my index is
> > allright)
> >
> > Thanks
> >
> > (when I talk about termDocs, it means number of documents in which term
> > appears)
> >
> >
> >
> > On 24/10/06, Grant Ingersoll <gs...@apache.org> wrote:
> > >
> > > You can also use Term Vectors, at the cost of extra storage.  Search
> > > this list for Term Vectors for info on how to implement.
> > >
> > > On Oct 23, 2006, at 5:50 AM, beatriz ramos wrote:
> > >
> > > > Hello,
> > > > I´m working with Lucene. I need to get the number of occurrences of
> > > > the term
> > > > in the document. I had seen the documentations ant I don´t find
> > > > anything.
> > > > Do you have any idea?
> > > > Thanks.
> > >
> > > --------------------------
> > > Grant Ingersoll
> > > Sr. Software Engineer
> > > Center for Natural Language Processing
> > > Syracuse University
> > > 335 Hinds Hall
> > > Syracuse, NY 13244
> > > http://www.cnlp.org
> > >
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: number of term occurrences

Posted by Tricia Williams <pg...@student.cs.uwaterloo.ca>.
When you create a Document by adding Field(s) 
(http://lucene.apache.org/java/docs/api/org/apache/lucene/document/Field.html) 
consider the last constructor which allows you to specify if the the field 
will have its TermVector stored or not stored.  Also, Luke has a column in 
its document view which tells you if the TermVector is stored or not 
stored by the presence or lack of precence of a + under the T column.

Cheers,
Tricia

On Tue, 24 Oct 2006, Paz Belmonte wrote:

> I don't know. How are this vectors stored?
> Could you show me an example? (or documentation where I can find it)
>
> 2006/10/24, Samir Abdou <Sa...@unine.ch>:
>>
>> Hi,
>>
>> You indexed without storing vectors! This is why the term vector is null.
>>
>> Samir
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: number of term occurrences

Posted by Paz Belmonte <pa...@gmail.com>.
I don't know. How are this vectors stored?
Could you show me an example? (or documentation where I can find it)

2006/10/24, Samir Abdou <Sa...@unine.ch>:
>
> Hi,
>
> You indexed without storing vectors! This is why the term vector is null.
>
> Samir
>
>
> -----Message d'origine-----
> De: Paz Belmonte [mailto:pazbl1@gmail.com]
> Envoyé: mardi, 24. octobre 2006 12:30
> Ŕ: java-user
> Objet: Re: number of term occurrences
>
> Hi,
>
> I have tried this options too and the Term Vector return null.
>
> Which do you think that it is the problem?
>
>
> 2006/10/24, beatriz ramos <be...@gmail.com>:
> >
> >
> >
> > ---------- Forwarded message ----------
> > From: beatriz ramos <be...@gmail.com>
> > Date: 24-Oct-2006 11:24
> > Subject: Re: number of term occurrences
> > To: java-user@lucene.apache.org
> >
> > Hi, thanks for all your answers, but they don't work
> >
> > I have tried the 3 options and with all of them we get termDoc = 0
> > I have checked my index with Luke software and termDoc is 1 here, so my
> > index is correct.
> >
> > is it possible I have a problem with the reader? (because my index is
> > allright)
> >
> > Thanks
> >
> > (when I talk about termDocs, it means number of documents in which term
> > appears)
> >
> >
> >
> > On 24/10/06, Grant Ingersoll <gs...@apache.org> wrote:
> > >
> > > You can also use Term Vectors, at the cost of extra storage.  Search
> > > this list for Term Vectors for info on how to implement.
> > >
> > > On Oct 23, 2006, at 5:50 AM, beatriz ramos wrote:
> > >
> > > > Hello,
> > > > I´m working with Lucene. I need to get the number of occurrences of
> > > > the term
> > > > in the document. I had seen the documentations ant I don´t find
> > > > anything.
> > > > Do you have any idea?
> > > > Thanks.
> > >
> > > --------------------------
> > > Grant Ingersoll
> > > Sr. Software Engineer
> > > Center for Natural Language Processing
> > > Syracuse University
> > > 335 Hinds Hall
> > > Syracuse, NY 13244
> > > http://www.cnlp.org
> > >
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

RE: number of term occurrences

Posted by Samir Abdou <Sa...@unine.ch>.
Hi,

You indexed without storing vectors! This is why the term vector is null.

Samir


-----Message d'origine-----
De : Paz Belmonte [mailto:pazbl1@gmail.com] 
Envoyé : mardi, 24. octobre 2006 12:30
À : java-user
Objet : Re: number of term occurrences

Hi,

I have tried this options too and the Term Vector return null.

Which do you think that it is the problem?


2006/10/24, beatriz ramos <be...@gmail.com>:
>
>
>
> ---------- Forwarded message ----------
> From: beatriz ramos <be...@gmail.com>
> Date: 24-Oct-2006 11:24
> Subject: Re: number of term occurrences
> To: java-user@lucene.apache.org
>
> Hi, thanks for all your answers, but they don't work
>
> I have tried the 3 options and with all of them we get termDoc = 0
> I have checked my index with Luke software and termDoc is 1 here, so my
> index is correct.
>
> is it possible I have a problem with the reader? (because my index is
> allright)
>
> Thanks
>
> (when I talk about termDocs, it means number of documents in which term
> appears)
>
>
>
> On 24/10/06, Grant Ingersoll <gs...@apache.org> wrote:
> >
> > You can also use Term Vectors, at the cost of extra storage.  Search
> > this list for Term Vectors for info on how to implement.
> >
> > On Oct 23, 2006, at 5:50 AM, beatriz ramos wrote:
> >
> > > Hello,
> > > I´m working with Lucene. I need to get the number of occurrences of
> > > the term
> > > in the document. I had seen the documentations ant I don´t find
> > > anything.
> > > Do you have any idea?
> > > Thanks.
> >
> > --------------------------
> > Grant Ingersoll
> > Sr. Software Engineer
> > Center for Natural Language Processing
> > Syracuse University
> > 335 Hinds Hall
> > Syracuse, NY 13244
> > http://www.cnlp.org
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: number of term occurrences

Posted by Paz Belmonte <pa...@gmail.com>.
Hi,

I have tried this options too and the Term Vector return null.

Which do you think that it is the problem?


2006/10/24, beatriz ramos <be...@gmail.com>:
>
>
>
> ---------- Forwarded message ----------
> From: beatriz ramos <be...@gmail.com>
> Date: 24-Oct-2006 11:24
> Subject: Re: number of term occurrences
> To: java-user@lucene.apache.org
>
> Hi, thanks for all your answers, but they don't work
>
> I have tried the 3 options and with all of them we get termDoc = 0
> I have checked my index with Luke software and termDoc is 1 here, so my
> index is correct.
>
> is it possible I have a problem with the reader? (because my index is
> allright)
>
> Thanks
>
> (when I talk about termDocs, it means number of documents in which term
> appears)
>
>
>
> On 24/10/06, Grant Ingersoll <gs...@apache.org> wrote:
> >
> > You can also use Term Vectors, at the cost of extra storage.  Search
> > this list for Term Vectors for info on how to implement.
> >
> > On Oct 23, 2006, at 5:50 AM, beatriz ramos wrote:
> >
> > > Hello,
> > > I´m working with Lucene. I need to get the number of occurrences of
> > > the term
> > > in the document. I had seen the documentations ant I don´t find
> > > anything.
> > > Do you have any idea?
> > > Thanks.
> >
> > --------------------------
> > Grant Ingersoll
> > Sr. Software Engineer
> > Center for Natural Language Processing
> > Syracuse University
> > 335 Hinds Hall
> > Syracuse, NY 13244
> > http://www.cnlp.org
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>