You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Venkateshprasanna <pr...@yahoo.co.in> on 2006/09/13 09:30:28 UTC

Storing no. of occurances of a token

Is it possible for me to store the number of occurances of a token in a
particular document or a collection of documents?

Regards,
Venkateshprasanna
-- 
View this message in context: http://www.nabble.com/Storing-no.-of-occurances-of-a-token-tf2263455.html#a6280422
Sent from the Lucene - Java Users forum at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Storing no. of occurances of a token

Posted by Chris Hostetter <ho...@fucit.org>.
: I found out how to determine the number of documents in which a term
: appeared by looking at the Luke code, but how does one determine the
: number of times it occurs in each document?

take a look at the TermDocs class.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Storing no. of occurances of a token

Posted by Doron Cohen <DO...@il.ibm.com>.
> I found out how to determine the number of documents in which a term
> appeared by looking at the Luke code, but how does one determine the
> number of times it occurs in each document?

Use TermDocs -
http://lucene.apache.org/java/docs/api/org/apache/lucene/index/TermDocs.html
Something like -
 TermDocs td = myIndexReader.termDocs(new Term("name1","value1"));
 while (td.next()) {
   System.out.println("term frequency in doc "+td.doc()+" is: "+
td.freq());
 };


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Storing no. of occurances of a token

Posted by Bill Taylor <wa...@as-st.com>.
On Sep 13, 2006, at 3:39 AM, Paul Elschot wrote:

> On Wednesday 13 September 2006 09:30, Venkateshprasanna wrote:
>>
>> Is it possible for me to store the number of occurances of a token in 
>> a
>> particular document or a collection of documents?
>
> When the token is indexed as a term, an IndexReader provides
> access to the total number of documents containing the term,
> and to the number of times the term occurs in each document.

I found out how to determine the number of documents in which a term 
appeared by looking at the Luke code, but how does one determine the 
number of times it occurs in each document?

> The total number of term occurrences over all indexed documents
> is not present a Lucene index.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Storing no. of occurances of a token

Posted by Paul Elschot <pa...@xs4all.nl>.
On Wednesday 13 September 2006 09:30, Venkateshprasanna wrote:
> 
> Is it possible for me to store the number of occurances of a token in a
> particular document or a collection of documents?

When the token is indexed as a term, an IndexReader provides
access to the total number of documents containing the term,
and to the number of times the term occurs in each document.
The total number of term occurrences over all indexed documents
is not present a Lucene index.

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Queries in Lucene

Posted by mcarcelen <mc...@isoco.com>.
Thank you very much.

Yes, I´m very new to Lucene. I´m sorry

With the help of Lucene we want to classify 724.827 legal files that in the
first line contained the word "Auto" or "Providencia". We can to separate in
two groups. That´s why I´ve indexed these files with Lucene before, and we
thought that we could reused the index and apply a special query for that

Thanks for your help.
Best regards
Teresa


 


-----Mensaje original-----
De: Erick Erickson [mailto:erickerickson@gmail.com] 
Enviado el: miércoles, 13 de septiembre de 2006 14:20
Para: java-user@lucene.apache.org
Asunto: Re: Queries in Lucene

I'm assuming that you're new to Lucene, so if you're an old pro you probably
already know all this....

I think you'll have difficulty here. Lucene has no concept of lines, just
tokens and offsets. So here are a couple of suggestions off the top  of my
head...

If the first line is the *only* way you want to restrict this, index the
tokens in the first line in a separate field for each document, and search
on that field (call it "firstline" <G>). Obviously, this won't work for
searching lines 2-n.

If you're going to want to ask if terms are in line 2, 3, 4..., you could
bump your term position at the start of each line by, say, 500 and then do
some fancy dancing with TermPositions to get terms from a particular line.
This is going to be complicated though to get right, especially when you
want to do arbitrary boolean queries.

You could creatively index things. Index a document with fields line1,
line2, line3, line4...., and when you wanted to search in a particular line,
form your query with a field corresponding to the correct line. You could
even index the full text of the document in a "fulltext" field if you wanted
to search over an entire document.

There are space tradeoffs to all this, so be sure you understand
Field.Store.YES and NO as they apply to your problem, and what effect
analyzers have on your indexing AND search streams. Lots of people are
confused by this issue.

If you haven't already, get a copy of Luke so you can poke around at your
index. Google luke lucene and it'll pop right up.

Before diving into this as stated, is there a way to re-think the problem to
make it easier? What question are you *really* trying to answer by asking
whether certain tokens are in a particular line?

Best
Erick


On 9/13/06, mcarcelen <mc...@isoco.com> wrote:
>
>
> Hi all,
> I´ve got a index and now I´m trying to create a query with lucene-2.0.0,
> I´d like to find files that in the first line get the following:
>
> <DIV class=My-Word1> AND Word2
>
> I´m tried with the package org.apache.lucene.demo.SearchFiles
> but I get files where the word "Word2" is not in the first line.
>
> I don´t know how to do the query filtered or if I have to use another file
>
> Can anyone help me?
>
> Thanks
>
> Best Regards
> Teresa
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Queries in Lucene

Posted by Erick Erickson <er...@gmail.com>.
I'm assuming that you're new to Lucene, so if you're an old pro you probably
already know all this....

I think you'll have difficulty here. Lucene has no concept of lines, just
tokens and offsets. So here are a couple of suggestions off the top  of my
head...

If the first line is the *only* way you want to restrict this, index the
tokens in the first line in a separate field for each document, and search
on that field (call it "firstline" <G>). Obviously, this won't work for
searching lines 2-n.

If you're going to want to ask if terms are in line 2, 3, 4..., you could
bump your term position at the start of each line by, say, 500 and then do
some fancy dancing with TermPositions to get terms from a particular line.
This is going to be complicated though to get right, especially when you
want to do arbitrary boolean queries.

You could creatively index things. Index a document with fields line1,
line2, line3, line4...., and when you wanted to search in a particular line,
form your query with a field corresponding to the correct line. You could
even index the full text of the document in a "fulltext" field if you wanted
to search over an entire document.

There are space tradeoffs to all this, so be sure you understand
Field.Store.YES and NO as they apply to your problem, and what effect
analyzers have on your indexing AND search streams. Lots of people are
confused by this issue.

If you haven't already, get a copy of Luke so you can poke around at your
index. Google luke lucene and it'll pop right up.

Before diving into this as stated, is there a way to re-think the problem to
make it easier? What question are you *really* trying to answer by asking
whether certain tokens are in a particular line?

Best
Erick


On 9/13/06, mcarcelen <mc...@isoco.com> wrote:
>
>
> Hi all,
> I´ve got a index and now I´m trying to create a query with lucene-2.0.0,
> I´d like to find files that in the first line get the following:
>
> <DIV class=My-Word1> AND Word2
>
> I´m tried with the package org.apache.lucene.demo.SearchFiles
> but I get files where the word "Word2" is not in the first line.
>
> I don´t know how to do the query filtered or if I have to use another file
>
> Can anyone help me?
>
> Thanks
>
> Best Regards
> Teresa
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Queries in Lucene

Posted by mcarcelen <mc...@isoco.com>.
Hi all,
I´ve got a index and now I´m trying to create a query with lucene-2.0.0, 
I´d like to find files that in the first line get the following:

<DIV class=My-Word1> AND Word2

I´m tried with the package org.apache.lucene.demo.SearchFiles 
but I get files where the word "Word2" is not in the first line.

I don´t know how to do the query filtered or if I have to use another file

Can anyone help me?

Thanks 

Best Regards
Teresa


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org