You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by caomanhdat <ca...@gmail.com> on 2011/06/22 14:19:16 UTC
Get frequency of word
Hi all
I have a problem with get frequency of word in nutch :|
in Lucene it quite easy through this code :
Directory dir2 = FSDirectory.open(new File(indexDir));
IndexReader ir = IndexReader.open(dir2);
TermDocs termDocs = ir.termDocs(new Term("contents", "eBank"));
int count = 0;
while (termDocs.next()) {
count += termDocs.freq();
}
But in nutch, the indexer quite weird so i can't do the same thing
Directory dir2 = FSDirectory.open(new File("D:\\nutch\\crawl\\indexes"));
IndexReader ir = IndexReader.open(dir2);
TermDocs termDocs = ir.termDocs(new Term("contents", "eBank"));
int count = 0;
while (termDocs.next()) {
count += termDocs.freq();
}
--
View this message in context: http://lucene.472066.n3.nabble.com/Get-frequency-of-word-tp3095236p3095236.html
Sent from the Nutch - User mailing list archive at Nabble.com.
Re: Get frequency of word
Posted by caomanhdat <ca...@gmail.com>.
Thanks for your answer!
So how can i get the frequency of a word in all document which is indexed by
nutch.
--
View this message in context: http://lucene.472066.n3.nabble.com/Get-frequency-of-word-tp3095236p3099835.html
Sent from the Nutch - User mailing list archive at Nabble.com.
Re: Get frequency of word
Posted by Gabriele Kahlout <ga...@mysimpatico.com>.
Are you trying to use Nutch's indexer? AFAIK that's deprecated, isn't it?
On Wed, Jun 22, 2011 at 2:19 PM, caomanhdat <ca...@gmail.com> wrote:
> Hi all
> I have a problem with get frequency of word in nutch :|
> in Lucene it quite easy through this code :
>
> Directory dir2 = FSDirectory.open(new File(indexDir));
> IndexReader ir = IndexReader.open(dir2);
> TermDocs termDocs = ir.termDocs(new Term("contents", "eBank"));
> int count = 0;
> while (termDocs.next()) {
> count += termDocs.freq();
> }
>
> But in nutch, the indexer quite weird so i can't do the same thing
>
> Directory dir2 = FSDirectory.open(new File("D:\\nutch\\crawl\\indexes"));
> IndexReader ir = IndexReader.open(dir2);
> TermDocs termDocs = ir.termDocs(new Term("contents", "eBank"));
> int count = 0;
> while (termDocs.next()) {
> count += termDocs.freq();
> }
>
>
>
> --
> View this message in context:
http://lucene.472066.n3.nabble.com/Get-frequency-of-word-tp3095236p3095236.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
--
Regards,
K. Gabriele
--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).
If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).