You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by jason <gi...@gmail.com> on 2006/02/06 10:19:15 UTC
understand the queryNorm and the fieldNorm.
Hi,
I have a problem of understanding the queryNorm and fieldNorm.
The following is an example. I try to follow what said in the Javadoc
"Computes the normalization value for a query given the sum of the squared
weights of each of the query terms". But the result is different.
ID:0 C:/PDF2Text/SearchEngine/File/SIG/sigkdd/p374-zhang.pdf|initial rank: 0
0.31900567 = sum of:
0.03968133 = weight(contents:associ in 920), product of:
0.60161763 = queryWeight(contents:associ), product of:
1.326625 = idf(docFreq=830)
0.45349488 = queryNorm
0.065957725 = fieldWeight(contents:associ in 920), product of:
4.2426405 = tf(termFreq(contents:associ)=18)
1.326625 = idf(docFreq=830)
0.01171875 = fieldNorm(field=contents, doc=920)
0.27932435 = weight(contents:rule in 920), product of:
0.7987842 = queryWeight(contents:rule), product of:
1.7613963 = idf(docFreq=537)
0.45349488 = queryNorm
0.34968686 = fieldWeight(contents:rule in 920), product of:
16.941074 = tf(termFreq(contents:rule)=287)
1.7613963 = idf(docFreq=537)
0.01171875 = fieldNorm(field=contents, doc=920)
regards
jiang xing
Re: understand the queryNorm and the fieldNorm.
Posted by jason <gi...@gmail.com>.
hi, thx.
I think i forget the ^0.5
cheers
Jason
On 2/6/06, Yonik Seeley <ys...@gmail.com> wrote:
>
> Hi Jason,
> I get the same thing for the queryNorm when I calculate it by hand:
> 1/((1.7613963**2 + 1.326625**2)**.5) = 0.45349488111693986
>
> -Yonik
>
> On 2/6/06, jason <gi...@gmail.com> wrote:
> > Hi,
> >
> > I have a problem of understanding the queryNorm and fieldNorm.
> >
> > The following is an example. I try to follow what said in the Javadoc
> > "Computes the normalization value for a query given the sum of the
> squared
> > weights of each of the query terms". But the result is different.
> >
> > ID:0 C:/PDF2Text/SearchEngine/File/SIG/sigkdd/p374-zhang.pdf|initialrank: 0
> > 0.31900567 = sum of:
> > 0.03968133 = weight(contents:associ in 920), product of:
> > 0.60161763 = queryWeight(contents:associ), product of:
> > 1.326625 = idf(docFreq=830)
> > 0.45349488 = queryNorm
> > 0.065957725 = fieldWeight(contents:associ in 920), product of:
> > 4.2426405 = tf(termFreq(contents:associ)=18)
> > 1.326625 = idf(docFreq=830)
> > 0.01171875 = fieldNorm(field=contents, doc=920)
> > 0.27932435 = weight(contents:rule in 920), product of:
> > 0.7987842 = queryWeight(contents:rule), product of:
> > 1.7613963 = idf(docFreq=537)
> > 0.45349488 = queryNorm
> > 0.34968686 = fieldWeight(contents:rule in 920), product of:
> > 16.941074 = tf(termFreq(contents:rule)=287)
> > 1.7613963 = idf(docFreq=537)
> > 0.01171875 = fieldNorm(field=contents, doc=920)
> >
> > regards
> > jiang xing
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: time of search for an index with the file .FDT much large
Posted by Yonik Seeley <ys...@gmail.com>.
20 seconds does seem like a long time to retrieve the stored fields of
the 3000 documents. However, you should also step back and determine
if you really need to do that, or if there is another way to narrow
the number of documents that need to be read from disk.
-Yonik
On 2/6/06, Antonio Bruno <nt...@yahoo.it> wrote:
> Hi,
> I have an index with 2,5 million documents.
> A document is formed in this way:
> - 15 fields index
> - 1 field stored but not indexed, whose value is one string of 500 byte.
> A search in average gives back the 3000 document. As 3000 id of documents is given back a lot fastly, the 3000 documents instead demands at least 20sec. for being given back!!!
> I have tried with a string of 5 byte instead that of 500, and the 3000 documents are given back in only 1sec!
> The question seems had to the size of the file .fdt
> How I can make in order to reduce the time in the first case?
>
>
>
>
>
>
> Ing. Antonio Bruno
> Software Analyst
> http://xoomer.virgilio.it/lnb
> cell: (+39) 3402347684
> T&S S.r.l - Technologies And Solutions
> email T&S: bruno@tessrl.com
> email Yahoo: ntnbrn80@yahoo.it
>
>
>
>
>
>
>
>
> ---------------------------------
> Yahoo! Mail: gratis 1GB per i messaggi, antispam, antivirus, POP3
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
time of search for an index with the file .FDT much large
Posted by Antonio Bruno <nt...@yahoo.it>.
Hi,
I have an index with 2,5 million documents.
A document is formed in this way:
- 15 fields index
- 1 field stored but not indexed, whose value is one string of 500 byte.
A search in average gives back the 3000 document. As 3000 id of documents is given back a lot fastly, the 3000 documents instead demands at least 20sec. for being given back!!!
I have tried with a string of 5 byte instead that of 500, and the 3000 documents are given back in only 1sec!
The question seems had to the size of the file .fdt
How I can make in order to reduce the time in the first case?
Ing. Antonio Bruno
Software Analyst
http://xoomer.virgilio.it/lnb
cell: (+39) 3402347684
T&S S.r.l - Technologies And Solutions
email T&S: bruno@tessrl.com
email Yahoo: ntnbrn80@yahoo.it
---------------------------------
Yahoo! Mail: gratis 1GB per i messaggi, antispam, antivirus, POP3
Re: understand the queryNorm and the fieldNorm.
Posted by Yonik Seeley <ys...@gmail.com>.
Hi Jason,
I get the same thing for the queryNorm when I calculate it by hand:
1/((1.7613963**2 + 1.326625**2)**.5) = 0.45349488111693986
-Yonik
On 2/6/06, jason <gi...@gmail.com> wrote:
> Hi,
>
> I have a problem of understanding the queryNorm and fieldNorm.
>
> The following is an example. I try to follow what said in the Javadoc
> "Computes the normalization value for a query given the sum of the squared
> weights of each of the query terms". But the result is different.
>
> ID:0 C:/PDF2Text/SearchEngine/File/SIG/sigkdd/p374-zhang.pdf|initial rank: 0
> 0.31900567 = sum of:
> 0.03968133 = weight(contents:associ in 920), product of:
> 0.60161763 = queryWeight(contents:associ), product of:
> 1.326625 = idf(docFreq=830)
> 0.45349488 = queryNorm
> 0.065957725 = fieldWeight(contents:associ in 920), product of:
> 4.2426405 = tf(termFreq(contents:associ)=18)
> 1.326625 = idf(docFreq=830)
> 0.01171875 = fieldNorm(field=contents, doc=920)
> 0.27932435 = weight(contents:rule in 920), product of:
> 0.7987842 = queryWeight(contents:rule), product of:
> 1.7613963 = idf(docFreq=537)
> 0.45349488 = queryNorm
> 0.34968686 = fieldWeight(contents:rule in 920), product of:
> 16.941074 = tf(termFreq(contents:rule)=287)
> 1.7613963 = idf(docFreq=537)
> 0.01171875 = fieldNorm(field=contents, doc=920)
>
> regards
> jiang xing
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org