You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "S.Selvam Siva" <s....@gmail.com> on 2009/01/23 09:33:56 UTC

stats.jsp - maxDoc and numDoc-help

Hi all,

i am new to solr.I have posted nearly 10 lakh xml docs for the last few
months.

Now i want to find out the total number of duplicate posts untill now.

whether the stats.jsp's  numDocs and maxDocs is the appropriate one to find
out the total duplicate post(maxDocs-numDocs) so far?
please guide me to the solution.
-- 
Yours,
S.Selvam

Re: stats.jsp - maxDoc and numDoc-help

Posted by Chris Hostetter <ho...@fucit.org>.
: 1)then i can think of that "maxDocs- numDocs " should be the maximum(upper
: bound) duplicate post count so far,if i assume no other deletion happened
: other than duplication deletion.

not neccessarily -- when Lucene merges segments (which can happen on any 
add) deletes get flushed from the segments that get merged.  so there may 
have been duplicates that will be missing from your count.

the easiest way to see this is by sending an optimize command -- that 
should cause maxDocs to always equal numDocs.


-Hoss


Re: stats.jsp - maxDoc and numDoc-help

Posted by "S.Selvam Siva" <s....@gmail.com>.
On Fri, Jan 23, 2009 at 10:54 PM, Otis Gospodnetic <
otis_gospodnetic@yahoo.com> wrote:

> Hello,
>
> Those two numbers won't necessarily give you the number of duplicates, as
> they reflect the number of deletes in the index, and those deletes were not
> necessarily caused by Solr detecting a duplicate insert.
>
>
> Otis
>

thank you otis,

1)then i can think of that "maxDocs- numDocs " should be the maximum(upper
bound) duplicate post count so far,if i assume no other deletion happened
other than duplication deletion.

2)Also i have a another query ,where the deletion of indexed document will
happen when a duplicate is posted.My aim is to retrive a particular
field(not unique field) from the indexed document before it is deleted due
to duplication.






-- 
Yours,
S.Selvam

Re: stats.jsp - maxDoc and numDoc-help

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hello,

Those two numbers won't necessarily give you the number of duplicates, as they reflect the number of deletes in the index, and those deletes were not necessarily caused by Solr detecting a duplicate insert.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: S.Selvam Siva <s....@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Friday, January 23, 2009 3:33:56 AM
> Subject: stats.jsp - maxDoc and numDoc-help
> 
> Hi all,
> 
> i am new to solr.I have posted nearly 10 lakh xml docs for the last few
> months.
> 
> Now i want to find out the total number of duplicate posts untill now.
> 
> whether the stats.jsp's  numDocs and maxDocs is the appropriate one to find
> out the total duplicate post(maxDocs-numDocs) so far?
> please guide me to the solution.
> -- 
> Yours,
> S.Selvam