You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@lucene.apache.org by ny1984 <na...@yahoo.com> on 2009/01/07 13:19:54 UTC

DuplicateFilter Problem

Hi everyone,

I have a problem about Lucene DuplicateFilter. I have some PDF files and
have 3 field (id, title and content). I am indexing pdf files page by page.
Different pages on the same pdf stores same id and title, only content is
different. I want to search a string and eliminate the same id. But on some
documents DuplicateFilter runs perfect, but in some socumetns it returns 0
result. By the way if I search the string in title it again returns true
results, but if we search in content 0 results resturn. I have added my code
below. I could not find the problem. Please help me about the issue. Thank
you...

        String directory = "C:/indexes/";
        Query queryd = null;
        
        IndexReader = IndexReader.open(directory);
        IndexSearcher searcher = new IndexSearcher(IndexReader);
        
        Analyzer sanalyzer = new StopAnalyzer();
        QueryParser parser = new QueryParser("content",sanalyzer);

        queryd = parser.parse("point");
        DuplicateFilter df = new   DuplicateFilter("id",1,1);
        ehits = searcher.search(queryd, df);

-- 
View this message in context: http://www.nabble.com/DuplicateFilter-Problem-tp21330217p21330217.html
Sent from the Lucene - General mailing list archive at Nabble.com.


Re: DuplicateFilter Problem

Posted by ny1984 <na...@yahoo.com>.
Thank you for your interest and we have found the problem. Actually we
misunderstandt the structure and the working principle of the
DuplicateFilter so we try to wrong thing. In this case we have to change our
desing. Thank you very much again...



hossman wrote:
> 
> 
> I don't have an answer to your question, but since you are specificly 
> asking about classes in the Lucene-Java API I would suggest you re-post 
> your question to the java-user@lucene.
> 
> the general@lucene list is for broader discussions about the entire lucene 
> project (and all subprojects) or when people have extremely general 
> questions aren't aren't sure where to start (ie: use Solr vs use Nutch, vs 
> use PyLucene, etc...)
> 
> 
> -Hoss
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/DuplicateFilter-Problem-tp21330217p21495289.html
Sent from the Lucene - General mailing list archive at Nabble.com.


Re: DuplicateFilter Problem

Posted by Chris Hostetter <ho...@fucit.org>.
I don't have an answer to your question, but since you are specificly 
asking about classes in the Lucene-Java API I would suggest you re-post 
your question to the java-user@lucene.

the general@lucene list is for broader discussions about the entire lucene 
project (and all subprojects) or when people have extremely general 
questions aren't aren't sure where to start (ie: use Solr vs use Nutch, vs 
use PyLucene, etc...)


-Hoss