You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Paul Elschot <pa...@xs4all.nl> on 2004/07/27 11:47:29 UTC

Some javadoc additions: via bugzilla?

Dear developers,

I have made javadocs for Scorer.java and TermScorer.java.
For this I also had to change build.xml to use package access for the
javadocs target. That caused some minor error javadoc messages
in CompoundFileReader.java and FieldInfos.java, which I fixed.

Can I post corresponding patches in bugzilla, or would you
prefer to have them on lucene-dev?

I could also add the patch that I posted earlier for Weight.java
(a broken javadoc link) in bugzilla.

Regards,
Paul.



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

RE: Powerpoint search using Lucene

Posted by Stephane James Vaucher <va...@cirano.qc.ca>.

I've seen a post on poi-user list with some more code. The links have been
added to the wiki.

http://wiki.apache.org/jakarta-lucene/PowerPoint

sv

On Thu, 29 Jul 2004, Divya S. Jesuraj wrote:

> The second link - does things a bit differently than one would expect.
>
> It creates multiple files "1.txt", "2.txt", so on, extracts the text and
> keeps it only in "1.txt" and doesn't save the name of the initial powerpoint
> file so it can't link to it when you search for it.
>
> What would be ideal is to extract the powerpoint text into an object
> {String?} and create a Lucene Doc that would add it to the index...
>
> I have been playing with the idea of using the code by Mr.Koundinya and
> somehow storing those contents to a string object which then got added as
> "content" to the Lucene Doc. The file name ( .ppt ) and path would get added
> too...will let you folks know how it goes...
>
> ~Divya
>
> -----Original Message-----
> From: Stephane James Vaucher [mailto:vauchers@cirano.qc.ca]
> Sent: Wednesday, July 28, 2004 11:41 PM
> To: Lucene Developers List
> Subject: Re: Powerpoint search using Lucene
>
> I haven't, I've found a few link though...
>
> I just saw this on the poi list. I can't confirm if it works or not (if
> you try it, can you tell us)
>
> http://www.mail-archive.com/poi-user@jakarta.apache.org/msg04782.html
>
> This is a reference to some code that I found works on some ppts:
> http://nagoya.apache.org/eyebrowse/ReadMsg?listName=poi-dev@jakarta.apache.o
> rg&msgNo=4326
>
> sv
>
> On Wed, 28 Jul 2004, Divya S. Jesuraj wrote:
>
> > Hello,
> >
> > I am a VERY new Java Programmer and have now been thrust into development
> > using Lucene. I was able to figure out parsing/indexing of MS Word, MS
> > Excel, RTF, Text files, and PDFs with a lot of reading and using Poi& PDF
> > Sandbox. I however haven't been able to do anything with PPTs [or htmls -
> > that is the least of my worries]...
> >
> > I am indexing a directory on my machine and have a user interface with a
> > JSP. Has anyone figured out how to get a Powerpoint search to work? I
> > searched the forums but I can't find anything that would help my
> situation.
> > Some sample code would be appreciated.
> >
> > Thank you.
> >
> > ~Divya Jesuraj
> > Technical Summer Intern 2004
> > MITRE Corporation
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

RE: Powerpoint search using Lucene

Posted by "Divya S. Jesuraj" <dj...@mitre.org>.

The second link - does things a bit differently than one would expect.

It creates multiple files "1.txt", "2.txt", so on, extracts the text and
keeps it only in "1.txt" and doesn't save the name of the initial powerpoint
file so it can't link to it when you search for it.

What would be ideal is to extract the powerpoint text into an object
{String?} and create a Lucene Doc that would add it to the index...

I have been playing with the idea of using the code by Mr.Koundinya and
somehow storing those contents to a string object which then got added as
"content" to the Lucene Doc. The file name ( .ppt ) and path would get added
too...will let you folks know how it goes...

~Divya

-----Original Message-----
From: Stephane James Vaucher [mailto:vauchers@cirano.qc.ca] 
Sent: Wednesday, July 28, 2004 11:41 PM
To: Lucene Developers List
Subject: Re: Powerpoint search using Lucene

I haven't, I've found a few link though...

I just saw this on the poi list. I can't confirm if it works or not (if
you try it, can you tell us)

http://www.mail-archive.com/poi-user@jakarta.apache.org/msg04782.html

This is a reference to some code that I found works on some ppts:
http://nagoya.apache.org/eyebrowse/ReadMsg?listName=poi-dev@jakarta.apache.o
rg&msgNo=4326

sv

On Wed, 28 Jul 2004, Divya S. Jesuraj wrote:

> Hello,
>
> I am a VERY new Java Programmer and have now been thrust into development
> using Lucene. I was able to figure out parsing/indexing of MS Word, MS
> Excel, RTF, Text files, and PDFs with a lot of reading and using Poi& PDF
> Sandbox. I however haven't been able to do anything with PPTs [or htmls -
> that is the least of my worries]...
>
> I am indexing a directory on my machine and have a user interface with a
> JSP. Has anyone figured out how to get a Powerpoint search to work? I
> searched the forums but I can't find anything that would help my
situation.
> Some sample code would be appreciated.
>
> Thank you.
>
> ~Divya Jesuraj
> Technical Summer Intern 2004
> MITRE Corporation
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

Re: Powerpoint search using Lucene

Posted by Stephane James Vaucher <va...@cirano.qc.ca>.

I haven't, I've found a few link though...

I just saw this on the poi list. I can't confirm if it works or not (if
you try it, can you tell us)

http://www.mail-archive.com/poi-user@jakarta.apache.org/msg04782.html

This is a reference to some code that I found works on some ppts:
http://nagoya.apache.org/eyebrowse/ReadMsg?listName=poi-dev@jakarta.apache.org&msgNo=4326

sv

On Wed, 28 Jul 2004, Divya S. Jesuraj wrote:

> Hello,
>
> I am a VERY new Java Programmer and have now been thrust into development
> using Lucene. I was able to figure out parsing/indexing of MS Word, MS
> Excel, RTF, Text files, and PDFs with a lot of reading and using Poi& PDF
> Sandbox. I however haven't been able to do anything with PPTs [or htmls -
> that is the least of my worries]...
>
> I am indexing a directory on my machine and have a user interface with a
> JSP. Has anyone figured out how to get a Powerpoint search to work? I
> searched the forums but I can't find anything that would help my situation.
> Some sample code would be appreciated.
>
> Thank you.
>
> ~Divya Jesuraj
> Technical Summer Intern 2004
> MITRE Corporation
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

Powerpoint search using Lucene

Posted by "Divya S. Jesuraj" <dj...@mitre.org>.

Hello,

I am a VERY new Java Programmer and have now been thrust into development
using Lucene. I was able to figure out parsing/indexing of MS Word, MS
Excel, RTF, Text files, and PDFs with a lot of reading and using Poi& PDF
Sandbox. I however haven't been able to do anything with PPTs [or htmls -
that is the least of my worries]...

I am indexing a directory on my machine and have a user interface with a
JSP. Has anyone figured out how to get a Powerpoint search to work? I
searched the forums but I can't find anything that would help my situation.
Some sample code would be appreciated.

Thank you.

~Divya Jesuraj
Technical Summer Intern 2004
MITRE Corporation



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

Re: docfaq of IndexReader is showing the deleted document also

Posted by Bernhard Messer <Be...@intrafind.de>.

hi,

the code example you posted is working, but as you mentioned before you 
have the problem that the docFreq is updated only when closing the 
IndexWriter. The method i would prefer is to run a simple query and 
check if you get extactly one hit.  This has the advantage that it is 
returning the correct result when a doc is deleted within the 
IndexReader, before storing the index.

TermQuery tq = new TermQuery(new Term("OID", key));
IndexSearcher luceneSearcher = new IndexSearcher(IndexDirectory);
Hits hits = luceneSearcher.search(tq);
if (hits.length() == 1) {
.... doc is in index
}

regards
Bernhard


lingaraju wrote:

>Thanks for information
>I have read the documentation for the IndexReader.delete. method
>After "Indexwriter optimize()" method "docFreq" is giving correct count I
>mean excluding deleted document
>I am having one more question
>Actually I am using docFreq method to find the particular document is
>present or not in index by using key(Unique) field OID
>
>  IndexReader reader = IndexReader.open("c:/index");
>  Term term = new Term("OID","9365");
>  int i=reader.docFreq(term);
>  if (i!=0)
>  {System.out.println("Document present the index:"+i); }
>
>This is the right way or is there any way to find out?
>
>Regards
>Raju
>
>
>
>----- Original Message ----- 
>From: "Bernhard Messer" <Be...@intrafind.de>
>To: "Lucene Developers List" <lu...@jakarta.apache.org>
>Sent: Tuesday, July 27, 2004 7:28 PM
>Subject: Re: docfaq of IndexReader is showing the deleted document also
>
>
>  
>
>>Hi Raju,
>>
>>read the documentation for the IndexReader.delete method and you will
>>find your way ;-)
>>
>>/** Deletes the document numbered <code>docNum</code>.  Once a document is
>>   deleted it will not appear in TermDocs or TermPostitions enumerations.
>>   Attempts to read its field with the {@link #document}
>>   method will result in an error.  The presence of this document may
>>still be
>>   reflected in the {@link #docFreq} statistic, though
>>   this will be corrected eventually as the index is further modified.
>>   */
>>
>>public final synchronized void delete(int docNum) throws IOException
>>
>>
>>regards
>>Bernhard
>>
>>lingaraju wrote:
>>
>>    
>>
>>>I used the below code
>>>
>>>reader.delete(term);
>>>i=reader.docFreq(term);
>>>System.out.println("docfaq:"+i);
>>>reader.close();
>>>
>>>reader.docFreq method is returning 10 count before delete even after
>>>      
>>>
>delete
>  
>
>>>also count is showing same why
>>>
>>>Regards
>>>Raju
>>>
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>>>
>>>
>>>
>>>
>>>      
>>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>>
>>
>>    
>>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
>  
>

Re: docfaq of IndexReader is showing the deleted document also

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Moved to the more appropriate lucene-user list.

Why don't you just use IndexSearcher with TermQuery with new
Term("OID","9365") ?

The search with such TermQuery will get you 1 or 0 hits, assuming OID
acts as a primary key.  1 if OID=9365 has not been deleted, 0
otherwise.

Otis

--- lingaraju <li...@infactindia.com> wrote:

> 
> Thanks for information
> I have read the documentation for the IndexReader.delete. method
> After "Indexwriter optimize()" method "docFreq" is giving correct
> count I
> mean excluding deleted document
> I am having one more question
> Actually I am using docFreq method to find the particular document is
> present or not in index by using key(Unique) field OID
> 
>   IndexReader reader = IndexReader.open("c:/index");
>   Term term = new Term("OID","9365");
>   int i=reader.docFreq(term);
>   if (i!=0)
>   {System.out.println("Document present the index:"+i); }
> 
> This is the right way or is there any way to find out?
> 
> Regards
> Raju
> 
> 
> 
> ----- Original Message ----- 
> From: "Bernhard Messer" <Be...@intrafind.de>
> To: "Lucene Developers List" <lu...@jakarta.apache.org>
> Sent: Tuesday, July 27, 2004 7:28 PM
> Subject: Re: docfaq of IndexReader is showing the deleted document
> also
> 
> 
> > Hi Raju,
> >
> > read the documentation for the IndexReader.delete method and you
> will
> > find your way ;-)
> >
> > /** Deletes the document numbered <code>docNum</code>.  Once a
> document is
> >    deleted it will not appear in TermDocs or TermPostitions
> enumerations.
> >    Attempts to read its field with the {@link #document}
> >    method will result in an error.  The presence of this document
> may
> > still be
> >    reflected in the {@link #docFreq} statistic, though
> >    this will be corrected eventually as the index is further
> modified.
> >    */
> >
> > public final synchronized void delete(int docNum) throws
> IOException
> >
> >
> > regards
> > Bernhard
> >
> > lingaraju wrote:
> >
> > >I used the below code
> > >
> > >reader.delete(term);
> > >i=reader.docFreq(term);
> > >System.out.println("docfaq:"+i);
> > >reader.close();
> > >
> > >reader.docFreq method is returning 10 count before delete even
> after
> delete
> > >also count is showing same why
> > >
> > >Regards
> > >Raju
> > >
> > >
> > >
> >
>
>---------------------------------------------------------------------
> > >To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> > >For additional commands, e-mail:
> lucene-dev-help@jakarta.apache.org
> > >
> > >
> > >
> > >
> >
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> >
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: docfaq of IndexReader is showing the deleted document also

Posted by lingaraju <li...@infactindia.com>.

Thanks for information
I have read the documentation for the IndexReader.delete. method
After "Indexwriter optimize()" method "docFreq" is giving correct count I
mean excluding deleted document
I am having one more question
Actually I am using docFreq method to find the particular document is
present or not in index by using key(Unique) field OID

  IndexReader reader = IndexReader.open("c:/index");
  Term term = new Term("OID","9365");
  int i=reader.docFreq(term);
  if (i!=0)
  {System.out.println("Document present the index:"+i); }

This is the right way or is there any way to find out?

Regards
Raju



----- Original Message ----- 
From: "Bernhard Messer" <Be...@intrafind.de>
To: "Lucene Developers List" <lu...@jakarta.apache.org>
Sent: Tuesday, July 27, 2004 7:28 PM
Subject: Re: docfaq of IndexReader is showing the deleted document also


> Hi Raju,
>
> read the documentation for the IndexReader.delete method and you will
> find your way ;-)
>
> /** Deletes the document numbered <code>docNum</code>.  Once a document is
>    deleted it will not appear in TermDocs or TermPostitions enumerations.
>    Attempts to read its field with the {@link #document}
>    method will result in an error.  The presence of this document may
> still be
>    reflected in the {@link #docFreq} statistic, though
>    this will be corrected eventually as the index is further modified.
>    */
>
> public final synchronized void delete(int docNum) throws IOException
>
>
> regards
> Bernhard
>
> lingaraju wrote:
>
> >I used the below code
> >
> >reader.delete(term);
> >i=reader.docFreq(term);
> >System.out.println("docfaq:"+i);
> >reader.close();
> >
> >reader.docFreq method is returning 10 count before delete even after
delete
> >also count is showing same why
> >
> >Regards
> >Raju
> >
> >
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> >For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> >
> >
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

Re: docfaq of IndexReader is showing the deleted document also

Posted by Bernhard Messer <Be...@intrafind.de>.

Hi Raju,

read the documentation for the IndexReader.delete method and you will 
find your way ;-)

/** Deletes the document numbered <code>docNum</code>.  Once a document is
   deleted it will not appear in TermDocs or TermPostitions enumerations.
   Attempts to read its field with the {@link #document}
   method will result in an error.  The presence of this document may 
still be
   reflected in the {@link #docFreq} statistic, though
   this will be corrected eventually as the index is further modified.
   */

public final synchronized void delete(int docNum) throws IOException


regards
Bernhard

lingaraju wrote:

>I used the below code
>
>IndexReader reader = IndexReader.open("c:/index");
>Term term = new Term("contents","books");
>int i=reader.docFreq(term);
>System.out.println("docfaq:"+i);
>reader.delete(term);
>i=reader.docFreq(term);
>System.out.println("docfaq:"+i);
>reader.close();
>
>reader.docFreq method is returning 10 count before delete even after delete
>also count is showing same why
>
>Regards
>Raju
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
>
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

docfaq of IndexReader is showing the deleted document also

Posted by lingaraju <li...@infactindia.com>.

I used the below code

IndexReader reader = IndexReader.open("c:/index");
Term term = new Term("contents","books");
int i=reader.docFreq(term);
System.out.println("docfaq:"+i);
reader.delete(term);
i=reader.docFreq(term);
System.out.println("docfaq:"+i);
reader.close();

reader.docFreq method is returning 10 count before delete even after delete
also count is showing same why

Regards
Raju



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

Re: FilteringQuery.java, Filter

Posted by Paul Elschot <pa...@xs4all.nl>.

On Friday 30 July 2004 23:29, Robert Engels wrote:
> I thought the next release we were change 'Filter' to an interface, with a
> definition
>
> interface Filter {
>    boolean accept(Document doc);
> }
>
> Is this not going to happen?

I don't know, I wasn't involved in that.

I'd rather have the BitSet in the current filter changed into a DocNrFilter
and leave the current filter as it is for backward compatibility. So how about:

interface DocNrFilter { /* new interface for BitSet, Set and other implementations */
  boolean accept(int docNr);
}

However, I'd like to have a bit some more functionality in there to support doc(), next()
and skipTo(), ie. document number iterator as needed by a Scorer.
It would a waste not to use BitSet.nextBitSet().

Thinking about it, the current FilteredQuery might be reimplemented using a 
FilteringQuery. I might give that a try one of these days.

(Rereading the posted FilteringQuery.java I see that it doesn't compile
as it is, the constructor for class SkipReaderBitsScorer is
still named FilterReaderBitsScorer,  sorry.)

Regards,
Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

RE: FilteringQuery.java

Posted by Robert Engels <re...@ix.netcom.com>.

I thought the next release we were change 'Filter' to an interface, with a
definition

interface Filter {
   boolean accept(Document doc);
}

Is this not going to happen?

Robert Engels

-----Original Message-----
From: Paul Elschot [mailto:paul.elschot@xs4all.nl]
Sent: Friday, July 30, 2004 4:18 PM
To: lucene-dev@jakarta.apache.org
Subject: FilteringQuery.java


Dear developers,

At the moment IndexSearcher.search(Query, Filter) computes a score
for every document matching the query before checking the filter.

With the BitSet.nextSetBit() method one might implement a
filter as a required clause in a Query. This would even allow the evt. use
of
ConjunctionScorer and skipTo() in appropriate circumstances, currently
all other clauses required.

Below is a Query that intents to do this.
It compiles against current CVS, but it has not yet been tested.
Before I start writing test code I'd like to have some comments.

For very large indexes, and relatively small nrs of filtered docs,
a similar filter could be used with something sparser than a full BitSet,
eg. a byte array of VInts with the differences between the document numbers.

Regards,
Paul.

Here it is, FilteringQuery.java, under Apache 2.0 licence:

package org.apache.lucene.search;

import java.util.BitSet;
import java.io.IOException;
import org.apache.lucene.index.IndexReader;

public abstract class FilteringQuery extends Query {
  Filter filter;
  String filterName;

  public FilteringQuery(Filter filter, String filterName) {
    this.filter = filter; /* should be non null */
    this.filterName = filterName; /* for explanations */
  }

  protected String getFilterExplanation() {
    return (filterName != null) ? filterName : filter.toString();
  }

  /** Prints this <code>FilteringQuery</code> to a <code>String</code>.
   * @param field Should be null because a FilteringQuery depends on a
filter.
   */
  public String toString(String field) {
    String res = "FilteringQuery( " + getFilterExplanation() + ")";
    if (field == null)
      return res;
    else
      return res + "(" + field + " ?)";
  }

  /** Prints this query to a string. */
  public String toString() { return toString(null); }

  /** Expert:
   * @return <code>null</code>. No similarity is used for scoring a
</code>FilteringQuery</code>.
   */
  public Similarity getSimilarity(Searcher searcher) {return null;}

  /** Expert: Apply the Filter and use the result in another Query which
   * extends BooleanQuery to have ConjunctionScorer used when it is Query is
required.
   */
  public Query rewrite(IndexReader reader) throws IOException {

    class SkipReaderBitsQuery extends Query {
      /** Prints this to a <code>String</code>.
       * @param  field Should be null.
       */
      public String toString(String field) {
        String res = "SkipReaderBitsQuery( " + getFilterExplanation() + ")";
        if (field == null)
          return res;
        else
          return res + "(" + field + " ?)";
      }

      /** Expert: Constructs a Weight implementation for this
<code>SkipReaderBitsQuery</code>.
       * <p>Only implemented by primitive queries, which re-write to
themselves.
       */
      protected Weight createWeight(final Searcher searcher) {

        class FilterWeight implements Weight {
          public float getValue() {return 0.0f;}

          public void normalize(float norm) {}

          public float sumOfSquaredWeights() {return 0.0f;}

          public Query getQuery() {return FilteringQuery.this;}

          public Explanation explain(IndexReader reader, int doc) {
            return new Explanation(getValue(), "weightless " +
getFilterExplanation());
          }

          public Scorer scorer(final IndexReader reader) throws IOException
{

            class SkipReaderBitsScorer extends Scorer {
              BitSet docNrs;
              int currentDoc;

              FilterReaderBitsScorer(Similarity similarity) throws
IOException {
                super(similarity);
                /* CHECKME: ok not to compute the bits earlier? */
                docNrs = FilteringQuery.this.filter.bits(reader);
                currentDoc = -1;
              }

              public int doc() {return currentDoc;}

              public float score() {return 0.0f;}

              /* should not be called after returning false */
              public boolean next() {
                currentDoc = docNrs.nextSetBit(currentDoc + 1); /* -1 when
no next bit */
                return currentDoc >= 0;
              }

              /* should not be called after returning false */
              public boolean skipTo(int target) {
                currentDoc = docNrs.nextSetBit((currentDoc < target) ?
target : (currentDoc + 1));
                return currentDoc >= 0;
              }

              public Explanation explain(int doc) {
                skipTo(doc);
                return new Explanation(score() /* zero anyway */,
                                        "document " + doc + " "
                                        + ((currentDoc == doc)
                                            ? "matches"
                                            : "does not match"
                                          )
                                        + " filter: " +
getFilterExplanation());
              }
            }

            return new SkipReaderBitsScorer(getSimilarity(searcher));
          }
        }

        return new FilterWeight();
      }
    }

    return new SkipReaderBitsQuery();
  }
}


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

FilteringQuery.java

Posted by Paul Elschot <pa...@xs4all.nl>.

Dear developers,

At the moment IndexSearcher.search(Query, Filter) computes a score
for every document matching the query before checking the filter.

With the BitSet.nextSetBit() method one might implement a
filter as a required clause in a Query. This would even allow the evt. use of 
ConjunctionScorer and skipTo() in appropriate circumstances, currently
all other clauses required.

Below is a Query that intents to do this.
It compiles against current CVS, but it has not yet been tested.
Before I start writing test code I'd like to have some comments.

For very large indexes, and relatively small nrs of filtered docs,
a similar filter could be used with something sparser than a full BitSet,
eg. a byte array of VInts with the differences between the document numbers.

Regards,
Paul.

Here it is, FilteringQuery.java, under Apache 2.0 licence:

package org.apache.lucene.search;

import java.util.BitSet;
import java.io.IOException;
import org.apache.lucene.index.IndexReader;

public abstract class FilteringQuery extends Query {
  Filter filter;
  String filterName;
  
  public FilteringQuery(Filter filter, String filterName) {
    this.filter = filter; /* should be non null */
    this.filterName = filterName; /* for explanations */
  }
  
  protected String getFilterExplanation() {
    return (filterName != null) ? filterName : filter.toString();
  }
  
  /** Prints this <code>FilteringQuery</code> to a <code>String</code>.
   * @param field Should be null because a FilteringQuery depends on a filter.
   */
  public String toString(String field) {
    String res = "FilteringQuery( " + getFilterExplanation() + ")";
    if (field == null)
      return res;
    else
      return res + "(" + field + " ?)";
  }
  
  /** Prints this query to a string. */
  public String toString() { return toString(null); }
  
  /** Expert:
   * @return <code>null</code>. No similarity is used for scoring a </code>FilteringQuery</code>.
   */
  public Similarity getSimilarity(Searcher searcher) {return null;}

  /** Expert: Apply the Filter and use the result in another Query which
   * extends BooleanQuery to have ConjunctionScorer used when it is Query is required.
   */
  public Query rewrite(IndexReader reader) throws IOException {
    
    class SkipReaderBitsQuery extends Query {
      /** Prints this to a <code>String</code>.
       * @param  field Should be null.
       */
      public String toString(String field) {
        String res = "SkipReaderBitsQuery( " + getFilterExplanation() + ")";
        if (field == null)
          return res;
        else
          return res + "(" + field + " ?)";
      }
      
      /** Expert: Constructs a Weight implementation for this <code>SkipReaderBitsQuery</code>.
       * <p>Only implemented by primitive queries, which re-write to themselves.
       */
      protected Weight createWeight(final Searcher searcher) {
        
        class FilterWeight implements Weight {
          public float getValue() {return 0.0f;}

          public void normalize(float norm) {}

          public float sumOfSquaredWeights() {return 0.0f;}

          public Query getQuery() {return FilteringQuery.this;}

          public Explanation explain(IndexReader reader, int doc) {
            return new Explanation(getValue(), "weightless " + getFilterExplanation());
          }

          public Scorer scorer(final IndexReader reader) throws IOException {

            class SkipReaderBitsScorer extends Scorer {
              BitSet docNrs;
              int currentDoc;
              
              FilterReaderBitsScorer(Similarity similarity) throws IOException {
                super(similarity);
                /* CHECKME: ok not to compute the bits earlier? */
                docNrs = FilteringQuery.this.filter.bits(reader);
                currentDoc = -1;
              }
              
              public int doc() {return currentDoc;}

              public float score() {return 0.0f;}
              
              /* should not be called after returning false */
              public boolean next() {
                currentDoc = docNrs.nextSetBit(currentDoc + 1); /* -1 when no next bit */
                return currentDoc >= 0;
              }
              
              /* should not be called after returning false */
              public boolean skipTo(int target) {
                currentDoc = docNrs.nextSetBit((currentDoc < target) ? target : (currentDoc + 1));
                return currentDoc >= 0;
              }
              
              public Explanation explain(int doc) {
                skipTo(doc);
                return new Explanation(score() /* zero anyway */,
                                        "document " + doc + " "
                                        + ((currentDoc == doc) 
                                            ? "matches"
                                            : "does not match"
                                          )
                                        + " filter: " + getFilterExplanation());
              }
            }

            return new SkipReaderBitsScorer(getSimilarity(searcher));
          }
        }
        
        return new FilterWeight();
      }
    }
    
    return new SkipReaderBitsQuery();
  }
}


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

Re: Some javadoc additions: via bugzilla?

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Bugzilla, please.

Thanks,
Otis

--- Paul Elschot <pa...@xs4all.nl> wrote:

> Dear developers,
> 
> I have made javadocs for Scorer.java and TermScorer.java.
> For this I also had to change build.xml to use package access for the
> javadocs target. That caused some minor error javadoc messages
> in CompoundFileReader.java and FieldInfos.java, which I fixed.
> 
> Can I post corresponding patches in bugzilla, or would you
> prefer to have them on lucene-dev?
> 
> I could also add the patch that I posted earlier for Weight.java
> (a broken javadoc link) in bugzilla.
> 
> Regards,
> Paul.
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org