You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Elias Khsheibun <el...@gmail.com> on 2009/12/19 14:07:16 UTC

Payloads

Hi,

I need to add a query operator '!' such that when it precedes a word or a
phrase in the query, that term will contribute twice its weight if it is
positioned in an even offset of the document. The position of a phrase is
determined by the offset of its first word. 

I guess it involves payloads...

Elias.



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: Payloads

Posted by Uwe Schindler <uw...@thetaphi.de>.

The problem was solved in #lucene irc channel already. The behaviour of
PayloadTermQuery was correct if you compare scores of a document with an
even and no-even match in the *same* query.

In general: You cannot compare scores on different queries or different
indexes.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Elias Khsheibun [mailto:elias3@gmail.com]
> Sent: Sunday, December 20, 2009 2:51 PM
> To: java-user@lucene.apache.org
> Subject: RE: Payloads
> 
> 
> I'm trying to run queries now, the problem is - the scoring of the
> BoostingTermQuery is always giving a double weight to even terms, and not
> if
> the query itself contains the term, here is the code that I'm using:
> 
> 
> public class DocumentAnalyzer extends Analyzer {
> 
> 	@Override
> 	public TokenStream tokenStream(String fieldName, Reader reader) {
> 		TokenStream result = new WhitespaceTokenizer(reader);
> 		result = new TermPositionPayloadTokenFilter(result);
> 
> 		return result;
> 	}
> 
> }
> 
> 
> public class TermPositionPayloadTokenFilter extends TokenFilter {
> 
>     protected PayloadAttribute payAtt;
>     protected PositionIncrementAttribute posIncrAtt;
> 
>     private static final Payload evenPayload = new
> Payload(PayloadHelper.encodeFloat(2.0f));
> 
>     private int termPosition = 0;
> 
>     public TermPositionPayloadTokenFilter(TokenStream input) {
>         super(input);
>         payAtt = (PayloadAttribute) addAttribute(PayloadAttribute.class);
>         posIncrAtt = (PositionIncrementAttribute)
> addAttribute(PositionIncrementAttribute.class);
>     }
> 
>     @Override
>     public final boolean incrementToken() throws IOException {
>         if (input.incrementToken()) {
>             if ((termPosition % 2) == 0)
>                 payAtt.setPayload(evenPayload);
>             termPosition += posIncrAtt.getPositionIncrement();
>             return true;
>         } else {
>             return false;
>         }
>     }
> 
> }
> 
> 
> 
> public class BoostingSimilarity extends DefaultSimilarity {
> 	public float scorePayload(String fieldName, byte[] payload, int
> offset, int length) {
> 	if (payload != null)
> 	return PayloadHelper.decodeFloat(payload, offset);
> 
> 	else
> 	return 1.0F;
> 	}
> }
> 
> And this is a test I've written, if you look at the scores, then you will
> notice that the BoostingTermQuery is always giving a double weight to even
> terms no matter if they appear in the query or no (this is my current
> problem now):
> 
> public class PayloadsTest extends TestCase {
> 	Directory dir;
> 	IndexWriter writer;
> 	DocumentAnalyzer analyzer;
> 	protected void setUp() throws Exception {
> 	super.setUp();
> 	dir = new RAMDirectory();
> 	analyzer = new DocumentAnalyzer();
> 	writer = new IndexWriter(dir, analyzer,
> IndexWriter.MaxFieldLength.UNLIMITED);
> 	}
> 	protected void tearDown() throws Exception {
> 	super.tearDown();
> 	writer.close();
> 	}
> 	void addDoc(String title, String contents) throws IOException {
> 	Document doc = new Document();
> 	doc.add(new Field("title",
> 	title,
> 	Field.Store.YES,
> 	Field.Index.NO));
> 
> 	doc.add(new Field("contents",
> 			contents,
> 			Field.Store.NO,
> 			Field.Index.ANALYZED));
> 
> 	writer.addDocument(doc);
> 	}
> 
> 	public void testBoostingTermQuery() throws Throwable {
> 	addDoc("Hurricane warning", "A hurricane warning was issued at 6 AM
> for the outer great banks");
> 	addDoc("Warning label maker", "The warning label maker is a
> delightful toy for your precocious six year old's warning needs");
> 	addDoc("Tornado warning", "There is a tornado warning for Worcester
> county until 6 PM today");
> 	writer.commit();
> 	IndexSearcher searcher = new IndexSearcher(dir);
> 	searcher.setSimilarity(new BoostingSimilarity());
> 	Term warning = new Term("contents", "tornado");
> 	Query query1 = new TermQuery(warning);
> 	System.out.println("\nTermQuery results:");
> 
> 	ScoreDoc [] hits = searcher.search(query1, 10).scoreDocs;
> 	 for (int i = 0; i < hits.length; i++) {
> 	      Document hitDoc = searcher.doc(hits[i].doc);
> 	      System.out.println(hitDoc.get("title"));
> 	 }
> 	Query query2 = new BoostingTermQuery(warning);
> 	System.out.println("\nBoostingTermQuery results:");
> 
> 	ScoreDoc [] hits2 = searcher.search(query2, 10).scoreDocs;
> 	for (int i = 0; i < hits2.length; i++) {
> 	      Document hitDoc = searcher.doc(hits2[i].doc);
> 	      System.out.println(hitDoc.get("title"));
> 	 }
> 	}
> 	}
> 
> 
> -----Original Message-----
> From: AHMET ARSLAN [mailto:iorixxx@yahoo.com]
> Sent: Saturday, December 19, 2009 11:19 PM
> To: java-user@lucene.apache.org
> Subject: RE: Payloads
> 
> 
> > If I need to override the QueryParser
> > to return PayloadTermQuery, what
> > function for PayloadFunction should I use in the
> > constructor (If you can
> > show me an example).
> 
> I am not sure about that. Maybe custom one.
> 
> > In your code I didn't see an indexer, will this work with
> > the regular
> > IndexWriter but with the new Analyzer that you overloaded
> 
> No, at index time [IndexWriter] you are going to use a new analyzer that
> uses WhitespaceTokenizer  + TermPositionPayloadTokenFilter.
> 
> PayloadAnalyzer will be used at query time. [QueryParser]
> 
> You need to setSimilarity(new CustomSimilarity) of both indexer and
> searcher.
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: Payloads

Posted by Elias Khsheibun <el...@gmail.com>.

I'm trying to run queries now, the problem is - the scoring of the
BoostingTermQuery is always giving a double weight to even terms, and not if
the query itself contains the term, here is the code that I'm using:


public class DocumentAnalyzer extends Analyzer {

	@Override
	public TokenStream tokenStream(String fieldName, Reader reader) {
		TokenStream result = new WhitespaceTokenizer(reader);
		result = new TermPositionPayloadTokenFilter(result);
		
		return result;
	}
	
}


public class TermPositionPayloadTokenFilter extends TokenFilter {

    protected PayloadAttribute payAtt;
    protected PositionIncrementAttribute posIncrAtt;

    private static final Payload evenPayload = new
Payload(PayloadHelper.encodeFloat(2.0f));

    private int termPosition = 0;

    public TermPositionPayloadTokenFilter(TokenStream input) {
        super(input);
        payAtt = (PayloadAttribute) addAttribute(PayloadAttribute.class);
        posIncrAtt = (PositionIncrementAttribute)
addAttribute(PositionIncrementAttribute.class);
    }

    @Override
    public final boolean incrementToken() throws IOException {
        if (input.incrementToken()) {
            if ((termPosition % 2) == 0)
                payAtt.setPayload(evenPayload);
            termPosition += posIncrAtt.getPositionIncrement();
            return true;
        } else {
            return false;
        }
    }

}



public class BoostingSimilarity extends DefaultSimilarity {
	public float scorePayload(String fieldName, byte[] payload, int
offset, int length) {
	if (payload != null)
	return PayloadHelper.decodeFloat(payload, offset);
	
	else
	return 1.0F;
	}
}

And this is a test I've written, if you look at the scores, then you will
notice that the BoostingTermQuery is always giving a double weight to even
terms no matter if they appear in the query or no (this is my current
problem now):

public class PayloadsTest extends TestCase {
	Directory dir;
	IndexWriter writer;
	DocumentAnalyzer analyzer;
	protected void setUp() throws Exception {
	super.setUp();
	dir = new RAMDirectory();
	analyzer = new DocumentAnalyzer();
	writer = new IndexWriter(dir, analyzer,
IndexWriter.MaxFieldLength.UNLIMITED);
	}
	protected void tearDown() throws Exception {
	super.tearDown();
	writer.close();
	}
	void addDoc(String title, String contents) throws IOException {
	Document doc = new Document();
	doc.add(new Field("title",
	title,
	Field.Store.YES,
	Field.Index.NO));
	
	doc.add(new Field("contents",
			contents,
			Field.Store.NO,
			Field.Index.ANALYZED));
	
	writer.addDocument(doc);
	}
	
	public void testBoostingTermQuery() throws Throwable {
	addDoc("Hurricane warning", "A hurricane warning was issued at 6 AM
for the outer great banks");
	addDoc("Warning label maker", "The warning label maker is a
delightful toy for your precocious six year old's warning needs");
	addDoc("Tornado warning", "There is a tornado warning for Worcester
county until 6 PM today");
	writer.commit();
	IndexSearcher searcher = new IndexSearcher(dir);
	searcher.setSimilarity(new BoostingSimilarity());
	Term warning = new Term("contents", "tornado");
	Query query1 = new TermQuery(warning);
	System.out.println("\nTermQuery results:");
	
	ScoreDoc [] hits = searcher.search(query1, 10).scoreDocs;
	 for (int i = 0; i < hits.length; i++) {
	      Document hitDoc = searcher.doc(hits[i].doc);
	      System.out.println(hitDoc.get("title"));
	 }
	Query query2 = new BoostingTermQuery(warning);
	System.out.println("\nBoostingTermQuery results:");
	
	ScoreDoc [] hits2 = searcher.search(query2, 10).scoreDocs;
	for (int i = 0; i < hits2.length; i++) {
	      Document hitDoc = searcher.doc(hits2[i].doc);
	      System.out.println(hitDoc.get("title"));
	 }
	}
	}


-----Original Message-----
From: AHMET ARSLAN [mailto:iorixxx@yahoo.com] 
Sent: Saturday, December 19, 2009 11:19 PM
To: java-user@lucene.apache.org
Subject: RE: Payloads


> If I need to override the QueryParser
> to return PayloadTermQuery, what
> function for PayloadFunction should I use in the
> constructor (If you can
> show me an example).

I am not sure about that. Maybe custom one.

> In your code I didn't see an indexer, will this work with
> the regular
> IndexWriter but with the new Analyzer that you overloaded

No, at index time [IndexWriter] you are going to use a new analyzer that
uses WhitespaceTokenizer  + TermPositionPayloadTokenFilter.

PayloadAnalyzer will be used at query time. [QueryParser]

You need to setSimilarity(new CustomSimilarity) of both indexer and
searcher.


      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: Payloads

Posted by Elias Khsheibun <el...@gmail.com>.

What do you mean by a custom one - please explain. I must use a
PayloadTermQuery ?

And for the TermPositionPayloadTokenFilter there is a method that is not
used - incrementToken (only used in the main method) ... I didn't see in the
code the place that examines if the query term is at an even offset of the
document I can see it is only called from the main method - but how should
this work all together ?

Thank you.


> If I need to override the QueryParser
> to return PayloadTermQuery, what
> function for PayloadFunction should I use in the constructor (If you 
> can show me an example).

I am not sure about that. Maybe custom one.

> In your code I didn't see an indexer, will this work with the regular 
> IndexWriter but with the new Analyzer that you overloaded

No, at index time [IndexWriter] you are going to use a new analyzer that
uses WhitespaceTokenizer  + TermPositionPayloadTokenFilter.

PayloadAnalyzer will be used at query time. [QueryParser]

You need to setSimilarity(new CustomSimilarity) of both indexer and
searcher.



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: Payloads

Posted by AHMET ARSLAN <io...@yahoo.com>.

> If I need to override the QueryParser
> to return PayloadTermQuery, what
> function for PayloadFunction should I use in the
> constructor (If you can
> show me an example).

I am not sure about that. Maybe custom one.

> In your code I didn't see an indexer, will this work with
> the regular
> IndexWriter but with the new Analyzer that you overloaded

No, at index time [IndexWriter] you are going to use a new analyzer that uses WhitespaceTokenizer  + TermPositionPayloadTokenFilter.

PayloadAnalyzer will be used at query time. [QueryParser]

You need to setSimilarity(new CustomSimilarity) of both indexer and searcher.


      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: Payloads

Posted by Elias Khsheibun <el...@gmail.com>.

If I need to override the QueryParser to return PayloadTermQuery, what
function for PayloadFunction should I use in the constructor (If you can
show me an example).

In your code I didn't see an indexer, will this work with the regular
IndexWriter but with the new Analyzer that you overloaded ?

-----Original Message-----
From: AHMET ARSLAN [mailto:iorixxx@yahoo.com] 
Sent: Saturday, December 19, 2009 8:34 PM
To: java-user@lucene.apache.org
Subject: RE: Payloads

> Let's say I have a document that
> contains the following text:
> 
> "Graph Algorithms is one of the most important topics in computer 
> science"
> 
> And a query "!Graph Algorithms" then the term Graph in the query 
> should have a double weight because the offset of Graph is 0 (and it 
> is
> even) - we apply
> this doubling of weight only if a '!' operator precedes the term and 
> if its offset from the document is even.

I modified the TokenOffsetPayloadTokenFilter and created
TermPositionPayloadTokenFilter.

Index time you can use WhitespaceTokenizer + TermPositionPayloadTokenFilter
to assign payload values of 2.0f to the tokens that have an even term
position.

Modifying the QueryParser to change the meaning of ! operator is very
troublesome.
If you can convert your query "!Graph Algorithms" to "Graph|2.0 Algorithms"
you can use DelimitedPayloadTokenFilter to set payload of marked term. 

Additionally you need to everride QueryParser to return PayloadTermQuery and
scorePayload method of DefaultSimilarity.
By doing so payloads will be included in score calculation.


public class PayloadAnalyzer extends Analyzer {

    public TokenStream tokenStream(String fieldName, Reader reader) {
        TokenStream result = new WhitespaceTokenizer(reader);
        result = new DelimitedPayloadTokenFilter(result, '|', new
FloatEncoder());
        return result;
    }

    public static void main(String[] args) throws ParseException {
        QueryParser qp = new QueryParser(Version.LUCENE_29, "f", new
PayloadAnalyzer());
        System.out.println(qp.parse("Graph|2.0 Algorithms").toString());
    }

}
public class TermPositionPayloadTokenFilter extends TokenFilter {

    protected PayloadAttribute payAtt;
    protected PositionIncrementAttribute posIncrAtt;

    private static final Payload evenPayload = new
Payload(PayloadHelper.encodeFloat(2.0f));

    private int termPosition = 0;

    public TermPositionPayloadTokenFilter(TokenStream input) {
        super(input);
        payAtt = (PayloadAttribute) addAttribute(PayloadAttribute.class);
        posIncrAtt = (PositionIncrementAttribute)
addAttribute(PositionIncrementAttribute.class);
    }

    public final boolean incrementToken() throws IOException {
        if (input.incrementToken()) {
            if ((termPosition % 2) == 0)
                payAtt.setPayload(evenPayload);
            termPosition += posIncrAtt.getPositionIncrement();
            return true;
        } else {
            return false;
        }
    }

    public static void main(String[] args) throws IOException {
        String test = "Graph Algorithms is one of the most important topics
in computer science";
        TokenStream tokenStream = new TermPositionPayloadTokenFilter(new
WhitespaceTokenizer(new StringReader(test)));
        TermAttribute termAtt = (TermAttribute)
tokenStream.getAttribute(TermAttribute.class);
        PayloadAttribute payloadAtt = (PayloadAttribute)
tokenStream.getAttribute(PayloadAttribute.class);

        while (tokenStream.incrementToken()) {
            System.out.print(termAtt.term());
            Payload payload = payloadAtt.getPayload();
            if (payload != null)
                System.out.println(" Payload = " +
PayloadHelper.decodeFloat(payload.toByteArray()));
            else
                System.out.println(" Payload is null.");
        }
    }
}


      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: Payloads

Posted by Elias Khsheibun <el...@gmail.com>.

Thank you, I managed to do that for terms - but for a phrase like the
example below ("!Graph Algorithms") I'm still don't know how to do it... 

-----Original Message-----
From: AHMET ARSLAN [mailto:iorixxx@yahoo.com] 
Sent: Saturday, December 19, 2009 8:34 PM
To: java-user@lucene.apache.org
Subject: RE: Payloads


> Let's say I have a document that
> contains the following text:
> 
> "Graph Algorithms is one of the most important topics in computer 
> science"
> 
> And a query "!Graph Algorithms" then the term Graph in the query 
> should have a double weight because the offset of Graph is 0 (and it 
> is
> even) - we apply
> this doubling of weight only if a '!' operator precedes the term and 
> if its offset from the document is even.

I modified the TokenOffsetPayloadTokenFilter and created
TermPositionPayloadTokenFilter.

Index time you can use WhitespaceTokenizer + TermPositionPayloadTokenFilter
to assign payload values of 2.0f to the tokens that have an even term
position.

Modifying the QueryParser to change the meaning of ! operator is very
troublesome.
If you can convert your query "!Graph Algorithms" to "Graph|2.0 Algorithms"
you can use DelimitedPayloadTokenFilter to set payload of marked term. 

Additionally you need to everride QueryParser to return PayloadTermQuery and
scorePayload method of DefaultSimilarity.
By doing so payloads will be included in score calculation.


public class PayloadAnalyzer extends Analyzer {

    public TokenStream tokenStream(String fieldName, Reader reader) {
        TokenStream result = new WhitespaceTokenizer(reader);
        result = new DelimitedPayloadTokenFilter(result, '|', new
FloatEncoder());
        return result;
    }

    public static void main(String[] args) throws ParseException {
        QueryParser qp = new QueryParser(Version.LUCENE_29, "f", new
PayloadAnalyzer());
        System.out.println(qp.parse("Graph|2.0 Algorithms").toString());
    }

}
public class TermPositionPayloadTokenFilter extends TokenFilter {

    protected PayloadAttribute payAtt;
    protected PositionIncrementAttribute posIncrAtt;

    private static final Payload evenPayload = new
Payload(PayloadHelper.encodeFloat(2.0f));

    private int termPosition = 0;

    public TermPositionPayloadTokenFilter(TokenStream input) {
        super(input);
        payAtt = (PayloadAttribute) addAttribute(PayloadAttribute.class);
        posIncrAtt = (PositionIncrementAttribute)
addAttribute(PositionIncrementAttribute.class);
    }

    public final boolean incrementToken() throws IOException {
        if (input.incrementToken()) {
            if ((termPosition % 2) == 0)
                payAtt.setPayload(evenPayload);
            termPosition += posIncrAtt.getPositionIncrement();
            return true;
        } else {
            return false;
        }
    }

    public static void main(String[] args) throws IOException {
        String test = "Graph Algorithms is one of the most important topics
in computer science";
        TokenStream tokenStream = new TermPositionPayloadTokenFilter(new
WhitespaceTokenizer(new StringReader(test)));
        TermAttribute termAtt = (TermAttribute)
tokenStream.getAttribute(TermAttribute.class);
        PayloadAttribute payloadAtt = (PayloadAttribute)
tokenStream.getAttribute(PayloadAttribute.class);

        while (tokenStream.incrementToken()) {
            System.out.print(termAtt.term());
            Payload payload = payloadAtt.getPayload();
            if (payload != null)
                System.out.println(" Payload = " +
PayloadHelper.decodeFloat(payload.toByteArray()));
            else
                System.out.println(" Payload is null.");
        }
    }
}


      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: Payloads

Posted by Elias Khsheibun <el...@gmail.com>.

About 60 students I think, if you have given some answers I would be
grateful if you could link me to them or quote them again.

-----Original Message-----
From: Uwe Schindler [mailto:uwe@thetaphi.de] 
Sent: Saturday, December 19, 2009 7:00 PM
To: java-user@lucene.apache.org
Subject: RE: Payloads

Just a question, how big is this university course about Lucene? You are the
third asking for the same :-)

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Elias Khsheibun [mailto:elias3@gmail.com]
> Sent: Saturday, December 19, 2009 5:54 PM
> To: java-user@lucene.apache.org
> Subject: RE: Payloads
> 
> Let's say I have a document that contains the following text:
> 
> "Graph Algorithms is one of the most important topics in computer science"
> 
> And a query "!Graph Algorithms" then the term Graph in the query should
> have
> a double weight because the offset of Graph is 0 (and it is even) - we
> apply
> this doubling of weight only if a '!' operator precedes the term and if
> its
> offset from the document is even.
> 
> 
> -----Original Message-----
> From: AHMET ARSLAN [mailto:iorixxx@yahoo.com]
> Sent: Saturday, December 19, 2009 6:48 PM
> To: java-user@lucene.apache.org
> Subject: RE: Payloads
> 
> > I want to override the operator - it
> > is for a project purpose.
> 
> Can you explain your requirements more? What do you mean by "an even
> offset
> of the document"?
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: Payloads

Posted by Uwe Schindler <uw...@thetaphi.de>.

Just a question, how big is this university course about Lucene? You are the
third asking for the same :-)

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Elias Khsheibun [mailto:elias3@gmail.com]
> Sent: Saturday, December 19, 2009 5:54 PM
> To: java-user@lucene.apache.org
> Subject: RE: Payloads
> 
> Let's say I have a document that contains the following text:
> 
> "Graph Algorithms is one of the most important topics in computer science"
> 
> And a query "!Graph Algorithms" then the term Graph in the query should
> have
> a double weight because the offset of Graph is 0 (and it is even) - we
> apply
> this doubling of weight only if a '!' operator precedes the term and if
> its
> offset from the document is even.
> 
> 
> -----Original Message-----
> From: AHMET ARSLAN [mailto:iorixxx@yahoo.com]
> Sent: Saturday, December 19, 2009 6:48 PM
> To: java-user@lucene.apache.org
> Subject: RE: Payloads
> 
> > I want to override the operator - it
> > is for a project purpose.
> 
> Can you explain your requirements more? What do you mean by "an even
> offset
> of the document"?
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: Payloads

Posted by AHMET ARSLAN <io...@yahoo.com>.

> Let's say I have a document that
> contains the following text:
> 
> "Graph Algorithms is one of the most important topics in
> computer science"
> 
> And a query "!Graph Algorithms" then the term Graph in the
> query should have
> a double weight because the offset of Graph is 0 (and it is
> even) - we apply
> this doubling of weight only if a '!' operator precedes the
> term and if its
> offset from the document is even.

I modified the TokenOffsetPayloadTokenFilter and created TermPositionPayloadTokenFilter.

Index time you can use WhitespaceTokenizer + TermPositionPayloadTokenFilter to assign payload values of 2.0f to the tokens that have an even term position.

Modifying the QueryParser to change the meaning of ! operator is very troublesome.
If you can convert your query "!Graph Algorithms" to "Graph|2.0 Algorithms" you can use DelimitedPayloadTokenFilter to set payload of marked term. 

Additionally you need to everride QueryParser to return PayloadTermQuery
and scorePayload method of DefaultSimilarity.
By doing so payloads will be included in score calculation.


public class PayloadAnalyzer extends Analyzer {

    public TokenStream tokenStream(String fieldName, Reader reader) {
        TokenStream result = new WhitespaceTokenizer(reader);
        result = new DelimitedPayloadTokenFilter(result, '|', new FloatEncoder());
        return result;
    }

    public static void main(String[] args) throws ParseException {
        QueryParser qp = new QueryParser(Version.LUCENE_29, "f", new PayloadAnalyzer());
        System.out.println(qp.parse("Graph|2.0 Algorithms").toString());
    }

}
public class TermPositionPayloadTokenFilter extends TokenFilter {

    protected PayloadAttribute payAtt;
    protected PositionIncrementAttribute posIncrAtt;

    private static final Payload evenPayload = new Payload(PayloadHelper.encodeFloat(2.0f));

    private int termPosition = 0;

    public TermPositionPayloadTokenFilter(TokenStream input) {
        super(input);
        payAtt = (PayloadAttribute) addAttribute(PayloadAttribute.class);
        posIncrAtt = (PositionIncrementAttribute) addAttribute(PositionIncrementAttribute.class);
    }

    public final boolean incrementToken() throws IOException {
        if (input.incrementToken()) {
            if ((termPosition % 2) == 0)
                payAtt.setPayload(evenPayload);
            termPosition += posIncrAtt.getPositionIncrement();
            return true;
        } else {
            return false;
        }
    }

    public static void main(String[] args) throws IOException {
        String test = "Graph Algorithms is one of the most important topics in computer science";
        TokenStream tokenStream = new TermPositionPayloadTokenFilter(new WhitespaceTokenizer(new StringReader(test)));
        TermAttribute termAtt = (TermAttribute) tokenStream.getAttribute(TermAttribute.class);
        PayloadAttribute payloadAtt = (PayloadAttribute) tokenStream.getAttribute(PayloadAttribute.class);

        while (tokenStream.incrementToken()) {
            System.out.print(termAtt.term());
            Payload payload = payloadAtt.getPayload();
            if (payload != null)
                System.out.println(" Payload = " + PayloadHelper.decodeFloat(payload.toByteArray()));
            else
                System.out.println(" Payload is null.");
        }
    }
}


      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: Payloads

Posted by Elias Khsheibun <el...@gmail.com>.

Let's say I have a document that contains the following text:

"Graph Algorithms is one of the most important topics in computer science"

And a query "!Graph Algorithms" then the term Graph in the query should have
a double weight because the offset of Graph is 0 (and it is even) - we apply
this doubling of weight only if a '!' operator precedes the term and if its
offset from the document is even.


-----Original Message-----
From: AHMET ARSLAN [mailto:iorixxx@yahoo.com] 
Sent: Saturday, December 19, 2009 6:48 PM
To: java-user@lucene.apache.org
Subject: RE: Payloads

> I want to override the operator - it
> is for a project purpose.

Can you explain your requirements more? What do you mean by "an even offset
of the document"?


      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: Payloads

Posted by AHMET ARSLAN <io...@yahoo.com>.

> I want to override the operator - it
> is for a project purpose.

Can you explain your requirements more? What do you mean by "an even offset of the document"?


      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: Payloads

Posted by Elias Khsheibun <el...@gmail.com>.

I want to override the operator - it is for a project purpose.

-----Original Message-----
From: AHMET ARSLAN [mailto:iorixxx@yahoo.com] 
Sent: Saturday, December 19, 2009 6:41 PM
To: java-user@lucene.apache.org
Subject: Re: Payloads

> Hi,
> 
> I need to add a query operator '!' such that when it
> precedes a word or a
> phrase in the query, that term will contribute twice its
> weight if it is
> positioned in an even offset of the document. The position
> of a phrase is
> determined by the offset of its first word. 
> 
> I guess it involves payloads...
> 
> Elias.

'!' is already a query operator. It is equivalent of NOT. So you cannot use
it. Why not use carat operator?  Like singleterm^2 "some phrase"^2

[Boosting a Term] http://lucene.apache.org/java/3_0_0/queryparsersyntax.html



      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Payloads

Posted by AHMET ARSLAN <io...@yahoo.com>.

> Hi,
> 
> I need to add a query operator '!' such that when it
> precedes a word or a
> phrase in the query, that term will contribute twice its
> weight if it is
> positioned in an even offset of the document. The position
> of a phrase is
> determined by the offset of its first word. 
> 
> I guess it involves payloads...
> 
> Elias.

'!' is already a query operator. It is equivalent of NOT. So you cannot use it. Why not use carat operator?  Like singleterm^2 "some phrase"^2

[Boosting a Term] http://lucene.apache.org/java/3_0_0/queryparsersyntax.html



      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org