You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Clemens Wyss DEV <cl...@mysign.ch> on 2014/10/27 07:32:57 UTC
[suggestions] fetch terms from a FilterAtomicReader(subclass)?
Is it possible to fetch the terms of a FilterAtomicReader in order to provide suggestions from a subset of all documents in an index?
So my target is to "provide suggestions from a subset of all documents in an index".
Note:
I have an "equal" discussion ongoing in the solr-mailinglist. But I thought I'd might ask in the core-of-solr (i.e. lucene)-mailinglist, too ;)
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: AW: [suggestions] fetch terms from a FilterAtomicReader(subclass)?
Posted by Olivier Binda <ol...@wanadoo.fr>.
Here you are. This is written in kotlin but it is similar enough to Java
to be usable
private fun buildTermsFromIndex(indexReader:IndexReader, field: String,
file: File, bits:Bits): WFSTCompletionLookup {
val lookup = WFSTCompletionLookup(true)
lookup.build(WeightedLuceneDictionary(indexReader, field,
bits:Bits))
val output = BufferedOutputStream(FileOutputStream(file))
lookup.store(output)
output.close()
return lookup
}
package org.lakedaemon.queries
import org.apache.lucene.index.IndexReader
import org.apache.lucene.util.BytesRefIterator
import org.apache.lucene.index.MultiFields
import org.apache.lucene.search.spell.Dictionary
import org.apache.lucene.search.suggest.InputIterator
import org.lakedaemon.constants.Lucene
import org.apache.lucene.util.BytesRef
import org.apache.lucene.util.NumericUtils
import org.lakedaemon.L
import org.apache.lucene.search.Filter
import org.apache.lucene.util.OpenBitSet
import org.apache.lucene.util.Bits
/**
* Lucene Dictionary: terms taken from the given field
* of a Lucene index, weight computed out of docFreqs.
*/
public final class WeightedLuceneDictionary(private val reader:
IndexReader, private val field: String, private val bits:Bits?) :
Dictionary {
override fun getEntryIterator(): InputIterator {
val terms = MultiFields.getTerms(reader, field)
if (terms == null) return InputIterator.EMPTY
val termsEnum = terms.iterator(null)
return if (termsEnum == null) InputIterator.EMPTY else
WeightedLuceneInputIterator(termsEnum,
Lucene.numericFields.contains(field), bits)
}
}
package org.lakedaemon.queries
import org.apache.lucene.search.suggest.InputIterator
import org.apache.lucene.index.TermsEnum
import java.util.Comparator
import org.apache.lucene.util.BytesRef
import org.apache.lucene.util.NumericUtils
import org.lakedaemon.L
import org.lakedaemon.kotlin.safeGet
import org.apache.lucene.util.Bits
import org.apache.lucene.index.DocsEnum
import org.apache.lucene.search.DocIdSetIterator
final class WeightedLuceneInputIterator(private val termsEnum:
TermsEnum, hasNumericTerms:Boolean = false, val bits :Bits?) :
InputIterator {
val bytesRef = BytesRef()
var docsEnum : DocsEnum? = null
var docSize = 0L
// this is for traversing numeric terms (they are encoded as strings
with a prefix)
val transform = if (hasNumericTerms) {BytesRef.()->
val shift = bytes!![offset].toInt() -
NumericUtils.SHIFT_START_INT.toInt()
if (shift > 31 || shift < 0) null else {
val intString = NumericUtils.prefixCodedToInt(this).toString()
bytesRef.offset = 0
bytesRef.length = 0
bytesRef.copyChars(intString)
bytesRef
}
} else {BytesRef.()-> this}
override fun getComparator() : Comparator<BytesRef>? = null
override fun next(): BytesRef? {
docSize = 0L
while (true) {
val t = termsEnum.next() ?: return null
docsEnum = termsEnum.docs(bits, docsEnum)
val enum = docsEnum ?: continue
while (true) {
val docId = enum.nextDoc()
if (docId == -1 || docId ==
DocIdSetIterator.NO_MORE_DOCS) break
++docSize
}
if (docSize != 0L) return t.transform()
}
}
override fun weight(): Long = docSize
override fun payload(): BytesRef? = null
override fun hasPayloads(): Boolean = false
}
On 10/27/2014 02:08 PM, Clemens Wyss DEV wrote:
> Salut Olivier,
> would you mind providing me your Suggester-class code (or the relevant snippets) as an ideal jump-start?
>
> -Clemens
>
> -----Ursprüngliche Nachricht-----
> Von: Olivier Binda [mailto:olivier.binda@wanadoo.fr]
> Gesendet: Montag, 27. Oktober 2014 11:51
> An: java-user@lucene.apache.org
> Betreff: Re: [suggestions] fetch terms from a FilterAtomicReader(subclass)?
>
> On 10/27/2014 07:32 AM, Clemens Wyss DEV wrote:
>> Is it possible to fetch the terms of a FilterAtomicReader in order to provide suggestions from a subset of all documents in an index?
> Yes, it is possible.
> I do it by feeding a custom Dictionary with a custom InputIterator in the lookup.build() method for WFSTCompletionLookup
>
> Suggestions are preprocessed once at runtime
>
>> So my target is to "provide suggestions from a subset of all documents in an index".
> I provide different suggestions relevant to the languages chosen by my users
>
>> Note:
>> I have an "equal" discussion ongoing in the solr-mailinglist. But I
>> thought I'd might ask in the core-of-solr (i.e. lucene)-mailinglist,
>> too ;)
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
AW: [suggestions] fetch terms from a FilterAtomicReader(subclass)?
Posted by Clemens Wyss DEV <cl...@mysign.ch>.
Salut Olivier,
would you mind providing me your Suggester-class code (or the relevant snippets) as an ideal jump-start?
-Clemens
-----Ursprüngliche Nachricht-----
Von: Olivier Binda [mailto:olivier.binda@wanadoo.fr]
Gesendet: Montag, 27. Oktober 2014 11:51
An: java-user@lucene.apache.org
Betreff: Re: [suggestions] fetch terms from a FilterAtomicReader(subclass)?
On 10/27/2014 07:32 AM, Clemens Wyss DEV wrote:
> Is it possible to fetch the terms of a FilterAtomicReader in order to provide suggestions from a subset of all documents in an index?
Yes, it is possible.
I do it by feeding a custom Dictionary with a custom InputIterator in the lookup.build() method for WFSTCompletionLookup
Suggestions are preprocessed once at runtime
> So my target is to "provide suggestions from a subset of all documents in an index".
I provide different suggestions relevant to the languages chosen by my users
> Note:
> I have an "equal" discussion ongoing in the solr-mailinglist. But I
> thought I'd might ask in the core-of-solr (i.e. lucene)-mailinglist,
> too ;)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: [suggestions] fetch terms from a FilterAtomicReader(subclass)?
Posted by Olivier Binda <ol...@wanadoo.fr>.
On 10/27/2014 07:32 AM, Clemens Wyss DEV wrote:
> Is it possible to fetch the terms of a FilterAtomicReader in order to provide suggestions from a subset of all documents in an index?
Yes, it is possible.
I do it by feeding a custom Dictionary with a custom InputIterator
in the lookup.build() method for WFSTCompletionLookup
Suggestions are preprocessed once at runtime
> So my target is to "provide suggestions from a subset of all documents in an index".
I provide different suggestions relevant to the languages chosen by my users
> Note:
> I have an "equal" discussion ongoing in the solr-mailinglist. But I thought I'd might ask in the core-of-solr (i.e. lucene)-mailinglist, too ;)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org