You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Bernhard Haslhofer <be...@univie.ac.at> on 2010/07/08 10:47:14 UTC

Retrieve term payloads / custom PayloadFilter

Hi,

in my application I have documents that may contain terms and term translations in multiple languages. The language tag of each term is explicitly given and should be available in the index in order to enable queries for documents that contain a certain term (optionally in a given language).

I could split the documents in a set of sub-documents each containing terms in one specific language and a dedicated field indicating the language. But then I need multiple queries to retrieve stored term translations from the subdocuments.

The IMO better alternative is not to split the document and to assign the language tags as payloads to the terms. But then I need

(i) a search filter that eliminates docs based on a given language tag and

(ii) a way to access the term payloads from the documents returned by the searcher

For both I haven't found a solution yet. Can I write a custom PayloadFilter or is there already some implementation available? Is it possible to access the term payloads from the search results?

Thanks.
Bernhard
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Retrieve term payloads / custom PayloadFilter

Posted by Erick Erickson <er...@gmail.com>.
If you know this at index time, could you index language-specific fields?
i.e.
text_en, text_de, title_en, title_de etc? Perhaps you could have a catch-all
that contained everything too.

Then your searching would be on a per field_lang basis.
PerFieldAnalyzerWrapper
would automatically use the proper language-specific Analyzers.

This may turn out being too clumsy if you have many fields X many
languages....

Actually, this looks a lot like what SOLR could provide, perhaps with
dynamic
fields and the dismax query parser

Best
Erick

On Thu, Jul 8, 2010 at 4:47 AM, Bernhard Haslhofer <
bernhard.haslhofer@univie.ac.at> wrote:

> Hi,
>
> in my application I have documents that may contain terms and term
> translations in multiple languages. The language tag of each term is
> explicitly given and should be available in the index in order to enable
> queries for documents that contain a certain term (optionally in a given
> language).
>
> I could split the documents in a set of sub-documents each containing terms
> in one specific language and a dedicated field indicating the language. But
> then I need multiple queries to retrieve stored term translations from the
> subdocuments.
>
> The IMO better alternative is not to split the document and to assign the
> language tags as payloads to the terms. But then I need
>
> (i) a search filter that eliminates docs based on a given language tag and
>
> (ii) a way to access the term payloads from the documents returned by the
> searcher
>
> For both I haven't found a solution yet. Can I write a custom PayloadFilter
> or is there already some implementation available? Is it possible to access
> the term payloads from the search results?
>
> Thanks.
> Bernhard
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>