You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by shm <sh...@dbc.dk> on 2011/01/20 11:38:32 UTC

Indexing same data in multiple fields with different filters

Hi, I have a little problem regarding indexing, that i don't know
how to solve, i need to index the same data in different ways
into the same field. The problem is a normalization problem, and
here is an example:

I have a special character \uA732, which i need to normalize in
two different ways for phrase searching. So if i encounter this
character in, for example, title field I would like it to result
in these two phrase fields:

        raw data = "\uA732lborg"
        phrase.title= "ålborg"        
        phrase.title= "aalborg"

Because both ways are valid representations of tyhe phrase.

I can copy the field from the raw data, but then i cannot
normalize them differently, so i am at a loss.

Does anyone have a solution or a good idea?

Regards
  shm


Re: Indexing same data in multiple fields with different filters

Posted by Gora Mohanty <go...@mimirtech.com>.
On Thu, Jan 20, 2011 at 4:08 PM, shm <sh...@dbc.dk> wrote:
> Hi, I have a little problem regarding indexing, that i don't know
> how to solve, i need to index the same data in different ways
> into the same field. The problem is a normalization problem, and
> here is an example:
>
> I have a special character \uA732, which i need to normalize in
> two different ways for phrase searching. So if i encounter this
> character in, for example, title field I would like it to result
> in these two phrase fields:
>
>        raw data = "\uA732lborg"
>        phrase.title= "ålborg"
>        phrase.title= "aalborg"
[...]

You could use a multi-valued field along with a
ScriptTransformer in the DataImportHandler.
Read in the raw data, call a ScriptTransformer
to do the normalisation, and store both output
versions in the multi-valud field (or, you could
store it in two separate fields, if you prefer).

Regards,
Gora

Re: Indexing same data in multiple fields with different filters

Posted by Erick Erickson <er...@gmail.com>.
I'm assuming that this is just one example of many different
kinds of transformations you could do. It *seems* like a variant
of a synonym analyzer, so you could write a custom analyzer
(it's not actuall hard) to create a bunch of synonyms
for your "special" terms at index time. Or you could use the
synonyms at query time (query time is more flexible)

Best
Erick

On Thu, Jan 20, 2011 at 5:38 AM, shm <sh...@dbc.dk> wrote:

> Hi, I have a little problem regarding indexing, that i don't know
> how to solve, i need to index the same data in different ways
> into the same field. The problem is a normalization problem, and
> here is an example:
>
> I have a special character \uA732, which i need to normalize in
> two different ways for phrase searching. So if i encounter this
> character in, for example, title field I would like it to result
> in these two phrase fields:
>
>        raw data = "\uA732lborg"
>        phrase.title= "ålborg"
>        phrase.title= "aalborg"
>
> Because both ways are valid representations of tyhe phrase.
>
> I can copy the field from the raw data, but then i cannot
> normalize them differently, so i am at a loss.
>
> Does anyone have a solution or a good idea?
>
> Regards
>   shm
>
>