You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Tom Mortimer <to...@flax.co.uk> on 2011/03/28 10:45:06 UTC

copyField at search time / multi-language support

Hi,

Here's my problem: I'm indexing a corpus with text in a variety of
languages. I'm planning to detect these at index time and send the
text to one of a suitably-configured field (e.g. "mytext_de" for
German, "mytext_cjk" for Chinese/Japanese/Korean etc.)

At search time I want to search all of these fields. However, there
will be at least 12 of them, which could lead to a very long query
string. (Also I need to use the standard query parser rather than
dismax, for full query syntax.)

Therefore I was wondering if there was a way to copy fields at search
time, so I can have my mytext query in a single field and have it
copied to mytext_de, mytext_cjk etc. Something like:

   <copyQueryField source="mytext" dest="mytext_de" />
   <copyQueryField source="mytext" dest="mytext_cjk" />
  ...

If this is not currently possible, could someone give me some pointers
for hacking Solr to support it? Should I subclass solr.SearchHandler?
I know nothing about Solr internals at the moment...

thanks,
Tom

Re: copyField at search time / multi-language support

Posted by Erick Erickson <er...@gmail.com>.
This may not be all that helpful, but have you looked at edismax?
https://issues.apache.org/jira/browse/SOLR-1553

It allows the full Solr query syntax while preserving the goodness of
dismax.

This is standard equipment on 3.1, which is being released even as we
speak, and I also know it's being used in production situations.

If going to 3.1 is not an option, I know people have applied that patch
to 1.4.1, but haven't done it myself.

Best
Erick

On Mon, Mar 28, 2011 at 4:45 AM, Tom Mortimer <to...@flax.co.uk> wrote:
> Hi,
>
> Here's my problem: I'm indexing a corpus with text in a variety of
> languages. I'm planning to detect these at index time and send the
> text to one of a suitably-configured field (e.g. "mytext_de" for
> German, "mytext_cjk" for Chinese/Japanese/Korean etc.)
>
> At search time I want to search all of these fields. However, there
> will be at least 12 of them, which could lead to a very long query
> string. (Also I need to use the standard query parser rather than
> dismax, for full query syntax.)
>
> Therefore I was wondering if there was a way to copy fields at search
> time, so I can have my mytext query in a single field and have it
> copied to mytext_de, mytext_cjk etc. Something like:
>
>   <copyQueryField source="mytext" dest="mytext_de" />
>   <copyQueryField source="mytext" dest="mytext_cjk" />
>  ...
>
> If this is not currently possible, could someone give me some pointers
> for hacking Solr to support it? Should I subclass solr.SearchHandler?
> I know nothing about Solr internals at the moment...
>
> thanks,
> Tom
>

Re: copyField at search time / multi-language support

Posted by Gora Mohanty <go...@mimirtech.com>.
On Mon, Mar 28, 2011 at 2:15 PM, Tom Mortimer <to...@flax.co.uk> wrote:
> Hi,
>
> Here's my problem: I'm indexing a corpus with text in a variety of
> languages. I'm planning to detect these at index time and send the
> text to one of a suitably-configured field (e.g. "mytext_de" for
> German, "mytext_cjk" for Chinese/Japanese/Korean etc.)

>
> At search time I want to search all of these fields. However, there
> will be at least 12 of them, which could lead to a very long query
> string. (Also I need to use the standard query parser rather than
> dismax, for full query syntax.)

Sorry, unable to understand this. Are you detecting the language,
and based on that, indexing to one of mytext_de, mytext_cjk, etc.,
or does each field have mixed languages? If the former, why could
you not also detect the language at query time (or, have separate
query sources for users of different languages), and query the
appropriate field based on the known language to be searched?

> Therefore I was wondering if there was a way to copy fields at search
> time, so I can have my mytext query in a single field and have it
> copied to mytext_de, mytext_cjk etc. Something like:
>
>   <copyQueryField source="mytext" dest="mytext_de" />
>   <copyQueryField source="mytext" dest="mytext_cjk" />
>  ...
>
> If this is not currently possible, could someone give me some pointers
> for hacking Solr to support it? Should I subclass solr.SearchHandler?
> I know nothing about Solr internals at the moment...
[...]

This is not possible as far as I know, and would be quite inefficient.

Regards,
Gora

Re: copyField at search time / multi-language support

Posted by lboutros <bo...@gmail.com>.
Tom,

to solve this kind of problem, if I understand it well, you could extend the
query parser to support something like meta-fields. I'm currently developing
a QueryParser Plugin to support a specific syntax. The support of
meta-fields to search on different fields (multiple languages) is one of the
functionalities that this parser will contain.

Ludovic.

2011/3/29 Markus Jelsma-2 [via Lucene] <
ml-node+2747011-315348515-383657@n3.nabble.com>

> I haven't tried this as an UpdateProcessor but it relies on Tika and that
> LanguageIdentifier works well, except for short texts.
>
> > Thanks Markus.
> >
> > Do you know if this patch is good enough for production use? Thanks.
> >
> > Andy
> >
> > --- On Tue, 3/29/11, Markus Jelsma <[hidden email]<http://user/SendEmail.jtp?type=node&node=2747011&i=0&by-user=t>>
> wrote:
> > > From: Markus Jelsma <[hidden email]<http://user/SendEmail.jtp?type=node&node=2747011&i=1&by-user=t>>
>
> > > Subject: Re: copyField at search time / multi-language support
> > > To: [hidden email]<http://user/SendEmail.jtp?type=node&node=2747011&i=2&by-user=t>
> > > Cc: "Andy" <[hidden email]<http://user/SendEmail.jtp?type=node&node=2747011&i=3&by-user=t>>
>
> > > Date: Tuesday, March 29, 2011, 1:29 AM
> > > https://issues.apache.org/jira/browse/SOLR-1979
> > >
> > > > Tom,
> > > >
> > > > Could you share the method you use to perform language
> > >
> > > detection? Any open
> > >
> > > > source tools that do that?
> > > >
> > > > Thanks.
> > > >
> > > > --- On Mon, 3/28/11, Tom Mortimer <[hidden email]<http://user/SendEmail.jtp?type=node&node=2747011&i=4&by-user=t>>
>
> > >
> > > wrote:
> > > > > From: Tom Mortimer <[hidden email]<http://user/SendEmail.jtp?type=node&node=2747011&i=5&by-user=t>>
>
> > > > > Subject: copyField at search time /
> > >
> > > multi-language support
> > >
> > > > > To: [hidden email]<http://user/SendEmail.jtp?type=node&node=2747011&i=6&by-user=t>
> > > > > Date: Monday, March 28, 2011, 4:45 AM
> > > > > Hi,
> > > > >
> > > > > Here's my problem: I'm indexing a corpus with
> > >
> > > text in a
> > >
> > > > > variety of
> > > > > languages. I'm planning to detect these at index
> > >
> > > time and
> > >
> > > > > send the
> > > > > text to one of a suitably-configured field (e.g.
> > > > > "mytext_de" for
> > > > > German, "mytext_cjk" for Chinese/Japanese/Korean
> > >
> > > etc.)
> > >
> > > > > At search time I want to search all of these
> > >
> > > fields.
> > >
> > > > > However, there
> > > > > will be at least 12 of them, which could lead to
> > >
> > > a very
> > >
> > > > > long query
> > > > > string. (Also I need to use the standard query
> > >
> > > parser
> > >
> > > > > rather than
> > > > > dismax, for full query syntax.)
> > > > >
> > > > > Therefore I was wondering if there was a way to
> > >
> > > copy fields
> > >
> > > > > at search
> > > > > time, so I can have my mytext query in a single
> > >
> > > field and
> > >
> > > > > have it
> > > > > copied to mytext_de, mytext_cjk etc. Something
> > >
> > > like:
> > > > >    <copyQueryField source="mytext"
> > > > >
> > > > > dest="mytext_de" />
> > > > >
> > > > >    <copyQueryField source="mytext"
> > > > >
> > > > > dest="mytext_cjk" />
> > > > >
> > > > >   ...
> > > > >
> > > > > If this is not currently possible, could someone
> > >
> > > give me
> > >
> > > > > some pointers
> > > > > for hacking Solr to support it? Should I
> > >
> > > subclass
> > >
> > > > > solr.SearchHandler?
> > > > > I know nothing about Solr internals at the
> > >
> > > moment...
> > >
> > > > > thanks,
> > > > > Tom
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/copyField-at-search-time-multi-language-support-tp2746017p2747011.html
>  To start a new topic under Solr - User, email
> ml-node+472068-1765922688-383657@n3.nabble.com
> To unsubscribe from Solr - User, click here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=472068&code=Ym91dHJvc2xAZ21haWwuY29tfDQ3MjA2OHw0Mzk2MDUxNjE=>.
>
>


-----
Jouve
France.
--
View this message in context: http://lucene.472066.n3.nabble.com/copyField-at-search-time-multi-language-support-tp2746017p2747386.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: copyField at search time / multi-language support

Posted by Markus Jelsma <ma...@openindex.io>.
I haven't tried this as an UpdateProcessor but it relies on Tika and that 
LanguageIdentifier works well, except for short texts.

> Thanks Markus.
> 
> Do you know if this patch is good enough for production use? Thanks.
> 
> Andy
> 
> --- On Tue, 3/29/11, Markus Jelsma <ma...@openindex.io> wrote:
> > From: Markus Jelsma <ma...@openindex.io>
> > Subject: Re: copyField at search time / multi-language support
> > To: solr-user@lucene.apache.org
> > Cc: "Andy" <an...@yahoo.com>
> > Date: Tuesday, March 29, 2011, 1:29 AM
> > https://issues.apache.org/jira/browse/SOLR-1979
> > 
> > > Tom,
> > > 
> > > Could you share the method you use to perform language
> > 
> > detection? Any open
> > 
> > > source tools that do that?
> > > 
> > > Thanks.
> > > 
> > > --- On Mon, 3/28/11, Tom Mortimer <to...@flax.co.uk>
> > 
> > wrote:
> > > > From: Tom Mortimer <to...@flax.co.uk>
> > > > Subject: copyField at search time /
> > 
> > multi-language support
> > 
> > > > To: solr-user@lucene.apache.org
> > > > Date: Monday, March 28, 2011, 4:45 AM
> > > > Hi,
> > > > 
> > > > Here's my problem: I'm indexing a corpus with
> > 
> > text in a
> > 
> > > > variety of
> > > > languages. I'm planning to detect these at index
> > 
> > time and
> > 
> > > > send the
> > > > text to one of a suitably-configured field (e.g.
> > > > "mytext_de" for
> > > > German, "mytext_cjk" for Chinese/Japanese/Korean
> > 
> > etc.)
> > 
> > > > At search time I want to search all of these
> > 
> > fields.
> > 
> > > > However, there
> > > > will be at least 12 of them, which could lead to
> > 
> > a very
> > 
> > > > long query
> > > > string. (Also I need to use the standard query
> > 
> > parser
> > 
> > > > rather than
> > > > dismax, for full query syntax.)
> > > > 
> > > > Therefore I was wondering if there was a way to
> > 
> > copy fields
> > 
> > > > at search
> > > > time, so I can have my mytext query in a single
> > 
> > field and
> > 
> > > > have it
> > > > copied to mytext_de, mytext_cjk etc. Something
> > 
> > like:
> > > >    <copyQueryField source="mytext"
> > > >
> > > > dest="mytext_de" />
> > > >
> > > >    <copyQueryField source="mytext"
> > > >
> > > > dest="mytext_cjk" />
> > > >
> > > >   ...
> > > >
> > > > If this is not currently possible, could someone
> > 
> > give me
> > 
> > > > some pointers
> > > > for hacking Solr to support it? Should I
> > 
> > subclass
> > 
> > > > solr.SearchHandler?
> > > > I know nothing about Solr internals at the
> > 
> > moment...
> > 
> > > > thanks,
> > > > Tom

Re: copyField at search time / multi-language support

Posted by Andy <an...@yahoo.com>.
Thanks Markus.

Do you know if this patch is good enough for production use? Thanks.

Andy

--- On Tue, 3/29/11, Markus Jelsma <ma...@openindex.io> wrote:

> From: Markus Jelsma <ma...@openindex.io>
> Subject: Re: copyField at search time / multi-language support
> To: solr-user@lucene.apache.org
> Cc: "Andy" <an...@yahoo.com>
> Date: Tuesday, March 29, 2011, 1:29 AM
> https://issues.apache.org/jira/browse/SOLR-1979
> 
> > Tom,
> > 
> > Could you share the method you use to perform language
> detection? Any open
> > source tools that do that?
> > 
> > Thanks.
> > 
> > --- On Mon, 3/28/11, Tom Mortimer <to...@flax.co.uk>
> wrote:
> > > From: Tom Mortimer <to...@flax.co.uk>
> > > Subject: copyField at search time /
> multi-language support
> > > To: solr-user@lucene.apache.org
> > > Date: Monday, March 28, 2011, 4:45 AM
> > > Hi,
> > > 
> > > Here's my problem: I'm indexing a corpus with
> text in a
> > > variety of
> > > languages. I'm planning to detect these at index
> time and
> > > send the
> > > text to one of a suitably-configured field (e.g.
> > > "mytext_de" for
> > > German, "mytext_cjk" for Chinese/Japanese/Korean
> etc.)
> > > 
> > > At search time I want to search all of these
> fields.
> > > However, there
> > > will be at least 12 of them, which could lead to
> a very
> > > long query
> > > string. (Also I need to use the standard query
> parser
> > > rather than
> > > dismax, for full query syntax.)
> > > 
> > > Therefore I was wondering if there was a way to
> copy fields
> > > at search
> > > time, so I can have my mytext query in a single
> field and
> > > have it
> > > copied to mytext_de, mytext_cjk etc. Something
> like:
> > > 
> > >    <copyQueryField source="mytext"
> > > dest="mytext_de" />
> > >    <copyQueryField source="mytext"
> > > dest="mytext_cjk" />
> > >   ...
> > > 
> > > If this is not currently possible, could someone
> give me
> > > some pointers
> > > for hacking Solr to support it? Should I
> subclass
> > > solr.SearchHandler?
> > > I know nothing about Solr internals at the
> moment...
> > > 
> > > thanks,
> > > Tom
> 


      

Re: copyField at search time / multi-language support

Posted by Markus Jelsma <ma...@openindex.io>.
https://issues.apache.org/jira/browse/SOLR-1979

> Tom,
> 
> Could you share the method you use to perform language detection? Any open
> source tools that do that?
> 
> Thanks.
> 
> --- On Mon, 3/28/11, Tom Mortimer <to...@flax.co.uk> wrote:
> > From: Tom Mortimer <to...@flax.co.uk>
> > Subject: copyField at search time / multi-language support
> > To: solr-user@lucene.apache.org
> > Date: Monday, March 28, 2011, 4:45 AM
> > Hi,
> > 
> > Here's my problem: I'm indexing a corpus with text in a
> > variety of
> > languages. I'm planning to detect these at index time and
> > send the
> > text to one of a suitably-configured field (e.g.
> > "mytext_de" for
> > German, "mytext_cjk" for Chinese/Japanese/Korean etc.)
> > 
> > At search time I want to search all of these fields.
> > However, there
> > will be at least 12 of them, which could lead to a very
> > long query
> > string. (Also I need to use the standard query parser
> > rather than
> > dismax, for full query syntax.)
> > 
> > Therefore I was wondering if there was a way to copy fields
> > at search
> > time, so I can have my mytext query in a single field and
> > have it
> > copied to mytext_de, mytext_cjk etc. Something like:
> > 
> >    <copyQueryField source="mytext"
> > dest="mytext_de" />
> >    <copyQueryField source="mytext"
> > dest="mytext_cjk" />
> >   ...
> > 
> > If this is not currently possible, could someone give me
> > some pointers
> > for hacking Solr to support it? Should I subclass
> > solr.SearchHandler?
> > I know nothing about Solr internals at the moment...
> > 
> > thanks,
> > Tom

Re: copyField at search time / multi-language support

Posted by Andy <an...@yahoo.com>.
Tom,

Could you share the method you use to perform language detection? Any open source tools that do that?

Thanks.

--- On Mon, 3/28/11, Tom Mortimer <to...@flax.co.uk> wrote:

> From: Tom Mortimer <to...@flax.co.uk>
> Subject: copyField at search time / multi-language support
> To: solr-user@lucene.apache.org
> Date: Monday, March 28, 2011, 4:45 AM
> Hi,
> 
> Here's my problem: I'm indexing a corpus with text in a
> variety of
> languages. I'm planning to detect these at index time and
> send the
> text to one of a suitably-configured field (e.g.
> "mytext_de" for
> German, "mytext_cjk" for Chinese/Japanese/Korean etc.)
> 
> At search time I want to search all of these fields.
> However, there
> will be at least 12 of them, which could lead to a very
> long query
> string. (Also I need to use the standard query parser
> rather than
> dismax, for full query syntax.)
> 
> Therefore I was wondering if there was a way to copy fields
> at search
> time, so I can have my mytext query in a single field and
> have it
> copied to mytext_de, mytext_cjk etc. Something like:
> 
>    <copyQueryField source="mytext"
> dest="mytext_de" />
>    <copyQueryField source="mytext"
> dest="mytext_cjk" />
>   ...
> 
> If this is not currently possible, could someone give me
> some pointers
> for hacking Solr to support it? Should I subclass
> solr.SearchHandler?
> I know nothing about Solr internals at the moment...
> 
> thanks,
> Tom
>