You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Furkan KAMACI <fu...@gmail.com> on 2013/06/26 16:51:09 UTC

Dynamic Type For Solr Schema

I use Solr 4.3.1 as SolrCloud. I know that I can define analyzer at
schema.xml. Let's assume that I have specialized my analyzer for Turkish.
However I want to have another analzyer too, i.e. for English. I have that
fields at my schema:
...
<field name="content" type="text_tr" stored="true" indexed="true"/>
<field name="title" type="text_tr" stored="true" indexed="true"/>
...

I have a field type as text_tr that is combined for Turkish. I have another
field type as text_en that is combined for Englished. I have another field
at my schema as lang. lang holds the language of document as "en" or "tr".

If I get a document that has a "lang" field holds "*tr*" I want that:

...
<field name="content" type="*text_tr*" stored="true" indexed="true"/>
<field name="title" type="*text_tr*" stored="true" indexed="true"/>
...

If I get a document that has a "lang" field holds "*en*" I want that:

...
<field name="content" type="*text_en*" stored="true" indexed="true"/>
<field name="title" type="*text_en*" stored="true" indexed="true"/>
...

I want dynamic types just for that fields other will be same. How can I do
that properly at Solr? (UpdateRequestProcessor, ...?)

Re: Dynamic Type For Solr Schema

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
On Wed, Jun 26, 2013 at 11:46 AM, Jack Krupansky
<ja...@basetechnology.com> wrote:
> But there are also built-in "language identifier" update processors that can
> simultaneously identify what language is used in the input value for a field
> AND do the redirection to a language-specific field AND store the language
> code.

I have an example of using this as well (for English/Russian):
https://github.com/arafalov/solr-indexing-book/tree/master/published/languages
. This includes the collection data files, so you can see the end
result and play with it. The instructions on how to recreate this and
explanation behind routing and field aliases setup are in my book :
http://blog.outerthoughts.com/2013/06/my-book-on-solr-is-now-published/
:-)

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)

Re: Dynamic Type For Solr Schema

Posted by Jack Krupansky <ja...@basetechnology.com>.
You can certainly do redirection of input values in an update processing, 
even in a JavaScript script.

But there are also built-in "language identifier" update processors that can 
simultaneously identify what language is used in the input value for a field 
AND do the redirection to a language-specific field AND store the language 
code.

See:
LangDetectLanguageIdentifierUpdateProcessorFactory
TikaLanguageIdentifierUpdateProcessorFactory
http://lucene.apache.org/solr/4_3_0/solr-langid/org/apache/solr/update/processor/LangDetectLanguageIdentifierUpdateProcessorFactory.html
http://lucene.apache.org/solr/4_3_0/solr-langid/org/apache/solr/update/processor/TikaLanguageIdentifierUpdateProcessorFactory.html
http://wiki.apache.org/solr/LanguageDetection

The non-Tika version may be better, depending on the nature of your input.

Neither processor is in the new Apache Solr Reference Guide nor current 
release from Lucid, but see the detailed examples in my book.

-- Jack Krupansky

-----Original Message----- 
From: Furkan KAMACI
Sent: Wednesday, June 26, 2013 10:51 AM
To: solr-user@lucene.apache.org
Subject: Dynamic Type For Solr Schema

I use Solr 4.3.1 as SolrCloud. I know that I can define analyzer at
schema.xml. Let's assume that I have specialized my analyzer for Turkish.
However I want to have another analzyer too, i.e. for English. I have that
fields at my schema:
...
<field name="content" type="text_tr" stored="true" indexed="true"/>
<field name="title" type="text_tr" stored="true" indexed="true"/>
...

I have a field type as text_tr that is combined for Turkish. I have another
field type as text_en that is combined for Englished. I have another field
at my schema as lang. lang holds the language of document as "en" or "tr".

If I get a document that has a "lang" field holds "*tr*" I want that:

...
<field name="content" type="*text_tr*" stored="true" indexed="true"/>
<field name="title" type="*text_tr*" stored="true" indexed="true"/>
...

If I get a document that has a "lang" field holds "*en*" I want that:

...
<field name="content" type="*text_en*" stored="true" indexed="true"/>
<field name="title" type="*text_en*" stored="true" indexed="true"/>
...

I want dynamic types just for that fields other will be same. How can I do
that properly at Solr? (UpdateRequestProcessor, ...?) 


Re: Dynamic Type For Solr Schema

Posted by Shawn Heisey <so...@elyograg.org>.
On 6/26/2013 8:51 AM, Furkan KAMACI wrote:
> If I get a document that has a "lang" field holds "*tr*" I want that:
> 
> ...
> <field name="content" type="*text_tr*" stored="true" indexed="true"/>
> <field name="title" type="*text_tr*" stored="true" indexed="true"/>

Changing the TYPE of a field based on the contents of another field
isn't possible.  The language detection that has been mentioned in your
other replies makes it possible to direct different languages to
different fields, but won't change the type.

Solr is highly dependent on its schema.  The schema is necessarily
fairly static.  This is changing to some degree with the schema REST API
in newer versions, but even with that, types aren't dynamic.  If you
change them, you have to reindex.  Making them dynamic would require a
major rewrite of Solr internals, and it's very likely that nobody would
be able to agree on the criteria used to choose a type.

What you are trying to do could be done by writing a custom Lucene
application, because Lucene has no schema.  Field types are determined
by whatever code you write yourself.  The problem with this approach is
that you have to write ALL the server code, something that you get for
free with Solr.  It would not be a trivial task.

Thanks,
Shawn