You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Furkan KAMACI <fu...@gmail.com> on 2018/10/03 07:50:49 UTC

Solr 6.6 LanguageDetector

Hi,

I use Solr 6.6 and try to test automatic language detection. I've added
these configuration into my solrconfig.xml.

    <updateRequestProcessorChain name="langid">
       <processor
class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">
          <lst name="invariants">
            <str name="langid.fl">content</str>
            <str name="langid.whitelist">en,tr</str>
            <str name="langid.langField">language_code</str>
            <str name="langid.fallback">other</str>
            <bool name="langid.map">true</bool>
            <bool name="langid.map.keepOrig">true</bool>
          </lst>
        </processor>
       <processor class="solr.LogUpdateProcessorFactory" />
       <processor class="solr.RunUpdateProcessorFactory" />
     </updateRequestProcessorChain>
...
  <requestHandler name="/update/extract"
                  startup="lazy"
                  class="solr.extraction.ExtractingRequestHandler" >
    <lst name="defaults">
      <str name="lowernames">true</str>
      <str name="captureAttr">true</str>
      <str name="fmap.meta">ignored_</str>
      <str name="fmap.content">content</str>
      <str name="fmap.div">ignored_</str>
      <str name="fmap.a">ignored_</str>
    </lst>
    <lst name="invariants">
      <str name="update.chain">dedupe</str>
      <str name="update.chain">langid</str>
      <str name="update.chain">ignore-commit-from-client</str>
   </lst>
  </requestHandler>

content field is populated but content_en, content_tr, content_other and
language_code fields are empty.

What I miss?

Kind Regards,
Furkan KAMACI

Re: Solr 6.6 LanguageDetector

Posted by Furkan KAMACI <fu...@gmail.com>.
Here is my schema configuration:

   <field name="content" type="text_suggest" indexed="true" stored="true"
multiValued="false"/>
   <field name="content_en" type="text_general" stored="true"
indexed="true" termVectors="true" termPositions="true" termOffsets="true"/>
   <field name="content_tr" type="text_tr" stored="true" indexed="true"
termVectors="true" termPositions="true" termOffsets="true"/>
   <field name="content_other" type="text_general" stored="true"
indexed="true" termVectors="true" termPositions="true" termOffsets="true"/>


On Wed, Oct 3, 2018 at 10:50 AM Furkan KAMACI <fu...@gmail.com>
wrote:

> Hi,
>
> I use Solr 6.6 and try to test automatic language detection. I've added
> these configuration into my solrconfig.xml.
>
>     <updateRequestProcessorChain name="langid">
>        <processor
> class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">
>           <lst name="invariants">
>             <str name="langid.fl">content</str>
>             <str name="langid.whitelist">en,tr</str>
>             <str name="langid.langField">language_code</str>
>             <str name="langid.fallback">other</str>
>             <bool name="langid.map">true</bool>
>             <bool name="langid.map.keepOrig">true</bool>
>           </lst>
>         </processor>
>        <processor class="solr.LogUpdateProcessorFactory" />
>        <processor class="solr.RunUpdateProcessorFactory" />
>      </updateRequestProcessorChain>
> ...
>   <requestHandler name="/update/extract"
>                   startup="lazy"
>                   class="solr.extraction.ExtractingRequestHandler" >
>     <lst name="defaults">
>       <str name="lowernames">true</str>
>       <str name="captureAttr">true</str>
>       <str name="fmap.meta">ignored_</str>
>       <str name="fmap.content">content</str>
>       <str name="fmap.div">ignored_</str>
>       <str name="fmap.a">ignored_</str>
>     </lst>
>     <lst name="invariants">
>       <str name="update.chain">dedupe</str>
>       <str name="update.chain">langid</str>
>       <str name="update.chain">ignore-commit-from-client</str>
>    </lst>
>   </requestHandler>
>
> content field is populated but content_en, content_tr, content_other and
> language_code fields are empty.
>
> What I miss?
>
> Kind Regards,
> Furkan KAMACI
>