You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Adriano <ad...@gmail.com> on 2017/08/30 20:15:19 UTC

Knn classifier doesn't work

Hello,

I'm trying to use the knn classifier by following this link:
https://wiki.apache.org/solr/SolrClassification

I use this config :

<requestHandler name="/update" class="solr.UpdateRequestHandler">
    <lst name="defaults">
      <str name="update.chain">classification</str>
    </lst>
  </requestHandler>

and 

  <updateRequestProcessorChain name="classification">
	<processor class="solr.ClassificationUpdateProcessorFactory">
		<str name="inputFields">Title,Body</str>
		<str name="classField">Tags</str>
		<str name="predictedClassField">Tags</str>
		<str name="algorithm">knn</str>
		<str name="knn.k">20</str>
		<str name="knn.minTf">1</str>
		<str name="knn.minDf">5</str>
	</processor>

    	<processor class="solr.LogUpdateProcessorFactory"/>
	<processor class="solr.RunUpdateProcessorFactory" />
  </updateRequestProcessorChain>

For the schema.xml :

  <field name="Title" type="text_en" indexed="true" stored="true"
termVectors="true"/>
  <field name="Body" type="text_en" indexed="true" stored="true"
termVectors="true"/>
  <field name="Tags" type="strings" indexed="true" stored="true"
multiValued="true"/>

There's no error, even in the log.

It's like the updateRequestProcessorChain is not called. So I try with the
bayes and this error occurs :

java.lang.NullPointerException
	at
org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.assignNormClasses(SimpleNaiveBayesDocumentClassifier.java:116)
	at
org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.getClasses(SimpleNaiveBayesDocumentClassifier.java:106)
	at
org.apache.solr.update.processor.ClassificationUpdateProcessor.processAdd(ClassificationUpdateProcessor.java:107)
	at
org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:98)
	at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:180)
	at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:136)
	at
org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:306)
	at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:251)
	at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:122)
	at
org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:271)
	at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:251)
	at
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:173)
	at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:187)
	at
org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:108)
	at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:55)
	at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
	at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
	at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
	at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
	at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
	at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)
	at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)
	at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
	at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
	at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
	at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
	at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
	at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
	at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
	at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
	at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
	at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
	at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
	at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
	at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
	at
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
	at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
	at org.eclipse.jetty.server.Server.handle(Server.java:534)
	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
	at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
	at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
	at
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
	at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
	at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
	at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
	at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
	at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
	at java.lang.Thread.run(Thread.java:748)

What am I doing wrong ? Did I miss something ?





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Knn classifier doesn't work

Posted by Tommaso Teofili <to...@gmail.com>.
hi Alessandro,

yes please, feel free to open a Jira issue, patches welcome !

Tommaso

Il giorno lun 18 set 2017 alle ore 14:30 alessandro.benedetti <
a.benedetti@sease.io> ha scritto:

> Hi Tommaso,
> you are definitely right!
> I see that the method : MultiFields.getTerms
> returns :
>  if (termsPerLeaf.size() == 0) {
>       return null;
>     }
>
> As you correctly mentioned this is not handled in :
>
>
> org/apache/lucene/classification/document/SimpleNaiveBayesDocumentClassifier.java:115
>
> org/apache/lucene/classification/document/SimpleNaiveBayesDocumentClassifier.java:228
> org/apache/lucene/classification/SimpleNaiveBayesClassifier.java:243
>
> Can you do the change or should I open a Jira issue and attach the simple
> patch for you to commit?
> let me know,
>
> Regards
>
>
>
> -----
> ---------------
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Re: Knn classifier doesn't work

Posted by "alessandro.benedetti" <a....@sease.io>.
Hi Tommaso,
you are definitely right!
I see that the method : MultiFields.getTerms
returns :
 if (termsPerLeaf.size() == 0) {
      return null;
    }

As you correctly mentioned this is not handled in :

org/apache/lucene/classification/document/SimpleNaiveBayesDocumentClassifier.java:115
org/apache/lucene/classification/document/SimpleNaiveBayesDocumentClassifier.java:228
org/apache/lucene/classification/SimpleNaiveBayesClassifier.java:243

Can you do the change or should I open a Jira issue and attach the simple
patch for you to commit?
let me know,

Regards



-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Knn classifier doesn't work

Posted by Tommaso Teofili <to...@gmail.com>.
it would sound like none of the docs in your index has the "class" field,
in your case Tags, whereas classification needs some bootstrapping (add
some examples of correctly classified docs to the index beforehand).
On the other hand the naive bayes implementation has definitely a bug as
the MultiFields.getTerms(indexReader, classFieldName) call may return  null
(this is handled in SimpleNaiveBayesClassifier but not in
SImpleNaiveBayesDocumentClassifier).

Hope this helps,
Tommaso

Il giorno mer 30 ago 2017 alle ore 22:15 Adriano <ad...@gmail.com>
ha scritto:

> Hello,
>
> I'm trying to use the knn classifier by following this link:
> https://wiki.apache.org/solr/SolrClassification
>
> I use this config :
>
> <requestHandler name="/update" class="solr.UpdateRequestHandler">
>     <lst name="defaults">
>       <str name="update.chain">classification</str>
>     </lst>
>   </requestHandler>
>
> and
>
>   <updateRequestProcessorChain name="classification">
>         <processor class="solr.ClassificationUpdateProcessorFactory">
>                 <str name="inputFields">Title,Body</str>
>                 <str name="classField">Tags</str>
>                 <str name="predictedClassField">Tags</str>
>                 <str name="algorithm">knn</str>
>                 <str name="knn.k">20</str>
>                 <str name="knn.minTf">1</str>
>                 <str name="knn.minDf">5</str>
>         </processor>
>
>         <processor class="solr.LogUpdateProcessorFactory"/>
>         <processor class="solr.RunUpdateProcessorFactory" />
>   </updateRequestProcessorChain>
>
> For the schema.xml :
>
>   <field name="Title" type="text_en" indexed="true" stored="true"
> termVectors="true"/>
>   <field name="Body" type="text_en" indexed="true" stored="true"
> termVectors="true"/>
>   <field name="Tags" type="strings" indexed="true" stored="true"
> multiValued="true"/>
>
> There's no error, even in the log.
>
> It's like the updateRequestProcessorChain is not called. So I try with the
> bayes and this error occurs :
>
> java.lang.NullPointerException
>         at
>
> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.assignNormClasses(SimpleNaiveBayesDocumentClassifier.java:116)
>         at
>
> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.getClasses(SimpleNaiveBayesDocumentClassifier.java:106)
>         at
>
> org.apache.solr.update.processor.ClassificationUpdateProcessor.processAdd(ClassificationUpdateProcessor.java:107)
>         at
>
> org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:98)
>         at
>
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:180)
>         at
>
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:136)
>         at
> org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:306)
>         at
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:251)
>         at
>
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:122)
>         at
> org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:271)
>         at
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:251)
>         at
> org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:173)
>         at
>
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:187)
>         at
>
> org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:108)
>         at
> org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:55)
>         at
>
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
>         at
>
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
>         at
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
>         at
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
>         at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
>         at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)
>         at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)
>         at
>
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
>         at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>         at
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>         at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>         at
>
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>         at
>
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>         at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>         at
>
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>         at
>
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>         at
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>         at
>
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>         at
>
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
>         at
>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>         at
>
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
>         at
>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>         at org.eclipse.jetty.server.Server.handle(Server.java:534)
>         at
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
>         at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>         at
> org.eclipse.jetty.io
> .AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
>         at org.eclipse.jetty.io
> .FillInterest.fillable(FillInterest.java:95)
>         at
> org.eclipse.jetty.io
> .SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>         at
>
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>         at
>
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>         at
>
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>         at
>
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>         at
>
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>         at java.lang.Thread.run(Thread.java:748)
>
> What am I doing wrong ? Did I miss something ?
>
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>