You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Alessandro Benedetti <ab...@apache.org> on 2016/06/22 15:39:32 UTC

Re: A working example to play with Naive Bayes classifier

Hi Tomas,
first consideration :
an empty string is different from a NULL string.
This is controversial, I would suggest you to never use the empty String as
this can cause some others side effect.
Apart from that, the plugin will add the class only if the class field is
without any value

> Object documentClass = doc.getFieldValue(classFieldName);
> if (documentClass == null) {
>
> Saying that, I would suggest you to build a sample index with some
document and then try to classify.
If this doesn't solve your issue, I can help you further.

Cheers

On Wed, Jun 22, 2016 at 3:45 PM, Tomas Ramanauskas <
Tomas.Ramanauskas@springer.com> wrote:

> I also tried this configuration, but could get the feature to work:
>
>
>
>   <initParams path="/update/">
>     <lst name="defaults">
>       <str name="update.chain">classification</str>
>     </lst>
>   </initParams>
>
>
>   <updateRequestProcessorChain name="classification">
>     <processor class="solr.ClassificationUpdateProcessorFactory">
>       <str name="inputFields">title_t,author_s</str>
>       <str name="classField">cat_s</str>
>       <str name="algorithm">bayes</str>
>     </processor>
>   </updateRequestProcessorChain>
>
>
> Tomas
>
> On 22 Jun 2016, at 13:46, Tomas Ramanauskas <
> Tomas.Ramanauskas@springer.com<ma...@springer.com>>
> wrote:
>
> P.S. The version I use:
>
> 6.1.0-68
>
> Also, earlier I said “If I modify an existing record, I think the
> functionality works:”, but I think it doesn’t work for me at all.
>
> $ curl http://localhost:8983/solr/demo/get?id=book1
> {
>   "doc":
>   {
>     "id":"book1",
>     "title_t":["The Way of Kings"],
>     "author_s":"Brandon Sanderson",
>     "cat_s":"fantasy",
>     "pubyear_i":2010,
>     "ISBN_s":"978-0-7653-2635-5",
>     "_version_":1535488016326328320}}
>
> $ curl http://localhost:8983/solr/demo/update -d '
> [
> {"id" : "book1",
> "title_t":["The Way of Kings"],
> "author_s":"Brandon Sanderson",
> "cat_s":"aaa",
> "pubyear_i":2010,
> "ISBN_s":"978-0-7653-2635-5"
> }
> ]'
> {"responseHeader":{"status":0,"QTime":0}}
>
> $ curl http://localhost:8983/solr/demo/get?id=book1
> {
>   "doc":
>   {
>     "id":"book1",
>     "title_t":["The Way of Kings"],
>     "author_s":"Brandon Sanderson",
>     "cat_s":"fantasy",
>     "pubyear_i":2010,
>     "ISBN_s":"978-0-7653-2635-5",
>     "_version_":1535488016326328320}}
>
>
> Tomas
>
>
> On 22 Jun 2016, at 12:47, Tomas Ramanauskas <
> Tomas.Ramanauskas@springer.com<ma...@springer.com>>
> wrote:
>
> Hi, everyone,
>
>
> would someone be able to share a working example (step by step) that
> demonstrates the use of Naive Bayes classifier in Solr?
>
>
> I followed this Blog post:
>
> https://alexbenedetti.blogspot.co.uk/2015/07/solr-document-classification-part-1.html?showComment=1464358093048#c2489902302085000947
>
> And this tutorial:
> http://yonik.com/solr-tutorial/
>
> And this JIRA ticket:
> https://issues.apache.org/jira/browse/SOLR-7739
>
>
>
> So this is my configuration file (only what I added or modified):
>
>   <initParams path="/update/**">
>     <lst name="defaults">
>       <str name="update.chain">classification</str>
>     </lst>
>   </initParams>
>
>
>   <updateRequestProcessorChain name="classification">
>     <processor class="solr.ClassificationUpdateProcessorFactory">
>       <str name="inputFields">title_t,author_s</str>
>       <str name="classField">cat_s</str>
>       <str name="algorithm">bayes</str>
>     </processor>
>   </updateRequestProcessorChain>
>
>
>
> If I modify an existing record, I think the functionality works:
>
>
> $ curl http://localhost:8983/solr/demo/update -d '
> [
> {"id" : "book1",
> "title_t":["The Way of Kings"],
> "author_s":"Brandon Sanderson",
> "cat_s":"",
> "pubyear_i":2010,
> "ISBN_s":"978-0-7653-2635-5"
> }
> ]'
> {"responseHeader":{"status":0,"QTime":8}}
> $ curl http://localhost:8983/solr/demo/get?id=book1
> {
>   "doc":
>   {
>     "id":"book1",
>     "title_t":["The Way of Kings"],
>     "author_s":"Brandon Sanderson",
>     "cat_s":"fantasy",
>     "pubyear_i":2010,
>     "ISBN_s":"978-0-7653-2635-5",
>     "_version_":1535488016326328320}}
>
>
>
>
> If I add a new document, something isn’t quite working:
>
> $ curl http://localhost:8983/solr/demo/update -d '
> [
> {"id" : "book7",
> "title_t":["The Way of Kings"],
> "author_s":"Brandon Sanderson",
> "cat_s":"",
> "pubyear_i":2010,
> "ISBN_s":"978-0-7653-2635-5"
> }
> ]'
> {"responseHeader":{"status":0,"QTime":0}}
> $ curl http://localhost:8983/solr/demo/get?id=book7
> {
>   "doc":null}
>
>
>
>
>
>
>


-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: A working example to play with Naive Bayes classifier

Posted by koponk <ar...@gdplabs.id>.
Hi, i have some problem when implementing this solr classification,

this is my schema : 

<field name="pagetext_mlt" type="text_mlt" indexed="true" stored="true"
required="false" multiValued="false" termVectors="true"/>
<field name="knn_tags" type="string" indexed="true" stored="true"
required="false" multiValued="true"/>
<fieldType name="string" class="solr.StrField" sortMissingLast="true"
docValues="true" useDocValuesAsStored="true"/>

<fieldType name="text_mlt" class="solr.TextField"
positionIncrementGap="100">
    <analyzer type="index">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.StopFilterFactory" ignoreCase="true"
words="lang/stopwords_id.txt"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
  </fieldType>

and this is my solrconfig : 

<requestHandler name="/update" class="solr.UpdateRequestHandler">
    <lst name="defaults">
      <str name="update.chain">classi</str>
    </lst>
  </requestHandler>

  <updateRequestProcessorChain name="classi"> 
    <processor class="solr.ClassificationUpdateProcessorFactory">
      <str name="inputFields">pagetext_mlt</str>
      <str name="classField">knn_tags</str>
      <str name="predictedClassField">prebayes_tags</str>
      <field name="prebayes_tags" type="string" indexed="true" stored="true"
required="false" multiValued="true"/>
      <str name="algorithm">bayes</str> 
    </processor>
    <processor class="solr.LogUpdateProcessorFactory" />
    <processor class="solr.RunUpdateProcessorFactory" /> 
  </updateRequestProcessorChain> 


but this is not working, step : 
1. insert document A with pagetext_mlt="something A" and knn_tags="aaa"
2. insert document B with pagetext_mlt="something B" and knn_tags="bbb"
3. insert document C with pagetext_mlt="something B" and knn_tags=null

but field prebayes_tags always empty(i cant see it even when i stored the
field). is it something i miss?

Thanks,


Alessandro Benedetti wrote
> But how big it is your index ? Are you expecting Solr to automatically
> classify your documents without any knowledge groundbase ?
> Please attach an example of schema.
> There was a reason if I asked you :)
> Seems related the fact we get no token from the text analysis.
> 
> Cheers
> 
> On Fri, Jul 15, 2016 at 12:11 PM, Tomas Ramanauskas <

> Tomas.Ramanauskas@

>> wrote:
> 
>> Hi, Allesandro,
>>
>> sorry for the delay. What do you mean?
>>
>>
>> As I mentioned earlier, I followed a super simply set of steps.
>>
>> 1. Download Solr
>> 2. Configure classification
>> 3. Create some documents using curl over HTTP.
>>
>>
>> Is it difficult to reproduce the steps / problem?
>>
>>
>> Tomas
>>
>>
>>
>> > On 23 Jun 2016, at 16:42, Alessandro Benedetti <
>> 

> benedetti.alex85@

>> wrote:
>> >
>> > Can you give an example of your schema, and can you run a simple query
>> for
>> > you index, curious to see how the input fields are analyzed.
>> >
>> > Cheers
>> >
>> > On Wed, Jun 22, 2016 at 6:05 PM, Alessandro Benedetti <
>> > 

> benedetti.alex85@

>> wrote:
>> >
>> >> This is better!  At list the classifier is invoked!
>> >> How many docs in the index have the class assigned?
>> >> Take a look to the stacktrace and you should find the cause!
>> >> I am now on mobile, I will check the code tomorrow!
>> >> Cheers
>> >> On 22 Jun 2016 5:26 pm, "Tomas Ramanauskas" <
>> >> 

> Tomas.Ramanauskas@

>> wrote:
>> >>
>> >>>
>> >>> I also tried with this config (adding **):
>> >>>
>> >>>
>> >>>  
> <initParams path="/update/**">
>> >>>    
> <lst name="defaults">
>> >>>      
> <str name="update.chain">
> classification
> </str>
>> >>>    
> </lst>
>> >>>  
> </initParams>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> And I get the error:
>> >>>
>> >>>
>> >>>
>> >>> $ curl http://localhost:8983/solr/demo/update -d '
>> >>> [
>> >>> {"id" : "book15",
>> >>> "title_t":["The Way of Kings"],
>> >>> "author_s":"Brandon Sanderson",
>> >>> "cat_s": null,
>> >>> "pubyear_i":2010,
>> >>> "ISBN_s":"978-0-7653-2635-5"
>> >>> }
>> >>> ]'
>> >>>
>> {"responseHeader":{"status":500,"QTime":29},"error":{"trace":"java.lang.NullPointerException\n\tat
>> >>>
>> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.getTokenArray(SimpleNaiveBayesDocumentClassifier.java:202)\n\tat
>> >>>
>> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.analyzeSeedDocument(SimpleNaiveBayesDocumentClassifier.java:162)\n\tat
>> >>>
>> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.assignNormClasses(SimpleNaiveBayesDocumentClassifier.java:121)\n\tat
>> >>>
>> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.assignClass(SimpleNaiveBayesDocumentClassifier.java:81)\n\tat
>> >>>
>> org.apache.solr.update.processor.ClassificationUpdateProcessor.processAdd(ClassificationUpdateProcessor.java:94)\n\tat
>> >>>
>> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.handleAdds(JsonLoader.java:474)\n\tat
>> >>>
>> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:138)\n\tat
>> >>>
>> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:114)\n\tat
>> >>>
>> org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:77)\n\tat
>> >>>
>> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)\n\tat
>> >>>
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69)\n\tat
>> >>>
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)\n\tat
>> >>> org.apache.solr.core.SolrCore.execute(SolrCore.java:2036)\n\tat
>> >>>
>> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:657)\n\tat
>> >>>
>> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)\n\tat
>> >>>
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)\n\tat
>> >>>
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)\n\tat
>> >>>
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)\n\tat
>> >>>
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)\n\tat
>> >>>
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
>> >>>
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
>> >>>
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat
>> >>>
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)\n\tat
>> >>>
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)\n\tat
>> >>>
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat
>> >>>
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)\n\tat
>> >>>
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
>> >>>
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\n\tat
>> >>>
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\n\tat
>> >>>
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat
>> >>> org.eclipse.jetty.server.Server.handle(Server.java:518)\n\tat
>> >>>
>> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)\n\tat
>> >>>
>> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)\n\tat
>> >>>
>> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)\n\tat
>> >>>
>> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)\n\tat
>> >>>
>> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat
>> >>>
>> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)\n\tat
>> >>>
>> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)\n\tat
>> >>>
>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)\n\tat
>> >>>
>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)\n\tat
>> >>> java.lang.Thread.run(Thread.java:745)\n","code":500}}
>> >>>
>> >>>
>> >>> Tomas
>> >>>
>> >>>
>> >>> On 22 Jun 2016, at 17:22, Tomas Ramanauskas <
>> >>> 

> Tomas.Ramanauskas@

> &lt;mailto:

> Tomas.Ramanauskas@

> &gt;>
>> >>> wrote:
>> >>>
>> >>> Thanks for the response, Alessandro.
>> >>>
>> >>> I tried this and it didn’t work either:
>> >>>
>> >>>
>> >>>
>> >>> $ curl http://localhost:8983/solr/demo/update -d '
>> >>> [
>> >>> {"id" : "book14",
>> >>> "title_t":["The Way of Kings"],
>> >>> "author_s":"Brandon Sanderson",
>> >>> "cat_s": null,
>> >>> "pubyear_i":2010,
>> >>> "ISBN_s":"978-0-7653-2635-5"
>> >>> }
>> >>> ]’
>> >>>
>> >>> {"responseHeader":{"status":0,"QTime":2}}
>> >>>
>> >>> $ curl http://localhost:8983/solr/demo/get?id=book14
>> >>> {
>> >>>  "doc":
>> >>>  {
>> >>>    "id":"book14",
>> >>>    "title_t":["The Way of Kings"],
>> >>>    "author_s":"Brandon Sanderson",
>> >>>    "pubyear_i":2010,
>> >>>    "ISBN_s":"978-0-7653-2635-5",
>> >>>    "_version_":1537854598189940736}}
>> >>>
>> >>>
>> >>> I don’t see “cat_s” field in the results at all.
>> >>>
>> >>>
>> >>> Tomas
>> >>>
>> >>>
>> >>> On 22 Jun 2016, at 16:39, Alessandro Benedetti &lt;

> abenedetti@

> &gt; >>> &lt;mailto:

> abenedetti@

> &gt;> wrote:
>> >>>
>> >>> Hi Tomas,
>> >>> first consideration :
>> >>> an empty string is different from a NULL string.
>> >>> This is controversial, I would suggest you to never use the empty
>> String
>> >>> as
>> >>> this can cause some others side effect.
>> >>> Apart from that, the plugin will add the class only if the class
>> field
>> is
>> >>> without any value
>> >>>
>> >>> Object documentClass = doc.getFieldValue(classFieldName);
>> >>> if (documentClass == null) {
>> >>>
>> >>> Saying that, I would suggest you to build a sample index with some
>> >>> document and then try to classify.
>> >>> If this doesn't solve your issue, I can help you further.
>> >>>
>> >>> Cheers
>> >>>
>> >>> On Wed, Jun 22, 2016 at 3:45 PM, Tomas Ramanauskas <
>> >>> 

> Tomas.Ramanauskas@

> &lt;mailto:

> Tomas.Ramanauskas@

> &gt;>
>> >>> wrote:
>> >>>
>> >>> I also tried this configuration, but could get the feature to work:
>> >>>
>> >>>
>> >>>
>> >>> 
> <initParams path="/update/">
>> >>>   
> <lst name="defaults">
>> >>>     
> <str name="update.chain">
> classification
> </str>
>> >>>   
> </lst>
>> >>> 
> </initParams>
>> >>>
>> >>>
>> >>> 
> <updateRequestProcessorChain name="classification">
>> >>>   
> <processor class="solr.ClassificationUpdateProcessorFactory">
>> >>>     
> <str name="inputFields">
> title_t,author_s
> </str>
>> >>>     
> <str name="classField">
> cat_s
> </str>
>> >>>     
> <str name="algorithm">
> bayes
> </str>
>> >>>   
> </processor>
>> >>> 
> </updateRequestProcessorChain>
>> >>>
>> >>>
>> >>> Tomas
>> >>>
>> >>> On 22 Jun 2016, at 13:46, Tomas Ramanauskas <
>> >>> 

> Tomas.Ramanauskas@

> &lt;mailto:

> Tomas.Ramanauskas@

> &gt; >>>> &lt;mailto:

> Tomas.Ramanauskas@

> &gt;>
>> >>> wrote:
>> >>>
>> >>> P.S. The version I use:
>> >>>
>> >>> 6.1.0-68
>> >>>
>> >>> Also, earlier I said “If I modify an existing record, I think the
>> >>> functionality works:”, but I think it doesn’t work for me at all.
>> >>>
>> >>> $ curl http://localhost:8983/solr/demo/get?id=book1
>> >>> {
>> >>> "doc":
>> >>> {
>> >>>   "id":"book1",
>> >>>   "title_t":["The Way of Kings"],
>> >>>   "author_s":"Brandon Sanderson",
>> >>>   "cat_s":"fantasy",
>> >>>   "pubyear_i":2010,
>> >>>   "ISBN_s":"978-0-7653-2635-5",
>> >>>   "_version_":1535488016326328320}}
>> >>>
>> >>> $ curl http://localhost:8983/solr/demo/update -d '
>> >>> [
>> >>> {"id" : "book1",
>> >>> "title_t":["The Way of Kings"],
>> >>> "author_s":"Brandon Sanderson",
>> >>> "cat_s":"aaa",
>> >>> "pubyear_i":2010,
>> >>> "ISBN_s":"978-0-7653-2635-5"
>> >>> }
>> >>> ]'
>> >>> {"responseHeader":{"status":0,"QTime":0}}
>> >>>
>> >>> $ curl http://localhost:8983/solr/demo/get?id=book1
>> >>> {
>> >>> "doc":
>> >>> {
>> >>>   "id":"book1",
>> >>>   "title_t":["The Way of Kings"],
>> >>>   "author_s":"Brandon Sanderson",
>> >>>   "cat_s":"fantasy",
>> >>>   "pubyear_i":2010,
>> >>>   "ISBN_s":"978-0-7653-2635-5",
>> >>>   "_version_":1535488016326328320}}
>> >>>
>> >>>
>> >>> Tomas
>> >>>
>> >>>
>> >>> On 22 Jun 2016, at 12:47, Tomas Ramanauskas <
>> >>> 

> Tomas.Ramanauskas@

> &lt;mailto:

> Tomas.Ramanauskas@

> &gt; >>>> &lt;mailto:

> Tomas.Ramanauskas@

> &gt;>
>> >>> wrote:
>> >>>
>> >>> Hi, everyone,
>> >>>
>> >>>
>> >>> would someone be able to share a working example (step by step) that
>> >>> demonstrates the use of Naive Bayes classifier in Solr?
>> >>>
>> >>>
>> >>> I followed this Blog post:
>> >>>
>> >>>
>> >>>
>> https://alexbenedetti.blogspot.co.uk/2015/07/solr-document-classification-part-1.html?showComment=1464358093048#c2489902302085000947
>> >>>
>> >>> And this tutorial:
>> >>> http://yonik.com/solr-tutorial/
>> >>>
>> >>> And this JIRA ticket:
>> >>> https://issues.apache.org/jira/browse/SOLR-7739
>> >>>
>> >>>
>> >>>
>> >>> So this is my configuration file (only what I added or modified):
>> >>>
>> >>> 
> <initParams path="/update/**">
>> >>>   
> <lst name="defaults">
>> >>>     
> <str name="update.chain">
> classification
> </str>
>> >>>   
> </lst>
>> >>> 
> </initParams>
>> >>>
>> >>>
>> >>> 
> <updateRequestProcessorChain name="classification">
>> >>>   
> <processor class="solr.ClassificationUpdateProcessorFactory">
>> >>>     
> <str name="inputFields">
> title_t,author_s
> </str>
>> >>>     
> <str name="classField">
> cat_s
> </str>
>> >>>     
> <str name="algorithm">
> bayes
> </str>
>> >>>   
> </processor>
>> >>> 
> </updateRequestProcessorChain>
>> >>>
>> >>>
>> >>>
>> >>> If I modify an existing record, I think the functionality works:
>> >>>
>> >>>
>> >>> $ curl http://localhost:8983/solr/demo/update -d '
>> >>> [
>> >>> {"id" : "book1",
>> >>> "title_t":["The Way of Kings"],
>> >>> "author_s":"Brandon Sanderson",
>> >>> "cat_s":"",
>> >>> "pubyear_i":2010,
>> >>> "ISBN_s":"978-0-7653-2635-5"
>> >>> }
>> >>> ]'
>> >>> {"responseHeader":{"status":0,"QTime":8}}
>> >>> $ curl http://localhost:8983/solr/demo/get?id=book1
>> >>> {
>> >>> "doc":
>> >>> {
>> >>>   "id":"book1",
>> >>>   "title_t":["The Way of Kings"],
>> >>>   "author_s":"Brandon Sanderson",
>> >>>   "cat_s":"fantasy",
>> >>>   "pubyear_i":2010,
>> >>>   "ISBN_s":"978-0-7653-2635-5",
>> >>>   "_version_":1535488016326328320}}
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> If I add a new document, something isn’t quite working:
>> >>>
>> >>> $ curl http://localhost:8983/solr/demo/update -d '
>> >>> [
>> >>> {"id" : "book7",
>> >>> "title_t":["The Way of Kings"],
>> >>> "author_s":"Brandon Sanderson",
>> >>> "cat_s":"",
>> >>> "pubyear_i":2010,
>> >>> "ISBN_s":"978-0-7653-2635-5"
>> >>> }
>> >>> ]'
>> >>> {"responseHeader":{"status":0,"QTime":0}}
>> >>> $ curl http://localhost:8983/solr/demo/get?id=book7
>> >>> {
>> >>> "doc":null}
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> --------------------------
>> >>>
>> >>> Benedetti Alessandro
>> >>> Visiting card : http://about.me/alessandro_benedetti
>> >>>
>> >>> "Tyger, tyger burning bright
>> >>> In the forests of the night,
>> >>> What immortal hand or eye
>> >>> Could frame thy fearful symmetry?"
>> >>>
>> >>> William Blake - Songs of Experience -1794 England
>> >>>
>> >>>
>> >>>
>> >
>> >
>> > --
>> > --------------------------
>> >
>> > Benedetti Alessandro
>> > Visiting card - http://about.me/alessandro_benedetti
>> > Blog - http://alexbenedetti.blogspot.co.uk
>> >
>> > "Tyger, tyger burning bright
>> > In the forests of the night,
>> > What immortal hand or eye
>> > Could frame thy fearful symmetry?"
>> >
>> > William Blake - Songs of Experience -1794 England
>>
>>
> 
> 
> -- 
> --------------------------
> 
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
> 
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
> 
> William Blake - Songs of Experience -1794 England





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: A working example to play with Naive Bayes classifier

Posted by Alessandro Benedetti <ab...@apache.org>.
But how big it is your index ? Are you expecting Solr to automatically
classify your documents without any knowledge groundbase ?
Please attach an example of schema.
There was a reason if I asked you :)
Seems related the fact we get no token from the text analysis.

Cheers

On Fri, Jul 15, 2016 at 12:11 PM, Tomas Ramanauskas <
Tomas.Ramanauskas@springer.com> wrote:

> Hi, Allesandro,
>
> sorry for the delay. What do you mean?
>
>
> As I mentioned earlier, I followed a super simply set of steps.
>
> 1. Download Solr
> 2. Configure classification
> 3. Create some documents using curl over HTTP.
>
>
> Is it difficult to reproduce the steps / problem?
>
>
> Tomas
>
>
>
> > On 23 Jun 2016, at 16:42, Alessandro Benedetti <
> benedetti.alex85@gmail.com> wrote:
> >
> > Can you give an example of your schema, and can you run a simple query
> for
> > you index, curious to see how the input fields are analyzed.
> >
> > Cheers
> >
> > On Wed, Jun 22, 2016 at 6:05 PM, Alessandro Benedetti <
> > benedetti.alex85@gmail.com> wrote:
> >
> >> This is better!  At list the classifier is invoked!
> >> How many docs in the index have the class assigned?
> >> Take a look to the stacktrace and you should find the cause!
> >> I am now on mobile, I will check the code tomorrow!
> >> Cheers
> >> On 22 Jun 2016 5:26 pm, "Tomas Ramanauskas" <
> >> Tomas.Ramanauskas@springer.com> wrote:
> >>
> >>>
> >>> I also tried with this config (adding **):
> >>>
> >>>
> >>>  <initParams path="/update/**">
> >>>    <lst name="defaults">
> >>>      <str name="update.chain">classification</str>
> >>>    </lst>
> >>>  </initParams>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> And I get the error:
> >>>
> >>>
> >>>
> >>> $ curl http://localhost:8983/solr/demo/update -d '
> >>> [
> >>> {"id" : "book15",
> >>> "title_t":["The Way of Kings"],
> >>> "author_s":"Brandon Sanderson",
> >>> "cat_s": null,
> >>> "pubyear_i":2010,
> >>> "ISBN_s":"978-0-7653-2635-5"
> >>> }
> >>> ]'
> >>>
> {"responseHeader":{"status":500,"QTime":29},"error":{"trace":"java.lang.NullPointerException\n\tat
> >>>
> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.getTokenArray(SimpleNaiveBayesDocumentClassifier.java:202)\n\tat
> >>>
> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.analyzeSeedDocument(SimpleNaiveBayesDocumentClassifier.java:162)\n\tat
> >>>
> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.assignNormClasses(SimpleNaiveBayesDocumentClassifier.java:121)\n\tat
> >>>
> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.assignClass(SimpleNaiveBayesDocumentClassifier.java:81)\n\tat
> >>>
> org.apache.solr.update.processor.ClassificationUpdateProcessor.processAdd(ClassificationUpdateProcessor.java:94)\n\tat
> >>>
> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.handleAdds(JsonLoader.java:474)\n\tat
> >>>
> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:138)\n\tat
> >>>
> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:114)\n\tat
> >>>
> org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:77)\n\tat
> >>>
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)\n\tat
> >>>
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69)\n\tat
> >>>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)\n\tat
> >>> org.apache.solr.core.SolrCore.execute(SolrCore.java:2036)\n\tat
> >>>
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:657)\n\tat
> >>> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)\n\tat
> >>>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)\n\tat
> >>>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)\n\tat
> >>>
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)\n\tat
> >>>
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)\n\tat
> >>>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
> >>>
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
> >>>
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat
> >>>
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)\n\tat
> >>>
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)\n\tat
> >>>
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat
> >>>
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)\n\tat
> >>>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
> >>>
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\n\tat
> >>>
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\n\tat
> >>>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat
> >>> org.eclipse.jetty.server.Server.handle(Server.java:518)\n\tat
> >>> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)\n\tat
> >>>
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)\n\tat
> >>>
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)\n\tat
> >>> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)\n\tat
> >>>
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat
> >>>
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)\n\tat
> >>>
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)\n\tat
> >>>
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)\n\tat
> >>>
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)\n\tat
> >>> java.lang.Thread.run(Thread.java:745)\n","code":500}}
> >>>
> >>>
> >>> Tomas
> >>>
> >>>
> >>> On 22 Jun 2016, at 17:22, Tomas Ramanauskas <
> >>> Tomas.Ramanauskas@springer.com<ma...@springer.com>>
> >>> wrote:
> >>>
> >>> Thanks for the response, Alessandro.
> >>>
> >>> I tried this and it didn’t work either:
> >>>
> >>>
> >>>
> >>> $ curl http://localhost:8983/solr/demo/update -d '
> >>> [
> >>> {"id" : "book14",
> >>> "title_t":["The Way of Kings"],
> >>> "author_s":"Brandon Sanderson",
> >>> "cat_s": null,
> >>> "pubyear_i":2010,
> >>> "ISBN_s":"978-0-7653-2635-5"
> >>> }
> >>> ]’
> >>>
> >>> {"responseHeader":{"status":0,"QTime":2}}
> >>>
> >>> $ curl http://localhost:8983/solr/demo/get?id=book14
> >>> {
> >>>  "doc":
> >>>  {
> >>>    "id":"book14",
> >>>    "title_t":["The Way of Kings"],
> >>>    "author_s":"Brandon Sanderson",
> >>>    "pubyear_i":2010,
> >>>    "ISBN_s":"978-0-7653-2635-5",
> >>>    "_version_":1537854598189940736}}
> >>>
> >>>
> >>> I don’t see “cat_s” field in the results at all.
> >>>
> >>>
> >>> Tomas
> >>>
> >>>
> >>> On 22 Jun 2016, at 16:39, Alessandro Benedetti <abenedetti@apache.org
> >>> <ma...@apache.org>> wrote:
> >>>
> >>> Hi Tomas,
> >>> first consideration :
> >>> an empty string is different from a NULL string.
> >>> This is controversial, I would suggest you to never use the empty
> String
> >>> as
> >>> this can cause some others side effect.
> >>> Apart from that, the plugin will add the class only if the class field
> is
> >>> without any value
> >>>
> >>> Object documentClass = doc.getFieldValue(classFieldName);
> >>> if (documentClass == null) {
> >>>
> >>> Saying that, I would suggest you to build a sample index with some
> >>> document and then try to classify.
> >>> If this doesn't solve your issue, I can help you further.
> >>>
> >>> Cheers
> >>>
> >>> On Wed, Jun 22, 2016 at 3:45 PM, Tomas Ramanauskas <
> >>> Tomas.Ramanauskas@springer.com<ma...@springer.com>>
> >>> wrote:
> >>>
> >>> I also tried this configuration, but could get the feature to work:
> >>>
> >>>
> >>>
> >>> <initParams path="/update/">
> >>>   <lst name="defaults">
> >>>     <str name="update.chain">classification</str>
> >>>   </lst>
> >>> </initParams>
> >>>
> >>>
> >>> <updateRequestProcessorChain name="classification">
> >>>   <processor class="solr.ClassificationUpdateProcessorFactory">
> >>>     <str name="inputFields">title_t,author_s</str>
> >>>     <str name="classField">cat_s</str>
> >>>     <str name="algorithm">bayes</str>
> >>>   </processor>
> >>> </updateRequestProcessorChain>
> >>>
> >>>
> >>> Tomas
> >>>
> >>> On 22 Jun 2016, at 13:46, Tomas Ramanauskas <
> >>> Tomas.Ramanauskas@springer.com<mailto:Tomas.Ramanauskas@springer.com
> >>>> <ma...@springer.com>>
> >>> wrote:
> >>>
> >>> P.S. The version I use:
> >>>
> >>> 6.1.0-68
> >>>
> >>> Also, earlier I said “If I modify an existing record, I think the
> >>> functionality works:”, but I think it doesn’t work for me at all.
> >>>
> >>> $ curl http://localhost:8983/solr/demo/get?id=book1
> >>> {
> >>> "doc":
> >>> {
> >>>   "id":"book1",
> >>>   "title_t":["The Way of Kings"],
> >>>   "author_s":"Brandon Sanderson",
> >>>   "cat_s":"fantasy",
> >>>   "pubyear_i":2010,
> >>>   "ISBN_s":"978-0-7653-2635-5",
> >>>   "_version_":1535488016326328320}}
> >>>
> >>> $ curl http://localhost:8983/solr/demo/update -d '
> >>> [
> >>> {"id" : "book1",
> >>> "title_t":["The Way of Kings"],
> >>> "author_s":"Brandon Sanderson",
> >>> "cat_s":"aaa",
> >>> "pubyear_i":2010,
> >>> "ISBN_s":"978-0-7653-2635-5"
> >>> }
> >>> ]'
> >>> {"responseHeader":{"status":0,"QTime":0}}
> >>>
> >>> $ curl http://localhost:8983/solr/demo/get?id=book1
> >>> {
> >>> "doc":
> >>> {
> >>>   "id":"book1",
> >>>   "title_t":["The Way of Kings"],
> >>>   "author_s":"Brandon Sanderson",
> >>>   "cat_s":"fantasy",
> >>>   "pubyear_i":2010,
> >>>   "ISBN_s":"978-0-7653-2635-5",
> >>>   "_version_":1535488016326328320}}
> >>>
> >>>
> >>> Tomas
> >>>
> >>>
> >>> On 22 Jun 2016, at 12:47, Tomas Ramanauskas <
> >>> Tomas.Ramanauskas@springer.com<mailto:Tomas.Ramanauskas@springer.com
> >>>> <ma...@springer.com>>
> >>> wrote:
> >>>
> >>> Hi, everyone,
> >>>
> >>>
> >>> would someone be able to share a working example (step by step) that
> >>> demonstrates the use of Naive Bayes classifier in Solr?
> >>>
> >>>
> >>> I followed this Blog post:
> >>>
> >>>
> >>>
> https://alexbenedetti.blogspot.co.uk/2015/07/solr-document-classification-part-1.html?showComment=1464358093048#c2489902302085000947
> >>>
> >>> And this tutorial:
> >>> http://yonik.com/solr-tutorial/
> >>>
> >>> And this JIRA ticket:
> >>> https://issues.apache.org/jira/browse/SOLR-7739
> >>>
> >>>
> >>>
> >>> So this is my configuration file (only what I added or modified):
> >>>
> >>> <initParams path="/update/**">
> >>>   <lst name="defaults">
> >>>     <str name="update.chain">classification</str>
> >>>   </lst>
> >>> </initParams>
> >>>
> >>>
> >>> <updateRequestProcessorChain name="classification">
> >>>   <processor class="solr.ClassificationUpdateProcessorFactory">
> >>>     <str name="inputFields">title_t,author_s</str>
> >>>     <str name="classField">cat_s</str>
> >>>     <str name="algorithm">bayes</str>
> >>>   </processor>
> >>> </updateRequestProcessorChain>
> >>>
> >>>
> >>>
> >>> If I modify an existing record, I think the functionality works:
> >>>
> >>>
> >>> $ curl http://localhost:8983/solr/demo/update -d '
> >>> [
> >>> {"id" : "book1",
> >>> "title_t":["The Way of Kings"],
> >>> "author_s":"Brandon Sanderson",
> >>> "cat_s":"",
> >>> "pubyear_i":2010,
> >>> "ISBN_s":"978-0-7653-2635-5"
> >>> }
> >>> ]'
> >>> {"responseHeader":{"status":0,"QTime":8}}
> >>> $ curl http://localhost:8983/solr/demo/get?id=book1
> >>> {
> >>> "doc":
> >>> {
> >>>   "id":"book1",
> >>>   "title_t":["The Way of Kings"],
> >>>   "author_s":"Brandon Sanderson",
> >>>   "cat_s":"fantasy",
> >>>   "pubyear_i":2010,
> >>>   "ISBN_s":"978-0-7653-2635-5",
> >>>   "_version_":1535488016326328320}}
> >>>
> >>>
> >>>
> >>>
> >>> If I add a new document, something isn’t quite working:
> >>>
> >>> $ curl http://localhost:8983/solr/demo/update -d '
> >>> [
> >>> {"id" : "book7",
> >>> "title_t":["The Way of Kings"],
> >>> "author_s":"Brandon Sanderson",
> >>> "cat_s":"",
> >>> "pubyear_i":2010,
> >>> "ISBN_s":"978-0-7653-2635-5"
> >>> }
> >>> ]'
> >>> {"responseHeader":{"status":0,"QTime":0}}
> >>> $ curl http://localhost:8983/solr/demo/get?id=book7
> >>> {
> >>> "doc":null}
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> --------------------------
> >>>
> >>> Benedetti Alessandro
> >>> Visiting card : http://about.me/alessandro_benedetti
> >>>
> >>> "Tyger, tyger burning bright
> >>> In the forests of the night,
> >>> What immortal hand or eye
> >>> Could frame thy fearful symmetry?"
> >>>
> >>> William Blake - Songs of Experience -1794 England
> >>>
> >>>
> >>>
> >
> >
> > --
> > --------------------------
> >
> > Benedetti Alessandro
> > Visiting card - http://about.me/alessandro_benedetti
> > Blog - http://alexbenedetti.blogspot.co.uk
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
>
>


-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: A working example to play with Naive Bayes classifier

Posted by Tomas Ramanauskas <To...@springer.com>.
Hi, Allesandro,

sorry for the delay. What do you mean?


As I mentioned earlier, I followed a super simply set of steps.

1. Download Solr
2. Configure classification 
3. Create some documents using curl over HTTP.


Is it difficult to reproduce the steps / problem?


Tomas



> On 23 Jun 2016, at 16:42, Alessandro Benedetti <be...@gmail.com> wrote:
> 
> Can you give an example of your schema, and can you run a simple query for
> you index, curious to see how the input fields are analyzed.
> 
> Cheers
> 
> On Wed, Jun 22, 2016 at 6:05 PM, Alessandro Benedetti <
> benedetti.alex85@gmail.com> wrote:
> 
>> This is better!  At list the classifier is invoked!
>> How many docs in the index have the class assigned?
>> Take a look to the stacktrace and you should find the cause!
>> I am now on mobile, I will check the code tomorrow!
>> Cheers
>> On 22 Jun 2016 5:26 pm, "Tomas Ramanauskas" <
>> Tomas.Ramanauskas@springer.com> wrote:
>> 
>>> 
>>> I also tried with this config (adding **):
>>> 
>>> 
>>>  <initParams path="/update/**">
>>>    <lst name="defaults">
>>>      <str name="update.chain">classification</str>
>>>    </lst>
>>>  </initParams>
>>> 
>>> 
>>> 
>>> 
>>> 
>>> And I get the error:
>>> 
>>> 
>>> 
>>> $ curl http://localhost:8983/solr/demo/update -d '
>>> [
>>> {"id" : "book15",
>>> "title_t":["The Way of Kings"],
>>> "author_s":"Brandon Sanderson",
>>> "cat_s": null,
>>> "pubyear_i":2010,
>>> "ISBN_s":"978-0-7653-2635-5"
>>> }
>>> ]'
>>> {"responseHeader":{"status":500,"QTime":29},"error":{"trace":"java.lang.NullPointerException\n\tat
>>> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.getTokenArray(SimpleNaiveBayesDocumentClassifier.java:202)\n\tat
>>> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.analyzeSeedDocument(SimpleNaiveBayesDocumentClassifier.java:162)\n\tat
>>> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.assignNormClasses(SimpleNaiveBayesDocumentClassifier.java:121)\n\tat
>>> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.assignClass(SimpleNaiveBayesDocumentClassifier.java:81)\n\tat
>>> org.apache.solr.update.processor.ClassificationUpdateProcessor.processAdd(ClassificationUpdateProcessor.java:94)\n\tat
>>> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.handleAdds(JsonLoader.java:474)\n\tat
>>> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:138)\n\tat
>>> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:114)\n\tat
>>> org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:77)\n\tat
>>> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)\n\tat
>>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69)\n\tat
>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)\n\tat
>>> org.apache.solr.core.SolrCore.execute(SolrCore.java:2036)\n\tat
>>> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:657)\n\tat
>>> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)\n\tat
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)\n\tat
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)\n\tat
>>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)\n\tat
>>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)\n\tat
>>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
>>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
>>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat
>>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)\n\tat
>>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)\n\tat
>>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat
>>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)\n\tat
>>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
>>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\n\tat
>>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\n\tat
>>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat
>>> org.eclipse.jetty.server.Server.handle(Server.java:518)\n\tat
>>> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)\n\tat
>>> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)\n\tat
>>> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)\n\tat
>>> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)\n\tat
>>> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat
>>> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)\n\tat
>>> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)\n\tat
>>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)\n\tat
>>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)\n\tat
>>> java.lang.Thread.run(Thread.java:745)\n","code":500}}
>>> 
>>> 
>>> Tomas
>>> 
>>> 
>>> On 22 Jun 2016, at 17:22, Tomas Ramanauskas <
>>> Tomas.Ramanauskas@springer.com<ma...@springer.com>>
>>> wrote:
>>> 
>>> Thanks for the response, Alessandro.
>>> 
>>> I tried this and it didn’t work either:
>>> 
>>> 
>>> 
>>> $ curl http://localhost:8983/solr/demo/update -d '
>>> [
>>> {"id" : "book14",
>>> "title_t":["The Way of Kings"],
>>> "author_s":"Brandon Sanderson",
>>> "cat_s": null,
>>> "pubyear_i":2010,
>>> "ISBN_s":"978-0-7653-2635-5"
>>> }
>>> ]’
>>> 
>>> {"responseHeader":{"status":0,"QTime":2}}
>>> 
>>> $ curl http://localhost:8983/solr/demo/get?id=book14
>>> {
>>>  "doc":
>>>  {
>>>    "id":"book14",
>>>    "title_t":["The Way of Kings"],
>>>    "author_s":"Brandon Sanderson",
>>>    "pubyear_i":2010,
>>>    "ISBN_s":"978-0-7653-2635-5",
>>>    "_version_":1537854598189940736}}
>>> 
>>> 
>>> I don’t see “cat_s” field in the results at all.
>>> 
>>> 
>>> Tomas
>>> 
>>> 
>>> On 22 Jun 2016, at 16:39, Alessandro Benedetti <abenedetti@apache.org
>>> <ma...@apache.org>> wrote:
>>> 
>>> Hi Tomas,
>>> first consideration :
>>> an empty string is different from a NULL string.
>>> This is controversial, I would suggest you to never use the empty String
>>> as
>>> this can cause some others side effect.
>>> Apart from that, the plugin will add the class only if the class field is
>>> without any value
>>> 
>>> Object documentClass = doc.getFieldValue(classFieldName);
>>> if (documentClass == null) {
>>> 
>>> Saying that, I would suggest you to build a sample index with some
>>> document and then try to classify.
>>> If this doesn't solve your issue, I can help you further.
>>> 
>>> Cheers
>>> 
>>> On Wed, Jun 22, 2016 at 3:45 PM, Tomas Ramanauskas <
>>> Tomas.Ramanauskas@springer.com<ma...@springer.com>>
>>> wrote:
>>> 
>>> I also tried this configuration, but could get the feature to work:
>>> 
>>> 
>>> 
>>> <initParams path="/update/">
>>>   <lst name="defaults">
>>>     <str name="update.chain">classification</str>
>>>   </lst>
>>> </initParams>
>>> 
>>> 
>>> <updateRequestProcessorChain name="classification">
>>>   <processor class="solr.ClassificationUpdateProcessorFactory">
>>>     <str name="inputFields">title_t,author_s</str>
>>>     <str name="classField">cat_s</str>
>>>     <str name="algorithm">bayes</str>
>>>   </processor>
>>> </updateRequestProcessorChain>
>>> 
>>> 
>>> Tomas
>>> 
>>> On 22 Jun 2016, at 13:46, Tomas Ramanauskas <
>>> Tomas.Ramanauskas@springer.com<mailto:Tomas.Ramanauskas@springer.com
>>>> <ma...@springer.com>>
>>> wrote:
>>> 
>>> P.S. The version I use:
>>> 
>>> 6.1.0-68
>>> 
>>> Also, earlier I said “If I modify an existing record, I think the
>>> functionality works:”, but I think it doesn’t work for me at all.
>>> 
>>> $ curl http://localhost:8983/solr/demo/get?id=book1
>>> {
>>> "doc":
>>> {
>>>   "id":"book1",
>>>   "title_t":["The Way of Kings"],
>>>   "author_s":"Brandon Sanderson",
>>>   "cat_s":"fantasy",
>>>   "pubyear_i":2010,
>>>   "ISBN_s":"978-0-7653-2635-5",
>>>   "_version_":1535488016326328320}}
>>> 
>>> $ curl http://localhost:8983/solr/demo/update -d '
>>> [
>>> {"id" : "book1",
>>> "title_t":["The Way of Kings"],
>>> "author_s":"Brandon Sanderson",
>>> "cat_s":"aaa",
>>> "pubyear_i":2010,
>>> "ISBN_s":"978-0-7653-2635-5"
>>> }
>>> ]'
>>> {"responseHeader":{"status":0,"QTime":0}}
>>> 
>>> $ curl http://localhost:8983/solr/demo/get?id=book1
>>> {
>>> "doc":
>>> {
>>>   "id":"book1",
>>>   "title_t":["The Way of Kings"],
>>>   "author_s":"Brandon Sanderson",
>>>   "cat_s":"fantasy",
>>>   "pubyear_i":2010,
>>>   "ISBN_s":"978-0-7653-2635-5",
>>>   "_version_":1535488016326328320}}
>>> 
>>> 
>>> Tomas
>>> 
>>> 
>>> On 22 Jun 2016, at 12:47, Tomas Ramanauskas <
>>> Tomas.Ramanauskas@springer.com<mailto:Tomas.Ramanauskas@springer.com
>>>> <ma...@springer.com>>
>>> wrote:
>>> 
>>> Hi, everyone,
>>> 
>>> 
>>> would someone be able to share a working example (step by step) that
>>> demonstrates the use of Naive Bayes classifier in Solr?
>>> 
>>> 
>>> I followed this Blog post:
>>> 
>>> 
>>> https://alexbenedetti.blogspot.co.uk/2015/07/solr-document-classification-part-1.html?showComment=1464358093048#c2489902302085000947
>>> 
>>> And this tutorial:
>>> http://yonik.com/solr-tutorial/
>>> 
>>> And this JIRA ticket:
>>> https://issues.apache.org/jira/browse/SOLR-7739
>>> 
>>> 
>>> 
>>> So this is my configuration file (only what I added or modified):
>>> 
>>> <initParams path="/update/**">
>>>   <lst name="defaults">
>>>     <str name="update.chain">classification</str>
>>>   </lst>
>>> </initParams>
>>> 
>>> 
>>> <updateRequestProcessorChain name="classification">
>>>   <processor class="solr.ClassificationUpdateProcessorFactory">
>>>     <str name="inputFields">title_t,author_s</str>
>>>     <str name="classField">cat_s</str>
>>>     <str name="algorithm">bayes</str>
>>>   </processor>
>>> </updateRequestProcessorChain>
>>> 
>>> 
>>> 
>>> If I modify an existing record, I think the functionality works:
>>> 
>>> 
>>> $ curl http://localhost:8983/solr/demo/update -d '
>>> [
>>> {"id" : "book1",
>>> "title_t":["The Way of Kings"],
>>> "author_s":"Brandon Sanderson",
>>> "cat_s":"",
>>> "pubyear_i":2010,
>>> "ISBN_s":"978-0-7653-2635-5"
>>> }
>>> ]'
>>> {"responseHeader":{"status":0,"QTime":8}}
>>> $ curl http://localhost:8983/solr/demo/get?id=book1
>>> {
>>> "doc":
>>> {
>>>   "id":"book1",
>>>   "title_t":["The Way of Kings"],
>>>   "author_s":"Brandon Sanderson",
>>>   "cat_s":"fantasy",
>>>   "pubyear_i":2010,
>>>   "ISBN_s":"978-0-7653-2635-5",
>>>   "_version_":1535488016326328320}}
>>> 
>>> 
>>> 
>>> 
>>> If I add a new document, something isn’t quite working:
>>> 
>>> $ curl http://localhost:8983/solr/demo/update -d '
>>> [
>>> {"id" : "book7",
>>> "title_t":["The Way of Kings"],
>>> "author_s":"Brandon Sanderson",
>>> "cat_s":"",
>>> "pubyear_i":2010,
>>> "ISBN_s":"978-0-7653-2635-5"
>>> }
>>> ]'
>>> {"responseHeader":{"status":0,"QTime":0}}
>>> $ curl http://localhost:8983/solr/demo/get?id=book7
>>> {
>>> "doc":null}
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> --
>>> --------------------------
>>> 
>>> Benedetti Alessandro
>>> Visiting card : http://about.me/alessandro_benedetti
>>> 
>>> "Tyger, tyger burning bright
>>> In the forests of the night,
>>> What immortal hand or eye
>>> Could frame thy fearful symmetry?"
>>> 
>>> William Blake - Songs of Experience -1794 England
>>> 
>>> 
>>> 
> 
> 
> -- 
> --------------------------
> 
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
> 
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
> 
> William Blake - Songs of Experience -1794 England


Re: A working example to play with Naive Bayes classifier

Posted by Alessandro Benedetti <be...@gmail.com>.
Can you give an example of your schema, and can you run a simple query for
you index, curious to see how the input fields are analyzed.

Cheers

On Wed, Jun 22, 2016 at 6:05 PM, Alessandro Benedetti <
benedetti.alex85@gmail.com> wrote:

> This is better!  At list the classifier is invoked!
> How many docs in the index have the class assigned?
> Take a look to the stacktrace and you should find the cause!
> I am now on mobile, I will check the code tomorrow!
> Cheers
> On 22 Jun 2016 5:26 pm, "Tomas Ramanauskas" <
> Tomas.Ramanauskas@springer.com> wrote:
>
>>
>> I also tried with this config (adding **):
>>
>>
>>   <initParams path="/update/**">
>>     <lst name="defaults">
>>       <str name="update.chain">classification</str>
>>     </lst>
>>   </initParams>
>>
>>
>>
>>
>>
>> And I get the error:
>>
>>
>>
>> $ curl http://localhost:8983/solr/demo/update -d '
>> [
>> {"id" : "book15",
>> "title_t":["The Way of Kings"],
>> "author_s":"Brandon Sanderson",
>> "cat_s": null,
>> "pubyear_i":2010,
>> "ISBN_s":"978-0-7653-2635-5"
>> }
>> ]'
>> {"responseHeader":{"status":500,"QTime":29},"error":{"trace":"java.lang.NullPointerException\n\tat
>> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.getTokenArray(SimpleNaiveBayesDocumentClassifier.java:202)\n\tat
>> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.analyzeSeedDocument(SimpleNaiveBayesDocumentClassifier.java:162)\n\tat
>> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.assignNormClasses(SimpleNaiveBayesDocumentClassifier.java:121)\n\tat
>> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.assignClass(SimpleNaiveBayesDocumentClassifier.java:81)\n\tat
>> org.apache.solr.update.processor.ClassificationUpdateProcessor.processAdd(ClassificationUpdateProcessor.java:94)\n\tat
>> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.handleAdds(JsonLoader.java:474)\n\tat
>> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:138)\n\tat
>> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:114)\n\tat
>> org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:77)\n\tat
>> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)\n\tat
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69)\n\tat
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)\n\tat
>> org.apache.solr.core.SolrCore.execute(SolrCore.java:2036)\n\tat
>> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:657)\n\tat
>> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)\n\tat
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)\n\tat
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)\n\tat
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)\n\tat
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)\n\tat
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)\n\tat
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)\n\tat
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)\n\tat
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\n\tat
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\n\tat
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat
>> org.eclipse.jetty.server.Server.handle(Server.java:518)\n\tat
>> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)\n\tat
>> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)\n\tat
>> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)\n\tat
>> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)\n\tat
>> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat
>> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)\n\tat
>> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)\n\tat
>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)\n\tat
>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)\n\tat
>> java.lang.Thread.run(Thread.java:745)\n","code":500}}
>>
>>
>> Tomas
>>
>>
>> On 22 Jun 2016, at 17:22, Tomas Ramanauskas <
>> Tomas.Ramanauskas@springer.com<ma...@springer.com>>
>> wrote:
>>
>> Thanks for the response, Alessandro.
>>
>> I tried this and it didn’t work either:
>>
>>
>>
>> $ curl http://localhost:8983/solr/demo/update -d '
>> [
>> {"id" : "book14",
>> "title_t":["The Way of Kings"],
>> "author_s":"Brandon Sanderson",
>> "cat_s": null,
>> "pubyear_i":2010,
>> "ISBN_s":"978-0-7653-2635-5"
>> }
>> ]’
>>
>> {"responseHeader":{"status":0,"QTime":2}}
>>
>> $ curl http://localhost:8983/solr/demo/get?id=book14
>> {
>>   "doc":
>>   {
>>     "id":"book14",
>>     "title_t":["The Way of Kings"],
>>     "author_s":"Brandon Sanderson",
>>     "pubyear_i":2010,
>>     "ISBN_s":"978-0-7653-2635-5",
>>     "_version_":1537854598189940736}}
>>
>>
>> I don’t see “cat_s” field in the results at all.
>>
>>
>> Tomas
>>
>>
>> On 22 Jun 2016, at 16:39, Alessandro Benedetti <abenedetti@apache.org
>> <ma...@apache.org>> wrote:
>>
>> Hi Tomas,
>> first consideration :
>> an empty string is different from a NULL string.
>> This is controversial, I would suggest you to never use the empty String
>> as
>> this can cause some others side effect.
>> Apart from that, the plugin will add the class only if the class field is
>> without any value
>>
>> Object documentClass = doc.getFieldValue(classFieldName);
>> if (documentClass == null) {
>>
>> Saying that, I would suggest you to build a sample index with some
>> document and then try to classify.
>> If this doesn't solve your issue, I can help you further.
>>
>> Cheers
>>
>> On Wed, Jun 22, 2016 at 3:45 PM, Tomas Ramanauskas <
>> Tomas.Ramanauskas@springer.com<ma...@springer.com>>
>> wrote:
>>
>> I also tried this configuration, but could get the feature to work:
>>
>>
>>
>>  <initParams path="/update/">
>>    <lst name="defaults">
>>      <str name="update.chain">classification</str>
>>    </lst>
>>  </initParams>
>>
>>
>>  <updateRequestProcessorChain name="classification">
>>    <processor class="solr.ClassificationUpdateProcessorFactory">
>>      <str name="inputFields">title_t,author_s</str>
>>      <str name="classField">cat_s</str>
>>      <str name="algorithm">bayes</str>
>>    </processor>
>>  </updateRequestProcessorChain>
>>
>>
>> Tomas
>>
>> On 22 Jun 2016, at 13:46, Tomas Ramanauskas <
>> Tomas.Ramanauskas@springer.com<mailto:Tomas.Ramanauskas@springer.com
>> ><ma...@springer.com>>
>> wrote:
>>
>> P.S. The version I use:
>>
>> 6.1.0-68
>>
>> Also, earlier I said “If I modify an existing record, I think the
>> functionality works:”, but I think it doesn’t work for me at all.
>>
>> $ curl http://localhost:8983/solr/demo/get?id=book1
>> {
>>  "doc":
>>  {
>>    "id":"book1",
>>    "title_t":["The Way of Kings"],
>>    "author_s":"Brandon Sanderson",
>>    "cat_s":"fantasy",
>>    "pubyear_i":2010,
>>    "ISBN_s":"978-0-7653-2635-5",
>>    "_version_":1535488016326328320}}
>>
>> $ curl http://localhost:8983/solr/demo/update -d '
>> [
>> {"id" : "book1",
>> "title_t":["The Way of Kings"],
>> "author_s":"Brandon Sanderson",
>> "cat_s":"aaa",
>> "pubyear_i":2010,
>> "ISBN_s":"978-0-7653-2635-5"
>> }
>> ]'
>> {"responseHeader":{"status":0,"QTime":0}}
>>
>> $ curl http://localhost:8983/solr/demo/get?id=book1
>> {
>>  "doc":
>>  {
>>    "id":"book1",
>>    "title_t":["The Way of Kings"],
>>    "author_s":"Brandon Sanderson",
>>    "cat_s":"fantasy",
>>    "pubyear_i":2010,
>>    "ISBN_s":"978-0-7653-2635-5",
>>    "_version_":1535488016326328320}}
>>
>>
>> Tomas
>>
>>
>> On 22 Jun 2016, at 12:47, Tomas Ramanauskas <
>> Tomas.Ramanauskas@springer.com<mailto:Tomas.Ramanauskas@springer.com
>> ><ma...@springer.com>>
>> wrote:
>>
>> Hi, everyone,
>>
>>
>> would someone be able to share a working example (step by step) that
>> demonstrates the use of Naive Bayes classifier in Solr?
>>
>>
>> I followed this Blog post:
>>
>>
>> https://alexbenedetti.blogspot.co.uk/2015/07/solr-document-classification-part-1.html?showComment=1464358093048#c2489902302085000947
>>
>> And this tutorial:
>> http://yonik.com/solr-tutorial/
>>
>> And this JIRA ticket:
>> https://issues.apache.org/jira/browse/SOLR-7739
>>
>>
>>
>> So this is my configuration file (only what I added or modified):
>>
>>  <initParams path="/update/**">
>>    <lst name="defaults">
>>      <str name="update.chain">classification</str>
>>    </lst>
>>  </initParams>
>>
>>
>>  <updateRequestProcessorChain name="classification">
>>    <processor class="solr.ClassificationUpdateProcessorFactory">
>>      <str name="inputFields">title_t,author_s</str>
>>      <str name="classField">cat_s</str>
>>      <str name="algorithm">bayes</str>
>>    </processor>
>>  </updateRequestProcessorChain>
>>
>>
>>
>> If I modify an existing record, I think the functionality works:
>>
>>
>> $ curl http://localhost:8983/solr/demo/update -d '
>> [
>> {"id" : "book1",
>> "title_t":["The Way of Kings"],
>> "author_s":"Brandon Sanderson",
>> "cat_s":"",
>> "pubyear_i":2010,
>> "ISBN_s":"978-0-7653-2635-5"
>> }
>> ]'
>> {"responseHeader":{"status":0,"QTime":8}}
>> $ curl http://localhost:8983/solr/demo/get?id=book1
>> {
>>  "doc":
>>  {
>>    "id":"book1",
>>    "title_t":["The Way of Kings"],
>>    "author_s":"Brandon Sanderson",
>>    "cat_s":"fantasy",
>>    "pubyear_i":2010,
>>    "ISBN_s":"978-0-7653-2635-5",
>>    "_version_":1535488016326328320}}
>>
>>
>>
>>
>> If I add a new document, something isn’t quite working:
>>
>> $ curl http://localhost:8983/solr/demo/update -d '
>> [
>> {"id" : "book7",
>> "title_t":["The Way of Kings"],
>> "author_s":"Brandon Sanderson",
>> "cat_s":"",
>> "pubyear_i":2010,
>> "ISBN_s":"978-0-7653-2635-5"
>> }
>> ]'
>> {"responseHeader":{"status":0,"QTime":0}}
>> $ curl http://localhost:8983/solr/demo/get?id=book7
>> {
>>  "doc":null}
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> --------------------------
>>
>> Benedetti Alessandro
>> Visiting card : http://about.me/alessandro_benedetti
>>
>> "Tyger, tyger burning bright
>> In the forests of the night,
>> What immortal hand or eye
>> Could frame thy fearful symmetry?"
>>
>> William Blake - Songs of Experience -1794 England
>>
>>
>>


-- 
--------------------------

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: A working example to play with Naive Bayes classifier

Posted by Alessandro Benedetti <be...@gmail.com>.
This is better!  At list the classifier is invoked!
How many docs in the index have the class assigned?
Take a look to the stacktrace and you should find the cause!
I am now on mobile, I will check the code tomorrow!
Cheers
On 22 Jun 2016 5:26 pm, "Tomas Ramanauskas" <To...@springer.com>
wrote:

>
> I also tried with this config (adding **):
>
>
>   <initParams path="/update/**">
>     <lst name="defaults">
>       <str name="update.chain">classification</str>
>     </lst>
>   </initParams>
>
>
>
>
>
> And I get the error:
>
>
>
> $ curl http://localhost:8983/solr/demo/update -d '
> [
> {"id" : "book15",
> "title_t":["The Way of Kings"],
> "author_s":"Brandon Sanderson",
> "cat_s": null,
> "pubyear_i":2010,
> "ISBN_s":"978-0-7653-2635-5"
> }
> ]'
> {"responseHeader":{"status":500,"QTime":29},"error":{"trace":"java.lang.NullPointerException\n\tat
> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.getTokenArray(SimpleNaiveBayesDocumentClassifier.java:202)\n\tat
> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.analyzeSeedDocument(SimpleNaiveBayesDocumentClassifier.java:162)\n\tat
> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.assignNormClasses(SimpleNaiveBayesDocumentClassifier.java:121)\n\tat
> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.assignClass(SimpleNaiveBayesDocumentClassifier.java:81)\n\tat
> org.apache.solr.update.processor.ClassificationUpdateProcessor.processAdd(ClassificationUpdateProcessor.java:94)\n\tat
> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.handleAdds(JsonLoader.java:474)\n\tat
> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:138)\n\tat
> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:114)\n\tat
> org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:77)\n\tat
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)\n\tat
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69)\n\tat
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)\n\tat
> org.apache.solr.core.SolrCore.execute(SolrCore.java:2036)\n\tat
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:657)\n\tat
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)\n\tat
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)\n\tat
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)\n\tat
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)\n\tat
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)\n\tat
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)\n\tat
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)\n\tat
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)\n\tat
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\n\tat
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\n\tat
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat
> org.eclipse.jetty.server.Server.handle(Server.java:518)\n\tat
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)\n\tat
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)\n\tat
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)\n\tat
> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)\n\tat
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)\n\tat
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)\n\tat
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)\n\tat
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)\n\tat
> java.lang.Thread.run(Thread.java:745)\n","code":500}}
>
>
> Tomas
>
>
> On 22 Jun 2016, at 17:22, Tomas Ramanauskas <
> Tomas.Ramanauskas@springer.com<ma...@springer.com>>
> wrote:
>
> Thanks for the response, Alessandro.
>
> I tried this and it didn’t work either:
>
>
>
> $ curl http://localhost:8983/solr/demo/update -d '
> [
> {"id" : "book14",
> "title_t":["The Way of Kings"],
> "author_s":"Brandon Sanderson",
> "cat_s": null,
> "pubyear_i":2010,
> "ISBN_s":"978-0-7653-2635-5"
> }
> ]’
>
> {"responseHeader":{"status":0,"QTime":2}}
>
> $ curl http://localhost:8983/solr/demo/get?id=book14
> {
>   "doc":
>   {
>     "id":"book14",
>     "title_t":["The Way of Kings"],
>     "author_s":"Brandon Sanderson",
>     "pubyear_i":2010,
>     "ISBN_s":"978-0-7653-2635-5",
>     "_version_":1537854598189940736}}
>
>
> I don’t see “cat_s” field in the results at all.
>
>
> Tomas
>
>
> On 22 Jun 2016, at 16:39, Alessandro Benedetti <abenedetti@apache.org
> <ma...@apache.org>> wrote:
>
> Hi Tomas,
> first consideration :
> an empty string is different from a NULL string.
> This is controversial, I would suggest you to never use the empty String as
> this can cause some others side effect.
> Apart from that, the plugin will add the class only if the class field is
> without any value
>
> Object documentClass = doc.getFieldValue(classFieldName);
> if (documentClass == null) {
>
> Saying that, I would suggest you to build a sample index with some
> document and then try to classify.
> If this doesn't solve your issue, I can help you further.
>
> Cheers
>
> On Wed, Jun 22, 2016 at 3:45 PM, Tomas Ramanauskas <
> Tomas.Ramanauskas@springer.com<ma...@springer.com>>
> wrote:
>
> I also tried this configuration, but could get the feature to work:
>
>
>
>  <initParams path="/update/">
>    <lst name="defaults">
>      <str name="update.chain">classification</str>
>    </lst>
>  </initParams>
>
>
>  <updateRequestProcessorChain name="classification">
>    <processor class="solr.ClassificationUpdateProcessorFactory">
>      <str name="inputFields">title_t,author_s</str>
>      <str name="classField">cat_s</str>
>      <str name="algorithm">bayes</str>
>    </processor>
>  </updateRequestProcessorChain>
>
>
> Tomas
>
> On 22 Jun 2016, at 13:46, Tomas Ramanauskas <
> Tomas.Ramanauskas@springer.com<mailto:Tomas.Ramanauskas@springer.com
> ><ma...@springer.com>>
> wrote:
>
> P.S. The version I use:
>
> 6.1.0-68
>
> Also, earlier I said “If I modify an existing record, I think the
> functionality works:”, but I think it doesn’t work for me at all.
>
> $ curl http://localhost:8983/solr/demo/get?id=book1
> {
>  "doc":
>  {
>    "id":"book1",
>    "title_t":["The Way of Kings"],
>    "author_s":"Brandon Sanderson",
>    "cat_s":"fantasy",
>    "pubyear_i":2010,
>    "ISBN_s":"978-0-7653-2635-5",
>    "_version_":1535488016326328320}}
>
> $ curl http://localhost:8983/solr/demo/update -d '
> [
> {"id" : "book1",
> "title_t":["The Way of Kings"],
> "author_s":"Brandon Sanderson",
> "cat_s":"aaa",
> "pubyear_i":2010,
> "ISBN_s":"978-0-7653-2635-5"
> }
> ]'
> {"responseHeader":{"status":0,"QTime":0}}
>
> $ curl http://localhost:8983/solr/demo/get?id=book1
> {
>  "doc":
>  {
>    "id":"book1",
>    "title_t":["The Way of Kings"],
>    "author_s":"Brandon Sanderson",
>    "cat_s":"fantasy",
>    "pubyear_i":2010,
>    "ISBN_s":"978-0-7653-2635-5",
>    "_version_":1535488016326328320}}
>
>
> Tomas
>
>
> On 22 Jun 2016, at 12:47, Tomas Ramanauskas <
> Tomas.Ramanauskas@springer.com<mailto:Tomas.Ramanauskas@springer.com
> ><ma...@springer.com>>
> wrote:
>
> Hi, everyone,
>
>
> would someone be able to share a working example (step by step) that
> demonstrates the use of Naive Bayes classifier in Solr?
>
>
> I followed this Blog post:
>
>
> https://alexbenedetti.blogspot.co.uk/2015/07/solr-document-classification-part-1.html?showComment=1464358093048#c2489902302085000947
>
> And this tutorial:
> http://yonik.com/solr-tutorial/
>
> And this JIRA ticket:
> https://issues.apache.org/jira/browse/SOLR-7739
>
>
>
> So this is my configuration file (only what I added or modified):
>
>  <initParams path="/update/**">
>    <lst name="defaults">
>      <str name="update.chain">classification</str>
>    </lst>
>  </initParams>
>
>
>  <updateRequestProcessorChain name="classification">
>    <processor class="solr.ClassificationUpdateProcessorFactory">
>      <str name="inputFields">title_t,author_s</str>
>      <str name="classField">cat_s</str>
>      <str name="algorithm">bayes</str>
>    </processor>
>  </updateRequestProcessorChain>
>
>
>
> If I modify an existing record, I think the functionality works:
>
>
> $ curl http://localhost:8983/solr/demo/update -d '
> [
> {"id" : "book1",
> "title_t":["The Way of Kings"],
> "author_s":"Brandon Sanderson",
> "cat_s":"",
> "pubyear_i":2010,
> "ISBN_s":"978-0-7653-2635-5"
> }
> ]'
> {"responseHeader":{"status":0,"QTime":8}}
> $ curl http://localhost:8983/solr/demo/get?id=book1
> {
>  "doc":
>  {
>    "id":"book1",
>    "title_t":["The Way of Kings"],
>    "author_s":"Brandon Sanderson",
>    "cat_s":"fantasy",
>    "pubyear_i":2010,
>    "ISBN_s":"978-0-7653-2635-5",
>    "_version_":1535488016326328320}}
>
>
>
>
> If I add a new document, something isn’t quite working:
>
> $ curl http://localhost:8983/solr/demo/update -d '
> [
> {"id" : "book7",
> "title_t":["The Way of Kings"],
> "author_s":"Brandon Sanderson",
> "cat_s":"",
> "pubyear_i":2010,
> "ISBN_s":"978-0-7653-2635-5"
> }
> ]'
> {"responseHeader":{"status":0,"QTime":0}}
> $ curl http://localhost:8983/solr/demo/get?id=book7
> {
>  "doc":null}
>
>
>
>
>
>
>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>
>
>

Re: A working example to play with Naive Bayes classifier

Posted by Tomas Ramanauskas <To...@springer.com>.
I also tried with this config (adding **):


  <initParams path="/update/**">
    <lst name="defaults">
      <str name="update.chain">classification</str>
    </lst>
  </initParams>





And I get the error:



$ curl http://localhost:8983/solr/demo/update -d '
[
{"id" : "book15",
"title_t":["The Way of Kings"],
"author_s":"Brandon Sanderson",
"cat_s": null,
"pubyear_i":2010,
"ISBN_s":"978-0-7653-2635-5"
}
]'
{"responseHeader":{"status":500,"QTime":29},"error":{"trace":"java.lang.NullPointerException\n\tat org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.getTokenArray(SimpleNaiveBayesDocumentClassifier.java:202)\n\tat org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.analyzeSeedDocument(SimpleNaiveBayesDocumentClassifier.java:162)\n\tat org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.assignNormClasses(SimpleNaiveBayesDocumentClassifier.java:121)\n\tat org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.assignClass(SimpleNaiveBayesDocumentClassifier.java:81)\n\tat org.apache.solr.update.processor.ClassificationUpdateProcessor.processAdd(ClassificationUpdateProcessor.java:94)\n\tat org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.handleAdds(JsonLoader.java:474)\n\tat org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:138)\n\tat org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:114)\n\tat org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:77)\n\tat org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)\n\tat org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:2036)\n\tat org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:657)\n\tat org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:518)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)\n\tat org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)\n\tat org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)\n\tat java.lang.Thread.run(Thread.java:745)\n","code":500}}


Tomas


On 22 Jun 2016, at 17:22, Tomas Ramanauskas <To...@springer.com>> wrote:

Thanks for the response, Alessandro.

I tried this and it didn’t work either:



$ curl http://localhost:8983/solr/demo/update -d '
[
{"id" : "book14",
"title_t":["The Way of Kings"],
"author_s":"Brandon Sanderson",
"cat_s": null,
"pubyear_i":2010,
"ISBN_s":"978-0-7653-2635-5"
}
]’

{"responseHeader":{"status":0,"QTime":2}}

$ curl http://localhost:8983/solr/demo/get?id=book14
{
  "doc":
  {
    "id":"book14",
    "title_t":["The Way of Kings"],
    "author_s":"Brandon Sanderson",
    "pubyear_i":2010,
    "ISBN_s":"978-0-7653-2635-5",
    "_version_":1537854598189940736}}


I don’t see “cat_s” field in the results at all.


Tomas


On 22 Jun 2016, at 16:39, Alessandro Benedetti <ab...@apache.org>> wrote:

Hi Tomas,
first consideration :
an empty string is different from a NULL string.
This is controversial, I would suggest you to never use the empty String as
this can cause some others side effect.
Apart from that, the plugin will add the class only if the class field is
without any value

Object documentClass = doc.getFieldValue(classFieldName);
if (documentClass == null) {

Saying that, I would suggest you to build a sample index with some
document and then try to classify.
If this doesn't solve your issue, I can help you further.

Cheers

On Wed, Jun 22, 2016 at 3:45 PM, Tomas Ramanauskas <
Tomas.Ramanauskas@springer.com<ma...@springer.com>> wrote:

I also tried this configuration, but could get the feature to work:



 <initParams path="/update/">
   <lst name="defaults">
     <str name="update.chain">classification</str>
   </lst>
 </initParams>


 <updateRequestProcessorChain name="classification">
   <processor class="solr.ClassificationUpdateProcessorFactory">
     <str name="inputFields">title_t,author_s</str>
     <str name="classField">cat_s</str>
     <str name="algorithm">bayes</str>
   </processor>
 </updateRequestProcessorChain>


Tomas

On 22 Jun 2016, at 13:46, Tomas Ramanauskas <
Tomas.Ramanauskas@springer.com<ma...@springer.com>>
wrote:

P.S. The version I use:

6.1.0-68

Also, earlier I said “If I modify an existing record, I think the
functionality works:”, but I think it doesn’t work for me at all.

$ curl http://localhost:8983/solr/demo/get?id=book1
{
 "doc":
 {
   "id":"book1",
   "title_t":["The Way of Kings"],
   "author_s":"Brandon Sanderson",
   "cat_s":"fantasy",
   "pubyear_i":2010,
   "ISBN_s":"978-0-7653-2635-5",
   "_version_":1535488016326328320}}

$ curl http://localhost:8983/solr/demo/update -d '
[
{"id" : "book1",
"title_t":["The Way of Kings"],
"author_s":"Brandon Sanderson",
"cat_s":"aaa",
"pubyear_i":2010,
"ISBN_s":"978-0-7653-2635-5"
}
]'
{"responseHeader":{"status":0,"QTime":0}}

$ curl http://localhost:8983/solr/demo/get?id=book1
{
 "doc":
 {
   "id":"book1",
   "title_t":["The Way of Kings"],
   "author_s":"Brandon Sanderson",
   "cat_s":"fantasy",
   "pubyear_i":2010,
   "ISBN_s":"978-0-7653-2635-5",
   "_version_":1535488016326328320}}


Tomas


On 22 Jun 2016, at 12:47, Tomas Ramanauskas <
Tomas.Ramanauskas@springer.com<ma...@springer.com>>
wrote:

Hi, everyone,


would someone be able to share a working example (step by step) that
demonstrates the use of Naive Bayes classifier in Solr?


I followed this Blog post:

https://alexbenedetti.blogspot.co.uk/2015/07/solr-document-classification-part-1.html?showComment=1464358093048#c2489902302085000947

And this tutorial:
http://yonik.com/solr-tutorial/

And this JIRA ticket:
https://issues.apache.org/jira/browse/SOLR-7739



So this is my configuration file (only what I added or modified):

 <initParams path="/update/**">
   <lst name="defaults">
     <str name="update.chain">classification</str>
   </lst>
 </initParams>


 <updateRequestProcessorChain name="classification">
   <processor class="solr.ClassificationUpdateProcessorFactory">
     <str name="inputFields">title_t,author_s</str>
     <str name="classField">cat_s</str>
     <str name="algorithm">bayes</str>
   </processor>
 </updateRequestProcessorChain>



If I modify an existing record, I think the functionality works:


$ curl http://localhost:8983/solr/demo/update -d '
[
{"id" : "book1",
"title_t":["The Way of Kings"],
"author_s":"Brandon Sanderson",
"cat_s":"",
"pubyear_i":2010,
"ISBN_s":"978-0-7653-2635-5"
}
]'
{"responseHeader":{"status":0,"QTime":8}}
$ curl http://localhost:8983/solr/demo/get?id=book1
{
 "doc":
 {
   "id":"book1",
   "title_t":["The Way of Kings"],
   "author_s":"Brandon Sanderson",
   "cat_s":"fantasy",
   "pubyear_i":2010,
   "ISBN_s":"978-0-7653-2635-5",
   "_version_":1535488016326328320}}




If I add a new document, something isn’t quite working:

$ curl http://localhost:8983/solr/demo/update -d '
[
{"id" : "book7",
"title_t":["The Way of Kings"],
"author_s":"Brandon Sanderson",
"cat_s":"",
"pubyear_i":2010,
"ISBN_s":"978-0-7653-2635-5"
}
]'
{"responseHeader":{"status":0,"QTime":0}}
$ curl http://localhost:8983/solr/demo/get?id=book7
{
 "doc":null}









--
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England



Re: A working example to play with Naive Bayes classifier

Posted by Tomas Ramanauskas <To...@springer.com>.
Thanks for the response, Alessandro.

I tried this and it didn’t work either:



$ curl http://localhost:8983/solr/demo/update -d '
[
{"id" : "book14",
"title_t":["The Way of Kings"],
"author_s":"Brandon Sanderson",
"cat_s": null,
"pubyear_i":2010,
"ISBN_s":"978-0-7653-2635-5"
}
]’

{"responseHeader":{"status":0,"QTime":2}}

$ curl http://localhost:8983/solr/demo/get?id=book14
{
  "doc":
  {
    "id":"book14",
    "title_t":["The Way of Kings"],
    "author_s":"Brandon Sanderson",
    "pubyear_i":2010,
    "ISBN_s":"978-0-7653-2635-5",
    "_version_":1537854598189940736}}


I don’t see “cat_s” field in the results at all.


Tomas


On 22 Jun 2016, at 16:39, Alessandro Benedetti <ab...@apache.org>> wrote:

Hi Tomas,
first consideration :
an empty string is different from a NULL string.
This is controversial, I would suggest you to never use the empty String as
this can cause some others side effect.
Apart from that, the plugin will add the class only if the class field is
without any value

Object documentClass = doc.getFieldValue(classFieldName);
if (documentClass == null) {

Saying that, I would suggest you to build a sample index with some
document and then try to classify.
If this doesn't solve your issue, I can help you further.

Cheers

On Wed, Jun 22, 2016 at 3:45 PM, Tomas Ramanauskas <
Tomas.Ramanauskas@springer.com<ma...@springer.com>> wrote:

I also tried this configuration, but could get the feature to work:



 <initParams path="/update/">
   <lst name="defaults">
     <str name="update.chain">classification</str>
   </lst>
 </initParams>


 <updateRequestProcessorChain name="classification">
   <processor class="solr.ClassificationUpdateProcessorFactory">
     <str name="inputFields">title_t,author_s</str>
     <str name="classField">cat_s</str>
     <str name="algorithm">bayes</str>
   </processor>
 </updateRequestProcessorChain>


Tomas

On 22 Jun 2016, at 13:46, Tomas Ramanauskas <
Tomas.Ramanauskas@springer.com<ma...@springer.com>>
wrote:

P.S. The version I use:

6.1.0-68

Also, earlier I said “If I modify an existing record, I think the
functionality works:”, but I think it doesn’t work for me at all.

$ curl http://localhost:8983/solr/demo/get?id=book1
{
 "doc":
 {
   "id":"book1",
   "title_t":["The Way of Kings"],
   "author_s":"Brandon Sanderson",
   "cat_s":"fantasy",
   "pubyear_i":2010,
   "ISBN_s":"978-0-7653-2635-5",
   "_version_":1535488016326328320}}

$ curl http://localhost:8983/solr/demo/update -d '
[
{"id" : "book1",
"title_t":["The Way of Kings"],
"author_s":"Brandon Sanderson",
"cat_s":"aaa",
"pubyear_i":2010,
"ISBN_s":"978-0-7653-2635-5"
}
]'
{"responseHeader":{"status":0,"QTime":0}}

$ curl http://localhost:8983/solr/demo/get?id=book1
{
 "doc":
 {
   "id":"book1",
   "title_t":["The Way of Kings"],
   "author_s":"Brandon Sanderson",
   "cat_s":"fantasy",
   "pubyear_i":2010,
   "ISBN_s":"978-0-7653-2635-5",
   "_version_":1535488016326328320}}


Tomas


On 22 Jun 2016, at 12:47, Tomas Ramanauskas <
Tomas.Ramanauskas@springer.com<ma...@springer.com>>
wrote:

Hi, everyone,


would someone be able to share a working example (step by step) that
demonstrates the use of Naive Bayes classifier in Solr?


I followed this Blog post:

https://alexbenedetti.blogspot.co.uk/2015/07/solr-document-classification-part-1.html?showComment=1464358093048#c2489902302085000947

And this tutorial:
http://yonik.com/solr-tutorial/

And this JIRA ticket:
https://issues.apache.org/jira/browse/SOLR-7739



So this is my configuration file (only what I added or modified):

 <initParams path="/update/**">
   <lst name="defaults">
     <str name="update.chain">classification</str>
   </lst>
 </initParams>


 <updateRequestProcessorChain name="classification">
   <processor class="solr.ClassificationUpdateProcessorFactory">
     <str name="inputFields">title_t,author_s</str>
     <str name="classField">cat_s</str>
     <str name="algorithm">bayes</str>
   </processor>
 </updateRequestProcessorChain>



If I modify an existing record, I think the functionality works:


$ curl http://localhost:8983/solr/demo/update -d '
[
{"id" : "book1",
"title_t":["The Way of Kings"],
"author_s":"Brandon Sanderson",
"cat_s":"",
"pubyear_i":2010,
"ISBN_s":"978-0-7653-2635-5"
}
]'
{"responseHeader":{"status":0,"QTime":8}}
$ curl http://localhost:8983/solr/demo/get?id=book1
{
 "doc":
 {
   "id":"book1",
   "title_t":["The Way of Kings"],
   "author_s":"Brandon Sanderson",
   "cat_s":"fantasy",
   "pubyear_i":2010,
   "ISBN_s":"978-0-7653-2635-5",
   "_version_":1535488016326328320}}




If I add a new document, something isn’t quite working:

$ curl http://localhost:8983/solr/demo/update -d '
[
{"id" : "book7",
"title_t":["The Way of Kings"],
"author_s":"Brandon Sanderson",
"cat_s":"",
"pubyear_i":2010,
"ISBN_s":"978-0-7653-2635-5"
}
]'
{"responseHeader":{"status":0,"QTime":0}}
$ curl http://localhost:8983/solr/demo/get?id=book7
{
 "doc":null}









--
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England