You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Susheel Kumar <su...@gmail.com> on 2017/02/07 19:14:13 UTC

alerting system with Solr's Streaming Expressions

Hello,

I am tried to follow http://joelsolr.blogspot.com/ to see if we can
classify positive & negative feedbacks using streaming expressions.  All
works but end result where probability_d result of classify expression
gives similar results for positive / negative feedback. See below

What I may be missing here.  Do i need to put more data in training set or
something else?


{ "result-set": { "docs": [ { "body_txt": [ "love the company" ],
"score_d": 2.1892474120319667, "id": "6", "probability_d":
0.977944433135261 }, { "body_txt": [ "bad experience " ], "score_d":
3.1689453250842914, "id": "5", "probability_d": 0.9888109278133054 }, {
"body_txt": [ "This company rewards its employees, but you should only work
here if you truly love sales. The stress of the job can get to you and they
definitely push you." ], "score_d": 4.621702323888672, "id": "4",
"probability_d": 0.9999999999898557 }, { "body_txt": [ "no chance for
advancement with that company every year I was there it got worse I don't
know if all branches of adp but Florence organization was turn over rate
would be higher if it was for temp workers" ], "score_d":
5.288898825826228, "id": "3", "probability_d": 0.9999999999999956 }, {
"body_txt": [ "It was a pleasure to work at the Milpitas campus. The team
that works there are professional and dedicated individuals. The level of
loyalty and dedication is impressive" ], "score_d": 2.5303947056922937,
"id": "2", "probability_d": 0.9999990430778418 },

Re: alerting system with Solr's Streaming Expressions

Posted by Susheel Kumar <su...@gmail.com>.
Hello Joel,

I took a bigger trainingSet around 200K documents (amazon reviews) and it
worked out well.  I verified the feature terms extracted and classify
function was able to output correct probability of reviews being negative
or positive.  Big thanks for adding this.

I wonder what you have next to implement more towards NLU in Solr where
queries like "average revenue in last quarter" etc. can be converted to
streaming functions to return appropriate results.

Thanks,
Susheel


On Thu, Feb 9, 2017 at 11:23 AM, Susheel Kumar <su...@gmail.com>
wrote:

> got it, Thanks, Joel.
>
> On Thu, Feb 9, 2017 at 11:17 AM, Susheel Kumar <su...@gmail.com>
> wrote:
>
>> I increased from 250 to 2500 and 100 to 1000 when did't get expected
>> result.  Let me put more examples.
>>
>> Thanks,
>> Susheel
>>
>> On Thu, Feb 9, 2017 at 11:03 AM, Joel Bernstein <jo...@gmail.com>
>> wrote:
>>
>>> A few things that I see right off:
>>>
>>> 1) 2500 terms is too many. I was testing with 100-250 terms
>>> 2) 1000 iterations is to high. If the model hasn't converged by 100
>>> iterations it's likely not going to converge.
>>> 3) You're going to need more examples. You may want to run features first
>>> and see what it selects. Then you need multiple examples for each
>>> feature.
>>> I was testing with the enron ham/spam data set. It would be good to
>>> download that dataset and see what that looks like.
>>>
>>> Joel Bernstein
>>> http://joelsolr.blogspot.com/
>>>
>>> On Thu, Feb 9, 2017 at 10:15 AM, Susheel Kumar <su...@gmail.com>
>>> wrote:
>>>
>>> > Hello Joel,
>>> >
>>> > Here is the final iteration in json format.
>>> >
>>> >  https://www.dropbox.com/s/g3a3606ms6cu8q4/final_iteration.json?dl=0
>>> >
>>> > Below is the expression used
>>> >
>>> > update(models,
>>> >              batchSize="50",
>>> >              train(trainingSet,
>>> >                       features(trainingSet,
>>> >                                      q="*:*",
>>> >                                      featureSet="threatFeatures",
>>> >                                      field="body_txt",
>>> >                                      outcome="out_i",
>>> >                                      numTerms=2500),
>>> >                       q="*:*",
>>> >                       name="threatModel",
>>> >                       field="body_txt",
>>> >                       outcome="out_i",
>>> >                       maxIterations="1000"))
>>> >
>>> > I just have 16 documents with 8+ve and 8-ves. The field which contains
>>> the
>>> > feedback is body_txt (text_general type)
>>> >
>>> > Thanks for looking.
>>> >
>>> >
>>> >
>>> > On Wed, Feb 8, 2017 at 7:52 AM, Joel Bernstein <jo...@gmail.com>
>>> wrote:
>>> >
>>> > > Can you post the final iteration of the model?
>>> > >
>>> > > Also the expression you used to train the model?
>>> > >
>>> > > How much training data do you have? Ho many positive examples and
>>> > negatives
>>> > > examples?
>>> > >
>>> > > Joel Bernstein
>>> > > http://joelsolr.blogspot.com/
>>> > >
>>> > > On Tue, Feb 7, 2017 at 2:14 PM, Susheel Kumar <susheel2777@gmail.com
>>> >
>>> > > wrote:
>>> > >
>>> > > > Hello,
>>> > > >
>>> > > > I am tried to follow http://joelsolr.blogspot.com/ to see if we
>>> can
>>> > > > classify positive & negative feedbacks using streaming expressions.
>>> > All
>>> > > > works but end result where probability_d result of classify
>>> expression
>>> > > > gives similar results for positive / negative feedback. See below
>>> > > >
>>> > > > What I may be missing here.  Do i need to put more data in
>>> training set
>>> > > or
>>> > > > something else?
>>> > > >
>>> > > >
>>> > > > { "result-set": { "docs": [ { "body_txt": [ "love the company" ],
>>> > > > "score_d": 2.1892474120319667, "id": "6", "probability_d":
>>> > > > 0.977944433135261 }, { "body_txt": [ "bad experience " ],
>>> "score_d":
>>> > > > 3.1689453250842914, "id": "5", "probability_d": 0.9888109278133054
>>> }, {
>>> > > > "body_txt": [ "This company rewards its employees, but you should
>>> only
>>> > > work
>>> > > > here if you truly love sales. The stress of the job can get to you
>>> and
>>> > > they
>>> > > > definitely push you." ], "score_d": 4.621702323888672, "id": "4",
>>> > > > "probability_d": 0.9999999999898557 }, { "body_txt": [ "no chance
>>> for
>>> > > > advancement with that company every year I was there it got worse I
>>> > don't
>>> > > > know if all branches of adp but Florence organization was turn over
>>> > rate
>>> > > > would be higher if it was for temp workers" ], "score_d":
>>> > > > 5.288898825826228, "id": "3", "probability_d": 0.9999999999999956
>>> }, {
>>> > > > "body_txt": [ "It was a pleasure to work at the Milpitas campus.
>>> The
>>> > team
>>> > > > that works there are professional and dedicated individuals. The
>>> level
>>> > of
>>> > > > loyalty and dedication is impressive" ], "score_d":
>>> 2.5303947056922937,
>>> > > > "id": "2", "probability_d": 0.9999990430778418 },
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>

Re: alerting system with Solr's Streaming Expressions

Posted by Susheel Kumar <su...@gmail.com>.
got it, Thanks, Joel.

On Thu, Feb 9, 2017 at 11:17 AM, Susheel Kumar <su...@gmail.com>
wrote:

> I increased from 250 to 2500 and 100 to 1000 when did't get expected
> result.  Let me put more examples.
>
> Thanks,
> Susheel
>
> On Thu, Feb 9, 2017 at 11:03 AM, Joel Bernstein <jo...@gmail.com>
> wrote:
>
>> A few things that I see right off:
>>
>> 1) 2500 terms is too many. I was testing with 100-250 terms
>> 2) 1000 iterations is to high. If the model hasn't converged by 100
>> iterations it's likely not going to converge.
>> 3) You're going to need more examples. You may want to run features first
>> and see what it selects. Then you need multiple examples for each feature.
>> I was testing with the enron ham/spam data set. It would be good to
>> download that dataset and see what that looks like.
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Thu, Feb 9, 2017 at 10:15 AM, Susheel Kumar <su...@gmail.com>
>> wrote:
>>
>> > Hello Joel,
>> >
>> > Here is the final iteration in json format.
>> >
>> >  https://www.dropbox.com/s/g3a3606ms6cu8q4/final_iteration.json?dl=0
>> >
>> > Below is the expression used
>> >
>> > update(models,
>> >              batchSize="50",
>> >              train(trainingSet,
>> >                       features(trainingSet,
>> >                                      q="*:*",
>> >                                      featureSet="threatFeatures",
>> >                                      field="body_txt",
>> >                                      outcome="out_i",
>> >                                      numTerms=2500),
>> >                       q="*:*",
>> >                       name="threatModel",
>> >                       field="body_txt",
>> >                       outcome="out_i",
>> >                       maxIterations="1000"))
>> >
>> > I just have 16 documents with 8+ve and 8-ves. The field which contains
>> the
>> > feedback is body_txt (text_general type)
>> >
>> > Thanks for looking.
>> >
>> >
>> >
>> > On Wed, Feb 8, 2017 at 7:52 AM, Joel Bernstein <jo...@gmail.com>
>> wrote:
>> >
>> > > Can you post the final iteration of the model?
>> > >
>> > > Also the expression you used to train the model?
>> > >
>> > > How much training data do you have? Ho many positive examples and
>> > negatives
>> > > examples?
>> > >
>> > > Joel Bernstein
>> > > http://joelsolr.blogspot.com/
>> > >
>> > > On Tue, Feb 7, 2017 at 2:14 PM, Susheel Kumar <su...@gmail.com>
>> > > wrote:
>> > >
>> > > > Hello,
>> > > >
>> > > > I am tried to follow http://joelsolr.blogspot.com/ to see if we can
>> > > > classify positive & negative feedbacks using streaming expressions.
>> > All
>> > > > works but end result where probability_d result of classify
>> expression
>> > > > gives similar results for positive / negative feedback. See below
>> > > >
>> > > > What I may be missing here.  Do i need to put more data in training
>> set
>> > > or
>> > > > something else?
>> > > >
>> > > >
>> > > > { "result-set": { "docs": [ { "body_txt": [ "love the company" ],
>> > > > "score_d": 2.1892474120319667, "id": "6", "probability_d":
>> > > > 0.977944433135261 }, { "body_txt": [ "bad experience " ], "score_d":
>> > > > 3.1689453250842914, "id": "5", "probability_d": 0.9888109278133054
>> }, {
>> > > > "body_txt": [ "This company rewards its employees, but you should
>> only
>> > > work
>> > > > here if you truly love sales. The stress of the job can get to you
>> and
>> > > they
>> > > > definitely push you." ], "score_d": 4.621702323888672, "id": "4",
>> > > > "probability_d": 0.9999999999898557 }, { "body_txt": [ "no chance
>> for
>> > > > advancement with that company every year I was there it got worse I
>> > don't
>> > > > know if all branches of adp but Florence organization was turn over
>> > rate
>> > > > would be higher if it was for temp workers" ], "score_d":
>> > > > 5.288898825826228, "id": "3", "probability_d": 0.9999999999999956
>> }, {
>> > > > "body_txt": [ "It was a pleasure to work at the Milpitas campus. The
>> > team
>> > > > that works there are professional and dedicated individuals. The
>> level
>> > of
>> > > > loyalty and dedication is impressive" ], "score_d":
>> 2.5303947056922937,
>> > > > "id": "2", "probability_d": 0.9999990430778418 },
>> > > >
>> > >
>> >
>>
>
>

Re: alerting system with Solr's Streaming Expressions

Posted by Susheel Kumar <su...@gmail.com>.
I increased from 250 to 2500 and 100 to 1000 when did't get expected
result.  Let me put more examples.

Thanks,
Susheel

On Thu, Feb 9, 2017 at 11:03 AM, Joel Bernstein <jo...@gmail.com> wrote:

> A few things that I see right off:
>
> 1) 2500 terms is too many. I was testing with 100-250 terms
> 2) 1000 iterations is to high. If the model hasn't converged by 100
> iterations it's likely not going to converge.
> 3) You're going to need more examples. You may want to run features first
> and see what it selects. Then you need multiple examples for each feature.
> I was testing with the enron ham/spam data set. It would be good to
> download that dataset and see what that looks like.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Thu, Feb 9, 2017 at 10:15 AM, Susheel Kumar <su...@gmail.com>
> wrote:
>
> > Hello Joel,
> >
> > Here is the final iteration in json format.
> >
> >  https://www.dropbox.com/s/g3a3606ms6cu8q4/final_iteration.json?dl=0
> >
> > Below is the expression used
> >
> > update(models,
> >              batchSize="50",
> >              train(trainingSet,
> >                       features(trainingSet,
> >                                      q="*:*",
> >                                      featureSet="threatFeatures",
> >                                      field="body_txt",
> >                                      outcome="out_i",
> >                                      numTerms=2500),
> >                       q="*:*",
> >                       name="threatModel",
> >                       field="body_txt",
> >                       outcome="out_i",
> >                       maxIterations="1000"))
> >
> > I just have 16 documents with 8+ve and 8-ves. The field which contains
> the
> > feedback is body_txt (text_general type)
> >
> > Thanks for looking.
> >
> >
> >
> > On Wed, Feb 8, 2017 at 7:52 AM, Joel Bernstein <jo...@gmail.com>
> wrote:
> >
> > > Can you post the final iteration of the model?
> > >
> > > Also the expression you used to train the model?
> > >
> > > How much training data do you have? Ho many positive examples and
> > negatives
> > > examples?
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > > On Tue, Feb 7, 2017 at 2:14 PM, Susheel Kumar <su...@gmail.com>
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > I am tried to follow http://joelsolr.blogspot.com/ to see if we can
> > > > classify positive & negative feedbacks using streaming expressions.
> > All
> > > > works but end result where probability_d result of classify
> expression
> > > > gives similar results for positive / negative feedback. See below
> > > >
> > > > What I may be missing here.  Do i need to put more data in training
> set
> > > or
> > > > something else?
> > > >
> > > >
> > > > { "result-set": { "docs": [ { "body_txt": [ "love the company" ],
> > > > "score_d": 2.1892474120319667, "id": "6", "probability_d":
> > > > 0.977944433135261 }, { "body_txt": [ "bad experience " ], "score_d":
> > > > 3.1689453250842914, "id": "5", "probability_d": 0.9888109278133054
> }, {
> > > > "body_txt": [ "This company rewards its employees, but you should
> only
> > > work
> > > > here if you truly love sales. The stress of the job can get to you
> and
> > > they
> > > > definitely push you." ], "score_d": 4.621702323888672, "id": "4",
> > > > "probability_d": 0.9999999999898557 }, { "body_txt": [ "no chance for
> > > > advancement with that company every year I was there it got worse I
> > don't
> > > > know if all branches of adp but Florence organization was turn over
> > rate
> > > > would be higher if it was for temp workers" ], "score_d":
> > > > 5.288898825826228, "id": "3", "probability_d": 0.9999999999999956 },
> {
> > > > "body_txt": [ "It was a pleasure to work at the Milpitas campus. The
> > team
> > > > that works there are professional and dedicated individuals. The
> level
> > of
> > > > loyalty and dedication is impressive" ], "score_d":
> 2.5303947056922937,
> > > > "id": "2", "probability_d": 0.9999990430778418 },
> > > >
> > >
> >
>

Re: alerting system with Solr's Streaming Expressions

Posted by Joel Bernstein <jo...@gmail.com>.
Also you can see in the final iteration of the model that there are 8 true
positives and 8 false positives. So this model classifies everything as
positive. At that you know that it's not a good model.

Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Feb 9, 2017 at 11:03 AM, Joel Bernstein <jo...@gmail.com> wrote:

> A few things that I see right off:
>
> 1) 2500 terms is too many. I was testing with 100-250 terms
> 2) 1000 iterations is to high. If the model hasn't converged by 100
> iterations it's likely not going to converge.
> 3) You're going to need more examples. You may want to run features first
> and see what it selects. Then you need multiple examples for each feature.
> I was testing with the enron ham/spam data set. It would be good to
> download that dataset and see what that looks like.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Thu, Feb 9, 2017 at 10:15 AM, Susheel Kumar <su...@gmail.com>
> wrote:
>
>> Hello Joel,
>>
>> Here is the final iteration in json format.
>>
>>  https://www.dropbox.com/s/g3a3606ms6cu8q4/final_iteration.json?dl=0
>>
>> Below is the expression used
>>
>> update(models,
>>              batchSize="50",
>>              train(trainingSet,
>>                       features(trainingSet,
>>                                      q="*:*",
>>                                      featureSet="threatFeatures",
>>                                      field="body_txt",
>>                                      outcome="out_i",
>>                                      numTerms=2500),
>>                       q="*:*",
>>                       name="threatModel",
>>                       field="body_txt",
>>                       outcome="out_i",
>>                       maxIterations="1000"))
>>
>> I just have 16 documents with 8+ve and 8-ves. The field which contains the
>> feedback is body_txt (text_general type)
>>
>> Thanks for looking.
>>
>>
>>
>> On Wed, Feb 8, 2017 at 7:52 AM, Joel Bernstein <jo...@gmail.com>
>> wrote:
>>
>> > Can you post the final iteration of the model?
>> >
>> > Also the expression you used to train the model?
>> >
>> > How much training data do you have? Ho many positive examples and
>> negatives
>> > examples?
>> >
>> > Joel Bernstein
>> > http://joelsolr.blogspot.com/
>> >
>> > On Tue, Feb 7, 2017 at 2:14 PM, Susheel Kumar <su...@gmail.com>
>> > wrote:
>> >
>> > > Hello,
>> > >
>> > > I am tried to follow http://joelsolr.blogspot.com/ to see if we can
>> > > classify positive & negative feedbacks using streaming expressions.
>> All
>> > > works but end result where probability_d result of classify expression
>> > > gives similar results for positive / negative feedback. See below
>> > >
>> > > What I may be missing here.  Do i need to put more data in training
>> set
>> > or
>> > > something else?
>> > >
>> > >
>> > > { "result-set": { "docs": [ { "body_txt": [ "love the company" ],
>> > > "score_d": 2.1892474120319667, "id": "6", "probability_d":
>> > > 0.977944433135261 }, { "body_txt": [ "bad experience " ], "score_d":
>> > > 3.1689453250842914, "id": "5", "probability_d": 0.9888109278133054 },
>> {
>> > > "body_txt": [ "This company rewards its employees, but you should only
>> > work
>> > > here if you truly love sales. The stress of the job can get to you and
>> > they
>> > > definitely push you." ], "score_d": 4.621702323888672, "id": "4",
>> > > "probability_d": 0.9999999999898557 }, { "body_txt": [ "no chance for
>> > > advancement with that company every year I was there it got worse I
>> don't
>> > > know if all branches of adp but Florence organization was turn over
>> rate
>> > > would be higher if it was for temp workers" ], "score_d":
>> > > 5.288898825826228, "id": "3", "probability_d": 0.9999999999999956 }, {
>> > > "body_txt": [ "It was a pleasure to work at the Milpitas campus. The
>> team
>> > > that works there are professional and dedicated individuals. The
>> level of
>> > > loyalty and dedication is impressive" ], "score_d":
>> 2.5303947056922937,
>> > > "id": "2", "probability_d": 0.9999990430778418 },
>> > >
>> >
>>
>
>

Re: alerting system with Solr's Streaming Expressions

Posted by Joel Bernstein <jo...@gmail.com>.
A few things that I see right off:

1) 2500 terms is too many. I was testing with 100-250 terms
2) 1000 iterations is to high. If the model hasn't converged by 100
iterations it's likely not going to converge.
3) You're going to need more examples. You may want to run features first
and see what it selects. Then you need multiple examples for each feature.
I was testing with the enron ham/spam data set. It would be good to
download that dataset and see what that looks like.

Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Feb 9, 2017 at 10:15 AM, Susheel Kumar <su...@gmail.com>
wrote:

> Hello Joel,
>
> Here is the final iteration in json format.
>
>  https://www.dropbox.com/s/g3a3606ms6cu8q4/final_iteration.json?dl=0
>
> Below is the expression used
>
> update(models,
>              batchSize="50",
>              train(trainingSet,
>                       features(trainingSet,
>                                      q="*:*",
>                                      featureSet="threatFeatures",
>                                      field="body_txt",
>                                      outcome="out_i",
>                                      numTerms=2500),
>                       q="*:*",
>                       name="threatModel",
>                       field="body_txt",
>                       outcome="out_i",
>                       maxIterations="1000"))
>
> I just have 16 documents with 8+ve and 8-ves. The field which contains the
> feedback is body_txt (text_general type)
>
> Thanks for looking.
>
>
>
> On Wed, Feb 8, 2017 at 7:52 AM, Joel Bernstein <jo...@gmail.com> wrote:
>
> > Can you post the final iteration of the model?
> >
> > Also the expression you used to train the model?
> >
> > How much training data do you have? Ho many positive examples and
> negatives
> > examples?
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Tue, Feb 7, 2017 at 2:14 PM, Susheel Kumar <su...@gmail.com>
> > wrote:
> >
> > > Hello,
> > >
> > > I am tried to follow http://joelsolr.blogspot.com/ to see if we can
> > > classify positive & negative feedbacks using streaming expressions.
> All
> > > works but end result where probability_d result of classify expression
> > > gives similar results for positive / negative feedback. See below
> > >
> > > What I may be missing here.  Do i need to put more data in training set
> > or
> > > something else?
> > >
> > >
> > > { "result-set": { "docs": [ { "body_txt": [ "love the company" ],
> > > "score_d": 2.1892474120319667, "id": "6", "probability_d":
> > > 0.977944433135261 }, { "body_txt": [ "bad experience " ], "score_d":
> > > 3.1689453250842914, "id": "5", "probability_d": 0.9888109278133054 }, {
> > > "body_txt": [ "This company rewards its employees, but you should only
> > work
> > > here if you truly love sales. The stress of the job can get to you and
> > they
> > > definitely push you." ], "score_d": 4.621702323888672, "id": "4",
> > > "probability_d": 0.9999999999898557 }, { "body_txt": [ "no chance for
> > > advancement with that company every year I was there it got worse I
> don't
> > > know if all branches of adp but Florence organization was turn over
> rate
> > > would be higher if it was for temp workers" ], "score_d":
> > > 5.288898825826228, "id": "3", "probability_d": 0.9999999999999956 }, {
> > > "body_txt": [ "It was a pleasure to work at the Milpitas campus. The
> team
> > > that works there are professional and dedicated individuals. The level
> of
> > > loyalty and dedication is impressive" ], "score_d": 2.5303947056922937,
> > > "id": "2", "probability_d": 0.9999990430778418 },
> > >
> >
>

Re: alerting system with Solr's Streaming Expressions

Posted by Susheel Kumar <su...@gmail.com>.
Hello Joel,

Here is the final iteration in json format.

 https://www.dropbox.com/s/g3a3606ms6cu8q4/final_iteration.json?dl=0

Below is the expression used

update(models,
             batchSize="50",
             train(trainingSet,
                      features(trainingSet,
                                     q="*:*",
                                     featureSet="threatFeatures",
                                     field="body_txt",
                                     outcome="out_i",
                                     numTerms=2500),
                      q="*:*",
                      name="threatModel",
                      field="body_txt",
                      outcome="out_i",
                      maxIterations="1000"))

I just have 16 documents with 8+ve and 8-ves. The field which contains the
feedback is body_txt (text_general type)

Thanks for looking.



On Wed, Feb 8, 2017 at 7:52 AM, Joel Bernstein <jo...@gmail.com> wrote:

> Can you post the final iteration of the model?
>
> Also the expression you used to train the model?
>
> How much training data do you have? Ho many positive examples and negatives
> examples?
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Tue, Feb 7, 2017 at 2:14 PM, Susheel Kumar <su...@gmail.com>
> wrote:
>
> > Hello,
> >
> > I am tried to follow http://joelsolr.blogspot.com/ to see if we can
> > classify positive & negative feedbacks using streaming expressions.  All
> > works but end result where probability_d result of classify expression
> > gives similar results for positive / negative feedback. See below
> >
> > What I may be missing here.  Do i need to put more data in training set
> or
> > something else?
> >
> >
> > { "result-set": { "docs": [ { "body_txt": [ "love the company" ],
> > "score_d": 2.1892474120319667, "id": "6", "probability_d":
> > 0.977944433135261 }, { "body_txt": [ "bad experience " ], "score_d":
> > 3.1689453250842914, "id": "5", "probability_d": 0.9888109278133054 }, {
> > "body_txt": [ "This company rewards its employees, but you should only
> work
> > here if you truly love sales. The stress of the job can get to you and
> they
> > definitely push you." ], "score_d": 4.621702323888672, "id": "4",
> > "probability_d": 0.9999999999898557 }, { "body_txt": [ "no chance for
> > advancement with that company every year I was there it got worse I don't
> > know if all branches of adp but Florence organization was turn over rate
> > would be higher if it was for temp workers" ], "score_d":
> > 5.288898825826228, "id": "3", "probability_d": 0.9999999999999956 }, {
> > "body_txt": [ "It was a pleasure to work at the Milpitas campus. The team
> > that works there are professional and dedicated individuals. The level of
> > loyalty and dedication is impressive" ], "score_d": 2.5303947056922937,
> > "id": "2", "probability_d": 0.9999990430778418 },
> >
>

Re: alerting system with Solr's Streaming Expressions

Posted by Joel Bernstein <jo...@gmail.com>.
Can you post the final iteration of the model?

Also the expression you used to train the model?

How much training data do you have? Ho many positive examples and negatives
examples?

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Feb 7, 2017 at 2:14 PM, Susheel Kumar <su...@gmail.com> wrote:

> Hello,
>
> I am tried to follow http://joelsolr.blogspot.com/ to see if we can
> classify positive & negative feedbacks using streaming expressions.  All
> works but end result where probability_d result of classify expression
> gives similar results for positive / negative feedback. See below
>
> What I may be missing here.  Do i need to put more data in training set or
> something else?
>
>
> { "result-set": { "docs": [ { "body_txt": [ "love the company" ],
> "score_d": 2.1892474120319667, "id": "6", "probability_d":
> 0.977944433135261 }, { "body_txt": [ "bad experience " ], "score_d":
> 3.1689453250842914, "id": "5", "probability_d": 0.9888109278133054 }, {
> "body_txt": [ "This company rewards its employees, but you should only work
> here if you truly love sales. The stress of the job can get to you and they
> definitely push you." ], "score_d": 4.621702323888672, "id": "4",
> "probability_d": 0.9999999999898557 }, { "body_txt": [ "no chance for
> advancement with that company every year I was there it got worse I don't
> know if all branches of adp but Florence organization was turn over rate
> would be higher if it was for temp workers" ], "score_d":
> 5.288898825826228, "id": "3", "probability_d": 0.9999999999999956 }, {
> "body_txt": [ "It was a pleasure to work at the Milpitas campus. The team
> that works there are professional and dedicated individuals. The level of
> loyalty and dedication is impressive" ], "score_d": 2.5303947056922937,
> "id": "2", "probability_d": 0.9999990430778418 },
>