You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Ahmed Adel (Jira)" <ji...@apache.org> on 2019/11/08 04:06:00 UTC
[jira] [Updated] (SOLR-13903) Classification Model Confusion Matrix
Discrepancy
[ https://issues.apache.org/jira/browse/SOLR-13903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ahmed Adel updated SOLR-13903:
------------------------------
Description:
Using features and train stream sources generate a model with TP, TN, FP, FN fields. For some reason, the summation of the values of these fields is sometimes less than the training set size.
How to regenerate:
# Create two collections: cellphones and cellphones-model
# Indexing the attached dataset into cellphones
# Run the following expression:
{{commit(cellphones-model,update(cellphones-model,batchSize=500,
train(cellphones,
features(cellphones, q="*:*", featureSet="featureSet",
field="title_t",
outcome="brand_i", numTerms=25),
q="*:*",
name="cellphones-classification-model",
field="title_t",
outcome="brand_i",
maxIterations=100)))
}}
# Run the following query to retrieve confusion matrix:
{{search q=*:*&collection=cellphones-model&fl=name_s,trueNegative_i,truePositive_i,falseNegative_i,falsePositive_i,iteration_i&sort=iteration_i%20desc&rows=100}}
The summation of the metrics TP, TN, FP, FN is always less than the training set size by one in this instance for all iterations.
was:
Using features and train stream sources generate a model with TP, TN, FP, FN fields. For some reason, the summation of the values of these fields is sometimes less than the training set size.
How to regenerate:
# Create two collections: cellphones and cellphones-model
# Indexing the attached dataset into cellphones
# Run the following expression:
{{commit(cellphones-model,update(cellphones-model,batchSize=500,
}}{{ train(cellphones,
}}{{ features(cellphones, q="*:*", featureSet="featureSet", field="title_t", outcome="brand_i", numTerms=25),
}}{{ q="*:*",
}}{{ name="cellphones-classification-model",
}}{{ field="title_t",
}}{{ outcome="brand_i",
}}{{ maxIterations=100)))
}}
4) Run the following query to retrieve confusion matrix:
{{search q=*:*&collection=cellphones-model&fl=name_s,trueNegative_i,truePositive_i,falseNegative_i,falsePositive_i,iteration_i&sort=iteration_i%20desc&rows=100
}}
The summation of the metrics TP, TN, FP, FN is always less than the training set size by one in this instance for all iterations.
> Classification Model Confusion Matrix Discrepancy
> -------------------------------------------------
>
> Key: SOLR-13903
> URL: https://issues.apache.org/jira/browse/SOLR-13903
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: streaming expressions
> Affects Versions: 8.2
> Reporter: Ahmed Adel
> Priority: Major
> Labels: classification
> Attachments: cellphones.csv
>
>
> Using features and train stream sources generate a model with TP, TN, FP, FN fields. For some reason, the summation of the values of these fields is sometimes less than the training set size.
> How to regenerate:
> # Create two collections: cellphones and cellphones-model
> # Indexing the attached dataset into cellphones
> # Run the following expression:
> {{commit(cellphones-model,update(cellphones-model,batchSize=500,
> train(cellphones,
> features(cellphones, q="*:*", featureSet="featureSet",
> field="title_t",
> outcome="brand_i", numTerms=25),
> q="*:*",
> name="cellphones-classification-model",
> field="title_t",
> outcome="brand_i",
> maxIterations=100)))
> }}
> # Run the following query to retrieve confusion matrix:
> {{search q=*:*&collection=cellphones-model&fl=name_s,trueNegative_i,truePositive_i,falseNegative_i,falsePositive_i,iteration_i&sort=iteration_i%20desc&rows=100}}
> The summation of the metrics TP, TN, FP, FN is always less than the training set size by one in this instance for all iterations.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org