You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@opennlp.apache.org by Damiano Porta <da...@gmail.com> on 2017/03/05 10:34:06 UTC

Training perceptron model

Hello,

I am training a NER model with perceptron classifier (using OpenNLP 1.7.0)

the output of the training is:

Indexing events using cutoff of 0

Computing event counts...  done. 11861603 events
Indexing...  done.
Collecting events... Done indexing.
Incorporating indexed data for training...
done.
Number of Event Tokens: 11861603
   Number of Outcomes: 23
 Number of Predicates: 6623489
Computing model parameters...
Performing 300 iterations.
  1:  . (11795234/11861603) 0.9944047191597966
  2:  . (11820243/11861603) 0.9965131188423689
  3:  . (11829329/11861603) 0.9972791198626357
  4:  . (11834935/11861603) 0.9977517372651908
  5:  . (11838996/11861603) 0.9980941024581584
  6:  . (11841501/11861603) 0.9983052880795286
  7:  . (11843704/11861603) 0.998491013398442
  8:  . (11845304/11861603) 0.9986259024180796
  9:  . (11846421/11861603) 0.9987200718149141
 10:  . (11847181/11861603) 0.9987841440992419
 20:  . (11852226/11861603) 0.9992094660392866
 30:  . (11853947/11861603) 0.9993545560410343
 40:  . (11854831/11861603) 0.999429082224384
 50:  . (11855471/11861603) 0.999483037832239
Stopping: change in training set accuracy less than 1.0E-5
Stats: (11846242/11861603) 0.998704981105842
...done.
Compressed 6623489 parameters to 554312
6892 outcome patterns
Indexing events using cutoff of 0

Computing event counts...  done. 6370206 events
Indexing...  done.
Collecting events... Done indexing.
Incorporating indexed data for training...
done.
Number of Event Tokens: 6370206
   Number of Outcomes: 23
 Number of Predicates: 3737425
Computing model parameters...
Performing 300 iterations.
  1:  . (6330365/6370206) 0.9937457281601254
  2:  . (6345859/6370206) 0.9961779885925196
  3:  . (6351552/6370206) 0.9970716802564941
  4:  . (6354847/6370206) 0.9975889319748843
  5:  . (6356872/6370206) 0.997906818084062
  6:  . (6358350/6370206) 0.998138835698563
  7:  . (6359611/6370206) 0.9983367884806237
  8:  . (6360473/6370206) 0.9984721059256169
  9:  . (6361138/6370206) 0.9985764981540628
 10:  . (6361532/6370206) 0.9986383485871572
 20:  . (6364161/6370206) 0.9990510510963068
 30:  . (6365106/6370206) 0.9991993979472563
Stopping: change in training set accuracy less than 1.0E-5
Stats: (6360617/6370206) 0.9984947111600473
...done.
Indexing events using cutoff of 0

Computing event counts...  done. 6370114 events
Indexing...  done.
Collecting events... Done indexing.
Incorporating indexed data for training...
done.
Number of Event Tokens: 6370114
   Number of Outcomes: 23
 Number of Predicates: 3737390
Computing model parameters...
Performing 300 iterations.
  1:  . (6330266/6370114) 0.9937445389517362
  2:  . (6345810/6370114) 0.9961846836650019
  3:  . (6351374/6370114) 0.9970581374210885
  4:  . (6354747/6370114) 0.9975876412886803
  5:  . (6356872/6370114) 0.9979212302950936
  6:  . (6358429/6370114) 0.998165652922381
  7:  . (6359417/6370114) 0.9983207521874805
  8:  . (6360292/6370114) 0.9984581123665919
  9:  . (6361076/6370114) 0.9985811870870757
 10:  . (6361693/6370114) 0.998678045636232
 20:  . (6364109/6370114) 0.9990573167136413
 30:  . (6365008/6370114) 0.9991984444862368
 40:  . (6365478/6370114) 0.9992722265253023
Stopping: change in training set accuracy less than 1.0E-5
Stats: (6359985/6370114) 0.9984099185666065
...done.
Indexing events using cutoff of 0

Computing event counts...  done. 6370480 events
Indexing...  done.
Collecting events... Done indexing.
Incorporating indexed data for training...
done.
Number of Event Tokens: 6370480
   Number of Outcomes: 23
 Number of Predicates: 3737798
Computing model parameters...
Performing 300 iterations.
  1:  . (6330685/6370480) 0.9937532179678769
  2:  . (6346153/6370480) 0.9961812924614786
  3:  . (6351726/6370480) 0.9970561088018485
  4:  . (6355089/6370480) 0.9975840125076917
  5:  . (6357173/6370480) 0.9979111464128292
  6:  . (6358780/6370480) 0.9981634036995642
  7:  . (6359845/6370480) 0.9983305810551167
  8:  . (6360827/6370480) 0.9984847295651191
  9:  . (6361316/6370480) 0.9985614898720347
 10:  . (6362076/6370480) 0.9986807901445417
 20:  . (6364506/6370480) 0.9990622370684784
 30:  . (6365415/6370480) 0.9992049264733583
Stopping: change in training set accuracy less than 1.0E-5
Stats: (6362594/6370480) 0.9987621026986977
...done.
Indexing events using cutoff of 0

Computing event counts...  done. 6370008 events
Indexing...  done.
Collecting events... Done indexing.
Incorporating indexed data for training...
done.
Number of Event Tokens: 6370008
   Number of Outcomes: 23
 Number of Predicates: 3737824
Computing model parameters...
Performing 300 iterations.
  1:  . (6330200/6370008) 0.9937507142848172
  2:  . (6345643/6370008) 0.9961750440501802
  3:  . (6351415/6370008) 0.9970811653611737
  4:  . (6354522/6370008) 0.9975689198506501
  5:  . (6356723/6370008) 0.9979144453193779
  6:  . (6358164/6370008) 0.9981406616757781
  7:  . (6359399/6370008) 0.9983345389833106
  8:  . (6360274/6370008) 0.9984719014481614
  9:  . (6360694/6370008) 0.9985378354312899
 10:  . (6361531/6370008) 0.9986692324405244
....
....
....

etc etc is that normal ? The parameters are; *0 cutoff* and *300 iterators*.

The corpus is relative small, it has 20k sentences.

I do not remember an output like that using MAXENT classifier.

Damiano

Re: Training perceptron model

Posted by Joern Kottmann <ko...@gmail.com>.
Yes, open an issue for the name samples, that should be fixed.

Jörn

On Mon, Mar 6, 2017 at 2:17 PM, Damiano Porta <da...@gmail.com>
wrote:

> I have to redesign it, reading the wiki you gave me i have noticed that i
> should not create two partitions (one for trainiing and one for testing).
> It avoids overfitting, so i will pass all the data!
> Thanks Jorn!
>
> P.S. Did you read my previous email about the bug in namesamples? Should i
> open an issue?
>
> 2017-03-06 13:43 GMT+01:00 Damiano Porta <da...@gmail.com>:
>
> > Oh I see. Thanks!
> >
> > Basically i have 30k sentences i apply the labels with a script and then
> i
> > pass 0-15k to train the model (to build the .bin) and 15k-30k to evaluate
> > it.
> >
> > I am trying to build the model with 300 iterations again.
> >
> > 2017-03-06 13:31 GMT+01:00 Joern Kottmann <ko...@gmail.com>:
> >
> >> You should understand how it works, have a look at this wikipedia
> article,
> >> the picture on the right side explains it quite nicely.
> >> https://en.wikipedia.org/wiki/Cross-validation_(statistics)
> >>
> >> The idea is to split the data into n partitions and then use n-1 for
> >> training and 1 for testing, this is repeated n times, so that each
> >> partition was once used for testing.
> >>
> >> It really should be three times as long in your case, maybe there is
> >> something else wrong?'
> >>
> >> Jörn
> >>
> >> On Mon, Mar 6, 2017 at 12:36 PM, Damiano Porta <da...@gmail.com>
> >> wrote:
> >>
> >> > Unfortunately not, 100 iterations ~ 30 minutes 300 iterations > 2 days
> >> and
> >> > it is still running... i will block it
> >> >
> >> > i still do not understand what number should i set as *folds*. Ok i
> will
> >> > set a number > 1 but, should i have to pay more attention to this
> >> > parameter? if i set 8 or 10 does it matter anything?
> >> >
> >> >
> >> >
> >> > 2017-03-06 12:19 GMT+01:00 Joern Kottmann <ko...@gmail.com>:
> >> >
> >> > > test.evaluate(samples, 1), here the second parameter is the number
> of
> >> > > folds, usually you use 10 or a number larger than 1.
> >> > >
> >> > > The amount of times you need for training with perceptron is linear
> to
> >> > the
> >> > > iterations, if you use 300 instead of 100 it should take three times
> >> as
> >> > > long.
> >> > >
> >> > > Jörn
> >> > >
> >> > > On Mon, Mar 6, 2017 at 11:12 AM, Damiano Porta <
> >> damianoporta@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > Jorn,
> >> > > > I am training and testing the model via api. If it is not a
> training
> >> > > > problem. How is that possible that the evaluation is taking 2 days
> >> (and
> >> > > > still running) to evaluate the model? As i told you with 100
> >> > iterations i
> >> > > > can get the model and the test in ~30 minutes.
> >> > > >
> >> > > > I only have a doubt about evaluation, this is the code:
> >> > > >
> >> > > >         try (ObjectStream<NameSample> samples =
> >> > > > ObjectStreamUtils.createObjectStream(evaluation)) {
> >> > > >
> >> > > >             TrainingParameters mlParams = new
> TrainingParameters();
> >> > > >             mlParams.put(TrainingParameters.ALGORITHM_PARAM,
> >> > > > PerceptronTrainer.PERCEPTRON_VALUE);
> >> > > >             mlParams.put(TrainingParameters.ITERATIONS_PARAM,
> >> > > > Integer.toString(100));
> >> > > >             mlParams.put(TrainingParameters.CUTOFF_PARAM,
> >> > > > Integer.toString(0));
> >> > > >
> >> > > >             TokenNameFinderCrossValidator test = new
> >> > > > TokenNameFinderCrossValidator("it",
> >> > > >                 null, mlParams, null,
> >> > > > (TokenNameFinderEvaluationMonitor)null);
> >> > > >
> >> > > >             test.evaluate(samples, 1); *// <---- SECOND PARAMETER
> >> HERE*
> >> > > >
> >> > > >             FMeasure result = test.getFMeasure();
> >> > > >
> >> > > >             System.out.println(result.toString());
> >> > > >         }
> >> > > >
> >> > > > What should i put on the second parameter of test.evaluate() ?
> Each
> >> > > sample
> >> > > > (in samples variable) represents a document. There are no
> relations
> >> > with
> >> > > > other samples.
> >> > > >
> >> > > > 2017-03-06 10:56 GMT+01:00 Joern Kottmann <ko...@gmail.com>:
> >> > > >
> >> > > > > Hello,
> >> > > > >
> >> > > > > the model is only available after the training finished, hard to
> >> > guess
> >> > > > what
> >> > > > > you are doing.
> >> > > > >
> >> > > > > Do you use the command line? Which command?
> >> > > > >
> >> > > > > Jörn
> >> > > > >
> >> > > > > On Mon, Mar 6, 2017 at 10:29 AM, Damiano Porta <
> >> > damianoporta@gmail.com
> >> > > >
> >> > > > > wrote:
> >> > > > >
> >> > > > > > Hello Jorn,
> >> > > > > > I tried with 300 iterations and it takes forever, reducing
> that
> >> > > number
> >> > > > to
> >> > > > > > 100 i can finally get the model in half an hour.
> >> > > > > >
> >> > > > > > The problem with 300 iterations is that i can see the model
> >> (.bin)
> >> > in
> >> > > > > half
> >> > > > > > an hour too but the computations are still running. So i do
> not
> >> > > really
> >> > > > > > understand what it is doing.
> >> > > > > >
> >> > > > > > Damiano
> >> > > > > >
> >> > > > > > 2017-03-06 10:19 GMT+01:00 Joern Kottmann <kottmann@gmail.com
> >:
> >> > > > > >
> >> > > > > > > Hello,
> >> > > > > > >
> >> > > > > > > this looks like output from the cross validator.
> >> > > > > > >
> >> > > > > > > Jörn
> >> > > > > > >
> >> > > > > > > On Sun, Mar 5, 2017 at 11:34 AM, Damiano Porta <
> >> > > > damianoporta@gmail.com
> >> > > > > >
> >> > > > > > > wrote:
> >> > > > > > >
> >> > > > > > > > Hello,
> >> > > > > > > >
> >> > > > > > > > I am training a NER model with perceptron classifier
> (using
> >> > > OpenNLP
> >> > > > > > > 1.7.0)
> >> > > > > > > >
> >> > > > > > > > the output of the training is:
> >> > > > > > > >
> >> > > > > > > > Indexing events using cutoff of 0
> >> > > > > > > >
> >> > > > > > > > Computing event counts...  done. 11861603 events
> >> > > > > > > > Indexing...  done.
> >> > > > > > > > Collecting events... Done indexing.
> >> > > > > > > > Incorporating indexed data for training...
> >> > > > > > > > done.
> >> > > > > > > > Number of Event Tokens: 11861603
> >> > > > > > > >    Number of Outcomes: 23
> >> > > > > > > >  Number of Predicates: 6623489
> >> > > > > > > > Computing model parameters...
> >> > > > > > > > Performing 300 iterations.
> >> > > > > > > >   1:  . (11795234/11861603) 0.9944047191597966
> >> > > > > > > >   2:  . (11820243/11861603) 0.9965131188423689
> >> > > > > > > >   3:  . (11829329/11861603) 0.9972791198626357
> >> > > > > > > >   4:  . (11834935/11861603) 0.9977517372651908
> >> > > > > > > >   5:  . (11838996/11861603) 0.9980941024581584
> >> > > > > > > >   6:  . (11841501/11861603) 0.9983052880795286
> >> > > > > > > >   7:  . (11843704/11861603) 0.998491013398442
> >> > > > > > > >   8:  . (11845304/11861603) 0.9986259024180796
> >> > > > > > > >   9:  . (11846421/11861603) 0.9987200718149141
> >> > > > > > > >  10:  . (11847181/11861603) 0.9987841440992419
> >> > > > > > > >  20:  . (11852226/11861603) 0.9992094660392866
> >> > > > > > > >  30:  . (11853947/11861603) 0.9993545560410343
> >> > > > > > > >  40:  . (11854831/11861603) 0.999429082224384
> >> > > > > > > >  50:  . (11855471/11861603) 0.999483037832239
> >> > > > > > > > Stopping: change in training set accuracy less than 1.0E-5
> >> > > > > > > > Stats: (11846242/11861603) 0.998704981105842
> >> > > > > > > > ...done.
> >> > > > > > > > Compressed 6623489 parameters to 554312
> >> > > > > > > > 6892 outcome patterns
> >> > > > > > > > Indexing events using cutoff of 0
> >> > > > > > > >
> >> > > > > > > > Computing event counts...  done. 6370206 events
> >> > > > > > > > Indexing...  done.
> >> > > > > > > > Collecting events... Done indexing.
> >> > > > > > > > Incorporating indexed data for training...
> >> > > > > > > > done.
> >> > > > > > > > Number of Event Tokens: 6370206
> >> > > > > > > >    Number of Outcomes: 23
> >> > > > > > > >  Number of Predicates: 3737425
> >> > > > > > > > Computing model parameters...
> >> > > > > > > > Performing 300 iterations.
> >> > > > > > > >   1:  . (6330365/6370206) 0.9937457281601254
> >> > > > > > > >   2:  . (6345859/6370206) 0.9961779885925196
> >> > > > > > > >   3:  . (6351552/6370206) 0.9970716802564941
> >> > > > > > > >   4:  . (6354847/6370206) 0.9975889319748843
> >> > > > > > > >   5:  . (6356872/6370206) 0.997906818084062
> >> > > > > > > >   6:  . (6358350/6370206) 0.998138835698563
> >> > > > > > > >   7:  . (6359611/6370206) 0.9983367884806237
> >> > > > > > > >   8:  . (6360473/6370206) 0.9984721059256169
> >> > > > > > > >   9:  . (6361138/6370206) 0.9985764981540628
> >> > > > > > > >  10:  . (6361532/6370206) 0.9986383485871572
> >> > > > > > > >  20:  . (6364161/6370206) 0.9990510510963068
> >> > > > > > > >  30:  . (6365106/6370206) 0.9991993979472563
> >> > > > > > > > Stopping: change in training set accuracy less than 1.0E-5
> >> > > > > > > > Stats: (6360617/6370206) 0.9984947111600473
> >> > > > > > > > ...done.
> >> > > > > > > > Indexing events using cutoff of 0
> >> > > > > > > >
> >> > > > > > > > Computing event counts...  done. 6370114 events
> >> > > > > > > > Indexing...  done.
> >> > > > > > > > Collecting events... Done indexing.
> >> > > > > > > > Incorporating indexed data for training...
> >> > > > > > > > done.
> >> > > > > > > > Number of Event Tokens: 6370114
> >> > > > > > > >    Number of Outcomes: 23
> >> > > > > > > >  Number of Predicates: 3737390
> >> > > > > > > > Computing model parameters...
> >> > > > > > > > Performing 300 iterations.
> >> > > > > > > >   1:  . (6330266/6370114) 0.9937445389517362
> >> > > > > > > >   2:  . (6345810/6370114) 0.9961846836650019
> >> > > > > > > >   3:  . (6351374/6370114) 0.9970581374210885
> >> > > > > > > >   4:  . (6354747/6370114) 0.9975876412886803
> >> > > > > > > >   5:  . (6356872/6370114) 0.9979212302950936
> >> > > > > > > >   6:  . (6358429/6370114) 0.998165652922381
> >> > > > > > > >   7:  . (6359417/6370114) 0.9983207521874805
> >> > > > > > > >   8:  . (6360292/6370114) 0.9984581123665919
> >> > > > > > > >   9:  . (6361076/6370114) 0.9985811870870757
> >> > > > > > > >  10:  . (6361693/6370114) 0.998678045636232
> >> > > > > > > >  20:  . (6364109/6370114) 0.9990573167136413
> >> > > > > > > >  30:  . (6365008/6370114) 0.9991984444862368
> >> > > > > > > >  40:  . (6365478/6370114) 0.9992722265253023
> >> > > > > > > > Stopping: change in training set accuracy less than 1.0E-5
> >> > > > > > > > Stats: (6359985/6370114) 0.9984099185666065
> >> > > > > > > > ...done.
> >> > > > > > > > Indexing events using cutoff of 0
> >> > > > > > > >
> >> > > > > > > > Computing event counts...  done. 6370480 events
> >> > > > > > > > Indexing...  done.
> >> > > > > > > > Collecting events... Done indexing.
> >> > > > > > > > Incorporating indexed data for training...
> >> > > > > > > > done.
> >> > > > > > > > Number of Event Tokens: 6370480
> >> > > > > > > >    Number of Outcomes: 23
> >> > > > > > > >  Number of Predicates: 3737798
> >> > > > > > > > Computing model parameters...
> >> > > > > > > > Performing 300 iterations.
> >> > > > > > > >   1:  . (6330685/6370480) 0.9937532179678769
> >> > > > > > > >   2:  . (6346153/6370480) 0.9961812924614786
> >> > > > > > > >   3:  . (6351726/6370480) 0.9970561088018485
> >> > > > > > > >   4:  . (6355089/6370480) 0.9975840125076917
> >> > > > > > > >   5:  . (6357173/6370480) 0.9979111464128292
> >> > > > > > > >   6:  . (6358780/6370480) 0.9981634036995642
> >> > > > > > > >   7:  . (6359845/6370480) 0.9983305810551167
> >> > > > > > > >   8:  . (6360827/6370480) 0.9984847295651191
> >> > > > > > > >   9:  . (6361316/6370480) 0.9985614898720347
> >> > > > > > > >  10:  . (6362076/6370480) 0.9986807901445417
> >> > > > > > > >  20:  . (6364506/6370480) 0.9990622370684784
> >> > > > > > > >  30:  . (6365415/6370480) 0.9992049264733583
> >> > > > > > > > Stopping: change in training set accuracy less than 1.0E-5
> >> > > > > > > > Stats: (6362594/6370480) 0.9987621026986977
> >> > > > > > > > ...done.
> >> > > > > > > > Indexing events using cutoff of 0
> >> > > > > > > >
> >> > > > > > > > Computing event counts...  done. 6370008 events
> >> > > > > > > > Indexing...  done.
> >> > > > > > > > Collecting events... Done indexing.
> >> > > > > > > > Incorporating indexed data for training...
> >> > > > > > > > done.
> >> > > > > > > > Number of Event Tokens: 6370008
> >> > > > > > > >    Number of Outcomes: 23
> >> > > > > > > >  Number of Predicates: 3737824
> >> > > > > > > > Computing model parameters...
> >> > > > > > > > Performing 300 iterations.
> >> > > > > > > >   1:  . (6330200/6370008) 0.9937507142848172
> >> > > > > > > >   2:  . (6345643/6370008) 0.9961750440501802
> >> > > > > > > >   3:  . (6351415/6370008) 0.9970811653611737
> >> > > > > > > >   4:  . (6354522/6370008) 0.9975689198506501
> >> > > > > > > >   5:  . (6356723/6370008) 0.9979144453193779
> >> > > > > > > >   6:  . (6358164/6370008) 0.9981406616757781
> >> > > > > > > >   7:  . (6359399/6370008) 0.9983345389833106
> >> > > > > > > >   8:  . (6360274/6370008) 0.9984719014481614
> >> > > > > > > >   9:  . (6360694/6370008) 0.9985378354312899
> >> > > > > > > >  10:  . (6361531/6370008) 0.9986692324405244
> >> > > > > > > > ....
> >> > > > > > > > ....
> >> > > > > > > > ....
> >> > > > > > > >
> >> > > > > > > > etc etc is that normal ? The parameters are; *0 cutoff*
> and
> >> > *300
> >> > > > > > > > iterators*.
> >> > > > > > > >
> >> > > > > > > > The corpus is relative small, it has 20k sentences.
> >> > > > > > > >
> >> > > > > > > > I do not remember an output like that using MAXENT
> >> classifier.
> >> > > > > > > >
> >> > > > > > > > Damiano
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: Training perceptron model

Posted by Damiano Porta <da...@gmail.com>.
I have to redesign it, reading the wiki you gave me i have noticed that i
should not create two partitions (one for trainiing and one for testing).
It avoids overfitting, so i will pass all the data!
Thanks Jorn!

P.S. Did you read my previous email about the bug in namesamples? Should i
open an issue?

2017-03-06 13:43 GMT+01:00 Damiano Porta <da...@gmail.com>:

> Oh I see. Thanks!
>
> Basically i have 30k sentences i apply the labels with a script and then i
> pass 0-15k to train the model (to build the .bin) and 15k-30k to evaluate
> it.
>
> I am trying to build the model with 300 iterations again.
>
> 2017-03-06 13:31 GMT+01:00 Joern Kottmann <ko...@gmail.com>:
>
>> You should understand how it works, have a look at this wikipedia article,
>> the picture on the right side explains it quite nicely.
>> https://en.wikipedia.org/wiki/Cross-validation_(statistics)
>>
>> The idea is to split the data into n partitions and then use n-1 for
>> training and 1 for testing, this is repeated n times, so that each
>> partition was once used for testing.
>>
>> It really should be three times as long in your case, maybe there is
>> something else wrong?'
>>
>> Jörn
>>
>> On Mon, Mar 6, 2017 at 12:36 PM, Damiano Porta <da...@gmail.com>
>> wrote:
>>
>> > Unfortunately not, 100 iterations ~ 30 minutes 300 iterations > 2 days
>> and
>> > it is still running... i will block it
>> >
>> > i still do not understand what number should i set as *folds*. Ok i will
>> > set a number > 1 but, should i have to pay more attention to this
>> > parameter? if i set 8 or 10 does it matter anything?
>> >
>> >
>> >
>> > 2017-03-06 12:19 GMT+01:00 Joern Kottmann <ko...@gmail.com>:
>> >
>> > > test.evaluate(samples, 1), here the second parameter is the number of
>> > > folds, usually you use 10 or a number larger than 1.
>> > >
>> > > The amount of times you need for training with perceptron is linear to
>> > the
>> > > iterations, if you use 300 instead of 100 it should take three times
>> as
>> > > long.
>> > >
>> > > Jörn
>> > >
>> > > On Mon, Mar 6, 2017 at 11:12 AM, Damiano Porta <
>> damianoporta@gmail.com>
>> > > wrote:
>> > >
>> > > > Jorn,
>> > > > I am training and testing the model via api. If it is not a training
>> > > > problem. How is that possible that the evaluation is taking 2 days
>> (and
>> > > > still running) to evaluate the model? As i told you with 100
>> > iterations i
>> > > > can get the model and the test in ~30 minutes.
>> > > >
>> > > > I only have a doubt about evaluation, this is the code:
>> > > >
>> > > >         try (ObjectStream<NameSample> samples =
>> > > > ObjectStreamUtils.createObjectStream(evaluation)) {
>> > > >
>> > > >             TrainingParameters mlParams = new TrainingParameters();
>> > > >             mlParams.put(TrainingParameters.ALGORITHM_PARAM,
>> > > > PerceptronTrainer.PERCEPTRON_VALUE);
>> > > >             mlParams.put(TrainingParameters.ITERATIONS_PARAM,
>> > > > Integer.toString(100));
>> > > >             mlParams.put(TrainingParameters.CUTOFF_PARAM,
>> > > > Integer.toString(0));
>> > > >
>> > > >             TokenNameFinderCrossValidator test = new
>> > > > TokenNameFinderCrossValidator("it",
>> > > >                 null, mlParams, null,
>> > > > (TokenNameFinderEvaluationMonitor)null);
>> > > >
>> > > >             test.evaluate(samples, 1); *// <---- SECOND PARAMETER
>> HERE*
>> > > >
>> > > >             FMeasure result = test.getFMeasure();
>> > > >
>> > > >             System.out.println(result.toString());
>> > > >         }
>> > > >
>> > > > What should i put on the second parameter of test.evaluate() ? Each
>> > > sample
>> > > > (in samples variable) represents a document. There are no relations
>> > with
>> > > > other samples.
>> > > >
>> > > > 2017-03-06 10:56 GMT+01:00 Joern Kottmann <ko...@gmail.com>:
>> > > >
>> > > > > Hello,
>> > > > >
>> > > > > the model is only available after the training finished, hard to
>> > guess
>> > > > what
>> > > > > you are doing.
>> > > > >
>> > > > > Do you use the command line? Which command?
>> > > > >
>> > > > > Jörn
>> > > > >
>> > > > > On Mon, Mar 6, 2017 at 10:29 AM, Damiano Porta <
>> > damianoporta@gmail.com
>> > > >
>> > > > > wrote:
>> > > > >
>> > > > > > Hello Jorn,
>> > > > > > I tried with 300 iterations and it takes forever, reducing that
>> > > number
>> > > > to
>> > > > > > 100 i can finally get the model in half an hour.
>> > > > > >
>> > > > > > The problem with 300 iterations is that i can see the model
>> (.bin)
>> > in
>> > > > > half
>> > > > > > an hour too but the computations are still running. So i do not
>> > > really
>> > > > > > understand what it is doing.
>> > > > > >
>> > > > > > Damiano
>> > > > > >
>> > > > > > 2017-03-06 10:19 GMT+01:00 Joern Kottmann <ko...@gmail.com>:
>> > > > > >
>> > > > > > > Hello,
>> > > > > > >
>> > > > > > > this looks like output from the cross validator.
>> > > > > > >
>> > > > > > > Jörn
>> > > > > > >
>> > > > > > > On Sun, Mar 5, 2017 at 11:34 AM, Damiano Porta <
>> > > > damianoporta@gmail.com
>> > > > > >
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > Hello,
>> > > > > > > >
>> > > > > > > > I am training a NER model with perceptron classifier (using
>> > > OpenNLP
>> > > > > > > 1.7.0)
>> > > > > > > >
>> > > > > > > > the output of the training is:
>> > > > > > > >
>> > > > > > > > Indexing events using cutoff of 0
>> > > > > > > >
>> > > > > > > > Computing event counts...  done. 11861603 events
>> > > > > > > > Indexing...  done.
>> > > > > > > > Collecting events... Done indexing.
>> > > > > > > > Incorporating indexed data for training...
>> > > > > > > > done.
>> > > > > > > > Number of Event Tokens: 11861603
>> > > > > > > >    Number of Outcomes: 23
>> > > > > > > >  Number of Predicates: 6623489
>> > > > > > > > Computing model parameters...
>> > > > > > > > Performing 300 iterations.
>> > > > > > > >   1:  . (11795234/11861603) 0.9944047191597966
>> > > > > > > >   2:  . (11820243/11861603) 0.9965131188423689
>> > > > > > > >   3:  . (11829329/11861603) 0.9972791198626357
>> > > > > > > >   4:  . (11834935/11861603) 0.9977517372651908
>> > > > > > > >   5:  . (11838996/11861603) 0.9980941024581584
>> > > > > > > >   6:  . (11841501/11861603) 0.9983052880795286
>> > > > > > > >   7:  . (11843704/11861603) 0.998491013398442
>> > > > > > > >   8:  . (11845304/11861603) 0.9986259024180796
>> > > > > > > >   9:  . (11846421/11861603) 0.9987200718149141
>> > > > > > > >  10:  . (11847181/11861603) 0.9987841440992419
>> > > > > > > >  20:  . (11852226/11861603) 0.9992094660392866
>> > > > > > > >  30:  . (11853947/11861603) 0.9993545560410343
>> > > > > > > >  40:  . (11854831/11861603) 0.999429082224384
>> > > > > > > >  50:  . (11855471/11861603) 0.999483037832239
>> > > > > > > > Stopping: change in training set accuracy less than 1.0E-5
>> > > > > > > > Stats: (11846242/11861603) 0.998704981105842
>> > > > > > > > ...done.
>> > > > > > > > Compressed 6623489 parameters to 554312
>> > > > > > > > 6892 outcome patterns
>> > > > > > > > Indexing events using cutoff of 0
>> > > > > > > >
>> > > > > > > > Computing event counts...  done. 6370206 events
>> > > > > > > > Indexing...  done.
>> > > > > > > > Collecting events... Done indexing.
>> > > > > > > > Incorporating indexed data for training...
>> > > > > > > > done.
>> > > > > > > > Number of Event Tokens: 6370206
>> > > > > > > >    Number of Outcomes: 23
>> > > > > > > >  Number of Predicates: 3737425
>> > > > > > > > Computing model parameters...
>> > > > > > > > Performing 300 iterations.
>> > > > > > > >   1:  . (6330365/6370206) 0.9937457281601254
>> > > > > > > >   2:  . (6345859/6370206) 0.9961779885925196
>> > > > > > > >   3:  . (6351552/6370206) 0.9970716802564941
>> > > > > > > >   4:  . (6354847/6370206) 0.9975889319748843
>> > > > > > > >   5:  . (6356872/6370206) 0.997906818084062
>> > > > > > > >   6:  . (6358350/6370206) 0.998138835698563
>> > > > > > > >   7:  . (6359611/6370206) 0.9983367884806237
>> > > > > > > >   8:  . (6360473/6370206) 0.9984721059256169
>> > > > > > > >   9:  . (6361138/6370206) 0.9985764981540628
>> > > > > > > >  10:  . (6361532/6370206) 0.9986383485871572
>> > > > > > > >  20:  . (6364161/6370206) 0.9990510510963068
>> > > > > > > >  30:  . (6365106/6370206) 0.9991993979472563
>> > > > > > > > Stopping: change in training set accuracy less than 1.0E-5
>> > > > > > > > Stats: (6360617/6370206) 0.9984947111600473
>> > > > > > > > ...done.
>> > > > > > > > Indexing events using cutoff of 0
>> > > > > > > >
>> > > > > > > > Computing event counts...  done. 6370114 events
>> > > > > > > > Indexing...  done.
>> > > > > > > > Collecting events... Done indexing.
>> > > > > > > > Incorporating indexed data for training...
>> > > > > > > > done.
>> > > > > > > > Number of Event Tokens: 6370114
>> > > > > > > >    Number of Outcomes: 23
>> > > > > > > >  Number of Predicates: 3737390
>> > > > > > > > Computing model parameters...
>> > > > > > > > Performing 300 iterations.
>> > > > > > > >   1:  . (6330266/6370114) 0.9937445389517362
>> > > > > > > >   2:  . (6345810/6370114) 0.9961846836650019
>> > > > > > > >   3:  . (6351374/6370114) 0.9970581374210885
>> > > > > > > >   4:  . (6354747/6370114) 0.9975876412886803
>> > > > > > > >   5:  . (6356872/6370114) 0.9979212302950936
>> > > > > > > >   6:  . (6358429/6370114) 0.998165652922381
>> > > > > > > >   7:  . (6359417/6370114) 0.9983207521874805
>> > > > > > > >   8:  . (6360292/6370114) 0.9984581123665919
>> > > > > > > >   9:  . (6361076/6370114) 0.9985811870870757
>> > > > > > > >  10:  . (6361693/6370114) 0.998678045636232
>> > > > > > > >  20:  . (6364109/6370114) 0.9990573167136413
>> > > > > > > >  30:  . (6365008/6370114) 0.9991984444862368
>> > > > > > > >  40:  . (6365478/6370114) 0.9992722265253023
>> > > > > > > > Stopping: change in training set accuracy less than 1.0E-5
>> > > > > > > > Stats: (6359985/6370114) 0.9984099185666065
>> > > > > > > > ...done.
>> > > > > > > > Indexing events using cutoff of 0
>> > > > > > > >
>> > > > > > > > Computing event counts...  done. 6370480 events
>> > > > > > > > Indexing...  done.
>> > > > > > > > Collecting events... Done indexing.
>> > > > > > > > Incorporating indexed data for training...
>> > > > > > > > done.
>> > > > > > > > Number of Event Tokens: 6370480
>> > > > > > > >    Number of Outcomes: 23
>> > > > > > > >  Number of Predicates: 3737798
>> > > > > > > > Computing model parameters...
>> > > > > > > > Performing 300 iterations.
>> > > > > > > >   1:  . (6330685/6370480) 0.9937532179678769
>> > > > > > > >   2:  . (6346153/6370480) 0.9961812924614786
>> > > > > > > >   3:  . (6351726/6370480) 0.9970561088018485
>> > > > > > > >   4:  . (6355089/6370480) 0.9975840125076917
>> > > > > > > >   5:  . (6357173/6370480) 0.9979111464128292
>> > > > > > > >   6:  . (6358780/6370480) 0.9981634036995642
>> > > > > > > >   7:  . (6359845/6370480) 0.9983305810551167
>> > > > > > > >   8:  . (6360827/6370480) 0.9984847295651191
>> > > > > > > >   9:  . (6361316/6370480) 0.9985614898720347
>> > > > > > > >  10:  . (6362076/6370480) 0.9986807901445417
>> > > > > > > >  20:  . (6364506/6370480) 0.9990622370684784
>> > > > > > > >  30:  . (6365415/6370480) 0.9992049264733583
>> > > > > > > > Stopping: change in training set accuracy less than 1.0E-5
>> > > > > > > > Stats: (6362594/6370480) 0.9987621026986977
>> > > > > > > > ...done.
>> > > > > > > > Indexing events using cutoff of 0
>> > > > > > > >
>> > > > > > > > Computing event counts...  done. 6370008 events
>> > > > > > > > Indexing...  done.
>> > > > > > > > Collecting events... Done indexing.
>> > > > > > > > Incorporating indexed data for training...
>> > > > > > > > done.
>> > > > > > > > Number of Event Tokens: 6370008
>> > > > > > > >    Number of Outcomes: 23
>> > > > > > > >  Number of Predicates: 3737824
>> > > > > > > > Computing model parameters...
>> > > > > > > > Performing 300 iterations.
>> > > > > > > >   1:  . (6330200/6370008) 0.9937507142848172
>> > > > > > > >   2:  . (6345643/6370008) 0.9961750440501802
>> > > > > > > >   3:  . (6351415/6370008) 0.9970811653611737
>> > > > > > > >   4:  . (6354522/6370008) 0.9975689198506501
>> > > > > > > >   5:  . (6356723/6370008) 0.9979144453193779
>> > > > > > > >   6:  . (6358164/6370008) 0.9981406616757781
>> > > > > > > >   7:  . (6359399/6370008) 0.9983345389833106
>> > > > > > > >   8:  . (6360274/6370008) 0.9984719014481614
>> > > > > > > >   9:  . (6360694/6370008) 0.9985378354312899
>> > > > > > > >  10:  . (6361531/6370008) 0.9986692324405244
>> > > > > > > > ....
>> > > > > > > > ....
>> > > > > > > > ....
>> > > > > > > >
>> > > > > > > > etc etc is that normal ? The parameters are; *0 cutoff* and
>> > *300
>> > > > > > > > iterators*.
>> > > > > > > >
>> > > > > > > > The corpus is relative small, it has 20k sentences.
>> > > > > > > >
>> > > > > > > > I do not remember an output like that using MAXENT
>> classifier.
>> > > > > > > >
>> > > > > > > > Damiano
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: Training perceptron model

Posted by Damiano Porta <da...@gmail.com>.
Oh I see. Thanks!

Basically i have 30k sentences i apply the labels with a script and then i
pass 0-15k to train the model (to build the .bin) and 15k-30k to evaluate
it.

I am trying to build the model with 300 iterations again.

2017-03-06 13:31 GMT+01:00 Joern Kottmann <ko...@gmail.com>:

> You should understand how it works, have a look at this wikipedia article,
> the picture on the right side explains it quite nicely.
> https://en.wikipedia.org/wiki/Cross-validation_(statistics)
>
> The idea is to split the data into n partitions and then use n-1 for
> training and 1 for testing, this is repeated n times, so that each
> partition was once used for testing.
>
> It really should be three times as long in your case, maybe there is
> something else wrong?'
>
> Jörn
>
> On Mon, Mar 6, 2017 at 12:36 PM, Damiano Porta <da...@gmail.com>
> wrote:
>
> > Unfortunately not, 100 iterations ~ 30 minutes 300 iterations > 2 days
> and
> > it is still running... i will block it
> >
> > i still do not understand what number should i set as *folds*. Ok i will
> > set a number > 1 but, should i have to pay more attention to this
> > parameter? if i set 8 or 10 does it matter anything?
> >
> >
> >
> > 2017-03-06 12:19 GMT+01:00 Joern Kottmann <ko...@gmail.com>:
> >
> > > test.evaluate(samples, 1), here the second parameter is the number of
> > > folds, usually you use 10 or a number larger than 1.
> > >
> > > The amount of times you need for training with perceptron is linear to
> > the
> > > iterations, if you use 300 instead of 100 it should take three times as
> > > long.
> > >
> > > Jörn
> > >
> > > On Mon, Mar 6, 2017 at 11:12 AM, Damiano Porta <damianoporta@gmail.com
> >
> > > wrote:
> > >
> > > > Jorn,
> > > > I am training and testing the model via api. If it is not a training
> > > > problem. How is that possible that the evaluation is taking 2 days
> (and
> > > > still running) to evaluate the model? As i told you with 100
> > iterations i
> > > > can get the model and the test in ~30 minutes.
> > > >
> > > > I only have a doubt about evaluation, this is the code:
> > > >
> > > >         try (ObjectStream<NameSample> samples =
> > > > ObjectStreamUtils.createObjectStream(evaluation)) {
> > > >
> > > >             TrainingParameters mlParams = new TrainingParameters();
> > > >             mlParams.put(TrainingParameters.ALGORITHM_PARAM,
> > > > PerceptronTrainer.PERCEPTRON_VALUE);
> > > >             mlParams.put(TrainingParameters.ITERATIONS_PARAM,
> > > > Integer.toString(100));
> > > >             mlParams.put(TrainingParameters.CUTOFF_PARAM,
> > > > Integer.toString(0));
> > > >
> > > >             TokenNameFinderCrossValidator test = new
> > > > TokenNameFinderCrossValidator("it",
> > > >                 null, mlParams, null,
> > > > (TokenNameFinderEvaluationMonitor)null);
> > > >
> > > >             test.evaluate(samples, 1); *// <---- SECOND PARAMETER
> HERE*
> > > >
> > > >             FMeasure result = test.getFMeasure();
> > > >
> > > >             System.out.println(result.toString());
> > > >         }
> > > >
> > > > What should i put on the second parameter of test.evaluate() ? Each
> > > sample
> > > > (in samples variable) represents a document. There are no relations
> > with
> > > > other samples.
> > > >
> > > > 2017-03-06 10:56 GMT+01:00 Joern Kottmann <ko...@gmail.com>:
> > > >
> > > > > Hello,
> > > > >
> > > > > the model is only available after the training finished, hard to
> > guess
> > > > what
> > > > > you are doing.
> > > > >
> > > > > Do you use the command line? Which command?
> > > > >
> > > > > Jörn
> > > > >
> > > > > On Mon, Mar 6, 2017 at 10:29 AM, Damiano Porta <
> > damianoporta@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Hello Jorn,
> > > > > > I tried with 300 iterations and it takes forever, reducing that
> > > number
> > > > to
> > > > > > 100 i can finally get the model in half an hour.
> > > > > >
> > > > > > The problem with 300 iterations is that i can see the model
> (.bin)
> > in
> > > > > half
> > > > > > an hour too but the computations are still running. So i do not
> > > really
> > > > > > understand what it is doing.
> > > > > >
> > > > > > Damiano
> > > > > >
> > > > > > 2017-03-06 10:19 GMT+01:00 Joern Kottmann <ko...@gmail.com>:
> > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > this looks like output from the cross validator.
> > > > > > >
> > > > > > > Jörn
> > > > > > >
> > > > > > > On Sun, Mar 5, 2017 at 11:34 AM, Damiano Porta <
> > > > damianoporta@gmail.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hello,
> > > > > > > >
> > > > > > > > I am training a NER model with perceptron classifier (using
> > > OpenNLP
> > > > > > > 1.7.0)
> > > > > > > >
> > > > > > > > the output of the training is:
> > > > > > > >
> > > > > > > > Indexing events using cutoff of 0
> > > > > > > >
> > > > > > > > Computing event counts...  done. 11861603 events
> > > > > > > > Indexing...  done.
> > > > > > > > Collecting events... Done indexing.
> > > > > > > > Incorporating indexed data for training...
> > > > > > > > done.
> > > > > > > > Number of Event Tokens: 11861603
> > > > > > > >    Number of Outcomes: 23
> > > > > > > >  Number of Predicates: 6623489
> > > > > > > > Computing model parameters...
> > > > > > > > Performing 300 iterations.
> > > > > > > >   1:  . (11795234/11861603) 0.9944047191597966
> > > > > > > >   2:  . (11820243/11861603) 0.9965131188423689
> > > > > > > >   3:  . (11829329/11861603) 0.9972791198626357
> > > > > > > >   4:  . (11834935/11861603) 0.9977517372651908
> > > > > > > >   5:  . (11838996/11861603) 0.9980941024581584
> > > > > > > >   6:  . (11841501/11861603) 0.9983052880795286
> > > > > > > >   7:  . (11843704/11861603) 0.998491013398442
> > > > > > > >   8:  . (11845304/11861603) 0.9986259024180796
> > > > > > > >   9:  . (11846421/11861603) 0.9987200718149141
> > > > > > > >  10:  . (11847181/11861603) 0.9987841440992419
> > > > > > > >  20:  . (11852226/11861603) 0.9992094660392866
> > > > > > > >  30:  . (11853947/11861603) 0.9993545560410343
> > > > > > > >  40:  . (11854831/11861603) 0.999429082224384
> > > > > > > >  50:  . (11855471/11861603) 0.999483037832239
> > > > > > > > Stopping: change in training set accuracy less than 1.0E-5
> > > > > > > > Stats: (11846242/11861603) 0.998704981105842
> > > > > > > > ...done.
> > > > > > > > Compressed 6623489 parameters to 554312
> > > > > > > > 6892 outcome patterns
> > > > > > > > Indexing events using cutoff of 0
> > > > > > > >
> > > > > > > > Computing event counts...  done. 6370206 events
> > > > > > > > Indexing...  done.
> > > > > > > > Collecting events... Done indexing.
> > > > > > > > Incorporating indexed data for training...
> > > > > > > > done.
> > > > > > > > Number of Event Tokens: 6370206
> > > > > > > >    Number of Outcomes: 23
> > > > > > > >  Number of Predicates: 3737425
> > > > > > > > Computing model parameters...
> > > > > > > > Performing 300 iterations.
> > > > > > > >   1:  . (6330365/6370206) 0.9937457281601254
> > > > > > > >   2:  . (6345859/6370206) 0.9961779885925196
> > > > > > > >   3:  . (6351552/6370206) 0.9970716802564941
> > > > > > > >   4:  . (6354847/6370206) 0.9975889319748843
> > > > > > > >   5:  . (6356872/6370206) 0.997906818084062
> > > > > > > >   6:  . (6358350/6370206) 0.998138835698563
> > > > > > > >   7:  . (6359611/6370206) 0.9983367884806237
> > > > > > > >   8:  . (6360473/6370206) 0.9984721059256169
> > > > > > > >   9:  . (6361138/6370206) 0.9985764981540628
> > > > > > > >  10:  . (6361532/6370206) 0.9986383485871572
> > > > > > > >  20:  . (6364161/6370206) 0.9990510510963068
> > > > > > > >  30:  . (6365106/6370206) 0.9991993979472563
> > > > > > > > Stopping: change in training set accuracy less than 1.0E-5
> > > > > > > > Stats: (6360617/6370206) 0.9984947111600473
> > > > > > > > ...done.
> > > > > > > > Indexing events using cutoff of 0
> > > > > > > >
> > > > > > > > Computing event counts...  done. 6370114 events
> > > > > > > > Indexing...  done.
> > > > > > > > Collecting events... Done indexing.
> > > > > > > > Incorporating indexed data for training...
> > > > > > > > done.
> > > > > > > > Number of Event Tokens: 6370114
> > > > > > > >    Number of Outcomes: 23
> > > > > > > >  Number of Predicates: 3737390
> > > > > > > > Computing model parameters...
> > > > > > > > Performing 300 iterations.
> > > > > > > >   1:  . (6330266/6370114) 0.9937445389517362
> > > > > > > >   2:  . (6345810/6370114) 0.9961846836650019
> > > > > > > >   3:  . (6351374/6370114) 0.9970581374210885
> > > > > > > >   4:  . (6354747/6370114) 0.9975876412886803
> > > > > > > >   5:  . (6356872/6370114) 0.9979212302950936
> > > > > > > >   6:  . (6358429/6370114) 0.998165652922381
> > > > > > > >   7:  . (6359417/6370114) 0.9983207521874805
> > > > > > > >   8:  . (6360292/6370114) 0.9984581123665919
> > > > > > > >   9:  . (6361076/6370114) 0.9985811870870757
> > > > > > > >  10:  . (6361693/6370114) 0.998678045636232
> > > > > > > >  20:  . (6364109/6370114) 0.9990573167136413
> > > > > > > >  30:  . (6365008/6370114) 0.9991984444862368
> > > > > > > >  40:  . (6365478/6370114) 0.9992722265253023
> > > > > > > > Stopping: change in training set accuracy less than 1.0E-5
> > > > > > > > Stats: (6359985/6370114) 0.9984099185666065
> > > > > > > > ...done.
> > > > > > > > Indexing events using cutoff of 0
> > > > > > > >
> > > > > > > > Computing event counts...  done. 6370480 events
> > > > > > > > Indexing...  done.
> > > > > > > > Collecting events... Done indexing.
> > > > > > > > Incorporating indexed data for training...
> > > > > > > > done.
> > > > > > > > Number of Event Tokens: 6370480
> > > > > > > >    Number of Outcomes: 23
> > > > > > > >  Number of Predicates: 3737798
> > > > > > > > Computing model parameters...
> > > > > > > > Performing 300 iterations.
> > > > > > > >   1:  . (6330685/6370480) 0.9937532179678769
> > > > > > > >   2:  . (6346153/6370480) 0.9961812924614786
> > > > > > > >   3:  . (6351726/6370480) 0.9970561088018485
> > > > > > > >   4:  . (6355089/6370480) 0.9975840125076917
> > > > > > > >   5:  . (6357173/6370480) 0.9979111464128292
> > > > > > > >   6:  . (6358780/6370480) 0.9981634036995642
> > > > > > > >   7:  . (6359845/6370480) 0.9983305810551167
> > > > > > > >   8:  . (6360827/6370480) 0.9984847295651191
> > > > > > > >   9:  . (6361316/6370480) 0.9985614898720347
> > > > > > > >  10:  . (6362076/6370480) 0.9986807901445417
> > > > > > > >  20:  . (6364506/6370480) 0.9990622370684784
> > > > > > > >  30:  . (6365415/6370480) 0.9992049264733583
> > > > > > > > Stopping: change in training set accuracy less than 1.0E-5
> > > > > > > > Stats: (6362594/6370480) 0.9987621026986977
> > > > > > > > ...done.
> > > > > > > > Indexing events using cutoff of 0
> > > > > > > >
> > > > > > > > Computing event counts...  done. 6370008 events
> > > > > > > > Indexing...  done.
> > > > > > > > Collecting events... Done indexing.
> > > > > > > > Incorporating indexed data for training...
> > > > > > > > done.
> > > > > > > > Number of Event Tokens: 6370008
> > > > > > > >    Number of Outcomes: 23
> > > > > > > >  Number of Predicates: 3737824
> > > > > > > > Computing model parameters...
> > > > > > > > Performing 300 iterations.
> > > > > > > >   1:  . (6330200/6370008) 0.9937507142848172
> > > > > > > >   2:  . (6345643/6370008) 0.9961750440501802
> > > > > > > >   3:  . (6351415/6370008) 0.9970811653611737
> > > > > > > >   4:  . (6354522/6370008) 0.9975689198506501
> > > > > > > >   5:  . (6356723/6370008) 0.9979144453193779
> > > > > > > >   6:  . (6358164/6370008) 0.9981406616757781
> > > > > > > >   7:  . (6359399/6370008) 0.9983345389833106
> > > > > > > >   8:  . (6360274/6370008) 0.9984719014481614
> > > > > > > >   9:  . (6360694/6370008) 0.9985378354312899
> > > > > > > >  10:  . (6361531/6370008) 0.9986692324405244
> > > > > > > > ....
> > > > > > > > ....
> > > > > > > > ....
> > > > > > > >
> > > > > > > > etc etc is that normal ? The parameters are; *0 cutoff* and
> > *300
> > > > > > > > iterators*.
> > > > > > > >
> > > > > > > > The corpus is relative small, it has 20k sentences.
> > > > > > > >
> > > > > > > > I do not remember an output like that using MAXENT
> classifier.
> > > > > > > >
> > > > > > > > Damiano
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Training perceptron model

Posted by Joern Kottmann <ko...@gmail.com>.
You should understand how it works, have a look at this wikipedia article,
the picture on the right side explains it quite nicely.
https://en.wikipedia.org/wiki/Cross-validation_(statistics)

The idea is to split the data into n partitions and then use n-1 for
training and 1 for testing, this is repeated n times, so that each
partition was once used for testing.

It really should be three times as long in your case, maybe there is
something else wrong?'

Jörn

On Mon, Mar 6, 2017 at 12:36 PM, Damiano Porta <da...@gmail.com>
wrote:

> Unfortunately not, 100 iterations ~ 30 minutes 300 iterations > 2 days and
> it is still running... i will block it
>
> i still do not understand what number should i set as *folds*. Ok i will
> set a number > 1 but, should i have to pay more attention to this
> parameter? if i set 8 or 10 does it matter anything?
>
>
>
> 2017-03-06 12:19 GMT+01:00 Joern Kottmann <ko...@gmail.com>:
>
> > test.evaluate(samples, 1), here the second parameter is the number of
> > folds, usually you use 10 or a number larger than 1.
> >
> > The amount of times you need for training with perceptron is linear to
> the
> > iterations, if you use 300 instead of 100 it should take three times as
> > long.
> >
> > Jörn
> >
> > On Mon, Mar 6, 2017 at 11:12 AM, Damiano Porta <da...@gmail.com>
> > wrote:
> >
> > > Jorn,
> > > I am training and testing the model via api. If it is not a training
> > > problem. How is that possible that the evaluation is taking 2 days (and
> > > still running) to evaluate the model? As i told you with 100
> iterations i
> > > can get the model and the test in ~30 minutes.
> > >
> > > I only have a doubt about evaluation, this is the code:
> > >
> > >         try (ObjectStream<NameSample> samples =
> > > ObjectStreamUtils.createObjectStream(evaluation)) {
> > >
> > >             TrainingParameters mlParams = new TrainingParameters();
> > >             mlParams.put(TrainingParameters.ALGORITHM_PARAM,
> > > PerceptronTrainer.PERCEPTRON_VALUE);
> > >             mlParams.put(TrainingParameters.ITERATIONS_PARAM,
> > > Integer.toString(100));
> > >             mlParams.put(TrainingParameters.CUTOFF_PARAM,
> > > Integer.toString(0));
> > >
> > >             TokenNameFinderCrossValidator test = new
> > > TokenNameFinderCrossValidator("it",
> > >                 null, mlParams, null,
> > > (TokenNameFinderEvaluationMonitor)null);
> > >
> > >             test.evaluate(samples, 1); *// <---- SECOND PARAMETER HERE*
> > >
> > >             FMeasure result = test.getFMeasure();
> > >
> > >             System.out.println(result.toString());
> > >         }
> > >
> > > What should i put on the second parameter of test.evaluate() ? Each
> > sample
> > > (in samples variable) represents a document. There are no relations
> with
> > > other samples.
> > >
> > > 2017-03-06 10:56 GMT+01:00 Joern Kottmann <ko...@gmail.com>:
> > >
> > > > Hello,
> > > >
> > > > the model is only available after the training finished, hard to
> guess
> > > what
> > > > you are doing.
> > > >
> > > > Do you use the command line? Which command?
> > > >
> > > > Jörn
> > > >
> > > > On Mon, Mar 6, 2017 at 10:29 AM, Damiano Porta <
> damianoporta@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Hello Jorn,
> > > > > I tried with 300 iterations and it takes forever, reducing that
> > number
> > > to
> > > > > 100 i can finally get the model in half an hour.
> > > > >
> > > > > The problem with 300 iterations is that i can see the model (.bin)
> in
> > > > half
> > > > > an hour too but the computations are still running. So i do not
> > really
> > > > > understand what it is doing.
> > > > >
> > > > > Damiano
> > > > >
> > > > > 2017-03-06 10:19 GMT+01:00 Joern Kottmann <ko...@gmail.com>:
> > > > >
> > > > > > Hello,
> > > > > >
> > > > > > this looks like output from the cross validator.
> > > > > >
> > > > > > Jörn
> > > > > >
> > > > > > On Sun, Mar 5, 2017 at 11:34 AM, Damiano Porta <
> > > damianoporta@gmail.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > I am training a NER model with perceptron classifier (using
> > OpenNLP
> > > > > > 1.7.0)
> > > > > > >
> > > > > > > the output of the training is:
> > > > > > >
> > > > > > > Indexing events using cutoff of 0
> > > > > > >
> > > > > > > Computing event counts...  done. 11861603 events
> > > > > > > Indexing...  done.
> > > > > > > Collecting events... Done indexing.
> > > > > > > Incorporating indexed data for training...
> > > > > > > done.
> > > > > > > Number of Event Tokens: 11861603
> > > > > > >    Number of Outcomes: 23
> > > > > > >  Number of Predicates: 6623489
> > > > > > > Computing model parameters...
> > > > > > > Performing 300 iterations.
> > > > > > >   1:  . (11795234/11861603) 0.9944047191597966
> > > > > > >   2:  . (11820243/11861603) 0.9965131188423689
> > > > > > >   3:  . (11829329/11861603) 0.9972791198626357
> > > > > > >   4:  . (11834935/11861603) 0.9977517372651908
> > > > > > >   5:  . (11838996/11861603) 0.9980941024581584
> > > > > > >   6:  . (11841501/11861603) 0.9983052880795286
> > > > > > >   7:  . (11843704/11861603) 0.998491013398442
> > > > > > >   8:  . (11845304/11861603) 0.9986259024180796
> > > > > > >   9:  . (11846421/11861603) 0.9987200718149141
> > > > > > >  10:  . (11847181/11861603) 0.9987841440992419
> > > > > > >  20:  . (11852226/11861603) 0.9992094660392866
> > > > > > >  30:  . (11853947/11861603) 0.9993545560410343
> > > > > > >  40:  . (11854831/11861603) 0.999429082224384
> > > > > > >  50:  . (11855471/11861603) 0.999483037832239
> > > > > > > Stopping: change in training set accuracy less than 1.0E-5
> > > > > > > Stats: (11846242/11861603) 0.998704981105842
> > > > > > > ...done.
> > > > > > > Compressed 6623489 parameters to 554312
> > > > > > > 6892 outcome patterns
> > > > > > > Indexing events using cutoff of 0
> > > > > > >
> > > > > > > Computing event counts...  done. 6370206 events
> > > > > > > Indexing...  done.
> > > > > > > Collecting events... Done indexing.
> > > > > > > Incorporating indexed data for training...
> > > > > > > done.
> > > > > > > Number of Event Tokens: 6370206
> > > > > > >    Number of Outcomes: 23
> > > > > > >  Number of Predicates: 3737425
> > > > > > > Computing model parameters...
> > > > > > > Performing 300 iterations.
> > > > > > >   1:  . (6330365/6370206) 0.9937457281601254
> > > > > > >   2:  . (6345859/6370206) 0.9961779885925196
> > > > > > >   3:  . (6351552/6370206) 0.9970716802564941
> > > > > > >   4:  . (6354847/6370206) 0.9975889319748843
> > > > > > >   5:  . (6356872/6370206) 0.997906818084062
> > > > > > >   6:  . (6358350/6370206) 0.998138835698563
> > > > > > >   7:  . (6359611/6370206) 0.9983367884806237
> > > > > > >   8:  . (6360473/6370206) 0.9984721059256169
> > > > > > >   9:  . (6361138/6370206) 0.9985764981540628
> > > > > > >  10:  . (6361532/6370206) 0.9986383485871572
> > > > > > >  20:  . (6364161/6370206) 0.9990510510963068
> > > > > > >  30:  . (6365106/6370206) 0.9991993979472563
> > > > > > > Stopping: change in training set accuracy less than 1.0E-5
> > > > > > > Stats: (6360617/6370206) 0.9984947111600473
> > > > > > > ...done.
> > > > > > > Indexing events using cutoff of 0
> > > > > > >
> > > > > > > Computing event counts...  done. 6370114 events
> > > > > > > Indexing...  done.
> > > > > > > Collecting events... Done indexing.
> > > > > > > Incorporating indexed data for training...
> > > > > > > done.
> > > > > > > Number of Event Tokens: 6370114
> > > > > > >    Number of Outcomes: 23
> > > > > > >  Number of Predicates: 3737390
> > > > > > > Computing model parameters...
> > > > > > > Performing 300 iterations.
> > > > > > >   1:  . (6330266/6370114) 0.9937445389517362
> > > > > > >   2:  . (6345810/6370114) 0.9961846836650019
> > > > > > >   3:  . (6351374/6370114) 0.9970581374210885
> > > > > > >   4:  . (6354747/6370114) 0.9975876412886803
> > > > > > >   5:  . (6356872/6370114) 0.9979212302950936
> > > > > > >   6:  . (6358429/6370114) 0.998165652922381
> > > > > > >   7:  . (6359417/6370114) 0.9983207521874805
> > > > > > >   8:  . (6360292/6370114) 0.9984581123665919
> > > > > > >   9:  . (6361076/6370114) 0.9985811870870757
> > > > > > >  10:  . (6361693/6370114) 0.998678045636232
> > > > > > >  20:  . (6364109/6370114) 0.9990573167136413
> > > > > > >  30:  . (6365008/6370114) 0.9991984444862368
> > > > > > >  40:  . (6365478/6370114) 0.9992722265253023
> > > > > > > Stopping: change in training set accuracy less than 1.0E-5
> > > > > > > Stats: (6359985/6370114) 0.9984099185666065
> > > > > > > ...done.
> > > > > > > Indexing events using cutoff of 0
> > > > > > >
> > > > > > > Computing event counts...  done. 6370480 events
> > > > > > > Indexing...  done.
> > > > > > > Collecting events... Done indexing.
> > > > > > > Incorporating indexed data for training...
> > > > > > > done.
> > > > > > > Number of Event Tokens: 6370480
> > > > > > >    Number of Outcomes: 23
> > > > > > >  Number of Predicates: 3737798
> > > > > > > Computing model parameters...
> > > > > > > Performing 300 iterations.
> > > > > > >   1:  . (6330685/6370480) 0.9937532179678769
> > > > > > >   2:  . (6346153/6370480) 0.9961812924614786
> > > > > > >   3:  . (6351726/6370480) 0.9970561088018485
> > > > > > >   4:  . (6355089/6370480) 0.9975840125076917
> > > > > > >   5:  . (6357173/6370480) 0.9979111464128292
> > > > > > >   6:  . (6358780/6370480) 0.9981634036995642
> > > > > > >   7:  . (6359845/6370480) 0.9983305810551167
> > > > > > >   8:  . (6360827/6370480) 0.9984847295651191
> > > > > > >   9:  . (6361316/6370480) 0.9985614898720347
> > > > > > >  10:  . (6362076/6370480) 0.9986807901445417
> > > > > > >  20:  . (6364506/6370480) 0.9990622370684784
> > > > > > >  30:  . (6365415/6370480) 0.9992049264733583
> > > > > > > Stopping: change in training set accuracy less than 1.0E-5
> > > > > > > Stats: (6362594/6370480) 0.9987621026986977
> > > > > > > ...done.
> > > > > > > Indexing events using cutoff of 0
> > > > > > >
> > > > > > > Computing event counts...  done. 6370008 events
> > > > > > > Indexing...  done.
> > > > > > > Collecting events... Done indexing.
> > > > > > > Incorporating indexed data for training...
> > > > > > > done.
> > > > > > > Number of Event Tokens: 6370008
> > > > > > >    Number of Outcomes: 23
> > > > > > >  Number of Predicates: 3737824
> > > > > > > Computing model parameters...
> > > > > > > Performing 300 iterations.
> > > > > > >   1:  . (6330200/6370008) 0.9937507142848172
> > > > > > >   2:  . (6345643/6370008) 0.9961750440501802
> > > > > > >   3:  . (6351415/6370008) 0.9970811653611737
> > > > > > >   4:  . (6354522/6370008) 0.9975689198506501
> > > > > > >   5:  . (6356723/6370008) 0.9979144453193779
> > > > > > >   6:  . (6358164/6370008) 0.9981406616757781
> > > > > > >   7:  . (6359399/6370008) 0.9983345389833106
> > > > > > >   8:  . (6360274/6370008) 0.9984719014481614
> > > > > > >   9:  . (6360694/6370008) 0.9985378354312899
> > > > > > >  10:  . (6361531/6370008) 0.9986692324405244
> > > > > > > ....
> > > > > > > ....
> > > > > > > ....
> > > > > > >
> > > > > > > etc etc is that normal ? The parameters are; *0 cutoff* and
> *300
> > > > > > > iterators*.
> > > > > > >
> > > > > > > The corpus is relative small, it has 20k sentences.
> > > > > > >
> > > > > > > I do not remember an output like that using MAXENT classifier.
> > > > > > >
> > > > > > > Damiano
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Training perceptron model

Posted by Damiano Porta <da...@gmail.com>.
Unfortunately not, 100 iterations ~ 30 minutes 300 iterations > 2 days and
it is still running... i will block it

i still do not understand what number should i set as *folds*. Ok i will
set a number > 1 but, should i have to pay more attention to this
parameter? if i set 8 or 10 does it matter anything?



2017-03-06 12:19 GMT+01:00 Joern Kottmann <ko...@gmail.com>:

> test.evaluate(samples, 1), here the second parameter is the number of
> folds, usually you use 10 or a number larger than 1.
>
> The amount of times you need for training with perceptron is linear to the
> iterations, if you use 300 instead of 100 it should take three times as
> long.
>
> Jörn
>
> On Mon, Mar 6, 2017 at 11:12 AM, Damiano Porta <da...@gmail.com>
> wrote:
>
> > Jorn,
> > I am training and testing the model via api. If it is not a training
> > problem. How is that possible that the evaluation is taking 2 days (and
> > still running) to evaluate the model? As i told you with 100 iterations i
> > can get the model and the test in ~30 minutes.
> >
> > I only have a doubt about evaluation, this is the code:
> >
> >         try (ObjectStream<NameSample> samples =
> > ObjectStreamUtils.createObjectStream(evaluation)) {
> >
> >             TrainingParameters mlParams = new TrainingParameters();
> >             mlParams.put(TrainingParameters.ALGORITHM_PARAM,
> > PerceptronTrainer.PERCEPTRON_VALUE);
> >             mlParams.put(TrainingParameters.ITERATIONS_PARAM,
> > Integer.toString(100));
> >             mlParams.put(TrainingParameters.CUTOFF_PARAM,
> > Integer.toString(0));
> >
> >             TokenNameFinderCrossValidator test = new
> > TokenNameFinderCrossValidator("it",
> >                 null, mlParams, null,
> > (TokenNameFinderEvaluationMonitor)null);
> >
> >             test.evaluate(samples, 1); *// <---- SECOND PARAMETER HERE*
> >
> >             FMeasure result = test.getFMeasure();
> >
> >             System.out.println(result.toString());
> >         }
> >
> > What should i put on the second parameter of test.evaluate() ? Each
> sample
> > (in samples variable) represents a document. There are no relations with
> > other samples.
> >
> > 2017-03-06 10:56 GMT+01:00 Joern Kottmann <ko...@gmail.com>:
> >
> > > Hello,
> > >
> > > the model is only available after the training finished, hard to guess
> > what
> > > you are doing.
> > >
> > > Do you use the command line? Which command?
> > >
> > > Jörn
> > >
> > > On Mon, Mar 6, 2017 at 10:29 AM, Damiano Porta <damianoporta@gmail.com
> >
> > > wrote:
> > >
> > > > Hello Jorn,
> > > > I tried with 300 iterations and it takes forever, reducing that
> number
> > to
> > > > 100 i can finally get the model in half an hour.
> > > >
> > > > The problem with 300 iterations is that i can see the model (.bin) in
> > > half
> > > > an hour too but the computations are still running. So i do not
> really
> > > > understand what it is doing.
> > > >
> > > > Damiano
> > > >
> > > > 2017-03-06 10:19 GMT+01:00 Joern Kottmann <ko...@gmail.com>:
> > > >
> > > > > Hello,
> > > > >
> > > > > this looks like output from the cross validator.
> > > > >
> > > > > Jörn
> > > > >
> > > > > On Sun, Mar 5, 2017 at 11:34 AM, Damiano Porta <
> > damianoporta@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Hello,
> > > > > >
> > > > > > I am training a NER model with perceptron classifier (using
> OpenNLP
> > > > > 1.7.0)
> > > > > >
> > > > > > the output of the training is:
> > > > > >
> > > > > > Indexing events using cutoff of 0
> > > > > >
> > > > > > Computing event counts...  done. 11861603 events
> > > > > > Indexing...  done.
> > > > > > Collecting events... Done indexing.
> > > > > > Incorporating indexed data for training...
> > > > > > done.
> > > > > > Number of Event Tokens: 11861603
> > > > > >    Number of Outcomes: 23
> > > > > >  Number of Predicates: 6623489
> > > > > > Computing model parameters...
> > > > > > Performing 300 iterations.
> > > > > >   1:  . (11795234/11861603) 0.9944047191597966
> > > > > >   2:  . (11820243/11861603) 0.9965131188423689
> > > > > >   3:  . (11829329/11861603) 0.9972791198626357
> > > > > >   4:  . (11834935/11861603) 0.9977517372651908
> > > > > >   5:  . (11838996/11861603) 0.9980941024581584
> > > > > >   6:  . (11841501/11861603) 0.9983052880795286
> > > > > >   7:  . (11843704/11861603) 0.998491013398442
> > > > > >   8:  . (11845304/11861603) 0.9986259024180796
> > > > > >   9:  . (11846421/11861603) 0.9987200718149141
> > > > > >  10:  . (11847181/11861603) 0.9987841440992419
> > > > > >  20:  . (11852226/11861603) 0.9992094660392866
> > > > > >  30:  . (11853947/11861603) 0.9993545560410343
> > > > > >  40:  . (11854831/11861603) 0.999429082224384
> > > > > >  50:  . (11855471/11861603) 0.999483037832239
> > > > > > Stopping: change in training set accuracy less than 1.0E-5
> > > > > > Stats: (11846242/11861603) 0.998704981105842
> > > > > > ...done.
> > > > > > Compressed 6623489 parameters to 554312
> > > > > > 6892 outcome patterns
> > > > > > Indexing events using cutoff of 0
> > > > > >
> > > > > > Computing event counts...  done. 6370206 events
> > > > > > Indexing...  done.
> > > > > > Collecting events... Done indexing.
> > > > > > Incorporating indexed data for training...
> > > > > > done.
> > > > > > Number of Event Tokens: 6370206
> > > > > >    Number of Outcomes: 23
> > > > > >  Number of Predicates: 3737425
> > > > > > Computing model parameters...
> > > > > > Performing 300 iterations.
> > > > > >   1:  . (6330365/6370206) 0.9937457281601254
> > > > > >   2:  . (6345859/6370206) 0.9961779885925196
> > > > > >   3:  . (6351552/6370206) 0.9970716802564941
> > > > > >   4:  . (6354847/6370206) 0.9975889319748843
> > > > > >   5:  . (6356872/6370206) 0.997906818084062
> > > > > >   6:  . (6358350/6370206) 0.998138835698563
> > > > > >   7:  . (6359611/6370206) 0.9983367884806237
> > > > > >   8:  . (6360473/6370206) 0.9984721059256169
> > > > > >   9:  . (6361138/6370206) 0.9985764981540628
> > > > > >  10:  . (6361532/6370206) 0.9986383485871572
> > > > > >  20:  . (6364161/6370206) 0.9990510510963068
> > > > > >  30:  . (6365106/6370206) 0.9991993979472563
> > > > > > Stopping: change in training set accuracy less than 1.0E-5
> > > > > > Stats: (6360617/6370206) 0.9984947111600473
> > > > > > ...done.
> > > > > > Indexing events using cutoff of 0
> > > > > >
> > > > > > Computing event counts...  done. 6370114 events
> > > > > > Indexing...  done.
> > > > > > Collecting events... Done indexing.
> > > > > > Incorporating indexed data for training...
> > > > > > done.
> > > > > > Number of Event Tokens: 6370114
> > > > > >    Number of Outcomes: 23
> > > > > >  Number of Predicates: 3737390
> > > > > > Computing model parameters...
> > > > > > Performing 300 iterations.
> > > > > >   1:  . (6330266/6370114) 0.9937445389517362
> > > > > >   2:  . (6345810/6370114) 0.9961846836650019
> > > > > >   3:  . (6351374/6370114) 0.9970581374210885
> > > > > >   4:  . (6354747/6370114) 0.9975876412886803
> > > > > >   5:  . (6356872/6370114) 0.9979212302950936
> > > > > >   6:  . (6358429/6370114) 0.998165652922381
> > > > > >   7:  . (6359417/6370114) 0.9983207521874805
> > > > > >   8:  . (6360292/6370114) 0.9984581123665919
> > > > > >   9:  . (6361076/6370114) 0.9985811870870757
> > > > > >  10:  . (6361693/6370114) 0.998678045636232
> > > > > >  20:  . (6364109/6370114) 0.9990573167136413
> > > > > >  30:  . (6365008/6370114) 0.9991984444862368
> > > > > >  40:  . (6365478/6370114) 0.9992722265253023
> > > > > > Stopping: change in training set accuracy less than 1.0E-5
> > > > > > Stats: (6359985/6370114) 0.9984099185666065
> > > > > > ...done.
> > > > > > Indexing events using cutoff of 0
> > > > > >
> > > > > > Computing event counts...  done. 6370480 events
> > > > > > Indexing...  done.
> > > > > > Collecting events... Done indexing.
> > > > > > Incorporating indexed data for training...
> > > > > > done.
> > > > > > Number of Event Tokens: 6370480
> > > > > >    Number of Outcomes: 23
> > > > > >  Number of Predicates: 3737798
> > > > > > Computing model parameters...
> > > > > > Performing 300 iterations.
> > > > > >   1:  . (6330685/6370480) 0.9937532179678769
> > > > > >   2:  . (6346153/6370480) 0.9961812924614786
> > > > > >   3:  . (6351726/6370480) 0.9970561088018485
> > > > > >   4:  . (6355089/6370480) 0.9975840125076917
> > > > > >   5:  . (6357173/6370480) 0.9979111464128292
> > > > > >   6:  . (6358780/6370480) 0.9981634036995642
> > > > > >   7:  . (6359845/6370480) 0.9983305810551167
> > > > > >   8:  . (6360827/6370480) 0.9984847295651191
> > > > > >   9:  . (6361316/6370480) 0.9985614898720347
> > > > > >  10:  . (6362076/6370480) 0.9986807901445417
> > > > > >  20:  . (6364506/6370480) 0.9990622370684784
> > > > > >  30:  . (6365415/6370480) 0.9992049264733583
> > > > > > Stopping: change in training set accuracy less than 1.0E-5
> > > > > > Stats: (6362594/6370480) 0.9987621026986977
> > > > > > ...done.
> > > > > > Indexing events using cutoff of 0
> > > > > >
> > > > > > Computing event counts...  done. 6370008 events
> > > > > > Indexing...  done.
> > > > > > Collecting events... Done indexing.
> > > > > > Incorporating indexed data for training...
> > > > > > done.
> > > > > > Number of Event Tokens: 6370008
> > > > > >    Number of Outcomes: 23
> > > > > >  Number of Predicates: 3737824
> > > > > > Computing model parameters...
> > > > > > Performing 300 iterations.
> > > > > >   1:  . (6330200/6370008) 0.9937507142848172
> > > > > >   2:  . (6345643/6370008) 0.9961750440501802
> > > > > >   3:  . (6351415/6370008) 0.9970811653611737
> > > > > >   4:  . (6354522/6370008) 0.9975689198506501
> > > > > >   5:  . (6356723/6370008) 0.9979144453193779
> > > > > >   6:  . (6358164/6370008) 0.9981406616757781
> > > > > >   7:  . (6359399/6370008) 0.9983345389833106
> > > > > >   8:  . (6360274/6370008) 0.9984719014481614
> > > > > >   9:  . (6360694/6370008) 0.9985378354312899
> > > > > >  10:  . (6361531/6370008) 0.9986692324405244
> > > > > > ....
> > > > > > ....
> > > > > > ....
> > > > > >
> > > > > > etc etc is that normal ? The parameters are; *0 cutoff* and *300
> > > > > > iterators*.
> > > > > >
> > > > > > The corpus is relative small, it has 20k sentences.
> > > > > >
> > > > > > I do not remember an output like that using MAXENT classifier.
> > > > > >
> > > > > > Damiano
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Training perceptron model

Posted by Joern Kottmann <ko...@gmail.com>.
test.evaluate(samples, 1), here the second parameter is the number of
folds, usually you use 10 or a number larger than 1.

The amount of times you need for training with perceptron is linear to the
iterations, if you use 300 instead of 100 it should take three times as
long.

Jörn

On Mon, Mar 6, 2017 at 11:12 AM, Damiano Porta <da...@gmail.com>
wrote:

> Jorn,
> I am training and testing the model via api. If it is not a training
> problem. How is that possible that the evaluation is taking 2 days (and
> still running) to evaluate the model? As i told you with 100 iterations i
> can get the model and the test in ~30 minutes.
>
> I only have a doubt about evaluation, this is the code:
>
>         try (ObjectStream<NameSample> samples =
> ObjectStreamUtils.createObjectStream(evaluation)) {
>
>             TrainingParameters mlParams = new TrainingParameters();
>             mlParams.put(TrainingParameters.ALGORITHM_PARAM,
> PerceptronTrainer.PERCEPTRON_VALUE);
>             mlParams.put(TrainingParameters.ITERATIONS_PARAM,
> Integer.toString(100));
>             mlParams.put(TrainingParameters.CUTOFF_PARAM,
> Integer.toString(0));
>
>             TokenNameFinderCrossValidator test = new
> TokenNameFinderCrossValidator("it",
>                 null, mlParams, null,
> (TokenNameFinderEvaluationMonitor)null);
>
>             test.evaluate(samples, 1); *// <---- SECOND PARAMETER HERE*
>
>             FMeasure result = test.getFMeasure();
>
>             System.out.println(result.toString());
>         }
>
> What should i put on the second parameter of test.evaluate() ? Each sample
> (in samples variable) represents a document. There are no relations with
> other samples.
>
> 2017-03-06 10:56 GMT+01:00 Joern Kottmann <ko...@gmail.com>:
>
> > Hello,
> >
> > the model is only available after the training finished, hard to guess
> what
> > you are doing.
> >
> > Do you use the command line? Which command?
> >
> > Jörn
> >
> > On Mon, Mar 6, 2017 at 10:29 AM, Damiano Porta <da...@gmail.com>
> > wrote:
> >
> > > Hello Jorn,
> > > I tried with 300 iterations and it takes forever, reducing that number
> to
> > > 100 i can finally get the model in half an hour.
> > >
> > > The problem with 300 iterations is that i can see the model (.bin) in
> > half
> > > an hour too but the computations are still running. So i do not really
> > > understand what it is doing.
> > >
> > > Damiano
> > >
> > > 2017-03-06 10:19 GMT+01:00 Joern Kottmann <ko...@gmail.com>:
> > >
> > > > Hello,
> > > >
> > > > this looks like output from the cross validator.
> > > >
> > > > Jörn
> > > >
> > > > On Sun, Mar 5, 2017 at 11:34 AM, Damiano Porta <
> damianoporta@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > I am training a NER model with perceptron classifier (using OpenNLP
> > > > 1.7.0)
> > > > >
> > > > > the output of the training is:
> > > > >
> > > > > Indexing events using cutoff of 0
> > > > >
> > > > > Computing event counts...  done. 11861603 events
> > > > > Indexing...  done.
> > > > > Collecting events... Done indexing.
> > > > > Incorporating indexed data for training...
> > > > > done.
> > > > > Number of Event Tokens: 11861603
> > > > >    Number of Outcomes: 23
> > > > >  Number of Predicates: 6623489
> > > > > Computing model parameters...
> > > > > Performing 300 iterations.
> > > > >   1:  . (11795234/11861603) 0.9944047191597966
> > > > >   2:  . (11820243/11861603) 0.9965131188423689
> > > > >   3:  . (11829329/11861603) 0.9972791198626357
> > > > >   4:  . (11834935/11861603) 0.9977517372651908
> > > > >   5:  . (11838996/11861603) 0.9980941024581584
> > > > >   6:  . (11841501/11861603) 0.9983052880795286
> > > > >   7:  . (11843704/11861603) 0.998491013398442
> > > > >   8:  . (11845304/11861603) 0.9986259024180796
> > > > >   9:  . (11846421/11861603) 0.9987200718149141
> > > > >  10:  . (11847181/11861603) 0.9987841440992419
> > > > >  20:  . (11852226/11861603) 0.9992094660392866
> > > > >  30:  . (11853947/11861603) 0.9993545560410343
> > > > >  40:  . (11854831/11861603) 0.999429082224384
> > > > >  50:  . (11855471/11861603) 0.999483037832239
> > > > > Stopping: change in training set accuracy less than 1.0E-5
> > > > > Stats: (11846242/11861603) 0.998704981105842
> > > > > ...done.
> > > > > Compressed 6623489 parameters to 554312
> > > > > 6892 outcome patterns
> > > > > Indexing events using cutoff of 0
> > > > >
> > > > > Computing event counts...  done. 6370206 events
> > > > > Indexing...  done.
> > > > > Collecting events... Done indexing.
> > > > > Incorporating indexed data for training...
> > > > > done.
> > > > > Number of Event Tokens: 6370206
> > > > >    Number of Outcomes: 23
> > > > >  Number of Predicates: 3737425
> > > > > Computing model parameters...
> > > > > Performing 300 iterations.
> > > > >   1:  . (6330365/6370206) 0.9937457281601254
> > > > >   2:  . (6345859/6370206) 0.9961779885925196
> > > > >   3:  . (6351552/6370206) 0.9970716802564941
> > > > >   4:  . (6354847/6370206) 0.9975889319748843
> > > > >   5:  . (6356872/6370206) 0.997906818084062
> > > > >   6:  . (6358350/6370206) 0.998138835698563
> > > > >   7:  . (6359611/6370206) 0.9983367884806237
> > > > >   8:  . (6360473/6370206) 0.9984721059256169
> > > > >   9:  . (6361138/6370206) 0.9985764981540628
> > > > >  10:  . (6361532/6370206) 0.9986383485871572
> > > > >  20:  . (6364161/6370206) 0.9990510510963068
> > > > >  30:  . (6365106/6370206) 0.9991993979472563
> > > > > Stopping: change in training set accuracy less than 1.0E-5
> > > > > Stats: (6360617/6370206) 0.9984947111600473
> > > > > ...done.
> > > > > Indexing events using cutoff of 0
> > > > >
> > > > > Computing event counts...  done. 6370114 events
> > > > > Indexing...  done.
> > > > > Collecting events... Done indexing.
> > > > > Incorporating indexed data for training...
> > > > > done.
> > > > > Number of Event Tokens: 6370114
> > > > >    Number of Outcomes: 23
> > > > >  Number of Predicates: 3737390
> > > > > Computing model parameters...
> > > > > Performing 300 iterations.
> > > > >   1:  . (6330266/6370114) 0.9937445389517362
> > > > >   2:  . (6345810/6370114) 0.9961846836650019
> > > > >   3:  . (6351374/6370114) 0.9970581374210885
> > > > >   4:  . (6354747/6370114) 0.9975876412886803
> > > > >   5:  . (6356872/6370114) 0.9979212302950936
> > > > >   6:  . (6358429/6370114) 0.998165652922381
> > > > >   7:  . (6359417/6370114) 0.9983207521874805
> > > > >   8:  . (6360292/6370114) 0.9984581123665919
> > > > >   9:  . (6361076/6370114) 0.9985811870870757
> > > > >  10:  . (6361693/6370114) 0.998678045636232
> > > > >  20:  . (6364109/6370114) 0.9990573167136413
> > > > >  30:  . (6365008/6370114) 0.9991984444862368
> > > > >  40:  . (6365478/6370114) 0.9992722265253023
> > > > > Stopping: change in training set accuracy less than 1.0E-5
> > > > > Stats: (6359985/6370114) 0.9984099185666065
> > > > > ...done.
> > > > > Indexing events using cutoff of 0
> > > > >
> > > > > Computing event counts...  done. 6370480 events
> > > > > Indexing...  done.
> > > > > Collecting events... Done indexing.
> > > > > Incorporating indexed data for training...
> > > > > done.
> > > > > Number of Event Tokens: 6370480
> > > > >    Number of Outcomes: 23
> > > > >  Number of Predicates: 3737798
> > > > > Computing model parameters...
> > > > > Performing 300 iterations.
> > > > >   1:  . (6330685/6370480) 0.9937532179678769
> > > > >   2:  . (6346153/6370480) 0.9961812924614786
> > > > >   3:  . (6351726/6370480) 0.9970561088018485
> > > > >   4:  . (6355089/6370480) 0.9975840125076917
> > > > >   5:  . (6357173/6370480) 0.9979111464128292
> > > > >   6:  . (6358780/6370480) 0.9981634036995642
> > > > >   7:  . (6359845/6370480) 0.9983305810551167
> > > > >   8:  . (6360827/6370480) 0.9984847295651191
> > > > >   9:  . (6361316/6370480) 0.9985614898720347
> > > > >  10:  . (6362076/6370480) 0.9986807901445417
> > > > >  20:  . (6364506/6370480) 0.9990622370684784
> > > > >  30:  . (6365415/6370480) 0.9992049264733583
> > > > > Stopping: change in training set accuracy less than 1.0E-5
> > > > > Stats: (6362594/6370480) 0.9987621026986977
> > > > > ...done.
> > > > > Indexing events using cutoff of 0
> > > > >
> > > > > Computing event counts...  done. 6370008 events
> > > > > Indexing...  done.
> > > > > Collecting events... Done indexing.
> > > > > Incorporating indexed data for training...
> > > > > done.
> > > > > Number of Event Tokens: 6370008
> > > > >    Number of Outcomes: 23
> > > > >  Number of Predicates: 3737824
> > > > > Computing model parameters...
> > > > > Performing 300 iterations.
> > > > >   1:  . (6330200/6370008) 0.9937507142848172
> > > > >   2:  . (6345643/6370008) 0.9961750440501802
> > > > >   3:  . (6351415/6370008) 0.9970811653611737
> > > > >   4:  . (6354522/6370008) 0.9975689198506501
> > > > >   5:  . (6356723/6370008) 0.9979144453193779
> > > > >   6:  . (6358164/6370008) 0.9981406616757781
> > > > >   7:  . (6359399/6370008) 0.9983345389833106
> > > > >   8:  . (6360274/6370008) 0.9984719014481614
> > > > >   9:  . (6360694/6370008) 0.9985378354312899
> > > > >  10:  . (6361531/6370008) 0.9986692324405244
> > > > > ....
> > > > > ....
> > > > > ....
> > > > >
> > > > > etc etc is that normal ? The parameters are; *0 cutoff* and *300
> > > > > iterators*.
> > > > >
> > > > > The corpus is relative small, it has 20k sentences.
> > > > >
> > > > > I do not remember an output like that using MAXENT classifier.
> > > > >
> > > > > Damiano
> > > > >
> > > >
> > >
> >
>

Re: Training perceptron model

Posted by Damiano Porta <da...@gmail.com>.
Jorn,
I am training and testing the model via api. If it is not a training
problem. How is that possible that the evaluation is taking 2 days (and
still running) to evaluate the model? As i told you with 100 iterations i
can get the model and the test in ~30 minutes.

I only have a doubt about evaluation, this is the code:

        try (ObjectStream<NameSample> samples =
ObjectStreamUtils.createObjectStream(evaluation)) {

            TrainingParameters mlParams = new TrainingParameters();
            mlParams.put(TrainingParameters.ALGORITHM_PARAM,
PerceptronTrainer.PERCEPTRON_VALUE);
            mlParams.put(TrainingParameters.ITERATIONS_PARAM,
Integer.toString(100));
            mlParams.put(TrainingParameters.CUTOFF_PARAM,
Integer.toString(0));

            TokenNameFinderCrossValidator test = new
TokenNameFinderCrossValidator("it",
                null, mlParams, null,
(TokenNameFinderEvaluationMonitor)null);

            test.evaluate(samples, 1); *// <---- SECOND PARAMETER HERE*

            FMeasure result = test.getFMeasure();

            System.out.println(result.toString());
        }

What should i put on the second parameter of test.evaluate() ? Each sample
(in samples variable) represents a document. There are no relations with
other samples.

2017-03-06 10:56 GMT+01:00 Joern Kottmann <ko...@gmail.com>:

> Hello,
>
> the model is only available after the training finished, hard to guess what
> you are doing.
>
> Do you use the command line? Which command?
>
> Jörn
>
> On Mon, Mar 6, 2017 at 10:29 AM, Damiano Porta <da...@gmail.com>
> wrote:
>
> > Hello Jorn,
> > I tried with 300 iterations and it takes forever, reducing that number to
> > 100 i can finally get the model in half an hour.
> >
> > The problem with 300 iterations is that i can see the model (.bin) in
> half
> > an hour too but the computations are still running. So i do not really
> > understand what it is doing.
> >
> > Damiano
> >
> > 2017-03-06 10:19 GMT+01:00 Joern Kottmann <ko...@gmail.com>:
> >
> > > Hello,
> > >
> > > this looks like output from the cross validator.
> > >
> > > Jörn
> > >
> > > On Sun, Mar 5, 2017 at 11:34 AM, Damiano Porta <damianoporta@gmail.com
> >
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > I am training a NER model with perceptron classifier (using OpenNLP
> > > 1.7.0)
> > > >
> > > > the output of the training is:
> > > >
> > > > Indexing events using cutoff of 0
> > > >
> > > > Computing event counts...  done. 11861603 events
> > > > Indexing...  done.
> > > > Collecting events... Done indexing.
> > > > Incorporating indexed data for training...
> > > > done.
> > > > Number of Event Tokens: 11861603
> > > >    Number of Outcomes: 23
> > > >  Number of Predicates: 6623489
> > > > Computing model parameters...
> > > > Performing 300 iterations.
> > > >   1:  . (11795234/11861603) 0.9944047191597966
> > > >   2:  . (11820243/11861603) 0.9965131188423689
> > > >   3:  . (11829329/11861603) 0.9972791198626357
> > > >   4:  . (11834935/11861603) 0.9977517372651908
> > > >   5:  . (11838996/11861603) 0.9980941024581584
> > > >   6:  . (11841501/11861603) 0.9983052880795286
> > > >   7:  . (11843704/11861603) 0.998491013398442
> > > >   8:  . (11845304/11861603) 0.9986259024180796
> > > >   9:  . (11846421/11861603) 0.9987200718149141
> > > >  10:  . (11847181/11861603) 0.9987841440992419
> > > >  20:  . (11852226/11861603) 0.9992094660392866
> > > >  30:  . (11853947/11861603) 0.9993545560410343
> > > >  40:  . (11854831/11861603) 0.999429082224384
> > > >  50:  . (11855471/11861603) 0.999483037832239
> > > > Stopping: change in training set accuracy less than 1.0E-5
> > > > Stats: (11846242/11861603) 0.998704981105842
> > > > ...done.
> > > > Compressed 6623489 parameters to 554312
> > > > 6892 outcome patterns
> > > > Indexing events using cutoff of 0
> > > >
> > > > Computing event counts...  done. 6370206 events
> > > > Indexing...  done.
> > > > Collecting events... Done indexing.
> > > > Incorporating indexed data for training...
> > > > done.
> > > > Number of Event Tokens: 6370206
> > > >    Number of Outcomes: 23
> > > >  Number of Predicates: 3737425
> > > > Computing model parameters...
> > > > Performing 300 iterations.
> > > >   1:  . (6330365/6370206) 0.9937457281601254
> > > >   2:  . (6345859/6370206) 0.9961779885925196
> > > >   3:  . (6351552/6370206) 0.9970716802564941
> > > >   4:  . (6354847/6370206) 0.9975889319748843
> > > >   5:  . (6356872/6370206) 0.997906818084062
> > > >   6:  . (6358350/6370206) 0.998138835698563
> > > >   7:  . (6359611/6370206) 0.9983367884806237
> > > >   8:  . (6360473/6370206) 0.9984721059256169
> > > >   9:  . (6361138/6370206) 0.9985764981540628
> > > >  10:  . (6361532/6370206) 0.9986383485871572
> > > >  20:  . (6364161/6370206) 0.9990510510963068
> > > >  30:  . (6365106/6370206) 0.9991993979472563
> > > > Stopping: change in training set accuracy less than 1.0E-5
> > > > Stats: (6360617/6370206) 0.9984947111600473
> > > > ...done.
> > > > Indexing events using cutoff of 0
> > > >
> > > > Computing event counts...  done. 6370114 events
> > > > Indexing...  done.
> > > > Collecting events... Done indexing.
> > > > Incorporating indexed data for training...
> > > > done.
> > > > Number of Event Tokens: 6370114
> > > >    Number of Outcomes: 23
> > > >  Number of Predicates: 3737390
> > > > Computing model parameters...
> > > > Performing 300 iterations.
> > > >   1:  . (6330266/6370114) 0.9937445389517362
> > > >   2:  . (6345810/6370114) 0.9961846836650019
> > > >   3:  . (6351374/6370114) 0.9970581374210885
> > > >   4:  . (6354747/6370114) 0.9975876412886803
> > > >   5:  . (6356872/6370114) 0.9979212302950936
> > > >   6:  . (6358429/6370114) 0.998165652922381
> > > >   7:  . (6359417/6370114) 0.9983207521874805
> > > >   8:  . (6360292/6370114) 0.9984581123665919
> > > >   9:  . (6361076/6370114) 0.9985811870870757
> > > >  10:  . (6361693/6370114) 0.998678045636232
> > > >  20:  . (6364109/6370114) 0.9990573167136413
> > > >  30:  . (6365008/6370114) 0.9991984444862368
> > > >  40:  . (6365478/6370114) 0.9992722265253023
> > > > Stopping: change in training set accuracy less than 1.0E-5
> > > > Stats: (6359985/6370114) 0.9984099185666065
> > > > ...done.
> > > > Indexing events using cutoff of 0
> > > >
> > > > Computing event counts...  done. 6370480 events
> > > > Indexing...  done.
> > > > Collecting events... Done indexing.
> > > > Incorporating indexed data for training...
> > > > done.
> > > > Number of Event Tokens: 6370480
> > > >    Number of Outcomes: 23
> > > >  Number of Predicates: 3737798
> > > > Computing model parameters...
> > > > Performing 300 iterations.
> > > >   1:  . (6330685/6370480) 0.9937532179678769
> > > >   2:  . (6346153/6370480) 0.9961812924614786
> > > >   3:  . (6351726/6370480) 0.9970561088018485
> > > >   4:  . (6355089/6370480) 0.9975840125076917
> > > >   5:  . (6357173/6370480) 0.9979111464128292
> > > >   6:  . (6358780/6370480) 0.9981634036995642
> > > >   7:  . (6359845/6370480) 0.9983305810551167
> > > >   8:  . (6360827/6370480) 0.9984847295651191
> > > >   9:  . (6361316/6370480) 0.9985614898720347
> > > >  10:  . (6362076/6370480) 0.9986807901445417
> > > >  20:  . (6364506/6370480) 0.9990622370684784
> > > >  30:  . (6365415/6370480) 0.9992049264733583
> > > > Stopping: change in training set accuracy less than 1.0E-5
> > > > Stats: (6362594/6370480) 0.9987621026986977
> > > > ...done.
> > > > Indexing events using cutoff of 0
> > > >
> > > > Computing event counts...  done. 6370008 events
> > > > Indexing...  done.
> > > > Collecting events... Done indexing.
> > > > Incorporating indexed data for training...
> > > > done.
> > > > Number of Event Tokens: 6370008
> > > >    Number of Outcomes: 23
> > > >  Number of Predicates: 3737824
> > > > Computing model parameters...
> > > > Performing 300 iterations.
> > > >   1:  . (6330200/6370008) 0.9937507142848172
> > > >   2:  . (6345643/6370008) 0.9961750440501802
> > > >   3:  . (6351415/6370008) 0.9970811653611737
> > > >   4:  . (6354522/6370008) 0.9975689198506501
> > > >   5:  . (6356723/6370008) 0.9979144453193779
> > > >   6:  . (6358164/6370008) 0.9981406616757781
> > > >   7:  . (6359399/6370008) 0.9983345389833106
> > > >   8:  . (6360274/6370008) 0.9984719014481614
> > > >   9:  . (6360694/6370008) 0.9985378354312899
> > > >  10:  . (6361531/6370008) 0.9986692324405244
> > > > ....
> > > > ....
> > > > ....
> > > >
> > > > etc etc is that normal ? The parameters are; *0 cutoff* and *300
> > > > iterators*.
> > > >
> > > > The corpus is relative small, it has 20k sentences.
> > > >
> > > > I do not remember an output like that using MAXENT classifier.
> > > >
> > > > Damiano
> > > >
> > >
> >
>

Re: Training perceptron model

Posted by Joern Kottmann <ko...@gmail.com>.
Hello,

the model is only available after the training finished, hard to guess what
you are doing.

Do you use the command line? Which command?

Jörn

On Mon, Mar 6, 2017 at 10:29 AM, Damiano Porta <da...@gmail.com>
wrote:

> Hello Jorn,
> I tried with 300 iterations and it takes forever, reducing that number to
> 100 i can finally get the model in half an hour.
>
> The problem with 300 iterations is that i can see the model (.bin) in half
> an hour too but the computations are still running. So i do not really
> understand what it is doing.
>
> Damiano
>
> 2017-03-06 10:19 GMT+01:00 Joern Kottmann <ko...@gmail.com>:
>
> > Hello,
> >
> > this looks like output from the cross validator.
> >
> > Jörn
> >
> > On Sun, Mar 5, 2017 at 11:34 AM, Damiano Porta <da...@gmail.com>
> > wrote:
> >
> > > Hello,
> > >
> > > I am training a NER model with perceptron classifier (using OpenNLP
> > 1.7.0)
> > >
> > > the output of the training is:
> > >
> > > Indexing events using cutoff of 0
> > >
> > > Computing event counts...  done. 11861603 events
> > > Indexing...  done.
> > > Collecting events... Done indexing.
> > > Incorporating indexed data for training...
> > > done.
> > > Number of Event Tokens: 11861603
> > >    Number of Outcomes: 23
> > >  Number of Predicates: 6623489
> > > Computing model parameters...
> > > Performing 300 iterations.
> > >   1:  . (11795234/11861603) 0.9944047191597966
> > >   2:  . (11820243/11861603) 0.9965131188423689
> > >   3:  . (11829329/11861603) 0.9972791198626357
> > >   4:  . (11834935/11861603) 0.9977517372651908
> > >   5:  . (11838996/11861603) 0.9980941024581584
> > >   6:  . (11841501/11861603) 0.9983052880795286
> > >   7:  . (11843704/11861603) 0.998491013398442
> > >   8:  . (11845304/11861603) 0.9986259024180796
> > >   9:  . (11846421/11861603) 0.9987200718149141
> > >  10:  . (11847181/11861603) 0.9987841440992419
> > >  20:  . (11852226/11861603) 0.9992094660392866
> > >  30:  . (11853947/11861603) 0.9993545560410343
> > >  40:  . (11854831/11861603) 0.999429082224384
> > >  50:  . (11855471/11861603) 0.999483037832239
> > > Stopping: change in training set accuracy less than 1.0E-5
> > > Stats: (11846242/11861603) 0.998704981105842
> > > ...done.
> > > Compressed 6623489 parameters to 554312
> > > 6892 outcome patterns
> > > Indexing events using cutoff of 0
> > >
> > > Computing event counts...  done. 6370206 events
> > > Indexing...  done.
> > > Collecting events... Done indexing.
> > > Incorporating indexed data for training...
> > > done.
> > > Number of Event Tokens: 6370206
> > >    Number of Outcomes: 23
> > >  Number of Predicates: 3737425
> > > Computing model parameters...
> > > Performing 300 iterations.
> > >   1:  . (6330365/6370206) 0.9937457281601254
> > >   2:  . (6345859/6370206) 0.9961779885925196
> > >   3:  . (6351552/6370206) 0.9970716802564941
> > >   4:  . (6354847/6370206) 0.9975889319748843
> > >   5:  . (6356872/6370206) 0.997906818084062
> > >   6:  . (6358350/6370206) 0.998138835698563
> > >   7:  . (6359611/6370206) 0.9983367884806237
> > >   8:  . (6360473/6370206) 0.9984721059256169
> > >   9:  . (6361138/6370206) 0.9985764981540628
> > >  10:  . (6361532/6370206) 0.9986383485871572
> > >  20:  . (6364161/6370206) 0.9990510510963068
> > >  30:  . (6365106/6370206) 0.9991993979472563
> > > Stopping: change in training set accuracy less than 1.0E-5
> > > Stats: (6360617/6370206) 0.9984947111600473
> > > ...done.
> > > Indexing events using cutoff of 0
> > >
> > > Computing event counts...  done. 6370114 events
> > > Indexing...  done.
> > > Collecting events... Done indexing.
> > > Incorporating indexed data for training...
> > > done.
> > > Number of Event Tokens: 6370114
> > >    Number of Outcomes: 23
> > >  Number of Predicates: 3737390
> > > Computing model parameters...
> > > Performing 300 iterations.
> > >   1:  . (6330266/6370114) 0.9937445389517362
> > >   2:  . (6345810/6370114) 0.9961846836650019
> > >   3:  . (6351374/6370114) 0.9970581374210885
> > >   4:  . (6354747/6370114) 0.9975876412886803
> > >   5:  . (6356872/6370114) 0.9979212302950936
> > >   6:  . (6358429/6370114) 0.998165652922381
> > >   7:  . (6359417/6370114) 0.9983207521874805
> > >   8:  . (6360292/6370114) 0.9984581123665919
> > >   9:  . (6361076/6370114) 0.9985811870870757
> > >  10:  . (6361693/6370114) 0.998678045636232
> > >  20:  . (6364109/6370114) 0.9990573167136413
> > >  30:  . (6365008/6370114) 0.9991984444862368
> > >  40:  . (6365478/6370114) 0.9992722265253023
> > > Stopping: change in training set accuracy less than 1.0E-5
> > > Stats: (6359985/6370114) 0.9984099185666065
> > > ...done.
> > > Indexing events using cutoff of 0
> > >
> > > Computing event counts...  done. 6370480 events
> > > Indexing...  done.
> > > Collecting events... Done indexing.
> > > Incorporating indexed data for training...
> > > done.
> > > Number of Event Tokens: 6370480
> > >    Number of Outcomes: 23
> > >  Number of Predicates: 3737798
> > > Computing model parameters...
> > > Performing 300 iterations.
> > >   1:  . (6330685/6370480) 0.9937532179678769
> > >   2:  . (6346153/6370480) 0.9961812924614786
> > >   3:  . (6351726/6370480) 0.9970561088018485
> > >   4:  . (6355089/6370480) 0.9975840125076917
> > >   5:  . (6357173/6370480) 0.9979111464128292
> > >   6:  . (6358780/6370480) 0.9981634036995642
> > >   7:  . (6359845/6370480) 0.9983305810551167
> > >   8:  . (6360827/6370480) 0.9984847295651191
> > >   9:  . (6361316/6370480) 0.9985614898720347
> > >  10:  . (6362076/6370480) 0.9986807901445417
> > >  20:  . (6364506/6370480) 0.9990622370684784
> > >  30:  . (6365415/6370480) 0.9992049264733583
> > > Stopping: change in training set accuracy less than 1.0E-5
> > > Stats: (6362594/6370480) 0.9987621026986977
> > > ...done.
> > > Indexing events using cutoff of 0
> > >
> > > Computing event counts...  done. 6370008 events
> > > Indexing...  done.
> > > Collecting events... Done indexing.
> > > Incorporating indexed data for training...
> > > done.
> > > Number of Event Tokens: 6370008
> > >    Number of Outcomes: 23
> > >  Number of Predicates: 3737824
> > > Computing model parameters...
> > > Performing 300 iterations.
> > >   1:  . (6330200/6370008) 0.9937507142848172
> > >   2:  . (6345643/6370008) 0.9961750440501802
> > >   3:  . (6351415/6370008) 0.9970811653611737
> > >   4:  . (6354522/6370008) 0.9975689198506501
> > >   5:  . (6356723/6370008) 0.9979144453193779
> > >   6:  . (6358164/6370008) 0.9981406616757781
> > >   7:  . (6359399/6370008) 0.9983345389833106
> > >   8:  . (6360274/6370008) 0.9984719014481614
> > >   9:  . (6360694/6370008) 0.9985378354312899
> > >  10:  . (6361531/6370008) 0.9986692324405244
> > > ....
> > > ....
> > > ....
> > >
> > > etc etc is that normal ? The parameters are; *0 cutoff* and *300
> > > iterators*.
> > >
> > > The corpus is relative small, it has 20k sentences.
> > >
> > > I do not remember an output like that using MAXENT classifier.
> > >
> > > Damiano
> > >
> >
>

Re: Training perceptron model

Posted by Damiano Porta <da...@gmail.com>.
Hello Jorn,
I tried with 300 iterations and it takes forever, reducing that number to
100 i can finally get the model in half an hour.

The problem with 300 iterations is that i can see the model (.bin) in half
an hour too but the computations are still running. So i do not really
understand what it is doing.

Damiano

2017-03-06 10:19 GMT+01:00 Joern Kottmann <ko...@gmail.com>:

> Hello,
>
> this looks like output from the cross validator.
>
> Jörn
>
> On Sun, Mar 5, 2017 at 11:34 AM, Damiano Porta <da...@gmail.com>
> wrote:
>
> > Hello,
> >
> > I am training a NER model with perceptron classifier (using OpenNLP
> 1.7.0)
> >
> > the output of the training is:
> >
> > Indexing events using cutoff of 0
> >
> > Computing event counts...  done. 11861603 events
> > Indexing...  done.
> > Collecting events... Done indexing.
> > Incorporating indexed data for training...
> > done.
> > Number of Event Tokens: 11861603
> >    Number of Outcomes: 23
> >  Number of Predicates: 6623489
> > Computing model parameters...
> > Performing 300 iterations.
> >   1:  . (11795234/11861603) 0.9944047191597966
> >   2:  . (11820243/11861603) 0.9965131188423689
> >   3:  . (11829329/11861603) 0.9972791198626357
> >   4:  . (11834935/11861603) 0.9977517372651908
> >   5:  . (11838996/11861603) 0.9980941024581584
> >   6:  . (11841501/11861603) 0.9983052880795286
> >   7:  . (11843704/11861603) 0.998491013398442
> >   8:  . (11845304/11861603) 0.9986259024180796
> >   9:  . (11846421/11861603) 0.9987200718149141
> >  10:  . (11847181/11861603) 0.9987841440992419
> >  20:  . (11852226/11861603) 0.9992094660392866
> >  30:  . (11853947/11861603) 0.9993545560410343
> >  40:  . (11854831/11861603) 0.999429082224384
> >  50:  . (11855471/11861603) 0.999483037832239
> > Stopping: change in training set accuracy less than 1.0E-5
> > Stats: (11846242/11861603) 0.998704981105842
> > ...done.
> > Compressed 6623489 parameters to 554312
> > 6892 outcome patterns
> > Indexing events using cutoff of 0
> >
> > Computing event counts...  done. 6370206 events
> > Indexing...  done.
> > Collecting events... Done indexing.
> > Incorporating indexed data for training...
> > done.
> > Number of Event Tokens: 6370206
> >    Number of Outcomes: 23
> >  Number of Predicates: 3737425
> > Computing model parameters...
> > Performing 300 iterations.
> >   1:  . (6330365/6370206) 0.9937457281601254
> >   2:  . (6345859/6370206) 0.9961779885925196
> >   3:  . (6351552/6370206) 0.9970716802564941
> >   4:  . (6354847/6370206) 0.9975889319748843
> >   5:  . (6356872/6370206) 0.997906818084062
> >   6:  . (6358350/6370206) 0.998138835698563
> >   7:  . (6359611/6370206) 0.9983367884806237
> >   8:  . (6360473/6370206) 0.9984721059256169
> >   9:  . (6361138/6370206) 0.9985764981540628
> >  10:  . (6361532/6370206) 0.9986383485871572
> >  20:  . (6364161/6370206) 0.9990510510963068
> >  30:  . (6365106/6370206) 0.9991993979472563
> > Stopping: change in training set accuracy less than 1.0E-5
> > Stats: (6360617/6370206) 0.9984947111600473
> > ...done.
> > Indexing events using cutoff of 0
> >
> > Computing event counts...  done. 6370114 events
> > Indexing...  done.
> > Collecting events... Done indexing.
> > Incorporating indexed data for training...
> > done.
> > Number of Event Tokens: 6370114
> >    Number of Outcomes: 23
> >  Number of Predicates: 3737390
> > Computing model parameters...
> > Performing 300 iterations.
> >   1:  . (6330266/6370114) 0.9937445389517362
> >   2:  . (6345810/6370114) 0.9961846836650019
> >   3:  . (6351374/6370114) 0.9970581374210885
> >   4:  . (6354747/6370114) 0.9975876412886803
> >   5:  . (6356872/6370114) 0.9979212302950936
> >   6:  . (6358429/6370114) 0.998165652922381
> >   7:  . (6359417/6370114) 0.9983207521874805
> >   8:  . (6360292/6370114) 0.9984581123665919
> >   9:  . (6361076/6370114) 0.9985811870870757
> >  10:  . (6361693/6370114) 0.998678045636232
> >  20:  . (6364109/6370114) 0.9990573167136413
> >  30:  . (6365008/6370114) 0.9991984444862368
> >  40:  . (6365478/6370114) 0.9992722265253023
> > Stopping: change in training set accuracy less than 1.0E-5
> > Stats: (6359985/6370114) 0.9984099185666065
> > ...done.
> > Indexing events using cutoff of 0
> >
> > Computing event counts...  done. 6370480 events
> > Indexing...  done.
> > Collecting events... Done indexing.
> > Incorporating indexed data for training...
> > done.
> > Number of Event Tokens: 6370480
> >    Number of Outcomes: 23
> >  Number of Predicates: 3737798
> > Computing model parameters...
> > Performing 300 iterations.
> >   1:  . (6330685/6370480) 0.9937532179678769
> >   2:  . (6346153/6370480) 0.9961812924614786
> >   3:  . (6351726/6370480) 0.9970561088018485
> >   4:  . (6355089/6370480) 0.9975840125076917
> >   5:  . (6357173/6370480) 0.9979111464128292
> >   6:  . (6358780/6370480) 0.9981634036995642
> >   7:  . (6359845/6370480) 0.9983305810551167
> >   8:  . (6360827/6370480) 0.9984847295651191
> >   9:  . (6361316/6370480) 0.9985614898720347
> >  10:  . (6362076/6370480) 0.9986807901445417
> >  20:  . (6364506/6370480) 0.9990622370684784
> >  30:  . (6365415/6370480) 0.9992049264733583
> > Stopping: change in training set accuracy less than 1.0E-5
> > Stats: (6362594/6370480) 0.9987621026986977
> > ...done.
> > Indexing events using cutoff of 0
> >
> > Computing event counts...  done. 6370008 events
> > Indexing...  done.
> > Collecting events... Done indexing.
> > Incorporating indexed data for training...
> > done.
> > Number of Event Tokens: 6370008
> >    Number of Outcomes: 23
> >  Number of Predicates: 3737824
> > Computing model parameters...
> > Performing 300 iterations.
> >   1:  . (6330200/6370008) 0.9937507142848172
> >   2:  . (6345643/6370008) 0.9961750440501802
> >   3:  . (6351415/6370008) 0.9970811653611737
> >   4:  . (6354522/6370008) 0.9975689198506501
> >   5:  . (6356723/6370008) 0.9979144453193779
> >   6:  . (6358164/6370008) 0.9981406616757781
> >   7:  . (6359399/6370008) 0.9983345389833106
> >   8:  . (6360274/6370008) 0.9984719014481614
> >   9:  . (6360694/6370008) 0.9985378354312899
> >  10:  . (6361531/6370008) 0.9986692324405244
> > ....
> > ....
> > ....
> >
> > etc etc is that normal ? The parameters are; *0 cutoff* and *300
> > iterators*.
> >
> > The corpus is relative small, it has 20k sentences.
> >
> > I do not remember an output like that using MAXENT classifier.
> >
> > Damiano
> >
>

Re: Training perceptron model

Posted by Joern Kottmann <ko...@gmail.com>.
Hello,

this looks like output from the cross validator.

Jörn

On Sun, Mar 5, 2017 at 11:34 AM, Damiano Porta <da...@gmail.com>
wrote:

> Hello,
>
> I am training a NER model with perceptron classifier (using OpenNLP 1.7.0)
>
> the output of the training is:
>
> Indexing events using cutoff of 0
>
> Computing event counts...  done. 11861603 events
> Indexing...  done.
> Collecting events... Done indexing.
> Incorporating indexed data for training...
> done.
> Number of Event Tokens: 11861603
>    Number of Outcomes: 23
>  Number of Predicates: 6623489
> Computing model parameters...
> Performing 300 iterations.
>   1:  . (11795234/11861603) 0.9944047191597966
>   2:  . (11820243/11861603) 0.9965131188423689
>   3:  . (11829329/11861603) 0.9972791198626357
>   4:  . (11834935/11861603) 0.9977517372651908
>   5:  . (11838996/11861603) 0.9980941024581584
>   6:  . (11841501/11861603) 0.9983052880795286
>   7:  . (11843704/11861603) 0.998491013398442
>   8:  . (11845304/11861603) 0.9986259024180796
>   9:  . (11846421/11861603) 0.9987200718149141
>  10:  . (11847181/11861603) 0.9987841440992419
>  20:  . (11852226/11861603) 0.9992094660392866
>  30:  . (11853947/11861603) 0.9993545560410343
>  40:  . (11854831/11861603) 0.999429082224384
>  50:  . (11855471/11861603) 0.999483037832239
> Stopping: change in training set accuracy less than 1.0E-5
> Stats: (11846242/11861603) 0.998704981105842
> ...done.
> Compressed 6623489 parameters to 554312
> 6892 outcome patterns
> Indexing events using cutoff of 0
>
> Computing event counts...  done. 6370206 events
> Indexing...  done.
> Collecting events... Done indexing.
> Incorporating indexed data for training...
> done.
> Number of Event Tokens: 6370206
>    Number of Outcomes: 23
>  Number of Predicates: 3737425
> Computing model parameters...
> Performing 300 iterations.
>   1:  . (6330365/6370206) 0.9937457281601254
>   2:  . (6345859/6370206) 0.9961779885925196
>   3:  . (6351552/6370206) 0.9970716802564941
>   4:  . (6354847/6370206) 0.9975889319748843
>   5:  . (6356872/6370206) 0.997906818084062
>   6:  . (6358350/6370206) 0.998138835698563
>   7:  . (6359611/6370206) 0.9983367884806237
>   8:  . (6360473/6370206) 0.9984721059256169
>   9:  . (6361138/6370206) 0.9985764981540628
>  10:  . (6361532/6370206) 0.9986383485871572
>  20:  . (6364161/6370206) 0.9990510510963068
>  30:  . (6365106/6370206) 0.9991993979472563
> Stopping: change in training set accuracy less than 1.0E-5
> Stats: (6360617/6370206) 0.9984947111600473
> ...done.
> Indexing events using cutoff of 0
>
> Computing event counts...  done. 6370114 events
> Indexing...  done.
> Collecting events... Done indexing.
> Incorporating indexed data for training...
> done.
> Number of Event Tokens: 6370114
>    Number of Outcomes: 23
>  Number of Predicates: 3737390
> Computing model parameters...
> Performing 300 iterations.
>   1:  . (6330266/6370114) 0.9937445389517362
>   2:  . (6345810/6370114) 0.9961846836650019
>   3:  . (6351374/6370114) 0.9970581374210885
>   4:  . (6354747/6370114) 0.9975876412886803
>   5:  . (6356872/6370114) 0.9979212302950936
>   6:  . (6358429/6370114) 0.998165652922381
>   7:  . (6359417/6370114) 0.9983207521874805
>   8:  . (6360292/6370114) 0.9984581123665919
>   9:  . (6361076/6370114) 0.9985811870870757
>  10:  . (6361693/6370114) 0.998678045636232
>  20:  . (6364109/6370114) 0.9990573167136413
>  30:  . (6365008/6370114) 0.9991984444862368
>  40:  . (6365478/6370114) 0.9992722265253023
> Stopping: change in training set accuracy less than 1.0E-5
> Stats: (6359985/6370114) 0.9984099185666065
> ...done.
> Indexing events using cutoff of 0
>
> Computing event counts...  done. 6370480 events
> Indexing...  done.
> Collecting events... Done indexing.
> Incorporating indexed data for training...
> done.
> Number of Event Tokens: 6370480
>    Number of Outcomes: 23
>  Number of Predicates: 3737798
> Computing model parameters...
> Performing 300 iterations.
>   1:  . (6330685/6370480) 0.9937532179678769
>   2:  . (6346153/6370480) 0.9961812924614786
>   3:  . (6351726/6370480) 0.9970561088018485
>   4:  . (6355089/6370480) 0.9975840125076917
>   5:  . (6357173/6370480) 0.9979111464128292
>   6:  . (6358780/6370480) 0.9981634036995642
>   7:  . (6359845/6370480) 0.9983305810551167
>   8:  . (6360827/6370480) 0.9984847295651191
>   9:  . (6361316/6370480) 0.9985614898720347
>  10:  . (6362076/6370480) 0.9986807901445417
>  20:  . (6364506/6370480) 0.9990622370684784
>  30:  . (6365415/6370480) 0.9992049264733583
> Stopping: change in training set accuracy less than 1.0E-5
> Stats: (6362594/6370480) 0.9987621026986977
> ...done.
> Indexing events using cutoff of 0
>
> Computing event counts...  done. 6370008 events
> Indexing...  done.
> Collecting events... Done indexing.
> Incorporating indexed data for training...
> done.
> Number of Event Tokens: 6370008
>    Number of Outcomes: 23
>  Number of Predicates: 3737824
> Computing model parameters...
> Performing 300 iterations.
>   1:  . (6330200/6370008) 0.9937507142848172
>   2:  . (6345643/6370008) 0.9961750440501802
>   3:  . (6351415/6370008) 0.9970811653611737
>   4:  . (6354522/6370008) 0.9975689198506501
>   5:  . (6356723/6370008) 0.9979144453193779
>   6:  . (6358164/6370008) 0.9981406616757781
>   7:  . (6359399/6370008) 0.9983345389833106
>   8:  . (6360274/6370008) 0.9984719014481614
>   9:  . (6360694/6370008) 0.9985378354312899
>  10:  . (6361531/6370008) 0.9986692324405244
> ....
> ....
> ....
>
> etc etc is that normal ? The parameters are; *0 cutoff* and *300
> iterators*.
>
> The corpus is relative small, it has 20k sentences.
>
> I do not remember an output like that using MAXENT classifier.
>
> Damiano
>