You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by "ttrung@nlke-group.net" <tt...@nlke-group.net> on 2016/05/17 07:06:05 UTC
Stuck with class ChunkerME: java.lang.String cannot be cast to opennlp.tools.parser.Parse
Dear Apache OpenNLP Project Team,
I have an critical issue when training with Chunker tool in Java:
- Firstly, the sample code in documentation site
(https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training.api)
is not work, both for version 1.5.3 and 1.6.0
- Secondly, I have to edit the codes myself to (using version 1.5.3):
try {
Charset charset = Charset.forName("UTF-8");
ObjectStream lineStream = new PlainTextByLineStream(new
FileInputStream(fileChunker), charset);
ObjectStream<ChunkSample> sampleStream = new
ChunkSampleStream(lineStream);
chunkerModel = ChunkerME.train("vn", sampleStream,
TrainingParameters.defaultParams(), new ChunkerFactory());
modelApacheChunkerPath =
UtilityHelper.getTemporaryFilePathInsideDir("chunkerModel.bin");
OutputStream modelOut = new BufferedOutputStream(new
FileOutputStream(modelApacheChunkerPath));
chunkerModel.serialize(modelOut);
} catch (FileNotFoundException fe) {
} catch (IOException ie) {
}
- Thirdly, I have the error "java.lang.String cannot be cast to
opennlp.tools.parser.Parse". The reason is:
+ The constructor of class ChunkSampleStream requires
parameter is "ObjectStream<Parse> in"
+ However, the second parameter of method ChunkerME.train
is "ObjectStream<ChunkSample> in"
I cannot find any way to work around this issue.
Would you please check this point for me?
Thank you so much for your help.
Best regards,
Trung Tran.
Re: Stuck with class ChunkerME: java.lang.String cannot be cast to opennlp.tools.parser.Parse
Posted by Rodrigo Agerri <ra...@apache.org>.
Hi,
Can you provide the errors? otherwise we can only guess at what the problem is.
You are using the
/opennlp-tools/src/main/java/opennlp/tools/parser/ChunkSampleStream.java
class instead of the
/opennlp-tools/src/main/java/opennlp/tools/chunker/ChunkSampleStream.java
The first one creates chunk samples from Parse tree, that is why is
asking for ObjectStream<Parse> as input.
What is the error you get when using the code in the documentation?
Best,
R
On Tue, May 17, 2016 at 9:06 AM, ttrung@nlke-group.net
<tt...@nlke-group.net> wrote:
> Dear Apache OpenNLP Project Team,
>
> I have an critical issue when training with Chunker tool in Java:
>
> - Firstly, the sample code in documentation site
> (https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training.api)
> is not work, both for version 1.5.3 and 1.6.0
>
> - Secondly, I have to edit the codes myself to (using version 1.5.3):
>
> try {
> Charset charset = Charset.forName("UTF-8");
> ObjectStream lineStream = new PlainTextByLineStream(new
> FileInputStream(fileChunker), charset);
> ObjectStream<ChunkSample> sampleStream = new
> ChunkSampleStream(lineStream);
>
> chunkerModel = ChunkerME.train("vn", sampleStream,
> TrainingParameters.defaultParams(), new ChunkerFactory());
>
> modelApacheChunkerPath =
> UtilityHelper.getTemporaryFilePathInsideDir("chunkerModel.bin");
> OutputStream modelOut = new BufferedOutputStream(new
> FileOutputStream(modelApacheChunkerPath));
> chunkerModel.serialize(modelOut);
> } catch (FileNotFoundException fe) {
>
> } catch (IOException ie) {
>
> }
>
> - Thirdly, I have the error "java.lang.String cannot be cast to
> opennlp.tools.parser.Parse". The reason is:
>
> + The constructor of class ChunkSampleStream requires parameter
> is "ObjectStream<Parse> in"
>
> + However, the second parameter of method ChunkerME.train is
> "ObjectStream<ChunkSample> in"
>
> I cannot find any way to work around this issue.
>
> Would you please check this point for me?
>
> Thank you so much for your help.
>
> Best regards,
>
> Trung Tran.
Re: Stuck with class ChunkerME: java.lang.String cannot be cast to opennlp.tools.parser.Parse
Posted by Rodrigo Agerri <ra...@apache.org>.
Hi,
Your corpus is not well formatted. Did you check the format of the
training and test corpus from the English CoNLL 2000 dataset? That is
the format that it requires.
Best,
R
On Tue, May 17, 2016 at 11:56 PM, ttrung@nlke-group.net
<tt...@nlke-group.net> wrote:
> Dear Apache OpenNLP Project Team,
>
> Thank you so much for giving me very useful information about class (
>
> /opennlp-tools/src/main/java/opennlp/tools/chunker/ChunkSampleStream.java
>
> )
>
> It works very well.
>
> There is one more point: I have error when train Vietnamese sentences (more
> than 2 sentences in one training file).
>
> Here is 2 example sentences in file trainChunker.txt:
>
> buo^?i _T_C B-ADVP
> tru+a _T_C I-ADVP
> , , O
> cu+`u A_C B-NP
> cha.y IT_M B-VP
> theo IT_M I-VP
> me. H_C I-VP
> ra IT_M B-PP
> bo+` S_C I-PP
> suo^'i S_C I-PP
> . . O
>
> nó C_N_T B-NP
> tha^'y S_P B-VP
> ba^`y A_G B-NP
> hu+o+u A_C I-NP
> nai A_C I-NP
> ?ã ST_P_S B-CONJP
> o+? IT_P_C B-PP
> ?a^'y C_N_T I-PP
> ro^`i T_G I-PP
> . . O
>
> Here is the error right after train the first sentence:
>
> Skipping corrupt line: buo^?i _T_C B-ADVP
> Skipping corrupt line: tru+a _T_C I-ADVP
> Skipping corrupt line: , , O
> Skipping corrupt line: cu+`u A_C B-NP
> Skipping corrupt line: cha.y IT_M B-VP
> Skipping corrupt line: theo IT_M I-VP
> Skipping corrupt line: me. H_C I-VP
> Skipping corrupt line: ra IT_M B-PP
> Skipping corrupt line: bo+` S_C I-PP
> Skipping corrupt line: suo^'i S_C I-PP
> Skipping corrupt line: . . O
> Exception in thread "AWT-EventQueue-0" java.lang.IndexOutOfBoundsException:
> Index: 0, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:653)
> at java.util.ArrayList.get(ArrayList.java:429)
> at
> opennlp.tools.ml.model.AbstractDataIndexer.sortAndMerge(AbstractDataIndexer.java:89)
> at
> opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:105)
> at
> opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74)
> at
> opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:91)
> at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:217)
> at
> form.UtilitiesForm.handleTrainingTreebankWithApacheChunker(UtilitiesForm.java:2939)
> at
> form.UtilitiesForm.jmnitLoadTreebankActionPerformed(UtilitiesForm.java:1136)
> at form.UtilitiesForm.access$1400(UtilitiesForm.java:108)
> at form.UtilitiesForm$17.actionPerformed(UtilitiesForm.java:901)
> at
> javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2022)
> at
> javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2348)
> at
> javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402)
> at
> javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259)
> at javax.swing.AbstractButton.doClick(AbstractButton.java:376)
> at
> javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:833)
> at
> javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:877)
> Sorting and merging events... at
> java.awt.Component.processMouseEvent(Component.java:6535)
> at javax.swing.JComponent.processMouseEvent(JComponent.java:3324)
> at java.awt.Component.processEvent(Component.java:6300)
> at java.awt.Container.processEvent(Container.java:2236)
> at java.awt.Component.dispatchEventImpl(Component.java:4891)
> at java.awt.Container.dispatchEventImpl(Container.java:2294)
> at java.awt.Component.dispatchEvent(Component.java:4713)
> at
> java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4888)
> at java.awt.LightweightDispatcher.processMouseEvent(Container.java:4525)
> at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4466)
> at java.awt.Container.dispatchEventImpl(Container.java:2280)
> at java.awt.Window.dispatchEventImpl(Window.java:2750)
> at java.awt.Component.dispatchEvent(Component.java:4713)
> at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:758)
> at java.awt.EventQueue.access$500(EventQueue.java:97)
> at java.awt.EventQueue$3.run(EventQueue.java:709)
> at java.awt.EventQueue$3.run(EventQueue.java:703)
> at java.security.AccessController.doPrivileged(Native Method)
> at
> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
> at
> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:86)
> at java.awt.EventQueue$4.run(EventQueue.java:731)
> at java.awt.EventQueue$4.run(EventQueue.java:729)
> at java.security.AccessController.doPrivileged(Native Method)
> at
> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
> at java.awt.EventQueue.dispatchEvent(EventQueue.java:728)
> at
> java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201)
> at
> java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116)
> at
> java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105)
> at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
> at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
> at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)
>
> Would you please check these points for me?
>
> Thank you so much for your help.
>
> Best regards,
>
> Trung Tran.
>
> On 05/17/2016 08:15 PM, ttrung@nlke-group.net wrote:
>>
>> Dear Apache OpenNLP Project Team,
>>
>> I have another error with command line tool:
>>
>> - I did exactly as information in site
>> (https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training.tool):
>>
>> E:\test\apache-opennlp-1.5.3\bin>opennlp.bat ChunkerTrainerME -model
>> E:\test\en-chunker.bin -lang en -data E:\test\tmp.txt -encoding UTF-8
>>
>> File test only contains sample sentence as in the site :
>>
>> He PRP B-NP
>> reckons VBZ B-VP
>> the DT B-NP
>> current JJ I-NP
>> account NN I-NP
>> deficit NN I-NP
>> will MD B-VP
>> narrow VB I-VP
>> to TO B-PP
>> only RB B-NP
>> # # I-NP
>> 1.8 CD I-NP
>> billion CD I-NP
>> in IN B-PP
>> September NNP B-NP
>> . . O
>> And here is the error:
>>
>> Computing event counts... done. 0 events
>> Indexing... done.
>> Sorting and merging events... Done indexing.
>> Incorporating indexed data for training...
>> Exception in thread "main" java.lang.NullPointerException
>> at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
>> at opennlp.maxent.GIS.trainModel(GIS.java:256)
>> at opennlp.model.TrainUtil.train(TrainUtil.java:184)
>> at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:214)
>> at
>> opennlp.tools.cmdline.chunker.ChunkerTrainerTool.run(ChunkerTrainerTo
>> ol.java:68)
>> at opennlp.tools.cmdline.CLI.main(CLI.java:222)
>>
>>
>> Another point: The function cannot read more than 2 sentence in one train
>> file.
>>
>> Would you please check these points for me?
>>
>> Thank you so much for your help.
>>
>> Best regards,
>>
>> Trung Tran.
>>
>> On 05/17/2016 02:06 PM, ttrung@nlke-group.net wrote:
>>>
>>> Dear Apache OpenNLP Project Team,
>>>
>>> I have an critical issue when training with Chunker tool in Java:
>>>
>>> - Firstly, the sample code in documentation site
>>> (https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training.api)
>>> is not work, both for version 1.5.3 and 1.6.0
>>>
>>> - Secondly, I have to edit the codes myself to (using version 1.5.3):
>>>
>>> try {
>>> Charset charset = Charset.forName("UTF-8");
>>> ObjectStream lineStream = new PlainTextByLineStream(new
>>> FileInputStream(fileChunker), charset);
>>> ObjectStream<ChunkSample> sampleStream = new
>>> ChunkSampleStream(lineStream);
>>>
>>> chunkerModel = ChunkerME.train("vn", sampleStream,
>>> TrainingParameters.defaultParams(), new ChunkerFactory());
>>>
>>> modelApacheChunkerPath =
>>> UtilityHelper.getTemporaryFilePathInsideDir("chunkerModel.bin");
>>> OutputStream modelOut = new BufferedOutputStream(new
>>> FileOutputStream(modelApacheChunkerPath));
>>> chunkerModel.serialize(modelOut);
>>> } catch (FileNotFoundException fe) {
>>>
>>> } catch (IOException ie) {
>>>
>>> }
>>>
>>> - Thirdly, I have the error "java.lang.String cannot be cast to
>>> opennlp.tools.parser.Parse". The reason is:
>>>
>>> + The constructor of class ChunkSampleStream requires
>>> parameter is "ObjectStream<Parse> in"
>>>
>>> + However, the second parameter of method ChunkerME.train is
>>> "ObjectStream<ChunkSample> in"
>>>
>>> I cannot find any way to work around this issue.
>>>
>>> Would you please check this point for me?
>>>
>>> Thank you so much for your help.
>>>
>>> Best regards,
>>>
>>> Trung Tran.
>>
>>
>
Re: Cannot train Chunker
Posted by Joern Kottmann <ko...@gmail.com>.
On which data do you train exactly?
How many sentences?
Jörn
On Thu, May 26, 2016 at 2:49 PM, ttrung@nlke-group.net <
ttrung@nlke-group.net> wrote:
> Dear Apache OpenNLP Project Team,
>
> I have re-tested with sample sentence in the site (
> https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training)
> :
>
> He PRP B-NP
> reckons VBZ B-VP
> the DT B-NP
> current JJ I-NP
> account NN I-NP
> deficit NN I-NP
> will MD B-VP
> narrow VB I-VP
> to TO B-PP
> only RB B-NP
> # # I-NP
> 1.8 CD I-NP
> billion CD I-NP
> in IN B-PP
> September NNP B-NP
> . . O
>
> And I still receive the same error:
>
> Skipping corrupt line: He PRP B-NPreckons VBZ B-VPthe DT
> B-NPcurrent JJ I-NPaccount NN I-NPdeficit NN I-NPwill MD
> B-VPnarrow VB I-VPto TO B-PPonly RB B-NP# #
> I-NP1.8 CD I-NPbillion CD I-NPin IN B-PPSeptember NNP
> B-NP. . O
> Exception in thread "AWT-EventQueue-0"
> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:653)
> at java.util.ArrayList.get(ArrayList.java:429)
> at
> opennlp.tools.ml.model.AbstractDataIndexer.sortAndMerge(AbstractDataIndexer.java:89)
> at
> opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:105)
> at opennlp.tools.ml
> .AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74)
> at opennlp.tools.ml
> .AbstractEventTrainer.train(AbstractEventTrainer.java:91)
> at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:217)
> at
> form.UtilitiesForm.handleTrainingTreebankWithApacheChunker(UtilitiesForm.java:2989)
> at
> form.UtilitiesForm.jmnitLoadTreebankActionPerformed(UtilitiesForm.java:1166)
> at form.UtilitiesForm.access$1400(UtilitiesForm.java:108)
> at form.UtilitiesForm$17.actionPerformed(UtilitiesForm.java:901)
> at
> javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2022)
> at
> javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2348)
> at
> javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402)
> at
> javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259)
> at javax.swing.AbstractButton.doClick(AbstractButton.java:376)
> at
> javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:833)
> at
> javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:877)
> at java.awt.Component.processMouseEvent(Component.java:6535)
> at javax.swing.JComponent.processMouseEvent(JComponent.java:3324)
> at java.awt.Component.processEvent(Component.java:6300)
> at java.awt.Container.processEvent(Container.java:2236)
> at java.awt.Component.dispatchEventImpl(Component.java:4891)
> at java.awt.Container.dispatchEventImpl(Container.java:2294)
> at java.awt.Component.dispatchEvent(Component.java:4713)
> at
> java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4888)
> at
> java.awt.LightweightDispatcher.processMouseEvent(Container.java:4525)
> at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4466)
> at java.awt.Container.dispatchEventImpl(Container.java:2280)
> at java.awt.Window.dispatchEventImpl(Window.java:2750)
> at java.awt.Component.dispatchEvent(Component.java:4713)
> at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:758)
> at java.awt.EventQueue.access$500(EventQueue.java:97)
> at java.awt.EventQueue$3.run(EventQueue.java:709)
> at java.awt.EventQueue$3.run(EventQueue.java:703)
> at java.security.AccessController.doPrivileged(Native Method)
> at
> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
> at
> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:86)
> at java.awt.EventQueue$4.run(EventQueue.java:731)
> at java.awt.EventQueue$4.run(EventQueue.java:729)
> at java.security.AccessController.doPrivileged(Native Method)
> at
> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
> at java.awt.EventQueue.dispatchEvent(EventQueue.java:728)
> at
> java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201)
> at
> java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116)
> at
> java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105)
> at
> java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
> at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
> at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)
> Sorting and merging events...
>
> Here are whole java code:
>
> try {
> Charset charset = Charset.forName("UTF-8");
> File fileChunker = new File("trainApacheChunker.txt");
> MarkableFileInputStreamFactory i = new
> MarkableFileInputStreamFactory(fileChunker);
> ObjectStream lineStream = new PlainTextByLineStream(i,
> charset);
> ObjectStream<ChunkSample> sampleStream = new
> ChunkSampleStream(lineStream);
>
> chunkerModel = ChunkerME.train("en", sampleStream,
> TrainingParameters.defaultParams(), new ChunkerFactory());
>
> modelApacheChunkerPath = "chunkerModel.bin";
> OutputStream modelOut = new BufferedOutputStream(new
> FileOutputStream(modelApacheChunkerPath));
> chunkerModel.serialize(modelOut);
> } catch (FileNotFoundException fe) {
>
> } catch (IOException ie) {
>
> }
>
> Would you please check this point for me?
>
> Thank you so much for your help.
>
> Best regards,
>
> Trung Tran.
>
>
> On 05/18/2016 04:56 AM, ttrung@nlke-group.net wrote:
>
>> Dear Apache OpenNLP Project Team,
>>
>> Thank you so much for giving me very useful information about class (
>> /opennlp-tools/src/main/java/opennlp/tools/chunker/ChunkSampleStream.java
>> )
>>
>> It works very well.
>>
>> There is one more point: I have error when train Vietnamese sentences
>> (more than 2 sentences in one training file).
>>
>> Here is 2 example sentences in file trainChunker.txt:
>>
>> buo^?i _T_C B-ADVP
>> tru+a _T_C I-ADVP
>> , , O
>> cu+`u A_C B-NP
>> cha.y IT_M B-VP
>> theo IT_M I-VP
>> me. H_C I-VP
>> ra IT_M B-PP
>> bo+` S_C I-PP
>> suo^'i S_C I-PP
>> . . O
>>
>> nó C_N_T B-NP
>> tha^'y S_P B-VP
>> ba^`y A_G B-NP
>> hu+o+u A_C I-NP
>> nai A_C I-NP
>> ?ã ST_P_S B-CONJP
>> o+? IT_P_C B-PP
>> ?a^'y C_N_T I-PP
>> ro^`i T_G I-PP
>> . . O
>>
>> Here is the error right after train the first sentence:
>>
>> Skipping corrupt line: buo^?i _T_C B-ADVP
>> Skipping corrupt line: tru+a _T_C I-ADVP
>> Skipping corrupt line: , , O
>> Skipping corrupt line: cu+`u A_C B-NP
>> Skipping corrupt line: cha.y IT_M B-VP
>> Skipping corrupt line: theo IT_M I-VP
>> Skipping corrupt line: me. H_C I-VP
>> Skipping corrupt line: ra IT_M B-PP
>> Skipping corrupt line: bo+` S_C I-PP
>> Skipping corrupt line: suo^'i S_C I-PP
>> Skipping corrupt line: . . O
>> Exception in thread "AWT-EventQueue-0"
>> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>> at java.util.ArrayList.rangeCheck(ArrayList.java:653)
>> at java.util.ArrayList.get(ArrayList.java:429)
>> at
>> opennlp.tools.ml.model.AbstractDataIndexer.sortAndMerge(AbstractDataIndexer.java:89)
>> at
>> opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:105)
>> at opennlp.tools.ml
>> .AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74)
>> at opennlp.tools.ml
>> .AbstractEventTrainer.train(AbstractEventTrainer.java:91)
>> at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:217)
>> at
>> form.UtilitiesForm.handleTrainingTreebankWithApacheChunker(UtilitiesForm.java:2939)
>> at
>> form.UtilitiesForm.jmnitLoadTreebankActionPerformed(UtilitiesForm.java:1136)
>> at form.UtilitiesForm.access$1400(UtilitiesForm.java:108)
>> at form.UtilitiesForm$17.actionPerformed(UtilitiesForm.java:901)
>> at
>> javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2022)
>> at
>> javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2348)
>> at
>> javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402)
>> at
>> javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259)
>> at javax.swing.AbstractButton.doClick(AbstractButton.java:376)
>> at
>> javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:833)
>> at
>> javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:877)
>> Sorting and merging events... at
>> java.awt.Component.processMouseEvent(Component.java:6535)
>> at javax.swing.JComponent.processMouseEvent(JComponent.java:3324)
>> at java.awt.Component.processEvent(Component.java:6300)
>> at java.awt.Container.processEvent(Container.java:2236)
>> at java.awt.Component.dispatchEventImpl(Component.java:4891)
>> at java.awt.Container.dispatchEventImpl(Container.java:2294)
>> at java.awt.Component.dispatchEvent(Component.java:4713)
>> at
>> java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4888)
>> at
>> java.awt.LightweightDispatcher.processMouseEvent(Container.java:4525)
>> at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4466)
>> at java.awt.Container.dispatchEventImpl(Container.java:2280)
>> at java.awt.Window.dispatchEventImpl(Window.java:2750)
>> at java.awt.Component.dispatchEvent(Component.java:4713)
>> at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:758)
>> at java.awt.EventQueue.access$500(EventQueue.java:97)
>> at java.awt.EventQueue$3.run(EventQueue.java:709)
>> at java.awt.EventQueue$3.run(EventQueue.java:703)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at
>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
>> at
>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:86)
>> at java.awt.EventQueue$4.run(EventQueue.java:731)
>> at java.awt.EventQueue$4.run(EventQueue.java:729)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at
>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
>> at java.awt.EventQueue.dispatchEvent(EventQueue.java:728)
>> at
>> java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201)
>> at
>> java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116)
>> at
>> java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105)
>> at
>> java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
>> at
>> java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
>> at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)
>>
>> Would you please check these points for me?
>>
>> Thank you so much for your help.
>>
>> Best regards,
>>
>> Trung Tran.
>>
>> On 05/17/2016 08:15 PM, ttrung@nlke-group.net wrote:
>>
>>> Dear Apache OpenNLP Project Team,
>>>
>>> I have another error with command line tool:
>>>
>>> - I did exactly as information in site (
>>> https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training.tool
>>> ):
>>>
>>> E:\test\apache-opennlp-1.5.3\bin>opennlp.bat ChunkerTrainerME -model
>>> E:\test\en-chunker.bin -lang en -data E:\test\tmp.txt -encoding UTF-8
>>>
>>> File test only contains sample sentence as in the site :
>>>
>>> He PRP B-NP
>>> reckons VBZ B-VP
>>> the DT B-NP
>>> current JJ I-NP
>>> account NN I-NP
>>> deficit NN I-NP
>>> will MD B-VP
>>> narrow VB I-VP
>>> to TO B-PP
>>> only RB B-NP
>>> # # I-NP
>>> 1.8 CD I-NP
>>> billion CD I-NP
>>> in IN B-PP
>>> September NNP B-NP
>>> . . O
>>> And here is the error:
>>>
>>> Computing event counts... done. 0 events
>>> Indexing... done.
>>> Sorting and merging events... Done indexing.
>>> Incorporating indexed data for training...
>>> Exception in thread "main" java.lang.NullPointerException
>>> at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
>>> at opennlp.maxent.GIS.trainModel(GIS.java:256)
>>> at opennlp.model.TrainUtil.train(TrainUtil.java:184)
>>> at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:214)
>>> at
>>> opennlp.tools.cmdline.chunker.ChunkerTrainerTool.run(ChunkerTrainerTo
>>> ol.java:68)
>>> at opennlp.tools.cmdline.CLI.main(CLI.java:222)
>>>
>>>
>>> Another point: The function cannot read more than 2 sentence in one
>>> train file.
>>>
>>> Would you please check these points for me?
>>>
>>> Thank you so much for your help.
>>>
>>> Best regards,
>>>
>>> Trung Tran.
>>>
>>> On 05/17/2016 02:06 PM, ttrung@nlke-group.net wrote:
>>>
>>>> Dear Apache OpenNLP Project Team,
>>>>
>>>> I have an critical issue when training with Chunker tool in Java:
>>>>
>>>> - Firstly, the sample code in documentation site (
>>>> https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training.api)
>>>> is not work, both for version 1.5.3 and 1.6.0
>>>>
>>>> - Secondly, I have to edit the codes myself to (using version
>>>> 1.5.3):
>>>>
>>>> try {
>>>> Charset charset = Charset.forName("UTF-8");
>>>> ObjectStream lineStream = new PlainTextByLineStream(new
>>>> FileInputStream(fileChunker), charset);
>>>> ObjectStream<ChunkSample> sampleStream = new
>>>> ChunkSampleStream(lineStream);
>>>>
>>>> chunkerModel = ChunkerME.train("vn", sampleStream,
>>>> TrainingParameters.defaultParams(), new ChunkerFactory());
>>>>
>>>> modelApacheChunkerPath =
>>>> UtilityHelper.getTemporaryFilePathInsideDir("chunkerModel.bin");
>>>> OutputStream modelOut = new BufferedOutputStream(new
>>>> FileOutputStream(modelApacheChunkerPath));
>>>> chunkerModel.serialize(modelOut);
>>>> } catch (FileNotFoundException fe) {
>>>>
>>>> } catch (IOException ie) {
>>>>
>>>> }
>>>>
>>>> - Thirdly, I have the error "java.lang.String cannot be cast to
>>>> opennlp.tools.parser.Parse". The reason is:
>>>>
>>>> + The constructor of class ChunkSampleStream requires
>>>> parameter is "ObjectStream<Parse> in"
>>>>
>>>> + However, the second parameter of method ChunkerME.train
>>>> is "ObjectStream<ChunkSample> in"
>>>>
>>>> I cannot find any way to work around this issue.
>>>>
>>>> Would you please check this point for me?
>>>>
>>>> Thank you so much for your help.
>>>>
>>>> Best regards,
>>>>
>>>> Trung Tran.
>>>>
>>>
>>>
>>
>
Re: Stuck: Cannot train Chunker even with the instruction on the site
Posted by Rodrigo Agerri <ra...@apache.org>.
Hi,
Can you please try training with the CoNLL 2000 data and see if you
get the same error?
Also, can you please put somewhere online your Vietnamese data so we
can try and reproduce that error?
Best,
R
On Fri, Jun 3, 2016 at 11:30 PM, ttrung@nlke-group.net
<tt...@nlke-group.net> wrote:
> Dear Apache OpenNLP Project Team,
>
> We really appreciate that you provides the wonderful tools OpenNLG and we
> already successfully trained with most (Tokenizer, POS Tagger).
>
> There is only one small problem (we really believe this) that I described
> below when training with Chunker.
>
> I hope that you will re-test and give us some information soon so that we
> can fix this critical point.
>
> By the way, you are always amazing team :)
>
> Thank you so much for your help.
>
> Best regards,
>
> Trung Tran.
>
>
> On 05/27/2016 08:23 AM, ttrung@nlke-group.net wrote:
>>
>> Dear Apache OpenNLP Project Team,
>>
>> To help you reproduce the situation, I describe the experiment step by
>> step here:
>>
>> - Firstly, I read carefully the instruction on the site
>> (https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training):
>>
>> " The training data can be converted to the OpenNLP chunker training
>> format, that is based onCoNLL2000
>> <http://www.cnts.ua.ac.be/conll2000/chunking>. Other formats may also be
>> available. The train data consist of three columns separated by spaces. Each
>> word has been put on a separate line and there is an empty line after each
>> sentence. The first column contains the current word, the second its
>> part-of-speech tag and the third its chunk tag. The chunk tags contain the
>> name of the chunk type, for example I-NP for noun phrase words and I-VP for
>> verb phrase words. Most chunk types have two types of chunk tags, B-CHUNK
>> for the first word of the chunk and I-CHUNK for each other word in the
>> chunk. Here is an example of the file format:"
>>
>> - Secondly, I created two tested file ".txt". The first file contains
>> only one sample sentence on the site:
>>
>> He PRP B-NP
>> reckons VBZ B-VP
>> the DT B-NP
>> current JJ I-NP
>> account NN I-NP
>> deficit NN I-NP
>> will MD B-VP
>> narrow VB I-VP
>> to TO B-PP
>> only RB B-NP
>> # # I-NP
>> 1.8 CD I-NP
>> billion CD I-NP
>> in IN B-PP
>> September NNP B-NP
>> . . O
>>
>> - The second tested file contains 300 Vietnamese sentences. As
>> described on the site: Each word has been put on a separate line and there
>> is an empty line after each sentence..
>>
>> - Thirdly, I ran the program 2 times to train these two files. With
>> both times, I had the same error, right after reading the first sentence.
>>
>> Would you please point out that I misses something?
>>
>> PS: I trained Tokenizer and POS Tagger successfully according to the
>> instruction on this site :)
>>
>> Thank you so much for helping me.
>>
>> Best regards,
>>
>> Trung Tran.
>>
>>
>> On 05/26/2016 07:49 PM, ttrung@nlke-group.net wrote:
>>>
>>> Dear Apache OpenNLP Project Team,
>>
>>
>>> I have re-tested with sample sentence in the site
>>> (https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training)
>>> :
>>>
>>> He PRP B-NP
>>> reckons VBZ B-VP
>>> the DT B-NP
>>> current JJ I-NP
>>> account NN I-NP
>>> deficit NN I-NP
>>> will MD B-VP
>>> narrow VB I-VP
>>> to TO B-PP
>>> only RB B-NP
>>> # # I-NP
>>> 1.8 CD I-NP
>>> billion CD I-NP
>>> in IN B-PP
>>> September NNP B-NP
>>> . . O
>>> And I still receive the same error:
>>>
>>> Skipping corrupt line: He PRP B-NPreckons VBZ B-VPthe DT
>>> B-NPcurrent JJ I-NPaccount NN I-NPdeficit NN I-NPwill MD
>>> B-VPnarrow VB I-VPto TO B-PPonly RB B-NP# #
>>> I-NP1.8 CD I-NPbillion CD I-NPin IN B-PPSeptember NNP
>>> B-NP. . O
>>> Exception in thread "AWT-EventQueue-0"
>>> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>>> at java.util.ArrayList.rangeCheck(ArrayList.java:653)
>>> at java.util.ArrayList.get(ArrayList.java:429)
>>> at
>>> opennlp.tools.ml.model.AbstractDataIndexer.sortAndMerge(AbstractDataIndexer.java:89)
>>> at
>>> opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:105)
>>> at
>>> opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74)
>>> at
>>> opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:91)
>>> at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:217)
>>> at
>>> form.UtilitiesForm.handleTrainingTreebankWithApacheChunker(UtilitiesForm.java:2989)
>>> at
>>> form.UtilitiesForm.jmnitLoadTreebankActionPerformed(UtilitiesForm.java:1166)
>>> at form.UtilitiesForm.access$1400(UtilitiesForm.java:108)
>>> at form.UtilitiesForm$17.actionPerformed(UtilitiesForm.java:901)
>>> at
>>> javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2022)
>>> at
>>> javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2348)
>>> at
>>> javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402)
>>> at
>>> javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259)
>>> at javax.swing.AbstractButton.doClick(AbstractButton.java:376)
>>> at
>>> javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:833)
>>> at
>>> javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:877)
>>> at java.awt.Component.processMouseEvent(Component.java:6535)
>>> at javax.swing.JComponent.processMouseEvent(JComponent.java:3324)
>>> at java.awt.Component.processEvent(Component.java:6300)
>>> at java.awt.Container.processEvent(Container.java:2236)
>>> at java.awt.Component.dispatchEventImpl(Component.java:4891)
>>> at java.awt.Container.dispatchEventImpl(Container.java:2294)
>>> at java.awt.Component.dispatchEvent(Component.java:4713)
>>> at
>>> java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4888)
>>> at
>>> java.awt.LightweightDispatcher.processMouseEvent(Container.java:4525)
>>> at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4466)
>>> at java.awt.Container.dispatchEventImpl(Container.java:2280)
>>> at java.awt.Window.dispatchEventImpl(Window.java:2750)
>>> at java.awt.Component.dispatchEvent(Component.java:4713)
>>> at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:758)
>>> at java.awt.EventQueue.access$500(EventQueue.java:97)
>>> at java.awt.EventQueue$3.run(EventQueue.java:709)
>>> at java.awt.EventQueue$3.run(EventQueue.java:703)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at
>>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
>>> at
>>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:86)
>>> at java.awt.EventQueue$4.run(EventQueue.java:731)
>>> at java.awt.EventQueue$4.run(EventQueue.java:729)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at
>>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
>>> at java.awt.EventQueue.dispatchEvent(EventQueue.java:728)
>>> at
>>> java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201)
>>> at
>>> java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116)
>>> at
>>> java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105)
>>> at
>>> java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
>>> at
>>> java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
>>> at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)
>>> Sorting and merging events...
>>>
>>> Here are whole java code:
>>>
>>> try {
>>> Charset charset = Charset.forName("UTF-8");
>>> File fileChunker = new File("trainApacheChunker.txt");
>>> MarkableFileInputStreamFactory i = new
>>> MarkableFileInputStreamFactory(fileChunker);
>>> ObjectStream lineStream = new PlainTextByLineStream(i,
>>> charset);
>>> ObjectStream<ChunkSample> sampleStream = new
>>> ChunkSampleStream(lineStream);
>>>
>>> chunkerModel = ChunkerME.train("en", sampleStream,
>>> TrainingParameters.defaultParams(), new ChunkerFactory());
>>>
>>> modelApacheChunkerPath = "chunkerModel.bin";
>>> OutputStream modelOut = new BufferedOutputStream(new
>>> FileOutputStream(modelApacheChunkerPath));
>>> chunkerModel.serialize(modelOut);
>>> } catch (FileNotFoundException fe) {
>>>
>>> } catch (IOException ie) {
>>>
>>> }
>>>
>>> Would you please check this point for me?
>>>
>>> Thank you so much for your help.
>>>
>>> Best regards,
>>>
>>> Trung Tran.
>>>
>>>
>>> On 05/18/2016 04:56 AM, ttrung@nlke-group.net wrote:
>>>>
>>>> Dear Apache OpenNLP Project Team,
>>>>
>>>> Thank you so much for giving me very useful information about class (
>>>>
>>>> /opennlp-tools/src/main/java/opennlp/tools/chunker/ChunkSampleStream.java
>>>> )
>>>>
>>>> It works very well.
>>>>
>>>> There is one more point: I have error when train Vietnamese sentences
>>>> (more than 2 sentences in one training file).
>>>>
>>>> Here is 2 example sentences in file trainChunker.txt:
>>>>
>>>> buo^?i _T_C B-ADVP
>>>> tru+a _T_C I-ADVP
>>>> , , O
>>>> cu+`u A_C B-NP
>>>> cha.y IT_M B-VP
>>>> theo IT_M I-VP
>>>> me. H_C I-VP
>>>> ra IT_M B-PP
>>>> bo+` S_C I-PP
>>>> suo^'i S_C I-PP
>>>> . . O
>>>>
>>>> nó C_N_T B-NP
>>>> tha^'y S_P B-VP
>>>> ba^`y A_G B-NP
>>>> hu+o+u A_C I-NP
>>>> nai A_C I-NP
>>>> ?ã ST_P_S B-CONJP
>>>> o+? IT_P_C B-PP
>>>> ?a^'y C_N_T I-PP
>>>> ro^`i T_G I-PP
>>>> . . O
>>>>
>>>> Here is the error right after train the first sentence:
>>>>
>>>> Skipping corrupt line: buo^?i _T_C B-ADVP
>>>> Skipping corrupt line: tru+a _T_C I-ADVP
>>>> Skipping corrupt line: , , O
>>>> Skipping corrupt line: cu+`u A_C B-NP
>>>> Skipping corrupt line: cha.y IT_M B-VP
>>>> Skipping corrupt line: theo IT_M I-VP
>>>> Skipping corrupt line: me. H_C I-VP
>>>> Skipping corrupt line: ra IT_M B-PP
>>>> Skipping corrupt line: bo+` S_C I-PP
>>>> Skipping corrupt line: suo^'i S_C I-PP
>>>> Skipping corrupt line: . . O
>>>> Exception in thread "AWT-EventQueue-0"
>>>> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>>>> at java.util.ArrayList.rangeCheck(ArrayList.java:653)
>>>> at java.util.ArrayList.get(ArrayList.java:429)
>>>> at
>>>> opennlp.tools.ml.model.AbstractDataIndexer.sortAndMerge(AbstractDataIndexer.java:89)
>>>> at
>>>> opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:105)
>>>> at
>>>> opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74)
>>>> at
>>>> opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:91)
>>>> at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:217)
>>>> at
>>>> form.UtilitiesForm.handleTrainingTreebankWithApacheChunker(UtilitiesForm.java:2939)
>>>> at
>>>> form.UtilitiesForm.jmnitLoadTreebankActionPerformed(UtilitiesForm.java:1136)
>>>> at form.UtilitiesForm.access$1400(UtilitiesForm.java:108)
>>>> at form.UtilitiesForm$17.actionPerformed(UtilitiesForm.java:901)
>>>> at
>>>> javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2022)
>>>> at
>>>> javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2348)
>>>> at
>>>> javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402)
>>>> at
>>>> javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259)
>>>> at javax.swing.AbstractButton.doClick(AbstractButton.java:376)
>>>> at
>>>> javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:833)
>>>> at
>>>> javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:877)
>>>> Sorting and merging events... at
>>>> java.awt.Component.processMouseEvent(Component.java:6535)
>>>> at javax.swing.JComponent.processMouseEvent(JComponent.java:3324)
>>>> at java.awt.Component.processEvent(Component.java:6300)
>>>> at java.awt.Container.processEvent(Container.java:2236)
>>>> at java.awt.Component.dispatchEventImpl(Component.java:4891)
>>>> at java.awt.Container.dispatchEventImpl(Container.java:2294)
>>>> at java.awt.Component.dispatchEvent(Component.java:4713)
>>>> at
>>>> java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4888)
>>>> at
>>>> java.awt.LightweightDispatcher.processMouseEvent(Container.java:4525)
>>>> at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4466)
>>>> at java.awt.Container.dispatchEventImpl(Container.java:2280)
>>>> at java.awt.Window.dispatchEventImpl(Window.java:2750)
>>>> at java.awt.Component.dispatchEvent(Component.java:4713)
>>>> at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:758)
>>>> at java.awt.EventQueue.access$500(EventQueue.java:97)
>>>> at java.awt.EventQueue$3.run(EventQueue.java:709)
>>>> at java.awt.EventQueue$3.run(EventQueue.java:703)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>> at
>>>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
>>>> at
>>>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:86)
>>>> at java.awt.EventQueue$4.run(EventQueue.java:731)
>>>> at java.awt.EventQueue$4.run(EventQueue.java:729)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>> at
>>>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
>>>> at java.awt.EventQueue.dispatchEvent(EventQueue.java:728)
>>>> at
>>>> java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201)
>>>> at
>>>> java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116)
>>>> at
>>>> java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105)
>>>> at
>>>> java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
>>>> at
>>>> java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
>>>> at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)
>>>>
>>>> Would you please check these points for me?
>>>>
>>>> Thank you so much for your help.
>>>>
>>>> Best regards,
>>>>
>>>> Trung Tran.
>>>>
>>>> On 05/17/2016 08:15 PM, ttrung@nlke-group.net wrote:
>>>>>
>>>>> Dear Apache OpenNLP Project Team,
>>>>>
>>>>> I have another error with command line tool:
>>>>>
>>>>> - I did exactly as information in site
>>>>> (https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training.tool):
>>>>>
>>>>> E:\test\apache-opennlp-1.5.3\bin>opennlp.bat ChunkerTrainerME -model
>>>>> E:\test\en-chunker.bin -lang en -data E:\test\tmp.txt -encoding UTF-8
>>>>>
>>>>> File test only contains sample sentence as in the site :
>>>>>
>>>>> He PRP B-NP
>>>>> reckons VBZ B-VP
>>>>> the DT B-NP
>>>>> current JJ I-NP
>>>>> account NN I-NP
>>>>> deficit NN I-NP
>>>>> will MD B-VP
>>>>> narrow VB I-VP
>>>>> to TO B-PP
>>>>> only RB B-NP
>>>>> # # I-NP
>>>>> 1.8 CD I-NP
>>>>> billion CD I-NP
>>>>> in IN B-PP
>>>>> September NNP B-NP
>>>>> . . O
>>>>> And here is the error:
>>>>>
>>>>> Computing event counts... done. 0 events
>>>>> Indexing... done.
>>>>> Sorting and merging events... Done indexing.
>>>>> Incorporating indexed data for training...
>>>>> Exception in thread "main" java.lang.NullPointerException
>>>>> at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
>>>>> at opennlp.maxent.GIS.trainModel(GIS.java:256)
>>>>> at opennlp.model.TrainUtil.train(TrainUtil.java:184)
>>>>> at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:214)
>>>>> at
>>>>> opennlp.tools.cmdline.chunker.ChunkerTrainerTool.run(ChunkerTrainerTo
>>>>> ol.java:68)
>>>>> at opennlp.tools.cmdline.CLI.main(CLI.java:222)
>>>>>
>>>>>
>>>>> Another point: The function cannot read more than 2 sentence in one
>>>>> train file.
>>>>>
>>>>> Would you please check these points for me?
>>>>>
>>>>> Thank you so much for your help.
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Trung Tran.
>>>>>
>>>>> On 05/17/2016 02:06 PM, ttrung@nlke-group.net wrote:
>>>>>>
>>>>>> Dear Apache OpenNLP Project Team,
>>>>>>
>>>>>> I have an critical issue when training with Chunker tool in Java:
>>>>>>
>>>>>> - Firstly, the sample code in documentation site
>>>>>> (https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training.api)
>>>>>> is not work, both for version 1.5.3 and 1.6.0
>>>>>>
>>>>>> - Secondly, I have to edit the codes myself to (using version
>>>>>> 1.5.3):
>>>>>>
>>>>>> try {
>>>>>> Charset charset = Charset.forName("UTF-8");
>>>>>> ObjectStream lineStream = new PlainTextByLineStream(new
>>>>>> FileInputStream(fileChunker), charset);
>>>>>> ObjectStream<ChunkSample> sampleStream = new
>>>>>> ChunkSampleStream(lineStream);
>>>>>>
>>>>>> chunkerModel = ChunkerME.train("vn", sampleStream,
>>>>>> TrainingParameters.defaultParams(), new ChunkerFactory());
>>>>>>
>>>>>> modelApacheChunkerPath =
>>>>>> UtilityHelper.getTemporaryFilePathInsideDir("chunkerModel.bin");
>>>>>> OutputStream modelOut = new BufferedOutputStream(new
>>>>>> FileOutputStream(modelApacheChunkerPath));
>>>>>> chunkerModel.serialize(modelOut);
>>>>>> } catch (FileNotFoundException fe) {
>>>>>>
>>>>>> } catch (IOException ie) {
>>>>>>
>>>>>> }
>>>>>>
>>>>>> - Thirdly, I have the error "java.lang.String cannot be cast to
>>>>>> opennlp.tools.parser.Parse". The reason is:
>>>>>>
>>>>>> + The constructor of class ChunkSampleStream requires
>>>>>> parameter is "ObjectStream<Parse> in"
>>>>>>
>>>>>> + However, the second parameter of method ChunkerME.train
>>>>>> is "ObjectStream<ChunkSample> in"
>>>>>>
>>>>>> I cannot find any way to work around this issue.
>>>>>>
>>>>>> Would you please check this point for me?
>>>>>>
>>>>>> Thank you so much for your help.
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> Trung Tran.
>>>>>
>>>>>
>>>>
>>>
>>
>
Re: Stuck: Cannot train Chunker even with the instruction on the
site
Posted by "ttrung@nlke-group.net" <tt...@nlke-group.net>.
Dear Apache OpenNLP Project Team,
I really appreciate that you found out what I missed in the training
file format.
I think that is why I also could not train the sample sentence on the
site. I just copied from the site to file. I didn't know that we just
need one single space between columns.
I suggest that you should update the information on the site: from " The
train data consist of three columns separated by spaces." to " The train
data consist of three columns separated by one single space." or
something like this. I already read carefully the information on
CoNLL-2000 (http://www.clips.uantwerpen.be/conll2000/chunking/) and the
information on this should be changed too.
Thank you so much for your help,
Best regards,
Trung Tran.
On 06/07/2016 08:57 PM, ttrung@nlke-group.net wrote:
> Dear Apache OpenNLP Project Team,
>
> Here are two test file for Chunker Training which we trained:
> https://www.fshare.vn/folder/WNDJBN38LYV7
>
> Would you please try to test these files for us?
>
> Thank you so much for your help.
>
> Best regards.
>
> Trung Tran.
>
>
> On 06/04/2016 04:30 AM, ttrung@nlke-group.net wrote:
>> Dear Apache OpenNLP Project Team,
>>
>> We really appreciate that you provides the wonderful tools OpenNLG
>> and we already successfully trained with most (Tokenizer, POS Tagger).
>>
>> There is only one small problem (we really believe this) that I
>> described below when training with Chunker.
>>
>> I hope that you will re-test and give us some information soon so
>> that we can fix this critical point.
>>
>> By the way, you are always amazing team :)
>>
>> Thank you so much for your help.
>>
>> Best regards,
>>
>> Trung Tran.
>>
>>
>> On 05/27/2016 08:23 AM, ttrung@nlke-group.net wrote:
>>> Dear Apache OpenNLP Project Team,
>>>
>>> To help you reproduce the situation, I describe the experiment step
>>> by step here:
>>>
>>> - Firstly, I read carefully the instruction on the site
>>> (https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training):
>>>
>>> " The training data can be converted to the OpenNLP chunker training
>>> format, that is based onCoNLL2000
>>> <http://www.cnts.ua.ac.be/conll2000/chunking>. Other formats may
>>> also be available. The train data consist of three columns separated
>>> by spaces. Each word has been put on a separate line and there is an
>>> empty line after each sentence. The first column contains the
>>> current word, the second its part-of-speech tag and the third its
>>> chunk tag. The chunk tags contain the name of the chunk type, for
>>> example I-NP for noun phrase words and I-VP for verb phrase words.
>>> Most chunk types have two types of chunk tags, B-CHUNK for the first
>>> word of the chunk and I-CHUNK for each other word in the chunk. Here
>>> is an example of the file format:"
>>>
>>> - Secondly, I created two tested file ".txt". The first file
>>> contains only one sample sentence on the site:
>>>
>>> He PRP B-NP
>>> reckons VBZ B-VP
>>> the DT B-NP
>>> current JJ I-NP
>>> account NN I-NP
>>> deficit NN I-NP
>>> will MD B-VP
>>> narrow VB I-VP
>>> to TO B-PP
>>> only RB B-NP
>>> # # I-NP
>>> 1.8 CD I-NP
>>> billion CD I-NP
>>> in IN B-PP
>>> September NNP B-NP
>>> . . O
>>>
>>> - The second tested file contains 300 Vietnamese sentences. As
>>> described on the site: Each word has been put on a separate line and
>>> there is an empty line after each sentence..
>>>
>>> - Thirdly, I ran the program 2 times to train these two files.
>>> With both times, I had the same error, right after reading the first
>>> sentence.
>>>
>>> Would you please point out that I misses something?
>>>
>>> PS: I trained Tokenizer and POS Tagger successfully according to the
>>> instruction on this site :)
>>>
>>> Thank you so much for helping me.
>>>
>>> Best regards,
>>>
>>> Trung Tran.
>>>
>>>
>>> On 05/26/2016 07:49 PM, ttrung@nlke-group.net wrote:
>>>> Dear Apache OpenNLP Project Team,
>>>
>>>> I have re-tested with sample sentence in the site
>>>> (https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training)
>>>> :
>>>>
>>>> He PRP B-NP
>>>> reckons VBZ B-VP
>>>> the DT B-NP
>>>> current JJ I-NP
>>>> account NN I-NP
>>>> deficit NN I-NP
>>>> will MD B-VP
>>>> narrow VB I-VP
>>>> to TO B-PP
>>>> only RB B-NP
>>>> # # I-NP
>>>> 1.8 CD I-NP
>>>> billion CD I-NP
>>>> in IN B-PP
>>>> September NNP B-NP
>>>> . . O
>>>> And I still receive the same error:
>>>>
>>>> Skipping corrupt line: He PRP B-NPreckons VBZ
>>>> B-VPthe DT B-NPcurrent JJ I-NPaccount NN
>>>> I-NPdeficit NN I-NPwill MD B-VPnarrow VB
>>>> I-VPto TO B-PPonly RB B-NP# # I-NP1.8
>>>> CD I-NPbillion CD I-NPin IN B-PPSeptember NNP
>>>> B-NP. . O
>>>> Exception in thread "AWT-EventQueue-0"
>>>> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>>>> at java.util.ArrayList.rangeCheck(ArrayList.java:653)
>>>> at java.util.ArrayList.get(ArrayList.java:429)
>>>> at
>>>> opennlp.tools.ml.model.AbstractDataIndexer.sortAndMerge(AbstractDataIndexer.java:89)
>>>> at
>>>> opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:105)
>>>> at
>>>> opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74)
>>>> at
>>>> opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:91)
>>>> at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:217)
>>>> at
>>>> form.UtilitiesForm.handleTrainingTreebankWithApacheChunker(UtilitiesForm.java:2989)
>>>> at
>>>> form.UtilitiesForm.jmnitLoadTreebankActionPerformed(UtilitiesForm.java:1166)
>>>> at form.UtilitiesForm.access$1400(UtilitiesForm.java:108)
>>>> at form.UtilitiesForm$17.actionPerformed(UtilitiesForm.java:901)
>>>> at
>>>> javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2022)
>>>> at
>>>> javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2348)
>>>> at
>>>> javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402)
>>>> at
>>>> javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259)
>>>> at javax.swing.AbstractButton.doClick(AbstractButton.java:376)
>>>> at
>>>> javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:833)
>>>> at
>>>> javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:877)
>>>> at java.awt.Component.processMouseEvent(Component.java:6535)
>>>> at javax.swing.JComponent.processMouseEvent(JComponent.java:3324)
>>>> at java.awt.Component.processEvent(Component.java:6300)
>>>> at java.awt.Container.processEvent(Container.java:2236)
>>>> at java.awt.Component.dispatchEventImpl(Component.java:4891)
>>>> at java.awt.Container.dispatchEventImpl(Container.java:2294)
>>>> at java.awt.Component.dispatchEvent(Component.java:4713)
>>>> at
>>>> java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4888)
>>>> at
>>>> java.awt.LightweightDispatcher.processMouseEvent(Container.java:4525)
>>>> at
>>>> java.awt.LightweightDispatcher.dispatchEvent(Container.java:4466)
>>>> at java.awt.Container.dispatchEventImpl(Container.java:2280)
>>>> at java.awt.Window.dispatchEventImpl(Window.java:2750)
>>>> at java.awt.Component.dispatchEvent(Component.java:4713)
>>>> at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:758)
>>>> at java.awt.EventQueue.access$500(EventQueue.java:97)
>>>> at java.awt.EventQueue$3.run(EventQueue.java:709)
>>>> at java.awt.EventQueue$3.run(EventQueue.java:703)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>> at
>>>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
>>>> at
>>>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:86)
>>>> at java.awt.EventQueue$4.run(EventQueue.java:731)
>>>> at java.awt.EventQueue$4.run(EventQueue.java:729)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>> at
>>>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
>>>> at java.awt.EventQueue.dispatchEvent(EventQueue.java:728)
>>>> at
>>>> java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201)
>>>> at
>>>> java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116)
>>>> at
>>>> java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105)
>>>> at
>>>> java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
>>>> at
>>>> java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
>>>> at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)
>>>> Sorting and merging events...
>>>>
>>>> Here are whole java code:
>>>>
>>>> try {
>>>> Charset charset = Charset.forName("UTF-8");
>>>> File fileChunker = new File("trainApacheChunker.txt");
>>>> MarkableFileInputStreamFactory i = new
>>>> MarkableFileInputStreamFactory(fileChunker);
>>>> ObjectStream lineStream = new PlainTextByLineStream(i,
>>>> charset);
>>>> ObjectStream<ChunkSample> sampleStream = new
>>>> ChunkSampleStream(lineStream);
>>>>
>>>> chunkerModel = ChunkerME.train("en", sampleStream,
>>>> TrainingParameters.defaultParams(), new ChunkerFactory());
>>>>
>>>> modelApacheChunkerPath = "chunkerModel.bin";
>>>> OutputStream modelOut = new BufferedOutputStream(new
>>>> FileOutputStream(modelApacheChunkerPath));
>>>> chunkerModel.serialize(modelOut);
>>>> } catch (FileNotFoundException fe) {
>>>>
>>>> } catch (IOException ie) {
>>>>
>>>> }
>>>>
>>>> Would you please check this point for me?
>>>>
>>>> Thank you so much for your help.
>>>>
>>>> Best regards,
>>>>
>>>> Trung Tran.
>>>>
>>>>
>>>> On 05/18/2016 04:56 AM, ttrung@nlke-group.net wrote:
>>>>> Dear Apache OpenNLP Project Team,
>>>>>
>>>>> Thank you so much for giving me very useful information about class (
>>>>> /opennlp-tools/src/main/java/opennlp/tools/chunker/ChunkSampleStream.java
>>>>> )
>>>>>
>>>>> It works very well.
>>>>>
>>>>> There is one more point: I have error when train Vietnamese
>>>>> sentences (more than 2 sentences in one training file).
>>>>>
>>>>> Here is 2 example sentences in file trainChunker.txt:
>>>>>
>>>>> buo^?i _T_C B-ADVP
>>>>> tru+a _T_C I-ADVP
>>>>> , , O
>>>>> cu+`u A_C B-NP
>>>>> cha.y IT_M B-VP
>>>>> theo IT_M I-VP
>>>>> me. H_C I-VP
>>>>> ra IT_M B-PP
>>>>> bo+` S_C I-PP
>>>>> suo^'i S_C I-PP
>>>>> . . O
>>>>>
>>>>> n C_N_T B-NP
>>>>> tha^'y S_P B-VP
>>>>> ba^`y A_G B-NP
>>>>> hu+o+u A_C I-NP
>>>>> nai A_C I-NP
>>>>> ? ST_P_S B-CONJP
>>>>> o+? IT_P_C B-PP
>>>>> ?a^'y C_N_T I-PP
>>>>> ro^`i T_G I-PP
>>>>> . . O
>>>>>
>>>>> Here is the error right after train the first sentence:
>>>>>
>>>>> Skipping corrupt line: buo^?i _T_C B-ADVP
>>>>> Skipping corrupt line: tru+a _T_C I-ADVP
>>>>> Skipping corrupt line: , , O
>>>>> Skipping corrupt line: cu+`u A_C B-NP
>>>>> Skipping corrupt line: cha.y IT_M B-VP
>>>>> Skipping corrupt line: theo IT_M I-VP
>>>>> Skipping corrupt line: me. H_C I-VP
>>>>> Skipping corrupt line: ra IT_M B-PP
>>>>> Skipping corrupt line: bo+` S_C I-PP
>>>>> Skipping corrupt line: suo^'i S_C I-PP
>>>>> Skipping corrupt line: . . O
>>>>> Exception in thread "AWT-EventQueue-0"
>>>>> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>>>>> at java.util.ArrayList.rangeCheck(ArrayList.java:653)
>>>>> at java.util.ArrayList.get(ArrayList.java:429)
>>>>> at
>>>>> opennlp.tools.ml.model.AbstractDataIndexer.sortAndMerge(AbstractDataIndexer.java:89)
>>>>> at
>>>>> opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:105)
>>>>> at
>>>>> opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74)
>>>>> at
>>>>> opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:91)
>>>>> at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:217)
>>>>> at
>>>>> form.UtilitiesForm.handleTrainingTreebankWithApacheChunker(UtilitiesForm.java:2939)
>>>>> at
>>>>> form.UtilitiesForm.jmnitLoadTreebankActionPerformed(UtilitiesForm.java:1136)
>>>>> at form.UtilitiesForm.access$1400(UtilitiesForm.java:108)
>>>>> at form.UtilitiesForm$17.actionPerformed(UtilitiesForm.java:901)
>>>>> at
>>>>> javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2022)
>>>>> at
>>>>> javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2348)
>>>>> at
>>>>> javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402)
>>>>> at
>>>>> javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259)
>>>>> at javax.swing.AbstractButton.doClick(AbstractButton.java:376)
>>>>> at
>>>>> javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:833)
>>>>> at
>>>>> javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:877)
>>>>> Sorting and merging events... at
>>>>> java.awt.Component.processMouseEvent(Component.java:6535)
>>>>> at javax.swing.JComponent.processMouseEvent(JComponent.java:3324)
>>>>> at java.awt.Component.processEvent(Component.java:6300)
>>>>> at java.awt.Container.processEvent(Container.java:2236)
>>>>> at java.awt.Component.dispatchEventImpl(Component.java:4891)
>>>>> at java.awt.Container.dispatchEventImpl(Container.java:2294)
>>>>> at java.awt.Component.dispatchEvent(Component.java:4713)
>>>>> at
>>>>> java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4888)
>>>>> at
>>>>> java.awt.LightweightDispatcher.processMouseEvent(Container.java:4525)
>>>>> at
>>>>> java.awt.LightweightDispatcher.dispatchEvent(Container.java:4466)
>>>>> at java.awt.Container.dispatchEventImpl(Container.java:2280)
>>>>> at java.awt.Window.dispatchEventImpl(Window.java:2750)
>>>>> at java.awt.Component.dispatchEvent(Component.java:4713)
>>>>> at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:758)
>>>>> at java.awt.EventQueue.access$500(EventQueue.java:97)
>>>>> at java.awt.EventQueue$3.run(EventQueue.java:709)
>>>>> at java.awt.EventQueue$3.run(EventQueue.java:703)
>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>> at
>>>>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
>>>>> at
>>>>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:86)
>>>>> at java.awt.EventQueue$4.run(EventQueue.java:731)
>>>>> at java.awt.EventQueue$4.run(EventQueue.java:729)
>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>> at
>>>>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
>>>>> at java.awt.EventQueue.dispatchEvent(EventQueue.java:728)
>>>>> at
>>>>> java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201)
>>>>> at
>>>>> java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116)
>>>>> at
>>>>> java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105)
>>>>> at
>>>>> java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
>>>>> at
>>>>> java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
>>>>> at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)
>>>>>
>>>>> Would you please check these points for me?
>>>>>
>>>>> Thank you so much for your help.
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Trung Tran.
>>>>>
>>>>> On 05/17/2016 08:15 PM, ttrung@nlke-group.net wrote:
>>>>>> Dear Apache OpenNLP Project Team,
>>>>>>
>>>>>> I have another error with command line tool:
>>>>>>
>>>>>> - I did exactly as information in site
>>>>>> (https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training.tool):
>>>>>>
>>>>>> E:\test\apache-opennlp-1.5.3\bin>opennlp.bat ChunkerTrainerME
>>>>>> -model E:\test\en-chunker.bin -lang en -data E:\test\tmp.txt
>>>>>> -encoding UTF-8
>>>>>>
>>>>>> File test only contains sample sentence as in the site :
>>>>>>
>>>>>> He PRP B-NP
>>>>>> reckons VBZ B-VP
>>>>>> the DT B-NP
>>>>>> current JJ I-NP
>>>>>> account NN I-NP
>>>>>> deficit NN I-NP
>>>>>> will MD B-VP
>>>>>> narrow VB I-VP
>>>>>> to TO B-PP
>>>>>> only RB B-NP
>>>>>> # # I-NP
>>>>>> 1.8 CD I-NP
>>>>>> billion CD I-NP
>>>>>> in IN B-PP
>>>>>> September NNP B-NP
>>>>>> . . O
>>>>>> And here is the error:
>>>>>>
>>>>>> Computing event counts... done. 0 events
>>>>>> Indexing... done.
>>>>>> Sorting and merging events... Done indexing.
>>>>>> Incorporating indexed data for training...
>>>>>> Exception in thread "main" java.lang.NullPointerException
>>>>>> at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
>>>>>> at opennlp.maxent.GIS.trainModel(GIS.java:256)
>>>>>> at opennlp.model.TrainUtil.train(TrainUtil.java:184)
>>>>>> at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:214)
>>>>>> at
>>>>>> opennlp.tools.cmdline.chunker.ChunkerTrainerTool.run(ChunkerTrainerTo
>>>>>> ol.java:68)
>>>>>> at opennlp.tools.cmdline.CLI.main(CLI.java:222)
>>>>>>
>>>>>>
>>>>>> Another point: The function cannot read more than 2 sentence in
>>>>>> one train file.
>>>>>>
>>>>>> Would you please check these points for me?
>>>>>>
>>>>>> Thank you so much for your help.
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> Trung Tran.
>>>>>>
>>>>>> On 05/17/2016 02:06 PM, ttrung@nlke-group.net wrote:
>>>>>>> Dear Apache OpenNLP Project Team,
>>>>>>>
>>>>>>> I have an critical issue when training with Chunker tool in Java:
>>>>>>>
>>>>>>> - Firstly, the sample code in documentation site
>>>>>>> (https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training.api)
>>>>>>> is not work, both for version 1.5.3 and 1.6.0
>>>>>>>
>>>>>>> - Secondly, I have to edit the codes myself to (using
>>>>>>> version 1.5.3):
>>>>>>>
>>>>>>> try {
>>>>>>> Charset charset = Charset.forName("UTF-8");
>>>>>>> ObjectStream lineStream = new
>>>>>>> PlainTextByLineStream(new FileInputStream(fileChunker), charset);
>>>>>>> ObjectStream<ChunkSample> sampleStream = new
>>>>>>> ChunkSampleStream(lineStream);
>>>>>>>
>>>>>>> chunkerModel = ChunkerME.train("vn", sampleStream,
>>>>>>> TrainingParameters.defaultParams(), new ChunkerFactory());
>>>>>>>
>>>>>>> modelApacheChunkerPath =
>>>>>>> UtilityHelper.getTemporaryFilePathInsideDir("chunkerModel.bin");
>>>>>>> OutputStream modelOut = new BufferedOutputStream(new
>>>>>>> FileOutputStream(modelApacheChunkerPath));
>>>>>>> chunkerModel.serialize(modelOut);
>>>>>>> } catch (FileNotFoundException fe) {
>>>>>>>
>>>>>>> } catch (IOException ie) {
>>>>>>>
>>>>>>> }
>>>>>>>
>>>>>>> - Thirdly, I have the error "java.lang.String cannot be cast
>>>>>>> to opennlp.tools.parser.Parse". The reason is:
>>>>>>>
>>>>>>> + The constructor of class ChunkSampleStream
>>>>>>> requires parameter is "ObjectStream<Parse> in"
>>>>>>>
>>>>>>> + However, the second parameter of method
>>>>>>> ChunkerME.train is "ObjectStream<ChunkSample> in"
>>>>>>>
>>>>>>> I cannot find any way to work around this issue.
>>>>>>>
>>>>>>> Would you please check this point for me?
>>>>>>>
>>>>>>> Thank you so much for your help.
>>>>>>>
>>>>>>> Best regards,
>>>>>>>
>>>>>>> Trung Tran.
>>>>>>
>>>>>
>>>>
>>>
>>
>
Re: Stuck: Cannot train Chunker even with the instruction on the site
Posted by Rodrigo Agerri <ro...@ehu.eus>.
Hi,
Your data is bad-formatted. Only one space between columns is
required. As I have repeatedly mentioned, the chunker takes the CoNLL
2000 format, which I assume you have not checked.
If you just separate the columns by a single space, the training will
be successfully completed, although it will not be of much use with
only one or two sentences in the training data.
bin/opennlp ChunkerTrainerME -params
lang/ml/PerceptronTrainerParams.txt -lang en -model vietnamese.bin
-data ~/vietnamese.txt
Indexing events using cutoff of 0
Computing event counts... done. 27 events
Indexing... done.
Collecting events... Done indexing.
Incorporating indexed data for training...
done.
Number of Event Tokens: 27
Number of Outcomes: 10
Number of Predicates: 944
Computing model parameters...
Performing 300 iterations.
1: . (8/27) 0.2962962962962963
2: . (24/27) 0.8888888888888888
3: . (27/27) 1.0
4: . (27/27) 1.0
5: . (27/27) 1.0
6: . (27/27) 1.0
Stopping: change in training set accuracy less than 1.0E-5
Stats: (27/27) 1.0
...done.
Writing chunker model ... Compressed 944 parameters to 805
39 outcome patterns
done (0.103s)
Wrote chunker model to
path: vietnamese.bin
HTH,
Rodrigo
On Tue, Jun 7, 2016 at 3:57 PM, ttrung@nlke-group.net
<tt...@nlke-group.net> wrote:
> Dear Apache OpenNLP Project Team,
>
> Here are two test file for Chunker Training which we trained:
> https://www.fshare.vn/folder/WNDJBN38LYV7
>
> Would you please try to test these files for us?
>
> Thank you so much for your help.
>
> Best regards.
>
> Trung Tran.
>
>
>
> On 06/04/2016 04:30 AM, ttrung@nlke-group.net wrote:
>>
>> Dear Apache OpenNLP Project Team,
>>
>> We really appreciate that you provides the wonderful tools OpenNLG and we
>> already successfully trained with most (Tokenizer, POS Tagger).
>>
>> There is only one small problem (we really believe this) that I described
>> below when training with Chunker.
>>
>> I hope that you will re-test and give us some information soon so that we
>> can fix this critical point.
>>
>> By the way, you are always amazing team :)
>>
>> Thank you so much for your help.
>>
>> Best regards,
>>
>> Trung Tran.
>>
>>
>> On 05/27/2016 08:23 AM, ttrung@nlke-group.net wrote:
>>>
>>> Dear Apache OpenNLP Project Team,
>>>
>>> To help you reproduce the situation, I describe the experiment step by
>>> step here:
>>>
>>> - Firstly, I read carefully the instruction on the site
>>> (https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training):
>>>
>>> " The training data can be converted to the OpenNLP chunker training
>>> format, that is based onCoNLL2000
>>> <http://www.cnts.ua.ac.be/conll2000/chunking>. Other formats may also be
>>> available. The train data consist of three columns separated by spaces. Each
>>> word has been put on a separate line and there is an empty line after each
>>> sentence. The first column contains the current word, the second its
>>> part-of-speech tag and the third its chunk tag. The chunk tags contain the
>>> name of the chunk type, for example I-NP for noun phrase words and I-VP for
>>> verb phrase words. Most chunk types have two types of chunk tags, B-CHUNK
>>> for the first word of the chunk and I-CHUNK for each other word in the
>>> chunk. Here is an example of the file format:"
>>>
>>> - Secondly, I created two tested file ".txt". The first file contains
>>> only one sample sentence on the site:
>>>
>>> He PRP B-NP
>>> reckons VBZ B-VP
>>> the DT B-NP
>>> current JJ I-NP
>>> account NN I-NP
>>> deficit NN I-NP
>>> will MD B-VP
>>> narrow VB I-VP
>>> to TO B-PP
>>> only RB B-NP
>>> # # I-NP
>>> 1.8 CD I-NP
>>> billion CD I-NP
>>> in IN B-PP
>>> September NNP B-NP
>>> . . O
>>>
>>> - The second tested file contains 300 Vietnamese sentences. As
>>> described on the site: Each word has been put on a separate line and there
>>> is an empty line after each sentence..
>>>
>>> - Thirdly, I ran the program 2 times to train these two files. With
>>> both times, I had the same error, right after reading the first sentence.
>>>
>>> Would you please point out that I misses something?
>>>
>>> PS: I trained Tokenizer and POS Tagger successfully according to the
>>> instruction on this site :)
>>>
>>> Thank you so much for helping me.
>>>
>>> Best regards,
>>>
>>> Trung Tran.
>>>
>>>
>>> On 05/26/2016 07:49 PM, ttrung@nlke-group.net wrote:
>>>>
>>>> Dear Apache OpenNLP Project Team,
>>>
>>>
>>>> I have re-tested with sample sentence in the site
>>>> (https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training)
>>>> :
>>>>
>>>> He PRP B-NP
>>>> reckons VBZ B-VP
>>>> the DT B-NP
>>>> current JJ I-NP
>>>> account NN I-NP
>>>> deficit NN I-NP
>>>> will MD B-VP
>>>> narrow VB I-VP
>>>> to TO B-PP
>>>> only RB B-NP
>>>> # # I-NP
>>>> 1.8 CD I-NP
>>>> billion CD I-NP
>>>> in IN B-PP
>>>> September NNP B-NP
>>>> . . O
>>>> And I still receive the same error:
>>>>
>>>> Skipping corrupt line: He PRP B-NPreckons VBZ B-VPthe DT
>>>> B-NPcurrent JJ I-NPaccount NN I-NPdeficit NN I-NPwill MD
>>>> B-VPnarrow VB I-VPto TO B-PPonly RB B-NP# #
>>>> I-NP1.8 CD I-NPbillion CD I-NPin IN B-PPSeptember NNP
>>>> B-NP. . O
>>>> Exception in thread "AWT-EventQueue-0"
>>>> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>>>> at java.util.ArrayList.rangeCheck(ArrayList.java:653)
>>>> at java.util.ArrayList.get(ArrayList.java:429)
>>>> at
>>>> opennlp.tools.ml.model.AbstractDataIndexer.sortAndMerge(AbstractDataIndexer.java:89)
>>>> at
>>>> opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:105)
>>>> at
>>>> opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74)
>>>> at
>>>> opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:91)
>>>> at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:217)
>>>> at
>>>> form.UtilitiesForm.handleTrainingTreebankWithApacheChunker(UtilitiesForm.java:2989)
>>>> at
>>>> form.UtilitiesForm.jmnitLoadTreebankActionPerformed(UtilitiesForm.java:1166)
>>>> at form.UtilitiesForm.access$1400(UtilitiesForm.java:108)
>>>> at form.UtilitiesForm$17.actionPerformed(UtilitiesForm.java:901)
>>>> at
>>>> javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2022)
>>>> at
>>>> javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2348)
>>>> at
>>>> javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402)
>>>> at
>>>> javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259)
>>>> at javax.swing.AbstractButton.doClick(AbstractButton.java:376)
>>>> at
>>>> javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:833)
>>>> at
>>>> javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:877)
>>>> at java.awt.Component.processMouseEvent(Component.java:6535)
>>>> at javax.swing.JComponent.processMouseEvent(JComponent.java:3324)
>>>> at java.awt.Component.processEvent(Component.java:6300)
>>>> at java.awt.Container.processEvent(Container.java:2236)
>>>> at java.awt.Component.dispatchEventImpl(Component.java:4891)
>>>> at java.awt.Container.dispatchEventImpl(Container.java:2294)
>>>> at java.awt.Component.dispatchEvent(Component.java:4713)
>>>> at
>>>> java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4888)
>>>> at
>>>> java.awt.LightweightDispatcher.processMouseEvent(Container.java:4525)
>>>> at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4466)
>>>> at java.awt.Container.dispatchEventImpl(Container.java:2280)
>>>> at java.awt.Window.dispatchEventImpl(Window.java:2750)
>>>> at java.awt.Component.dispatchEvent(Component.java:4713)
>>>> at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:758)
>>>> at java.awt.EventQueue.access$500(EventQueue.java:97)
>>>> at java.awt.EventQueue$3.run(EventQueue.java:709)
>>>> at java.awt.EventQueue$3.run(EventQueue.java:703)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>> at
>>>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
>>>> at
>>>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:86)
>>>> at java.awt.EventQueue$4.run(EventQueue.java:731)
>>>> at java.awt.EventQueue$4.run(EventQueue.java:729)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>> at
>>>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
>>>> at java.awt.EventQueue.dispatchEvent(EventQueue.java:728)
>>>> at
>>>> java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201)
>>>> at
>>>> java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116)
>>>> at
>>>> java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105)
>>>> at
>>>> java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
>>>> at
>>>> java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
>>>> at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)
>>>> Sorting and merging events...
>>>>
>>>> Here are whole java code:
>>>>
>>>> try {
>>>> Charset charset = Charset.forName("UTF-8");
>>>> File fileChunker = new File("trainApacheChunker.txt");
>>>> MarkableFileInputStreamFactory i = new
>>>> MarkableFileInputStreamFactory(fileChunker);
>>>> ObjectStream lineStream = new PlainTextByLineStream(i,
>>>> charset);
>>>> ObjectStream<ChunkSample> sampleStream = new
>>>> ChunkSampleStream(lineStream);
>>>>
>>>> chunkerModel = ChunkerME.train("en", sampleStream,
>>>> TrainingParameters.defaultParams(), new ChunkerFactory());
>>>>
>>>> modelApacheChunkerPath = "chunkerModel.bin";
>>>> OutputStream modelOut = new BufferedOutputStream(new
>>>> FileOutputStream(modelApacheChunkerPath));
>>>> chunkerModel.serialize(modelOut);
>>>> } catch (FileNotFoundException fe) {
>>>>
>>>> } catch (IOException ie) {
>>>>
>>>> }
>>>>
>>>> Would you please check this point for me?
>>>>
>>>> Thank you so much for your help.
>>>>
>>>> Best regards,
>>>>
>>>> Trung Tran.
>>>>
>>>>
>>>> On 05/18/2016 04:56 AM, ttrung@nlke-group.net wrote:
>>>>>
>>>>> Dear Apache OpenNLP Project Team,
>>>>>
>>>>> Thank you so much for giving me very useful information about class (
>>>>>
>>>>> /opennlp-tools/src/main/java/opennlp/tools/chunker/ChunkSampleStream.java
>>>>> )
>>>>>
>>>>> It works very well.
>>>>>
>>>>> There is one more point: I have error when train Vietnamese sentences
>>>>> (more than 2 sentences in one training file).
>>>>>
>>>>> Here is 2 example sentences in file trainChunker.txt:
>>>>>
>>>>> buo^?i _T_C B-ADVP
>>>>> tru+a _T_C I-ADVP
>>>>> , , O
>>>>> cu+`u A_C B-NP
>>>>> cha.y IT_M B-VP
>>>>> theo IT_M I-VP
>>>>> me. H_C I-VP
>>>>> ra IT_M B-PP
>>>>> bo+` S_C I-PP
>>>>> suo^'i S_C I-PP
>>>>> . . O
>>>>>
>>>>> nó C_N_T B-NP
>>>>> tha^'y S_P B-VP
>>>>> ba^`y A_G B-NP
>>>>> hu+o+u A_C I-NP
>>>>> nai A_C I-NP
>>>>> ?ã ST_P_S B-CONJP
>>>>> o+? IT_P_C B-PP
>>>>> ?a^'y C_N_T I-PP
>>>>> ro^`i T_G I-PP
>>>>> . . O
>>>>>
>>>>> Here is the error right after train the first sentence:
>>>>>
>>>>> Skipping corrupt line: buo^?i _T_C B-ADVP
>>>>> Skipping corrupt line: tru+a _T_C I-ADVP
>>>>> Skipping corrupt line: , , O
>>>>> Skipping corrupt line: cu+`u A_C B-NP
>>>>> Skipping corrupt line: cha.y IT_M B-VP
>>>>> Skipping corrupt line: theo IT_M I-VP
>>>>> Skipping corrupt line: me. H_C I-VP
>>>>> Skipping corrupt line: ra IT_M B-PP
>>>>> Skipping corrupt line: bo+` S_C I-PP
>>>>> Skipping corrupt line: suo^'i S_C I-PP
>>>>> Skipping corrupt line: . . O
>>>>> Exception in thread "AWT-EventQueue-0"
>>>>> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>>>>> at java.util.ArrayList.rangeCheck(ArrayList.java:653)
>>>>> at java.util.ArrayList.get(ArrayList.java:429)
>>>>> at
>>>>> opennlp.tools.ml.model.AbstractDataIndexer.sortAndMerge(AbstractDataIndexer.java:89)
>>>>> at
>>>>> opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:105)
>>>>> at
>>>>> opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74)
>>>>> at
>>>>> opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:91)
>>>>> at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:217)
>>>>> at
>>>>> form.UtilitiesForm.handleTrainingTreebankWithApacheChunker(UtilitiesForm.java:2939)
>>>>> at
>>>>> form.UtilitiesForm.jmnitLoadTreebankActionPerformed(UtilitiesForm.java:1136)
>>>>> at form.UtilitiesForm.access$1400(UtilitiesForm.java:108)
>>>>> at form.UtilitiesForm$17.actionPerformed(UtilitiesForm.java:901)
>>>>> at
>>>>> javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2022)
>>>>> at
>>>>> javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2348)
>>>>> at
>>>>> javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402)
>>>>> at
>>>>> javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259)
>>>>> at javax.swing.AbstractButton.doClick(AbstractButton.java:376)
>>>>> at
>>>>> javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:833)
>>>>> at
>>>>> javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:877)
>>>>> Sorting and merging events... at
>>>>> java.awt.Component.processMouseEvent(Component.java:6535)
>>>>> at javax.swing.JComponent.processMouseEvent(JComponent.java:3324)
>>>>> at java.awt.Component.processEvent(Component.java:6300)
>>>>> at java.awt.Container.processEvent(Container.java:2236)
>>>>> at java.awt.Component.dispatchEventImpl(Component.java:4891)
>>>>> at java.awt.Container.dispatchEventImpl(Container.java:2294)
>>>>> at java.awt.Component.dispatchEvent(Component.java:4713)
>>>>> at
>>>>> java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4888)
>>>>> at
>>>>> java.awt.LightweightDispatcher.processMouseEvent(Container.java:4525)
>>>>> at
>>>>> java.awt.LightweightDispatcher.dispatchEvent(Container.java:4466)
>>>>> at java.awt.Container.dispatchEventImpl(Container.java:2280)
>>>>> at java.awt.Window.dispatchEventImpl(Window.java:2750)
>>>>> at java.awt.Component.dispatchEvent(Component.java:4713)
>>>>> at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:758)
>>>>> at java.awt.EventQueue.access$500(EventQueue.java:97)
>>>>> at java.awt.EventQueue$3.run(EventQueue.java:709)
>>>>> at java.awt.EventQueue$3.run(EventQueue.java:703)
>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>> at
>>>>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
>>>>> at
>>>>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:86)
>>>>> at java.awt.EventQueue$4.run(EventQueue.java:731)
>>>>> at java.awt.EventQueue$4.run(EventQueue.java:729)
>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>> at
>>>>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
>>>>> at java.awt.EventQueue.dispatchEvent(EventQueue.java:728)
>>>>> at
>>>>> java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201)
>>>>> at
>>>>> java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116)
>>>>> at
>>>>> java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105)
>>>>> at
>>>>> java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
>>>>> at
>>>>> java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
>>>>> at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)
>>>>>
>>>>> Would you please check these points for me?
>>>>>
>>>>> Thank you so much for your help.
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Trung Tran.
>>>>>
>>>>> On 05/17/2016 08:15 PM, ttrung@nlke-group.net wrote:
>>>>>>
>>>>>> Dear Apache OpenNLP Project Team,
>>>>>>
>>>>>> I have another error with command line tool:
>>>>>>
>>>>>> - I did exactly as information in site
>>>>>> (https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training.tool):
>>>>>>
>>>>>> E:\test\apache-opennlp-1.5.3\bin>opennlp.bat ChunkerTrainerME -model
>>>>>> E:\test\en-chunker.bin -lang en -data E:\test\tmp.txt -encoding UTF-8
>>>>>>
>>>>>> File test only contains sample sentence as in the site :
>>>>>>
>>>>>> He PRP B-NP
>>>>>> reckons VBZ B-VP
>>>>>> the DT B-NP
>>>>>> current JJ I-NP
>>>>>> account NN I-NP
>>>>>> deficit NN I-NP
>>>>>> will MD B-VP
>>>>>> narrow VB I-VP
>>>>>> to TO B-PP
>>>>>> only RB B-NP
>>>>>> # # I-NP
>>>>>> 1.8 CD I-NP
>>>>>> billion CD I-NP
>>>>>> in IN B-PP
>>>>>> September NNP B-NP
>>>>>> . . O
>>>>>> And here is the error:
>>>>>>
>>>>>> Computing event counts... done. 0 events
>>>>>> Indexing... done.
>>>>>> Sorting and merging events... Done indexing.
>>>>>> Incorporating indexed data for training...
>>>>>> Exception in thread "main" java.lang.NullPointerException
>>>>>> at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
>>>>>> at opennlp.maxent.GIS.trainModel(GIS.java:256)
>>>>>> at opennlp.model.TrainUtil.train(TrainUtil.java:184)
>>>>>> at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:214)
>>>>>> at
>>>>>> opennlp.tools.cmdline.chunker.ChunkerTrainerTool.run(ChunkerTrainerTo
>>>>>> ol.java:68)
>>>>>> at opennlp.tools.cmdline.CLI.main(CLI.java:222)
>>>>>>
>>>>>>
>>>>>> Another point: The function cannot read more than 2 sentence in one
>>>>>> train file.
>>>>>>
>>>>>> Would you please check these points for me?
>>>>>>
>>>>>> Thank you so much for your help.
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> Trung Tran.
>>>>>>
>>>>>> On 05/17/2016 02:06 PM, ttrung@nlke-group.net wrote:
>>>>>>>
>>>>>>> Dear Apache OpenNLP Project Team,
>>>>>>>
>>>>>>> I have an critical issue when training with Chunker tool in Java:
>>>>>>>
>>>>>>> - Firstly, the sample code in documentation site
>>>>>>> (https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training.api)
>>>>>>> is not work, both for version 1.5.3 and 1.6.0
>>>>>>>
>>>>>>> - Secondly, I have to edit the codes myself to (using version
>>>>>>> 1.5.3):
>>>>>>>
>>>>>>> try {
>>>>>>> Charset charset = Charset.forName("UTF-8");
>>>>>>> ObjectStream lineStream = new PlainTextByLineStream(new
>>>>>>> FileInputStream(fileChunker), charset);
>>>>>>> ObjectStream<ChunkSample> sampleStream = new
>>>>>>> ChunkSampleStream(lineStream);
>>>>>>>
>>>>>>> chunkerModel = ChunkerME.train("vn", sampleStream,
>>>>>>> TrainingParameters.defaultParams(), new ChunkerFactory());
>>>>>>>
>>>>>>> modelApacheChunkerPath =
>>>>>>> UtilityHelper.getTemporaryFilePathInsideDir("chunkerModel.bin");
>>>>>>> OutputStream modelOut = new BufferedOutputStream(new
>>>>>>> FileOutputStream(modelApacheChunkerPath));
>>>>>>> chunkerModel.serialize(modelOut);
>>>>>>> } catch (FileNotFoundException fe) {
>>>>>>>
>>>>>>> } catch (IOException ie) {
>>>>>>>
>>>>>>> }
>>>>>>>
>>>>>>> - Thirdly, I have the error "java.lang.String cannot be cast to
>>>>>>> opennlp.tools.parser.Parse". The reason is:
>>>>>>>
>>>>>>> + The constructor of class ChunkSampleStream requires
>>>>>>> parameter is "ObjectStream<Parse> in"
>>>>>>>
>>>>>>> + However, the second parameter of method ChunkerME.train
>>>>>>> is "ObjectStream<ChunkSample> in"
>>>>>>>
>>>>>>> I cannot find any way to work around this issue.
>>>>>>>
>>>>>>> Would you please check this point for me?
>>>>>>>
>>>>>>> Thank you so much for your help.
>>>>>>>
>>>>>>> Best regards,
>>>>>>>
>>>>>>> Trung Tran.
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
Re: Stuck: Cannot train Chunker even with the instruction on the
site
Posted by "ttrung@nlke-group.net" <tt...@nlke-group.net>.
Dear Apache OpenNLP Project Team,
Here are two test file for Chunker Training which we trained:
https://www.fshare.vn/folder/WNDJBN38LYV7
Would you please try to test these files for us?
Thank you so much for your help.
Best regards.
Trung Tran.
On 06/04/2016 04:30 AM, ttrung@nlke-group.net wrote:
> Dear Apache OpenNLP Project Team,
>
> We really appreciate that you provides the wonderful tools OpenNLG and
> we already successfully trained with most (Tokenizer, POS Tagger).
>
> There is only one small problem (we really believe this) that I
> described below when training with Chunker.
>
> I hope that you will re-test and give us some information soon so that
> we can fix this critical point.
>
> By the way, you are always amazing team :)
>
> Thank you so much for your help.
>
> Best regards,
>
> Trung Tran.
>
>
> On 05/27/2016 08:23 AM, ttrung@nlke-group.net wrote:
>> Dear Apache OpenNLP Project Team,
>>
>> To help you reproduce the situation, I describe the experiment step
>> by step here:
>>
>> - Firstly, I read carefully the instruction on the site
>> (https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training):
>>
>> " The training data can be converted to the OpenNLP chunker training
>> format, that is based onCoNLL2000
>> <http://www.cnts.ua.ac.be/conll2000/chunking>. Other formats may also
>> be available. The train data consist of three columns separated by
>> spaces. Each word has been put on a separate line and there is an
>> empty line after each sentence. The first column contains the current
>> word, the second its part-of-speech tag and the third its chunk tag.
>> The chunk tags contain the name of the chunk type, for example I-NP
>> for noun phrase words and I-VP for verb phrase words. Most chunk
>> types have two types of chunk tags, B-CHUNK for the first word of the
>> chunk and I-CHUNK for each other word in the chunk. Here is an
>> example of the file format:"
>>
>> - Secondly, I created two tested file ".txt". The first file
>> contains only one sample sentence on the site:
>>
>> He PRP B-NP
>> reckons VBZ B-VP
>> the DT B-NP
>> current JJ I-NP
>> account NN I-NP
>> deficit NN I-NP
>> will MD B-VP
>> narrow VB I-VP
>> to TO B-PP
>> only RB B-NP
>> # # I-NP
>> 1.8 CD I-NP
>> billion CD I-NP
>> in IN B-PP
>> September NNP B-NP
>> . . O
>>
>> - The second tested file contains 300 Vietnamese sentences. As
>> described on the site: Each word has been put on a separate line and
>> there is an empty line after each sentence..
>>
>> - Thirdly, I ran the program 2 times to train these two files.
>> With both times, I had the same error, right after reading the first
>> sentence.
>>
>> Would you please point out that I misses something?
>>
>> PS: I trained Tokenizer and POS Tagger successfully according to the
>> instruction on this site :)
>>
>> Thank you so much for helping me.
>>
>> Best regards,
>>
>> Trung Tran.
>>
>>
>> On 05/26/2016 07:49 PM, ttrung@nlke-group.net wrote:
>>> Dear Apache OpenNLP Project Team,
>>
>>> I have re-tested with sample sentence in the site
>>> (https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training)
>>> :
>>>
>>> He PRP B-NP
>>> reckons VBZ B-VP
>>> the DT B-NP
>>> current JJ I-NP
>>> account NN I-NP
>>> deficit NN I-NP
>>> will MD B-VP
>>> narrow VB I-VP
>>> to TO B-PP
>>> only RB B-NP
>>> # # I-NP
>>> 1.8 CD I-NP
>>> billion CD I-NP
>>> in IN B-PP
>>> September NNP B-NP
>>> . . O
>>> And I still receive the same error:
>>>
>>> Skipping corrupt line: He PRP B-NPreckons VBZ
>>> B-VPthe DT B-NPcurrent JJ I-NPaccount NN I-NPdeficit
>>> NN I-NPwill MD B-VPnarrow VB I-VPto TO
>>> B-PPonly RB B-NP# # I-NP1.8 CD I-NPbillion
>>> CD I-NPin IN B-PPSeptember NNP B-NP. . O
>>> Exception in thread "AWT-EventQueue-0"
>>> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>>> at java.util.ArrayList.rangeCheck(ArrayList.java:653)
>>> at java.util.ArrayList.get(ArrayList.java:429)
>>> at
>>> opennlp.tools.ml.model.AbstractDataIndexer.sortAndMerge(AbstractDataIndexer.java:89)
>>> at
>>> opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:105)
>>> at
>>> opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74)
>>> at
>>> opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:91)
>>> at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:217)
>>> at
>>> form.UtilitiesForm.handleTrainingTreebankWithApacheChunker(UtilitiesForm.java:2989)
>>> at
>>> form.UtilitiesForm.jmnitLoadTreebankActionPerformed(UtilitiesForm.java:1166)
>>> at form.UtilitiesForm.access$1400(UtilitiesForm.java:108)
>>> at form.UtilitiesForm$17.actionPerformed(UtilitiesForm.java:901)
>>> at
>>> javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2022)
>>> at
>>> javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2348)
>>> at
>>> javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402)
>>> at
>>> javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259)
>>> at javax.swing.AbstractButton.doClick(AbstractButton.java:376)
>>> at
>>> javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:833)
>>> at
>>> javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:877)
>>> at java.awt.Component.processMouseEvent(Component.java:6535)
>>> at javax.swing.JComponent.processMouseEvent(JComponent.java:3324)
>>> at java.awt.Component.processEvent(Component.java:6300)
>>> at java.awt.Container.processEvent(Container.java:2236)
>>> at java.awt.Component.dispatchEventImpl(Component.java:4891)
>>> at java.awt.Container.dispatchEventImpl(Container.java:2294)
>>> at java.awt.Component.dispatchEvent(Component.java:4713)
>>> at
>>> java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4888)
>>> at
>>> java.awt.LightweightDispatcher.processMouseEvent(Container.java:4525)
>>> at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4466)
>>> at java.awt.Container.dispatchEventImpl(Container.java:2280)
>>> at java.awt.Window.dispatchEventImpl(Window.java:2750)
>>> at java.awt.Component.dispatchEvent(Component.java:4713)
>>> at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:758)
>>> at java.awt.EventQueue.access$500(EventQueue.java:97)
>>> at java.awt.EventQueue$3.run(EventQueue.java:709)
>>> at java.awt.EventQueue$3.run(EventQueue.java:703)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at
>>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
>>> at
>>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:86)
>>> at java.awt.EventQueue$4.run(EventQueue.java:731)
>>> at java.awt.EventQueue$4.run(EventQueue.java:729)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at
>>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
>>> at java.awt.EventQueue.dispatchEvent(EventQueue.java:728)
>>> at
>>> java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201)
>>> at
>>> java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116)
>>> at
>>> java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105)
>>> at
>>> java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
>>> at
>>> java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
>>> at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)
>>> Sorting and merging events...
>>>
>>> Here are whole java code:
>>>
>>> try {
>>> Charset charset = Charset.forName("UTF-8");
>>> File fileChunker = new File("trainApacheChunker.txt");
>>> MarkableFileInputStreamFactory i = new
>>> MarkableFileInputStreamFactory(fileChunker);
>>> ObjectStream lineStream = new PlainTextByLineStream(i,
>>> charset);
>>> ObjectStream<ChunkSample> sampleStream = new
>>> ChunkSampleStream(lineStream);
>>>
>>> chunkerModel = ChunkerME.train("en", sampleStream,
>>> TrainingParameters.defaultParams(), new ChunkerFactory());
>>>
>>> modelApacheChunkerPath = "chunkerModel.bin";
>>> OutputStream modelOut = new BufferedOutputStream(new
>>> FileOutputStream(modelApacheChunkerPath));
>>> chunkerModel.serialize(modelOut);
>>> } catch (FileNotFoundException fe) {
>>>
>>> } catch (IOException ie) {
>>>
>>> }
>>>
>>> Would you please check this point for me?
>>>
>>> Thank you so much for your help.
>>>
>>> Best regards,
>>>
>>> Trung Tran.
>>>
>>>
>>> On 05/18/2016 04:56 AM, ttrung@nlke-group.net wrote:
>>>> Dear Apache OpenNLP Project Team,
>>>>
>>>> Thank you so much for giving me very useful information about class (
>>>> /opennlp-tools/src/main/java/opennlp/tools/chunker/ChunkSampleStream.java
>>>> )
>>>>
>>>> It works very well.
>>>>
>>>> There is one more point: I have error when train Vietnamese
>>>> sentences (more than 2 sentences in one training file).
>>>>
>>>> Here is 2 example sentences in file trainChunker.txt:
>>>>
>>>> buo^?i _T_C B-ADVP
>>>> tru+a _T_C I-ADVP
>>>> , , O
>>>> cu+`u A_C B-NP
>>>> cha.y IT_M B-VP
>>>> theo IT_M I-VP
>>>> me. H_C I-VP
>>>> ra IT_M B-PP
>>>> bo+` S_C I-PP
>>>> suo^'i S_C I-PP
>>>> . . O
>>>>
>>>> n C_N_T B-NP
>>>> tha^'y S_P B-VP
>>>> ba^`y A_G B-NP
>>>> hu+o+u A_C I-NP
>>>> nai A_C I-NP
>>>> ? ST_P_S B-CONJP
>>>> o+? IT_P_C B-PP
>>>> ?a^'y C_N_T I-PP
>>>> ro^`i T_G I-PP
>>>> . . O
>>>>
>>>> Here is the error right after train the first sentence:
>>>>
>>>> Skipping corrupt line: buo^?i _T_C B-ADVP
>>>> Skipping corrupt line: tru+a _T_C I-ADVP
>>>> Skipping corrupt line: , , O
>>>> Skipping corrupt line: cu+`u A_C B-NP
>>>> Skipping corrupt line: cha.y IT_M B-VP
>>>> Skipping corrupt line: theo IT_M I-VP
>>>> Skipping corrupt line: me. H_C I-VP
>>>> Skipping corrupt line: ra IT_M B-PP
>>>> Skipping corrupt line: bo+` S_C I-PP
>>>> Skipping corrupt line: suo^'i S_C I-PP
>>>> Skipping corrupt line: . . O
>>>> Exception in thread "AWT-EventQueue-0"
>>>> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>>>> at java.util.ArrayList.rangeCheck(ArrayList.java:653)
>>>> at java.util.ArrayList.get(ArrayList.java:429)
>>>> at
>>>> opennlp.tools.ml.model.AbstractDataIndexer.sortAndMerge(AbstractDataIndexer.java:89)
>>>> at
>>>> opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:105)
>>>> at
>>>> opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74)
>>>> at
>>>> opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:91)
>>>> at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:217)
>>>> at
>>>> form.UtilitiesForm.handleTrainingTreebankWithApacheChunker(UtilitiesForm.java:2939)
>>>> at
>>>> form.UtilitiesForm.jmnitLoadTreebankActionPerformed(UtilitiesForm.java:1136)
>>>> at form.UtilitiesForm.access$1400(UtilitiesForm.java:108)
>>>> at form.UtilitiesForm$17.actionPerformed(UtilitiesForm.java:901)
>>>> at
>>>> javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2022)
>>>> at
>>>> javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2348)
>>>> at
>>>> javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402)
>>>> at
>>>> javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259)
>>>> at javax.swing.AbstractButton.doClick(AbstractButton.java:376)
>>>> at
>>>> javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:833)
>>>> at
>>>> javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:877)
>>>> Sorting and merging events... at
>>>> java.awt.Component.processMouseEvent(Component.java:6535)
>>>> at javax.swing.JComponent.processMouseEvent(JComponent.java:3324)
>>>> at java.awt.Component.processEvent(Component.java:6300)
>>>> at java.awt.Container.processEvent(Container.java:2236)
>>>> at java.awt.Component.dispatchEventImpl(Component.java:4891)
>>>> at java.awt.Container.dispatchEventImpl(Container.java:2294)
>>>> at java.awt.Component.dispatchEvent(Component.java:4713)
>>>> at
>>>> java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4888)
>>>> at
>>>> java.awt.LightweightDispatcher.processMouseEvent(Container.java:4525)
>>>> at
>>>> java.awt.LightweightDispatcher.dispatchEvent(Container.java:4466)
>>>> at java.awt.Container.dispatchEventImpl(Container.java:2280)
>>>> at java.awt.Window.dispatchEventImpl(Window.java:2750)
>>>> at java.awt.Component.dispatchEvent(Component.java:4713)
>>>> at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:758)
>>>> at java.awt.EventQueue.access$500(EventQueue.java:97)
>>>> at java.awt.EventQueue$3.run(EventQueue.java:709)
>>>> at java.awt.EventQueue$3.run(EventQueue.java:703)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>> at
>>>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
>>>> at
>>>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:86)
>>>> at java.awt.EventQueue$4.run(EventQueue.java:731)
>>>> at java.awt.EventQueue$4.run(EventQueue.java:729)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>> at
>>>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
>>>> at java.awt.EventQueue.dispatchEvent(EventQueue.java:728)
>>>> at
>>>> java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201)
>>>> at
>>>> java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116)
>>>> at
>>>> java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105)
>>>> at
>>>> java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
>>>> at
>>>> java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
>>>> at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)
>>>>
>>>> Would you please check these points for me?
>>>>
>>>> Thank you so much for your help.
>>>>
>>>> Best regards,
>>>>
>>>> Trung Tran.
>>>>
>>>> On 05/17/2016 08:15 PM, ttrung@nlke-group.net wrote:
>>>>> Dear Apache OpenNLP Project Team,
>>>>>
>>>>> I have another error with command line tool:
>>>>>
>>>>> - I did exactly as information in site
>>>>> (https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training.tool):
>>>>>
>>>>> E:\test\apache-opennlp-1.5.3\bin>opennlp.bat ChunkerTrainerME
>>>>> -model E:\test\en-chunker.bin -lang en -data E:\test\tmp.txt
>>>>> -encoding UTF-8
>>>>>
>>>>> File test only contains sample sentence as in the site :
>>>>>
>>>>> He PRP B-NP
>>>>> reckons VBZ B-VP
>>>>> the DT B-NP
>>>>> current JJ I-NP
>>>>> account NN I-NP
>>>>> deficit NN I-NP
>>>>> will MD B-VP
>>>>> narrow VB I-VP
>>>>> to TO B-PP
>>>>> only RB B-NP
>>>>> # # I-NP
>>>>> 1.8 CD I-NP
>>>>> billion CD I-NP
>>>>> in IN B-PP
>>>>> September NNP B-NP
>>>>> . . O
>>>>> And here is the error:
>>>>>
>>>>> Computing event counts... done. 0 events
>>>>> Indexing... done.
>>>>> Sorting and merging events... Done indexing.
>>>>> Incorporating indexed data for training...
>>>>> Exception in thread "main" java.lang.NullPointerException
>>>>> at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
>>>>> at opennlp.maxent.GIS.trainModel(GIS.java:256)
>>>>> at opennlp.model.TrainUtil.train(TrainUtil.java:184)
>>>>> at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:214)
>>>>> at
>>>>> opennlp.tools.cmdline.chunker.ChunkerTrainerTool.run(ChunkerTrainerTo
>>>>> ol.java:68)
>>>>> at opennlp.tools.cmdline.CLI.main(CLI.java:222)
>>>>>
>>>>>
>>>>> Another point: The function cannot read more than 2 sentence in
>>>>> one train file.
>>>>>
>>>>> Would you please check these points for me?
>>>>>
>>>>> Thank you so much for your help.
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Trung Tran.
>>>>>
>>>>> On 05/17/2016 02:06 PM, ttrung@nlke-group.net wrote:
>>>>>> Dear Apache OpenNLP Project Team,
>>>>>>
>>>>>> I have an critical issue when training with Chunker tool in Java:
>>>>>>
>>>>>> - Firstly, the sample code in documentation site
>>>>>> (https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training.api)
>>>>>> is not work, both for version 1.5.3 and 1.6.0
>>>>>>
>>>>>> - Secondly, I have to edit the codes myself to (using version
>>>>>> 1.5.3):
>>>>>>
>>>>>> try {
>>>>>> Charset charset = Charset.forName("UTF-8");
>>>>>> ObjectStream lineStream = new
>>>>>> PlainTextByLineStream(new FileInputStream(fileChunker), charset);
>>>>>> ObjectStream<ChunkSample> sampleStream = new
>>>>>> ChunkSampleStream(lineStream);
>>>>>>
>>>>>> chunkerModel = ChunkerME.train("vn", sampleStream,
>>>>>> TrainingParameters.defaultParams(), new ChunkerFactory());
>>>>>>
>>>>>> modelApacheChunkerPath =
>>>>>> UtilityHelper.getTemporaryFilePathInsideDir("chunkerModel.bin");
>>>>>> OutputStream modelOut = new BufferedOutputStream(new
>>>>>> FileOutputStream(modelApacheChunkerPath));
>>>>>> chunkerModel.serialize(modelOut);
>>>>>> } catch (FileNotFoundException fe) {
>>>>>>
>>>>>> } catch (IOException ie) {
>>>>>>
>>>>>> }
>>>>>>
>>>>>> - Thirdly, I have the error "java.lang.String cannot be cast
>>>>>> to opennlp.tools.parser.Parse". The reason is:
>>>>>>
>>>>>> + The constructor of class ChunkSampleStream requires
>>>>>> parameter is "ObjectStream<Parse> in"
>>>>>>
>>>>>> + However, the second parameter of method
>>>>>> ChunkerME.train is "ObjectStream<ChunkSample> in"
>>>>>>
>>>>>> I cannot find any way to work around this issue.
>>>>>>
>>>>>> Would you please check this point for me?
>>>>>>
>>>>>> Thank you so much for your help.
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> Trung Tran.
>>>>>
>>>>
>>>
>>
>
Stuck: Cannot train Chunker even with the instruction on the site
Posted by "ttrung@nlke-group.net" <tt...@nlke-group.net>.
Dear Apache OpenNLP Project Team,
We really appreciate that you provides the wonderful tools OpenNLG and
we already successfully trained with most (Tokenizer, POS Tagger).
There is only one small problem (we really believe this) that I
described below when training with Chunker.
I hope that you will re-test and give us some information soon so that
we can fix this critical point.
By the way, you are always amazing team :)
Thank you so much for your help.
Best regards,
Trung Tran.
On 05/27/2016 08:23 AM, ttrung@nlke-group.net wrote:
> Dear Apache OpenNLP Project Team,
>
> To help you reproduce the situation, I describe the experiment step by
> step here:
>
> - Firstly, I read carefully the instruction on the site
> (https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training):
>
> " The training data can be converted to the OpenNLP chunker training
> format, that is based onCoNLL2000
> <http://www.cnts.ua.ac.be/conll2000/chunking>. Other formats may also
> be available. The train data consist of three columns separated by
> spaces. Each word has been put on a separate line and there is an
> empty line after each sentence. The first column contains the current
> word, the second its part-of-speech tag and the third its chunk tag.
> The chunk tags contain the name of the chunk type, for example I-NP
> for noun phrase words and I-VP for verb phrase words. Most chunk types
> have two types of chunk tags, B-CHUNK for the first word of the chunk
> and I-CHUNK for each other word in the chunk. Here is an example of
> the file format:"
>
> - Secondly, I created two tested file ".txt". The first file
> contains only one sample sentence on the site:
>
> He PRP B-NP
> reckons VBZ B-VP
> the DT B-NP
> current JJ I-NP
> account NN I-NP
> deficit NN I-NP
> will MD B-VP
> narrow VB I-VP
> to TO B-PP
> only RB B-NP
> # # I-NP
> 1.8 CD I-NP
> billion CD I-NP
> in IN B-PP
> September NNP B-NP
> . . O
>
> - The second tested file contains 300 Vietnamese sentences. As
> described on the site: Each word has been put on a separate line and
> there is an empty line after each sentence..
>
> - Thirdly, I ran the program 2 times to train these two files.
> With both times, I had the same error, right after reading the first
> sentence.
>
> Would you please point out that I misses something?
>
> PS: I trained Tokenizer and POS Tagger successfully according to the
> instruction on this site :)
>
> Thank you so much for helping me.
>
> Best regards,
>
> Trung Tran.
>
>
> On 05/26/2016 07:49 PM, ttrung@nlke-group.net wrote:
>> Dear Apache OpenNLP Project Team,
>
>> I have re-tested with sample sentence in the site
>> (https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training)
>> :
>>
>> He PRP B-NP
>> reckons VBZ B-VP
>> the DT B-NP
>> current JJ I-NP
>> account NN I-NP
>> deficit NN I-NP
>> will MD B-VP
>> narrow VB I-VP
>> to TO B-PP
>> only RB B-NP
>> # # I-NP
>> 1.8 CD I-NP
>> billion CD I-NP
>> in IN B-PP
>> September NNP B-NP
>> . . O
>> And I still receive the same error:
>>
>> Skipping corrupt line: He PRP B-NPreckons VBZ B-VPthe
>> DT B-NPcurrent JJ I-NPaccount NN I-NPdeficit NN
>> I-NPwill MD B-VPnarrow VB I-VPto TO B-PPonly
>> RB B-NP# # I-NP1.8 CD I-NPbillion CD
>> I-NPin IN B-PPSeptember NNP B-NP. . O
>> Exception in thread "AWT-EventQueue-0"
>> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>> at java.util.ArrayList.rangeCheck(ArrayList.java:653)
>> at java.util.ArrayList.get(ArrayList.java:429)
>> at
>> opennlp.tools.ml.model.AbstractDataIndexer.sortAndMerge(AbstractDataIndexer.java:89)
>> at
>> opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:105)
>> at
>> opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74)
>> at
>> opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:91)
>> at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:217)
>> at
>> form.UtilitiesForm.handleTrainingTreebankWithApacheChunker(UtilitiesForm.java:2989)
>> at
>> form.UtilitiesForm.jmnitLoadTreebankActionPerformed(UtilitiesForm.java:1166)
>> at form.UtilitiesForm.access$1400(UtilitiesForm.java:108)
>> at form.UtilitiesForm$17.actionPerformed(UtilitiesForm.java:901)
>> at
>> javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2022)
>> at
>> javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2348)
>> at
>> javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402)
>> at
>> javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259)
>> at javax.swing.AbstractButton.doClick(AbstractButton.java:376)
>> at
>> javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:833)
>> at
>> javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:877)
>> at java.awt.Component.processMouseEvent(Component.java:6535)
>> at javax.swing.JComponent.processMouseEvent(JComponent.java:3324)
>> at java.awt.Component.processEvent(Component.java:6300)
>> at java.awt.Container.processEvent(Container.java:2236)
>> at java.awt.Component.dispatchEventImpl(Component.java:4891)
>> at java.awt.Container.dispatchEventImpl(Container.java:2294)
>> at java.awt.Component.dispatchEvent(Component.java:4713)
>> at
>> java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4888)
>> at
>> java.awt.LightweightDispatcher.processMouseEvent(Container.java:4525)
>> at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4466)
>> at java.awt.Container.dispatchEventImpl(Container.java:2280)
>> at java.awt.Window.dispatchEventImpl(Window.java:2750)
>> at java.awt.Component.dispatchEvent(Component.java:4713)
>> at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:758)
>> at java.awt.EventQueue.access$500(EventQueue.java:97)
>> at java.awt.EventQueue$3.run(EventQueue.java:709)
>> at java.awt.EventQueue$3.run(EventQueue.java:703)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at
>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
>> at
>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:86)
>> at java.awt.EventQueue$4.run(EventQueue.java:731)
>> at java.awt.EventQueue$4.run(EventQueue.java:729)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at
>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
>> at java.awt.EventQueue.dispatchEvent(EventQueue.java:728)
>> at
>> java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201)
>> at
>> java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116)
>> at
>> java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105)
>> at
>> java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
>> at
>> java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
>> at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)
>> Sorting and merging events...
>>
>> Here are whole java code:
>>
>> try {
>> Charset charset = Charset.forName("UTF-8");
>> File fileChunker = new File("trainApacheChunker.txt");
>> MarkableFileInputStreamFactory i = new
>> MarkableFileInputStreamFactory(fileChunker);
>> ObjectStream lineStream = new PlainTextByLineStream(i,
>> charset);
>> ObjectStream<ChunkSample> sampleStream = new
>> ChunkSampleStream(lineStream);
>>
>> chunkerModel = ChunkerME.train("en", sampleStream,
>> TrainingParameters.defaultParams(), new ChunkerFactory());
>>
>> modelApacheChunkerPath = "chunkerModel.bin";
>> OutputStream modelOut = new BufferedOutputStream(new
>> FileOutputStream(modelApacheChunkerPath));
>> chunkerModel.serialize(modelOut);
>> } catch (FileNotFoundException fe) {
>>
>> } catch (IOException ie) {
>>
>> }
>>
>> Would you please check this point for me?
>>
>> Thank you so much for your help.
>>
>> Best regards,
>>
>> Trung Tran.
>>
>>
>> On 05/18/2016 04:56 AM, ttrung@nlke-group.net wrote:
>>> Dear Apache OpenNLP Project Team,
>>>
>>> Thank you so much for giving me very useful information about class (
>>> /opennlp-tools/src/main/java/opennlp/tools/chunker/ChunkSampleStream.java
>>> )
>>>
>>> It works very well.
>>>
>>> There is one more point: I have error when train Vietnamese
>>> sentences (more than 2 sentences in one training file).
>>>
>>> Here is 2 example sentences in file trainChunker.txt:
>>>
>>> buo^?i _T_C B-ADVP
>>> tru+a _T_C I-ADVP
>>> , , O
>>> cu+`u A_C B-NP
>>> cha.y IT_M B-VP
>>> theo IT_M I-VP
>>> me. H_C I-VP
>>> ra IT_M B-PP
>>> bo+` S_C I-PP
>>> suo^'i S_C I-PP
>>> . . O
>>>
>>> n C_N_T B-NP
>>> tha^'y S_P B-VP
>>> ba^`y A_G B-NP
>>> hu+o+u A_C I-NP
>>> nai A_C I-NP
>>> ? ST_P_S B-CONJP
>>> o+? IT_P_C B-PP
>>> ?a^'y C_N_T I-PP
>>> ro^`i T_G I-PP
>>> . . O
>>>
>>> Here is the error right after train the first sentence:
>>>
>>> Skipping corrupt line: buo^?i _T_C B-ADVP
>>> Skipping corrupt line: tru+a _T_C I-ADVP
>>> Skipping corrupt line: , , O
>>> Skipping corrupt line: cu+`u A_C B-NP
>>> Skipping corrupt line: cha.y IT_M B-VP
>>> Skipping corrupt line: theo IT_M I-VP
>>> Skipping corrupt line: me. H_C I-VP
>>> Skipping corrupt line: ra IT_M B-PP
>>> Skipping corrupt line: bo+` S_C I-PP
>>> Skipping corrupt line: suo^'i S_C I-PP
>>> Skipping corrupt line: . . O
>>> Exception in thread "AWT-EventQueue-0"
>>> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>>> at java.util.ArrayList.rangeCheck(ArrayList.java:653)
>>> at java.util.ArrayList.get(ArrayList.java:429)
>>> at
>>> opennlp.tools.ml.model.AbstractDataIndexer.sortAndMerge(AbstractDataIndexer.java:89)
>>> at
>>> opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:105)
>>> at
>>> opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74)
>>> at
>>> opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:91)
>>> at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:217)
>>> at
>>> form.UtilitiesForm.handleTrainingTreebankWithApacheChunker(UtilitiesForm.java:2939)
>>> at
>>> form.UtilitiesForm.jmnitLoadTreebankActionPerformed(UtilitiesForm.java:1136)
>>> at form.UtilitiesForm.access$1400(UtilitiesForm.java:108)
>>> at form.UtilitiesForm$17.actionPerformed(UtilitiesForm.java:901)
>>> at
>>> javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2022)
>>> at
>>> javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2348)
>>> at
>>> javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402)
>>> at
>>> javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259)
>>> at javax.swing.AbstractButton.doClick(AbstractButton.java:376)
>>> at
>>> javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:833)
>>> at
>>> javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:877)
>>> Sorting and merging events... at
>>> java.awt.Component.processMouseEvent(Component.java:6535)
>>> at javax.swing.JComponent.processMouseEvent(JComponent.java:3324)
>>> at java.awt.Component.processEvent(Component.java:6300)
>>> at java.awt.Container.processEvent(Container.java:2236)
>>> at java.awt.Component.dispatchEventImpl(Component.java:4891)
>>> at java.awt.Container.dispatchEventImpl(Container.java:2294)
>>> at java.awt.Component.dispatchEvent(Component.java:4713)
>>> at
>>> java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4888)
>>> at
>>> java.awt.LightweightDispatcher.processMouseEvent(Container.java:4525)
>>> at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4466)
>>> at java.awt.Container.dispatchEventImpl(Container.java:2280)
>>> at java.awt.Window.dispatchEventImpl(Window.java:2750)
>>> at java.awt.Component.dispatchEvent(Component.java:4713)
>>> at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:758)
>>> at java.awt.EventQueue.access$500(EventQueue.java:97)
>>> at java.awt.EventQueue$3.run(EventQueue.java:709)
>>> at java.awt.EventQueue$3.run(EventQueue.java:703)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at
>>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
>>> at
>>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:86)
>>> at java.awt.EventQueue$4.run(EventQueue.java:731)
>>> at java.awt.EventQueue$4.run(EventQueue.java:729)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at
>>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
>>> at java.awt.EventQueue.dispatchEvent(EventQueue.java:728)
>>> at
>>> java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201)
>>> at
>>> java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116)
>>> at
>>> java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105)
>>> at
>>> java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
>>> at
>>> java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
>>> at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)
>>>
>>> Would you please check these points for me?
>>>
>>> Thank you so much for your help.
>>>
>>> Best regards,
>>>
>>> Trung Tran.
>>>
>>> On 05/17/2016 08:15 PM, ttrung@nlke-group.net wrote:
>>>> Dear Apache OpenNLP Project Team,
>>>>
>>>> I have another error with command line tool:
>>>>
>>>> - I did exactly as information in site
>>>> (https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training.tool):
>>>>
>>>> E:\test\apache-opennlp-1.5.3\bin>opennlp.bat ChunkerTrainerME
>>>> -model E:\test\en-chunker.bin -lang en -data E:\test\tmp.txt
>>>> -encoding UTF-8
>>>>
>>>> File test only contains sample sentence as in the site :
>>>>
>>>> He PRP B-NP
>>>> reckons VBZ B-VP
>>>> the DT B-NP
>>>> current JJ I-NP
>>>> account NN I-NP
>>>> deficit NN I-NP
>>>> will MD B-VP
>>>> narrow VB I-VP
>>>> to TO B-PP
>>>> only RB B-NP
>>>> # # I-NP
>>>> 1.8 CD I-NP
>>>> billion CD I-NP
>>>> in IN B-PP
>>>> September NNP B-NP
>>>> . . O
>>>> And here is the error:
>>>>
>>>> Computing event counts... done. 0 events
>>>> Indexing... done.
>>>> Sorting and merging events... Done indexing.
>>>> Incorporating indexed data for training...
>>>> Exception in thread "main" java.lang.NullPointerException
>>>> at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
>>>> at opennlp.maxent.GIS.trainModel(GIS.java:256)
>>>> at opennlp.model.TrainUtil.train(TrainUtil.java:184)
>>>> at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:214)
>>>> at
>>>> opennlp.tools.cmdline.chunker.ChunkerTrainerTool.run(ChunkerTrainerTo
>>>> ol.java:68)
>>>> at opennlp.tools.cmdline.CLI.main(CLI.java:222)
>>>>
>>>>
>>>> Another point: The function cannot read more than 2 sentence in one
>>>> train file.
>>>>
>>>> Would you please check these points for me?
>>>>
>>>> Thank you so much for your help.
>>>>
>>>> Best regards,
>>>>
>>>> Trung Tran.
>>>>
>>>> On 05/17/2016 02:06 PM, ttrung@nlke-group.net wrote:
>>>>> Dear Apache OpenNLP Project Team,
>>>>>
>>>>> I have an critical issue when training with Chunker tool in Java:
>>>>>
>>>>> - Firstly, the sample code in documentation site
>>>>> (https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training.api)
>>>>> is not work, both for version 1.5.3 and 1.6.0
>>>>>
>>>>> - Secondly, I have to edit the codes myself to (using version
>>>>> 1.5.3):
>>>>>
>>>>> try {
>>>>> Charset charset = Charset.forName("UTF-8");
>>>>> ObjectStream lineStream = new
>>>>> PlainTextByLineStream(new FileInputStream(fileChunker), charset);
>>>>> ObjectStream<ChunkSample> sampleStream = new
>>>>> ChunkSampleStream(lineStream);
>>>>>
>>>>> chunkerModel = ChunkerME.train("vn", sampleStream,
>>>>> TrainingParameters.defaultParams(), new ChunkerFactory());
>>>>>
>>>>> modelApacheChunkerPath =
>>>>> UtilityHelper.getTemporaryFilePathInsideDir("chunkerModel.bin");
>>>>> OutputStream modelOut = new BufferedOutputStream(new
>>>>> FileOutputStream(modelApacheChunkerPath));
>>>>> chunkerModel.serialize(modelOut);
>>>>> } catch (FileNotFoundException fe) {
>>>>>
>>>>> } catch (IOException ie) {
>>>>>
>>>>> }
>>>>>
>>>>> - Thirdly, I have the error "java.lang.String cannot be cast
>>>>> to opennlp.tools.parser.Parse". The reason is:
>>>>>
>>>>> + The constructor of class ChunkSampleStream requires
>>>>> parameter is "ObjectStream<Parse> in"
>>>>>
>>>>> + However, the second parameter of method
>>>>> ChunkerME.train is "ObjectStream<ChunkSample> in"
>>>>>
>>>>> I cannot find any way to work around this issue.
>>>>>
>>>>> Would you please check this point for me?
>>>>>
>>>>> Thank you so much for your help.
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Trung Tran.
>>>>
>>>
>>
>
Re: Cannot train Chunker
Posted by "ttrung@nlke-group.net" <tt...@nlke-group.net>.
Dear Apache OpenNLP Project Team,
To help you reproduce the situation, I describe the experiment step by
step here:
- Firstly, I read carefully the instruction on the site
(https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training):
" The training data can be converted to the OpenNLP chunker training
format, that is based onCoNLL2000
<http://www.cnts.ua.ac.be/conll2000/chunking>. Other formats may also be
available. The train data consist of three columns separated by spaces.
Each word has been put on a separate line and there is an empty line
after each sentence. The first column contains the current word, the
second its part-of-speech tag and the third its chunk tag. The chunk
tags contain the name of the chunk type, for example I-NP for noun
phrase words and I-VP for verb phrase words. Most chunk types have two
types of chunk tags, B-CHUNK for the first word of the chunk and I-CHUNK
for each other word in the chunk. Here is an example of the file format:"
- Secondly, I created two tested file ".txt". The first file
contains only one sample sentence on the site:
He PRP B-NP
reckons VBZ B-VP
the DT B-NP
current JJ I-NP
account NN I-NP
deficit NN I-NP
will MD B-VP
narrow VB I-VP
to TO B-PP
only RB B-NP
# # I-NP
1.8 CD I-NP
billion CD I-NP
in IN B-PP
September NNP B-NP
. . O
- The second tested file contains 300 Vietnamese sentences. As
described on the site: Each word has been put on a separate line and
there is an empty line after each sentence..
- Thirdly, I ran the program 2 times to train these two files. With
both times, I had the same error, right after reading the first sentence.
Would you please point out that I misses something?
PS: I trained Tokenizer and POS Tagger successfully according to the
instruction on this site :)
Thank you so much for helping me.
Best regards,
Trung Tran.
On 05/26/2016 07:49 PM, ttrung@nlke-group.net wrote:
> Dear Apache OpenNLP Project Team,
> I have re-tested with sample sentence in the site
> (https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training)
> :
>
> He PRP B-NP
> reckons VBZ B-VP
> the DT B-NP
> current JJ I-NP
> account NN I-NP
> deficit NN I-NP
> will MD B-VP
> narrow VB I-VP
> to TO B-PP
> only RB B-NP
> # # I-NP
> 1.8 CD I-NP
> billion CD I-NP
> in IN B-PP
> September NNP B-NP
> . . O
> And I still receive the same error:
>
> Skipping corrupt line: He PRP B-NPreckons VBZ B-VPthe
> DT B-NPcurrent JJ I-NPaccount NN I-NPdeficit NN
> I-NPwill MD B-VPnarrow VB I-VPto TO B-PPonly
> RB B-NP# # I-NP1.8 CD I-NPbillion CD
> I-NPin IN B-PPSeptember NNP B-NP. . O
> Exception in thread "AWT-EventQueue-0"
> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:653)
> at java.util.ArrayList.get(ArrayList.java:429)
> at
> opennlp.tools.ml.model.AbstractDataIndexer.sortAndMerge(AbstractDataIndexer.java:89)
> at
> opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:105)
> at
> opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74)
> at
> opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:91)
> at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:217)
> at
> form.UtilitiesForm.handleTrainingTreebankWithApacheChunker(UtilitiesForm.java:2989)
> at
> form.UtilitiesForm.jmnitLoadTreebankActionPerformed(UtilitiesForm.java:1166)
> at form.UtilitiesForm.access$1400(UtilitiesForm.java:108)
> at form.UtilitiesForm$17.actionPerformed(UtilitiesForm.java:901)
> at
> javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2022)
> at
> javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2348)
> at
> javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402)
> at
> javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259)
> at javax.swing.AbstractButton.doClick(AbstractButton.java:376)
> at
> javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:833)
> at
> javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:877)
> at java.awt.Component.processMouseEvent(Component.java:6535)
> at javax.swing.JComponent.processMouseEvent(JComponent.java:3324)
> at java.awt.Component.processEvent(Component.java:6300)
> at java.awt.Container.processEvent(Container.java:2236)
> at java.awt.Component.dispatchEventImpl(Component.java:4891)
> at java.awt.Container.dispatchEventImpl(Container.java:2294)
> at java.awt.Component.dispatchEvent(Component.java:4713)
> at
> java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4888)
> at
> java.awt.LightweightDispatcher.processMouseEvent(Container.java:4525)
> at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4466)
> at java.awt.Container.dispatchEventImpl(Container.java:2280)
> at java.awt.Window.dispatchEventImpl(Window.java:2750)
> at java.awt.Component.dispatchEvent(Component.java:4713)
> at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:758)
> at java.awt.EventQueue.access$500(EventQueue.java:97)
> at java.awt.EventQueue$3.run(EventQueue.java:709)
> at java.awt.EventQueue$3.run(EventQueue.java:703)
> at java.security.AccessController.doPrivileged(Native Method)
> at
> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
> at
> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:86)
> at java.awt.EventQueue$4.run(EventQueue.java:731)
> at java.awt.EventQueue$4.run(EventQueue.java:729)
> at java.security.AccessController.doPrivileged(Native Method)
> at
> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
> at java.awt.EventQueue.dispatchEvent(EventQueue.java:728)
> at
> java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201)
> at
> java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116)
> at
> java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105)
> at
> java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
> at
> java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
> at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)
> Sorting and merging events...
>
> Here are whole java code:
>
> try {
> Charset charset = Charset.forName("UTF-8");
> File fileChunker = new File("trainApacheChunker.txt");
> MarkableFileInputStreamFactory i = new
> MarkableFileInputStreamFactory(fileChunker);
> ObjectStream lineStream = new PlainTextByLineStream(i,
> charset);
> ObjectStream<ChunkSample> sampleStream = new
> ChunkSampleStream(lineStream);
>
> chunkerModel = ChunkerME.train("en", sampleStream,
> TrainingParameters.defaultParams(), new ChunkerFactory());
>
> modelApacheChunkerPath = "chunkerModel.bin";
> OutputStream modelOut = new BufferedOutputStream(new
> FileOutputStream(modelApacheChunkerPath));
> chunkerModel.serialize(modelOut);
> } catch (FileNotFoundException fe) {
>
> } catch (IOException ie) {
>
> }
>
> Would you please check this point for me?
>
> Thank you so much for your help.
>
> Best regards,
>
> Trung Tran.
>
>
> On 05/18/2016 04:56 AM, ttrung@nlke-group.net wrote:
>> Dear Apache OpenNLP Project Team,
>>
>> Thank you so much for giving me very useful information about class (
>> /opennlp-tools/src/main/java/opennlp/tools/chunker/ChunkSampleStream.java
>> )
>>
>> It works very well.
>>
>> There is one more point: I have error when train Vietnamese sentences
>> (more than 2 sentences in one training file).
>>
>> Here is 2 example sentences in file trainChunker.txt:
>>
>> buo^?i _T_C B-ADVP
>> tru+a _T_C I-ADVP
>> , , O
>> cu+`u A_C B-NP
>> cha.y IT_M B-VP
>> theo IT_M I-VP
>> me. H_C I-VP
>> ra IT_M B-PP
>> bo+` S_C I-PP
>> suo^'i S_C I-PP
>> . . O
>>
>> n C_N_T B-NP
>> tha^'y S_P B-VP
>> ba^`y A_G B-NP
>> hu+o+u A_C I-NP
>> nai A_C I-NP
>> ? ST_P_S B-CONJP
>> o+? IT_P_C B-PP
>> ?a^'y C_N_T I-PP
>> ro^`i T_G I-PP
>> . . O
>>
>> Here is the error right after train the first sentence:
>>
>> Skipping corrupt line: buo^?i _T_C B-ADVP
>> Skipping corrupt line: tru+a _T_C I-ADVP
>> Skipping corrupt line: , , O
>> Skipping corrupt line: cu+`u A_C B-NP
>> Skipping corrupt line: cha.y IT_M B-VP
>> Skipping corrupt line: theo IT_M I-VP
>> Skipping corrupt line: me. H_C I-VP
>> Skipping corrupt line: ra IT_M B-PP
>> Skipping corrupt line: bo+` S_C I-PP
>> Skipping corrupt line: suo^'i S_C I-PP
>> Skipping corrupt line: . . O
>> Exception in thread "AWT-EventQueue-0"
>> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>> at java.util.ArrayList.rangeCheck(ArrayList.java:653)
>> at java.util.ArrayList.get(ArrayList.java:429)
>> at
>> opennlp.tools.ml.model.AbstractDataIndexer.sortAndMerge(AbstractDataIndexer.java:89)
>> at
>> opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:105)
>> at
>> opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74)
>> at
>> opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:91)
>> at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:217)
>> at
>> form.UtilitiesForm.handleTrainingTreebankWithApacheChunker(UtilitiesForm.java:2939)
>> at
>> form.UtilitiesForm.jmnitLoadTreebankActionPerformed(UtilitiesForm.java:1136)
>> at form.UtilitiesForm.access$1400(UtilitiesForm.java:108)
>> at form.UtilitiesForm$17.actionPerformed(UtilitiesForm.java:901)
>> at
>> javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2022)
>> at
>> javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2348)
>> at
>> javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402)
>> at
>> javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259)
>> at javax.swing.AbstractButton.doClick(AbstractButton.java:376)
>> at
>> javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:833)
>> at
>> javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:877)
>> Sorting and merging events... at
>> java.awt.Component.processMouseEvent(Component.java:6535)
>> at javax.swing.JComponent.processMouseEvent(JComponent.java:3324)
>> at java.awt.Component.processEvent(Component.java:6300)
>> at java.awt.Container.processEvent(Container.java:2236)
>> at java.awt.Component.dispatchEventImpl(Component.java:4891)
>> at java.awt.Container.dispatchEventImpl(Container.java:2294)
>> at java.awt.Component.dispatchEvent(Component.java:4713)
>> at
>> java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4888)
>> at
>> java.awt.LightweightDispatcher.processMouseEvent(Container.java:4525)
>> at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4466)
>> at java.awt.Container.dispatchEventImpl(Container.java:2280)
>> at java.awt.Window.dispatchEventImpl(Window.java:2750)
>> at java.awt.Component.dispatchEvent(Component.java:4713)
>> at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:758)
>> at java.awt.EventQueue.access$500(EventQueue.java:97)
>> at java.awt.EventQueue$3.run(EventQueue.java:709)
>> at java.awt.EventQueue$3.run(EventQueue.java:703)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at
>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
>> at
>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:86)
>> at java.awt.EventQueue$4.run(EventQueue.java:731)
>> at java.awt.EventQueue$4.run(EventQueue.java:729)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at
>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
>> at java.awt.EventQueue.dispatchEvent(EventQueue.java:728)
>> at
>> java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201)
>> at
>> java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116)
>> at
>> java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105)
>> at
>> java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
>> at
>> java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
>> at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)
>>
>> Would you please check these points for me?
>>
>> Thank you so much for your help.
>>
>> Best regards,
>>
>> Trung Tran.
>>
>> On 05/17/2016 08:15 PM, ttrung@nlke-group.net wrote:
>>> Dear Apache OpenNLP Project Team,
>>>
>>> I have another error with command line tool:
>>>
>>> - I did exactly as information in site
>>> (https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training.tool):
>>>
>>> E:\test\apache-opennlp-1.5.3\bin>opennlp.bat ChunkerTrainerME -model
>>> E:\test\en-chunker.bin -lang en -data E:\test\tmp.txt -encoding UTF-8
>>>
>>> File test only contains sample sentence as in the site :
>>>
>>> He PRP B-NP
>>> reckons VBZ B-VP
>>> the DT B-NP
>>> current JJ I-NP
>>> account NN I-NP
>>> deficit NN I-NP
>>> will MD B-VP
>>> narrow VB I-VP
>>> to TO B-PP
>>> only RB B-NP
>>> # # I-NP
>>> 1.8 CD I-NP
>>> billion CD I-NP
>>> in IN B-PP
>>> September NNP B-NP
>>> . . O
>>> And here is the error:
>>>
>>> Computing event counts... done. 0 events
>>> Indexing... done.
>>> Sorting and merging events... Done indexing.
>>> Incorporating indexed data for training...
>>> Exception in thread "main" java.lang.NullPointerException
>>> at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
>>> at opennlp.maxent.GIS.trainModel(GIS.java:256)
>>> at opennlp.model.TrainUtil.train(TrainUtil.java:184)
>>> at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:214)
>>> at
>>> opennlp.tools.cmdline.chunker.ChunkerTrainerTool.run(ChunkerTrainerTo
>>> ol.java:68)
>>> at opennlp.tools.cmdline.CLI.main(CLI.java:222)
>>>
>>>
>>> Another point: The function cannot read more than 2 sentence in one
>>> train file.
>>>
>>> Would you please check these points for me?
>>>
>>> Thank you so much for your help.
>>>
>>> Best regards,
>>>
>>> Trung Tran.
>>>
>>> On 05/17/2016 02:06 PM, ttrung@nlke-group.net wrote:
>>>> Dear Apache OpenNLP Project Team,
>>>>
>>>> I have an critical issue when training with Chunker tool in Java:
>>>>
>>>> - Firstly, the sample code in documentation site
>>>> (https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training.api)
>>>> is not work, both for version 1.5.3 and 1.6.0
>>>>
>>>> - Secondly, I have to edit the codes myself to (using version
>>>> 1.5.3):
>>>>
>>>> try {
>>>> Charset charset = Charset.forName("UTF-8");
>>>> ObjectStream lineStream = new PlainTextByLineStream(new
>>>> FileInputStream(fileChunker), charset);
>>>> ObjectStream<ChunkSample> sampleStream = new
>>>> ChunkSampleStream(lineStream);
>>>>
>>>> chunkerModel = ChunkerME.train("vn", sampleStream,
>>>> TrainingParameters.defaultParams(), new ChunkerFactory());
>>>>
>>>> modelApacheChunkerPath =
>>>> UtilityHelper.getTemporaryFilePathInsideDir("chunkerModel.bin");
>>>> OutputStream modelOut = new BufferedOutputStream(new
>>>> FileOutputStream(modelApacheChunkerPath));
>>>> chunkerModel.serialize(modelOut);
>>>> } catch (FileNotFoundException fe) {
>>>>
>>>> } catch (IOException ie) {
>>>>
>>>> }
>>>>
>>>> - Thirdly, I have the error "java.lang.String cannot be cast to
>>>> opennlp.tools.parser.Parse". The reason is:
>>>>
>>>> + The constructor of class ChunkSampleStream requires
>>>> parameter is "ObjectStream<Parse> in"
>>>>
>>>> + However, the second parameter of method
>>>> ChunkerME.train is "ObjectStream<ChunkSample> in"
>>>>
>>>> I cannot find any way to work around this issue.
>>>>
>>>> Would you please check this point for me?
>>>>
>>>> Thank you so much for your help.
>>>>
>>>> Best regards,
>>>>
>>>> Trung Tran.
>>>
>>
>
Cannot train Chunker
Posted by "ttrung@nlke-group.net" <tt...@nlke-group.net>.
Dear Apache OpenNLP Project Team,
I have re-tested with sample sentence in the site
(https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training)
:
He PRP B-NP
reckons VBZ B-VP
the DT B-NP
current JJ I-NP
account NN I-NP
deficit NN I-NP
will MD B-VP
narrow VB I-VP
to TO B-PP
only RB B-NP
# # I-NP
1.8 CD I-NP
billion CD I-NP
in IN B-PP
September NNP B-NP
. . O
And I still receive the same error:
Skipping corrupt line: He PRP B-NPreckons VBZ B-VPthe
DT B-NPcurrent JJ I-NPaccount NN I-NPdeficit NN
I-NPwill MD B-VPnarrow VB I-VPto TO B-PPonly
RB B-NP# # I-NP1.8 CD I-NPbillion CD I-NPin
IN B-PPSeptember NNP B-NP. . O
Exception in thread "AWT-EventQueue-0"
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:653)
at java.util.ArrayList.get(ArrayList.java:429)
at
opennlp.tools.ml.model.AbstractDataIndexer.sortAndMerge(AbstractDataIndexer.java:89)
at
opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:105)
at
opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74)
at
opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:91)
at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:217)
at
form.UtilitiesForm.handleTrainingTreebankWithApacheChunker(UtilitiesForm.java:2989)
at
form.UtilitiesForm.jmnitLoadTreebankActionPerformed(UtilitiesForm.java:1166)
at form.UtilitiesForm.access$1400(UtilitiesForm.java:108)
at form.UtilitiesForm$17.actionPerformed(UtilitiesForm.java:901)
at
javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2022)
at
javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2348)
at
javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402)
at
javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259)
at javax.swing.AbstractButton.doClick(AbstractButton.java:376)
at
javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:833)
at
javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:877)
at java.awt.Component.processMouseEvent(Component.java:6535)
at javax.swing.JComponent.processMouseEvent(JComponent.java:3324)
at java.awt.Component.processEvent(Component.java:6300)
at java.awt.Container.processEvent(Container.java:2236)
at java.awt.Component.dispatchEventImpl(Component.java:4891)
at java.awt.Container.dispatchEventImpl(Container.java:2294)
at java.awt.Component.dispatchEvent(Component.java:4713)
at
java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4888)
at
java.awt.LightweightDispatcher.processMouseEvent(Container.java:4525)
at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4466)
at java.awt.Container.dispatchEventImpl(Container.java:2280)
at java.awt.Window.dispatchEventImpl(Window.java:2750)
at java.awt.Component.dispatchEvent(Component.java:4713)
at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:758)
at java.awt.EventQueue.access$500(EventQueue.java:97)
at java.awt.EventQueue$3.run(EventQueue.java:709)
at java.awt.EventQueue$3.run(EventQueue.java:703)
at java.security.AccessController.doPrivileged(Native Method)
at
java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
at
java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:86)
at java.awt.EventQueue$4.run(EventQueue.java:731)
at java.awt.EventQueue$4.run(EventQueue.java:729)
at java.security.AccessController.doPrivileged(Native Method)
at
java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
at java.awt.EventQueue.dispatchEvent(EventQueue.java:728)
at
java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201)
at
java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116)
at
java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105)
at
java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)
Sorting and merging events...
Here are whole java code:
try {
Charset charset = Charset.forName("UTF-8");
File fileChunker = new File("trainApacheChunker.txt");
MarkableFileInputStreamFactory i = new
MarkableFileInputStreamFactory(fileChunker);
ObjectStream lineStream = new PlainTextByLineStream(i,
charset);
ObjectStream<ChunkSample> sampleStream = new
ChunkSampleStream(lineStream);
chunkerModel = ChunkerME.train("en", sampleStream,
TrainingParameters.defaultParams(), new ChunkerFactory());
modelApacheChunkerPath = "chunkerModel.bin";
OutputStream modelOut = new BufferedOutputStream(new
FileOutputStream(modelApacheChunkerPath));
chunkerModel.serialize(modelOut);
} catch (FileNotFoundException fe) {
} catch (IOException ie) {
}
Would you please check this point for me?
Thank you so much for your help.
Best regards,
Trung Tran.
On 05/18/2016 04:56 AM, ttrung@nlke-group.net wrote:
> Dear Apache OpenNLP Project Team,
>
> Thank you so much for giving me very useful information about class (
> /opennlp-tools/src/main/java/opennlp/tools/chunker/ChunkSampleStream.java
> )
>
> It works very well.
>
> There is one more point: I have error when train Vietnamese sentences
> (more than 2 sentences in one training file).
>
> Here is 2 example sentences in file trainChunker.txt:
>
> buo^?i _T_C B-ADVP
> tru+a _T_C I-ADVP
> , , O
> cu+`u A_C B-NP
> cha.y IT_M B-VP
> theo IT_M I-VP
> me. H_C I-VP
> ra IT_M B-PP
> bo+` S_C I-PP
> suo^'i S_C I-PP
> . . O
>
> n C_N_T B-NP
> tha^'y S_P B-VP
> ba^`y A_G B-NP
> hu+o+u A_C I-NP
> nai A_C I-NP
> ? ST_P_S B-CONJP
> o+? IT_P_C B-PP
> ?a^'y C_N_T I-PP
> ro^`i T_G I-PP
> . . O
>
> Here is the error right after train the first sentence:
>
> Skipping corrupt line: buo^?i _T_C B-ADVP
> Skipping corrupt line: tru+a _T_C I-ADVP
> Skipping corrupt line: , , O
> Skipping corrupt line: cu+`u A_C B-NP
> Skipping corrupt line: cha.y IT_M B-VP
> Skipping corrupt line: theo IT_M I-VP
> Skipping corrupt line: me. H_C I-VP
> Skipping corrupt line: ra IT_M B-PP
> Skipping corrupt line: bo+` S_C I-PP
> Skipping corrupt line: suo^'i S_C I-PP
> Skipping corrupt line: . . O
> Exception in thread "AWT-EventQueue-0"
> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:653)
> at java.util.ArrayList.get(ArrayList.java:429)
> at
> opennlp.tools.ml.model.AbstractDataIndexer.sortAndMerge(AbstractDataIndexer.java:89)
> at
> opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:105)
> at
> opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74)
> at
> opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:91)
> at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:217)
> at
> form.UtilitiesForm.handleTrainingTreebankWithApacheChunker(UtilitiesForm.java:2939)
> at
> form.UtilitiesForm.jmnitLoadTreebankActionPerformed(UtilitiesForm.java:1136)
> at form.UtilitiesForm.access$1400(UtilitiesForm.java:108)
> at form.UtilitiesForm$17.actionPerformed(UtilitiesForm.java:901)
> at
> javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2022)
> at
> javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2348)
> at
> javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402)
> at
> javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259)
> at javax.swing.AbstractButton.doClick(AbstractButton.java:376)
> at
> javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:833)
> at
> javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:877)
> Sorting and merging events... at
> java.awt.Component.processMouseEvent(Component.java:6535)
> at javax.swing.JComponent.processMouseEvent(JComponent.java:3324)
> at java.awt.Component.processEvent(Component.java:6300)
> at java.awt.Container.processEvent(Container.java:2236)
> at java.awt.Component.dispatchEventImpl(Component.java:4891)
> at java.awt.Container.dispatchEventImpl(Container.java:2294)
> at java.awt.Component.dispatchEvent(Component.java:4713)
> at
> java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4888)
> at
> java.awt.LightweightDispatcher.processMouseEvent(Container.java:4525)
> at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4466)
> at java.awt.Container.dispatchEventImpl(Container.java:2280)
> at java.awt.Window.dispatchEventImpl(Window.java:2750)
> at java.awt.Component.dispatchEvent(Component.java:4713)
> at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:758)
> at java.awt.EventQueue.access$500(EventQueue.java:97)
> at java.awt.EventQueue$3.run(EventQueue.java:709)
> at java.awt.EventQueue$3.run(EventQueue.java:703)
> at java.security.AccessController.doPrivileged(Native Method)
> at
> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
> at
> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:86)
> at java.awt.EventQueue$4.run(EventQueue.java:731)
> at java.awt.EventQueue$4.run(EventQueue.java:729)
> at java.security.AccessController.doPrivileged(Native Method)
> at
> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
> at java.awt.EventQueue.dispatchEvent(EventQueue.java:728)
> at
> java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201)
> at
> java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116)
> at
> java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105)
> at
> java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
> at
> java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
> at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)
>
> Would you please check these points for me?
>
> Thank you so much for your help.
>
> Best regards,
>
> Trung Tran.
>
> On 05/17/2016 08:15 PM, ttrung@nlke-group.net wrote:
>> Dear Apache OpenNLP Project Team,
>>
>> I have another error with command line tool:
>>
>> - I did exactly as information in site
>> (https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training.tool):
>>
>> E:\test\apache-opennlp-1.5.3\bin>opennlp.bat ChunkerTrainerME -model
>> E:\test\en-chunker.bin -lang en -data E:\test\tmp.txt -encoding UTF-8
>>
>> File test only contains sample sentence as in the site :
>>
>> He PRP B-NP
>> reckons VBZ B-VP
>> the DT B-NP
>> current JJ I-NP
>> account NN I-NP
>> deficit NN I-NP
>> will MD B-VP
>> narrow VB I-VP
>> to TO B-PP
>> only RB B-NP
>> # # I-NP
>> 1.8 CD I-NP
>> billion CD I-NP
>> in IN B-PP
>> September NNP B-NP
>> . . O
>> And here is the error:
>>
>> Computing event counts... done. 0 events
>> Indexing... done.
>> Sorting and merging events... Done indexing.
>> Incorporating indexed data for training...
>> Exception in thread "main" java.lang.NullPointerException
>> at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
>> at opennlp.maxent.GIS.trainModel(GIS.java:256)
>> at opennlp.model.TrainUtil.train(TrainUtil.java:184)
>> at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:214)
>> at
>> opennlp.tools.cmdline.chunker.ChunkerTrainerTool.run(ChunkerTrainerTo
>> ol.java:68)
>> at opennlp.tools.cmdline.CLI.main(CLI.java:222)
>>
>>
>> Another point: The function cannot read more than 2 sentence in one
>> train file.
>>
>> Would you please check these points for me?
>>
>> Thank you so much for your help.
>>
>> Best regards,
>>
>> Trung Tran.
>>
>> On 05/17/2016 02:06 PM, ttrung@nlke-group.net wrote:
>>> Dear Apache OpenNLP Project Team,
>>>
>>> I have an critical issue when training with Chunker tool in Java:
>>>
>>> - Firstly, the sample code in documentation site
>>> (https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training.api)
>>> is not work, both for version 1.5.3 and 1.6.0
>>>
>>> - Secondly, I have to edit the codes myself to (using version
>>> 1.5.3):
>>>
>>> try {
>>> Charset charset = Charset.forName("UTF-8");
>>> ObjectStream lineStream = new PlainTextByLineStream(new
>>> FileInputStream(fileChunker), charset);
>>> ObjectStream<ChunkSample> sampleStream = new
>>> ChunkSampleStream(lineStream);
>>>
>>> chunkerModel = ChunkerME.train("vn", sampleStream,
>>> TrainingParameters.defaultParams(), new ChunkerFactory());
>>>
>>> modelApacheChunkerPath =
>>> UtilityHelper.getTemporaryFilePathInsideDir("chunkerModel.bin");
>>> OutputStream modelOut = new BufferedOutputStream(new
>>> FileOutputStream(modelApacheChunkerPath));
>>> chunkerModel.serialize(modelOut);
>>> } catch (FileNotFoundException fe) {
>>>
>>> } catch (IOException ie) {
>>>
>>> }
>>>
>>> - Thirdly, I have the error "java.lang.String cannot be cast to
>>> opennlp.tools.parser.Parse". The reason is:
>>>
>>> + The constructor of class ChunkSampleStream requires
>>> parameter is "ObjectStream<Parse> in"
>>>
>>> + However, the second parameter of method
>>> ChunkerME.train is "ObjectStream<ChunkSample> in"
>>>
>>> I cannot find any way to work around this issue.
>>>
>>> Would you please check this point for me?
>>>
>>> Thank you so much for your help.
>>>
>>> Best regards,
>>>
>>> Trung Tran.
>>
>
Re: Stuck with class ChunkerME: java.lang.String cannot be cast to
opennlp.tools.parser.Parse
Posted by "ttrung@nlke-group.net" <tt...@nlke-group.net>.
Dear Apache OpenNLP Project Team,
Thank you so much for giving me very useful information about class (
/opennlp-tools/src/main/java/opennlp/tools/chunker/ChunkSampleStream.java
)
It works very well.
There is one more point: I have error when train Vietnamese sentences
(more than 2 sentences in one training file).
Here is 2 example sentences in file trainChunker.txt:
buo^?i _T_C B-ADVP
tru+a _T_C I-ADVP
, , O
cu+`u A_C B-NP
cha.y IT_M B-VP
theo IT_M I-VP
me. H_C I-VP
ra IT_M B-PP
bo+` S_C I-PP
suo^'i S_C I-PP
. . O
n C_N_T B-NP
tha^'y S_P B-VP
ba^`y A_G B-NP
hu+o+u A_C I-NP
nai A_C I-NP
? ST_P_S B-CONJP
o+? IT_P_C B-PP
?a^'y C_N_T I-PP
ro^`i T_G I-PP
. . O
Here is the error right after train the first sentence:
Skipping corrupt line: buo^?i _T_C B-ADVP
Skipping corrupt line: tru+a _T_C I-ADVP
Skipping corrupt line: , , O
Skipping corrupt line: cu+`u A_C B-NP
Skipping corrupt line: cha.y IT_M B-VP
Skipping corrupt line: theo IT_M I-VP
Skipping corrupt line: me. H_C I-VP
Skipping corrupt line: ra IT_M B-PP
Skipping corrupt line: bo+` S_C I-PP
Skipping corrupt line: suo^'i S_C I-PP
Skipping corrupt line: . . O
Exception in thread "AWT-EventQueue-0"
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:653)
at java.util.ArrayList.get(ArrayList.java:429)
at
opennlp.tools.ml.model.AbstractDataIndexer.sortAndMerge(AbstractDataIndexer.java:89)
at
opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:105)
at
opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74)
at
opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:91)
at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:217)
at
form.UtilitiesForm.handleTrainingTreebankWithApacheChunker(UtilitiesForm.java:2939)
at
form.UtilitiesForm.jmnitLoadTreebankActionPerformed(UtilitiesForm.java:1136)
at form.UtilitiesForm.access$1400(UtilitiesForm.java:108)
at form.UtilitiesForm$17.actionPerformed(UtilitiesForm.java:901)
at
javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2022)
at
javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2348)
at
javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402)
at
javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259)
at javax.swing.AbstractButton.doClick(AbstractButton.java:376)
at
javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:833)
at
javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:877)
Sorting and merging events... at
java.awt.Component.processMouseEvent(Component.java:6535)
at javax.swing.JComponent.processMouseEvent(JComponent.java:3324)
at java.awt.Component.processEvent(Component.java:6300)
at java.awt.Container.processEvent(Container.java:2236)
at java.awt.Component.dispatchEventImpl(Component.java:4891)
at java.awt.Container.dispatchEventImpl(Container.java:2294)
at java.awt.Component.dispatchEvent(Component.java:4713)
at
java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4888)
at
java.awt.LightweightDispatcher.processMouseEvent(Container.java:4525)
at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4466)
at java.awt.Container.dispatchEventImpl(Container.java:2280)
at java.awt.Window.dispatchEventImpl(Window.java:2750)
at java.awt.Component.dispatchEvent(Component.java:4713)
at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:758)
at java.awt.EventQueue.access$500(EventQueue.java:97)
at java.awt.EventQueue$3.run(EventQueue.java:709)
at java.awt.EventQueue$3.run(EventQueue.java:703)
at java.security.AccessController.doPrivileged(Native Method)
at
java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
at
java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:86)
at java.awt.EventQueue$4.run(EventQueue.java:731)
at java.awt.EventQueue$4.run(EventQueue.java:729)
at java.security.AccessController.doPrivileged(Native Method)
at
java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
at java.awt.EventQueue.dispatchEvent(EventQueue.java:728)
at
java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201)
at
java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116)
at
java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105)
at
java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)
Would you please check these points for me?
Thank you so much for your help.
Best regards,
Trung Tran.
On 05/17/2016 08:15 PM, ttrung@nlke-group.net wrote:
> Dear Apache OpenNLP Project Team,
>
> I have another error with command line tool:
>
> - I did exactly as information in site
> (https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training.tool):
>
> E:\test\apache-opennlp-1.5.3\bin>opennlp.bat ChunkerTrainerME -model
> E:\test\en-chunker.bin -lang en -data E:\test\tmp.txt -encoding UTF-8
>
> File test only contains sample sentence as in the site :
>
> He PRP B-NP
> reckons VBZ B-VP
> the DT B-NP
> current JJ I-NP
> account NN I-NP
> deficit NN I-NP
> will MD B-VP
> narrow VB I-VP
> to TO B-PP
> only RB B-NP
> # # I-NP
> 1.8 CD I-NP
> billion CD I-NP
> in IN B-PP
> September NNP B-NP
> . . O
> And here is the error:
>
> Computing event counts... done. 0 events
> Indexing... done.
> Sorting and merging events... Done indexing.
> Incorporating indexed data for training...
> Exception in thread "main" java.lang.NullPointerException
> at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
> at opennlp.maxent.GIS.trainModel(GIS.java:256)
> at opennlp.model.TrainUtil.train(TrainUtil.java:184)
> at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:214)
> at
> opennlp.tools.cmdline.chunker.ChunkerTrainerTool.run(ChunkerTrainerTo
> ol.java:68)
> at opennlp.tools.cmdline.CLI.main(CLI.java:222)
>
>
> Another point: The function cannot read more than 2 sentence in one
> train file.
>
> Would you please check these points for me?
>
> Thank you so much for your help.
>
> Best regards,
>
> Trung Tran.
>
> On 05/17/2016 02:06 PM, ttrung@nlke-group.net wrote:
>> Dear Apache OpenNLP Project Team,
>>
>> I have an critical issue when training with Chunker tool in Java:
>>
>> - Firstly, the sample code in documentation site
>> (https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training.api)
>> is not work, both for version 1.5.3 and 1.6.0
>>
>> - Secondly, I have to edit the codes myself to (using version
>> 1.5.3):
>>
>> try {
>> Charset charset = Charset.forName("UTF-8");
>> ObjectStream lineStream = new PlainTextByLineStream(new
>> FileInputStream(fileChunker), charset);
>> ObjectStream<ChunkSample> sampleStream = new
>> ChunkSampleStream(lineStream);
>>
>> chunkerModel = ChunkerME.train("vn", sampleStream,
>> TrainingParameters.defaultParams(), new ChunkerFactory());
>>
>> modelApacheChunkerPath =
>> UtilityHelper.getTemporaryFilePathInsideDir("chunkerModel.bin");
>> OutputStream modelOut = new BufferedOutputStream(new
>> FileOutputStream(modelApacheChunkerPath));
>> chunkerModel.serialize(modelOut);
>> } catch (FileNotFoundException fe) {
>>
>> } catch (IOException ie) {
>>
>> }
>>
>> - Thirdly, I have the error "java.lang.String cannot be cast to
>> opennlp.tools.parser.Parse". The reason is:
>>
>> + The constructor of class ChunkSampleStream requires
>> parameter is "ObjectStream<Parse> in"
>>
>> + However, the second parameter of method ChunkerME.train
>> is "ObjectStream<ChunkSample> in"
>>
>> I cannot find any way to work around this issue.
>>
>> Would you please check this point for me?
>>
>> Thank you so much for your help.
>>
>> Best regards,
>>
>> Trung Tran.
>
Re: Stuck with class ChunkerME: java.lang.String cannot be cast to opennlp.tools.parser.Parse
Posted by Rodrigo Agerri <ra...@apache.org>.
Hi,
I cannot reproduce this error. If I get the training data from the
CoNLL 2000 website as it is,
http://www.cnts.ua.ac.be/conll2000/chunking/
It trains perfectly well with default training parameters and obtains
92.40 F1 on the test distributed also in the CoNLL 2000 site.
Best,
R
On Tue, May 17, 2016 at 3:15 PM, ttrung@nlke-group.net
<tt...@nlke-group.net> wrote:
> Dear Apache OpenNLP Project Team,
>
> I have another error with command line tool:
>
> - I did exactly as information in site
> (https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training.tool):
>
> E:\test\apache-opennlp-1.5.3\bin>opennlp.bat ChunkerTrainerME -model
> E:\test\en-chunker.bin -lang en -data E:\test\tmp.txt -encoding UTF-8
>
> File test only contains sample sentence as in the site :
>
> He PRP B-NP
> reckons VBZ B-VP
> the DT B-NP
> current JJ I-NP
> account NN I-NP
> deficit NN I-NP
> will MD B-VP
> narrow VB I-VP
> to TO B-PP
> only RB B-NP
> # # I-NP
> 1.8 CD I-NP
> billion CD I-NP
> in IN B-PP
> September NNP B-NP
> . . O
>
> And here is the error:
>
> Computing event counts... done. 0 events
> Indexing... done.
> Sorting and merging events... Done indexing.
> Incorporating indexed data for training...
> Exception in thread "main" java.lang.NullPointerException
> at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
> at opennlp.maxent.GIS.trainModel(GIS.java:256)
> at opennlp.model.TrainUtil.train(TrainUtil.java:184)
> at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:214)
> at
> opennlp.tools.cmdline.chunker.ChunkerTrainerTool.run(ChunkerTrainerTo
> ol.java:68)
> at opennlp.tools.cmdline.CLI.main(CLI.java:222)
>
>
> Another point: The function cannot read more than 2 sentence in one train
> file.
>
> Would you please check these points for me?
>
> Thank you so much for your help.
>
> Best regards,
>
> Trung Tran.
>
> On 05/17/2016 02:06 PM, ttrung@nlke-group.net wrote:
>>
>> Dear Apache OpenNLP Project Team,
>>
>> I have an critical issue when training with Chunker tool in Java:
>>
>> - Firstly, the sample code in documentation site
>> (https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training.api)
>> is not work, both for version 1.5.3 and 1.6.0
>>
>> - Secondly, I have to edit the codes myself to (using version 1.5.3):
>>
>> try {
>> Charset charset = Charset.forName("UTF-8");
>> ObjectStream lineStream = new PlainTextByLineStream(new
>> FileInputStream(fileChunker), charset);
>> ObjectStream<ChunkSample> sampleStream = new
>> ChunkSampleStream(lineStream);
>>
>> chunkerModel = ChunkerME.train("vn", sampleStream,
>> TrainingParameters.defaultParams(), new ChunkerFactory());
>>
>> modelApacheChunkerPath =
>> UtilityHelper.getTemporaryFilePathInsideDir("chunkerModel.bin");
>> OutputStream modelOut = new BufferedOutputStream(new
>> FileOutputStream(modelApacheChunkerPath));
>> chunkerModel.serialize(modelOut);
>> } catch (FileNotFoundException fe) {
>>
>> } catch (IOException ie) {
>>
>> }
>>
>> - Thirdly, I have the error "java.lang.String cannot be cast to
>> opennlp.tools.parser.Parse". The reason is:
>>
>> + The constructor of class ChunkSampleStream requires
>> parameter is "ObjectStream<Parse> in"
>>
>> + However, the second parameter of method ChunkerME.train is
>> "ObjectStream<ChunkSample> in"
>>
>> I cannot find any way to work around this issue.
>>
>> Would you please check this point for me?
>>
>> Thank you so much for your help.
>>
>> Best regards,
>>
>> Trung Tran.
>
>
Re: Stuck with class ChunkerME: java.lang.String cannot be cast to
opennlp.tools.parser.Parse
Posted by "ttrung@nlke-group.net" <tt...@nlke-group.net>.
Dear Apache OpenNLP Project Team,
I have another error with command line tool:
- I did exactly as information in site
(https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training.tool):
E:\test\apache-opennlp-1.5.3\bin>opennlp.bat ChunkerTrainerME -model
E:\test\en-chunker.bin -lang en -data E:\test\tmp.txt -encoding UTF-8
File test only contains sample sentence as in the site :
He PRP B-NP
reckons VBZ B-VP
the DT B-NP
current JJ I-NP
account NN I-NP
deficit NN I-NP
will MD B-VP
narrow VB I-VP
to TO B-PP
only RB B-NP
# # I-NP
1.8 CD I-NP
billion CD I-NP
in IN B-PP
September NNP B-NP
. . O
And here is the error:
Computing event counts... done. 0 events
Indexing... done.
Sorting and merging events... Done indexing.
Incorporating indexed data for training...
Exception in thread "main" java.lang.NullPointerException
at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
at opennlp.maxent.GIS.trainModel(GIS.java:256)
at opennlp.model.TrainUtil.train(TrainUtil.java:184)
at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:214)
at
opennlp.tools.cmdline.chunker.ChunkerTrainerTool.run(ChunkerTrainerTo
ol.java:68)
at opennlp.tools.cmdline.CLI.main(CLI.java:222)
Another point: The function cannot read more than 2 sentence in one
train file.
Would you please check these points for me?
Thank you so much for your help.
Best regards,
Trung Tran.
On 05/17/2016 02:06 PM, ttrung@nlke-group.net wrote:
> Dear Apache OpenNLP Project Team,
>
> I have an critical issue when training with Chunker tool in Java:
>
> - Firstly, the sample code in documentation site
> (https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training.api)
> is not work, both for version 1.5.3 and 1.6.0
>
> - Secondly, I have to edit the codes myself to (using version 1.5.3):
>
> try {
> Charset charset = Charset.forName("UTF-8");
> ObjectStream lineStream = new PlainTextByLineStream(new
> FileInputStream(fileChunker), charset);
> ObjectStream<ChunkSample> sampleStream = new
> ChunkSampleStream(lineStream);
>
> chunkerModel = ChunkerME.train("vn", sampleStream,
> TrainingParameters.defaultParams(), new ChunkerFactory());
>
> modelApacheChunkerPath =
> UtilityHelper.getTemporaryFilePathInsideDir("chunkerModel.bin");
> OutputStream modelOut = new BufferedOutputStream(new
> FileOutputStream(modelApacheChunkerPath));
> chunkerModel.serialize(modelOut);
> } catch (FileNotFoundException fe) {
>
> } catch (IOException ie) {
>
> }
>
> - Thirdly, I have the error "java.lang.String cannot be cast to
> opennlp.tools.parser.Parse". The reason is:
>
> + The constructor of class ChunkSampleStream requires
> parameter is "ObjectStream<Parse> in"
>
> + However, the second parameter of method ChunkerME.train
> is "ObjectStream<ChunkSample> in"
>
> I cannot find any way to work around this issue.
>
> Would you please check this point for me?
>
> Thank you so much for your help.
>
> Best regards,
>
> Trung Tran.