You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@opennlp.apache.org by "william.colen@gmail.com" <wi...@gmail.com> on 2012/01/16 00:54:44 UTC

Error in POS Tagger CrossValidator

Hi,

I am having an error in POS Tagger CrossValidator tool from the trunk.
I tried the same command with a released version and it worked, also I
tried Chunker CV tool and it is working too.
I tried debugging the code and check the SVN history for some clue,
but could not find anything. Any idea what is wrong?

$ bin/opennlp POSTaggerCrossValidator -lang pt -encoding MacRoman
-data pos1.txt -cutoff 50

IO error while reading training data or indexing data: Stream not marked

Stack trace:
java.io.IOException: Stream not marked
	at java.io.BufferedReader.reset(BufferedReader.java:485)
	at opennlp.tools.util.PlainTextByLineStream.reset(PlainTextByLineStream.java:79)
	at opennlp.tools.util.FilterObjectStream.reset(FilterObjectStream.java:43)
	at opennlp.tools.util.eval.CrossValidationPartitioner.next(CrossValidationPartitioner.java:256)
	at opennlp.tools.postag.POSTaggerCrossValidator.evaluate(POSTaggerCrossValidator.java:113)
	at opennlp.tools.cmdline.postag.POSTaggerCrossValidatorTool.run(POSTaggerCrossValidatorTool.java:72)
	at opennlp.tools.cmdline.CLI.main(CLI.java:212)


Any idea what is wrong?

Thanks,
William

Re: Error in POS Tagger CrossValidator

Posted by "william.colen@gmail.com" <wi...@gmail.com>.
Hi, Aliaksandr,

Yes, it is clear to me now, and I could even learn from it.
The hierarchy is complex, and a naming convention is really important here.
The diagram helps and we can use it for reference in the future.

Thanks,
William

On Sun, Jan 22, 2012 at 6:41 PM, Aliaksandr Autayeu
<al...@autayeu.com>wrote:

> Hi all,
>
> I have just committed the improvement for OPENNLP-402, which should make it
> much easier to understand the class structure of the cmdline package.
> Naming, javadocs, and the class hierarchy is improved.
>
> Can you take a look at the comment:
>
> https://issues.apache.org/jira/browse/OPENNLP-402?focusedCommentId=13190767#comment-13190767
> and
> the diagram:
>
> https://issues.apache.org/jira/secure/attachment/12511441/opennlp-cmdline-package-class-structure.png
> and
> let me know whether it is clearer now? Thank you!
>
> William, I would appreciate if you give it a look, since you asked above.
>
> Aliaksandr
>
> On Sun, Jan 22, 2012 at 7:24 PM, william.colen@gmail.com <
> william.colen@gmail.com> wrote:
>
> > Thank you, Aliaksandr,
> >
> > I really like how we can use stream factories during training and
> > evaluating tools. The changes you did are great.
> > I don't know if we need to improve the documentation. The code looks
> > good. Now I understand it better with your commit to fix this issue.
> >
> > Thanks
> > William
> >
> > On Sun, Jan 22, 2012 at 8:56 AM, Aliaksandr Autayeu
> > <al...@autayeu.com> wrote:
> > > Hi William!
> > >
> > > Thank you for pointing this out. I have fixed it. Can you check and
> close
> > > the issue?
> > >
> > > How can I improve the documentation (or the code itself) so that the
> new
> > > code is easier to understand? Any advice?
> > >
> > > Aliaksandr
> > >
> > > On Thu, Jan 19, 2012 at 2:57 PM, william.colen@gmail.com <
> > > william.colen@gmail.com> wrote:
> > >
> > >> Aliaksandr,
> > >>
> > >> Could you also check the issue
> > >> https://issues.apache.org/jira/browse/OPENNLP-418 ? I tried to fix it
> > >> by myself but I could not find an appropriate solution. I still have
> > >> to better understand the new code.
> > >>
> > >> Thank you,
> > >> William
> > >>
> > >>
> > >> On Wed, Jan 18, 2012 at 7:34 AM, Jörn Kottmann <ko...@gmail.com>
> > wrote:
> > >> > On 1/18/12 1:05 AM, James Kosin wrote:
> > >> >>
> > >> >> I put the TODO there; because I couldn't determine if it was a
> better
> > >> >> place.  The only big downside to using the Stream is we have no
> > control
> > >> >> over the encoding.  So, I was thinking more that this method of
> > loading
> > >> >> the item would be deprecated anyway.  In favor of the other method.
> > >> >
> > >> >
> > >> > Decoding with the correct encoding is the responsibility of the
> > >> > Reader. There is no need to pass an encoding along in this method,
> > right?
> > >> >
> > >> > Jörn
> > >>
> >
>

Re: Error in POS Tagger CrossValidator

Posted by Aliaksandr Autayeu <al...@autayeu.com>.
Hi all,

I have just committed the improvement for OPENNLP-402, which should make it
much easier to understand the class structure of the cmdline package.
Naming, javadocs, and the class hierarchy is improved.

Can you take a look at the comment:
https://issues.apache.org/jira/browse/OPENNLP-402?focusedCommentId=13190767#comment-13190767
and
the diagram:
https://issues.apache.org/jira/secure/attachment/12511441/opennlp-cmdline-package-class-structure.png
and
let me know whether it is clearer now? Thank you!

William, I would appreciate if you give it a look, since you asked above.

Aliaksandr

On Sun, Jan 22, 2012 at 7:24 PM, william.colen@gmail.com <
william.colen@gmail.com> wrote:

> Thank you, Aliaksandr,
>
> I really like how we can use stream factories during training and
> evaluating tools. The changes you did are great.
> I don't know if we need to improve the documentation. The code looks
> good. Now I understand it better with your commit to fix this issue.
>
> Thanks
> William
>
> On Sun, Jan 22, 2012 at 8:56 AM, Aliaksandr Autayeu
> <al...@autayeu.com> wrote:
> > Hi William!
> >
> > Thank you for pointing this out. I have fixed it. Can you check and close
> > the issue?
> >
> > How can I improve the documentation (or the code itself) so that the new
> > code is easier to understand? Any advice?
> >
> > Aliaksandr
> >
> > On Thu, Jan 19, 2012 at 2:57 PM, william.colen@gmail.com <
> > william.colen@gmail.com> wrote:
> >
> >> Aliaksandr,
> >>
> >> Could you also check the issue
> >> https://issues.apache.org/jira/browse/OPENNLP-418 ? I tried to fix it
> >> by myself but I could not find an appropriate solution. I still have
> >> to better understand the new code.
> >>
> >> Thank you,
> >> William
> >>
> >>
> >> On Wed, Jan 18, 2012 at 7:34 AM, Jörn Kottmann <ko...@gmail.com>
> wrote:
> >> > On 1/18/12 1:05 AM, James Kosin wrote:
> >> >>
> >> >> I put the TODO there; because I couldn't determine if it was a better
> >> >> place.  The only big downside to using the Stream is we have no
> control
> >> >> over the encoding.  So, I was thinking more that this method of
> loading
> >> >> the item would be deprecated anyway.  In favor of the other method.
> >> >
> >> >
> >> > Decoding with the correct encoding is the responsibility of the
> >> > Reader. There is no need to pass an encoding along in this method,
> right?
> >> >
> >> > Jörn
> >>
>

Re: Error in POS Tagger CrossValidator

Posted by "william.colen@gmail.com" <wi...@gmail.com>.
Thank you, Aliaksandr,

I really like how we can use stream factories during training and
evaluating tools. The changes you did are great.
I don't know if we need to improve the documentation. The code looks
good. Now I understand it better with your commit to fix this issue.

Thanks
William

On Sun, Jan 22, 2012 at 8:56 AM, Aliaksandr Autayeu
<al...@autayeu.com> wrote:
> Hi William!
>
> Thank you for pointing this out. I have fixed it. Can you check and close
> the issue?
>
> How can I improve the documentation (or the code itself) so that the new
> code is easier to understand? Any advice?
>
> Aliaksandr
>
> On Thu, Jan 19, 2012 at 2:57 PM, william.colen@gmail.com <
> william.colen@gmail.com> wrote:
>
>> Aliaksandr,
>>
>> Could you also check the issue
>> https://issues.apache.org/jira/browse/OPENNLP-418 ? I tried to fix it
>> by myself but I could not find an appropriate solution. I still have
>> to better understand the new code.
>>
>> Thank you,
>> William
>>
>>
>> On Wed, Jan 18, 2012 at 7:34 AM, Jörn Kottmann <ko...@gmail.com> wrote:
>> > On 1/18/12 1:05 AM, James Kosin wrote:
>> >>
>> >> I put the TODO there; because I couldn't determine if it was a better
>> >> place.  The only big downside to using the Stream is we have no control
>> >> over the encoding.  So, I was thinking more that this method of loading
>> >> the item would be deprecated anyway.  In favor of the other method.
>> >
>> >
>> > Decoding with the correct encoding is the responsibility of the
>> > Reader. There is no need to pass an encoding along in this method, right?
>> >
>> > Jörn
>>

Re: Error in POS Tagger CrossValidator

Posted by Aliaksandr Autayeu <al...@autayeu.com>.
Hi William!

Thank you for pointing this out. I have fixed it. Can you check and close
the issue?

How can I improve the documentation (or the code itself) so that the new
code is easier to understand? Any advice?

Aliaksandr

On Thu, Jan 19, 2012 at 2:57 PM, william.colen@gmail.com <
william.colen@gmail.com> wrote:

> Aliaksandr,
>
> Could you also check the issue
> https://issues.apache.org/jira/browse/OPENNLP-418 ? I tried to fix it
> by myself but I could not find an appropriate solution. I still have
> to better understand the new code.
>
> Thank you,
> William
>
>
> On Wed, Jan 18, 2012 at 7:34 AM, Jörn Kottmann <ko...@gmail.com> wrote:
> > On 1/18/12 1:05 AM, James Kosin wrote:
> >>
> >> I put the TODO there; because I couldn't determine if it was a better
> >> place.  The only big downside to using the Stream is we have no control
> >> over the encoding.  So, I was thinking more that this method of loading
> >> the item would be deprecated anyway.  In favor of the other method.
> >
> >
> > Decoding with the correct encoding is the responsibility of the
> > Reader. There is no need to pass an encoding along in this method, right?
> >
> > Jörn
>

Re: Error in POS Tagger CrossValidator

Posted by "william.colen@gmail.com" <wi...@gmail.com>.
Aliaksandr,

Could you also check the issue
https://issues.apache.org/jira/browse/OPENNLP-418 ? I tried to fix it
by myself but I could not find an appropriate solution. I still have
to better understand the new code.

Thank you,
William


On Wed, Jan 18, 2012 at 7:34 AM, Jörn Kottmann <ko...@gmail.com> wrote:
> On 1/18/12 1:05 AM, James Kosin wrote:
>>
>> I put the TODO there; because I couldn't determine if it was a better
>> place.  The only big downside to using the Stream is we have no control
>> over the encoding.  So, I was thinking more that this method of loading
>> the item would be deprecated anyway.  In favor of the other method.
>
>
> Decoding with the correct encoding is the responsibility of the
> Reader. There is no need to pass an encoding along in this method, right?
>
> Jörn

Re: Error in POS Tagger CrossValidator

Posted by Jörn Kottmann <ko...@gmail.com>.
On 1/18/12 1:05 AM, James Kosin wrote:
> I put the TODO there; because I couldn't determine if it was a better
> place.  The only big downside to using the Stream is we have no control
> over the encoding.  So, I was thinking more that this method of loading
> the item would be deprecated anyway.  In favor of the other method.

Decoding with the correct encoding is the responsibility of the
Reader. There is no need to pass an encoding along in this method, right?

Jörn

Re: Error in POS Tagger CrossValidator

Posted by Aliaksandr Autayeu <al...@autayeu.com>.
Ah... OK.

Aliaksandr

On Wed, Jan 18, 2012 at 1:05 AM, James Kosin <ja...@gmail.com> wrote:

> Aliaksandr,
>
> I put the TODO there; because I couldn't determine if it was a better
> place.  The only big downside to using the Stream is we have no control
> over the encoding.  So, I was thinking more that this method of loading
> the item would be deprecated anyway.  In favor of the other method.
>
> James
>
> On 1/17/2012 5:50 AM, Aliaksandr Autayeu wrote:
> > Guys, if somebody knows that part of the code well, it would be nice to
> > take a look at:
> >
> > 1) TODO left there
> > 2) .reset() raising the above exception if the PlainTextByLineStream is
> > created with a stream.
> >
> > Aliaksandr
> >
> > On Tue, Jan 17, 2012 at 12:12 AM, william.colen@gmail.com <
> > william.colen@gmail.com> wrote:
> >
> >> Thank you, Aliaksandr!
> >>
> >>
> >>
> >> On Mon, Jan 16, 2012 at 6:13 PM, Aliaksandr Autayeu
> >> <al...@autayeu.com> wrote:
> >>> I have reproduced the problem. It boils down to different
> initialization
> >>> of PlainTextByLineStream. If it is instantiated by
> >>>
> >>>   public PlainTextByLineStream(Reader in) {
> >>>     this.in = new BufferedReader(in);
> >>>     this.channel = null;
> >>>     this.encoding = null;
> >>>   }
> >>>
> >>> it does not work. If it is instantiated with a channel:
> >>>
> >>>   public PlainTextByLineStream(FileChannel channel, String
> charsetName) {
> >>>     this.encoding = charsetName;
> >>>     this.channel = channel;
> >>>
> >>>     // TODO: Why isn't reset called here ?
> >>>     in = new BufferedReader(Channels.newReader(channel, encoding));
> >>>   }
> >>>
> >>> it does work, because later on in reset:
> >>>
> >>>     if (channel == null) {
> >>>         in.reset();
> >>>     }
> >>>     else {
> >>>       channel.position(0);
> >>>       in = new BufferedReader(Channels.newReader(channel, encoding));
> >>>     }
> >>>
> >>> reader is recreated instead of direct in.reset() call.
> >>>
> >>>
> >>> Now, these differences come into play because
> WordTagSampleStreamFactory
> >> has
> >>> different PlainTextByLineStream initialization, which is probably my
> >> fault
> >>> due to work on factories in 402. Looks like a copy-paste error.
> >>>
> >>> I have tried to commit a fix, but I'm getting 403 error :(  Please,
> apply
> >>> the attached patch.
> >>>
> >>> Aliaksandr
> >>>
> >>>
> >>> On Mon, Jan 16, 2012 at 12:54 AM, william.colen@gmail.com
> >>> <wi...@gmail.com> wrote:
> >>>> Hi,
> >>>>
> >>>> I am having an error in POS Tagger CrossValidator tool from the trunk.
> >>>> I tried the same command with a released version and it worked, also I
> >>>> tried Chunker CV tool and it is working too.
> >>>> I tried debugging the code and check the SVN history for some clue,
> >>>> but could not find anything. Any idea what is wrong?
> >>>>
> >>>> $ bin/opennlp POSTaggerCrossValidator -lang pt -encoding MacRoman
> >>>> -data pos1.txt -cutoff 50
> >>>>
> >>>> IO error while reading training data or indexing data: Stream not
> marked
> >>>>
> >>>> Stack trace:
> >>>> java.io.IOException: Stream not marked
> >>>>        at java.io.BufferedReader.reset(BufferedReader.java:485)
> >>>>        at
> >>>>
> >>
> opennlp.tools.util.PlainTextByLineStream.reset(PlainTextByLineStream.java:79)
> >>>>        at
> >>>>
> opennlp.tools.util.FilterObjectStream.reset(FilterObjectStream.java:43)
> >>>>        at
> >>>>
> >>
> opennlp.tools.util.eval.CrossValidationPartitioner.next(CrossValidationPartitioner.java:256)
> >>>>        at
> >>>>
> >>
> opennlp.tools.postag.POSTaggerCrossValidator.evaluate(POSTaggerCrossValidator.java:113)
> >>>>        at
> >>>>
> >>
> opennlp.tools.cmdline.postag.POSTaggerCrossValidatorTool.run(POSTaggerCrossValidatorTool.java:72)
> >>>>        at opennlp.tools.cmdline.CLI.main(CLI.java:212)
> >>>>
> >>>>
> >>>> Any idea what is wrong?
> >>>>
> >>>> Thanks,
> >>>> William
> >>>
>
>

Re: Error in POS Tagger CrossValidator

Posted by James Kosin <ja...@gmail.com>.
Aliaksandr,

I put the TODO there; because I couldn't determine if it was a better
place.  The only big downside to using the Stream is we have no control
over the encoding.  So, I was thinking more that this method of loading
the item would be deprecated anyway.  In favor of the other method.

James

On 1/17/2012 5:50 AM, Aliaksandr Autayeu wrote:
> Guys, if somebody knows that part of the code well, it would be nice to
> take a look at:
>
> 1) TODO left there
> 2) .reset() raising the above exception if the PlainTextByLineStream is
> created with a stream.
>
> Aliaksandr
>
> On Tue, Jan 17, 2012 at 12:12 AM, william.colen@gmail.com <
> william.colen@gmail.com> wrote:
>
>> Thank you, Aliaksandr!
>>
>>
>>
>> On Mon, Jan 16, 2012 at 6:13 PM, Aliaksandr Autayeu
>> <al...@autayeu.com> wrote:
>>> I have reproduced the problem. It boils down to different initialization
>>> of PlainTextByLineStream. If it is instantiated by
>>>
>>>   public PlainTextByLineStream(Reader in) {
>>>     this.in = new BufferedReader(in);
>>>     this.channel = null;
>>>     this.encoding = null;
>>>   }
>>>
>>> it does not work. If it is instantiated with a channel:
>>>
>>>   public PlainTextByLineStream(FileChannel channel, String charsetName) {
>>>     this.encoding = charsetName;
>>>     this.channel = channel;
>>>
>>>     // TODO: Why isn't reset called here ?
>>>     in = new BufferedReader(Channels.newReader(channel, encoding));
>>>   }
>>>
>>> it does work, because later on in reset:
>>>
>>>     if (channel == null) {
>>>         in.reset();
>>>     }
>>>     else {
>>>       channel.position(0);
>>>       in = new BufferedReader(Channels.newReader(channel, encoding));
>>>     }
>>>
>>> reader is recreated instead of direct in.reset() call.
>>>
>>>
>>> Now, these differences come into play because WordTagSampleStreamFactory
>> has
>>> different PlainTextByLineStream initialization, which is probably my
>> fault
>>> due to work on factories in 402. Looks like a copy-paste error.
>>>
>>> I have tried to commit a fix, but I'm getting 403 error :(  Please, apply
>>> the attached patch.
>>>
>>> Aliaksandr
>>>
>>>
>>> On Mon, Jan 16, 2012 at 12:54 AM, william.colen@gmail.com
>>> <wi...@gmail.com> wrote:
>>>> Hi,
>>>>
>>>> I am having an error in POS Tagger CrossValidator tool from the trunk.
>>>> I tried the same command with a released version and it worked, also I
>>>> tried Chunker CV tool and it is working too.
>>>> I tried debugging the code and check the SVN history for some clue,
>>>> but could not find anything. Any idea what is wrong?
>>>>
>>>> $ bin/opennlp POSTaggerCrossValidator -lang pt -encoding MacRoman
>>>> -data pos1.txt -cutoff 50
>>>>
>>>> IO error while reading training data or indexing data: Stream not marked
>>>>
>>>> Stack trace:
>>>> java.io.IOException: Stream not marked
>>>>        at java.io.BufferedReader.reset(BufferedReader.java:485)
>>>>        at
>>>>
>> opennlp.tools.util.PlainTextByLineStream.reset(PlainTextByLineStream.java:79)
>>>>        at
>>>> opennlp.tools.util.FilterObjectStream.reset(FilterObjectStream.java:43)
>>>>        at
>>>>
>> opennlp.tools.util.eval.CrossValidationPartitioner.next(CrossValidationPartitioner.java:256)
>>>>        at
>>>>
>> opennlp.tools.postag.POSTaggerCrossValidator.evaluate(POSTaggerCrossValidator.java:113)
>>>>        at
>>>>
>> opennlp.tools.cmdline.postag.POSTaggerCrossValidatorTool.run(POSTaggerCrossValidatorTool.java:72)
>>>>        at opennlp.tools.cmdline.CLI.main(CLI.java:212)
>>>>
>>>>
>>>> Any idea what is wrong?
>>>>
>>>> Thanks,
>>>> William
>>>


Re: Error in POS Tagger CrossValidator

Posted by Aliaksandr Autayeu <al...@autayeu.com>.
Guys, if somebody knows that part of the code well, it would be nice to
take a look at:

1) TODO left there
2) .reset() raising the above exception if the PlainTextByLineStream is
created with a stream.

Aliaksandr

On Tue, Jan 17, 2012 at 12:12 AM, william.colen@gmail.com <
william.colen@gmail.com> wrote:

> Thank you, Aliaksandr!
>
>
>
> On Mon, Jan 16, 2012 at 6:13 PM, Aliaksandr Autayeu
> <al...@autayeu.com> wrote:
> > I have reproduced the problem. It boils down to different initialization
> > of PlainTextByLineStream. If it is instantiated by
> >
> >   public PlainTextByLineStream(Reader in) {
> >     this.in = new BufferedReader(in);
> >     this.channel = null;
> >     this.encoding = null;
> >   }
> >
> > it does not work. If it is instantiated with a channel:
> >
> >   public PlainTextByLineStream(FileChannel channel, String charsetName) {
> >     this.encoding = charsetName;
> >     this.channel = channel;
> >
> >     // TODO: Why isn't reset called here ?
> >     in = new BufferedReader(Channels.newReader(channel, encoding));
> >   }
> >
> > it does work, because later on in reset:
> >
> >     if (channel == null) {
> >         in.reset();
> >     }
> >     else {
> >       channel.position(0);
> >       in = new BufferedReader(Channels.newReader(channel, encoding));
> >     }
> >
> > reader is recreated instead of direct in.reset() call.
> >
> >
> > Now, these differences come into play because WordTagSampleStreamFactory
> has
> > different PlainTextByLineStream initialization, which is probably my
> fault
> > due to work on factories in 402. Looks like a copy-paste error.
> >
> > I have tried to commit a fix, but I'm getting 403 error :(  Please, apply
> > the attached patch.
> >
> > Aliaksandr
> >
> >
> > On Mon, Jan 16, 2012 at 12:54 AM, william.colen@gmail.com
> > <wi...@gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> I am having an error in POS Tagger CrossValidator tool from the trunk.
> >> I tried the same command with a released version and it worked, also I
> >> tried Chunker CV tool and it is working too.
> >> I tried debugging the code and check the SVN history for some clue,
> >> but could not find anything. Any idea what is wrong?
> >>
> >> $ bin/opennlp POSTaggerCrossValidator -lang pt -encoding MacRoman
> >> -data pos1.txt -cutoff 50
> >>
> >> IO error while reading training data or indexing data: Stream not marked
> >>
> >> Stack trace:
> >> java.io.IOException: Stream not marked
> >>        at java.io.BufferedReader.reset(BufferedReader.java:485)
> >>        at
> >>
> opennlp.tools.util.PlainTextByLineStream.reset(PlainTextByLineStream.java:79)
> >>        at
> >> opennlp.tools.util.FilterObjectStream.reset(FilterObjectStream.java:43)
> >>        at
> >>
> opennlp.tools.util.eval.CrossValidationPartitioner.next(CrossValidationPartitioner.java:256)
> >>        at
> >>
> opennlp.tools.postag.POSTaggerCrossValidator.evaluate(POSTaggerCrossValidator.java:113)
> >>        at
> >>
> opennlp.tools.cmdline.postag.POSTaggerCrossValidatorTool.run(POSTaggerCrossValidatorTool.java:72)
> >>        at opennlp.tools.cmdline.CLI.main(CLI.java:212)
> >>
> >>
> >> Any idea what is wrong?
> >>
> >> Thanks,
> >> William
> >
> >
>

Re: Error in POS Tagger CrossValidator

Posted by "william.colen@gmail.com" <wi...@gmail.com>.
Thank you, Aliaksandr!



On Mon, Jan 16, 2012 at 6:13 PM, Aliaksandr Autayeu
<al...@autayeu.com> wrote:
> I have reproduced the problem. It boils down to different initialization
> of PlainTextByLineStream. If it is instantiated by
>
>   public PlainTextByLineStream(Reader in) {
>     this.in = new BufferedReader(in);
>     this.channel = null;
>     this.encoding = null;
>   }
>
> it does not work. If it is instantiated with a channel:
>
>   public PlainTextByLineStream(FileChannel channel, String charsetName) {
>     this.encoding = charsetName;
>     this.channel = channel;
>
>     // TODO: Why isn't reset called here ?
>     in = new BufferedReader(Channels.newReader(channel, encoding));
>   }
>
> it does work, because later on in reset:
>
>     if (channel == null) {
>         in.reset();
>     }
>     else {
>       channel.position(0);
>       in = new BufferedReader(Channels.newReader(channel, encoding));
>     }
>
> reader is recreated instead of direct in.reset() call.
>
>
> Now, these differences come into play because WordTagSampleStreamFactory has
> different PlainTextByLineStream initialization, which is probably my fault
> due to work on factories in 402. Looks like a copy-paste error.
>
> I have tried to commit a fix, but I'm getting 403 error :(  Please, apply
> the attached patch.
>
> Aliaksandr
>
>
> On Mon, Jan 16, 2012 at 12:54 AM, william.colen@gmail.com
> <wi...@gmail.com> wrote:
>>
>> Hi,
>>
>> I am having an error in POS Tagger CrossValidator tool from the trunk.
>> I tried the same command with a released version and it worked, also I
>> tried Chunker CV tool and it is working too.
>> I tried debugging the code and check the SVN history for some clue,
>> but could not find anything. Any idea what is wrong?
>>
>> $ bin/opennlp POSTaggerCrossValidator -lang pt -encoding MacRoman
>> -data pos1.txt -cutoff 50
>>
>> IO error while reading training data or indexing data: Stream not marked
>>
>> Stack trace:
>> java.io.IOException: Stream not marked
>>        at java.io.BufferedReader.reset(BufferedReader.java:485)
>>        at
>> opennlp.tools.util.PlainTextByLineStream.reset(PlainTextByLineStream.java:79)
>>        at
>> opennlp.tools.util.FilterObjectStream.reset(FilterObjectStream.java:43)
>>        at
>> opennlp.tools.util.eval.CrossValidationPartitioner.next(CrossValidationPartitioner.java:256)
>>        at
>> opennlp.tools.postag.POSTaggerCrossValidator.evaluate(POSTaggerCrossValidator.java:113)
>>        at
>> opennlp.tools.cmdline.postag.POSTaggerCrossValidatorTool.run(POSTaggerCrossValidatorTool.java:72)
>>        at opennlp.tools.cmdline.CLI.main(CLI.java:212)
>>
>>
>> Any idea what is wrong?
>>
>> Thanks,
>> William
>
>

Re: Error in POS Tagger CrossValidator

Posted by Aliaksandr Autayeu <al...@autayeu.com>.
Yes.

Aliaksandr

On Mon, Jan 16, 2012 at 9:23 PM, Jörn Kottmann <ko...@gmail.com> wrote:

> On 1/16/12 9:13 PM, Aliaksandr Autayeu wrote:
>
>> I have tried to commit a fix, but I'm getting 403 error :(  Please, apply
>> the attached patch.
>>
>
> We need to investigate why you cannot commit.
> Did you checkout opennlp via the https URL?
>
> Jörn
>

Re: Error in POS Tagger CrossValidator

Posted by Jörn Kottmann <ko...@gmail.com>.
On 1/16/12 9:13 PM, Aliaksandr Autayeu wrote:
> I have tried to commit a fix, but I'm getting 403 error :(  Please, 
> apply the attached patch.

We need to investigate why you cannot commit.
Did you checkout opennlp via the https URL?

Jörn

Re: Error in POS Tagger CrossValidator

Posted by Aliaksandr Autayeu <al...@autayeu.com>.
I have reproduced the problem. It boils down to different initialization
of PlainTextByLineStream. If it is instantiated by

  public PlainTextByLineStream(Reader in) {
    this.in = new BufferedReader(in);
    this.channel = null;
    this.encoding = null;
  }

it does not work. If it is instantiated with a channel:

  public PlainTextByLineStream(FileChannel channel, String charsetName) {
    this.encoding = charsetName;
    this.channel = channel;

    // TODO: Why isn't reset called here ?
    in = new BufferedReader(Channels.newReader(channel, encoding));
  }

it does work, because later on in reset:

    if (channel == null) {
        in.reset();
    }
    else {
      channel.position(0);
      in = new BufferedReader(Channels.newReader(channel, encoding));
    }

reader is recreated instead of direct in.reset() call.


Now, these differences come into play because WordTagSampleStreamFactory
has different PlainTextByLineStream initialization, which is probably my
fault due to work on factories in 402. Looks like a copy-paste error.

I have tried to commit a fix, but I'm getting 403 error :(  Please, apply
the attached patch.

Aliaksandr

On Mon, Jan 16, 2012 at 12:54 AM, william.colen@gmail.com <
william.colen@gmail.com> wrote:

> Hi,
>
> I am having an error in POS Tagger CrossValidator tool from the trunk.
> I tried the same command with a released version and it worked, also I
> tried Chunker CV tool and it is working too.
> I tried debugging the code and check the SVN history for some clue,
> but could not find anything. Any idea what is wrong?
>
> $ bin/opennlp POSTaggerCrossValidator -lang pt -encoding MacRoman
> -data pos1.txt -cutoff 50
>
> IO error while reading training data or indexing data: Stream not marked
>
> Stack trace:
> java.io.IOException: Stream not marked
>        at java.io.BufferedReader.reset(BufferedReader.java:485)
>        at
> opennlp.tools.util.PlainTextByLineStream.reset(PlainTextByLineStream.java:79)
>        at
> opennlp.tools.util.FilterObjectStream.reset(FilterObjectStream.java:43)
>        at
> opennlp.tools.util.eval.CrossValidationPartitioner.next(CrossValidationPartitioner.java:256)
>        at
> opennlp.tools.postag.POSTaggerCrossValidator.evaluate(POSTaggerCrossValidator.java:113)
>        at
> opennlp.tools.cmdline.postag.POSTaggerCrossValidatorTool.run(POSTaggerCrossValidatorTool.java:72)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:212)
>
>
> Any idea what is wrong?
>
> Thanks,
> William
>