You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Piotr Kosiorowski <pk...@gmail.com> on 2005/08/19 20:30:01 UTC
Failing JUnit test
Hello,
I have updated my local copy today and JUnit tests started to fail.
expected:<el> but was:<sv>
junit.framework.ComparisonFailure: expected:<el> but was:<sv>
at
org.apache.nutch.analysis.lang.TestLanguageIdentifier.testIdentify(Unknown
Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
As I suspect it is a result of latest updates to LanguageIdentifier
plugin or its tests. I am not deep into it I will not try to debug it
myslef at the moment - so just wanted you to know about the issue.
Regards
Piotr
Re: Failing JUnit test
Posted by Piotr Kosiorowski <pk...@gmail.com>.
On 8/22/05, Jérôme Charron <je...@gmail.com> wrote:
> Does someone know for a typical pattern for this?
I would read chars instead of bytes - I think it should solve this problem.
P.
Re: Failing JUnit test
Posted by Jérôme Charron <je...@gmail.com>.
> I found it and commited the fix. It was not using UTF-8 encoding
> sometimes.
Thanks Piotr
> But while looking at the code I feel a little bit worried about
> LanguageIdentifier.identify(InputStream is) - as it reads bytes from
> file in chunks and coverts each chunk to stink separatelly. If multibyte
> UT-8 character is located at the chunk boundary it would would be split
> in two parts.
> Am I right?
Yes Piotr, you are right. It's a very good analysis.
Who said code review isn't useful? ;-)
Hopefully, this method is not used in nutch internals.
I will provide a correction as soon as possible.
Does someone know for a typical pattern for this?
Thanks again Piotr.
Regards
Jérôme
--
http://motrech.free.fr/
http://www.frutch.org/
Re: Failing JUnit test
Posted by Piotr Kosiorowski <pk...@gmail.com>.
Hello Jérôme,
I found it and commited the fix. It was not using UTF-8 encoding sometimes.
But while looking at the code I feel a little bit worried about
LanguageIdentifier.identify(InputStream is) - as it reads bytes from
file in chunks and coverts each chunk to stink separatelly. If multibyte
UT-8 character is located at the chunk boundary it would would be split
in two parts.
Am I right?
Regards
Piotr
Jérôme Charron wrote:
>>It works on my Linux box - with both JDK 1.4 and 1.5.
>
>
> ok. so it seems to be constent with my conf.
>
>
>>I will try to track it down.
>
>
> I assume it is an encoding problem of the Ngram profile files, but I have no
> time evening.
> Regards
>
> Jérôme
>
Re: Failing JUnit test
Posted by Jérôme Charron <je...@gmail.com>.
> It works on my Linux box - with both JDK 1.4 and 1.5.
ok. so it seems to be constent with my conf.
> I will try to track it down.
I assume it is an encoding problem of the Ngram profile files, but I have no
time evening.
Regards
Jérôme
Re: Failing JUnit test
Posted by Piotr Kosiorowski <pk...@gmail.com>.
It works on my Linux box - with both JDK 1.4 and 1.5.
I will try to track it down.
Regards
Piotr
Jérôme Charron wrote:
>>I am using JDK 1.5 on
>>Windows - I can test it on 1.4,1.5 on linux tomorrow - maybe this is the
>>problem.
>
>
> OK. Thanks
> Jérôme
>
Re: Failing JUnit test
Posted by Jérôme Charron <je...@gmail.com>.
> I am using JDK 1.5 on
> Windows - I can test it on 1.4,1.5 on linux tomorrow - maybe this is the
> problem.
OK. Thanks
Jérôme
--
http://motrech.free.fr/
http://www.frutch.org/
Re: Failing JUnit test
Posted by Piotr Kosiorowski <pk...@gmail.com>.
It looks like it fails again. Can I do anything for you to help to
identify this issue (not today but during the weekend)? I can try to
debug it myself or run some code prepared by you. I am using JDK 1.5 on
Windows - I can test it on 1.4,1.5 on linux tomorrow - maybe this is the
problem.
Regards,
Piotr
Jérôme Charron wrote:
>>As I suspect it is a result of latest updates to LanguageIdentifier
>>
>>>plugin or its tests.
>>
>>Piotr, I have just commited a minor change in language identifier plugin
>
> unit test.
> Could you please update your local copy and test again?
>
> Jerome
>
Re: Failing JUnit test
Posted by Jérôme Charron <je...@gmail.com>.
> As I suspect it is a result of latest updates to LanguageIdentifier
> > plugin or its tests.
>
> Piotr, I have just commited a minor change in language identifier plugin
unit test.
Could you please update your local copy and test again?
Jerome
Re: Failing JUnit test
Posted by Jérôme Charron <je...@gmail.com>.
> expected:<el> but was:<sv>
> junit.framework.ComparisonFailure: expected:<el> but was:<sv>
As I suspect it is a result of latest updates to LanguageIdentifier
> plugin or its tests. I am not deep into it I will not try to debug it
> myslef at the moment - so just wanted you to know about the issue.
You are right Piotr, it's a language identifier unit test failure.
It's quite strange since, this test is ok on my local copy.
(I have reduce a "tolerance" parameter in unit test before commiting. I
change it right now, so that unit tests should be ok on your local copy
too).
Regards
Jérôme
--
http://motrech.free.fr/
http://www.frutch.org/