You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Piotr Kosiorowski <pk...@gmail.com> on 2005/08/19 20:30:01 UTC

Failing JUnit test

Hello,
I have updated my local copy today and JUnit tests started to fail.

expected:<el> but was:<sv>
junit.framework.ComparisonFailure: expected:<el> but was:<sv>
	at 
org.apache.nutch.analysis.lang.TestLanguageIdentifier.testIdentify(Unknown 
Source)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

As I suspect it is a result of latest updates to LanguageIdentifier 
plugin or its tests. I am not deep into it I will not try to debug it 
myslef at the moment - so just wanted you to know about the issue.
Regards
Piotr

Re: Failing JUnit test

Posted by Piotr Kosiorowski <pk...@gmail.com>.
On 8/22/05, Jérôme Charron <je...@gmail.com> wrote:
> Does someone know for a typical pattern for this?
I would read chars instead of bytes - I think it should solve this problem.
P.

Re: Failing JUnit test

Posted by Jérôme Charron <je...@gmail.com>.
> I found it and commited the fix. It was not using UTF-8 encoding 
> sometimes.

Thanks Piotr

> But while looking at the code I feel a little bit worried about
> LanguageIdentifier.identify(InputStream is) - as it reads bytes from
> file in chunks and coverts each chunk to stink separatelly. If multibyte
> UT-8 character is located at the chunk boundary it would would be split
> in two parts.
> Am I right?

Yes Piotr, you are right. It's a very good analysis.
Who said code review isn't useful? ;-)
Hopefully, this method is not used in nutch internals.
I will provide a correction as soon as possible.
Does someone know for a typical pattern for this?

Thanks again Piotr.

Regards

Jérôme

-- 
http://motrech.free.fr/
http://www.frutch.org/

Re: Failing JUnit test

Posted by Piotr Kosiorowski <pk...@gmail.com>.
Hello Jérôme,
I found it and commited the fix. It was not using UTF-8 encoding sometimes.
But while looking at the code I feel a little bit worried about
LanguageIdentifier.identify(InputStream is) - as it reads bytes from 
file in chunks and coverts each chunk to stink separatelly. If multibyte 
UT-8 character is located at the chunk boundary it would would be split 
in two parts.
Am I right?

Regards
Piotr


Jérôme Charron wrote:
>>It works on my Linux box - with both JDK 1.4 and 1.5.
> 
> 
> ok. so it seems to be constent with my conf.
> 
> 
>>I will try to track it down.
> 
> 
> I assume it is an encoding problem of the Ngram profile files, but I have no 
> time evening.
> Regards
> 
> Jérôme
> 



Re: Failing JUnit test

Posted by Jérôme Charron <je...@gmail.com>.
> It works on my Linux box - with both JDK 1.4 and 1.5.

ok. so it seems to be constent with my conf.

> I will try to track it down.

I assume it is an encoding problem of the Ngram profile files, but I have no 
time evening.
Regards

Jérôme

Re: Failing JUnit test

Posted by Piotr Kosiorowski <pk...@gmail.com>.
It works on my Linux box - with both JDK 1.4 and 1.5.
I will try to track it down.
Regards
Piotr
Jérôme Charron wrote:
>>I am using JDK 1.5 on
>>Windows - I can test it on 1.4,1.5 on linux tomorrow - maybe this is the
>>problem.
> 
> 
> OK. Thanks
> Jérôme
> 



Re: Failing JUnit test

Posted by Jérôme Charron <je...@gmail.com>.
> I am using JDK 1.5 on
> Windows - I can test it on 1.4,1.5 on linux tomorrow - maybe this is the
> problem.

OK. Thanks
Jérôme

-- 
http://motrech.free.fr/
http://www.frutch.org/

Re: Failing JUnit test

Posted by Piotr Kosiorowski <pk...@gmail.com>.
It looks like it fails again. Can I do anything for you to help to 
identify this issue (not today but during the weekend)? I can try to 
debug it myself or run some code prepared by you. I am using JDK 1.5 on 
Windows - I can test it on 1.4,1.5 on linux tomorrow - maybe this is the 
problem.
Regards,
Piotr

Jérôme Charron wrote:
>>As I suspect it is a result of latest updates to LanguageIdentifier
>>
>>>plugin or its tests.
>>
>>Piotr, I have just commited a minor change in language identifier plugin 
> 
> unit test.
> Could you please update your local copy and test again?
> 
> Jerome
> 



Re: Failing JUnit test

Posted by Jérôme Charron <je...@gmail.com>.
> As I suspect it is a result of latest updates to LanguageIdentifier
> > plugin or its tests.
> 
> Piotr, I have just commited a minor change in language identifier plugin 
unit test.
Could you please update your local copy and test again?

Jerome

Re: Failing JUnit test

Posted by Jérôme Charron <je...@gmail.com>.
> expected:<el> but was:<sv>
> junit.framework.ComparisonFailure: expected:<el> but was:<sv>

As I suspect it is a result of latest updates to LanguageIdentifier
> plugin or its tests. I am not deep into it I will not try to debug it
> myslef at the moment - so just wanted you to know about the issue.

You are right Piotr, it's a language identifier unit test failure.
It's quite strange since, this test is ok on my local copy.
(I have reduce a "tolerance" parameter in unit test before commiting. I 
change it right now, so that unit tests should be ok on your local copy 
too).

Regards 

Jérôme

-- 
http://motrech.free.fr/
http://www.frutch.org/