You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by mbehlok <m_...@hotmail.com> on 2013/02/06 17:28:45 UTC

Re: Parsing error : java.lang.NoClassDefFoundError: org/cyberneko/html/LostText

I fixed it, nutch source comes with outdated nekohtml.jar. Trial and errored
with many neko versions until this one worked for me:

nekohtml-1.9.12.tar.gz

mitch




--
View this message in context: http://lucene.472066.n3.nabble.com/Parsing-error-java-lang-NoClassDefFoundError-org-cyberneko-html-LostText-tp4029958p4038809.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Parsing error : java.lang.NoClassDefFoundError: org/cyberneko/html/LostText

Posted by Arcondo Dasilva <ar...@gmail.com>.
Hello,

That sounds good. I'm going to retry
Thanks for your great support. Never give up !

Kr,


On Thu, Feb 7, 2013 at 5:49 AM, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:

> I uploaded a new patch on NUTCH-1253 for this. It would be greatly
> appreciated if someone could look into it as it seems that the
> TestDOMCOntentUtils tests are all broken further to the neko version
> upgrade!
>
> https://issues.apache.org/jira/browse/NUTCH-1253
>
> Thank you very much.
>
> Best
> Lewis
>
> On Wed, Feb 6, 2013 at 9:35 AM, Lewis John Mcgibbney <
> lewis.mcgibbney@gmail.com> wrote:
>
> > Hi,
> > Two observations here
> > 1) Did you try any versions more recent than 1.9.12? I assume you are
> > talking about the net.sourceforge.nekohtml groupId artifact [0] as oppose
> > to the nekohtml groupId artifact [1]?
> > 2) We need to completely update the nekohtml dependency altogether. We
> > currently use a completely outdated artifact [2] which was rather
> > embarrassingly released in 2005!
> > Great work, and great persistence this is something which definitely need
> > to address.
> > Thanks
> > Lewis
> >
> > [0]
> >
> http://search.maven.org/#search|gav|1|g%3A%22net.sourceforge.nekohtml%22%20AND%20a%3A%22nekohtml%22
> <
> http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22net.sourceforge.nekohtml%22%20AND%20a%3A%22nekohtml%22
> >
> > [1]
> >
> http://search.maven.org/#search|gav|1|g%3A%22nekohtml%22%20AND%20a%3A%22nekohtml%22
> <
> http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22nekohtml%22%20AND%20a%3A%22nekohtml%22
> >
> > [2] http://search.maven.org/#artifactdetails|nekohtml|nekohtml|0.9.5|jar
> >
> >
> > On Wed, Feb 6, 2013 at 8:28 AM, mbehlok <m_...@hotmail.com> wrote:
> >
> >> I fixed it, nutch source comes with outdated nekohtml.jar. Trial and
> >> errored
> >> with many neko versions until this one worked for me:
> >>
> >> nekohtml-1.9.12.tar.gz
> >>
> >> mitch
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://lucene.472066.n3.nabble.com/Parsing-error-java-lang-NoClassDefFoundError-org-cyberneko-html-LostText-tp4029958p4038809.html
> >> Sent from the Nutch - User mailing list archive at Nabble.com.
> >>
> >
> >
> >
> > --
> > *Lewis*
> >
>
>
>
> --
> *Lewis*
>

Re: Parsing error : java.lang.NoClassDefFoundError: org/cyberneko/html/LostText

Posted by Lewis John Mcgibbney <le...@gmail.com>.
I uploaded a new patch on NUTCH-1253 for this. It would be greatly
appreciated if someone could look into it as it seems that the
TestDOMCOntentUtils tests are all broken further to the neko version
upgrade!

https://issues.apache.org/jira/browse/NUTCH-1253

Thank you very much.

Best
Lewis

On Wed, Feb 6, 2013 at 9:35 AM, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:

> Hi,
> Two observations here
> 1) Did you try any versions more recent than 1.9.12? I assume you are
> talking about the net.sourceforge.nekohtml groupId artifact [0] as oppose
> to the nekohtml groupId artifact [1]?
> 2) We need to completely update the nekohtml dependency altogether. We
> currently use a completely outdated artifact [2] which was rather
> embarrassingly released in 2005!
> Great work, and great persistence this is something which definitely need
> to address.
> Thanks
> Lewis
>
> [0]
> http://search.maven.org/#search|gav|1|g%3A%22net.sourceforge.nekohtml%22%20AND%20a%3A%22nekohtml%22<http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22net.sourceforge.nekohtml%22%20AND%20a%3A%22nekohtml%22>
> [1]
> http://search.maven.org/#search|gav|1|g%3A%22nekohtml%22%20AND%20a%3A%22nekohtml%22<http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22nekohtml%22%20AND%20a%3A%22nekohtml%22>
> [2] http://search.maven.org/#artifactdetails|nekohtml|nekohtml|0.9.5|jar
>
>
> On Wed, Feb 6, 2013 at 8:28 AM, mbehlok <m_...@hotmail.com> wrote:
>
>> I fixed it, nutch source comes with outdated nekohtml.jar. Trial and
>> errored
>> with many neko versions until this one worked for me:
>>
>> nekohtml-1.9.12.tar.gz
>>
>> mitch
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Parsing-error-java-lang-NoClassDefFoundError-org-cyberneko-html-LostText-tp4029958p4038809.html
>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>
>
>
>
> --
> *Lewis*
>



-- 
*Lewis*

Re: Parsing error : java.lang.NoClassDefFoundError: org/cyberneko/html/LostText

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi,
Two observations here
1) Did you try any versions more recent than 1.9.12? I assume you are
talking about the net.sourceforge.nekohtml groupId artifact [0] as oppose
to the nekohtml groupId artifact [1]?
2) We need to completely update the nekohtml dependency altogether. We
currently use a completely outdated artifact [2] which was rather
embarrassingly released in 2005!
Great work, and great persistence this is something which definitely need
to address.
Thanks
Lewis

[0]
http://search.maven.org/#search|gav|1|g%3A%22net.sourceforge.nekohtml%22%20AND%20a%3A%22nekohtml%22
[1]
http://search.maven.org/#search|gav|1|g%3A%22nekohtml%22%20AND%20a%3A%22nekohtml%22
[2] http://search.maven.org/#artifactdetails|nekohtml|nekohtml|0.9.5|jar

On Wed, Feb 6, 2013 at 8:28 AM, mbehlok <m_...@hotmail.com> wrote:

> I fixed it, nutch source comes with outdated nekohtml.jar. Trial and
> errored
> with many neko versions until this one worked for me:
>
> nekohtml-1.9.12.tar.gz
>
> mitch
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Parsing-error-java-lang-NoClassDefFoundError-org-cyberneko-html-LostText-tp4029958p4038809.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>



-- 
*Lewis*