You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Bai Shen <ba...@gmail.com> on 2012/08/13 15:10:07 UTC

MoreIndexingFilter plugin failing with NPE

MoreIndexingFilter is failing with an NPE when trying to index
http://spiderbites.nytimes.com/

The contentType comes back as null.  There is a check for this in order to
determine which MIME command to run.

However, when you check to see if the content type needs to be spilt into
sub parts, there is no check and it throws and NPE.

Re: MoreIndexingFilter plugin failing with NPE

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Thanks for reporting this.

Are you able to do as Markus said and open an issue? if so please
include the Nutch version and a suggestion for the test we need to add
+ the fix.

Thank you very much

Lewis

On Wed, Aug 15, 2012 at 1:38 PM, Bai Shen <ba...@gmail.com> wrote:
> It's not being tested for.
>
> For the time being, I just set moreIndexingFilter.indexMimeTypeParts to
> false as I don't need the parts.
>
> On Mon, Aug 13, 2012 at 10:12 AM, Lewis John Mcgibbney <
> lewis.mcgibbney@gmail.com> wrote:
>
>> Hi Bai,
>>
>> On Mon, Aug 13, 2012 at 2:10 PM, Bai Shen <ba...@gmail.com> wrote:
>>
>> > The contentType comes back as null.  There is a check for this in order
>> to
>> > determine which MIME command to run.
>> >
>> > However, when you check to see if the content type needs to be spilt into
>> > sub parts, there is no check and it throws and NPE.
>>
>> Can you check here [0] and see if we check for it,if not then we can
>> add the trivial test... and provide fix if required.
>>
>> Best
>>
>> Lewis
>>
>> [0]
>> http://svn.apache.org/repos/asf/nutch/branches/2.x/src/plugin/index-more/src/test/org/apache/nutch/indexer/more/TestMoreIndexingFilter.java
>>
>>
>>
>> --
>> Lewis
>>



-- 
Lewis

Re: MoreIndexingFilter plugin failing with NPE

Posted by Bai Shen <ba...@gmail.com>.
It's not being tested for.

For the time being, I just set moreIndexingFilter.indexMimeTypeParts to
false as I don't need the parts.

On Mon, Aug 13, 2012 at 10:12 AM, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:

> Hi Bai,
>
> On Mon, Aug 13, 2012 at 2:10 PM, Bai Shen <ba...@gmail.com> wrote:
>
> > The contentType comes back as null.  There is a check for this in order
> to
> > determine which MIME command to run.
> >
> > However, when you check to see if the content type needs to be spilt into
> > sub parts, there is no check and it throws and NPE.
>
> Can you check here [0] and see if we check for it,if not then we can
> add the trivial test... and provide fix if required.
>
> Best
>
> Lewis
>
> [0]
> http://svn.apache.org/repos/asf/nutch/branches/2.x/src/plugin/index-more/src/test/org/apache/nutch/indexer/more/TestMoreIndexingFilter.java
>
>
>
> --
> Lewis
>

Re: MoreIndexingFilter plugin failing with NPE

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi Bai,

On Mon, Aug 13, 2012 at 2:10 PM, Bai Shen <ba...@gmail.com> wrote:

> The contentType comes back as null.  There is a check for this in order to
> determine which MIME command to run.
>
> However, when you check to see if the content type needs to be spilt into
> sub parts, there is no check and it throws and NPE.

Can you check here [0] and see if we check for it,if not then we can
add the trivial test... and provide fix if required.

Best

Lewis

[0] http://svn.apache.org/repos/asf/nutch/branches/2.x/src/plugin/index-more/src/test/org/apache/nutch/indexer/more/TestMoreIndexingFilter.java



-- 
Lewis

Re: MoreIndexingFilter plugin failing with NPE

Posted by Bai Shen <ba...@gmail.com>.
Sorry, I keep forgetting.  I'm using the Nutch 2.x branch as of last week.
However, there hasn't been a change to the filter in a month or so.

It was parsed correctly as far as I can tell.  I'm seeing the same content
in solr as what I see in the browser.

On Mon, Aug 13, 2012 at 9:19 AM, Markus Jelsma
<ma...@openindex.io>wrote:

> Strange, no content type, that should not happen. Anyway, you can open an
> issue in Jira for this.
> Please mention your Nutch version.
>
> I cannot replicate it with trunk.
>
> Also, is it being parsed at all?
>
> -----Original message-----
> > From:Bai Shen <ba...@gmail.com>
> > Sent: Mon 13-Aug-2012 15:12
> > To: user@nutch.apache.org
> > Subject: MoreIndexingFilter plugin failing with NPE
> >
> > MoreIndexingFilter is failing with an NPE when trying to index
> > http://spiderbites.nytimes.com/
> >
> > The contentType comes back as null.  There is a check for this in order
> to
> > determine which MIME command to run.
> >
> > However, when you check to see if the content type needs to be spilt into
> > sub parts, there is no check and it throws and NPE.
> >
>

RE: MoreIndexingFilter plugin failing with NPE

Posted by Markus Jelsma <ma...@openindex.io>.
Strange, no content type, that should not happen. Anyway, you can open an issue in Jira for this.
Please mention your Nutch version. 

I cannot replicate it with trunk.

Also, is it being parsed at all? 
 
-----Original message-----
> From:Bai Shen <ba...@gmail.com>
> Sent: Mon 13-Aug-2012 15:12
> To: user@nutch.apache.org
> Subject: MoreIndexingFilter plugin failing with NPE
> 
> MoreIndexingFilter is failing with an NPE when trying to index
> http://spiderbites.nytimes.com/
> 
> The contentType comes back as null.  There is a check for this in order to
> determine which MIME command to run.
> 
> However, when you check to see if the content type needs to be spilt into
> sub parts, there is no check and it throws and NPE.
>