You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by vi...@socialinfra.net on 2014/09/02 09:22:10 UTC

NullPointerException occured during indexing to solr from nutch 1.7 source build.



Hi,
I have taken nutch 1.7 source and copied
mapred-site.xml,hdfs-site.xml,yarn-site.xml,hadoop-env.sh,core-site.xml
from my Hadoop 2.3.0-cdh5.1.0 and did an ant build.
Then went on to
runtime/deploy/bin  to start the crawling. it successfully submitted
the jobs to my yarn. But later during indexing to solr, i'm getting below
exceptions.
I have copied the scheme-solr4.xml to my solr and added
exceptions in regex-urlfilter.txt for a particular website which i give
for crawling in the directory urls/seed.txt.
Error:
java.lang.NullPointerException

	        at
org.apache.hadoop.io.Text.encode(Text.java:443)

	        at
org.apache.hadoop.io.Text.set(Text.java:198)

	        at
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:270)

	        at
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:241)

	        at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:198)

	        at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:184)

	        at
org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)

	        at
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)

	        at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)

	        at
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)

	        at
java.security.AccessController.doPrivileged(Native Method)

	        at
javax.security.auth.Subject.doAs(Subject.java:415)

	        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)

	        at
org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

	 

	Kindly, can any one tell me how to solve this issue? I'm basically stuck
here!!

Re: NullPointerException occured during indexing to solr from nutch 1.7 source build.

Posted by vi...@socialinfra.net.





Hi Ameer,
Thanks for the information.
Yes i had created
multiple collections for understanding. Then once deleted everything and
kept only one collection which automatically resolved this
issue.
Thanks all for the
help.
________________________________________________
From:"atawfik"
<co...@gmail.com>

Sent:user@nutch.apache.org

Date:Fri, September 5, 2014 5:15 am

Subject:Re: NullPointerException occured during indexing to solr from
nutch 1.7 source build.





> Hi,

>

> If I am not mistaken your Solr url is not accurate. You should
provide the

> Solr url plus the used core. For instance, if your core is named

> "collection1"; the default Solr core name, then your url
should be

> *http://solr-server:8983/solr/collection1*. I believe if you review
Solr

> or

> Nutch logs, you will see that the indexing job has failed.

>

> Regards

> Ameer

>

>

>

> --

> View this message in context:

>
http://lucene.472066.n3.nabble.com/NullPointerException-occured-during-indexing-to-solr-from-nutch-1-7-source-build-tp4156343p4157058.html

> Sent from the Nutch - User mailing list archive at Nabble.com.

>

Re: NullPointerException occured during indexing to solr from nutch 1.7 source build.

Posted by atawfik <co...@gmail.com>.

Hi,

If I am not mistaken your Solr url is not accurate. You should provide the
Solr url plus the used core. For instance, if your core is named
"collection1"; the default Solr core name, then your url should be
*http://solr-server:8983/solr/collection1*. I believe if you review Solr or
Nutch logs, you will see that the indexing job has failed.

Regards
Ameer



--
View this message in context: http://lucene.472066.n3.nabble.com/NullPointerException-occured-during-indexing-to-solr-from-nutch-1-7-source-build-tp4156343p4157058.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: NullPointerException occured during indexing to solr from nutch 1.7 source build.

Posted by vi...@socialinfra.net.





Hi Talat,
Thanks for the information, I tried with 1.8 nutch, It
works fine and job compeleted.
However, i was not able to find the
data indexed in solr even i gave below command where solr url is
mentioned:
./crawl /user/nutch/urls /tmp/nutch_1_8_first_output
http://solr-server:8983/solr 1
I was assuming that after migrating
and by specifying solr-server url while running would ensure that data
crawled would get indexed automatically to solr.
is that not the
case? 
If not then how do i manually do it?
:)
________________________________________________

From:"Talat Uyarer" <ta...@uyarer.com>

Sent:user@nutch.apache.org

Date:Tue, September 2, 2014 8:35 pm

Subject:Re: NullPointerException occured during indexing to solr from
nutch 1.7 source build.





> Hi,

>

> This is an issue. Below is the code of SolrDeleteDuplicate class
from

> nutch

> 1.7 trunk where the solr record is deleted by id field. As documents
don't

> have the url field therefore the id of the documents empty, so its

> throwing

> a null pointer exception when it runs.

>

> Now i am writing on my phone. i diÅ&Yuml; not find this issue.
But if you

> update

> from 1.7 to newer version. You will not get this error.

>

> Talat

> On Sep 2, 2014 10:22 AM, <vi...@socialinfra.net>
wrote:

>

>>

>>

>>

>> Hi,

>> I have taken nutch 1.7 source and copied

>>
mapred-site.xml,hdfs-site.xml,yarn-site.xml,hadoop-env.sh,core-site.xml

>> from my Hadoop 2.3.0-cdh5.1.0 and did an ant build.

>> Then went on to

>> runtime/deploy/bin to start the crawling. it successfully
submitted

>> the jobs to my yarn. But later during indexing to solr, i'm
getting

>> below

>> exceptions.

>> I have copied the scheme-solr4.xml to my solr and added

>> exceptions in regex-urlfilter.txt for a particular website which
i give

>> for crawling in the directory urls/seed.txt.

>> Error:

>> java.lang.NullPointerException

>>

>> at

>> org.apache.hadoop.io.Text.encode(Text.java:443)

>>

>> at

>> org.apache.hadoop.io.Text.set(Text.java:198)

>>

>> at

>>

>>
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:270)

>>

>> at

>>

>>
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:241)

>>

>> at

>>

>>
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:198)

>>

>> at

>>
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:184)

>>

>> at

>> org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)

>>

>> at

>>
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)

>>

>> at

>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)

>>

>> at

>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)

>>

>> at

>> java.security.AccessController.doPrivileged(Native Method)

>>

>> at

>> javax.security.auth.Subject.doAs(Subject.java:415)

>>

>> at

>>

>>
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)

>>

>> at

>> org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

>>

>>

>>

>> Kindly, can any one tell me how to solve this issue? I'm

>> basically

>> stuck

>> here!!

>>

>>

>

Re: NullPointerException occured during indexing to solr from nutch 1.7 source build.

Posted by Talat Uyarer <ta...@uyarer.com>.

Hi,

This is an issue. Below is the code of SolrDeleteDuplicate class from nutch
1.7 trunk where the solr record is deleted by id field. As documents don't
have the url field therefore the id of the documents empty, so its throwing
a null pointer exception when it runs.

Now i am writing on my phone. i diş not find this issue. But if you update
from 1.7 to newer version. You will not get this error.

Talat
On Sep 2, 2014 10:22 AM, <vi...@socialinfra.net> wrote:

>
>
>
> Hi,
> I have taken nutch 1.7 source and copied
> mapred-site.xml,hdfs-site.xml,yarn-site.xml,hadoop-env.sh,core-site.xml
> from my Hadoop 2.3.0-cdh5.1.0 and did an ant build.
> Then went on to
> runtime/deploy/bin  to start the crawling. it successfully submitted
> the jobs to my yarn. But later during indexing to solr, i'm getting below
> exceptions.
> I have copied the scheme-solr4.xml to my solr and added
> exceptions in regex-urlfilter.txt for a particular website which i give
> for crawling in the directory urls/seed.txt.
> Error:
> java.lang.NullPointerException
>
>                 at
> org.apache.hadoop.io.Text.encode(Text.java:443)
>
>                 at
> org.apache.hadoop.io.Text.set(Text.java:198)
>
>                 at
>
> org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:270)
>
>                 at
>
> org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:241)
>
>                 at
>
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:198)
>
>                 at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:184)
>
>                 at
> org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
>
>                 at
> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
>
>                 at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>
>                 at
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
>
>                 at
> java.security.AccessController.doPrivileged(Native Method)
>
>                 at
> javax.security.auth.Subject.doAs(Subject.java:415)
>
>                 at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>
>                 at
> org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
>
>
>
>         Kindly, can any one tell me how to solve this issue? I'm basically
> stuck
> here!!
>
>