You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by glumet <ja...@gmail.com> on 2014/02/20 15:47:02 UTC

Re: Please help - Nutch fetch command not fetching data

Hello, I would like to ask if you solved the problem? I have exactly same
situation right now. 

Generator generates batchId, fetcher gives this batchId but the result is:

.
.
.
0/0 spinwaiting/active, 0 pages, 0 errors, 0.0 0 pages/s, 0 0 kb/s, 0 URLs
in 0 queues
-activeThreads=0
FetcherJob: done

and no urls are fetched.



--
View this message in context: http://lucene.472066.n3.nabble.com/Please-help-Nutch-fetch-command-not-fetching-data-tp3764751p4118565.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Re: Please help - Nutch fetch command not fetching data

Posted by glumet <ja...@gmail.com>.
Hi,

when I look into hadoop.log, I can see

2014-02-22 16:16:19,174 INFO  mapreduce.GoraRecordReader -
gora.buffer.read.limit = 10000
2014-02-22 16:16:21,200 INFO  store.HBaseStore - Keyclass and nameclass
match but mismatching table names  mappingfile schema is 'webpage' vs actual
schema 'webpage_webpage' , assuming they are the same.
2014-02-22 16:16:21,220 INFO  store.HBaseStore - Keyclass and nameclass
match but mismatching table names  mappingfile schema is 'webpage' vs actual
schema 'webpage_webpage' , assuming they are the same.
2014-02-22 16:16:21,249 INFO  store.HBaseStore - Keyclass and nameclass
match but mismatching table names  mappingfile schema is 'webpage' vs actual
schema 'webpage_webpage' , assuming they are the same.
2014-02-22 16:16:21,284 INFO  mapreduce.GoraRecordReader -
gora.buffer.read.limit = 10000

etc. etc. ... this repeats many times... my hbase table is walked through
and I think it cannot find specified batch ID...

when I look into hbase log, there is 

xbouj19@ir:~$ tail -f /opt/ir/hbase/logs/hbase-root-master-ir.log
2014-02-22 16:15:20,697 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
started; Attempting to free 19.38 MB of total=164.39 MB
2014-02-22 16:15:20,699 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
completed; freed=19.43 MB, total=144.96 MB, single=86.25 MB, multi=72.5 MB,
memory=4.05 MB
2014-02-22 16:15:21,257 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
started; Attempting to free 19.35 MB of total=164.37 MB
2014-02-22 16:15:21,258 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
completed; freed=19.43 MB, total=145.08 MB, single=86.22 MB, multi=72.5 MB,
memory=4.05 MB
2014-02-22 16:15:21,645 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
started; Attempting to free 19.35 MB of total=164.37 MB
2014-02-22 16:15:21,646 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
completed; freed=19.38 MB, total=145.07 MB, single=86.22 MB, multi=72.5 MB,
memory=4.05 MB
2014-02-22 16:15:22,251 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
started; Attempting to free 19.36 MB of total=164.37 MB
2014-02-22 16:15:22,252 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
completed; freed=19.36 MB, total=145.01 MB, single=86.23 MB, multi=72.5 MB,
memory=4.05 MB
2014-02-22 16:15:22,596 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
started; Attempting to free 19.38 MB of total=164.4 MB
2014-02-22 16:15:22,598 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
completed; freed=19.42 MB, total=144.98 MB, single=86.25 MB, multi=72.5 MB,
memory=4.05 MB
2014-02-22 16:15:23,694 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
started; Attempting to free 19.39 MB of total=164.41 MB
2014-02-22 16:15:23,696 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
completed; freed=19.41 MB, total=144.99 MB, single=86.26 MB, multi=72.5 MB,
memory=4.05 MB
2014-02-22 16:15:24,164 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
started; Attempting to free 19.39 MB of total=164.4 MB
2014-02-22 16:15:24,165 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
completed; freed=19.49 MB, total=144.91 MB, single=86.25 MB, multi=72.5 MB,
memory=4.05 MB
2014-02-22 16:15:24,432 INFO org.apache.zookeeper.server.NIOServerCnxn:
Accepted socket connection from /127.0.0.1:37587
2014-02-22 16:15:24,432 INFO org.apache.zookeeper.server.NIOServerCnxn:
Client attempting to establish new session at /127.0.0.1:37587
2014-02-22 16:15:24,434 INFO org.apache.zookeeper.server.NIOServerCnxn:
Established session 0x1444be389493276 with negotiated timeout 40000 for
client /127.0.0.1:37587
2014-02-22 16:15:24,440 INFO
org.apache.zookeeper.server.PrepRequestProcessor: Processed session
termination for sessionid: 0x1444be389493276
2014-02-22 16:15:24,441 INFO org.apache.zookeeper.server.NIOServerCnxn:
Closed socket connection for client /127.0.0.1:37587 which had sessionid
0x1444be389493276
2014-02-22 16:15:24,454 INFO org.apache.zookeeper.server.NIOServerCnxn:
Accepted socket connection from /127.0.0.1:37588
2014-02-22 16:15:24,454 INFO org.apache.zookeeper.server.NIOServerCnxn:
Client attempting to establish new session at /127.0.0.1:37588
2014-02-22 16:15:24,455 INFO org.apache.zookeeper.server.NIOServerCnxn:
Established session 0x1444be389493277 with negotiated timeout 40000 for
client /127.0.0.1:37588
2014-02-22 16:15:24,461 INFO
org.apache.zookeeper.server.PrepRequestProcessor: Processed session
termination for sessionid: 0x1444be389493277
2014-02-22 16:15:24,462 INFO org.apache.zookeeper.server.NIOServerCnxn:
Closed socket connection for client /127.0.0.1:37588 which had sessionid
0x1444be389493277
2014-02-22 16:15:24,474 INFO org.apache.zookeeper.server.NIOServerCnxn:
Accepted socket connection from /127.0.0.1:37589
2014-02-22 16:15:24,474 INFO org.apache.zookeeper.server.NIOServerCnxn:
Client attempting to establish new session at /127.0.0.1:37589
2014-02-22 16:15:24,476 INFO org.apache.zookeeper.server.NIOServerCnxn:
Established session 0x1444be389493278 with negotiated timeout 40000 for
client /127.0.0.1:37589
2014-02-22 16:15:24,484 INFO
org.apache.zookeeper.server.PrepRequestProcessor: Processed session
termination for sessionid: 0x1444be389493278
2014-02-22 16:15:24,485 INFO org.apache.zookeeper.server.NIOServerCnxn:
Closed socket connection for client /127.0.0.1:37589 which had sessionid
0x1444be389493278
2014-02-22 16:15:24,810 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
started; Attempting to free 19.37 MB of total=164.39 MB
2014-02-22 16:15:24,812 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
completed; freed=19.39 MB, total=145 MB, single=86.24 MB, multi=72.5 MB,
memory=4.05 MB
2014-02-22 16:15:25,236 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
started; Attempting to free 19.4 MB of total=164.42 MB
2014-02-22 16:15:25,237 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
completed; freed=19.43 MB, total=145.11 MB, single=86.27 MB, multi=72.5 MB,
memory=4.05 MB
2014-02-22 16:15:25,680 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
started; Attempting to free 19.36 MB of total=164.37 MB
2014-02-22 16:15:25,681 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
completed; freed=19.42 MB, total=145.07 MB, single=86.35 MB, multi=72.5 MB,
memory=4.05 MB
2014-02-22 16:15:25,963 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
started; Attempting to free 19.37 MB of total=164.38 MB
2014-02-22 16:15:25,965 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
completed; freed=19.37 MB, total=145.13 MB, single=86.36 MB, multi=72.5 MB,
memory=4.05 MB
2014-02-22 16:15:26,310 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
started; Attempting to free 19.35 MB of total=164.36 MB
2014-02-22 16:15:26,312 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
completed; freed=19.35 MB, total=145.01 MB, single=86.22 MB, multi=72.5 MB,
memory=4.05 MB

etc. etc. ... no ERRORs or anything



--
View this message in context: http://lucene.472066.n3.nabble.com/Please-help-Nutch-fetch-command-not-fetching-data-tp3764751p4118965.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Re: Please help - Nutch fetch command not fetching data

Posted by sumant <su...@gmail.com>.
Hi,

Please refer below link:

1]
http://lucene.472066.n3.nabble.com/Nutch-2-with-Cassandra-as-a-storage-is-not-crawling-data-properly-td4188115.html

and 

2] https://issues.apache.org/jira/browse/GORA-416

Please verify your hadoop.log file and check if you find any log like :

WARN mapreduce.GoraRecordWriter - Exception at GoraRecordWriter.class while
closing datastore.InvalidRequestException(why:supercolumn parameter is not
optional for super CF sc) 

If yes then there is already bug logged on link [2].

Please let me know if your issue is not related to the one i mentioned.




--
View this message in context: http://lucene.472066.n3.nabble.com/Please-help-Nutch-fetch-command-not-fetching-data-tp3764751p4194608.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Re: Please help - Nutch fetch command not fetching data

Posted by Bayu Widyasanyata <bw...@gmail.com>.
Hi,

Have you check the hadoop.log?

Re: Re: Please help - Nutch fetch command not fetching data

Posted by glumet <ja...@gmail.com>.
Unfortunately, it didn't solve the problem. I run /bin/crawl script many
times and everything worked fine but suddenly something went wrong and
nothing is fetched... I have suspicion that generator writes that he
generated batch with ID XXXXXX-XXX but no urls are actually generated... and
it could be reason that nothing is fetched.

Does anybody know how to solve it?



--
View this message in context: http://lucene.472066.n3.nabble.com/Please-help-Nutch-fetch-command-not-fetching-data-tp3764751p4118946.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Re: Please help - Nutch fetch command not fetching data

Posted by 钟逊 <kk...@gmail.com>.
Hello, I would like to ask if you solved the problem? I have exactly same
situation right now. 

Generator generates batchId, fetcher gives this batchId but the result is:

.
.
.
0/0 spinwaiting/active, 0 pages, 0 errors, 0.0 0 pages/s, 0 0 kb/s, 0 URLs
in 0 queues
-activeThreads=0
FetcherJob: done

and no urls are fetched.



--
View this message in context: http://lucene.472066.n3.nabble.com/Please-help-Nutch-fetch-command-not-fetching-data-tp3764751p4118565.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Re: Please help - Nutch fetch command not fetching data

Posted by 钟逊 <kk...@gmail.com>.
Hello, I would like to ask if you solved the problem? I have exactly same
situation right now. 

Generator generates batchId, fetcher gives this batchId but the result is:

.
.
.
0/0 spinwaiting/active, 0 pages, 0 errors, 0.0 0 pages/s, 0 0 kb/s, 0 URLs
in 0 queues
-activeThreads=0
FetcherJob: done

and no urls are fetched.



--
View this message in context: http://lucene.472066.n3.nabble.com/Please-help-Nutch-fetch-command-not-fetching-data-tp3764751p4118565.html
Sent from the Nutch - User mailing list archive at Nabble.com.