You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by cervenkovab <ce...@gmail.com> on 2014/02/19 21:58:19 UTC

Re: WrongRegionException after updatedb

Hi, thanks for hint.
We repaired the HBase regions with /hbase hbck -fixMeta -fixAssignments./
Don't know, what caused this problem, but it happend when we want to stop
this to be still logged:

/INFO  store.HBaseStore - Keyclass and nameclass match but mismatching table
names  mappingfile schema is 'webpage' vs actual schema 'webpage_webpage' ,
assuming they are the same./

This is in class HBaseStore

if (!tableName.equals(tableNameFromMapping)) {
            LOG.info("Keyclass and nameclass match but mismatching table
names " 
                + " mappingfile schema is '" + tableNameFromMapping 
                + "' vs actual schema '" + tableName + "' , assuming they
are the same.");
            if (tableNameFromMapping != null) {
              mappingBuilder.renameTable(tableNameFromMapping, tableName);
            }
          }

We tried all possible configurations (-crawlID on command line, nutch-site,
gora-hbase-mapping.xml), but still getting this strange INFO. When we change
in gora-hbase-mapping.xml table name to /webpage_webpage/ it creates a table
named /webpage_webpage_webpage/. 

 Is there any guide how to configure this? 

For example I want to run crawl in table named "first_webpage" and then (for
testing purposes) "second_webpage". Is there any possibility to do that? I
was assuming it can be done by this schema.prefix property, but it not works
for me (maybe I am wrong).
Thanks in advance



--
View this message in context: http://lucene.472066.n3.nabble.com/WrongRegionException-after-updatedb-tp4112982p4118418.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: WrongRegionException after updatedb

Posted by cervenkovab <ce...@gmail.com>.
Hi,
well the issue apears again after running update and possibly causes that
pages are fetched no more.

Hadoop log:
/2014-03-10 23:24:13,240 ERROR store.HBaseStore - Failed 1418 actions:
WrongRegionException: 1418 times, servers with issues: localhost:33227, 
2014-03-10 23:24:13,241 ERROR store.HBaseStore -
[Ljava.lang.StackTraceElement;@6ce27643/


Hbase log:
/2014-03-10 23:26:33,793 INFO org.apache.zookeeper.server.NIOServerCnxn:
Accepted socket connection from /127.0.0.1:42526
2014-03-10 23:26:33,794 INFO org.apache.zookeeper.server.NIOServerCnxn:
Refusing session request for client /127.0.0.1:42526 as it has seen zxid
0x1602 our last zxid is 0xff1 client must try another server/


My stats:
/2014-03-10 23:05:11,393 INFO  crawl.WebTableReader - TOTAL urls:	510594
2014-03-10 23:05:11,393 INFO  crawl.WebTableReader - status 4
(status_redir_temp):	13947
2014-03-10 23:05:11,393 INFO  crawl.WebTableReader - status 5
(status_redir_perm):	6303
2014-03-10 23:05:11,393 INFO  crawl.WebTableReader - max score:	863.04
2014-03-10 23:05:11,393 INFO  crawl.WebTableReader - retry 64:	1
2014-03-10 23:05:11,393 INFO  crawl.WebTableReader - status 34
(status_retry):	692
2014-03-10 23:05:11,394 INFO  crawl.WebTableReader - status 3 (status_gone):
22738
2014-03-10 23:05:11,394 INFO  crawl.WebTableReader - *status 1
(status_unfetched):	255963*
2014-03-10 23:05:11,394 INFO  crawl.WebTableReader - status 0 (null):	1753
2014-03-10 23:05:11,394 INFO  crawl.WebTableReader - retry 156:	1
2014-03-10 23:05:11,394 INFO  crawl.WebTableReader - avg score:	0.40301135
2014-03-10 23:05:11,394 INFO  crawl.WebTableReader - WebTable statistics:
done
/


Here I found some thread where they had same exception but they resolbe it
by repairing client:
/http://zookeeper-user.578899.n2.nabble.com/Session-refused-zxid-too-high-td7577866.html/

Any idea how to make it fetching and updating properly again?

Thanks in advance




--
View this message in context: http://lucene.472066.n3.nabble.com/WrongRegionException-after-updatedb-tp4112982p4122709.html
Sent from the Nutch - User mailing list archive at Nabble.com.