You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by kiran chitturi <ch...@gmail.com> on 2013/01/29 16:42:50 UTC

Inlinks not being saved in the database

Hi!

I have noticed that inlinks are not being saved even though the property
(db.ignore.internal.links) is changed to 'false'. I have created a jira on
this issue (https://issues.apache.org/jira/browse/NUTCH-1524).

I am using Nutch 2.x code and Hbase as the backend. I was trying to debug
this issue using Eclipse and i have noticed that configuration property
(db.ignore.internal.links) is recognized and also, i could see the
'inlinks' variable in the class 'Webpage' getting updated with inlink and
the value.

This is happening during the DbUpdateReducer class and it calls
putToInlinks function. During that state, i could see the page having other
properties like fetch time, markers and inlinks but when i check in hbase
shell after the DbUpdaterJob is finished. there are no inlinks showed there
but only the fetch family and markers.

I was not able to identify where actually the saving of inlinks is omitted.
Did anyone face a similar issue ?

Please let me know your suggestions on debugging this error.

Thanks,

-- 
Kiran Chitturi

Re: Inlinks not being saved in the database

Posted by brian4 <bq...@gmail.com>.
Thank you for posting.  I have found the same thing (also using HBase) and am
currently trying to figure out where it's going wrong.

Have you been able to figure out a solution since posting?

I am at about the same stage you were - I confirmed all the inlinks are
being stored correctly in the "page" variable in the reduce method (I can do
"page.getInLinks()" there to get them), but for whatever reasons these are
not being copied over to the Hbase table - I manually checked the table and
no inlinks are being stored there (after checking the db dump with nutch).  





--
View this message in context: http://lucene.472066.n3.nabble.com/Inlinks-not-being-saved-in-the-database-tp4037067p4074861.html
Sent from the Nutch - Dev mailing list archive at Nabble.com.