You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Nguyen Ngoc Giang <gi...@gmail.com> on 2006/02/10 12:55:22 UTC

Bug in closing the database?

  Hi everyone,

  I'm constantly encounter this problem when Nutch comes to database closing
stage. The crawler causes my system hung and needs to be restarted. Can
anyone help me to figure out this! Here is my log file before hanging:

060210 161454 Finishing update
060210 161959 Processing pagesByURL: Sorted 25968131 instructions in
305.23seconds.
060210 161959 Processing pagesByURL: Sorted 85077.25649510205instructions/second
060210 162403 Processing pagesByURL: Merged to new DB containing 4088028
records in 133.31 seconds
060210 162403 Processing pagesByURL: Merged 30665.576475883277records/second
060210 162428 Processing pagesByMD5: Sorted 3479369 instructions in
25.146seconds.
060210 162428 Processing pagesByMD5: Sorted 138366.6984808717instructions/second
060210 162529 Processing pagesByMD5: Merged to new DB containing 4088028
records in 56.115 seconds
060210 162529 Processing pagesByMD5: Merged 72850.89548249131 records/second
060210 163426 Processing linksByMD5: Sorted 25926879 instructions in
536.227seconds.
060210 163426 Processing linksByMD5: Sorted 48350.56608488569instructions/second
060210 164415 Processing linksByMD5: Merged to new DB containing 50840622
records in 465.897 seconds
060210 164415 Processing linksByMD5: Merged 109124.16692960032records/second
060210 164840 Processing linksByURL: Sorted 22072525 instructions in
264.309seconds.
060210 164840 Processing linksByURL: Sorted 83510.30422724916instructions/second
060210 165830 Processing linksByURL: Merged to new DB containing 50840622
records in 478.304 seconds
060210 165830 Processing linksByURL: Merged 106293.53298320733records/second
060210 170433 Processing linksByMD5: Sorted 23422502 instructions in
362.723seconds.
060210 170433 Processing linksByMD5: Sorted 64574.0744314532instructions/second
060210 171409 Processing linksByMD5: Merged to new DB containing 50840622
records in 453.612 seconds
060210 171409 Processing linksByMD5: Merged 112079.53493293827records/second

  The weird thing is that Nutch processed linksByMD5 twice, and number of
instructions are not the same.

  Regards,
   Giang