You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Otis Gospodnetic <ot...@gmail.com> on 2013/11/27 05:32:36 UTC

Nutch, HBase, slow scans and FuzzyRowFilter

Hi,

I'm not intimately familiar with Nutch 2.x's use of HBase, but it seems a
lot of time is spend on various scans, even with improvements in GORA-119
and such.

Would
http://blog.sematext.com/2012/08/09/consider-using-fuzzyrowfilter-when-in-need-for-secondary-indexes-in-hbase/
help
speed things up?

If you look at the comments, you'll see the author of Phoenix, who
essentially borrowed this idea and added it to Phoenix with great success.

See:
http://phoenix-hbase.blogspot.com/2013/05/demystifying-skip-scan-in-phoenix
.html
https://issues.apache.org/jira/browse/HBASE-6618

For those of you who know more about how Nutch 2.x uses HBase - could this
help speed things up?

Thanks,
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/