You are viewing a plain text version of this content. The canonical link for it is here.

Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2018/08/30 06:48:59 UTC

[GitHub] snoami opened a new issue #624: Performance with midsize table size

snoami opened a new issue #624: Performance with midsize table size
URL: https://github.com/apache/accumulo/issues/624
 
 
   I have a table with about 4M rows in Accumulo, which is grouped to about 90K entities with different rowIds, and it takes about 10 sec to scan all the table from the Java API when I scan with large rowId range or ranges. I have also tried BatchScan ranges where each rowId is its own range. Scan -np on the Accumulo shell gives similar result.
   I have tried it with different configurations on very strong machines @AWS and the performance does not get better than about 10s. Is that reasonable? It seems very slow but maybe that is what Accumulo can do. Does anyone have experience with such queries? Using: Accumulo 1.7.4 Hadoop 2.8.1 4vCPU, 16GB RAM (AWS m5d.xlarge), 5 servers running Accumulo tservers and Hadoop data nodes, one name node running Accumulo master.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services