You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "Eric Newton (JIRA)" <ji...@apache.org> on 2014/07/18 06:00:08 UTC

[jira] [Resolved] (ACCUMULO-1417) data storage efficiency

     [ https://issues.apache.org/jira/browse/ACCUMULO-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eric Newton resolved ACCUMULO-1417.
-----------------------------------

    Resolution: Fixed

Code to ingest the Google Books ngrams was added.  I posted some numbers on the efficiency of the ingest and storage [here|http://tinyurl.com/nrvj7xv].

Other key-value stores can compare their numbers, if they like.  Beating compressed CSV's was an unexpected result.


> data storage efficiency
> -----------------------
>
>                 Key: ACCUMULO-1417
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-1417
>             Project: Accumulo
>          Issue Type: Task
>            Reporter: Eric Newton
>
> David Medinets wrote the user's list:
> {quote}
> Are there any published numbers for the amount of disk space used by
> Accumulo versus other products? I'm thinking some dataset like dbpedia
> or something from http://books.google.com/ngrams/datasets. If there is
> not such a comparison, what comparisons would you like to see? What
> about WordNet stored in CSV, MySQL, Cassandra, HBase, and Accumulo?
> WordNet is just a large set of CSV files so it would be a good
> candidate for this concept, I think.
> {quote}
> Good idea.



--
This message was sent by Atlassian JIRA
(v6.2#6252)