You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by "Dimov, Stefan" <st...@sap.com> on 2018/01/05 06:28:20 UTC

Re: Hardware for in-memory storage

Thanks Adam!

S.

On 12/23/17, 3:40 AM, "ajs6f" <aj...@apache.org> wrote:

    There are several in-memory dataset impls for Jena. TIM is fully transactional but requires more space. The older in-memory dataset is more space efficient, but isn't as fully transactional. It would be possible to use TIM's MVCC machinery to build a dataset impl that would be smaller and fully-transactional, but less performant (require scanning in indexes) but I haven't had time to do that.
    
    There is no simple function for determining how much heap you will need to bring a TDB db fully into memory. For one thing, TDB uses a dictionary to save space in indexes, and the in-memory impls do not. How much space is being saved by TDB will depend on the actual qualities of your data. (E.g. data with many large literals will have more space saved than data with literals that are few and small.) TDB also relies on the use of off-heap memory for caching.
    
    If you use TIM, how much memory you ultimately need will also depend a little on your pattern of access, since TIM stores triples in covering indexes made from MVCC data structures.
    
    The amount of memory TIM will need should increase roughly linearly with more data of the same provenance, so you could take some extracts of your data (0.5G, 1G, 5G, etc) and try loading them into TIM to get an idea of what the full dataset might require. That should be true of the older in-memory dataset impl, too.
    
    Adam
    
    > On Dec 22, 2017, at 3:32 PM, Dimov, Stefan <st...@sap.com> wrote:
    > 
    > Hi all,
    > 
    > Let’s say I have a Jena/Fuseki setup with TDB that takes 100G disk space. What hardware do I need to run it as in-memory DB?
    > 
    > Regards,
    > Stefan