You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "André Lanka (JIRA)" <ji...@apache.org> on 2013/11/21 13:46:36 UTC

[jira] [Commented] (JENA-524) Global Cache for servers hosting a large number of TDB stores

    [ https://issues.apache.org/jira/browse/JENA-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13828882#comment-13828882 ] 

André Lanka commented on JENA-524:
----------------------------------

Sorry Andy, I missed your comments....

It's faster to compare only a single long value than using equals/hashCode for two objects (file ref, blockId). This is the reason why we map the path and the blockid to one long value. Of course it's a drawback to put it all to a Long-based Map later (because of (un)boxing the long values). As we plan to introduce a long-based (not Long-based) LinkedHashMap, we could profit even more from the single-long-approach.

Yes, the mappers (FilenameAndBlockIDLongMapper and FilenameAndFilePosLongMapper) are essentially the same. The only difference are the bits for the first/second part of the long value. The reason is that the Block-mapper has to manage more files with less block numbers (each is 8kB) whereas the ID-Mapper (for the node IDs) has to manage less files but larger values for the position in file. Of course a single class with parametrized bit values could also handle this. Yet, the access of a public static final variable is way faster than a instance-dependent variable. This is why I used this slightly irritating approach.

> Global Cache for servers hosting a large number of TDB stores
> -------------------------------------------------------------
>
>                 Key: JENA-524
>                 URL: https://issues.apache.org/jira/browse/JENA-524
>             Project: Apache Jena
>          Issue Type: New Feature
>          Components: TDB
>    Affects Versions: TDB 0.10.1
>            Reporter: André Lanka
>            Priority: Minor
>              Labels: patch
>         Attachments: patch_hojoki_global_cache.txt
>
>
> Hello,
> we (namely Hojoki) use Jena/TDB since a couple of years. We started in 2011 to implement a global cache shared over all TDB stores currently opened on a server. The motivation was that we need to have many TDB stores on a single machine to provide parallel write access to the different graphs. Our goal was to have more than 2000 stores on a single machine. As we have only 8GB of memory for the JVM we can't use appropriate sized local caches for each store.
> So, we decided to implement a global shared cache for both Nodes/NodeIDs and Blocks. We intensively tested our changes with the current TDB version 0.10.1 since it came up and it works well. Currently we host more than 5000 stores on each server, containing more than a billion triples on each server (stored in round about 150-200 GB TDB data). The cache has a size of approximately 500 MB.
> We will be very happy if we can integrate our changes in the official tdb branch. Our cache can be turned on by calling SystemTDB.useGlobalCache(true). If this method is not called, the factories use the original NodeTableCache and the original BlockMgrCache. If it's called, our table and our manager is used. Of course, it has a some overhead, but at least it's possible to have this large number of stores on a single machine.
> We only tested it with FileMode.direct as we only use this mode (for smaller file sizes, and we know for sure when changes a written to disk -- important for our backup mechanism). The cache applies only to the big data files on disk, not to the journal files.
> I can provide a patch I created yesterday against the current snapshot version (I can't find a upload field in this "Create issue"-mask). The patch still contains a few tests that are merely Hojoki specific and it could need a few more general approaches (configuration by config files, instead of code constants and such things).
> Anyways, if you allow us to integrate our changes, I'll improve these parts.
> What do you think?
> Best wishes
> André



--
This message was sent by Atlassian JIRA
(v6.1#6144)