You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Shawn Heisey (JIRA)" <ji...@apache.org> on 2016/03/16 20:32:33 UTC

[jira] [Commented] (SOLR-2613) DIH Cache backed w/bdb-je

    [ https://issues.apache.org/jira/browse/SOLR-2613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15197981#comment-15197981 ] 

Shawn Heisey commented on SOLR-2613:
------------------------------------

This came up in a discussion on IRC today, talking about nested entity situations where the inner entities have a very large number of rows, so memory-based caches would require far more memory than the machine can hold.

The Oracle Berkeley DB implementation was specifically mentioned, which is why I'm here instead of opening a new issue.  This is licensed under the AGPL, so we can't distribute it, but I wonder if maybe we could implement enough of an API layer that a user could provide the jar themselves, tell Solr what class will be needed, and be in business.  Is this what the patch on this issue does?  I haven't looked deeply.

Other ideas, which might need a separate issue for disk-based caching implementations:

I had the idea of using SQLite for caching in a single-file database.  SQLite is public domain, and there are ways to access it from Java.

Even just a simple implementation that writes little files to the disk would work.  To avoid tons of files in a single directory, perhaps this idea could get a 32-bit hash of the key and write to a four-level directory structure where each directory is two hex characters.  df/8c/12/b5

A disk-based solution would not be as fast as the memory-based solution already available, but as long as it was on a local physical disk, it would probably be faster than making N+1 queries to a remote database.

> DIH Cache backed w/bdb-je
> -------------------------
>
>                 Key: SOLR-2613
>                 URL: https://issues.apache.org/jira/browse/SOLR-2613
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0-ALPHA
>            Reporter: James Dyer
>            Priority: Minor
>         Attachments: SOLR-2613.patch, SOLR-2613.patch, SOLR-2613.patch, SOLR-2613.patch, SOLR-2613.patch, SOLR-2613.patch, SOLR-2613.patch
>
>
> This is spun out of SOLR-2382, which provides a framework for multiple cacheing implementations with DIH.  This cache implementation is fast & flexible, supporting persistence and delta updates.  However, it depends on Berkley Database Java Edition so in order to evaluate this and use it you must download bdb-je from Oracle and accept the license requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org