You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by Andy Seaborne <an...@apache.org> on 2013/06/05 12:32:44 UTC

Re: [DISCUSS] SDB future


On 28/05/13 18:22, Rob Vesse wrote:
> Personally I would vote for Option 3, none of us are maintaining it nor do
> we have the time/interest to do so.  To continue to put out releases
> implies a level of support that does not really exist.
>
> Over in dotNetRDF we long ago abandoned any use of SQL backed storage
> because it just doesn't scale as well as a native triple store.
>
> I understand the points about why people may prefer SDB over TDB but
> whatever we choose to do we should start making it clear to people that
> SDB is deprecated/dormant and we recommend they investigate other
> alternatives.
>
> Let's also be clear that TDB is not the only alternative it is just the
> one that comes with Jena.  Most of the big triple store vendors have Jena
> providers that will let you access commercially developed triple stores
> via Jena (e.g. Virtuoso, Stardog, OWLIM, Oracle, IBM DB2 etc).  Many of
> these commercial stores include the type of clustering, replication and
> failover capabilities that Simon is highlighting as being desirable in
> highly scalable production scenarios.

Good point that there is Jena-as-client, able to work with any SPARQL 
compliant system (that's what standards are for, right?!), 
Jena-as-framework (additional storage options) and 
Jena-as-database-provider + Jena-as-SPARQL server.  This may not come 
across clearly enough - if not, what can we do to fix that?

	Andy

>
> Rob
>
>
>
> On 5/27/13 7:43 AM, "Simon Helsen" <sh...@ca.ibm.com> wrote:
>
>> Andy, others,
>>
>> we do not use SDB because it is way too slow for us. Although I'm sure it
>> can be improved as you suggested below, we do not believe it will ever
>> come close to TDBs performance because of how SDB is designed. The fact
>> it
>> is used at all keeps surprising me, but it probably doesn't matter for
>> simple use cases especially if the dataset remains small. Btw, a little
>> while back we were reconsidering it because SDB supports multiple vendors
>> and our use-case was not very performance sensitive, but it turned out
>> that it was still too slow for our needs
>>
>> As for motivations why some people may prefer SDB over TDB, I don't think
>> it is just "SQL" and corporate acceptance. There are some very good
>> reasons why file-based systems like TDB are difficult to use in
>> commercial
>> deployments. Corporate Java-based server deployments are almost always
>> based on one or more app servers (JEE, but sometimes just a regular web
>> server) where *all* persisted state goes into a relational database.
>> Storing state on the file system is generally taboo for many reasons
>> including: The inability to cluster the app server - a critical step to
>> scale up beyond a few hundred users; the fact that organizations
>> generally
>> have a hard time understanding and managing file system state as opposed
>> to standard relational database management. For instance, how do you
>> perform online backups and when and how does the state corrupt. For
>> instance, on a DB server, admins are watchful for running out of disk
>> space. This sort of monitoring is usually less critical on the app
>> server,
>> but now, suddenly, there is this growing "thing" they have to backup
>> because it will corrupt if you run out of disk space. On top of that, DB
>> servers usually use very fast and expensive disk systems (RAIDs, SSDs,
>> etc.) This is usually not the case for the app server. On top of that,
>> when customers realize there is this large set of data on the file
>> system,
>> they have a tendency to put it on larger disks connected via NFS, which
>> is
>> unfortunately very dangerous because even short network glitches can
>> corrupt TDB. All of this is manageable if you carefully encapsulate TDB
>> and provide good administration tools on top of your system, but it is
>> not
>> trivial, and it doesn't come out of the box. Especially cluster
>> management
>> is quite tricky as you can imagine.
>>
>> Just wanted to set that out there as to why SQL-based systems remain
>> attractive. I would advise though to more clearly state on the SDB
>> download page that SDB is deprecated and no longer actively supported
>> (unless that changes of course)
>>
>> Simon
>>
>>
>>
>> From:
>> Andy Seaborne <an...@apache.org>
>> To:
>> dev@jena.apache.org,
>> Date:
>> 05/25/2013 06:19 AM
>> Subject:
>> [DISCUSS] SDB future
>>
>>
>>
>> Yes - I'm conflicted as well, flip-flopping between opt1 and opt3.
>>
>> There is enough user@ traffic to suggest it's used.  I'm guessing it's
>> the "SQL" part makes it easier in corp IT. TDB is faster, scales better
>> and is better supported (and I'm not corp IT bound).
>>
>> There are ways to improve it's performance - pushing some filters into
>> SQL for example - so theres scoep for development.
>>
>> Option 1 - add to the main distribution, remove if it becomes a block -
>> means that there is no additional work on a release vote.
>>
>> Testing SDB using Derby only (that's what the junit does by default) is
>> easy to setup because it's pulled in by maven.  It only runs embedded,
>> not as a server but it does check the code generation.  Unliek other
>> jjava SQL DBs, Derby implements tree join plans (the other onyl do
>> linear join plans which makes some optional cases impossible - the code
>> fall back to brute-force-and-ignorance in these cases). Derby is quite
>> picky about it's SQL 92.  SDB tests without additional setup.
>>
>> We state on users@ this is the position as a indication that we stil
>> have option 3 (retirement) available.  Unless we shake the tree a bit,
>>
>> Proposal: (option 1)
>>
>>    add to apache-jena, remove at the first sign of trouble.
>>    make a clear statement of situation on users@ including
>>      encouraging people to come forward
>>    option 3 still on the cards.
>>
>> I've added DISCUSS to the subject line for now to leave open the
>> possibility of a vote because it affects the whole project.
>>
>> But all PMC chipping in here is enough.
>>
>> I don't see a rush to make a choice just yet.
>>
>>                  Andy
>>
>> On 22/05/13 11:47, Claude Warren wrote:
>>> I am conflicted about this one.  I think we need SDB (or something
>> similar)
>>> that will allow users to use standard infrastructure (shared/pooled DB
>> is
>>> fairly common).  But I don't have the bandwidth to support it.
>>>
>>> Is there a status where in we release SDB package with each release of
>> Jena
>>> but only ensure that the current test cases work -- that is the latest
>>> release doesn't break something?  Perhaps with a reduced set of
>> supported
>>> DBs (perhaps Derby & MySQL)
>>>
>>> If not then I think we take Andy's approach and release one more before
>>> putting it on the shelf.
>>>
>>>
>>> On Wed, May 22, 2013 at 10:35 AM, Charles Li
>> <ch...@gmail.com>wrote:
>>>
>>>> What are alternatives to SDB? I have a 4GB RDF/XML to load for later
>>>> queries
>>>>
>>>> Thanks!
>>>> - Charles
>>>>
>>>> On May 21, 2013, at 12:00 PM, Stephen Owens
>> <st...@thoughtwire.ca>
>>>> wrote:
>>>>
>>>>> +1 for option 3 if no one currently is taking ownership of that
>> project.
>>>> I think it's a useful signal to potential adopters about what they
>> should
>>>> expect.
>>>>>
>>>>> On 2013-05-21, at 12:47 PM, Andy Seaborne <an...@apache.org> wrote:
>>>>>
>>>>>> SDB is getting some user attention but not much developer attention.
>> I
>>>> was hoping that there would be a contribution to go with JENA-447 but
>>>> nothing has come in.  I don't have the bandwidth to even answer
>> questions
>>>> about it properly, partly because I don't use it.  I guess others are
>> in a
>>>> similar position.
>>>>>>
>>>>>> I do think we should be clear as to it's status.
>>>>>>
>>>>>> In the future, I see these options:
>>>>>>
>>>>>> 1/ Add jena-sdb to the main distribution.
>>>>>>    (If it becomes a block on a release, remove it.)
>>>>>>
>>>>>> 2/ As is - release "sometimes".
>>>>>>
>>>>>> 3/ Dormant SDB.
>>>>>>    This is the last release unless some activity arises to maintain
>> it.
>>>>>>    Keep the source around but move out of trunk.
>>>>>>    Can be built from source.
>>>>>>
>>>>>> 4/ Legacy SDB.
>>>>>>    More definite statement than (3) that it is dropped.
>>>>>>    Keep the source around.
>>>>>>
>>>>>> For 3 and 4, where there are no plans to release again if nothing
>>>> changes, the snapshot builds should be stopped.  Users can build from
>>>> source if they want to but the current snapshot should not become a
>>>> distribution-under-the-radar which I feel it becomes if there are no
>> plans
>>>> to make it a formal release.
>>>>>>
>>>>>> Thoughts?
>>>>>>
>>>>>> I'm tending towards doing this one last release then (3).
>>>>>>
>>>>>>     Andy
>>>>
>>>
>>>
>>>
>>
>>
>>
>