You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Adam Kimball <ak...@healthwise.org> on 2016/01/14 18:57:26 UTC

Sizing an SDB instance?

HI all,

I’m working with a vendor to purchase a taxonomy mgmt tool.  The vendor uses SDB and we’ll need to setup a database with sufficient cpu/ram to support the tool.  The vendor, perhaps surprisingly, doesn’t have any data that can help me with sizing the DB.  I’m planning on storing 50m triples off the bat.  About 1/3 of the properties will be data types but generally shorter than 256 characters.

My hunch is:


  *   This is a relatively small amount of data from a disk perspective – 256GB of SSD would be more than enough
  *   CPU wise a standard 2 proc 4 core machine would be more than enough

Thoughts?  Is there a better way of getting these questions answered?

Thanks!
-Adam

Re: Sizing an SDB instance?

Posted by Håvard Mikkelsen Ottestad <ha...@acando.no>.
Any chance you are planning to purchase Topbraid EVN?


Håvard



On 14/01/16 18:57, "Adam Kimball" <ak...@healthwise.org> wrote:

>HI all,
>
>I’m working with a vendor to purchase a taxonomy mgmt tool.  The vendor uses SDB and we’ll need to setup a database with sufficient cpu/ram to support the tool.  The vendor, perhaps surprisingly, doesn’t have any data that can help me with sizing the DB.  I’m planning on storing 50m triples off the bat.  About 1/3 of the properties will be data types but generally shorter than 256 characters.
>
>My hunch is:
>
>
>  *   This is a relatively small amount of data from a disk perspective – 256GB of SSD would be more than enough
>  *   CPU wise a standard 2 proc 4 core machine would be more than enough
>
>Thoughts?  Is there a better way of getting these questions answered?
>
>Thanks!
>-Adam

Re: Sizing an SDB instance?

Posted by Andy Seaborne <an...@apache.org>.
On 15/01/16 13:22, Adam Kimball wrote:
> Thanks for that great response.  I appreciate it a ton.
>
> We do typically virtualize our environments - and further we often buy high-performance blades and then run multiple vm's against each one.  So, your comments concern me a little bit.  If absolutely essential, I could get the database running on a physical machine but I'd need to verify insufficient performance in the VM first.
>
> Do you have any other tips for me regarding this aspect?

Talk to your IT department.  They really are your friends!

A VM that is provisioned to run a database is fine - it is when the 
provisioning ticket says something like "it's an application server". 
If it's known to be a DB, it'll get set up correctly.

What I've seen is that apps are assumed to need less RAM, don't need it 
all the time so are densely packed, eg more VM RAM allocated that 
physical RAM.  TDB is more prone to that - you have an SQL database so 
its some unknown.

The geo separation is to do with the fact that SDB goes over JDBC so 
latencies can accumulate.  If it is the same machine, or same 
datacenter, then unlikely to be trouble.

Any/all IT stacks have become quite complicated these days. Ultimately, 
the only way is to test in your environment.

	Andy

>
> Thanks again,
> Adam
>
> ________________________________________
> From: Andy Seaborne [andy@apache.org]
> Sent: Friday, January 15, 2016 2:43 AM
> To: users@jena.apache.org
> Subject: Re: Sizing an SDB instance?
>
> On 14/01/16 17:57, Adam Kimball wrote:
>> HI all,
>>
>> I’m working with a vendor to purchase a taxonomy mgmt tool.  The vendor uses SDB and we’ll need to setup a database with sufficient cpu/ram to support the tool.  The vendor, perhaps surprisingly, doesn’t have any data that can help me with sizing the DB.  I’m planning on storing 50m triples off the bat.  About 1/3 of the properties will be data types but generally shorter than 256 characters.
>>
>> My hunch is:
>>
>>
>>     *   This is a relatively small amount of data from a disk perspective – 256GB of SSD would be more than enough
>>     *   CPU wise a standard 2 proc 4 core machine would be more than enough
>>
>> Thoughts?  Is there a better way of getting these questions answered?
>>
>> Thanks!
>> -Adam
>>
>
> Hi Adam,
>
> (1) Yes and (2) yes.
>
>
> 1/ Normal data is very roughly 5m triples to the 1G of disk, so you
> should have plenty of room.
>
> Caveats:
>     It assumes it is not dominated by long literals.
>     It can be less (e.g. the ratio of triples/nodes is high)
>     It assumes no compression.
>
> (I did a very, very quick check with SDB/MySQL and with TDB on some BSBM
> data and got those sort of numbers).
>
>
> 2/ This one is more dependent on workload and environment.
>
>
> The amount of RAM matters but at 50e6, triples, 10G disk,
> a machine can cache the working set at 32G or greater.  (Database tuning
> may be needed.)
>
> It's a SQL database - all the usual applies.
>
> If the machine has any other applications or databases, database
> performance is impacted.
>
> If the machine is VM'ed, then that can cause poor performance.
>
> If the machine is VM'ed and running on hardware supporting several VMs,
> then performance can be poor, erratic and thoroughly mysterious.
>
> If the database is a long way away from the SDB application/engine (e.g.
> different datacenters), it can impact performance.
>
> Having an SSD is good - the cold start performance is better.
>
>          Andy
>


RE: Sizing an SDB instance?

Posted by Adam Kimball <ak...@healthwise.org>.
Thanks for that great response.  I appreciate it a ton.  

We do typically virtualize our environments - and further we often buy high-performance blades and then run multiple vm's against each one.  So, your comments concern me a little bit.  If absolutely essential, I could get the database running on a physical machine but I'd need to verify insufficient performance in the VM first.

Do you have any other tips for me regarding this aspect?

Thanks again,
Adam

________________________________________
From: Andy Seaborne [andy@apache.org]
Sent: Friday, January 15, 2016 2:43 AM
To: users@jena.apache.org
Subject: Re: Sizing an SDB instance?

On 14/01/16 17:57, Adam Kimball wrote:
> HI all,
>
> I’m working with a vendor to purchase a taxonomy mgmt tool.  The vendor uses SDB and we’ll need to setup a database with sufficient cpu/ram to support the tool.  The vendor, perhaps surprisingly, doesn’t have any data that can help me with sizing the DB.  I’m planning on storing 50m triples off the bat.  About 1/3 of the properties will be data types but generally shorter than 256 characters.
>
> My hunch is:
>
>
>    *   This is a relatively small amount of data from a disk perspective – 256GB of SSD would be more than enough
>    *   CPU wise a standard 2 proc 4 core machine would be more than enough
>
> Thoughts?  Is there a better way of getting these questions answered?
>
> Thanks!
> -Adam
>

Hi Adam,

(1) Yes and (2) yes.


1/ Normal data is very roughly 5m triples to the 1G of disk, so you
should have plenty of room.

Caveats:
   It assumes it is not dominated by long literals.
   It can be less (e.g. the ratio of triples/nodes is high)
   It assumes no compression.

(I did a very, very quick check with SDB/MySQL and with TDB on some BSBM
data and got those sort of numbers).


2/ This one is more dependent on workload and environment.


The amount of RAM matters but at 50e6, triples, 10G disk,
a machine can cache the working set at 32G or greater.  (Database tuning
may be needed.)

It's a SQL database - all the usual applies.

If the machine has any other applications or databases, database
performance is impacted.

If the machine is VM'ed, then that can cause poor performance.

If the machine is VM'ed and running on hardware supporting several VMs,
then performance can be poor, erratic and thoroughly mysterious.

If the database is a long way away from the SDB application/engine (e.g.
different datacenters), it can impact performance.

Having an SSD is good - the cold start performance is better.

        Andy

Re: Sizing an SDB instance?

Posted by Andy Seaborne <an...@apache.org>.
On 14/01/16 17:57, Adam Kimball wrote:
> HI all,
>
> I’m working with a vendor to purchase a taxonomy mgmt tool.  The vendor uses SDB and we’ll need to setup a database with sufficient cpu/ram to support the tool.  The vendor, perhaps surprisingly, doesn’t have any data that can help me with sizing the DB.  I’m planning on storing 50m triples off the bat.  About 1/3 of the properties will be data types but generally shorter than 256 characters.
>
> My hunch is:
>
>
>    *   This is a relatively small amount of data from a disk perspective – 256GB of SSD would be more than enough
>    *   CPU wise a standard 2 proc 4 core machine would be more than enough
>
> Thoughts?  Is there a better way of getting these questions answered?
>
> Thanks!
> -Adam
>

Hi Adam,

(1) Yes and (2) yes.


1/ Normal data is very roughly 5m triples to the 1G of disk, so you 
should have plenty of room.

Caveats:
   It assumes it is not dominated by long literals.
   It can be less (e.g. the ratio of triples/nodes is high)
   It assumes no compression.

(I did a very, very quick check with SDB/MySQL and with TDB on some BSBM 
data and got those sort of numbers).


2/ This one is more dependent on workload and environment.


The amount of RAM matters but at 50e6, triples, 10G disk,
a machine can cache the working set at 32G or greater.  (Database tuning 
may be needed.)

It's a SQL database - all the usual applies.

If the machine has any other applications or databases, database 
performance is impacted.

If the machine is VM'ed, then that can cause poor performance.

If the machine is VM'ed and running on hardware supporting several VMs, 
then performance can be poor, erratic and thoroughly mysterious.

If the database is a long way away from the SDB application/engine (e.g. 
different datacenters), it can impact performance.

Having an SSD is good - the cold start performance is better.

	Andy