You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by Leo Woessner <es...@gmail.com> on 2016/02/18 17:35:55 UTC

key value store restore time

We are starting to use the key-value store with rocksdb.  We are trying to
offically add Samza to our stack and functionally everything is great. But,

I am seeing minutes to hours restore time.  Does anyone have any benchmarks
on data size versus restore time?  My big question is how will this scale.

Thanks in advance

-- 
Leo Woessner

Re: key value store restore time

Posted by Tao Feng <fe...@gmail.com>.
Hi Leo,

At linkedin when we switched to using RocksDB for Samza last year, we did
some tests to see how well RocksDB performs. We used the rocksdb
microbenchmark(
https://github.com/facebook/rocksdb/blob/master/java/benchmark/src/main/java/org/rocksdb/benchmark/DbBenchmark.java)
to conduct serval tests. For sequential write (10 bytes key, 800 bytes
value, 1,000,000,000 entries), Rocksdb write throughput is around 311 MB
/sec with SSD. You could take a look at the result (
https://issues.apache.org/jira/secure/attachment/12723431/2015-04-06%20RocksDB%20Performance.pdf)
from SAMZA-543 attachment.

When Samza restore data in RocksDB, it is doing RocksDB db put operation
for entry(RocksDbKeyValueStore->putAll). And it takes time to reseed if
your changelog is huge. Hence Samza 0.10 introduce Yarn host-affinity
feature which Jagadish mentions. This should help to solve the long RocksDB
restore time in most cases.

Thanks,
-Tao

On Thu, Feb 18, 2016 at 8:35 AM, Leo Woessner <es...@gmail.com> wrote:

> We are starting to use the key-value store with rocksdb.  We are trying to
> offically add Samza to our stack and functionally everything is great. But,
>
> I am seeing minutes to hours restore time.  Does anyone have any benchmarks
> on data size versus restore time?  My big question is how will this scale.
>
> Thanks in advance
>
> --
> Leo Woessner
>

Re: key value store restore time

Posted by Jagadish Venkatraman <ja...@gmail.com>.
Samza 0.10 introduces the feature of Yarn host affinity for this exact
reason. For jobs that need to bootstrap lots of state, downtime during
bootstrapping is not acceptable. In our production usecases, we've observed
bootstrap times from 25 mins to about 30 seconds.

Please refer
https://samza.apache.org/learn/documentation/0.10/yarn/yarn-host-affinity.html
for configs to  take advantage of this feature.

On Thu, Feb 18, 2016 at 8:35 AM, Leo Woessner <es...@gmail.com> wrote:

> We are starting to use the key-value store with rocksdb.  We are trying to
> offically add Samza to our stack and functionally everything is great. But,
>
> I am seeing minutes to hours restore time.  Does anyone have any benchmarks
> on data size versus restore time?  My big question is how will this scale.
>
> Thanks in advance
>
> --
> Leo Woessner
>



-- 
Jagadish V,
Graduate Student,
Department of Computer Science,
Stanford University