You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@samza.apache.org by Roger Hoover <ro...@gmail.com> on 2015/01/17 06:20:31 UTC

Local state write throughput

Hi guys,

I'm testing a job that needs to load 40M records (6GB in Kafka as JSON)
from a bootstrap topic.  The topic has 4 partitions and I'm running the job
using the ProcessJobFactory so all four tasks are in one container.

Using RocksDB, it's taking 19 minutes to load all the data which amounts to
35k records/sec or 5MB/s based on input size.  I ran iostat during this
time as see the disk write throughput is 14MB/s.

I didn't tweak any of the storage settings.

A few questions:
1) Does this seem low?  I'm running on a Macbook Pro with SSD.
2) Do you have any recommendations for improving the load speed?

Thanks,

Roger

Re: Local state write throughput

Posted by Chris Riccomini <cr...@apache.org>.

Hey all,

Another note from Yi:

"""
Here are a few more links regarding to SSD performance:
A comprehensive overview:
http://codecapsule.com/2014/02/12/coding-for-ssds-part-5-access-patterns-and-system-optimizations/
A quick note on filesystem configuration:
http://superuser.com/questions/228657/which-linux-filesystem-works-best-with-ssd/
A blog of test results on different I/O schedulers on SSD:
http://www.phoronix.com/scan.php?page=article&item=linux_iosched_2012&num=1

If we are sure that our main workload generated by RocksDb store is
sequential READ/WRITE, can we check whether we have are using the file
system configuration mentioned in the second link above?
"""

Cheers,
Chris

On Mon, Jan 26, 2015 at 8:14 AM, Jon Bringhurst <
jbringhurst@linkedin.com.invalid> wrote:

> Right now we're mostly running with noop for our most recently installed
> SSDs. Some older ones are running with cfq.
>
> Early on in the development of samza-kv, I tried deadline and noop (in
> place of cfq) and didn't notice a significant change in performance.
> However, I don't have any numbers to back this up, so this observation is
> probably worthless. :) That was also when we were still using LevelDB
> backed KV and a different SSD model and brand, so I agree that testing the
> different schedulers (mostly noop vs deadline) is worth revisiting.
>
> -Jon
>
> On Jan 25, 2015, at 12:29 PM, Roger Hoover <ro...@gmail.com> wrote:
>
> > FYI, for Linux with SSDs, changing the io scheduler to deadline or noop
> can
> > make a 500x improvement.  I haven't tried this myself.
> >
> >
> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/hardware.html#_disks
> >
> > On Tue, Jan 20, 2015 at 9:28 AM, Chris Riccomini <
> > criccomini@linkedin.com.invalid> wrote:
> >
> >> Hey Roger,
> >>
> >> We did some benchmarking, and discovered very similar performance to
> what
> >> you've described. We saw ~40k writes/sec, and ~20 k reads/sec,
> >> per-container, on a Virident SSD. This was without any changelog. Are
> you
> >> using a changelog on the store?
> >>
> >> When we attached a changelog to the store, the writes dropped
> >> significantly (~1000 writes/sec). When we hooked up VisualVM, we saw
> that
> >> the container was spending > 99% of its time in
> KafkaSystemProducer.send().
> >>
> >> We're currently doing two things:
> >>
> >> 1. Working with our performance team to understand and tune RocksDB
> >> properly.
> >> 2. Upgrading the Kafka producer to use the new Java-based API.
> (SAMZA-227)
> >>
> >> For (1), it seems like we should be able to get a lot higher throughput
> >> from RocksDB. Anecdotally, we've heard that RocksDB requires many
> threads
> >> in order to max out an SSD, and since Samza is single-threaded, we could
> >> just be hitting a RocksDB bottleneck. We won't know until we dig into
> the
> >> problem (which we started investigating last week). The current plan is
> to
> >> start by benchmarking RocksDB JNI outside of Samza, and see what we can
> >> get. From there, we'll know our "speed of light", and can try to get
> Samza
> >> as close as possible to it. If RocksDB JNI can't be made to go "fast",
> >> then we'll have to understand why.
> >>
> >> (2) should help with the changelog issue. I believe that the slowness
> with
> >> the changelog is caused because the changelog is using a sync producer
> to
> >> send to Kafka, and is blocking when a batch is flushed. In the new API,
> >> the concept of a "sync" producer is removed. All writes are handled on
> an
> >> async writer thread (though we can still guarantee writes are safely
> >> written before checkpointing, which is what we need).
> >>
> >> In short, I agree, it seems slow. We see this behavior, too. We're
> digging
> >> into it.
> >>
> >> Cheers,
> >> Chris
> >>
> >> On 1/17/15 12:58 PM, "Roger Hoover" <ro...@gmail.com> wrote:
> >>
> >>> Michael,
> >>>
> >>> Thanks for the response.  I used VisualVM and YourKit and see the CPU
> is
> >>> not being used (0.1%).  I took a few thread dumps and see the main
> thread
> >>> blocked on the flush() method inside the KV store.
> >>>
> >>> On Sat, Jan 17, 2015 at 7:09 AM, Michael Rose <el...@gmail.com>
> >>> wrote:
> >>>
> >>>> Is your process at 100% CPU? I suspect you're spending most of your
> >>>> time in
> >>>> JSON deserialization, but profile it and check.
> >>>>
> >>>> Michael
> >>>>
> >>>> On Friday, January 16, 2015, Roger Hoover <ro...@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> Hi guys,
> >>>>>
> >>>>> I'm testing a job that needs to load 40M records (6GB in Kafka as
> >>>> JSON)
> >>>>> from a bootstrap topic.  The topic has 4 partitions and I'm running
> >>>> the
> >>>> job
> >>>>> using the ProcessJobFactory so all four tasks are in one container.
> >>>>>
> >>>>> Using RocksDB, it's taking 19 minutes to load all the data which
> >>>> amounts
> >>>> to
> >>>>> 35k records/sec or 5MB/s based on input size.  I ran iostat during
> >>>> this
> >>>>> time as see the disk write throughput is 14MB/s.
> >>>>>
> >>>>> I didn't tweak any of the storage settings.
> >>>>>
> >>>>> A few questions:
> >>>>> 1) Does this seem low?  I'm running on a Macbook Pro with SSD.
> >>>>> 2) Do you have any recommendations for improving the load speed?
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> Roger
> >>>>>
> >>>>
> >>
> >>
>
>

Re: Local state write throughput

Posted by Jon Bringhurst <jb...@linkedin.com.INVALID>.

Right now we're mostly running with noop for our most recently installed SSDs. Some older ones are running with cfq.

Early on in the development of samza-kv, I tried deadline and noop (in place of cfq) and didn't notice a significant change in performance. However, I don't have any numbers to back this up, so this observation is probably worthless. :) That was also when we were still using LevelDB backed KV and a different SSD model and brand, so I agree that testing the different schedulers (mostly noop vs deadline) is worth revisiting.

-Jon

On Jan 25, 2015, at 12:29 PM, Roger Hoover <ro...@gmail.com> wrote:

> FYI, for Linux with SSDs, changing the io scheduler to deadline or noop can
> make a 500x improvement.  I haven't tried this myself.
> 
> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/hardware.html#_disks
> 
> On Tue, Jan 20, 2015 at 9:28 AM, Chris Riccomini <
> criccomini@linkedin.com.invalid> wrote:
> 
>> Hey Roger,
>> 
>> We did some benchmarking, and discovered very similar performance to what
>> you've described. We saw ~40k writes/sec, and ~20 k reads/sec,
>> per-container, on a Virident SSD. This was without any changelog. Are you
>> using a changelog on the store?
>> 
>> When we attached a changelog to the store, the writes dropped
>> significantly (~1000 writes/sec). When we hooked up VisualVM, we saw that
>> the container was spending > 99% of its time in KafkaSystemProducer.send().
>> 
>> We're currently doing two things:
>> 
>> 1. Working with our performance team to understand and tune RocksDB
>> properly.
>> 2. Upgrading the Kafka producer to use the new Java-based API. (SAMZA-227)
>> 
>> For (1), it seems like we should be able to get a lot higher throughput
>> from RocksDB. Anecdotally, we've heard that RocksDB requires many threads
>> in order to max out an SSD, and since Samza is single-threaded, we could
>> just be hitting a RocksDB bottleneck. We won't know until we dig into the
>> problem (which we started investigating last week). The current plan is to
>> start by benchmarking RocksDB JNI outside of Samza, and see what we can
>> get. From there, we'll know our "speed of light", and can try to get Samza
>> as close as possible to it. If RocksDB JNI can't be made to go "fast",
>> then we'll have to understand why.
>> 
>> (2) should help with the changelog issue. I believe that the slowness with
>> the changelog is caused because the changelog is using a sync producer to
>> send to Kafka, and is blocking when a batch is flushed. In the new API,
>> the concept of a "sync" producer is removed. All writes are handled on an
>> async writer thread (though we can still guarantee writes are safely
>> written before checkpointing, which is what we need).
>> 
>> In short, I agree, it seems slow. We see this behavior, too. We're digging
>> into it.
>> 
>> Cheers,
>> Chris
>> 
>> On 1/17/15 12:58 PM, "Roger Hoover" <ro...@gmail.com> wrote:
>> 
>>> Michael,
>>> 
>>> Thanks for the response.  I used VisualVM and YourKit and see the CPU is
>>> not being used (0.1%).  I took a few thread dumps and see the main thread
>>> blocked on the flush() method inside the KV store.
>>> 
>>> On Sat, Jan 17, 2015 at 7:09 AM, Michael Rose <el...@gmail.com>
>>> wrote:
>>> 
>>>> Is your process at 100% CPU? I suspect you're spending most of your
>>>> time in
>>>> JSON deserialization, but profile it and check.
>>>> 
>>>> Michael
>>>> 
>>>> On Friday, January 16, 2015, Roger Hoover <ro...@gmail.com>
>>>> wrote:
>>>> 
>>>>> Hi guys,
>>>>> 
>>>>> I'm testing a job that needs to load 40M records (6GB in Kafka as
>>>> JSON)
>>>>> from a bootstrap topic.  The topic has 4 partitions and I'm running
>>>> the
>>>> job
>>>>> using the ProcessJobFactory so all four tasks are in one container.
>>>>> 
>>>>> Using RocksDB, it's taking 19 minutes to load all the data which
>>>> amounts
>>>> to
>>>>> 35k records/sec or 5MB/s based on input size.  I ran iostat during
>>>> this
>>>>> time as see the disk write throughput is 14MB/s.
>>>>> 
>>>>> I didn't tweak any of the storage settings.
>>>>> 
>>>>> A few questions:
>>>>> 1) Does this seem low?  I'm running on a Macbook Pro with SSD.
>>>>> 2) Do you have any recommendations for improving the load speed?
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Roger
>>>>> 
>>>> 
>> 
>>

Re: Local state write throughput

Posted by Jay Kreps <ja...@gmail.com>.

To change the readahead amount iirc is something like
   blockdev --setra 16 /dev/sda
where the number is the readahead buffer size in KB I think.

-Jay

On Sun, Jan 25, 2015 at 3:50 PM, Roger Hoover <ro...@gmail.com>
wrote:

> I haven't had a chance to try it yet.  Hopefully next week.  I'll let you
> know what I find.
>
> On Sun, Jan 25, 2015 at 2:40 PM, Chris Riccomini <cr...@apache.org>
> wrote:
>
> > Awesome, I'll have a look at this. @Roger, did setting this improve your
> > RocksDB throughput?
> >
> > On Sun, Jan 25, 2015 at 12:53 PM, Jay Kreps <ja...@gmail.com> wrote:
> >
> > > I have seen a similar thing from the OS tunable readahead. I think
> Linux
> > > defaults to reading a full 128K into pagecache with every read. This is
> > > sensible for spinning disks where maybe blowing 500us may mean you get
> > > lucky and save a 10ms seek. But for SSDs, especially a key-value store
> > > doing purely random access, it is a total waste and huge perf hit.
> > >
> > > -Jay
> > >
> > > On Sun, Jan 25, 2015 at 12:29 PM, Roger Hoover <roger.hoover@gmail.com
> >
> > > wrote:
> > >
> > > > FYI, for Linux with SSDs, changing the io scheduler to deadline or
> noop
> > > can
> > > > make a 500x improvement.  I haven't tried this myself.
> > > >
> > > >
> > > >
> > >
> >
> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/hardware.html#_disks
> > > >
> > > > On Tue, Jan 20, 2015 at 9:28 AM, Chris Riccomini <
> > > > criccomini@linkedin.com.invalid> wrote:
> > > >
> > > > > Hey Roger,
> > > > >
> > > > > We did some benchmarking, and discovered very similar performance
> to
> > > what
> > > > > you've described. We saw ~40k writes/sec, and ~20 k reads/sec,
> > > > > per-container, on a Virident SSD. This was without any changelog.
> Are
> > > you
> > > > > using a changelog on the store?
> > > > >
> > > > > When we attached a changelog to the store, the writes dropped
> > > > > significantly (~1000 writes/sec). When we hooked up VisualVM, we
> saw
> > > that
> > > > > the container was spending > 99% of its time in
> > > > KafkaSystemProducer.send().
> > > > >
> > > > > We're currently doing two things:
> > > > >
> > > > > 1. Working with our performance team to understand and tune RocksDB
> > > > > properly.
> > > > > 2. Upgrading the Kafka producer to use the new Java-based API.
> > > > (SAMZA-227)
> > > > >
> > > > > For (1), it seems like we should be able to get a lot higher
> > throughput
> > > > > from RocksDB. Anecdotally, we've heard that RocksDB requires many
> > > threads
> > > > > in order to max out an SSD, and since Samza is single-threaded, we
> > > could
> > > > > just be hitting a RocksDB bottleneck. We won't know until we dig
> into
> > > the
> > > > > problem (which we started investigating last week). The current
> plan
> > is
> > > > to
> > > > > start by benchmarking RocksDB JNI outside of Samza, and see what we
> > can
> > > > > get. From there, we'll know our "speed of light", and can try to
> get
> > > > Samza
> > > > > as close as possible to it. If RocksDB JNI can't be made to go
> > "fast",
> > > > > then we'll have to understand why.
> > > > >
> > > > > (2) should help with the changelog issue. I believe that the
> slowness
> > > > with
> > > > > the changelog is caused because the changelog is using a sync
> > producer
> > > to
> > > > > send to Kafka, and is blocking when a batch is flushed. In the new
> > API,
> > > > > the concept of a "sync" producer is removed. All writes are handled
> > on
> > > an
> > > > > async writer thread (though we can still guarantee writes are
> safely
> > > > > written before checkpointing, which is what we need).
> > > > >
> > > > > In short, I agree, it seems slow. We see this behavior, too. We're
> > > > digging
> > > > > into it.
> > > > >
> > > > > Cheers,
> > > > > Chris
> > > > >
> > > > > On 1/17/15 12:58 PM, "Roger Hoover" <ro...@gmail.com>
> wrote:
> > > > >
> > > > > >Michael,
> > > > > >
> > > > > >Thanks for the response.  I used VisualVM and YourKit and see the
> > CPU
> > > is
> > > > > >not being used (0.1%).  I took a few thread dumps and see the main
> > > > thread
> > > > > >blocked on the flush() method inside the KV store.
> > > > > >
> > > > > >On Sat, Jan 17, 2015 at 7:09 AM, Michael Rose <
> > elementation@gmail.com
> > > >
> > > > > >wrote:
> > > > > >
> > > > > >> Is your process at 100% CPU? I suspect you're spending most of
> > your
> > > > > >>time in
> > > > > >> JSON deserialization, but profile it and check.
> > > > > >>
> > > > > >> Michael
> > > > > >>
> > > > > >> On Friday, January 16, 2015, Roger Hoover <
> roger.hoover@gmail.com
> > >
> > > > > >>wrote:
> > > > > >>
> > > > > >> > Hi guys,
> > > > > >> >
> > > > > >> > I'm testing a job that needs to load 40M records (6GB in Kafka
> > as
> > > > > >>JSON)
> > > > > >> > from a bootstrap topic.  The topic has 4 partitions and I'm
> > > running
> > > > > >>the
> > > > > >> job
> > > > > >> > using the ProcessJobFactory so all four tasks are in one
> > > container.
> > > > > >> >
> > > > > >> > Using RocksDB, it's taking 19 minutes to load all the data
> which
> > > > > >>amounts
> > > > > >> to
> > > > > >> > 35k records/sec or 5MB/s based on input size.  I ran iostat
> > during
> > > > > >>this
> > > > > >> > time as see the disk write throughput is 14MB/s.
> > > > > >> >
> > > > > >> > I didn't tweak any of the storage settings.
> > > > > >> >
> > > > > >> > A few questions:
> > > > > >> > 1) Does this seem low?  I'm running on a Macbook Pro with SSD.
> > > > > >> > 2) Do you have any recommendations for improving the load
> speed?
> > > > > >> >
> > > > > >> > Thanks,
> > > > > >> >
> > > > > >> > Roger
> > > > > >> >
> > > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: Local state write throughput

Posted by Roger Hoover <ro...@gmail.com>.

I haven't had a chance to try it yet.  Hopefully next week.  I'll let you
know what I find.

On Sun, Jan 25, 2015 at 2:40 PM, Chris Riccomini <cr...@apache.org>
wrote:

> Awesome, I'll have a look at this. @Roger, did setting this improve your
> RocksDB throughput?
>
> On Sun, Jan 25, 2015 at 12:53 PM, Jay Kreps <ja...@gmail.com> wrote:
>
> > I have seen a similar thing from the OS tunable readahead. I think Linux
> > defaults to reading a full 128K into pagecache with every read. This is
> > sensible for spinning disks where maybe blowing 500us may mean you get
> > lucky and save a 10ms seek. But for SSDs, especially a key-value store
> > doing purely random access, it is a total waste and huge perf hit.
> >
> > -Jay
> >
> > On Sun, Jan 25, 2015 at 12:29 PM, Roger Hoover <ro...@gmail.com>
> > wrote:
> >
> > > FYI, for Linux with SSDs, changing the io scheduler to deadline or noop
> > can
> > > make a 500x improvement.  I haven't tried this myself.
> > >
> > >
> > >
> >
> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/hardware.html#_disks
> > >
> > > On Tue, Jan 20, 2015 at 9:28 AM, Chris Riccomini <
> > > criccomini@linkedin.com.invalid> wrote:
> > >
> > > > Hey Roger,
> > > >
> > > > We did some benchmarking, and discovered very similar performance to
> > what
> > > > you've described. We saw ~40k writes/sec, and ~20 k reads/sec,
> > > > per-container, on a Virident SSD. This was without any changelog. Are
> > you
> > > > using a changelog on the store?
> > > >
> > > > When we attached a changelog to the store, the writes dropped
> > > > significantly (~1000 writes/sec). When we hooked up VisualVM, we saw
> > that
> > > > the container was spending > 99% of its time in
> > > KafkaSystemProducer.send().
> > > >
> > > > We're currently doing two things:
> > > >
> > > > 1. Working with our performance team to understand and tune RocksDB
> > > > properly.
> > > > 2. Upgrading the Kafka producer to use the new Java-based API.
> > > (SAMZA-227)
> > > >
> > > > For (1), it seems like we should be able to get a lot higher
> throughput
> > > > from RocksDB. Anecdotally, we've heard that RocksDB requires many
> > threads
> > > > in order to max out an SSD, and since Samza is single-threaded, we
> > could
> > > > just be hitting a RocksDB bottleneck. We won't know until we dig into
> > the
> > > > problem (which we started investigating last week). The current plan
> is
> > > to
> > > > start by benchmarking RocksDB JNI outside of Samza, and see what we
> can
> > > > get. From there, we'll know our "speed of light", and can try to get
> > > Samza
> > > > as close as possible to it. If RocksDB JNI can't be made to go
> "fast",
> > > > then we'll have to understand why.
> > > >
> > > > (2) should help with the changelog issue. I believe that the slowness
> > > with
> > > > the changelog is caused because the changelog is using a sync
> producer
> > to
> > > > send to Kafka, and is blocking when a batch is flushed. In the new
> API,
> > > > the concept of a "sync" producer is removed. All writes are handled
> on
> > an
> > > > async writer thread (though we can still guarantee writes are safely
> > > > written before checkpointing, which is what we need).
> > > >
> > > > In short, I agree, it seems slow. We see this behavior, too. We're
> > > digging
> > > > into it.
> > > >
> > > > Cheers,
> > > > Chris
> > > >
> > > > On 1/17/15 12:58 PM, "Roger Hoover" <ro...@gmail.com> wrote:
> > > >
> > > > >Michael,
> > > > >
> > > > >Thanks for the response.  I used VisualVM and YourKit and see the
> CPU
> > is
> > > > >not being used (0.1%).  I took a few thread dumps and see the main
> > > thread
> > > > >blocked on the flush() method inside the KV store.
> > > > >
> > > > >On Sat, Jan 17, 2015 at 7:09 AM, Michael Rose <
> elementation@gmail.com
> > >
> > > > >wrote:
> > > > >
> > > > >> Is your process at 100% CPU? I suspect you're spending most of
> your
> > > > >>time in
> > > > >> JSON deserialization, but profile it and check.
> > > > >>
> > > > >> Michael
> > > > >>
> > > > >> On Friday, January 16, 2015, Roger Hoover <roger.hoover@gmail.com
> >
> > > > >>wrote:
> > > > >>
> > > > >> > Hi guys,
> > > > >> >
> > > > >> > I'm testing a job that needs to load 40M records (6GB in Kafka
> as
> > > > >>JSON)
> > > > >> > from a bootstrap topic.  The topic has 4 partitions and I'm
> > running
> > > > >>the
> > > > >> job
> > > > >> > using the ProcessJobFactory so all four tasks are in one
> > container.
> > > > >> >
> > > > >> > Using RocksDB, it's taking 19 minutes to load all the data which
> > > > >>amounts
> > > > >> to
> > > > >> > 35k records/sec or 5MB/s based on input size.  I ran iostat
> during
> > > > >>this
> > > > >> > time as see the disk write throughput is 14MB/s.
> > > > >> >
> > > > >> > I didn't tweak any of the storage settings.
> > > > >> >
> > > > >> > A few questions:
> > > > >> > 1) Does this seem low?  I'm running on a Macbook Pro with SSD.
> > > > >> > 2) Do you have any recommendations for improving the load speed?
> > > > >> >
> > > > >> > Thanks,
> > > > >> >
> > > > >> > Roger
> > > > >> >
> > > > >>
> > > >
> > > >
> > >
> >
>

Re: Local state write throughput

Posted by Chris Riccomini <cr...@apache.org>.

Awesome, I'll have a look at this. @Roger, did setting this improve your
RocksDB throughput?

On Sun, Jan 25, 2015 at 12:53 PM, Jay Kreps <ja...@gmail.com> wrote:

> I have seen a similar thing from the OS tunable readahead. I think Linux
> defaults to reading a full 128K into pagecache with every read. This is
> sensible for spinning disks where maybe blowing 500us may mean you get
> lucky and save a 10ms seek. But for SSDs, especially a key-value store
> doing purely random access, it is a total waste and huge perf hit.
>
> -Jay
>
> On Sun, Jan 25, 2015 at 12:29 PM, Roger Hoover <ro...@gmail.com>
> wrote:
>
> > FYI, for Linux with SSDs, changing the io scheduler to deadline or noop
> can
> > make a 500x improvement.  I haven't tried this myself.
> >
> >
> >
> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/hardware.html#_disks
> >
> > On Tue, Jan 20, 2015 at 9:28 AM, Chris Riccomini <
> > criccomini@linkedin.com.invalid> wrote:
> >
> > > Hey Roger,
> > >
> > > We did some benchmarking, and discovered very similar performance to
> what
> > > you've described. We saw ~40k writes/sec, and ~20 k reads/sec,
> > > per-container, on a Virident SSD. This was without any changelog. Are
> you
> > > using a changelog on the store?
> > >
> > > When we attached a changelog to the store, the writes dropped
> > > significantly (~1000 writes/sec). When we hooked up VisualVM, we saw
> that
> > > the container was spending > 99% of its time in
> > KafkaSystemProducer.send().
> > >
> > > We're currently doing two things:
> > >
> > > 1. Working with our performance team to understand and tune RocksDB
> > > properly.
> > > 2. Upgrading the Kafka producer to use the new Java-based API.
> > (SAMZA-227)
> > >
> > > For (1), it seems like we should be able to get a lot higher throughput
> > > from RocksDB. Anecdotally, we've heard that RocksDB requires many
> threads
> > > in order to max out an SSD, and since Samza is single-threaded, we
> could
> > > just be hitting a RocksDB bottleneck. We won't know until we dig into
> the
> > > problem (which we started investigating last week). The current plan is
> > to
> > > start by benchmarking RocksDB JNI outside of Samza, and see what we can
> > > get. From there, we'll know our "speed of light", and can try to get
> > Samza
> > > as close as possible to it. If RocksDB JNI can't be made to go "fast",
> > > then we'll have to understand why.
> > >
> > > (2) should help with the changelog issue. I believe that the slowness
> > with
> > > the changelog is caused because the changelog is using a sync producer
> to
> > > send to Kafka, and is blocking when a batch is flushed. In the new API,
> > > the concept of a "sync" producer is removed. All writes are handled on
> an
> > > async writer thread (though we can still guarantee writes are safely
> > > written before checkpointing, which is what we need).
> > >
> > > In short, I agree, it seems slow. We see this behavior, too. We're
> > digging
> > > into it.
> > >
> > > Cheers,
> > > Chris
> > >
> > > On 1/17/15 12:58 PM, "Roger Hoover" <ro...@gmail.com> wrote:
> > >
> > > >Michael,
> > > >
> > > >Thanks for the response.  I used VisualVM and YourKit and see the CPU
> is
> > > >not being used (0.1%).  I took a few thread dumps and see the main
> > thread
> > > >blocked on the flush() method inside the KV store.
> > > >
> > > >On Sat, Jan 17, 2015 at 7:09 AM, Michael Rose <elementation@gmail.com
> >
> > > >wrote:
> > > >
> > > >> Is your process at 100% CPU? I suspect you're spending most of your
> > > >>time in
> > > >> JSON deserialization, but profile it and check.
> > > >>
> > > >> Michael
> > > >>
> > > >> On Friday, January 16, 2015, Roger Hoover <ro...@gmail.com>
> > > >>wrote:
> > > >>
> > > >> > Hi guys,
> > > >> >
> > > >> > I'm testing a job that needs to load 40M records (6GB in Kafka as
> > > >>JSON)
> > > >> > from a bootstrap topic.  The topic has 4 partitions and I'm
> running
> > > >>the
> > > >> job
> > > >> > using the ProcessJobFactory so all four tasks are in one
> container.
> > > >> >
> > > >> > Using RocksDB, it's taking 19 minutes to load all the data which
> > > >>amounts
> > > >> to
> > > >> > 35k records/sec or 5MB/s based on input size.  I ran iostat during
> > > >>this
> > > >> > time as see the disk write throughput is 14MB/s.
> > > >> >
> > > >> > I didn't tweak any of the storage settings.
> > > >> >
> > > >> > A few questions:
> > > >> > 1) Does this seem low?  I'm running on a Macbook Pro with SSD.
> > > >> > 2) Do you have any recommendations for improving the load speed?
> > > >> >
> > > >> > Thanks,
> > > >> >
> > > >> > Roger
> > > >> >
> > > >>
> > >
> > >
> >
>

Re: Local state write throughput

Posted by Jay Kreps <ja...@gmail.com>.

I have seen a similar thing from the OS tunable readahead. I think Linux
defaults to reading a full 128K into pagecache with every read. This is
sensible for spinning disks where maybe blowing 500us may mean you get
lucky and save a 10ms seek. But for SSDs, especially a key-value store
doing purely random access, it is a total waste and huge perf hit.

-Jay

On Sun, Jan 25, 2015 at 12:29 PM, Roger Hoover <ro...@gmail.com>
wrote:

> FYI, for Linux with SSDs, changing the io scheduler to deadline or noop can
> make a 500x improvement.  I haven't tried this myself.
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/hardware.html#_disks
>
> On Tue, Jan 20, 2015 at 9:28 AM, Chris Riccomini <
> criccomini@linkedin.com.invalid> wrote:
>
> > Hey Roger,
> >
> > We did some benchmarking, and discovered very similar performance to what
> > you've described. We saw ~40k writes/sec, and ~20 k reads/sec,
> > per-container, on a Virident SSD. This was without any changelog. Are you
> > using a changelog on the store?
> >
> > When we attached a changelog to the store, the writes dropped
> > significantly (~1000 writes/sec). When we hooked up VisualVM, we saw that
> > the container was spending > 99% of its time in
> KafkaSystemProducer.send().
> >
> > We're currently doing two things:
> >
> > 1. Working with our performance team to understand and tune RocksDB
> > properly.
> > 2. Upgrading the Kafka producer to use the new Java-based API.
> (SAMZA-227)
> >
> > For (1), it seems like we should be able to get a lot higher throughput
> > from RocksDB. Anecdotally, we've heard that RocksDB requires many threads
> > in order to max out an SSD, and since Samza is single-threaded, we could
> > just be hitting a RocksDB bottleneck. We won't know until we dig into the
> > problem (which we started investigating last week). The current plan is
> to
> > start by benchmarking RocksDB JNI outside of Samza, and see what we can
> > get. From there, we'll know our "speed of light", and can try to get
> Samza
> > as close as possible to it. If RocksDB JNI can't be made to go "fast",
> > then we'll have to understand why.
> >
> > (2) should help with the changelog issue. I believe that the slowness
> with
> > the changelog is caused because the changelog is using a sync producer to
> > send to Kafka, and is blocking when a batch is flushed. In the new API,
> > the concept of a "sync" producer is removed. All writes are handled on an
> > async writer thread (though we can still guarantee writes are safely
> > written before checkpointing, which is what we need).
> >
> > In short, I agree, it seems slow. We see this behavior, too. We're
> digging
> > into it.
> >
> > Cheers,
> > Chris
> >
> > On 1/17/15 12:58 PM, "Roger Hoover" <ro...@gmail.com> wrote:
> >
> > >Michael,
> > >
> > >Thanks for the response.  I used VisualVM and YourKit and see the CPU is
> > >not being used (0.1%).  I took a few thread dumps and see the main
> thread
> > >blocked on the flush() method inside the KV store.
> > >
> > >On Sat, Jan 17, 2015 at 7:09 AM, Michael Rose <el...@gmail.com>
> > >wrote:
> > >
> > >> Is your process at 100% CPU? I suspect you're spending most of your
> > >>time in
> > >> JSON deserialization, but profile it and check.
> > >>
> > >> Michael
> > >>
> > >> On Friday, January 16, 2015, Roger Hoover <ro...@gmail.com>
> > >>wrote:
> > >>
> > >> > Hi guys,
> > >> >
> > >> > I'm testing a job that needs to load 40M records (6GB in Kafka as
> > >>JSON)
> > >> > from a bootstrap topic.  The topic has 4 partitions and I'm running
> > >>the
> > >> job
> > >> > using the ProcessJobFactory so all four tasks are in one container.
> > >> >
> > >> > Using RocksDB, it's taking 19 minutes to load all the data which
> > >>amounts
> > >> to
> > >> > 35k records/sec or 5MB/s based on input size.  I ran iostat during
> > >>this
> > >> > time as see the disk write throughput is 14MB/s.
> > >> >
> > >> > I didn't tweak any of the storage settings.
> > >> >
> > >> > A few questions:
> > >> > 1) Does this seem low?  I'm running on a Macbook Pro with SSD.
> > >> > 2) Do you have any recommendations for improving the load speed?
> > >> >
> > >> > Thanks,
> > >> >
> > >> > Roger
> > >> >
> > >>
> >
> >
>

Re: Local state write throughput

Posted by Roger Hoover <ro...@gmail.com>.

FYI, for Linux with SSDs, changing the io scheduler to deadline or noop can
make a 500x improvement.  I haven't tried this myself.

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/hardware.html#_disks

On Tue, Jan 20, 2015 at 9:28 AM, Chris Riccomini <
criccomini@linkedin.com.invalid> wrote:

> Hey Roger,
>
> We did some benchmarking, and discovered very similar performance to what
> you've described. We saw ~40k writes/sec, and ~20 k reads/sec,
> per-container, on a Virident SSD. This was without any changelog. Are you
> using a changelog on the store?
>
> When we attached a changelog to the store, the writes dropped
> significantly (~1000 writes/sec). When we hooked up VisualVM, we saw that
> the container was spending > 99% of its time in KafkaSystemProducer.send().
>
> We're currently doing two things:
>
> 1. Working with our performance team to understand and tune RocksDB
> properly.
> 2. Upgrading the Kafka producer to use the new Java-based API. (SAMZA-227)
>
> For (1), it seems like we should be able to get a lot higher throughput
> from RocksDB. Anecdotally, we've heard that RocksDB requires many threads
> in order to max out an SSD, and since Samza is single-threaded, we could
> just be hitting a RocksDB bottleneck. We won't know until we dig into the
> problem (which we started investigating last week). The current plan is to
> start by benchmarking RocksDB JNI outside of Samza, and see what we can
> get. From there, we'll know our "speed of light", and can try to get Samza
> as close as possible to it. If RocksDB JNI can't be made to go "fast",
> then we'll have to understand why.
>
> (2) should help with the changelog issue. I believe that the slowness with
> the changelog is caused because the changelog is using a sync producer to
> send to Kafka, and is blocking when a batch is flushed. In the new API,
> the concept of a "sync" producer is removed. All writes are handled on an
> async writer thread (though we can still guarantee writes are safely
> written before checkpointing, which is what we need).
>
> In short, I agree, it seems slow. We see this behavior, too. We're digging
> into it.
>
> Cheers,
> Chris
>
> On 1/17/15 12:58 PM, "Roger Hoover" <ro...@gmail.com> wrote:
>
> >Michael,
> >
> >Thanks for the response.  I used VisualVM and YourKit and see the CPU is
> >not being used (0.1%).  I took a few thread dumps and see the main thread
> >blocked on the flush() method inside the KV store.
> >
> >On Sat, Jan 17, 2015 at 7:09 AM, Michael Rose <el...@gmail.com>
> >wrote:
> >
> >> Is your process at 100% CPU? I suspect you're spending most of your
> >>time in
> >> JSON deserialization, but profile it and check.
> >>
> >> Michael
> >>
> >> On Friday, January 16, 2015, Roger Hoover <ro...@gmail.com>
> >>wrote:
> >>
> >> > Hi guys,
> >> >
> >> > I'm testing a job that needs to load 40M records (6GB in Kafka as
> >>JSON)
> >> > from a bootstrap topic.  The topic has 4 partitions and I'm running
> >>the
> >> job
> >> > using the ProcessJobFactory so all four tasks are in one container.
> >> >
> >> > Using RocksDB, it's taking 19 minutes to load all the data which
> >>amounts
> >> to
> >> > 35k records/sec or 5MB/s based on input size.  I ran iostat during
> >>this
> >> > time as see the disk write throughput is 14MB/s.
> >> >
> >> > I didn't tweak any of the storage settings.
> >> >
> >> > A few questions:
> >> > 1) Does this seem low?  I'm running on a Macbook Pro with SSD.
> >> > 2) Do you have any recommendations for improving the load speed?
> >> >
> >> > Thanks,
> >> >
> >> > Roger
> >> >
> >>
>
>

Re: Local state write throughput

Posted by Chris Riccomini <cr...@linkedin.com.INVALID>.

Hey Roger,

> Is this the way you typically do it for stream/table joins?

Yup, in cases where the source stream has a copy of the data, and you
fully bootstrap, there's no need to have a changelog.

> but the current write throughput puts hard limits on the amount of local
>state that a container can have without really long
>initialization/recovery times.

Yea. The current guidance we give is <10G/container, and every gig costs
on startup recovery time. We definitely want to decrease startup times.

> Is my tests, LevelDB has about the same performance.  Have you noticed
>that as well?

I haven't checked, but I'm not surprised. LevelDB and RocksDB are fairly
similar. Anecdotally, we have found LevelDB less predictable. It performs
on-par with RocksDB, and then falls off a cliff for some reason (data
size, access patterns, etc) sometimes.

Cheers,
Chris

On 1/20/15 4:05 PM, "Roger Hoover" <ro...@gmail.com> wrote:

>Thanks, Chris.
>
>I am not using a changelog for the store because the the bootstrap stream
>is a master copy of the data and the job can recover from there.  No need
>to write out another copy.  Is this the way you typically do it for
>stream/table joins?
>
>Great to know you that you're looking into the performance issues.  I love
>the idea of local state for isolation and predictable throughput but the
>current write throughput puts hard limits on the amount of local state
>that
>a container can have without really long initialization/recovery times.
>
>Is my tests, LevelDB has about the same performance.  Have you noticed
>that
>as well?
>
>Cheers,
>
>Roger
>
>On Tue, Jan 20, 2015 at 9:28 AM, Chris Riccomini <
>criccomini@linkedin.com.invalid> wrote:
>
>> Hey Roger,
>>
>> We did some benchmarking, and discovered very similar performance to
>>what
>> you've described. We saw ~40k writes/sec, and ~20 k reads/sec,
>> per-container, on a Virident SSD. This was without any changelog. Are
>>you
>> using a changelog on the store?
>>
>> When we attached a changelog to the store, the writes dropped
>> significantly (~1000 writes/sec). When we hooked up VisualVM, we saw
>>that
>> the container was spending > 99% of its time in
>>KafkaSystemProducer.send().
>>
>> We're currently doing two things:
>>
>> 1. Working with our performance team to understand and tune RocksDB
>> properly.
>> 2. Upgrading the Kafka producer to use the new Java-based API.
>>(SAMZA-227)
>>
>> For (1), it seems like we should be able to get a lot higher throughput
>> from RocksDB. Anecdotally, we've heard that RocksDB requires many
>>threads
>> in order to max out an SSD, and since Samza is single-threaded, we could
>> just be hitting a RocksDB bottleneck. We won't know until we dig into
>>the
>> problem (which we started investigating last week). The current plan is
>>to
>> start by benchmarking RocksDB JNI outside of Samza, and see what we can
>> get. From there, we'll know our "speed of light", and can try to get
>>Samza
>> as close as possible to it. If RocksDB JNI can't be made to go "fast",
>> then we'll have to understand why.
>>
>> (2) should help with the changelog issue. I believe that the slowness
>>with
>> the changelog is caused because the changelog is using a sync producer
>>to
>> send to Kafka, and is blocking when a batch is flushed. In the new API,
>> the concept of a "sync" producer is removed. All writes are handled on
>>an
>> async writer thread (though we can still guarantee writes are safely
>> written before checkpointing, which is what we need).
>>
>> In short, I agree, it seems slow. We see this behavior, too. We're
>>digging
>> into it.
>>
>> Cheers,
>> Chris
>>
>> On 1/17/15 12:58 PM, "Roger Hoover" <ro...@gmail.com> wrote:
>>
>> >Michael,
>> >
>> >Thanks for the response.  I used VisualVM and YourKit and see the CPU
>>is
>> >not being used (0.1%).  I took a few thread dumps and see the main
>>thread
>> >blocked on the flush() method inside the KV store.
>> >
>> >On Sat, Jan 17, 2015 at 7:09 AM, Michael Rose <el...@gmail.com>
>> >wrote:
>> >
>> >> Is your process at 100% CPU? I suspect you're spending most of your
>> >>time in
>> >> JSON deserialization, but profile it and check.
>> >>
>> >> Michael
>> >>
>> >> On Friday, January 16, 2015, Roger Hoover <ro...@gmail.com>
>> >>wrote:
>> >>
>> >> > Hi guys,
>> >> >
>> >> > I'm testing a job that needs to load 40M records (6GB in Kafka as
>> >>JSON)
>> >> > from a bootstrap topic.  The topic has 4 partitions and I'm running
>> >>the
>> >> job
>> >> > using the ProcessJobFactory so all four tasks are in one container.
>> >> >
>> >> > Using RocksDB, it's taking 19 minutes to load all the data which
>> >>amounts
>> >> to
>> >> > 35k records/sec or 5MB/s based on input size.  I ran iostat during
>> >>this
>> >> > time as see the disk write throughput is 14MB/s.
>> >> >
>> >> > I didn't tweak any of the storage settings.
>> >> >
>> >> > A few questions:
>> >> > 1) Does this seem low?  I'm running on a Macbook Pro with SSD.
>> >> > 2) Do you have any recommendations for improving the load speed?
>> >> >
>> >> > Thanks,
>> >> >
>> >> > Roger
>> >> >
>> >>
>>
>>

Re: Local state write throughput

Posted by Roger Hoover <ro...@gmail.com>.

Good to know.  Thanks, Jay and Chris.  I want the job to accept updates it
may be worthwhile for me to add a changelog so recoveries are faster.


On Tue, Jan 20, 2015 at 4:38 PM, Chris Riccomini <
criccomini@linkedin.com.invalid> wrote:

> Hey Roger,
>
> To add to Jay's comment, if you don't care about getting updates after the
> initial bootstrap, you can configure a store with a changelog pointed to
> your bootstrap topic. This will cause the SamzaContainer to bootstrap
> using the optimized code that Jay described. Just make sure you don't
> write to the store (since it would put the mutation back into your
> bootstrap stream). This configuration won't allow new updates to come into
> the store until the job is restarted. If you use the 'bootstrap stream'
> concept, then you continue getting updates after the initial bootstrap.
> The 'bootstrap' stream also allows you to have arbitrary logic, which
> might be useful for your job--not sure.
>
> Cheers,
> Chris
>
> On 1/20/15 4:30 PM, "Jay Kreps" <ja...@confluent.io> wrote:
>
> >It's also worth noting that restoring from a changelog *should* be much
> >faster than restoring from upstream. The restore case is optimized and
> >batches the updates and skips serialization both of which help a ton with
> >performance.
> >
> >-Jay
> >
> >On Tue, Jan 20, 2015 at 4:19 PM, Chinmay Soman <chinmay.cerebro@gmail.com
> >
> >wrote:
> >
> >> I remember running both RocksDB and LevelDB and it was definitely better
> >> (in that 1 test case, it was ~40K vs ~30K random writes/sec) - but I
> >> haven't done any exhaustive comparison.
> >>
> >> Btw, I see that you're using 4 partitions ? Any reason you're not using
> >> like >= 128 and running with more containers ?
> >>
> >> On Tue, Jan 20, 2015 at 4:05 PM, Roger Hoover <ro...@gmail.com>
> >> wrote:
> >>
> >> > Thanks, Chris.
> >> >
> >> > I am not using a changelog for the store because the the bootstrap
> >>stream
> >> > is a master copy of the data and the job can recover from there.  No
> >>need
> >> > to write out another copy.  Is this the way you typically do it for
> >> > stream/table joins?
> >> >
> >> > Great to know you that you're looking into the performance issues.  I
> >> love
> >> > the idea of local state for isolation and predictable throughput but
> >>the
> >> > current write throughput puts hard limits on the amount of local state
> >> that
> >> > a container can have without really long initialization/recovery
> >>times.
> >> >
> >> > Is my tests, LevelDB has about the same performance.  Have you noticed
> >> that
> >> > as well?
> >> >
> >> > Cheers,
> >> >
> >> > Roger
> >> >
> >> > On Tue, Jan 20, 2015 at 9:28 AM, Chris Riccomini <
> >> > criccomini@linkedin.com.invalid> wrote:
> >> >
> >> > > Hey Roger,
> >> > >
> >> > > We did some benchmarking, and discovered very similar performance to
> >> what
> >> > > you've described. We saw ~40k writes/sec, and ~20 k reads/sec,
> >> > > per-container, on a Virident SSD. This was without any changelog.
> >>Are
> >> you
> >> > > using a changelog on the store?
> >> > >
> >> > > When we attached a changelog to the store, the writes dropped
> >> > > significantly (~1000 writes/sec). When we hooked up VisualVM, we saw
> >> that
> >> > > the container was spending > 99% of its time in
> >> > KafkaSystemProducer.send().
> >> > >
> >> > > We're currently doing two things:
> >> > >
> >> > > 1. Working with our performance team to understand and tune RocksDB
> >> > > properly.
> >> > > 2. Upgrading the Kafka producer to use the new Java-based API.
> >> > (SAMZA-227)
> >> > >
> >> > > For (1), it seems like we should be able to get a lot higher
> >>throughput
> >> > > from RocksDB. Anecdotally, we've heard that RocksDB requires many
> >> threads
> >> > > in order to max out an SSD, and since Samza is single-threaded, we
> >> could
> >> > > just be hitting a RocksDB bottleneck. We won't know until we dig
> >>into
> >> the
> >> > > problem (which we started investigating last week). The current
> >>plan is
> >> > to
> >> > > start by benchmarking RocksDB JNI outside of Samza, and see what we
> >>can
> >> > > get. From there, we'll know our "speed of light", and can try to get
> >> > Samza
> >> > > as close as possible to it. If RocksDB JNI can't be made to go
> >>"fast",
> >> > > then we'll have to understand why.
> >> > >
> >> > > (2) should help with the changelog issue. I believe that the
> >>slowness
> >> > with
> >> > > the changelog is caused because the changelog is using a sync
> >>producer
> >> to
> >> > > send to Kafka, and is blocking when a batch is flushed. In the new
> >>API,
> >> > > the concept of a "sync" producer is removed. All writes are handled
> >>on
> >> an
> >> > > async writer thread (though we can still guarantee writes are safely
> >> > > written before checkpointing, which is what we need).
> >> > >
> >> > > In short, I agree, it seems slow. We see this behavior, too. We're
> >> > digging
> >> > > into it.
> >> > >
> >> > > Cheers,
> >> > > Chris
> >> > >
> >> > > On 1/17/15 12:58 PM, "Roger Hoover" <ro...@gmail.com> wrote:
> >> > >
> >> > > >Michael,
> >> > > >
> >> > > >Thanks for the response.  I used VisualVM and YourKit and see the
> >>CPU
> >> is
> >> > > >not being used (0.1%).  I took a few thread dumps and see the main
> >> > thread
> >> > > >blocked on the flush() method inside the KV store.
> >> > > >
> >> > > >On Sat, Jan 17, 2015 at 7:09 AM, Michael Rose
> >><elementation@gmail.com
> >> >
> >> > > >wrote:
> >> > > >
> >> > > >> Is your process at 100% CPU? I suspect you're spending most of
> >>your
> >> > > >>time in
> >> > > >> JSON deserialization, but profile it and check.
> >> > > >>
> >> > > >> Michael
> >> > > >>
> >> > > >> On Friday, January 16, 2015, Roger Hoover
> >><ro...@gmail.com>
> >> > > >>wrote:
> >> > > >>
> >> > > >> > Hi guys,
> >> > > >> >
> >> > > >> > I'm testing a job that needs to load 40M records (6GB in Kafka
> >>as
> >> > > >>JSON)
> >> > > >> > from a bootstrap topic.  The topic has 4 partitions and I'm
> >> running
> >> > > >>the
> >> > > >> job
> >> > > >> > using the ProcessJobFactory so all four tasks are in one
> >> container.
> >> > > >> >
> >> > > >> > Using RocksDB, it's taking 19 minutes to load all the data
> >>which
> >> > > >>amounts
> >> > > >> to
> >> > > >> > 35k records/sec or 5MB/s based on input size.  I ran iostat
> >>during
> >> > > >>this
> >> > > >> > time as see the disk write throughput is 14MB/s.
> >> > > >> >
> >> > > >> > I didn't tweak any of the storage settings.
> >> > > >> >
> >> > > >> > A few questions:
> >> > > >> > 1) Does this seem low?  I'm running on a Macbook Pro with SSD.
> >> > > >> > 2) Do you have any recommendations for improving the load
> >>speed?
> >> > > >> >
> >> > > >> > Thanks,
> >> > > >> >
> >> > > >> > Roger
> >> > > >> >
> >> > > >>
> >> > >
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> Thanks and regards
> >>
> >> Chinmay Soman
> >>
>
>

Re: Local state write throughput

Posted by Chris Riccomini <cr...@linkedin.com.INVALID>.

Hey Roger,

To add to Jay's comment, if you don't care about getting updates after the
initial bootstrap, you can configure a store with a changelog pointed to
your bootstrap topic. This will cause the SamzaContainer to bootstrap
using the optimized code that Jay described. Just make sure you don't
write to the store (since it would put the mutation back into your
bootstrap stream). This configuration won't allow new updates to come into
the store until the job is restarted. If you use the 'bootstrap stream'
concept, then you continue getting updates after the initial bootstrap.
The 'bootstrap' stream also allows you to have arbitrary logic, which
might be useful for your job--not sure.

Cheers,
Chris

On 1/20/15 4:30 PM, "Jay Kreps" <ja...@confluent.io> wrote:

>It's also worth noting that restoring from a changelog *should* be much
>faster than restoring from upstream. The restore case is optimized and
>batches the updates and skips serialization both of which help a ton with
>performance.
>
>-Jay
>
>On Tue, Jan 20, 2015 at 4:19 PM, Chinmay Soman <ch...@gmail.com>
>wrote:
>
>> I remember running both RocksDB and LevelDB and it was definitely better
>> (in that 1 test case, it was ~40K vs ~30K random writes/sec) - but I
>> haven't done any exhaustive comparison.
>>
>> Btw, I see that you're using 4 partitions ? Any reason you're not using
>> like >= 128 and running with more containers ?
>>
>> On Tue, Jan 20, 2015 at 4:05 PM, Roger Hoover <ro...@gmail.com>
>> wrote:
>>
>> > Thanks, Chris.
>> >
>> > I am not using a changelog for the store because the the bootstrap
>>stream
>> > is a master copy of the data and the job can recover from there.  No
>>need
>> > to write out another copy.  Is this the way you typically do it for
>> > stream/table joins?
>> >
>> > Great to know you that you're looking into the performance issues.  I
>> love
>> > the idea of local state for isolation and predictable throughput but
>>the
>> > current write throughput puts hard limits on the amount of local state
>> that
>> > a container can have without really long initialization/recovery
>>times.
>> >
>> > Is my tests, LevelDB has about the same performance.  Have you noticed
>> that
>> > as well?
>> >
>> > Cheers,
>> >
>> > Roger
>> >
>> > On Tue, Jan 20, 2015 at 9:28 AM, Chris Riccomini <
>> > criccomini@linkedin.com.invalid> wrote:
>> >
>> > > Hey Roger,
>> > >
>> > > We did some benchmarking, and discovered very similar performance to
>> what
>> > > you've described. We saw ~40k writes/sec, and ~20 k reads/sec,
>> > > per-container, on a Virident SSD. This was without any changelog.
>>Are
>> you
>> > > using a changelog on the store?
>> > >
>> > > When we attached a changelog to the store, the writes dropped
>> > > significantly (~1000 writes/sec). When we hooked up VisualVM, we saw
>> that
>> > > the container was spending > 99% of its time in
>> > KafkaSystemProducer.send().
>> > >
>> > > We're currently doing two things:
>> > >
>> > > 1. Working with our performance team to understand and tune RocksDB
>> > > properly.
>> > > 2. Upgrading the Kafka producer to use the new Java-based API.
>> > (SAMZA-227)
>> > >
>> > > For (1), it seems like we should be able to get a lot higher
>>throughput
>> > > from RocksDB. Anecdotally, we've heard that RocksDB requires many
>> threads
>> > > in order to max out an SSD, and since Samza is single-threaded, we
>> could
>> > > just be hitting a RocksDB bottleneck. We won't know until we dig
>>into
>> the
>> > > problem (which we started investigating last week). The current
>>plan is
>> > to
>> > > start by benchmarking RocksDB JNI outside of Samza, and see what we
>>can
>> > > get. From there, we'll know our "speed of light", and can try to get
>> > Samza
>> > > as close as possible to it. If RocksDB JNI can't be made to go
>>"fast",
>> > > then we'll have to understand why.
>> > >
>> > > (2) should help with the changelog issue. I believe that the
>>slowness
>> > with
>> > > the changelog is caused because the changelog is using a sync
>>producer
>> to
>> > > send to Kafka, and is blocking when a batch is flushed. In the new
>>API,
>> > > the concept of a "sync" producer is removed. All writes are handled
>>on
>> an
>> > > async writer thread (though we can still guarantee writes are safely
>> > > written before checkpointing, which is what we need).
>> > >
>> > > In short, I agree, it seems slow. We see this behavior, too. We're
>> > digging
>> > > into it.
>> > >
>> > > Cheers,
>> > > Chris
>> > >
>> > > On 1/17/15 12:58 PM, "Roger Hoover" <ro...@gmail.com> wrote:
>> > >
>> > > >Michael,
>> > > >
>> > > >Thanks for the response.  I used VisualVM and YourKit and see the
>>CPU
>> is
>> > > >not being used (0.1%).  I took a few thread dumps and see the main
>> > thread
>> > > >blocked on the flush() method inside the KV store.
>> > > >
>> > > >On Sat, Jan 17, 2015 at 7:09 AM, Michael Rose
>><elementation@gmail.com
>> >
>> > > >wrote:
>> > > >
>> > > >> Is your process at 100% CPU? I suspect you're spending most of
>>your
>> > > >>time in
>> > > >> JSON deserialization, but profile it and check.
>> > > >>
>> > > >> Michael
>> > > >>
>> > > >> On Friday, January 16, 2015, Roger Hoover
>><ro...@gmail.com>
>> > > >>wrote:
>> > > >>
>> > > >> > Hi guys,
>> > > >> >
>> > > >> > I'm testing a job that needs to load 40M records (6GB in Kafka
>>as
>> > > >>JSON)
>> > > >> > from a bootstrap topic.  The topic has 4 partitions and I'm
>> running
>> > > >>the
>> > > >> job
>> > > >> > using the ProcessJobFactory so all four tasks are in one
>> container.
>> > > >> >
>> > > >> > Using RocksDB, it's taking 19 minutes to load all the data
>>which
>> > > >>amounts
>> > > >> to
>> > > >> > 35k records/sec or 5MB/s based on input size.  I ran iostat
>>during
>> > > >>this
>> > > >> > time as see the disk write throughput is 14MB/s.
>> > > >> >
>> > > >> > I didn't tweak any of the storage settings.
>> > > >> >
>> > > >> > A few questions:
>> > > >> > 1) Does this seem low?  I'm running on a Macbook Pro with SSD.
>> > > >> > 2) Do you have any recommendations for improving the load
>>speed?
>> > > >> >
>> > > >> > Thanks,
>> > > >> >
>> > > >> > Roger
>> > > >> >
>> > > >>
>> > >
>> > >
>> >
>>
>>
>>
>> --
>> Thanks and regards
>>
>> Chinmay Soman
>>

Re: Local state write throughput

Posted by Jay Kreps <ja...@confluent.io>.

It's also worth noting that restoring from a changelog *should* be much
faster than restoring from upstream. The restore case is optimized and
batches the updates and skips serialization both of which help a ton with
performance.

-Jay

On Tue, Jan 20, 2015 at 4:19 PM, Chinmay Soman <ch...@gmail.com>
wrote:

> I remember running both RocksDB and LevelDB and it was definitely better
> (in that 1 test case, it was ~40K vs ~30K random writes/sec) - but I
> haven't done any exhaustive comparison.
>
> Btw, I see that you're using 4 partitions ? Any reason you're not using
> like >= 128 and running with more containers ?
>
> On Tue, Jan 20, 2015 at 4:05 PM, Roger Hoover <ro...@gmail.com>
> wrote:
>
> > Thanks, Chris.
> >
> > I am not using a changelog for the store because the the bootstrap stream
> > is a master copy of the data and the job can recover from there.  No need
> > to write out another copy.  Is this the way you typically do it for
> > stream/table joins?
> >
> > Great to know you that you're looking into the performance issues.  I
> love
> > the idea of local state for isolation and predictable throughput but the
> > current write throughput puts hard limits on the amount of local state
> that
> > a container can have without really long initialization/recovery times.
> >
> > Is my tests, LevelDB has about the same performance.  Have you noticed
> that
> > as well?
> >
> > Cheers,
> >
> > Roger
> >
> > On Tue, Jan 20, 2015 at 9:28 AM, Chris Riccomini <
> > criccomini@linkedin.com.invalid> wrote:
> >
> > > Hey Roger,
> > >
> > > We did some benchmarking, and discovered very similar performance to
> what
> > > you've described. We saw ~40k writes/sec, and ~20 k reads/sec,
> > > per-container, on a Virident SSD. This was without any changelog. Are
> you
> > > using a changelog on the store?
> > >
> > > When we attached a changelog to the store, the writes dropped
> > > significantly (~1000 writes/sec). When we hooked up VisualVM, we saw
> that
> > > the container was spending > 99% of its time in
> > KafkaSystemProducer.send().
> > >
> > > We're currently doing two things:
> > >
> > > 1. Working with our performance team to understand and tune RocksDB
> > > properly.
> > > 2. Upgrading the Kafka producer to use the new Java-based API.
> > (SAMZA-227)
> > >
> > > For (1), it seems like we should be able to get a lot higher throughput
> > > from RocksDB. Anecdotally, we've heard that RocksDB requires many
> threads
> > > in order to max out an SSD, and since Samza is single-threaded, we
> could
> > > just be hitting a RocksDB bottleneck. We won't know until we dig into
> the
> > > problem (which we started investigating last week). The current plan is
> > to
> > > start by benchmarking RocksDB JNI outside of Samza, and see what we can
> > > get. From there, we'll know our "speed of light", and can try to get
> > Samza
> > > as close as possible to it. If RocksDB JNI can't be made to go "fast",
> > > then we'll have to understand why.
> > >
> > > (2) should help with the changelog issue. I believe that the slowness
> > with
> > > the changelog is caused because the changelog is using a sync producer
> to
> > > send to Kafka, and is blocking when a batch is flushed. In the new API,
> > > the concept of a "sync" producer is removed. All writes are handled on
> an
> > > async writer thread (though we can still guarantee writes are safely
> > > written before checkpointing, which is what we need).
> > >
> > > In short, I agree, it seems slow. We see this behavior, too. We're
> > digging
> > > into it.
> > >
> > > Cheers,
> > > Chris
> > >
> > > On 1/17/15 12:58 PM, "Roger Hoover" <ro...@gmail.com> wrote:
> > >
> > > >Michael,
> > > >
> > > >Thanks for the response.  I used VisualVM and YourKit and see the CPU
> is
> > > >not being used (0.1%).  I took a few thread dumps and see the main
> > thread
> > > >blocked on the flush() method inside the KV store.
> > > >
> > > >On Sat, Jan 17, 2015 at 7:09 AM, Michael Rose <elementation@gmail.com
> >
> > > >wrote:
> > > >
> > > >> Is your process at 100% CPU? I suspect you're spending most of your
> > > >>time in
> > > >> JSON deserialization, but profile it and check.
> > > >>
> > > >> Michael
> > > >>
> > > >> On Friday, January 16, 2015, Roger Hoover <ro...@gmail.com>
> > > >>wrote:
> > > >>
> > > >> > Hi guys,
> > > >> >
> > > >> > I'm testing a job that needs to load 40M records (6GB in Kafka as
> > > >>JSON)
> > > >> > from a bootstrap topic.  The topic has 4 partitions and I'm
> running
> > > >>the
> > > >> job
> > > >> > using the ProcessJobFactory so all four tasks are in one
> container.
> > > >> >
> > > >> > Using RocksDB, it's taking 19 minutes to load all the data which
> > > >>amounts
> > > >> to
> > > >> > 35k records/sec or 5MB/s based on input size.  I ran iostat during
> > > >>this
> > > >> > time as see the disk write throughput is 14MB/s.
> > > >> >
> > > >> > I didn't tweak any of the storage settings.
> > > >> >
> > > >> > A few questions:
> > > >> > 1) Does this seem low?  I'm running on a Macbook Pro with SSD.
> > > >> > 2) Do you have any recommendations for improving the load speed?
> > > >> >
> > > >> > Thanks,
> > > >> >
> > > >> > Roger
> > > >> >
> > > >>
> > >
> > >
> >
>
>
>
> --
> Thanks and regards
>
> Chinmay Soman
>

Re: Local state write throughput

Posted by Roger Hoover <ro...@gmail.com>.

"Btw, I see that you're using 4 partitions ? Any reason you're not using
like >= 128 and running with more containers ?"

In my case, the table that I need to join is small (1.5GB) so rather than
partitioning the state, I have each task keep it's own full copy.  This
makes the overall job flow simpler in that I don't need partitioning jobs
and I can add more partitions if needed without a complex migration process
or data loss.

So, in this case, adding more partitions won't decrease the start up time.

On Tue, Jan 20, 2015 at 4:19 PM, Chinmay Soman <ch...@gmail.com>
wrote:

> I remember running both RocksDB and LevelDB and it was definitely better
> (in that 1 test case, it was ~40K vs ~30K random writes/sec) - but I
> haven't done any exhaustive comparison.
>
> Btw, I see that you're using 4 partitions ? Any reason you're not using
> like >= 128 and running with more containers ?
>
> On Tue, Jan 20, 2015 at 4:05 PM, Roger Hoover <ro...@gmail.com>
> wrote:
>
> > Thanks, Chris.
> >
> > I am not using a changelog for the store because the the bootstrap stream
> > is a master copy of the data and the job can recover from there.  No need
> > to write out another copy.  Is this the way you typically do it for
> > stream/table joins?
> >
> > Great to know you that you're looking into the performance issues.  I
> love
> > the idea of local state for isolation and predictable throughput but the
> > current write throughput puts hard limits on the amount of local state
> that
> > a container can have without really long initialization/recovery times.
> >
> > Is my tests, LevelDB has about the same performance.  Have you noticed
> that
> > as well?
> >
> > Cheers,
> >
> > Roger
> >
> > On Tue, Jan 20, 2015 at 9:28 AM, Chris Riccomini <
> > criccomini@linkedin.com.invalid> wrote:
> >
> > > Hey Roger,
> > >
> > > We did some benchmarking, and discovered very similar performance to
> what
> > > you've described. We saw ~40k writes/sec, and ~20 k reads/sec,
> > > per-container, on a Virident SSD. This was without any changelog. Are
> you
> > > using a changelog on the store?
> > >
> > > When we attached a changelog to the store, the writes dropped
> > > significantly (~1000 writes/sec). When we hooked up VisualVM, we saw
> that
> > > the container was spending > 99% of its time in
> > KafkaSystemProducer.send().
> > >
> > > We're currently doing two things:
> > >
> > > 1. Working with our performance team to understand and tune RocksDB
> > > properly.
> > > 2. Upgrading the Kafka producer to use the new Java-based API.
> > (SAMZA-227)
> > >
> > > For (1), it seems like we should be able to get a lot higher throughput
> > > from RocksDB. Anecdotally, we've heard that RocksDB requires many
> threads
> > > in order to max out an SSD, and since Samza is single-threaded, we
> could
> > > just be hitting a RocksDB bottleneck. We won't know until we dig into
> the
> > > problem (which we started investigating last week). The current plan is
> > to
> > > start by benchmarking RocksDB JNI outside of Samza, and see what we can
> > > get. From there, we'll know our "speed of light", and can try to get
> > Samza
> > > as close as possible to it. If RocksDB JNI can't be made to go "fast",
> > > then we'll have to understand why.
> > >
> > > (2) should help with the changelog issue. I believe that the slowness
> > with
> > > the changelog is caused because the changelog is using a sync producer
> to
> > > send to Kafka, and is blocking when a batch is flushed. In the new API,
> > > the concept of a "sync" producer is removed. All writes are handled on
> an
> > > async writer thread (though we can still guarantee writes are safely
> > > written before checkpointing, which is what we need).
> > >
> > > In short, I agree, it seems slow. We see this behavior, too. We're
> > digging
> > > into it.
> > >
> > > Cheers,
> > > Chris
> > >
> > > On 1/17/15 12:58 PM, "Roger Hoover" <ro...@gmail.com> wrote:
> > >
> > > >Michael,
> > > >
> > > >Thanks for the response.  I used VisualVM and YourKit and see the CPU
> is
> > > >not being used (0.1%).  I took a few thread dumps and see the main
> > thread
> > > >blocked on the flush() method inside the KV store.
> > > >
> > > >On Sat, Jan 17, 2015 at 7:09 AM, Michael Rose <elementation@gmail.com
> >
> > > >wrote:
> > > >
> > > >> Is your process at 100% CPU? I suspect you're spending most of your
> > > >>time in
> > > >> JSON deserialization, but profile it and check.
> > > >>
> > > >> Michael
> > > >>
> > > >> On Friday, January 16, 2015, Roger Hoover <ro...@gmail.com>
> > > >>wrote:
> > > >>
> > > >> > Hi guys,
> > > >> >
> > > >> > I'm testing a job that needs to load 40M records (6GB in Kafka as
> > > >>JSON)
> > > >> > from a bootstrap topic.  The topic has 4 partitions and I'm
> running
> > > >>the
> > > >> job
> > > >> > using the ProcessJobFactory so all four tasks are in one
> container.
> > > >> >
> > > >> > Using RocksDB, it's taking 19 minutes to load all the data which
> > > >>amounts
> > > >> to
> > > >> > 35k records/sec or 5MB/s based on input size.  I ran iostat during
> > > >>this
> > > >> > time as see the disk write throughput is 14MB/s.
> > > >> >
> > > >> > I didn't tweak any of the storage settings.
> > > >> >
> > > >> > A few questions:
> > > >> > 1) Does this seem low?  I'm running on a Macbook Pro with SSD.
> > > >> > 2) Do you have any recommendations for improving the load speed?
> > > >> >
> > > >> > Thanks,
> > > >> >
> > > >> > Roger
> > > >> >
> > > >>
> > >
> > >
> >
>
>
>
> --
> Thanks and regards
>
> Chinmay Soman
>

Re: Local state write throughput

Posted by Chinmay Soman <ch...@gmail.com>.

I remember running both RocksDB and LevelDB and it was definitely better
(in that 1 test case, it was ~40K vs ~30K random writes/sec) - but I
haven't done any exhaustive comparison.

Btw, I see that you're using 4 partitions ? Any reason you're not using
like >= 128 and running with more containers ?

On Tue, Jan 20, 2015 at 4:05 PM, Roger Hoover <ro...@gmail.com>
wrote:

> Thanks, Chris.
>
> I am not using a changelog for the store because the the bootstrap stream
> is a master copy of the data and the job can recover from there.  No need
> to write out another copy.  Is this the way you typically do it for
> stream/table joins?
>
> Great to know you that you're looking into the performance issues.  I love
> the idea of local state for isolation and predictable throughput but the
> current write throughput puts hard limits on the amount of local state that
> a container can have without really long initialization/recovery times.
>
> Is my tests, LevelDB has about the same performance.  Have you noticed that
> as well?
>
> Cheers,
>
> Roger
>
> On Tue, Jan 20, 2015 at 9:28 AM, Chris Riccomini <
> criccomini@linkedin.com.invalid> wrote:
>
> > Hey Roger,
> >
> > We did some benchmarking, and discovered very similar performance to what
> > you've described. We saw ~40k writes/sec, and ~20 k reads/sec,
> > per-container, on a Virident SSD. This was without any changelog. Are you
> > using a changelog on the store?
> >
> > When we attached a changelog to the store, the writes dropped
> > significantly (~1000 writes/sec). When we hooked up VisualVM, we saw that
> > the container was spending > 99% of its time in
> KafkaSystemProducer.send().
> >
> > We're currently doing two things:
> >
> > 1. Working with our performance team to understand and tune RocksDB
> > properly.
> > 2. Upgrading the Kafka producer to use the new Java-based API.
> (SAMZA-227)
> >
> > For (1), it seems like we should be able to get a lot higher throughput
> > from RocksDB. Anecdotally, we've heard that RocksDB requires many threads
> > in order to max out an SSD, and since Samza is single-threaded, we could
> > just be hitting a RocksDB bottleneck. We won't know until we dig into the
> > problem (which we started investigating last week). The current plan is
> to
> > start by benchmarking RocksDB JNI outside of Samza, and see what we can
> > get. From there, we'll know our "speed of light", and can try to get
> Samza
> > as close as possible to it. If RocksDB JNI can't be made to go "fast",
> > then we'll have to understand why.
> >
> > (2) should help with the changelog issue. I believe that the slowness
> with
> > the changelog is caused because the changelog is using a sync producer to
> > send to Kafka, and is blocking when a batch is flushed. In the new API,
> > the concept of a "sync" producer is removed. All writes are handled on an
> > async writer thread (though we can still guarantee writes are safely
> > written before checkpointing, which is what we need).
> >
> > In short, I agree, it seems slow. We see this behavior, too. We're
> digging
> > into it.
> >
> > Cheers,
> > Chris
> >
> > On 1/17/15 12:58 PM, "Roger Hoover" <ro...@gmail.com> wrote:
> >
> > >Michael,
> > >
> > >Thanks for the response.  I used VisualVM and YourKit and see the CPU is
> > >not being used (0.1%).  I took a few thread dumps and see the main
> thread
> > >blocked on the flush() method inside the KV store.
> > >
> > >On Sat, Jan 17, 2015 at 7:09 AM, Michael Rose <el...@gmail.com>
> > >wrote:
> > >
> > >> Is your process at 100% CPU? I suspect you're spending most of your
> > >>time in
> > >> JSON deserialization, but profile it and check.
> > >>
> > >> Michael
> > >>
> > >> On Friday, January 16, 2015, Roger Hoover <ro...@gmail.com>
> > >>wrote:
> > >>
> > >> > Hi guys,
> > >> >
> > >> > I'm testing a job that needs to load 40M records (6GB in Kafka as
> > >>JSON)
> > >> > from a bootstrap topic.  The topic has 4 partitions and I'm running
> > >>the
> > >> job
> > >> > using the ProcessJobFactory so all four tasks are in one container.
> > >> >
> > >> > Using RocksDB, it's taking 19 minutes to load all the data which
> > >>amounts
> > >> to
> > >> > 35k records/sec or 5MB/s based on input size.  I ran iostat during
> > >>this
> > >> > time as see the disk write throughput is 14MB/s.
> > >> >
> > >> > I didn't tweak any of the storage settings.
> > >> >
> > >> > A few questions:
> > >> > 1) Does this seem low?  I'm running on a Macbook Pro with SSD.
> > >> > 2) Do you have any recommendations for improving the load speed?
> > >> >
> > >> > Thanks,
> > >> >
> > >> > Roger
> > >> >
> > >>
> >
> >
>



-- 
Thanks and regards

Chinmay Soman

Re: Local state write throughput

Posted by Roger Hoover <ro...@gmail.com>.

Thanks, Chris.

I am not using a changelog for the store because the the bootstrap stream
is a master copy of the data and the job can recover from there.  No need
to write out another copy.  Is this the way you typically do it for
stream/table joins?

Great to know you that you're looking into the performance issues.  I love
the idea of local state for isolation and predictable throughput but the
current write throughput puts hard limits on the amount of local state that
a container can have without really long initialization/recovery times.

Is my tests, LevelDB has about the same performance.  Have you noticed that
as well?

Cheers,

Roger

On Tue, Jan 20, 2015 at 9:28 AM, Chris Riccomini <
criccomini@linkedin.com.invalid> wrote:

> Hey Roger,
>
> We did some benchmarking, and discovered very similar performance to what
> you've described. We saw ~40k writes/sec, and ~20 k reads/sec,
> per-container, on a Virident SSD. This was without any changelog. Are you
> using a changelog on the store?
>
> When we attached a changelog to the store, the writes dropped
> significantly (~1000 writes/sec). When we hooked up VisualVM, we saw that
> the container was spending > 99% of its time in KafkaSystemProducer.send().
>
> We're currently doing two things:
>
> 1. Working with our performance team to understand and tune RocksDB
> properly.
> 2. Upgrading the Kafka producer to use the new Java-based API. (SAMZA-227)
>
> For (1), it seems like we should be able to get a lot higher throughput
> from RocksDB. Anecdotally, we've heard that RocksDB requires many threads
> in order to max out an SSD, and since Samza is single-threaded, we could
> just be hitting a RocksDB bottleneck. We won't know until we dig into the
> problem (which we started investigating last week). The current plan is to
> start by benchmarking RocksDB JNI outside of Samza, and see what we can
> get. From there, we'll know our "speed of light", and can try to get Samza
> as close as possible to it. If RocksDB JNI can't be made to go "fast",
> then we'll have to understand why.
>
> (2) should help with the changelog issue. I believe that the slowness with
> the changelog is caused because the changelog is using a sync producer to
> send to Kafka, and is blocking when a batch is flushed. In the new API,
> the concept of a "sync" producer is removed. All writes are handled on an
> async writer thread (though we can still guarantee writes are safely
> written before checkpointing, which is what we need).
>
> In short, I agree, it seems slow. We see this behavior, too. We're digging
> into it.
>
> Cheers,
> Chris
>
> On 1/17/15 12:58 PM, "Roger Hoover" <ro...@gmail.com> wrote:
>
> >Michael,
> >
> >Thanks for the response.  I used VisualVM and YourKit and see the CPU is
> >not being used (0.1%).  I took a few thread dumps and see the main thread
> >blocked on the flush() method inside the KV store.
> >
> >On Sat, Jan 17, 2015 at 7:09 AM, Michael Rose <el...@gmail.com>
> >wrote:
> >
> >> Is your process at 100% CPU? I suspect you're spending most of your
> >>time in
> >> JSON deserialization, but profile it and check.
> >>
> >> Michael
> >>
> >> On Friday, January 16, 2015, Roger Hoover <ro...@gmail.com>
> >>wrote:
> >>
> >> > Hi guys,
> >> >
> >> > I'm testing a job that needs to load 40M records (6GB in Kafka as
> >>JSON)
> >> > from a bootstrap topic.  The topic has 4 partitions and I'm running
> >>the
> >> job
> >> > using the ProcessJobFactory so all four tasks are in one container.
> >> >
> >> > Using RocksDB, it's taking 19 minutes to load all the data which
> >>amounts
> >> to
> >> > 35k records/sec or 5MB/s based on input size.  I ran iostat during
> >>this
> >> > time as see the disk write throughput is 14MB/s.
> >> >
> >> > I didn't tweak any of the storage settings.
> >> >
> >> > A few questions:
> >> > 1) Does this seem low?  I'm running on a Macbook Pro with SSD.
> >> > 2) Do you have any recommendations for improving the load speed?
> >> >
> >> > Thanks,
> >> >
> >> > Roger
> >> >
> >>
>
>

Re: Local state write throughput

Posted by Chris Riccomini <cr...@linkedin.com.INVALID>.

Hey Roger,

We did some benchmarking, and discovered very similar performance to what
you've described. We saw ~40k writes/sec, and ~20 k reads/sec,
per-container, on a Virident SSD. This was without any changelog. Are you
using a changelog on the store?

When we attached a changelog to the store, the writes dropped
significantly (~1000 writes/sec). When we hooked up VisualVM, we saw that
the container was spending > 99% of its time in KafkaSystemProducer.send().

We're currently doing two things:

1. Working with our performance team to understand and tune RocksDB
properly.
2. Upgrading the Kafka producer to use the new Java-based API. (SAMZA-227)

For (1), it seems like we should be able to get a lot higher throughput
from RocksDB. Anecdotally, we've heard that RocksDB requires many threads
in order to max out an SSD, and since Samza is single-threaded, we could
just be hitting a RocksDB bottleneck. We won't know until we dig into the
problem (which we started investigating last week). The current plan is to
start by benchmarking RocksDB JNI outside of Samza, and see what we can
get. From there, we'll know our "speed of light", and can try to get Samza
as close as possible to it. If RocksDB JNI can't be made to go "fast",
then we'll have to understand why.

(2) should help with the changelog issue. I believe that the slowness with
the changelog is caused because the changelog is using a sync producer to
send to Kafka, and is blocking when a batch is flushed. In the new API,
the concept of a "sync" producer is removed. All writes are handled on an
async writer thread (though we can still guarantee writes are safely
written before checkpointing, which is what we need).

In short, I agree, it seems slow. We see this behavior, too. We're digging
into it.

Cheers,
Chris

On 1/17/15 12:58 PM, "Roger Hoover" <ro...@gmail.com> wrote:

>Michael,
>
>Thanks for the response.  I used VisualVM and YourKit and see the CPU is
>not being used (0.1%).  I took a few thread dumps and see the main thread
>blocked on the flush() method inside the KV store.
>
>On Sat, Jan 17, 2015 at 7:09 AM, Michael Rose <el...@gmail.com>
>wrote:
>
>> Is your process at 100% CPU? I suspect you're spending most of your
>>time in
>> JSON deserialization, but profile it and check.
>>
>> Michael
>>
>> On Friday, January 16, 2015, Roger Hoover <ro...@gmail.com>
>>wrote:
>>
>> > Hi guys,
>> >
>> > I'm testing a job that needs to load 40M records (6GB in Kafka as
>>JSON)
>> > from a bootstrap topic.  The topic has 4 partitions and I'm running
>>the
>> job
>> > using the ProcessJobFactory so all four tasks are in one container.
>> >
>> > Using RocksDB, it's taking 19 minutes to load all the data which
>>amounts
>> to
>> > 35k records/sec or 5MB/s based on input size.  I ran iostat during
>>this
>> > time as see the disk write throughput is 14MB/s.
>> >
>> > I didn't tweak any of the storage settings.
>> >
>> > A few questions:
>> > 1) Does this seem low?  I'm running on a Macbook Pro with SSD.
>> > 2) Do you have any recommendations for improving the load speed?
>> >
>> > Thanks,
>> >
>> > Roger
>> >
>>

Re: Local state write throughput

Posted by Roger Hoover <ro...@gmail.com>.

Michael,

Thanks for the response.  I used VisualVM and YourKit and see the CPU is
not being used (0.1%).  I took a few thread dumps and see the main thread
blocked on the flush() method inside the KV store.

On Sat, Jan 17, 2015 at 7:09 AM, Michael Rose <el...@gmail.com>
wrote:

> Is your process at 100% CPU? I suspect you're spending most of your time in
> JSON deserialization, but profile it and check.
>
> Michael
>
> On Friday, January 16, 2015, Roger Hoover <ro...@gmail.com> wrote:
>
> > Hi guys,
> >
> > I'm testing a job that needs to load 40M records (6GB in Kafka as JSON)
> > from a bootstrap topic.  The topic has 4 partitions and I'm running the
> job
> > using the ProcessJobFactory so all four tasks are in one container.
> >
> > Using RocksDB, it's taking 19 minutes to load all the data which amounts
> to
> > 35k records/sec or 5MB/s based on input size.  I ran iostat during this
> > time as see the disk write throughput is 14MB/s.
> >
> > I didn't tweak any of the storage settings.
> >
> > A few questions:
> > 1) Does this seem low?  I'm running on a Macbook Pro with SSD.
> > 2) Do you have any recommendations for improving the load speed?
> >
> > Thanks,
> >
> > Roger
> >
>

Re: Local state write throughput

Posted by Michael Rose <el...@gmail.com>.

Is your process at 100% CPU? I suspect you're spending most of your time in
JSON deserialization, but profile it and check.

Michael

On Friday, January 16, 2015, Roger Hoover <ro...@gmail.com> wrote:

> Hi guys,
>
> I'm testing a job that needs to load 40M records (6GB in Kafka as JSON)
> from a bootstrap topic.  The topic has 4 partitions and I'm running the job
> using the ProcessJobFactory so all four tasks are in one container.
>
> Using RocksDB, it's taking 19 minutes to load all the data which amounts to
> 35k records/sec or 5MB/s based on input size.  I ran iostat during this
> time as see the disk write throughput is 14MB/s.
>
> I didn't tweak any of the storage settings.
>
> A few questions:
> 1) Does this seem low?  I'm running on a Macbook Pro with SSD.
> 2) Do you have any recommendations for improving the load speed?
>
> Thanks,
>
> Roger
>