You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by Jake Maes <ja...@gmail.com> on 2016/06/01 02:22:45 UTC

Review Request 48109: SAMZA-957 Avoid unnecessary KV Store flushes (part 3)

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48109/
-----------------------------------------------------------

Review request for samza, Boris Shkolnik, Chris Pettitt, Jake Maes, Navina Ramesh, Jagadish Venkatraman, Xinyu Liu, and Yi Pan (Data Infrastructure).


Bugs: SAMZA-957
    https://issues.apache.org/jira/browse/SAMZA-957


Repository: samza


Description
-------

SAMZA-957 Avoid unnecessary KV Store flushes (part 3)


Diffs
-----

  samza-kv/src/main/scala/org/apache/samza/storage/kv/CachedStore.scala c28f8db8cb59bd5415e78535877acc1e5bee0f67 
  samza-kv/src/test/scala/org/apache/samza/storage/kv/TestCachedStore.scala eee74473726cb2a36d0b75fe5c9df737440980bc 

Diff: https://reviews.apache.org/r/48109/diff/


Testing
-------


Thanks,

Jake Maes


Re: Review Request 48109: SAMZA-957 Avoid unnecessary KV Store flushes (part 3)

Posted by Jake Maes <ja...@gmail.com>.

> On June 1, 2016, 3:20 p.m., Chris Pettitt wrote:
> > samza-kv/src/main/scala/org/apache/samza/storage/kv/CachedStore.scala, line 179
> > <https://reviews.apache.org/r/48109/diff/1/?file=1402973#file1402973line179>
> >
> >     I think you could even get away with returning true here only if the key is an array and we're already doing the check earlier. That would also allow the hasArrayKeys method to be dropped along with the containsArrayKeys field, which would later need to be guarded by a lock or made volatile. This would be an improvement but is tangential to your change. It would be fine to bias towards safety in doubt.

I like the idea of getting rid of hasArrayKeys, but I'm not sure how to get rid of containsArrayKeys if we want to preserve 1) printing the warning only once, 2) performance of put(). Although instanceOf is supposed to be fast, it's probably not as fast as checking a boolean. Any suggestions?

As far as only returning true if the key is an array, that would basically disable the write batching. The current code seems to purge dirty values if there are writeBatchSize dirty keys or if the cache is full and the LRU item is dirty. So the check above is similar, but for a different purpose.


- Jake


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48109/#review135803
-----------------------------------------------------------


On June 1, 2016, 2:23 a.m., Jake Maes wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48109/
> -----------------------------------------------------------
> 
> (Updated June 1, 2016, 2:23 a.m.)
> 
> 
> Review request for samza, Boris Shkolnik, Chris Pettitt, Jake Maes, Navina Ramesh, Jagadish Venkatraman, Xinyu Liu, and Yi Pan (Data Infrastructure).
> 
> 
> Bugs: SAMZA-957
>     https://issues.apache.org/jira/browse/SAMZA-957
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> SAMZA-957 Avoid unnecessary KV Store flushes (part 3)
> 
> 
> Diffs
> -----
> 
>   samza-kv/src/main/scala/org/apache/samza/storage/kv/CachedStore.scala c28f8db8cb59bd5415e78535877acc1e5bee0f67 
>   samza-kv/src/test/scala/org/apache/samza/storage/kv/TestCachedStore.scala eee74473726cb2a36d0b75fe5c9df737440980bc 
> 
> Diff: https://reviews.apache.org/r/48109/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew build
> 
> 
> Thanks,
> 
> Jake Maes
> 
>


Re: Review Request 48109: SAMZA-957 Avoid unnecessary KV Store flushes (part 3)

Posted by Chris Pettitt <cp...@linkedin.com>.

> On June 1, 2016, 3:20 p.m., Chris Pettitt wrote:
> > samza-kv/src/main/scala/org/apache/samza/storage/kv/CachedStore.scala, line 179
> > <https://reviews.apache.org/r/48109/diff/1/?file=1402973#file1402973line179>
> >
> >     I think you could even get away with returning true here only if the key is an array and we're already doing the check earlier. That would also allow the hasArrayKeys method to be dropped along with the containsArrayKeys field, which would later need to be guarded by a lock or made volatile. This would be an improvement but is tangential to your change. It would be fine to bias towards safety in doubt.
> 
> Jake Maes wrote:
>     I like the idea of getting rid of hasArrayKeys, but I'm not sure how to get rid of containsArrayKeys if we want to preserve 1) printing the warning only once, 2) performance of put(). Although instanceOf is supposed to be fast, it's probably not as fast as checking a boolean. Any suggestions?
>     
>     As far as only returning true if the key is an array, that would basically disable the write batching. The current code seems to purge dirty values if there are writeBatchSize dirty keys or if the cache is full and the LRU item is dirty. So the check above is similar, but for a different purpose.

You're right. We must take the hit of some additional state if we want to log a warning at runtime. Regarding the second paragraph, I wasn't intending to change the behavior for the other branches.


- Chris


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48109/#review135803
-----------------------------------------------------------


On June 1, 2016, 2:23 a.m., Jake Maes wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48109/
> -----------------------------------------------------------
> 
> (Updated June 1, 2016, 2:23 a.m.)
> 
> 
> Review request for samza, Boris Shkolnik, Chris Pettitt, Jake Maes, Navina Ramesh, Jagadish Venkatraman, Xinyu Liu, and Yi Pan (Data Infrastructure).
> 
> 
> Bugs: SAMZA-957
>     https://issues.apache.org/jira/browse/SAMZA-957
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> SAMZA-957 Avoid unnecessary KV Store flushes (part 3)
> 
> 
> Diffs
> -----
> 
>   samza-kv/src/main/scala/org/apache/samza/storage/kv/CachedStore.scala c28f8db8cb59bd5415e78535877acc1e5bee0f67 
>   samza-kv/src/test/scala/org/apache/samza/storage/kv/TestCachedStore.scala eee74473726cb2a36d0b75fe5c9df737440980bc 
> 
> Diff: https://reviews.apache.org/r/48109/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew build
> 
> 
> Thanks,
> 
> Jake Maes
> 
>


Re: Review Request 48109: SAMZA-957 Avoid unnecessary KV Store flushes (part 3)

Posted by Chris Pettitt <cp...@linkedin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48109/#review135803
-----------------------------------------------------------


Ship it!





samza-kv/src/main/scala/org/apache/samza/storage/kv/CachedStore.scala (line 173)
<https://reviews.apache.org/r/48109/#comment200849>

    I think you could even get away with returning true here only if the key is an array and we're already doing the check earlier. That would also allow the hasArrayKeys method to be dropped along with the containsArrayKeys field, which would later need to be guarded by a lock or made volatile. This would be an improvement but is tangential to your change. It would be fine to bias towards safety in doubt.


- Chris Pettitt


On June 1, 2016, 2:23 a.m., Jake Maes wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48109/
> -----------------------------------------------------------
> 
> (Updated June 1, 2016, 2:23 a.m.)
> 
> 
> Review request for samza, Boris Shkolnik, Chris Pettitt, Jake Maes, Navina Ramesh, Jagadish Venkatraman, Xinyu Liu, and Yi Pan (Data Infrastructure).
> 
> 
> Bugs: SAMZA-957
>     https://issues.apache.org/jira/browse/SAMZA-957
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> SAMZA-957 Avoid unnecessary KV Store flushes (part 3)
> 
> 
> Diffs
> -----
> 
>   samza-kv/src/main/scala/org/apache/samza/storage/kv/CachedStore.scala c28f8db8cb59bd5415e78535877acc1e5bee0f67 
>   samza-kv/src/test/scala/org/apache/samza/storage/kv/TestCachedStore.scala eee74473726cb2a36d0b75fe5c9df737440980bc 
> 
> Diff: https://reviews.apache.org/r/48109/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew build
> 
> 
> Thanks,
> 
> Jake Maes
> 
>


Re: Review Request 48109: SAMZA-957 Avoid unnecessary KV Store flushes (part 3)

Posted by "Yi Pan (Data Infrastructure)" <yi...@linkedin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48109/#review136036
-----------------------------------------------------------


Ship it!




lgtm. Thanks!

- Yi Pan (Data Infrastructure)


On June 1, 2016, 2:23 a.m., Jake Maes wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48109/
> -----------------------------------------------------------
> 
> (Updated June 1, 2016, 2:23 a.m.)
> 
> 
> Review request for samza, Boris Shkolnik, Chris Pettitt, Jake Maes, Navina Ramesh, Jagadish Venkatraman, Xinyu Liu, and Yi Pan (Data Infrastructure).
> 
> 
> Bugs: SAMZA-957
>     https://issues.apache.org/jira/browse/SAMZA-957
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> SAMZA-957 Avoid unnecessary KV Store flushes (part 3)
> 
> 
> Diffs
> -----
> 
>   samza-kv/src/main/scala/org/apache/samza/storage/kv/CachedStore.scala c28f8db8cb59bd5415e78535877acc1e5bee0f67 
>   samza-kv/src/test/scala/org/apache/samza/storage/kv/TestCachedStore.scala eee74473726cb2a36d0b75fe5c9df737440980bc 
> 
> Diff: https://reviews.apache.org/r/48109/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew build
> 
> 
> Thanks,
> 
> Jake Maes
> 
>


Re: Review Request 48109: SAMZA-957 Avoid unnecessary KV Store flushes (part 3)

Posted by Jake Maes <ja...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48109/
-----------------------------------------------------------

(Updated June 1, 2016, 2:23 a.m.)


Review request for samza, Boris Shkolnik, Chris Pettitt, Jake Maes, Navina Ramesh, Jagadish Venkatraman, Xinyu Liu, and Yi Pan (Data Infrastructure).


Bugs: SAMZA-957
    https://issues.apache.org/jira/browse/SAMZA-957


Repository: samza


Description
-------

SAMZA-957 Avoid unnecessary KV Store flushes (part 3)


Diffs
-----

  samza-kv/src/main/scala/org/apache/samza/storage/kv/CachedStore.scala c28f8db8cb59bd5415e78535877acc1e5bee0f67 
  samza-kv/src/test/scala/org/apache/samza/storage/kv/TestCachedStore.scala eee74473726cb2a36d0b75fe5c9df737440980bc 

Diff: https://reviews.apache.org/r/48109/diff/


Testing (updated)
-------

./gradlew build


Thanks,

Jake Maes