You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@beam.apache.org by David Morávek <dm...@apache.org> on 2018/12/20 11:27:02 UTC

Re: Spark-optimized Shuffle (SOS) any update?

This is an awesome news! Is there anything we can do to help? We are
currently facing huge performance penalties due this issue.

Thanks,
David

On Wed, Dec 19, 2018 at 5:43 PM Ilan Filonenko <if...@cornell.edu> wrote:

> Recently, the community has actively been working on this. The JIRA to
> follow is:
> https://issues.apache.org/jira/browse/SPARK-25299. A group of various
> companies including Bloomberg and Palantir are in the works of a WIP
> solution that implements a varied version of Option #5 (which is elaborated
> upon in the google doc linked in the JIRA summary).
>
> On Wed, Dec 19, 2018 at 5:20 AM <ma...@seznam.cz> wrote:
>
>> Hi everyone,
>>     we are facing same problems as Facebook had, where shuffle service is
>> a bottleneck. For now we solved that with large task size (2g) to reduce
>> shuffle I/O.
>>
>> I saw very nice presentation from Brian Cho on Optimizing shuffle I/O at
>> large scale[1]. It is a implementation of white paper[2].
>> Brian Cho at the end of the lecture kindly mentioned about plans to
>> contribute it back to Spark[3]. I checked mailing list and spark JIRA and
>> didn't find any ticket on this topic.
>>
>> Please, does anyone has a contact on someone from Facebook who could know
>> more about this? Or are there some plans to bring similar optimization to
>> Spark?
>>
>> [1] https://databricks.com/session/sos-optimizing-shuffle-i-o
>> [2] https://haoyuzhang.org/publications/riffle-eurosys18.pdf
>> [3]
>> https://image.slidesharecdn.com/5brianchoerginseyfe-180613004126/95/sos-optimizing-shuffle-io-with-brian-cho-and-ergin-seyfe-30-638.jpg?cb=1528850545
>>
>

Re: Spark-optimized Shuffle (SOS) any update?

Posted by David Morávek <dm...@apache.org>.

I've just quickly went trough the design doc and it seems that it has
nothing to do with the paper Marek has mentioned. The paper is trying to
solving the problem of io ops required for shuffle are growing
quadratically with number of tasks (shuffle files), therefore we need to
keep number of tasks low.

Am I missing something?

Thanks,
D.

On Thu, Dec 20, 2018 at 12:27 PM David Morávek <dm...@apache.org> wrote:

> This is an awesome news! Is there anything we can do to help? We are
> currently facing huge performance penalties due this issue.
>
> Thanks,
> David
>
> On Wed, Dec 19, 2018 at 5:43 PM Ilan Filonenko <if...@cornell.edu> wrote:
>
>> Recently, the community has actively been working on this. The JIRA to
>> follow is:
>> https://issues.apache.org/jira/browse/SPARK-25299. A group of various
>> companies including Bloomberg and Palantir are in the works of a WIP
>> solution that implements a varied version of Option #5 (which is elaborated
>> upon in the google doc linked in the JIRA summary).
>>
>> On Wed, Dec 19, 2018 at 5:20 AM <ma...@seznam.cz> wrote:
>>
>>> Hi everyone,
>>>     we are facing same problems as Facebook had, where shuffle service
>>> is a bottleneck. For now we solved that with large task size (2g) to reduce
>>> shuffle I/O.
>>>
>>> I saw very nice presentation from Brian Cho on Optimizing shuffle I/O at
>>> large scale[1]. It is a implementation of white paper[2].
>>> Brian Cho at the end of the lecture kindly mentioned about plans to
>>> contribute it back to Spark[3]. I checked mailing list and spark JIRA and
>>> didn't find any ticket on this topic.
>>>
>>> Please, does anyone has a contact on someone from Facebook who could
>>> know more about this? Or are there some plans to bring similar optimization
>>> to Spark?
>>>
>>> [1] https://databricks.com/session/sos-optimizing-shuffle-i-o
>>> [2] https://haoyuzhang.org/publications/riffle-eurosys18.pdf
>>> [3]
>>> https://image.slidesharecdn.com/5brianchoerginseyfe-180613004126/95/sos-optimizing-shuffle-io-with-brian-cho-and-ergin-seyfe-30-638.jpg?cb=1528850545
>>>
>>