You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Jungtaek Lim <ka...@gmail.com> on 2020/06/16 06:40:32 UTC

Re: [DISCUSS] remove the incomplete code path on aggregation for continuous mode

Bump this again. I filed SPARK-31985 [1] and plan to submit a PR in a
couple of days if there's no voice on the reason we should keep it.

1. https://issues.apache.org/jira/browse/SPARK-31985

On Thu, May 21, 2020 at 8:54 AM Jungtaek Lim <ka...@gmail.com>
wrote:

> Let me share the effect on removing the incomplete and undocumented code
> path. I manually tried out removing the code path and here's the change.
>
>
> https://github.com/HeartSaVioR/spark/commit/aa53e9b1b33c0b8aec37704ad290b42ffb2962d8
>
> 1,120 lines deleted without hurting any existing streaming tests, except a
> suite I removed as well since it tests such code path. No need to update
> documentation as it was never publicized. This also removes some rules
> which only apply for continuous mode, which gave exceptions on the fact
> that "continuous mode is only available for map-like operations". Removing
> them would be back to simplify the usage of continuous mode.
>
> Also worth noting that I had to manually remove the code path instead of
> revert, because the code path has been changed to reflect DSv2 change. What
> this means? We have to "update" the code path and concern
> about compatibility, etc. while it never be used in production.
>
> On Tue, May 19, 2020 at 1:14 PM Jungtaek Lim <ka...@gmail.com>
> wrote:
>
>> Hi devs,
>>
>> during experiment on complete mode I realized we left some incomplete
>> code parts on supporting aggregation for continuous mode. (shuffle &
>> coalesce)
>>
>> The work had been occurred around first half of 2018 and stopped, and no
>> work has been done for around 2 years (so I don't expect anyone is working
>> on this). The functionality is undocumented (as the work was only done
>> partially) and continuous mode is experimental so I don't feel risks to get
>> rid of the part.
>>
>> What do you think? If it makes sense then I'll raise a PR to get rid of
>> the incomplete codes.
>>
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>>
>> ps. I'm kind of feeling the same with continuous mode itself - it has
>> been still experimental over the 2 years, and while it lacks lots of
>> essential things (backpressure, epoch synchronization, query progress,
>> etc.) there's no additional major work over 1 year. We eventually have been
>> excluded continuous mode for the new streaming feature like observable
>> metric, streaming UI, etc. because it is on top of the feature
>> which continuous mode lacks.
>>
>> Unlike incomplete code path, I'm not strongly against this, as it
>> has been documented and someone might use the feature in production. I
>> still think it's ideal to retire the feature smoothly.
>> (Please chime in with the use case if someone has the production cases.)
>>
>> ps2. It feels like "feature branch" looks to be a thing to consider for
>> efforts on one big feature.
>>
>

Re: [DISCUSS] remove the incomplete code path on aggregation for continuous mode

Posted by Jungtaek Lim <ka...@gmail.com>.

Just submitted the patch: https://github.com/apache/spark/pull/29077

On Tue, Jun 16, 2020 at 3:40 PM Jungtaek Lim <ka...@gmail.com>
wrote:

> Bump this again. I filed SPARK-31985 [1] and plan to submit a PR in a
> couple of days if there's no voice on the reason we should keep it.
>
> 1. https://issues.apache.org/jira/browse/SPARK-31985
>
> On Thu, May 21, 2020 at 8:54 AM Jungtaek Lim <ka...@gmail.com>
> wrote:
>
>> Let me share the effect on removing the incomplete and undocumented code
>> path. I manually tried out removing the code path and here's the change.
>>
>>
>> https://github.com/HeartSaVioR/spark/commit/aa53e9b1b33c0b8aec37704ad290b42ffb2962d8
>>
>> 1,120 lines deleted without hurting any existing streaming tests, except
>> a suite I removed as well since it tests such code path. No need to update
>> documentation as it was never publicized. This also removes some rules
>> which only apply for continuous mode, which gave exceptions on the fact
>> that "continuous mode is only available for map-like operations". Removing
>> them would be back to simplify the usage of continuous mode.
>>
>> Also worth noting that I had to manually remove the code path instead of
>> revert, because the code path has been changed to reflect DSv2 change. What
>> this means? We have to "update" the code path and concern
>> about compatibility, etc. while it never be used in production.
>>
>> On Tue, May 19, 2020 at 1:14 PM Jungtaek Lim <
>> kabhwan.opensource@gmail.com> wrote:
>>
>>> Hi devs,
>>>
>>> during experiment on complete mode I realized we left some incomplete
>>> code parts on supporting aggregation for continuous mode. (shuffle &
>>> coalesce)
>>>
>>> The work had been occurred around first half of 2018 and stopped, and no
>>> work has been done for around 2 years (so I don't expect anyone is working
>>> on this). The functionality is undocumented (as the work was only done
>>> partially) and continuous mode is experimental so I don't feel risks to get
>>> rid of the part.
>>>
>>> What do you think? If it makes sense then I'll raise a PR to get rid of
>>> the incomplete codes.
>>>
>>> Thanks,
>>> Jungtaek Lim (HeartSaVioR)
>>>
>>> ps. I'm kind of feeling the same with continuous mode itself - it has
>>> been still experimental over the 2 years, and while it lacks lots of
>>> essential things (backpressure, epoch synchronization, query progress,
>>> etc.) there's no additional major work over 1 year. We eventually have been
>>> excluded continuous mode for the new streaming feature like observable
>>> metric, streaming UI, etc. because it is on top of the feature
>>> which continuous mode lacks.
>>>
>>> Unlike incomplete code path, I'm not strongly against this, as it
>>> has been documented and someone might use the feature in production. I
>>> still think it's ideal to retire the feature smoothly.
>>> (Please chime in with the use case if someone has the production cases.)
>>>
>>> ps2. It feels like "feature branch" looks to be a thing to consider for
>>> efforts on one big feature.
>>>
>>