You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Jungtaek Lim <ka...@gmail.com> on 2019/10/02 03:21:43 UTC

[DISCUSS] Preferred approach on dealing with SPARK-29322

Hi devs,

I've discovered an issue with event logger, specifically reading incomplete
event log file which is compressed with 'zstd' - the reader thread got
stuck on reading that file.

This is very easy to reproduce: setting configuration as below

- spark.eventLog.enabled=true
- spark.eventLog.compress=true
- spark.eventLog.compression.codec=zstd

and start Spark application. While the application is running, load the
application in SHS webpage. It may succeed to replay the event log, but
high likely it will be stuck and loading page will be also stuck.

Please refer SPARK-29322 for more details.

As the issue only occurs with 'zstd', the simplest approach is dropping
support of 'zstd' for event log. More general approach would be introducing
timeout on reading event log file, but it should be able to differentiate
thread being stuck vs thread busy with reading huge event log file.

Which approach would be preferred in Spark community, or would someone
propose better ideas for handling this?

Thanks,
Jungtaek Lim (HeartSaVioR)

Re: [DISCUSS] Preferred approach on dealing with SPARK-29322

Posted by Jungtaek Lim <ka...@gmail.com>.
I'm not 100% sure I understand the question. Assuming you're referring
"both" as SPARK-26283 [1] and SPARK-29322 [2], if you ask about the fix
then yes, only master branch as fix for SPARK-26283 is not ported back to
branch-2.4. If you ask about the issue (problem) then maybe no, according
to the affected version of SPARK-26283 (2.4.0 is also there).

On Wed, Oct 2, 2019 at 11:47 PM Dongjoon Hyun <do...@gmail.com>
wrote:

> Thank you for the investigation and making a fix.
>
> So, both issues are on only master (3.0.0) branch?
>
> Bests,
> Dongjoon.
>
>
> On Wed, Oct 2, 2019 at 00:06 Jungtaek Lim <ka...@gmail.com>
> wrote:
>
>> FYI: patch submitted - https://github.com/apache/spark/pull/25996
>>
>> On Wed, Oct 2, 2019 at 3:25 PM Jungtaek Lim <ka...@gmail.com>
>> wrote:
>>
>>> I need to do full manual test to make sure, but according to experiment
>>> (small UT) "closeFrameOnFlush" seems to work.
>>>
>>> There was relevant change on master branch SPARK-26283 [1], and it
>>> changed the way to read the zstd event log file to "continuous", which
>>> seems to read open frame. With "closeFrameOnFlush" being false for
>>> ZstdOutputStream, frame is never closed (even flushing output stream)
>>> unless output stream is closed.
>>>
>>> I'll raise a patch once manual test is passed. Sorry for the false alarm.
>>>
>>> Thanks,
>>> Jungtaek Lim (HeartSaVioR)
>>>
>>> 1. https://issues.apache.org/jira/browse/SPARK-26283
>>>
>>> On Wed, Oct 2, 2019 at 2:33 PM Jungtaek Lim <
>>> kabhwan.opensource@gmail.com> wrote:
>>>
>>>> The change log for zstd v1.4.3 feels me that the changes don't seem to
>>>> be related.
>>>>
>>>> https://github.com/facebook/zstd/blob/dev/CHANGELOG#L1-L5
>>>>
>>>> v1.4.3
>>>> bug: Fix Dictionary Compression Ratio Regression by @cyan4973 (#1709)
>>>> bug: Fix Buffer Overflow in v0.3 Decompression by @felixhandte (#1722)
>>>> build: Add support for IAR C/C++ Compiler for Arm by @joseph0918 (#1705)
>>>> misc: Add NULL pointer check in util.c by @leeyoung624 (#1706)
>>>>
>>>> But it's only the matter of dependency update and rebuild, so I'll try
>>>> it out.
>>>>
>>>> Before that, I just indicated ZstdOutputStream has a parameter
>>>> "closeFrameOnFlush" which seems to deal with flush. We let the value as the
>>>> default value which is "false". Let me pass the value to "true" and see it
>>>> helps. Please let me know if someone knows why we pick the value as false
>>>> (or let it by default).
>>>>
>>>>
>>>> On Wed, Oct 2, 2019 at 1:48 PM Dongjoon Hyun <do...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thank you for reporting, Jungtaek.
>>>>>
>>>>> Can we try to upgrade it to the newer version first?
>>>>>
>>>>> Since we are at 1.4.2, the newer version is 1.4.3.
>>>>>
>>>>> Bests,
>>>>> Dongjoon.
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Oct 1, 2019 at 9:18 PM Mridul Muralidharan <mr...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Makes more sense to drop support for zstd assuming the fix is not
>>>>>> something at spark end (configuration, etc).
>>>>>> Does not make sense to try to detect deadlock in codec.
>>>>>>
>>>>>> Regards,
>>>>>> Mridul
>>>>>>
>>>>>> On Tue, Oct 1, 2019 at 8:39 PM Jungtaek Lim
>>>>>> <ka...@gmail.com> wrote:
>>>>>> >
>>>>>> > Hi devs,
>>>>>> >
>>>>>> > I've discovered an issue with event logger, specifically reading
>>>>>> incomplete event log file which is compressed with 'zstd' - the reader
>>>>>> thread got stuck on reading that file.
>>>>>> >
>>>>>> > This is very easy to reproduce: setting configuration as below
>>>>>> >
>>>>>> > - spark.eventLog.enabled=true
>>>>>> > - spark.eventLog.compress=true
>>>>>> > - spark.eventLog.compression.codec=zstd
>>>>>> >
>>>>>> > and start Spark application. While the application is running, load
>>>>>> the application in SHS webpage. It may succeed to replay the event log, but
>>>>>> high likely it will be stuck and loading page will be also stuck.
>>>>>> >
>>>>>> > Please refer SPARK-29322 for more details.
>>>>>> >
>>>>>> > As the issue only occurs with 'zstd', the simplest approach is
>>>>>> dropping support of 'zstd' for event log. More general approach would be
>>>>>> introducing timeout on reading event log file, but it should be able to
>>>>>> differentiate thread being stuck vs thread busy with reading huge event log
>>>>>> file.
>>>>>> >
>>>>>> > Which approach would be preferred in Spark community, or would
>>>>>> someone propose better ideas for handling this?
>>>>>> >
>>>>>> > Thanks,
>>>>>> > Jungtaek Lim (HeartSaVioR)
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>>>
>>>>>>

Re: [DISCUSS] Preferred approach on dealing with SPARK-29322

Posted by Dongjoon Hyun <do...@gmail.com>.
Thank you for the investigation and making a fix.

So, both issues are on only master (3.0.0) branch?

Bests,
Dongjoon.


On Wed, Oct 2, 2019 at 00:06 Jungtaek Lim <ka...@gmail.com>
wrote:

> FYI: patch submitted - https://github.com/apache/spark/pull/25996
>
> On Wed, Oct 2, 2019 at 3:25 PM Jungtaek Lim <ka...@gmail.com>
> wrote:
>
>> I need to do full manual test to make sure, but according to experiment
>> (small UT) "closeFrameOnFlush" seems to work.
>>
>> There was relevant change on master branch SPARK-26283 [1], and it
>> changed the way to read the zstd event log file to "continuous", which
>> seems to read open frame. With "closeFrameOnFlush" being false for
>> ZstdOutputStream, frame is never closed (even flushing output stream)
>> unless output stream is closed.
>>
>> I'll raise a patch once manual test is passed. Sorry for the false alarm.
>>
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>>
>> 1. https://issues.apache.org/jira/browse/SPARK-26283
>>
>> On Wed, Oct 2, 2019 at 2:33 PM Jungtaek Lim <ka...@gmail.com>
>> wrote:
>>
>>> The change log for zstd v1.4.3 feels me that the changes don't seem to
>>> be related.
>>>
>>> https://github.com/facebook/zstd/blob/dev/CHANGELOG#L1-L5
>>>
>>> v1.4.3
>>> bug: Fix Dictionary Compression Ratio Regression by @cyan4973 (#1709)
>>> bug: Fix Buffer Overflow in v0.3 Decompression by @felixhandte (#1722)
>>> build: Add support for IAR C/C++ Compiler for Arm by @joseph0918 (#1705)
>>> misc: Add NULL pointer check in util.c by @leeyoung624 (#1706)
>>>
>>> But it's only the matter of dependency update and rebuild, so I'll try
>>> it out.
>>>
>>> Before that, I just indicated ZstdOutputStream has a parameter
>>> "closeFrameOnFlush" which seems to deal with flush. We let the value as the
>>> default value which is "false". Let me pass the value to "true" and see it
>>> helps. Please let me know if someone knows why we pick the value as false
>>> (or let it by default).
>>>
>>>
>>> On Wed, Oct 2, 2019 at 1:48 PM Dongjoon Hyun <do...@gmail.com>
>>> wrote:
>>>
>>>> Thank you for reporting, Jungtaek.
>>>>
>>>> Can we try to upgrade it to the newer version first?
>>>>
>>>> Since we are at 1.4.2, the newer version is 1.4.3.
>>>>
>>>> Bests,
>>>> Dongjoon.
>>>>
>>>>
>>>>
>>>> On Tue, Oct 1, 2019 at 9:18 PM Mridul Muralidharan <mr...@gmail.com>
>>>> wrote:
>>>>
>>>>> Makes more sense to drop support for zstd assuming the fix is not
>>>>> something at spark end (configuration, etc).
>>>>> Does not make sense to try to detect deadlock in codec.
>>>>>
>>>>> Regards,
>>>>> Mridul
>>>>>
>>>>> On Tue, Oct 1, 2019 at 8:39 PM Jungtaek Lim
>>>>> <ka...@gmail.com> wrote:
>>>>> >
>>>>> > Hi devs,
>>>>> >
>>>>> > I've discovered an issue with event logger, specifically reading
>>>>> incomplete event log file which is compressed with 'zstd' - the reader
>>>>> thread got stuck on reading that file.
>>>>> >
>>>>> > This is very easy to reproduce: setting configuration as below
>>>>> >
>>>>> > - spark.eventLog.enabled=true
>>>>> > - spark.eventLog.compress=true
>>>>> > - spark.eventLog.compression.codec=zstd
>>>>> >
>>>>> > and start Spark application. While the application is running, load
>>>>> the application in SHS webpage. It may succeed to replay the event log, but
>>>>> high likely it will be stuck and loading page will be also stuck.
>>>>> >
>>>>> > Please refer SPARK-29322 for more details.
>>>>> >
>>>>> > As the issue only occurs with 'zstd', the simplest approach is
>>>>> dropping support of 'zstd' for event log. More general approach would be
>>>>> introducing timeout on reading event log file, but it should be able to
>>>>> differentiate thread being stuck vs thread busy with reading huge event log
>>>>> file.
>>>>> >
>>>>> > Which approach would be preferred in Spark community, or would
>>>>> someone propose better ideas for handling this?
>>>>> >
>>>>> > Thanks,
>>>>> > Jungtaek Lim (HeartSaVioR)
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>>
>>>>>

Re: [DISCUSS] Preferred approach on dealing with SPARK-29322

Posted by Jungtaek Lim <ka...@gmail.com>.
FYI: patch submitted - https://github.com/apache/spark/pull/25996

On Wed, Oct 2, 2019 at 3:25 PM Jungtaek Lim <ka...@gmail.com>
wrote:

> I need to do full manual test to make sure, but according to experiment
> (small UT) "closeFrameOnFlush" seems to work.
>
> There was relevant change on master branch SPARK-26283 [1], and it changed
> the way to read the zstd event log file to "continuous", which seems to
> read open frame. With "closeFrameOnFlush" being false for ZstdOutputStream,
> frame is never closed (even flushing output stream) unless output stream is
> closed.
>
> I'll raise a patch once manual test is passed. Sorry for the false alarm.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> 1. https://issues.apache.org/jira/browse/SPARK-26283
>
> On Wed, Oct 2, 2019 at 2:33 PM Jungtaek Lim <ka...@gmail.com>
> wrote:
>
>> The change log for zstd v1.4.3 feels me that the changes don't seem to be
>> related.
>>
>> https://github.com/facebook/zstd/blob/dev/CHANGELOG#L1-L5
>>
>> v1.4.3
>> bug: Fix Dictionary Compression Ratio Regression by @cyan4973 (#1709)
>> bug: Fix Buffer Overflow in v0.3 Decompression by @felixhandte (#1722)
>> build: Add support for IAR C/C++ Compiler for Arm by @joseph0918 (#1705)
>> misc: Add NULL pointer check in util.c by @leeyoung624 (#1706)
>>
>> But it's only the matter of dependency update and rebuild, so I'll try it
>> out.
>>
>> Before that, I just indicated ZstdOutputStream has a parameter
>> "closeFrameOnFlush" which seems to deal with flush. We let the value as the
>> default value which is "false". Let me pass the value to "true" and see it
>> helps. Please let me know if someone knows why we pick the value as false
>> (or let it by default).
>>
>>
>> On Wed, Oct 2, 2019 at 1:48 PM Dongjoon Hyun <do...@gmail.com>
>> wrote:
>>
>>> Thank you for reporting, Jungtaek.
>>>
>>> Can we try to upgrade it to the newer version first?
>>>
>>> Since we are at 1.4.2, the newer version is 1.4.3.
>>>
>>> Bests,
>>> Dongjoon.
>>>
>>>
>>>
>>> On Tue, Oct 1, 2019 at 9:18 PM Mridul Muralidharan <mr...@gmail.com>
>>> wrote:
>>>
>>>> Makes more sense to drop support for zstd assuming the fix is not
>>>> something at spark end (configuration, etc).
>>>> Does not make sense to try to detect deadlock in codec.
>>>>
>>>> Regards,
>>>> Mridul
>>>>
>>>> On Tue, Oct 1, 2019 at 8:39 PM Jungtaek Lim
>>>> <ka...@gmail.com> wrote:
>>>> >
>>>> > Hi devs,
>>>> >
>>>> > I've discovered an issue with event logger, specifically reading
>>>> incomplete event log file which is compressed with 'zstd' - the reader
>>>> thread got stuck on reading that file.
>>>> >
>>>> > This is very easy to reproduce: setting configuration as below
>>>> >
>>>> > - spark.eventLog.enabled=true
>>>> > - spark.eventLog.compress=true
>>>> > - spark.eventLog.compression.codec=zstd
>>>> >
>>>> > and start Spark application. While the application is running, load
>>>> the application in SHS webpage. It may succeed to replay the event log, but
>>>> high likely it will be stuck and loading page will be also stuck.
>>>> >
>>>> > Please refer SPARK-29322 for more details.
>>>> >
>>>> > As the issue only occurs with 'zstd', the simplest approach is
>>>> dropping support of 'zstd' for event log. More general approach would be
>>>> introducing timeout on reading event log file, but it should be able to
>>>> differentiate thread being stuck vs thread busy with reading huge event log
>>>> file.
>>>> >
>>>> > Which approach would be preferred in Spark community, or would
>>>> someone propose better ideas for handling this?
>>>> >
>>>> > Thanks,
>>>> > Jungtaek Lim (HeartSaVioR)
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>
>>>>

Re: [DISCUSS] Preferred approach on dealing with SPARK-29322

Posted by Jungtaek Lim <ka...@gmail.com>.
I need to do full manual test to make sure, but according to experiment
(small UT) "closeFrameOnFlush" seems to work.

There was relevant change on master branch SPARK-26283 [1], and it changed
the way to read the zstd event log file to "continuous", which seems to
read open frame. With "closeFrameOnFlush" being false for ZstdOutputStream,
frame is never closed (even flushing output stream) unless output stream is
closed.

I'll raise a patch once manual test is passed. Sorry for the false alarm.

Thanks,
Jungtaek Lim (HeartSaVioR)

1. https://issues.apache.org/jira/browse/SPARK-26283

On Wed, Oct 2, 2019 at 2:33 PM Jungtaek Lim <ka...@gmail.com>
wrote:

> The change log for zstd v1.4.3 feels me that the changes don't seem to be
> related.
>
> https://github.com/facebook/zstd/blob/dev/CHANGELOG#L1-L5
>
> v1.4.3
> bug: Fix Dictionary Compression Ratio Regression by @cyan4973 (#1709)
> bug: Fix Buffer Overflow in v0.3 Decompression by @felixhandte (#1722)
> build: Add support for IAR C/C++ Compiler for Arm by @joseph0918 (#1705)
> misc: Add NULL pointer check in util.c by @leeyoung624 (#1706)
>
> But it's only the matter of dependency update and rebuild, so I'll try it
> out.
>
> Before that, I just indicated ZstdOutputStream has a parameter
> "closeFrameOnFlush" which seems to deal with flush. We let the value as the
> default value which is "false". Let me pass the value to "true" and see it
> helps. Please let me know if someone knows why we pick the value as false
> (or let it by default).
>
>
> On Wed, Oct 2, 2019 at 1:48 PM Dongjoon Hyun <do...@gmail.com>
> wrote:
>
>> Thank you for reporting, Jungtaek.
>>
>> Can we try to upgrade it to the newer version first?
>>
>> Since we are at 1.4.2, the newer version is 1.4.3.
>>
>> Bests,
>> Dongjoon.
>>
>>
>>
>> On Tue, Oct 1, 2019 at 9:18 PM Mridul Muralidharan <mr...@gmail.com>
>> wrote:
>>
>>> Makes more sense to drop support for zstd assuming the fix is not
>>> something at spark end (configuration, etc).
>>> Does not make sense to try to detect deadlock in codec.
>>>
>>> Regards,
>>> Mridul
>>>
>>> On Tue, Oct 1, 2019 at 8:39 PM Jungtaek Lim
>>> <ka...@gmail.com> wrote:
>>> >
>>> > Hi devs,
>>> >
>>> > I've discovered an issue with event logger, specifically reading
>>> incomplete event log file which is compressed with 'zstd' - the reader
>>> thread got stuck on reading that file.
>>> >
>>> > This is very easy to reproduce: setting configuration as below
>>> >
>>> > - spark.eventLog.enabled=true
>>> > - spark.eventLog.compress=true
>>> > - spark.eventLog.compression.codec=zstd
>>> >
>>> > and start Spark application. While the application is running, load
>>> the application in SHS webpage. It may succeed to replay the event log, but
>>> high likely it will be stuck and loading page will be also stuck.
>>> >
>>> > Please refer SPARK-29322 for more details.
>>> >
>>> > As the issue only occurs with 'zstd', the simplest approach is
>>> dropping support of 'zstd' for event log. More general approach would be
>>> introducing timeout on reading event log file, but it should be able to
>>> differentiate thread being stuck vs thread busy with reading huge event log
>>> file.
>>> >
>>> > Which approach would be preferred in Spark community, or would someone
>>> propose better ideas for handling this?
>>> >
>>> > Thanks,
>>> > Jungtaek Lim (HeartSaVioR)
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>
>>>

Re: [DISCUSS] Preferred approach on dealing with SPARK-29322

Posted by Jungtaek Lim <ka...@gmail.com>.
The change log for zstd v1.4.3 feels me that the changes don't seem to be
related.

https://github.com/facebook/zstd/blob/dev/CHANGELOG#L1-L5

v1.4.3
bug: Fix Dictionary Compression Ratio Regression by @cyan4973 (#1709)
bug: Fix Buffer Overflow in v0.3 Decompression by @felixhandte (#1722)
build: Add support for IAR C/C++ Compiler for Arm by @joseph0918 (#1705)
misc: Add NULL pointer check in util.c by @leeyoung624 (#1706)

But it's only the matter of dependency update and rebuild, so I'll try it
out.

Before that, I just indicated ZstdOutputStream has a parameter
"closeFrameOnFlush" which seems to deal with flush. We let the value as the
default value which is "false". Let me pass the value to "true" and see it
helps. Please let me know if someone knows why we pick the value as false
(or let it by default).


On Wed, Oct 2, 2019 at 1:48 PM Dongjoon Hyun <do...@gmail.com>
wrote:

> Thank you for reporting, Jungtaek.
>
> Can we try to upgrade it to the newer version first?
>
> Since we are at 1.4.2, the newer version is 1.4.3.
>
> Bests,
> Dongjoon.
>
>
>
> On Tue, Oct 1, 2019 at 9:18 PM Mridul Muralidharan <mr...@gmail.com>
> wrote:
>
>> Makes more sense to drop support for zstd assuming the fix is not
>> something at spark end (configuration, etc).
>> Does not make sense to try to detect deadlock in codec.
>>
>> Regards,
>> Mridul
>>
>> On Tue, Oct 1, 2019 at 8:39 PM Jungtaek Lim
>> <ka...@gmail.com> wrote:
>> >
>> > Hi devs,
>> >
>> > I've discovered an issue with event logger, specifically reading
>> incomplete event log file which is compressed with 'zstd' - the reader
>> thread got stuck on reading that file.
>> >
>> > This is very easy to reproduce: setting configuration as below
>> >
>> > - spark.eventLog.enabled=true
>> > - spark.eventLog.compress=true
>> > - spark.eventLog.compression.codec=zstd
>> >
>> > and start Spark application. While the application is running, load the
>> application in SHS webpage. It may succeed to replay the event log, but
>> high likely it will be stuck and loading page will be also stuck.
>> >
>> > Please refer SPARK-29322 for more details.
>> >
>> > As the issue only occurs with 'zstd', the simplest approach is dropping
>> support of 'zstd' for event log. More general approach would be introducing
>> timeout on reading event log file, but it should be able to differentiate
>> thread being stuck vs thread busy with reading huge event log file.
>> >
>> > Which approach would be preferred in Spark community, or would someone
>> propose better ideas for handling this?
>> >
>> > Thanks,
>> > Jungtaek Lim (HeartSaVioR)
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>
>>

Re: [DISCUSS] Preferred approach on dealing with SPARK-29322

Posted by Dongjoon Hyun <do...@gmail.com>.
Thank you for reporting, Jungtaek.

Can we try to upgrade it to the newer version first?

Since we are at 1.4.2, the newer version is 1.4.3.

Bests,
Dongjoon.



On Tue, Oct 1, 2019 at 9:18 PM Mridul Muralidharan <mr...@gmail.com> wrote:

> Makes more sense to drop support for zstd assuming the fix is not
> something at spark end (configuration, etc).
> Does not make sense to try to detect deadlock in codec.
>
> Regards,
> Mridul
>
> On Tue, Oct 1, 2019 at 8:39 PM Jungtaek Lim
> <ka...@gmail.com> wrote:
> >
> > Hi devs,
> >
> > I've discovered an issue with event logger, specifically reading
> incomplete event log file which is compressed with 'zstd' - the reader
> thread got stuck on reading that file.
> >
> > This is very easy to reproduce: setting configuration as below
> >
> > - spark.eventLog.enabled=true
> > - spark.eventLog.compress=true
> > - spark.eventLog.compression.codec=zstd
> >
> > and start Spark application. While the application is running, load the
> application in SHS webpage. It may succeed to replay the event log, but
> high likely it will be stuck and loading page will be also stuck.
> >
> > Please refer SPARK-29322 for more details.
> >
> > As the issue only occurs with 'zstd', the simplest approach is dropping
> support of 'zstd' for event log. More general approach would be introducing
> timeout on reading event log file, but it should be able to differentiate
> thread being stuck vs thread busy with reading huge event log file.
> >
> > Which approach would be preferred in Spark community, or would someone
> propose better ideas for handling this?
> >
> > Thanks,
> > Jungtaek Lim (HeartSaVioR)
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Re: [DISCUSS] Preferred approach on dealing with SPARK-29322

Posted by Mridul Muralidharan <mr...@gmail.com>.
Makes more sense to drop support for zstd assuming the fix is not
something at spark end (configuration, etc).
Does not make sense to try to detect deadlock in codec.

Regards,
Mridul

On Tue, Oct 1, 2019 at 8:39 PM Jungtaek Lim
<ka...@gmail.com> wrote:
>
> Hi devs,
>
> I've discovered an issue with event logger, specifically reading incomplete event log file which is compressed with 'zstd' - the reader thread got stuck on reading that file.
>
> This is very easy to reproduce: setting configuration as below
>
> - spark.eventLog.enabled=true
> - spark.eventLog.compress=true
> - spark.eventLog.compression.codec=zstd
>
> and start Spark application. While the application is running, load the application in SHS webpage. It may succeed to replay the event log, but high likely it will be stuck and loading page will be also stuck.
>
> Please refer SPARK-29322 for more details.
>
> As the issue only occurs with 'zstd', the simplest approach is dropping support of 'zstd' for event log. More general approach would be introducing timeout on reading event log file, but it should be able to differentiate thread being stuck vs thread busy with reading huge event log file.
>
> Which approach would be preferred in Spark community, or would someone propose better ideas for handling this?
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org