You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@yunikorn.apache.org by Craig Condit <cc...@apache.org> on 2022/12/20 00:18:19 UTC

[DISCUSS] Removal of periodic state dumps

All,

I’d like to open a discussion about the future of the periodic state dump feature. To jumpstart the discussion, I opened https://issues.apache.org/jira/browse/YUNIKORN-1483, which is copied below for context. In the process of writing this up, it seems to me that we might actually be better off simply removing the feature, and relying solely on the REST API to retrieve state dumps on demand.

In the current state, periodic state dumps need to be enabled, at which point they write to a local filesystem within the YuniKorn scheduler. This maps onto ephemeral storage, so to avoid out-of-space scenarios, an administrator needs to customize the YK Helm deployment with additional resource quota. Additionally, to even access the dumps, the filesystem needs to be mounted as a persistent volume and external code written to interact with the saved dumps. Given the mixed text-and-json format of these dumps, this can be rather complicated.

Alternatively, users could simply deploy a cron container which pulls the state dump on-demand from the existing REST API. This ends up being considerably simpler.

Are there objections to removing the existing periodic state dump functionality? Existing users who would be impacted greatly? To be clear, I’m not proposing removing the state dump itself; the version available via the REST API has proven extremely valuable. All that is on the table is removal of the automatic, periodic state dump which writes to local files.

Looking forward to feedback,

Craig



------------------------------------
YUNIKORN-1483 write-up:

The current support for generating periodic state dumps implemented in YUNIKORN-940 <https://issues.apache.org/jira/browse/YUNIKORN-940> has several warts:
The configuration in YUNIKORN-949 <https://issues.apache.org/jira/browse/YUNIKORN-949> is done via the core scheduler configuration, leading to a random option on partitions which doesn't belong there and has nothing to do with scheduling.
Changing the frequency of the state dumps is done via the unsecured REST API. This is a potential denial-of-service vector.
Configuration V2 is now complete, which standardizes on using a ConfigMap to configure all YuniKorn options that make sense to be reconfigured. However, allowing the location to be changed at runtime makes no sense in a containerized environment.
Retrieving the state dumps requires mounting of external storage. This is necessarily a site-specific configuration and currently requires a custom Helm deployment.
The state dumps, though JSON, are emitted as text files with JSON appended to them, making parsing difficult.
To address these issues:
Deprecate existing REST API configuration for frequency, and make it a no-op now for security reasons. We can remove it completely in 2.0.
Deprecate the statedumpfilepath option on partitions. Ignore it for security reasons now (and warn if found), and remove completely in 2.0.
Disable the feature by default. To enable it, we should require setting a specific environment variable:
YUNIKORN_STATE_DUMP_LOCATION=/path/to/dir : This would be required to enable the feature at all. Making it an env var makes sense as it is not an option that should be reconfigured (or even visible) in configuration.
Via configmap, we should allow the feature to be enabled / disabled and its frequency set. These options would have no effect if YUNIKORN_STATE_DUMP_LOCATION is not defined:
periodicStateDump.enabled: "true" | "false" (default "false")
periodicStateDump.frequency: "15m" (default value, do not allow more frequently than 1m intervals)
periodicStateDump.count: 10 (default value)
Create an empty directory /yunkorn-state in the Docker image to store state dumps.
Add support to Helm for enabling state dump support as well as setting custom mount options (including quota). Enabling support should set the env var YUNIKORN_STATE_DUMP_LOCATION=/yunikorn-state and mount this directory via the options specified.
Output a single json file per dump and remove oldest files until count <= periodicStateDump.count entries: yunikorn-state-dump-YYYYMMDD-HHMM.json

Re: [DISCUSS] Removal of periodic state dumps

Posted by Wilfred Spiegelenburg <wi...@apache.org>.
I am in favor of just having the REST option as per YUNIKORN-1500 [1]
Volume mounting and maintaining files etc is more involved than setting up
a simple job that pulls the detail on a schedule.
The code to support writing to a volume properly could become complex as
state dumps might grow and handling out of space while still within
parameters becomes a complex case.

I have approved the change for YUNIKORN-1500 and will commit soon.

Wilfred

[1] https://issues.apache.org/jira/browse/YUNIKORN-1500

On Tue, 20 Dec 2022 at 12:29, Weiwei Yang <ww...@apache.org> wrote:

> +1 to remove this, I don't believe users heavily depend on this, but let's
> keep this thread for at least a few days to collect feedback.
>
> Thanks
> Weiwei
>
> On Mon, Dec 19, 2022 at 4:21 PM Craig Condit <cc...@apache.org> wrote:
>
> > Re-adding existing YUNIKORN-1483 text as formatting broke badly. I’m not
> > proposing this is the way to go, just referencing the JIRA for
> discussion:
> >
> >
> > <SNIP>
> >
> > The current support for generating periodic state dumps implemented in
> > YUNIKORN-940 has several warts:
> >
> > 1. The configuration in YUNIKORN-949 is done via the core scheduler
> > configuration, leading to a random option on partitions which doesn't
> > belong there and has nothing to do with scheduling.
> >
> > 2. Changing the frequency of the state dumps is done via the unsecured
> > REST API. This is a potential denial-of-service vector.
> >
> > 3. Configuration V2 is now complete, which standardizes on using a
> > ConfigMap to configure all YuniKorn options that make sense to be
> > reconfigured. However, allowing the location to be changed at runtime
> makes
> > no sense in a containerized environment.
> >
> > 4. Retrieving the state dumps requires mounting of external storage. This
> > is necessarily a site-specific configuration and currently requires a
> > custom Helm deployment.
> >
> > 5. The state dumps, though JSON, are emitted as text files with JSON
> > appended to them, making parsing difficult.
> >
> > To address these issues:
> >
> > 1. Deprecate existing REST API configuration for frequency, and make it a
> > no-op now for security reasons. We can remove it completely in 2.0.
> >
> > 2. Deprecate the statedumpfilepath option on partitions. Ignore it for
> > security reasons now (and warn if found), and remove completely in 2.0.
> >
> > 3. Disable the feature by default. To enable it, we should require
> setting
> > a specific environment variable:
> >    - YUNIKORN_STATE_DUMP_LOCATION=/path/to/dir : This would be required
> to
> > enable the feature at all. Making it an env var makes sense as it is not
> an
> > option that should be reconfigured (or even visible) in configuration.
> >
> > 4. Via configmap, we should allow the feature to be enabled / disabled
> and
> > its frequency set. These options would have no effect if
> > YUNIKORN_STATE_DUMP_LOCATION is not defined:
> >   - periodicStateDump.enabled: "true" | "false" (default "false")
> >   - periodicStateDump.frequency: "15m" (default value, do not allow more
> > frequently than 1m intervals)
> >   - periodicStateDump.count: 10 (default value)
> >
> > 5. Create an empty directory /yunkorn-state in the Docker image to store
> > state dumps.
> >
> > 6. Add support to Helm for enabling state dump support as well as setting
> > custom mount options (including quota). Enabling support should set the
> env
> > var YUNIKORN_STATE_DUMP_LOCATION=/yunikorn-state and mount this directory
> > via the options specified.
> >
> > 7. Output a single json file per dump and remove oldest files until count
> > <= periodicStateDump.count entries:
> yunikorn-state-dump-YYYYMMDD-HHMM.json
> >
> > </SNIP>
> >
> > > On Dec 19, 2022, at 6:18 PM, Craig Condit <cc...@apache.org> wrote:
> > >
> > > All,
> > >
> > > I’d like to open a discussion about the future of the periodic state
> > dump feature. To jumpstart the discussion, I opened
> > https://issues.apache.org/jira/browse/YUNIKORN-1483, which is copied
> > below for context. In the process of writing this up, it seems to me that
> > we might actually be better off simply removing the feature, and relying
> > solely on the REST API to retrieve state dumps on demand.
> > >
> > > In the current state, periodic state dumps need to be enabled, at which
> > point they write to a local filesystem within the YuniKorn scheduler.
> This
> > maps onto ephemeral storage, so to avoid out-of-space scenarios, an
> > administrator needs to customize the YK Helm deployment with additional
> > resource quota. Additionally, to even access the dumps, the filesystem
> > needs to be mounted as a persistent volume and external code written to
> > interact with the saved dumps. Given the mixed text-and-json format of
> > these dumps, this can be rather complicated.
> > >
> > > Alternatively, users could simply deploy a cron container which pulls
> > the state dump on-demand from the existing REST API. This ends up being
> > considerably simpler.
> > >
> > > Are there objections to removing the existing periodic state dump
> > functionality? Existing users who would be impacted greatly? To be clear,
> > I’m not proposing removing the state dump itself; the version available
> via
> > the REST API has proven extremely valuable. All that is on the table is
> > removal of the automatic, periodic state dump which writes to local
> files.
> > >
> > > Looking forward to feedback,
> > >
> > > Craig
> > >
> > >
> > >
> > > ------------------------------------
> > > YUNIKORN-1483 write-up:
> > >
> > > The current support for generating periodic state dumps implemented in
> > YUNIKORN-940 <https://issues.apache.org/jira/browse/YUNIKORN-940> has
> > several warts:
> > > The configuration in YUNIKORN-949 <
> > https://issues.apache.org/jira/browse/YUNIKORN-949> is done via the core
> > scheduler configuration, leading to a random option on partitions which
> > doesn't belong there and has nothing to do with scheduling.
> > > Changing the frequency of the state dumps is done via the unsecured
> REST
> > API. This is a potential denial-of-service vector.
> > > Configuration V2 is now complete, which standardizes on using a
> > ConfigMap to configure all YuniKorn options that make sense to be
> > reconfigured. However, allowing the location to be changed at runtime
> makes
> > no sense in a containerized environment.
> > > Retrieving the state dumps requires mounting of external storage. This
> > is necessarily a site-specific configuration and currently requires a
> > custom Helm deployment.
> > > The state dumps, though JSON, are emitted as text files with JSON
> > appended to them, making parsing difficult.
> > > To address these issues:
> > > Deprecate existing REST API configuration for frequency, and make it a
> > no-op now for security reasons. We can remove it completely in 2.0.
> > > Deprecate the statedumpfilepath option on partitions. Ignore it for
> > security reasons now (and warn if found), and remove completely in 2.0.
> > > Disable the feature by default. To enable it, we should require setting
> > a specific environment variable:
> > > YUNIKORN_STATE_DUMP_LOCATION=/path/to/dir : This would be required to
> > enable the feature at all. Making it an env var makes sense as it is not
> an
> > option that should be reconfigured (or even visible) in configuration.
> > > Via configmap, we should allow the feature to be enabled / disabled and
> > its frequency set. These options would have no effect if
> > YUNIKORN_STATE_DUMP_LOCATION is not defined:
> > > periodicStateDump.enabled: "true" | "false" (default "false")
> > > periodicStateDump.frequency: "15m" (default value, do not allow more
> > frequently than 1m intervals)
> > > periodicStateDump.count: 10 (default value)
> > > Create an empty directory /yunkorn-state in the Docker image to store
> > state dumps.
> > > Add support to Helm for enabling state dump support as well as setting
> > custom mount options (including quota). Enabling support should set the
> env
> > var YUNIKORN_STATE_DUMP_LOCATION=/yunikorn-state and mount this directory
> > via the options specified.
> > > Output a single json file per dump and remove oldest files until count
> > <= periodicStateDump.count entries:
> yunikorn-state-dump-YYYYMMDD-HHMM.json
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@yunikorn.apache.org
> > For additional commands, e-mail: dev-help@yunikorn.apache.org
> >
> >
>

Re: [DISCUSS] Removal of periodic state dumps

Posted by Weiwei Yang <ww...@apache.org>.
+1 to remove this, I don't believe users heavily depend on this, but let's
keep this thread for at least a few days to collect feedback.

Thanks
Weiwei

On Mon, Dec 19, 2022 at 4:21 PM Craig Condit <cc...@apache.org> wrote:

> Re-adding existing YUNIKORN-1483 text as formatting broke badly. I’m not
> proposing this is the way to go, just referencing the JIRA for discussion:
>
>
> <SNIP>
>
> The current support for generating periodic state dumps implemented in
> YUNIKORN-940 has several warts:
>
> 1. The configuration in YUNIKORN-949 is done via the core scheduler
> configuration, leading to a random option on partitions which doesn't
> belong there and has nothing to do with scheduling.
>
> 2. Changing the frequency of the state dumps is done via the unsecured
> REST API. This is a potential denial-of-service vector.
>
> 3. Configuration V2 is now complete, which standardizes on using a
> ConfigMap to configure all YuniKorn options that make sense to be
> reconfigured. However, allowing the location to be changed at runtime makes
> no sense in a containerized environment.
>
> 4. Retrieving the state dumps requires mounting of external storage. This
> is necessarily a site-specific configuration and currently requires a
> custom Helm deployment.
>
> 5. The state dumps, though JSON, are emitted as text files with JSON
> appended to them, making parsing difficult.
>
> To address these issues:
>
> 1. Deprecate existing REST API configuration for frequency, and make it a
> no-op now for security reasons. We can remove it completely in 2.0.
>
> 2. Deprecate the statedumpfilepath option on partitions. Ignore it for
> security reasons now (and warn if found), and remove completely in 2.0.
>
> 3. Disable the feature by default. To enable it, we should require setting
> a specific environment variable:
>    - YUNIKORN_STATE_DUMP_LOCATION=/path/to/dir : This would be required to
> enable the feature at all. Making it an env var makes sense as it is not an
> option that should be reconfigured (or even visible) in configuration.
>
> 4. Via configmap, we should allow the feature to be enabled / disabled and
> its frequency set. These options would have no effect if
> YUNIKORN_STATE_DUMP_LOCATION is not defined:
>   - periodicStateDump.enabled: "true" | "false" (default "false")
>   - periodicStateDump.frequency: "15m" (default value, do not allow more
> frequently than 1m intervals)
>   - periodicStateDump.count: 10 (default value)
>
> 5. Create an empty directory /yunkorn-state in the Docker image to store
> state dumps.
>
> 6. Add support to Helm for enabling state dump support as well as setting
> custom mount options (including quota). Enabling support should set the env
> var YUNIKORN_STATE_DUMP_LOCATION=/yunikorn-state and mount this directory
> via the options specified.
>
> 7. Output a single json file per dump and remove oldest files until count
> <= periodicStateDump.count entries: yunikorn-state-dump-YYYYMMDD-HHMM.json
>
> </SNIP>
>
> > On Dec 19, 2022, at 6:18 PM, Craig Condit <cc...@apache.org> wrote:
> >
> > All,
> >
> > I’d like to open a discussion about the future of the periodic state
> dump feature. To jumpstart the discussion, I opened
> https://issues.apache.org/jira/browse/YUNIKORN-1483, which is copied
> below for context. In the process of writing this up, it seems to me that
> we might actually be better off simply removing the feature, and relying
> solely on the REST API to retrieve state dumps on demand.
> >
> > In the current state, periodic state dumps need to be enabled, at which
> point they write to a local filesystem within the YuniKorn scheduler. This
> maps onto ephemeral storage, so to avoid out-of-space scenarios, an
> administrator needs to customize the YK Helm deployment with additional
> resource quota. Additionally, to even access the dumps, the filesystem
> needs to be mounted as a persistent volume and external code written to
> interact with the saved dumps. Given the mixed text-and-json format of
> these dumps, this can be rather complicated.
> >
> > Alternatively, users could simply deploy a cron container which pulls
> the state dump on-demand from the existing REST API. This ends up being
> considerably simpler.
> >
> > Are there objections to removing the existing periodic state dump
> functionality? Existing users who would be impacted greatly? To be clear,
> I’m not proposing removing the state dump itself; the version available via
> the REST API has proven extremely valuable. All that is on the table is
> removal of the automatic, periodic state dump which writes to local files.
> >
> > Looking forward to feedback,
> >
> > Craig
> >
> >
> >
> > ------------------------------------
> > YUNIKORN-1483 write-up:
> >
> > The current support for generating periodic state dumps implemented in
> YUNIKORN-940 <https://issues.apache.org/jira/browse/YUNIKORN-940> has
> several warts:
> > The configuration in YUNIKORN-949 <
> https://issues.apache.org/jira/browse/YUNIKORN-949> is done via the core
> scheduler configuration, leading to a random option on partitions which
> doesn't belong there and has nothing to do with scheduling.
> > Changing the frequency of the state dumps is done via the unsecured REST
> API. This is a potential denial-of-service vector.
> > Configuration V2 is now complete, which standardizes on using a
> ConfigMap to configure all YuniKorn options that make sense to be
> reconfigured. However, allowing the location to be changed at runtime makes
> no sense in a containerized environment.
> > Retrieving the state dumps requires mounting of external storage. This
> is necessarily a site-specific configuration and currently requires a
> custom Helm deployment.
> > The state dumps, though JSON, are emitted as text files with JSON
> appended to them, making parsing difficult.
> > To address these issues:
> > Deprecate existing REST API configuration for frequency, and make it a
> no-op now for security reasons. We can remove it completely in 2.0.
> > Deprecate the statedumpfilepath option on partitions. Ignore it for
> security reasons now (and warn if found), and remove completely in 2.0.
> > Disable the feature by default. To enable it, we should require setting
> a specific environment variable:
> > YUNIKORN_STATE_DUMP_LOCATION=/path/to/dir : This would be required to
> enable the feature at all. Making it an env var makes sense as it is not an
> option that should be reconfigured (or even visible) in configuration.
> > Via configmap, we should allow the feature to be enabled / disabled and
> its frequency set. These options would have no effect if
> YUNIKORN_STATE_DUMP_LOCATION is not defined:
> > periodicStateDump.enabled: "true" | "false" (default "false")
> > periodicStateDump.frequency: "15m" (default value, do not allow more
> frequently than 1m intervals)
> > periodicStateDump.count: 10 (default value)
> > Create an empty directory /yunkorn-state in the Docker image to store
> state dumps.
> > Add support to Helm for enabling state dump support as well as setting
> custom mount options (including quota). Enabling support should set the env
> var YUNIKORN_STATE_DUMP_LOCATION=/yunikorn-state and mount this directory
> via the options specified.
> > Output a single json file per dump and remove oldest files until count
> <= periodicStateDump.count entries: yunikorn-state-dump-YYYYMMDD-HHMM.json
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@yunikorn.apache.org
> For additional commands, e-mail: dev-help@yunikorn.apache.org
>
>

Re: [DISCUSS] Removal of periodic state dumps

Posted by Craig Condit <cc...@apache.org>.
Re-adding existing YUNIKORN-1483 text as formatting broke badly. I’m not proposing this is the way to go, just referencing the JIRA for discussion:


<SNIP>

The current support for generating periodic state dumps implemented in YUNIKORN-940 has several warts:

1. The configuration in YUNIKORN-949 is done via the core scheduler configuration, leading to a random option on partitions which doesn't belong there and has nothing to do with scheduling.

2. Changing the frequency of the state dumps is done via the unsecured REST API. This is a potential denial-of-service vector.

3. Configuration V2 is now complete, which standardizes on using a ConfigMap to configure all YuniKorn options that make sense to be reconfigured. However, allowing the location to be changed at runtime makes no sense in a containerized environment.

4. Retrieving the state dumps requires mounting of external storage. This is necessarily a site-specific configuration and currently requires a custom Helm deployment.

5. The state dumps, though JSON, are emitted as text files with JSON appended to them, making parsing difficult.

To address these issues:

1. Deprecate existing REST API configuration for frequency, and make it a no-op now for security reasons. We can remove it completely in 2.0.

2. Deprecate the statedumpfilepath option on partitions. Ignore it for security reasons now (and warn if found), and remove completely in 2.0.

3. Disable the feature by default. To enable it, we should require setting a specific environment variable:
   - YUNIKORN_STATE_DUMP_LOCATION=/path/to/dir : This would be required to enable the feature at all. Making it an env var makes sense as it is not an option that should be reconfigured (or even visible) in configuration.

4. Via configmap, we should allow the feature to be enabled / disabled and its frequency set. These options would have no effect if YUNIKORN_STATE_DUMP_LOCATION is not defined:
  - periodicStateDump.enabled: "true" | "false" (default "false")
  - periodicStateDump.frequency: "15m" (default value, do not allow more frequently than 1m intervals)
  - periodicStateDump.count: 10 (default value)

5. Create an empty directory /yunkorn-state in the Docker image to store state dumps.

6. Add support to Helm for enabling state dump support as well as setting custom mount options (including quota). Enabling support should set the env var YUNIKORN_STATE_DUMP_LOCATION=/yunikorn-state and mount this directory via the options specified.

7. Output a single json file per dump and remove oldest files until count <= periodicStateDump.count entries: yunikorn-state-dump-YYYYMMDD-HHMM.json

</SNIP>

> On Dec 19, 2022, at 6:18 PM, Craig Condit <cc...@apache.org> wrote:
> 
> All,
> 
> I’d like to open a discussion about the future of the periodic state dump feature. To jumpstart the discussion, I opened https://issues.apache.org/jira/browse/YUNIKORN-1483, which is copied below for context. In the process of writing this up, it seems to me that we might actually be better off simply removing the feature, and relying solely on the REST API to retrieve state dumps on demand.
> 
> In the current state, periodic state dumps need to be enabled, at which point they write to a local filesystem within the YuniKorn scheduler. This maps onto ephemeral storage, so to avoid out-of-space scenarios, an administrator needs to customize the YK Helm deployment with additional resource quota. Additionally, to even access the dumps, the filesystem needs to be mounted as a persistent volume and external code written to interact with the saved dumps. Given the mixed text-and-json format of these dumps, this can be rather complicated.
> 
> Alternatively, users could simply deploy a cron container which pulls the state dump on-demand from the existing REST API. This ends up being considerably simpler.
> 
> Are there objections to removing the existing periodic state dump functionality? Existing users who would be impacted greatly? To be clear, I’m not proposing removing the state dump itself; the version available via the REST API has proven extremely valuable. All that is on the table is removal of the automatic, periodic state dump which writes to local files.
> 
> Looking forward to feedback,
> 
> Craig
> 
> 
> 
> ------------------------------------
> YUNIKORN-1483 write-up:
> 
> The current support for generating periodic state dumps implemented in YUNIKORN-940 <https://issues.apache.org/jira/browse/YUNIKORN-940> has several warts:
> The configuration in YUNIKORN-949 <https://issues.apache.org/jira/browse/YUNIKORN-949> is done via the core scheduler configuration, leading to a random option on partitions which doesn't belong there and has nothing to do with scheduling.
> Changing the frequency of the state dumps is done via the unsecured REST API. This is a potential denial-of-service vector.
> Configuration V2 is now complete, which standardizes on using a ConfigMap to configure all YuniKorn options that make sense to be reconfigured. However, allowing the location to be changed at runtime makes no sense in a containerized environment.
> Retrieving the state dumps requires mounting of external storage. This is necessarily a site-specific configuration and currently requires a custom Helm deployment.
> The state dumps, though JSON, are emitted as text files with JSON appended to them, making parsing difficult.
> To address these issues:
> Deprecate existing REST API configuration for frequency, and make it a no-op now for security reasons. We can remove it completely in 2.0.
> Deprecate the statedumpfilepath option on partitions. Ignore it for security reasons now (and warn if found), and remove completely in 2.0.
> Disable the feature by default. To enable it, we should require setting a specific environment variable:
> YUNIKORN_STATE_DUMP_LOCATION=/path/to/dir : This would be required to enable the feature at all. Making it an env var makes sense as it is not an option that should be reconfigured (or even visible) in configuration.
> Via configmap, we should allow the feature to be enabled / disabled and its frequency set. These options would have no effect if YUNIKORN_STATE_DUMP_LOCATION is not defined:
> periodicStateDump.enabled: "true" | "false" (default "false")
> periodicStateDump.frequency: "15m" (default value, do not allow more frequently than 1m intervals)
> periodicStateDump.count: 10 (default value)
> Create an empty directory /yunkorn-state in the Docker image to store state dumps.
> Add support to Helm for enabling state dump support as well as setting custom mount options (including quota). Enabling support should set the env var YUNIKORN_STATE_DUMP_LOCATION=/yunikorn-state and mount this directory via the options specified.
> Output a single json file per dump and remove oldest files until count <= periodicStateDump.count entries: yunikorn-state-dump-YYYYMMDD-HHMM.json


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@yunikorn.apache.org
For additional commands, e-mail: dev-help@yunikorn.apache.org