You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Andrew Schwartzmeyer <an...@schwartzmeyer.com> on 2018/08/21 21:02:50 UTC

Follow up to discussion regarding use : in paths on Windows (MESOS-9109)

Hey all,

I have a set of patches up for MESOS-9109 that I need reviewed, starting 
here: https://reviews.apache.org/r/68297/.

Eduard here was trying to use Chronos to schedule a task on a Windows 
agent, and found an error due to the fact that Chronos uses colons (as 
in `:`) in its generated framework (and task) IDs. Now, to maintain 
backward compatibility, we obviously can't disallow the use of `:` as 
there are frameworks already using it. However, this is a reserved 
character on Windows for file system paths 
(https://docs.microsoft.com/en-us/windows/desktop/FileIO/naming-a-file), 
so it cannot be in the path.

My first implementation simply applied `s/:/_COLON_` to `frameworkId` 
and `taskId` in the functions in `paths.cpp` which generate Mesos's 
filesystem paths. While this worked, it's kind of a kludge. Or that is 
to say, it would nicer to use the ASCII representation of `%3A` instead. 
Doing so, however, revealed a bug in libprocess (MESOS-9168) that I have 
also fixed and need reviewed, starting here: 
https://reviews.apache.org/r/68420/

So combining the two fixes, the chain maps `:` in `frameworkId` and 
`taskId` to `%3A` (and back when appropriate). This obviously doesn't 
fix any third-party tooling, but being Windows, I don't think there is 
any yet to worry about.

I wanted to get this in for 1.7, but due to a miscommunication, we were 
not able to land it in time. If you can, please review! Or if you have a 
better way of doing this, let me know!

Thanks,

Andy

P.S. Original discussion here: 
https://mesos.slack.com/archives/C1LPTK50T/p1533324650000396 (our Slack 
archives seem to be down, so this is only available until Slack cycles 
out sadly).

Re: Follow up to discussion regarding use : in paths on Windows (MESOS-9109)

Posted by Chun-Hung Hsiao <ch...@mesosphere.io>.
It seems we have the following issues w.r.t path generation:

1. Path separators are disallowed:
    This is general to all systems, so we'll need to put a
platform-independent check. But since no one's doing this we can put this
into the backlog.
2. Other invalid characters on different platforms:
    For now let's just focus on Windows since Un*x doesn't have any
restriction other than /,
    but since we're already working on this issue, how about resolve all
of 0x00-0x1F 0x7F " * / : < > ? | at once?
    This can be a Windows-specific now, as proposed by Andy.
3. Other path constraints, e.g., invalid sequences of valid characters.
    This is platform-dependent but the problem is there for both Un*x and
Windows. We can resolve this along with 1 later.

As long as the way we resolve 2 (i.e., the encoding/decoding process) won't
introduce any compatibility problem in the future,
I'm good at only fixing 2 for now and follow up with a clean up later.
To be conservative, if we're sure that there's no existing framework using
% in its ID,
does it make sense to add a check for now to ensure that?

On Tue, Sep 4, 2018 at 2:12 PM Andrew Schwartzmeyer <
andrew@schwartzmeyer.com> wrote:

> I think your approach would be fairly sound. That is, to change the
> logic to read the IDs from the info file instead of the paths. But I
> also think we can punt this for now (as I do not think a task ID like
> 'Hello%3AWorld' is plausibly in use right now), and implement a fix for
> colons now that would remain compatible.
>
> If we add encode/decode logic for colons on Windows, we do not introduce
> backward compatibility issues on other platforms (as we'd constrain the
> change to Windows), and in the future, we can safely replace the decode
> logic with your approach. That is to say, we implement the encoding as
> sparingly as possible, but implement it now, because it's kind of
> required, and we implement the decoding only as a stop-gap until we
> replace this logic with reading from the info file instead. If we later
> find another character in use that also needs to be encoded, we can then
> abstract the single encoding into a per-platform encoding set.
>
> Does this seem reasonable?
>
> Thanks,
>
> Andy
>
> P.S. Sorry this took a while to get back to, I was out last week.
>
> On 08/23/2018 3:34 pm, Chun-Hung Hsiao wrote:
> > I'm a bit concerned about the recovery logic and backward
> > compatibility:
> > The changes we're making shouldn't affect existing users,
> > and we should try hard to avoid any future backward compatibility
> > problem.
> >
> > Say if there is already some custom framework using task ID
> > 'Hello%3AWorld',
> > then if we blindly decode the task path during recovery, we will get
> > the
> > wrong ID 'Hello:World'.
> > On the other hand, if we don't decode the task path during recovery,
> > then later on during checkpointing for the same task,
> > we shouldn't blindly encode the task ID, because it might create a
> > different path,
> > and we'll need to introduce some migration code to avoid such
> > duplication.
> >
> > Fortunately, we do checkpoint the executor IDs and task IDs in the info
> > files under the meta dir.
> > So I'm considering the following design to minimize the backward
> > compatibility issue we might have:
> > During recovery, we don't decode the recovered task path,
> > but get the executor/task ID from the info file instead of relying on
> > parsing the executor/task path.
> > When checkpointing, we only encode executor/task IDs if they contain
> > reserved characters.
> > The set of reserved characters could be defined as a platform-dependent
> > variable,
> > similar to what we have done for `PATH_SEPARATOR`.
> >
> > The above design would look a bit more complicated then just blindly
> > applying percent encoding
> > to when constructing checkpoint paths, but it doesn't require extra
> > checkpoint migration logic,
> > and would keep the exact same behavior we have now for "normal"
> > executor/task IDs.
> >
> > What did you guys think? Please feel free to raise any concern :)
> > And we don't need to implement the whole thing for now.
> > For example, we could start with just dealing with colons,
> > and extend the implementation later on,
> > as long as the partial solution we're going to have right now doesn't
> > create future tech debts!
> >
> > Best,
> > Chun-Hung
> >
> > On Thu, Aug 23, 2018 at 1:42 PM Greg Mann <gr...@mesosphere.io> wrote:
> >
> >> Thanks Andy! Responses inlined below.
> >>
> >>
> >>
> >>> No: As the only character we've run into a problem with is `:`
> >>> (MESOS-9109), it might not be worth it to generalize this to solve a
> >>> bunch
> >>> of problems that we haven't encountered.
> >>>
> >>>
> >> It's true that I'm not aware of other scenarios where
> >> filesystem-disallowed characters in task/executor IDs have caused
> >> issues
> >> for users, and this issue has existed for a long time. However, when
> >> feasible I would like to fix issues that we're aware of before they
> >> cause
> >> problems for users, rather than after. I would suggest that since we
> >> have
> >> one compelling case that we need to address now, it's worth
> >> formulating an
> >> approach for the general case, so that we can be sure any current work
> >> doesn't get in our way later on.
> >>
> >>
> >>> I'm somewhat comfortable doing so only for Windows, as we don't
> >>> really
> >>> need to worry about the recovery scenario; but very uncomfortable
> >>> about
> >>> doing so for Linux etc., for precisely that reason.
> >>>
> >>> So expanding this is definitely up for debate; but we must fix the
> >>> bug
> >>> with `:`.
> >>>
> >>>
> >> Indeed, addressing the general case may prove to be much more complex
> >> - I
> >> can certainly identify with this situation, where a fix for a smaller
> >> issue
> >> turns into a big project :)
> >> It may turn out to be possible to implement a scoped-down solution for
> >> the
> >> colon case now, and extend it later on. I think it would be good if we
> >> could at least get an idea of how we want to handle the general case
> >> now,
> >> so that any short-term solutions can be a constructive step toward the
> >> long-term.
> >>
> >> Cheers,
> >> G
> >>
>

Re: Follow up to discussion regarding use : in paths on Windows (MESOS-9109)

Posted by Chun-Hung Hsiao <ch...@mesosphere.io>.
It seems we have the following issues w.r.t path generation:

1. Path separators are disallowed:
    This is general to all systems, so we'll need to put a
platform-independent check. But since no one's doing this we can put this
into the backlog.
2. Other invalid characters on different platforms:
    For now let's just focus on Windows since Un*x doesn't have any
restriction other than /,
    but since we're already working on this issue, how about resolve all
of 0x00-0x1F 0x7F " * / : < > ? | at once?
    This can be a Windows-specific now, as proposed by Andy.
3. Other path constraints, e.g., invalid sequences of valid characters.
    This is platform-dependent but the problem is there for both Un*x and
Windows. We can resolve this along with 1 later.

As long as the way we resolve 2 (i.e., the encoding/decoding process) won't
introduce any compatibility problem in the future,
I'm good at only fixing 2 for now and follow up with a clean up later.
To be conservative, if we're sure that there's no existing framework using
% in its ID,
does it make sense to add a check for now to ensure that?

On Tue, Sep 4, 2018 at 2:12 PM Andrew Schwartzmeyer <
andrew@schwartzmeyer.com> wrote:

> I think your approach would be fairly sound. That is, to change the
> logic to read the IDs from the info file instead of the paths. But I
> also think we can punt this for now (as I do not think a task ID like
> 'Hello%3AWorld' is plausibly in use right now), and implement a fix for
> colons now that would remain compatible.
>
> If we add encode/decode logic for colons on Windows, we do not introduce
> backward compatibility issues on other platforms (as we'd constrain the
> change to Windows), and in the future, we can safely replace the decode
> logic with your approach. That is to say, we implement the encoding as
> sparingly as possible, but implement it now, because it's kind of
> required, and we implement the decoding only as a stop-gap until we
> replace this logic with reading from the info file instead. If we later
> find another character in use that also needs to be encoded, we can then
> abstract the single encoding into a per-platform encoding set.
>
> Does this seem reasonable?
>
> Thanks,
>
> Andy
>
> P.S. Sorry this took a while to get back to, I was out last week.
>
> On 08/23/2018 3:34 pm, Chun-Hung Hsiao wrote:
> > I'm a bit concerned about the recovery logic and backward
> > compatibility:
> > The changes we're making shouldn't affect existing users,
> > and we should try hard to avoid any future backward compatibility
> > problem.
> >
> > Say if there is already some custom framework using task ID
> > 'Hello%3AWorld',
> > then if we blindly decode the task path during recovery, we will get
> > the
> > wrong ID 'Hello:World'.
> > On the other hand, if we don't decode the task path during recovery,
> > then later on during checkpointing for the same task,
> > we shouldn't blindly encode the task ID, because it might create a
> > different path,
> > and we'll need to introduce some migration code to avoid such
> > duplication.
> >
> > Fortunately, we do checkpoint the executor IDs and task IDs in the info
> > files under the meta dir.
> > So I'm considering the following design to minimize the backward
> > compatibility issue we might have:
> > During recovery, we don't decode the recovered task path,
> > but get the executor/task ID from the info file instead of relying on
> > parsing the executor/task path.
> > When checkpointing, we only encode executor/task IDs if they contain
> > reserved characters.
> > The set of reserved characters could be defined as a platform-dependent
> > variable,
> > similar to what we have done for `PATH_SEPARATOR`.
> >
> > The above design would look a bit more complicated then just blindly
> > applying percent encoding
> > to when constructing checkpoint paths, but it doesn't require extra
> > checkpoint migration logic,
> > and would keep the exact same behavior we have now for "normal"
> > executor/task IDs.
> >
> > What did you guys think? Please feel free to raise any concern :)
> > And we don't need to implement the whole thing for now.
> > For example, we could start with just dealing with colons,
> > and extend the implementation later on,
> > as long as the partial solution we're going to have right now doesn't
> > create future tech debts!
> >
> > Best,
> > Chun-Hung
> >
> > On Thu, Aug 23, 2018 at 1:42 PM Greg Mann <gr...@mesosphere.io> wrote:
> >
> >> Thanks Andy! Responses inlined below.
> >>
> >>
> >>
> >>> No: As the only character we've run into a problem with is `:`
> >>> (MESOS-9109), it might not be worth it to generalize this to solve a
> >>> bunch
> >>> of problems that we haven't encountered.
> >>>
> >>>
> >> It's true that I'm not aware of other scenarios where
> >> filesystem-disallowed characters in task/executor IDs have caused
> >> issues
> >> for users, and this issue has existed for a long time. However, when
> >> feasible I would like to fix issues that we're aware of before they
> >> cause
> >> problems for users, rather than after. I would suggest that since we
> >> have
> >> one compelling case that we need to address now, it's worth
> >> formulating an
> >> approach for the general case, so that we can be sure any current work
> >> doesn't get in our way later on.
> >>
> >>
> >>> I'm somewhat comfortable doing so only for Windows, as we don't
> >>> really
> >>> need to worry about the recovery scenario; but very uncomfortable
> >>> about
> >>> doing so for Linux etc., for precisely that reason.
> >>>
> >>> So expanding this is definitely up for debate; but we must fix the
> >>> bug
> >>> with `:`.
> >>>
> >>>
> >> Indeed, addressing the general case may prove to be much more complex
> >> - I
> >> can certainly identify with this situation, where a fix for a smaller
> >> issue
> >> turns into a big project :)
> >> It may turn out to be possible to implement a scoped-down solution for
> >> the
> >> colon case now, and extend it later on. I think it would be good if we
> >> could at least get an idea of how we want to handle the general case
> >> now,
> >> so that any short-term solutions can be a constructive step toward the
> >> long-term.
> >>
> >> Cheers,
> >> G
> >>
>

Re: Follow up to discussion regarding use : in paths on Windows (MESOS-9109)

Posted by Andrew Schwartzmeyer <an...@schwartzmeyer.com>.
I think your approach would be fairly sound. That is, to change the 
logic to read the IDs from the info file instead of the paths. But I 
also think we can punt this for now (as I do not think a task ID like 
'Hello%3AWorld' is plausibly in use right now), and implement a fix for 
colons now that would remain compatible.

If we add encode/decode logic for colons on Windows, we do not introduce 
backward compatibility issues on other platforms (as we'd constrain the 
change to Windows), and in the future, we can safely replace the decode 
logic with your approach. That is to say, we implement the encoding as 
sparingly as possible, but implement it now, because it's kind of 
required, and we implement the decoding only as a stop-gap until we 
replace this logic with reading from the info file instead. If we later 
find another character in use that also needs to be encoded, we can then 
abstract the single encoding into a per-platform encoding set.

Does this seem reasonable?

Thanks,

Andy

P.S. Sorry this took a while to get back to, I was out last week.

On 08/23/2018 3:34 pm, Chun-Hung Hsiao wrote:
> I'm a bit concerned about the recovery logic and backward 
> compatibility:
> The changes we're making shouldn't affect existing users,
> and we should try hard to avoid any future backward compatibility 
> problem.
> 
> Say if there is already some custom framework using task ID 
> 'Hello%3AWorld',
> then if we blindly decode the task path during recovery, we will get 
> the
> wrong ID 'Hello:World'.
> On the other hand, if we don't decode the task path during recovery,
> then later on during checkpointing for the same task,
> we shouldn't blindly encode the task ID, because it might create a
> different path,
> and we'll need to introduce some migration code to avoid such 
> duplication.
> 
> Fortunately, we do checkpoint the executor IDs and task IDs in the info
> files under the meta dir.
> So I'm considering the following design to minimize the backward
> compatibility issue we might have:
> During recovery, we don't decode the recovered task path,
> but get the executor/task ID from the info file instead of relying on
> parsing the executor/task path.
> When checkpointing, we only encode executor/task IDs if they contain
> reserved characters.
> The set of reserved characters could be defined as a platform-dependent
> variable,
> similar to what we have done for `PATH_SEPARATOR`.
> 
> The above design would look a bit more complicated then just blindly
> applying percent encoding
> to when constructing checkpoint paths, but it doesn't require extra
> checkpoint migration logic,
> and would keep the exact same behavior we have now for "normal"
> executor/task IDs.
> 
> What did you guys think? Please feel free to raise any concern :)
> And we don't need to implement the whole thing for now.
> For example, we could start with just dealing with colons,
> and extend the implementation later on,
> as long as the partial solution we're going to have right now doesn't
> create future tech debts!
> 
> Best,
> Chun-Hung
> 
> On Thu, Aug 23, 2018 at 1:42 PM Greg Mann <gr...@mesosphere.io> wrote:
> 
>> Thanks Andy! Responses inlined below.
>> 
>> 
>> 
>>> No: As the only character we've run into a problem with is `:`
>>> (MESOS-9109), it might not be worth it to generalize this to solve a 
>>> bunch
>>> of problems that we haven't encountered.
>>> 
>>> 
>> It's true that I'm not aware of other scenarios where
>> filesystem-disallowed characters in task/executor IDs have caused 
>> issues
>> for users, and this issue has existed for a long time. However, when
>> feasible I would like to fix issues that we're aware of before they 
>> cause
>> problems for users, rather than after. I would suggest that since we 
>> have
>> one compelling case that we need to address now, it's worth 
>> formulating an
>> approach for the general case, so that we can be sure any current work
>> doesn't get in our way later on.
>> 
>> 
>>> I'm somewhat comfortable doing so only for Windows, as we don't 
>>> really
>>> need to worry about the recovery scenario; but very uncomfortable 
>>> about
>>> doing so for Linux etc., for precisely that reason.
>>> 
>>> So expanding this is definitely up for debate; but we must fix the 
>>> bug
>>> with `:`.
>>> 
>>> 
>> Indeed, addressing the general case may prove to be much more complex 
>> - I
>> can certainly identify with this situation, where a fix for a smaller 
>> issue
>> turns into a big project :)
>> It may turn out to be possible to implement a scoped-down solution for 
>> the
>> colon case now, and extend it later on. I think it would be good if we
>> could at least get an idea of how we want to handle the general case 
>> now,
>> so that any short-term solutions can be a constructive step toward the
>> long-term.
>> 
>> Cheers,
>> G
>> 

Re: Follow up to discussion regarding use : in paths on Windows (MESOS-9109)

Posted by Andrew Schwartzmeyer <an...@schwartzmeyer.com>.
I think your approach would be fairly sound. That is, to change the 
logic to read the IDs from the info file instead of the paths. But I 
also think we can punt this for now (as I do not think a task ID like 
'Hello%3AWorld' is plausibly in use right now), and implement a fix for 
colons now that would remain compatible.

If we add encode/decode logic for colons on Windows, we do not introduce 
backward compatibility issues on other platforms (as we'd constrain the 
change to Windows), and in the future, we can safely replace the decode 
logic with your approach. That is to say, we implement the encoding as 
sparingly as possible, but implement it now, because it's kind of 
required, and we implement the decoding only as a stop-gap until we 
replace this logic with reading from the info file instead. If we later 
find another character in use that also needs to be encoded, we can then 
abstract the single encoding into a per-platform encoding set.

Does this seem reasonable?

Thanks,

Andy

P.S. Sorry this took a while to get back to, I was out last week.

On 08/23/2018 3:34 pm, Chun-Hung Hsiao wrote:
> I'm a bit concerned about the recovery logic and backward 
> compatibility:
> The changes we're making shouldn't affect existing users,
> and we should try hard to avoid any future backward compatibility 
> problem.
> 
> Say if there is already some custom framework using task ID 
> 'Hello%3AWorld',
> then if we blindly decode the task path during recovery, we will get 
> the
> wrong ID 'Hello:World'.
> On the other hand, if we don't decode the task path during recovery,
> then later on during checkpointing for the same task,
> we shouldn't blindly encode the task ID, because it might create a
> different path,
> and we'll need to introduce some migration code to avoid such 
> duplication.
> 
> Fortunately, we do checkpoint the executor IDs and task IDs in the info
> files under the meta dir.
> So I'm considering the following design to minimize the backward
> compatibility issue we might have:
> During recovery, we don't decode the recovered task path,
> but get the executor/task ID from the info file instead of relying on
> parsing the executor/task path.
> When checkpointing, we only encode executor/task IDs if they contain
> reserved characters.
> The set of reserved characters could be defined as a platform-dependent
> variable,
> similar to what we have done for `PATH_SEPARATOR`.
> 
> The above design would look a bit more complicated then just blindly
> applying percent encoding
> to when constructing checkpoint paths, but it doesn't require extra
> checkpoint migration logic,
> and would keep the exact same behavior we have now for "normal"
> executor/task IDs.
> 
> What did you guys think? Please feel free to raise any concern :)
> And we don't need to implement the whole thing for now.
> For example, we could start with just dealing with colons,
> and extend the implementation later on,
> as long as the partial solution we're going to have right now doesn't
> create future tech debts!
> 
> Best,
> Chun-Hung
> 
> On Thu, Aug 23, 2018 at 1:42 PM Greg Mann <gr...@mesosphere.io> wrote:
> 
>> Thanks Andy! Responses inlined below.
>> 
>> 
>> 
>>> No: As the only character we've run into a problem with is `:`
>>> (MESOS-9109), it might not be worth it to generalize this to solve a 
>>> bunch
>>> of problems that we haven't encountered.
>>> 
>>> 
>> It's true that I'm not aware of other scenarios where
>> filesystem-disallowed characters in task/executor IDs have caused 
>> issues
>> for users, and this issue has existed for a long time. However, when
>> feasible I would like to fix issues that we're aware of before they 
>> cause
>> problems for users, rather than after. I would suggest that since we 
>> have
>> one compelling case that we need to address now, it's worth 
>> formulating an
>> approach for the general case, so that we can be sure any current work
>> doesn't get in our way later on.
>> 
>> 
>>> I'm somewhat comfortable doing so only for Windows, as we don't 
>>> really
>>> need to worry about the recovery scenario; but very uncomfortable 
>>> about
>>> doing so for Linux etc., for precisely that reason.
>>> 
>>> So expanding this is definitely up for debate; but we must fix the 
>>> bug
>>> with `:`.
>>> 
>>> 
>> Indeed, addressing the general case may prove to be much more complex 
>> - I
>> can certainly identify with this situation, where a fix for a smaller 
>> issue
>> turns into a big project :)
>> It may turn out to be possible to implement a scoped-down solution for 
>> the
>> colon case now, and extend it later on. I think it would be good if we
>> could at least get an idea of how we want to handle the general case 
>> now,
>> so that any short-term solutions can be a constructive step toward the
>> long-term.
>> 
>> Cheers,
>> G
>> 

Re: Follow up to discussion regarding use : in paths on Windows (MESOS-9109)

Posted by Chun-Hung Hsiao <ch...@mesosphere.io>.
I'm a bit concerned about the recovery logic and backward compatibility:
The changes we're making shouldn't affect existing users,
and we should try hard to avoid any future backward compatibility problem.

Say if there is already some custom framework using task ID 'Hello%3AWorld',
then if we blindly decode the task path during recovery, we will get the
wrong ID 'Hello:World'.
On the other hand, if we don't decode the task path during recovery,
then later on during checkpointing for the same task,
we shouldn't blindly encode the task ID, because it might create a
different path,
and we'll need to introduce some migration code to avoid such duplication.

Fortunately, we do checkpoint the executor IDs and task IDs in the info
files under the meta dir.
So I'm considering the following design to minimize the backward
compatibility issue we might have:
During recovery, we don't decode the recovered task path,
but get the executor/task ID from the info file instead of relying on
parsing the executor/task path.
When checkpointing, we only encode executor/task IDs if they contain
reserved characters.
The set of reserved characters could be defined as a platform-dependent
variable,
similar to what we have done for `PATH_SEPARATOR`.

The above design would look a bit more complicated then just blindly
applying percent encoding
to when constructing checkpoint paths, but it doesn't require extra
checkpoint migration logic,
and would keep the exact same behavior we have now for "normal"
executor/task IDs.

What did you guys think? Please feel free to raise any concern :)
And we don't need to implement the whole thing for now.
For example, we could start with just dealing with colons,
and extend the implementation later on,
as long as the partial solution we're going to have right now doesn't
create future tech debts!

Best,
Chun-Hung

On Thu, Aug 23, 2018 at 1:42 PM Greg Mann <gr...@mesosphere.io> wrote:

> Thanks Andy! Responses inlined below.
>
>
>
>> No: As the only character we've run into a problem with is `:`
>> (MESOS-9109), it might not be worth it to generalize this to solve a bunch
>> of problems that we haven't encountered.
>>
>>
> It's true that I'm not aware of other scenarios where
> filesystem-disallowed characters in task/executor IDs have caused issues
> for users, and this issue has existed for a long time. However, when
> feasible I would like to fix issues that we're aware of before they cause
> problems for users, rather than after. I would suggest that since we have
> one compelling case that we need to address now, it's worth formulating an
> approach for the general case, so that we can be sure any current work
> doesn't get in our way later on.
>
>
>> I'm somewhat comfortable doing so only for Windows, as we don't really
>> need to worry about the recovery scenario; but very uncomfortable about
>> doing so for Linux etc., for precisely that reason.
>>
>> So expanding this is definitely up for debate; but we must fix the bug
>> with `:`.
>>
>>
> Indeed, addressing the general case may prove to be much more complex - I
> can certainly identify with this situation, where a fix for a smaller issue
> turns into a big project :)
> It may turn out to be possible to implement a scoped-down solution for the
> colon case now, and extend it later on. I think it would be good if we
> could at least get an idea of how we want to handle the general case now,
> so that any short-term solutions can be a constructive step toward the
> long-term.
>
> Cheers,
> G
>

Re: Follow up to discussion regarding use : in paths on Windows (MESOS-9109)

Posted by Chun-Hung Hsiao <ch...@mesosphere.io>.
I'm a bit concerned about the recovery logic and backward compatibility:
The changes we're making shouldn't affect existing users,
and we should try hard to avoid any future backward compatibility problem.

Say if there is already some custom framework using task ID 'Hello%3AWorld',
then if we blindly decode the task path during recovery, we will get the
wrong ID 'Hello:World'.
On the other hand, if we don't decode the task path during recovery,
then later on during checkpointing for the same task,
we shouldn't blindly encode the task ID, because it might create a
different path,
and we'll need to introduce some migration code to avoid such duplication.

Fortunately, we do checkpoint the executor IDs and task IDs in the info
files under the meta dir.
So I'm considering the following design to minimize the backward
compatibility issue we might have:
During recovery, we don't decode the recovered task path,
but get the executor/task ID from the info file instead of relying on
parsing the executor/task path.
When checkpointing, we only encode executor/task IDs if they contain
reserved characters.
The set of reserved characters could be defined as a platform-dependent
variable,
similar to what we have done for `PATH_SEPARATOR`.

The above design would look a bit more complicated then just blindly
applying percent encoding
to when constructing checkpoint paths, but it doesn't require extra
checkpoint migration logic,
and would keep the exact same behavior we have now for "normal"
executor/task IDs.

What did you guys think? Please feel free to raise any concern :)
And we don't need to implement the whole thing for now.
For example, we could start with just dealing with colons,
and extend the implementation later on,
as long as the partial solution we're going to have right now doesn't
create future tech debts!

Best,
Chun-Hung

On Thu, Aug 23, 2018 at 1:42 PM Greg Mann <gr...@mesosphere.io> wrote:

> Thanks Andy! Responses inlined below.
>
>
>
>> No: As the only character we've run into a problem with is `:`
>> (MESOS-9109), it might not be worth it to generalize this to solve a bunch
>> of problems that we haven't encountered.
>>
>>
> It's true that I'm not aware of other scenarios where
> filesystem-disallowed characters in task/executor IDs have caused issues
> for users, and this issue has existed for a long time. However, when
> feasible I would like to fix issues that we're aware of before they cause
> problems for users, rather than after. I would suggest that since we have
> one compelling case that we need to address now, it's worth formulating an
> approach for the general case, so that we can be sure any current work
> doesn't get in our way later on.
>
>
>> I'm somewhat comfortable doing so only for Windows, as we don't really
>> need to worry about the recovery scenario; but very uncomfortable about
>> doing so for Linux etc., for precisely that reason.
>>
>> So expanding this is definitely up for debate; but we must fix the bug
>> with `:`.
>>
>>
> Indeed, addressing the general case may prove to be much more complex - I
> can certainly identify with this situation, where a fix for a smaller issue
> turns into a big project :)
> It may turn out to be possible to implement a scoped-down solution for the
> colon case now, and extend it later on. I think it would be good if we
> could at least get an idea of how we want to handle the general case now,
> so that any short-term solutions can be a constructive step toward the
> long-term.
>
> Cheers,
> G
>

Re: Follow up to discussion regarding use : in paths on Windows (MESOS-9109)

Posted by Greg Mann <gr...@mesosphere.io>.
Thanks Andy! Responses inlined below.



> No: As the only character we've run into a problem with is `:`
> (MESOS-9109), it might not be worth it to generalize this to solve a bunch
> of problems that we haven't encountered.
>
>
It's true that I'm not aware of other scenarios where filesystem-disallowed
characters in task/executor IDs have caused issues for users, and this
issue has existed for a long time. However, when feasible I would like to
fix issues that we're aware of before they cause problems for users, rather
than after. I would suggest that since we have one compelling case that we
need to address now, it's worth formulating an approach for the general
case, so that we can be sure any current work doesn't get in our way later
on.


> I'm somewhat comfortable doing so only for Windows, as we don't really
> need to worry about the recovery scenario; but very uncomfortable about
> doing so for Linux etc., for precisely that reason.
>
> So expanding this is definitely up for debate; but we must fix the bug
> with `:`.
>
>
Indeed, addressing the general case may prove to be much more complex - I
can certainly identify with this situation, where a fix for a smaller issue
turns into a big project :)
It may turn out to be possible to implement a scoped-down solution for the
colon case now, and extend it later on. I think it would be good if we
could at least get an idea of how we want to handle the general case now,
so that any short-term solutions can be a constructive step toward the
long-term.

Cheers,
G

Re: Follow up to discussion regarding use : in paths on Windows (MESOS-9109)

Posted by Greg Mann <gr...@mesosphere.io>.
Thanks Andy! Responses inlined below.



> No: As the only character we've run into a problem with is `:`
> (MESOS-9109), it might not be worth it to generalize this to solve a bunch
> of problems that we haven't encountered.
>
>
It's true that I'm not aware of other scenarios where filesystem-disallowed
characters in task/executor IDs have caused issues for users, and this
issue has existed for a long time. However, when feasible I would like to
fix issues that we're aware of before they cause problems for users, rather
than after. I would suggest that since we have one compelling case that we
need to address now, it's worth formulating an approach for the general
case, so that we can be sure any current work doesn't get in our way later
on.


> I'm somewhat comfortable doing so only for Windows, as we don't really
> need to worry about the recovery scenario; but very uncomfortable about
> doing so for Linux etc., for precisely that reason.
>
> So expanding this is definitely up for debate; but we must fix the bug
> with `:`.
>
>
Indeed, addressing the general case may prove to be much more complex - I
can certainly identify with this situation, where a fix for a smaller issue
turns into a big project :)
It may turn out to be possible to implement a scoped-down solution for the
colon case now, and extend it later on. I think it would be good if we
could at least get an idea of how we want to handle the general case now,
so that any short-term solutions can be a constructive step toward the
long-term.

Cheers,
G

Re: Follow up to discussion regarding use : in paths on Windows (MESOS-9109)

Posted by Andrew Schwartzmeyer <an...@schwartzmeyer.com>.
 > I'm surprised we haven't run into issues like this before on Linux.

Indeed!

> I'm wondering if this warrants a general solution that could take care of all filesystem-disallowed characters. 

I think yes and no.

Yes: it'd be easy to just `process::http::encode(executorId/taskId)`
(though it might look funny) right where I'm currently applying the same
but only to `:`.

No: As the only character we've run into a problem with is `:`
(MESOS-9109), it might not be worth it to generalize this to solve a
bunch of problems that we haven't encountered.

I'm somewhat comfortable doing so only for Windows, as we don't really
need to worry about the recovery scenario; but very uncomfortable about
doing so for Linux etc., for precisely that reason.

So expanding this is definitely up for debate; but we must fix the bug
with `:`.

Thanks for the feedback,

Andy

On 08/22/2018 5:11 pm, Greg Mann wrote: 

> Thanks for addressing this Andy!! AFAIK we allow all characters in executor and task IDs; I'm surprised we haven't run into issues like this before on Linux. 
> 
> The percent-encoding approach seems fine to me. As long as the percent character isn't an issue on any filesystems that we're interested in? As a starting point, Wikipedia seems to have a decent survey of restrictions on different filesystems here [5]. Looks like the percent character may be fine. 
> 
> I wonder if there are other characters we should be concerned about? I'm guessing we should worry about slashes and backslashes as well? Seems like a more general solution might help us avoid similar pitfalls in the future. Perhaps we could just percent-encode executor and task IDs before we write to disk? If we did this, we would have issues during recovery to consider, where we need to look for "old" paths when recovering state from an "old" agent. 
> 
> In any case, I'm wondering if this warrants a general solution that could take care of all filesystem-disallowed characters. WDYT? 
> 
> Cheers, 
> Greg 
> 
> On Tue, Aug 21, 2018 at 2:02 PM, Andrew Schwartzmeyer <an...@schwartzmeyer.com> wrote:
> 
>> Hey all,
>> 
>> I have a set of patches up for MESOS-9109 that I need reviewed, starting here: https://reviews.apache.org/r/68297/ [1].
>> 
>> Eduard here was trying to use Chronos to schedule a task on a Windows agent, and found an error due to the fact that Chronos uses colons (as in `:`) in its generated framework (and task) IDs. Now, to maintain backward compatibility, we obviously can't disallow the use of `:` as there are frameworks already using it. However, this is a reserved character on Windows for file system paths (https://docs.microsoft.com/en-us/windows/desktop/FileIO/naming-a-file [2]), so it cannot be in the path.
>> 
>> My first implementation simply applied `s/:/_COLON_` to `frameworkId` and `taskId` in the functions in `paths.cpp` which generate Mesos's filesystem paths. While this worked, it's kind of a kludge. Or that is to say, it would nicer to use the ASCII representation of `%3A` instead. Doing so, however, revealed a bug in libprocess (MESOS-9168) that I have also fixed and need reviewed, starting here: https://reviews.apache.org/r/68420/ [3]
>> 
>> So combining the two fixes, the chain maps `:` in `frameworkId` and `taskId` to `%3A` (and back when appropriate). This obviously doesn't fix any third-party tooling, but being Windows, I don't think there is any yet to worry about.
>> 
>> I wanted to get this in for 1.7, but due to a miscommunication, we were not able to land it in time. If you can, please review! Or if you have a better way of doing this, let me know!
>> 
>> Thanks,
>> 
>> Andy
>> 
>> P.S. Original discussion here: https://mesos.slack.com/archives/C1LPTK50T/p1533324650000396 [4] (our Slack archives seem to be down, so this is only available until Slack cycles out sadly).

 

Links:
------
[1] https://reviews.apache.org/r/68297/
[2]
https://docs.microsoft.com/en-us/windows/desktop/FileIO/naming-a-file
[3] https://reviews.apache.org/r/68420/
[4] https://mesos.slack.com/archives/C1LPTK50T/p1533324650000396
[5]
https://en.wikipedia.org/wiki/Filename#Comparison_of_filename_limitations

Re: Follow up to discussion regarding use : in paths on Windows (MESOS-9109)

Posted by Andrew Schwartzmeyer <an...@schwartzmeyer.com>.
 > I'm surprised we haven't run into issues like this before on Linux.

Indeed!

> I'm wondering if this warrants a general solution that could take care of all filesystem-disallowed characters. 

I think yes and no.

Yes: it'd be easy to just `process::http::encode(executorId/taskId)`
(though it might look funny) right where I'm currently applying the same
but only to `:`.

No: As the only character we've run into a problem with is `:`
(MESOS-9109), it might not be worth it to generalize this to solve a
bunch of problems that we haven't encountered.

I'm somewhat comfortable doing so only for Windows, as we don't really
need to worry about the recovery scenario; but very uncomfortable about
doing so for Linux etc., for precisely that reason.

So expanding this is definitely up for debate; but we must fix the bug
with `:`.

Thanks for the feedback,

Andy

On 08/22/2018 5:11 pm, Greg Mann wrote: 

> Thanks for addressing this Andy!! AFAIK we allow all characters in executor and task IDs; I'm surprised we haven't run into issues like this before on Linux. 
> 
> The percent-encoding approach seems fine to me. As long as the percent character isn't an issue on any filesystems that we're interested in? As a starting point, Wikipedia seems to have a decent survey of restrictions on different filesystems here [5]. Looks like the percent character may be fine. 
> 
> I wonder if there are other characters we should be concerned about? I'm guessing we should worry about slashes and backslashes as well? Seems like a more general solution might help us avoid similar pitfalls in the future. Perhaps we could just percent-encode executor and task IDs before we write to disk? If we did this, we would have issues during recovery to consider, where we need to look for "old" paths when recovering state from an "old" agent. 
> 
> In any case, I'm wondering if this warrants a general solution that could take care of all filesystem-disallowed characters. WDYT? 
> 
> Cheers, 
> Greg 
> 
> On Tue, Aug 21, 2018 at 2:02 PM, Andrew Schwartzmeyer <an...@schwartzmeyer.com> wrote:
> 
>> Hey all,
>> 
>> I have a set of patches up for MESOS-9109 that I need reviewed, starting here: https://reviews.apache.org/r/68297/ [1].
>> 
>> Eduard here was trying to use Chronos to schedule a task on a Windows agent, and found an error due to the fact that Chronos uses colons (as in `:`) in its generated framework (and task) IDs. Now, to maintain backward compatibility, we obviously can't disallow the use of `:` as there are frameworks already using it. However, this is a reserved character on Windows for file system paths (https://docs.microsoft.com/en-us/windows/desktop/FileIO/naming-a-file [2]), so it cannot be in the path.
>> 
>> My first implementation simply applied `s/:/_COLON_` to `frameworkId` and `taskId` in the functions in `paths.cpp` which generate Mesos's filesystem paths. While this worked, it's kind of a kludge. Or that is to say, it would nicer to use the ASCII representation of `%3A` instead. Doing so, however, revealed a bug in libprocess (MESOS-9168) that I have also fixed and need reviewed, starting here: https://reviews.apache.org/r/68420/ [3]
>> 
>> So combining the two fixes, the chain maps `:` in `frameworkId` and `taskId` to `%3A` (and back when appropriate). This obviously doesn't fix any third-party tooling, but being Windows, I don't think there is any yet to worry about.
>> 
>> I wanted to get this in for 1.7, but due to a miscommunication, we were not able to land it in time. If you can, please review! Or if you have a better way of doing this, let me know!
>> 
>> Thanks,
>> 
>> Andy
>> 
>> P.S. Original discussion here: https://mesos.slack.com/archives/C1LPTK50T/p1533324650000396 [4] (our Slack archives seem to be down, so this is only available until Slack cycles out sadly).

 

Links:
------
[1] https://reviews.apache.org/r/68297/
[2]
https://docs.microsoft.com/en-us/windows/desktop/FileIO/naming-a-file
[3] https://reviews.apache.org/r/68420/
[4] https://mesos.slack.com/archives/C1LPTK50T/p1533324650000396
[5]
https://en.wikipedia.org/wiki/Filename#Comparison_of_filename_limitations

Re: Follow up to discussion regarding use : in paths on Windows (MESOS-9109)

Posted by Greg Mann <gr...@mesosphere.io>.
Thanks for addressing this Andy!! AFAIK we allow all characters in executor
and task IDs; I'm surprised we haven't run into issues like this before on
Linux.

The percent-encoding approach seems fine to me. As long as the percent
character isn't an issue on any filesystems that we're interested in? As a
starting point, Wikipedia seems to have a decent survey of restrictions on
different filesystems here
<https://en.wikipedia.org/wiki/Filename#Comparison_of_filename_limitations>.
Looks like the percent character may be fine.

I wonder if there are other characters we should be concerned about? I'm
guessing we should worry about slashes and backslashes as well? Seems like
a more general solution might help us avoid similar pitfalls in the future.
Perhaps we could just percent-encode executor and task IDs before we write
to disk? If we did this, we would have issues during recovery to consider,
where we need to look for "old" paths when recovering state from an "old"
agent.

In any case, I'm wondering if this warrants a general solution that could
take care of all filesystem-disallowed characters. WDYT?

Cheers,
Greg


On Tue, Aug 21, 2018 at 2:02 PM, Andrew Schwartzmeyer <
andrew@schwartzmeyer.com> wrote:

> Hey all,
>
> I have a set of patches up for MESOS-9109 that I need reviewed, starting
> here: https://reviews.apache.org/r/68297/.
>
> Eduard here was trying to use Chronos to schedule a task on a Windows
> agent, and found an error due to the fact that Chronos uses colons (as in
> `:`) in its generated framework (and task) IDs. Now, to maintain backward
> compatibility, we obviously can't disallow the use of `:` as there are
> frameworks already using it. However, this is a reserved character on
> Windows for file system paths (https://docs.microsoft.com/en
> -us/windows/desktop/FileIO/naming-a-file), so it cannot be in the path.
>
> My first implementation simply applied `s/:/_COLON_` to `frameworkId` and
> `taskId` in the functions in `paths.cpp` which generate Mesos's filesystem
> paths. While this worked, it's kind of a kludge. Or that is to say, it
> would nicer to use the ASCII representation of `%3A` instead. Doing so,
> however, revealed a bug in libprocess (MESOS-9168) that I have also fixed
> and need reviewed, starting here: https://reviews.apache.org/r/68420/
>
> So combining the two fixes, the chain maps `:` in `frameworkId` and
> `taskId` to `%3A` (and back when appropriate). This obviously doesn't fix
> any third-party tooling, but being Windows, I don't think there is any yet
> to worry about.
>
> I wanted to get this in for 1.7, but due to a miscommunication, we were
> not able to land it in time. If you can, please review! Or if you have a
> better way of doing this, let me know!
>
> Thanks,
>
> Andy
>
> P.S. Original discussion here: https://mesos.slack.com/archiv
> es/C1LPTK50T/p1533324650000396 (our Slack archives seem to be down, so
> this is only available until Slack cycles out sadly).
>

Re: Follow up to discussion regarding use : in paths on Windows (MESOS-9109)

Posted by Greg Mann <gr...@mesosphere.io>.
Thanks for addressing this Andy!! AFAIK we allow all characters in executor
and task IDs; I'm surprised we haven't run into issues like this before on
Linux.

The percent-encoding approach seems fine to me. As long as the percent
character isn't an issue on any filesystems that we're interested in? As a
starting point, Wikipedia seems to have a decent survey of restrictions on
different filesystems here
<https://en.wikipedia.org/wiki/Filename#Comparison_of_filename_limitations>.
Looks like the percent character may be fine.

I wonder if there are other characters we should be concerned about? I'm
guessing we should worry about slashes and backslashes as well? Seems like
a more general solution might help us avoid similar pitfalls in the future.
Perhaps we could just percent-encode executor and task IDs before we write
to disk? If we did this, we would have issues during recovery to consider,
where we need to look for "old" paths when recovering state from an "old"
agent.

In any case, I'm wondering if this warrants a general solution that could
take care of all filesystem-disallowed characters. WDYT?

Cheers,
Greg


On Tue, Aug 21, 2018 at 2:02 PM, Andrew Schwartzmeyer <
andrew@schwartzmeyer.com> wrote:

> Hey all,
>
> I have a set of patches up for MESOS-9109 that I need reviewed, starting
> here: https://reviews.apache.org/r/68297/.
>
> Eduard here was trying to use Chronos to schedule a task on a Windows
> agent, and found an error due to the fact that Chronos uses colons (as in
> `:`) in its generated framework (and task) IDs. Now, to maintain backward
> compatibility, we obviously can't disallow the use of `:` as there are
> frameworks already using it. However, this is a reserved character on
> Windows for file system paths (https://docs.microsoft.com/en
> -us/windows/desktop/FileIO/naming-a-file), so it cannot be in the path.
>
> My first implementation simply applied `s/:/_COLON_` to `frameworkId` and
> `taskId` in the functions in `paths.cpp` which generate Mesos's filesystem
> paths. While this worked, it's kind of a kludge. Or that is to say, it
> would nicer to use the ASCII representation of `%3A` instead. Doing so,
> however, revealed a bug in libprocess (MESOS-9168) that I have also fixed
> and need reviewed, starting here: https://reviews.apache.org/r/68420/
>
> So combining the two fixes, the chain maps `:` in `frameworkId` and
> `taskId` to `%3A` (and back when appropriate). This obviously doesn't fix
> any third-party tooling, but being Windows, I don't think there is any yet
> to worry about.
>
> I wanted to get this in for 1.7, but due to a miscommunication, we were
> not able to land it in time. If you can, please review! Or if you have a
> better way of doing this, let me know!
>
> Thanks,
>
> Andy
>
> P.S. Original discussion here: https://mesos.slack.com/archiv
> es/C1LPTK50T/p1533324650000396 (our Slack archives seem to be down, so
> this is only available until Slack cycles out sadly).
>