You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Abdul Hadi Shakir <sh...@gmail.com> on 2023/01/12 07:42:43 UTC

[VOTE][ISSUE-22073] Finalising approach for displaying non-ascii characters in DAG display name

Hi team,

While discussing the approach for
https://github.com/apache/airflow/issues/22073 (adding support for national
characters in DAG display name) - two approaches came out. Need votes to
finalise on one of the two:

   1. [Vote *+1*] Using *name* as the only parameter; and then generating a
   unique *dag_id* from it using *slugify*. This makes the interface
   simpler; but it makes *dag_id* unknown from the users. Ongoing PR for
   this: https://github.com/apache/airflow/pull/28183
   2. [Vote -*1*] To use *display_name* along with *dag_id* as DAG params.
   While this is a simpler solution on the backend - it needs lots of work on
   the frontend for a consistent experience. Ongoing PR for this:
   https://github.com/apache/airflow/pull/27145

Cheers,
*Abdul Hadi Shakir*

Re: [VOTE][ISSUE-22073] Finalising approach for displaying non-ascii characters in DAG display name

Posted by Ping Zhang <pi...@umich.edu>.
Hi Abdul,

Friendly bump this thread. Do we have an agreement on which route we
are going to take?

Thanks,

Ping


On Fri, Jan 20, 2023 at 3:50 PM Ping Zhang <pi...@umich.edu> wrote:

> Hi,
>
> I would vote for: `[Vote -1] To use display_name along with dag_id as DAG
> params`.
>
> `dag_id` is a fundamental core concept in the system, changing that might
> have some repercussions. There is also another advantage of having the
> display_name, which allows the UI to display a more meaningful and
> human-readable format of dags.
>
> Thanks,
>
> Ping
>
>
> On Thu, Jan 19, 2023 at 3:47 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> One more thing here. I think historically we also have other places we
>> don't fully realise where we rely on dag_id being ASCII.
>>
>> Example here https://github.com/apache/airflow/issues/18010 where
>> non-ASCII dag_id causes airflow scheduler to crash (and in
>> irrecoverable way at that) when statsd is enabled. Another example are the
>> log file names (there are still filesystems that do not support unicode or
>> they can have different encoding. But also there are a number of edge cases
>> (can we have ":" in the DAG_ID, or "';" ? or even space? Those are not
>> insurmountable of course - for example we could also slugify the statsd
>> label. But then those are opening up pandora's box of problems that we
>> won't be able to fully unit test and prevent - because we can't reasonably
>> test all kinds of combinations of filesystems, encoding, telemetry systems
>> etc. etc. And we will have plenty of errors reported after we release it.
>>
>> I personally think that if we continue insisting on "let's make the
>> dag_id unicode" we might keep on bumping on those issues we do not
>> understand or realise and that might mean - more issues, more frustrated
>> users, more support and diagnostics for the community. While I see how
>> "idealistic" approach on getting the id being fully "internationalizable"
>> is (hey we are in 2023, we should be able to handle non-ASCII as id),
>> pragmatically speaking, getting extra "display" while keeping the original
>> dag_id  as ASCII unique identifier might be much pragmatic and attainable.
>>
>> I remember similar discussions happening for > 1 year now, any attempt to
>> add an "extra" display was effectively blocked by a strong opposition "hey
>> we should make the dag_id unicode" ... and nothing happened. Thus
>> effectively our users cannot use non-ASCII characters in dag_id really.
>>
>> We might continue having the idea that the dag_id "can be non-ASCII", but
>> I think it is quite difficult to pull off and no-one has the courage to try
>> it and take responsibility to tackle all the issues (some of which we do
>> not even know).
>>
>> I would rather attempt it in a pragmatic way - even if it means we need
>> to add an extra "display" entity. Seems like more predictable in terms of
>> problems our users will experience.
>>
>> J,
>>
>>
>> On Mon, Jan 16, 2023 at 9:19 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>>> > Possibly contentious idea: We allow unicode dag_ids for Postgres,
>>> MSQQL (and sqllite) but for Mysql we enforce it as ASCII only.
>>>
>>> If we can make it easy and make sure it is handled well and we do not
>>> have extra maintenance - yes, I think it is viable solution. One more
>>> reason for those who use MySQL to migrate :D
>>>
>>>
>>>
>>> On Thu, Jan 12, 2023 at 10:57 PM Ash Berlin-Taylor <as...@apache.org>
>>> wrote:
>>>
>>>> The description of option 1 is wrong/misleading. It sounds like you are
>>>> proposing DAG(name="tést") but what actually happens its that gets
>>>> magically changed to `test` behind the users back. -1 veto to that.
>>>>
>>>> So it's Option2 or some combo of the "break MySQL in some way" options.
>>>>
>>>> -ash
>>>>
>>>> On Jan 12 2023, at 9:50 pm, Ash Berlin-Taylor <as...@apache.org> wrote:
>>>>
>>>>
>>>> Possibly contentious idea: We allow unicode dag_ids for Postgres, MSQQL
>>>> (and sqllite) but for Mysql we enforce it as ASCII only.
>>>>
>>>> On Jan 12 2023, at 6:15 pm, Jarek Potiuk <ja...@potiuk.com> wrote:
>>>>
>>>> As I mentioned multiple times in similar discussions We have a huge
>>>> problem with unicode in dag_id. Namely MySQL limit on indexes. We would
>>>> have to shorten the Id significantly in the database to workaround MySQL
>>>> limits for index size.
>>>>
>>>> We can have a wishful thinking that we can change dag_id to unicode but
>>>> until someone solves the problem - this is just this - wishful thinking.
>>>>
>>>> If someone has a proposal how to do it without breaking compatibility
>>>> or enormously complicating mysql case (or if we drop mysql proposal) - I
>>>> would also be for what Daniel said. But so far I have not seen any.
>>>>
>>>> So in the absence of a viable way to add unicode to dag_id (which
>>>> currently IMHO is not an option) my vote goes to 2.
>>>>
>>>> We can also drop MySQL support :D
>>>>
>>>> J.
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Jan 12, 2023 at 9:56 AM Ash Berlin-Taylor <as...@apache.org>
>>>> wrote:
>>>>
>>>> +1 to what Daniel said
>>>>
>>>> On 12 January 2023 08:32:29 GMT, Daniel Standish
>>>> <da...@astronomer.io.INVALID> wrote:
>>>>
>>>> 1 appears to have potential fix:
>>>> https://github.com/apache/airflow/issues/21127#issuecomment-1030673862
>>>> 2. seems to fail due to our own ascii enforcement... what if we remove
>>>> that?
>>>> 3. does not appear to be unicode-related or dag_id-related but a
>>>> feature request for user-friendly mapped task aliases...
>>>>
>>>> not saying we should not add a "name" of some kind... but ... does not
>>>> yet seem clear we can't just enable unicode...  i know others have given
>>>> this much more thought than I and maybe they can chime in with other
>>>> concerns we may have encountered as this idea has bounced around
>>>>
>>>> On Thu, Jan 12, 2023 at 12:01 AM Abdul Hadi Shakir <
>>>> shakir.iitd@gmail.com> wrote:
>>>>
>>>> Directly using non-ascii characters (unicodes included) in *dag_id* breaks
>>>> couples of functionalities. See issues:
>>>>
>>>>    - Fail to download task log if there are Chinese characters in
>>>>    dag_id #21127 <https://github.com/apache/airflow/issues/21127>
>>>>    - Airflow scheduler with statsd enabled crashes when dag_id
>>>>    contains unexpected characters #18010
>>>>    <https://github.com/apache/airflow/issues/18010>
>>>>    - Names for expanded tasks #23020
>>>>    <https://github.com/apache/airflow/issues/23020>
>>>>
>>>> *Abdul Hadi Shakir*
>>>>
>>>>
>>>> On Thu, Jan 12, 2023 at 1:19 PM Daniel Standish
>>>> <da...@astronomer.io.invalid> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Is it not possible to just have unicode dag_id with no distinct
>>>> "name"?  If you explored this route and encountered problems which caused
>>>> you to abandon, can you share what were the problems?
>>>>
>>>> I think having just one ID for a dag is a nice thing, if we can keep it.
>>>>
>>>> On Wed, Jan 11, 2023 at 11:43 PM Abdul Hadi Shakir <
>>>> shakir.iitd@gmail.com> wrote:
>>>>
>>>> Hi team,
>>>>
>>>> While discussing the approach for
>>>> https://github.com/apache/airflow/issues/22073 (adding support for
>>>> national characters in DAG display name) - two approaches came out. Need
>>>> votes to finalise on one of the two:
>>>>
>>>>    1. [Vote *+1*] Using *name* as the only parameter; and then
>>>>    generating a unique *dag_id* from it using *slugify*. This makes
>>>>    the interface simpler; but it makes *dag_id* unknown from the
>>>>    users. Ongoing PR for this:
>>>>    https://github.com/apache/airflow/pull/28183
>>>>    2. [Vote -*1*] To use *display_name* along with *dag_id* as DAG
>>>>    params. While this is a simpler solution on the backend - it needs lots of
>>>>    work on the frontend for a consistent experience. Ongoing PR for this:
>>>>    https://github.com/apache/airflow/pull/27145
>>>>    3.
>>>>
>>>> Cheers,
>>>> *Abdul Hadi Shakir*
>>>>
>>>>

Re: [VOTE][ISSUE-22073] Finalising approach for displaying non-ascii characters in DAG display name

Posted by Ping Zhang <pi...@umich.edu>.
Hi,

I would vote for: `[Vote -1] To use display_name along with dag_id as DAG
params`.

`dag_id` is a fundamental core concept in the system, changing that might
have some repercussions. There is also another advantage of having the
display_name, which allows the UI to display a more meaningful and
human-readable format of dags.

Thanks,

Ping


On Thu, Jan 19, 2023 at 3:47 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> One more thing here. I think historically we also have other places we
> don't fully realise where we rely on dag_id being ASCII.
>
> Example here https://github.com/apache/airflow/issues/18010 where
> non-ASCII dag_id causes airflow scheduler to crash (and in
> irrecoverable way at that) when statsd is enabled. Another example are the
> log file names (there are still filesystems that do not support unicode or
> they can have different encoding. But also there are a number of edge cases
> (can we have ":" in the DAG_ID, or "';" ? or even space? Those are not
> insurmountable of course - for example we could also slugify the statsd
> label. But then those are opening up pandora's box of problems that we
> won't be able to fully unit test and prevent - because we can't reasonably
> test all kinds of combinations of filesystems, encoding, telemetry systems
> etc. etc. And we will have plenty of errors reported after we release it.
>
> I personally think that if we continue insisting on "let's make the dag_id
> unicode" we might keep on bumping on those issues we do not understand or
> realise and that might mean - more issues, more frustrated users, more
> support and diagnostics for the community. While I see how "idealistic"
> approach on getting the id being fully "internationalizable" is (hey we are
> in 2023, we should be able to handle non-ASCII as id), pragmatically
> speaking, getting extra "display" while keeping the original dag_id  as
> ASCII unique identifier might be much pragmatic and attainable.
>
> I remember similar discussions happening for > 1 year now, any attempt to
> add an "extra" display was effectively blocked by a strong opposition "hey
> we should make the dag_id unicode" ... and nothing happened. Thus
> effectively our users cannot use non-ASCII characters in dag_id really.
>
> We might continue having the idea that the dag_id "can be non-ASCII", but
> I think it is quite difficult to pull off and no-one has the courage to try
> it and take responsibility to tackle all the issues (some of which we do
> not even know).
>
> I would rather attempt it in a pragmatic way - even if it means we need to
> add an extra "display" entity. Seems like more predictable in terms of
> problems our users will experience.
>
> J,
>
>
> On Mon, Jan 16, 2023 at 9:19 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> > Possibly contentious idea: We allow unicode dag_ids for Postgres, MSQQL
>> (and sqllite) but for Mysql we enforce it as ASCII only.
>>
>> If we can make it easy and make sure it is handled well and we do not
>> have extra maintenance - yes, I think it is viable solution. One more
>> reason for those who use MySQL to migrate :D
>>
>>
>>
>> On Thu, Jan 12, 2023 at 10:57 PM Ash Berlin-Taylor <as...@apache.org>
>> wrote:
>>
>>> The description of option 1 is wrong/misleading. It sounds like you are
>>> proposing DAG(name="tést") but what actually happens its that gets
>>> magically changed to `test` behind the users back. -1 veto to that.
>>>
>>> So it's Option2 or some combo of the "break MySQL in some way" options.
>>>
>>> -ash
>>>
>>> On Jan 12 2023, at 9:50 pm, Ash Berlin-Taylor <as...@apache.org> wrote:
>>>
>>>
>>> Possibly contentious idea: We allow unicode dag_ids for Postgres, MSQQL
>>> (and sqllite) but for Mysql we enforce it as ASCII only.
>>>
>>> On Jan 12 2023, at 6:15 pm, Jarek Potiuk <ja...@potiuk.com> wrote:
>>>
>>> As I mentioned multiple times in similar discussions We have a huge
>>> problem with unicode in dag_id. Namely MySQL limit on indexes. We would
>>> have to shorten the Id significantly in the database to workaround MySQL
>>> limits for index size.
>>>
>>> We can have a wishful thinking that we can change dag_id to unicode but
>>> until someone solves the problem - this is just this - wishful thinking.
>>>
>>> If someone has a proposal how to do it without breaking compatibility or
>>> enormously complicating mysql case (or if we drop mysql proposal) - I would
>>> also be for what Daniel said. But so far I have not seen any.
>>>
>>> So in the absence of a viable way to add unicode to dag_id (which
>>> currently IMHO is not an option) my vote goes to 2.
>>>
>>> We can also drop MySQL support :D
>>>
>>> J.
>>>
>>>
>>>
>>>
>>> On Thu, Jan 12, 2023 at 9:56 AM Ash Berlin-Taylor <as...@apache.org>
>>> wrote:
>>>
>>> +1 to what Daniel said
>>>
>>> On 12 January 2023 08:32:29 GMT, Daniel Standish
>>> <da...@astronomer.io.INVALID> wrote:
>>>
>>> 1 appears to have potential fix:
>>> https://github.com/apache/airflow/issues/21127#issuecomment-1030673862
>>> 2. seems to fail due to our own ascii enforcement... what if we remove
>>> that?
>>> 3. does not appear to be unicode-related or dag_id-related but a feature
>>> request for user-friendly mapped task aliases...
>>>
>>> not saying we should not add a "name" of some kind... but ... does not
>>> yet seem clear we can't just enable unicode...  i know others have given
>>> this much more thought than I and maybe they can chime in with other
>>> concerns we may have encountered as this idea has bounced around
>>>
>>> On Thu, Jan 12, 2023 at 12:01 AM Abdul Hadi Shakir <
>>> shakir.iitd@gmail.com> wrote:
>>>
>>> Directly using non-ascii characters (unicodes included) in *dag_id* breaks
>>> couples of functionalities. See issues:
>>>
>>>    - Fail to download task log if there are Chinese characters in
>>>    dag_id #21127 <https://github.com/apache/airflow/issues/21127>
>>>    - Airflow scheduler with statsd enabled crashes when dag_id contains
>>>    unexpected characters #18010
>>>    <https://github.com/apache/airflow/issues/18010>
>>>    - Names for expanded tasks #23020
>>>    <https://github.com/apache/airflow/issues/23020>
>>>
>>> *Abdul Hadi Shakir*
>>>
>>>
>>> On Thu, Jan 12, 2023 at 1:19 PM Daniel Standish
>>> <da...@astronomer.io.invalid> wrote:
>>>
>>> Hi,
>>>
>>> Is it not possible to just have unicode dag_id with no distinct "name"?
>>> If you explored this route and encountered problems which caused you to
>>> abandon, can you share what were the problems?
>>>
>>> I think having just one ID for a dag is a nice thing, if we can keep it.
>>>
>>> On Wed, Jan 11, 2023 at 11:43 PM Abdul Hadi Shakir <
>>> shakir.iitd@gmail.com> wrote:
>>>
>>> Hi team,
>>>
>>> While discussing the approach for
>>> https://github.com/apache/airflow/issues/22073 (adding support for
>>> national characters in DAG display name) - two approaches came out. Need
>>> votes to finalise on one of the two:
>>>
>>>    1. [Vote *+1*] Using *name* as the only parameter; and then
>>>    generating a unique *dag_id* from it using *slugify*. This makes the
>>>    interface simpler; but it makes *dag_id* unknown from the users.
>>>    Ongoing PR for this: https://github.com/apache/airflow/pull/28183
>>>    2. [Vote -*1*] To use *display_name* along with *dag_id* as DAG
>>>    params. While this is a simpler solution on the backend - it needs lots of
>>>    work on the frontend for a consistent experience. Ongoing PR for this:
>>>    https://github.com/apache/airflow/pull/27145
>>>    3.
>>>
>>> Cheers,
>>> *Abdul Hadi Shakir*
>>>
>>>

Re: [VOTE][ISSUE-22073] Finalising approach for displaying non-ascii characters in DAG display name

Posted by Jarek Potiuk <ja...@potiuk.com>.
One more thing here. I think historically we also have other places we
don't fully realise where we rely on dag_id being ASCII.

Example here https://github.com/apache/airflow/issues/18010 where non-ASCII
dag_id causes airflow scheduler to crash (and in irrecoverable way at that)
when statsd is enabled. Another example are the log file names (there are
still filesystems that do not support unicode or they can have different
encoding. But also there are a number of edge cases (can we have ":" in the
DAG_ID, or "';" ? or even space? Those are not insurmountable of course -
for example we could also slugify the statsd label. But then those are
opening up pandora's box of problems that we won't be able to fully unit
test and prevent - because we can't reasonably test all kinds of
combinations of filesystems, encoding, telemetry systems etc. etc. And we
will have plenty of errors reported after we release it.

I personally think that if we continue insisting on "let's make the dag_id
unicode" we might keep on bumping on those issues we do not understand or
realise and that might mean - more issues, more frustrated users, more
support and diagnostics for the community. While I see how "idealistic"
approach on getting the id being fully "internationalizable" is (hey we are
in 2023, we should be able to handle non-ASCII as id), pragmatically
speaking, getting extra "display" while keeping the original dag_id  as
ASCII unique identifier might be much pragmatic and attainable.

I remember similar discussions happening for > 1 year now, any attempt to
add an "extra" display was effectively blocked by a strong opposition "hey
we should make the dag_id unicode" ... and nothing happened. Thus
effectively our users cannot use non-ASCII characters in dag_id really.

We might continue having the idea that the dag_id "can be non-ASCII", but I
think it is quite difficult to pull off and no-one has the courage to try
it and take responsibility to tackle all the issues (some of which we do
not even know).

I would rather attempt it in a pragmatic way - even if it means we need to
add an extra "display" entity. Seems like more predictable in terms of
problems our users will experience.

J,


On Mon, Jan 16, 2023 at 9:19 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> > Possibly contentious idea: We allow unicode dag_ids for Postgres, MSQQL
> (and sqllite) but for Mysql we enforce it as ASCII only.
>
> If we can make it easy and make sure it is handled well and we do not have
> extra maintenance - yes, I think it is viable solution. One more reason for
> those who use MySQL to migrate :D
>
>
>
> On Thu, Jan 12, 2023 at 10:57 PM Ash Berlin-Taylor <as...@apache.org> wrote:
>
>> The description of option 1 is wrong/misleading. It sounds like you are
>> proposing DAG(name="tést") but what actually happens its that gets
>> magically changed to `test` behind the users back. -1 veto to that.
>>
>> So it's Option2 or some combo of the "break MySQL in some way" options.
>>
>> -ash
>>
>> On Jan 12 2023, at 9:50 pm, Ash Berlin-Taylor <as...@apache.org> wrote:
>>
>>
>> Possibly contentious idea: We allow unicode dag_ids for Postgres, MSQQL
>> (and sqllite) but for Mysql we enforce it as ASCII only.
>>
>> On Jan 12 2023, at 6:15 pm, Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>> As I mentioned multiple times in similar discussions We have a huge
>> problem with unicode in dag_id. Namely MySQL limit on indexes. We would
>> have to shorten the Id significantly in the database to workaround MySQL
>> limits for index size.
>>
>> We can have a wishful thinking that we can change dag_id to unicode but
>> until someone solves the problem - this is just this - wishful thinking.
>>
>> If someone has a proposal how to do it without breaking compatibility or
>> enormously complicating mysql case (or if we drop mysql proposal) - I would
>> also be for what Daniel said. But so far I have not seen any.
>>
>> So in the absence of a viable way to add unicode to dag_id (which
>> currently IMHO is not an option) my vote goes to 2.
>>
>> We can also drop MySQL support :D
>>
>> J.
>>
>>
>>
>>
>> On Thu, Jan 12, 2023 at 9:56 AM Ash Berlin-Taylor <as...@apache.org> wrote:
>>
>> +1 to what Daniel said
>>
>> On 12 January 2023 08:32:29 GMT, Daniel Standish
>> <da...@astronomer.io.INVALID> wrote:
>>
>> 1 appears to have potential fix:
>> https://github.com/apache/airflow/issues/21127#issuecomment-1030673862
>> 2. seems to fail due to our own ascii enforcement... what if we remove
>> that?
>> 3. does not appear to be unicode-related or dag_id-related but a feature
>> request for user-friendly mapped task aliases...
>>
>> not saying we should not add a "name" of some kind... but ... does not
>> yet seem clear we can't just enable unicode...  i know others have given
>> this much more thought than I and maybe they can chime in with other
>> concerns we may have encountered as this idea has bounced around
>>
>> On Thu, Jan 12, 2023 at 12:01 AM Abdul Hadi Shakir <sh...@gmail.com>
>> wrote:
>>
>> Directly using non-ascii characters (unicodes included) in *dag_id* breaks
>> couples of functionalities. See issues:
>>
>>    - Fail to download task log if there are Chinese characters in
>>    dag_id #21127 <https://github.com/apache/airflow/issues/21127>
>>    - Airflow scheduler with statsd enabled crashes when dag_id contains
>>    unexpected characters #18010
>>    <https://github.com/apache/airflow/issues/18010>
>>    - Names for expanded tasks #23020
>>    <https://github.com/apache/airflow/issues/23020>
>>
>> *Abdul Hadi Shakir*
>>
>>
>> On Thu, Jan 12, 2023 at 1:19 PM Daniel Standish
>> <da...@astronomer.io.invalid> wrote:
>>
>> Hi,
>>
>> Is it not possible to just have unicode dag_id with no distinct "name"?
>> If you explored this route and encountered problems which caused you to
>> abandon, can you share what were the problems?
>>
>> I think having just one ID for a dag is a nice thing, if we can keep it.
>>
>> On Wed, Jan 11, 2023 at 11:43 PM Abdul Hadi Shakir <sh...@gmail.com>
>> wrote:
>>
>> Hi team,
>>
>> While discussing the approach for
>> https://github.com/apache/airflow/issues/22073 (adding support for
>> national characters in DAG display name) - two approaches came out. Need
>> votes to finalise on one of the two:
>>
>>    1. [Vote *+1*] Using *name* as the only parameter; and then
>>    generating a unique *dag_id* from it using *slugify*. This makes the
>>    interface simpler; but it makes *dag_id* unknown from the users.
>>    Ongoing PR for this: https://github.com/apache/airflow/pull/28183
>>    2. [Vote -*1*] To use *display_name* along with *dag_id* as DAG
>>    params. While this is a simpler solution on the backend - it needs lots of
>>    work on the frontend for a consistent experience. Ongoing PR for this:
>>    https://github.com/apache/airflow/pull/27145
>>    3.
>>
>> Cheers,
>> *Abdul Hadi Shakir*
>>
>>

Re: [VOTE][ISSUE-22073] Finalising approach for displaying non-ascii characters in DAG display name

Posted by Jarek Potiuk <ja...@potiuk.com>.
> Possibly contentious idea: We allow unicode dag_ids for Postgres, MSQQL
(and sqllite) but for Mysql we enforce it as ASCII only.

If we can make it easy and make sure it is handled well and we do not have
extra maintenance - yes, I think it is viable solution. One more reason for
those who use MySQL to migrate :D



On Thu, Jan 12, 2023 at 10:57 PM Ash Berlin-Taylor <as...@apache.org> wrote:

> The description of option 1 is wrong/misleading. It sounds like you are
> proposing DAG(name="tést") but what actually happens its that gets
> magically changed to `test` behind the users back. -1 veto to that.
>
> So it's Option2 or some combo of the "break MySQL in some way" options.
>
> -ash
>
> On Jan 12 2023, at 9:50 pm, Ash Berlin-Taylor <as...@apache.org> wrote:
>
>
> Possibly contentious idea: We allow unicode dag_ids for Postgres, MSQQL
> (and sqllite) but for Mysql we enforce it as ASCII only.
>
> On Jan 12 2023, at 6:15 pm, Jarek Potiuk <ja...@potiuk.com> wrote:
>
> As I mentioned multiple times in similar discussions We have a huge
> problem with unicode in dag_id. Namely MySQL limit on indexes. We would
> have to shorten the Id significantly in the database to workaround MySQL
> limits for index size.
>
> We can have a wishful thinking that we can change dag_id to unicode but
> until someone solves the problem - this is just this - wishful thinking.
>
> If someone has a proposal how to do it without breaking compatibility or
> enormously complicating mysql case (or if we drop mysql proposal) - I would
> also be for what Daniel said. But so far I have not seen any.
>
> So in the absence of a viable way to add unicode to dag_id (which
> currently IMHO is not an option) my vote goes to 2.
>
> We can also drop MySQL support :D
>
> J.
>
>
>
>
> On Thu, Jan 12, 2023 at 9:56 AM Ash Berlin-Taylor <as...@apache.org> wrote:
>
> +1 to what Daniel said
>
> On 12 January 2023 08:32:29 GMT, Daniel Standish
> <da...@astronomer.io.INVALID> wrote:
>
> 1 appears to have potential fix:
> https://github.com/apache/airflow/issues/21127#issuecomment-1030673862
> 2. seems to fail due to our own ascii enforcement... what if we remove
> that?
> 3. does not appear to be unicode-related or dag_id-related but a feature
> request for user-friendly mapped task aliases...
>
> not saying we should not add a "name" of some kind... but ... does not yet
> seem clear we can't just enable unicode...  i know others have given this
> much more thought than I and maybe they can chime in with other concerns we
> may have encountered as this idea has bounced around
>
> On Thu, Jan 12, 2023 at 12:01 AM Abdul Hadi Shakir <sh...@gmail.com>
> wrote:
>
> Directly using non-ascii characters (unicodes included) in *dag_id* breaks
> couples of functionalities. See issues:
>
>    - Fail to download task log if there are Chinese characters in
>    dag_id #21127 <https://github.com/apache/airflow/issues/21127>
>    - Airflow scheduler with statsd enabled crashes when dag_id contains
>    unexpected characters #18010
>    <https://github.com/apache/airflow/issues/18010>
>    - Names for expanded tasks #23020
>    <https://github.com/apache/airflow/issues/23020>
>
> *Abdul Hadi Shakir*
>
>
> On Thu, Jan 12, 2023 at 1:19 PM Daniel Standish
> <da...@astronomer.io.invalid> wrote:
>
> Hi,
>
> Is it not possible to just have unicode dag_id with no distinct "name"?
> If you explored this route and encountered problems which caused you to
> abandon, can you share what were the problems?
>
> I think having just one ID for a dag is a nice thing, if we can keep it.
>
> On Wed, Jan 11, 2023 at 11:43 PM Abdul Hadi Shakir <sh...@gmail.com>
> wrote:
>
> Hi team,
>
> While discussing the approach for
> https://github.com/apache/airflow/issues/22073 (adding support for
> national characters in DAG display name) - two approaches came out. Need
> votes to finalise on one of the two:
>
>    1. [Vote *+1*] Using *name* as the only parameter; and then generating
>    a unique *dag_id* from it using *slugify*. This makes the interface
>    simpler; but it makes *dag_id* unknown from the users. Ongoing PR for
>    this: https://github.com/apache/airflow/pull/28183
>    2. [Vote -*1*] To use *display_name* along with *dag_id* as DAG
>    params. While this is a simpler solution on the backend - it needs lots of
>    work on the frontend for a consistent experience. Ongoing PR for this:
>    https://github.com/apache/airflow/pull/27145
>    3.
>
> Cheers,
> *Abdul Hadi Shakir*
>
>

Re: [VOTE][ISSUE-22073] Finalising approach for displaying non-ascii characters in DAG display name

Posted by Ash Berlin-Taylor <as...@apache.org>.
The description of option 1 is wrong/misleading. It sounds like you are proposing DAG(name="tést") but what actually happens its that gets magically changed to `test` behind the users back. -1 veto to that.

So it's Option2 or some combo of the "break MySQL in some way" options.
-ash
On Jan 12 2023, at 9:50 pm, Ash Berlin-Taylor <as...@apache.org> wrote:
>
> Possibly contentious idea: We allow unicode dag_ids for Postgres, MSQQL (and sqllite) but for Mysql we enforce it as ASCII only.
> On Jan 12 2023, at 6:15 pm, Jarek Potiuk <ja...@potiuk.com> wrote:
> > As I mentioned multiple times in similar discussions We have a huge problem with unicode in dag_id. Namely MySQL limit on indexes. We would have to shorten the Id significantly in the database to workaround MySQL limits for index size.
> >
> > We can have a wishful thinking that we can change dag_id to unicode but until someone solves the problem - this is just this - wishful thinking.
> >
> > If someone has a proposal how to do it without breaking compatibility or enormously complicating mysql case (or if we drop mysql proposal) - I would also be for what Daniel said. But so far I have not seen any.
> >
> > So in the absence of a viable way to add unicode to dag_id (which currently IMHO is not an option) my vote goes to 2.
> >
> > We can also drop MySQL support :D
> >
> > J.
> >
> >
> >
> >
> > On Thu, Jan 12, 2023 at 9:56 AM Ash Berlin-Taylor <ash@apache.org (mailto:ash@apache.org)> wrote:
> > > +1 to what Daniel said
> > >
> > > On 12 January 2023 08:32:29 GMT, Daniel Standish <da...@astronomer.io.INVALID> wrote:
> > > > 1 appears to have potential fix: https://github.com/apache/airflow/issues/21127#issuecomment-1030673862
> > > > 2. seems to fail due to our own ascii enforcement... what if we remove that?
> > > > 3. does not appear to be unicode-related or dag_id-related but a feature request for user-friendly mapped task aliases...
> > > >
> > > > not saying we should not add a "name" of some kind... but ... does not yet seem clear we can't just enable unicode... i know others have given this much more thought than I and maybe they can chime in with other concerns we may have encountered as this idea has bounced around
> > > > On Thu, Jan 12, 2023 at 12:01 AM Abdul Hadi Shakir <shakir.iitd@gmail.com (mailto:shakir.iitd@gmail.com)> wrote:
> > > > > Directly using non-ascii characters (unicodes included) in dag_id breaks couples of functionalities. See issues:
> > > > > Fail to download task log if there are Chinese characters in dag_id #21127 (https://github.com/apache/airflow/issues/21127)
> > > > >
> > > > > Airflow scheduler with statsd enabled crashes when dag_id contains unexpected characters #18010 (https://github.com/apache/airflow/issues/18010)
> > > > >
> > > > > Names for expanded tasks #23020 (https://github.com/apache/airflow/issues/23020)
> > > > >
> > > > >
> > > > > Abdul Hadi Shakir
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Jan 12, 2023 at 1:19 PM Daniel Standish <da...@astronomer.io.invalid> wrote:
> > > > > > Hi,
> > > > > >
> > > > > > Is it not possible to just have unicode dag_id with no distinct "name"? If you explored this route and encountered problems which caused you to abandon, can you share what were the problems?
> > > > > >
> > > > > > I think having just one ID for a dag is a nice thing, if we can keep it.
> > > > > > On Wed, Jan 11, 2023 at 11:43 PM Abdul Hadi Shakir <shakir.iitd@gmail.com (mailto:shakir.iitd@gmail.com)> wrote:
> > > > > > > Hi team,
> > > > > > >
> > > > > > > While discussing the approach for https://github.com/apache/airflow/issues/22073 (adding support for national characters in DAG display name) - two approaches came out. Need votes to finalise on one of the two:
> > > > > > > [Vote +1] Using name as the only parameter; and then generating a unique dag_id from it using slugify. This makes the interface simpler; but it makes dag_id unknown from the users. Ongoing PR for this: https://github.com/apache/airflow/pull/28183
> > > > > > >
> > > > > > > [Vote -1] To use display_name along with dag_id as DAG params. While this is a simpler solution on the backend - it needs lots of work on the frontend for a consistent experience. Ongoing PR for this: https://github.com/apache/airflow/pull/27145
> > > > > > >
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Abdul Hadi Shakir
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> > >
> >
> >
>


Re: [VOTE][ISSUE-22073] Finalising approach for displaying non-ascii characters in DAG display name

Posted by Ash Berlin-Taylor <as...@apache.org>.
Possibly contentious idea: We allow unicode dag_ids for Postgres, MSQQL (and sqllite) but for Mysql we enforce it as ASCII only.
On Jan 12 2023, at 6:15 pm, Jarek Potiuk <ja...@potiuk.com> wrote:
> As I mentioned multiple times in similar discussions We have a huge problem with unicode in dag_id. Namely MySQL limit on indexes. We would have to shorten the Id significantly in the database to workaround MySQL limits for index size.
>
> We can have a wishful thinking that we can change dag_id to unicode but until someone solves the problem - this is just this - wishful thinking.
>
> If someone has a proposal how to do it without breaking compatibility or enormously complicating mysql case (or if we drop mysql proposal) - I would also be for what Daniel said. But so far I have not seen any.
>
> So in the absence of a viable way to add unicode to dag_id (which currently IMHO is not an option) my vote goes to 2.
>
> We can also drop MySQL support :D
>
> J.
>
>
>
>
> On Thu, Jan 12, 2023 at 9:56 AM Ash Berlin-Taylor <ash@apache.org (mailto:ash@apache.org)> wrote:
> > +1 to what Daniel said
> >
> > On 12 January 2023 08:32:29 GMT, Daniel Standish <da...@astronomer.io.INVALID> wrote:
> > > 1 appears to have potential fix: https://github.com/apache/airflow/issues/21127#issuecomment-1030673862
> > > 2. seems to fail due to our own ascii enforcement... what if we remove that?
> > > 3. does not appear to be unicode-related or dag_id-related but a feature request for user-friendly mapped task aliases...
> > >
> > > not saying we should not add a "name" of some kind... but ... does not yet seem clear we can't just enable unicode... i know others have given this much more thought than I and maybe they can chime in with other concerns we may have encountered as this idea has bounced around
> > > On Thu, Jan 12, 2023 at 12:01 AM Abdul Hadi Shakir <shakir.iitd@gmail.com (mailto:shakir.iitd@gmail.com)> wrote:
> > > > Directly using non-ascii characters (unicodes included) in dag_id breaks couples of functionalities. See issues:
> > > > Fail to download task log if there are Chinese characters in dag_id #21127 (https://github.com/apache/airflow/issues/21127)
> > > >
> > > > Airflow scheduler with statsd enabled crashes when dag_id contains unexpected characters #18010 (https://github.com/apache/airflow/issues/18010)
> > > >
> > > > Names for expanded tasks #23020 (https://github.com/apache/airflow/issues/23020)
> > > >
> > > >
> > > > Abdul Hadi Shakir
> > > >
> > > >
> > > >
> > > >
> > > > On Thu, Jan 12, 2023 at 1:19 PM Daniel Standish <da...@astronomer.io.invalid> wrote:
> > > > > Hi,
> > > > >
> > > > > Is it not possible to just have unicode dag_id with no distinct "name"? If you explored this route and encountered problems which caused you to abandon, can you share what were the problems?
> > > > >
> > > > > I think having just one ID for a dag is a nice thing, if we can keep it.
> > > > > On Wed, Jan 11, 2023 at 11:43 PM Abdul Hadi Shakir <shakir.iitd@gmail.com (mailto:shakir.iitd@gmail.com)> wrote:
> > > > > > Hi team,
> > > > > >
> > > > > > While discussing the approach for https://github.com/apache/airflow/issues/22073 (adding support for national characters in DAG display name) - two approaches came out. Need votes to finalise on one of the two:
> > > > > > [Vote +1] Using name as the only parameter; and then generating a unique dag_id from it using slugify. This makes the interface simpler; but it makes dag_id unknown from the users. Ongoing PR for this: https://github.com/apache/airflow/pull/28183
> > > > > >
> > > > > > [Vote -1] To use display_name along with dag_id as DAG params. While this is a simpler solution on the backend - it needs lots of work on the frontend for a consistent experience. Ongoing PR for this: https://github.com/apache/airflow/pull/27145
> > > > > >
> > > > > >
> > > > > > Cheers,
> > > > > > Abdul Hadi Shakir
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> >
> >
> >
>
>


Re: [VOTE][ISSUE-22073] Finalising approach for displaying non-ascii characters in DAG display name

Posted by Daniel Standish <da...@astronomer.io.INVALID>.
Well, I did acknowledge that others thought about it more than me :) Have
not followed the discussions in a while and didn't know where we got.  I
suspect there are others in the same boat and we're asked to vote on it
so it's worth asking and surfacing why that has been ruled out, assuming it
has.

Is it not feasible to have shorter dag_id requirements for unicode
dags, perhaps only in the mysql case?







On Thu, Jan 12, 2023 at 10:16 AM Jarek Potiuk <ja...@potiuk.com> wrote:

> As I mentioned multiple times in similar discussions We have a huge
> problem with unicode in dag_id. Namely MySQL limit on indexes. We would
> have to shorten the Id significantly in the database to workaround MySQL
> limits for index size.
>
> We can have a wishful thinking that we can change dag_id to unicode but
> until someone solves the problem - this is just this - wishful thinking.
>
> If someone has a proposal how to do it without breaking compatibility or
> enormously complicating mysql case (or if we drop mysql proposal) - I would
> also be for what Daniel said. But so far I have not seen any.
>
> So in the absence of a viable way to add unicode to dag_id (which
> currently IMHO is not an option) my vote goes to 2.
>
> We can also drop MySQL support :D
>
> J.
>
>
>
>
> On Thu, Jan 12, 2023 at 9:56 AM Ash Berlin-Taylor <as...@apache.org> wrote:
>
>> +1 to what Daniel said
>>
>> On 12 January 2023 08:32:29 GMT, Daniel Standish
>> <da...@astronomer.io.INVALID> wrote:
>>>
>>> 1 appears to have potential fix:
>>> https://github.com/apache/airflow/issues/21127#issuecomment-1030673862
>>> 2. seems to fail due to our own ascii enforcement... what if we remove
>>> that?
>>> 3. does not appear to be unicode-related or dag_id-related but a feature
>>> request for user-friendly mapped task aliases...
>>>
>>> not saying we should not add a "name" of some kind... but ... does not
>>> yet seem clear we can't just enable unicode...  i know others have given
>>> this much more thought than I and maybe they can chime in with other
>>> concerns we may have encountered as this idea has bounced around
>>>
>>> On Thu, Jan 12, 2023 at 12:01 AM Abdul Hadi Shakir <
>>> shakir.iitd@gmail.com> wrote:
>>>
>>>> Directly using non-ascii characters (unicodes included) in *dag_id* breaks
>>>> couples of functionalities. See issues:
>>>>
>>>>    - Fail to download task log if there are Chinese characters in
>>>>    dag_id #21127 <https://github.com/apache/airflow/issues/21127>
>>>>    - Airflow scheduler with statsd enabled crashes when dag_id
>>>>    contains unexpected characters #18010
>>>>    <https://github.com/apache/airflow/issues/18010>
>>>>    - Names for expanded tasks #23020
>>>>    <https://github.com/apache/airflow/issues/23020>
>>>>
>>>> *Abdul Hadi Shakir*
>>>>
>>>>
>>>> On Thu, Jan 12, 2023 at 1:19 PM Daniel Standish
>>>> <da...@astronomer.io.invalid> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Is it not possible to just have unicode dag_id with no distinct
>>>>> "name"?  If you explored this route and encountered problems which caused
>>>>> you to abandon, can you share what were the problems?
>>>>>
>>>>> I think having just one ID for a dag is a nice thing, if we can keep
>>>>> it.
>>>>>
>>>>> On Wed, Jan 11, 2023 at 11:43 PM Abdul Hadi Shakir <
>>>>> shakir.iitd@gmail.com> wrote:
>>>>>
>>>>>> Hi team,
>>>>>>
>>>>>> While discussing the approach for
>>>>>> https://github.com/apache/airflow/issues/22073 (adding support for
>>>>>> national characters in DAG display name) - two approaches came out. Need
>>>>>> votes to finalise on one of the two:
>>>>>>
>>>>>>    1. [Vote *+1*] Using *name* as the only parameter; and then
>>>>>>    generating a unique *dag_id* from it using *slugify*. This makes
>>>>>>    the interface simpler; but it makes *dag_id* unknown from the
>>>>>>    users. Ongoing PR for this:
>>>>>>    https://github.com/apache/airflow/pull/28183
>>>>>>    2. [Vote -*1*] To use *display_name* along with *dag_id* as DAG
>>>>>>    params. While this is a simpler solution on the backend - it needs lots of
>>>>>>    work on the frontend for a consistent experience. Ongoing PR for this:
>>>>>>    https://github.com/apache/airflow/pull/27145
>>>>>>
>>>>>> Cheers,
>>>>>> *Abdul Hadi Shakir*
>>>>>>
>>>>>

Re: [VOTE][ISSUE-22073] Finalising approach for displaying non-ascii characters in DAG display name

Posted by Jarek Potiuk <ja...@potiuk.com>.
As I mentioned multiple times in similar discussions We have a huge problem
with unicode in dag_id. Namely MySQL limit on indexes. We would have to
shorten the Id significantly in the database to workaround MySQL limits for
index size.

We can have a wishful thinking that we can change dag_id to unicode but
until someone solves the problem - this is just this - wishful thinking.

If someone has a proposal how to do it without breaking compatibility or
enormously complicating mysql case (or if we drop mysql proposal) - I would
also be for what Daniel said. But so far I have not seen any.

So in the absence of a viable way to add unicode to dag_id (which currently
IMHO is not an option) my vote goes to 2.

We can also drop MySQL support :D

J.




On Thu, Jan 12, 2023 at 9:56 AM Ash Berlin-Taylor <as...@apache.org> wrote:

> +1 to what Daniel said
>
> On 12 January 2023 08:32:29 GMT, Daniel Standish
> <da...@astronomer.io.INVALID> wrote:
>>
>> 1 appears to have potential fix:
>> https://github.com/apache/airflow/issues/21127#issuecomment-1030673862
>> 2. seems to fail due to our own ascii enforcement... what if we remove
>> that?
>> 3. does not appear to be unicode-related or dag_id-related but a feature
>> request for user-friendly mapped task aliases...
>>
>> not saying we should not add a "name" of some kind... but ... does not
>> yet seem clear we can't just enable unicode...  i know others have given
>> this much more thought than I and maybe they can chime in with other
>> concerns we may have encountered as this idea has bounced around
>>
>> On Thu, Jan 12, 2023 at 12:01 AM Abdul Hadi Shakir <sh...@gmail.com>
>> wrote:
>>
>>> Directly using non-ascii characters (unicodes included) in *dag_id* breaks
>>> couples of functionalities. See issues:
>>>
>>>    - Fail to download task log if there are Chinese characters in dag_id
>>>     #21127 <https://github.com/apache/airflow/issues/21127>
>>>    - Airflow scheduler with statsd enabled crashes when dag_id contains
>>>    unexpected characters #18010
>>>    <https://github.com/apache/airflow/issues/18010>
>>>    - Names for expanded tasks #23020
>>>    <https://github.com/apache/airflow/issues/23020>
>>>
>>> *Abdul Hadi Shakir*
>>>
>>>
>>> On Thu, Jan 12, 2023 at 1:19 PM Daniel Standish
>>> <da...@astronomer.io.invalid> wrote:
>>>
>>>> Hi,
>>>>
>>>> Is it not possible to just have unicode dag_id with no distinct
>>>> "name"?  If you explored this route and encountered problems which caused
>>>> you to abandon, can you share what were the problems?
>>>>
>>>> I think having just one ID for a dag is a nice thing, if we can keep it.
>>>>
>>>> On Wed, Jan 11, 2023 at 11:43 PM Abdul Hadi Shakir <
>>>> shakir.iitd@gmail.com> wrote:
>>>>
>>>>> Hi team,
>>>>>
>>>>> While discussing the approach for
>>>>> https://github.com/apache/airflow/issues/22073 (adding support for
>>>>> national characters in DAG display name) - two approaches came out. Need
>>>>> votes to finalise on one of the two:
>>>>>
>>>>>    1. [Vote *+1*] Using *name* as the only parameter; and then
>>>>>    generating a unique *dag_id* from it using *slugify*. This makes
>>>>>    the interface simpler; but it makes *dag_id* unknown from the
>>>>>    users. Ongoing PR for this:
>>>>>    https://github.com/apache/airflow/pull/28183
>>>>>    2. [Vote -*1*] To use *display_name* along with *dag_id* as DAG
>>>>>    params. While this is a simpler solution on the backend - it needs lots of
>>>>>    work on the frontend for a consistent experience. Ongoing PR for this:
>>>>>    https://github.com/apache/airflow/pull/27145
>>>>>
>>>>> Cheers,
>>>>> *Abdul Hadi Shakir*
>>>>>
>>>>

Re: [VOTE][ISSUE-22073] Finalising approach for displaying non-ascii characters in DAG display name

Posted by Ash Berlin-Taylor <as...@apache.org>.
+1 to what Daniel said

On 12 January 2023 08:32:29 GMT, Daniel Standish <da...@astronomer.io.INVALID> wrote:
>1 appears to have potential fix:
>https://github.com/apache/airflow/issues/21127#issuecomment-1030673862
>2. seems to fail due to our own ascii enforcement... what if we remove that?
>3. does not appear to be unicode-related or dag_id-related but a feature
>request for user-friendly mapped task aliases...
>
>not saying we should not add a "name" of some kind... but ... does not yet
>seem clear we can't just enable unicode...  i know others have given this
>much more thought than I and maybe they can chime in with other concerns we
>may have encountered as this idea has bounced around
>
>On Thu, Jan 12, 2023 at 12:01 AM Abdul Hadi Shakir <sh...@gmail.com>
>wrote:
>
>> Directly using non-ascii characters (unicodes included) in *dag_id* breaks
>> couples of functionalities. See issues:
>>
>>    - Fail to download task log if there are Chinese characters in dag_id
>>     #21127 <https://github.com/apache/airflow/issues/21127>
>>    - Airflow scheduler with statsd enabled crashes when dag_id contains
>>    unexpected characters #18010
>>    <https://github.com/apache/airflow/issues/18010>
>>    - Names for expanded tasks #23020
>>    <https://github.com/apache/airflow/issues/23020>
>>
>> *Abdul Hadi Shakir*
>>
>>
>> On Thu, Jan 12, 2023 at 1:19 PM Daniel Standish
>> <da...@astronomer.io.invalid> wrote:
>>
>>> Hi,
>>>
>>> Is it not possible to just have unicode dag_id with no distinct "name"?
>>> If you explored this route and encountered problems which caused you to
>>> abandon, can you share what were the problems?
>>>
>>> I think having just one ID for a dag is a nice thing, if we can keep it.
>>>
>>> On Wed, Jan 11, 2023 at 11:43 PM Abdul Hadi Shakir <sh...@gmail.com>
>>> wrote:
>>>
>>>> Hi team,
>>>>
>>>> While discussing the approach for
>>>> https://github.com/apache/airflow/issues/22073 (adding support for
>>>> national characters in DAG display name) - two approaches came out. Need
>>>> votes to finalise on one of the two:
>>>>
>>>>    1. [Vote *+1*] Using *name* as the only parameter; and then
>>>>    generating a unique *dag_id* from it using *slugify*. This makes the
>>>>    interface simpler; but it makes *dag_id* unknown from the users.
>>>>    Ongoing PR for this: https://github.com/apache/airflow/pull/28183
>>>>    2. [Vote -*1*] To use *display_name* along with *dag_id* as DAG
>>>>    params. While this is a simpler solution on the backend - it needs lots of
>>>>    work on the frontend for a consistent experience. Ongoing PR for this:
>>>>    https://github.com/apache/airflow/pull/27145
>>>>
>>>> Cheers,
>>>> *Abdul Hadi Shakir*
>>>>
>>>

Re: [VOTE][ISSUE-22073] Finalising approach for displaying non-ascii characters in DAG display name

Posted by Daniel Standish <da...@astronomer.io.INVALID>.
1 appears to have potential fix:
https://github.com/apache/airflow/issues/21127#issuecomment-1030673862
2. seems to fail due to our own ascii enforcement... what if we remove that?
3. does not appear to be unicode-related or dag_id-related but a feature
request for user-friendly mapped task aliases...

not saying we should not add a "name" of some kind... but ... does not yet
seem clear we can't just enable unicode...  i know others have given this
much more thought than I and maybe they can chime in with other concerns we
may have encountered as this idea has bounced around

On Thu, Jan 12, 2023 at 12:01 AM Abdul Hadi Shakir <sh...@gmail.com>
wrote:

> Directly using non-ascii characters (unicodes included) in *dag_id* breaks
> couples of functionalities. See issues:
>
>    - Fail to download task log if there are Chinese characters in dag_id
>     #21127 <https://github.com/apache/airflow/issues/21127>
>    - Airflow scheduler with statsd enabled crashes when dag_id contains
>    unexpected characters #18010
>    <https://github.com/apache/airflow/issues/18010>
>    - Names for expanded tasks #23020
>    <https://github.com/apache/airflow/issues/23020>
>
> *Abdul Hadi Shakir*
>
>
> On Thu, Jan 12, 2023 at 1:19 PM Daniel Standish
> <da...@astronomer.io.invalid> wrote:
>
>> Hi,
>>
>> Is it not possible to just have unicode dag_id with no distinct "name"?
>> If you explored this route and encountered problems which caused you to
>> abandon, can you share what were the problems?
>>
>> I think having just one ID for a dag is a nice thing, if we can keep it.
>>
>> On Wed, Jan 11, 2023 at 11:43 PM Abdul Hadi Shakir <sh...@gmail.com>
>> wrote:
>>
>>> Hi team,
>>>
>>> While discussing the approach for
>>> https://github.com/apache/airflow/issues/22073 (adding support for
>>> national characters in DAG display name) - two approaches came out. Need
>>> votes to finalise on one of the two:
>>>
>>>    1. [Vote *+1*] Using *name* as the only parameter; and then
>>>    generating a unique *dag_id* from it using *slugify*. This makes the
>>>    interface simpler; but it makes *dag_id* unknown from the users.
>>>    Ongoing PR for this: https://github.com/apache/airflow/pull/28183
>>>    2. [Vote -*1*] To use *display_name* along with *dag_id* as DAG
>>>    params. While this is a simpler solution on the backend - it needs lots of
>>>    work on the frontend for a consistent experience. Ongoing PR for this:
>>>    https://github.com/apache/airflow/pull/27145
>>>
>>> Cheers,
>>> *Abdul Hadi Shakir*
>>>
>>

Re: [VOTE][ISSUE-22073] Finalising approach for displaying non-ascii characters in DAG display name

Posted by Abdul Hadi Shakir <sh...@gmail.com>.
Directly using non-ascii characters (unicodes included) in *dag_id* breaks
couples of functionalities. See issues:

   - Fail to download task log if there are Chinese characters in dag_id
    #21127 <https://github.com/apache/airflow/issues/21127>
   - Airflow scheduler with statsd enabled crashes when dag_id contains
   unexpected characters #18010
   <https://github.com/apache/airflow/issues/18010>
   - Names for expanded tasks #23020
   <https://github.com/apache/airflow/issues/23020>

*Abdul Hadi Shakir*


On Thu, Jan 12, 2023 at 1:19 PM Daniel Standish
<da...@astronomer.io.invalid> wrote:

> Hi,
>
> Is it not possible to just have unicode dag_id with no distinct "name"?
> If you explored this route and encountered problems which caused you to
> abandon, can you share what were the problems?
>
> I think having just one ID for a dag is a nice thing, if we can keep it.
>
> On Wed, Jan 11, 2023 at 11:43 PM Abdul Hadi Shakir <sh...@gmail.com>
> wrote:
>
>> Hi team,
>>
>> While discussing the approach for
>> https://github.com/apache/airflow/issues/22073 (adding support for
>> national characters in DAG display name) - two approaches came out. Need
>> votes to finalise on one of the two:
>>
>>    1. [Vote *+1*] Using *name* as the only parameter; and then
>>    generating a unique *dag_id* from it using *slugify*. This makes the
>>    interface simpler; but it makes *dag_id* unknown from the users.
>>    Ongoing PR for this: https://github.com/apache/airflow/pull/28183
>>    2. [Vote -*1*] To use *display_name* along with *dag_id* as DAG
>>    params. While this is a simpler solution on the backend - it needs lots of
>>    work on the frontend for a consistent experience. Ongoing PR for this:
>>    https://github.com/apache/airflow/pull/27145
>>
>> Cheers,
>> *Abdul Hadi Shakir*
>>
>

Re: [VOTE][ISSUE-22073] Finalising approach for displaying non-ascii characters in DAG display name

Posted by Daniel Standish <da...@astronomer.io.INVALID>.
Hi,

Is it not possible to just have unicode dag_id with no distinct "name"?  If
you explored this route and encountered problems which caused you to
abandon, can you share what were the problems?

I think having just one ID for a dag is a nice thing, if we can keep it.

On Wed, Jan 11, 2023 at 11:43 PM Abdul Hadi Shakir <sh...@gmail.com>
wrote:

> Hi team,
>
> While discussing the approach for
> https://github.com/apache/airflow/issues/22073 (adding support for
> national characters in DAG display name) - two approaches came out. Need
> votes to finalise on one of the two:
>
>    1. [Vote *+1*] Using *name* as the only parameter; and then generating
>    a unique *dag_id* from it using *slugify*. This makes the interface
>    simpler; but it makes *dag_id* unknown from the users. Ongoing PR for
>    this: https://github.com/apache/airflow/pull/28183
>    2. [Vote -*1*] To use *display_name* along with *dag_id* as DAG
>    params. While this is a simpler solution on the backend - it needs lots of
>    work on the frontend for a consistent experience. Ongoing PR for this:
>    https://github.com/apache/airflow/pull/27145
>
> Cheers,
> *Abdul Hadi Shakir*
>