You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Ping Zhang <pi...@umich.edu> on 2023/02/15 01:20:49 UTC

Re: [VOTE][ISSUE-22073] Finalising approach for displaying non-ascii characters in DAG display name

Hi Abdul,

Friendly bump this thread. Do we have an agreement on which route we
are going to take?

Thanks,

Ping


On Fri, Jan 20, 2023 at 3:50 PM Ping Zhang <pi...@umich.edu> wrote:

> Hi,
>
> I would vote for: `[Vote -1] To use display_name along with dag_id as DAG
> params`.
>
> `dag_id` is a fundamental core concept in the system, changing that might
> have some repercussions. There is also another advantage of having the
> display_name, which allows the UI to display a more meaningful and
> human-readable format of dags.
>
> Thanks,
>
> Ping
>
>
> On Thu, Jan 19, 2023 at 3:47 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> One more thing here. I think historically we also have other places we
>> don't fully realise where we rely on dag_id being ASCII.
>>
>> Example here https://github.com/apache/airflow/issues/18010 where
>> non-ASCII dag_id causes airflow scheduler to crash (and in
>> irrecoverable way at that) when statsd is enabled. Another example are the
>> log file names (there are still filesystems that do not support unicode or
>> they can have different encoding. But also there are a number of edge cases
>> (can we have ":" in the DAG_ID, or "';" ? or even space? Those are not
>> insurmountable of course - for example we could also slugify the statsd
>> label. But then those are opening up pandora's box of problems that we
>> won't be able to fully unit test and prevent - because we can't reasonably
>> test all kinds of combinations of filesystems, encoding, telemetry systems
>> etc. etc. And we will have plenty of errors reported after we release it.
>>
>> I personally think that if we continue insisting on "let's make the
>> dag_id unicode" we might keep on bumping on those issues we do not
>> understand or realise and that might mean - more issues, more frustrated
>> users, more support and diagnostics for the community. While I see how
>> "idealistic" approach on getting the id being fully "internationalizable"
>> is (hey we are in 2023, we should be able to handle non-ASCII as id),
>> pragmatically speaking, getting extra "display" while keeping the original
>> dag_id  as ASCII unique identifier might be much pragmatic and attainable.
>>
>> I remember similar discussions happening for > 1 year now, any attempt to
>> add an "extra" display was effectively blocked by a strong opposition "hey
>> we should make the dag_id unicode" ... and nothing happened. Thus
>> effectively our users cannot use non-ASCII characters in dag_id really.
>>
>> We might continue having the idea that the dag_id "can be non-ASCII", but
>> I think it is quite difficult to pull off and no-one has the courage to try
>> it and take responsibility to tackle all the issues (some of which we do
>> not even know).
>>
>> I would rather attempt it in a pragmatic way - even if it means we need
>> to add an extra "display" entity. Seems like more predictable in terms of
>> problems our users will experience.
>>
>> J,
>>
>>
>> On Mon, Jan 16, 2023 at 9:19 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>>> > Possibly contentious idea: We allow unicode dag_ids for Postgres,
>>> MSQQL (and sqllite) but for Mysql we enforce it as ASCII only.
>>>
>>> If we can make it easy and make sure it is handled well and we do not
>>> have extra maintenance - yes, I think it is viable solution. One more
>>> reason for those who use MySQL to migrate :D
>>>
>>>
>>>
>>> On Thu, Jan 12, 2023 at 10:57 PM Ash Berlin-Taylor <as...@apache.org>
>>> wrote:
>>>
>>>> The description of option 1 is wrong/misleading. It sounds like you are
>>>> proposing DAG(name="tést") but what actually happens its that gets
>>>> magically changed to `test` behind the users back. -1 veto to that.
>>>>
>>>> So it's Option2 or some combo of the "break MySQL in some way" options.
>>>>
>>>> -ash
>>>>
>>>> On Jan 12 2023, at 9:50 pm, Ash Berlin-Taylor <as...@apache.org> wrote:
>>>>
>>>>
>>>> Possibly contentious idea: We allow unicode dag_ids for Postgres, MSQQL
>>>> (and sqllite) but for Mysql we enforce it as ASCII only.
>>>>
>>>> On Jan 12 2023, at 6:15 pm, Jarek Potiuk <ja...@potiuk.com> wrote:
>>>>
>>>> As I mentioned multiple times in similar discussions We have a huge
>>>> problem with unicode in dag_id. Namely MySQL limit on indexes. We would
>>>> have to shorten the Id significantly in the database to workaround MySQL
>>>> limits for index size.
>>>>
>>>> We can have a wishful thinking that we can change dag_id to unicode but
>>>> until someone solves the problem - this is just this - wishful thinking.
>>>>
>>>> If someone has a proposal how to do it without breaking compatibility
>>>> or enormously complicating mysql case (or if we drop mysql proposal) - I
>>>> would also be for what Daniel said. But so far I have not seen any.
>>>>
>>>> So in the absence of a viable way to add unicode to dag_id (which
>>>> currently IMHO is not an option) my vote goes to 2.
>>>>
>>>> We can also drop MySQL support :D
>>>>
>>>> J.
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Jan 12, 2023 at 9:56 AM Ash Berlin-Taylor <as...@apache.org>
>>>> wrote:
>>>>
>>>> +1 to what Daniel said
>>>>
>>>> On 12 January 2023 08:32:29 GMT, Daniel Standish
>>>> <da...@astronomer.io.INVALID> wrote:
>>>>
>>>> 1 appears to have potential fix:
>>>> https://github.com/apache/airflow/issues/21127#issuecomment-1030673862
>>>> 2. seems to fail due to our own ascii enforcement... what if we remove
>>>> that?
>>>> 3. does not appear to be unicode-related or dag_id-related but a
>>>> feature request for user-friendly mapped task aliases...
>>>>
>>>> not saying we should not add a "name" of some kind... but ... does not
>>>> yet seem clear we can't just enable unicode...  i know others have given
>>>> this much more thought than I and maybe they can chime in with other
>>>> concerns we may have encountered as this idea has bounced around
>>>>
>>>> On Thu, Jan 12, 2023 at 12:01 AM Abdul Hadi Shakir <
>>>> shakir.iitd@gmail.com> wrote:
>>>>
>>>> Directly using non-ascii characters (unicodes included) in *dag_id* breaks
>>>> couples of functionalities. See issues:
>>>>
>>>>    - Fail to download task log if there are Chinese characters in
>>>>    dag_id #21127 <https://github.com/apache/airflow/issues/21127>
>>>>    - Airflow scheduler with statsd enabled crashes when dag_id
>>>>    contains unexpected characters #18010
>>>>    <https://github.com/apache/airflow/issues/18010>
>>>>    - Names for expanded tasks #23020
>>>>    <https://github.com/apache/airflow/issues/23020>
>>>>
>>>> *Abdul Hadi Shakir*
>>>>
>>>>
>>>> On Thu, Jan 12, 2023 at 1:19 PM Daniel Standish
>>>> <da...@astronomer.io.invalid> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Is it not possible to just have unicode dag_id with no distinct
>>>> "name"?  If you explored this route and encountered problems which caused
>>>> you to abandon, can you share what were the problems?
>>>>
>>>> I think having just one ID for a dag is a nice thing, if we can keep it.
>>>>
>>>> On Wed, Jan 11, 2023 at 11:43 PM Abdul Hadi Shakir <
>>>> shakir.iitd@gmail.com> wrote:
>>>>
>>>> Hi team,
>>>>
>>>> While discussing the approach for
>>>> https://github.com/apache/airflow/issues/22073 (adding support for
>>>> national characters in DAG display name) - two approaches came out. Need
>>>> votes to finalise on one of the two:
>>>>
>>>>    1. [Vote *+1*] Using *name* as the only parameter; and then
>>>>    generating a unique *dag_id* from it using *slugify*. This makes
>>>>    the interface simpler; but it makes *dag_id* unknown from the
>>>>    users. Ongoing PR for this:
>>>>    https://github.com/apache/airflow/pull/28183
>>>>    2. [Vote -*1*] To use *display_name* along with *dag_id* as DAG
>>>>    params. While this is a simpler solution on the backend - it needs lots of
>>>>    work on the frontend for a consistent experience. Ongoing PR for this:
>>>>    https://github.com/apache/airflow/pull/27145
>>>>    3.
>>>>
>>>> Cheers,
>>>> *Abdul Hadi Shakir*
>>>>
>>>>