You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/07/18 23:07:37 UTC

[GitHub] [airflow] pingzh opened a new issue, #25144: Airflow db init fails on mysql 5.7

pingzh opened a new issue, #25144:
URL: https://github.com/apache/airflow/issues/25144

   ### Apache Airflow version
   
   2.2.4
   
   ### What happened
   
   Airflow db init fails on mysql 5.7 due to :
   
   `Specified key was too long; max key length is 767 bytes`
   
   On mysql 5.7,  the index key prefix limit is 767 bytes for tables see: https://dev.mysql.com/doc/refman/5.7/en/innodb-limits.html when `innodb_large_prefix` is `OFF` or it  use the REDUNDANT or COMPACT row format. 
   and mysql 5.7 uses utf8mb3 charset (which is utf8), thus the max length for
   index key should be 251
   
   The following key length leads to the error:
   https://github.com/apache/airflow/blob/808035e00aaf59a8012c50903a09d3f50bd92ca4/airflow/models/pool.py#L48
   
   https://github.com/apache/airflow/blob/808035e00aaf59a8012c50903a09d3f50bd92ca4/airflow/models/xcom.py#L61
   
   (we don't really need that long key ^^)
   
   https://github.com/apache/airflow/blob/808035e00aaf59a8012c50903a09d3f50bd92ca4/airflow/migrations/versions/0061_2_0_0_increase_length_of_pool_name.py#L44
   
   ### What you think should happen instead
   
   `airflow db init` should works for mysql 5.7
   
   ### How to reproduce
   
   _No response_
   
   ### Operating System
   
   Apple M1 Max, version: 12.2
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Other
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] pingzh commented on issue #25144: Airflow db init fails on mysql 5.7

Posted by GitBox <gi...@apache.org>.
pingzh commented on issue #25144:
URL: https://github.com/apache/airflow/issues/25144#issuecomment-1189695673

   @potiuk agree that we don't need to decrease it now. my concern is more in the future changes. I think we should be more careful when changing the database schema, as it has implications for users when upgrading. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #25144: Airflow db init fails on mysql 5.7

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #25144:
URL: https://github.com/apache/airflow/issues/25144#issuecomment-1190541574

   And I just learned by answering #24526 that they are planning to remove utf8mb3 altogether (but this in another part of documentation):
   
   https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-utf8.html
   
   >  The utf8mb3 character set is deprecated and you should expect it to be removed in a future MySQL release. Please use utf8mb4 instead. utf8 is currently an alias for utf8mb3, but it is now deprecated as such, and utf8 is expected subsequently to become a reference to utf8mb4. Beginning with MySQL 8.0.28, utf8mb3 is also displayed in place of utf8 in columns of Information Schema tables, and in the output of SQL SHOW statements.
   
   > To avoid ambiguity about the meaning of utf8, consider specifying utf8mb4 explicitly for character set references.
   This is a nightmare 
   
   
   I am at a total loss.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk closed issue #25144: Airflow db init fails on mysql 5.7

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #25144: Airflow db init fails on mysql 5.7
URL: https://github.com/apache/airflow/issues/25144


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #25144: Airflow db init fails on mysql 5.7

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #25144:
URL: https://github.com/apache/airflow/issues/25144#issuecomment-1190089091

   BTW. The limited index size in MySQL is a bummer. This is an extremely bad design choice especially connected with the fact that the actual size taken by the index depends on collation.
   
   Sorry for the RANT, but I am so fed-up with the problems MySQL caused use because of they bad design choices and wrong (IMHO) decisions that it is a blueprint of "how not to treat your users". And whenever I see someone smart who understands how it all works I will use all the strenght I have to convince them to dump MySQL because their decision make their users to suffer (and in our case - our users and us).
   
   When I first learned (2 years ago when it caused us the problem) was when a user wanted to use 🦅 in the id of the DAG. This is extended UTF-8 set and required utf8mb4 and it was not possible - precisely because suddenly when you used utf8mb4 the index was to big on (back then) MySQL 5.6. And well - this is a very legitimat And then it got worse - if you look at how collaction charset approach changes in MySQL 5.7. 8 and possibly in the future - this is nothing but madness.
   
   5.6: default charset in 5.6 was .... Latin2 and collation is Swedish. This is what - unknowingly by the admins  - is by far biggest charset/collation MySQL database have. Yep. Swedish. BTW. Back then even if indexes of MySQL 5.6 were smaller, this was not a problem because the index for latin2 has huge capacity. If you wanted to change to utf8 you were immediately cutting it down - which basically means that definition of your database had to take into account which encoding/collation should be used. And what's worse - utf8mb3 or mb4 is designed in the way that it requires 1,2,3 (or 4) characters - Why on earth the limit for index for utf8mb3 is 1/3 of latin-2? and utf8mb4 ? You can fit the same number of utf8mb*  characters in the same space as latin-2 if you do not use special characters. UTF8 was DESIGNED for that. It was supposed to take exactly the same amount of space as ASCII for ASCII characters. Why on earth MySQL decided that - no matter which characters you have you can
  only fit 1/3 of them in the index? Beats me. It must have been for performance, but it is a terrible choice for unsuspected user who - if they want to go utf route has to basically redesign their database. 
   
   And (as you noticed) max index size of your InnoDB storage is different, depending on your database configuration. What you might also not be aware - the size of the index can go down depending on the page size you choose for the DB. So generally speaking if you want to have design of schema that will work for all the different charset, page sizes, COMPACT setting etc. then your index has to have ..... bear with me ..... "some small-ish size". This is not defined what exactly is "Small enough". The index size in 767 (strange round number isn't it???)  Is it 300 characters ? maybe, in some cases of latin2, but if you use utf8mb4, COMPACT and small page size it is WAY to much . worst possible case and your index size cannot be ~ 80-ish characters or so (or maybe less, don't know). And remember  - that if you have unique columns or foreign keys, where mutliple columns are are involved, the size of the index is COMBINED size of the columns. If you want to combine for columns, then - e
 ach of them has to have > 20 characters. And if you want to add more, then, well, you are out of luck. Good? Not really.
   
   But the really BAD thing is that when you create your schema - you do not KNOW what the limit is. It is arbitrary based on decisions of the deployment side.
   
   What's more - I am not sure if you know, but you can specify different encoding/collation not only for the database, but also different for a schema, table and different for the column (the last one BTW is a trick we managed to achieve Utf8mb4 databases with our index - we default ALL ID columns to be utf8mb3 if utf8mb4 is used for the database). But what's more you can also specifiy encoding and collation on the client (WHAT?) which defaults to what is your LANGUAAGE on the client (????)  and BAD things happen if those two encodings/collations do not agree with each other.
   
   5.7: This was apparently noticed they increased the size of index to 3072 (Andther round number). But all the problems remained: 
   > If you reduce the InnoDB [page size](https://dev.mysql.com/doc/refman/5.7/en/glossary.html#glos_page_size) to 8KB or 4KB by specifying the [innodb_page_size](https://dev.mysql.com/doc/refman/5.7/en/innodb-parameters.html#sysvar_innodb_page_size) option when creating the MySQL instance, the maximum length of the index key is lowered proportionally, based on the limit of 3072 bytes for a 16KB page size. That is, the maximum index key length is 1536 bytes when the page size is 8KB, and 768 bytes when the page size is 4KB.
   
   How on earth shoudl I decide on the column size when I am designing my database???????  For database - basicallly they force me to use autoincremented ids as primary keys. This was maybe good idea in 1980s - but this is an antipattern now and it does not really help when you want to have unique indexes on the actual fields..... because this apply to ALL indexes  -not only primary keys. How do I make sure my long text column is unique together with another long text column? Should I write a TRIGGER to protect against accidenal entering of non-unique value ?
   
   But the real fun thing starts with 8. Bear with me. They had a chance to fix it all and they screwed up even more. Instead of simplifying it, they complicated it even further and made even more unobvious choices and deliberately forward-incompatible ones.
   
   8: Default encoding for 8 utf8mb4 and collation utf8mb4_0900_ai_ci. But all the limits did not change. This means that when previously your DB schema installation worked on 5.7 no "stock" mysql (latin2) - they suddenly might stop working on 8 (because effective size of the index decreased). For us it means that if we did not use the (utf8mb3 id collation trick) - users who previously installed Airlfow on MySQL 5.7 with default settings (vast majo rity of people do not change the encoding/collation and use default) suddenly it would stop working. This is why actually we implemented the trick. Because suddenly people started to raise issues that Airflow cannot be installed on MySQL 8. Very nice "feature"- thank you Oracle. But this is nothing yet. They actually built-in another trap for their unsuspected users (and us). If you are a bit "smarter admin" and in the past you actually did the right thing and chose "utf8" as a character set (sounds legit like?) then ... Bear with me. In 
 MySQL 8 you end up with..... utf8mb3 ... WHAT? Default is 'utf8mb4', but 'utf8' is actually utf8mb3 .... But this is not the worst part. The worst part is deeply hidden in the documentation.
   
   https://dev.mysql.com/doc/refman/8.0/en/charset-unicode.html
   
   > utf8: An alias for utf8mb3. In MySQL 8.0, this alias is deprecated; use utf8mb4 instead. utf8 is expected in a future release to become an alias for utf8mb4.
   
   Let that sink in for a while. .......
   
   There are no plans to increase the index size in the future versions. And what this basically means that if you use 'utf8' as your character set and migrate to MySql 10 (assuming this will be 10) - then, yes, you guessed it , Airflow database will stop working because suddenly the same utf8 you had before will have smaller indexes to use.
   
   This is a trap. Again. 
   
   ----------------
   
   End of a RANT.
   
   Sorry @pingzh but - I lost  a lot of hair because of that already and whenever I can I try to convince everyone - if you only can, switch to Postgres.
   
   BTW. You know that Postgres has virtually no limits on sizes of the indexes. do you? 
   
   > There is no PostgreSQL-imposed limit on the number of indexes you can create on a table. Of course, performance may degrade if you choose to create more and more indexes on a table with more and more columns. PostgreSQL has a limit of 1GB for the size of any one field in a table.
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #25144: Airflow db init fails on mysql 5.7

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #25144:
URL: https://github.com/apache/airflow/issues/25144#issuecomment-1190091267

   I hope the above will explain that I was not mean @pingzh  - it was not my intention for sure. But when someone says that "We should be more careful with our design choices" when it comes to MySQL, there are NO GOOD DESIGN CHOICES. Any design choice you might make might be broken by very bad decisions of their.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #25144: Airflow db init fails on mysql 5.7

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #25144:
URL: https://github.com/apache/airflow/issues/25144#issuecomment-1188685159

   (And you probably has wrong collation set - you likely have utf8_mb4 not utf8_mb3) - all our tests are run with MySQL 5.7.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] pingzh commented on issue #25144: Airflow db init fails on mysql 5.7

Posted by GitBox <gi...@apache.org>.
pingzh commented on issue #25144:
URL: https://github.com/apache/airflow/issues/25144#issuecomment-1189361302

   > This is the problem of your definition See the notes https://airflow.apache.org/docs/apache-airflow/stable/howto/set-up-database.html#setting-up-a-mysql-database about collation used (you need to use utf8_mb3 collation).
   
   hi @potiuk thanks for the comments. In fact for mysql 5.7, the default charset is `utf8` (with collation is `utf8_general_ci` ,which is `utf8mb3` in mysql 5.7 , so using that wont solve the issue.
   
   
   my question is when setting the db columns an indexes, we might be extra careful about the length. for example, for the key of xcom, do we really need `512` that long?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #25144: Airflow db init fails on mysql 5.7

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #25144:
URL: https://github.com/apache/airflow/issues/25144#issuecomment-1189392728

   The really, really big problem is that we cannot make a migration. To decrease the size. This ship has sailed long ago in 1.10. There is no sane way to "cut" ids any more


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #25144: Airflow db init fails on mysql 5.7

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #25144:
URL: https://github.com/apache/airflow/issues/25144#issuecomment-1190520351

   So just to add one more comment: THIS IS EMBODIMENT what MySQL bad decisions lead to:
   
   https://github.com/apache/airflow/issues/24526#issuecomment-1190450784
   
   Precisely the resuilt of all the mess above.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #25144: Airflow db init fails on mysql 5.7

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #25144:
URL: https://github.com/apache/airflow/issues/25144#issuecomment-1189391132

   > (actually your problem is that innodb_large_prefix is off) - you should enable it. We are not going to change that requirement.
   
   As explained: actually your problem is that innodb_large_prefix is off) - you should enable it. We are not going to change that requirement.
   
   > my question is when setting the db columns an indexes, we might be extra careful about the length. for example, for the key of xcom, do we really need 512 that long?
   
   Yes. Unless you want to break compatibliity of about 10000 installations out there.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #25144: Airflow db init fails on mysql 5.7

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #25144:
URL: https://github.com/apache/airflow/issues/25144#issuecomment-1188692464

   (actually your problem is that innodb_large_prefix is off) - you should enable it. We are not going to change that requirement.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #25144: Airflow db init fails on mysql 5.7

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #25144:
URL: https://github.com/apache/airflow/issues/25144#issuecomment-1189886508

   > @potiuk agree that we don't need to decrease it now. my concern is more in the future changes. I think we should be more careful when changing the database schema, as it has implications for users when upgrading.
   
   Agreee. I think we;ve learned something. Back then when it was changed (before me as committer) we become super careful - we have now not only much better migrations (including being able to go back and forth), automated testing of upgrade and downgrade on CI and very careful reviews of what has changeed precisely beceuse we learned from that experience. Collation and index length for MySQL was something that gave us a lot fo trouble. 
   
   You are complaining about a change that was implemented ~3 years ago (you simply were a bit disconnected since).
   
   Check out those:
   
   * https://github.com/apache/airflow/pull/12591
   * https://github.com/apache/airflow/pull/14535
   * https://github.com/apache/airflow/pull/17603
   * https://github.com/apache/airflow/pull/17603
   * https://github.com/apache/airflow/pull/17729
   * https://github.com/apache/airflow/pull/18072
   * https://github.com/apache/airflow/pull/19268
   
   BTW. Speaking of migratioins. If you want to avoid troubles I heartily recomend to move to Postgres. You will not regret it. MySQL is a really bad choice you shoulld reconsider at AirBnB. 
    
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #25144: Airflow db init fails on mysql 5.7

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #25144:
URL: https://github.com/apache/airflow/issues/25144#issuecomment-1189393785

   Personally - I would also vote for decreasing the lenght. But there is simply no sane way to release airlfow 2.* with decreased length of id.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #25144: Airflow db init fails on mysql 5.7

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #25144:
URL: https://github.com/apache/airflow/issues/25144#issuecomment-1188680373

   This is the problem of your definition See the notes https://airflow.apache.org/docs/apache-airflow/stable/howto/set-up-database.html#setting-up-a-mysql-database about collation used (you need to use utf8_mb3 collation).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #25144: Airflow db init fails on mysql 5.7

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #25144:
URL: https://github.com/apache/airflow/issues/25144#issuecomment-1188683149

   BTW. We NEED that long of IDs. This is part of the product.
   
   BTW. MySQL has a number of problems like that that had to be workaround  - we recommend to switch to Postgres - which is rock-solid.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org