You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@superset.apache.org by GitBox <gi...@apache.org> on 2022/11/18 16:43:50 UTC

[GitHub] [superset] artemonsh commented on pull request #22118: fix: slug is empty if filename is non-ASCII

artemonsh commented on PR #22118:
URL: https://github.com/apache/superset/pull/22118#issuecomment-1320268015

   @EugeneTorap @villebro 
   Could you please take a look at this issue: https://github.com/apache/superset/issues/21657?
   
   There is a quite annoying problem with cyrillic languages like Russian. In short, dashboards, charts and databases containing only cyrillic letters **cannot be imported into Superset**! The function Eugene provided, for instance, converts chart title "Мой график" ("My chart") into an empty string in python, leading to this type of filename: `_123.yaml`. And as the first character in the filename is underscore, the `is_valid_config` function returns `False` for this filename, which then prohibits to import the file, despite the notification in the bottom right corner saying that everything was imported successfully :/
   
   Therefore, charts/dashboards/databases with these types of titles **cannot not be imported** into Superset. My suggestion is to expand werkzeug's `secure_filename` [function](https://tedboy.github.io/flask/_modules/werkzeug/utils.html#secure_filename) as follows:
   
   ```python
   import unicodedata
   
   
   def secure_filename(filename: str) -> str:
       r"""Pass it a filename and it will return a secure version of it.  This
       filename can then safely be stored on a regular file system and passed
       to :func:`os.path.join`.
   
       On windows systems the function also makes sure that the file is not
       named after one of the special device files.
   
       The function also takes filenames containing cyrillic letters.
   
       >>> secure_filename("My cool movie.mov")
       'My_cool_movie.mov'
       >>> secure_filename("../../../etc/passwd")
       'etc_passwd'
       >>> secure_filename('i contain cool \xfcml\xe4uts.txt')
       'i_contain_cool_umlauts.txt'
       >>> secure_filename('Мой красивый график.yaml')
       'Мой_красивыи_график.yaml'
   
       The function might return an empty filename.  It's your responsibility
       to ensure that the filename is unique and that you abort or
       generate a random filename if the function returned an empty one.
   
       .. versionadded:: 0.5
   
       :param filename: the filename to secure
       """
       # If the text contains cyrillic letters, ASCII encoding should not
       # be used as it does not contain cyrillic letters
       contains_cyrillic_letters = bool(re.search("[\u0400-\u04FF]", filename))
   
       _windows_device_files = (
           "CON",
           "AUX",
           "COM1",
           "COM2",
           "COM3",
           "COM4",
           "LPT1",
           "LPT2",
           "LPT3",
           "PRN",
           "NUL",
       )
   
       _filename_ascii_strip_re = re.compile(r"[^A-Za-z0-9_.-]")
       _filename_strip_re = (
           re.compile(r"[^A-Za-zа-яА-ЯёЁ0-9_.-]")
           if contains_cyrillic_letters
           else _filename_ascii_strip_re
       )
   
       filename = unicodedata.normalize("NFKD", filename)
       if not contains_cyrillic_letters:
           filename = filename.encode("ascii", "ignore").decode("ascii")
   
       for sep in os.path.sep, os.path.altsep:
           if sep:
               filename = filename.replace(sep, " ")
       filename = str(_filename_strip_re.sub("", "_".join(filename.split()))).strip("._")
   
       # on nt a couple of special files are present in each folder.  We
       # have to ensure that the target file is not such a filename.  In
       # this case we prepend an underline
       if (
           os.name == "nt"
           and filename
           and filename.split(".")[0].upper() in _windows_device_files
       ):
           filename = f"_{filename}"
   
       return filename
      ```
      
      Before:
      ```python
   >>>print(secure_filename("Мой красивый график"))
   >>>print(secure_filename("My beautiful график"))
   >>>print(secure_filename("My beautiful chart"))
   
   My_beautiful
   My_beautiful_chart
      ```
      After:
      ```python
   >>>print(secure_filename("Мой красивый график"))
   >>>print(secure_filename("My beautiful график"))
   >>>print(secure_filename("My beautiful chart"))
   Мои_красивыи_график
   My_beautiful_график
   My_beautiful_chart
      ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org