You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/07/06 07:42:31 UTC

[GitHub] [spark] HeartSaVioR edited a comment on pull request #28967: [SPARK-32149][SHUFFLE] Improve file path name normalisation at block resolution within the external shuffle service

HeartSaVioR edited a comment on pull request #28967:
URL: https://github.com/apache/spark/pull/28967#issuecomment-654065873


   Thanks for the links. That's the all what I'd like to see.
   
   > This is a redundant code of the package-private JDK counterpart. As the code not a perfect match even it could happen one method results in a bit different (but semantically equal) path.
   
   Yeah I just wanted to see which code JDK would run to normalize the path by itself (so the comment `here the old createNormalizedInternedPathname was as good as it could imitate the java.io.FileSystem#normalize()` is the answer for me), and honestly didn't know the method name would be just "normalize". (I should have just try finding by myself. My bad.)
   
   For sure, I prefer to follow the normalization provided by the JDK, which at least don't use regex which would be slower than the char manipulation. That said, I agree that we feel confident to exclude the test part as well, as the code is replaced with JDK one we tend to have belief.
   
   That said, assuming we never create weird file name containing separators, the only thing the normalization is in effect is localDirs - we could probably cost only once for each entry to normalize the entry, and avoid normalizing all further calls. (I meant path being changed during normalization. The normalization check can't be avoided, as JDK will do.)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org