You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/09/05 23:52:55 UTC

[GitHub] [incubator-druid] clintropolis commented on a change in pull request #8465: disallow whitespace characters except space in data source names

clintropolis commented on a change in pull request #8465: disallow whitespace characters except space in data source names
URL: https://github.com/apache/incubator-druid/pull/8465#discussion_r321530238
 
 

 ##########
 File path: server/src/main/java/org/apache/druid/segment/indexing/DataSchema.java
 ##########
 @@ -41,13 +41,17 @@
 import java.util.HashSet;
 import java.util.Map;
 import java.util.Set;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
 
 /**
+ *
  */
 public class DataSchema
 {
   private static final Logger log = new Logger(DataSchema.class);
-
+  private static final Pattern INVALIDCHARS = Pattern.compile("(?s).*[^\\S ].*");
 
 Review comment:
   Should we be checking for more invalid characters? Since the underlying issue was an unfriendly character causing issues with zookeeper paths, maybe we should forbid everything [that is not allowed in zookeeper paths](http://zookeeper.apache.org/doc/r3.1.2/zookeeperProgrammers.html#ch_zkDataModel). I think these are the relevant ones:
   
   >ZooKeeper has a hierarchal name space, much like a distributed file system. The only difference is that each node in the namespace can have data associated with it as well as children. It is like having a file system that allows a file to also be a directory. Paths to nodes are always expressed as canonical, absolute, slash-separated paths; there are no relative reference. Any unicode character can be used in a path subject to the following constraints:
   
   > * The null character (\u0000) cannot be part of a path name. (This causes problems with the C binding.)
   > * The following characters can't be used because they don't display well, or render in confusing ways: \u0001 - \u0019 and \u007F - \u009F.
   > * The following characters are not allowed: \ud800 -uF8FFF, \uFFF0-uFFFF, \uXFFFE - \uXFFFF (where X is a digit 1 - E), \uF0000 - \uFFFFF.
   
   This might 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org