You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/05/19 10:35:52 UTC

[GitHub] [arrow] pitrou opened a new pull request #7223: ARROW-8711: [Python] Expose timestamp_parsers in csv.ConvertOptions

pitrou opened a new pull request #7223:
URL: https://github.com/apache/arrow/pull/7223


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] fsaintjacques closed pull request #7223: ARROW-8711: [Python] Expose timestamp_parsers in csv.ConvertOptions

Posted by GitBox <gi...@apache.org>.
fsaintjacques closed pull request #7223:
URL: https://github.com/apache/arrow/pull/7223


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7223: ARROW-8711: [Python] Expose timestamp_parsers in csv.ConvertOptions

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #7223:
URL: https://github.com/apache/arrow/pull/7223#discussion_r428043526



##########
File path: python/pyarrow/_csv.pyx
##########
@@ -296,10 +297,31 @@ cdef class ParseOptions:
         out.options = options
         return out
 
-    def __reduce__(self):
-        return ParseOptions, (self.delimiter, self.quote_char,
-                              self.double_quote, self.escape_char,
-                              self.newlines_in_values, self.ignore_empty_lines)
+    def __getstate__(self):
+        return (self.delimiter, self.quote_char, self.double_quote,
+                self.escape_char, self.newlines_in_values,
+                self.ignore_empty_lines)
+
+    def __setstate__(self, state):
+        (self.delimiter, self.quote_char, self.double_quote,
+         self.escape_char, self.newlines_in_values,
+         self.ignore_empty_lines) = state
+
+
+cdef class _ISO8601:
+    """
+    A special object indicating ISO-8601 parsing.

Review comment:
       Alternative could also be to use a string for this? (so you could do like `["ISO-8601", "%Y/%m/%d"]`) 
   I understand that it is cleaner to use a separate object since "ISO-8601" is strictly speaking not a format string, but it would that users need to use yet another object (no strong objection to the current approach, though, to be clear).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] fsaintjacques commented on a change in pull request #7223: ARROW-8711: [Python] Expose timestamp_parsers in csv.ConvertOptions

Posted by GitBox <gi...@apache.org>.
fsaintjacques commented on a change in pull request #7223:
URL: https://github.com/apache/arrow/pull/7223#discussion_r434275294



##########
File path: python/pyarrow/_csv.pyx
##########
@@ -296,10 +297,31 @@ cdef class ParseOptions:
         out.options = options
         return out
 
-    def __reduce__(self):
-        return ParseOptions, (self.delimiter, self.quote_char,
-                              self.double_quote, self.escape_char,
-                              self.newlines_in_values, self.ignore_empty_lines)
+    def __getstate__(self):
+        return (self.delimiter, self.quote_char, self.double_quote,
+                self.escape_char, self.newlines_in_values,
+                self.ignore_empty_lines)
+
+    def __setstate__(self, state):
+        (self.delimiter, self.quote_char, self.double_quote,
+         self.escape_char, self.newlines_in_values,
+         self.ignore_empty_lines) = state
+
+
+cdef class _ISO8601:

Review comment:
       yes




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #7223: ARROW-8711: [Python] Expose timestamp_parsers in csv.ConvertOptions

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7223:
URL: https://github.com/apache/arrow/pull/7223#issuecomment-630740695


   https://issues.apache.org/jira/browse/ARROW-8711


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] fsaintjacques commented on a change in pull request #7223: ARROW-8711: [Python] Expose timestamp_parsers in csv.ConvertOptions

Posted by GitBox <gi...@apache.org>.
fsaintjacques commented on a change in pull request #7223:
URL: https://github.com/apache/arrow/pull/7223#discussion_r434275238



##########
File path: python/pyarrow/_csv.pyx
##########
@@ -296,10 +297,31 @@ cdef class ParseOptions:
         out.options = options
         return out
 
-    def __reduce__(self):
-        return ParseOptions, (self.delimiter, self.quote_char,
-                              self.double_quote, self.escape_char,
-                              self.newlines_in_values, self.ignore_empty_lines)
+    def __getstate__(self):
+        return (self.delimiter, self.quote_char, self.double_quote,
+                self.escape_char, self.newlines_in_values,
+                self.ignore_empty_lines)
+
+    def __setstate__(self, state):
+        (self.delimiter, self.quote_char, self.double_quote,
+         self.escape_char, self.newlines_in_values,
+         self.ignore_empty_lines) = state
+
+
+cdef class _ISO8601:
+    """
+    A special object indicating ISO-8601 parsing.

Review comment:
       Good ux feature, I'll take note.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #7223: ARROW-8711: [Python] Expose timestamp_parsers in csv.ConvertOptions

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #7223:
URL: https://github.com/apache/arrow/pull/7223#discussion_r434010492



##########
File path: cpp/src/arrow/util/value_parsing.h
##########
@@ -45,6 +45,10 @@ class ARROW_EXPORT TimestampParser {
   virtual bool operator()(const char* s, size_t length, TimeUnit::type out_unit,
                           int64_t* out) const = 0;
 
+  virtual const char* kind() const = 0;
+
+  virtual const char* detail() const;

Review comment:
       How would you do it otherwise?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] fsaintjacques commented on a change in pull request #7223: ARROW-8711: [Python] Expose timestamp_parsers in csv.ConvertOptions

Posted by GitBox <gi...@apache.org>.
fsaintjacques commented on a change in pull request #7223:
URL: https://github.com/apache/arrow/pull/7223#discussion_r433909097



##########
File path: cpp/src/arrow/util/value_parsing.h
##########
@@ -45,6 +45,10 @@ class ARROW_EXPORT TimestampParser {
   virtual bool operator()(const char* s, size_t length, TimeUnit::type out_unit,
                           int64_t* out) const = 0;
 
+  virtual const char* kind() const = 0;
+
+  virtual const char* detail() const;

Review comment:
       Do you need this to be virtual?

##########
File path: python/pyarrow/_csv.pyx
##########
@@ -296,10 +297,31 @@ cdef class ParseOptions:
         out.options = options
         return out
 
-    def __reduce__(self):
-        return ParseOptions, (self.delimiter, self.quote_char,
-                              self.double_quote, self.escape_char,
-                              self.newlines_in_values, self.ignore_empty_lines)
+    def __getstate__(self):
+        return (self.delimiter, self.quote_char, self.double_quote,
+                self.escape_char, self.newlines_in_values,
+                self.ignore_empty_lines)
+
+    def __setstate__(self, state):
+        (self.delimiter, self.quote_char, self.double_quote,
+         self.escape_char, self.newlines_in_values,
+         self.ignore_empty_lines) = state
+
+
+cdef class _ISO8601:

Review comment:
       Why not make an equivalent class for `strptime`? That would make timestamp_parsers's setter more robust to refactor, e.g. a new field in the wrapped c++ class. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #7223: ARROW-8711: [Python] Expose timestamp_parsers in csv.ConvertOptions

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #7223:
URL: https://github.com/apache/arrow/pull/7223#discussion_r434419478



##########
File path: cpp/src/arrow/util/value_parsing.h
##########
@@ -45,6 +45,10 @@ class ARROW_EXPORT TimestampParser {
   virtual bool operator()(const char* s, size_t length, TimeUnit::type out_unit,
                           int64_t* out) const = 0;
 
+  virtual const char* kind() const = 0;
+
+  virtual const char* detail() const;

Review comment:
       Then we need to expose StrptimeTimestampParser publicly... let's see...




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #7223: ARROW-8711: [Python] Expose timestamp_parsers in csv.ConvertOptions

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #7223:
URL: https://github.com/apache/arrow/pull/7223#discussion_r434421634



##########
File path: python/pyarrow/_csv.pyx
##########
@@ -296,10 +297,31 @@ cdef class ParseOptions:
         out.options = options
         return out
 
-    def __reduce__(self):
-        return ParseOptions, (self.delimiter, self.quote_char,
-                              self.double_quote, self.escape_char,
-                              self.newlines_in_values, self.ignore_empty_lines)
+    def __getstate__(self):
+        return (self.delimiter, self.quote_char, self.double_quote,
+                self.escape_char, self.newlines_in_values,
+                self.ignore_empty_lines)
+
+    def __setstate__(self, state):
+        (self.delimiter, self.quote_char, self.double_quote,
+         self.escape_char, self.newlines_in_values,
+         self.ignore_empty_lines) = state
+
+
+cdef class _ISO8601:

Review comment:
       Then it sounds like a bad idea to make the API more difficult to use.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #7223: ARROW-8711: [Python] Expose timestamp_parsers in csv.ConvertOptions

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #7223:
URL: https://github.com/apache/arrow/pull/7223#discussion_r434011213



##########
File path: python/pyarrow/_csv.pyx
##########
@@ -296,10 +297,31 @@ cdef class ParseOptions:
         out.options = options
         return out
 
-    def __reduce__(self):
-        return ParseOptions, (self.delimiter, self.quote_char,
-                              self.double_quote, self.escape_char,
-                              self.newlines_in_values, self.ignore_empty_lines)
+    def __getstate__(self):
+        return (self.delimiter, self.quote_char, self.double_quote,
+                self.escape_char, self.newlines_in_values,
+                self.ignore_empty_lines)
+
+    def __setstate__(self, state):
+        (self.delimiter, self.quote_char, self.double_quote,
+         self.escape_char, self.newlines_in_values,
+         self.ignore_empty_lines) = state
+
+
+cdef class _ISO8601:

Review comment:
       Do you mean people would have to pass e.g. `[Strptime('%Y')]` instead of `['%Y']`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #7223: ARROW-8711: [Python] Expose timestamp_parsers in csv.ConvertOptions

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #7223:
URL: https://github.com/apache/arrow/pull/7223#discussion_r428071211



##########
File path: python/pyarrow/_csv.pyx
##########
@@ -296,10 +297,31 @@ cdef class ParseOptions:
         out.options = options
         return out
 
-    def __reduce__(self):
-        return ParseOptions, (self.delimiter, self.quote_char,
-                              self.double_quote, self.escape_char,
-                              self.newlines_in_values, self.ignore_empty_lines)
+    def __getstate__(self):
+        return (self.delimiter, self.quote_char, self.double_quote,
+                self.escape_char, self.newlines_in_values,
+                self.ignore_empty_lines)
+
+    def __setstate__(self, state):
+        (self.delimiter, self.quote_char, self.double_quote,
+         self.escape_char, self.newlines_in_values,
+         self.ignore_empty_lines) = state
+
+
+cdef class _ISO8601:
+    """
+    A special object indicating ISO-8601 parsing.

Review comment:
       It's easy to misspell "ISO-8601" (for example forget the hyphen, or use lowercase) and then get an entirely different behaviour :-) That's why I went for the dedicated object (if you misspell it, you get an exception).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] fsaintjacques commented on a change in pull request #7223: ARROW-8711: [Python] Expose timestamp_parsers in csv.ConvertOptions

Posted by GitBox <gi...@apache.org>.
fsaintjacques commented on a change in pull request #7223:
URL: https://github.com/apache/arrow/pull/7223#discussion_r434275053



##########
File path: cpp/src/arrow/util/value_parsing.h
##########
@@ -45,6 +45,10 @@ class ARROW_EXPORT TimestampParser {
   virtual bool operator()(const char* s, size_t length, TimeUnit::type out_unit,
                           int64_t* out) const = 0;
 
+  virtual const char* kind() const = 0;
+
+  virtual const char* detail() const;

Review comment:
       Don't make this method part of the interface and only part of StrptimeTimestampParser.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org