You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "zeroshade (via GitHub)" <gi...@apache.org> on 2023/03/14 15:26:06 UTC

[GitHub] [arrow] zeroshade commented on a diff in pull request #34558: feat: Add binary support for CSV

zeroshade commented on code in PR #34558:
URL: https://github.com/apache/arrow/pull/34558#discussion_r1135724467


##########
go/arrow/csv/reader.go:
##########
@@ -469,6 +469,10 @@ func (r *Reader) initFieldConverter(bldr array.Builder) func(string) {
 		return func(s string) {
 			r.parseList(bldr, s)
 		}
+	case arrow.BinaryDataType:
+		return func(s string) {
+			r.parseBinaryDataType(bldr, s)
+		}

Review Comment:
   Can you add a case for `*arrow.LargeStringType` above this (such as with the string case) so that it doesn't get dropped into this case too? (I'm pretty sure the LargeString type does meet this interface)



##########
go/arrow/csv/reader.go:
##########
@@ -469,6 +469,10 @@ func (r *Reader) initFieldConverter(bldr array.Builder) func(string) {
 		return func(s string) {
 			r.parseList(bldr, s)
 		}
+	case arrow.BinaryDataType:
+		return func(s string) {
+			r.parseBinaryDataType(bldr, s)

Review Comment:
   Like the string case, we probably want to check the `r.stringsCanBeNull` value to determine whether we want to check for nulls or just let it be empty values etc.



##########
go/arrow/csv/transformer.go:
##########
@@ -215,6 +216,15 @@ func (w *Writer) transformColToStringArr(typ arrow.DataType, col arrow.Array) []
 				res[i] = w.nullValue
 			}
 		}
+	case arrow.BinaryDataType:

Review Comment:
   same as above, make sure that `LargeString` is covered above this case so it doesn't fall into here.



##########
go/arrow/csv/reader.go:
##########
@@ -755,6 +759,18 @@ func (r *Reader) parseList(field array.Builder, str string) {
 	}
 }
 
+func (r *Reader) parseBinaryDataType(field array.Builder, str string) {
+	if r.isNull(str) {
+		field.AppendNull()
+		return
+	}
+	decodedVal, err := base64.StdEncoding.DecodeString(str)

Review Comment:
   do we want to check both the padded and non-padded encodings? (ie: both `StdEncoding` and `RawStdEncoding`, checking the other if the first one errors)
   
   If we don't check both encodings, we should make sure the documentation states which we use.



##########
go/arrow/csv/testdata/types.csv:
##########
@@ -15,7 +15,7 @@
 # specific language governing permissions and limitations
 # under the License.
 #
-## supported types: bool;int8;int16;int32;int64;uint8;uint16;uint32;uint64;float32;float64;string;timestamp
-true;-1;-1;-1;-1;1;1;1;1;1.1;1.1;str-1;2022-05-09T00:01:01;{1,2,3}
-false;-2;-2;-2;-2;2;2;2;2;2.2;2.2;str-2;2022-05-09T23:59:59;{}
-null;NULL;null;N/A;;null;null;null;null;null;null;null;null;null
\ No newline at end of file
+## supported types: bool;int8;int16;int32;int64;uint8;uint16;uint32;uint64;float32;float64;string;timestamp;list(i64);binary
+true;-1;-1;-1;-1;1;1;1;1;1.1;1.1;str-1;2022-05-09T00:01:01;{1,2,3};AAEC
+false;-2;-2;-2;-2;2;2;2;2;2.2;2.2;str-2;2022-05-09T23:59:59;{};AwQF
+null;NULL;null;N/A;;null;null;null;null;null;null;null;null;null;null

Review Comment:
   Can we make sure there's a test case for testing an empty value in addition to a null value?



##########
go/arrow/csv/transformer.go:
##########
@@ -215,6 +216,15 @@ func (w *Writer) transformColToStringArr(typ arrow.DataType, col arrow.Array) []
 				res[i] = w.nullValue
 			}
 		}
+	case arrow.BinaryDataType:
+		arr := col.(*array.Binary)

Review Comment:
   you can't assume this, it could also be `*array.LargeBinary` or otherwise. If you don't want to duplicate the code and create cases for both `LargeBinary` and `Binary` separately, you can create and use an interface which contains `Value(int) []byte` which would cover both `Binary` and `LargeBinary` types.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org