You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/05/31 14:31:39 UTC

[GitHub] [arrow] zeroshade commented on a diff in pull request #13136: ARROW-16556: [Go] Add Layout method to DataTypes

zeroshade commented on code in PR #13136:
URL: https://github.com/apache/arrow/pull/13136#discussion_r885716884


##########
go/arrow/datatype.go:
##########
@@ -209,3 +210,77 @@ func timeUnitFingerprint(unit TimeUnit) rune {
 		return rune(0)
 	}
 }
+
+// BufferKind describes the type of buffer expected when defining a layout specification
+type BufferKind int8
+
+// The expected types of buffers
+const (
+	KindFixedWidth BufferKind = iota
+	KindVarWidth
+	KindBitmap
+	KindAlwaysNull
+)
+
+// BufferSpec provides a specification for the buffers of a particular datatype
+type BufferSpec struct {
+	Kind      BufferKind
+	ByteWidth int // for KindFixedWidth
+}
+
+func (b BufferSpec) Equals(other BufferSpec) bool {
+	return b.Kind == other.Kind && (b.Kind != KindFixedWidth || b.ByteWidth == other.ByteWidth)
+}
+
+// DataTypeLayout represents the physical layout of a datatype's buffers including
+// the number of and types of those binary buffers. This will correspond
+// with the buffers in the ArrayData for an array of that type.
+type DataTypeLayout struct {
+	Buffers []BufferSpec
+	HasDict bool
+}
+
+func SpecFixedWidth(w int) BufferSpec { return BufferSpec{KindFixedWidth, w} }
+func SpecVariableWidth() BufferSpec   { return BufferSpec{KindVarWidth, -1} }
+func SpecBitmap() BufferSpec          { return BufferSpec{KindBitmap, -1} }
+func SpecAlwaysNull() BufferSpec      { return BufferSpec{KindAlwaysNull, -1} }
+
+// IsInteger is a helper to return true if the type ID provided is one of the
+// integral types of uint or int with the varying sizes.
+func IsInteger(t Type) bool {
+	switch t {
+	case UINT8, INT8, UINT16, INT16, UINT32, INT32, UINT64, INT64:
+		return true
+	}
+	return false
+}
+
+// IsPrimitive returns true if the provided type ID represents a fixed width
+// primitive type.
+func IsPrimitive(t Type) bool {
+	switch t {
+	case BOOL, UINT8, INT8, UINT16, INT16, UINT32, INT32, UINT64, INT64,
+		FLOAT16, FLOAT32, FLOAT64, DATE32, DATE64, TIME32, TIME64, TIMESTAMP,
+		DURATION, INTERVAL_MONTHS, INTERVAL_DAY_TIME, INTERVAL_MONTH_DAY_NANO:
+		return true
+	}
+	return false
+}
+
+// IsBaseBinary returns true for Binary/String and their LARGE variants
+func IsBaseBinary(t Type) bool {
+	switch t {
+	case BINARY, STRING, LARGE_BINARY, LARGE_STRING:
+		return true
+	}
+	return false
+}
+
+// IsFixedSizeBinary returns true for Decimal128/256 and FixedSizeBinary
+func IsFixedSizeBinary(t Type) bool {
+	switch t {
+	case DECIMAL128, DECIMAL256, FIXED_SIZE_BINARY:

Review Comment:
   Honestly, I was following the categorizations that the C++ had. But thinking about it, technically the reason for `IsPrimitive` vs `IsFixedSizeBinary` is the fact that the "primitive" values are each an interpretable value in and of themselves as a primitive that exists in most programming languages (an integer of a specific bit-width or a float/double) as opposed to the DECIMAL values which are all represented as a compound value of more than one primitive (for example DECIMAL128 is two 64-bit integers). That's likely the original thinking that led to the categorization in the C++ that I was following here. 
   
   It's also probably why it's `IsPrimitive` rather than `IsKnownFixedWidth`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org