You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/08/22 14:52:04 UTC

[GitHub] [arrow] kszucs commented on a change in pull request #8008: ARROW-9369: [Python] Support conversion from python sequence to dictionary type

kszucs commented on a change in pull request #8008:
URL: https://github.com/apache/arrow/pull/8008#discussion_r475098380



##########
File path: cpp/src/arrow/python/python_to_arrow.cc
##########
@@ -1123,6 +1168,50 @@ class DecimalConverter : public TypedConverter<arrow::Decimal128Type, null_codin
   std::shared_ptr<DecimalType> decimal_type_;
 };
 
+#define DICTIONARY_PRIMITIVE(TYPE_ENUM, TYPE_CLASS)                 \
+  case Type::TYPE_ENUM:                                             \
+    *out = std::unique_ptr<SeqConverter>(                           \
+        new PrimitiveDictionaryConverter<TYPE_CLASS, null_coding>); \
+    break;
+
+#define DICTIONARY_BINARY_LIKE(TYPE_ENUM, TYPE_CLASS)                \
+  case Type::TYPE_ENUM:                                              \
+    *out = std::unique_ptr<SeqConverter>(                            \
+        new BinaryLikeDictionaryConverter<TYPE_CLASS, null_coding>); \
+    break;
+
+template <NullCoding null_coding>
+Status GetDictionaryConverter(const std::shared_ptr<DataType>& type,
+                              std::unique_ptr<SeqConverter>* out) {
+  const auto& dict_type = checked_cast<const DictionaryType&>(*type);
+  const auto& value_type = dict_type.value_type();
+
+  switch (value_type->id()) {
+    DICTIONARY_PRIMITIVE(BOOL, BooleanType);
+    DICTIONARY_PRIMITIVE(INT8, Int8Type);
+    DICTIONARY_PRIMITIVE(INT16, Int16Type);
+    DICTIONARY_PRIMITIVE(INT32, Int32Type);
+    DICTIONARY_PRIMITIVE(INT64, Int64Type);
+    DICTIONARY_PRIMITIVE(UINT8, UInt8Type);
+    DICTIONARY_PRIMITIVE(UINT16, UInt16Type);
+    DICTIONARY_PRIMITIVE(UINT32, UInt32Type);
+    DICTIONARY_PRIMITIVE(UINT64, UInt64Type);
+    DICTIONARY_PRIMITIVE(HALF_FLOAT, HalfFloatType);
+    DICTIONARY_PRIMITIVE(FLOAT, FloatType);
+    DICTIONARY_PRIMITIVE(DOUBLE, DoubleType);
+    DICTIONARY_PRIMITIVE(DATE32, Date32Type);
+    DICTIONARY_PRIMITIVE(DATE64, Date64Type);
+    DICTIONARY_BINARY_LIKE(BINARY, BinaryType);
+    DICTIONARY_BINARY_LIKE(STRING, StringType);
+    // DICTIONARY_BINARY_LIKE(LARGE_BINARY, LargeBinaryType);
+    // DICTIONARY_BINARY_LIKE(LARGE_STRING, LargeStringType);

Review comment:
       Yes, there are a couple of design decisions we need to make because of the following problems:
   - the index key type is ignored since adaptive builder is used
   - large binary/string types are not supported by the builder which I'm not sure whether is intentional or not




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org