You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/07/08 15:18:40 UTC

[GitHub] [arrow] kszucs opened a new pull request #7684: ARROW-9374: [C++][Python] Expose MakeArrayFromScalar [WIP]

kszucs opened a new pull request #7684:
URL: https://github.com/apache/arrow/pull/7684


   Since we have a complete scalar implementation on the python side, we can implement `pa.repeat(value, size=n)`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on a change in pull request #7684: ARROW-9374: [C++][Python] Expose MakeArrayFromScalar [WIP]

Posted by GitBox <gi...@apache.org>.
kszucs commented on a change in pull request #7684:
URL: https://github.com/apache/arrow/pull/7684#discussion_r451626479



##########
File path: python/pyarrow/tests/test_array.py
##########
@@ -297,6 +297,24 @@ def test_nulls(ty):
         assert arr.type == ty
 
 
+@pytest.mark.parametrize(('value', 'size', 'expected'), [
+    (None, 1, pa.array([None])),
+    (None, 10, pa.nulls(10)),
+    (-1, 3, pa.array([-1, -1, -1], type=pa.int64())),
+    (2.71, 2, pa.array([2.71, 2.71], type=pa.float64())),
+    ("string", 4, pa.array(["string"] * 4)),
+    (pa.scalar(8, type=pa.uint8()), 17, pa.array([8] * 17, type=pa.uint8())),
+    (pa.scalar(None), 3, pa.array([None, None, None])),
+    (pa.scalar(True), 11, pa.array([True] * 11)),
+    (False, 9, pa.array([False] * 9)),
+    # ([1, 2], 2, pa.array([[1, 2], [1, 2]]))

Review comment:
       The C++ implementation is not complete, so I'm going to extend its type support.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #7684: ARROW-9374: [C++][Python] Expose MakeArrayFromScalar [WIP]

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7684:
URL: https://github.com/apache/arrow/pull/7684#issuecomment-655592710


   https://issues.apache.org/jira/browse/ARROW-9374


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on a change in pull request #7684: ARROW-9374: [C++][Python] Expose MakeArrayFromScalar [WIP]

Posted by GitBox <gi...@apache.org>.
wesm commented on a change in pull request #7684:
URL: https://github.com/apache/arrow/pull/7684#discussion_r453243667



##########
File path: python/pyarrow/tests/test_scalars.py
##########
@@ -458,6 +458,7 @@ def test_map():
     assert len(s) == 2
     assert isinstance(s, pa.MapScalar)
     assert isinstance(s.values, pa.Array)
+

Review comment:
       remove whitespace change

##########
File path: python/pyarrow/array.pxi
##########
@@ -334,6 +334,25 @@ def nulls(size, type=None, MemoryPool memory_pool=None):
     return pyarrow_wrap_array(arr)
 
 
+def repeat(value, size, MemoryPool memory_pool=None):
+    cdef:

Review comment:
       needs docstring




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on a change in pull request #7684: ARROW-9374: [C++][Python] Expose MakeArrayFromScalar

Posted by GitBox <gi...@apache.org>.
kszucs commented on a change in pull request #7684:
URL: https://github.com/apache/arrow/pull/7684#discussion_r453896254



##########
File path: cpp/src/arrow/array/array_test.cc
##########
@@ -384,6 +398,40 @@ TEST_F(TestArray, TestMakeArrayFromScalar) {
   }
 }
 
+TEST_F(TestArray, TestMakeArrayFromDictionaryScalar) {
+  auto dictionary = ArrayFromJSON(utf8(), R"(["foo", "bar", "baz"])");
+  auto type = std::make_shared<DictionaryType>(int8(), utf8());
+  ASSERT_OK_AND_ASSIGN(auto value, MakeScalar(int8(), 1));
+  auto scalar = DictionaryScalar({value, dictionary}, type);

Review comment:
       I know, just my eyes prefer the assignments instead.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on a change in pull request #7684: ARROW-9374: [C++][Python] Expose MakeArrayFromScalar

Posted by GitBox <gi...@apache.org>.
wesm commented on a change in pull request #7684:
URL: https://github.com/apache/arrow/pull/7684#discussion_r453807979



##########
File path: cpp/src/arrow/array/array_test.cc
##########
@@ -384,6 +398,40 @@ TEST_F(TestArray, TestMakeArrayFromScalar) {
   }
 }
 
+TEST_F(TestArray, TestMakeArrayFromDictionaryScalar) {
+  auto dictionary = ArrayFromJSON(utf8(), R"(["foo", "bar", "baz"])");
+  auto type = std::make_shared<DictionaryType>(int8(), utf8());

Review comment:
       `::arrow::dictionary` works too

##########
File path: cpp/src/arrow/array/array_test.cc
##########
@@ -384,6 +398,40 @@ TEST_F(TestArray, TestMakeArrayFromScalar) {
   }
 }
 
+TEST_F(TestArray, TestMakeArrayFromDictionaryScalar) {
+  auto dictionary = ArrayFromJSON(utf8(), R"(["foo", "bar", "baz"])");
+  auto type = std::make_shared<DictionaryType>(int8(), utf8());
+  ASSERT_OK_AND_ASSIGN(auto value, MakeScalar(int8(), 1));
+  auto scalar = DictionaryScalar({value, dictionary}, type);

Review comment:
       Note you can also do `DictionaryScalar scalar({..., `




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm closed pull request #7684: ARROW-9374: [C++][Python] Expose MakeArrayFromScalar

Posted by GitBox <gi...@apache.org>.
wesm closed pull request #7684:
URL: https://github.com/apache/arrow/pull/7684


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on a change in pull request #7684: ARROW-9374: [C++][Python] Expose MakeArrayFromScalar

Posted by GitBox <gi...@apache.org>.
kszucs commented on a change in pull request #7684:
URL: https://github.com/apache/arrow/pull/7684#discussion_r453648012



##########
File path: cpp/src/arrow/array/util.cc
##########
@@ -323,15 +336,52 @@ class RepeatedArrayFactory {
     return Status::OK();
   }
 
-  Status Visit(const FixedSizeBinaryType&) {
-    std::shared_ptr<Buffer> value =
-        checked_cast<const FixedSizeBinaryScalar&>(scalar_).value;
-    return FinishFixedWidth(value->data(), value->size());
+  template <typename T>
+  enable_if_var_size_list<T, Status> Visit(const T& type) {
+    using ScalarType = typename TypeTraits<T>::ScalarType;
+    using ArrayType = typename TypeTraits<T>::ArrayType;
+
+    auto value = checked_cast<const ScalarType&>(scalar_).value;
+
+    ArrayVector values(length_, value);
+    ARROW_ASSIGN_OR_RAISE(auto value_array, Concatenate(values, pool_));
+
+    std::shared_ptr<Buffer> offsets_buffer;
+    auto size = static_cast<typename T::offset_type>(value->length());
+    RETURN_NOT_OK(CreateOffsetsBuffer(size, &offsets_buffer));
+
+    out_ =
+        std::make_shared<ArrayType>(scalar_.type, length_, offsets_buffer, value_array);
+    return Status::OK();
   }
 
-  Status Visit(const Decimal128Type&) {
-    auto value = checked_cast<const Decimal128Scalar&>(scalar_).value.ToBytes();
-    return FinishFixedWidth(value.data(), value.size());
+  Status Visit(const FixedSizeListType& type) {
+    auto value = checked_cast<const FixedSizeListScalar&>(scalar_).value;
+
+    ArrayVector values(length_, value);
+    ARROW_ASSIGN_OR_RAISE(auto value_array, Concatenate(values, pool_));
+
+    out_ = std::make_shared<FixedSizeListArray>(scalar_.type, length_, value_array);
+    return Status::OK();
+  }
+
+  Status Visit(const MapType& type) {
+    auto map_scalar = checked_cast<const MapScalar&>(scalar_);
+    auto struct_array = checked_cast<const StructArray*>(map_scalar.value.get());
+
+    ArrayVector keys(length_, struct_array->field(0));
+    ArrayVector values(length_, struct_array->field(1));
+
+    ARROW_ASSIGN_OR_RAISE(auto key_array, Concatenate(keys, pool_));

Review comment:
       We could add a more efficient implementations later.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on a change in pull request #7684: ARROW-9374: [C++][Python] Expose MakeArrayFromScalar

Posted by GitBox <gi...@apache.org>.
kszucs commented on a change in pull request #7684:
URL: https://github.com/apache/arrow/pull/7684#discussion_r453648012



##########
File path: cpp/src/arrow/array/util.cc
##########
@@ -323,15 +336,52 @@ class RepeatedArrayFactory {
     return Status::OK();
   }
 
-  Status Visit(const FixedSizeBinaryType&) {
-    std::shared_ptr<Buffer> value =
-        checked_cast<const FixedSizeBinaryScalar&>(scalar_).value;
-    return FinishFixedWidth(value->data(), value->size());
+  template <typename T>
+  enable_if_var_size_list<T, Status> Visit(const T& type) {
+    using ScalarType = typename TypeTraits<T>::ScalarType;
+    using ArrayType = typename TypeTraits<T>::ArrayType;
+
+    auto value = checked_cast<const ScalarType&>(scalar_).value;
+
+    ArrayVector values(length_, value);
+    ARROW_ASSIGN_OR_RAISE(auto value_array, Concatenate(values, pool_));
+
+    std::shared_ptr<Buffer> offsets_buffer;
+    auto size = static_cast<typename T::offset_type>(value->length());
+    RETURN_NOT_OK(CreateOffsetsBuffer(size, &offsets_buffer));
+
+    out_ =
+        std::make_shared<ArrayType>(scalar_.type, length_, offsets_buffer, value_array);
+    return Status::OK();
   }
 
-  Status Visit(const Decimal128Type&) {
-    auto value = checked_cast<const Decimal128Scalar&>(scalar_).value.ToBytes();
-    return FinishFixedWidth(value.data(), value.size());
+  Status Visit(const FixedSizeListType& type) {
+    auto value = checked_cast<const FixedSizeListScalar&>(scalar_).value;
+
+    ArrayVector values(length_, value);
+    ARROW_ASSIGN_OR_RAISE(auto value_array, Concatenate(values, pool_));
+
+    out_ = std::make_shared<FixedSizeListArray>(scalar_.type, length_, value_array);
+    return Status::OK();
+  }
+
+  Status Visit(const MapType& type) {
+    auto map_scalar = checked_cast<const MapScalar&>(scalar_);
+    auto struct_array = checked_cast<const StructArray*>(map_scalar.value.get());
+
+    ArrayVector keys(length_, struct_array->field(0));
+    ArrayVector values(length_, struct_array->field(1));
+
+    ARROW_ASSIGN_OR_RAISE(auto key_array, Concatenate(keys, pool_));

Review comment:
       We may want to add a more efficient implementations later.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on pull request #7684: ARROW-9374: [C++][Python] Expose MakeArrayFromScalar

Posted by GitBox <gi...@apache.org>.
kszucs commented on pull request #7684:
URL: https://github.com/apache/arrow/pull/7684#issuecomment-657537128


   It supports MapScalar, but not the union scalar because we'd need to have the type_code information stored in the union scalar value in order to properly reconstruct a union array if the union type contains identical types.
   
   Created a follow-up JIRA: https://issues.apache.org/jira/browse/ARROW-9434


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org