You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Ying Zhou <yz...@gmail.com> on 2021/02/07 20:29:45 UTC

[C++] RandomArrayGenerator::List bugs

Hi,

Recently I found a weird bug in RandomArrayGenerator.

RandomArrayGenerator::List consistently produces ListArrays with their length 1 below what they should be according to their documentation. Moreover the bitmaps we have are weird.

Here is some simple test:

TEST(TestAdapterWriteNested, ListTest) {
  int64_t num_rows = 2;
  static constexpr random::SeedType kRandomSeed2 = 0x0ff1ce;
  arrow::random::RandomArrayGenerator rand(kRandomSeed2);
  std::shared_ptr<Array> value_array = rand.ArrayOf(int32(), 2 * num_rows, 0.2);
  std::shared_ptr<Array> array = rand.List(*value_array, num_rows, 1);
  RecordProperty("bitmap",*(array->null_bitmap_data()));
  RecordProperty("length",array->length());
  RecordProperty("array",array->ToString());
}

Here are the results:

<testcase name="ListTest" status="run" result="completed" time="0" timestamp="2021-02-07T15:23:16" classname="TestAdapterWriteNested">
<properties>
<property name="bitmap" value="3"/>
<property name="length" value="1"/>
<property name="array" value="[&#x0A;  [&#x0A;    null,&#x0A;    1074834796,&#x0A;    551076274,&#x0A;    1184187771&#x0A;  ]&#x0A;]"/>
</properties>
    </testcase>

Here is what RandomArrayGenerator::List should do:

  /// \brief Generate a random ListArray
  ///
  /// \param[in] values The underlying values array
  /// \param[in] size The size of the generated list array
  /// \param[in] null_probability the probability of a list value being null
  /// \param[in] force_empty_nulls if true, null list entries must have 0 length
  ///
  /// \return a generated Array
  std::shared_ptr<Array> List(const Array& values, int64_t size, double null_probability,
                              bool force_empty_nulls = false);

Note that the generator failed in at least two aspects:
1. The length of the generated array is too low.
2. Even when null_probability is set to 1 there are still 1s in the bitmap. 
3. The size of the bitmap is larger than the size of the Array.

I’d like to know where we can find tests for arrow/testing/random. If they are absent I need to write them.

Thanks,
Ying


Re: [C++] RandomArrayGenerator::List bugs

Posted by Ying Zhou <yz...@gmail.com>.
A Jira ticket on this bug has been filed: https://issues.apache.org/jira/browse/ARROW-11548 <https://issues.apache.org/jira/browse/ARROW-11548> 

> On Feb 7, 2021, at 3:29 PM, Ying Zhou <yz...@gmail.com> wrote:
> 
> Hi,
> 
> Recently I found a weird bug in RandomArrayGenerator.
> 
> RandomArrayGenerator::List consistently produces ListArrays with their length 1 below what they should be according to their documentation. Moreover the bitmaps we have are weird.
> 
> Here is some simple test:
> 
> TEST(TestAdapterWriteNested, ListTest) {
>   int64_t num_rows = 2;
>   static constexpr random::SeedType kRandomSeed2 = 0x0ff1ce;
>   arrow::random::RandomArrayGenerator rand(kRandomSeed2);
>   std::shared_ptr<Array> value_array = rand.ArrayOf(int32(), 2 * num_rows, 0.2);
>   std::shared_ptr<Array> array = rand.List(*value_array, num_rows, 1);
>   RecordProperty("bitmap",*(array->null_bitmap_data()));
>   RecordProperty("length",array->length());
>   RecordProperty("array",array->ToString());
> }
> 
> Here are the results:
> 
> <testcase name="ListTest" status="run" result="completed" time="0" timestamp="2021-02-07T15:23:16" classname="TestAdapterWriteNested">
> <properties>
> <property name="bitmap" value="3"/>
> <property name="length" value="1"/>
> <property name="array" value="[&#x0A;  [&#x0A;    null,&#x0A;    1074834796,&#x0A;    551076274,&#x0A;    1184187771&#x0A;  ]&#x0A;]"/>
> </properties>
>     </testcase>
> 
> Here is what RandomArrayGenerator::List should do:
> 
>   /// \brief Generate a random ListArray
>   ///
>   /// \param[in] values The underlying values array
>   /// \param[in] size The size of the generated list array
>   /// \param[in] null_probability the probability of a list value being null
>   /// \param[in] force_empty_nulls if true, null list entries must have 0 length
>   ///
>   /// \return a generated Array
>   std::shared_ptr<Array> List(const Array& values, int64_t size, double null_probability,
>                               bool force_empty_nulls = false);
> 
> Note that the generator failed in at least two aspects:
> 1. The length of the generated array is too low.
> 2. Even when null_probability is set to 1 there are still 1s in the bitmap. 
> 3. The size of the bitmap is larger than the size of the Array.
> 
> I’d like to know where we can find tests for arrow/testing/random. If they are absent I need to write them.
> 
> Thanks,
> Ying
>