You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "pitrou (via GitHub)" <gi...@apache.org> on 2023/05/23 16:04:39 UTC

[GitHub] [arrow] pitrou commented on a diff in pull request #35691: GH-34785: [C++][Parquet] Skeleton for Parquet bloom filter writer

pitrou commented on code in PR #35691:
URL: https://github.com/apache/arrow/pull/35691#discussion_r1202627334


##########
cpp/src/parquet/column_writer.cc:
##########
@@ -2299,12 +2319,112 @@ Status TypedColumnWriterImpl<FLBAType>::WriteArrowDense(
   return Status::OK();
 }
 
+template <typename DType>
+void TypedColumnWriterImpl<DType>::UpdateBloomFilter(const T* values,
+                                                     int64_t num_values) {
+  if (bloom_filter_) {
+    for (int64_t i = 0; i < num_values; ++i) {
+      bloom_filter_->InsertHash(bloom_filter_->Hash(values + i));

Review Comment:
   This will be very slow in any case, because there is a double virtual indirection (first `bloom_filter_->Hash`, then `hasher_->Hash`).
   
   I think you want to add batch methods, such as `BloomFilter::InsertHashes(const double* values, int64_t num_values)`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org