You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@datasketches.apache.org by GitBox <gi...@apache.org> on 2021/01/28 11:25:38 UTC

[GitHub] [datasketches-cpp] chufucun edited a comment on pull request #185: Matches the Theta interface of the Java implementation

chufucun edited a comment on pull request #185:
URL: https://github.com/apache/datasketches-cpp/pull/185#issuecomment-768986515


   > It seems that this change breaks something in Python wrapper.
   > I never liked this mixed union functionality. In my view it is just unnecessary complexity. That is why I did not implement it in C++.
   > There is a complication: we have a new implementation of Theta sketch, which was developed as a part of Tuple sketch effort. The goal was to have a common base for both Theta and Tuple sketches. Currently this new Theta sketch implementation is sitting under Tuple sketch as experimental. I think it is time to make the move: replace the existing Theta sketch with this new implementation. I think it would be better to make sure that your integration works with this new version.
   > Perhaps we could add these update methods to the Theta and Tuple unions, but I still think that they are not necessary.
   
   thx reply.
   Use udaf function to integrate theta into impala. This function mainly affects the function logic of the merge phase.
   I found a solution, but the code doesn't look very elegant, I will test the performance and reply again。
   
   **Merge function:**
   The `dst` parameter is used to store intermediate results and is initialized in the init phase
   
   1. use update_theta_sketch and theta_union
   ```c++
   void AggregateFunctions::DsThetaMerge(
       FunctionContext* ctx, const StringVal& src, StringVal* dst) {
     DCHECK(!src.is_null);
     DCHECK(!dst->is_null);
     DCHECK(dst->len == sizeof(datasketches::update_theta_sketch)
         or dst->len == sizeof(datasketches::compact_theta_sketch));
     auto src_sketch = datasketches::theta_sketch::deserialize((void*)src.ptr, src.len);
     if (src_sketch->is_empty()) return;
   
     auto dst_sketch_ptr = reinterpret_cast<datasketches::theta_sketch*>(dst->ptr);
   
     datasketches::theta_union union_sketch = datasketches::theta_union::builder().build();
     union_sketch.update(*src_sketch);
     union_sketch.update(*dst_sketch_ptr);
   
     // Note, union_result may not match the type of dst and need to reallocate space
     const datasketches::compact_theta_sketch union_result = union_sketch.get_result();
     ctx->Reallocate(dst->ptr, sizeof(datasketches::compact_theta_sketch));
     if (LIKELY(dst->ptr != NULL)) {
       memset(dst->ptr, 0, sizeof(datasketches::compact_theta_sketch));
       datasketches::compact_theta_sketch* sketch_ptr =
           reinterpret_cast<datasketches::compact_theta_sketch*>(dst->ptr);
       std::uninitialized_fill_n(sketch_ptr, 1, union_result);
     } else {
       DCHECK(!ctx->impl()->state()->GetQueryStatus().ok());
     }
   }
   ```
   2. use theta_union(need add new update functionality)
   ```c++
   void AggregateFunctions::DsThetaMerge(
       FunctionContext* ctx, const StringVal& src, StringVal* dst) {
     DCHECK(!src.is_null);
     DCHECK(!dst->is_null);
     DCHECK_EQ(dst->len, sizeof(datasketches::theta_union));
   
     // Note, 'src' is a serialized compact_theta_sketch and not a serialized theta_union.
     auto src_sketch =
         datasketches::compact_theta_sketch::deserialize((void*)src.ptr, src.len);
   
     datasketches::theta_union* dst_union_ptr =
         reinterpret_cast<datasketches::theta_union*>(dst->ptr);
   
     dst_union_ptr->update(src_sketch);
   }
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org