You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Gabor Kaszab (Jira)" <ji...@apache.org> on 2021/08/31 06:43:00 UTC

[jira] [Created] (IMPALA-10901) Clean up Datasketches serialization and deserialization

Gabor Kaszab created IMPALA-10901:
-------------------------------------

             Summary: Clean up Datasketches serialization and deserialization
                 Key: IMPALA-10901
                 URL: https://issues.apache.org/jira/browse/IMPALA-10901
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend
    Affects Versions: Impala 4.0.0
            Reporter: Gabor Kaszab


(copy-paste from a mail thread)

Regarding serialization using bytes as opposed to a stream. This has nothing to do with BINARY data type in Impala.
Currently I see in the Impala code something like this (simplified):
std::stringstream tmp;
sketch.serialize(tmp);
std::string str = tmp.str(); // in StringStreamToStringVal
StringVal result(context, str.size());
memcpy(result.ptr, str.c_str(), str.size());

You could do it faster like this:
auto bytes = sketch.serialize();
StringVal result(context, bytes.size());
memcpy(result.ptr, bytes.data() bytes.size());

Regarding unnecessary constructor during deserialization. I see a code like this (HLL is an example, but the pattern is the same):
datasketches::hll_sketch src_sketch(DS_SKETCH_CONFIG, DS_HLL_TYPE); // construct an empty sketch, which is not needed
DeserializeDsSketch(src, &src_sketch); // pass it into a function, which will replace it by an assignment (hopefully a move, not copy)
// in the function
*sketch = T::deserialize((void*)serialized_sketch.ptr, serialized_sketch.len);

This can be accomplished like so avoiding unnecessary constructor:
datasketches::hll_sketch src_sketch = datasketches::hll_sketch::deserialize((void*)serialized_sketch.ptr, serialized_sketch.len);



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org