You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2022/06/22 08:32:00 UTC

[jira] [Resolved] (IMPALA-10901) Clean up Datasketches serialization and deserialization

     [ https://issues.apache.org/jira/browse/IMPALA-10901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Quanlong Huang resolved IMPALA-10901.
-------------------------------------
    Fix Version/s: Impala 4.1.0
       Resolution: Fixed

Resolving this. Thank [~alsay] !

> Clean up Datasketches serialization and deserialization
> -------------------------------------------------------
>
>                 Key: IMPALA-10901
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10901
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 4.0.0
>            Reporter: Gabor Kaszab
>            Assignee: Alexander Saydakov
>            Priority: Major
>              Labels: datasketches
>             Fix For: Impala 4.1.0
>
>
> (copy-paste from a mail thread)
> Regarding serialization using bytes as opposed to a stream. This has nothing to do with BINARY data type in Impala.
> Currently I see in the Impala code something like this (simplified):
> std::stringstream tmp;
> sketch.serialize(tmp);
> std::string str = tmp.str(); // in StringStreamToStringVal
> StringVal result(context, str.size());
> memcpy(result.ptr, str.c_str(), str.size());
> You could do it faster like this:
> auto bytes = sketch.serialize();
> StringVal result(context, bytes.size());
> memcpy(result.ptr, bytes.data() bytes.size());
> Regarding unnecessary constructor during deserialization. I see a code like this (HLL is an example, but the pattern is the same):
> datasketches::hll_sketch src_sketch(DS_SKETCH_CONFIG, DS_HLL_TYPE); // construct an empty sketch, which is not needed
> DeserializeDsSketch(src, &src_sketch); // pass it into a function, which will replace it by an assignment (hopefully a move, not copy)
> // in the function
> *sketch = T::deserialize((void*)serialized_sketch.ptr, serialized_sketch.len);
> This can be accomplished like so avoiding unnecessary constructor:
> datasketches::hll_sketch src_sketch = datasketches::hll_sketch::deserialize((void*)serialized_sketch.ptr, serialized_sketch.len);



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org