You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Alexander Saydakov (Jira)" <ji...@apache.org> on 2022/01/04 21:43:00 UTC
[jira] [Commented] (IMPALA-10901) Clean up Datasketches serialization and deserialization
[ https://issues.apache.org/jira/browse/IMPALA-10901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468867#comment-17468867 ]
Alexander Saydakov commented on IMPALA-10901:
---------------------------------------------
I believe this is resolved
> Clean up Datasketches serialization and deserialization
> -------------------------------------------------------
>
> Key: IMPALA-10901
> URL: https://issues.apache.org/jira/browse/IMPALA-10901
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Affects Versions: Impala 4.0.0
> Reporter: Gabor Kaszab
> Priority: Major
> Labels: datasketches
>
> (copy-paste from a mail thread)
> Regarding serialization using bytes as opposed to a stream. This has nothing to do with BINARY data type in Impala.
> Currently I see in the Impala code something like this (simplified):
> std::stringstream tmp;
> sketch.serialize(tmp);
> std::string str = tmp.str(); // in StringStreamToStringVal
> StringVal result(context, str.size());
> memcpy(result.ptr, str.c_str(), str.size());
> You could do it faster like this:
> auto bytes = sketch.serialize();
> StringVal result(context, bytes.size());
> memcpy(result.ptr, bytes.data() bytes.size());
> Regarding unnecessary constructor during deserialization. I see a code like this (HLL is an example, but the pattern is the same):
> datasketches::hll_sketch src_sketch(DS_SKETCH_CONFIG, DS_HLL_TYPE); // construct an empty sketch, which is not needed
> DeserializeDsSketch(src, &src_sketch); // pass it into a function, which will replace it by an assignment (hopefully a move, not copy)
> // in the function
> *sketch = T::deserialize((void*)serialized_sketch.ptr, serialized_sketch.len);
> This can be accomplished like so avoiding unnecessary constructor:
> datasketches::hll_sketch src_sketch = datasketches::hll_sketch::deserialize((void*)serialized_sketch.ptr, serialized_sketch.len);
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org