You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@carbondata.apache.org by Akash Nilugal <ak...@gmail.com> on 2018/08/27 06:21:36 UTC

[SUGGESTION]Support Decoder based fallback mechanism in local dictionary

Hi all,

Currently, when the fallback is initiated for a column page in case of
local dictionary, we are keeping both encoded data
and actual data in memory and then we form the new column page without
dictionary encoding and then at last we free the Encoded Column Page.
Because of this offheap memory footprint increases.

We can reduce the offheap memory footprint. This can be done using decoder
based fallback mechanism.
This means, no need to keep the actual data along with encoded data in
encoded column page. We can keep only encoded data and to form a new column
page, get the dictionary data from encoded column page by uncompressing and
using dictionary data get the actual data using local dictionary generator
and put it in new column page created and compress it again and give to
consumer for writing blocklet.

The above process may slow down the loading, but it will reduces the memory
footprint. So we can give a property which will decide whether to take
current fallback procedure or decoder based fallback mechanism dring
fallback.
Any inputs or suggestions are welcomed.


Regards,
Akash

Re: [SUGGESTION]Support Decoder based fallback mechanism in local dictionary

Posted by akashrn5 <ak...@gmail.com>.
As of now i will code as user property, and we can take desicion once we get
the performance report with this.



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [SUGGESTION]Support Decoder based fallback mechanism in local dictionary

Posted by manish gupta <to...@gmail.com>.
+1
@Akash..I suggest not to expose any property to the user for this. The
design should support this decision based on the property but to expose it
to the end user, this decision can be taken once you complete your
performance testing.

Regards
Manish Gupta

On Mon, 27 Aug 2018 at 1:57 PM, Kumar Vishal <ku...@gmail.com>
wrote:

> +1
> @ xuchuanyin
> This will not impact data map writing flow as actual column page will be
> cleared only after consuming all the records by data map writer,
> there will not be any change in that area.
>
> -Regards
> Kumar Vishal
> ,
>
> On Mon, Aug 27, 2018 at 1:01 PM xuchuanyin <xu...@hust.edu.cn> wrote:
>
> > This means, no need to keep the actual data along with encoded data in
> > encoded column page.
> > ---
> > A problem is that, currently index datamap needs the actual data to
> > generate
> > index. You may affect this procedure if you do not keep the actual data.
> >
> >
> >
> > --
> > Sent from:
> > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> >
>

Re: [SUGGESTION]Support Decoder based fallback mechanism in local dictionary

Posted by Kumar Vishal <ku...@gmail.com>.
+1
@ xuchuanyin
This will not impact data map writing flow as actual column page will be
cleared only after consuming all the records by data map writer,
there will not be any change in that area.

-Regards
Kumar Vishal
,

On Mon, Aug 27, 2018 at 1:01 PM xuchuanyin <xu...@hust.edu.cn> wrote:

> This means, no need to keep the actual data along with encoded data in
> encoded column page.
> ---
> A problem is that, currently index datamap needs the actual data to
> generate
> index. You may affect this procedure if you do not keep the actual data.
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>

Re: [SUGGESTION]Support Decoder based fallback mechanism in local dictionary

Posted by xuchuanyin <xu...@hust.edu.cn>.
This means, no need to keep the actual data along with encoded data in 
encoded column page. 
---
A problem is that, currently index datamap needs the actual data to generate
index. You may affect this procedure if you do not keep the actual data.



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [SUGGESTION]Support Decoder based fallback mechanism in local dictionary

Posted by Akash Nilugal <ak...@gmail.com>.
Hi all,

With PR https://github.com/apache/carbondata/pull/2662
i have tested the performance and memory requirement with decoder based
fallback for local dictionary and the results are as below

1. with current implementation, data loading of 3million data was taking
around 4GB when local dictionary was enabled which is almost 10times the
memory required to load same data when local dictionary is disabled.
  With decoder based fall back, the memory requirement is reduced from
10times to almost 2 times.


2. The dataloading performance is as below.
With the current implementation, the data loading of 1 billlion data takes
around 1.1hrs
and with decoder based fallback it takes 1.2hrs, which is not much
difference, but memory requirement is reduced more.
I think this PR will help.

Consolidated points.
1. store size didn't get impacted
2. GC time didn't get impacted
3. Time impact is low as mentioned above
4. memory requirement reduced to higher level



Regards,
Akash R Nilugal

On Mon, Aug 27, 2018 at 11:51 AM Akash Nilugal <ak...@gmail.com>
wrote:

> Hi all,
>
> Currently, when the fallback is initiated for a column page in case of
> local dictionary, we are keeping both encoded data
> and actual data in memory and then we form the new column page without
> dictionary encoding and then at last we free the Encoded Column Page.
> Because of this offheap memory footprint increases.
>
> We can reduce the offheap memory footprint. This can be done using decoder
> based fallback mechanism.
> This means, no need to keep the actual data along with encoded data in
> encoded column page. We can keep only encoded data and to form a new column
> page, get the dictionary data from encoded column page by uncompressing and
> using dictionary data get the actual data using local dictionary generator
> and put it in new column page created and compress it again and give to
> consumer for writing blocklet.
>
> The above process may slow down the loading, but it will reduces the
> memory footprint. So we can give a property which will decide whether to
> take current fallback procedure or decoder based fallback mechanism dring
> fallback.
> Any inputs or suggestions are welcomed.
>
>
> Regards,
> Akash
>