You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@carbondata.apache.org by Mahesh Raju Somalaraju <ma...@gmail.com> on 2022/04/05 06:55:19 UTC

Re: Carbondata Compacted file is bigger than original files

Hi Chin wei Low,

Normally the compacted folder file size should be the same or less with
original segment folders.
I need to understand in which case it will be big. Can you please provide
the scenario steps which you are executing so that i can execute and check
it.
1) create table with properties
2) data load

Reference:
https://github.com/apache/carbondata/blob/master/docs/ddl-of-carbondata.md#table-compaction-configuration
https://github.com/apache/carbondata/blob/master/docs/dml-of-carbondata.md#compaction


Thanks & Regards
-Mahesh Raju S
(github id: maheshrajus)

On Mon, Mar 28, 2022 at 3:21 PM Chin Wei Low <lo...@gmail.com> wrote:

> Hi Community,
>
> When I run compact 'MINOR' on a table, the compacted files (those with .1)
> are bigger than the total size of the original carbondata files. How can I
> check why this happened? Anyone knows what is happening?
>
> Regards,
> Chin Wei
>

Re: Carbondata Compacted file is bigger than original files

Posted by Chin Wei Low <lo...@gmail.com>.
Hi Mahesh,

The sample table schema:
create table if not exists test_table (ts bigint, field STRING, tag STRING,
measurement DOUBLE, write_ts bigint)
partitioned by (day bigint)
STORED AS carbondata
TBLPROPERTIES ('SORT_COLUMNS'='field')

Carbon properties:
carbon.compaction.level.threshold=10,6
carbon.enable.auto.load.merge=false

Steps:
1. Using 'load data inpath' to load csv into the table. Each csv has around
1million records.
2. After 10 batches, run compact 'MINOR' on the table.

Regards,
Chin Wei

On Tue, Apr 5, 2022 at 2:55 PM Mahesh Raju Somalaraju <
maheshraju.onos@gmail.com> wrote:

> Hi Chin wei Low,
>
> Normally the compacted folder file size should be the same or less with
> original segment folders.
> I need to understand in which case it will be big. Can you please provide
> the scenario steps which you are executing so that i can execute and check
> it.
> 1) create table with properties
> 2) data load
>
> Reference:
>
> https://github.com/apache/carbondata/blob/master/docs/ddl-of-carbondata.md#table-compaction-configuration
>
> https://github.com/apache/carbondata/blob/master/docs/dml-of-carbondata.md#compaction
>
>
> Thanks & Regards
> -Mahesh Raju S
> (github id: maheshrajus)
>
> On Mon, Mar 28, 2022 at 3:21 PM Chin Wei Low <lo...@gmail.com> wrote:
>
> > Hi Community,
> >
> > When I run compact 'MINOR' on a table, the compacted files (those with
> .1)
> > are bigger than the total size of the original carbondata files. How can
> I
> > check why this happened? Anyone knows what is happening?
> >
> > Regards,
> > Chin Wei
> >
>