You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by Roberto Tardío <ro...@stratebi.com> on 2018/02/06 17:02:44 UTC

Segments magnagement and auto merging

Hi,

I have to generate a big cube, about 400 M rows of historical data (and 
many dimensions in small-mid size cluster). To avoid a very big cube 
building process, I  divided this process into month periods (about 
30-40 M rows per month). When this process finish, an hourly load 
process will begin. Then we will have several historical monthly 
segments and then, new incremental hourly segments. About this scenario, 
arise me the following questions:

  * Do you recommend merge all the historical segments?
      o Sometimes we will need to rebuilt some month from the last six
        months. Due to the cube size, we thougth will be faster to
        rebuilt just a month segment.
  * I' going to define the following auto merge times after we get all
    historical data, for hourly incremental load.
      o 1 day
      o 7 days
      o 28 days
      o I understand well, this means that
          + Every day, all hourly segments will be merged.
          + Every 7 days, all daily segments will be merged.
          + Every 28 days, all 7 days segments will be merged.
      o This config arises my two questions:
          + 28 days segments will be automatically merged any time?
          + our historical big segments will be automatically merged?
  * I thougth that maybe I need to develop an script that merge segments
    as I need (using kylin rest API), instead of using Kylin cube auto
    merge option.

Thanks in advance,

Roberto

-- 

*Roberto Tardío Olmos*

/Senior Big Data & Business Intelligence Consultant/
Avenida de Brasil, 17, Planta 16.28020 Madrid
Fijo: 91.788.34.10