You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by Samuel Bock <sb...@marinsoftware.com> on 2015/02/19 19:40:41 UTC

OutOfMemoryError on step #3 of Cube build

Hello all,

We are in the process of evaluating Kylin for use as an OLAP engine. To
that end, we are trying to get a minimum viable setup with a representative
sample of our data in order to gather performance metrics. We have kylin
running against a 10 node cluster, the provided cubes build successfully
and the system seems functional. Attempting to build a simple cube against
our data results in an OutOfMemoryError in the kylin server process (so far
we have given it up to a 46 gig heap). I was wondering if you could give me
some guidance as to likely causes, any configurations I'm likely to have
missed before I start diving into the source. I have changed the
"dictionary" setting to false, as recommended for high-cardinality
dimensions, but have not changed configuration significantly apart from
that.

For reference, the sizes of the hive tables we're building the cubes from
dimension table: 25,399,061 rows
fact table: 270,940,921 rows

(And as a note, there are no pertinent log messages except to indicate that
it is in the Build Dimension Dictionary step)

Thank you,
sam bock

Re: OutOfMemoryError on step #3 of Cube build

Posted by Samuel Bock <sb...@marinsoftware.com>.
Thank you for the follow up,

Our dimension table is 25 million rows for our test data set, and would be
far larger in production. Given that, it sounds like our data doesn't fit
the Kylin use case. I appreciate the assistance in tracking down the source
of this issue,

cheers,
sam

On Tue, Feb 24, 2015 at 7:28 PM, Shi, Shaofeng <sh...@ebay.com> wrote:

> Hi Samuel,
>
> Kylin only supports the star schema: only 1 fact table join with multiple
> lookup tables. The lookup table need be small so that Kylin can read them
> into memory for join and cube build. Also as you found, Kylin will take
> snapshot on the lookup tables and persist them in Hbase; That should be
> the problem. In your case, how many rows there in the KEYWORDS table?
>
> On 2/21/15, 2:12 AM, "Samuel Bock" <sb...@marinsoftware.com> wrote:
>
> >Thank you for you response,
> >
> >I went into the code, and I'm fairly confident that I've isolated the
> >problem. The OutOfMemoryError is part of the dimension dictionary step,
> >but
> >is not actually related to the dictionary itself (since, as you mentioned,
> >that is skipped when dictionary=false). The problem arises from the second
> >half of that step in which it builds the dimension table snapshot. Looking
> >at the code, the process of building the snapshot table loads in the
> >entire
> >table into memory as strings (SnapshotTable.takeSnapshot), then serializes
> >that to an in memory ByteArrayOutputStream (ResourceStore.putResource),
> >then finally creates a copy of the internal byte array from the stream in
> >order to store it in HBase (HBaseResourceStore.checkAndPutResourceImpl).
> >That means that there needs to be space for three in-memory copies of the
> >full dimension table. Given that even our test subset dimension table is
> >25
> >million rows, 14 columns, that becomes problematic. From experimentation,
> >it breaks even with 95 gig heap.
> >
> >For completeness, the log leading up to the crash (minus the pointless zk
> >messages) is:
> > - Start to execute command:
> > -cubename foo -segmentname FULL_BUILD -input
> >/tmp/kylin-7d2b7588-17c0-4d80-9962-14ca63929186/foo/fact_distinct_columns
> >[QuartzScheduler_Worker-1]:[2015-02-19
> >22:59:01,284][INFO][com.kylinolap.cube.cli.DictionaryGeneratorCLI.processS
> >egment(DictionaryGeneratorCLI.java:57)]
> >- Building snapshot of KEYWORDS
> >[QuartzScheduler_Worker-2]:[2015-02-19
> >22:59:53,241][DEBUG][com.kylinolap.job.engine.JobFetcher.execute(JobFetche
> >r.java:60)]
> >- 0 pending jobs
> >[QuartzScheduler_Worker-3]:[2015-02-19
> >23:00:53,252][DEBUG][com.kylinolap.job.engine.JobFetcher.execute(JobFetche
> >r.java:60)]
> >- 0 pending jobs
> >[QuartzScheduler_Worker-1]:[2015-02-19
> >23:01:01,278][INFO][com.kylinolap.dict.lookup.FileTableReader.autoDetectDe
> >lim(FileTableReader.java:156)]
> >- Auto detect delim to be ' ', split line to 14 columns --
> >1020_18768_4_127200_4647593_group_341686994 group 19510703 0 18768 1020
> >341686994 4647593 371981 4 127200 CONTENT 2015-01-21 22:16:36.227246
> >[http-bio-7070-exec-8]:[2015-02-19
> >23:02:07,980][DEBUG][com.kylinolap.rest.service.AdminService.getConfigAsSt
> >ring(AdminService.java:91)]
> >- Get Kylin Runtime Config
> >[QuartzScheduler_Worker-4]:[2015-02-19
> >23:02:53,934][DEBUG][com.kylinolap.job.engine.JobFetcher.execute(JobFetche
> >r.java:60)]
> >- 0 pending jobs
> >[QuartzScheduler_Worker-1]:[2015-02-19
> >23:03:10,216][DEBUG][com.kylinolap.common.persistence.ResourceStore.putRes
> >ource(ResourceStore.java:166)]
> >- Saving resource
> >/table_snapshot/part-00000.csv/f87954d5-fdfa-4903-9f82-771d85df6367.snapsh
> >ot
> >(Store kylin_metadata_qa@hbase)
> >[QuartzScheduler_Worker-6]:[2015-02-19
> >23:04:53,230][DEBUG][com.kylinolap.job.engine.JobFetcher.execute(JobFetche
> >r.java:60)]
> >- 0 pending jobs
> >java.lang.OutOfMemoryError: Requested array size exceeds VM limit
> >Dumping heap to java_pid3705.hprof ...
> >
> >
> >The cube JSON is:
> >
> >{
> >  "uuid": "ba6105ca-a18d-4839-bed0-c89b86817110",
> >  "name": "foo",
> >  "description": "",
> >  "dimensions": [
> >    {
> >      "id": 1,
> >      "name": "KEYWORDS_DERIVED",
> >      "join": {
> >        "type": "left",
> >        "primary_key": [
> >          "DIM_ID"
> >        ],
> >        "foreign_key": [
> >          "KEYWORD_DIM_ID"
> >        ]
> >      },
> >      "hierarchy": null,
> >      "table": "KEYWORDS",
> >      "column": "{FK}",
> >      "datatype": null,
> >      "derived": [
> >        "PUBLISHER_GROUP_ID",
> >        "PUBLISHER_CAMPAIGN_ID",
> >        "PUBLISHER_ID"
> >      ]
> >    }
> >  ],
> >  "measures": [
> >    {
> >      "id": 1,
> >      "name": "_COUNT_",
> >      "function": {
> >        "expression": "COUNT",
> >        "parameter": {
> >          "type": "constant",
> >          "value": "1"
> >        },
> >        "returntype": "bigint"
> >      },
> >      "dependent_measure_ref": null
> >    },
> >    {
> >      "id": 2,
> >      "name": "CONVERSIONS",
> >      "function": {
> >        "expression": "SUM",
> >        "parameter": {
> >          "type": "column",
> >          "value": "CONVERSIONS"
> >        },
> >        "returntype": "bigint"
> >      },
> >      "dependent_measure_ref": null
> >    }
> >  ],
> >  "rowkey": {
> >    "rowkey_columns": [
> >      {
> >        "column": "KEYWORD_DIM_ID",
> >        "length": 0,
> >        "dictionary": "false",
> >        "mandatory": false
> >      }
> >    ],
> >    "aggregation_groups": [
> >      [
> >        "KEYWORD_DIM_ID"
> >      ]
> >    ]
> >  },
> >  "signature": "T+aYTH/KlCwwmVAGRQR3hQ==",
> >  "capacity": "LARGE",
> >  "last_modified": 1424367558297,
> >  "fact_table": "FACTS",
> >  "null_string": null,
> >  "filter_condition": "KEYWORDS.PUBLISHER_GROUP_ID=386784931",
> >  "cube_partition_desc": {
> >    "partition_date_column": null,
> >    "partition_date_start": 0,
> >    "cube_partition_type": "APPEND"
> >  },
> >  "hbase_mapping": {
> >    "column_family": [
> >      {
> >        "name": "F1",
> >        "columns": [
> >          {
> >            "qualifier": "M",
> >            "measure_refs": [
> >              "_COUNT_",
> >              "CONVERSIONS"
> >            ]
> >          }
> >        ]
> >      }
> >    ]
> >  },
> >  "notify_list": [
> >    "sam"
> >  ]
> >}
> >
> >
> >Cheers,
> >sam
> >
> >On Thu, Feb 19, 2015 at 9:49 PM, 周千昊 <z....@gmail.com> wrote:
> >
> >> Also since you set the dictionary to false, there should not be any
> >>memory
> >> consuming while building dictionary.
> >> So can you also give us the json description of the cube?(in the cube
> >>tab,
> >> click the corresponding cube, click the json button)
> >>
> >>
> >> On Fri Feb 20 2015 at 1:39:15 PM 周千昊 <z....@gmail.com> wrote:
> >>
> >> > Hi, Samuel
> >> >      Can you give us some detail log, so we can dig into the root
> >>cause
> >> >
> >> > On Fri Feb 20 2015 at 2:44:32 AM Samuel Bock <sbock@marinsoftware.com
> >
> >> > wrote:
> >> >
> >> >> Hello all,
> >> >>
> >> >> We are in the process of evaluating Kylin for use as an OLAP engine.
> >>To
> >> >> that end, we are trying to get a minimum viable setup with a
> >> >> representative
> >> >> sample of our data in order to gather performance metrics. We have
> >>kylin
> >> >> running against a 10 node cluster, the provided cubes build
> >>successfully
> >> >> and the system seems functional. Attempting to build a simple cube
> >> against
> >> >> our data results in an OutOfMemoryError in the kylin server process
> >>(so
> >> >> far
> >> >> we have given it up to a 46 gig heap). I was wondering if you could
> >>give
> >> >> me
> >> >> some guidance as to likely causes, any configurations I'm likely to
> >>have
> >> >> missed before I start diving into the source. I have changed the
> >> >> "dictionary" setting to false, as recommended for high-cardinality
> >> >> dimensions, but have not changed configuration significantly apart
> >>from
> >> >> that.
> >> >>
> >> >> For reference, the sizes of the hive tables we're building the cubes
> >> from
> >> >> dimension table: 25,399,061 rows
> >> >> fact table: 270,940,921 rows
> >> >>
> >> >> (And as a note, there are no pertinent log messages except to
> >>indicate
> >> >> that
> >> >> it is in the Build Dimension Dictionary step)
> >> >>
> >> >> Thank you,
> >> >> sam bock
> >> >>
> >> >
> >>
>
>

Re: OutOfMemoryError on step #3 of Cube build

Posted by "Shi, Shaofeng" <sh...@ebay.com>.
Hi Samuel,

Kylin only supports the star schema: only 1 fact table join with multiple
lookup tables. The lookup table need be small so that Kylin can read them
into memory for join and cube build. Also as you found, Kylin will take
snapshot on the lookup tables and persist them in Hbase; That should be
the problem. In your case, how many rows there in the KEYWORDS table?

On 2/21/15, 2:12 AM, "Samuel Bock" <sb...@marinsoftware.com> wrote:

>Thank you for you response,
>
>I went into the code, and I'm fairly confident that I've isolated the
>problem. The OutOfMemoryError is part of the dimension dictionary step,
>but
>is not actually related to the dictionary itself (since, as you mentioned,
>that is skipped when dictionary=false). The problem arises from the second
>half of that step in which it builds the dimension table snapshot. Looking
>at the code, the process of building the snapshot table loads in the
>entire
>table into memory as strings (SnapshotTable.takeSnapshot), then serializes
>that to an in memory ByteArrayOutputStream (ResourceStore.putResource),
>then finally creates a copy of the internal byte array from the stream in
>order to store it in HBase (HBaseResourceStore.checkAndPutResourceImpl).
>That means that there needs to be space for three in-memory copies of the
>full dimension table. Given that even our test subset dimension table is
>25
>million rows, 14 columns, that becomes problematic. From experimentation,
>it breaks even with 95 gig heap.
>
>For completeness, the log leading up to the crash (minus the pointless zk
>messages) is:
> - Start to execute command:
> -cubename foo -segmentname FULL_BUILD -input
>/tmp/kylin-7d2b7588-17c0-4d80-9962-14ca63929186/foo/fact_distinct_columns
>[QuartzScheduler_Worker-1]:[2015-02-19
>22:59:01,284][INFO][com.kylinolap.cube.cli.DictionaryGeneratorCLI.processS
>egment(DictionaryGeneratorCLI.java:57)]
>- Building snapshot of KEYWORDS
>[QuartzScheduler_Worker-2]:[2015-02-19
>22:59:53,241][DEBUG][com.kylinolap.job.engine.JobFetcher.execute(JobFetche
>r.java:60)]
>- 0 pending jobs
>[QuartzScheduler_Worker-3]:[2015-02-19
>23:00:53,252][DEBUG][com.kylinolap.job.engine.JobFetcher.execute(JobFetche
>r.java:60)]
>- 0 pending jobs
>[QuartzScheduler_Worker-1]:[2015-02-19
>23:01:01,278][INFO][com.kylinolap.dict.lookup.FileTableReader.autoDetectDe
>lim(FileTableReader.java:156)]
>- Auto detect delim to be ' ', split line to 14 columns --
>1020_18768_4_127200_4647593_group_341686994 group 19510703 0 18768 1020
>341686994 4647593 371981 4 127200 CONTENT 2015-01-21 22:16:36.227246
>[http-bio-7070-exec-8]:[2015-02-19
>23:02:07,980][DEBUG][com.kylinolap.rest.service.AdminService.getConfigAsSt
>ring(AdminService.java:91)]
>- Get Kylin Runtime Config
>[QuartzScheduler_Worker-4]:[2015-02-19
>23:02:53,934][DEBUG][com.kylinolap.job.engine.JobFetcher.execute(JobFetche
>r.java:60)]
>- 0 pending jobs
>[QuartzScheduler_Worker-1]:[2015-02-19
>23:03:10,216][DEBUG][com.kylinolap.common.persistence.ResourceStore.putRes
>ource(ResourceStore.java:166)]
>- Saving resource
>/table_snapshot/part-00000.csv/f87954d5-fdfa-4903-9f82-771d85df6367.snapsh
>ot
>(Store kylin_metadata_qa@hbase)
>[QuartzScheduler_Worker-6]:[2015-02-19
>23:04:53,230][DEBUG][com.kylinolap.job.engine.JobFetcher.execute(JobFetche
>r.java:60)]
>- 0 pending jobs
>java.lang.OutOfMemoryError: Requested array size exceeds VM limit
>Dumping heap to java_pid3705.hprof ...
>
>
>The cube JSON is:
>
>{
>  "uuid": "ba6105ca-a18d-4839-bed0-c89b86817110",
>  "name": "foo",
>  "description": "",
>  "dimensions": [
>    {
>      "id": 1,
>      "name": "KEYWORDS_DERIVED",
>      "join": {
>        "type": "left",
>        "primary_key": [
>          "DIM_ID"
>        ],
>        "foreign_key": [
>          "KEYWORD_DIM_ID"
>        ]
>      },
>      "hierarchy": null,
>      "table": "KEYWORDS",
>      "column": "{FK}",
>      "datatype": null,
>      "derived": [
>        "PUBLISHER_GROUP_ID",
>        "PUBLISHER_CAMPAIGN_ID",
>        "PUBLISHER_ID"
>      ]
>    }
>  ],
>  "measures": [
>    {
>      "id": 1,
>      "name": "_COUNT_",
>      "function": {
>        "expression": "COUNT",
>        "parameter": {
>          "type": "constant",
>          "value": "1"
>        },
>        "returntype": "bigint"
>      },
>      "dependent_measure_ref": null
>    },
>    {
>      "id": 2,
>      "name": "CONVERSIONS",
>      "function": {
>        "expression": "SUM",
>        "parameter": {
>          "type": "column",
>          "value": "CONVERSIONS"
>        },
>        "returntype": "bigint"
>      },
>      "dependent_measure_ref": null
>    }
>  ],
>  "rowkey": {
>    "rowkey_columns": [
>      {
>        "column": "KEYWORD_DIM_ID",
>        "length": 0,
>        "dictionary": "false",
>        "mandatory": false
>      }
>    ],
>    "aggregation_groups": [
>      [
>        "KEYWORD_DIM_ID"
>      ]
>    ]
>  },
>  "signature": "T+aYTH/KlCwwmVAGRQR3hQ==",
>  "capacity": "LARGE",
>  "last_modified": 1424367558297,
>  "fact_table": "FACTS",
>  "null_string": null,
>  "filter_condition": "KEYWORDS.PUBLISHER_GROUP_ID=386784931",
>  "cube_partition_desc": {
>    "partition_date_column": null,
>    "partition_date_start": 0,
>    "cube_partition_type": "APPEND"
>  },
>  "hbase_mapping": {
>    "column_family": [
>      {
>        "name": "F1",
>        "columns": [
>          {
>            "qualifier": "M",
>            "measure_refs": [
>              "_COUNT_",
>              "CONVERSIONS"
>            ]
>          }
>        ]
>      }
>    ]
>  },
>  "notify_list": [
>    "sam"
>  ]
>}
>
>
>Cheers,
>sam
>
>On Thu, Feb 19, 2015 at 9:49 PM, 周千昊 <z....@gmail.com> wrote:
>
>> Also since you set the dictionary to false, there should not be any
>>memory
>> consuming while building dictionary.
>> So can you also give us the json description of the cube?(in the cube
>>tab,
>> click the corresponding cube, click the json button)
>>
>>
>> On Fri Feb 20 2015 at 1:39:15 PM 周千昊 <z....@gmail.com> wrote:
>>
>> > Hi, Samuel
>> >      Can you give us some detail log, so we can dig into the root
>>cause
>> >
>> > On Fri Feb 20 2015 at 2:44:32 AM Samuel Bock <sb...@marinsoftware.com>
>> > wrote:
>> >
>> >> Hello all,
>> >>
>> >> We are in the process of evaluating Kylin for use as an OLAP engine.
>>To
>> >> that end, we are trying to get a minimum viable setup with a
>> >> representative
>> >> sample of our data in order to gather performance metrics. We have
>>kylin
>> >> running against a 10 node cluster, the provided cubes build
>>successfully
>> >> and the system seems functional. Attempting to build a simple cube
>> against
>> >> our data results in an OutOfMemoryError in the kylin server process
>>(so
>> >> far
>> >> we have given it up to a 46 gig heap). I was wondering if you could
>>give
>> >> me
>> >> some guidance as to likely causes, any configurations I'm likely to
>>have
>> >> missed before I start diving into the source. I have changed the
>> >> "dictionary" setting to false, as recommended for high-cardinality
>> >> dimensions, but have not changed configuration significantly apart
>>from
>> >> that.
>> >>
>> >> For reference, the sizes of the hive tables we're building the cubes
>> from
>> >> dimension table: 25,399,061 rows
>> >> fact table: 270,940,921 rows
>> >>
>> >> (And as a note, there are no pertinent log messages except to
>>indicate
>> >> that
>> >> it is in the Build Dimension Dictionary step)
>> >>
>> >> Thank you,
>> >> sam bock
>> >>
>> >
>>


Re: OutOfMemoryError on step #3 of Cube build

Posted by Samuel Bock <sb...@marinsoftware.com>.
Thank you for you response,

I went into the code, and I'm fairly confident that I've isolated the
problem. The OutOfMemoryError is part of the dimension dictionary step, but
is not actually related to the dictionary itself (since, as you mentioned,
that is skipped when dictionary=false). The problem arises from the second
half of that step in which it builds the dimension table snapshot. Looking
at the code, the process of building the snapshot table loads in the entire
table into memory as strings (SnapshotTable.takeSnapshot), then serializes
that to an in memory ByteArrayOutputStream (ResourceStore.putResource),
then finally creates a copy of the internal byte array from the stream in
order to store it in HBase (HBaseResourceStore.checkAndPutResourceImpl).
That means that there needs to be space for three in-memory copies of the
full dimension table. Given that even our test subset dimension table is 25
million rows, 14 columns, that becomes problematic. From experimentation,
it breaks even with 95 gig heap.

For completeness, the log leading up to the crash (minus the pointless zk
messages) is:
 - Start to execute command:
 -cubename foo -segmentname FULL_BUILD -input
/tmp/kylin-7d2b7588-17c0-4d80-9962-14ca63929186/foo/fact_distinct_columns
[QuartzScheduler_Worker-1]:[2015-02-19
22:59:01,284][INFO][com.kylinolap.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:57)]
- Building snapshot of KEYWORDS
[QuartzScheduler_Worker-2]:[2015-02-19
22:59:53,241][DEBUG][com.kylinolap.job.engine.JobFetcher.execute(JobFetcher.java:60)]
- 0 pending jobs
[QuartzScheduler_Worker-3]:[2015-02-19
23:00:53,252][DEBUG][com.kylinolap.job.engine.JobFetcher.execute(JobFetcher.java:60)]
- 0 pending jobs
[QuartzScheduler_Worker-1]:[2015-02-19
23:01:01,278][INFO][com.kylinolap.dict.lookup.FileTableReader.autoDetectDelim(FileTableReader.java:156)]
- Auto detect delim to be ' ', split line to 14 columns --
1020_18768_4_127200_4647593_group_341686994 group 19510703 0 18768 1020
341686994 4647593 371981 4 127200 CONTENT 2015-01-21 22:16:36.227246
[http-bio-7070-exec-8]:[2015-02-19
23:02:07,980][DEBUG][com.kylinolap.rest.service.AdminService.getConfigAsString(AdminService.java:91)]
- Get Kylin Runtime Config
[QuartzScheduler_Worker-4]:[2015-02-19
23:02:53,934][DEBUG][com.kylinolap.job.engine.JobFetcher.execute(JobFetcher.java:60)]
- 0 pending jobs
[QuartzScheduler_Worker-1]:[2015-02-19
23:03:10,216][DEBUG][com.kylinolap.common.persistence.ResourceStore.putResource(ResourceStore.java:166)]
- Saving resource
/table_snapshot/part-00000.csv/f87954d5-fdfa-4903-9f82-771d85df6367.snapshot
(Store kylin_metadata_qa@hbase)
[QuartzScheduler_Worker-6]:[2015-02-19
23:04:53,230][DEBUG][com.kylinolap.job.engine.JobFetcher.execute(JobFetcher.java:60)]
- 0 pending jobs
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
Dumping heap to java_pid3705.hprof ...


The cube JSON is:

{
  "uuid": "ba6105ca-a18d-4839-bed0-c89b86817110",
  "name": "foo",
  "description": "",
  "dimensions": [
    {
      "id": 1,
      "name": "KEYWORDS_DERIVED",
      "join": {
        "type": "left",
        "primary_key": [
          "DIM_ID"
        ],
        "foreign_key": [
          "KEYWORD_DIM_ID"
        ]
      },
      "hierarchy": null,
      "table": "KEYWORDS",
      "column": "{FK}",
      "datatype": null,
      "derived": [
        "PUBLISHER_GROUP_ID",
        "PUBLISHER_CAMPAIGN_ID",
        "PUBLISHER_ID"
      ]
    }
  ],
  "measures": [
    {
      "id": 1,
      "name": "_COUNT_",
      "function": {
        "expression": "COUNT",
        "parameter": {
          "type": "constant",
          "value": "1"
        },
        "returntype": "bigint"
      },
      "dependent_measure_ref": null
    },
    {
      "id": 2,
      "name": "CONVERSIONS",
      "function": {
        "expression": "SUM",
        "parameter": {
          "type": "column",
          "value": "CONVERSIONS"
        },
        "returntype": "bigint"
      },
      "dependent_measure_ref": null
    }
  ],
  "rowkey": {
    "rowkey_columns": [
      {
        "column": "KEYWORD_DIM_ID",
        "length": 0,
        "dictionary": "false",
        "mandatory": false
      }
    ],
    "aggregation_groups": [
      [
        "KEYWORD_DIM_ID"
      ]
    ]
  },
  "signature": "T+aYTH/KlCwwmVAGRQR3hQ==",
  "capacity": "LARGE",
  "last_modified": 1424367558297,
  "fact_table": "FACTS",
  "null_string": null,
  "filter_condition": "KEYWORDS.PUBLISHER_GROUP_ID=386784931",
  "cube_partition_desc": {
    "partition_date_column": null,
    "partition_date_start": 0,
    "cube_partition_type": "APPEND"
  },
  "hbase_mapping": {
    "column_family": [
      {
        "name": "F1",
        "columns": [
          {
            "qualifier": "M",
            "measure_refs": [
              "_COUNT_",
              "CONVERSIONS"
            ]
          }
        ]
      }
    ]
  },
  "notify_list": [
    "sam"
  ]
}


Cheers,
sam

On Thu, Feb 19, 2015 at 9:49 PM, 周千昊 <z....@gmail.com> wrote:

> Also since you set the dictionary to false, there should not be any memory
> consuming while building dictionary.
> So can you also give us the json description of the cube?(in the cube tab,
> click the corresponding cube, click the json button)
>
>
> On Fri Feb 20 2015 at 1:39:15 PM 周千昊 <z....@gmail.com> wrote:
>
> > Hi, Samuel
> >      Can you give us some detail log, so we can dig into the root cause
> >
> > On Fri Feb 20 2015 at 2:44:32 AM Samuel Bock <sb...@marinsoftware.com>
> > wrote:
> >
> >> Hello all,
> >>
> >> We are in the process of evaluating Kylin for use as an OLAP engine. To
> >> that end, we are trying to get a minimum viable setup with a
> >> representative
> >> sample of our data in order to gather performance metrics. We have kylin
> >> running against a 10 node cluster, the provided cubes build successfully
> >> and the system seems functional. Attempting to build a simple cube
> against
> >> our data results in an OutOfMemoryError in the kylin server process (so
> >> far
> >> we have given it up to a 46 gig heap). I was wondering if you could give
> >> me
> >> some guidance as to likely causes, any configurations I'm likely to have
> >> missed before I start diving into the source. I have changed the
> >> "dictionary" setting to false, as recommended for high-cardinality
> >> dimensions, but have not changed configuration significantly apart from
> >> that.
> >>
> >> For reference, the sizes of the hive tables we're building the cubes
> from
> >> dimension table: 25,399,061 rows
> >> fact table: 270,940,921 rows
> >>
> >> (And as a note, there are no pertinent log messages except to indicate
> >> that
> >> it is in the Build Dimension Dictionary step)
> >>
> >> Thank you,
> >> sam bock
> >>
> >
>

Re: OutOfMemoryError on step #3 of Cube build

Posted by 周千昊 <z....@gmail.com>.
Also since you set the dictionary to false, there should not be any memory
consuming while building dictionary.
So can you also give us the json description of the cube?(in the cube tab,
click the corresponding cube, click the json button)


On Fri Feb 20 2015 at 1:39:15 PM 周千昊 <z....@gmail.com> wrote:

> Hi, Samuel
>      Can you give us some detail log, so we can dig into the root cause
>
> On Fri Feb 20 2015 at 2:44:32 AM Samuel Bock <sb...@marinsoftware.com>
> wrote:
>
>> Hello all,
>>
>> We are in the process of evaluating Kylin for use as an OLAP engine. To
>> that end, we are trying to get a minimum viable setup with a
>> representative
>> sample of our data in order to gather performance metrics. We have kylin
>> running against a 10 node cluster, the provided cubes build successfully
>> and the system seems functional. Attempting to build a simple cube against
>> our data results in an OutOfMemoryError in the kylin server process (so
>> far
>> we have given it up to a 46 gig heap). I was wondering if you could give
>> me
>> some guidance as to likely causes, any configurations I'm likely to have
>> missed before I start diving into the source. I have changed the
>> "dictionary" setting to false, as recommended for high-cardinality
>> dimensions, but have not changed configuration significantly apart from
>> that.
>>
>> For reference, the sizes of the hive tables we're building the cubes from
>> dimension table: 25,399,061 rows
>> fact table: 270,940,921 rows
>>
>> (And as a note, there are no pertinent log messages except to indicate
>> that
>> it is in the Build Dimension Dictionary step)
>>
>> Thank you,
>> sam bock
>>
>

Re: OutOfMemoryError on step #3 of Cube build

Posted by 周千昊 <z....@gmail.com>.
Hi, Samuel
     Can you give us some detail log, so we can dig into the root cause

On Fri Feb 20 2015 at 2:44:32 AM Samuel Bock <sb...@marinsoftware.com>
wrote:

> Hello all,
>
> We are in the process of evaluating Kylin for use as an OLAP engine. To
> that end, we are trying to get a minimum viable setup with a representative
> sample of our data in order to gather performance metrics. We have kylin
> running against a 10 node cluster, the provided cubes build successfully
> and the system seems functional. Attempting to build a simple cube against
> our data results in an OutOfMemoryError in the kylin server process (so far
> we have given it up to a 46 gig heap). I was wondering if you could give me
> some guidance as to likely causes, any configurations I'm likely to have
> missed before I start diving into the source. I have changed the
> "dictionary" setting to false, as recommended for high-cardinality
> dimensions, but have not changed configuration significantly apart from
> that.
>
> For reference, the sizes of the hive tables we're building the cubes from
> dimension table: 25,399,061 rows
> fact table: 270,940,921 rows
>
> (And as a note, there are no pertinent log messages except to indicate that
> it is in the Build Dimension Dictionary step)
>
> Thank you,
> sam bock
>