You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@kylin.apache.org by Ashika Umanga Umagiliya <um...@gmail.com> on 2016/09/28 01:56:52 UTC

tomcat crashes while building long running Cubes jobs

greetings,

After successfully building kylin sample cube, I tried to build our own
cube with Kylin.
After long running job, I noticed that tomcat crashes.There's following
"java" process running even after the crash of tomcat.

After killing this process.I could restart Kylin.
I noticed that the job has failed in the "#4 Step Name: Build Dimension
Dictionary Duration: 0 seconds" step.

However, tomcat crashes every time I try to build my cube in the same step.
I couldn't find any useful  information in tomcat log.
Any tips please ?




[kylin@ins-ascale102 tomcat]$ ps -ef | grep java
kylin   9775  9686  0 00:14 pts/0    00:00:00 grep --color=auto java
kylin  30628 29623 67 Sep27 ?        10:59:29
/home/kylin/hdp_c5000/jdk/bin/java -Dproc_-Xms1024M
-XX:OnOutOfMemoryError=kill -9 %p -XX:+UseConcMarkSweepGC
-Djava.security.krb5.conf=/home/kylin/hdp_c5000/krb5.conf
-Dhbase.log.dir=/home/kylin/hdp_c5000/hbase/logs -Dhbase.log.file=hbase.log
-Dhbase.home.dir=/home/kylin/hdp_c5000/hbase -Dhbase.id.str=
-Dhbase.root.logger=INFO,console
-Djava.library.path=/home/kylin/hdp_c5000/hadoop-2.7.1.2.4.2.0-258/lib/native
-Dhbase.security.logger=INFO,NullAppender
-Dhbase.log.dir=/home/kylin/hdp_c5000/hbase/logs -Dhbase.log.file=hbase.log
-Dhbase.home.dir=/home/kylin/hdp_c5000/hbase -Dhbase.id.str=
-Dhbase.root.logger=INFO,console
-Djava.library.path=/home/kylin/hdp_c5000/hadoop-2.7.1.2.4.2.0-258/lib/native
-Dhbase.security.logger=INFO,NullAppender -Xms1024M -Xmx4096M -Xss256K
-XX:MaxPermSize=128M -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps
-Xloggc:/home/kylin/git/ashika-kylin/kylin/dist/apache-kylin-1.5.4-SNAPSHOT-bin/logs/kylin.gc.26034
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=64M
-Dlog4j.configuration=kylin-server-log4j.properties
-Dcatalina.home=/home/kylin/git/ashika-kylin/kylin/dist/apache-kylin-1.5.4-SNAPSHOT-bin/bin/../tomcat
org.apache.kylin.tool.JobDiagnosisInfoCLI -jobId
56a5d296-7d61-4aaf-8073-cad8df28606a -destDir
/home/kylin/git/ashika-kylin/kylin/dist/apache-kylin-1.5.4-SNAPSHOT-bin/bin/../tomcat/temp/1474962693869-0
kylin  31291 29869 67 Sep27 ?        11:00:05
/home/kylin/hdp_c5000/jdk/bin/java -Dproc_-Xms1024M
-XX:OnOutOfMemoryError=kill -9 %p -XX:+UseConcMarkSweepGC
-Djava.security.krb5.conf=/home/kylin/hdp_c5000/krb5.conf
-Dhbase.log.dir=/home/kylin/hdp_c5000/hbase/logs -Dhbase.log.file=hbase.log
-Dhbase.home.dir=/home/kylin/hdp_c5000/hbase -Dhbase.id.str=
-Dhbase.root.logger=INFO,console
-Djava.library.path=/home/kylin/hdp_c5000/hadoop-2.7.1.2.4.2.0-258/lib/native
-Dhbase.security.logger=INFO,NullAppender
-Dhbase.log.dir=/home/kylin/hdp_c5000/hbase/logs -Dhbase.log.file=hbase.log
-Dhbase.home.dir=/home/kylin/hdp_c5000/hbase -Dhbase.id.str=
-Dhbase.root.logger=INFO,console
-Djava.library.path=/home/kylin/hdp_c5000/hadoop-2.7.1.2.4.2.0-258/lib/native
-Dhbase.security.logger=INFO,NullAppender -Xms1024M -Xmx4096M -Xss256K
-XX:MaxPermSize=128M -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps
-Xloggc:/home/kylin/git/ashika-kylin/kylin/dist/apache-kylin-1.5.4-SNAPSHOT-bin/logs/kylin.gc.26034
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=64M
-Dlog4j.configuration=kylin-server-log4j.properties
-Dcatalina.home=/home/kylin/git/ashika-kylin/kylin/dist/apache-kylin-1.5.4-SNAPSHOT-bin/bin/../tomcat
org.apache.kylin.tool.JobDiagnosisInfoCLI -jobId
56a5d296-7d61-4aaf-8073-cad8df28606a -destDir
/home/kylin/git/ashika-kylin/kylin/dist/apache-kylin-1.5.4-SNAPSHOT-bin/bin/../tomcat/temp/1474962697366-0

Re: tomcat crashes while building long running Cubes jobs

Posted by Li Yang <li...@apache.org>.

Right, ultra high cardinality is not suitable for dictionary. Please
consider other encodings.

On Thu, Sep 29, 2016 at 9:29 AM, Ashika Umanga Umagiliya <
umanga.pdn@gmail.com> wrote:

> I think I found some explanation  here :
>
> https://github.com/KylinOLAP/Kylin/issues/364
>
> On Thu, Sep 29, 2016 at 9:55 AM, Ashika Umanga Umagiliya <
> umanga.pdn@gmail.com> wrote:
>
>> Finally, the 4th step failed without throwing OOM exception.
>> The log error was :
>>
>>
>> -------
>>
>> java.lang.RuntimeException: Failed to create dictionary on RAT_LOG_FILTERED.RAT_LOG_APRL_MAY_2015.EASY_ID
>> 	at org.apache.kylin.dict.DictionaryManager.buildDictionary(DictionaryManager.java:325)
>> 	at org.apache.kylin.cube.CubeManager.buildDictionary(CubeManager.java:185)
>> 	at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:50)
>> 	at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:41)
>> 	at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
>> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>> 	at org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
>> 	at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:113)
>> 	at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:57)
>> 	at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:113)
>> 	at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:136)
>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> 	at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.lang.IllegalArgumentException: Too high cardinality is not suitable for dictionary -- cardinality: 96111330
>> 	at org.apache.kylin.dict.DictionaryGenerator.buildDictionary(DictionaryGenerator.java:96)
>> 	at org.apache.kylin.dict.DictionaryGenerator.buildDictionary(DictionaryGenerator.java:73)
>> 	at org.apache.kylin.dict.DictionaryManager.buildDictionary(DictionaryManager.java:321)
>> 	... 14 more
>>
>> result code:2
>>
>>
>> On Thu, Sep 29, 2016 at 9:15 AM, Ashika Umanga Umagiliya <
>> umanga.pdn@gmail.com> wrote:
>>
>>> Thanks for the tips,
>>>
>>> I increased memory up to 28Gb (32Gb total in the Kylin node)
>>> But still I could see the java process (its the only java process in the
>>> server) memory consumption keep growing and finally crash with
>>> OutOfMemoryException.
>>>
>>> This happens in the 4th step "4 Step Name #: Build Dimension Dictionary
>>> Duration: 0 Seconds" which continue for about 25mins before the crash.
>>> Why does this step need that much of memory in Kylin side?
>>> Also I couldn't see any  logs to investigate the issue.
>>> Apart from GC dump, where else can I find any useful information ?
>>>
>>>
>>> On Wed, Sep 28, 2016 at 4:55 PM, Li Yang <li...@apache.org> wrote:
>>>
>>>> Increase memory in $KYLIN_HOME/bin/setenv.sh
>>>>
>>>> # (if your're deploying KYLIN on a powerful server and want to replace
>>>> the default conservative settings)
>>>> # uncomment following to for it to take effect
>>>> export KYLIN_JVM_SETTINGS=...
>>>> # export KYLIN_JVM_SETTINGS=...
>>>>
>>>> The commented line is a reference.
>>>>
>>>> Cheers
>>>> Yang
>>>>
>>>>
>>>> On Wed, Sep 28, 2016 at 3:06 PM, Ashika Umanga Umagiliya <
>>>> umanga.pdn@gmail.com> wrote:
>>>>
>>>>> Looks like tomcat crashed after running out of memory.
>>>>> I saw this in "kylin.out" :
>>>>>
>>>>> #
>>>>> # java.lang.OutOfMemoryError: Java heap space
>>>>> # -XX:OnOutOfMemoryError="kill -9 %p"
>>>>> #   Executing /bin/sh -c "kill -9 12727"...
>>>>>
>>>>>
>>>>>
>>>>> Before the crash , "kylin.log" file shows following lines.
>>>>> Seems it keep trying to reconnect to ZooKeeper.
>>>>> What the reason for  Kylin to communicate with ZK ?
>>>>>
>>>>> I see the line "System free memory less than 100 MB."
>>>>>
>>>>> ---- kylin.log ----
>>>>>
>>>>> 2016-09-28 06:50:02,495 ERROR [Curator-Framework-0]
>>>>> curator.ConnectionState:200 : Connection timed out for connection string
>>>>> (hdp-jz5001.hadoop.local:2181,hdp-jz5002.hadoop.local:2181,hdp-jz5003.hadoop.local:2181)
>>>>> and timeout (15000) / elapsed (28428)
>>>>> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
>>>>> ConnectionLoss
>>>>> at org.apache.curator.ConnectionState.checkTimeouts(ConnectionS
>>>>> tate.java:197)
>>>>> at org.apache.curator.ConnectionState.getZooKeeper(ConnectionSt
>>>>> ate.java:87)
>>>>> at org.apache.curator.CuratorZookeeperClient.getZooKeeper(Curat
>>>>> orZookeeperClient.java:115)
>>>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.perfo
>>>>> rmBackgroundOperation(CuratorFrameworkImpl.java:806)
>>>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.backg
>>>>> roundOperationsLoop(CuratorFrameworkImpl.java:792)
>>>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.acces
>>>>> s$300(CuratorFrameworkImpl.java:62)
>>>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.cal
>>>>> l(CuratorFrameworkImpl.java:257)
>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>>>>> Executor.java:1142)
>>>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>>>>> lExecutor.java:617)
>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>> 2016-09-28 06:50:02,495 INFO  [Thread-10-SendThread(hdp-jz5001.hadoop.local:2181)]
>>>>> zookeeper.ClientCnxn:1279 : Session establishment complete on server
>>>>> hdp-jz5001.hadoop.local/100.78.7.155:2181, sessionid =
>>>>> 0x156d401adb1701a, negotiated timeout = 40000
>>>>> 2016-09-28 06:50:02,495 INFO  [localhost-startStop-1-SendTh
>>>>> read(hdp-jz5003.hadoop.local:2181)] zookeeper.ClientCnxn:1019 :
>>>>> Opening socket connection to server hdp-jz5003.hadoop.local/100.78
>>>>> .8.153:2181. Will not attempt to authenticate using SASL (unknown
>>>>> error)
>>>>> 2016-09-28 06:50:02,495 ERROR [Curator-Framework-0]
>>>>> curator.ConnectionState:200 : Connection timed out for connection string
>>>>> (hdp-jz5001.hadoop.local:2181,hdp-jz5002.hadoop.local:2181,hdp-jz5003.hadoop.local:2181)
>>>>> and timeout (15000) / elapsed (28429)
>>>>> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
>>>>> ConnectionLoss
>>>>> at org.apache.curator.ConnectionState.checkTimeouts(ConnectionS
>>>>> tate.java:197)
>>>>> at org.apache.curator.ConnectionState.getZooKeeper(ConnectionSt
>>>>> ate.java:87)
>>>>> at org.apache.curator.CuratorZookeeperClient.getZooKeeper(Curat
>>>>> orZookeeperClient.java:115)
>>>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.perfo
>>>>> rmBackgroundOperation(CuratorFrameworkImpl.java:806)
>>>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.doSyn
>>>>> cForSuspendedConnection(CuratorFrameworkImpl.java:681)
>>>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.acces
>>>>> s$700(CuratorFrameworkImpl.java:62)
>>>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl$7.ret
>>>>> riesExhausted(CuratorFrameworkImpl.java:677)
>>>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.check
>>>>> BackgroundRetry(CuratorFrameworkImpl.java:696)
>>>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.perfo
>>>>> rmBackgroundOperation(CuratorFrameworkImpl.java:826)
>>>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.backg
>>>>> roundOperationsLoop(CuratorFrameworkImpl.java:792)
>>>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.acces
>>>>> s$300(CuratorFrameworkImpl.java:62)
>>>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.cal
>>>>> l(CuratorFrameworkImpl.java:257)
>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>>>>> Executor.java:1142)
>>>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>>>>> lExecutor.java:617)
>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>> 2016-09-28 06:50:02,495 INFO  [localhost-startStop-1-SendTh
>>>>> read(hdp-jz5003.hadoop.local:2181)] zookeeper.ClientCnxn:864 : Socket
>>>>> connection established to hdp-jz5003.hadoop.local/100.78.8.153:2181,
>>>>> initiating session
>>>>> 2016-09-28 06:50:15,060 INFO  [localhost-startStop-1-SendTh
>>>>> read(hdp-jz5003.hadoop.local:2181)] zookeeper.ClientCnxn:1140 :
>>>>> Client session timed out, have not heard from server in 12565ms for
>>>>> sessionid 0x356d401ac017143, closing socket connection and attempting
>>>>> reconnect
>>>>> 2016-09-28 06:50:02,495 INFO  [Thread-10-EventThread]
>>>>> state.ConnectionStateManager:228 : State change: RECONNECTED
>>>>> 2016-09-28 06:50:31,040 INFO  [Thread-10-SendThread(hdp-jz5001.hadoop.local:2181)]
>>>>> zookeeper.ClientCnxn:1140 : Client session timed out, have not heard from
>>>>> server in 28544ms for sessionid 0x156d401adb1701a, closing socket
>>>>> connection and attempting reconnect
>>>>> 2016-09-28 06:50:31,042 DEBUG [http-bio-7070-exec-7]
>>>>> service.AdminService:89 : Get Kylin Runtime Config
>>>>> 2016-09-28 06:50:31,043 DEBUG [http-bio-7070-exec-1]
>>>>> controller.UserController:64 : authentication.getPrincipal() is
>>>>> org.springframework.security.core.userdetails.User@3b40b2f: Username:
>>>>> ADMIN; Password: [PROTECTED]; Enabled: true; AccountNonExpired: true;
>>>>> credentialsNonExpired: true; AccountNonLocked: true; Granted Authorities:
>>>>> ROLE_ADMIN,ROLE_ANALYST,ROLE_MODELER
>>>>> 2016-09-28 06:50:43,799 INFO  [localhost-startStop-1-SendTh
>>>>> read(hdp-jz5002.hadoop.local:2181)] zookeeper.ClientCnxn:1019 :
>>>>> Opening socket connection to server hdp-jz5002.hadoop.local/100.78
>>>>> .8.20:2181. Will not attempt to authenticate using SASL (unknown
>>>>> error)
>>>>> 2016-09-28 06:50:43,799 INFO  [Thread-10-EventThread]
>>>>> state.ConnectionStateManager:228 : State change: SUSPENDED
>>>>> 2016-09-28 06:50:59,925 INFO  [BadQueryDetector]
>>>>> service.BadQueryDetector:151 : System free memory less than 100 MB. 0
>>>>> queries running.
>>>>> 2016-09-28 06:50:59,926 INFO  [localhost-startStop-1-SendTh
>>>>> read(hdp-jz5002.hadoop.local:2181)] zookeeper.ClientCnxn:864 : Socket
>>>>> connection established to hdp-jz5002.hadoop.local/100.78.8.20:2181,
>>>>> initiating session
>>>>> 2016-09-28 06:51:28,723 INFO  [localhost-startStop-1-SendTh
>>>>> read(hdp-jz5002.hadoop.local:2181)] zookeeper.ClientCnxn:1140 :
>>>>> Client session timed out, have not heard from server in 28798ms for
>>>>> sessionid 0x356d401ac017143, closing socket connection and attempting
>>>>> reconnect
>>>>> 2016-09-28 06:51:41,129 INFO  [pool-8-thread-10-SendThread(hdp-jz5001.hadoop.local:2181)]
>>>>> zookeeper.ClientCnxn:1142 : Unable to read additional data from server
>>>>> sessionid 0x356d401ac01714a, likely server has closed socket, closing
>>>>> socket connection and attempting reconnect
>>>>> 2016-09-28 06:51:53,474 INFO  [Thread-10-SendThread(hdp-jz5003.hadoop.local:2181)]
>>>>> zookeeper.ClientCnxn:1019 : Opening socket connection to server
>>>>> hdp-jz5003.hadoop.local/100.78.8.153:2181. Will not attempt to
>>>>> authenticate using SASL (unknown error)
>>>>> 2016-09-28 06:51:12,316 INFO  [pool-8-thread-10-SendThread(hdp-jz5003.hadoop.local:2181)]
>>>>> zookeeper.ClientCnxn:1140 : Client session timed out, have not heard from
>>>>> server in 28517ms for sessionid 0x256d401adbf6f77, closing socket
>>>>> connection and attempting reconnect
>>>>> 2016-09-28 06:54:29,304 INFO  [localhost-startStop-1-SendTh
>>>>> read(hdp-jz5001.hadoop.local:2181)] zookeeper.ClientCnxn:1019 :
>>>>> Opening socket connection to server hdp-jz5001.hadoop.local/100.78
>>>>> .7.155:2181. Will not attempt to authenticate using SASL (unknown
>>>>> error)
>>>>> 2016-09-28 06:52:05,570 INFO  [BadQueryDetector]
>>>>> service.BadQueryDetector:151 : System free memory less than 100 MB. 0
>>>>> queries running.
>>>>> 2016-09-28 06:56:29,665 ERROR [Curator-Framework-0]
>>>>> imps.CuratorFrameworkImpl:537 : Background operation retry gave up
>>>>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>>>>> KeeperErrorCode = ConnectionLoss
>>>>> at org.apache.zookeeper.KeeperException.create(KeeperException.
>>>>> java:99)
>>>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.check
>>>>> BackgroundRetry(CuratorFrameworkImpl.java:708)
>>>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.perfo
>>>>> rmBackgroundOperation(CuratorFrameworkImpl.java:826)
>>>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.backg
>>>>> roundOperationsLoop(CuratorFrameworkImpl.java:792)
>>>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.acces
>>>>> s$300(CuratorFrameworkImpl.java:62)
>>>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.cal
>>>>> l(CuratorFrameworkImpl.java:257)
>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>>>>> Executor.java:1142)
>>>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>>>>> lExecutor.java:617)
>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>> 2016-09-28 06:57:31,275 INFO  [BadQueryDetector]
>>>>> service.BadQueryDetector:151 : System free memory less than 100 MB. 0
>>>>> queries running.
>>>>> 2016-09-28 06:56:29,665 INFO  [pool-8-thread-10-SendThread(hdp-jz5001.hadoop.local:2181)]
>>>>> zookeeper.ClientCnxn:1019 : Opening socket connection to server
>>>>> hdp-jz5001.hadoop.local/100.78.7.155:2181. Will not attempt to
>>>>> authenticate using SASL (unknown error)
>>>>>
>>>>>
>>>>>
>>>>> #
>>>>> # java.lang.OutOfMemoryError: Java heap space
>>>>> # -XX:OnOutOfMemoryError="kill -9 %p"
>>>>> #   Executing /bin/sh -c "kill -9 12727"...
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Umanga
>>> http://jp.linkedin.com/in/umanga
>>> http://umanga.ifreepages.com
>>>
>>
>>
>>
>> --
>> Umanga
>> http://jp.linkedin.com/in/umanga
>> http://umanga.ifreepages.com
>>
>
>
>
> --
> Umanga
> http://jp.linkedin.com/in/umanga
> http://umanga.ifreepages.com
>

Re: tomcat crashes while building long running Cubes jobs

Posted by Ashika Umanga Umagiliya <um...@gmail.com>.

I think I found some explanation  here :

https://github.com/KylinOLAP/Kylin/issues/364

On Thu, Sep 29, 2016 at 9:55 AM, Ashika Umanga Umagiliya <
umanga.pdn@gmail.com> wrote:

> Finally, the 4th step failed without throwing OOM exception.
> The log error was :
>
>
> -------
>
> java.lang.RuntimeException: Failed to create dictionary on RAT_LOG_FILTERED.RAT_LOG_APRL_MAY_2015.EASY_ID
> 	at org.apache.kylin.dict.DictionaryManager.buildDictionary(DictionaryManager.java:325)
> 	at org.apache.kylin.cube.CubeManager.buildDictionary(CubeManager.java:185)
> 	at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:50)
> 	at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:41)
> 	at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
> 	at org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
> 	at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:113)
> 	at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:57)
> 	at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:113)
> 	at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:136)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalArgumentException: Too high cardinality is not suitable for dictionary -- cardinality: 96111330
> 	at org.apache.kylin.dict.DictionaryGenerator.buildDictionary(DictionaryGenerator.java:96)
> 	at org.apache.kylin.dict.DictionaryGenerator.buildDictionary(DictionaryGenerator.java:73)
> 	at org.apache.kylin.dict.DictionaryManager.buildDictionary(DictionaryManager.java:321)
> 	... 14 more
>
> result code:2
>
>
> On Thu, Sep 29, 2016 at 9:15 AM, Ashika Umanga Umagiliya <
> umanga.pdn@gmail.com> wrote:
>
>> Thanks for the tips,
>>
>> I increased memory up to 28Gb (32Gb total in the Kylin node)
>> But still I could see the java process (its the only java process in the
>> server) memory consumption keep growing and finally crash with
>> OutOfMemoryException.
>>
>> This happens in the 4th step "4 Step Name #: Build Dimension Dictionary
>> Duration: 0 Seconds" which continue for about 25mins before the crash.
>> Why does this step need that much of memory in Kylin side?
>> Also I couldn't see any  logs to investigate the issue.
>> Apart from GC dump, where else can I find any useful information ?
>>
>>
>> On Wed, Sep 28, 2016 at 4:55 PM, Li Yang <li...@apache.org> wrote:
>>
>>> Increase memory in $KYLIN_HOME/bin/setenv.sh
>>>
>>> # (if your're deploying KYLIN on a powerful server and want to replace
>>> the default conservative settings)
>>> # uncomment following to for it to take effect
>>> export KYLIN_JVM_SETTINGS=...
>>> # export KYLIN_JVM_SETTINGS=...
>>>
>>> The commented line is a reference.
>>>
>>> Cheers
>>> Yang
>>>
>>>
>>> On Wed, Sep 28, 2016 at 3:06 PM, Ashika Umanga Umagiliya <
>>> umanga.pdn@gmail.com> wrote:
>>>
>>>> Looks like tomcat crashed after running out of memory.
>>>> I saw this in "kylin.out" :
>>>>
>>>> #
>>>> # java.lang.OutOfMemoryError: Java heap space
>>>> # -XX:OnOutOfMemoryError="kill -9 %p"
>>>> #   Executing /bin/sh -c "kill -9 12727"...
>>>>
>>>>
>>>>
>>>> Before the crash , "kylin.log" file shows following lines.
>>>> Seems it keep trying to reconnect to ZooKeeper.
>>>> What the reason for  Kylin to communicate with ZK ?
>>>>
>>>> I see the line "System free memory less than 100 MB."
>>>>
>>>> ---- kylin.log ----
>>>>
>>>> 2016-09-28 06:50:02,495 ERROR [Curator-Framework-0]
>>>> curator.ConnectionState:200 : Connection timed out for connection string
>>>> (hdp-jz5001.hadoop.local:2181,hdp-jz5002.hadoop.local:2181,hdp-jz5003.hadoop.local:2181)
>>>> and timeout (15000) / elapsed (28428)
>>>> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
>>>> ConnectionLoss
>>>> at org.apache.curator.ConnectionState.checkTimeouts(ConnectionS
>>>> tate.java:197)
>>>> at org.apache.curator.ConnectionState.getZooKeeper(ConnectionSt
>>>> ate.java:87)
>>>> at org.apache.curator.CuratorZookeeperClient.getZooKeeper(Curat
>>>> orZookeeperClient.java:115)
>>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.perfo
>>>> rmBackgroundOperation(CuratorFrameworkImpl.java:806)
>>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.backg
>>>> roundOperationsLoop(CuratorFrameworkImpl.java:792)
>>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.acces
>>>> s$300(CuratorFrameworkImpl.java:62)
>>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.cal
>>>> l(CuratorFrameworkImpl.java:257)
>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>>>> Executor.java:1142)
>>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>>>> lExecutor.java:617)
>>>> at java.lang.Thread.run(Thread.java:745)
>>>> 2016-09-28 06:50:02,495 INFO  [Thread-10-SendThread(hdp-jz5001.hadoop.local:2181)]
>>>> zookeeper.ClientCnxn:1279 : Session establishment complete on server
>>>> hdp-jz5001.hadoop.local/100.78.7.155:2181, sessionid =
>>>> 0x156d401adb1701a, negotiated timeout = 40000
>>>> 2016-09-28 06:50:02,495 INFO  [localhost-startStop-1-SendTh
>>>> read(hdp-jz5003.hadoop.local:2181)] zookeeper.ClientCnxn:1019 :
>>>> Opening socket connection to server hdp-jz5003.hadoop.local/100.78
>>>> .8.153:2181. Will not attempt to authenticate using SASL (unknown
>>>> error)
>>>> 2016-09-28 06:50:02,495 ERROR [Curator-Framework-0]
>>>> curator.ConnectionState:200 : Connection timed out for connection string
>>>> (hdp-jz5001.hadoop.local:2181,hdp-jz5002.hadoop.local:2181,hdp-jz5003.hadoop.local:2181)
>>>> and timeout (15000) / elapsed (28429)
>>>> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
>>>> ConnectionLoss
>>>> at org.apache.curator.ConnectionState.checkTimeouts(ConnectionS
>>>> tate.java:197)
>>>> at org.apache.curator.ConnectionState.getZooKeeper(ConnectionSt
>>>> ate.java:87)
>>>> at org.apache.curator.CuratorZookeeperClient.getZooKeeper(Curat
>>>> orZookeeperClient.java:115)
>>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.perfo
>>>> rmBackgroundOperation(CuratorFrameworkImpl.java:806)
>>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.doSyn
>>>> cForSuspendedConnection(CuratorFrameworkImpl.java:681)
>>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.acces
>>>> s$700(CuratorFrameworkImpl.java:62)
>>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl$7.ret
>>>> riesExhausted(CuratorFrameworkImpl.java:677)
>>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.check
>>>> BackgroundRetry(CuratorFrameworkImpl.java:696)
>>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.perfo
>>>> rmBackgroundOperation(CuratorFrameworkImpl.java:826)
>>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.backg
>>>> roundOperationsLoop(CuratorFrameworkImpl.java:792)
>>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.acces
>>>> s$300(CuratorFrameworkImpl.java:62)
>>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.cal
>>>> l(CuratorFrameworkImpl.java:257)
>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>>>> Executor.java:1142)
>>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>>>> lExecutor.java:617)
>>>> at java.lang.Thread.run(Thread.java:745)
>>>> 2016-09-28 06:50:02,495 INFO  [localhost-startStop-1-SendTh
>>>> read(hdp-jz5003.hadoop.local:2181)] zookeeper.ClientCnxn:864 : Socket
>>>> connection established to hdp-jz5003.hadoop.local/100.78.8.153:2181,
>>>> initiating session
>>>> 2016-09-28 06:50:15,060 INFO  [localhost-startStop-1-SendTh
>>>> read(hdp-jz5003.hadoop.local:2181)] zookeeper.ClientCnxn:1140 : Client
>>>> session timed out, have not heard from server in 12565ms for sessionid
>>>> 0x356d401ac017143, closing socket connection and attempting reconnect
>>>> 2016-09-28 06:50:02,495 INFO  [Thread-10-EventThread]
>>>> state.ConnectionStateManager:228 : State change: RECONNECTED
>>>> 2016-09-28 06:50:31,040 INFO  [Thread-10-SendThread(hdp-jz5001.hadoop.local:2181)]
>>>> zookeeper.ClientCnxn:1140 : Client session timed out, have not heard from
>>>> server in 28544ms for sessionid 0x156d401adb1701a, closing socket
>>>> connection and attempting reconnect
>>>> 2016-09-28 06:50:31,042 DEBUG [http-bio-7070-exec-7]
>>>> service.AdminService:89 : Get Kylin Runtime Config
>>>> 2016-09-28 06:50:31,043 DEBUG [http-bio-7070-exec-1]
>>>> controller.UserController:64 : authentication.getPrincipal() is
>>>> org.springframework.security.core.userdetails.User@3b40b2f: Username:
>>>> ADMIN; Password: [PROTECTED]; Enabled: true; AccountNonExpired: true;
>>>> credentialsNonExpired: true; AccountNonLocked: true; Granted Authorities:
>>>> ROLE_ADMIN,ROLE_ANALYST,ROLE_MODELER
>>>> 2016-09-28 06:50:43,799 INFO  [localhost-startStop-1-SendTh
>>>> read(hdp-jz5002.hadoop.local:2181)] zookeeper.ClientCnxn:1019 :
>>>> Opening socket connection to server hdp-jz5002.hadoop.local/100.78
>>>> .8.20:2181. Will not attempt to authenticate using SASL (unknown error)
>>>> 2016-09-28 06:50:43,799 INFO  [Thread-10-EventThread]
>>>> state.ConnectionStateManager:228 : State change: SUSPENDED
>>>> 2016-09-28 06:50:59,925 INFO  [BadQueryDetector]
>>>> service.BadQueryDetector:151 : System free memory less than 100 MB. 0
>>>> queries running.
>>>> 2016-09-28 06:50:59,926 INFO  [localhost-startStop-1-SendTh
>>>> read(hdp-jz5002.hadoop.local:2181)] zookeeper.ClientCnxn:864 : Socket
>>>> connection established to hdp-jz5002.hadoop.local/100.78.8.20:2181,
>>>> initiating session
>>>> 2016-09-28 06:51:28,723 INFO  [localhost-startStop-1-SendTh
>>>> read(hdp-jz5002.hadoop.local:2181)] zookeeper.ClientCnxn:1140 : Client
>>>> session timed out, have not heard from server in 28798ms for sessionid
>>>> 0x356d401ac017143, closing socket connection and attempting reconnect
>>>> 2016-09-28 06:51:41,129 INFO  [pool-8-thread-10-SendThread(hdp-jz5001.hadoop.local:2181)]
>>>> zookeeper.ClientCnxn:1142 : Unable to read additional data from server
>>>> sessionid 0x356d401ac01714a, likely server has closed socket, closing
>>>> socket connection and attempting reconnect
>>>> 2016-09-28 06:51:53,474 INFO  [Thread-10-SendThread(hdp-jz5003.hadoop.local:2181)]
>>>> zookeeper.ClientCnxn:1019 : Opening socket connection to server
>>>> hdp-jz5003.hadoop.local/100.78.8.153:2181. Will not attempt to
>>>> authenticate using SASL (unknown error)
>>>> 2016-09-28 06:51:12,316 INFO  [pool-8-thread-10-SendThread(hdp-jz5003.hadoop.local:2181)]
>>>> zookeeper.ClientCnxn:1140 : Client session timed out, have not heard from
>>>> server in 28517ms for sessionid 0x256d401adbf6f77, closing socket
>>>> connection and attempting reconnect
>>>> 2016-09-28 06:54:29,304 INFO  [localhost-startStop-1-SendTh
>>>> read(hdp-jz5001.hadoop.local:2181)] zookeeper.ClientCnxn:1019 :
>>>> Opening socket connection to server hdp-jz5001.hadoop.local/100.78
>>>> .7.155:2181. Will not attempt to authenticate using SASL (unknown
>>>> error)
>>>> 2016-09-28 06:52:05,570 INFO  [BadQueryDetector]
>>>> service.BadQueryDetector:151 : System free memory less than 100 MB. 0
>>>> queries running.
>>>> 2016-09-28 06:56:29,665 ERROR [Curator-Framework-0]
>>>> imps.CuratorFrameworkImpl:537 : Background operation retry gave up
>>>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>>>> KeeperErrorCode = ConnectionLoss
>>>> at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.check
>>>> BackgroundRetry(CuratorFrameworkImpl.java:708)
>>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.perfo
>>>> rmBackgroundOperation(CuratorFrameworkImpl.java:826)
>>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.backg
>>>> roundOperationsLoop(CuratorFrameworkImpl.java:792)
>>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.acces
>>>> s$300(CuratorFrameworkImpl.java:62)
>>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.cal
>>>> l(CuratorFrameworkImpl.java:257)
>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>>>> Executor.java:1142)
>>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>>>> lExecutor.java:617)
>>>> at java.lang.Thread.run(Thread.java:745)
>>>> 2016-09-28 06:57:31,275 INFO  [BadQueryDetector]
>>>> service.BadQueryDetector:151 : System free memory less than 100 MB. 0
>>>> queries running.
>>>> 2016-09-28 06:56:29,665 INFO  [pool-8-thread-10-SendThread(hdp-jz5001.hadoop.local:2181)]
>>>> zookeeper.ClientCnxn:1019 : Opening socket connection to server
>>>> hdp-jz5001.hadoop.local/100.78.7.155:2181. Will not attempt to
>>>> authenticate using SASL (unknown error)
>>>>
>>>>
>>>>
>>>> #
>>>> # java.lang.OutOfMemoryError: Java heap space
>>>> # -XX:OnOutOfMemoryError="kill -9 %p"
>>>> #   Executing /bin/sh -c "kill -9 12727"...
>>>>
>>>>
>>>
>>
>>
>> --
>> Umanga
>> http://jp.linkedin.com/in/umanga
>> http://umanga.ifreepages.com
>>
>
>
>
> --
> Umanga
> http://jp.linkedin.com/in/umanga
> http://umanga.ifreepages.com
>



-- 
Umanga
http://jp.linkedin.com/in/umanga
http://umanga.ifreepages.com

Re: tomcat crashes while building long running Cubes jobs

Posted by Ashika Umanga Umagiliya <um...@gmail.com>.

Finally, the 4th step failed without throwing OOM exception.
The log error was :


-------

java.lang.RuntimeException: Failed to create dictionary on
RAT_LOG_FILTERED.RAT_LOG_APRL_MAY_2015.EASY_ID
	at org.apache.kylin.dict.DictionaryManager.buildDictionary(DictionaryManager.java:325)
	at org.apache.kylin.cube.CubeManager.buildDictionary(CubeManager.java:185)
	at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:50)
	at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:41)
	at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
	at org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
	at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:113)
	at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:57)
	at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:113)
	at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:136)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: Too high cardinality is
not suitable for dictionary -- cardinality: 96111330
	at org.apache.kylin.dict.DictionaryGenerator.buildDictionary(DictionaryGenerator.java:96)
	at org.apache.kylin.dict.DictionaryGenerator.buildDictionary(DictionaryGenerator.java:73)
	at org.apache.kylin.dict.DictionaryManager.buildDictionary(DictionaryManager.java:321)
	... 14 more

result code:2


On Thu, Sep 29, 2016 at 9:15 AM, Ashika Umanga Umagiliya <
umanga.pdn@gmail.com> wrote:

> Thanks for the tips,
>
> I increased memory up to 28Gb (32Gb total in the Kylin node)
> But still I could see the java process (its the only java process in the
> server) memory consumption keep growing and finally crash with
> OutOfMemoryException.
>
> This happens in the 4th step "4 Step Name #: Build Dimension Dictionary
> Duration: 0 Seconds" which continue for about 25mins before the crash.
> Why does this step need that much of memory in Kylin side?
> Also I couldn't see any  logs to investigate the issue.
> Apart from GC dump, where else can I find any useful information ?
>
>
> On Wed, Sep 28, 2016 at 4:55 PM, Li Yang <li...@apache.org> wrote:
>
>> Increase memory in $KYLIN_HOME/bin/setenv.sh
>>
>> # (if your're deploying KYLIN on a powerful server and want to replace
>> the default conservative settings)
>> # uncomment following to for it to take effect
>> export KYLIN_JVM_SETTINGS=...
>> # export KYLIN_JVM_SETTINGS=...
>>
>> The commented line is a reference.
>>
>> Cheers
>> Yang
>>
>>
>> On Wed, Sep 28, 2016 at 3:06 PM, Ashika Umanga Umagiliya <
>> umanga.pdn@gmail.com> wrote:
>>
>>> Looks like tomcat crashed after running out of memory.
>>> I saw this in "kylin.out" :
>>>
>>> #
>>> # java.lang.OutOfMemoryError: Java heap space
>>> # -XX:OnOutOfMemoryError="kill -9 %p"
>>> #   Executing /bin/sh -c "kill -9 12727"...
>>>
>>>
>>>
>>> Before the crash , "kylin.log" file shows following lines.
>>> Seems it keep trying to reconnect to ZooKeeper.
>>> What the reason for  Kylin to communicate with ZK ?
>>>
>>> I see the line "System free memory less than 100 MB."
>>>
>>> ---- kylin.log ----
>>>
>>> 2016-09-28 06:50:02,495 ERROR [Curator-Framework-0]
>>> curator.ConnectionState:200 : Connection timed out for connection string
>>> (hdp-jz5001.hadoop.local:2181,hdp-jz5002.hadoop.local:2181,hdp-jz5003.hadoop.local:2181)
>>> and timeout (15000) / elapsed (28428)
>>> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
>>> ConnectionLoss
>>> at org.apache.curator.ConnectionState.checkTimeouts(ConnectionS
>>> tate.java:197)
>>> at org.apache.curator.ConnectionState.getZooKeeper(ConnectionSt
>>> ate.java:87)
>>> at org.apache.curator.CuratorZookeeperClient.getZooKeeper(Curat
>>> orZookeeperClient.java:115)
>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.perfo
>>> rmBackgroundOperation(CuratorFrameworkImpl.java:806)
>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.backg
>>> roundOperationsLoop(CuratorFrameworkImpl.java:792)
>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.acces
>>> s$300(CuratorFrameworkImpl.java:62)
>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.cal
>>> l(CuratorFrameworkImpl.java:257)
>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>>> Executor.java:1142)
>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>>> lExecutor.java:617)
>>> at java.lang.Thread.run(Thread.java:745)
>>> 2016-09-28 06:50:02,495 INFO  [Thread-10-SendThread(hdp-jz5001.hadoop.local:2181)]
>>> zookeeper.ClientCnxn:1279 : Session establishment complete on server
>>> hdp-jz5001.hadoop.local/100.78.7.155:2181, sessionid =
>>> 0x156d401adb1701a, negotiated timeout = 40000
>>> 2016-09-28 06:50:02,495 INFO  [localhost-startStop-1-SendTh
>>> read(hdp-jz5003.hadoop.local:2181)] zookeeper.ClientCnxn:1019 : Opening
>>> socket connection to server hdp-jz5003.hadoop.local/100.78.8.153:2181.
>>> Will not attempt to authenticate using SASL (unknown error)
>>> 2016-09-28 06:50:02,495 ERROR [Curator-Framework-0]
>>> curator.ConnectionState:200 : Connection timed out for connection string
>>> (hdp-jz5001.hadoop.local:2181,hdp-jz5002.hadoop.local:2181,hdp-jz5003.hadoop.local:2181)
>>> and timeout (15000) / elapsed (28429)
>>> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
>>> ConnectionLoss
>>> at org.apache.curator.ConnectionState.checkTimeouts(ConnectionS
>>> tate.java:197)
>>> at org.apache.curator.ConnectionState.getZooKeeper(ConnectionSt
>>> ate.java:87)
>>> at org.apache.curator.CuratorZookeeperClient.getZooKeeper(Curat
>>> orZookeeperClient.java:115)
>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.perfo
>>> rmBackgroundOperation(CuratorFrameworkImpl.java:806)
>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.doSyn
>>> cForSuspendedConnection(CuratorFrameworkImpl.java:681)
>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.acces
>>> s$700(CuratorFrameworkImpl.java:62)
>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl$7.ret
>>> riesExhausted(CuratorFrameworkImpl.java:677)
>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.check
>>> BackgroundRetry(CuratorFrameworkImpl.java:696)
>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.perfo
>>> rmBackgroundOperation(CuratorFrameworkImpl.java:826)
>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.backg
>>> roundOperationsLoop(CuratorFrameworkImpl.java:792)
>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.acces
>>> s$300(CuratorFrameworkImpl.java:62)
>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.cal
>>> l(CuratorFrameworkImpl.java:257)
>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>>> Executor.java:1142)
>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>>> lExecutor.java:617)
>>> at java.lang.Thread.run(Thread.java:745)
>>> 2016-09-28 06:50:02,495 INFO  [localhost-startStop-1-SendTh
>>> read(hdp-jz5003.hadoop.local:2181)] zookeeper.ClientCnxn:864 : Socket
>>> connection established to hdp-jz5003.hadoop.local/100.78.8.153:2181,
>>> initiating session
>>> 2016-09-28 06:50:15,060 INFO  [localhost-startStop-1-SendTh
>>> read(hdp-jz5003.hadoop.local:2181)] zookeeper.ClientCnxn:1140 : Client
>>> session timed out, have not heard from server in 12565ms for sessionid
>>> 0x356d401ac017143, closing socket connection and attempting reconnect
>>> 2016-09-28 06:50:02,495 INFO  [Thread-10-EventThread]
>>> state.ConnectionStateManager:228 : State change: RECONNECTED
>>> 2016-09-28 06:50:31,040 INFO  [Thread-10-SendThread(hdp-jz5001.hadoop.local:2181)]
>>> zookeeper.ClientCnxn:1140 : Client session timed out, have not heard from
>>> server in 28544ms for sessionid 0x156d401adb1701a, closing socket
>>> connection and attempting reconnect
>>> 2016-09-28 06:50:31,042 DEBUG [http-bio-7070-exec-7]
>>> service.AdminService:89 : Get Kylin Runtime Config
>>> 2016-09-28 06:50:31,043 DEBUG [http-bio-7070-exec-1]
>>> controller.UserController:64 : authentication.getPrincipal() is
>>> org.springframework.security.core.userdetails.User@3b40b2f: Username:
>>> ADMIN; Password: [PROTECTED]; Enabled: true; AccountNonExpired: true;
>>> credentialsNonExpired: true; AccountNonLocked: true; Granted Authorities:
>>> ROLE_ADMIN,ROLE_ANALYST,ROLE_MODELER
>>> 2016-09-28 06:50:43,799 INFO  [localhost-startStop-1-SendTh
>>> read(hdp-jz5002.hadoop.local:2181)] zookeeper.ClientCnxn:1019 : Opening
>>> socket connection to server hdp-jz5002.hadoop.local/100.78.8.20:2181.
>>> Will not attempt to authenticate using SASL (unknown error)
>>> 2016-09-28 06:50:43,799 INFO  [Thread-10-EventThread]
>>> state.ConnectionStateManager:228 : State change: SUSPENDED
>>> 2016-09-28 06:50:59,925 INFO  [BadQueryDetector]
>>> service.BadQueryDetector:151 : System free memory less than 100 MB. 0
>>> queries running.
>>> 2016-09-28 06:50:59,926 INFO  [localhost-startStop-1-SendTh
>>> read(hdp-jz5002.hadoop.local:2181)] zookeeper.ClientCnxn:864 : Socket
>>> connection established to hdp-jz5002.hadoop.local/100.78.8.20:2181,
>>> initiating session
>>> 2016-09-28 06:51:28,723 INFO  [localhost-startStop-1-SendTh
>>> read(hdp-jz5002.hadoop.local:2181)] zookeeper.ClientCnxn:1140 : Client
>>> session timed out, have not heard from server in 28798ms for sessionid
>>> 0x356d401ac017143, closing socket connection and attempting reconnect
>>> 2016-09-28 06:51:41,129 INFO  [pool-8-thread-10-SendThread(hdp-jz5001.hadoop.local:2181)]
>>> zookeeper.ClientCnxn:1142 : Unable to read additional data from server
>>> sessionid 0x356d401ac01714a, likely server has closed socket, closing
>>> socket connection and attempting reconnect
>>> 2016-09-28 06:51:53,474 INFO  [Thread-10-SendThread(hdp-jz5003.hadoop.local:2181)]
>>> zookeeper.ClientCnxn:1019 : Opening socket connection to server
>>> hdp-jz5003.hadoop.local/100.78.8.153:2181. Will not attempt to
>>> authenticate using SASL (unknown error)
>>> 2016-09-28 06:51:12,316 INFO  [pool-8-thread-10-SendThread(hdp-jz5003.hadoop.local:2181)]
>>> zookeeper.ClientCnxn:1140 : Client session timed out, have not heard from
>>> server in 28517ms for sessionid 0x256d401adbf6f77, closing socket
>>> connection and attempting reconnect
>>> 2016-09-28 06:54:29,304 INFO  [localhost-startStop-1-SendTh
>>> read(hdp-jz5001.hadoop.local:2181)] zookeeper.ClientCnxn:1019 : Opening
>>> socket connection to server hdp-jz5001.hadoop.local/100.78.7.155:2181.
>>> Will not attempt to authenticate using SASL (unknown error)
>>> 2016-09-28 06:52:05,570 INFO  [BadQueryDetector]
>>> service.BadQueryDetector:151 : System free memory less than 100 MB. 0
>>> queries running.
>>> 2016-09-28 06:56:29,665 ERROR [Curator-Framework-0]
>>> imps.CuratorFrameworkImpl:537 : Background operation retry gave up
>>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>>> KeeperErrorCode = ConnectionLoss
>>> at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.check
>>> BackgroundRetry(CuratorFrameworkImpl.java:708)
>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.perfo
>>> rmBackgroundOperation(CuratorFrameworkImpl.java:826)
>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.backg
>>> roundOperationsLoop(CuratorFrameworkImpl.java:792)
>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.acces
>>> s$300(CuratorFrameworkImpl.java:62)
>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.cal
>>> l(CuratorFrameworkImpl.java:257)
>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>>> Executor.java:1142)
>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>>> lExecutor.java:617)
>>> at java.lang.Thread.run(Thread.java:745)
>>> 2016-09-28 06:57:31,275 INFO  [BadQueryDetector]
>>> service.BadQueryDetector:151 : System free memory less than 100 MB. 0
>>> queries running.
>>> 2016-09-28 06:56:29,665 INFO  [pool-8-thread-10-SendThread(hdp-jz5001.hadoop.local:2181)]
>>> zookeeper.ClientCnxn:1019 : Opening socket connection to server
>>> hdp-jz5001.hadoop.local/100.78.7.155:2181. Will not attempt to
>>> authenticate using SASL (unknown error)
>>>
>>>
>>>
>>> #
>>> # java.lang.OutOfMemoryError: Java heap space
>>> # -XX:OnOutOfMemoryError="kill -9 %p"
>>> #   Executing /bin/sh -c "kill -9 12727"...
>>>
>>>
>>
>
>
> --
> Umanga
> http://jp.linkedin.com/in/umanga
> http://umanga.ifreepages.com
>



-- 
Umanga
http://jp.linkedin.com/in/umanga
http://umanga.ifreepages.com

Re: tomcat crashes while building long running Cubes jobs

Posted by Ashika Umanga Umagiliya <um...@gmail.com>.

Thanks for the tips,

I increased memory up to 28Gb (32Gb total in the Kylin node)
But still I could see the java process (its the only java process in the
server) memory consumption keep growing and finally crash with
OutOfMemoryException.

This happens in the 4th step "4 Step Name #: Build Dimension Dictionary
Duration: 0 Seconds" which continue for about 25mins before the crash.
Why does this step need that much of memory in Kylin side?
Also I couldn't see any  logs to investigate the issue.
Apart from GC dump, where else can I find any useful information ?


On Wed, Sep 28, 2016 at 4:55 PM, Li Yang <li...@apache.org> wrote:

> Increase memory in $KYLIN_HOME/bin/setenv.sh
>
> # (if your're deploying KYLIN on a powerful server and want to replace the
> default conservative settings)
> # uncomment following to for it to take effect
> export KYLIN_JVM_SETTINGS=...
> # export KYLIN_JVM_SETTINGS=...
>
> The commented line is a reference.
>
> Cheers
> Yang
>
>
> On Wed, Sep 28, 2016 at 3:06 PM, Ashika Umanga Umagiliya <
> umanga.pdn@gmail.com> wrote:
>
>> Looks like tomcat crashed after running out of memory.
>> I saw this in "kylin.out" :
>>
>> #
>> # java.lang.OutOfMemoryError: Java heap space
>> # -XX:OnOutOfMemoryError="kill -9 %p"
>> #   Executing /bin/sh -c "kill -9 12727"...
>>
>>
>>
>> Before the crash , "kylin.log" file shows following lines.
>> Seems it keep trying to reconnect to ZooKeeper.
>> What the reason for  Kylin to communicate with ZK ?
>>
>> I see the line "System free memory less than 100 MB."
>>
>> ---- kylin.log ----
>>
>> 2016-09-28 06:50:02,495 ERROR [Curator-Framework-0]
>> curator.ConnectionState:200 : Connection timed out for connection string
>> (hdp-jz5001.hadoop.local:2181,hdp-jz5002.hadoop.local:2181,hdp-jz5003.hadoop.local:2181)
>> and timeout (15000) / elapsed (28428)
>> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
>> ConnectionLoss
>> at org.apache.curator.ConnectionState.checkTimeouts(ConnectionS
>> tate.java:197)
>> at org.apache.curator.ConnectionState.getZooKeeper(ConnectionSt
>> ate.java:87)
>> at org.apache.curator.CuratorZookeeperClient.getZooKeeper(Curat
>> orZookeeperClient.java:115)
>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.perfo
>> rmBackgroundOperation(CuratorFrameworkImpl.java:806)
>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.backg
>> roundOperationsLoop(CuratorFrameworkImpl.java:792)
>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.acces
>> s$300(CuratorFrameworkImpl.java:62)
>> at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.cal
>> l(CuratorFrameworkImpl.java:257)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1142)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>> 2016-09-28 06:50:02,495 INFO  [Thread-10-SendThread(hdp-jz5001.hadoop.local:2181)]
>> zookeeper.ClientCnxn:1279 : Session establishment complete on server
>> hdp-jz5001.hadoop.local/100.78.7.155:2181, sessionid =
>> 0x156d401adb1701a, negotiated timeout = 40000
>> 2016-09-28 06:50:02,495 INFO  [localhost-startStop-1-SendTh
>> read(hdp-jz5003.hadoop.local:2181)] zookeeper.ClientCnxn:1019 : Opening
>> socket connection to server hdp-jz5003.hadoop.local/100.78.8.153:2181.
>> Will not attempt to authenticate using SASL (unknown error)
>> 2016-09-28 06:50:02,495 ERROR [Curator-Framework-0]
>> curator.ConnectionState:200 : Connection timed out for connection string
>> (hdp-jz5001.hadoop.local:2181,hdp-jz5002.hadoop.local:2181,hdp-jz5003.hadoop.local:2181)
>> and timeout (15000) / elapsed (28429)
>> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
>> ConnectionLoss
>> at org.apache.curator.ConnectionState.checkTimeouts(ConnectionS
>> tate.java:197)
>> at org.apache.curator.ConnectionState.getZooKeeper(ConnectionSt
>> ate.java:87)
>> at org.apache.curator.CuratorZookeeperClient.getZooKeeper(Curat
>> orZookeeperClient.java:115)
>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.perfo
>> rmBackgroundOperation(CuratorFrameworkImpl.java:806)
>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.doSyn
>> cForSuspendedConnection(CuratorFrameworkImpl.java:681)
>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.acces
>> s$700(CuratorFrameworkImpl.java:62)
>> at org.apache.curator.framework.imps.CuratorFrameworkImpl$7.ret
>> riesExhausted(CuratorFrameworkImpl.java:677)
>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.check
>> BackgroundRetry(CuratorFrameworkImpl.java:696)
>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.perfo
>> rmBackgroundOperation(CuratorFrameworkImpl.java:826)
>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.backg
>> roundOperationsLoop(CuratorFrameworkImpl.java:792)
>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.acces
>> s$300(CuratorFrameworkImpl.java:62)
>> at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.cal
>> l(CuratorFrameworkImpl.java:257)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1142)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>> 2016-09-28 06:50:02,495 INFO  [localhost-startStop-1-SendTh
>> read(hdp-jz5003.hadoop.local:2181)] zookeeper.ClientCnxn:864 : Socket
>> connection established to hdp-jz5003.hadoop.local/100.78.8.153:2181,
>> initiating session
>> 2016-09-28 06:50:15,060 INFO  [localhost-startStop-1-SendTh
>> read(hdp-jz5003.hadoop.local:2181)] zookeeper.ClientCnxn:1140 : Client
>> session timed out, have not heard from server in 12565ms for sessionid
>> 0x356d401ac017143, closing socket connection and attempting reconnect
>> 2016-09-28 06:50:02,495 INFO  [Thread-10-EventThread]
>> state.ConnectionStateManager:228 : State change: RECONNECTED
>> 2016-09-28 06:50:31,040 INFO  [Thread-10-SendThread(hdp-jz5001.hadoop.local:2181)]
>> zookeeper.ClientCnxn:1140 : Client session timed out, have not heard from
>> server in 28544ms for sessionid 0x156d401adb1701a, closing socket
>> connection and attempting reconnect
>> 2016-09-28 06:50:31,042 DEBUG [http-bio-7070-exec-7]
>> service.AdminService:89 : Get Kylin Runtime Config
>> 2016-09-28 06:50:31,043 DEBUG [http-bio-7070-exec-1]
>> controller.UserController:64 : authentication.getPrincipal() is
>> org.springframework.security.core.userdetails.User@3b40b2f: Username:
>> ADMIN; Password: [PROTECTED]; Enabled: true; AccountNonExpired: true;
>> credentialsNonExpired: true; AccountNonLocked: true; Granted Authorities:
>> ROLE_ADMIN,ROLE_ANALYST,ROLE_MODELER
>> 2016-09-28 06:50:43,799 INFO  [localhost-startStop-1-SendTh
>> read(hdp-jz5002.hadoop.local:2181)] zookeeper.ClientCnxn:1019 : Opening
>> socket connection to server hdp-jz5002.hadoop.local/100.78.8.20:2181.
>> Will not attempt to authenticate using SASL (unknown error)
>> 2016-09-28 06:50:43,799 INFO  [Thread-10-EventThread]
>> state.ConnectionStateManager:228 : State change: SUSPENDED
>> 2016-09-28 06:50:59,925 INFO  [BadQueryDetector]
>> service.BadQueryDetector:151 : System free memory less than 100 MB. 0
>> queries running.
>> 2016-09-28 06:50:59,926 INFO  [localhost-startStop-1-SendTh
>> read(hdp-jz5002.hadoop.local:2181)] zookeeper.ClientCnxn:864 : Socket
>> connection established to hdp-jz5002.hadoop.local/100.78.8.20:2181,
>> initiating session
>> 2016-09-28 06:51:28,723 INFO  [localhost-startStop-1-SendTh
>> read(hdp-jz5002.hadoop.local:2181)] zookeeper.ClientCnxn:1140 : Client
>> session timed out, have not heard from server in 28798ms for sessionid
>> 0x356d401ac017143, closing socket connection and attempting reconnect
>> 2016-09-28 06:51:41,129 INFO  [pool-8-thread-10-SendThread(hdp-jz5001.hadoop.local:2181)]
>> zookeeper.ClientCnxn:1142 : Unable to read additional data from server
>> sessionid 0x356d401ac01714a, likely server has closed socket, closing
>> socket connection and attempting reconnect
>> 2016-09-28 06:51:53,474 INFO  [Thread-10-SendThread(hdp-jz5003.hadoop.local:2181)]
>> zookeeper.ClientCnxn:1019 : Opening socket connection to server
>> hdp-jz5003.hadoop.local/100.78.8.153:2181. Will not attempt to
>> authenticate using SASL (unknown error)
>> 2016-09-28 06:51:12,316 INFO  [pool-8-thread-10-SendThread(hdp-jz5003.hadoop.local:2181)]
>> zookeeper.ClientCnxn:1140 : Client session timed out, have not heard from
>> server in 28517ms for sessionid 0x256d401adbf6f77, closing socket
>> connection and attempting reconnect
>> 2016-09-28 06:54:29,304 INFO  [localhost-startStop-1-SendTh
>> read(hdp-jz5001.hadoop.local:2181)] zookeeper.ClientCnxn:1019 : Opening
>> socket connection to server hdp-jz5001.hadoop.local/100.78.7.155:2181.
>> Will not attempt to authenticate using SASL (unknown error)
>> 2016-09-28 06:52:05,570 INFO  [BadQueryDetector]
>> service.BadQueryDetector:151 : System free memory less than 100 MB. 0
>> queries running.
>> 2016-09-28 06:56:29,665 ERROR [Curator-Framework-0]
>> imps.CuratorFrameworkImpl:537 : Background operation retry gave up
>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>> KeeperErrorCode = ConnectionLoss
>> at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.check
>> BackgroundRetry(CuratorFrameworkImpl.java:708)
>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.perfo
>> rmBackgroundOperation(CuratorFrameworkImpl.java:826)
>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.backg
>> roundOperationsLoop(CuratorFrameworkImpl.java:792)
>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.acces
>> s$300(CuratorFrameworkImpl.java:62)
>> at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.cal
>> l(CuratorFrameworkImpl.java:257)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1142)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>> 2016-09-28 06:57:31,275 INFO  [BadQueryDetector]
>> service.BadQueryDetector:151 : System free memory less than 100 MB. 0
>> queries running.
>> 2016-09-28 06:56:29,665 INFO  [pool-8-thread-10-SendThread(hdp-jz5001.hadoop.local:2181)]
>> zookeeper.ClientCnxn:1019 : Opening socket connection to server
>> hdp-jz5001.hadoop.local/100.78.7.155:2181. Will not attempt to
>> authenticate using SASL (unknown error)
>>
>>
>>
>> #
>> # java.lang.OutOfMemoryError: Java heap space
>> # -XX:OnOutOfMemoryError="kill -9 %p"
>> #   Executing /bin/sh -c "kill -9 12727"...
>>
>>
>


-- 
Umanga
http://jp.linkedin.com/in/umanga
http://umanga.ifreepages.com

Re: tomcat crashes while building long running Cubes jobs

Posted by Li Yang <li...@apache.org>.

Increase memory in $KYLIN_HOME/bin/setenv.sh

# (if your're deploying KYLIN on a powerful server and want to replace the
default conservative settings)
# uncomment following to for it to take effect
export KYLIN_JVM_SETTINGS=...
# export KYLIN_JVM_SETTINGS=...

The commented line is a reference.

Cheers
Yang


On Wed, Sep 28, 2016 at 3:06 PM, Ashika Umanga Umagiliya <
umanga.pdn@gmail.com> wrote:

> Looks like tomcat crashed after running out of memory.
> I saw this in "kylin.out" :
>
> #
> # java.lang.OutOfMemoryError: Java heap space
> # -XX:OnOutOfMemoryError="kill -9 %p"
> #   Executing /bin/sh -c "kill -9 12727"...
>
>
>
> Before the crash , "kylin.log" file shows following lines.
> Seems it keep trying to reconnect to ZooKeeper.
> What the reason for  Kylin to communicate with ZK ?
>
> I see the line "System free memory less than 100 MB."
>
> ---- kylin.log ----
>
> 2016-09-28 06:50:02,495 ERROR [Curator-Framework-0]
> curator.ConnectionState:200 : Connection timed out for connection string
> (hdp-jz5001.hadoop.local:2181,hdp-jz5002.hadoop.local:2181,hdp-jz5003.hadoop.local:2181)
> and timeout (15000) / elapsed (28428)
> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
> ConnectionLoss
> at org.apache.curator.ConnectionState.checkTimeouts(
> ConnectionState.java:197)
> at org.apache.curator.ConnectionState.getZooKeeper(
> ConnectionState.java:87)
> at org.apache.curator.CuratorZookeeperClient.getZooKeeper(
> CuratorZookeeperClient.java:115)
> at org.apache.curator.framework.imps.CuratorFrameworkImpl.
> performBackgroundOperation(CuratorFrameworkImpl.java:806)
> at org.apache.curator.framework.imps.CuratorFrameworkImpl.
> backgroundOperationsLoop(CuratorFrameworkImpl.java:792)
> at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(
> CuratorFrameworkImpl.java:62)
> at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.
> call(CuratorFrameworkImpl.java:257)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2016-09-28 06:50:02,495 INFO  [Thread-10-SendThread(hdp-jz5001.hadoop.local:2181)]
> zookeeper.ClientCnxn:1279 : Session establishment complete on server
> hdp-jz5001.hadoop.local/100.78.7.155:2181, sessionid = 0x156d401adb1701a,
> negotiated timeout = 40000
> 2016-09-28 06:50:02,495 INFO  [localhost-startStop-1-
> SendThread(hdp-jz5003.hadoop.local:2181)] zookeeper.ClientCnxn:1019 :
> Opening socket connection to server hdp-jz5003.hadoop.local/100.
> 78.8.153:2181. Will not attempt to authenticate using SASL (unknown error)
> 2016-09-28 06:50:02,495 ERROR [Curator-Framework-0]
> curator.ConnectionState:200 : Connection timed out for connection string
> (hdp-jz5001.hadoop.local:2181,hdp-jz5002.hadoop.local:2181,hdp-jz5003.hadoop.local:2181)
> and timeout (15000) / elapsed (28429)
> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
> ConnectionLoss
> at org.apache.curator.ConnectionState.checkTimeouts(
> ConnectionState.java:197)
> at org.apache.curator.ConnectionState.getZooKeeper(
> ConnectionState.java:87)
> at org.apache.curator.CuratorZookeeperClient.getZooKeeper(
> CuratorZookeeperClient.java:115)
> at org.apache.curator.framework.imps.CuratorFrameworkImpl.
> performBackgroundOperation(CuratorFrameworkImpl.java:806)
> at org.apache.curator.framework.imps.CuratorFrameworkImpl.
> doSyncForSuspendedConnection(CuratorFrameworkImpl.java:681)
> at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$700(
> CuratorFrameworkImpl.java:62)
> at org.apache.curator.framework.imps.CuratorFrameworkImpl$7.
> retriesExhausted(CuratorFrameworkImpl.java:677)
> at org.apache.curator.framework.imps.CuratorFrameworkImpl.
> checkBackgroundRetry(CuratorFrameworkImpl.java:696)
> at org.apache.curator.framework.imps.CuratorFrameworkImpl.
> performBackgroundOperation(CuratorFrameworkImpl.java:826)
> at org.apache.curator.framework.imps.CuratorFrameworkImpl.
> backgroundOperationsLoop(CuratorFrameworkImpl.java:792)
> at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(
> CuratorFrameworkImpl.java:62)
> at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.
> call(CuratorFrameworkImpl.java:257)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2016-09-28 06:50:02,495 INFO  [localhost-startStop-1-
> SendThread(hdp-jz5003.hadoop.local:2181)] zookeeper.ClientCnxn:864 :
> Socket connection established to hdp-jz5003.hadoop.local/100.78.8.153:2181,
> initiating session
> 2016-09-28 06:50:15,060 INFO  [localhost-startStop-1-
> SendThread(hdp-jz5003.hadoop.local:2181)] zookeeper.ClientCnxn:1140 :
> Client session timed out, have not heard from server in 12565ms for
> sessionid 0x356d401ac017143, closing socket connection and attempting
> reconnect
> 2016-09-28 06:50:02,495 INFO  [Thread-10-EventThread]
> state.ConnectionStateManager:228 : State change: RECONNECTED
> 2016-09-28 06:50:31,040 INFO  [Thread-10-SendThread(hdp-jz5001.hadoop.local:2181)]
> zookeeper.ClientCnxn:1140 : Client session timed out, have not heard from
> server in 28544ms for sessionid 0x156d401adb1701a, closing socket
> connection and attempting reconnect
> 2016-09-28 06:50:31,042 DEBUG [http-bio-7070-exec-7]
> service.AdminService:89 : Get Kylin Runtime Config
> 2016-09-28 06:50:31,043 DEBUG [http-bio-7070-exec-1]
> controller.UserController:64 : authentication.getPrincipal() is
> org.springframework.security.core.userdetails.User@3b40b2f: Username:
> ADMIN; Password: [PROTECTED]; Enabled: true; AccountNonExpired: true;
> credentialsNonExpired: true; AccountNonLocked: true; Granted Authorities:
> ROLE_ADMIN,ROLE_ANALYST,ROLE_MODELER
> 2016-09-28 06:50:43,799 INFO  [localhost-startStop-1-
> SendThread(hdp-jz5002.hadoop.local:2181)] zookeeper.ClientCnxn:1019 :
> Opening socket connection to server hdp-jz5002.hadoop.local/100.
> 78.8.20:2181. Will not attempt to authenticate using SASL (unknown error)
> 2016-09-28 06:50:43,799 INFO  [Thread-10-EventThread]
> state.ConnectionStateManager:228 : State change: SUSPENDED
> 2016-09-28 06:50:59,925 INFO  [BadQueryDetector]
> service.BadQueryDetector:151 : System free memory less than 100 MB. 0
> queries running.
> 2016-09-28 06:50:59,926 INFO  [localhost-startStop-1-
> SendThread(hdp-jz5002.hadoop.local:2181)] zookeeper.ClientCnxn:864 :
> Socket connection established to hdp-jz5002.hadoop.local/100.78.8.20:2181,
> initiating session
> 2016-09-28 06:51:28,723 INFO  [localhost-startStop-1-
> SendThread(hdp-jz5002.hadoop.local:2181)] zookeeper.ClientCnxn:1140 :
> Client session timed out, have not heard from server in 28798ms for
> sessionid 0x356d401ac017143, closing socket connection and attempting
> reconnect
> 2016-09-28 06:51:41,129 INFO  [pool-8-thread-10-SendThread(hdp-jz5001.hadoop.local:2181)]
> zookeeper.ClientCnxn:1142 : Unable to read additional data from server
> sessionid 0x356d401ac01714a, likely server has closed socket, closing
> socket connection and attempting reconnect
> 2016-09-28 06:51:53,474 INFO  [Thread-10-SendThread(hdp-jz5003.hadoop.local:2181)]
> zookeeper.ClientCnxn:1019 : Opening socket connection to server
> hdp-jz5003.hadoop.local/100.78.8.153:2181. Will not attempt to
> authenticate using SASL (unknown error)
> 2016-09-28 06:51:12,316 INFO  [pool-8-thread-10-SendThread(hdp-jz5003.hadoop.local:2181)]
> zookeeper.ClientCnxn:1140 : Client session timed out, have not heard from
> server in 28517ms for sessionid 0x256d401adbf6f77, closing socket
> connection and attempting reconnect
> 2016-09-28 06:54:29,304 INFO  [localhost-startStop-1-
> SendThread(hdp-jz5001.hadoop.local:2181)] zookeeper.ClientCnxn:1019 :
> Opening socket connection to server hdp-jz5001.hadoop.local/100.
> 78.7.155:2181. Will not attempt to authenticate using SASL (unknown error)
> 2016-09-28 06:52:05,570 INFO  [BadQueryDetector]
> service.BadQueryDetector:151 : System free memory less than 100 MB. 0
> queries running.
> 2016-09-28 06:56:29,665 ERROR [Curator-Framework-0]
> imps.CuratorFrameworkImpl:537 : Background operation retry gave up
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> at org.apache.curator.framework.imps.CuratorFrameworkImpl.
> checkBackgroundRetry(CuratorFrameworkImpl.java:708)
> at org.apache.curator.framework.imps.CuratorFrameworkImpl.
> performBackgroundOperation(CuratorFrameworkImpl.java:826)
> at org.apache.curator.framework.imps.CuratorFrameworkImpl.
> backgroundOperationsLoop(CuratorFrameworkImpl.java:792)
> at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(
> CuratorFrameworkImpl.java:62)
> at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.
> call(CuratorFrameworkImpl.java:257)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2016-09-28 06:57:31,275 INFO  [BadQueryDetector]
> service.BadQueryDetector:151 : System free memory less than 100 MB. 0
> queries running.
> 2016-09-28 06:56:29,665 INFO  [pool-8-thread-10-SendThread(hdp-jz5001.hadoop.local:2181)]
> zookeeper.ClientCnxn:1019 : Opening socket connection to server
> hdp-jz5001.hadoop.local/100.78.7.155:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
>
>
> #
> # java.lang.OutOfMemoryError: Java heap space
> # -XX:OnOutOfMemoryError="kill -9 %p"
> #   Executing /bin/sh -c "kill -9 12727"...
>
>

Re: tomcat crashes while building long running Cubes jobs

Posted by Ashika Umanga Umagiliya <um...@gmail.com>.

Looks like tomcat crashed after running out of memory.
I saw this in "kylin.out" :

#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="kill -9 %p"
#   Executing /bin/sh -c "kill -9 12727"...



Before the crash , "kylin.log" file shows following lines.
Seems it keep trying to reconnect to ZooKeeper.
What the reason for  Kylin to communicate with ZK ?

I see the line "System free memory less than 100 MB."

---- kylin.log ----

2016-09-28 06:50:02,495 ERROR [Curator-Framework-0]
curator.ConnectionState:200 : Connection timed out for connection string
(hdp-jz5001.hadoop.local:2181,hdp-jz5002.hadoop.local:2181,hdp-jz5003.hadoop.local:2181)
and timeout (15000) / elapsed (28428)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
ConnectionLoss
at
org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:197)
at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:87)
at
org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:806)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:792)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:62)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:257)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2016-09-28 06:50:02,495 INFO
 [Thread-10-SendThread(hdp-jz5001.hadoop.local:2181)]
zookeeper.ClientCnxn:1279 : Session establishment complete on server
hdp-jz5001.hadoop.local/100.78.7.155:2181, sessionid = 0x156d401adb1701a,
negotiated timeout = 40000
2016-09-28 06:50:02,495 INFO
 [localhost-startStop-1-SendThread(hdp-jz5003.hadoop.local:2181)]
zookeeper.ClientCnxn:1019 : Opening socket connection to server
hdp-jz5003.hadoop.local/100.78.8.153:2181. Will not attempt to authenticate
using SASL (unknown error)
2016-09-28 06:50:02,495 ERROR [Curator-Framework-0]
curator.ConnectionState:200 : Connection timed out for connection string
(hdp-jz5001.hadoop.local:2181,hdp-jz5002.hadoop.local:2181,hdp-jz5003.hadoop.local:2181)
and timeout (15000) / elapsed (28429)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
ConnectionLoss
at
org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:197)
at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:87)
at
org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:806)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.doSyncForSuspendedConnection(CuratorFrameworkImpl.java:681)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.access$700(CuratorFrameworkImpl.java:62)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl$7.retriesExhausted(CuratorFrameworkImpl.java:677)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:696)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:826)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:792)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:62)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:257)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2016-09-28 06:50:02,495 INFO
 [localhost-startStop-1-SendThread(hdp-jz5003.hadoop.local:2181)]
zookeeper.ClientCnxn:864 : Socket connection established to
hdp-jz5003.hadoop.local/100.78.8.153:2181, initiating session
2016-09-28 06:50:15,060 INFO
 [localhost-startStop-1-SendThread(hdp-jz5003.hadoop.local:2181)]
zookeeper.ClientCnxn:1140 : Client session timed out, have not heard from
server in 12565ms for sessionid 0x356d401ac017143, closing socket
connection and attempting reconnect
2016-09-28 06:50:02,495 INFO  [Thread-10-EventThread]
state.ConnectionStateManager:228 : State change: RECONNECTED
2016-09-28 06:50:31,040 INFO
 [Thread-10-SendThread(hdp-jz5001.hadoop.local:2181)]
zookeeper.ClientCnxn:1140 : Client session timed out, have not heard from
server in 28544ms for sessionid 0x156d401adb1701a, closing socket
connection and attempting reconnect
2016-09-28 06:50:31,042 DEBUG [http-bio-7070-exec-7]
service.AdminService:89 : Get Kylin Runtime Config
2016-09-28 06:50:31,043 DEBUG [http-bio-7070-exec-1]
controller.UserController:64 : authentication.getPrincipal() is
org.springframework.security.core.userdetails.User@3b40b2f: Username:
ADMIN; Password: [PROTECTED]; Enabled: true; AccountNonExpired: true;
credentialsNonExpired: true; AccountNonLocked: true; Granted Authorities:
ROLE_ADMIN,ROLE_ANALYST,ROLE_MODELER
2016-09-28 06:50:43,799 INFO
 [localhost-startStop-1-SendThread(hdp-jz5002.hadoop.local:2181)]
zookeeper.ClientCnxn:1019 : Opening socket connection to server
hdp-jz5002.hadoop.local/100.78.8.20:2181. Will not attempt to authenticate
using SASL (unknown error)
2016-09-28 06:50:43,799 INFO  [Thread-10-EventThread]
state.ConnectionStateManager:228 : State change: SUSPENDED
2016-09-28 06:50:59,925 INFO  [BadQueryDetector]
service.BadQueryDetector:151 : System free memory less than 100 MB. 0
queries running.
2016-09-28 06:50:59,926 INFO
 [localhost-startStop-1-SendThread(hdp-jz5002.hadoop.local:2181)]
zookeeper.ClientCnxn:864 : Socket connection established to
hdp-jz5002.hadoop.local/100.78.8.20:2181, initiating session
2016-09-28 06:51:28,723 INFO
 [localhost-startStop-1-SendThread(hdp-jz5002.hadoop.local:2181)]
zookeeper.ClientCnxn:1140 : Client session timed out, have not heard from
server in 28798ms for sessionid 0x356d401ac017143, closing socket
connection and attempting reconnect
2016-09-28 06:51:41,129 INFO
 [pool-8-thread-10-SendThread(hdp-jz5001.hadoop.local:2181)]
zookeeper.ClientCnxn:1142 : Unable to read additional data from server
sessionid 0x356d401ac01714a, likely server has closed socket, closing
socket connection and attempting reconnect
2016-09-28 06:51:53,474 INFO
 [Thread-10-SendThread(hdp-jz5003.hadoop.local:2181)]
zookeeper.ClientCnxn:1019 : Opening socket connection to server
hdp-jz5003.hadoop.local/100.78.8.153:2181. Will not attempt to authenticate
using SASL (unknown error)
2016-09-28 06:51:12,316 INFO
 [pool-8-thread-10-SendThread(hdp-jz5003.hadoop.local:2181)]
zookeeper.ClientCnxn:1140 : Client session timed out, have not heard from
server in 28517ms for sessionid 0x256d401adbf6f77, closing socket
connection and attempting reconnect
2016-09-28 06:54:29,304 INFO
 [localhost-startStop-1-SendThread(hdp-jz5001.hadoop.local:2181)]
zookeeper.ClientCnxn:1019 : Opening socket connection to server
hdp-jz5001.hadoop.local/100.78.7.155:2181. Will not attempt to authenticate
using SASL (unknown error)
2016-09-28 06:52:05,570 INFO  [BadQueryDetector]
service.BadQueryDetector:151 : System free memory less than 100 MB. 0
queries running.
2016-09-28 06:56:29,665 ERROR [Curator-Framework-0]
imps.CuratorFrameworkImpl:537 : Background operation retry gave up
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:708)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:826)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:792)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:62)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:257)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2016-09-28 06:57:31,275 INFO  [BadQueryDetector]
service.BadQueryDetector:151 : System free memory less than 100 MB. 0
queries running.
2016-09-28 06:56:29,665 INFO
 [pool-8-thread-10-SendThread(hdp-jz5001.hadoop.local:2181)]
zookeeper.ClientCnxn:1019 : Opening socket connection to server
hdp-jz5001.hadoop.local/100.78.7.155:2181. Will not attempt to authenticate
using SASL (unknown error)



#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="kill -9 %p"
#   Executing /bin/sh -c "kill -9 12727"...