You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by Srinivasan Hariharan02 <Sr...@infosys.com> on 2015/06/11 11:45:56 UTC

Hive external Table Dimension

Hi,

I have a dimension external  table in Hive which is created using Hbase Storage handler. After creating the cube using this hive  table cube build job failed  in the "Build Dimension Dictionary" with below error
java.lang.IllegalStateException: Expect 1 and only 1 non-zero file under hdfs://host:8020/user/hive/warehouse/hbase.db/department/, but find 0
        at org.apache.kylin.dict.lookup.HiveTable.findOnlyFile(HiveTable.java:123)
        at org.apache.kylin.dict.lookup.HiveTable.computeHDFSLocation(HiveTable.java:107)
        at org.apache.kylin.dict.lookup.HiveTable.getHDFSLocation(HiveTable.java:83)
        at org.apache.kylin.dict.lookup.HiveTable.getFileTable(HiveTable.java:76)
        at org.apache.kylin.dict.lookup.HiveTable.getSignature(HiveTable.java:71)
        at org.apache.kylin.dict.DictionaryManager.buildDictionary(DictionaryManager.java:164)
        at org.apache.kylin.cube.CubeManager.buildDictionary(CubeManager.java:154)
        at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:53)
        at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
        at org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJob.java:53)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
        at org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
        at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
        at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
        at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
        at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:132)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)

Since external table created from other sources like Hbase hive doesn't store any data in their warehouse directory. So it should not check for files under  warehouse dir for external tables. Please help.

Regards,
Srinivasan Hariharan
Mob +91-9940395830


**************** CAUTION - Disclaimer *****************
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely 
for the use of the addressee(s). If you are not the intended recipient, please 
notify the sender by e-mail and delete the original message. Further, you are not 
to copy, disclose, or distribute this e-mail or its contents to any other person and 
any such actions are unlawful. This e-mail may contain viruses. Infosys has taken 
every reasonable precaution to minimize this risk, but is not liable for any damage 
you may sustain as a result of any virus in this e-mail. You should carry out your 
own virus checks before opening the e-mail or attachment. Infosys reserves the 
right to monitor and review the content of all messages sent to or from this e-mail 
address. Messages sent to or from this e-mail address may be stored on the 
Infosys e-mail system.
***INFOSYS******** End of Disclaimer ********INFOSYS***

RE: Hive external Table Dimension

Posted by Srinivasan Hariharan <sr...@outlook.com>.
hi Shi,

Any comments on this patch.

> From: srinivasan.hariharan@outlook.com
> To: dev@kylin.incubator.apache.org
> Subject: RE: Hive external Table Dimension
> Date: Thu, 18 Jun 2015 23:51:50 +0530
> 
> Hi Shi,
> 
> The Unit tests were again failed for query module. I have uploaded patch in the jira pls check.
> 
> > Date: Thu, 18 Jun 2015 14:00:56 +0800
> > Subject: Re: Hive external Table Dimension
> > From: shaofengshi@gmail.com
> > To: dev@kylin.incubator.apache.org
> > 
> > Oh, to run a full mvn test in v0.7, a HDP sandbox is needed, and the test
> > cubes need be built before running query tests; I have a jenkins which runs
> > in sandbox and automates these steps; Please upload your patch and then I
> > will use the jenkins to test it;
> > 
> > BTW: from v0.8 the regression test and unit test are separated; it will be
> > easy for user to run a quick unit test;
> > 
> > Thanks;
> > 
> > 2015-06-18 4:43 GMT+08:00 Srinivasan Hariharan <
> > srinivasan.hariharan@outlook.com>:
> > 
> > > Hi,
> > > I made the changes but Kylin-Query module unit tests fails in 0.7 staging
> > > branch code. Without my changes also unit tests fails for the query module.
> > >
> > >
> > > CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAndCompQuery:359->KylinTestBase.executeQuery:204
> > > type mismatch:
> > > type1:
> > > DECIMAL(19, 4)
> > > type2:
> > > DECIMAL(39, 16) NOT NULL
> > >
> > > CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAndCompQuery:359->KylinTestBase.executeQuery:204
> > > type mismatch:
> > > type1:
> > > DECIMAL(19, 4)
> > > type2:
> > > DECIMAL(39, 16) NOT NULL
> > >
> > > CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAndCompQuery:359->KylinTestBase.executeQuery:204
> > > type mismatch:
> > > type1:
> > > DECIMAL(19, 4)
> > > type2:
> > > DECIMAL(39, 16) NOT NULL
> > >
> > > CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAndCompQuery:359->KylinTestBase.executeQuery:204
> > > type mismatch:
> > > type1:
> > > DECIMAL(19, 4)
> > > type2:
> > > DECIMAL(39, 16) NOT NULL
> > >
> > > CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAndCompQuery:359->KylinTestBase.executeQuery:204
> > > type mismatch:
> > > type1:
> > > DECIMAL(19, 4)
> > > type2:
> > > DECIMAL(39, 16) NOT NULL
> > >
> > > CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAndCompQuery:359->KylinTestBase.executeQuery:204
> > > type mismatch:
> > > type1:
> > > DECIMAL(19, 4)
> > > type2:
> > > DECIMAL(39, 16) NOT NULL
> > >
> > > IIQueryTest.testDetailedQuery:59->KylinTestBase.execAndCompQuery:359->KylinTestBase.executeQuery:204
> > > type mismatch:
> > > type1:
> > > DECIMAL(19, 4)
> > > type2:
> > > DECIMAL(39, 16) NOT NULL
> > >
> > > IIQueryTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAndCompQuery:359->KylinTestBase.executeQuery:204
> > > type mismatch:
> > > type1:
> > > DECIMAL(19, 4)
> > > type2:
> > > DECIMAL(39, 16) NOT NULL
> > >
> > >
> > > Regards,
> > > Srinivasan Hariharan
> > >
> > >
> > >
> > > > Date: Wed, 17 Jun 2015 11:32:05 +0800
> > > > Subject: Re: Hive external Table Dimension
> > > > From: shaofengshi@gmail.com
> > > > To: dev@kylin.incubator.apache.org
> > > >
> > > > Srinivasan,
> > > >
> > > > You can checkout 0.7-staging branch as start; Look into
> > > > org.apache.kylin.dict.lookup.HiveTable, the implementation of
> > > > "getSignature()" and "getColumnDelimeter()" is not perfect: it calls
> > > > "getFileTable()", which will check the underlying HDFS file, as we know
> > > > this is not suitable for external table;
> > > >
> > > > To fix the problem, need re-write two methods; In the new
> > > "getSignature()",
> > > > using Hive API to get the table's path, size and last modified time, you
> > > > may need do some search here; For the new "getColumnDelimeter()", just
> > > > return DELIM_AUTO is okay;
> > > >
> > > > After finish the code and pass all unit test, please create a patch and
> > > > attache it in the JIRA for review ("pull request" is not accepted
> > > anymore);
> > > >
> > > > Thanks for the contribution;
> > > >
> > > >
> > > >
> > > > 2015-06-17 1:10 GMT+08:00 Srinivasan Hariharan <
> > > > srinivasan.hariharan@outlook.com>:
> > > >
> > > > > Hi ,
> > > > >
> > > > > I am interested to contribute to this JIRA, could anyone help me out
> > > where
> > > > > can I start.
> > > > >
> > > > > https://issues.apache.org/jira/browse/KYLIN-824
> > > > >
> > > > > Regards,
> > > > > Srinivasan Hariharan
> > > > >
> > > > >
> > > > >
> > > > > From: srinivasan.hariharan@outlook.com
> > > > > To: dev@kylin.incubator.apache.org
> > > > > Subject: RE: Hive external Table Dimension
> > > > > Date: Thu, 11 Jun 2015 21:51:08 +0530
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > > I have created JIRA.
> > > > >
> > > > > https://issues.apache.org/jira/browse/KYLIN-824
> > > > >
> > > > > I am interested to contribute, i will see the code and update for help.
> > > > >
> > > > >
> > > > > > From: shaoshi@ebay.com
> > > > > > To: dev@kylin.incubator.apache.org
> > > > > > Subject: Re: Hive external Table Dimension
> > > > > > Date: Thu, 11 Jun 2015 14:33:59 +0000
> > > > > >
> > > > > > Kylin need take snapshot for lookup tables for runtime queries (to
> > > derive
> > > > > > the dimensions that not on row key), that¹s why it try to seek the
> > > > > > underlying data file;
> > > > > >
> > > > > > So far without this it couldn¹t move ahead; For long run, Kylin can
> > > > > > consider to abstract this; Please open a JIRA as requirement if you
> > > like;
> > > > > >
> > > > > > On 6/11/15, 5:45 PM, "Srinivasan Hariharan02" <
> > > > > Srinivasan_H02@infosys.com>
> > > > > > wrote:
> > > > > >
> > > > > > >Hi,
> > > > > > >
> > > > > > >I have a dimension external  table in Hive which is created using
> > > Hbase
> > > > > > >Storage handler. After creating the cube using this hive  table cube
> > > > > > >build job failed  in the "Build Dimension Dictionary" with below
> > > error
> > > > > > >java.lang.IllegalStateException: Expect 1 and only 1 non-zero file
> > > under
> > > > > > >hdfs://host:8020/user/hive/warehouse/hbase.db/department/, but find
> > > 0
> > > > > > >        at
> > > > > >
> > > >org.apache.kylin.dict.lookup.HiveTable.findOnlyFile(HiveTable.java:123)
> > > > > > >        at
> > > > > >
> > > > >
> > > >org.apache.kylin.dict.lookup.HiveTable.computeHDFSLocation(HiveTable.java:
> > > > > > >107)
> > > > > > >        at
> > > > > >
> > > > >
> > > >org.apache.kylin.dict.lookup.HiveTable.getHDFSLocation(HiveTable.java:83)
> > > > > > >        at
> > > > > >
> > > >org.apache.kylin.dict.lookup.HiveTable.getFileTable(HiveTable.java:76)
> > > > > > >        at
> > > > > >
> > > >org.apache.kylin.dict.lookup.HiveTable.getSignature(HiveTable.java:71)
> > > > > > >        at
> > > > > >
> > > > >
> > > >org.apache.kylin.dict.DictionaryManager.buildDictionary(DictionaryManager.
> > > > > > >java:164)
> > > > > > >        at
> > > > > >
> > > >org.apache.kylin.cube.CubeManager.buildDictionary(CubeManager.java:154)
> > > > > > >        at
> > > > > >
> > > > >
> > > >org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(Dictionary
> > > > > > >GeneratorCLI.java:53)
> > > > > > >        at
> > > > > >
> > > > >
> > > >org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(Dictionary
> > > > > > >GeneratorCLI.java:42)
> > > > > > >        at
> > > > > >
> > > > >
> > > >org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJ
> > > > > > >ob.java:53)
> > > > > > >        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> > > > > > >        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> > > > > > >        at
> > > > > >
> > > > >
> > > >org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecut
> > > > > > >able.java:63)
> > > > > > >        at
> > > > > >
> > > > >
> > > >org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutab
> > > > > > >le.java:107)
> > > > > > >        at
> > > > > >
> > > > >
> > > >org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChai
> > > > > > >nedExecutable.java:50)
> > > > > > >        at
> > > > > >
> > > > >
> > > >org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutab
> > > > > > >le.java:107)
> > > > > > >        at
> > > > > >
> > > > >
> > > >org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(Defaul
> > > > > > >tScheduler.java:132)
> > > > > > >        at
> > > > > >
> > > > >
> > > >java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
> > > > > > >1145)
> > > > > > >        at
> > > > > >
> > > > >
> > > >java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java
> > > > > > >:615)
> > > > > > >        at java.lang.Thread.run(Thread.java:744)
> > > > > > >
> > > > > > >Since external table created from other sources like Hbase hive
> > > doesn't
> > > > > > >store any data in their warehouse directory. So it should not check
> > > for
> > > > > > >files under  warehouse dir for external tables. Please help.
> > > > > > >
> > > > > > >Regards,
> > > > > > >Srinivasan Hariharan
> > > > > > >Mob +91-9940395830
> > > > > > >
> > > > > > >
> > > > > > >**************** CAUTION - Disclaimer *****************
> > > > > > >This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION
> > > intended
> > > > > > >solely
> > > > > > >for the use of the addressee(s). If you are not the intended
> > > recipient,
> > > > > > >please
> > > > > > >notify the sender by e-mail and delete the original message.
> > > Further,
> > > > > you
> > > > > > >are not
> > > > > > >to copy, disclose, or distribute this e-mail or its contents to any
> > > > > other
> > > > > > >person and
> > > > > > >any such actions are unlawful. This e-mail may contain viruses.
> > > Infosys
> > > > > > >has taken
> > > > > > >every reasonable precaution to minimize this risk, but is not
> > > liable for
> > > > > > >any damage
> > > > > > >you may sustain as a result of any virus in this e-mail. You should
> > > > > carry
> > > > > > >out your
> > > > > > >own virus checks before opening the e-mail or attachment. Infosys
> > > > > > >reserves the
> > > > > > >right to monitor and review the content of all messages sent to or
> > > from
> > > > > > >this e-mail
> > > > > > >address. Messages sent to or from this e-mail address may be stored
> > > on
> > > > > > >the
> > > > > > >Infosys e-mail system.
> > > > > > >***INFOSYS******** End of Disclaimer ********INFOSYS***
> > > > > >
> > > > >
> > > > >
> > > > >
> > >
> > >
>  		 	   		  
 		 	   		  

RE: Hive external Table Dimension

Posted by Srinivasan Hariharan <sr...@outlook.com>.
Hi Shi,

The Unit tests were again failed for query module. I have uploaded patch in the jira pls check.

> Date: Thu, 18 Jun 2015 14:00:56 +0800
> Subject: Re: Hive external Table Dimension
> From: shaofengshi@gmail.com
> To: dev@kylin.incubator.apache.org
> 
> Oh, to run a full mvn test in v0.7, a HDP sandbox is needed, and the test
> cubes need be built before running query tests; I have a jenkins which runs
> in sandbox and automates these steps; Please upload your patch and then I
> will use the jenkins to test it;
> 
> BTW: from v0.8 the regression test and unit test are separated; it will be
> easy for user to run a quick unit test;
> 
> Thanks;
> 
> 2015-06-18 4:43 GMT+08:00 Srinivasan Hariharan <
> srinivasan.hariharan@outlook.com>:
> 
> > Hi,
> > I made the changes but Kylin-Query module unit tests fails in 0.7 staging
> > branch code. Without my changes also unit tests fails for the query module.
> >
> >
> > CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAndCompQuery:359->KylinTestBase.executeQuery:204
> > type mismatch:
> > type1:
> > DECIMAL(19, 4)
> > type2:
> > DECIMAL(39, 16) NOT NULL
> >
> > CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAndCompQuery:359->KylinTestBase.executeQuery:204
> > type mismatch:
> > type1:
> > DECIMAL(19, 4)
> > type2:
> > DECIMAL(39, 16) NOT NULL
> >
> > CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAndCompQuery:359->KylinTestBase.executeQuery:204
> > type mismatch:
> > type1:
> > DECIMAL(19, 4)
> > type2:
> > DECIMAL(39, 16) NOT NULL
> >
> > CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAndCompQuery:359->KylinTestBase.executeQuery:204
> > type mismatch:
> > type1:
> > DECIMAL(19, 4)
> > type2:
> > DECIMAL(39, 16) NOT NULL
> >
> > CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAndCompQuery:359->KylinTestBase.executeQuery:204
> > type mismatch:
> > type1:
> > DECIMAL(19, 4)
> > type2:
> > DECIMAL(39, 16) NOT NULL
> >
> > CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAndCompQuery:359->KylinTestBase.executeQuery:204
> > type mismatch:
> > type1:
> > DECIMAL(19, 4)
> > type2:
> > DECIMAL(39, 16) NOT NULL
> >
> > IIQueryTest.testDetailedQuery:59->KylinTestBase.execAndCompQuery:359->KylinTestBase.executeQuery:204
> > type mismatch:
> > type1:
> > DECIMAL(19, 4)
> > type2:
> > DECIMAL(39, 16) NOT NULL
> >
> > IIQueryTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAndCompQuery:359->KylinTestBase.executeQuery:204
> > type mismatch:
> > type1:
> > DECIMAL(19, 4)
> > type2:
> > DECIMAL(39, 16) NOT NULL
> >
> >
> > Regards,
> > Srinivasan Hariharan
> >
> >
> >
> > > Date: Wed, 17 Jun 2015 11:32:05 +0800
> > > Subject: Re: Hive external Table Dimension
> > > From: shaofengshi@gmail.com
> > > To: dev@kylin.incubator.apache.org
> > >
> > > Srinivasan,
> > >
> > > You can checkout 0.7-staging branch as start; Look into
> > > org.apache.kylin.dict.lookup.HiveTable, the implementation of
> > > "getSignature()" and "getColumnDelimeter()" is not perfect: it calls
> > > "getFileTable()", which will check the underlying HDFS file, as we know
> > > this is not suitable for external table;
> > >
> > > To fix the problem, need re-write two methods; In the new
> > "getSignature()",
> > > using Hive API to get the table's path, size and last modified time, you
> > > may need do some search here; For the new "getColumnDelimeter()", just
> > > return DELIM_AUTO is okay;
> > >
> > > After finish the code and pass all unit test, please create a patch and
> > > attache it in the JIRA for review ("pull request" is not accepted
> > anymore);
> > >
> > > Thanks for the contribution;
> > >
> > >
> > >
> > > 2015-06-17 1:10 GMT+08:00 Srinivasan Hariharan <
> > > srinivasan.hariharan@outlook.com>:
> > >
> > > > Hi ,
> > > >
> > > > I am interested to contribute to this JIRA, could anyone help me out
> > where
> > > > can I start.
> > > >
> > > > https://issues.apache.org/jira/browse/KYLIN-824
> > > >
> > > > Regards,
> > > > Srinivasan Hariharan
> > > >
> > > >
> > > >
> > > > From: srinivasan.hariharan@outlook.com
> > > > To: dev@kylin.incubator.apache.org
> > > > Subject: RE: Hive external Table Dimension
> > > > Date: Thu, 11 Jun 2015 21:51:08 +0530
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Thanks,
> > > >
> > > > I have created JIRA.
> > > >
> > > > https://issues.apache.org/jira/browse/KYLIN-824
> > > >
> > > > I am interested to contribute, i will see the code and update for help.
> > > >
> > > >
> > > > > From: shaoshi@ebay.com
> > > > > To: dev@kylin.incubator.apache.org
> > > > > Subject: Re: Hive external Table Dimension
> > > > > Date: Thu, 11 Jun 2015 14:33:59 +0000
> > > > >
> > > > > Kylin need take snapshot for lookup tables for runtime queries (to
> > derive
> > > > > the dimensions that not on row key), that¹s why it try to seek the
> > > > > underlying data file;
> > > > >
> > > > > So far without this it couldn¹t move ahead; For long run, Kylin can
> > > > > consider to abstract this; Please open a JIRA as requirement if you
> > like;
> > > > >
> > > > > On 6/11/15, 5:45 PM, "Srinivasan Hariharan02" <
> > > > Srinivasan_H02@infosys.com>
> > > > > wrote:
> > > > >
> > > > > >Hi,
> > > > > >
> > > > > >I have a dimension external  table in Hive which is created using
> > Hbase
> > > > > >Storage handler. After creating the cube using this hive  table cube
> > > > > >build job failed  in the "Build Dimension Dictionary" with below
> > error
> > > > > >java.lang.IllegalStateException: Expect 1 and only 1 non-zero file
> > under
> > > > > >hdfs://host:8020/user/hive/warehouse/hbase.db/department/, but find
> > 0
> > > > > >        at
> > > > >
> > >org.apache.kylin.dict.lookup.HiveTable.findOnlyFile(HiveTable.java:123)
> > > > > >        at
> > > > >
> > > >
> > >org.apache.kylin.dict.lookup.HiveTable.computeHDFSLocation(HiveTable.java:
> > > > > >107)
> > > > > >        at
> > > > >
> > > >
> > >org.apache.kylin.dict.lookup.HiveTable.getHDFSLocation(HiveTable.java:83)
> > > > > >        at
> > > > >
> > >org.apache.kylin.dict.lookup.HiveTable.getFileTable(HiveTable.java:76)
> > > > > >        at
> > > > >
> > >org.apache.kylin.dict.lookup.HiveTable.getSignature(HiveTable.java:71)
> > > > > >        at
> > > > >
> > > >
> > >org.apache.kylin.dict.DictionaryManager.buildDictionary(DictionaryManager.
> > > > > >java:164)
> > > > > >        at
> > > > >
> > >org.apache.kylin.cube.CubeManager.buildDictionary(CubeManager.java:154)
> > > > > >        at
> > > > >
> > > >
> > >org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(Dictionary
> > > > > >GeneratorCLI.java:53)
> > > > > >        at
> > > > >
> > > >
> > >org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(Dictionary
> > > > > >GeneratorCLI.java:42)
> > > > > >        at
> > > > >
> > > >
> > >org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJ
> > > > > >ob.java:53)
> > > > > >        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> > > > > >        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> > > > > >        at
> > > > >
> > > >
> > >org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecut
> > > > > >able.java:63)
> > > > > >        at
> > > > >
> > > >
> > >org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutab
> > > > > >le.java:107)
> > > > > >        at
> > > > >
> > > >
> > >org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChai
> > > > > >nedExecutable.java:50)
> > > > > >        at
> > > > >
> > > >
> > >org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutab
> > > > > >le.java:107)
> > > > > >        at
> > > > >
> > > >
> > >org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(Defaul
> > > > > >tScheduler.java:132)
> > > > > >        at
> > > > >
> > > >
> > >java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
> > > > > >1145)
> > > > > >        at
> > > > >
> > > >
> > >java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java
> > > > > >:615)
> > > > > >        at java.lang.Thread.run(Thread.java:744)
> > > > > >
> > > > > >Since external table created from other sources like Hbase hive
> > doesn't
> > > > > >store any data in their warehouse directory. So it should not check
> > for
> > > > > >files under  warehouse dir for external tables. Please help.
> > > > > >
> > > > > >Regards,
> > > > > >Srinivasan Hariharan
> > > > > >Mob +91-9940395830
> > > > > >
> > > > > >
> > > > > >**************** CAUTION - Disclaimer *****************
> > > > > >This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION
> > intended
> > > > > >solely
> > > > > >for the use of the addressee(s). If you are not the intended
> > recipient,
> > > > > >please
> > > > > >notify the sender by e-mail and delete the original message.
> > Further,
> > > > you
> > > > > >are not
> > > > > >to copy, disclose, or distribute this e-mail or its contents to any
> > > > other
> > > > > >person and
> > > > > >any such actions are unlawful. This e-mail may contain viruses.
> > Infosys
> > > > > >has taken
> > > > > >every reasonable precaution to minimize this risk, but is not
> > liable for
> > > > > >any damage
> > > > > >you may sustain as a result of any virus in this e-mail. You should
> > > > carry
> > > > > >out your
> > > > > >own virus checks before opening the e-mail or attachment. Infosys
> > > > > >reserves the
> > > > > >right to monitor and review the content of all messages sent to or
> > from
> > > > > >this e-mail
> > > > > >address. Messages sent to or from this e-mail address may be stored
> > on
> > > > > >the
> > > > > >Infosys e-mail system.
> > > > > >***INFOSYS******** End of Disclaimer ********INFOSYS***
> > > > >
> > > >
> > > >
> > > >
> >
> >
 		 	   		  

Re: Hive external Table Dimension

Posted by ShaoFeng Shi <sh...@gmail.com>.
Oh, to run a full mvn test in v0.7, a HDP sandbox is needed, and the test
cubes need be built before running query tests; I have a jenkins which runs
in sandbox and automates these steps; Please upload your patch and then I
will use the jenkins to test it;

BTW: from v0.8 the regression test and unit test are separated; it will be
easy for user to run a quick unit test;

Thanks;

2015-06-18 4:43 GMT+08:00 Srinivasan Hariharan <
srinivasan.hariharan@outlook.com>:

> Hi,
> I made the changes but Kylin-Query module unit tests fails in 0.7 staging
> branch code. Without my changes also unit tests fails for the query module.
>
>
> CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAndCompQuery:359->KylinTestBase.executeQuery:204
> type mismatch:
> type1:
> DECIMAL(19, 4)
> type2:
> DECIMAL(39, 16) NOT NULL
>
> CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAndCompQuery:359->KylinTestBase.executeQuery:204
> type mismatch:
> type1:
> DECIMAL(19, 4)
> type2:
> DECIMAL(39, 16) NOT NULL
>
> CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAndCompQuery:359->KylinTestBase.executeQuery:204
> type mismatch:
> type1:
> DECIMAL(19, 4)
> type2:
> DECIMAL(39, 16) NOT NULL
>
> CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAndCompQuery:359->KylinTestBase.executeQuery:204
> type mismatch:
> type1:
> DECIMAL(19, 4)
> type2:
> DECIMAL(39, 16) NOT NULL
>
> CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAndCompQuery:359->KylinTestBase.executeQuery:204
> type mismatch:
> type1:
> DECIMAL(19, 4)
> type2:
> DECIMAL(39, 16) NOT NULL
>
> CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAndCompQuery:359->KylinTestBase.executeQuery:204
> type mismatch:
> type1:
> DECIMAL(19, 4)
> type2:
> DECIMAL(39, 16) NOT NULL
>
> IIQueryTest.testDetailedQuery:59->KylinTestBase.execAndCompQuery:359->KylinTestBase.executeQuery:204
> type mismatch:
> type1:
> DECIMAL(19, 4)
> type2:
> DECIMAL(39, 16) NOT NULL
>
> IIQueryTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAndCompQuery:359->KylinTestBase.executeQuery:204
> type mismatch:
> type1:
> DECIMAL(19, 4)
> type2:
> DECIMAL(39, 16) NOT NULL
>
>
> Regards,
> Srinivasan Hariharan
>
>
>
> > Date: Wed, 17 Jun 2015 11:32:05 +0800
> > Subject: Re: Hive external Table Dimension
> > From: shaofengshi@gmail.com
> > To: dev@kylin.incubator.apache.org
> >
> > Srinivasan,
> >
> > You can checkout 0.7-staging branch as start; Look into
> > org.apache.kylin.dict.lookup.HiveTable, the implementation of
> > "getSignature()" and "getColumnDelimeter()" is not perfect: it calls
> > "getFileTable()", which will check the underlying HDFS file, as we know
> > this is not suitable for external table;
> >
> > To fix the problem, need re-write two methods; In the new
> "getSignature()",
> > using Hive API to get the table's path, size and last modified time, you
> > may need do some search here; For the new "getColumnDelimeter()", just
> > return DELIM_AUTO is okay;
> >
> > After finish the code and pass all unit test, please create a patch and
> > attache it in the JIRA for review ("pull request" is not accepted
> anymore);
> >
> > Thanks for the contribution;
> >
> >
> >
> > 2015-06-17 1:10 GMT+08:00 Srinivasan Hariharan <
> > srinivasan.hariharan@outlook.com>:
> >
> > > Hi ,
> > >
> > > I am interested to contribute to this JIRA, could anyone help me out
> where
> > > can I start.
> > >
> > > https://issues.apache.org/jira/browse/KYLIN-824
> > >
> > > Regards,
> > > Srinivasan Hariharan
> > >
> > >
> > >
> > > From: srinivasan.hariharan@outlook.com
> > > To: dev@kylin.incubator.apache.org
> > > Subject: RE: Hive external Table Dimension
> > > Date: Thu, 11 Jun 2015 21:51:08 +0530
> > >
> > >
> > >
> > >
> > >
> > > Thanks,
> > >
> > > I have created JIRA.
> > >
> > > https://issues.apache.org/jira/browse/KYLIN-824
> > >
> > > I am interested to contribute, i will see the code and update for help.
> > >
> > >
> > > > From: shaoshi@ebay.com
> > > > To: dev@kylin.incubator.apache.org
> > > > Subject: Re: Hive external Table Dimension
> > > > Date: Thu, 11 Jun 2015 14:33:59 +0000
> > > >
> > > > Kylin need take snapshot for lookup tables for runtime queries (to
> derive
> > > > the dimensions that not on row key), that¹s why it try to seek the
> > > > underlying data file;
> > > >
> > > > So far without this it couldn¹t move ahead; For long run, Kylin can
> > > > consider to abstract this; Please open a JIRA as requirement if you
> like;
> > > >
> > > > On 6/11/15, 5:45 PM, "Srinivasan Hariharan02" <
> > > Srinivasan_H02@infosys.com>
> > > > wrote:
> > > >
> > > > >Hi,
> > > > >
> > > > >I have a dimension external  table in Hive which is created using
> Hbase
> > > > >Storage handler. After creating the cube using this hive  table cube
> > > > >build job failed  in the "Build Dimension Dictionary" with below
> error
> > > > >java.lang.IllegalStateException: Expect 1 and only 1 non-zero file
> under
> > > > >hdfs://host:8020/user/hive/warehouse/hbase.db/department/, but find
> 0
> > > > >        at
> > > >
> >org.apache.kylin.dict.lookup.HiveTable.findOnlyFile(HiveTable.java:123)
> > > > >        at
> > > >
> > >
> >org.apache.kylin.dict.lookup.HiveTable.computeHDFSLocation(HiveTable.java:
> > > > >107)
> > > > >        at
> > > >
> > >
> >org.apache.kylin.dict.lookup.HiveTable.getHDFSLocation(HiveTable.java:83)
> > > > >        at
> > > >
> >org.apache.kylin.dict.lookup.HiveTable.getFileTable(HiveTable.java:76)
> > > > >        at
> > > >
> >org.apache.kylin.dict.lookup.HiveTable.getSignature(HiveTable.java:71)
> > > > >        at
> > > >
> > >
> >org.apache.kylin.dict.DictionaryManager.buildDictionary(DictionaryManager.
> > > > >java:164)
> > > > >        at
> > > >
> >org.apache.kylin.cube.CubeManager.buildDictionary(CubeManager.java:154)
> > > > >        at
> > > >
> > >
> >org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(Dictionary
> > > > >GeneratorCLI.java:53)
> > > > >        at
> > > >
> > >
> >org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(Dictionary
> > > > >GeneratorCLI.java:42)
> > > > >        at
> > > >
> > >
> >org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJ
> > > > >ob.java:53)
> > > > >        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> > > > >        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> > > > >        at
> > > >
> > >
> >org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecut
> > > > >able.java:63)
> > > > >        at
> > > >
> > >
> >org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutab
> > > > >le.java:107)
> > > > >        at
> > > >
> > >
> >org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChai
> > > > >nedExecutable.java:50)
> > > > >        at
> > > >
> > >
> >org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutab
> > > > >le.java:107)
> > > > >        at
> > > >
> > >
> >org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(Defaul
> > > > >tScheduler.java:132)
> > > > >        at
> > > >
> > >
> >java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
> > > > >1145)
> > > > >        at
> > > >
> > >
> >java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java
> > > > >:615)
> > > > >        at java.lang.Thread.run(Thread.java:744)
> > > > >
> > > > >Since external table created from other sources like Hbase hive
> doesn't
> > > > >store any data in their warehouse directory. So it should not check
> for
> > > > >files under  warehouse dir for external tables. Please help.
> > > > >
> > > > >Regards,
> > > > >Srinivasan Hariharan
> > > > >Mob +91-9940395830
> > > > >
> > > > >
> > > > >**************** CAUTION - Disclaimer *****************
> > > > >This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION
> intended
> > > > >solely
> > > > >for the use of the addressee(s). If you are not the intended
> recipient,
> > > > >please
> > > > >notify the sender by e-mail and delete the original message.
> Further,
> > > you
> > > > >are not
> > > > >to copy, disclose, or distribute this e-mail or its contents to any
> > > other
> > > > >person and
> > > > >any such actions are unlawful. This e-mail may contain viruses.
> Infosys
> > > > >has taken
> > > > >every reasonable precaution to minimize this risk, but is not
> liable for
> > > > >any damage
> > > > >you may sustain as a result of any virus in this e-mail. You should
> > > carry
> > > > >out your
> > > > >own virus checks before opening the e-mail or attachment. Infosys
> > > > >reserves the
> > > > >right to monitor and review the content of all messages sent to or
> from
> > > > >this e-mail
> > > > >address. Messages sent to or from this e-mail address may be stored
> on
> > > > >the
> > > > >Infosys e-mail system.
> > > > >***INFOSYS******** End of Disclaimer ********INFOSYS***
> > > >
> > >
> > >
> > >
>
>

RE: Hive external Table Dimension

Posted by Srinivasan Hariharan <sr...@outlook.com>.
Hi Shi,

I am running the unit tests, it takes almost around >2hrs to complete all unit test cases. my configuration is 8GB ram and 4 core processors in centos OS .

> From: shaoshi@ebay.com
> To: dev@kylin.incubator.apache.org
> Subject: Re: Hive external Table Dimension
> Date: Thu, 18 Jun 2015 09:05:50 +0000
> 
> The bug was fixed and all tests passed in my Jenkins; Please pull the
> latest code from 0.7-staging and try again; Thanks;
> 
> On 6/18/15, 3:01 PM, "Shi, Shaofeng" <sh...@ebay.com> wrote:
> 
> >The unit test was broken with a commit in yesterday; we’re fixing it; Will
> >update you when it got fixed;
> >
> >On 6/18/15, 4:43 AM, "Srinivasan Hariharan"
> ><sr...@outlook.com> wrote:
> >
> >>Hi,
> >>I made the changes but Kylin-Query module unit tests fails in 0.7 staging
> >>branch code. Without my changes also unit tests fails for the query
> >>module.
> >>
> >>  
> >>CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAnd
> >>C
> >>ompQuery:359->KylinTestBase.executeQuery:204 type mismatch:
> >>type1:
> >>DECIMAL(19, 4)
> >>type2:
> >>DECIMAL(39, 16) NOT NULL
> >>  
> >>CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAnd
> >>C
> >>ompQuery:359->KylinTestBase.executeQuery:204 type mismatch:
> >>type1:
> >>DECIMAL(19, 4)
> >>type2:
> >>DECIMAL(39, 16) NOT NULL
> >>  
> >>CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAnd
> >>C
> >>ompQuery:359->KylinTestBase.executeQuery:204 type mismatch:
> >>type1:
> >>DECIMAL(19, 4)
> >>type2:
> >>DECIMAL(39, 16) NOT NULL
> >>  
> >>CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAnd
> >>C
> >>ompQuery:359->KylinTestBase.executeQuery:204 type mismatch:
> >>type1:
> >>DECIMAL(19, 4)
> >>type2:
> >>DECIMAL(39, 16) NOT NULL
> >>  
> >>CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAnd
> >>C
> >>ompQuery:359->KylinTestBase.executeQuery:204 type mismatch:
> >>type1:
> >>DECIMAL(19, 4)
> >>type2:
> >>DECIMAL(39, 16) NOT NULL
> >>  
> >>CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAnd
> >>C
> >>ompQuery:359->KylinTestBase.executeQuery:204 type mismatch:
> >>type1:
> >>DECIMAL(19, 4)
> >>type2:
> >>DECIMAL(39, 16) NOT NULL
> >>  
> >>IIQueryTest.testDetailedQuery:59->KylinTestBase.execAndCompQuery:359->Kyl
> >>i
> >>nTestBase.executeQuery:204 type mismatch:
> >>type1:
> >>DECIMAL(19, 4)
> >>type2:
> >>DECIMAL(39, 16) NOT NULL
> >>  
> >>IIQueryTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAndComp
> >>Q
> >>uery:359->KylinTestBase.executeQuery:204 type mismatch:
> >>type1:
> >>DECIMAL(19, 4)
> >>type2:
> >>DECIMAL(39, 16) NOT NULL
> >>
> >>
> >>Regards,
> >>Srinivasan Hariharan
> >>
> >>
> >>
> >>> Date: Wed, 17 Jun 2015 11:32:05 +0800
> >>> Subject: Re: Hive external Table Dimension
> >>> From: shaofengshi@gmail.com
> >>> To: dev@kylin.incubator.apache.org
> >>> 
> >>> Srinivasan,
> >>> 
> >>> You can checkout 0.7-staging branch as start; Look into
> >>> org.apache.kylin.dict.lookup.HiveTable, the implementation of
> >>> "getSignature()" and "getColumnDelimeter()" is not perfect: it calls
> >>> "getFileTable()", which will check the underlying HDFS file, as we know
> >>> this is not suitable for external table;
> >>> 
> >>> To fix the problem, need re-write two methods; In the new
> >>>"getSignature()",
> >>> using Hive API to get the table's path, size and last modified time,
> >>>you
> >>> may need do some search here; For the new "getColumnDelimeter()", just
> >>> return DELIM_AUTO is okay;
> >>> 
> >>> After finish the code and pass all unit test, please create a patch and
> >>> attache it in the JIRA for review ("pull request" is not accepted
> >>>anymore);
> >>> 
> >>> Thanks for the contribution;
> >>> 
> >>> 
> >>> 
> >>> 2015-06-17 1:10 GMT+08:00 Srinivasan Hariharan <
> >>> srinivasan.hariharan@outlook.com>:
> >>> 
> >>> > Hi ,
> >>> >
> >>> > I am interested to contribute to this JIRA, could anyone help me out
> >>>where
> >>> > can I start.
> >>> >
> >>> > https://issues.apache.org/jira/browse/KYLIN-824
> >>> >
> >>> > Regards,
> >>> > Srinivasan Hariharan
> >>> >
> >>> >
> >>> >
> >>> > From: srinivasan.hariharan@outlook.com
> >>> > To: dev@kylin.incubator.apache.org
> >>> > Subject: RE: Hive external Table Dimension
> >>> > Date: Thu, 11 Jun 2015 21:51:08 +0530
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >
> >>> > Thanks,
> >>> >
> >>> > I have created JIRA.
> >>> >
> >>> > https://issues.apache.org/jira/browse/KYLIN-824
> >>> >
> >>> > I am interested to contribute, i will see the code and update for
> >>>help.
> >>> >
> >>> >
> >>> > > From: shaoshi@ebay.com
> >>> > > To: dev@kylin.incubator.apache.org
> >>> > > Subject: Re: Hive external Table Dimension
> >>> > > Date: Thu, 11 Jun 2015 14:33:59 +0000
> >>> > >
> >>> > > Kylin need take snapshot for lookup tables for runtime queries (to
> >>>derive
> >>> > > the dimensions that not on row key), that¹s why it try to seek the
> >>> > > underlying data file;
> >>> > >
> >>> > > So far without this it couldn¹t move ahead; For long run, Kylin can
> >>> > > consider to abstract this; Please open a JIRA as requirement if you
> >>>like;
> >>> > >
> >>> > > On 6/11/15, 5:45 PM, "Srinivasan Hariharan02" <
> >>> > Srinivasan_H02@infosys.com>
> >>> > > wrote:
> >>> > >
> >>> > > >Hi,
> >>> > > >
> >>> > > >I have a dimension external  table in Hive which is created using
> >>>Hbase
> >>> > > >Storage handler. After creating the cube using this hive  table
> >>>cube
> >>> > > >build job failed  in the "Build Dimension Dictionary" with below
> >>>error
> >>> > > >java.lang.IllegalStateException: Expect 1 and only 1 non-zero file
> >>>under
> >>> > > >hdfs://host:8020/user/hive/warehouse/hbase.db/department/, but
> >>>find 0
> >>> > > >        at
> >>> > > 
> >>>>org.apache.kylin.dict.lookup.HiveTable.findOnlyFile(HiveTable.java:123)
> >>> > > >        at
> >>> > >
> >>> > 
> >>>>org.apache.kylin.dict.lookup.HiveTable.computeHDFSLocation(HiveTable.ja
> >>>>v
> >>>>a:
> >>> > > >107)
> >>> > > >        at
> >>> > >
> >>> > 
> >>>>org.apache.kylin.dict.lookup.HiveTable.getHDFSLocation(HiveTable.java:8
> >>>>3
> >>>>)
> >>> > > >        at
> >>> > > 
> >>>>org.apache.kylin.dict.lookup.HiveTable.getFileTable(HiveTable.java:76)
> >>> > > >        at
> >>> > > 
> >>>>org.apache.kylin.dict.lookup.HiveTable.getSignature(HiveTable.java:71)
> >>> > > >        at
> >>> > >
> >>> > 
> >>>>org.apache.kylin.dict.DictionaryManager.buildDictionary(DictionaryManag
> >>>>e
> >>>>r.
> >>> > > >java:164)
> >>> > > >        at
> >>> > > 
> >>>>org.apache.kylin.cube.CubeManager.buildDictionary(CubeManager.java:154)
> >>> > > >        at
> >>> > >
> >>> > 
> >>>>org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(Diction
> >>>>a
> >>>>ry
> >>> > > >GeneratorCLI.java:53)
> >>> > > >        at
> >>> > >
> >>> > 
> >>>>org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(Diction
> >>>>a
> >>>>ry
> >>> > > >GeneratorCLI.java:42)
> >>> > > >        at
> >>> > >
> >>> > 
> >>>>org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictiona
> >>>>r
> >>>>yJ
> >>> > > >ob.java:53)
> >>> > > >        at
> >>>org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> >>> > > >        at
> >>>org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> >>> > > >        at
> >>> > >
> >>> > 
> >>>>org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExe
> >>>>c
> >>>>ut
> >>> > > >able.java:63)
> >>> > > >        at
> >>> > >
> >>> > 
> >>>>org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecu
> >>>>t
> >>>>ab
> >>> > > >le.java:107)
> >>> > > >        at
> >>> > >
> >>> > 
> >>>>org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultC
> >>>>h
> >>>>ai
> >>> > > >nedExecutable.java:50)
> >>> > > >        at
> >>> > >
> >>> > 
> >>>>org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecu
> >>>>t
> >>>>ab
> >>> > > >le.java:107)
> >>> > > >        at
> >>> > >
> >>> > 
> >>>>org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(Def
> >>>>a
> >>>>ul
> >>> > > >tScheduler.java:132)
> >>> > > >        at
> >>> > >
> >>> > 
> >>>>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.ja
> >>>>v
> >>>>a:
> >>> > > >1145)
> >>> > > >        at
> >>> > >
> >>> > 
> >>>>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.j
> >>>>a
> >>>>va
> >>> > > >:615)
> >>> > > >        at java.lang.Thread.run(Thread.java:744)
> >>> > > >
> >>> > > >Since external table created from other sources like Hbase hive
> >>>doesn't
> >>> > > >store any data in their warehouse directory. So it should not
> >>>check for
> >>> > > >files under  warehouse dir for external tables. Please help.
> >>> > > >
> >>> > > >Regards,
> >>> > > >Srinivasan Hariharan
> >>> > > >Mob +91-9940395830
> >>> > > >
> >>> > > >
> >>> > > >**************** CAUTION - Disclaimer *****************
> >>> > > >This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION
> >>>intended
> >>> > > >solely
> >>> > > >for the use of the addressee(s). If you are not the intended
> >>>recipient,
> >>> > > >please
> >>> > > >notify the sender by e-mail and delete the original message.
> >>>Further,
> >>> > you
> >>> > > >are not
> >>> > > >to copy, disclose, or distribute this e-mail or its contents to
> >>>any
> >>> > other
> >>> > > >person and
> >>> > > >any such actions are unlawful. This e-mail may contain viruses.
> >>>Infosys
> >>> > > >has taken
> >>> > > >every reasonable precaution to minimize this risk, but is not
> >>>liable for
> >>> > > >any damage
> >>> > > >you may sustain as a result of any virus in this e-mail. You
> >>>should
> >>> > carry
> >>> > > >out your
> >>> > > >own virus checks before opening the e-mail or attachment. Infosys
> >>> > > >reserves the
> >>> > > >right to monitor and review the content of all messages sent to or
> >>>from
> >>> > > >this e-mail
> >>> > > >address. Messages sent to or from this e-mail address may be
> >>>stored on
> >>> > > >the
> >>> > > >Infosys e-mail system.
> >>> > > >***INFOSYS******** End of Disclaimer ********INFOSYS***
> >>> > >
> >>> >
> >>> >
> >>> >
> >> 		 	   		  
> >
> 
 		 	   		  

Re: Hive external Table Dimension

Posted by "Shi, Shaofeng" <sh...@ebay.com>.
The bug was fixed and all tests passed in my Jenkins; Please pull the
latest code from 0.7-staging and try again; Thanks;

On 6/18/15, 3:01 PM, "Shi, Shaofeng" <sh...@ebay.com> wrote:

>The unit test was broken with a commit in yesterday; we’re fixing it; Will
>update you when it got fixed;
>
>On 6/18/15, 4:43 AM, "Srinivasan Hariharan"
><sr...@outlook.com> wrote:
>
>>Hi,
>>I made the changes but Kylin-Query module unit tests fails in 0.7 staging
>>branch code. Without my changes also unit tests fails for the query
>>module.
>>
>>  
>>CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAnd
>>C
>>ompQuery:359->KylinTestBase.executeQuery:204 type mismatch:
>>type1:
>>DECIMAL(19, 4)
>>type2:
>>DECIMAL(39, 16) NOT NULL
>>  
>>CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAnd
>>C
>>ompQuery:359->KylinTestBase.executeQuery:204 type mismatch:
>>type1:
>>DECIMAL(19, 4)
>>type2:
>>DECIMAL(39, 16) NOT NULL
>>  
>>CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAnd
>>C
>>ompQuery:359->KylinTestBase.executeQuery:204 type mismatch:
>>type1:
>>DECIMAL(19, 4)
>>type2:
>>DECIMAL(39, 16) NOT NULL
>>  
>>CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAnd
>>C
>>ompQuery:359->KylinTestBase.executeQuery:204 type mismatch:
>>type1:
>>DECIMAL(19, 4)
>>type2:
>>DECIMAL(39, 16) NOT NULL
>>  
>>CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAnd
>>C
>>ompQuery:359->KylinTestBase.executeQuery:204 type mismatch:
>>type1:
>>DECIMAL(19, 4)
>>type2:
>>DECIMAL(39, 16) NOT NULL
>>  
>>CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAnd
>>C
>>ompQuery:359->KylinTestBase.executeQuery:204 type mismatch:
>>type1:
>>DECIMAL(19, 4)
>>type2:
>>DECIMAL(39, 16) NOT NULL
>>  
>>IIQueryTest.testDetailedQuery:59->KylinTestBase.execAndCompQuery:359->Kyl
>>i
>>nTestBase.executeQuery:204 type mismatch:
>>type1:
>>DECIMAL(19, 4)
>>type2:
>>DECIMAL(39, 16) NOT NULL
>>  
>>IIQueryTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAndComp
>>Q
>>uery:359->KylinTestBase.executeQuery:204 type mismatch:
>>type1:
>>DECIMAL(19, 4)
>>type2:
>>DECIMAL(39, 16) NOT NULL
>>
>>
>>Regards,
>>Srinivasan Hariharan
>>
>>
>>
>>> Date: Wed, 17 Jun 2015 11:32:05 +0800
>>> Subject: Re: Hive external Table Dimension
>>> From: shaofengshi@gmail.com
>>> To: dev@kylin.incubator.apache.org
>>> 
>>> Srinivasan,
>>> 
>>> You can checkout 0.7-staging branch as start; Look into
>>> org.apache.kylin.dict.lookup.HiveTable, the implementation of
>>> "getSignature()" and "getColumnDelimeter()" is not perfect: it calls
>>> "getFileTable()", which will check the underlying HDFS file, as we know
>>> this is not suitable for external table;
>>> 
>>> To fix the problem, need re-write two methods; In the new
>>>"getSignature()",
>>> using Hive API to get the table's path, size and last modified time,
>>>you
>>> may need do some search here; For the new "getColumnDelimeter()", just
>>> return DELIM_AUTO is okay;
>>> 
>>> After finish the code and pass all unit test, please create a patch and
>>> attache it in the JIRA for review ("pull request" is not accepted
>>>anymore);
>>> 
>>> Thanks for the contribution;
>>> 
>>> 
>>> 
>>> 2015-06-17 1:10 GMT+08:00 Srinivasan Hariharan <
>>> srinivasan.hariharan@outlook.com>:
>>> 
>>> > Hi ,
>>> >
>>> > I am interested to contribute to this JIRA, could anyone help me out
>>>where
>>> > can I start.
>>> >
>>> > https://issues.apache.org/jira/browse/KYLIN-824
>>> >
>>> > Regards,
>>> > Srinivasan Hariharan
>>> >
>>> >
>>> >
>>> > From: srinivasan.hariharan@outlook.com
>>> > To: dev@kylin.incubator.apache.org
>>> > Subject: RE: Hive external Table Dimension
>>> > Date: Thu, 11 Jun 2015 21:51:08 +0530
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > Thanks,
>>> >
>>> > I have created JIRA.
>>> >
>>> > https://issues.apache.org/jira/browse/KYLIN-824
>>> >
>>> > I am interested to contribute, i will see the code and update for
>>>help.
>>> >
>>> >
>>> > > From: shaoshi@ebay.com
>>> > > To: dev@kylin.incubator.apache.org
>>> > > Subject: Re: Hive external Table Dimension
>>> > > Date: Thu, 11 Jun 2015 14:33:59 +0000
>>> > >
>>> > > Kylin need take snapshot for lookup tables for runtime queries (to
>>>derive
>>> > > the dimensions that not on row key), that¹s why it try to seek the
>>> > > underlying data file;
>>> > >
>>> > > So far without this it couldn¹t move ahead; For long run, Kylin can
>>> > > consider to abstract this; Please open a JIRA as requirement if you
>>>like;
>>> > >
>>> > > On 6/11/15, 5:45 PM, "Srinivasan Hariharan02" <
>>> > Srinivasan_H02@infosys.com>
>>> > > wrote:
>>> > >
>>> > > >Hi,
>>> > > >
>>> > > >I have a dimension external  table in Hive which is created using
>>>Hbase
>>> > > >Storage handler. After creating the cube using this hive  table
>>>cube
>>> > > >build job failed  in the "Build Dimension Dictionary" with below
>>>error
>>> > > >java.lang.IllegalStateException: Expect 1 and only 1 non-zero file
>>>under
>>> > > >hdfs://host:8020/user/hive/warehouse/hbase.db/department/, but
>>>find 0
>>> > > >        at
>>> > > 
>>>>org.apache.kylin.dict.lookup.HiveTable.findOnlyFile(HiveTable.java:123)
>>> > > >        at
>>> > >
>>> > 
>>>>org.apache.kylin.dict.lookup.HiveTable.computeHDFSLocation(HiveTable.ja
>>>>v
>>>>a:
>>> > > >107)
>>> > > >        at
>>> > >
>>> > 
>>>>org.apache.kylin.dict.lookup.HiveTable.getHDFSLocation(HiveTable.java:8
>>>>3
>>>>)
>>> > > >        at
>>> > > 
>>>>org.apache.kylin.dict.lookup.HiveTable.getFileTable(HiveTable.java:76)
>>> > > >        at
>>> > > 
>>>>org.apache.kylin.dict.lookup.HiveTable.getSignature(HiveTable.java:71)
>>> > > >        at
>>> > >
>>> > 
>>>>org.apache.kylin.dict.DictionaryManager.buildDictionary(DictionaryManag
>>>>e
>>>>r.
>>> > > >java:164)
>>> > > >        at
>>> > > 
>>>>org.apache.kylin.cube.CubeManager.buildDictionary(CubeManager.java:154)
>>> > > >        at
>>> > >
>>> > 
>>>>org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(Diction
>>>>a
>>>>ry
>>> > > >GeneratorCLI.java:53)
>>> > > >        at
>>> > >
>>> > 
>>>>org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(Diction
>>>>a
>>>>ry
>>> > > >GeneratorCLI.java:42)
>>> > > >        at
>>> > >
>>> > 
>>>>org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictiona
>>>>r
>>>>yJ
>>> > > >ob.java:53)
>>> > > >        at
>>>org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>> > > >        at
>>>org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>>> > > >        at
>>> > >
>>> > 
>>>>org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExe
>>>>c
>>>>ut
>>> > > >able.java:63)
>>> > > >        at
>>> > >
>>> > 
>>>>org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecu
>>>>t
>>>>ab
>>> > > >le.java:107)
>>> > > >        at
>>> > >
>>> > 
>>>>org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultC
>>>>h
>>>>ai
>>> > > >nedExecutable.java:50)
>>> > > >        at
>>> > >
>>> > 
>>>>org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecu
>>>>t
>>>>ab
>>> > > >le.java:107)
>>> > > >        at
>>> > >
>>> > 
>>>>org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(Def
>>>>a
>>>>ul
>>> > > >tScheduler.java:132)
>>> > > >        at
>>> > >
>>> > 
>>>>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.ja
>>>>v
>>>>a:
>>> > > >1145)
>>> > > >        at
>>> > >
>>> > 
>>>>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.j
>>>>a
>>>>va
>>> > > >:615)
>>> > > >        at java.lang.Thread.run(Thread.java:744)
>>> > > >
>>> > > >Since external table created from other sources like Hbase hive
>>>doesn't
>>> > > >store any data in their warehouse directory. So it should not
>>>check for
>>> > > >files under  warehouse dir for external tables. Please help.
>>> > > >
>>> > > >Regards,
>>> > > >Srinivasan Hariharan
>>> > > >Mob +91-9940395830
>>> > > >
>>> > > >
>>> > > >**************** CAUTION - Disclaimer *****************
>>> > > >This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION
>>>intended
>>> > > >solely
>>> > > >for the use of the addressee(s). If you are not the intended
>>>recipient,
>>> > > >please
>>> > > >notify the sender by e-mail and delete the original message.
>>>Further,
>>> > you
>>> > > >are not
>>> > > >to copy, disclose, or distribute this e-mail or its contents to
>>>any
>>> > other
>>> > > >person and
>>> > > >any such actions are unlawful. This e-mail may contain viruses.
>>>Infosys
>>> > > >has taken
>>> > > >every reasonable precaution to minimize this risk, but is not
>>>liable for
>>> > > >any damage
>>> > > >you may sustain as a result of any virus in this e-mail. You
>>>should
>>> > carry
>>> > > >out your
>>> > > >own virus checks before opening the e-mail or attachment. Infosys
>>> > > >reserves the
>>> > > >right to monitor and review the content of all messages sent to or
>>>from
>>> > > >this e-mail
>>> > > >address. Messages sent to or from this e-mail address may be
>>>stored on
>>> > > >the
>>> > > >Infosys e-mail system.
>>> > > >***INFOSYS******** End of Disclaimer ********INFOSYS***
>>> > >
>>> >
>>> >
>>> >
>> 		 	   		  
>


Re: Hive external Table Dimension

Posted by "Shi, Shaofeng" <sh...@ebay.com>.
The unit test was broken with a commit in yesterday; we’re fixing it; Will
update you when it got fixed;

On 6/18/15, 4:43 AM, "Srinivasan Hariharan"
<sr...@outlook.com> wrote:

>Hi,
>I made the changes but Kylin-Query module unit tests fails in 0.7 staging
>branch code. Without my changes also unit tests fails for the query
>module.
>
>  
>CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAndC
>ompQuery:359->KylinTestBase.executeQuery:204 type mismatch:
>type1:
>DECIMAL(19, 4)
>type2:
>DECIMAL(39, 16) NOT NULL
>  
>CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAndC
>ompQuery:359->KylinTestBase.executeQuery:204 type mismatch:
>type1:
>DECIMAL(19, 4)
>type2:
>DECIMAL(39, 16) NOT NULL
>  
>CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAndC
>ompQuery:359->KylinTestBase.executeQuery:204 type mismatch:
>type1:
>DECIMAL(19, 4)
>type2:
>DECIMAL(39, 16) NOT NULL
>  
>CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAndC
>ompQuery:359->KylinTestBase.executeQuery:204 type mismatch:
>type1:
>DECIMAL(19, 4)
>type2:
>DECIMAL(39, 16) NOT NULL
>  
>CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAndC
>ompQuery:359->KylinTestBase.executeQuery:204 type mismatch:
>type1:
>DECIMAL(19, 4)
>type2:
>DECIMAL(39, 16) NOT NULL
>  
>CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAndC
>ompQuery:359->KylinTestBase.executeQuery:204 type mismatch:
>type1:
>DECIMAL(19, 4)
>type2:
>DECIMAL(39, 16) NOT NULL
>  
>IIQueryTest.testDetailedQuery:59->KylinTestBase.execAndCompQuery:359->Kyli
>nTestBase.executeQuery:204 type mismatch:
>type1:
>DECIMAL(19, 4)
>type2:
>DECIMAL(39, 16) NOT NULL
>  
>IIQueryTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAndCompQ
>uery:359->KylinTestBase.executeQuery:204 type mismatch:
>type1:
>DECIMAL(19, 4)
>type2:
>DECIMAL(39, 16) NOT NULL
>
>
>Regards,
>Srinivasan Hariharan
>
>
>
>> Date: Wed, 17 Jun 2015 11:32:05 +0800
>> Subject: Re: Hive external Table Dimension
>> From: shaofengshi@gmail.com
>> To: dev@kylin.incubator.apache.org
>> 
>> Srinivasan,
>> 
>> You can checkout 0.7-staging branch as start; Look into
>> org.apache.kylin.dict.lookup.HiveTable, the implementation of
>> "getSignature()" and "getColumnDelimeter()" is not perfect: it calls
>> "getFileTable()", which will check the underlying HDFS file, as we know
>> this is not suitable for external table;
>> 
>> To fix the problem, need re-write two methods; In the new
>>"getSignature()",
>> using Hive API to get the table's path, size and last modified time, you
>> may need do some search here; For the new "getColumnDelimeter()", just
>> return DELIM_AUTO is okay;
>> 
>> After finish the code and pass all unit test, please create a patch and
>> attache it in the JIRA for review ("pull request" is not accepted
>>anymore);
>> 
>> Thanks for the contribution;
>> 
>> 
>> 
>> 2015-06-17 1:10 GMT+08:00 Srinivasan Hariharan <
>> srinivasan.hariharan@outlook.com>:
>> 
>> > Hi ,
>> >
>> > I am interested to contribute to this JIRA, could anyone help me out
>>where
>> > can I start.
>> >
>> > https://issues.apache.org/jira/browse/KYLIN-824
>> >
>> > Regards,
>> > Srinivasan Hariharan
>> >
>> >
>> >
>> > From: srinivasan.hariharan@outlook.com
>> > To: dev@kylin.incubator.apache.org
>> > Subject: RE: Hive external Table Dimension
>> > Date: Thu, 11 Jun 2015 21:51:08 +0530
>> >
>> >
>> >
>> >
>> >
>> > Thanks,
>> >
>> > I have created JIRA.
>> >
>> > https://issues.apache.org/jira/browse/KYLIN-824
>> >
>> > I am interested to contribute, i will see the code and update for
>>help.
>> >
>> >
>> > > From: shaoshi@ebay.com
>> > > To: dev@kylin.incubator.apache.org
>> > > Subject: Re: Hive external Table Dimension
>> > > Date: Thu, 11 Jun 2015 14:33:59 +0000
>> > >
>> > > Kylin need take snapshot for lookup tables for runtime queries (to
>>derive
>> > > the dimensions that not on row key), that¹s why it try to seek the
>> > > underlying data file;
>> > >
>> > > So far without this it couldn¹t move ahead; For long run, Kylin can
>> > > consider to abstract this; Please open a JIRA as requirement if you
>>like;
>> > >
>> > > On 6/11/15, 5:45 PM, "Srinivasan Hariharan02" <
>> > Srinivasan_H02@infosys.com>
>> > > wrote:
>> > >
>> > > >Hi,
>> > > >
>> > > >I have a dimension external  table in Hive which is created using
>>Hbase
>> > > >Storage handler. After creating the cube using this hive  table
>>cube
>> > > >build job failed  in the "Build Dimension Dictionary" with below
>>error
>> > > >java.lang.IllegalStateException: Expect 1 and only 1 non-zero file
>>under
>> > > >hdfs://host:8020/user/hive/warehouse/hbase.db/department/, but
>>find 0
>> > > >        at
>> > > 
>>>org.apache.kylin.dict.lookup.HiveTable.findOnlyFile(HiveTable.java:123)
>> > > >        at
>> > >
>> > 
>>>org.apache.kylin.dict.lookup.HiveTable.computeHDFSLocation(HiveTable.jav
>>>a:
>> > > >107)
>> > > >        at
>> > >
>> > 
>>>org.apache.kylin.dict.lookup.HiveTable.getHDFSLocation(HiveTable.java:83
>>>)
>> > > >        at
>> > > 
>>>org.apache.kylin.dict.lookup.HiveTable.getFileTable(HiveTable.java:76)
>> > > >        at
>> > > 
>>>org.apache.kylin.dict.lookup.HiveTable.getSignature(HiveTable.java:71)
>> > > >        at
>> > >
>> > 
>>>org.apache.kylin.dict.DictionaryManager.buildDictionary(DictionaryManage
>>>r.
>> > > >java:164)
>> > > >        at
>> > > 
>>>org.apache.kylin.cube.CubeManager.buildDictionary(CubeManager.java:154)
>> > > >        at
>> > >
>> > 
>>>org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(Dictiona
>>>ry
>> > > >GeneratorCLI.java:53)
>> > > >        at
>> > >
>> > 
>>>org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(Dictiona
>>>ry
>> > > >GeneratorCLI.java:42)
>> > > >        at
>> > >
>> > 
>>>org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionar
>>>yJ
>> > > >ob.java:53)
>> > > >        at 
>>org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>> > > >        at 
>>org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>> > > >        at
>> > >
>> > 
>>>org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExec
>>>ut
>> > > >able.java:63)
>> > > >        at
>> > >
>> > 
>>>org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecut
>>>ab
>> > > >le.java:107)
>> > > >        at
>> > >
>> > 
>>>org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultCh
>>>ai
>> > > >nedExecutable.java:50)
>> > > >        at
>> > >
>> > 
>>>org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecut
>>>ab
>> > > >le.java:107)
>> > > >        at
>> > >
>> > 
>>>org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(Defa
>>>ul
>> > > >tScheduler.java:132)
>> > > >        at
>> > >
>> > 
>>>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.jav
>>>a:
>> > > >1145)
>> > > >        at
>> > >
>> > 
>>>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja
>>>va
>> > > >:615)
>> > > >        at java.lang.Thread.run(Thread.java:744)
>> > > >
>> > > >Since external table created from other sources like Hbase hive
>>doesn't
>> > > >store any data in their warehouse directory. So it should not
>>check for
>> > > >files under  warehouse dir for external tables. Please help.
>> > > >
>> > > >Regards,
>> > > >Srinivasan Hariharan
>> > > >Mob +91-9940395830
>> > > >
>> > > >
>> > > >**************** CAUTION - Disclaimer *****************
>> > > >This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION
>>intended
>> > > >solely
>> > > >for the use of the addressee(s). If you are not the intended
>>recipient,
>> > > >please
>> > > >notify the sender by e-mail and delete the original message.
>>Further,
>> > you
>> > > >are not
>> > > >to copy, disclose, or distribute this e-mail or its contents to any
>> > other
>> > > >person and
>> > > >any such actions are unlawful. This e-mail may contain viruses.
>>Infosys
>> > > >has taken
>> > > >every reasonable precaution to minimize this risk, but is not
>>liable for
>> > > >any damage
>> > > >you may sustain as a result of any virus in this e-mail. You should
>> > carry
>> > > >out your
>> > > >own virus checks before opening the e-mail or attachment. Infosys
>> > > >reserves the
>> > > >right to monitor and review the content of all messages sent to or
>>from
>> > > >this e-mail
>> > > >address. Messages sent to or from this e-mail address may be
>>stored on
>> > > >the
>> > > >Infosys e-mail system.
>> > > >***INFOSYS******** End of Disclaimer ********INFOSYS***
>> > >
>> >
>> >
>> >
> 		 	   		  


RE: Hive external Table Dimension

Posted by Srinivasan Hariharan <sr...@outlook.com>.
Hi,
I made the changes but Kylin-Query module unit tests fails in 0.7 staging branch code. Without my changes also unit tests fails for the query module.

  CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAndCompQuery:359->KylinTestBase.executeQuery:204 type mismatch:
type1:
DECIMAL(19, 4)
type2:
DECIMAL(39, 16) NOT NULL
  CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAndCompQuery:359->KylinTestBase.executeQuery:204 type mismatch:
type1:
DECIMAL(19, 4)
type2:
DECIMAL(39, 16) NOT NULL
  CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAndCompQuery:359->KylinTestBase.executeQuery:204 type mismatch:
type1:
DECIMAL(19, 4)
type2:
DECIMAL(39, 16) NOT NULL
  CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAndCompQuery:359->KylinTestBase.executeQuery:204 type mismatch:
type1:
DECIMAL(19, 4)
type2:
DECIMAL(39, 16) NOT NULL
  CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAndCompQuery:359->KylinTestBase.executeQuery:204 type mismatch:
type1:
DECIMAL(19, 4)
type2:
DECIMAL(39, 16) NOT NULL
  CombinationTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAndCompQuery:359->KylinTestBase.executeQuery:204 type mismatch:
type1:
DECIMAL(19, 4)
type2:
DECIMAL(39, 16) NOT NULL
  IIQueryTest.testDetailedQuery:59->KylinTestBase.execAndCompQuery:359->KylinTestBase.executeQuery:204 type mismatch:
type1:
DECIMAL(19, 4)
type2:
DECIMAL(39, 16) NOT NULL
  IIQueryTest>KylinQueryTest.testCommonQuery:165->KylinTestBase.execAndCompQuery:359->KylinTestBase.executeQuery:204 type mismatch:
type1:
DECIMAL(19, 4)
type2:
DECIMAL(39, 16) NOT NULL


Regards,
Srinivasan Hariharan



> Date: Wed, 17 Jun 2015 11:32:05 +0800
> Subject: Re: Hive external Table Dimension
> From: shaofengshi@gmail.com
> To: dev@kylin.incubator.apache.org
> 
> Srinivasan,
> 
> You can checkout 0.7-staging branch as start; Look into
> org.apache.kylin.dict.lookup.HiveTable, the implementation of
> "getSignature()" and "getColumnDelimeter()" is not perfect: it calls
> "getFileTable()", which will check the underlying HDFS file, as we know
> this is not suitable for external table;
> 
> To fix the problem, need re-write two methods; In the new "getSignature()",
> using Hive API to get the table's path, size and last modified time, you
> may need do some search here; For the new "getColumnDelimeter()", just
> return DELIM_AUTO is okay;
> 
> After finish the code and pass all unit test, please create a patch and
> attache it in the JIRA for review ("pull request" is not accepted anymore);
> 
> Thanks for the contribution;
> 
> 
> 
> 2015-06-17 1:10 GMT+08:00 Srinivasan Hariharan <
> srinivasan.hariharan@outlook.com>:
> 
> > Hi ,
> >
> > I am interested to contribute to this JIRA, could anyone help me out where
> > can I start.
> >
> > https://issues.apache.org/jira/browse/KYLIN-824
> >
> > Regards,
> > Srinivasan Hariharan
> >
> >
> >
> > From: srinivasan.hariharan@outlook.com
> > To: dev@kylin.incubator.apache.org
> > Subject: RE: Hive external Table Dimension
> > Date: Thu, 11 Jun 2015 21:51:08 +0530
> >
> >
> >
> >
> >
> > Thanks,
> >
> > I have created JIRA.
> >
> > https://issues.apache.org/jira/browse/KYLIN-824
> >
> > I am interested to contribute, i will see the code and update for help.
> >
> >
> > > From: shaoshi@ebay.com
> > > To: dev@kylin.incubator.apache.org
> > > Subject: Re: Hive external Table Dimension
> > > Date: Thu, 11 Jun 2015 14:33:59 +0000
> > >
> > > Kylin need take snapshot for lookup tables for runtime queries (to derive
> > > the dimensions that not on row key), that¹s why it try to seek the
> > > underlying data file;
> > >
> > > So far without this it couldn¹t move ahead; For long run, Kylin can
> > > consider to abstract this; Please open a JIRA as requirement if you like;
> > >
> > > On 6/11/15, 5:45 PM, "Srinivasan Hariharan02" <
> > Srinivasan_H02@infosys.com>
> > > wrote:
> > >
> > > >Hi,
> > > >
> > > >I have a dimension external  table in Hive which is created using Hbase
> > > >Storage handler. After creating the cube using this hive  table cube
> > > >build job failed  in the "Build Dimension Dictionary" with below error
> > > >java.lang.IllegalStateException: Expect 1 and only 1 non-zero file under
> > > >hdfs://host:8020/user/hive/warehouse/hbase.db/department/, but find 0
> > > >        at
> > > >org.apache.kylin.dict.lookup.HiveTable.findOnlyFile(HiveTable.java:123)
> > > >        at
> > >
> > >org.apache.kylin.dict.lookup.HiveTable.computeHDFSLocation(HiveTable.java:
> > > >107)
> > > >        at
> > >
> > >org.apache.kylin.dict.lookup.HiveTable.getHDFSLocation(HiveTable.java:83)
> > > >        at
> > > >org.apache.kylin.dict.lookup.HiveTable.getFileTable(HiveTable.java:76)
> > > >        at
> > > >org.apache.kylin.dict.lookup.HiveTable.getSignature(HiveTable.java:71)
> > > >        at
> > >
> > >org.apache.kylin.dict.DictionaryManager.buildDictionary(DictionaryManager.
> > > >java:164)
> > > >        at
> > > >org.apache.kylin.cube.CubeManager.buildDictionary(CubeManager.java:154)
> > > >        at
> > >
> > >org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(Dictionary
> > > >GeneratorCLI.java:53)
> > > >        at
> > >
> > >org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(Dictionary
> > > >GeneratorCLI.java:42)
> > > >        at
> > >
> > >org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJ
> > > >ob.java:53)
> > > >        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> > > >        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> > > >        at
> > >
> > >org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecut
> > > >able.java:63)
> > > >        at
> > >
> > >org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutab
> > > >le.java:107)
> > > >        at
> > >
> > >org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChai
> > > >nedExecutable.java:50)
> > > >        at
> > >
> > >org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutab
> > > >le.java:107)
> > > >        at
> > >
> > >org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(Defaul
> > > >tScheduler.java:132)
> > > >        at
> > >
> > >java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
> > > >1145)
> > > >        at
> > >
> > >java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java
> > > >:615)
> > > >        at java.lang.Thread.run(Thread.java:744)
> > > >
> > > >Since external table created from other sources like Hbase hive doesn't
> > > >store any data in their warehouse directory. So it should not check for
> > > >files under  warehouse dir for external tables. Please help.
> > > >
> > > >Regards,
> > > >Srinivasan Hariharan
> > > >Mob +91-9940395830
> > > >
> > > >
> > > >**************** CAUTION - Disclaimer *****************
> > > >This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended
> > > >solely
> > > >for the use of the addressee(s). If you are not the intended recipient,
> > > >please
> > > >notify the sender by e-mail and delete the original message. Further,
> > you
> > > >are not
> > > >to copy, disclose, or distribute this e-mail or its contents to any
> > other
> > > >person and
> > > >any such actions are unlawful. This e-mail may contain viruses. Infosys
> > > >has taken
> > > >every reasonable precaution to minimize this risk, but is not liable for
> > > >any damage
> > > >you may sustain as a result of any virus in this e-mail. You should
> > carry
> > > >out your
> > > >own virus checks before opening the e-mail or attachment. Infosys
> > > >reserves the
> > > >right to monitor and review the content of all messages sent to or from
> > > >this e-mail
> > > >address. Messages sent to or from this e-mail address may be stored on
> > > >the
> > > >Infosys e-mail system.
> > > >***INFOSYS******** End of Disclaimer ********INFOSYS***
> > >
> >
> >
> >
 		 	   		  

Re: Hive external Table Dimension

Posted by ShaoFeng Shi <sh...@gmail.com>.
Srinivasan,

You can checkout 0.7-staging branch as start; Look into
org.apache.kylin.dict.lookup.HiveTable, the implementation of
"getSignature()" and "getColumnDelimeter()" is not perfect: it calls
"getFileTable()", which will check the underlying HDFS file, as we know
this is not suitable for external table;

To fix the problem, need re-write two methods; In the new "getSignature()",
using Hive API to get the table's path, size and last modified time, you
may need do some search here; For the new "getColumnDelimeter()", just
return DELIM_AUTO is okay;

After finish the code and pass all unit test, please create a patch and
attache it in the JIRA for review ("pull request" is not accepted anymore);

Thanks for the contribution;



2015-06-17 1:10 GMT+08:00 Srinivasan Hariharan <
srinivasan.hariharan@outlook.com>:

> Hi ,
>
> I am interested to contribute to this JIRA, could anyone help me out where
> can I start.
>
> https://issues.apache.org/jira/browse/KYLIN-824
>
> Regards,
> Srinivasan Hariharan
>
>
>
> From: srinivasan.hariharan@outlook.com
> To: dev@kylin.incubator.apache.org
> Subject: RE: Hive external Table Dimension
> Date: Thu, 11 Jun 2015 21:51:08 +0530
>
>
>
>
>
> Thanks,
>
> I have created JIRA.
>
> https://issues.apache.org/jira/browse/KYLIN-824
>
> I am interested to contribute, i will see the code and update for help.
>
>
> > From: shaoshi@ebay.com
> > To: dev@kylin.incubator.apache.org
> > Subject: Re: Hive external Table Dimension
> > Date: Thu, 11 Jun 2015 14:33:59 +0000
> >
> > Kylin need take snapshot for lookup tables for runtime queries (to derive
> > the dimensions that not on row key), that¹s why it try to seek the
> > underlying data file;
> >
> > So far without this it couldn¹t move ahead; For long run, Kylin can
> > consider to abstract this; Please open a JIRA as requirement if you like;
> >
> > On 6/11/15, 5:45 PM, "Srinivasan Hariharan02" <
> Srinivasan_H02@infosys.com>
> > wrote:
> >
> > >Hi,
> > >
> > >I have a dimension external  table in Hive which is created using Hbase
> > >Storage handler. After creating the cube using this hive  table cube
> > >build job failed  in the "Build Dimension Dictionary" with below error
> > >java.lang.IllegalStateException: Expect 1 and only 1 non-zero file under
> > >hdfs://host:8020/user/hive/warehouse/hbase.db/department/, but find 0
> > >        at
> > >org.apache.kylin.dict.lookup.HiveTable.findOnlyFile(HiveTable.java:123)
> > >        at
> >
> >org.apache.kylin.dict.lookup.HiveTable.computeHDFSLocation(HiveTable.java:
> > >107)
> > >        at
> >
> >org.apache.kylin.dict.lookup.HiveTable.getHDFSLocation(HiveTable.java:83)
> > >        at
> > >org.apache.kylin.dict.lookup.HiveTable.getFileTable(HiveTable.java:76)
> > >        at
> > >org.apache.kylin.dict.lookup.HiveTable.getSignature(HiveTable.java:71)
> > >        at
> >
> >org.apache.kylin.dict.DictionaryManager.buildDictionary(DictionaryManager.
> > >java:164)
> > >        at
> > >org.apache.kylin.cube.CubeManager.buildDictionary(CubeManager.java:154)
> > >        at
> >
> >org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(Dictionary
> > >GeneratorCLI.java:53)
> > >        at
> >
> >org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(Dictionary
> > >GeneratorCLI.java:42)
> > >        at
> >
> >org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJ
> > >ob.java:53)
> > >        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> > >        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> > >        at
> >
> >org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecut
> > >able.java:63)
> > >        at
> >
> >org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutab
> > >le.java:107)
> > >        at
> >
> >org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChai
> > >nedExecutable.java:50)
> > >        at
> >
> >org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutab
> > >le.java:107)
> > >        at
> >
> >org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(Defaul
> > >tScheduler.java:132)
> > >        at
> >
> >java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
> > >1145)
> > >        at
> >
> >java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java
> > >:615)
> > >        at java.lang.Thread.run(Thread.java:744)
> > >
> > >Since external table created from other sources like Hbase hive doesn't
> > >store any data in their warehouse directory. So it should not check for
> > >files under  warehouse dir for external tables. Please help.
> > >
> > >Regards,
> > >Srinivasan Hariharan
> > >Mob +91-9940395830
> > >
> > >
> > >**************** CAUTION - Disclaimer *****************
> > >This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended
> > >solely
> > >for the use of the addressee(s). If you are not the intended recipient,
> > >please
> > >notify the sender by e-mail and delete the original message. Further,
> you
> > >are not
> > >to copy, disclose, or distribute this e-mail or its contents to any
> other
> > >person and
> > >any such actions are unlawful. This e-mail may contain viruses. Infosys
> > >has taken
> > >every reasonable precaution to minimize this risk, but is not liable for
> > >any damage
> > >you may sustain as a result of any virus in this e-mail. You should
> carry
> > >out your
> > >own virus checks before opening the e-mail or attachment. Infosys
> > >reserves the
> > >right to monitor and review the content of all messages sent to or from
> > >this e-mail
> > >address. Messages sent to or from this e-mail address may be stored on
> > >the
> > >Infosys e-mail system.
> > >***INFOSYS******** End of Disclaimer ********INFOSYS***
> >
>
>
>

RE: Hive external Table Dimension

Posted by Srinivasan Hariharan <sr...@outlook.com>.
Hi ,

I am interested to contribute to this JIRA, could anyone help me out where can I start.

https://issues.apache.org/jira/browse/KYLIN-824

Regards,
Srinivasan Hariharan



From: srinivasan.hariharan@outlook.com
To: dev@kylin.incubator.apache.org
Subject: RE: Hive external Table Dimension
Date: Thu, 11 Jun 2015 21:51:08 +0530





Thanks, 

I have created JIRA.

https://issues.apache.org/jira/browse/KYLIN-824

I am interested to contribute, i will see the code and update for help.


> From: shaoshi@ebay.com
> To: dev@kylin.incubator.apache.org
> Subject: Re: Hive external Table Dimension
> Date: Thu, 11 Jun 2015 14:33:59 +0000
> 
> Kylin need take snapshot for lookup tables for runtime queries (to derive
> the dimensions that not on row key), that¹s why it try to seek the
> underlying data file;
> 
> So far without this it couldn¹t move ahead; For long run, Kylin can
> consider to abstract this; Please open a JIRA as requirement if you like;
> 
> On 6/11/15, 5:45 PM, "Srinivasan Hariharan02" <Sr...@infosys.com>
> wrote:
> 
> >Hi,
> >
> >I have a dimension external  table in Hive which is created using Hbase
> >Storage handler. After creating the cube using this hive  table cube
> >build job failed  in the "Build Dimension Dictionary" with below error
> >java.lang.IllegalStateException: Expect 1 and only 1 non-zero file under
> >hdfs://host:8020/user/hive/warehouse/hbase.db/department/, but find 0
> >        at 
> >org.apache.kylin.dict.lookup.HiveTable.findOnlyFile(HiveTable.java:123)
> >        at 
> >org.apache.kylin.dict.lookup.HiveTable.computeHDFSLocation(HiveTable.java:
> >107)
> >        at 
> >org.apache.kylin.dict.lookup.HiveTable.getHDFSLocation(HiveTable.java:83)
> >        at 
> >org.apache.kylin.dict.lookup.HiveTable.getFileTable(HiveTable.java:76)
> >        at 
> >org.apache.kylin.dict.lookup.HiveTable.getSignature(HiveTable.java:71)
> >        at 
> >org.apache.kylin.dict.DictionaryManager.buildDictionary(DictionaryManager.
> >java:164)
> >        at 
> >org.apache.kylin.cube.CubeManager.buildDictionary(CubeManager.java:154)
> >        at 
> >org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(Dictionary
> >GeneratorCLI.java:53)
> >        at 
> >org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(Dictionary
> >GeneratorCLI.java:42)
> >        at 
> >org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJ
> >ob.java:53)
> >        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> >        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> >        at 
> >org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecut
> >able.java:63)
> >        at 
> >org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutab
> >le.java:107)
> >        at 
> >org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChai
> >nedExecutable.java:50)
> >        at 
> >org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutab
> >le.java:107)
> >        at 
> >org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(Defaul
> >tScheduler.java:132)
> >        at 
> >java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
> >1145)
> >        at 
> >java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java
> >:615)
> >        at java.lang.Thread.run(Thread.java:744)
> >
> >Since external table created from other sources like Hbase hive doesn't
> >store any data in their warehouse directory. So it should not check for
> >files under  warehouse dir for external tables. Please help.
> >
> >Regards,
> >Srinivasan Hariharan
> >Mob +91-9940395830
> >
> >
> >**************** CAUTION - Disclaimer *****************
> >This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended
> >solely 
> >for the use of the addressee(s). If you are not the intended recipient,
> >please 
> >notify the sender by e-mail and delete the original message. Further, you
> >are not 
> >to copy, disclose, or distribute this e-mail or its contents to any other
> >person and 
> >any such actions are unlawful. This e-mail may contain viruses. Infosys
> >has taken 
> >every reasonable precaution to minimize this risk, but is not liable for
> >any damage 
> >you may sustain as a result of any virus in this e-mail. You should carry
> >out your 
> >own virus checks before opening the e-mail or attachment. Infosys
> >reserves the 
> >right to monitor and review the content of all messages sent to or from
> >this e-mail 
> >address. Messages sent to or from this e-mail address may be stored on
> >the 
> >Infosys e-mail system.
> >***INFOSYS******** End of Disclaimer ********INFOSYS***
> 
 		 	   		   		 	   		  

RE: Hive external Table Dimension

Posted by Srinivasan Hariharan <sr...@outlook.com>.
Thanks, 

I have created JIRA.

https://issues.apache.org/jira/browse/KYLIN-824

I am interested to contribute, i will see the code and update for help.


> From: shaoshi@ebay.com
> To: dev@kylin.incubator.apache.org
> Subject: Re: Hive external Table Dimension
> Date: Thu, 11 Jun 2015 14:33:59 +0000
> 
> Kylin need take snapshot for lookup tables for runtime queries (to derive
> the dimensions that not on row key), that¹s why it try to seek the
> underlying data file;
> 
> So far without this it couldn¹t move ahead; For long run, Kylin can
> consider to abstract this; Please open a JIRA as requirement if you like;
> 
> On 6/11/15, 5:45 PM, "Srinivasan Hariharan02" <Sr...@infosys.com>
> wrote:
> 
> >Hi,
> >
> >I have a dimension external  table in Hive which is created using Hbase
> >Storage handler. After creating the cube using this hive  table cube
> >build job failed  in the "Build Dimension Dictionary" with below error
> >java.lang.IllegalStateException: Expect 1 and only 1 non-zero file under
> >hdfs://host:8020/user/hive/warehouse/hbase.db/department/, but find 0
> >        at 
> >org.apache.kylin.dict.lookup.HiveTable.findOnlyFile(HiveTable.java:123)
> >        at 
> >org.apache.kylin.dict.lookup.HiveTable.computeHDFSLocation(HiveTable.java:
> >107)
> >        at 
> >org.apache.kylin.dict.lookup.HiveTable.getHDFSLocation(HiveTable.java:83)
> >        at 
> >org.apache.kylin.dict.lookup.HiveTable.getFileTable(HiveTable.java:76)
> >        at 
> >org.apache.kylin.dict.lookup.HiveTable.getSignature(HiveTable.java:71)
> >        at 
> >org.apache.kylin.dict.DictionaryManager.buildDictionary(DictionaryManager.
> >java:164)
> >        at 
> >org.apache.kylin.cube.CubeManager.buildDictionary(CubeManager.java:154)
> >        at 
> >org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(Dictionary
> >GeneratorCLI.java:53)
> >        at 
> >org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(Dictionary
> >GeneratorCLI.java:42)
> >        at 
> >org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJ
> >ob.java:53)
> >        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> >        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> >        at 
> >org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecut
> >able.java:63)
> >        at 
> >org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutab
> >le.java:107)
> >        at 
> >org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChai
> >nedExecutable.java:50)
> >        at 
> >org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutab
> >le.java:107)
> >        at 
> >org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(Defaul
> >tScheduler.java:132)
> >        at 
> >java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
> >1145)
> >        at 
> >java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java
> >:615)
> >        at java.lang.Thread.run(Thread.java:744)
> >
> >Since external table created from other sources like Hbase hive doesn't
> >store any data in their warehouse directory. So it should not check for
> >files under  warehouse dir for external tables. Please help.
> >
> >Regards,
> >Srinivasan Hariharan
> >Mob +91-9940395830
> >
> >
> >**************** CAUTION - Disclaimer *****************
> >This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended
> >solely 
> >for the use of the addressee(s). If you are not the intended recipient,
> >please 
> >notify the sender by e-mail and delete the original message. Further, you
> >are not 
> >to copy, disclose, or distribute this e-mail or its contents to any other
> >person and 
> >any such actions are unlawful. This e-mail may contain viruses. Infosys
> >has taken 
> >every reasonable precaution to minimize this risk, but is not liable for
> >any damage 
> >you may sustain as a result of any virus in this e-mail. You should carry
> >out your 
> >own virus checks before opening the e-mail or attachment. Infosys
> >reserves the 
> >right to monitor and review the content of all messages sent to or from
> >this e-mail 
> >address. Messages sent to or from this e-mail address may be stored on
> >the 
> >Infosys e-mail system.
> >***INFOSYS******** End of Disclaimer ********INFOSYS***
> 
 		 	   		  

Re: Hive external Table Dimension

Posted by "Shi, Shaofeng" <sh...@ebay.com>.
Kylin need take snapshot for lookup tables for runtime queries (to derive
the dimensions that not on row key), that¹s why it try to seek the
underlying data file;

So far without this it couldn¹t move ahead; For long run, Kylin can
consider to abstract this; Please open a JIRA as requirement if you like;

On 6/11/15, 5:45 PM, "Srinivasan Hariharan02" <Sr...@infosys.com>
wrote:

>Hi,
>
>I have a dimension external  table in Hive which is created using Hbase
>Storage handler. After creating the cube using this hive  table cube
>build job failed  in the "Build Dimension Dictionary" with below error
>java.lang.IllegalStateException: Expect 1 and only 1 non-zero file under
>hdfs://host:8020/user/hive/warehouse/hbase.db/department/, but find 0
>        at 
>org.apache.kylin.dict.lookup.HiveTable.findOnlyFile(HiveTable.java:123)
>        at 
>org.apache.kylin.dict.lookup.HiveTable.computeHDFSLocation(HiveTable.java:
>107)
>        at 
>org.apache.kylin.dict.lookup.HiveTable.getHDFSLocation(HiveTable.java:83)
>        at 
>org.apache.kylin.dict.lookup.HiveTable.getFileTable(HiveTable.java:76)
>        at 
>org.apache.kylin.dict.lookup.HiveTable.getSignature(HiveTable.java:71)
>        at 
>org.apache.kylin.dict.DictionaryManager.buildDictionary(DictionaryManager.
>java:164)
>        at 
>org.apache.kylin.cube.CubeManager.buildDictionary(CubeManager.java:154)
>        at 
>org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(Dictionary
>GeneratorCLI.java:53)
>        at 
>org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(Dictionary
>GeneratorCLI.java:42)
>        at 
>org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJ
>ob.java:53)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>        at 
>org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecut
>able.java:63)
>        at 
>org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutab
>le.java:107)
>        at 
>org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChai
>nedExecutable.java:50)
>        at 
>org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutab
>le.java:107)
>        at 
>org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(Defaul
>tScheduler.java:132)
>        at 
>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
>1145)
>        at 
>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java
>:615)
>        at java.lang.Thread.run(Thread.java:744)
>
>Since external table created from other sources like Hbase hive doesn't
>store any data in their warehouse directory. So it should not check for
>files under  warehouse dir for external tables. Please help.
>
>Regards,
>Srinivasan Hariharan
>Mob +91-9940395830
>
>
>**************** CAUTION - Disclaimer *****************
>This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended
>solely 
>for the use of the addressee(s). If you are not the intended recipient,
>please 
>notify the sender by e-mail and delete the original message. Further, you
>are not 
>to copy, disclose, or distribute this e-mail or its contents to any other
>person and 
>any such actions are unlawful. This e-mail may contain viruses. Infosys
>has taken 
>every reasonable precaution to minimize this risk, but is not liable for
>any damage 
>you may sustain as a result of any virus in this e-mail. You should carry
>out your 
>own virus checks before opening the e-mail or attachment. Infosys
>reserves the 
>right to monitor and review the content of all messages sent to or from
>this e-mail 
>address. Messages sent to or from this e-mail address may be stored on
>the 
>Infosys e-mail system.
>***INFOSYS******** End of Disclaimer ********INFOSYS***