You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kylin.apache.org by mailpig <al...@163.com> on 2019/02/20 03:17:47 UTC

Hbase table is always empty when build with spark

In kylin-2.5.2, the result hbase table is always table is always empty when I
build cube with spark.
I found that the step "Load HFile to HBase Table" has some warn log:
/2019-01-27 00:49:30,067 WARN [Scheduler 448149092 Job
89a25959-e12d-7a5e-0ecb-80c978533eab-6419]
mapreduce.LoadIncrementalHFiles:204 : Skipping non-directory
hdfs://test/kylin/kylin_metadata/kylin-89a25959-e12d-7a5e-0ecb-80c978533eab/test_UUID_spark/hfile/_SUCCESS
2019-01-27 00:49:30,068 WARN [Scheduler 448149092 Job
89a25959-e12d-7a5e-0ecb-80c978533eab-6419]
mapreduce.LoadIncrementalHFiles:204 : Skipping non-directory
hdfs://test/kylin/kylin_metadata/kylin-89a25959-e12d-7a5e-0ecb-80c978533eab/test_UUID_spark/hfile/part-r-00000
2019-01-27 00:49:30,068 WARN [Scheduler 448149092 Job
89a25959-e12d-7a5e-0ecb-80c978533eab-6419]
mapreduce.LoadIncrementalHFiles:204 : Skipping non-directory
hdfs://test/kylin/kylin_metadata/kylin-89a25959-e12d-7a5e-0ecb-80c978533eab/test_UUID_spark/hfile/part-r-00001/

After read the source code, I found the step "Convert Cuboid Data to HFile"
with spark has bug. The above step's outputdir should has subdirectory with
column family. Indeed, SparkCubeHFile must set
mapreduce.job.outputformat.class with HFileOutputFormat2.class. 

Please check if I am correct！

--
Sent from: http://apache-kylin.74782.x6.nabble.com/

Re: Hbase table is always empty when build with spark

Posted by ShaoFeng Shi <sh...@apache.org>.

Hi Alex,

Could you please report a JIRA to Kylin? or send a Pull request if you
already have a hot-fix. Thank you!

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofengshi@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org
Join Kylin dev mail group: dev-subscribe@kylin.apache.org




mailpig <al...@163.com> 于2019年2月25日周一 下午5:18写道：

> Sure, hive table is not empty and the output directory of hfile also has
> data.
>
> <http://apache-kylin.74782.x6.nabble.com/file/t635/IMG20190225_171051.png>
>
>
> After set the mapreduce.job.outputformat.class in the job config, load
> hfile
> to hbase is success.
> Besides that I found the source code has the above config in the first
> commit,
> ..............................
> HTable table = new HTable(hbaseConf,
> cubeSegment.getStorageLocationIdentifier());
>         try {
>             HFileOutputFormat2.configureIncrementalLoadMap(job, table);
>         } catch (IOException ioe) {
>             // this can be ignored.
>             logger.debug(ioe.getMessage(), ioe);
>         }
> ...............................
> But after the commit 76c9c960be542c919301c72b34c7ae5ce6f1ec1c, the above
> config is deleted, I don't know why. Please check.
>
> --
> Sent from: http://apache-kylin.74782.x6.nabble.com/
>

Re: Hbase table is always empty when build with spark

Posted by mailpig <al...@163.com>.

Sure, hive table is not empty and the output directory of hfile also has
data.

<http://apache-kylin.74782.x6.nabble.com/file/t635/IMG20190225_171051.png> 

After set the mapreduce.job.outputformat.class in the job config, load hfile
to hbase is success.
Besides that I found the source code has the above config in the first
commit,
..............................
HTable table = new HTable(hbaseConf,
cubeSegment.getStorageLocationIdentifier());
        try {
            HFileOutputFormat2.configureIncrementalLoadMap(job, table);
        } catch (IOException ioe) {
            // this can be ignored.
            logger.debug(ioe.getMessage(), ioe);
        }
...............................
But after the commit 76c9c960be542c919301c72b34c7ae5ce6f1ec1c, the above
config is deleted, I don't know why. Please check.

--
Sent from: http://apache-kylin.74782.x6.nabble.com/

Re: Hbase table is always empty when build with spark

Posted by ShaoFeng Shi <sh...@apache.org>.

Hello Alex,

Interesting; We didn't observe such an issue. Can you confirm your hive
table has the data, instead of an input error? Does the problem get solved
after setting "mapreduce.job.outputformat.class"?

Thanks for the feedback!

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
Kyligence Inc: https://kyligence.io/

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org
Join Kylin dev mail group: dev-subscribe@kylin.apache.org




mailpig <al...@163.com> 于2019年2月20日周三 上午11:18写道：

> In kylin-2.5.2, the result hbase table is always table is always empty
> when I
> build cube with spark.
> I found that the step "Load HFile to HBase Table" has some warn log:
> /2019-01-27 00:49:30,067 WARN [Scheduler 448149092 Job
> 89a25959-e12d-7a5e-0ecb-80c978533eab-6419]
> mapreduce.LoadIncrementalHFiles:204 : Skipping non-directory
>
> hdfs://test/kylin/kylin_metadata/kylin-89a25959-e12d-7a5e-0ecb-80c978533eab/test_UUID_spark/hfile/_SUCCESS
> 2019-01-27 00:49:30,068 WARN [Scheduler 448149092 Job
> 89a25959-e12d-7a5e-0ecb-80c978533eab-6419]
> mapreduce.LoadIncrementalHFiles:204 : Skipping non-directory
>
> hdfs://test/kylin/kylin_metadata/kylin-89a25959-e12d-7a5e-0ecb-80c978533eab/test_UUID_spark/hfile/part-r-00000
> 2019-01-27 00:49:30,068 WARN [Scheduler 448149092 Job
> 89a25959-e12d-7a5e-0ecb-80c978533eab-6419]
> mapreduce.LoadIncrementalHFiles:204 : Skipping non-directory
>
> hdfs://test/kylin/kylin_metadata/kylin-89a25959-e12d-7a5e-0ecb-80c978533eab/test_UUID_spark/hfile/part-r-00001/
>
> After read the source code, I found the step "Convert Cuboid Data to HFile"
> with spark has bug. The above step's outputdir should has subdirectory with
> column family. Indeed, SparkCubeHFile must set
> mapreduce.job.outputformat.class with HFileOutputFormat2.class.
>
> Please check if I am correct！
>
> --
> Sent from: http://apache-kylin.74782.x6.nabble.com/
>