You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Stephan Ewen (Jira)" <ji...@apache.org> on 2020/01/15 13:30:00 UTC
[jira] [Assigned] (FLINK-9407) Support orc rolling sink writer
[ https://issues.apache.org/jira/browse/FLINK-9407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Stephan Ewen reassigned FLINK-9407:
-----------------------------------
Assignee: (was: zhangminglei)
> Support orc rolling sink writer
> -------------------------------
>
> Key: FLINK-9407
> URL: https://issues.apache.org/jira/browse/FLINK-9407
> Project: Flink
> Issue Type: New Feature
> Components: Connectors / FileSystem
> Reporter: zhangminglei
> Priority: Major
> Labels: pull-request-available, usability
> Fix For: 1.11.0
>
>
> Currently, we only support {{StringWriter}}, {{SequenceFileWriter}} and {{AvroKeyValueSinkWriter}}. I would suggest add an orc writer for rolling sink.
> Below, FYI.
> I tested the PR and verify the results with spark sql. Obviously, we can get the results of what we had written down before. But I will give more tests in the next couple of days. Including the performance under compression with short checkpoint intervals. And more UTs.
> {code:java}
> scala> spark.read.orc("hdfs://10.199.196.0:9000/data/hive/man/2018-07-06--21")
> res1: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more field]
> scala>
> scala> res1.registerTempTable("tablerice")
> warning: there was one deprecation warning; re-run with -deprecation for details
> scala> spark.sql("select * from tablerice")
> res3: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more field]
> scala> res3.show(3)
> +-----+---+-------+
> | name|age|married|
> +-----+---+-------+
> |Sagar| 26| false|
> |Sagar| 30| false|
> |Sagar| 34| false|
> +-----+---+-------+
> only showing top 3 rows
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)