You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user-zh@flink.apache.org by JasonLee <17...@163.com> on 2020/07/21 12:38:30 UTC

回复：flink-1.11 ddl kafka-to-hive问题

hi
hive表是一直没有数据还是过一段时间就有数据了？


| |
JasonLee
|
|
邮箱：17610775726@163.com
|

Signature is customized by Netease Mail Master

在2020年07月21日 19:09，kcz 写道：
hive-1.2.1
chk 已经成功了（去chk目录查看了的确有chk数据，kafka也有数据），但是hive表没有数据，我是哪里缺少了什么吗？
String hiveSql = "CREATE  TABLE  stream_tmp.fs_table (\n" +
       "  host STRING,\n" +
       "  url STRING," +
       "  public_date STRING" +
       ") partitioned by (public_date string) " +
       "stored as PARQUET " +
       "TBLPROPERTIES (\n" +
       "  'sink.partition-commit.delay'='0 s',\n" +
       "  'sink.partition-commit.trigger'='partition-time',\n" +
       "  'sink.partition-commit.policy.kind'='metastore,success-file'" +
       ")";
tableEnv.executeSql(hiveSql);


tableEnv.executeSql("INSERT INTO  stream_tmp.fs_table SELECT host, url, DATE_FORMAT(public_date, 'yyyy-MM-dd') FROM stream_tmp.source_table");

Re: flink-1.11 ddl kafka-to-hive问题

Posted by Jark Wu <im...@gmail.com>.

rolling 策略配一下？
https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/connectors/filesystem.html#sink-rolling-policy-rollover-interval

Best,
Jark

On Tue, 21 Jul 2020 at 20:38, JasonLee <17...@163.com> wrote:

> hi
> hive表是一直没有数据还是过一段时间就有数据了？
>
>
> | |
> JasonLee
> |
> |
> 邮箱：17610775726@163.com
> |
>
> Signature is customized by Netease Mail Master
>
> 在2020年07月21日 19:09，kcz 写道：
> hive-1.2.1
> chk 已经成功了（去chk目录查看了的确有chk数据，kafka也有数据），但是hive表没有数据，我是哪里缺少了什么吗？
> String hiveSql = "CREATE  TABLE  stream_tmp.fs_table (\n" +
>        "  host STRING,\n" +
>        "  url STRING," +
>        "  public_date STRING" +
>        ") partitioned by (public_date string) " +
>        "stored as PARQUET " +
>        "TBLPROPERTIES (\n" +
>        "  'sink.partition-commit.delay'='0 s',\n" +
>        "  'sink.partition-commit.trigger'='partition-time',\n" +
>        "  'sink.partition-commit.policy.kind'='metastore,success-file'" +
>        ")";
> tableEnv.executeSql(hiveSql);
>
>
> tableEnv.executeSql("INSERT INTO  stream_tmp.fs_table SELECT host, url,
> DATE_FORMAT(public_date, 'yyyy-MM-dd') FROM stream_tmp.source_table");

Re: flink-1.11 ddl kafka-to-hive问题

Posted by Jingsong Li <ji...@gmail.com>.

你的Source表是怎么定义的？确定有watermark前进吗？(可以看Flink UI)

'sink.partition-commit.trigger'='partition-time' 去掉试试？

Best,
Jingsong

On Wed, Jul 22, 2020 at 12:02 AM Leonard Xu <xb...@gmail.com> wrote:

> HI,
>
> Hive 表时在flink里建的吗？ 如果是建表时使用了hive dialect吗？可以参考[1]设置下
>
> Best
> Leonard Xu
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/hive_dialect.html#use-hive-dialect
> <
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/hive_dialect.html#use-hive-dialect
> >
>
> > 在 2020年7月21日，22:57，kcz <57...@qq.com> 写道：
> >
> > 一直都木有数据 我也不知道哪里不太对 hive有这个表了已经。我测试写ddl hdfs 是OK的
> >
> >
> >
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: JasonLee <17610775726@163.com <ma...@163.com>&gt;
> > 发送时间: 2020年7月21日 20:39
> > 收件人: user-zh <user-zh@flink.apache.org <mailto:user-zh@flink.apache.org
> >&gt;
> > 主题: 回复：flink-1.11 ddl kafka-to-hive问题
> >
> >
> >
> > hi
> > hive表是一直没有数据还是过一段时间就有数据了？
> >
> >
> > | |
> > JasonLee
> > |
> > |
> > 邮箱：17610775726@163.com
> > |
> >
> > Signature is customized by Netease Mail Master
> >
> > 在2020年07月21日 19:09，kcz 写道：
> > hive-1.2.1
> > chk 已经成功了（去chk目录查看了的确有chk数据，kafka也有数据），但是hive表没有数据，我是哪里缺少了什么吗？
> > String hiveSql = "CREATE&nbsp; TABLE&nbsp; stream_tmp.fs_table (\n" +
> > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&nbsp; host STRING,\n" +
> > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&nbsp; url STRING," +
> > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&nbsp; public_date STRING" +
> > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ") partitioned by (public_date
> string) " +
> > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "stored as PARQUET " +
> > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "TBLPROPERTIES (\n" +
> > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&nbsp;
> 'sink.partition-commit.delay'='0 s',\n" +
> > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&nbsp;
> 'sink.partition-commit.trigger'='partition-time',\n" +
> > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&nbsp;
> 'sink.partition-commit.policy.kind'='metastore,success-file'" +
> > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ")";
> > tableEnv.executeSql(hiveSql);
> >
> >
> > tableEnv.executeSql("INSERT INTO&nbsp; stream_tmp.fs_table SELECT host,
> url, DATE_FORMAT(public_date, 'yyyy-MM-dd') FROM stream_tmp.source_table");
>
>

-- 
Best, Jingsong Lee

Re: flink-1.11 ddl kafka-to-hive问题

Posted by Leonard Xu <xb...@gmail.com>.

HI,

Hive 表时在flink里建的吗？ 如果是建表时使用了hive dialect吗？可以参考[1]设置下

Best
Leonard Xu
[1] https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/hive_dialect.html#use-hive-dialect <https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/hive_dialect.html#use-hive-dialect>

> 在 2020年7月21日，22:57，kcz <57...@qq.com> 写道：
> 
> 一直都木有数据 我也不知道哪里不太对 hive有这个表了已经。我测试写ddl hdfs 是OK的
> 
> 
> 
> 
> 
> ------------------ 原始邮件 ------------------
> 发件人: JasonLee <17610775726@163.com <ma...@163.com>&gt;
> 发送时间: 2020年7月21日 20:39
> 收件人: user-zh <user-zh@flink.apache.org <ma...@flink.apache.org>&gt;
> 主题: 回复：flink-1.11 ddl kafka-to-hive问题
> 
> 
> 
> hi
> hive表是一直没有数据还是过一段时间就有数据了？
> 
> 
> | |
> JasonLee
> |
> |
> 邮箱：17610775726@163.com
> |
> 
> Signature is customized by Netease Mail Master
> 
> 在2020年07月21日 19:09，kcz 写道：
> hive-1.2.1
> chk 已经成功了（去chk目录查看了的确有chk数据，kafka也有数据），但是hive表没有数据，我是哪里缺少了什么吗？
> String hiveSql = "CREATE&nbsp; TABLE&nbsp; stream_tmp.fs_table (\n" +
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&nbsp; host STRING,\n" +
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&nbsp; url STRING," +
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&nbsp; public_date STRING" +
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ") partitioned by (public_date string) " +
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "stored as PARQUET " +
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "TBLPROPERTIES (\n" +
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&nbsp; 'sink.partition-commit.delay'='0 s',\n" +
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&nbsp; 'sink.partition-commit.trigger'='partition-time',\n" +
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&nbsp; 'sink.partition-commit.policy.kind'='metastore,success-file'" +
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ")";
> tableEnv.executeSql(hiveSql);
> 
> 
> tableEnv.executeSql("INSERT INTO&nbsp; stream_tmp.fs_table SELECT host, url, DATE_FORMAT(public_date, 'yyyy-MM-dd') FROM stream_tmp.source_table");

回复：flink-1.11 ddl kafka-to-hive问题

Posted by kcz <57...@qq.com>.

一直都木有数据 我也不知道哪里不太对 hive有这个表了已经。我测试写ddl hdfs 是OK的





------------------ 原始邮件 ------------------
发件人: JasonLee <17610775726@163.com&gt;
发送时间: 2020年7月21日 20:39
收件人: user-zh <user-zh@flink.apache.org&gt;
主题: 回复：flink-1.11 ddl kafka-to-hive问题



hi
hive表是一直没有数据还是过一段时间就有数据了？


| |
JasonLee
|
|
邮箱：17610775726@163.com
|

Signature is customized by Netease Mail Master

在2020年07月21日 19:09，kcz 写道：
hive-1.2.1
chk 已经成功了（去chk目录查看了的确有chk数据，kafka也有数据），但是hive表没有数据，我是哪里缺少了什么吗？
String hiveSql = "CREATE&nbsp; TABLE&nbsp; stream_tmp.fs_table (\n" +
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&nbsp; host STRING,\n" +
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&nbsp; url STRING," +
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&nbsp; public_date STRING" +
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ") partitioned by (public_date string) " +
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "stored as PARQUET " +
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "TBLPROPERTIES (\n" +
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&nbsp; 'sink.partition-commit.delay'='0 s',\n" +
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&nbsp; 'sink.partition-commit.trigger'='partition-time',\n" +
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&nbsp; 'sink.partition-commit.policy.kind'='metastore,success-file'" +
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ")";
tableEnv.executeSql(hiveSql);


tableEnv.executeSql("INSERT INTO&nbsp; stream_tmp.fs_table SELECT host, url, DATE_FORMAT(public_date, 'yyyy-MM-dd') FROM stream_tmp.source_table");