You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "zouyunhe (Jira)" <ji...@apache.org> on 2020/10/12 17:18:00 UTC
[jira] [Updated] (FLINK-19588) HBase zookeeper connection not released in hbase batch table source while flink job failover

     [ https://issues.apache.org/jira/browse/FLINK-19588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

zouyunhe updated FLINK-19588:
-----------------------------
    Description: 
Hi, I Create a sql job read from hbase table, the sql as below
{code:java}
create table hbase_source_test(
 id bigint not null,
 f1 ROW<
 uid bigint,
 all_stay bigint>)
 with (
 'connector.type' = 'hbase',
 'connector.version' = '1.4.3',
 'connector.table-name' = 'test_out',
 'connector.zookeeper.quorum' = 'testcluster-dn1:2181,testcluster-dn2:2181,testcluster-dn3:2181'
 );
create table test_mysql(
 id BIGINT,
 `name` VARCHAR,
 COST DOUBLE
 ) with (
 'connector.type' = 'jdbc',
 'connector.url' = 'jdbc:mysql://192.168.1.22:3306/test',
 'connector.table' = 'test_result',
 'connector.username' = 'test',
 'connector.write.flush.interval' = '2s'
 );
create view view_1 as
 select
 if (f1.uid is null, 0,f1.uid) as uid,
 proctime() as itime from hbase_source_test;
insert into `test_mysql` select uid, '', 0 from view_1;
{code}

 the field type(uid bigint, all_stay bigint) defined in `hbase_source_test` is not matched with the column in actual hbase table(uid int, all_stay int), run this sql job in yarn cluster, it will failover as data type not matched, with hbase table exception:
{code:java}
Source: HBaseTableSource[schema=[id, f1], projectFields=[1]] (1/3) (b16b12602c2e7e442785b15c5d6509f9) switched from RUNNING to FAILED on org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@110fff83. java.lang.IllegalArgumentException: offset (0) + length (8) exceed the capacity of the array: 4
 at org.apache.hadoop.hbase.util.Bytes.explainWrongLengthOrOffset(Bytes.java:779)
 ......................................
{code}
when the job failover, it will reconnect to hbase zookeeper, as it show in the log

{code:java}
2020-09-16 07:56:33,383 INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn - Opening socket connection to server hr-rec2/10.221.114.150:2181
 2020-09-16 07:56:33,383 INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn - Socket connection established to hr-rec2/10.221.114.150:2181, initiating session
 2020-09-16 07:56:33,385 INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn - Session establishment complete on server hr-rec2/10.221.114.150:2181, sessionid = 0x3737ad5b2ccd9fd, negotiated timeout = 60000{code}
 while the job failover for many times, we find the hbase zookeeper connection count is alway increased, by use the command: `netstat -an | grep 2181 | wc -l`, at last the connections count will goes to a very big number(thousands of)，which will exhaust the hbase zookeeper‘s connection.

  was:
Hi, I Create a sql job read from hbase table, the sql as below
```
create table hbase_source_test(
id bigint not null,
f1 ROW<
uid bigint,
all_stay bigint>)
with (
'connector.type' = 'hbase',
'connector.version' = '1.4.3',
'connector.table-name' = 'test_out',
'connector.zookeeper.quorum' = 'testcluster-dn1:2181,testcluster-dn2:2181,testcluster-dn3:2181'
);

create table test_mysql(
id BIGINT,
`name` VARCHAR,
COST DOUBLE
) with (
'connector.type' = 'jdbc',
'connector.url' = 'jdbc:mysql://192.168.1.22:3306/test',
'connector.table' = 'test_result',
'connector.username' = 'test',
'connector.write.flush.interval' = '2s'
);

create view view_1 as
select
if (f1.uid is null, 0,f1.uid) as uid,
proctime() as itime from hbase_source_test;

insert into `test_mysql` select uid, '', 0 from view_1;
```
the field type(uid bigint, all_stay bigint) defined in `hbase_source_test` is not matched with the column in actual hbase table(uid int, all_stay int), run this sql job in yarn cluster, it will failover as data type not matched, with hbase table exception:
```
Source: HBaseTableSource[schema=[id, f1], projectFields=[1]] (1/3) (b16b12602c2e7e442785b15c5d6509f9) switched from RUNNING to FAILED on org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@110fff83. java.lang.IllegalArgumentException: offset (0) + length (8) exceed the capacity of the array: 4
at org.apache.hadoop.hbase.util.Bytes.explainWrongLengthOrOffset(Bytes.java:779)
......................................
```
when the job failover, it will reconnect to hbase zookeeper, as it show in the log
```
2020-09-16 07:56:33,383 INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn - Opening socket connection to server hr-rec2/10.221.114.150:2181
2020-09-16 07:56:33,383 INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn - Socket connection established to hr-rec2/10.221.114.150:2181, initiating session
2020-09-16 07:56:33,385 INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn - Session establishment complete on server hr-rec2/10.221.114.150:2181, sessionid = 0x3737ad5b2ccd9fd, negotiated timeout = 60000
..............
```
while the job failover for many times, we find the hbase zookeeper connection count is alway increased, by use the command: `netstat -an | grep 2181 | wc -l`, at last the connections count will goes to a very big number(thousands of)，which will exhaust the hbase zookeeper‘s connection.


> HBase zookeeper connection not released in hbase batch table source while flink job failover
> --------------------------------------------------------------------------------------------
>
>                 Key: FLINK-19588
>                 URL: https://issues.apache.org/jira/browse/FLINK-19588
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / HBase
>    Affects Versions: 1.11.0
>            Reporter: zouyunhe
>            Priority: Critical
>
> Hi, I Create a sql job read from hbase table, the sql as below
> {code:java}
> create table hbase_source_test(
>  id bigint not null,
>  f1 ROW<
>  uid bigint,
>  all_stay bigint>)
>  with (
>  'connector.type' = 'hbase',
>  'connector.version' = '1.4.3',
>  'connector.table-name' = 'test_out',
>  'connector.zookeeper.quorum' = 'testcluster-dn1:2181,testcluster-dn2:2181,testcluster-dn3:2181'
>  );
> create table test_mysql(
>  id BIGINT,
>  `name` VARCHAR,
>  COST DOUBLE
>  ) with (
>  'connector.type' = 'jdbc',
>  'connector.url' = 'jdbc:mysql://192.168.1.22:3306/test',
>  'connector.table' = 'test_result',
>  'connector.username' = 'test',
>  'connector.write.flush.interval' = '2s'
>  );
> create view view_1 as
>  select
>  if (f1.uid is null, 0,f1.uid) as uid,
>  proctime() as itime from hbase_source_test;
> insert into `test_mysql` select uid, '', 0 from view_1;
> {code}
>  the field type(uid bigint, all_stay bigint) defined in `hbase_source_test` is not matched with the column in actual hbase table(uid int, all_stay int), run this sql job in yarn cluster, it will failover as data type not matched, with hbase table exception:
> {code:java}
> Source: HBaseTableSource[schema=[id, f1], projectFields=[1]] (1/3) (b16b12602c2e7e442785b15c5d6509f9) switched from RUNNING to FAILED on org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@110fff83. java.lang.IllegalArgumentException: offset (0) + length (8) exceed the capacity of the array: 4
>  at org.apache.hadoop.hbase.util.Bytes.explainWrongLengthOrOffset(Bytes.java:779)
>  ......................................
> {code}
> when the job failover, it will reconnect to hbase zookeeper, as it show in the log
> {code:java}
> 2020-09-16 07:56:33,383 INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn - Opening socket connection to server hr-rec2/10.221.114.150:2181
>  2020-09-16 07:56:33,383 INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn - Socket connection established to hr-rec2/10.221.114.150:2181, initiating session
>  2020-09-16 07:56:33,385 INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn - Session establishment complete on server hr-rec2/10.221.114.150:2181, sessionid = 0x3737ad5b2ccd9fd, negotiated timeout = 60000{code}
>  while the job failover for many times, we find the hbase zookeeper connection count is alway increased, by use the command: `netstat -an | grep 2181 | wc -l`, at last the connections count will goes to a very big number(thousands of)，which will exhaust the hbase zookeeper‘s connection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)