You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@seatunnel.apache.org by "Dylan-zj (via GitHub)" <gi...@apache.org> on 2023/05/17 08:13:53 UTC
[GitHub] [incubator-seatunnel] Dylan-zj opened a new issue, #4770: [Bug] [seatunnel-connectors] [seatunnel-connector-spark-clickhouse] Data cannot be imported to all nodes by configuring split_mode and sharding_key
Dylan-zj opened a new issue, #4770:
URL: https://github.com/apache/incubator-seatunnel/issues/4770
### Search before asking
- [X] I had searched in the [issues](https://github.com/apache/incubator-seatunnel/issues?q=is%3Aissue+label%3A%22bug%22) and found no similar issues.
### What happened
Cluster configuration:
- 20 machine nodes(clickhouse-01 ~ clickhouse-20)
- 10 shards
- every shard has 2 replicas
- the weight of every shard is 1
Configure split_mode=true, specify sharding_key, and the sharding_key is String type
After importing the data, it is found that only 8 nodes(clickhouse-01 ~ clickhouse-08) have data.
Root cause:
`(hashInstance.hash(ByteBuffer.wrap(row.getString(fieldIndex).getBytes), 0) & Long.MaxValue % shardWeightCount).toInt (Clickhouse.scala 440)`
When the hash function is used to determine which shard to import data, '%' has a higher priority than '&', and the shardWeightCount is 20 in my case, then `Long.MaxValue % shardWeightCount` is 7, so the result of this line is one of 0 to 7.
Finally, the data only can be imported to clickhouse-01 ~ clickhouse-08
### SeaTunnel Version
v 2.1.3
### SeaTunnel Config
```conf
None
```
### Running Command
```shell
None
```
### Error Exception
```log
None
```
### Flink or Spark Version
_No response_
### Java or Scala Version
_No response_
### Screenshots
_No response_
### Are you willing to submit PR?
- [X] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [seatunnel] github-actions[bot] closed issue #4770: [Bug] [seatunnel-connectors] [seatunnel-connector-spark-clickhouse] Data cannot be imported to all nodes by configuring split_mode and sharding_key
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed issue #4770: [Bug] [seatunnel-connectors] [seatunnel-connector-spark-clickhouse] Data cannot be imported to all nodes by configuring split_mode and sharding_key
URL: https://github.com/apache/seatunnel/issues/4770
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-seatunnel] wuxizhi777 commented on issue #4770: [Bug] [seatunnel-connectors] [seatunnel-connector-spark-clickhouse] Data cannot be imported to all nodes by configuring split_mode and sharding_key
Posted by "wuxizhi777 (via GitHub)" <gi...@apache.org>.
wuxizhi777 commented on issue #4770:
URL: https://github.com/apache/incubator-seatunnel/issues/4770#issuecomment-1551198805
val hashInstance = XXHashFactory.fastestInstance().hash64()
val s = "hah,what,you,am,who,am,i,123,de,df,fd,CR5,QA6,T7H,E8G,B11I,123,ZW2,SN1,34,23"
var apps = s.split(",")
for ( app <- apps){
var offset = ( ( hashInstance.hash(ByteBuffer.wrap(app.getBytes), 0) & Long.MaxValue ) % 20).toInt
println(app)
println(offset)
}
this is ture write type
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [seatunnel] github-actions[bot] commented on issue #4770: [Bug] [seatunnel-connectors] [seatunnel-connector-spark-clickhouse] Data cannot be imported to all nodes by configuring split_mode and sharding_key
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #4770:
URL: https://github.com/apache/seatunnel/issues/4770#issuecomment-1595510463
This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-seatunnel] wuxizhi777 commented on issue #4770: [Bug] [seatunnel-connectors] [seatunnel-connector-spark-clickhouse] Data cannot be imported to all nodes by configuring split_mode and sharding_key
Posted by "wuxizhi777 (via GitHub)" <gi...@apache.org>.
wuxizhi777 commented on issue #4770:
URL: https://github.com/apache/incubator-seatunnel/issues/4770#issuecomment-1550959725
assign to me!!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-seatunnel] liugddx commented on issue #4770: [Bug] [seatunnel-connectors] [seatunnel-connector-spark-clickhouse] Data cannot be imported to all nodes by configuring split_mode and sharding_key
Posted by "liugddx (via GitHub)" <gi...@apache.org>.
liugddx commented on issue #4770:
URL: https://github.com/apache/incubator-seatunnel/issues/4770#issuecomment-1551164373
> assign to me!!!
Already assigned to you.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [seatunnel] github-actions[bot] commented on issue #4770: [Bug] [seatunnel-connectors] [seatunnel-connector-spark-clickhouse] Data cannot be imported to all nodes by configuring split_mode and sharding_key
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #4770:
URL: https://github.com/apache/seatunnel/issues/4770#issuecomment-1605185977
This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org