You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@seatunnel.apache.org by "Dylan-zj (via GitHub)" <gi...@apache.org> on 2023/05/17 08:13:53 UTC

[GitHub] [incubator-seatunnel] Dylan-zj opened a new issue, #4770: [Bug] [seatunnel-connectors] [seatunnel-connector-spark-clickhouse] Data cannot be imported to all nodes by configuring split_mode and sharding_key

Dylan-zj opened a new issue, #4770:
URL: https://github.com/apache/incubator-seatunnel/issues/4770

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/incubator-seatunnel/issues?q=is%3Aissue+label%3A%22bug%22) and found no similar issues.
   
   
   ### What happened
   
   Cluster configuration: 
   
   - 20 machine nodes(clickhouse-01 ~ clickhouse-20) 
   - 10 shards 
   - every shard has 2 replicas
   - the weight of every shard is 1
   
   Configure split_mode=true, specify  sharding_key, and the sharding_key is String type 
   
   After importing the data, it is found that only 8 nodes(clickhouse-01 ~ clickhouse-08) have data.
   
   Root cause: 
   `(hashInstance.hash(ByteBuffer.wrap(row.getString(fieldIndex).getBytes), 0) & Long.MaxValue % shardWeightCount).toInt (Clickhouse.scala 440)`
   When the hash function is used to determine which shard to import data, '%' has a higher priority than '&', and the shardWeightCount is 20 in my case, then `Long.MaxValue % shardWeightCount` is 7, so the result of this line is one of 0 to 7.
    Finally, the data only can be imported to clickhouse-01 ~  clickhouse-08
   
   
   
   
   ### SeaTunnel Version
   
   v 2.1.3
   
   ### SeaTunnel Config
   
   ```conf
   None
   ```
   
   
   ### Running Command
   
   ```shell
   None
   ```
   
   
   ### Error Exception
   
   ```log
   None
   ```
   
   
   ### Flink or Spark Version
   
   _No response_
   
   ### Java or Scala Version
   
   _No response_
   
   ### Screenshots
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [seatunnel] github-actions[bot] closed issue #4770: [Bug] [seatunnel-connectors] [seatunnel-connector-spark-clickhouse] Data cannot be imported to all nodes by configuring split_mode and sharding_key

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed issue #4770: [Bug] [seatunnel-connectors] [seatunnel-connector-spark-clickhouse]  Data cannot be imported to all nodes by configuring split_mode and sharding_key
URL: https://github.com/apache/seatunnel/issues/4770


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] wuxizhi777 commented on issue #4770: [Bug] [seatunnel-connectors] [seatunnel-connector-spark-clickhouse] Data cannot be imported to all nodes by configuring split_mode and sharding_key

Posted by "wuxizhi777 (via GitHub)" <gi...@apache.org>.
wuxizhi777 commented on issue #4770:
URL: https://github.com/apache/incubator-seatunnel/issues/4770#issuecomment-1551198805

    val hashInstance = XXHashFactory.fastestInstance().hash64()
   
       val s =  "hah,what,you,am,who,am,i,123,de,df,fd,CR5,QA6,T7H,E8G,B11I,123,ZW2,SN1,34,23"
   
       var apps = s.split(",")
   
       for ( app <- apps){
         var offset = ( ( hashInstance.hash(ByteBuffer.wrap(app.getBytes), 0) & Long.MaxValue )  % 20).toInt
         println(app)
         println(offset)
       }
    this is ture write type
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [seatunnel] github-actions[bot] commented on issue #4770: [Bug] [seatunnel-connectors] [seatunnel-connector-spark-clickhouse] Data cannot be imported to all nodes by configuring split_mode and sharding_key

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #4770:
URL: https://github.com/apache/seatunnel/issues/4770#issuecomment-1595510463

   This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] wuxizhi777 commented on issue #4770: [Bug] [seatunnel-connectors] [seatunnel-connector-spark-clickhouse] Data cannot be imported to all nodes by configuring split_mode and sharding_key

Posted by "wuxizhi777 (via GitHub)" <gi...@apache.org>.
wuxizhi777 commented on issue #4770:
URL: https://github.com/apache/incubator-seatunnel/issues/4770#issuecomment-1550959725

   assign to me!!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] liugddx commented on issue #4770: [Bug] [seatunnel-connectors] [seatunnel-connector-spark-clickhouse] Data cannot be imported to all nodes by configuring split_mode and sharding_key

Posted by "liugddx (via GitHub)" <gi...@apache.org>.
liugddx commented on issue #4770:
URL: https://github.com/apache/incubator-seatunnel/issues/4770#issuecomment-1551164373

   > assign to me!!!
   
   Already assigned to you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [seatunnel] github-actions[bot] commented on issue #4770: [Bug] [seatunnel-connectors] [seatunnel-connector-spark-clickhouse] Data cannot be imported to all nodes by configuring split_mode and sharding_key

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #4770:
URL: https://github.com/apache/seatunnel/issues/4770#issuecomment-1605185977

   This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org