You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@shardingsphere.apache.org by Sun Nianjun <ka...@outlook.com> on 2020/06/14 15:54:38 UTC

some suggesions for sharding scaling

hi , commuity

    since last week, I got some experience of using sharding scaling, and  I have summarized some suggestions as follows, hope to discuss with the community :


  1.   dumper, importer, channel is a typical producer-consumer pattern， mainly used BlockingQueue in Java，and I adjusted the details of the use a little in my actual experience: I changed the 'offer' to 'put' in MemoryChannel , change the queue size a little bit larger, because the dumper will stuffed the queue full very fast, and dumper is relative slow, even there is a timeout limitation for method `offer`, it's very easy to cause an RuntimeException by a failed offer.
  2.  I’m thinking about how to add a breakpoint resume function to scaling, because once there is a problem during the migration process, it is very troublesome。maybe we could record the scaling position through local log or something like that, we have to make sure the scaling could resume through breakpoint after broken down.
  3.  since the elastic job is resurrected, why not seperate and handle the job in scaling by elastic job ? a job system must be more Robust and Stable
  4.  It's very common usage in shardingsphere to load an implementaion for interface, that's SPI . I think the current dumper and importer should exact as default implementation, and add more implementions and load by SPI . BTW , importer is not just focus on import data into database, maybe for encryption.
  5.  I believe proxy will be the bottleneck for current architecture , event I tried to scaling more than 40 million data in a mysql to 8 new shard , sharding proxy is a little bit 'slow' , if we prepare to scale billions data from 16 sharding to 32 shard , I don't think proxy could withstand such pressure，we may use sharding jdbc for scaling to reshard.
  6.  currently, the parameters for `/scaling/job/start` are in yaml formats, it's ok if that's for API invocation , but its not friendly for debugging，should we add some function to translate yaml configuration to json format in API invocation ？

shardingsphere has a very active and open community ,  I hope the metors could give some suggestions for that .

Regards

Re:some suggesions for sharding scaling

Posted by KimmKing <ki...@apache.org>.

Thanks for these awesome suggestions.

1. "breakpoint resume" is an amazing feature for me, expecting your more forward.
2. Other optimizations sound nice and we could clearify and discuss one by one, and then move on.



At 2020-06-15 00:01:40, "Sun Nianjun" <ka...@outlook.com> wrote:
>hi , commuity
>
>    since last week, I got some experience of using sharding scaling, and  I have summarized some suggestions as follows, hope to discuss with the community :
>
>
>  1.   dumper, importer, channel is a typical producer-consumer pattern， mainly used BlockingQueue in Java，and I adjusted the details of the use a little in my actual experience: I changed the 'offer' to 'put' in MemoryChannel , change the queue size a little bit larger, because the dumper will stuffed the queue full very fast, and dumper is relative slow, even there is a timeout limitation for method `offer`, it's very easy to cause an RuntimeException by a failed offer.
>  2.  I’m thinking about how to add a breakpoint resume function to scaling, because once there is a problem during the migration process, it is very troublesome。maybe we could record the scaling position through local log or something like that, we have to make sure the scaling could resume through breakpoint after broken down.
>  3.  since the elastic job is resurrected, why not seperate and handle the job in scaling by elastic job ? a job system must be more Robust and Stable
>  4.  It's very common usage in shardingsphere to load an implementaion for interface, that's SPI . I think the current dumper and importer should exact as default implementation, and add more implementions and load by SPI . BTW , importer is not just focus on import data into database, maybe for encryption.
>  5.  I believe proxy will be the bottleneck for current architecture , event I tried to scaling more than 40 million data in a mysql to 8 new shard , sharding proxy is a little bit 'slow' , if we prepare to scale billions data from 16 sharding to 32 shard , I don't think proxy could withstand such pressure，we may use sharding jdbc for scaling to reshard.
>  6.  currently, the parameters for `/scaling/job/start` are in yaml formats, it's ok if that's for API invocation , but its not friendly for debugging，should we add some function to translate yaml configuration to json format in API invocation ？
>
>shardingsphere has a very active and open community ,  I hope the metors could give some suggestions for that .
>
>Regards