You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by mycFelix <gi...@git.apache.org> on 2016/01/25 12:39:50 UTC

[GitHub] storm pull request: make the txid continuous and bug fixed

GitHub user mycFelix opened a pull request:

    https://github.com/apache/storm/pull/1041

    make the txid continuous and bug fixed

    hello, i'm Felix.
    
    When we used Trident API and set Config.TOPOLOGY_MAX_SPOUT_PENDING greater than 1, assuming 100, we found the txid was incontinuous.
    
    That phenomenon has two effects: 
    
    *  When using TransactionalTridentKafkaSpout class, we got different txid and continuance offset value but the process speed would be slow, because some unnecessary loops in function MasterBatchCoordinator.sync(). And the greater Config.TOPOLOGY_MAX_SPOUT_PENDING, the slower speed.
    we printed some logs here:
    
    ````
    Config.TOPOLOGY_MAX_SPOUT_PENDING=100
    Config.TOPOLOGY_TRIDENT_BATCH_EMIT_INTERVAL_MILLIS=50
    
    14:05:00.337 [Thread-15-spout0] ERROR s.kafka.trident.TridentKafkaEmitter - emit:[id:135,offset:2546,nextOffset:2565]
    14:05:00.483 [Thread-15-spout0] ERROR s.kafka.trident.TridentKafkaEmitter - emit:[id:136,offset:2565,nextOffset:2584]
    14:05:00.495 [Thread-15-spout0] ERROR s.kafka.trident.TridentKafkaEmitter - emit:[id:197,offset:2584,nextOffset:2603]
    14:05:03.550 [Thread-15-spout0] ERROR s.kafka.trident.TridentKafkaEmitter - emit:[id:198,offset:2603,nextOffset:2622]
    14:05:03.593 [Thread-15-spout0] ERROR s.kafka.trident.TridentKafkaEmitter - emit:[id:199,offset:2622,nextOffset:2641]
    please NOTICE the timestamp from id 136 to 198 and offset.
    
    Config.TOPOLOGY_MAX_SPOUT_PENDING=1000
    Config.TOPOLOGY_TRIDENT_BATCH_EMIT_INTERVAL_MILLIS=50
    11:35:36.265 [Thread-15-spout0] ERROR s.kafka.trident.TridentKafkaEmitter - emit:[id:232,offset:228,nextOffset:247]
    11:35:36.305 [Thread-15-spout0] ERROR s.kafka.trident.TridentKafkaEmitter - emit:[id:233,offset:247,nextOffset:266]
    11:35:36.343 [Thread-15-spout0] ERROR s.kafka.trident.TridentKafkaEmitter - emit:[id:446,offset:266,nextOffset:285]
    11:35:41.345 [Thread-15-spout0] ERROR s.kafka.trident.TridentKafkaEmitter - emit:[id:1266,offset:285,nextOffset:304]
    11:35:47.063 [Thread-15-spout0] ERROR s.kafka.trident.TridentKafkaEmitter - emit:[id:1330,offset:304,nextOffset:323]
    11:35:56.221 [Thread-15-spout0] ERROR s.kafka.trident.TridentKafkaEmitter - emit:[id:1447,offset:323,nextOffset:342]
    please notice every log's timestamp,txid and offset.
    ````
    
    *  When using OpaqueTridentKafkaSpout class, we found that the different txid got same offset value which lead that we got repeated value from kafka.
    we printed some logs here:
    
    ````
    11:10:13.516 [Thread-15-spout0] ERROR s.kafka.trident.TridentKafkaEmitter - emit:[id:96,offset:1805,nextOffset:1824]
    11:10:13.567 [Thread-15-spout0] ERROR s.kafka.trident.TridentKafkaEmitter - emit:[id:142,offset:1824,nextOffset:1843]
    11:10:13.619 [Thread-15-spout0] ERROR s.kafka.trident.TridentKafkaEmitter - emit:[id:97,offset:1824,nextOffset:1843]
    11:10:13.670 [Thread-15-spout0] ERROR s.kafka.trident.TridentKafkaEmitter - emit:[id:98,offset:1843,nextOffset:1862]
    please NOTICE that id 142 and 97 got same kafka offset.
    ````
    
    We thought the txid's distribution algorithm needs to be with the continuous principle in MasterBatchCoordinator class. ONLY when the time windows and other condition is ready, the txid could be added.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mycFelix/storm 0.9.x-branch

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/1041.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1041
    
----
commit 3abcd55a9b8105c7c5eccfd882fbca35fa16ce21
Author: mycFelix <my...@gmail.com>
Date:   2016-01-25T11:23:53Z

    make the txid continuous

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #1041: make the txid continuous and bug fixed

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/storm/pull/1041


---