You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Joel Koshy <jj...@gmail.com> on 2014/02/13 23:40:03 UTC

Re: Review Request 17006: Check if there are hanging producer threads in stopping all entities

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17006/#review34432
-----------------------------------------------------------



system_test/utils/kafka_system_test_utils.py
<https://reviews.apache.org/r/17006/#comment64523>

    This does not seem right (or I may be misunderstanding the issue in this
    jira).
    
    This will return the PID of the locally running test-runner. You will need
    to parse out the PPID from the response of the ssh "start and echo pid"
    command - see kafka_system_test_utils.start_entity_in_background for how the
    PPID is obtained for the broker/console consumer, etc.
    
    


- Joel Koshy


On Jan. 28, 2014, 5:26 p.m., Guozhang Wang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17006/
> -----------------------------------------------------------
> 
> (Updated Jan. 28, 2014, 5:26 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1212
>     https://issues.apache.org/jira/browse/KAFKA-1212
> 
> 
> Repository: kafka
> 
> 
> Description
> -------
> 
> KAFKA-1212.v3.4
> 
> 
> KAFKA-1212.v3.3
> 
> 
> KAFKA-1212.v3.2
> 
> 
> KAFKA-1212.v3.1
> 
> 
> KAFKA-1212.v3
> 
> 
> KAFKA-1212.v2
> 
> 
> Diffs
> -----
> 
>   system_test/utils/kafka_system_test_utils.py fb4a9c05bf6f39a7abf41126325ed5ca26bcc246 
> 
> Diff: https://reviews.apache.org/r/17006/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Guozhang Wang
> 
>


Re: Review Request 17006: Check if there are hanging producer threads in stopping all entities

Posted by Neha Narkhede <ne...@gmail.com>.

> On Feb. 13, 2014, 10:40 p.m., Joel Koshy wrote:
> > system_test/utils/kafka_system_test_utils.py, line 937
> > <https://reviews.apache.org/r/17006/diff/9/?file=452899#file452899line937>
> >
> >     This does not seem right (or I may be misunderstanding the issue in this
> >     jira).
> >     
> >     This will return the PID of the locally running test-runner. You will need
> >     to parse out the PPID from the response of the ssh "start and echo pid"
> >     command - see kafka_system_test_utils.start_entity_in_background for how the
> >     PPID is obtained for the broker/console consumer, etc.
> >     
> >

Guozhang's reply to the same question I asked as well :)

"Remotely a producer performance process will be started, which will terminate itself when the number of messages specified has been sent. So we only need to stop the local script which periodically kicks off new producer performance remotely."


- Neha


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17006/#review34432
-----------------------------------------------------------


On Jan. 28, 2014, 5:26 p.m., Guozhang Wang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17006/
> -----------------------------------------------------------
> 
> (Updated Jan. 28, 2014, 5:26 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1212
>     https://issues.apache.org/jira/browse/KAFKA-1212
> 
> 
> Repository: kafka
> 
> 
> Description
> -------
> 
> KAFKA-1212.v3.4
> 
> 
> KAFKA-1212.v3.3
> 
> 
> KAFKA-1212.v3.2
> 
> 
> KAFKA-1212.v3.1
> 
> 
> KAFKA-1212.v3
> 
> 
> KAFKA-1212.v2
> 
> 
> Diffs
> -----
> 
>   system_test/utils/kafka_system_test_utils.py fb4a9c05bf6f39a7abf41126325ed5ca26bcc246 
> 
> Diff: https://reviews.apache.org/r/17006/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Guozhang Wang
> 
>


Re: Review Request 17006: Check if there are hanging producer threads in stopping all entities

Posted by Guozhang Wang <gu...@linkedin.com>.

> On Feb. 13, 2014, 10:40 p.m., Joel Koshy wrote:
> > system_test/utils/kafka_system_test_utils.py, line 937
> > <https://reviews.apache.org/r/17006/diff/9/?file=452899#file452899line937>
> >
> >     This does not seem right (or I may be misunderstanding the issue in this
> >     jira).
> >     
> >     This will return the PID of the locally running test-runner. You will need
> >     to parse out the PPID from the response of the ssh "start and echo pid"
> >     command - see kafka_system_test_utils.start_entity_in_background for how the
> >     PPID is obtained for the broker/console consumer, etc.
> >     
> >
> 
> Neha Narkhede wrote:
>     Guozhang's reply to the same question I asked as well :)
>     
>     "Remotely a producer performance process will be started, which will terminate itself when the number of messages specified has been sent. So we only need to stop the local script which periodically kicks off new producer performance remotely."
>
> 
> Joel Koshy wrote:
>     I think the loop that spawns off remote producers will end when the "stopBackgroundProducer" flag is set to true (which is already the case). What seems to happen with this patch is the local pid (of the test runner is taken) and in stop remote entities, we will ssh to that remote entity and SIGTERM a pid that doesn't even exist on that remote entity. i.e., I don't think this actually kills the local script. Furthermore in this case I do think we should SIGTERM the remote producer and not wait for them to terminate themselves.

The issue here is that if one test case ends abnormally, the script will not be terminated, and hence will keep spawning remote producers and hence mess up with the subsequent test cases. What this patch does it that in stop_all_remote_running_processes, which is called at the end of each test case no matter if it exists normally or not, check if the local script still exists. If yes, signal it to terminate. The PID value is actually not used, we only check the length of the pid list. Does this make more sense now?


- Guozhang


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17006/#review34432
-----------------------------------------------------------


On Jan. 28, 2014, 5:26 p.m., Guozhang Wang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17006/
> -----------------------------------------------------------
> 
> (Updated Jan. 28, 2014, 5:26 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1212
>     https://issues.apache.org/jira/browse/KAFKA-1212
> 
> 
> Repository: kafka
> 
> 
> Description
> -------
> 
> KAFKA-1212.v3.4
> 
> 
> KAFKA-1212.v3.3
> 
> 
> KAFKA-1212.v3.2
> 
> 
> KAFKA-1212.v3.1
> 
> 
> KAFKA-1212.v3
> 
> 
> KAFKA-1212.v2
> 
> 
> Diffs
> -----
> 
>   system_test/utils/kafka_system_test_utils.py fb4a9c05bf6f39a7abf41126325ed5ca26bcc246 
> 
> Diff: https://reviews.apache.org/r/17006/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Guozhang Wang
> 
>


Re: Review Request 17006: Check if there are hanging producer threads in stopping all entities

Posted by Joel Koshy <jj...@gmail.com>.

> On Feb. 13, 2014, 10:40 p.m., Joel Koshy wrote:
> > system_test/utils/kafka_system_test_utils.py, line 937
> > <https://reviews.apache.org/r/17006/diff/9/?file=452899#file452899line937>
> >
> >     This does not seem right (or I may be misunderstanding the issue in this
> >     jira).
> >     
> >     This will return the PID of the locally running test-runner. You will need
> >     to parse out the PPID from the response of the ssh "start and echo pid"
> >     command - see kafka_system_test_utils.start_entity_in_background for how the
> >     PPID is obtained for the broker/console consumer, etc.
> >     
> >
> 
> Neha Narkhede wrote:
>     Guozhang's reply to the same question I asked as well :)
>     
>     "Remotely a producer performance process will be started, which will terminate itself when the number of messages specified has been sent. So we only need to stop the local script which periodically kicks off new producer performance remotely."
>

I think the loop that spawns off remote producers will end when the "stopBackgroundProducer" flag is set to true (which is already the case). What seems to happen with this patch is the local pid (of the test runner is taken) and in stop remote entities, we will ssh to that remote entity and SIGTERM a pid that doesn't even exist on that remote entity. i.e., I don't think this actually kills the local script. Furthermore in this case I do think we should SIGTERM the remote producer and not wait for them to terminate themselves.


- Joel


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17006/#review34432
-----------------------------------------------------------


On Jan. 28, 2014, 5:26 p.m., Guozhang Wang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17006/
> -----------------------------------------------------------
> 
> (Updated Jan. 28, 2014, 5:26 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1212
>     https://issues.apache.org/jira/browse/KAFKA-1212
> 
> 
> Repository: kafka
> 
> 
> Description
> -------
> 
> KAFKA-1212.v3.4
> 
> 
> KAFKA-1212.v3.3
> 
> 
> KAFKA-1212.v3.2
> 
> 
> KAFKA-1212.v3.1
> 
> 
> KAFKA-1212.v3
> 
> 
> KAFKA-1212.v2
> 
> 
> Diffs
> -----
> 
>   system_test/utils/kafka_system_test_utils.py fb4a9c05bf6f39a7abf41126325ed5ca26bcc246 
> 
> Diff: https://reviews.apache.org/r/17006/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Guozhang Wang
> 
>