You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@plc4x.apache.org by Julian Feinauer <j....@pragmaticminds.de> on 2019/07/03 17:58:35 UTC

Possibly undefined behavior with PooledPlcDriverManager

Hi all,

in the last weeks we observed multiple times strange behavior when connecting to Siemens S7 devices.
We have not yet been able to trace it down entirely but I have the assumption that it is an issue with the PooledPlcDriverManager.

Whats the issue?
When doing requests (either via OPM or the “regular” API) we come at a point where all subsequent requests simply fail (and in some cases we were no longer able to send requests to the PLC from other instances, so it looks like the internal server went down).

Whats the setup?
When I remember correctly, all situations where this occurred used the Pool as Basis.
We had it both with OPM and the normal API but NOT with the Scraper.

I remember that I spent like a hole day at the Hackathon in Mallorca to get all timeout things to work correctly, as the S7 does not like when you simply cancel your request futures.
Currently there are two “suspects” from my side.

First, the pool calls the “.connect()” method on a now Connection it establishes but by API convention you also have to do that in your code so it gets called multiple times, which could fuck up stuff.
Second, connection can also time out (but its no future in our API) so in the Scraper I implemented it as Future with timeout (as I’m unsure how everything behaes if the pool starts to initialize a connection but then the “waitTime” times out and it abandons this).

Just wanted to share my thoughts with you… I hope I find some time the next days to set up a MWE and try it out.

Julian

Re: Possibly undefined behavior with PooledPlcDriverManager

Posted by Christofer Dutz <ch...@c-ware.de>.
Hi Julian,

Would also be a great test for the updated Release Documentation ;-)

Chris

Am 03.07.19, 21:47 schrieb "Julian Feinauer" <j....@pragmaticminds.de>:

    I guess that's what they're for :)
    
    So I'll try to get a patch ready on that branch and would like to prepare an RC then based on that. ASAP
    
    J
    
    Von meinem Mobiltelefon gesendet
    
    
    -------- Ursprüngliche Nachricht --------
    Von: Christofer Dutz <ch...@c-ware.de>
    Datum: Mi., 3. Juli 2019, 21:41
    An: dev@plc4x.apache.org
    Betreff: Re: Possibly undefined behavior with PooledPlcDriverManager
    Ah ... yes ... you are right.
    
    That should work :-)
    
    Good we have these branches ;-)
    
    Chris
    
    Am 03.07.19, 21:38 schrieb "Julian Feinauer" <j....@pragmaticminds.de>:
    
        Hi Chris,
    
        But at its a hotfix it would go off of rel/0.4 branch and is picked on develop in parallel.
        So this should not be a problem or?
    
        J
    
        Von meinem Mobiltelefon gesendet
    
    
        -------- Ursprüngliche Nachricht --------
        Von: Christofer Dutz <ch...@c-ware.de>
        Datum: Mi., 3. Juli 2019, 21:34
        An: dev@plc4x.apache.org
        Betreff: Re: Possibly undefined behavior with PooledPlcDriverManager
        Hi Julian,
    
        this time we would need to initially release the plc4x-build-tools first as we now depend on that.
        As soon as that’s done, we could update the versions to release versions and start releasing plc4x again.
    
        Chris
    
        Am 03.07.19, 21:30 schrieb "Julian Feinauer" <j....@pragmaticminds.de>:
    
            Hi all,
    
            Just a quick update. Tim was able to reproduce the problem and it is due to an issue when connect is called multiple times in a connection (it generates a zombie connection in the background than).
    
            I consider this issue severe (we had it in prod yesterday).
            I will make a Jira issue tomorrow and prepare a patch.
    
            I suggest to release this as hotfix 0.4.1 ASAP.
            Any opinions on that?
    
            Julian
    
            Von meinem Mobiltelefon gesendet
    
    
            -------- Ursprüngliche Nachricht --------
            Von: Julian Feinauer <j....@pragmaticminds.de>
            Datum: Mi., 3. Juli 2019, 19:58
            An: dev@plc4x.apache.org
            Betreff: Possibly undefined behavior with PooledPlcDriverManager
            Hi all,
    
            in the last weeks we observed multiple times strange behavior when connecting to Siemens S7 devices.
            We have not yet been able to trace it down entirely but I have the assumption that it is an issue with the PooledPlcDriverManager.
    
            Whats the issue?
            When doing requests (either via OPM or the “regular” API) we come at a point where all subsequent requests simply fail (and in some cases we were no longer able to send requests to the PLC from other instances, so it looks like the internal server went down).
    
            Whats the setup?
            When I remember correctly, all situations where this occurred used the Pool as Basis.
            We had it both with OPM and the normal API but NOT with the Scraper.
    
            I remember that I spent like a hole day at the Hackathon in Mallorca to get all timeout things to work correctly, as the S7 does not like when you simply cancel your request futures.
            Currently there are two “suspects” from my side.
    
            First, the pool calls the “.connect()” method on a now Connection it establishes but by API convention you also have to do that in your code so it gets called multiple times, which could fuck up stuff.
            Second, connection can also time out (but its no future in our API) so in the Scraper I implemented it as Future with timeout (as I’m unsure how everything behaes if the pool starts to initialize a connection but then the “waitTime” times out and it abandons this).
    
            Just wanted to share my thoughts with you… I hope I find some time the next days to set up a MWE and try it out.
    
            Julian
    
    
    
    
    


AW: Possibly undefined behavior with PooledPlcDriverManager

Posted by Julian Feinauer <j....@pragmaticminds.de>.
I guess that's what they're for :)

So I'll try to get a patch ready on that branch and would like to prepare an RC then based on that. ASAP

J

Von meinem Mobiltelefon gesendet


-------- Ursprüngliche Nachricht --------
Von: Christofer Dutz <ch...@c-ware.de>
Datum: Mi., 3. Juli 2019, 21:41
An: dev@plc4x.apache.org
Betreff: Re: Possibly undefined behavior with PooledPlcDriverManager
Ah ... yes ... you are right.

That should work :-)

Good we have these branches ;-)

Chris

Am 03.07.19, 21:38 schrieb "Julian Feinauer" <j....@pragmaticminds.de>:

    Hi Chris,

    But at its a hotfix it would go off of rel/0.4 branch and is picked on develop in parallel.
    So this should not be a problem or?

    J

    Von meinem Mobiltelefon gesendet


    -------- Ursprüngliche Nachricht --------
    Von: Christofer Dutz <ch...@c-ware.de>
    Datum: Mi., 3. Juli 2019, 21:34
    An: dev@plc4x.apache.org
    Betreff: Re: Possibly undefined behavior with PooledPlcDriverManager
    Hi Julian,

    this time we would need to initially release the plc4x-build-tools first as we now depend on that.
    As soon as that’s done, we could update the versions to release versions and start releasing plc4x again.

    Chris

    Am 03.07.19, 21:30 schrieb "Julian Feinauer" <j....@pragmaticminds.de>:

        Hi all,

        Just a quick update. Tim was able to reproduce the problem and it is due to an issue when connect is called multiple times in a connection (it generates a zombie connection in the background than).

        I consider this issue severe (we had it in prod yesterday).
        I will make a Jira issue tomorrow and prepare a patch.

        I suggest to release this as hotfix 0.4.1 ASAP.
        Any opinions on that?

        Julian

        Von meinem Mobiltelefon gesendet


        -------- Ursprüngliche Nachricht --------
        Von: Julian Feinauer <j....@pragmaticminds.de>
        Datum: Mi., 3. Juli 2019, 19:58
        An: dev@plc4x.apache.org
        Betreff: Possibly undefined behavior with PooledPlcDriverManager
        Hi all,

        in the last weeks we observed multiple times strange behavior when connecting to Siemens S7 devices.
        We have not yet been able to trace it down entirely but I have the assumption that it is an issue with the PooledPlcDriverManager.

        Whats the issue?
        When doing requests (either via OPM or the “regular” API) we come at a point where all subsequent requests simply fail (and in some cases we were no longer able to send requests to the PLC from other instances, so it looks like the internal server went down).

        Whats the setup?
        When I remember correctly, all situations where this occurred used the Pool as Basis.
        We had it both with OPM and the normal API but NOT with the Scraper.

        I remember that I spent like a hole day at the Hackathon in Mallorca to get all timeout things to work correctly, as the S7 does not like when you simply cancel your request futures.
        Currently there are two “suspects” from my side.

        First, the pool calls the “.connect()” method on a now Connection it establishes but by API convention you also have to do that in your code so it gets called multiple times, which could fuck up stuff.
        Second, connection can also time out (but its no future in our API) so in the Scraper I implemented it as Future with timeout (as I’m unsure how everything behaes if the pool starts to initialize a connection but then the “waitTime” times out and it abandons this).

        Just wanted to share my thoughts with you… I hope I find some time the next days to set up a MWE and try it out.

        Julian





Re: Possibly undefined behavior with PooledPlcDriverManager

Posted by Christofer Dutz <ch...@c-ware.de>.
Ah ... yes ... you are right.

That should work :-)

Good we have these branches ;-)

Chris

Am 03.07.19, 21:38 schrieb "Julian Feinauer" <j....@pragmaticminds.de>:

    Hi Chris,
    
    But at its a hotfix it would go off of rel/0.4 branch and is picked on develop in parallel.
    So this should not be a problem or?
    
    J
    
    Von meinem Mobiltelefon gesendet
    
    
    -------- Ursprüngliche Nachricht --------
    Von: Christofer Dutz <ch...@c-ware.de>
    Datum: Mi., 3. Juli 2019, 21:34
    An: dev@plc4x.apache.org
    Betreff: Re: Possibly undefined behavior with PooledPlcDriverManager
    Hi Julian,
    
    this time we would need to initially release the plc4x-build-tools first as we now depend on that.
    As soon as that’s done, we could update the versions to release versions and start releasing plc4x again.
    
    Chris
    
    Am 03.07.19, 21:30 schrieb "Julian Feinauer" <j....@pragmaticminds.de>:
    
        Hi all,
    
        Just a quick update. Tim was able to reproduce the problem and it is due to an issue when connect is called multiple times in a connection (it generates a zombie connection in the background than).
    
        I consider this issue severe (we had it in prod yesterday).
        I will make a Jira issue tomorrow and prepare a patch.
    
        I suggest to release this as hotfix 0.4.1 ASAP.
        Any opinions on that?
    
        Julian
    
        Von meinem Mobiltelefon gesendet
    
    
        -------- Ursprüngliche Nachricht --------
        Von: Julian Feinauer <j....@pragmaticminds.de>
        Datum: Mi., 3. Juli 2019, 19:58
        An: dev@plc4x.apache.org
        Betreff: Possibly undefined behavior with PooledPlcDriverManager
        Hi all,
    
        in the last weeks we observed multiple times strange behavior when connecting to Siemens S7 devices.
        We have not yet been able to trace it down entirely but I have the assumption that it is an issue with the PooledPlcDriverManager.
    
        Whats the issue?
        When doing requests (either via OPM or the “regular” API) we come at a point where all subsequent requests simply fail (and in some cases we were no longer able to send requests to the PLC from other instances, so it looks like the internal server went down).
    
        Whats the setup?
        When I remember correctly, all situations where this occurred used the Pool as Basis.
        We had it both with OPM and the normal API but NOT with the Scraper.
    
        I remember that I spent like a hole day at the Hackathon in Mallorca to get all timeout things to work correctly, as the S7 does not like when you simply cancel your request futures.
        Currently there are two “suspects” from my side.
    
        First, the pool calls the “.connect()” method on a now Connection it establishes but by API convention you also have to do that in your code so it gets called multiple times, which could fuck up stuff.
        Second, connection can also time out (but its no future in our API) so in the Scraper I implemented it as Future with timeout (as I’m unsure how everything behaes if the pool starts to initialize a connection but then the “waitTime” times out and it abandons this).
    
        Just wanted to share my thoughts with you… I hope I find some time the next days to set up a MWE and try it out.
    
        Julian
    
    
    


AW: Possibly undefined behavior with PooledPlcDriverManager

Posted by Julian Feinauer <j....@pragmaticminds.de>.
Hi Chris,

But at its a hotfix it would go off of rel/0.4 branch and is picked on develop in parallel.
So this should not be a problem or?

J

Von meinem Mobiltelefon gesendet


-------- Ursprüngliche Nachricht --------
Von: Christofer Dutz <ch...@c-ware.de>
Datum: Mi., 3. Juli 2019, 21:34
An: dev@plc4x.apache.org
Betreff: Re: Possibly undefined behavior with PooledPlcDriverManager
Hi Julian,

this time we would need to initially release the plc4x-build-tools first as we now depend on that.
As soon as that’s done, we could update the versions to release versions and start releasing plc4x again.

Chris

Am 03.07.19, 21:30 schrieb "Julian Feinauer" <j....@pragmaticminds.de>:

    Hi all,

    Just a quick update. Tim was able to reproduce the problem and it is due to an issue when connect is called multiple times in a connection (it generates a zombie connection in the background than).

    I consider this issue severe (we had it in prod yesterday).
    I will make a Jira issue tomorrow and prepare a patch.

    I suggest to release this as hotfix 0.4.1 ASAP.
    Any opinions on that?

    Julian

    Von meinem Mobiltelefon gesendet


    -------- Ursprüngliche Nachricht --------
    Von: Julian Feinauer <j....@pragmaticminds.de>
    Datum: Mi., 3. Juli 2019, 19:58
    An: dev@plc4x.apache.org
    Betreff: Possibly undefined behavior with PooledPlcDriverManager
    Hi all,

    in the last weeks we observed multiple times strange behavior when connecting to Siemens S7 devices.
    We have not yet been able to trace it down entirely but I have the assumption that it is an issue with the PooledPlcDriverManager.

    Whats the issue?
    When doing requests (either via OPM or the “regular” API) we come at a point where all subsequent requests simply fail (and in some cases we were no longer able to send requests to the PLC from other instances, so it looks like the internal server went down).

    Whats the setup?
    When I remember correctly, all situations where this occurred used the Pool as Basis.
    We had it both with OPM and the normal API but NOT with the Scraper.

    I remember that I spent like a hole day at the Hackathon in Mallorca to get all timeout things to work correctly, as the S7 does not like when you simply cancel your request futures.
    Currently there are two “suspects” from my side.

    First, the pool calls the “.connect()” method on a now Connection it establishes but by API convention you also have to do that in your code so it gets called multiple times, which could fuck up stuff.
    Second, connection can also time out (but its no future in our API) so in the Scraper I implemented it as Future with timeout (as I’m unsure how everything behaes if the pool starts to initialize a connection but then the “waitTime” times out and it abandons this).

    Just wanted to share my thoughts with you… I hope I find some time the next days to set up a MWE and try it out.

    Julian



Re: Possibly undefined behavior with PooledPlcDriverManager

Posted by Christofer Dutz <ch...@c-ware.de>.
Hi Julian,

this time we would need to initially release the plc4x-build-tools first as we now depend on that.
As soon as that’s done, we could update the versions to release versions and start releasing plc4x again.

Chris

Am 03.07.19, 21:30 schrieb "Julian Feinauer" <j....@pragmaticminds.de>:

    Hi all,
    
    Just a quick update. Tim was able to reproduce the problem and it is due to an issue when connect is called multiple times in a connection (it generates a zombie connection in the background than).
    
    I consider this issue severe (we had it in prod yesterday).
    I will make a Jira issue tomorrow and prepare a patch.
    
    I suggest to release this as hotfix 0.4.1 ASAP.
    Any opinions on that?
    
    Julian
    
    Von meinem Mobiltelefon gesendet
    
    
    -------- Ursprüngliche Nachricht --------
    Von: Julian Feinauer <j....@pragmaticminds.de>
    Datum: Mi., 3. Juli 2019, 19:58
    An: dev@plc4x.apache.org
    Betreff: Possibly undefined behavior with PooledPlcDriverManager
    Hi all,
    
    in the last weeks we observed multiple times strange behavior when connecting to Siemens S7 devices.
    We have not yet been able to trace it down entirely but I have the assumption that it is an issue with the PooledPlcDriverManager.
    
    Whats the issue?
    When doing requests (either via OPM or the “regular” API) we come at a point where all subsequent requests simply fail (and in some cases we were no longer able to send requests to the PLC from other instances, so it looks like the internal server went down).
    
    Whats the setup?
    When I remember correctly, all situations where this occurred used the Pool as Basis.
    We had it both with OPM and the normal API but NOT with the Scraper.
    
    I remember that I spent like a hole day at the Hackathon in Mallorca to get all timeout things to work correctly, as the S7 does not like when you simply cancel your request futures.
    Currently there are two “suspects” from my side.
    
    First, the pool calls the “.connect()” method on a now Connection it establishes but by API convention you also have to do that in your code so it gets called multiple times, which could fuck up stuff.
    Second, connection can also time out (but its no future in our API) so in the Scraper I implemented it as Future with timeout (as I’m unsure how everything behaes if the pool starts to initialize a connection but then the “waitTime” times out and it abandons this).
    
    Just wanted to share my thoughts with you… I hope I find some time the next days to set up a MWE and try it out.
    
    Julian
    


AW: Possibly undefined behavior with PooledPlcDriverManager

Posted by Julian Feinauer <j....@pragmaticminds.de>.
Hi all,

Just a quick update. Tim was able to reproduce the problem and it is due to an issue when connect is called multiple times in a connection (it generates a zombie connection in the background than).

I consider this issue severe (we had it in prod yesterday).
I will make a Jira issue tomorrow and prepare a patch.

I suggest to release this as hotfix 0.4.1 ASAP.
Any opinions on that?

Julian

Von meinem Mobiltelefon gesendet


-------- Ursprüngliche Nachricht --------
Von: Julian Feinauer <j....@pragmaticminds.de>
Datum: Mi., 3. Juli 2019, 19:58
An: dev@plc4x.apache.org
Betreff: Possibly undefined behavior with PooledPlcDriverManager
Hi all,

in the last weeks we observed multiple times strange behavior when connecting to Siemens S7 devices.
We have not yet been able to trace it down entirely but I have the assumption that it is an issue with the PooledPlcDriverManager.

Whats the issue?
When doing requests (either via OPM or the “regular” API) we come at a point where all subsequent requests simply fail (and in some cases we were no longer able to send requests to the PLC from other instances, so it looks like the internal server went down).

Whats the setup?
When I remember correctly, all situations where this occurred used the Pool as Basis.
We had it both with OPM and the normal API but NOT with the Scraper.

I remember that I spent like a hole day at the Hackathon in Mallorca to get all timeout things to work correctly, as the S7 does not like when you simply cancel your request futures.
Currently there are two “suspects” from my side.

First, the pool calls the “.connect()” method on a now Connection it establishes but by API convention you also have to do that in your code so it gets called multiple times, which could fuck up stuff.
Second, connection can also time out (but its no future in our API) so in the Scraper I implemented it as Future with timeout (as I’m unsure how everything behaes if the pool starts to initialize a connection but then the “waitTime” times out and it abandons this).

Just wanted to share my thoughts with you… I hope I find some time the next days to set up a MWE and try it out.

Julian