You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Stefan Miklosovic (Jira)" <ji...@apache.org> on 2022/02/05 14:45:00 UTC

[jira] [Updated] (CASSANDRA-17350) Investigate and remove Windows-specific threading-related code in copyutils.py

     [ https://issues.apache.org/jira/browse/CASSANDRA-17350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stefan Miklosovic updated CASSANDRA-17350:
------------------------------------------
    Description: 
There are bits of the code to be removed or refactored related to how Windows were treating threading in copyutils.py as Windows is not longer supported.

[~Bowen Song] put it best so I just copy it here from GitHub PR for 16956, this code relates to FilesReader, FeedingProcess and ChildProcess Python classes.

We agreed on the fact that 16956 may be merged without this being addressed as it requires further investigation in the matter which would unncessarily postpone and delay it.

Bowen's take on this:

I believe that we should move the code from on_fork() to __init__(), and may also remove the on_fork() method if it's no longer used.

The problem with Windows and Python multiprocessing is that Windows doesn't support fork(), therefore Python implemented a workaround. On Windows, Python multiprocessing library uses pickle to serialise everything in memory, spawn a new process, and then restores the memory content from the serialised data. The ReceivingChannel and SendingChannel objects are not serialisable because they have file descriptors (which I believe it's called a file handle on Windows) in them, therefore the code has to handle them after the fake fork().

However, I'm concerned that moving the code from on_fork() to __init__() may have other unintended side effects. For example, the file descriptors (FDs) will be opened before the fork, in some edge cases the fork may never happen (e.g.: an exception raised in or just after the init, but before the fork). Where's the code responsible for closing the FDs on the parent process side? Will this cause any FD leak? This clearly requires further work to find out.

To be honest, I don't think removing the comments without addressing the above is a wise move. Future developers wouldn't have the opportunity to understand why the code is written in this way if the comment is removed. 

  was:
There are bits of the code to be removed or refactored related to how Windows were treating threading in copyutils.py as Windows is not longer supported.

FilesReader, FeedingProcess and ChildProcess:

[~Bowen Song] put it the best so I just copy it here from GitHub PR for 16956, this code relates to FilesReader, FeedingProcess and ChildProcess Python classes.

We agreed on the fact that 16956 may be merged without this being addressed as it requires further investigation in the matter which would unncessarily postpone and delay it.

Bowen's take on this:

I believe that we should move the code from on_fork() to __init__(), and may also remove the on_fork() method if it's no longer used.

The problem with Windows and Python multiprocessing is that Windows doesn't support fork(), therefore Python implemented a workaround. On Windows, Python multiprocessing library uses pickle to serialise everything in memory, spawn a new process, and then restores the memory content from the serialised data. The ReceivingChannel and SendingChannel objects are not serialisable because they have file descriptors (which I believe it's called a file handle on Windows) in them, therefore the code has to handle them after the fake fork().

However, I'm concerned that moving the code from on_fork() to __init__() may have other unintended side effects. For example, the file descriptors (FDs) will be opened before the fork, in some edge cases the fork may never happen (e.g.: an exception raised in or just after the init, but before the fork). Where's the code responsible for closing the FDs on the parent process side? Will this cause any FD leak? This clearly requires further work to find out.

To be honest, I don't think removing the comments without addressing the above is a wise move. Future developers wouldn't have the opportunity to understand why the code is written in this way if the comment is removed. 


> Investigate and remove Windows-specific threading-related code in copyutils.py
> ------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-17350
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17350
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tool/cqlsh
>            Reporter: Stefan Miklosovic
>            Priority: Normal
>
> There are bits of the code to be removed or refactored related to how Windows were treating threading in copyutils.py as Windows is not longer supported.
> [~Bowen Song] put it best so I just copy it here from GitHub PR for 16956, this code relates to FilesReader, FeedingProcess and ChildProcess Python classes.
> We agreed on the fact that 16956 may be merged without this being addressed as it requires further investigation in the matter which would unncessarily postpone and delay it.
> Bowen's take on this:
> I believe that we should move the code from on_fork() to __init__(), and may also remove the on_fork() method if it's no longer used.
> The problem with Windows and Python multiprocessing is that Windows doesn't support fork(), therefore Python implemented a workaround. On Windows, Python multiprocessing library uses pickle to serialise everything in memory, spawn a new process, and then restores the memory content from the serialised data. The ReceivingChannel and SendingChannel objects are not serialisable because they have file descriptors (which I believe it's called a file handle on Windows) in them, therefore the code has to handle them after the fake fork().
> However, I'm concerned that moving the code from on_fork() to __init__() may have other unintended side effects. For example, the file descriptors (FDs) will be opened before the fork, in some edge cases the fork may never happen (e.g.: an exception raised in or just after the init, but before the fork). Where's the code responsible for closing the FDs on the parent process side? Will this cause any FD leak? This clearly requires further work to find out.
> To be honest, I don't think removing the comments without addressing the above is a wise move. Future developers wouldn't have the opportunity to understand why the code is written in this way if the comment is removed. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org