You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/10 02:15:36 UTC

[GitHub] [arrow-julia] baumgold opened a new pull request, #325: allow ntasks to be 0 and determine whetheher to use threads based on nthreads rather than ntasks

baumgold opened a new pull request, #325:
URL: https://github.com/apache/arrow-julia/pull/325

   This allows configuring an unbuffered channel while still using multithreading


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-julia] codecov-commenter commented on pull request #325: allow ntasks to be 0 and determine whetheher to use threads based on nthreads rather than ntasks

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #325:
URL: https://github.com/apache/arrow-julia/pull/325#issuecomment-1151854796

   # [Codecov](https://codecov.io/gh/apache/arrow-julia/pull/325?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#325](https://codecov.io/gh/apache/arrow-julia/pull/325?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (d3123fd) into [main](https://codecov.io/gh/apache/arrow-julia/commit/aeda3f41ace7b560a39a7ec669a6399ae8e60796?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (aeda3f4) will **increase** coverage by `0.02%`.
   > The diff coverage is `100.00%`.
   
   ```diff
   @@            Coverage Diff             @@
   ##             main     #325      +/-   ##
   ==========================================
   + Coverage   87.15%   87.17%   +0.02%     
   ==========================================
     Files          26       26              
     Lines        3301     3299       -2     
   ==========================================
   - Hits         2877     2876       -1     
   + Misses        424      423       -1     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-julia/pull/325?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [src/write.jl](https://codecov.io/gh/apache/arrow-julia/pull/325/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3JjL3dyaXRlLmps) | `96.92% <100.00%> (+0.28%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-julia/pull/325?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-julia/pull/325?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [aeda3f4...d3123fd](https://codecov.io/gh/apache/arrow-julia/pull/325?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-julia] ericphanson commented on a diff in pull request #325: allow ntasks to be 0 and determine whetheher to use threads based on nthreads rather than ntasks

Posted by GitBox <gi...@apache.org>.
ericphanson commented on code in PR #325:
URL: https://github.com/apache/arrow-julia/pull/325#discussion_r894624797


##########
src/write.jl:
##########
@@ -135,16 +135,13 @@ mutable struct Writer{T<:IO}
 end
 
 function Base.open(::Type{Writer}, io::T, compress::Union{Nothing,LZ4FrameCompressor,<:AbstractVector{LZ4FrameCompressor},ZstdCompressor,<:AbstractVector{ZstdCompressor}}, writetofile::Bool, largelists::Bool, denseunions::Bool, dictencode::Bool, dictencodenested::Bool, alignment::Integer, maxdepth::Integer, ntasks::Integer, meta::Union{Nothing,Any}, colmeta::Union{Nothing,Any}, closeio::Bool) where {T<:IO}
-    if ntasks < 1
-        throw(ArgumentError("ntasks keyword argument must be > 0; pass `ntasks=1` to disable multithreaded writing"))
-    end
     msgs = OrderedChannel{Message}(ntasks)
     schema = Ref{Tables.Schema}()
     firstcols = Ref{Any}()
     dictencodings = Dict{Int64,Any}() # Lockable{DictEncoding}
     blocks = (Block[], Block[])
     # start message writing from channel
-    threaded = ntasks > 1
+    threaded = Threads.nthreads() > 1

Review Comment:
   ah ok, thank you for clarifying, that seems reasonable & non-breaking to me too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-julia] quinnj merged pull request #325: allow ntasks to be 0 and determine whetheher to use threads based on nthreads rather than ntasks

Posted by GitBox <gi...@apache.org>.
quinnj merged PR #325:
URL: https://github.com/apache/arrow-julia/pull/325


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-julia] baumgold commented on a diff in pull request #325: allow ntasks to be 0 and determine whetheher to use threads based on nthreads rather than ntasks

Posted by GitBox <gi...@apache.org>.
baumgold commented on code in PR #325:
URL: https://github.com/apache/arrow-julia/pull/325#discussion_r894582858


##########
src/write.jl:
##########
@@ -135,16 +135,13 @@ mutable struct Writer{T<:IO}
 end
 
 function Base.open(::Type{Writer}, io::T, compress::Union{Nothing,LZ4FrameCompressor,<:AbstractVector{LZ4FrameCompressor},ZstdCompressor,<:AbstractVector{ZstdCompressor}}, writetofile::Bool, largelists::Bool, denseunions::Bool, dictencode::Bool, dictencodenested::Bool, alignment::Integer, maxdepth::Integer, ntasks::Integer, meta::Union{Nothing,Any}, colmeta::Union{Nothing,Any}, closeio::Bool) where {T<:IO}
-    if ntasks < 1
-        throw(ArgumentError("ntasks keyword argument must be > 0; pass `ntasks=1` to disable multithreaded writing"))
-    end
     msgs = OrderedChannel{Message}(ntasks)
     schema = Ref{Tables.Schema}()
     firstcols = Ref{Any}()
     dictencodings = Dict{Int64,Any}() # Lockable{DictEncoding}
     blocks = (Block[], Block[])
     # start message writing from channel
-    threaded = ntasks > 1
+    threaded = Threads.nthreads() > 1

Review Comment:
   Thanks for your comment.  I don't think this change is breaking.  In this context, `threaded` just determines whether to use threads or coroutines.  In either case the processing of each partition is done in parallel and then sequenced through an `OrderedChannel` to be serialized to the IO.  The IO serialization is sequential (not parallel) since order matters so no locking is required.  We should use threads when they are available, otherwise we should use coroutines.
   
   The `ntasks` parameter is orthogonal to the choice of threads or coroutines.  `ntasks` defines how large the buffer can be within the `OrderedChannel`.  A larger buffer means the parallel processing of partitions can outpace  the sequential serialization of the results to the IO since the processed results can be cached in the buffer.  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-julia] ericphanson commented on a diff in pull request #325: allow ntasks to be 0 and determine whetheher to use threads based on nthreads rather than ntasks

Posted by GitBox <gi...@apache.org>.
ericphanson commented on code in PR #325:
URL: https://github.com/apache/arrow-julia/pull/325#discussion_r894390429


##########
src/write.jl:
##########
@@ -135,16 +135,13 @@ mutable struct Writer{T<:IO}
 end
 
 function Base.open(::Type{Writer}, io::T, compress::Union{Nothing,LZ4FrameCompressor,<:AbstractVector{LZ4FrameCompressor},ZstdCompressor,<:AbstractVector{ZstdCompressor}}, writetofile::Bool, largelists::Bool, denseunions::Bool, dictencode::Bool, dictencodenested::Bool, alignment::Integer, maxdepth::Integer, ntasks::Integer, meta::Union{Nothing,Any}, colmeta::Union{Nothing,Any}, closeio::Bool) where {T<:IO}
-    if ntasks < 1
-        throw(ArgumentError("ntasks keyword argument must be > 0; pass `ntasks=1` to disable multithreaded writing"))
-    end
     msgs = OrderedChannel{Message}(ntasks)
     schema = Ref{Tables.Schema}()
     firstcols = Ref{Any}()
     dictencodings = Dict{Int64,Any}() # Lockable{DictEncoding}
     blocks = (Block[], Block[])
     # start message writing from channel
-    threaded = ntasks > 1
+    threaded = Threads.nthreads() > 1

Review Comment:
   isn't this breaking? since someone could've been relying on "to disable multithreaded writing, pass `ntasks=1`" if they are writing to some `io` handle that doesn't use locking internally. (I wasn't so maybe it's theoretical, just looks like a strange change to me)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org