You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "baumgold (via GitHub)" <gi...@apache.org> on 2023/04/14 22:58:39 UTC

[GitHub] [arrow-julia] baumgold opened a new pull request, #428: Base.isdone for Stream

baumgold opened a new pull request, #428:
URL: https://github.com/apache/arrow-julia/pull/428

   Implement Base.isdone for Arrow.Stream
   
   https://github.com/JuliaLang/julia/blob/v1.8.5/base/essentials.jl#L882-L895


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-julia] quinnj merged pull request #428: Base.isdone for Stream

Posted by "quinnj (via GitHub)" <gi...@apache.org>.
quinnj merged PR #428:
URL: https://github.com/apache/arrow-julia/pull/428


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-julia] quinnj commented on a diff in pull request #428: Base.isdone for Stream

Posted by "quinnj (via GitHub)" <gi...@apache.org>.
quinnj commented on code in PR #428:
URL: https://github.com/apache/arrow-julia/pull/428#discussion_r1174504543


##########
src/table.jl:
##########
@@ -80,9 +80,17 @@ function Stream(inputs::Vector{ArrowBlob}; convert::Bool=true)
     Stream(inputs, inputindex, batchiterator, names, types, schema, dictencodings, dictencoded, convert, compression)
 end
 
-Stream(input, pos::Integer=1, len=nothing; kw...) = Stream([ArrowBlob(tobytes(input), pos, len)]; kw...)
-Stream(input::Vector{UInt8}, pos::Integer=1, len=nothing; kw...) = Stream([ArrowBlob(tobytes(input), pos, len)]; kw...)
-Stream(inputs::Vector; kw...) = Stream([ArrowBlob(tobytes(x), 1, nothing) for x in inputs]; kw...)
+function Stream(input, pos::Integer=1, len=nothing; kw...)
+    b = tobytes(input)
+    isempty(b) ? Stream(ArrowBlob[]; kw...) : Stream([ArrowBlob(b, pos, len)]; kw...)
+end
+
+function Stream(input::Vector{UInt8}, pos::Integer=1, len=nothing; kw...)
+    b = tobytes(input)
+    isempty(b) ? Stream(ArrowBlob[]; kw...) : Stream([ArrowBlob(b, pos, len)]; kw...)
+end
+
+Stream(inputs::AbstractVector; kw...) = Stream([ArrowBlob(tobytes(x), 1, nothing) for x in inputs]; kw...)

Review Comment:
   Do we not want to do a similar check for each input element?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-julia] quinnj commented on pull request #428: Base.isdone for Stream

Posted by "quinnj (via GitHub)" <gi...@apache.org>.
quinnj commented on PR #428:
URL: https://github.com/apache/arrow-julia/pull/428#issuecomment-1518938619

   Hmmmm, I'm not sure I'm following the changes or reasoning here. Can you explain how `isempty` changes the current `Arrow.Stream` state? I'm not super familiar w/ `Base.isdone` or how it's used for other types or why; if you could share an example, that would be great. Another question that comes to mind is why we shouldn't define `isempty(::Arrow.Stream)` ourselves if it's causing problems?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-julia] baumgold commented on pull request #428: Base.isdone for Stream

Posted by "baumgold (via GitHub)" <gi...@apache.org>.
baumgold commented on PR #428:
URL: https://github.com/apache/arrow-julia/pull/428#issuecomment-1520107608

   @quinnj - Iterating on Arrow.Stream modifies the `inputindex` field, making it stateful iteration rather than stateless.  The default implementation for Base.isempty uses iteration to check if there are further elements remaining:
   
   https://github.com/JuliaLang/julia/blob/v1.8.5/base/essentials.jl#L785-L789
   
   Unfortunately this doesn't work for stateful iterators since calling Base.isdone modifies the iterator's state.  A solution to this is to define Base.isdone:
   
   https://github.com/JuliaLang/julia/blob/v1.8.5/base/essentials.jl#L882-L895
   
   Note the doc-string explains this quite clearly:
   
   > This function provides a fast-path hint for iterator completion. This is useful for mutable iterators that want to avoid having elements consumed, if they are not going to be exposed to the user (e.g. to check for done-ness in `isempty` or `zip`). Mutable iterators that want to opt into this feature should define an isdone method that returns true/false depending on whether the iterator is done or not. Stateless iterators need not implement this function. If the result is `missing`, callers may go ahead and compute `iterate(x, state...) === nothing` to compute a definite answer.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-julia] quinnj commented on a diff in pull request #428: Base.isdone for Stream

Posted by "quinnj (via GitHub)" <gi...@apache.org>.
quinnj commented on code in PR #428:
URL: https://github.com/apache/arrow-julia/pull/428#discussion_r1203357397


##########
src/table.jl:
##########
@@ -80,9 +80,17 @@ function Stream(inputs::Vector{ArrowBlob}; convert::Bool=true)
     Stream(inputs, inputindex, batchiterator, names, types, schema, dictencodings, dictencoded, convert, compression)
 end
 
-Stream(input, pos::Integer=1, len=nothing; kw...) = Stream([ArrowBlob(tobytes(input), pos, len)]; kw...)
-Stream(input::Vector{UInt8}, pos::Integer=1, len=nothing; kw...) = Stream([ArrowBlob(tobytes(input), pos, len)]; kw...)
-Stream(inputs::Vector; kw...) = Stream([ArrowBlob(tobytes(x), 1, nothing) for x in inputs]; kw...)
+function Stream(input, pos::Integer=1, len=nothing; kw...)
+    b = tobytes(input)
+    isempty(b) ? Stream(ArrowBlob[]; kw...) : Stream([ArrowBlob(b, pos, len)]; kw...)
+end
+
+function Stream(input::Vector{UInt8}, pos::Integer=1, len=nothing; kw...)
+    b = tobytes(input)
+    isempty(b) ? Stream(ArrowBlob[]; kw...) : Stream([ArrowBlob(b, pos, len)]; kw...)
+end
+
+Stream(inputs::AbstractVector; kw...) = Stream([ArrowBlob(tobytes(x), 1, nothing) for x in inputs]; kw...)

Review Comment:
   Bump; it seems like we should filter out empty elements in the `inputs` array here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-julia] baumgold commented on a diff in pull request #428: Base.isdone for Stream

Posted by "baumgold (via GitHub)" <gi...@apache.org>.
baumgold commented on code in PR #428:
URL: https://github.com/apache/arrow-julia/pull/428#discussion_r1210704253


##########
src/table.jl:
##########
@@ -80,9 +80,17 @@ function Stream(inputs::Vector{ArrowBlob}; convert::Bool=true)
     Stream(inputs, inputindex, batchiterator, names, types, schema, dictencodings, dictencoded, convert, compression)
 end
 
-Stream(input, pos::Integer=1, len=nothing; kw...) = Stream([ArrowBlob(tobytes(input), pos, len)]; kw...)
-Stream(input::Vector{UInt8}, pos::Integer=1, len=nothing; kw...) = Stream([ArrowBlob(tobytes(input), pos, len)]; kw...)
-Stream(inputs::Vector; kw...) = Stream([ArrowBlob(tobytes(x), 1, nothing) for x in inputs]; kw...)
+function Stream(input, pos::Integer=1, len=nothing; kw...)
+    b = tobytes(input)
+    isempty(b) ? Stream(ArrowBlob[]; kw...) : Stream([ArrowBlob(b, pos, len)]; kw...)
+end
+
+function Stream(input::Vector{UInt8}, pos::Integer=1, len=nothing; kw...)
+    b = tobytes(input)
+    isempty(b) ? Stream(ArrowBlob[]; kw...) : Stream([ArrowBlob(b, pos, len)]; kw...)
+end
+
+Stream(inputs::AbstractVector; kw...) = Stream([ArrowBlob(tobytes(x), 1, nothing) for x in inputs]; kw...)

Review Comment:
   Agreed, fixed.  Thanks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org