You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by GitBox <gi...@apache.org> on 2022/09/07 12:57:32 UTC

[GitHub] [arrow-julia] TanookiToad opened a new issue, #336: Invalid argument error

TanookiToad opened a new issue, #336:
URL: https://github.com/apache/arrow-julia/issues/336

   If you try to save a loaded table into the same file, it will lead to an invalid argument error.
   
   Seems like it's caused by mmap on windows. See JuliaData/CSV.jl#70.
   
   ```jl
   using Arrow
   using DataFrames
   
   df = DataFrame(rand(100, 100), :auto)
   Arrow.write("test.arrow", df)
   
   df = Arrow.Table("test.arrow")
   Arrow.write("test.arrow", df)
   ```
   
   The last line will raise an error. 
   
   ```jl
   ERROR: SystemError: opening file "test.arrow": Invalid argument
   Stacktrace:
     [1] systemerror(p::String, errno::Int32; extrainfo::Nothing)
       @ Base .\error.jl:174
     [2] #systemerror#68
       @ .\error.jl:173 [inlined]
     [3] systemerror
       @ .\error.jl:173 [inlined]
     [4] open(fname::String; lock::Bool, read::Nothing, write::Nothing, create::Nothing, truncate::Bool, append::Nothing)
       @ Base .\iostream.jl:293
     [5] open(fname::String, mode::String; lock::Bool)
       @ Base .\iostream.jl:355
     [6] open(fname::String, mode::String)
       @ Base .\iostream.jl:355
     [7] open(::Arrow.var"#116#117"{Nothing, Nothing, Bool, Nothing, Bool, Bool, Bool, Int64, Int64, Float64, Bool, Arrow.Table}, ::String, ::Vararg{String}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
       @ Base .\io.jl:328
     [8] open(::Function, ::String, ::String)
       @ Base .\io.jl:328
     [9] #write#115
       @ C:\Users\R9000K\.julia\packages\Arrow\SFb8h\src\write.jl:57 [inlined]
    [10] write(file_path::String, tbl::Arrow.Table)
       @ Arrow C:\Users\R9000K\.julia\packages\Arrow\SFb8h\src\write.jl:57
    [11] top-level scope
       @ Untitled-1:8
   ```
   
   However, it works when saved to a different file name other than the original one.
   
   ```jl
   Arrow.write("test1.arrow", df)
   ```
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-julia] bkamins commented on issue #336: Invalid argument error

Posted by GitBox <gi...@apache.org>.
bkamins commented on issue #336:
URL: https://github.com/apache/arrow-julia/issues/336#issuecomment-1336105104

   @TanookiToad - this is strange `copy(df)` and `df[:, :]` are almost the same (their only difference is how metadata is handled but it should not affect the result here)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-julia] TanookiToad commented on issue #336: Invalid argument error

Posted by GitBox <gi...@apache.org>.
TanookiToad commented on issue #336:
URL: https://github.com/apache/arrow-julia/issues/336#issuecomment-1335930504

   Just found out if you're using `DataFrame(Arrow.Table("test.arrow"))[:, :]` instead of `copy(DataFrame(Arrow.Table("test.arrow"))`, the loaded file can now be overwritten on windows. I dunno why these two have different behaviors.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-julia] bkamins commented on issue #336: Invalid argument error

Posted by GitBox <gi...@apache.org>.
bkamins commented on issue #336:
URL: https://github.com/apache/arrow-julia/issues/336#issuecomment-1327021437

   `df = Arrow.Table("test.arrow")` does memory mapping, so this is expected. The difference between OSes might be due to how file locking when doing mmapping is handled.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-julia] quinnj closed issue #336: Invalid argument error

Posted by "quinnj (via GitHub)" <gi...@apache.org>.
quinnj closed issue #336: Invalid argument error
URL: https://github.com/apache/arrow-julia/issues/336


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-julia] TanookiToad commented on issue #336: Invalid argument error

Posted by GitBox <gi...@apache.org>.
TanookiToad commented on issue #336:
URL: https://github.com/apache/arrow-julia/issues/336#issuecomment-1336261151

   > @TanookiToad - this is strange `copy(df)` and `df[:, :]` are almost the same (their only difference is how metadata is handled but it should not affect the result here)
   
   Yeah. I was wrong about that. There's actually no difference in `copy(df)` vs `df[:, :]` on this issue. Actually after more tests, it becomes pretty cofusing to me that only sometimes loaded data can be rewritten on Windows.
   
   For example, the following code will work if `"test.arrow"` is constructed using `DataFrame(rand(10000, 1000), :auto)`, but it will raise an error if it's a smaller data like `DataFrame(rand(100, 100), :auto)` or a larger data like `DataFrame(rand(10000, 10000), :auto)`
   
   ```julia
   df = copy(DataFrame(Arrow.Table("test.arrow")))
   Arrow.write("test.arrow", df)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-julia] quinnj commented on issue #336: Invalid argument error

Posted by "quinnj (via GitHub)" <gi...@apache.org>.
quinnj commented on issue #336:
URL: https://github.com/apache/arrow-julia/issues/336#issuecomment-1590175221

   Fix for this is up: https://github.com/apache/arrow-julia/pull/469. Sorry for the slow response here, but it would be great if anyone on windows could confirm that the original issue is fixed w/ that PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-julia] jeremiedb commented on issue #336: Invalid argument error

Posted by GitBox <gi...@apache.org>.
jeremiedb commented on issue #336:
URL: https://github.com/apache/arrow-julia/issues/336#issuecomment-1326877205

   I can can reproduce here on Windows. Cannot write arrow to the same filepath which was previously read from. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-julia] TanookiToad commented on issue #336: Invalid argument error

Posted by GitBox <gi...@apache.org>.
TanookiToad commented on issue #336:
URL: https://github.com/apache/arrow-julia/issues/336#issuecomment-1336165621

   > @TanookiToad - this is strange `copy(df)` and `df[:, :]` are almost the same (their only difference is how metadata is handled but it should not affect the result here)
   
   Sry i made a mistake when writing my comment above. it's actually `copy(df[:, :])` vs `copy(df)`. And i'm able to rewrite the original file using the first command.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-julia] jeremiedb commented on issue #336: Invalid argument error

Posted by GitBox <gi...@apache.org>.
jeremiedb commented on issue #336:
URL: https://github.com/apache/arrow-julia/issues/336#issuecomment-1326892898

   The issue wan't OS / Windows specific as the above example crashed Julia session on Ubuntu.
   In order to write to the same path that was read from, a `copy` looks necessary. For example, the following does work:
   ```
   using Arrow
   using DataFrames
   
   df = DataFrame(rand(100, 100), :auto)
   Arrow.write("test.arrow", df)
   df = copy(DataFrame(Arrow.Table("test.arrow")))
   Arrow.write("test.arrow", df)
   ``` 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-julia] TanookiToad commented on issue #336: Invalid argument error

Posted by GitBox <gi...@apache.org>.
TanookiToad commented on issue #336:
URL: https://github.com/apache/arrow-julia/issues/336#issuecomment-1327674749

   I forgot to ```copy``` the mmapped table in my sample code. 
   
   As @jeremiedb mentioned, you can overwrite the loaded arrow file on Linux if it's copied, but the same code will raise an error on Windows. A temporary workaround from [JuliaData/CSV.jl#170](https://github.com/JuliaData/CSV.jl/issues/170#issuecomment-365259677) is to use ```GC.gc()``` before saving it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org