You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Dewey Dunnington (Jira)" <ji...@apache.org> on 2021/11/05 17:35:00 UTC

[jira] [Commented] (ARROW-14611) [R][C++] Reporting progress from copy_files()?

    [ https://issues.apache.org/jira/browse/ARROW-14611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17439398#comment-17439398 ] 

Dewey Dunnington commented on ARROW-14611:
------------------------------------------



I took the opportunity to learn a bit about the C++ sources! I didn’t find a way to put a callback into CopyFiles that gives any progress info but perhaps there is one. 
C++ source from the R package: <https://github.com/apache/arrow/blob/master/r/src/filesystem.cpp#L267-L275> 
Implementation for arrow::fs::CopyFiles: <https://github.com/apache/arrow/blob/master/cpp/src/arrow/filesystem/filesystem.cc#L586-L607> 

{code:R}
library(cpp11)
Sys.setenv(
  PKG_CXXFLAGS = paste0("-I", Sys.getenv("ARROW_HOME"), "/include"),
  PKG_LIBS = paste0("-L", Sys.getenv("ARROW_HOME"), "/lib", " -larrow")
)

cpp11::cpp_source(code = '

#include <cpp11.hpp>
#include <arrow/filesystem/api.h>
using namespace cpp11;
using namespace arrow;

[[cpp11::register]]
void copy_files2(std::string src_dir, std::string dst_dir) {
  auto fs = std::make_shared<fs::LocalFileSystem>();
  fs::FileSelector source_sel;
  source_sel.base_dir = src_dir;
  
  Status status = fs::CopyFiles(fs, source_sel, fs, dst_dir);
  
  if (!status.ok()) {
    std::string s = status.ToString();
    stop("%s", s.c_str());
  }
}

')

source_dir <- tempfile()
dest_dir <- tempfile()

dir.create(source_dir)
for (i in 1:1000) {
  write(
    as.character(1:i),
    sprintf("%s/file%03d.txt", source_dir, i)
  )
}
dir.create(dest_dir)

copy_files2(source_dir, dest_dir)
waldo::compare(list.files(source_dir), list.files(dest_dir))
#> ✓ No differences
{code}


If there is a way to do this with a callback, C++ progress bars using the progress package might be useful? <https://github.com/r-lib/progress#c-api>


> [R][C++] Reporting progress from copy_files()?
> ----------------------------------------------
>
>                 Key: ARROW-14611
>                 URL: https://issues.apache.org/jira/browse/ARROW-14611
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, R
>            Reporter: Nicola Crane
>            Priority: Minor
>
> Would it be possible to have something that reports progress from {{copy_files()}} which calls CopyFiles from FileSystem?  When copying huge files, the R session just hangs and the user doesn't know if it's working or not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)