You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2021/09/29 18:36:00 UTC

[jira] [Created] (ARROW-14175) [C++][Dataset] Add more fine-grained error for existing data to dataset writer

Weston Pace created ARROW-14175:
-----------------------------------

             Summary: [C++][Dataset] Add more fine-grained error for existing data to dataset writer
                 Key: ARROW-14175
                 URL: https://issues.apache.org/jira/browse/ARROW-14175
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Weston Pace


ARROW-13650 is adding different behaviors for handling existing data in the dataset writer.  One of these is a very coarse "error" behavior which will return an error if the destination directory has any files in it at all.

However, during the discussion of the PR, we decided it would be helpful to have a more fine grained error behavior that only returned an error if it encountered a file that was going to be overwritten.  This would allow someone to safely do a write that should only append new data.

However, it is a bit tricky, because the files to be written to will not be known ahead of time.  So this error may be encountered after we have already started writing data.  The data already written would need to be rolled back somehow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)