You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/10/17 16:28:18 UTC

[GitHub] [arrow] varadal opened a new issue, #14438: How to properly resize a MemoryMappedFile when using RecordBatchWriter?

varadal opened a new issue, #14438:
URL: https://github.com/apache/arrow/issues/14438

   Hello,
   Using the arrow cpp api I am creating a MemoryMappedFile with excess space and am writing to it using a RecordBatchWriter. When closing the RecordBatchWriter I am finding that I don't have the exact size of what has been written to the MemoryMappedFile and can't truncate/resize the file. Is there a way for me to get the total number of bytes that have been written to the MemoryMappedFile? I found that RecordBatchWriter::stats() does not include of the number of bytes written when RecordBatchWriter::Close() is called.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] varadal closed issue #14438: [C++][Python] Get size of RecordBatchWriter.close()?

Posted by "varadal (via GitHub)" <gi...@apache.org>.
varadal closed issue #14438: [C++][Python] Get size of RecordBatchWriter.close()?
URL: https://github.com/apache/arrow/issues/14438


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] varadal commented on issue #14438: [C++][Python] Get size of RecordBatchWriter.close()?

Posted by "varadal (via GitHub)" <gi...@apache.org>.
varadal commented on issue #14438:
URL: https://github.com/apache/arrow/issues/14438#issuecomment-1651778863

   That makes sense. Then is there a way to use MakeStreamWriter with a MemoryMappedFile sink?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ljluestc commented on issue #14438: Get size of RecordBatchWriter.close()?

Posted by "ljluestc (via GitHub)" <gi...@apache.org>.
ljluestc commented on issue #14438:
URL: https://github.com/apache/arrow/issues/14438#issuecomment-1646977064

   In Apache Arrow, the RecordBatchWriter.close() method is responsible for finalizing the writing process and flushing any remaining data to the underlying storage or stream. However, the Arrow library itself does not provide a direct method to retrieve the size of the data that was written when the writer is closed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ljluestc commented on issue #14438: [C++][Python] Get size of RecordBatchWriter.close()?

Posted by "ljluestc (via GitHub)" <gi...@apache.org>.
ljluestc commented on issue #14438:
URL: https://github.com/apache/arrow/issues/14438#issuecomment-1652861250

   ```
   #include <arrow/api.h>
   #include <arrow/io/api.h>
   
   using namespace arrow;
   
   int main() {
     std::shared_ptr<io::MemoryMappedFile> mmap_file;
     ARROW_ASSIGN_OR_RAISE(mmap_file, io::MemoryMappedFile::Create("data.arrow", io::FileMode::WRITE));
     std::shared_ptr<io::FileOutputStream> output_stream = std::make_shared<io::FileOutputStream>(mmap_file);
     std::shared_ptr<ipc::RecordBatchWriter> writer;
     ARROW_ASSIGN_OR_RAISE(writer, ipc::MakeFileWriter(output_stream, schema));
     ARROW_RETURN_NOT_OK(writer->WriteRecordBatch(*record_batch));
     ARROW_RETURN_NOT_OK(writer->Close());
     int64_t written_size = mmap_file->GetSize();
     std::cout << "Written data size: " << written_size << std::endl;
     return 0;
   }
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] varadal commented on issue #14438: [C++][Python] Get size of RecordBatchWriter.close()?

Posted by "varadal (via GitHub)" <gi...@apache.org>.
varadal commented on issue #14438:
URL: https://github.com/apache/arrow/issues/14438#issuecomment-1654374557

   FileOutputStream doesn't accept a MemoryMappedFile, but I was able to get my code to work with your help. Thanks so much!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org