You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "wgtmac (via GitHub)" <gi...@apache.org> on 2023/03/03 14:59:14 UTC

[GitHub] [arrow] wgtmac opened a new issue, #34432: [C++][Java][IPC] Java reader cannot read compressed file created by C++ writer

wgtmac opened a new issue, #34432:
URL: https://github.com/apache/arrow/issues/34432

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   To reproduce the issue, use the C++ code below to write and read arrow IPC file. I use the 11.0.0 version. Make sure `arrow_testing` is linked to enable `ArrayFromJSON` utility.
   
   ```cpp
   #include "arrow/io/api.h"
   #include "arrow/filesystem/localfs.h"
   #include "arrow/ipc/feather.h"
   #include "arrow/ipc/writer.h"
   #include "arrow/table.h"
   #include "arrow/testing/gtest_util.h"
   
   #include <iostream>
   
   int main(int argc, char** argv) {
     auto codec_type = ::arrow::Compression::UNCOMPRESSED;
     std::string path = "/tmp/test.arrow";
     auto schema = arrow::schema({field("a", arrow::int32()), field("b", arrow::utf8())});
     auto a = arrow::ArrayFromJSON(arrow::int32(), "[1, 2, 3]");
     auto b = arrow::ArrayFromJSON(arrow::utf8(), R"(["a", "b", "c"])");
     auto table = arrow::Table::Make(schema, {a, b});
     auto out = arrow::io::FileOutputStream::Open(path).MoveValueUnsafe();
     ::arrow::ipc::IpcWriteOptions options{
         .codec = arrow::util::Codec::Create(codec_type).MoveValueUnsafe(),
         .metadata_version = arrow::ipc::MetadataVersion::V5,
     };
     auto writer = ::arrow::ipc::MakeFileWriter(std::move(out), schema, options).MoveValueUnsafe();
     writer->WriteTable(*table).ok();
     writer->Close().ok();
   
     auto fs = arrow::fs::LocalFileSystem();
     auto in = fs.OpenInputFile(path).MoveValueUnsafe();
     auto reader = arrow::ipc::feather::Reader::Open(in).MoveValueUnsafe();
     std::shared_ptr<arrow::Table> read_table;
     if (reader->Read(&read_table).ok()) {
       std::cout << read_table->ToString() << std::endl;
     } else {
       std::cerr << "Failed to read table" << std::endl;
     }
     return 0;
   }
   ```
   
   On the Java side, use the code below:
   ```java
   package test.arrow;
   
   import org.apache.arrow.memory.RootAllocator;
   import org.apache.arrow.vector.ipc.ArrowFileReader;
   import org.apache.commons.compress.utils.SeekableInMemoryByteChannel;
   import org.apache.commons.io.IOUtils;
   
   import java.io.DataInputStream;
   import java.io.FileInputStream;
   import java.io.IOException;
   import java.io.InputStream;
   
   public class ArrowReaderTest {
   
     public static void main(String[] args) throws IOException {
       InputStream inputStream = new DataInputStream(new FileInputStream("/tmp/test.arrow"));
       RootAllocator allocator = new RootAllocator(Long.MAX_VALUE);
       SeekableInMemoryByteChannel channel = new SeekableInMemoryByteChannel
         (IOUtils.toByteArray(inputStream));
       try (ArrowFileReader reader = new ArrowFileReader(channel, allocator)) {
         while (reader.loadNextBatch()) {
           System.out.println(reader.getVectorSchemaRoot().contentToTSVString());
         }
       } catch (Exception e) {
         e.printStackTrace();
       }
     }
   }
   ```
   
   The issues vary with different `codec_type`.
   1. UNCOMPRESSED: Both C++ and Java readers can read correct data.
   2. SNAPPY: Both C++ and Java readers cannot read any data. (Same for GZIP)
   3. ZSTD: C++ reader can read data correctly. But the Java reader throws `java.lang.NegativeArraySizeException` at `org.apache.arrow.vector.VarCharVector.get(VarCharVector.java:115)`
   
   ### Component(s)
   
   C++, Java


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on issue #34432: [C++][Java][IPC] Java reader cannot read compressed file created by C++ writer

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on issue #34432:
URL: https://github.com/apache/arrow/issues/34432#issuecomment-1468463854

   I don't have time in the near feature unfortunately. A PR would be welcome. (I was hoping David or Larry could help but it seems they're busy.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wgtmac commented on issue #34432: [C++][Java][IPC] Java reader cannot read compressed file created by C++ writer

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on issue #34432:
URL: https://github.com/apache/arrow/issues/34432#issuecomment-1453664804

   It seems related to https://github.com/apache/arrow/issues/33688


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] davisusanibar commented on issue #34432: [C++][Java][IPC] Java reader cannot read compressed file created by C++ writer

Posted by "davisusanibar (via GitHub)" <gi...@apache.org>.
davisusanibar commented on issue #34432:
URL: https://github.com/apache/arrow/issues/34432#issuecomment-1471681397

   My apologies for jumping in late.
   
   Close related to https://github.com/apache/arrow/issues/33384
   
   In addition, there is a PR for cookbooks, so I'll get to work on the remaining tasks so we can have a recipe for this use case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wgtmac commented on issue #34432: [C++][Java][IPC] Java reader cannot read compressed file created by C++ writer

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on issue #34432:
URL: https://github.com/apache/arrow/issues/34432#issuecomment-1453765235

   > Java-zstd seems like a bug.
   > 
   > C++ shouldn't let you specify snappy or gzip - that's also a bug.
   
   I can workaround it by not using compression for now.
   
   The fix on the C++ side should be trivial. Will someone fix the Java-zstd bug?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm closed issue #34432: [C++][Java][IPC] Java reader cannot read compressed file created by C++ writer

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm closed issue #34432: [C++][Java][IPC] Java reader cannot read compressed file created by C++ writer
URL: https://github.com/apache/arrow/issues/34432


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wgtmac commented on issue #34432: [C++][Java][IPC] Java reader cannot read compressed file created by C++ writer

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on issue #34432:
URL: https://github.com/apache/arrow/issues/34432#issuecomment-1471331371

   Please feel free to close this issue if there is no further action. @lidavidm 
   I think we can throw from `ArrowFileReader` if it detects a compressed file but `NoCompressionCodec.Factory.INSTANCE` is supplied.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on issue #34432: [C++][Java][IPC] Java reader cannot read compressed file created by C++ writer

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on issue #34432:
URL: https://github.com/apache/arrow/issues/34432#issuecomment-1453680462

   Java-zstd seems like a bug.
   
   C++ shouldn't let you specify snappy or gzip - that's also a bug.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on issue #34432: [C++][Java][IPC] Java reader cannot read compressed file created by C++ writer

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on issue #34432:
URL: https://github.com/apache/arrow/issues/34432#issuecomment-1453766029

   @lwhite1 @davisusanibar 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wgtmac commented on issue #34432: [C++][Java][IPC] Java reader cannot read compressed file created by C++ writer

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on issue #34432:
URL: https://github.com/apache/arrow/issues/34432#issuecomment-1468240789

   > @lwhite1 @davisusanibar
   
   Do we have a plan to fix this in the next release v12.0.0?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wgtmac commented on issue #34432: [C++][Java][IPC] Java reader cannot read compressed file created by C++ writer

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on issue #34432:
URL: https://github.com/apache/arrow/issues/34432#issuecomment-1453657955

   @lidavidm Do you have any idea?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wgtmac commented on issue #34432: [C++][Java][IPC] Java reader cannot read compressed file created by C++ writer

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on issue #34432:
URL: https://github.com/apache/arrow/issues/34432#issuecomment-1470254698

   > I don't have time in the near feature unfortunately. A PR would be welcome. (I was hoping David or Larry could help but it seems they're busy.)
   
   I tried to debug a little bit and found that the offset buffer of VarCharVector has some strange values when decompressed. In the meantime, I happened to see https://github.com/apache/arrow/pull/15194 which deals with the problem in the inverse direction. Not sure if it can solve this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wgtmac commented on issue #34432: [C++][Java][IPC] Java reader cannot read compressed file created by C++ writer

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on issue #34432:
URL: https://github.com/apache/arrow/issues/34432#issuecomment-1471327958

   Finally I have figured out that the default constructor of `ArrowFileReader` applies `NoCompressionCodec.Factory.INSTANCE` so it is unable to decompress a compressed file. One has to add dependency of `arrow-compression` and use the constructors with `CompressionCodec.Factory` to enable it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on issue #34432: [C++][Java][IPC] Java reader cannot read compressed file created by C++ writer

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on issue #34432:
URL: https://github.com/apache/arrow/issues/34432#issuecomment-1472276059

   I filed #34592 to track the C++ side of this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org