You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/07/25 07:58:54 UTC

[GitHub] [arrow] Oooorchid opened a new issue, #13703: [Java][C++] Java VectorSchemaRoot to C++ RecordBatchreader

Oooorchid opened a new issue, #13703:
URL: https://github.com/apache/arrow/issues/13703

   I have a Java library that is writing an Arrow Table to a VectorSchemaRoot object in memory. And I want read the data with C++. But it keeps getting an error, what should I do?
   
   From the docs I only find C++ to Java, not Java to C++ [docs](https://arrow.apache.org/docs/java/cdata.html)
   
   Here is my example:
   [Java]
   ```
   public String rootToString(){
       try( 
           BufferAllocator allcator = new RootAllocator(); 
           FileInputStream fileInputStream = new FileInputStream(filepath);
           ArrowFileReader reader = new ArrowFileReader(fileInputStream.getChannel(), allcator);
           ByteArrayOutputStream out = new ByteArrayOutputStream();
       ){
           System.out.println("Record batches in file : " + reader.getRecordBlocks().size()); // Actually size = 1
           ArrowBlock arrowBlock = reader.getRecordBlocks().get(0);
           VectorSchemaRoot root = reader.getVectorSchemaRoot();
           reader.loadRecordBatch(arrowBlock);
           System.out.println(root.contentToTSVString());
    
           ArrowFileWriter writer = new ArrowFileWriter(root, null, Channels.newChannel(out));
           writer.start();
           writer.writeBatch();
           writer.end();
           writer.close();
   
           return out.toString();}}
   ``` 
   
   
   [C++]
   
   ```
   ....
   std::String JStringToString(JNIEnv* env, jstring string){
       if(string == nullptr){
           return std::string();
   }
       const char* chars = env->GetStringUTFChars(string, nullptr);
       std::string ret(chars);
       env->ReleaseStringUTFChars(string, chars);
       return ret;
   }
   
   std::string test(std::string name){
       if (status != JNI_ERR){
           jclass cls = env->FindClass("com/xxxxx");
           jmethodID mid = env->GetMethodID(cls, "", "(Ljava/lang/String;)V");
           jstring arg = NewJString(name.c_str());
           jobject obj = env->NewObject(cls, mid, arg);
           mid = env->GetMethodID(cls, "rootToString", "()Ljava/lang/String;");
           jstring ret = *(jstring)env->CallObjectMethod(test, mid);
           std::cout << "Java String length is : " << env->GetStringLength(ret) << std::endl; // length is 2563
           std::string result = JStringToCString(env, ret);
           std::cout << "result length is : " << result.length() << std::endl; // length is 4609
           return result;
   }
   
   int main(){
       std::string test_result = test("/data/...../4stock_5day.arrow");
       std::shared_ptrarrow::io::BufferReader bufferReader = std::make_sharedarrow::io::BufferReader(test_result);
       std::shared_ptrarrow::ipc::RecordBatchFileReader reader =
       arrow::ipc::RecordBatchFileReader::Open(bufferReader.get()).ValueOrDie();
       std::cout << reader -> num_record_batches() << std::endl;
       return 0;
   }
   ```
   
   The error is as follow:
   
   `xxxxxxxxxx/work/cpp/src/arrow/result.cc:28 : ValueOrDie called on an error : Invalid: File is smaller than indicated metadata size
   /usr/local/conda3/lib/libarrow.so.500(+0x518f0c)[0x7f44fc2d3f0c]
   /usr/local/conda3/lib/libarrow.so.500(_ZN) ...`
   
   what should I do? Is there any other way ? Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] Oooorchid commented on issue #13703: [Java][C++] Java VectorSchemaRoot to C++ RecordBatchreader

Posted by GitBox <gi...@apache.org>.
Oooorchid commented on issue #13703:
URL: https://github.com/apache/arrow/issues/13703#issuecomment-1206117305

   [JAVA]
   At JAVA side,it will return byte[] like this:
   ```
   ......
   
        VectorSchemaRoot root = reader.getVectorSchemaRoot();
        reader.loadRecordBatch(arrowBlock);
        System.out.println(root.contentToTSVString());
    
        ArrowFileWriter writer = new ArrowFileWriter(root, null, Channels.newChannel(out));
        writer.start();
        writer.writeBatch();
        writer.end();
        writer.close();
   
        return out.toByteArray();}}
   .....
   
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] Oooorchid closed issue #13703: [Java][C++] Java VectorSchemaRoot to C++ RecordBatchreader

Posted by GitBox <gi...@apache.org>.
Oooorchid closed issue #13703: [Java][C++] Java VectorSchemaRoot to C++ RecordBatchreader
URL: https://github.com/apache/arrow/issues/13703


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on issue #13703: [Java][C++] Java VectorSchemaRoot to C++ RecordBatchreader

Posted by GitBox <gi...@apache.org>.
lidavidm commented on issue #13703:
URL: https://github.com/apache/arrow/issues/13703#issuecomment-1205167877

   @davisusanibar you already had a solution here I think, can you post it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on issue #13703: [Java][C++] Java VectorSchemaRoot to C++ RecordBatchreader

Posted by GitBox <gi...@apache.org>.
lidavidm commented on issue #13703:
URL: https://github.com/apache/arrow/issues/13703#issuecomment-1205170453

   Or actually @Oooorchid see https://github.com/apache/arrow/pull/13788
   
   FWIW, I believe the immediate problem in the code is the use of a string to transport the buffer over JNI. It should be byte[]. `ByteArrayOutputStream#toString` will attempt to decode the contents _as text_ which corrupts the content.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] Oooorchid commented on issue #13703: [Java][C++] Java VectorSchemaRoot to C++ RecordBatchreader

Posted by GitBox <gi...@apache.org>.
Oooorchid commented on issue #13703:
URL: https://github.com/apache/arrow/issues/13703#issuecomment-1206098206

   > Or actually @Oooorchid see #13788
   > 
   > FWIW, I believe the immediate problem in the code is the use of a string to transport the buffer over JNI. It should be byte[]. `ByteArrayOutputStream#toString` will attempt to decode the contents _as text_ which corrupts the content.
   
   Thanks for your reply, your solution is perfect.   I also solved this problem few days ago. as you said using string transport over JNI will cause some questions.  I used byte[]  to slove this problem finally, and there is my example:
   
   
   ```
    ......
        jclass cls = env->FindClass->("com/xxxxx");
        jmethodID mid = env->GetMethodID(cls, "init", "(Ljava/lang/String;)V");
        jstring arg = NewJString(name.c_str());
        jobject obj = env->NewObject(cls, mid, arg);
        mid = env->GetMethodID(cls, "rootToByte", "[B");
        jbyteArray dataArray = (jbyteArray)env->CallObjectMethod(test, mid);
       
        int arr_len = env->GetArrayLength(dataArray);
        std::cout  << "arr_len  is :"  <<  arr_len << std::endl;
        jbyte* bytes = env->GetByteArrayElements(dataArray, 0);
        
        char* ret = (char*)bytes;
        env->SetByteArrayRegion(dataArray, 0, arr_len, bytes);
        // Actually if thers is no _arrlen_  specified, the result will stop  when (char*)ret encounters 0
        std::string result = std::string(ret, arr_len);
   
        std::cout << "result is : " << result << std::endl; 
        return result;
   .....
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on issue #13703: [Java][C++] Java VectorSchemaRoot to C++ RecordBatchreader

Posted by GitBox <gi...@apache.org>.
pitrou commented on issue #13703:
URL: https://github.com/apache/arrow/issues/13703#issuecomment-1204876072

   cc @lidavidm @davisusanibar 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org