You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/04/26 16:06:25 UTC

[GitHub] [arrow] Sounek opened a new issue #10164: [Cpp][Parquet] WriteTable does not write the schema metadata into the parquet file

Sounek opened a new issue #10164:
URL: https://github.com/apache/arrow/issues/10164


   Hey,
   
   I create a parquet file in the following way:
   
   ```
   arrow::DoubleBuilder f64builder;
   f64builderWcX.AppendValues({1.0, 2.0, 3.0});
   std::shared_ptr<arrow::Array> array;
   f64builderWcX.Finish(&array);
   
   std::vector<std::shared_ptr<arrow::Field>> fields;
   auto field = std::make_shared<arrow::Field>("DataRow1", arrow::float64(), true, arrow::key_value_metadata({ "Unit" }, { "mm" }));
   
   auto schema = std::make_shared<arrow::Schema>(fields);
   schema = schema->WithMetadata(arrow::key_value_metadata({ "user" }, { "me" }));
   
   auto table = arrow::Table::Make(schema, {array});
   
   std::shared_ptr<arrow::io::FileOutputStream> outfile;
   PARQUET_ASSIGN_OR_THROW(outfile, arrow::io::FileOutputStream::Open("D:\\parquet-arrow-example.parquet"));
   
   PARQUET_THROW_NOT_OK( parquet::arrow::WriteTable(table, arrow::default_memory_pool(), outfile, 1));
   ```
   
   The parquet is created fine and it contains all data, but it seems that custom metadata I add is gone. 
   Am I doing something wrong? Or do I need to explicity export the metadata?
   
   Thanks in advance.
   
   Best regards
   Sven


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Sounek commented on issue #10164: [Cpp][Parquet] WriteTable does not write the schema metadata into the parquet file

Posted by GitBox <gi...@apache.org>.
Sounek commented on issue #10164:
URL: https://github.com/apache/arrow/issues/10164#issuecomment-827462391


   Sorry for the trouble. I found the solution by my own.
   
   The default arrow writer property is not to write the schema. The following code does what I need:
   
   arrow::DoubleBuilder f64builder;
   f64builderWcX.AppendValues({1.0, 2.0, 3.0});
   std::shared_ptr<arrow::Array> array;
   f64builderWcX.Finish(&array);
   
   std::vector<std::shared_ptr<arrow::Field>> fields;
   auto field = std::make_shared<arrow::Field>("DataRow1", arrow::float64(), true, arrow::key_value_metadata({ "Unit" }, { "mm" }));
   
   auto schema = std::make_shared<arrow::Schema>(fields);
   schema = schema->WithMetadata(arrow::key_value_metadata({ "user" }, { "me" }));
   
   auto table = arrow::Table::Make(schema, {array});
   
   std::shared_ptr<arrow::io::FileOutputStream> outfile;
   PARQUET_ASSIGN_OR_THROW(outfile, arrow::io::FileOutputStream::Open("D:\\parquet-arrow-example.parquet"));
   
   parquet::WriterProperties::Builder builder;
   std::shared_ptr<parquet::WriterProperties> props = builder.build();
   
   parquet::ArrowWriterProperties::Builder builder2;
   builder2.store_schema();
   std::shared_ptr<parquet::ArrowWriterProperties> props2 = builder2.build();
   
   PARQUET_THROW_NOT_OK( parquet::arrow::WriteTable(table, arrow::default_memory_pool(), outfile, 1, props, props2));


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Sounek closed issue #10164: [Cpp][Parquet] WriteTable does not write the schema metadata into the parquet file

Posted by GitBox <gi...@apache.org>.
Sounek closed issue #10164:
URL: https://github.com/apache/arrow/issues/10164


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org