You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/04/21 10:57:29 UTC

[GitHub] [arrow-rs] tustvold opened a new pull request #16: ARROW-12493: Add support for writing dictionary arrays to CSV and JSON

tustvold opened a new pull request #16:
URL: https://github.com/apache/arrow-rs/pull/16


   Provide support for serializing dictionary arrays to CSV and JSON by hydrating them to their underlying representation. This is not the most efficient way to do this, but was the simplest way I could think of to cover all bases. 
   
   It may be worthwhile special-casing StringDictionaries with a more efficient implementation in a subsequent PR, as I imagine they're the most common form of DictionaryArray.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] codecov-commenter commented on pull request #16: ARROW-12493: Add support for writing dictionary arrays to CSV and JSON

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #16:
URL: https://github.com/apache/arrow-rs/pull/16#issuecomment-823980424


   # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/16?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#16](https://codecov.io/gh/apache/arrow-rs/pull/16?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (4aaa465) into [master](https://codecov.io/gh/apache/arrow-rs/commit/5479e199d8c91f47f812cd8b708b68aba89dc21a?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (5479e19) will **increase** coverage by `0.01%`.
   > The diff coverage is `97.29%`.
   
   > :exclamation: Current head 4aaa465 differs from pull request most recent head 4fa2a1c. Consider uploading reports for the commit 4fa2a1c to get more accurate results
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-rs/pull/16/graphs/tree.svg?width=650&height=150&src=pr&token=pq9V9qWZ1N&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-rs/pull/16?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master      #16      +/-   ##
   ==========================================
   + Coverage   82.47%   82.48%   +0.01%     
   ==========================================
     Files         162      162              
     Lines       43414    43447      +33     
   ==========================================
   + Hits        35806    35838      +32     
   - Misses       7608     7609       +1     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/16?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [arrow/src/csv/writer.rs](https://codecov.io/gh/apache/arrow-rs/pull/16/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2Nzdi93cml0ZXIucnM=) | `83.01% <91.66%> (+0.27%)` | :arrow_up: |
   | [arrow/src/json/writer.rs](https://codecov.io/gh/apache/arrow-rs/pull/16/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2pzb24vd3JpdGVyLnJz) | `91.92% <100.00%> (+0.43%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/16?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/16?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [5479e19...4fa2a1c](https://codecov.io/gh/apache/arrow-rs/pull/16?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] jorgecarleitao commented on pull request #16: ARROW-12493: Add support for writing dictionary arrays to CSV and JSON

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #16:
URL: https://github.com/apache/arrow-rs/pull/16#issuecomment-824173793


   We had to perform a small re-write of master. The commits may look a bit odd, but it should not cause conflicts. Could you kindly rebase this against the latest master to make it easier to review?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on a change in pull request #16: ARROW-12493: Add support for writing dictionary arrays to CSV and JSON

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #16:
URL: https://github.com/apache/arrow-rs/pull/16#discussion_r617547804



##########
File path: arrow/src/json/writer.rs
##########
@@ -709,6 +715,56 @@ mod tests {
         );
     }
 
+    #[test]
+    fn write_dictionary() {
+        let schema = Schema::new(vec![
+            Field::new(
+                "c1",
+                DataType::Dictionary(Box::new(DataType::Int32), Box::new(DataType::Utf8)),
+                true,
+            ),
+            Field::new(
+                "c2",
+                DataType::Dictionary(Box::new(DataType::Int8), Box::new(DataType::Utf8)),
+                true,
+            ),
+        ]);
+
+        let a: DictionaryArray<Int32Type> = vec![
+            Some("cupcakes"),
+            Some("foo"),
+            Some("foo"),
+            None,
+            Some("cupcakes"),
+        ]
+        .into_iter()
+        .collect();
+        let b: DictionaryArray<Int8Type> =
+            vec![Some("sdsd"), Some("sdsd"), None, Some("sd"), Some("sdsd")]
+                .into_iter()
+                .collect();
+
+        let batch =
+            RecordBatch::try_new(Arc::new(schema), vec![Arc::new(a), Arc::new(b)])
+                .unwrap();
+
+        let mut buf = Vec::new();
+        {
+            let mut writer = LineDelimitedWriter::new(&mut buf);
+            writer.write_batches(&[batch]).unwrap();
+        }
+
+        assert_eq!(
+            String::from_utf8(buf).unwrap(),
+            r#"{"c1":"cupcakes","c2":"sdsd"}

Review comment:
       🧁 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb merged pull request #16: ARROW-12493: Add support for writing dictionary arrays to CSV and JSON

Posted by GitBox <gi...@apache.org>.
alamb merged pull request #16:
URL: https://github.com/apache/arrow-rs/pull/16


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org