You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/04/30 13:53:53 UTC

[GitHub] [arrow-datafusion] andygrove edited a comment on pull request #55: Support qualified columns in queries

andygrove edited a comment on pull request #55:
URL: https://github.com/apache/arrow-datafusion/pull/55#issuecomment-830111572


   > @andygrove could you give some examples for schemaless data sources? I would like to incorporate that into the design doc.
   
   @houqp Sure. My limited experience with schemaless comes from Apache Drill. One use case is where you are reading JSON documents and there could be thousands of different attributes but each document only has a small subset. Rather than forcing every document into a fixed schema, you have a schema that contains one column named `*` or `**` (I can't quite remember how Drill does this). For each document (row) you can look up column by name and either get a value or null/None if the attribute doesn't exist on a particular row.
   
   The benefit of this approach is that we don't need to create a Schema object with thousands of fields, and we don't need each RecordBatch to have thousands of empty arrays. Our current RecordBatch does not support this use case of course.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org