You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/07/09 08:00:43 UTC

[GitHub] [iceberg] deadwind4 opened a new issue #2798: Semi-structured data and unstructured data support.

deadwind4 opened a new issue #2798:
URL: https://github.com/apache/iceberg/issues/2798


   Will iceberg support semi-structured data and unstructured data in the future.
   For example, CSV, logs, XML, JSON, documents, PDFs etc.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] kbendick commented on issue #2798: Semi-structured data and unstructured data support.

Posted by GitBox <gi...@apache.org>.

kbendick commented on issue #2798:
URL: https://github.com/apache/iceberg/issues/2798#issuecomment-877908540


   By support, do you mean storing the data in these formats?
   
   I can't speak for the direction of the project as a whole, but I would think that would not be the case. Iceberg is really designed to take advantage of tabular data and the storage formats that benefit from having schema to be able to reduce the amount of data that is read for a query. I'm not sure that would be something that could be accomplished with json data storage.
   
   Could you describe your specific use case case (or one of them) in more detail? There might be a solution or pattern in practice that you could benefit from.
   
   If you're looking for a simple one of column, you can  always have a string column for a json blob as an example.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] kbendick edited a comment on issue #2798: Semi-structured data and unstructured data support.

Posted by GitBox <gi...@apache.org>.

kbendick edited a comment on issue #2798:
URL: https://github.com/apache/iceberg/issues/2798#issuecomment-877908540


   By support, do you mean storing the data in these formats?
   
   I can't speak for the direction of the project as a whole, but I would think that would not be the case. Iceberg is really designed to take advantage of tabular data and the storage formats that benefit from having schema to be able to reduce the amount of data that is read for a query. I'm not sure that would be something that could be accomplished with json data storage.
   
   Could you describe your specific use case case (or one of them) in more detail? There might be a solution or pattern in practice that you could benefit from.
   
   If you're looking for a simple one off column, you can always have a string column for a json blob as an example.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org