You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@carbondata.apache.org by "Ajantha Bhat (JIRA)" <ji...@apache.org> on 2019/05/02 11:25:00 UTC

[jira] [Created] (CARBONDATA-3365) Support Apache arrow vector filling from carbondata SDK

Ajantha Bhat created CARBONDATA-3365:
----------------------------------------

             Summary: Support Apache arrow vector filling from carbondata SDK
                 Key: CARBONDATA-3365
                 URL: https://issues.apache.org/jira/browse/CARBONDATA-3365
             Project: CarbonData
          Issue Type: Improvement
            Reporter: Ajantha Bhat


*Background:* 
As we know Apache arrow is a cross-language development platform for 
in-memory data, It specifies a standardised language-independent columnar 
memory format for flat and hierarchical data, organised for efficient 
analytic operations on modern hardware. 
So, By integrating carbon to support filling arrow vector, contents read by 
carbondata files can be used for analytics in any programming language. say 
arrow vector filled from carbon java SDK can be read by python, c, c++ and 
many other languages supported by arrow. 
This will also increase the scope for carbondata use-cases and carbondata 
can be used for various applications as arrow is integrated already with 
many query engines. 
*Implementation:* 
*Stage1:* 
After SDK reading the carbondata file, convert carbon rows and fill the 
arrow vector. 
*Stage2:* 
Deep integration with carbon vector; for this, currently carbon SDK vector 
doesn't support filling complex columns. 
After supporting this, arrow vector can be wrapped around carbon SDK vector 
for deep integration. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)