You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@parquet.apache.org by "Julien Le Dem (JIRA)" <ji...@apache.org> on 2015/04/03 02:07:53 UTC

[jira] [Commented] (PARQUET-224) Implement writing Parquet files into Cassandra natively

    [ https://issues.apache.org/jira/browse/PARQUET-224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393776#comment-14393776 ] 

Julien Le Dem commented on PARQUET-224:
---------------------------------------

That sounds doable.
You'd create a new PageStore:
https://github.com/apache/incubator-parquet-mr/blob/master/parquet-column/src/test/java/parquet/column/page/mem/MemPageStore.java

example here of using parquet assembly independently of a file:
https://github.com/apache/incubator-parquet-mr/blob/master/parquet-column/src/test/java/parquet/io/TestColumnIO.java

> Implement writing Parquet files into Cassandra natively
> -------------------------------------------------------
>
>                 Key: PARQUET-224
>                 URL: https://issues.apache.org/jira/browse/PARQUET-224
>             Project: Parquet
>          Issue Type: New Feature
>            Reporter: Issac Buenrostro
>            Priority: Minor
>
> Writing Parquet files into Cassandra could allow parallel writes of multiple pages into different cells, and low latency reads with a persistent connection to C*.
> Each page could be written to separate C* cells, with metadata written into a separate column family.
> A possible way of implementing is:
> - abstract ParquetFileWriter -> ParquetDataWriter. writeDictionaryPage, writeDataPage are abstract methods.
> - ParquetFileWriter implements ParquetDataWriter, writing the data to Hadoop compatible files.
> - ParquetCassandraWriter implements ParquetDataWriter, writing data to Cassandra
> -- for each page, metadata is written to Metadata CF, with key <parquet-file-name>:<row-chunk>:<column>:<page>
> -- for each page, data is written to Data CF, with key <parquet-file-name>:<row-chunk>:<column>:<page>
> -- footer is written to Metadata CF, with key <parquet-file-name>
> - abstract ParquetFileReader -> ParquetDataReader. readNextRowGroup, readFooter are abstract methods. Chunk will also need to be abstract.
> - ParquetFileReader implements ParquetDataReader, reading from Hadoop compatible files.
> - ParquetCassandraReader implements ParquetDataReader, reading from Cassandra
> - ParquetDataWriter and ParquetDataReader are instantiated through reflection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)