You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@predictionio.apache.org by "Wojciech Indyk (JIRA)" <ji...@apache.org> on 2016/09/18 16:10:20 UTC

[jira] [Created] (PIO-38) add Apache Parquet as a data source

Wojciech Indyk created PIO-38:
---------------------------------

             Summary: add Apache Parquet as a data source
                 Key: PIO-38
                 URL: https://issues.apache.org/jira/browse/PIO-38
             Project: PredictionIO
          Issue Type: New Feature
            Reporter: Wojciech Indyk


Apache Parquet (https://parquet.apache.org/) is a columnar data store, native for Apache Spark and very well suited to storing batch data (as an input) for PredictionIO Engine.
Parquet is very popular to archive clickstream, so it would enable to use PredictionIO without additional import of data (and duplication) to HBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)