You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@kylin.apache.org by "Kumar, Manoj H" <ma...@jpmorgan.com> on 2017/11/14 06:30:10 UTC

Apache Kylin - Does it take input data from Parquet file format

Pls. advise on this - Input source of data - Can it from Parquet File format instead of Hive table? Upstream system generating the File format as output. Pls. suggest. This will be important features to have it.

Manoj

This message is confidential and subject to terms at: http://www.jpmorgan.com/emaildisclaimer including on confidentiality, legal privilege, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately. Any unauthorized use is strictly prohibited.

Re: Apache Kylin - Does it take input data from Parquet file format

Posted by ShaoFeng Shi <sh...@apache.org>.

Exactly, using the external table. After creating the table, let your
upstreaming app output files to the location. And then you can query the
data from Hive; If the table is available in Hive, that means Kylin can
fetch the data.

2017-11-14 16:13 GMT+08:00 Kumar, Manoj H <ma...@jpmorgan.com>:

> Do you mean that using external table it can be possible ? Just sample –
> Table name would be same as File name & upstream will be loading the file
> in that Directory?
>
>
>
> create external table parquet_table_name (x INT, y STRING)
>
>   ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
>
>   STORED AS
>
>     INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"
>
>     OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat"
>
>     LOCATION '/user/cloudera/tinytable';
>
>
>
> Regards,
>
> Manoj
>
>
>
> *From:* ShaoFeng Shi [mailto:shaofengshi@apache.org]
> *Sent:* Tuesday, November 14, 2017 12:58 PM
> *To:* user
> *Subject:* Re: Apache Kylin - Does it take input data from Parquet file
> format
>
>
>
> Hi Kumar,
>
>
>
> You can create a Hive table with Parquet format, and then let upstream
> application output the parquet files to this Hive table's HDFS location
> (each time can be in a new partition). Then Kylin will be able to fetch the
> new data incrementally from this table.
>
>
>
> Hive provides the logic layer over the physical model, which is
> recommended.
>
>
>
>
>
> 2017-11-14 14:30 GMT+08:00 Kumar, Manoj H <ma...@jpmorgan.com>:
>
> Pls. advise on this – Input source of data – Can it from Parquet File
> format instead of Hive table? Upstream system generating the File format as
> output. Pls. suggest. This will be important features to have it.
>
>
>
> Manoj
>
> This message is confidential and subject to terms at: http://
> www.jpmorgan.com/emaildisclaimer including on confidentiality, legal
> privilege, viruses and monitoring of electronic messages. If you are not
> the intended recipient, please delete this message and notify the sender
> immediately. Any unauthorized use is strictly prohibited.
>
>
>
>
>
> --
>
> Best regards,
>
>
>
> Shaofeng Shi 史少锋
>
>
>
> This message is confidential and subject to terms at: http://
> www.jpmorgan.com/emaildisclaimer including on confidentiality, legal
> privilege, viruses and monitoring of electronic messages. If you are not
> the intended recipient, please delete this message and notify the sender
> immediately. Any unauthorized use is strictly prohibited.
>



-- 
Best regards,

Shaofeng Shi 史少锋

RE: Apache Kylin - Does it take input data from Parquet file format

Posted by "Kumar, Manoj H" <ma...@jpmorgan.com>.

Do you mean that using external table it can be possible ? Just sample – Table name would be same as File name & upstream will be loading the file in that Directory?

create external table parquet_table_name (x INT, y STRING)
  ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
  STORED AS
    INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"
    OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat"
    LOCATION '/user/cloudera/tinytable';

Regards,
Manoj

From: ShaoFeng Shi [mailto:shaofengshi@apache.org]
Sent: Tuesday, November 14, 2017 12:58 PM
To: user
Subject: Re: Apache Kylin - Does it take input data from Parquet file format

Hi Kumar,

You can create a Hive table with Parquet format, and then let upstream application output the parquet files to this Hive table's HDFS location (each time can be in a new partition). Then Kylin will be able to fetch the new data incrementally from this table.

Hive provides the logic layer over the physical model, which is recommended.


2017-11-14 14:30 GMT+08:00 Kumar, Manoj H <ma...@jpmorgan.com>>:
Pls. advise on this – Input source of data – Can it from Parquet File format instead of Hive table? Upstream system generating the File format as output. Pls. suggest. This will be important features to have it.

Manoj

This message is confidential and subject to terms at: http://www.jpmorgan.com/emaildisclaimer<http://www.jpmorgan.com/emaildisclaimer> including on confidentiality, legal privilege, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately. Any unauthorized use is strictly prohibited.



--
Best regards,

Shaofeng Shi 史少锋


This message is confidential and subject to terms at: http://www.jpmorgan.com/emaildisclaimer including on confidentiality, legal privilege, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately. Any unauthorized use is strictly prohibited.

Re: Apache Kylin - Does it take input data from Parquet file format

Posted by ShaoFeng Shi <sh...@apache.org>.

Hi Kumar,

You can create a Hive table with Parquet format, and then let upstream
application output the parquet files to this Hive table's HDFS location
(each time can be in a new partition). Then Kylin will be able to fetch the
new data incrementally from this table.

Hive provides the logic layer over the physical model, which is
recommended.


2017-11-14 14:30 GMT+08:00 Kumar, Manoj H <ma...@jpmorgan.com>:

> Pls. advise on this – Input source of data – Can it from Parquet File
> format instead of Hive table? Upstream system generating the File format as
> output. Pls. suggest. This will be important features to have it.
>
>
>
> Manoj
>
> This message is confidential and subject to terms at: http://
> www.jpmorgan.com/emaildisclaimer including on confidentiality, legal
> privilege, viruses and monitoring of electronic messages. If you are not
> the intended recipient, please delete this message and notify the sender
> immediately. Any unauthorized use is strictly prohibited.
>



-- 
Best regards,

Shaofeng Shi 史少锋