You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Divya Gehlot <di...@gmail.com> on 2015/12/16 03:04:33 UTC

Pros and cons -Saving spark data in hive

Hi,
I am new bee to Spark and  I am exploring option and pros and cons which
one will work best in spark and hive context.My  dataset  inputs are CSV
files, using spark to process the my data and saving it in hive using
hivecontext

1) Process the CSV file using spark-csv package and create temptable and
store the data in hive using hive context.
2) Process the file as normal text file in sqlcontext  ,register its as
temptable in sqlcontext and store it as ORC file and read that ORC file in
hive context and store it in hive.

Is there any other best options apart from mentioned above.
Would really appreciate the inputs.
Thanks in advance.

Thanks,
Regards,
Divya

Re: Pros and cons -Saving spark data in hive

Posted by Sabarish Sasidharan <sa...@manthan.com>.
If all you want to do is to load data into Hive, you don't need to use
Spark.

For subsequent query performance you would want to convert to ORC or
Parquet when loading into Hive.

Regards
Sab
On 16-Dec-2015 7:34 am, "Divya Gehlot" <di...@gmail.com> wrote:

> Hi,
> I am new bee to Spark and  I am exploring option and pros and cons which
> one will work best in spark and hive context.My  dataset  inputs are CSV
> files, using spark to process the my data and saving it in hive using
> hivecontext
>
> 1) Process the CSV file using spark-csv package and create temptable and
> store the data in hive using hive context.
> 2) Process the file as normal text file in sqlcontext  ,register its as
> temptable in sqlcontext and store it as ORC file and read that ORC file in
> hive context and store it in hive.
>
> Is there any other best options apart from mentioned above.
> Would really appreciate the inputs.
> Thanks in advance.
>
> Thanks,
> Regards,
> Divya
>

Re: Pros and cons -Saving spark data in hive

Posted by Xuefu Zhang <xz...@cloudera.com>.
You might want to consider Hive on Spark where you can work directly with
Hive and your query execution is powered by Spark as an engine.

--Xuefu

On Tue, Dec 15, 2015 at 6:04 PM, Divya Gehlot <di...@gmail.com>
wrote:

> Hi,
> I am new bee to Spark and  I am exploring option and pros and cons which
> one will work best in spark and hive context.My  dataset  inputs are CSV
> files, using spark to process the my data and saving it in hive using
> hivecontext
>
> 1) Process the CSV file using spark-csv package and create temptable and
> store the data in hive using hive context.
> 2) Process the file as normal text file in sqlcontext  ,register its as
> temptable in sqlcontext and store it as ORC file and read that ORC file in
> hive context and store it in hive.
>
> Is there any other best options apart from mentioned above.
> Would really appreciate the inputs.
> Thanks in advance.
>
> Thanks,
> Regards,
> Divya
>