You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by LinQili <li...@outlook.com> on 2014/12/23 09:09:41 UTC
How to export data from hive into hdfs in spark program?
Hi all:I wonder if is there a way to export data from table of hive into hdfs using spark?like this: INSERT OVERWRITE DIRECTORY '/user/linqili/tmp/src' select * from $DB.$tableName
Re: How to export data from hive into hdfs in spark program?
Posted by Cheng Lian <li...@gmail.com>.
This depends on which output format you want. For Parquet, you can
simply do this:
|hiveContext.table("some_db.some_table").saveAsParquetFile("hdfs://path/to/file")
|
On 12/23/14 5:22 PM, LinQili wrote:
> Hi Leo:
> Thanks for your reply.
> I am talking about using hive from spark to export data from hive to hdfs.
> maybe like:
> val exportData = s"insert overwrite directory
> '/user/linqili/tmp/src' select * from $DB.$tableName"
> hiveContext.sql(exportData)
> but it was unsupported in spark now:
> Exception in thread "Thread-3" java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$anon$2.run(ApplicationMaster.scala:183)
> Caused by: java.lang.RuntimeException:
> Unsupported language features in query: insert overwrite directory
> '/user/linqili/tmp/src' select * from test_spark.src
> TOK_QUERY
> TOK_FROM
> TOK_TABREF
> TOK_TABNAME
> test_spark
> src
> TOK_INSERT
> TOK_DESTINATION
> TOK_DIR
> '/user/linqili/tmp/src'
> TOK_SELECT
> TOK_SELEXPR
> TOK_ALLCOLREF
>
> at scala.sys.package$.error(package.scala:27)
> at org.apache.spark.sql.hive.HiveQl$.parseSql(HiveQl.scala:256)
> at org.apache.spark.sql.hive.HiveContext.hiveql(HiveContext.scala:106)
> at org.apache.spark.sql.hive.HiveContext.hql(HiveContext.scala:110)
> at com.nd.huayuedu.HiveExportTest$.main(HiveExportTest.scala:35)
> at com.nd.huayuedu.HiveExportTest.main(HiveExportTest.scala)
> ... 5 more
> ------------------------------------------------------------------------
> Date: Tue, 23 Dec 2014 16:47:11 +0800
> From: leo.chen.cipher@outlook.com
> To: lin_qili@outlook.com
> Subject: Re: How to export data from hive into hdfs in spark program?
>
> Hi,
>
> If you are talking about using spark's thriftserver, this query should
> work:
> export table $DB.$tableName to '/user/linqili/tmp/src';
> However you need to take care of that folder (by deleting it I
> presume) first.
>
> Cheers,
> Leo
> On 2014/12/23 16:09, LinQili wrote:
>
> Hi all:
> I wonder if is there a way to export data from table of hive into
> hdfs using spark?
> like this: INSERT OVERWRITE DIRECTORY '/user/linqili/tmp/src'
> select * from $DB.$tableName
>
>
RE: How to export data from hive into hdfs in spark program?
Posted by LinQili <li...@outlook.com>.
Hi Leo:Thanks for your reply.I am talking about using hive from spark to export data from hive to hdfs.maybe like: val exportData = s"insert overwrite directory '/user/linqili/tmp/src' select * from $DB.$tableName" hiveContext.sql(exportData)but it was unsupported in spark now:Exception in thread "Thread-3" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:183)Caused by: java.lang.RuntimeException: Unsupported language features in query: insert overwrite directory '/user/linqili/tmp/src' select * from test_spark.srcTOK_QUERY TOK_FROM TOK_TABREF TOK_TABNAME test_spark src TOK_INSERT TOK_DESTINATION TOK_DIR '/user/linqili/tmp/src' TOK_SELECT TOK_SELEXPR TOK_ALLCOLREF
at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.hive.HiveQl$.parseSql(HiveQl.scala:256) at org.apache.spark.sql.hive.HiveContext.hiveql(HiveContext.scala:106) at org.apache.spark.sql.hive.HiveContext.hql(HiveContext.scala:110) at com.nd.huayuedu.HiveExportTest$.main(HiveExportTest.scala:35) at com.nd.huayuedu.HiveExportTest.main(HiveExportTest.scala) ... 5 moreDate: Tue, 23 Dec 2014 16:47:11 +0800
From: leo.chen.cipher@outlook.com
To: lin_qili@outlook.com
Subject: Re: How to export data from hive into hdfs in spark program?
Hi,
If you are talking about using spark's thriftserver, this query
should work:
export table $DB.$tableName to '/user/linqili/tmp/src';
However you need to take care of that folder (by deleting it I
presume) first.
Cheers,
Leo
On 2014/12/23 16:09, LinQili wrote:
Hi all:
I wonder if is there a way to export data from table of
hive into hdfs using spark?
like this: INSERT OVERWRITE DIRECTORY
'/user/linqili/tmp/src' select * from $DB.$tableName