You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by LinQili <li...@outlook.com> on 2014/12/23 09:09:41 UTC

How to export data from hive into hdfs in spark program?

Hi all:I wonder if is there a way to export data from table of hive into hdfs using spark?like this:  INSERT OVERWRITE DIRECTORY '/user/linqili/tmp/src' select * from $DB.$tableName 		 	   		  

Re: How to export data from hive into hdfs in spark program?

Posted by Cheng Lian <li...@gmail.com>.
This depends on which output format you want. For Parquet, you can 
simply do this:

|hiveContext.table("some_db.some_table").saveAsParquetFile("hdfs://path/to/file")
|

On 12/23/14 5:22 PM, LinQili wrote:

> Hi Leo:
> Thanks for your reply.
> I am talking about using hive from spark to export data from hive to hdfs.
> maybe like:
>       val exportData = s"insert overwrite directory 
> '/user/linqili/tmp/src' select * from $DB.$tableName"
>       hiveContext.sql(exportData)
> but it was unsupported in spark now:
> Exception in thread "Thread-3" java.lang.reflect.InvocationTargetException
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:601)
>     at 
> org.apache.spark.deploy.yarn.ApplicationMaster$anon$2.run(ApplicationMaster.scala:183)
> Caused by: java.lang.RuntimeException:
> Unsupported language features in query: insert overwrite directory 
> '/user/linqili/tmp/src' select * from test_spark.src
> TOK_QUERY
>   TOK_FROM
>     TOK_TABREF
>       TOK_TABNAME
>         test_spark
>         src
>   TOK_INSERT
>     TOK_DESTINATION
>       TOK_DIR
>         '/user/linqili/tmp/src'
>     TOK_SELECT
>       TOK_SELEXPR
>         TOK_ALLCOLREF
>
>     at scala.sys.package$.error(package.scala:27)
>     at org.apache.spark.sql.hive.HiveQl$.parseSql(HiveQl.scala:256)
>     at org.apache.spark.sql.hive.HiveContext.hiveql(HiveContext.scala:106)
>     at org.apache.spark.sql.hive.HiveContext.hql(HiveContext.scala:110)
>     at com.nd.huayuedu.HiveExportTest$.main(HiveExportTest.scala:35)
>     at com.nd.huayuedu.HiveExportTest.main(HiveExportTest.scala)
>     ... 5 more
> ------------------------------------------------------------------------
> Date: Tue, 23 Dec 2014 16:47:11 +0800
> From: leo.chen.cipher@outlook.com
> To: lin_qili@outlook.com
> Subject: Re: How to export data from hive into hdfs in spark program?
>
> Hi,
>
> If you are talking about using spark's thriftserver, this query should 
> work:
> export table $DB.$tableName to '/user/linqili/tmp/src';
> However you need to take care of that folder (by deleting it I 
> presume) first.
>
> Cheers,
> Leo
> On 2014/12/23 16:09, LinQili wrote:
>
>     Hi all:
>     I wonder if is there a way to export data from table of hive into
>     hdfs using spark?
>     like this:  INSERT OVERWRITE DIRECTORY '/user/linqili/tmp/src'
>     select * from $DB.$tableName
>
>
​

RE: How to export data from hive into hdfs in spark program?

Posted by LinQili <li...@outlook.com>.
Hi Leo:Thanks for your reply.I am talking about using hive from spark to export data from hive to hdfs.maybe like:      val exportData = s"insert overwrite directory '/user/linqili/tmp/src' select * from $DB.$tableName"      hiveContext.sql(exportData)but it was unsupported in spark now:Exception in thread "Thread-3" java.lang.reflect.InvocationTargetException    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)    at java.lang.reflect.Method.invoke(Method.java:601)    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:183)Caused by: java.lang.RuntimeException: Unsupported language features in query: insert overwrite directory '/user/linqili/tmp/src' select * from test_spark.srcTOK_QUERY  TOK_FROM    TOK_TABREF      TOK_TABNAME        test_spark        src   TOK_INSERT    TOK_DESTINATION      TOK_DIR        '/user/linqili/tmp/src'    TOK_SELECT      TOK_SELEXPR        TOK_ALLCOLREF
        at scala.sys.package$.error(package.scala:27)    at org.apache.spark.sql.hive.HiveQl$.parseSql(HiveQl.scala:256)    at org.apache.spark.sql.hive.HiveContext.hiveql(HiveContext.scala:106)    at org.apache.spark.sql.hive.HiveContext.hql(HiveContext.scala:110)    at com.nd.huayuedu.HiveExportTest$.main(HiveExportTest.scala:35)    at com.nd.huayuedu.HiveExportTest.main(HiveExportTest.scala)    ... 5 moreDate: Tue, 23 Dec 2014 16:47:11 +0800
From: leo.chen.cipher@outlook.com
To: lin_qili@outlook.com
Subject: Re: How to export data from hive into hdfs in spark program?


  
    
  
  
    Hi,

    

    If you are talking about using spark's thriftserver, this query
    should work:

    export table $DB.$tableName to '/user/linqili/tmp/src';

    However you need to take care of that folder (by deleting it I
    presume) first.

    

    Cheers,

    Leo

    On 2014/12/23 16:09, LinQili wrote:

    
    
      
      Hi all:
        I wonder if is there a way to export data from table of
          hive into hdfs using spark?
        like this:  INSERT OVERWRITE DIRECTORY
          '/user/linqili/tmp/src' select * from $DB.$tableName