You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Dai, Kevin" <yu...@ebay.com> on 2015/03/23 10:29:25 UTC

Use pig load function in spark

Hi, all

Can spark use pig's load function to load data?

Best Regards,
Kevin.

Re: Use pig load function in spark

Posted by Denny Lee <de...@gmail.com>.
You may be able to utilize Spork (Pig on Apache Spark) as a mechanism to do
this: https://github.com/sigmoidanalytics/spork


On Mon, Mar 23, 2015 at 2:29 AM Dai, Kevin <yu...@ebay.com> wrote:

>  Hi, all
>
>
>
> Can spark use pig’s load function to load data?
>
>
>
> Best Regards,
>
> Kevin.
>

RE: Use pig load function in spark

Posted by "Dai, Kevin" <yu...@ebay.com>.
Hi, Yin

But our data is customized sequence file which can be read by our customized load in pig

And I want to use spark to reuse these load function to read data and transfer them to the RDD.

Best Regards,
Kevin.

From: Yin Huai [mailto:yhuai@databricks.com]
Sent: 2015年3月24日 11:53
To: Dai, Kevin
Cc: Paul Brown; user@spark.apache.org
Subject: Re: Use pig load function in spark

Hello Kevin,

You can take a look at our generic load function<https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#generic-loadsave-functions>.

For example, you can use

val df = sqlContext.load("/myData", "parquet")
To load a parquet dataset stored in "/myData" as a DataFrame<https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#dataframes>.

You can use it to load data stored in various formats, like json (Spark built-in), parquet (Spark built-in), avro<https://github.com/databricks/spark-avro>, and csv<https://github.com/databricks/spark-csv>.

Thanks,

Yin

On Mon, Mar 23, 2015 at 7:14 PM, Dai, Kevin <yu...@ebay.com>> wrote:
Hi, Paul

You are right.

The story is that we have a lot of pig load function to load our different data.

And now we want to use spark to read and process these data.

So we want to figure out a way to reuse our existing load function in spark to read these data.

Any idea?

Best Regards,
Kevin.

From: Paul Brown [mailto:prb@mult.ifario.us<ma...@mult.ifario.us>]
Sent: 2015年3月24日 4:11
To: Dai, Kevin
Subject: Re: Use pig load function in spark


The answer is "Maybe, but you probably don't want to do that.".

A typical Pig load function is devoted to bridging external data into Pig's type system, but you don't really need to do that in Spark because it is (thankfully) not encumbered by Pig's type system.  What you probably want to do is to figure out a way to use native Spark facilities (e.g., textFile) coupled with some of the logic out of your Pig load function necessary to turn your external data into an RDD.


—
prb@mult.ifario.us<ma...@mult.ifario.us> | Multifarious, Inc. | http://mult.ifario.us/

On Mon, Mar 23, 2015 at 2:29 AM, Dai, Kevin <yu...@ebay.com>> wrote:
Hi, all

Can spark use pig’s load function to load data?

Best Regards,
Kevin.



Re: Use pig load function in spark

Posted by Yin Huai <yh...@databricks.com>.
Hello Kevin,

You can take a look at our generic load function
<https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#generic-loadsave-functions>
.

For example, you can use

val df = sqlContext.load("/myData", "parquet")

To load a parquet dataset stored in "/myData" as a DataFrame
<https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#dataframes>.

You can use it to load data stored in various formats, like json (Spark
built-in), parquet (Spark built-in), avro
<https://github.com/databricks/spark-avro>, and csv
<https://github.com/databricks/spark-csv>.

Thanks,

Yin

On Mon, Mar 23, 2015 at 7:14 PM, Dai, Kevin <yu...@ebay.com> wrote:

>  Hi, Paul
>
>
>
> You are right.
>
>
>
> The story is that we have a lot of pig load function to load our different
> data.
>
>
>
> And now we want to use spark to read and process these data.
>
>
>
> So we want to figure out a way to reuse our existing load function in
> spark to read these data.
>
>
>
> Any idea?
>
>
>
> Best Regards,
>
> Kevin.
>
>
>
> *From:* Paul Brown [mailto:prb@mult.ifario.us]
> *Sent:* 2015年3月24日 4:11
> *To:* Dai, Kevin
> *Subject:* Re: Use pig load function in spark
>
>
>
>
>
> The answer is "Maybe, but you probably don't want to do that.".
>
>
>
> A typical Pig load function is devoted to bridging external data into
> Pig's type system, but you don't really need to do that in Spark because it
> is (thankfully) not encumbered by Pig's type system.  What you probably
> want to do is to figure out a way to use native Spark facilities (e.g.,
> textFile) coupled with some of the logic out of your Pig load function
> necessary to turn your external data into an RDD.
>
>
>
>
>   —
> prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
>
>
>
> On Mon, Mar 23, 2015 at 2:29 AM, Dai, Kevin <yu...@ebay.com> wrote:
>
> Hi, all
>
>
>
> Can spark use pig’s load function to load data?
>
>
>
> Best Regards,
>
> Kevin.
>
>
>

RE: Use pig load function in spark

Posted by "Dai, Kevin" <yu...@ebay.com>.
Hi, Paul

You are right.

The story is that we have a lot of pig load function to load our different data.

And now we want to use spark to read and process these data.

So we want to figure out a way to reuse our existing load function in spark to read these data.

Any idea?

Best Regards,
Kevin.

From: Paul Brown [mailto:prb@mult.ifario.us]
Sent: 2015年3月24日 4:11
To: Dai, Kevin
Subject: Re: Use pig load function in spark


The answer is "Maybe, but you probably don't want to do that.".

A typical Pig load function is devoted to bridging external data into Pig's type system, but you don't really need to do that in Spark because it is (thankfully) not encumbered by Pig's type system.  What you probably want to do is to figure out a way to use native Spark facilities (e.g., textFile) coupled with some of the logic out of your Pig load function necessary to turn your external data into an RDD.


—
prb@mult.ifario.us<ma...@mult.ifario.us> | Multifarious, Inc. | http://mult.ifario.us/

On Mon, Mar 23, 2015 at 2:29 AM, Dai, Kevin <yu...@ebay.com>> wrote:
Hi, all

Can spark use pig’s load function to load data?

Best Regards,
Kevin.