You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Praveen Devarao <pr...@in.ibm.com> on 2016/02/01 13:03:18 UTC

Guidelines for writing SPARK packages

Hi,

        Is there any guidelines or specs to write a Spark package? I would 
like to implement a spark package and would like to know the way it needs 
to be structured (implement some interfaces etc) so that it can plug into 
Spark for extended functionality.

        Could any one help me point to docs or links on the above?

Thanking You

Praveen Devarao


Re: Guidelines for writing SPARK packages

Posted by Takeshi Yamamuro <li...@gmail.com>.
Hi,

A package I maintain (https://github.com/maropu/hivemall-spark) extends
existing SparkSQL/DataFrame classes for a third-party library.
Please use this as a concrete example.

Thanks,
takeshi

On Tue, Feb 2, 2016 at 6:20 PM, Praveen Devarao <pr...@in.ibm.com>
wrote:

> Thanks David.
>
> I am looking at extending the SparkSQL library with a custom
> package...hence was looking at more from details on any specific classes to
> be extended or implement (with) to achieve the redirect of calls to my
> module (when using .format).
>
> If you have any info on these lines do share with me...else debugging
> through would be the way :-)
>
> Thanking You
>
> Praveen Devarao
>
>
>
> From:        David Russell <th...@gmail.com>
> To:        Praveen Devarao/India/IBM@IBMIN
> Cc:        user <us...@spark.apache.org>
> Date:        01/02/2016 07:03 pm
> Subject:        Re: Guidelines for writing SPARK packages
> Sent by:        marchoffolly@gmail.com
> ------------------------------
>
>
>
> Hi Praveen,
>
> The basic requirements for releasing a Spark package on
> spark-packages.org are as follows:
>
> 1. The package content must be hosted by GitHub in a public repo under
> the owner's account.
> 2. The repo name must match the package name.
> 3. The master branch of the repo must contain "README.md" and "LICENSE".
>
> Per the doc on spark-packages.org site an example package that meets
> those requirements can be found at
> https://github.com/databricks/spark-avro. My own recently released
> SAMBA package also meets these requirements:
> https://github.com/onetapbeyond/lambda-spark-executor.
>
> As you can see there is nothing in this list of requirements that
> demands the implementation of specific interfaces. What you'll need to
> implement will depend entirely on what you want to accomplish. If you
> want to register a release for your package you will also need to push
> the artifacts for your package to Maven central.
>
> David
>
>
> On Mon, Feb 1, 2016 at 7:03 AM, Praveen Devarao <pr...@in.ibm.com>
> wrote:
> > Hi,
> >
> >         Is there any guidelines or specs to write a Spark package? I
> would
> > like to implement a spark package and would like to know the way it
> needs to
> > be structured (implement some interfaces etc) so that it can plug into
> Spark
> > for extended functionality.
> >
> >         Could any one help me point to docs or links on the above?
> >
> > Thanking You
> >
> > Praveen Devarao
>
>
>
> --
> "All that is gold does not glitter, Not all those who wander are lost."
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>
>
>
>


-- 
---
Takeshi Yamamuro

Re: Guidelines for writing SPARK packages

Posted by Praveen Devarao <pr...@in.ibm.com>.
Thanks David.

I am looking at extending the SparkSQL library with a custom 
package...hence was looking at more from details on any specific classes 
to be extended or implement (with) to achieve the redirect of calls to my 
module (when using .format).

If you have any info on these lines do share with me...else debugging 
through would be the way :-)

Thanking You

Praveen Devarao



From:   David Russell <th...@gmail.com>
To:     Praveen Devarao/India/IBM@IBMIN
Cc:     user <us...@spark.apache.org>
Date:   01/02/2016 07:03 pm
Subject:        Re: Guidelines for writing SPARK packages
Sent by:        marchoffolly@gmail.com



Hi Praveen,

The basic requirements for releasing a Spark package on
spark-packages.org are as follows:

1. The package content must be hosted by GitHub in a public repo under
the owner's account.
2. The repo name must match the package name.
3. The master branch of the repo must contain "README.md" and "LICENSE".

Per the doc on spark-packages.org site an example package that meets
those requirements can be found at
https://github.com/databricks/spark-avro. My own recently released
SAMBA package also meets these requirements:
https://github.com/onetapbeyond/lambda-spark-executor.

As you can see there is nothing in this list of requirements that
demands the implementation of specific interfaces. What you'll need to
implement will depend entirely on what you want to accomplish. If you
want to register a release for your package you will also need to push
the artifacts for your package to Maven central.

David


On Mon, Feb 1, 2016 at 7:03 AM, Praveen Devarao <pr...@in.ibm.com> 
wrote:
> Hi,
>
>         Is there any guidelines or specs to write a Spark package? I 
would
> like to implement a spark package and would like to know the way it 
needs to
> be structured (implement some interfaces etc) so that it can plug into 
Spark
> for extended functionality.
>
>         Could any one help me point to docs or links on the above?
>
> Thanking You
>
> Praveen Devarao



-- 
"All that is gold does not glitter, Not all those who wander are lost."

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org






Re: Guidelines for writing SPARK packages

Posted by Burak Yavuz <br...@gmail.com>.
Thanks for the reply David, just wanted to fix one part of your response:


> If you
> want to register a release for your package you will also need to push
> the artifacts for your package to Maven central.
>

It is NOT necessary to push to Maven Central in order to make a release.
There are many packages out there that don't publish to Maven Central, e.g.
scripts, and pure python packages.

Praveen, I would suggest taking a look at:
 - spark-package command line tool (
https://github.com/databricks/spark-package-cmd-tool), to get you set up
 - sbt-spark-package (https://github.com/databricks/sbt-spark-package) to
help with building/publishing if you plan to use Scala in your package. You
could of course use Maven as well, but we don't have a maven plugin for
Spark Packages.

Best,
Burak

Re: Guidelines for writing SPARK packages

Posted by David Russell <th...@gmail.com>.
Hi Praveen,

The basic requirements for releasing a Spark package on
spark-packages.org are as follows:

1. The package content must be hosted by GitHub in a public repo under
the owner's account.
2. The repo name must match the package name.
3. The master branch of the repo must contain "README.md" and "LICENSE".

Per the doc on spark-packages.org site an example package that meets
those requirements can be found at
https://github.com/databricks/spark-avro. My own recently released
SAMBA package also meets these requirements:
https://github.com/onetapbeyond/lambda-spark-executor.

As you can see there is nothing in this list of requirements that
demands the implementation of specific interfaces. What you'll need to
implement will depend entirely on what you want to accomplish. If you
want to register a release for your package you will also need to push
the artifacts for your package to Maven central.

David


On Mon, Feb 1, 2016 at 7:03 AM, Praveen Devarao <pr...@in.ibm.com> wrote:
> Hi,
>
>         Is there any guidelines or specs to write a Spark package? I would
> like to implement a spark package and would like to know the way it needs to
> be structured (implement some interfaces etc) so that it can plug into Spark
> for extended functionality.
>
>         Could any one help me point to docs or links on the above?
>
> Thanking You
>
> Praveen Devarao



-- 
"All that is gold does not glitter, Not all those who wander are lost."

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org