You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@phoenix.apache.org by Krishna <re...@gmail.com> on 2015/12/02 00:30:30 UTC

spark plugin with java

Hi,

Is there a working example for using spark plugin in Java? Specifically,
what's the java equivalent for creating a dataframe as shown here in scala:

val df = sqlContext.phoenixTableAsDataFrame("TABLE1", Array("ID",
"COL1"), conf = configuration)

Re: spark plugin with java

Posted by Josh Mahonin <jm...@gmail.com>.

It does. Under the hood, the DataFrame/RDD makes use of the
PhoenixInputFormat, which derives the split information from the query
planner and passes those back through to Spark to use for its
parallelization.

After you have the RDD / DataFrame handle, you're also free to use Spark's
repartition() operation as needed as well.

On Wed, Dec 2, 2015 at 2:56 PM, Krishna <re...@gmail.com> wrote:

> Yes, I will create new tickets for any issues that I may run into.
> Another question: For now I'm pursuing the option of creating a dataframe
> as shown in my previous email. How does spark handle parallelization in
> this case? Does it use phoenix metadata on splits?
>
>
> On Wed, Dec 2, 2015 at 11:02 AM, Josh Mahonin <jm...@gmail.com> wrote:
>
>> Hi Krishna,
>>
>> That's great to hear. You're right, the plugin itself should be backwards
>> compatible to Spark 1.3.1 and should be for any version of Phoenix past
>> 4.4.0, though I can't guarantee that to be the case forever. As well, I
>> don't know how much usage there is across the board using the Java API and
>> DataFrames, you in fact may be the first. If you are encountering any
>> errors with it could you file a JIRA please with any stack traces you see?
>>
>> Since Spark is a very quickly changing project, often they update
>> internal functionality that we sometimes lag behind on support for, and as
>> a result there's no direct mapping between specific Phoenix versions and
>> specific Spark versions. We add new support as fast as we get patches,
>> essentially.
>>
>> My general recommendation is to stay back a major version on Spark if
>> possible, but if you need to use the latest Spark releases, try use the
>> latest Phoenix release as well. The DataFrame support in Phoenix, for
>> instance, has had many patches and improvements recently that older
>> versions are missing.
>>
>> Thanks,
>>
>> Josh
>>
>> On Wed, Dec 2, 2015 at 1:40 PM, Krishna <re...@gmail.com> wrote:
>>
>>> Yes, that works for Spark 1.4.x. Website says Spark 1.3.1+ for Spark
>>> plugin, is that accurate?
>>>
>>> For Spark 1.3.1, I created a dataframe as follows (could not use the
>>> plugin):
>>> *        Map<String, String> options = new HashMap<String, String>();*
>>> *        options.put("url", PhoenixRuntime.JDBC_PROTOCOL +
>>> PhoenixRuntime.JDBC_PROTOCOL_SEPARATOR + zkQuorum);*
>>> *        options.put("dbtable", "TABLE_NAME");*
>>>
>>> *        SQLContext sqlContext = new SQLContext(sc);*
>>> *        DataFrame jdbcDF = sqlContext.load("jdbc",
>>> options).filter("COL_NAME > SOME_VALUE");*
>>>
>>> Also, it isn't immediately obvious which version of Spark was used in
>>> building Phoenix artifacts available on Maven. May be, it's worth putting
>>> it on the website. Let me know if the mapping below is incorrect.
>>>
>>> Phoenix 4.4.x <--> Spark 1.4.0
>>> > Phoenix 4.5.x <--> Spark 1.5.0
>>> > Phoenix 4.6.x <--> Spark 1.5.0
>>>
>>>
>>> On Tue, Dec 1, 2015 at 7:05 PM, Josh Mahonin <jm...@gmail.com> wrote:
>>>
>>> > Hi Krishna,
>>> >
>>> > I've not tried it in Java at all, but I as of Spark 1.4+ the DataFrame
>>> API
>>> > should be unified between Scala and Java, so the following may work
>>> for you:
>>> >
>>> > DataFrame df = sqlContext.read()
>>> >     .format("org.apache.phoenix.spark")
>>> >     .option("table", "TABLE1")
>>> >     .option("zkUrl", "<phoenix-server:2181>")
>>> >     .load();
>>> >
>>> > Note that 'zkUrl' must be set to your Phoenix URL, and passing a 'conf'
>>> > parameter isn't supported. Please let us know back here if this works
>>> out
>>> > for you, I'd love to update the documentation and unit tests if it
>>> works.
>>> >
>>> > Josh
>>> >
>>> > On Tue, Dec 1, 2015 at 6:30 PM, Krishna <re...@gmail.com> wrote:
>>> >
>>> >> Hi,
>>> >>
>>> >> Is there a working example for using spark plugin in Java?
>>> Specifically,
>>> >> what's the java equivalent for creating a dataframe as shown here in
>>> scala:
>>> >>
>>> >> val df = sqlContext.phoenixTableAsDataFrame("TABLE1", Array("ID",
>>> "COL1"), conf = configuration)
>>> >>
>>> >>
>>> >
>>>
>>
>>
>

Re: spark plugin with java

Posted by Josh Mahonin <jm...@gmail.com>.

It does. Under the hood, the DataFrame/RDD makes use of the
PhoenixInputFormat, which derives the split information from the query
planner and passes those back through to Spark to use for its
parallelization.

After you have the RDD / DataFrame handle, you're also free to use Spark's
repartition() operation as needed as well.

On Wed, Dec 2, 2015 at 2:56 PM, Krishna <re...@gmail.com> wrote:

> Yes, I will create new tickets for any issues that I may run into.
> Another question: For now I'm pursuing the option of creating a dataframe
> as shown in my previous email. How does spark handle parallelization in
> this case? Does it use phoenix metadata on splits?
>
>
> On Wed, Dec 2, 2015 at 11:02 AM, Josh Mahonin <jm...@gmail.com> wrote:
>
>> Hi Krishna,
>>
>> That's great to hear. You're right, the plugin itself should be backwards
>> compatible to Spark 1.3.1 and should be for any version of Phoenix past
>> 4.4.0, though I can't guarantee that to be the case forever. As well, I
>> don't know how much usage there is across the board using the Java API and
>> DataFrames, you in fact may be the first. If you are encountering any
>> errors with it could you file a JIRA please with any stack traces you see?
>>
>> Since Spark is a very quickly changing project, often they update
>> internal functionality that we sometimes lag behind on support for, and as
>> a result there's no direct mapping between specific Phoenix versions and
>> specific Spark versions. We add new support as fast as we get patches,
>> essentially.
>>
>> My general recommendation is to stay back a major version on Spark if
>> possible, but if you need to use the latest Spark releases, try use the
>> latest Phoenix release as well. The DataFrame support in Phoenix, for
>> instance, has had many patches and improvements recently that older
>> versions are missing.
>>
>> Thanks,
>>
>> Josh
>>
>> On Wed, Dec 2, 2015 at 1:40 PM, Krishna <re...@gmail.com> wrote:
>>
>>> Yes, that works for Spark 1.4.x. Website says Spark 1.3.1+ for Spark
>>> plugin, is that accurate?
>>>
>>> For Spark 1.3.1, I created a dataframe as follows (could not use the
>>> plugin):
>>> *        Map<String, String> options = new HashMap<String, String>();*
>>> *        options.put("url", PhoenixRuntime.JDBC_PROTOCOL +
>>> PhoenixRuntime.JDBC_PROTOCOL_SEPARATOR + zkQuorum);*
>>> *        options.put("dbtable", "TABLE_NAME");*
>>>
>>> *        SQLContext sqlContext = new SQLContext(sc);*
>>> *        DataFrame jdbcDF = sqlContext.load("jdbc",
>>> options).filter("COL_NAME > SOME_VALUE");*
>>>
>>> Also, it isn't immediately obvious which version of Spark was used in
>>> building Phoenix artifacts available on Maven. May be, it's worth putting
>>> it on the website. Let me know if the mapping below is incorrect.
>>>
>>> Phoenix 4.4.x <--> Spark 1.4.0
>>> > Phoenix 4.5.x <--> Spark 1.5.0
>>> > Phoenix 4.6.x <--> Spark 1.5.0
>>>
>>>
>>> On Tue, Dec 1, 2015 at 7:05 PM, Josh Mahonin <jm...@gmail.com> wrote:
>>>
>>> > Hi Krishna,
>>> >
>>> > I've not tried it in Java at all, but I as of Spark 1.4+ the DataFrame
>>> API
>>> > should be unified between Scala and Java, so the following may work
>>> for you:
>>> >
>>> > DataFrame df = sqlContext.read()
>>> >     .format("org.apache.phoenix.spark")
>>> >     .option("table", "TABLE1")
>>> >     .option("zkUrl", "<phoenix-server:2181>")
>>> >     .load();
>>> >
>>> > Note that 'zkUrl' must be set to your Phoenix URL, and passing a 'conf'
>>> > parameter isn't supported. Please let us know back here if this works
>>> out
>>> > for you, I'd love to update the documentation and unit tests if it
>>> works.
>>> >
>>> > Josh
>>> >
>>> > On Tue, Dec 1, 2015 at 6:30 PM, Krishna <re...@gmail.com> wrote:
>>> >
>>> >> Hi,
>>> >>
>>> >> Is there a working example for using spark plugin in Java?
>>> Specifically,
>>> >> what's the java equivalent for creating a dataframe as shown here in
>>> scala:
>>> >>
>>> >> val df = sqlContext.phoenixTableAsDataFrame("TABLE1", Array("ID",
>>> "COL1"), conf = configuration)
>>> >>
>>> >>
>>> >
>>>
>>
>>
>

Re: spark plugin with java

Posted by Krishna <re...@gmail.com>.

Yes, I will create new tickets for any issues that I may run into.
Another question: For now I'm pursuing the option of creating a dataframe
as shown in my previous email. How does spark handle parallelization in
this case? Does it use phoenix metadata on splits?


On Wed, Dec 2, 2015 at 11:02 AM, Josh Mahonin <jm...@gmail.com> wrote:

> Hi Krishna,
>
> That's great to hear. You're right, the plugin itself should be backwards
> compatible to Spark 1.3.1 and should be for any version of Phoenix past
> 4.4.0, though I can't guarantee that to be the case forever. As well, I
> don't know how much usage there is across the board using the Java API and
> DataFrames, you in fact may be the first. If you are encountering any
> errors with it could you file a JIRA please with any stack traces you see?
>
> Since Spark is a very quickly changing project, often they update internal
> functionality that we sometimes lag behind on support for, and as a result
> there's no direct mapping between specific Phoenix versions and specific
> Spark versions. We add new support as fast as we get patches, essentially.
>
> My general recommendation is to stay back a major version on Spark if
> possible, but if you need to use the latest Spark releases, try use the
> latest Phoenix release as well. The DataFrame support in Phoenix, for
> instance, has had many patches and improvements recently that older
> versions are missing.
>
> Thanks,
>
> Josh
>
> On Wed, Dec 2, 2015 at 1:40 PM, Krishna <re...@gmail.com> wrote:
>
>> Yes, that works for Spark 1.4.x. Website says Spark 1.3.1+ for Spark
>> plugin, is that accurate?
>>
>> For Spark 1.3.1, I created a dataframe as follows (could not use the
>> plugin):
>> *        Map<String, String> options = new HashMap<String, String>();*
>> *        options.put("url", PhoenixRuntime.JDBC_PROTOCOL +
>> PhoenixRuntime.JDBC_PROTOCOL_SEPARATOR + zkQuorum);*
>> *        options.put("dbtable", "TABLE_NAME");*
>>
>> *        SQLContext sqlContext = new SQLContext(sc);*
>> *        DataFrame jdbcDF = sqlContext.load("jdbc",
>> options).filter("COL_NAME > SOME_VALUE");*
>>
>> Also, it isn't immediately obvious which version of Spark was used in
>> building Phoenix artifacts available on Maven. May be, it's worth putting
>> it on the website. Let me know if the mapping below is incorrect.
>>
>> Phoenix 4.4.x <--> Spark 1.4.0
>> > Phoenix 4.5.x <--> Spark 1.5.0
>> > Phoenix 4.6.x <--> Spark 1.5.0
>>
>>
>> On Tue, Dec 1, 2015 at 7:05 PM, Josh Mahonin <jm...@gmail.com> wrote:
>>
>> > Hi Krishna,
>> >
>> > I've not tried it in Java at all, but I as of Spark 1.4+ the DataFrame
>> API
>> > should be unified between Scala and Java, so the following may work for
>> you:
>> >
>> > DataFrame df = sqlContext.read()
>> >     .format("org.apache.phoenix.spark")
>> >     .option("table", "TABLE1")
>> >     .option("zkUrl", "<phoenix-server:2181>")
>> >     .load();
>> >
>> > Note that 'zkUrl' must be set to your Phoenix URL, and passing a 'conf'
>> > parameter isn't supported. Please let us know back here if this works
>> out
>> > for you, I'd love to update the documentation and unit tests if it
>> works.
>> >
>> > Josh
>> >
>> > On Tue, Dec 1, 2015 at 6:30 PM, Krishna <re...@gmail.com> wrote:
>> >
>> >> Hi,
>> >>
>> >> Is there a working example for using spark plugin in Java?
>> Specifically,
>> >> what's the java equivalent for creating a dataframe as shown here in
>> scala:
>> >>
>> >> val df = sqlContext.phoenixTableAsDataFrame("TABLE1", Array("ID",
>> "COL1"), conf = configuration)
>> >>
>> >>
>> >
>>
>
>

Re: spark plugin with java

Posted by Krishna <re...@gmail.com>.

Yes, I will create new tickets for any issues that I may run into.
Another question: For now I'm pursuing the option of creating a dataframe
as shown in my previous email. How does spark handle parallelization in
this case? Does it use phoenix metadata on splits?


On Wed, Dec 2, 2015 at 11:02 AM, Josh Mahonin <jm...@gmail.com> wrote:

> Hi Krishna,
>
> That's great to hear. You're right, the plugin itself should be backwards
> compatible to Spark 1.3.1 and should be for any version of Phoenix past
> 4.4.0, though I can't guarantee that to be the case forever. As well, I
> don't know how much usage there is across the board using the Java API and
> DataFrames, you in fact may be the first. If you are encountering any
> errors with it could you file a JIRA please with any stack traces you see?
>
> Since Spark is a very quickly changing project, often they update internal
> functionality that we sometimes lag behind on support for, and as a result
> there's no direct mapping between specific Phoenix versions and specific
> Spark versions. We add new support as fast as we get patches, essentially.
>
> My general recommendation is to stay back a major version on Spark if
> possible, but if you need to use the latest Spark releases, try use the
> latest Phoenix release as well. The DataFrame support in Phoenix, for
> instance, has had many patches and improvements recently that older
> versions are missing.
>
> Thanks,
>
> Josh
>
> On Wed, Dec 2, 2015 at 1:40 PM, Krishna <re...@gmail.com> wrote:
>
>> Yes, that works for Spark 1.4.x. Website says Spark 1.3.1+ for Spark
>> plugin, is that accurate?
>>
>> For Spark 1.3.1, I created a dataframe as follows (could not use the
>> plugin):
>> *        Map<String, String> options = new HashMap<String, String>();*
>> *        options.put("url", PhoenixRuntime.JDBC_PROTOCOL +
>> PhoenixRuntime.JDBC_PROTOCOL_SEPARATOR + zkQuorum);*
>> *        options.put("dbtable", "TABLE_NAME");*
>>
>> *        SQLContext sqlContext = new SQLContext(sc);*
>> *        DataFrame jdbcDF = sqlContext.load("jdbc",
>> options).filter("COL_NAME > SOME_VALUE");*
>>
>> Also, it isn't immediately obvious which version of Spark was used in
>> building Phoenix artifacts available on Maven. May be, it's worth putting
>> it on the website. Let me know if the mapping below is incorrect.
>>
>> Phoenix 4.4.x <--> Spark 1.4.0
>> > Phoenix 4.5.x <--> Spark 1.5.0
>> > Phoenix 4.6.x <--> Spark 1.5.0
>>
>>
>> On Tue, Dec 1, 2015 at 7:05 PM, Josh Mahonin <jm...@gmail.com> wrote:
>>
>> > Hi Krishna,
>> >
>> > I've not tried it in Java at all, but I as of Spark 1.4+ the DataFrame
>> API
>> > should be unified between Scala and Java, so the following may work for
>> you:
>> >
>> > DataFrame df = sqlContext.read()
>> >     .format("org.apache.phoenix.spark")
>> >     .option("table", "TABLE1")
>> >     .option("zkUrl", "<phoenix-server:2181>")
>> >     .load();
>> >
>> > Note that 'zkUrl' must be set to your Phoenix URL, and passing a 'conf'
>> > parameter isn't supported. Please let us know back here if this works
>> out
>> > for you, I'd love to update the documentation and unit tests if it
>> works.
>> >
>> > Josh
>> >
>> > On Tue, Dec 1, 2015 at 6:30 PM, Krishna <re...@gmail.com> wrote:
>> >
>> >> Hi,
>> >>
>> >> Is there a working example for using spark plugin in Java?
>> Specifically,
>> >> what's the java equivalent for creating a dataframe as shown here in
>> scala:
>> >>
>> >> val df = sqlContext.phoenixTableAsDataFrame("TABLE1", Array("ID",
>> "COL1"), conf = configuration)
>> >>
>> >>
>> >
>>
>
>

Re: spark plugin with java

Posted by Josh Mahonin <jm...@gmail.com>.

Hi Krishna,

That's great to hear. You're right, the plugin itself should be backwards
compatible to Spark 1.3.1 and should be for any version of Phoenix past
4.4.0, though I can't guarantee that to be the case forever. As well, I
don't know how much usage there is across the board using the Java API and
DataFrames, you in fact may be the first. If you are encountering any
errors with it could you file a JIRA please with any stack traces you see?

Since Spark is a very quickly changing project, often they update internal
functionality that we sometimes lag behind on support for, and as a result
there's no direct mapping between specific Phoenix versions and specific
Spark versions. We add new support as fast as we get patches, essentially.

My general recommendation is to stay back a major version on Spark if
possible, but if you need to use the latest Spark releases, try use the
latest Phoenix release as well. The DataFrame support in Phoenix, for
instance, has had many patches and improvements recently that older
versions are missing.

Thanks,

Josh

On Wed, Dec 2, 2015 at 1:40 PM, Krishna <re...@gmail.com> wrote:

> Yes, that works for Spark 1.4.x. Website says Spark 1.3.1+ for Spark
> plugin, is that accurate?
>
> For Spark 1.3.1, I created a dataframe as follows (could not use the
> plugin):
> *        Map<String, String> options = new HashMap<String, String>();*
> *        options.put("url", PhoenixRuntime.JDBC_PROTOCOL +
> PhoenixRuntime.JDBC_PROTOCOL_SEPARATOR + zkQuorum);*
> *        options.put("dbtable", "TABLE_NAME");*
>
> *        SQLContext sqlContext = new SQLContext(sc);*
> *        DataFrame jdbcDF = sqlContext.load("jdbc",
> options).filter("COL_NAME > SOME_VALUE");*
>
> Also, it isn't immediately obvious which version of Spark was used in
> building Phoenix artifacts available on Maven. May be, it's worth putting
> it on the website. Let me know if the mapping below is incorrect.
>
> Phoenix 4.4.x <--> Spark 1.4.0
> > Phoenix 4.5.x <--> Spark 1.5.0
> > Phoenix 4.6.x <--> Spark 1.5.0
>
>
> On Tue, Dec 1, 2015 at 7:05 PM, Josh Mahonin <jm...@gmail.com> wrote:
>
> > Hi Krishna,
> >
> > I've not tried it in Java at all, but I as of Spark 1.4+ the DataFrame
> API
> > should be unified between Scala and Java, so the following may work for
> you:
> >
> > DataFrame df = sqlContext.read()
> >     .format("org.apache.phoenix.spark")
> >     .option("table", "TABLE1")
> >     .option("zkUrl", "<phoenix-server:2181>")
> >     .load();
> >
> > Note that 'zkUrl' must be set to your Phoenix URL, and passing a 'conf'
> > parameter isn't supported. Please let us know back here if this works out
> > for you, I'd love to update the documentation and unit tests if it works.
> >
> > Josh
> >
> > On Tue, Dec 1, 2015 at 6:30 PM, Krishna <re...@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> Is there a working example for using spark plugin in Java? Specifically,
> >> what's the java equivalent for creating a dataframe as shown here in
> scala:
> >>
> >> val df = sqlContext.phoenixTableAsDataFrame("TABLE1", Array("ID",
> "COL1"), conf = configuration)
> >>
> >>
> >
>

Re: spark plugin with java

Posted by Josh Mahonin <jm...@gmail.com>.

Hi Krishna,

That's great to hear. You're right, the plugin itself should be backwards
compatible to Spark 1.3.1 and should be for any version of Phoenix past
4.4.0, though I can't guarantee that to be the case forever. As well, I
don't know how much usage there is across the board using the Java API and
DataFrames, you in fact may be the first. If you are encountering any
errors with it could you file a JIRA please with any stack traces you see?

Since Spark is a very quickly changing project, often they update internal
functionality that we sometimes lag behind on support for, and as a result
there's no direct mapping between specific Phoenix versions and specific
Spark versions. We add new support as fast as we get patches, essentially.

My general recommendation is to stay back a major version on Spark if
possible, but if you need to use the latest Spark releases, try use the
latest Phoenix release as well. The DataFrame support in Phoenix, for
instance, has had many patches and improvements recently that older
versions are missing.

Thanks,

Josh

On Wed, Dec 2, 2015 at 1:40 PM, Krishna <re...@gmail.com> wrote:

> Yes, that works for Spark 1.4.x. Website says Spark 1.3.1+ for Spark
> plugin, is that accurate?
>
> For Spark 1.3.1, I created a dataframe as follows (could not use the
> plugin):
> *        Map<String, String> options = new HashMap<String, String>();*
> *        options.put("url", PhoenixRuntime.JDBC_PROTOCOL +
> PhoenixRuntime.JDBC_PROTOCOL_SEPARATOR + zkQuorum);*
> *        options.put("dbtable", "TABLE_NAME");*
>
> *        SQLContext sqlContext = new SQLContext(sc);*
> *        DataFrame jdbcDF = sqlContext.load("jdbc",
> options).filter("COL_NAME > SOME_VALUE");*
>
> Also, it isn't immediately obvious which version of Spark was used in
> building Phoenix artifacts available on Maven. May be, it's worth putting
> it on the website. Let me know if the mapping below is incorrect.
>
> Phoenix 4.4.x <--> Spark 1.4.0
> > Phoenix 4.5.x <--> Spark 1.5.0
> > Phoenix 4.6.x <--> Spark 1.5.0
>
>
> On Tue, Dec 1, 2015 at 7:05 PM, Josh Mahonin <jm...@gmail.com> wrote:
>
> > Hi Krishna,
> >
> > I've not tried it in Java at all, but I as of Spark 1.4+ the DataFrame
> API
> > should be unified between Scala and Java, so the following may work for
> you:
> >
> > DataFrame df = sqlContext.read()
> >     .format("org.apache.phoenix.spark")
> >     .option("table", "TABLE1")
> >     .option("zkUrl", "<phoenix-server:2181>")
> >     .load();
> >
> > Note that 'zkUrl' must be set to your Phoenix URL, and passing a 'conf'
> > parameter isn't supported. Please let us know back here if this works out
> > for you, I'd love to update the documentation and unit tests if it works.
> >
> > Josh
> >
> > On Tue, Dec 1, 2015 at 6:30 PM, Krishna <re...@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> Is there a working example for using spark plugin in Java? Specifically,
> >> what's the java equivalent for creating a dataframe as shown here in
> scala:
> >>
> >> val df = sqlContext.phoenixTableAsDataFrame("TABLE1", Array("ID",
> "COL1"), conf = configuration)
> >>
> >>
> >
>

Re: spark plugin with java

Posted by Krishna <re...@gmail.com>.

Yes, that works for Spark 1.4.x. Website says Spark 1.3.1+ for Spark
plugin, is that accurate?

For Spark 1.3.1, I created a dataframe as follows (could not use the
plugin):
*        Map<String, String> options = new HashMap<String, String>();*
*        options.put("url", PhoenixRuntime.JDBC_PROTOCOL +
PhoenixRuntime.JDBC_PROTOCOL_SEPARATOR + zkQuorum);*
*        options.put("dbtable", "TABLE_NAME");*

*        SQLContext sqlContext = new SQLContext(sc);*
*        DataFrame jdbcDF = sqlContext.load("jdbc",
options).filter("COL_NAME > SOME_VALUE");*

Also, it isn't immediately obvious which version of Spark was used in
building Phoenix artifacts available on Maven. May be, it's worth putting
it on the website. Let me know if the mapping below is incorrect.

Phoenix 4.4.x <--> Spark 1.4.0
> Phoenix 4.5.x <--> Spark 1.5.0
> Phoenix 4.6.x <--> Spark 1.5.0


On Tue, Dec 1, 2015 at 7:05 PM, Josh Mahonin <jm...@gmail.com> wrote:

> Hi Krishna,
>
> I've not tried it in Java at all, but I as of Spark 1.4+ the DataFrame API
> should be unified between Scala and Java, so the following may work for you:
>
> DataFrame df = sqlContext.read()
>     .format("org.apache.phoenix.spark")
>     .option("table", "TABLE1")
>     .option("zkUrl", "<phoenix-server:2181>")
>     .load();
>
> Note that 'zkUrl' must be set to your Phoenix URL, and passing a 'conf'
> parameter isn't supported. Please let us know back here if this works out
> for you, I'd love to update the documentation and unit tests if it works.
>
> Josh
>
> On Tue, Dec 1, 2015 at 6:30 PM, Krishna <re...@gmail.com> wrote:
>
>> Hi,
>>
>> Is there a working example for using spark plugin in Java? Specifically,
>> what's the java equivalent for creating a dataframe as shown here in scala:
>>
>> val df = sqlContext.phoenixTableAsDataFrame("TABLE1", Array("ID", "COL1"), conf = configuration)
>>
>>
>

Re: spark plugin with java

Posted by Krishna <re...@gmail.com>.

Yes, that works for Spark 1.4.x. Website says Spark 1.3.1+ for Spark
plugin, is that accurate?

For Spark 1.3.1, I created a dataframe as follows (could not use the
plugin):
*        Map<String, String> options = new HashMap<String, String>();*
*        options.put("url", PhoenixRuntime.JDBC_PROTOCOL +
PhoenixRuntime.JDBC_PROTOCOL_SEPARATOR + zkQuorum);*
*        options.put("dbtable", "TABLE_NAME");*

*        SQLContext sqlContext = new SQLContext(sc);*
*        DataFrame jdbcDF = sqlContext.load("jdbc",
options).filter("COL_NAME > SOME_VALUE");*

Also, it isn't immediately obvious which version of Spark was used in
building Phoenix artifacts available on Maven. May be, it's worth putting
it on the website. Let me know if the mapping below is incorrect.

Phoenix 4.4.x <--> Spark 1.4.0
> Phoenix 4.5.x <--> Spark 1.5.0
> Phoenix 4.6.x <--> Spark 1.5.0


On Tue, Dec 1, 2015 at 7:05 PM, Josh Mahonin <jm...@gmail.com> wrote:

> Hi Krishna,
>
> I've not tried it in Java at all, but I as of Spark 1.4+ the DataFrame API
> should be unified between Scala and Java, so the following may work for you:
>
> DataFrame df = sqlContext.read()
>     .format("org.apache.phoenix.spark")
>     .option("table", "TABLE1")
>     .option("zkUrl", "<phoenix-server:2181>")
>     .load();
>
> Note that 'zkUrl' must be set to your Phoenix URL, and passing a 'conf'
> parameter isn't supported. Please let us know back here if this works out
> for you, I'd love to update the documentation and unit tests if it works.
>
> Josh
>
> On Tue, Dec 1, 2015 at 6:30 PM, Krishna <re...@gmail.com> wrote:
>
>> Hi,
>>
>> Is there a working example for using spark plugin in Java? Specifically,
>> what's the java equivalent for creating a dataframe as shown here in scala:
>>
>> val df = sqlContext.phoenixTableAsDataFrame("TABLE1", Array("ID", "COL1"), conf = configuration)
>>
>>
>

Re: spark plugin with java

Posted by Josh Mahonin <jm...@gmail.com>.

Hi Krishna,

I've not tried it in Java at all, but I as of Spark 1.4+ the DataFrame API
should be unified between Scala and Java, so the following may work for you:

DataFrame df = sqlContext.read()
    .format("org.apache.phoenix.spark")
    .option("table", "TABLE1")
    .option("zkUrl", "<phoenix-server:2181>")
    .load();

Note that 'zkUrl' must be set to your Phoenix URL, and passing a 'conf'
parameter isn't supported. Please let us know back here if this works out
for you, I'd love to update the documentation and unit tests if it works.

Josh

On Tue, Dec 1, 2015 at 6:30 PM, Krishna <re...@gmail.com> wrote:

> Hi,
>
> Is there a working example for using spark plugin in Java? Specifically,
> what's the java equivalent for creating a dataframe as shown here in scala:
>
> val df = sqlContext.phoenixTableAsDataFrame("TABLE1", Array("ID", "COL1"), conf = configuration)
>
>

Re: spark plugin with java

Posted by Josh Mahonin <jm...@gmail.com>.

Hi Krishna,

I've not tried it in Java at all, but I as of Spark 1.4+ the DataFrame API
should be unified between Scala and Java, so the following may work for you:

DataFrame df = sqlContext.read()
    .format("org.apache.phoenix.spark")
    .option("table", "TABLE1")
    .option("zkUrl", "<phoenix-server:2181>")
    .load();

Note that 'zkUrl' must be set to your Phoenix URL, and passing a 'conf'
parameter isn't supported. Please let us know back here if this works out
for you, I'd love to update the documentation and unit tests if it works.

Josh

On Tue, Dec 1, 2015 at 6:30 PM, Krishna <re...@gmail.com> wrote:

> Hi,
>
> Is there a working example for using spark plugin in Java? Specifically,
> what's the java equivalent for creating a dataframe as shown here in scala:
>
> val df = sqlContext.phoenixTableAsDataFrame("TABLE1", Array("ID", "COL1"), conf = configuration)
>
>