You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by Saif Addin <sa...@gmail.com> on 2018/09/12 20:32:38 UTC

Missing content in phoenix after writing from Spark

Hi all,

We're trying to write tables with all string columns from spark.
We are not using the Spark Connector, instead we are directly writing byte
arrays from RDDs.

The process works fine, and Hbase receives the data correctly, and content
is consistent.

However reading the table from Phoenix, we notice the first character of
strings are missing. This sounds like it's a byte encoding issue, but we're
at loss. We're using PVarchar to generate bytes.

Here's the snippet of code creating the RDD:

val tdd = pdd.flatMap(x => {
  val rowKey = PVarchar.INSTANCE.toBytes(x._1)
  for(i <- 0 until cols.length) yield {
    other stuff for other columns ...
    ...
    (rowKey, (column1, column2, column3))
  }
})

...

We then create the following output to be written down in Hbase

val output = tdd.map(x => {
    val rowKeyByte: Array[Byte] = x._1
    val immutableRowKey = new ImmutableBytesWritable(rowKeyByte)

    val kv = new KeyValue(rowKeyByte,
        PVarchar.INSTANCE.toBytes(column1),
        PVarchar.INSTANCE.toBytes(column2),
      PVarchar.INSTANCE.toBytes(column3)
    )

    (immutableRowKey, kv)
})

By the way, we are using *KryoSerializer* in order to be able to serialize
all classes necessary for Hbase (KeyValue, BytesWritable, etc).

The key of this table is the one missing data when queried from Phoenix. So
we guess something is wrong with the byte ser.

Any ideas? Appreciated!
Saif

Re: Missing content in phoenix after writing from Spark

Posted by Saif Addin <sa...@gmail.com>.
Thanks. We topped with the next problem, we do need to do appending. But
current support documentation says only Overwrite mode is available right?
In this case we'll have to resort back to RDD writing, correct?

On Mon, Sep 17, 2018 at 8:45 PM Josh Elser <el...@apache.org> wrote:

> As I said earlier, the expectation is that you use the
> phoenix-client.jar and phoenix-spark2.jar for the phoenix-spark
> integration with spark2.
>
> You do not need to reference all of these jars by hand. We create the
> jars with all of the necessary dependencies bundled to specifically
> avoid creating this problem for users.
>
> On 9/17/18 3:27 PM, Saif Addin wrote:
> > Thanks for the patience, sorry maybe I sent incomplete information. We
> > are loading the following jars and still getting: */executor 1):
> > java.lang.NoClassDefFoundError: Could not initialize class
> > org.apache.phoenix.query.QueryServicesOptions/*
> > */
> > /*
> >
> http://central.maven.org/maven2/org/apache/hbase/hbase-client/2.1.0/hbase-client-2.1.0.jar
> >
> http://central.maven.org/maven2/org/apache/hbase/hbase-common/2.1.0/hbase-common-2.1.0.jar
> >
> http://central.maven.org/maven2/org/apache/hbase/hbase-hadoop-compat/2.1.0/hbase-hadoop-compat-2.1.0.jar
> >
> http://central.maven.org/maven2/org/apache/hbase/hbase-mapreduce/2.1.0/hbase-mapreduce-2.1.0.jar
> >
> http://central.maven.org/maven2/org/apache/hbase/thirdparty/hbase-shaded-miscellaneous/2.1.0/hbase-shaded-miscellaneous-2.1.0.jar
> >
> http://central.maven.org/maven2/org/apache/hbase/hbase-protocol/2.1.0/hbase-protocol-2.1.0.jar
> >
> http://central.maven.org/maven2/org/apache/hbase/hbase-protocol-shaded/2.1.0/hbase-protocol-shaded-2.1.0.jar
> >
> http://central.maven.org/maven2/org/apache/hbase/thirdparty/hbase-shaded-protobuf/2.1.0/hbase-shaded-protobuf-2.1.0.jar
> >
> http://central.maven.org/maven2/org/apache/hbase/thirdparty/hbase-shaded-netty/2.1.0/hbase-shaded-netty-2.1.0.jar
> >
> http://central.maven.org/maven2/org/apache/hbase/hbase-server/2.1.0/hbase-server-2.1.0.jar
> >
> http://central.maven.org/maven2/org/apache/hbase/hbase-hadoop2-compat/2.1.0/hbase-hadoop2-compat-2.1.0.jar
> >
> http://central.maven.org/maven2/org/apache/hbase/hbase-metrics/2.1.0/hbase-metrics-2.1.0.jar
> >
> http://central.maven.org/maven2/org/apache/hbase/hbase-metrics-api/2.1.0/hbase-metrics-api-2.1.0.jar
> >
> http://central.maven.org/maven2/org/apache/hbase/hbase-zookeeper/2.1.0/hbase-zookeeper-2.1.0.jar
> >
> >
> http://central.maven.org/maven2/org/apache/phoenix/phoenix-spark/5.0.0-HBase-2.0/phoenix-spark-5.0.0-HBase-2.0.jar
> >
> http://central.maven.org/maven2/org/apache/phoenix/phoenix-core/5.0.0-HBase-2.0/phoenix-core-5.0.0-HBase-2.0.jar
> >
> http://central.maven.org/maven2/org/apache/phoenix/phoenix-queryserver/5.0.0-HBase-2.0/phoenix-queryserver-5.0.0-HBase-2.0.jar
> >
> http://central.maven.org/maven2/org/apache/phoenix/phoenix-queryserver-client/5.0.0-HBase-2.0/phoenix-queryserver-client-5.0.0-HBase-2.0.jar
> >
> >
> http://central.maven.org/maven2/org/apache/twill/twill-zookeeper/0.13.0/twill-zookeeper-0.13.0.jar
> >
> http://central.maven.org/maven2/org/apache/twill/twill-discovery-core/0.13.0/twill-discovery-core-0.13.0.jar
> >
> > Not sure which one I could be missing??
> >
> > On Fri, Sep 14, 2018 at 7:34 PM Josh Elser <elserj@apache.org
> > <ma...@apache.org>> wrote:
> >
> >     Uh, you're definitely not using the right JARs :)
> >
> >     You'll want the phoenix-client.jar for the Phoenix JDBC driver and
> the
> >     phoenix-spark.jar for the Phoenix RDD.
> >
> >     On 9/14/18 1:08 PM, Saif Addin wrote:
> >      > Hi, I am attempting to make connection with Spark but no success
> >     so far.
> >      >
> >      > For writing into Phoenix, I am trying this:
> >      >
> >      > tdd.toDF("ID", "COL1", "COL2",
> >      > "COL3").write.format("org.apache.phoenix.spark").option("zkUrl",
> >      > "zookeper-host-url:2181").option("table",
> >      > htablename).mode("overwrite").save()
> >      >
> >      > But getting:
> >      > *java.sql.SQLException: ERROR 103 (08004): Unable to establish
> >     connection.*
> >      > *
> >      > *
> >      > For reading, on the other hand, attempting this:
> >      >
> >      > val hbConf = HBaseConfiguration.create()
> >      > val hbaseSitePath = "/etc/hbase/conf/hbase-site.xml"
> >      > hbConf.addResource(new Path(hbaseSitePath))
> >      >
> >      > spark.sqlContext.phoenixTableAsDataFrame("VISTA_409X68",
> >     Array("ID"),
> >      > conf = hbConf)
> >      >
> >      > Gets me
> >      > *java.lang.NoClassDefFoundError: Could not initialize class
> >      > org.apache.phoenix.query.QueryServicesOptions*
> >      > *
> >      > *
> >      > I have added phoenix-queryserver-5.0.0-HBase-2.0.jar and
> >      > phoenix-queryserver-client-5.0.0-HBase-2.0.jar
> >      > Any thoughts? I have an hbase-site.xml file with more
> >     configuration but
> >      > not sure how to get it to be read in the saving instance.
> >      > Thanks
> >      >
> >      > On Thu, Sep 13, 2018 at 11:38 AM Josh Elser <elserj@apache.org
> >     <ma...@apache.org>
> >      > <mailto:elserj@apache.org <ma...@apache.org>>> wrote:
> >      >
> >      >     Pretty sure we ran tests with Spark 2.3 with Phoenix 5.0. Not
> >     sure if
> >      >     Spark has already moved beyond that.
> >      >
> >      >     On 9/12/18 11:00 PM, Saif Addin wrote:
> >      >      > Thanks, we'll try Spark Connector then. Thought it didn't
> >     support
> >      >     newest
> >      >      > Spark Versions
> >      >      >
> >      >      > On Wed, Sep 12, 2018 at 11:03 PM Jaanai Zhang
> >      >     <cloud.poster@gmail.com <ma...@gmail.com>
> >     <mailto:cloud.poster@gmail.com <ma...@gmail.com>>
> >      >      > <mailto:cloud.poster@gmail.com
> >     <ma...@gmail.com> <mailto:cloud.poster@gmail.com
> >     <ma...@gmail.com>>>>
> >      >     wrote:
> >      >      >
> >      >      >     It seems columns data missing mapping information of
> the
> >      >     schema. if
> >      >      >     you want to use this way to write HBase table,  you
> >     can create an
> >      >      >     HBase table and uses Phoenix mapping it.
> >      >      >
> >      >      >     ----------------------------------------
> >      >      >         Jaanai Zhang
> >      >      >         Best regards!
> >      >      >
> >      >      >
> >      >      >
> >      >      >     Thomas D'Silva <tdsilva@salesforce.com
> >     <ma...@salesforce.com>
> >      >     <mailto:tdsilva@salesforce.com <mailto:tdsilva@salesforce.com
> >>
> >      >      >     <mailto:tdsilva@salesforce.com
> >     <ma...@salesforce.com>
> >      >     <mailto:tdsilva@salesforce.com
> >     <ma...@salesforce.com>>>> 于2018年9月13日周四 上午6:03写道:
> >      >      >
> >      >      >         Is there a reason you didn't use the
> >     spark-connector to
> >      >      >         serialize your data?
> >      >      >
> >      >      >         On Wed, Sep 12, 2018 at 2:28 PM, Saif Addin
> >      >     <saif1988@gmail.com <ma...@gmail.com>
> >     <mailto:saif1988@gmail.com <ma...@gmail.com>>
> >      >      >         <mailto:saif1988@gmail.com
> >     <ma...@gmail.com> <mailto:saif1988@gmail.com
> >     <ma...@gmail.com>>>>
> >      >     wrote:
> >      >      >
> >      >      >             Thank you Josh! That was helpful. Indeed,
> >     there was a
> >      >     salt
> >      >      >             bucket on the table, and the key-column now
> shows
> >      >     correctly.
> >      >      >
> >      >      >             However, the problem still persists in that
> >     the rest
> >      >     of the
> >      >      >             columns show as completely empty on Phoenix
> >     (appear
> >      >      >             correctly on Hbase). We'll be looking into
> >     this but
> >      >     if you
> >      >      >             have any further advice, appreciated.
> >      >      >
> >      >      >             Saif
> >      >      >
> >      >      >             On Wed, Sep 12, 2018 at 5:50 PM Josh Elser
> >      >      >             <elserj@apache.org <ma...@apache.org>
> >     <mailto:elserj@apache.org <ma...@apache.org>>
> >      >     <mailto:elserj@apache.org <ma...@apache.org>
> >     <mailto:elserj@apache.org <ma...@apache.org>>>> wrote:
> >      >      >
> >      >      >                 Reminder: Using Phoenix internals forces
> >     you to
> >      >      >                 understand exactly how
> >      >      >                 the version of Phoenix that you're using
> >     serializes
> >      >      >                 data. Is there a
> >      >      >                 reason you're not using SQL to interact
> >     with Phoenix?
> >      >      >
> >      >      >                 Sounds to me that Phoenix is expecting
> >     more data
> >      >     at the
> >      >      >                 head of your
> >      >      >                 rowkey. Maybe a salt bucket that you've
> >     defined
> >      >     on the
> >      >      >                 table but not
> >      >      >                 created?
> >      >      >
> >      >      >                 On 9/12/18 4:32 PM, Saif Addin wrote:
> >      >      >                  > Hi all,
> >      >      >                  >
> >      >      >                  > We're trying to write tables with all
> >     string
> >      >     columns
> >      >      >                 from spark.
> >      >      >                  > We are not using the Spark Connector,
> >     instead
> >      >     we are
> >      >      >                 directly writing
> >      >      >                  > byte arrays from RDDs.
> >      >      >                  >
> >      >      >                  > The process works fine, and Hbase
> >     receives the
> >      >     data
> >      >      >                 correctly, and
> >      >      >                  > content is consistent.
> >      >      >                  >
> >      >      >                  > However reading the table from Phoenix,
> we
> >      >     notice the
> >      >      >                 first character of
> >      >      >                  > strings are missing. This sounds like
> >     it's a byte
> >      >      >                 encoding issue, but
> >      >      >                  > we're at loss. We're using PVarchar to
> >      >     generate bytes.
> >      >      >                  >
> >      >      >                  > Here's the snippet of code creating the
> >     RDD:
> >      >      >                  >
> >      >      >                  > val tdd = pdd.flatMap(x => {
> >      >      >                  >    val rowKey =
> >     PVarchar.INSTANCE.toBytes(x._1)
> >      >      >                  >    for(i <- 0 until cols.length) yield {
> >      >      >                  >      other stuff for other columns ...
> >      >      >                  >      ...
> >      >      >                  >      (rowKey, (column1, column2,
> column3))
> >      >      >                  >    }
> >      >      >                  > })
> >      >      >                  >
> >      >      >                  > ...
> >      >      >                  >
> >      >      >                  > We then create the following output to
> >     be written
> >      >      >                 down in Hbase
> >      >      >                  >
> >      >      >                  > val output = tdd.map(x => {
> >      >      >                  >      val rowKeyByte: Array[Byte] = x._1
> >      >      >                  >      val immutableRowKey = new
> >      >      >                 ImmutableBytesWritable(rowKeyByte)
> >      >      >                  >
> >      >      >                  >      val kv = new KeyValue(rowKeyByte,
> >      >      >                  >
> >     PVarchar.INSTANCE.toBytes(column1),
> >      >      >                  >
> >     PVarchar.INSTANCE.toBytes(column2),
> >      >      >                  >
> PVarchar.INSTANCE.toBytes(column3)
> >      >      >                  >      )
> >      >      >                  >      (immutableRowKey, kv)
> >      >      >                  > })
> >      >      >                  >
> >      >      >                  > By the way, we are using
> >     *KryoSerializer* in
> >      >     order to
> >      >      >                 be able to
> >      >      >                  > serialize all classes necessary for
> Hbase
> >      >     (KeyValue,
> >      >      >                 BytesWritable, etc).
> >      >      >                  >
> >      >      >                  > The key of this table is the one
> >     missing data when
> >      >      >                 queried from Phoenix.
> >      >      >                  > So we guess something is wrong with the
> >     byte ser.
> >      >      >                  >
> >      >      >                  > Any ideas? Appreciated!
> >      >      >                  > Saif
> >      >      >
> >      >      >
> >      >
> >
>

Re: Missing content in phoenix after writing from Spark

Posted by Josh Elser <el...@apache.org>.
As I said earlier, the expectation is that you use the 
phoenix-client.jar and phoenix-spark2.jar for the phoenix-spark 
integration with spark2.

You do not need to reference all of these jars by hand. We create the 
jars with all of the necessary dependencies bundled to specifically 
avoid creating this problem for users.

On 9/17/18 3:27 PM, Saif Addin wrote:
> Thanks for the patience, sorry maybe I sent incomplete information. We 
> are loading the following jars and still getting: */executor 1): 
> java.lang.NoClassDefFoundError: Could not initialize class 
> org.apache.phoenix.query.QueryServicesOptions/*
> */
> /*
> http://central.maven.org/maven2/org/apache/hbase/hbase-client/2.1.0/hbase-client-2.1.0.jar
> http://central.maven.org/maven2/org/apache/hbase/hbase-common/2.1.0/hbase-common-2.1.0.jar
> http://central.maven.org/maven2/org/apache/hbase/hbase-hadoop-compat/2.1.0/hbase-hadoop-compat-2.1.0.jar
> http://central.maven.org/maven2/org/apache/hbase/hbase-mapreduce/2.1.0/hbase-mapreduce-2.1.0.jar
> http://central.maven.org/maven2/org/apache/hbase/thirdparty/hbase-shaded-miscellaneous/2.1.0/hbase-shaded-miscellaneous-2.1.0.jar
> http://central.maven.org/maven2/org/apache/hbase/hbase-protocol/2.1.0/hbase-protocol-2.1.0.jar
> http://central.maven.org/maven2/org/apache/hbase/hbase-protocol-shaded/2.1.0/hbase-protocol-shaded-2.1.0.jar
> http://central.maven.org/maven2/org/apache/hbase/thirdparty/hbase-shaded-protobuf/2.1.0/hbase-shaded-protobuf-2.1.0.jar
> http://central.maven.org/maven2/org/apache/hbase/thirdparty/hbase-shaded-netty/2.1.0/hbase-shaded-netty-2.1.0.jar
> http://central.maven.org/maven2/org/apache/hbase/hbase-server/2.1.0/hbase-server-2.1.0.jar
> http://central.maven.org/maven2/org/apache/hbase/hbase-hadoop2-compat/2.1.0/hbase-hadoop2-compat-2.1.0.jar
> http://central.maven.org/maven2/org/apache/hbase/hbase-metrics/2.1.0/hbase-metrics-2.1.0.jar
> http://central.maven.org/maven2/org/apache/hbase/hbase-metrics-api/2.1.0/hbase-metrics-api-2.1.0.jar
> http://central.maven.org/maven2/org/apache/hbase/hbase-zookeeper/2.1.0/hbase-zookeeper-2.1.0.jar
> 
> http://central.maven.org/maven2/org/apache/phoenix/phoenix-spark/5.0.0-HBase-2.0/phoenix-spark-5.0.0-HBase-2.0.jar
> http://central.maven.org/maven2/org/apache/phoenix/phoenix-core/5.0.0-HBase-2.0/phoenix-core-5.0.0-HBase-2.0.jar
> http://central.maven.org/maven2/org/apache/phoenix/phoenix-queryserver/5.0.0-HBase-2.0/phoenix-queryserver-5.0.0-HBase-2.0.jar
> http://central.maven.org/maven2/org/apache/phoenix/phoenix-queryserver-client/5.0.0-HBase-2.0/phoenix-queryserver-client-5.0.0-HBase-2.0.jar
> 
> http://central.maven.org/maven2/org/apache/twill/twill-zookeeper/0.13.0/twill-zookeeper-0.13.0.jar
> http://central.maven.org/maven2/org/apache/twill/twill-discovery-core/0.13.0/twill-discovery-core-0.13.0.jar
> 
> Not sure which one I could be missing??
> 
> On Fri, Sep 14, 2018 at 7:34 PM Josh Elser <elserj@apache.org 
> <ma...@apache.org>> wrote:
> 
>     Uh, you're definitely not using the right JARs :)
> 
>     You'll want the phoenix-client.jar for the Phoenix JDBC driver and the
>     phoenix-spark.jar for the Phoenix RDD.
> 
>     On 9/14/18 1:08 PM, Saif Addin wrote:
>      > Hi, I am attempting to make connection with Spark but no success
>     so far.
>      >
>      > For writing into Phoenix, I am trying this:
>      >
>      > tdd.toDF("ID", "COL1", "COL2",
>      > "COL3").write.format("org.apache.phoenix.spark").option("zkUrl",
>      > "zookeper-host-url:2181").option("table",
>      > htablename).mode("overwrite").save()
>      >
>      > But getting:
>      > *java.sql.SQLException: ERROR 103 (08004): Unable to establish
>     connection.*
>      > *
>      > *
>      > For reading, on the other hand, attempting this:
>      >
>      > val hbConf = HBaseConfiguration.create()
>      > val hbaseSitePath = "/etc/hbase/conf/hbase-site.xml"
>      > hbConf.addResource(new Path(hbaseSitePath))
>      >
>      > spark.sqlContext.phoenixTableAsDataFrame("VISTA_409X68",
>     Array("ID"),
>      > conf = hbConf)
>      >
>      > Gets me
>      > *java.lang.NoClassDefFoundError: Could not initialize class
>      > org.apache.phoenix.query.QueryServicesOptions*
>      > *
>      > *
>      > I have added phoenix-queryserver-5.0.0-HBase-2.0.jar and
>      > phoenix-queryserver-client-5.0.0-HBase-2.0.jar
>      > Any thoughts? I have an hbase-site.xml file with more
>     configuration but
>      > not sure how to get it to be read in the saving instance.
>      > Thanks
>      >
>      > On Thu, Sep 13, 2018 at 11:38 AM Josh Elser <elserj@apache.org
>     <ma...@apache.org>
>      > <mailto:elserj@apache.org <ma...@apache.org>>> wrote:
>      >
>      >     Pretty sure we ran tests with Spark 2.3 with Phoenix 5.0. Not
>     sure if
>      >     Spark has already moved beyond that.
>      >
>      >     On 9/12/18 11:00 PM, Saif Addin wrote:
>      >      > Thanks, we'll try Spark Connector then. Thought it didn't
>     support
>      >     newest
>      >      > Spark Versions
>      >      >
>      >      > On Wed, Sep 12, 2018 at 11:03 PM Jaanai Zhang
>      >     <cloud.poster@gmail.com <ma...@gmail.com>
>     <mailto:cloud.poster@gmail.com <ma...@gmail.com>>
>      >      > <mailto:cloud.poster@gmail.com
>     <ma...@gmail.com> <mailto:cloud.poster@gmail.com
>     <ma...@gmail.com>>>>
>      >     wrote:
>      >      >
>      >      >     It seems columns data missing mapping information of the
>      >     schema. if
>      >      >     you want to use this way to write HBase table,  you
>     can create an
>      >      >     HBase table and uses Phoenix mapping it.
>      >      >
>      >      >     ----------------------------------------
>      >      >         Jaanai Zhang
>      >      >         Best regards!
>      >      >
>      >      >
>      >      >
>      >      >     Thomas D'Silva <tdsilva@salesforce.com
>     <ma...@salesforce.com>
>      >     <mailto:tdsilva@salesforce.com <ma...@salesforce.com>>
>      >      >     <mailto:tdsilva@salesforce.com
>     <ma...@salesforce.com>
>      >     <mailto:tdsilva@salesforce.com
>     <ma...@salesforce.com>>>> 于2018年9月13日周四 上午6:03写道:
>      >      >
>      >      >         Is there a reason you didn't use the
>     spark-connector to
>      >      >         serialize your data?
>      >      >
>      >      >         On Wed, Sep 12, 2018 at 2:28 PM, Saif Addin
>      >     <saif1988@gmail.com <ma...@gmail.com>
>     <mailto:saif1988@gmail.com <ma...@gmail.com>>
>      >      >         <mailto:saif1988@gmail.com
>     <ma...@gmail.com> <mailto:saif1988@gmail.com
>     <ma...@gmail.com>>>>
>      >     wrote:
>      >      >
>      >      >             Thank you Josh! That was helpful. Indeed,
>     there was a
>      >     salt
>      >      >             bucket on the table, and the key-column now shows
>      >     correctly.
>      >      >
>      >      >             However, the problem still persists in that
>     the rest
>      >     of the
>      >      >             columns show as completely empty on Phoenix
>     (appear
>      >      >             correctly on Hbase). We'll be looking into
>     this but
>      >     if you
>      >      >             have any further advice, appreciated.
>      >      >
>      >      >             Saif
>      >      >
>      >      >             On Wed, Sep 12, 2018 at 5:50 PM Josh Elser
>      >      >             <elserj@apache.org <ma...@apache.org>
>     <mailto:elserj@apache.org <ma...@apache.org>>
>      >     <mailto:elserj@apache.org <ma...@apache.org>
>     <mailto:elserj@apache.org <ma...@apache.org>>>> wrote:
>      >      >
>      >      >                 Reminder: Using Phoenix internals forces
>     you to
>      >      >                 understand exactly how
>      >      >                 the version of Phoenix that you're using
>     serializes
>      >      >                 data. Is there a
>      >      >                 reason you're not using SQL to interact
>     with Phoenix?
>      >      >
>      >      >                 Sounds to me that Phoenix is expecting
>     more data
>      >     at the
>      >      >                 head of your
>      >      >                 rowkey. Maybe a salt bucket that you've
>     defined
>      >     on the
>      >      >                 table but not
>      >      >                 created?
>      >      >
>      >      >                 On 9/12/18 4:32 PM, Saif Addin wrote:
>      >      >                  > Hi all,
>      >      >                  >
>      >      >                  > We're trying to write tables with all
>     string
>      >     columns
>      >      >                 from spark.
>      >      >                  > We are not using the Spark Connector,
>     instead
>      >     we are
>      >      >                 directly writing
>      >      >                  > byte arrays from RDDs.
>      >      >                  >
>      >      >                  > The process works fine, and Hbase
>     receives the
>      >     data
>      >      >                 correctly, and
>      >      >                  > content is consistent.
>      >      >                  >
>      >      >                  > However reading the table from Phoenix, we
>      >     notice the
>      >      >                 first character of
>      >      >                  > strings are missing. This sounds like
>     it's a byte
>      >      >                 encoding issue, but
>      >      >                  > we're at loss. We're using PVarchar to
>      >     generate bytes.
>      >      >                  >
>      >      >                  > Here's the snippet of code creating the
>     RDD:
>      >      >                  >
>      >      >                  > val tdd = pdd.flatMap(x => {
>      >      >                  >    val rowKey =
>     PVarchar.INSTANCE.toBytes(x._1)
>      >      >                  >    for(i <- 0 until cols.length) yield {
>      >      >                  >      other stuff for other columns ...
>      >      >                  >      ...
>      >      >                  >      (rowKey, (column1, column2, column3))
>      >      >                  >    }
>      >      >                  > })
>      >      >                  >
>      >      >                  > ...
>      >      >                  >
>      >      >                  > We then create the following output to
>     be written
>      >      >                 down in Hbase
>      >      >                  >
>      >      >                  > val output = tdd.map(x => {
>      >      >                  >      val rowKeyByte: Array[Byte] = x._1
>      >      >                  >      val immutableRowKey = new
>      >      >                 ImmutableBytesWritable(rowKeyByte)
>      >      >                  >
>      >      >                  >      val kv = new KeyValue(rowKeyByte,
>      >      >                  >         
>     PVarchar.INSTANCE.toBytes(column1),
>      >      >                  >         
>     PVarchar.INSTANCE.toBytes(column2),
>      >      >                  >        PVarchar.INSTANCE.toBytes(column3)
>      >      >                  >      )
>      >      >                  >      (immutableRowKey, kv)
>      >      >                  > })
>      >      >                  >
>      >      >                  > By the way, we are using
>     *KryoSerializer* in
>      >     order to
>      >      >                 be able to
>      >      >                  > serialize all classes necessary for Hbase
>      >     (KeyValue,
>      >      >                 BytesWritable, etc).
>      >      >                  >
>      >      >                  > The key of this table is the one
>     missing data when
>      >      >                 queried from Phoenix.
>      >      >                  > So we guess something is wrong with the
>     byte ser.
>      >      >                  >
>      >      >                  > Any ideas? Appreciated!
>      >      >                  > Saif
>      >      >
>      >      >
>      >
> 

Re: Missing content in phoenix after writing from Spark

Posted by Saif Addin <sa...@gmail.com>.
Thanks for the patience, sorry maybe I sent incomplete information. We are
loading the following jars and still getting: *executor 1):
java.lang.NoClassDefFoundError: Could not initialize class
org.apache.phoenix.query.QueryServicesOptions*

http://central.maven.org/maven2/org/apache/hbase/hbase-client/2.1.0/hbase-client-2.1.0.jar
http://central.maven.org/maven2/org/apache/hbase/hbase-common/2.1.0/hbase-common-2.1.0.jar
http://central.maven.org/maven2/org/apache/hbase/hbase-hadoop-compat/2.1.0/hbase-hadoop-compat-2.1.0.jar
http://central.maven.org/maven2/org/apache/hbase/hbase-mapreduce/2.1.0/hbase-mapreduce-2.1.0.jar
http://central.maven.org/maven2/org/apache/hbase/thirdparty/hbase-shaded-miscellaneous/2.1.0/hbase-shaded-miscellaneous-2.1.0.jar
http://central.maven.org/maven2/org/apache/hbase/hbase-protocol/2.1.0/hbase-protocol-2.1.0.jar
http://central.maven.org/maven2/org/apache/hbase/hbase-protocol-shaded/2.1.0/hbase-protocol-shaded-2.1.0.jar
http://central.maven.org/maven2/org/apache/hbase/thirdparty/hbase-shaded-protobuf/2.1.0/hbase-shaded-protobuf-2.1.0.jar
http://central.maven.org/maven2/org/apache/hbase/thirdparty/hbase-shaded-netty/2.1.0/hbase-shaded-netty-2.1.0.jar
http://central.maven.org/maven2/org/apache/hbase/hbase-server/2.1.0/hbase-server-2.1.0.jar
http://central.maven.org/maven2/org/apache/hbase/hbase-hadoop2-compat/2.1.0/hbase-hadoop2-compat-2.1.0.jar
http://central.maven.org/maven2/org/apache/hbase/hbase-metrics/2.1.0/hbase-metrics-2.1.0.jar
http://central.maven.org/maven2/org/apache/hbase/hbase-metrics-api/2.1.0/hbase-metrics-api-2.1.0.jar
http://central.maven.org/maven2/org/apache/hbase/hbase-zookeeper/2.1.0/hbase-zookeeper-2.1.0.jar

http://central.maven.org/maven2/org/apache/phoenix/phoenix-spark/5.0.0-HBase-2.0/phoenix-spark-5.0.0-HBase-2.0.jar
http://central.maven.org/maven2/org/apache/phoenix/phoenix-core/5.0.0-HBase-2.0/phoenix-core-5.0.0-HBase-2.0.jar
http://central.maven.org/maven2/org/apache/phoenix/phoenix-queryserver/5.0.0-HBase-2.0/phoenix-queryserver-5.0.0-HBase-2.0.jar
http://central.maven.org/maven2/org/apache/phoenix/phoenix-queryserver-client/5.0.0-HBase-2.0/phoenix-queryserver-client-5.0.0-HBase-2.0.jar

http://central.maven.org/maven2/org/apache/twill/twill-zookeeper/0.13.0/twill-zookeeper-0.13.0.jar
http://central.maven.org/maven2/org/apache/twill/twill-discovery-core/0.13.0/twill-discovery-core-0.13.0.jar

Not sure which one I could be missing??

On Fri, Sep 14, 2018 at 7:34 PM Josh Elser <el...@apache.org> wrote:

> Uh, you're definitely not using the right JARs :)
>
> You'll want the phoenix-client.jar for the Phoenix JDBC driver and the
> phoenix-spark.jar for the Phoenix RDD.
>
> On 9/14/18 1:08 PM, Saif Addin wrote:
> > Hi, I am attempting to make connection with Spark but no success so far.
> >
> > For writing into Phoenix, I am trying this:
> >
> > tdd.toDF("ID", "COL1", "COL2",
> > "COL3").write.format("org.apache.phoenix.spark").option("zkUrl",
> > "zookeper-host-url:2181").option("table",
> > htablename).mode("overwrite").save()
> >
> > But getting:
> > *java.sql.SQLException: ERROR 103 (08004): Unable to establish
> connection.*
> > *
> > *
> > For reading, on the other hand, attempting this:
> >
> > val hbConf = HBaseConfiguration.create()
> > val hbaseSitePath = "/etc/hbase/conf/hbase-site.xml"
> > hbConf.addResource(new Path(hbaseSitePath))
> >
> > spark.sqlContext.phoenixTableAsDataFrame("VISTA_409X68", Array("ID"),
> > conf = hbConf)
> >
> > Gets me
> > *java.lang.NoClassDefFoundError: Could not initialize class
> > org.apache.phoenix.query.QueryServicesOptions*
> > *
> > *
> > I have added phoenix-queryserver-5.0.0-HBase-2.0.jar and
> > phoenix-queryserver-client-5.0.0-HBase-2.0.jar
> > Any thoughts? I have an hbase-site.xml file with more configuration but
> > not sure how to get it to be read in the saving instance.
> > Thanks
> >
> > On Thu, Sep 13, 2018 at 11:38 AM Josh Elser <elserj@apache.org
> > <ma...@apache.org>> wrote:
> >
> >     Pretty sure we ran tests with Spark 2.3 with Phoenix 5.0. Not sure if
> >     Spark has already moved beyond that.
> >
> >     On 9/12/18 11:00 PM, Saif Addin wrote:
> >      > Thanks, we'll try Spark Connector then. Thought it didn't support
> >     newest
> >      > Spark Versions
> >      >
> >      > On Wed, Sep 12, 2018 at 11:03 PM Jaanai Zhang
> >     <cloud.poster@gmail.com <ma...@gmail.com>
> >      > <mailto:cloud.poster@gmail.com <ma...@gmail.com>>>
> >     wrote:
> >      >
> >      >     It seems columns data missing mapping information of the
> >     schema. if
> >      >     you want to use this way to write HBase table,  you can
> create an
> >      >     HBase table and uses Phoenix mapping it.
> >      >
> >      >     ----------------------------------------
> >      >         Jaanai Zhang
> >      >         Best regards!
> >      >
> >      >
> >      >
> >      >     Thomas D'Silva <tdsilva@salesforce.com
> >     <ma...@salesforce.com>
> >      >     <mailto:tdsilva@salesforce.com
> >     <ma...@salesforce.com>>> 于2018年9月13日周四 上午6:03写道:
> >      >
> >      >         Is there a reason you didn't use the spark-connector to
> >      >         serialize your data?
> >      >
> >      >         On Wed, Sep 12, 2018 at 2:28 PM, Saif Addin
> >     <saif1988@gmail.com <ma...@gmail.com>
> >      >         <mailto:saif1988@gmail.com <ma...@gmail.com>>>
> >     wrote:
> >      >
> >      >             Thank you Josh! That was helpful. Indeed, there was a
> >     salt
> >      >             bucket on the table, and the key-column now shows
> >     correctly.
> >      >
> >      >             However, the problem still persists in that the rest
> >     of the
> >      >             columns show as completely empty on Phoenix (appear
> >      >             correctly on Hbase). We'll be looking into this but
> >     if you
> >      >             have any further advice, appreciated.
> >      >
> >      >             Saif
> >      >
> >      >             On Wed, Sep 12, 2018 at 5:50 PM Josh Elser
> >      >             <elserj@apache.org <ma...@apache.org>
> >     <mailto:elserj@apache.org <ma...@apache.org>>> wrote:
> >      >
> >      >                 Reminder: Using Phoenix internals forces you to
> >      >                 understand exactly how
> >      >                 the version of Phoenix that you're using
> serializes
> >      >                 data. Is there a
> >      >                 reason you're not using SQL to interact with
> Phoenix?
> >      >
> >      >                 Sounds to me that Phoenix is expecting more data
> >     at the
> >      >                 head of your
> >      >                 rowkey. Maybe a salt bucket that you've defined
> >     on the
> >      >                 table but not
> >      >                 created?
> >      >
> >      >                 On 9/12/18 4:32 PM, Saif Addin wrote:
> >      >                  > Hi all,
> >      >                  >
> >      >                  > We're trying to write tables with all string
> >     columns
> >      >                 from spark.
> >      >                  > We are not using the Spark Connector, instead
> >     we are
> >      >                 directly writing
> >      >                  > byte arrays from RDDs.
> >      >                  >
> >      >                  > The process works fine, and Hbase receives the
> >     data
> >      >                 correctly, and
> >      >                  > content is consistent.
> >      >                  >
> >      >                  > However reading the table from Phoenix, we
> >     notice the
> >      >                 first character of
> >      >                  > strings are missing. This sounds like it's a
> byte
> >      >                 encoding issue, but
> >      >                  > we're at loss. We're using PVarchar to
> >     generate bytes.
> >      >                  >
> >      >                  > Here's the snippet of code creating the RDD:
> >      >                  >
> >      >                  > val tdd = pdd.flatMap(x => {
> >      >                  >    val rowKey = PVarchar.INSTANCE.toBytes(x._1)
> >      >                  >    for(i <- 0 until cols.length) yield {
> >      >                  >      other stuff for other columns ...
> >      >                  >      ...
> >      >                  >      (rowKey, (column1, column2, column3))
> >      >                  >    }
> >      >                  > })
> >      >                  >
> >      >                  > ...
> >      >                  >
> >      >                  > We then create the following output to be
> written
> >      >                 down in Hbase
> >      >                  >
> >      >                  > val output = tdd.map(x => {
> >      >                  >      val rowKeyByte: Array[Byte] = x._1
> >      >                  >      val immutableRowKey = new
> >      >                 ImmutableBytesWritable(rowKeyByte)
> >      >                  >
> >      >                  >      val kv = new KeyValue(rowKeyByte,
> >      >                  >          PVarchar.INSTANCE.toBytes(column1),
> >      >                  >          PVarchar.INSTANCE.toBytes(column2),
> >      >                  >        PVarchar.INSTANCE.toBytes(column3)
> >      >                  >      )
> >      >                  >      (immutableRowKey, kv)
> >      >                  > })
> >      >                  >
> >      >                  > By the way, we are using *KryoSerializer* in
> >     order to
> >      >                 be able to
> >      >                  > serialize all classes necessary for Hbase
> >     (KeyValue,
> >      >                 BytesWritable, etc).
> >      >                  >
> >      >                  > The key of this table is the one missing data
> when
> >      >                 queried from Phoenix.
> >      >                  > So we guess something is wrong with the byte
> ser.
> >      >                  >
> >      >                  > Any ideas? Appreciated!
> >      >                  > Saif
> >      >
> >      >
> >
>

Re: Missing content in phoenix after writing from Spark

Posted by Josh Elser <el...@apache.org>.
Please retain the mailing list in your replies.

On 9/17/18 2:32 PM, Saif Addin wrote:
> Thanks for the patience, sorry I sent incomplete information. We are 
> loading the following jars and still getting: */executor 1): 
> java.lang.NoClassDefFoundError: Could not initialize class 
> org.apache.phoenix.query.QueryServicesOptions/*
> */
> /*
> http://central.maven.org/maven2/org/apache/hbase/hbase-client/2.1.0/hbase-client-2.1.0.jar
> http://central.maven.org/maven2/org/apache/hbase/hbase-common/2.1.0/hbase-common-2.1.0.jar
> http://central.maven.org/maven2/org/apache/hbase/hbase-hadoop-compat/2.1.0/hbase-hadoop-compat-2.1.0.jar
> http://central.maven.org/maven2/org/apache/hbase/hbase-mapreduce/2.1.0/hbase-mapreduce-2.1.0.jar
> http://central.maven.org/maven2/org/apache/hbase/thirdparty/hbase-shaded-miscellaneous/2.1.0/hbase-shaded-miscellaneous-2.1.0.jar
> http://central.maven.org/maven2/org/apache/hbase/hbase-protocol/2.1.0/hbase-protocol-2.1.0.jar
> http://central.maven.org/maven2/org/apache/hbase/hbase-protocol-shaded/2.1.0/hbase-protocol-shaded-2.1.0.jar
> http://central.maven.org/maven2/org/apache/hbase/thirdparty/hbase-shaded-protobuf/2.1.0/hbase-shaded-protobuf-2.1.0.jar
> http://central.maven.org/maven2/org/apache/hbase/thirdparty/hbase-shaded-netty/2.1.0/hbase-shaded-netty-2.1.0.jar
> http://central.maven.org/maven2/org/apache/hbase/hbase-server/2.1.0/hbase-server-2.1.0.jar
> http://central.maven.org/maven2/org/apache/hbase/hbase-hadoop2-compat/2.1.0/hbase-hadoop2-compat-2.1.0.jar
> http://central.maven.org/maven2/org/apache/hbase/hbase-metrics/2.1.0/hbase-metrics-2.1.0.jar
> http://central.maven.org/maven2/org/apache/hbase/hbase-metrics-api/2.1.0/hbase-metrics-api-2.1.0.jar
> http://central.maven.org/maven2/org/apache/hbase/hbase-zookeeper/2.1.0/hbase-zookeeper-2.1.0.jar
> 
> http://central.maven.org/maven2/org/apache/phoenix/phoenix-spark/5.0.0-HBase-2.0/phoenix-spark-5.0.0-HBase-2.0.jar
> http://central.maven.org/maven2/org/apache/phoenix/phoenix-core/5.0.0-HBase-2.0/phoenix-core-5.0.0-HBase-2.0.jar
> http://central.maven.org/maven2/org/apache/phoenix/phoenix-queryserver/5.0.0-HBase-2.0/phoenix-queryserver-5.0.0-HBase-2.0.jar
> http://central.maven.org/maven2/org/apache/phoenix/phoenix-queryserver-client/5.0.0-HBase-2.0/phoenix-queryserver-client-5.0.0-HBase-2.0.jar
> 
> http://central.maven.org/maven2/org/apache/twill/twill-zookeeper/0.13.0/twill-zookeeper-0.13.0.jar
> http://central.maven.org/maven2/org/apache/twill/twill-discovery-core/0.13.0/twill-discovery-core-0.13.0.jar
> 
> Not sure which one I could be missing
> 
> On Fri, Sep 14, 2018 at 7:34 PM Josh Elser <elserj@apache.org 
> <ma...@apache.org>> wrote:
> 
>     Uh, you're definitely not using the right JARs :)
> 
>     You'll want the phoenix-client.jar for the Phoenix JDBC driver and the
>     phoenix-spark.jar for the Phoenix RDD.
> 
>     On 9/14/18 1:08 PM, Saif Addin wrote:
>      > Hi, I am attempting to make connection with Spark but no success
>     so far.
>      >
>      > For writing into Phoenix, I am trying this:
>      >
>      > tdd.toDF("ID", "COL1", "COL2",
>      > "COL3").write.format("org.apache.phoenix.spark").option("zkUrl",
>      > "zookeper-host-url:2181").option("table",
>      > htablename).mode("overwrite").save()
>      >
>      > But getting:
>      > *java.sql.SQLException: ERROR 103 (08004): Unable to establish
>     connection.*
>      > *
>      > *
>      > For reading, on the other hand, attempting this:
>      >
>      > val hbConf = HBaseConfiguration.create()
>      > val hbaseSitePath = "/etc/hbase/conf/hbase-site.xml"
>      > hbConf.addResource(new Path(hbaseSitePath))
>      >
>      > spark.sqlContext.phoenixTableAsDataFrame("VISTA_409X68",
>     Array("ID"),
>      > conf = hbConf)
>      >
>      > Gets me
>      > *java.lang.NoClassDefFoundError: Could not initialize class
>      > org.apache.phoenix.query.QueryServicesOptions*
>      > *
>      > *
>      > I have added phoenix-queryserver-5.0.0-HBase-2.0.jar and
>      > phoenix-queryserver-client-5.0.0-HBase-2.0.jar
>      > Any thoughts? I have an hbase-site.xml file with more
>     configuration but
>      > not sure how to get it to be read in the saving instance.
>      > Thanks
>      >
>      > On Thu, Sep 13, 2018 at 11:38 AM Josh Elser <elserj@apache.org
>     <ma...@apache.org>
>      > <mailto:elserj@apache.org <ma...@apache.org>>> wrote:
>      >
>      >     Pretty sure we ran tests with Spark 2.3 with Phoenix 5.0. Not
>     sure if
>      >     Spark has already moved beyond that.
>      >
>      >     On 9/12/18 11:00 PM, Saif Addin wrote:
>      >      > Thanks, we'll try Spark Connector then. Thought it didn't
>     support
>      >     newest
>      >      > Spark Versions
>      >      >
>      >      > On Wed, Sep 12, 2018 at 11:03 PM Jaanai Zhang
>      >     <cloud.poster@gmail.com <ma...@gmail.com>
>     <mailto:cloud.poster@gmail.com <ma...@gmail.com>>
>      >      > <mailto:cloud.poster@gmail.com
>     <ma...@gmail.com> <mailto:cloud.poster@gmail.com
>     <ma...@gmail.com>>>>
>      >     wrote:
>      >      >
>      >      >     It seems columns data missing mapping information of the
>      >     schema. if
>      >      >     you want to use this way to write HBase table,  you
>     can create an
>      >      >     HBase table and uses Phoenix mapping it.
>      >      >
>      >      >     ----------------------------------------
>      >      >         Jaanai Zhang
>      >      >         Best regards!
>      >      >
>      >      >
>      >      >
>      >      >     Thomas D'Silva <tdsilva@salesforce.com
>     <ma...@salesforce.com>
>      >     <mailto:tdsilva@salesforce.com <ma...@salesforce.com>>
>      >      >     <mailto:tdsilva@salesforce.com
>     <ma...@salesforce.com>
>      >     <mailto:tdsilva@salesforce.com
>     <ma...@salesforce.com>>>> 于2018年9月13日周四 上午6:03写道:
>      >      >
>      >      >         Is there a reason you didn't use the
>     spark-connector to
>      >      >         serialize your data?
>      >      >
>      >      >         On Wed, Sep 12, 2018 at 2:28 PM, Saif Addin
>      >     <saif1988@gmail.com <ma...@gmail.com>
>     <mailto:saif1988@gmail.com <ma...@gmail.com>>
>      >      >         <mailto:saif1988@gmail.com
>     <ma...@gmail.com> <mailto:saif1988@gmail.com
>     <ma...@gmail.com>>>>
>      >     wrote:
>      >      >
>      >      >             Thank you Josh! That was helpful. Indeed,
>     there was a
>      >     salt
>      >      >             bucket on the table, and the key-column now shows
>      >     correctly.
>      >      >
>      >      >             However, the problem still persists in that
>     the rest
>      >     of the
>      >      >             columns show as completely empty on Phoenix
>     (appear
>      >      >             correctly on Hbase). We'll be looking into
>     this but
>      >     if you
>      >      >             have any further advice, appreciated.
>      >      >
>      >      >             Saif
>      >      >
>      >      >             On Wed, Sep 12, 2018 at 5:50 PM Josh Elser
>      >      >             <elserj@apache.org <ma...@apache.org>
>     <mailto:elserj@apache.org <ma...@apache.org>>
>      >     <mailto:elserj@apache.org <ma...@apache.org>
>     <mailto:elserj@apache.org <ma...@apache.org>>>> wrote:
>      >      >
>      >      >                 Reminder: Using Phoenix internals forces
>     you to
>      >      >                 understand exactly how
>      >      >                 the version of Phoenix that you're using
>     serializes
>      >      >                 data. Is there a
>      >      >                 reason you're not using SQL to interact
>     with Phoenix?
>      >      >
>      >      >                 Sounds to me that Phoenix is expecting
>     more data
>      >     at the
>      >      >                 head of your
>      >      >                 rowkey. Maybe a salt bucket that you've
>     defined
>      >     on the
>      >      >                 table but not
>      >      >                 created?
>      >      >
>      >      >                 On 9/12/18 4:32 PM, Saif Addin wrote:
>      >      >                  > Hi all,
>      >      >                  >
>      >      >                  > We're trying to write tables with all
>     string
>      >     columns
>      >      >                 from spark.
>      >      >                  > We are not using the Spark Connector,
>     instead
>      >     we are
>      >      >                 directly writing
>      >      >                  > byte arrays from RDDs.
>      >      >                  >
>      >      >                  > The process works fine, and Hbase
>     receives the
>      >     data
>      >      >                 correctly, and
>      >      >                  > content is consistent.
>      >      >                  >
>      >      >                  > However reading the table from Phoenix, we
>      >     notice the
>      >      >                 first character of
>      >      >                  > strings are missing. This sounds like
>     it's a byte
>      >      >                 encoding issue, but
>      >      >                  > we're at loss. We're using PVarchar to
>      >     generate bytes.
>      >      >                  >
>      >      >                  > Here's the snippet of code creating the
>     RDD:
>      >      >                  >
>      >      >                  > val tdd = pdd.flatMap(x => {
>      >      >                  >    val rowKey =
>     PVarchar.INSTANCE.toBytes(x._1)
>      >      >                  >    for(i <- 0 until cols.length) yield {
>      >      >                  >      other stuff for other columns ...
>      >      >                  >      ...
>      >      >                  >      (rowKey, (column1, column2, column3))
>      >      >                  >    }
>      >      >                  > })
>      >      >                  >
>      >      >                  > ...
>      >      >                  >
>      >      >                  > We then create the following output to
>     be written
>      >      >                 down in Hbase
>      >      >                  >
>      >      >                  > val output = tdd.map(x => {
>      >      >                  >      val rowKeyByte: Array[Byte] = x._1
>      >      >                  >      val immutableRowKey = new
>      >      >                 ImmutableBytesWritable(rowKeyByte)
>      >      >                  >
>      >      >                  >      val kv = new KeyValue(rowKeyByte,
>      >      >                  >         
>     PVarchar.INSTANCE.toBytes(column1),
>      >      >                  >         
>     PVarchar.INSTANCE.toBytes(column2),
>      >      >                  >        PVarchar.INSTANCE.toBytes(column3)
>      >      >                  >      )
>      >      >                  >      (immutableRowKey, kv)
>      >      >                  > })
>      >      >                  >
>      >      >                  > By the way, we are using
>     *KryoSerializer* in
>      >     order to
>      >      >                 be able to
>      >      >                  > serialize all classes necessary for Hbase
>      >     (KeyValue,
>      >      >                 BytesWritable, etc).
>      >      >                  >
>      >      >                  > The key of this table is the one
>     missing data when
>      >      >                 queried from Phoenix.
>      >      >                  > So we guess something is wrong with the
>     byte ser.
>      >      >                  >
>      >      >                  > Any ideas? Appreciated!
>      >      >                  > Saif
>      >      >
>      >      >
>      >
> 

Re: Missing content in phoenix after writing from Spark

Posted by Saif Addin <sa...@gmail.com>.
Hi, I am attempting to make connection with Spark but no success so far.

For writing into Phoenix, I am trying this:

tdd.toDF("ID", "COL1", "COL2",
"COL3").write.format("org.apache.phoenix.spark").option("zkUrl",
"zookeper-host-url:2181").option("table",
htablename).mode("overwrite").save()

But getting:
*java.sql.SQLException: ERROR 103 (08004): Unable to establish connection.*

For reading, on the other hand, attempting this:

val hbConf = HBaseConfiguration.create()
val hbaseSitePath = "/etc/hbase/conf/hbase-site.xml"
hbConf.addResource(new Path(hbaseSitePath))

spark.sqlContext.phoenixTableAsDataFrame("VISTA_409X68", Array("ID"), conf
= hbConf)

Gets me
*java.lang.NoClassDefFoundError: Could not initialize class
org.apache.phoenix.query.QueryServicesOptions*

I have added phoenix-queryserver-5.0.0-HBase-2.0.jar and
phoenix-queryserver-client-5.0.0-HBase-2.0.jar

Any thoughts? I have an hbase-site.xml file with more configuration but not
sure how to get it to be read in the saving instance.

Thanks

On Thu, Sep 13, 2018 at 11:38 AM Josh Elser <el...@apache.org> wrote:

> Pretty sure we ran tests with Spark 2.3 with Phoenix 5.0. Not sure if
> Spark has already moved beyond that.
>
> On 9/12/18 11:00 PM, Saif Addin wrote:
> > Thanks, we'll try Spark Connector then. Thought it didn't support newest
> > Spark Versions
> >
> > On Wed, Sep 12, 2018 at 11:03 PM Jaanai Zhang <cloud.poster@gmail.com
> > <ma...@gmail.com>> wrote:
> >
> >     It seems columns data missing mapping information of the schema. if
> >     you want to use this way to write HBase table,  you can create an
> >     HBase table and uses Phoenix mapping it.
> >
> >     ----------------------------------------
> >         Jaanai Zhang
> >         Best regards!
> >
> >
> >
> >     Thomas D'Silva <tdsilva@salesforce.com
> >     <ma...@salesforce.com>> 于2018年9月13日周四 上午6:03写道:
> >
> >         Is there a reason you didn't use the spark-connector to
> >         serialize your data?
> >
> >         On Wed, Sep 12, 2018 at 2:28 PM, Saif Addin <saif1988@gmail.com
> >         <ma...@gmail.com>> wrote:
> >
> >             Thank you Josh! That was helpful. Indeed, there was a salt
> >             bucket on the table, and the key-column now shows correctly.
> >
> >             However, the problem still persists in that the rest of the
> >             columns show as completely empty on Phoenix (appear
> >             correctly on Hbase). We'll be looking into this but if you
> >             have any further advice, appreciated.
> >
> >             Saif
> >
> >             On Wed, Sep 12, 2018 at 5:50 PM Josh Elser
> >             <elserj@apache.org <ma...@apache.org>> wrote:
> >
> >                 Reminder: Using Phoenix internals forces you to
> >                 understand exactly how
> >                 the version of Phoenix that you're using serializes
> >                 data. Is there a
> >                 reason you're not using SQL to interact with Phoenix?
> >
> >                 Sounds to me that Phoenix is expecting more data at the
> >                 head of your
> >                 rowkey. Maybe a salt bucket that you've defined on the
> >                 table but not
> >                 created?
> >
> >                 On 9/12/18 4:32 PM, Saif Addin wrote:
> >                  > Hi all,
> >                  >
> >                  > We're trying to write tables with all string columns
> >                 from spark.
> >                  > We are not using the Spark Connector, instead we are
> >                 directly writing
> >                  > byte arrays from RDDs.
> >                  >
> >                  > The process works fine, and Hbase receives the data
> >                 correctly, and
> >                  > content is consistent.
> >                  >
> >                  > However reading the table from Phoenix, we notice the
> >                 first character of
> >                  > strings are missing. This sounds like it's a byte
> >                 encoding issue, but
> >                  > we're at loss. We're using PVarchar to generate bytes.
> >                  >
> >                  > Here's the snippet of code creating the RDD:
> >                  >
> >                  > val tdd = pdd.flatMap(x => {
> >                  >    val rowKey = PVarchar.INSTANCE.toBytes(x._1)
> >                  >    for(i <- 0 until cols.length) yield {
> >                  >      other stuff for other columns ...
> >                  >      ...
> >                  >      (rowKey, (column1, column2, column3))
> >                  >    }
> >                  > })
> >                  >
> >                  > ...
> >                  >
> >                  > We then create the following output to be written
> >                 down in Hbase
> >                  >
> >                  > val output = tdd.map(x => {
> >                  >      val rowKeyByte: Array[Byte] = x._1
> >                  >      val immutableRowKey = new
> >                 ImmutableBytesWritable(rowKeyByte)
> >                  >
> >                  >      val kv = new KeyValue(rowKeyByte,
> >                  >          PVarchar.INSTANCE.toBytes(column1),
> >                  >          PVarchar.INSTANCE.toBytes(column2),
> >                  >        PVarchar.INSTANCE.toBytes(column3)
> >                  >      )
> >                  >      (immutableRowKey, kv)
> >                  > })
> >                  >
> >                  > By the way, we are using *KryoSerializer* in order to
> >                 be able to
> >                  > serialize all classes necessary for Hbase (KeyValue,
> >                 BytesWritable, etc).
> >                  >
> >                  > The key of this table is the one missing data when
> >                 queried from Phoenix.
> >                  > So we guess something is wrong with the byte ser.
> >                  >
> >                  > Any ideas? Appreciated!
> >                  > Saif
> >
> >
>

Re: Missing content in phoenix after writing from Spark

Posted by Josh Elser <el...@apache.org>.
Pretty sure we ran tests with Spark 2.3 with Phoenix 5.0. Not sure if 
Spark has already moved beyond that.

On 9/12/18 11:00 PM, Saif Addin wrote:
> Thanks, we'll try Spark Connector then. Thought it didn't support newest 
> Spark Versions
> 
> On Wed, Sep 12, 2018 at 11:03 PM Jaanai Zhang <cloud.poster@gmail.com 
> <ma...@gmail.com>> wrote:
> 
>     It seems columns data missing mapping information of the schema. if
>     you want to use this way to write HBase table,  you can create an
>     HBase table and uses Phoenix mapping it.
> 
>     ----------------------------------------
>         Jaanai Zhang
>         Best regards!
> 
> 
> 
>     Thomas D'Silva <tdsilva@salesforce.com
>     <ma...@salesforce.com>> 于2018年9月13日周四 上午6:03写道:
> 
>         Is there a reason you didn't use the spark-connector to
>         serialize your data?
> 
>         On Wed, Sep 12, 2018 at 2:28 PM, Saif Addin <saif1988@gmail.com
>         <ma...@gmail.com>> wrote:
> 
>             Thank you Josh! That was helpful. Indeed, there was a salt
>             bucket on the table, and the key-column now shows correctly.
> 
>             However, the problem still persists in that the rest of the
>             columns show as completely empty on Phoenix (appear
>             correctly on Hbase). We'll be looking into this but if you
>             have any further advice, appreciated.
> 
>             Saif
> 
>             On Wed, Sep 12, 2018 at 5:50 PM Josh Elser
>             <elserj@apache.org <ma...@apache.org>> wrote:
> 
>                 Reminder: Using Phoenix internals forces you to
>                 understand exactly how
>                 the version of Phoenix that you're using serializes
>                 data. Is there a
>                 reason you're not using SQL to interact with Phoenix?
> 
>                 Sounds to me that Phoenix is expecting more data at the
>                 head of your
>                 rowkey. Maybe a salt bucket that you've defined on the
>                 table but not
>                 created?
> 
>                 On 9/12/18 4:32 PM, Saif Addin wrote:
>                  > Hi all,
>                  >
>                  > We're trying to write tables with all string columns
>                 from spark.
>                  > We are not using the Spark Connector, instead we are
>                 directly writing
>                  > byte arrays from RDDs.
>                  >
>                  > The process works fine, and Hbase receives the data
>                 correctly, and
>                  > content is consistent.
>                  >
>                  > However reading the table from Phoenix, we notice the
>                 first character of
>                  > strings are missing. This sounds like it's a byte
>                 encoding issue, but
>                  > we're at loss. We're using PVarchar to generate bytes.
>                  >
>                  > Here's the snippet of code creating the RDD:
>                  >
>                  > val tdd = pdd.flatMap(x => {
>                  >    val rowKey = PVarchar.INSTANCE.toBytes(x._1)
>                  >    for(i <- 0 until cols.length) yield {
>                  >      other stuff for other columns ...
>                  >      ...
>                  >      (rowKey, (column1, column2, column3))
>                  >    }
>                  > })
>                  >
>                  > ...
>                  >
>                  > We then create the following output to be written
>                 down in Hbase
>                  >
>                  > val output = tdd.map(x => {
>                  >      val rowKeyByte: Array[Byte] = x._1
>                  >      val immutableRowKey = new
>                 ImmutableBytesWritable(rowKeyByte)
>                  >
>                  >      val kv = new KeyValue(rowKeyByte,
>                  >          PVarchar.INSTANCE.toBytes(column1),
>                  >          PVarchar.INSTANCE.toBytes(column2),
>                  >        PVarchar.INSTANCE.toBytes(column3)
>                  >      )
>                  >      (immutableRowKey, kv)
>                  > })
>                  >
>                  > By the way, we are using *KryoSerializer* in order to
>                 be able to
>                  > serialize all classes necessary for Hbase (KeyValue,
>                 BytesWritable, etc).
>                  >
>                  > The key of this table is the one missing data when
>                 queried from Phoenix.
>                  > So we guess something is wrong with the byte ser.
>                  >
>                  > Any ideas? Appreciated!
>                  > Saif
> 
> 

Re: Missing content in phoenix after writing from Spark

Posted by Saif Addin <sa...@gmail.com>.
Thanks, we'll try Spark Connector then. Thought it didn't support newest
Spark Versions

On Wed, Sep 12, 2018 at 11:03 PM Jaanai Zhang <cl...@gmail.com>
wrote:

> It seems columns data missing mapping information of the schema. if you
> want to use this way to write HBase table,  you can create an HBase table
> and uses Phoenix mapping it.
>
> ----------------------------------------
>    Jaanai Zhang
>    Best regards!
>
>
>
> Thomas D'Silva <td...@salesforce.com> 于2018年9月13日周四 上午6:03写道:
>
>> Is there a reason you didn't use the spark-connector to serialize your
>> data?
>>
>> On Wed, Sep 12, 2018 at 2:28 PM, Saif Addin <sa...@gmail.com> wrote:
>>
>>> Thank you Josh! That was helpful. Indeed, there was a salt bucket on the
>>> table, and the key-column now shows correctly.
>>>
>>> However, the problem still persists in that the rest of the columns show
>>> as completely empty on Phoenix (appear correctly on Hbase). We'll be
>>> looking into this but if you have any further advice, appreciated.
>>>
>>> Saif
>>>
>>> On Wed, Sep 12, 2018 at 5:50 PM Josh Elser <el...@apache.org> wrote:
>>>
>>>> Reminder: Using Phoenix internals forces you to understand exactly how
>>>> the version of Phoenix that you're using serializes data. Is there a
>>>> reason you're not using SQL to interact with Phoenix?
>>>>
>>>> Sounds to me that Phoenix is expecting more data at the head of your
>>>> rowkey. Maybe a salt bucket that you've defined on the table but not
>>>> created?
>>>>
>>>> On 9/12/18 4:32 PM, Saif Addin wrote:
>>>> > Hi all,
>>>> >
>>>> > We're trying to write tables with all string columns from spark.
>>>> > We are not using the Spark Connector, instead we are directly writing
>>>> > byte arrays from RDDs.
>>>> >
>>>> > The process works fine, and Hbase receives the data correctly, and
>>>> > content is consistent.
>>>> >
>>>> > However reading the table from Phoenix, we notice the first character
>>>> of
>>>> > strings are missing. This sounds like it's a byte encoding issue, but
>>>> > we're at loss. We're using PVarchar to generate bytes.
>>>> >
>>>> > Here's the snippet of code creating the RDD:
>>>> >
>>>> > val tdd = pdd.flatMap(x => {
>>>> >    val rowKey = PVarchar.INSTANCE.toBytes(x._1)
>>>> >    for(i <- 0 until cols.length) yield {
>>>> >      other stuff for other columns ...
>>>> >      ...
>>>> >      (rowKey, (column1, column2, column3))
>>>> >    }
>>>> > })
>>>> >
>>>> > ...
>>>> >
>>>> > We then create the following output to be written down in Hbase
>>>> >
>>>> > val output = tdd.map(x => {
>>>> >      val rowKeyByte: Array[Byte] = x._1
>>>> >      val immutableRowKey = new ImmutableBytesWritable(rowKeyByte)
>>>> >
>>>> >      val kv = new KeyValue(rowKeyByte,
>>>> >          PVarchar.INSTANCE.toBytes(column1),
>>>> >          PVarchar.INSTANCE.toBytes(column2),
>>>> >        PVarchar.INSTANCE.toBytes(column3)
>>>> >      )
>>>> >      (immutableRowKey, kv)
>>>> > })
>>>> >
>>>> > By the way, we are using *KryoSerializer* in order to be able to
>>>> > serialize all classes necessary for Hbase (KeyValue, BytesWritable,
>>>> etc).
>>>> >
>>>> > The key of this table is the one missing data when queried from
>>>> Phoenix.
>>>> > So we guess something is wrong with the byte ser.
>>>> >
>>>> > Any ideas? Appreciated!
>>>> > Saif
>>>>
>>>
>>

Re: Missing content in phoenix after writing from Spark

Posted by Jaanai Zhang <cl...@gmail.com>.
It seems columns data missing mapping information of the schema. if you
want to use this way to write HBase table,  you can create an HBase table
and uses Phoenix mapping it.

----------------------------------------
   Jaanai Zhang
   Best regards!



Thomas D'Silva <td...@salesforce.com> 于2018年9月13日周四 上午6:03写道:

> Is there a reason you didn't use the spark-connector to serialize your
> data?
>
> On Wed, Sep 12, 2018 at 2:28 PM, Saif Addin <sa...@gmail.com> wrote:
>
>> Thank you Josh! That was helpful. Indeed, there was a salt bucket on the
>> table, and the key-column now shows correctly.
>>
>> However, the problem still persists in that the rest of the columns show
>> as completely empty on Phoenix (appear correctly on Hbase). We'll be
>> looking into this but if you have any further advice, appreciated.
>>
>> Saif
>>
>> On Wed, Sep 12, 2018 at 5:50 PM Josh Elser <el...@apache.org> wrote:
>>
>>> Reminder: Using Phoenix internals forces you to understand exactly how
>>> the version of Phoenix that you're using serializes data. Is there a
>>> reason you're not using SQL to interact with Phoenix?
>>>
>>> Sounds to me that Phoenix is expecting more data at the head of your
>>> rowkey. Maybe a salt bucket that you've defined on the table but not
>>> created?
>>>
>>> On 9/12/18 4:32 PM, Saif Addin wrote:
>>> > Hi all,
>>> >
>>> > We're trying to write tables with all string columns from spark.
>>> > We are not using the Spark Connector, instead we are directly writing
>>> > byte arrays from RDDs.
>>> >
>>> > The process works fine, and Hbase receives the data correctly, and
>>> > content is consistent.
>>> >
>>> > However reading the table from Phoenix, we notice the first character
>>> of
>>> > strings are missing. This sounds like it's a byte encoding issue, but
>>> > we're at loss. We're using PVarchar to generate bytes.
>>> >
>>> > Here's the snippet of code creating the RDD:
>>> >
>>> > val tdd = pdd.flatMap(x => {
>>> >    val rowKey = PVarchar.INSTANCE.toBytes(x._1)
>>> >    for(i <- 0 until cols.length) yield {
>>> >      other stuff for other columns ...
>>> >      ...
>>> >      (rowKey, (column1, column2, column3))
>>> >    }
>>> > })
>>> >
>>> > ...
>>> >
>>> > We then create the following output to be written down in Hbase
>>> >
>>> > val output = tdd.map(x => {
>>> >      val rowKeyByte: Array[Byte] = x._1
>>> >      val immutableRowKey = new ImmutableBytesWritable(rowKeyByte)
>>> >
>>> >      val kv = new KeyValue(rowKeyByte,
>>> >          PVarchar.INSTANCE.toBytes(column1),
>>> >          PVarchar.INSTANCE.toBytes(column2),
>>> >        PVarchar.INSTANCE.toBytes(column3)
>>> >      )
>>> >      (immutableRowKey, kv)
>>> > })
>>> >
>>> > By the way, we are using *KryoSerializer* in order to be able to
>>> > serialize all classes necessary for Hbase (KeyValue, BytesWritable,
>>> etc).
>>> >
>>> > The key of this table is the one missing data when queried from
>>> Phoenix.
>>> > So we guess something is wrong with the byte ser.
>>> >
>>> > Any ideas? Appreciated!
>>> > Saif
>>>
>>
>

Re: Missing content in phoenix after writing from Spark

Posted by Thomas D'Silva <td...@salesforce.com>.
Is there a reason you didn't use the spark-connector to serialize your data?

On Wed, Sep 12, 2018 at 2:28 PM, Saif Addin <sa...@gmail.com> wrote:

> Thank you Josh! That was helpful. Indeed, there was a salt bucket on the
> table, and the key-column now shows correctly.
>
> However, the problem still persists in that the rest of the columns show
> as completely empty on Phoenix (appear correctly on Hbase). We'll be
> looking into this but if you have any further advice, appreciated.
>
> Saif
>
> On Wed, Sep 12, 2018 at 5:50 PM Josh Elser <el...@apache.org> wrote:
>
>> Reminder: Using Phoenix internals forces you to understand exactly how
>> the version of Phoenix that you're using serializes data. Is there a
>> reason you're not using SQL to interact with Phoenix?
>>
>> Sounds to me that Phoenix is expecting more data at the head of your
>> rowkey. Maybe a salt bucket that you've defined on the table but not
>> created?
>>
>> On 9/12/18 4:32 PM, Saif Addin wrote:
>> > Hi all,
>> >
>> > We're trying to write tables with all string columns from spark.
>> > We are not using the Spark Connector, instead we are directly writing
>> > byte arrays from RDDs.
>> >
>> > The process works fine, and Hbase receives the data correctly, and
>> > content is consistent.
>> >
>> > However reading the table from Phoenix, we notice the first character
>> of
>> > strings are missing. This sounds like it's a byte encoding issue, but
>> > we're at loss. We're using PVarchar to generate bytes.
>> >
>> > Here's the snippet of code creating the RDD:
>> >
>> > val tdd = pdd.flatMap(x => {
>> >    val rowKey = PVarchar.INSTANCE.toBytes(x._1)
>> >    for(i <- 0 until cols.length) yield {
>> >      other stuff for other columns ...
>> >      ...
>> >      (rowKey, (column1, column2, column3))
>> >    }
>> > })
>> >
>> > ...
>> >
>> > We then create the following output to be written down in Hbase
>> >
>> > val output = tdd.map(x => {
>> >      val rowKeyByte: Array[Byte] = x._1
>> >      val immutableRowKey = new ImmutableBytesWritable(rowKeyByte)
>> >
>> >      val kv = new KeyValue(rowKeyByte,
>> >          PVarchar.INSTANCE.toBytes(column1),
>> >          PVarchar.INSTANCE.toBytes(column2),
>> >        PVarchar.INSTANCE.toBytes(column3)
>> >      )
>> >      (immutableRowKey, kv)
>> > })
>> >
>> > By the way, we are using *KryoSerializer* in order to be able to
>> > serialize all classes necessary for Hbase (KeyValue, BytesWritable,
>> etc).
>> >
>> > The key of this table is the one missing data when queried from
>> Phoenix.
>> > So we guess something is wrong with the byte ser.
>> >
>> > Any ideas? Appreciated!
>> > Saif
>>
>

Re: Missing content in phoenix after writing from Spark

Posted by Saif Addin <sa...@gmail.com>.
Thank you Josh! That was helpful. Indeed, there was a salt bucket on the
table, and the key-column now shows correctly.

However, the problem still persists in that the rest of the columns show as
completely empty on Phoenix (appear correctly on Hbase). We'll be looking
into this but if you have any further advice, appreciated.

Saif

On Wed, Sep 12, 2018 at 5:50 PM Josh Elser <el...@apache.org> wrote:

> Reminder: Using Phoenix internals forces you to understand exactly how
> the version of Phoenix that you're using serializes data. Is there a
> reason you're not using SQL to interact with Phoenix?
>
> Sounds to me that Phoenix is expecting more data at the head of your
> rowkey. Maybe a salt bucket that you've defined on the table but not
> created?
>
> On 9/12/18 4:32 PM, Saif Addin wrote:
> > Hi all,
> >
> > We're trying to write tables with all string columns from spark.
> > We are not using the Spark Connector, instead we are directly writing
> > byte arrays from RDDs.
> >
> > The process works fine, and Hbase receives the data correctly, and
> > content is consistent.
> >
> > However reading the table from Phoenix, we notice the first character of
> > strings are missing. This sounds like it's a byte encoding issue, but
> > we're at loss. We're using PVarchar to generate bytes.
> >
> > Here's the snippet of code creating the RDD:
> >
> > val tdd = pdd.flatMap(x => {
> >    val rowKey = PVarchar.INSTANCE.toBytes(x._1)
> >    for(i <- 0 until cols.length) yield {
> >      other stuff for other columns ...
> >      ...
> >      (rowKey, (column1, column2, column3))
> >    }
> > })
> >
> > ...
> >
> > We then create the following output to be written down in Hbase
> >
> > val output = tdd.map(x => {
> >      val rowKeyByte: Array[Byte] = x._1
> >      val immutableRowKey = new ImmutableBytesWritable(rowKeyByte)
> >
> >      val kv = new KeyValue(rowKeyByte,
> >          PVarchar.INSTANCE.toBytes(column1),
> >          PVarchar.INSTANCE.toBytes(column2),
> >        PVarchar.INSTANCE.toBytes(column3)
> >      )
> >      (immutableRowKey, kv)
> > })
> >
> > By the way, we are using *KryoSerializer* in order to be able to
> > serialize all classes necessary for Hbase (KeyValue, BytesWritable, etc).
> >
> > The key of this table is the one missing data when queried from Phoenix.
> > So we guess something is wrong with the byte ser.
> >
> > Any ideas? Appreciated!
> > Saif
>

Re: Missing content in phoenix after writing from Spark

Posted by Josh Elser <el...@apache.org>.
Reminder: Using Phoenix internals forces you to understand exactly how 
the version of Phoenix that you're using serializes data. Is there a 
reason you're not using SQL to interact with Phoenix?

Sounds to me that Phoenix is expecting more data at the head of your 
rowkey. Maybe a salt bucket that you've defined on the table but not 
created?

On 9/12/18 4:32 PM, Saif Addin wrote:
> Hi all,
> 
> We're trying to write tables with all string columns from spark.
> We are not using the Spark Connector, instead we are directly writing 
> byte arrays from RDDs.
> 
> The process works fine, and Hbase receives the data correctly, and 
> content is consistent.
> 
> However reading the table from Phoenix, we notice the first character of 
> strings are missing. This sounds like it's a byte encoding issue, but 
> we're at loss. We're using PVarchar to generate bytes.
> 
> Here's the snippet of code creating the RDD:
> 
> val tdd = pdd.flatMap(x => {
>    val rowKey = PVarchar.INSTANCE.toBytes(x._1)
>    for(i <- 0 until cols.length) yield {
>      other stuff for other columns ...
>      ...
>      (rowKey, (column1, column2, column3))
>    }
> })
> 
> ...
> 
> We then create the following output to be written down in Hbase
> 
> val output = tdd.map(x => {
>      val rowKeyByte: Array[Byte] = x._1
>      val immutableRowKey = new ImmutableBytesWritable(rowKeyByte)
> 
>      val kv = new KeyValue(rowKeyByte,
>          PVarchar.INSTANCE.toBytes(column1),
>          PVarchar.INSTANCE.toBytes(column2),
>        PVarchar.INSTANCE.toBytes(column3)
>      )
>      (immutableRowKey, kv)
> })
> 
> By the way, we are using *KryoSerializer* in order to be able to 
> serialize all classes necessary for Hbase (KeyValue, BytesWritable, etc).
> 
> The key of this table is the one missing data when queried from Phoenix. 
> So we guess something is wrong with the byte ser.
> 
> Any ideas? Appreciated!
> Saif