You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iceberg.apache.org by Saisai Shao <sa...@gmail.com> on 2019/08/07 03:58:53 UTC

Two newbie question about Iceberg

Hi team,

I just met some issues when trying Iceberg with quick start guide. Not sure
if it is proper to send this to @dev mail list (seems there's no user mail
list).

One issue is that seems current Iceberg cannot run with embedded metastore.
It will throw an exception. Is this an on-purpose behavior (force to use
remote HMS), or just a bug?

Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Unable to
update transaction database java.sql.SQLSyntaxErrorException: Table/View
'HIVE_LOCKS' does not exist.
at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown
Source)
at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source)
at
org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown
Source)

Followed by this issue, seems like current Iceberg only binds to HMS as
catalog, this is fine for production usage. But I'm wondering if we could
have a simple catalog like in-memory catalog as Spark, so that it is easy
for user to test and play. Is there any concern or plan?

Best regards,
Saisai

Re: Two newbie question about Iceberg

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
Great, thanks for working on this, Saisai!

On Thu, Aug 8, 2019 at 7:38 PM Saisai Shao <sa...@gmail.com> wrote:

> I'm still looking into this, to figure out a way to add HIVE_LOCKS table
> in the Spark side. Anyway I will create an issue first to track this.
>
> Best regards,
> Saisai
>
> Ryan Blue <rb...@netflix.com> 于2019年8月9日周五 上午4:58写道:
>
>> Any ideas on how to fix this? Can we create the HIVE_LOCKS table if it is
>> missing automatically?
>>
>> On Wed, Aug 7, 2019 at 7:13 PM Saisai Shao <sa...@gmail.com>
>> wrote:
>>
>>> Thanks guys for your reply.
>>>
>>> I didn't do anything special, I don't even have a configured Hive. I
>>> just simply put the iceberg (assembly) jar into Spark and start a local
>>> Spark process. I think the built-in Hive version of Spark is 1.2.1-spark
>>> (has a slight pom change), and all the configurations related to
>>> SparkSQL/Hive are default. I guess the reason is like Anton mentioned, I
>>> will take a try by creating all tables (HIVE_LOCKS) using script. But I
>>> think we should fix it, this potentially stops user to do a quick start by
>>> using local spark.
>>>
>>>  think the reason why it works in tests is because we create all tables
>>>> (including HIVE_LOCKS) using a script
>>>>
>>>
>>> Best regards,
>>> Saisai
>>>
>>> Anton Okolnychyi <ao...@apple.com> 于2019年8月7日周三 下午11:56写道:
>>>
>>>> I think the reason why it works in tests is because we create all
>>>> tables (including HIVE_LOCKS) using a script. I am not sure lock tables are
>>>> always created in embedded mode.
>>>>
>>>> > On 7 Aug 2019, at 16:49, Ryan Blue <rb...@netflix.com> wrote:
>>>> >
>>>> > This is the right list. Iceberg is fairly low in the stack, so most
>>>> questions are probably dev questions.
>>>> >
>>>> > I'm surprised that this doesn't work with an embedded metastore
>>>> because we use an embedded metastore in tests:
>>>> https://github.com/apache/incubator-iceberg/blob/master/hive/src/test/java/org/apache/iceberg/hive/TestHiveMetastore.java
>>>> >
>>>> > But we are also using Hive 1.2.1 and a metastore schema for 3.1.0. I
>>>> wonder if a newer version of Hive would avoid this problem? What version
>>>> are you linking with?
>>>> >
>>>> > On Tue, Aug 6, 2019 at 8:59 PM Saisai Shao <sa...@gmail.com>
>>>> wrote:
>>>> > Hi team,
>>>> >
>>>> > I just met some issues when trying Iceberg with quick start guide.
>>>> Not sure if it is proper to send this to @dev mail list (seems there's no
>>>> user mail list).
>>>> >
>>>> > One issue is that seems current Iceberg cannot run with embedded
>>>> metastore. It will throw an exception. Is this an on-purpose behavior
>>>> (force to use remote HMS), or just a bug?
>>>> >
>>>> > Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Unable
>>>> to update transaction database java.sql.SQLSyntaxErrorException: Table/View
>>>> 'HIVE_LOCKS' does not exist.
>>>> > at
>>>> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown
>>>> Source)
>>>> > at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown
>>>> Source)
>>>> > at
>>>> org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown
>>>> Source)
>>>> >
>>>> > Followed by this issue, seems like current Iceberg only binds to HMS
>>>> as catalog, this is fine for production usage. But I'm wondering if we
>>>> could have a simple catalog like in-memory catalog as Spark, so that it is
>>>> easy for user to test and play. Is there any concern or plan?
>>>> >
>>>> > Best regards,
>>>> > Saisai
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Ryan Blue
>>>> > Software Engineer
>>>> > Netflix
>>>>
>>>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>

-- 
Ryan Blue
Software Engineer
Netflix

Re: Two newbie question about Iceberg

Posted by Saisai Shao <sa...@gmail.com>.
I'm still looking into this, to figure out a way to add HIVE_LOCKS table in
the Spark side. Anyway I will create an issue first to track this.

Best regards,
Saisai

Ryan Blue <rb...@netflix.com> 于2019年8月9日周五 上午4:58写道:

> Any ideas on how to fix this? Can we create the HIVE_LOCKS table if it is
> missing automatically?
>
> On Wed, Aug 7, 2019 at 7:13 PM Saisai Shao <sa...@gmail.com> wrote:
>
>> Thanks guys for your reply.
>>
>> I didn't do anything special, I don't even have a configured Hive. I just
>> simply put the iceberg (assembly) jar into Spark and start a local Spark
>> process. I think the built-in Hive version of Spark is 1.2.1-spark (has a
>> slight pom change), and all the configurations related to SparkSQL/Hive are
>> default. I guess the reason is like Anton mentioned, I will take a try by
>> creating all tables (HIVE_LOCKS) using script. But I think we should fix
>> it, this potentially stops user to do a quick start by using local spark.
>>
>>  think the reason why it works in tests is because we create all tables
>>> (including HIVE_LOCKS) using a script
>>>
>>
>> Best regards,
>> Saisai
>>
>> Anton Okolnychyi <ao...@apple.com> 于2019年8月7日周三 下午11:56写道:
>>
>>> I think the reason why it works in tests is because we create all tables
>>> (including HIVE_LOCKS) using a script. I am not sure lock tables are always
>>> created in embedded mode.
>>>
>>> > On 7 Aug 2019, at 16:49, Ryan Blue <rb...@netflix.com> wrote:
>>> >
>>> > This is the right list. Iceberg is fairly low in the stack, so most
>>> questions are probably dev questions.
>>> >
>>> > I'm surprised that this doesn't work with an embedded metastore
>>> because we use an embedded metastore in tests:
>>> https://github.com/apache/incubator-iceberg/blob/master/hive/src/test/java/org/apache/iceberg/hive/TestHiveMetastore.java
>>> >
>>> > But we are also using Hive 1.2.1 and a metastore schema for 3.1.0. I
>>> wonder if a newer version of Hive would avoid this problem? What version
>>> are you linking with?
>>> >
>>> > On Tue, Aug 6, 2019 at 8:59 PM Saisai Shao <sa...@gmail.com>
>>> wrote:
>>> > Hi team,
>>> >
>>> > I just met some issues when trying Iceberg with quick start guide. Not
>>> sure if it is proper to send this to @dev mail list (seems there's no user
>>> mail list).
>>> >
>>> > One issue is that seems current Iceberg cannot run with embedded
>>> metastore. It will throw an exception. Is this an on-purpose behavior
>>> (force to use remote HMS), or just a bug?
>>> >
>>> > Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Unable
>>> to update transaction database java.sql.SQLSyntaxErrorException: Table/View
>>> 'HIVE_LOCKS' does not exist.
>>> > at
>>> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown
>>> Source)
>>> > at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown
>>> Source)
>>> > at
>>> org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown
>>> Source)
>>> >
>>> > Followed by this issue, seems like current Iceberg only binds to HMS
>>> as catalog, this is fine for production usage. But I'm wondering if we
>>> could have a simple catalog like in-memory catalog as Spark, so that it is
>>> easy for user to test and play. Is there any concern or plan?
>>> >
>>> > Best regards,
>>> > Saisai
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > Ryan Blue
>>> > Software Engineer
>>> > Netflix
>>>
>>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: Two newbie question about Iceberg

Posted by Ryan Blue <rb...@netflix.com>.
Any ideas on how to fix this? Can we create the HIVE_LOCKS table if it is
missing automatically?

On Wed, Aug 7, 2019 at 7:13 PM Saisai Shao <sa...@gmail.com> wrote:

> Thanks guys for your reply.
>
> I didn't do anything special, I don't even have a configured Hive. I just
> simply put the iceberg (assembly) jar into Spark and start a local Spark
> process. I think the built-in Hive version of Spark is 1.2.1-spark (has a
> slight pom change), and all the configurations related to SparkSQL/Hive are
> default. I guess the reason is like Anton mentioned, I will take a try by
> creating all tables (HIVE_LOCKS) using script. But I think we should fix
> it, this potentially stops user to do a quick start by using local spark.
>
>  think the reason why it works in tests is because we create all tables
>> (including HIVE_LOCKS) using a script
>>
>
> Best regards,
> Saisai
>
> Anton Okolnychyi <ao...@apple.com> 于2019年8月7日周三 下午11:56写道:
>
>> I think the reason why it works in tests is because we create all tables
>> (including HIVE_LOCKS) using a script. I am not sure lock tables are always
>> created in embedded mode.
>>
>> > On 7 Aug 2019, at 16:49, Ryan Blue <rb...@netflix.com> wrote:
>> >
>> > This is the right list. Iceberg is fairly low in the stack, so most
>> questions are probably dev questions.
>> >
>> > I'm surprised that this doesn't work with an embedded metastore because
>> we use an embedded metastore in tests:
>> https://github.com/apache/incubator-iceberg/blob/master/hive/src/test/java/org/apache/iceberg/hive/TestHiveMetastore.java
>> >
>> > But we are also using Hive 1.2.1 and a metastore schema for 3.1.0. I
>> wonder if a newer version of Hive would avoid this problem? What version
>> are you linking with?
>> >
>> > On Tue, Aug 6, 2019 at 8:59 PM Saisai Shao <sa...@gmail.com>
>> wrote:
>> > Hi team,
>> >
>> > I just met some issues when trying Iceberg with quick start guide. Not
>> sure if it is proper to send this to @dev mail list (seems there's no user
>> mail list).
>> >
>> > One issue is that seems current Iceberg cannot run with embedded
>> metastore. It will throw an exception. Is this an on-purpose behavior
>> (force to use remote HMS), or just a bug?
>> >
>> > Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Unable
>> to update transaction database java.sql.SQLSyntaxErrorException: Table/View
>> 'HIVE_LOCKS' does not exist.
>> > at
>> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown
>> Source)
>> > at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown
>> Source)
>> > at
>> org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown
>> Source)
>> >
>> > Followed by this issue, seems like current Iceberg only binds to HMS as
>> catalog, this is fine for production usage. But I'm wondering if we could
>> have a simple catalog like in-memory catalog as Spark, so that it is easy
>> for user to test and play. Is there any concern or plan?
>> >
>> > Best regards,
>> > Saisai
>> >
>> >
>> >
>> >
>> > --
>> > Ryan Blue
>> > Software Engineer
>> > Netflix
>>
>>

-- 
Ryan Blue
Software Engineer
Netflix

Re: Two newbie question about Iceberg

Posted by Saisai Shao <sa...@gmail.com>.
Thanks guys for your reply.

I didn't do anything special, I don't even have a configured Hive. I just
simply put the iceberg (assembly) jar into Spark and start a local Spark
process. I think the built-in Hive version of Spark is 1.2.1-spark (has a
slight pom change), and all the configurations related to SparkSQL/Hive are
default. I guess the reason is like Anton mentioned, I will take a try by
creating all tables (HIVE_LOCKS) using script. But I think we should fix
it, this potentially stops user to do a quick start by using local spark.

 think the reason why it works in tests is because we create all tables
> (including HIVE_LOCKS) using a script
>

Best regards,
Saisai

Anton Okolnychyi <ao...@apple.com> 于2019年8月7日周三 下午11:56写道:

> I think the reason why it works in tests is because we create all tables
> (including HIVE_LOCKS) using a script. I am not sure lock tables are always
> created in embedded mode.
>
> > On 7 Aug 2019, at 16:49, Ryan Blue <rb...@netflix.com> wrote:
> >
> > This is the right list. Iceberg is fairly low in the stack, so most
> questions are probably dev questions.
> >
> > I'm surprised that this doesn't work with an embedded metastore because
> we use an embedded metastore in tests:
> https://github.com/apache/incubator-iceberg/blob/master/hive/src/test/java/org/apache/iceberg/hive/TestHiveMetastore.java
> >
> > But we are also using Hive 1.2.1 and a metastore schema for 3.1.0. I
> wonder if a newer version of Hive would avoid this problem? What version
> are you linking with?
> >
> > On Tue, Aug 6, 2019 at 8:59 PM Saisai Shao <sa...@gmail.com>
> wrote:
> > Hi team,
> >
> > I just met some issues when trying Iceberg with quick start guide. Not
> sure if it is proper to send this to @dev mail list (seems there's no user
> mail list).
> >
> > One issue is that seems current Iceberg cannot run with embedded
> metastore. It will throw an exception. Is this an on-purpose behavior
> (force to use remote HMS), or just a bug?
> >
> > Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Unable to
> update transaction database java.sql.SQLSyntaxErrorException: Table/View
> 'HIVE_LOCKS' does not exist.
> > at
> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown
> Source)
> > at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source)
> > at
> org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown
> Source)
> >
> > Followed by this issue, seems like current Iceberg only binds to HMS as
> catalog, this is fine for production usage. But I'm wondering if we could
> have a simple catalog like in-memory catalog as Spark, so that it is easy
> for user to test and play. Is there any concern or plan?
> >
> > Best regards,
> > Saisai
> >
> >
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix
>
>

Re: Two newbie question about Iceberg

Posted by Anton Okolnychyi <ao...@apple.com>.
I think the reason why it works in tests is because we create all tables (including HIVE_LOCKS) using a script. I am not sure lock tables are always created in embedded mode.

> On 7 Aug 2019, at 16:49, Ryan Blue <rb...@netflix.com> wrote:
> 
> This is the right list. Iceberg is fairly low in the stack, so most questions are probably dev questions.
> 
> I'm surprised that this doesn't work with an embedded metastore because we use an embedded metastore in tests: https://github.com/apache/incubator-iceberg/blob/master/hive/src/test/java/org/apache/iceberg/hive/TestHiveMetastore.java
> 
> But we are also using Hive 1.2.1 and a metastore schema for 3.1.0. I wonder if a newer version of Hive would avoid this problem? What version are you linking with?
> 
> On Tue, Aug 6, 2019 at 8:59 PM Saisai Shao <sa...@gmail.com> wrote:
> Hi team, 
> 
> I just met some issues when trying Iceberg with quick start guide. Not sure if it is proper to send this to @dev mail list (seems there's no user mail list).
> 
> One issue is that seems current Iceberg cannot run with embedded metastore. It will throw an exception. Is this an on-purpose behavior (force to use remote HMS), or just a bug?
> 
> Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Unable to update transaction database java.sql.SQLSyntaxErrorException: Table/View 'HIVE_LOCKS' does not exist.
> at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
> at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source)
> at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source)
> 
> Followed by this issue, seems like current Iceberg only binds to HMS as catalog, this is fine for production usage. But I'm wondering if we could have a simple catalog like in-memory catalog as Spark, so that it is easy for user to test and play. Is there any concern or plan?
> 
> Best regards,
> Saisai
> 
> 
> 
> 
> -- 
> Ryan Blue
> Software Engineer
> Netflix


Re: Two newbie question about Iceberg

Posted by Ryan Blue <rb...@netflix.com>.
This is the right list. Iceberg is fairly low in the stack, so most
questions are probably dev questions.

I'm surprised that this doesn't work with an embedded metastore because we
use an embedded metastore in tests:
https://github.com/apache/incubator-iceberg/blob/master/hive/src/test/java/org/apache/iceberg/hive/TestHiveMetastore.java

But we are also using Hive 1.2.1 and a metastore schema for 3.1.0. I wonder
if a newer version of Hive would avoid this problem? What version are you
linking with?

On Tue, Aug 6, 2019 at 8:59 PM Saisai Shao <sa...@gmail.com> wrote:

> Hi team,
>
> I just met some issues when trying Iceberg with quick start guide. Not
> sure if it is proper to send this to @dev mail list (seems there's no user
> mail list).
>
> One issue is that seems current Iceberg cannot run with embedded
> metastore. It will throw an exception. Is this an on-purpose behavior
> (force to use remote HMS), or just a bug?
>
> Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Unable to
> update transaction database java.sql.SQLSyntaxErrorException: Table/View
> 'HIVE_LOCKS' does not exist.
> at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown
> Source)
> at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source)
> at
> org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown
> Source)
>
> Followed by this issue, seems like current Iceberg only binds to HMS as
> catalog, this is fine for production usage. But I'm wondering if we could
> have a simple catalog like in-memory catalog as Spark, so that it is easy
> for user to test and play. Is there any concern or plan?
>
> Best regards,
> Saisai
>
>
>

-- 
Ryan Blue
Software Engineer
Netflix