You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hudi.apache.org by Pratyaksh Sharma <pr...@gmail.com> on 2019/11/06 07:01:40 UTC

[Discuss] Creation of database in Hive

Hi,

While doing hive sync using HiveSyncTool, we first check if the target
table exists in hive. If not, we try to create it. However in this flow, if
the database itself does not exist, we do not create the database before
creating hive table, which results in exception like below -

org.apache.hive.service.cli.HiveSQLException: Error while compiling
statement: FAILED: SemanticException [Error 10072]: Database does not
exist: test_db
at
org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:335)
at
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:199)
at
org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:262)
at org.apache.hive.service.cli.operation.Operation.run(Operation.java:247)
at
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:575)
at
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:561)
at sun.reflect.GeneratedMethodAccessor108.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
at
org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
at
org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
at com.sun.proxy.$Proxy68.executeStatementAsync(Unknown Source)
at
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:315)
at
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:566)
at
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557)
at
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1542)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at
org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
... 3 more
Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Database does
not exist: test_db
at
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.getDatabase(BaseSemanticAnalyzer.java:2154)


So just wanted to discuss if we should try creating database first in above
case using query like -

CREATE DATABASE|SCHEMA [IF NOT EXISTS] <database name>

Re: [Discuss] Creation of database in Hive

Posted by Pratyaksh Sharma <pr...@gmail.com>.
Ok, that is a valid reason.

On Thu, Nov 7, 2019 at 2:03 AM Bhavani Sudha <bh...@gmail.com>
wrote:

> Ah okay. That is a valid concern. Dint think about admin management for
> Hive dbs.
>
> Thanks,
> Sudha
>
> On Wed, Nov 6, 2019 at 12:28 PM Balaji Varadarajan <vb...@apache.org>
> wrote:
>
> > I have a different opinion on this. Usually, in production deployments
> > (atleast whatever I am aware of), database is generally managed at the
> > org/group level.  Privacy policies like ACLs are usually done at database
> > level and would need first level management by admins. With such a setup,
> > its feels safer to let database creation done through separate process
> and
> > let hudi hive sync only  alter/create tables (current setup).
> >
> > Open to hearing other's thoughts.
> >
> > Regards,
> > Balaji.V
> >
> > On Wed, Nov 6, 2019 at 12:01 PM Bhavani Sudha <bh...@gmail.com>
> > wrote:
> >
> > > +1 I think we should create db if it does not exist.
> > >
> > > On Tue, Nov 5, 2019 at 11:08 PM Pratyaksh Sharma <
> pratyaksh13@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > While doing hive sync using HiveSyncTool, we first check if the
> target
> > > > table exists in hive. If not, we try to create it. However in this
> > flow,
> > > if
> > > > the database itself does not exist, we do not create the database
> > before
> > > > creating hive table, which results in exception like below -
> > > >
> > > > org.apache.hive.service.cli.HiveSQLException: Error while compiling
> > > > statement: FAILED: SemanticException [Error 10072]: Database does not
> > > > exist: test_db
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:335)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:199)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:262)
> > > > at
> > > org.apache.hive.service.cli.operation.Operation.run(Operation.java:247)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:575)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:561)
> > > > at sun.reflect.GeneratedMethodAccessor108.invoke(Unknown Source)
> > > > at
> > > >
> > > >
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > > > at java.lang.reflect.Method.invoke(Method.java:498)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
> > > > at java.security.AccessController.doPrivileged(Native Method)
> > > > at javax.security.auth.Subject.doAs(Subject.java:422)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
> > > > at com.sun.proxy.$Proxy68.executeStatementAsync(Unknown Source)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:315)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:566)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1542)
> > > > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> > > > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
> > > > ... 3 more
> > > > Caused by: org.apache.hadoop.hive.ql.parse.SemanticException:
> Database
> > > does
> > > > not exist: test_db
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.getDatabase(BaseSemanticAnalyzer.java:2154)
> > > >
> > > >
> > > > So just wanted to discuss if we should try creating database first in
> > > above
> > > > case using query like -
> > > >
> > > > CREATE DATABASE|SCHEMA [IF NOT EXISTS] <database name>
> > > >
> > >
> >
>

Re: [Discuss] Creation of database in Hive

Posted by Bhavani Sudha <bh...@gmail.com>.
Ah okay. That is a valid concern. Dint think about admin management for
Hive dbs.

Thanks,
Sudha

On Wed, Nov 6, 2019 at 12:28 PM Balaji Varadarajan <vb...@apache.org>
wrote:

> I have a different opinion on this. Usually, in production deployments
> (atleast whatever I am aware of), database is generally managed at the
> org/group level.  Privacy policies like ACLs are usually done at database
> level and would need first level management by admins. With such a setup,
> its feels safer to let database creation done through separate process and
> let hudi hive sync only  alter/create tables (current setup).
>
> Open to hearing other's thoughts.
>
> Regards,
> Balaji.V
>
> On Wed, Nov 6, 2019 at 12:01 PM Bhavani Sudha <bh...@gmail.com>
> wrote:
>
> > +1 I think we should create db if it does not exist.
> >
> > On Tue, Nov 5, 2019 at 11:08 PM Pratyaksh Sharma <pr...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > While doing hive sync using HiveSyncTool, we first check if the target
> > > table exists in hive. If not, we try to create it. However in this
> flow,
> > if
> > > the database itself does not exist, we do not create the database
> before
> > > creating hive table, which results in exception like below -
> > >
> > > org.apache.hive.service.cli.HiveSQLException: Error while compiling
> > > statement: FAILED: SemanticException [Error 10072]: Database does not
> > > exist: test_db
> > > at
> > >
> > >
> >
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:335)
> > > at
> > >
> > >
> >
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:199)
> > > at
> > >
> > >
> >
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:262)
> > > at
> > org.apache.hive.service.cli.operation.Operation.run(Operation.java:247)
> > > at
> > >
> > >
> >
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:575)
> > > at
> > >
> > >
> >
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:561)
> > > at sun.reflect.GeneratedMethodAccessor108.invoke(Unknown Source)
> > > at
> > >
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > > at java.lang.reflect.Method.invoke(Method.java:498)
> > > at
> > >
> > >
> >
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
> > > at
> > >
> > >
> >
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
> > > at
> > >
> > >
> >
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
> > > at java.security.AccessController.doPrivileged(Native Method)
> > > at javax.security.auth.Subject.doAs(Subject.java:422)
> > > at
> > >
> > >
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> > > at
> > >
> > >
> >
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
> > > at com.sun.proxy.$Proxy68.executeStatementAsync(Unknown Source)
> > > at
> > >
> > >
> >
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:315)
> > > at
> > >
> > >
> >
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:566)
> > > at
> > >
> > >
> >
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557)
> > > at
> > >
> > >
> >
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1542)
> > > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> > > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> > > at
> > >
> > >
> >
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
> > > at
> > >
> > >
> >
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
> > > ... 3 more
> > > Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Database
> > does
> > > not exist: test_db
> > > at
> > >
> > >
> >
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.getDatabase(BaseSemanticAnalyzer.java:2154)
> > >
> > >
> > > So just wanted to discuss if we should try creating database first in
> > above
> > > case using query like -
> > >
> > > CREATE DATABASE|SCHEMA [IF NOT EXISTS] <database name>
> > >
> >
>

Re: [Discuss] Creation of database in Hive

Posted by Balaji Varadarajan <vb...@apache.org>.
I have a different opinion on this. Usually, in production deployments
(atleast whatever I am aware of), database is generally managed at the
org/group level.  Privacy policies like ACLs are usually done at database
level and would need first level management by admins. With such a setup,
its feels safer to let database creation done through separate process and
let hudi hive sync only  alter/create tables (current setup).

Open to hearing other's thoughts.

Regards,
Balaji.V

On Wed, Nov 6, 2019 at 12:01 PM Bhavani Sudha <bh...@gmail.com>
wrote:

> +1 I think we should create db if it does not exist.
>
> On Tue, Nov 5, 2019 at 11:08 PM Pratyaksh Sharma <pr...@gmail.com>
> wrote:
>
> > Hi,
> >
> > While doing hive sync using HiveSyncTool, we first check if the target
> > table exists in hive. If not, we try to create it. However in this flow,
> if
> > the database itself does not exist, we do not create the database before
> > creating hive table, which results in exception like below -
> >
> > org.apache.hive.service.cli.HiveSQLException: Error while compiling
> > statement: FAILED: SemanticException [Error 10072]: Database does not
> > exist: test_db
> > at
> >
> >
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:335)
> > at
> >
> >
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:199)
> > at
> >
> >
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:262)
> > at
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:247)
> > at
> >
> >
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:575)
> > at
> >
> >
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:561)
> > at sun.reflect.GeneratedMethodAccessor108.invoke(Unknown Source)
> > at
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:498)
> > at
> >
> >
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
> > at
> >
> >
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
> > at
> >
> >
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:422)
> > at
> >
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> > at
> >
> >
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
> > at com.sun.proxy.$Proxy68.executeStatementAsync(Unknown Source)
> > at
> >
> >
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:315)
> > at
> >
> >
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:566)
> > at
> >
> >
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557)
> > at
> >
> >
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1542)
> > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> > at
> >
> >
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
> > at
> >
> >
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
> > ... 3 more
> > Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Database
> does
> > not exist: test_db
> > at
> >
> >
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.getDatabase(BaseSemanticAnalyzer.java:2154)
> >
> >
> > So just wanted to discuss if we should try creating database first in
> above
> > case using query like -
> >
> > CREATE DATABASE|SCHEMA [IF NOT EXISTS] <database name>
> >
>

Re: [Discuss] Creation of database in Hive

Posted by Bhavani Sudha <bh...@gmail.com>.
+1 I think we should create db if it does not exist.

On Tue, Nov 5, 2019 at 11:08 PM Pratyaksh Sharma <pr...@gmail.com>
wrote:

> Hi,
>
> While doing hive sync using HiveSyncTool, we first check if the target
> table exists in hive. If not, we try to create it. However in this flow, if
> the database itself does not exist, we do not create the database before
> creating hive table, which results in exception like below -
>
> org.apache.hive.service.cli.HiveSQLException: Error while compiling
> statement: FAILED: SemanticException [Error 10072]: Database does not
> exist: test_db
> at
>
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:335)
> at
>
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:199)
> at
>
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:262)
> at org.apache.hive.service.cli.operation.Operation.run(Operation.java:247)
> at
>
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:575)
> at
>
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:561)
> at sun.reflect.GeneratedMethodAccessor108.invoke(Unknown Source)
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
>
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
> at
>
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
> at
>
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at
>
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
> at com.sun.proxy.$Proxy68.executeStatementAsync(Unknown Source)
> at
>
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:315)
> at
>
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:566)
> at
>
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557)
> at
>
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1542)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at
>
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
> at
>
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
> ... 3 more
> Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Database does
> not exist: test_db
> at
>
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.getDatabase(BaseSemanticAnalyzer.java:2154)
>
>
> So just wanted to discuss if we should try creating database first in above
> case using query like -
>
> CREATE DATABASE|SCHEMA [IF NOT EXISTS] <database name>
>