You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Deenar Toraskar <de...@gmail.com> on 2016/01/27 22:42:35 UTC
Re: hivethriftserver2 problems on upgrade to 1.6.0

James

The problem you are facing is due to a feature introduced in Spark 1.6 -
multi-session mode, if you want to see temporary tables across session,
*set spark.sql.hive.thriftServer.singleSession=true*


   - From Spark 1.6, by default the Thrift server runs in multi-session
   mode. Which means each JDBC/ODBC connection owns a copy of their own SQL
   configuration and temporary function registry. Cached tables are still
   shared though. If you prefer to run the Thrift server in the old
   single-session mode, please set option
   spark.sql.hive.thriftServer.singleSession to true. You may either add
   this option to spark-defaults.conf, or pass it to start-thriftserver.sh
    via --conf:

./sbin/start-thriftserver.sh \
     --conf spark.sql.hive.thriftServer.singleSession=true \
     ...


On 25 January 2016 at 15:06, james.green9@baesystems.com <
james.green9@baesystems.com> wrote:

> On upgrade from 1.5.0 to 1.6.0 I have a problem with the
> hivethriftserver2, I have this code:
>
>
>
> *val *hiveContext = *new *HiveContext(SparkContext.*getOrCreate*(conf));
>
> *val *thing = hiveContext.read.parquet(*"hdfs://dkclusterm1.imp.net:8020/user/jegreen1/ex208
> <http://dkclusterm1.imp.net:8020/user/jegreen1/ex208>"*)
>
> thing.registerTempTable(*"thing"*)
>
>
>
> HiveThriftServer2.*startWithContext*(hiveContext)
>
>
>
>
>
> When I start things up on the cluster my hive-site.xml is found – I can
> see that the metastore connects:
>
>
>
>
>
> INFO  metastore - Trying to connect to metastore with URI thrift://
> dkclusterm2.imp.net:9083
>
> INFO  metastore - Connected to metastore.
>
>
>
>
>
> But then later on the thrift server seems not to connect to the remote
> hive metastore but to start a derby instance instead:
>
>
>
> INFO  AbstractService - Service:CLIService is started.
>
> INFO  ObjectStore - ObjectStore, initialize called
>
> INFO  Query - Reading in results for query
> "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used
> is closing
>
> INFO  MetaStoreDirectSql - Using direct SQL, underlying DB is DERBY
>
> INFO  ObjectStore - Initialized ObjectStore
>
> INFO  HiveMetaStore - 0: get_databases: default
>
> INFO  audit - ugi=jegreen1      ip=unknown-ip-addr      cmd=get_databases:
> default
>
> INFO  HiveMetaStore - 0: Shutting down the object store...
>
> INFO  audit - ugi=jegreen1      ip=unknown-ip-addr      cmd=Shutting down
> the object store...
>
> INFO  HiveMetaStore - 0: Metastore shutdown complete.
>
> INFO  audit - ugi=jegreen1      ip=unknown-ip-addr      cmd=Metastore
> shutdown complete.
>
> INFO  AbstractService - Service:ThriftBinaryCLIService is started.
>
> INFO  AbstractService - Service:HiveServer2 is started.
>
>
>
>
>
> So if I connect to this with JDBC I can see all the tables on the hive
> server – but not anything temporary – I guess they are going to derby.
>
>
>
> I see someone on the databricks website is also having this problem.
>
>
>
>
>
> Thanks
>
>
>
> James
>
>
>
>
>
>
>
>
>
>
>
>
>
> *From:* patcharee [mailto:Patcharee.Thongtra@uni.no]
> *Sent:* 25 January 2016 14:31
> *To:* user@spark.apache.org
> *Cc:* Eirik Thorsnes
> *Subject:* streaming textFileStream problem - got only ONE line
>
>
>
> Hi,
>
> My streaming application is receiving data from file system and just
> prints the input count every 1 sec interval, as the code below:
>
> * val *sparkConf = *new *SparkConf()
> * val *ssc = *new *StreamingContext(sparkConf, *Milliseconds*
> (interval_ms))
> * val *lines = ssc.textFileStream(args(0))
> lines.count().print()
>
> The problem is sometimes the data received from scc.textFileStream is ONLY
> ONE line. But in fact there are multiple lines in the new file found in
> that interval. See log below which shows three intervals. In the 2nd
> interval, the new file is:
> hdfs://helmhdfs/user/patcharee/cerdata/datetime_19617.txt. This file
> contains 6288 lines. The ssc.textFileStream returns ONLY ONE line (the
> header).
>
> Any ideas/suggestions what the problem is?
>
>
> -----------------------------------------------------------------------------------------
> SPARK LOG
>
> -----------------------------------------------------------------------------------------
>
> 16/01/25 15:11:11 INFO FileInputDStream: Cleared 1 old files that were
> older than 1453731011000 ms: 1453731010000 ms
> 16/01/25 15:11:11 INFO FileInputDStream: Cleared 0 old files that were
> older than 1453731011000 ms:
> 16/01/25 15:11:12 INFO FileInputDStream: Finding new files took 4 ms
> 16/01/25 15:11:12 INFO FileInputDStream: New files at time 1453731072000
> ms:
> hdfs://helmhdfs/user/patcharee/cerdata/datetime_19616.txt
> -------------------------------------------
> Time: 1453731072000 ms
> -------------------------------------------
> 6288
>
> 16/01/25 15:11:12 INFO FileInputDStream: Cleared 1 old files that were
> older than 1453731012000 ms: 1453731011000 ms
> 16/01/25 15:11:12 INFO FileInputDStream: Cleared 0 old files that were
> older than 1453731012000 ms:
> 16/01/25 15:11:13 INFO FileInputDStream: Finding new files took 4 ms
> 16/01/25 15:11:13 INFO FileInputDStream: New files at time 1453731073000
> ms:
> hdfs://helmhdfs/user/patcharee/cerdata/datetime_19617.txt
> -------------------------------------------
> Time: 1453731073000 ms
> -------------------------------------------
> 1
>
> 16/01/25 15:11:13 INFO FileInputDStream: Cleared 1 old files that were
> older than 1453731013000 ms: 1453731012000 ms
> 16/01/25 15:11:13 INFO FileInputDStream: Cleared 0 old files that were
> older than 1453731013000 ms:
> 16/01/25 15:11:14 INFO FileInputDStream: Finding new files took 3 ms
> 16/01/25 15:11:14 INFO FileInputDStream: New files at time 1453731074000
> ms:
> hdfs://helmhdfs/user/patcharee/cerdata/datetime_19618.txt
> -------------------------------------------
> Time: 1453731074000 ms
> -------------------------------------------
> 6288
>
>
> Thanks,
> Patcharee
> Please consider the environment before printing this email. This message
> should be regarded as confidential. If you have received this email in
> error please notify the sender and destroy it immediately. Statements of
> intent shall only become binding when confirmed in hard copy by an
> authorised signatory. The contents of this email may relate to dealings
> with other companies under the control of BAE Systems Applied Intelligence
> Limited, details of which can be found at
> http://www.baesystems.com/Businesses/index.htm.
>