You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by java8964 <ja...@hotmail.com> on 2015/08/25 22:41:07 UTC

SparkSQL problem with IBM BigInsight V3

Hi, On our production environment, we have a unique problems related to Spark SQL, and I wonder if anyone can give me some idea what is the best way to handle this.
Our production Hadoop cluster is IBM BigInsight Version 3, which comes with Hadoop 2.2.0 and Hive 0.12.
Right now, we build spark 1.3.1 ourselves and point to the above versions during the build.
Now, here is the problem related to Spark SQL that it cannot query partitioned Hive tables. It has no problem to query non-partitioned Hive tables in Spark SQL.
The error in the Spark SQL for querying partitioned Hive tables like following:
javax.jdo.JDODataStoreException: Error executing SQL query "select PARTITIONS.PART_ID from PARTITIONS  inner join TBLS on PARTITIONS.TBL_ID = TBLS.TBL_ID   inner join DBS on TBLS.DB_ID = DBS.DB_ID  where TBLS.TBL_NAME = ? and DBS.NAME = ?".	at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:451)	at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321)        ...............NestedThrowablesStackTrace:com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-204, SQLSTATE=42704, SQLERRMC=CATALOG.PARTITIONS, DRIVER=4.17.36
The Hive metadata of BigInsight V3 is stored in DB2 (Don't ask me why, as it is from IBM), and the above error from DB2 simple means "Table NOT FOUND".If I change the above query like following:
select PARTITIONS.PART_ID from HIVE.PARTITIONS as PARTITIONS inner join HIVE.TBLS as TBLS  on PARTITIONS.TBL_ID = TBLS.TBL_ID   inner join HIVE.DBS as DBS on TBLS.DB_ID = DBS.DB_ID  where TBLS.TBL_NAME = ? and DBS.NAME = ?
and the query will work without any problem. My guess is that IBM changed some part of Hive, to make it can use DB2 as the underline database for Hive. In DB2, it has DB instance, schema and objects. In fact, table "PARTITIONS", "TBLS" and "DBS" are all existed in the DB2, but under "HIVE" schema.
Funny thing is that for unpartitioned table, the Spark SQL just works fine with DB2 as Hive metadata store.
So my options are:
1) Wait for IBM V4.0, which will include Spark, and they will make it work, but don't know when that will happen.2) Build Spark with the Hive jar provided from IBM BigInsight, assume these hive jars will work with DB2?3) Modify some part of Spark SQL code, to make it works with DB2?
My feeling is option 3 is the best, but not sure where to start. 
Thanks
Yong
db2 => select schemaname from syscat.schemata
SCHEMANAME--------------------------------------------------------------------------------------------------------------------------------......HIVE......
db2 => list tables for schema hive
Table/View                      Schema          Type  Creation time------------------------------- --------------- ----- --------------------------BUCKETING_COLS                  HIVE            T     2015-08-05-00.09.08.676983CDS                             HIVE            T     2015-08-05-00.08.38.861789COLUMNS                         HIVE            T     2015-08-05-00.08.56.542476COLUMNS_V2                      HIVE            T     2015-08-05-00.08.36.270223DATABASE_PARAMS                 HIVE            T     2015-08-05-00.08.32.453663DBS                             HIVE            T     2015-08-05-00.08.29.642279DB_PRIVS                        HIVE            T     2015-08-05-00.08.41.411732DELEGATION_TOKENS               HIVE            T     2015-08-05-00.41.45.202784GLOBAL_PRIVS                    HIVE            T     2015-08-05-00.08.52.636188IDXS                            HIVE            T     2015-08-05-00.08.43.117673INDEX_PARAMS                    HIVE            T     2015-08-05-00.08.44.636557MASTER_KEYS                     HIVE            T     2015-08-05-00.41.43.849242NUCLEUS_TABLES                  HIVE            T     2015-08-05-00.09.11.451975PARTITIONS                      HIVE            T     2015-08-05-00.08.45.919837PARTITION_EVENTS                HIVE            T     2015-08-05-00.08.55.244342PARTITION_KEYS                  HIVE            T     2015-08-05-00.09.01.802570PARTITION_KEY_VALS              HIVE            T     2015-08-05-00.08.40.103345PARTITION_PARAMS                HIVE            T     2015-08-05-00.08.53.992383PART_COL_PRIVS                  HIVE            T     2015-08-05-00.09.03.225567PART_COL_STATS                  HIVE            T     2015-08-05-00.41.40.711274PART_PRIVS                      HIVE            T     2015-08-05-00.08.48.542585ROLES                           HIVE            T     2015-08-05-00.08.57.810737ROLE_MAP                        HIVE            T     2015-08-05-00.08.49.984015SDS                             HIVE            T     2015-08-05-00.09.04.575646SD_PARAMS                       HIVE            T     2015-08-05-00.09.12.710014SEQUENCE_TABLE                  HIVE            T     2015-08-05-00.09.06.135560SERDES                          HIVE            T     2015-08-05-00.08.47.261431SERDE_PARAMS                    HIVE            T     2015-08-05-00.08.34.995423SKEWED_COL_NAMES                HIVE            T     2015-08-05-00.41.25.778773SKEWED_COL_VALUE_LOC_MAP        HIVE            T     2015-08-05-00.41.28.612221SKEWED_STRING_LIST              HIVE            T     2015-08-05-00.41.21.171138SKEWED_STRING_LIST_VALUES       HIVE            T     2015-08-05-00.41.23.053981SKEWED_VALUES                   HIVE            T     2015-08-05-00.41.31.503723SORT_COLS                       HIVE            T     2015-08-05-00.08.37.528587TABLE_PARAMS                    HIVE            T     2015-08-05-00.09.07.475611TAB_COL_STATS                   HIVE            T     2015-08-05-00.41.37.720132TBLS                            HIVE            T     2015-08-05-00.08.59.133932TBL_COL_PRIVS                   HIVE            T     2015-08-05-00.08.33.695474TBL_PRIVS                       HIVE            T     2015-08-05-00.08.31.162446TYPES                           HIVE            T     2015-08-05-00.08.51.161305TYPE_FIELDS                     HIVE            T     2015-08-05-00.09.10.043541VERSION                         HIVE            T     2015-08-05-00.41.46.519233