You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by "wupeng (Jira)" <ji...@apache.org> on 2019/10/16 10:36:00 UTC
[jira] [Updated] (CARBONDATA-3549) How to build carbondata-1.6.0 with spark-2.1.1

     [ https://issues.apache.org/jira/browse/CARBONDATA-3549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

wupeng updated CARBONDATA-3549:
-------------------------------
    Affects Version/s: 1.6.0
          Description: 
I'm using building carbondata-1.6.0-rc3 with spark-2.1.1, and I found errors as follow:
{code:java}
[ERROR] /carbondata-root-1.6.0/integration/spark2/src/main/spark2.1/org/apache/spark/sql/hive/CreateCarbonSourceTableAsSelectCommand.scala:153: error: scrutinee is incompatible with pattern type;[ERROR] /carbondata-root-1.6.0/integration/spark2/src/main/spark2.1/org/apache/spark/sql/hive/CreateCarbonSourceTableAsSelectCommand.scala:153: error: scrutinee is incompatible with pattern type;[INFO] found : org.apache.spark.sql.execution.datasources.HadoopFsRelation[INFO] required: Unit[INFO] case fs:HadoopFsRelation if table.partitionColumnNames.nonEmpty &&[INFO] ^[WARNING] three warnings found
{code}
Finally I found the problem, In spark-2.1.1, org.apache.spark.sql.execution.datasources.Datasource#write has no return result, which in spark-2.1.0 has a BaseRelation as return.

spark-2.1.0: 
{code:java}
/** Writes the given [[DataFrame]] out to this [[DataSource]]. */
def write(
    mode: SaveMode,
    data: DataFrame): BaseRelation = {
  if (data.schema.map(_.dataType).exists(_.isInstanceOf[CalendarIntervalType])) {
    throw new AnalysisException("Cannot save interval data type into external storage.")
  }
{code}
spark-2.1.1 
{code:java}
/**
 * Writes the given [[DataFrame]] out to this [[DataSource]].
 */
def write(mode: SaveMode, data: DataFrame): Unit = {
  if (data.schema.map(_.dataType).exists(_.isInstanceOf[CalendarIntervalType])) {
    throw new AnalysisException("Cannot save interval data type into external storage.")
  }
{code}
 so when we build carbondata with spark-2.1.1, this method will give Exception in this code, because result is Unit in spark-2.1.1. 
{code:java}
val result = try {
  // dataSource.write(mode, df)
  dataSource.writeAndRead(mode, df)
} catch {
  case ex: AnalysisException =>
    logError(s"Failed to write to table $tableName in $mode mode", ex)
    throw ex
}
result match {
  case fs: HadoopFsRelation if table.partitionColumnNames.nonEmpty &&
                               sparkSession.sqlContext.conf.manageFilesourcePartitions =>
    // Need to recover partitions into the metastore so our saved data is visible.
    sparkSession.sessionState.executePlan(
      AlterTableRecoverPartitionsCommand(table.identifier)).toRdd
  case _ =>
}
{code}
I checked this method DataSource#write in spark-2.1.1 found it has been replaced by writeAndRead.
 So I have to modify org.apache.spark.sql.hive.CreateCarbonSourceTableAsSelectCommand on line 146, change dataSource.write(mode, df) to dataSource.writeAndRead(mode, df)
 After that the problem was resolved.

 

 

 

 

  was:
I'm using building carbondata-1.6.0-rc3 with spark-2.1.1, and I found errors as follow:
{code:java}
[ERROR] /carbondata-root-1.6.0/integration/spark2/src/main/spark2.1/org/apache/spark/sql/hive/CreateCarbonSourceTableAsSelectCommand.scala:153: error: scrutinee is incompatible with pattern type;[ERROR] /carbondata-root-1.6.0/integration/spark2/src/main/spark2.1/org/apache/spark/sql/hive/CreateCarbonSourceTableAsSelectCommand.scala:153: error: scrutinee is incompatible with pattern type;[INFO] found : org.apache.spark.sql.execution.datasources.HadoopFsRelation[INFO] required: Unit[INFO] case fs:HadoopFsRelation if table.partitionColumnNames.nonEmpty &&[INFO] ^[WARNING] three warnings found
{code}
Finally I found the problem, In spark-2.1.1, org.apache.spark.sql.execution.datasources.Datasource#write has no return result, which in spark-2.1.0 has a BaseRelation as return.

spark-2.1.0:

 
{code:java}
/** Writes the given [[DataFrame]] out to this [[DataSource]]. */
def write(
    mode: SaveMode,
    data: DataFrame): BaseRelation = {
  if (data.schema.map(_.dataType).exists(_.isInstanceOf[CalendarIntervalType])) {
    throw new AnalysisException("Cannot save interval data type into external storage.")
  }
{code}
spark-2.1.1

 

 
{code:java}
/**
 * Writes the given [[DataFrame]] out to this [[DataSource]].
 */
def write(mode: SaveMode, data: DataFrame): Unit = {
  if (data.schema.map(_.dataType).exists(_.isInstanceOf[CalendarIntervalType])) {
    throw new AnalysisException("Cannot save interval data type into external storage.")
  }
{code}
 

so when we build carbondata with spark-2.1.1, this method will give Exception in this code, because result is Unit in spark-2.1.1.

 
{code:java}
val result = try {
  // dataSource.write(mode, df)
  dataSource.writeAndRead(mode, df)
} catch {
  case ex: AnalysisException =>
    logError(s"Failed to write to table $tableName in $mode mode", ex)
    throw ex
}
result match {
  case fs: HadoopFsRelation if table.partitionColumnNames.nonEmpty &&
                               sparkSession.sqlContext.conf.manageFilesourcePartitions =>
    // Need to recover partitions into the metastore so our saved data is visible.
    sparkSession.sessionState.executePlan(
      AlterTableRecoverPartitionsCommand(table.identifier)).toRdd
  case _ =>
}
{code}
I checked this method DataSource#write in spark-2.1.1 found it has been replaced by writeAndRead.
So I have to modify org.apache.spark.sql.hive.CreateCarbonSourceTableAsSelectCommand on line 146, change dataSource.write(mode, df) to dataSource.writeAndRead(mode, df)
After that the problem was resolved.

 

 

 

 


> How to build carbondata-1.6.0 with spark-2.1.1
> ----------------------------------------------
>
>                 Key: CARBONDATA-3549
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-3549
>             Project: CarbonData
>          Issue Type: Improvement
>    Affects Versions: 1.6.0
>            Reporter: wupeng
>            Priority: Minor
>
> I'm using building carbondata-1.6.0-rc3 with spark-2.1.1, and I found errors as follow:
> {code:java}
> [ERROR] /carbondata-root-1.6.0/integration/spark2/src/main/spark2.1/org/apache/spark/sql/hive/CreateCarbonSourceTableAsSelectCommand.scala:153: error: scrutinee is incompatible with pattern type;[ERROR] /carbondata-root-1.6.0/integration/spark2/src/main/spark2.1/org/apache/spark/sql/hive/CreateCarbonSourceTableAsSelectCommand.scala:153: error: scrutinee is incompatible with pattern type;[INFO] found : org.apache.spark.sql.execution.datasources.HadoopFsRelation[INFO] required: Unit[INFO] case fs:HadoopFsRelation if table.partitionColumnNames.nonEmpty &&[INFO] ^[WARNING] three warnings found
> {code}
> Finally I found the problem, In spark-2.1.1, org.apache.spark.sql.execution.datasources.Datasource#write has no return result, which in spark-2.1.0 has a BaseRelation as return.
> spark-2.1.0: 
> {code:java}
> /** Writes the given [[DataFrame]] out to this [[DataSource]]. */
> def write(
>     mode: SaveMode,
>     data: DataFrame): BaseRelation = {
>   if (data.schema.map(_.dataType).exists(_.isInstanceOf[CalendarIntervalType])) {
>     throw new AnalysisException("Cannot save interval data type into external storage.")
>   }
> {code}
> spark-2.1.1 
> {code:java}
> /**
>  * Writes the given [[DataFrame]] out to this [[DataSource]].
>  */
> def write(mode: SaveMode, data: DataFrame): Unit = {
>   if (data.schema.map(_.dataType).exists(_.isInstanceOf[CalendarIntervalType])) {
>     throw new AnalysisException("Cannot save interval data type into external storage.")
>   }
> {code}
>  so when we build carbondata with spark-2.1.1, this method will give Exception in this code, because result is Unit in spark-2.1.1. 
> {code:java}
> val result = try {
>   // dataSource.write(mode, df)
>   dataSource.writeAndRead(mode, df)
> } catch {
>   case ex: AnalysisException =>
>     logError(s"Failed to write to table $tableName in $mode mode", ex)
>     throw ex
> }
> result match {
>   case fs: HadoopFsRelation if table.partitionColumnNames.nonEmpty &&
>                                sparkSession.sqlContext.conf.manageFilesourcePartitions =>
>     // Need to recover partitions into the metastore so our saved data is visible.
>     sparkSession.sessionState.executePlan(
>       AlterTableRecoverPartitionsCommand(table.identifier)).toRdd
>   case _ =>
> }
> {code}
> I checked this method DataSource#write in spark-2.1.1 found it has been replaced by writeAndRead.
>  So I have to modify org.apache.spark.sql.hive.CreateCarbonSourceTableAsSelectCommand on line 146, change dataSource.write(mode, df) to dataSource.writeAndRead(mode, df)
>  After that the problem was resolved.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)