You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/07/02 07:53:04 UTC

[GitHub] [iceberg] zhangdove opened a new issue #1160: NoSuchTableException: Table does not exist

zhangdove opened a new issue #1160:
URL: https://github.com/apache/iceberg/issues/1160


   I have some test code use the function of `removeOrphanFiles`, throw NoSuchTableException.
   
   ```scala
     case class TwoColumnRecord(id: String, name: String)
   
     def testCode(spark: SparkSession): Unit = {
       val schemaName = "testDb"
       val tableName = "testTb"
   
       val conf: Configuration = new Configuration(spark.sparkContext.hadoopConfiguration)
       val catalog: HadoopCatalog = new HadoopCatalog(conf, conf.get("fs.defaultFS") + "/iceberg/warehouse")
   
       // 1. create iceberg table by hadoopCatalog
       val nameSpace = Namespace.of(schemaName)
       val tableIdentifier: TableIdentifier = TableIdentifier.of(nameSpace, tableName)
       val columns: List[Types.NestedField] = new ArrayList[Types.NestedField]
       columns.add(Types.NestedField.of(1, true, "id", Types.StringType.get, "id doc"))
       columns.add(Types.NestedField.of(2, true, "name", Types.StringType.get, "name doc"))
       val schema: Schema = new Schema(columns)
       val table: Table = catalog.createTable(tableIdentifier, schema)
   
       // 2. create DataFrame
       val df = spark.createDataFrame(Seq(TwoColumnRecord("1", "iceberg"), TwoColumnRecord("2", "spark"))).toDF()
       // 3. write data to iceberg table
       df.write.format("iceberg").mode("append").save(table.location())
       Thread.sleep(1000)
       // 4. write data by parquet to path of data
       df.write.format("parquet").mode("append").save(table.location() + "/data/")
   
       // 5. removeOrphanFiles
       Thread.sleep(1000)
       val actions: Actions = Actions.forTable(table)
       val removeFileList = actions.removeOrphanFiles().olderThan(System.currentTimeMillis()).execute()
       // throw Exception and exit
     }
   ```
   The expected result is normal exit and delete some orphan files. However, I get some error:
   ```java
   Exception in thread "main" org.apache.iceberg.exceptions.NoSuchTableException: Table does not exist: testDb.testTb
   	at org.apache.iceberg.BaseMetastoreCatalog.loadMetadataTable(BaseMetastoreCatalog.java:153)
   	at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:139)
   	at org.apache.iceberg.spark.source.IcebergSource.findTable(IcebergSource.java:148)
   	at org.apache.iceberg.spark.source.IcebergSource.getTableAndResolveHadoopConfiguration(IcebergSource.java:177)
   	at org.apache.iceberg.spark.source.IcebergSource.createReader(IcebergSource.java:80)
   	at org.apache.iceberg.spark.source.IcebergSource.createReader(IcebergSource.java:74)
   	at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation$SourceHelpers.createReader(DataSourceV2Relation.scala:155)
   	at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation$.create(DataSourceV2Relation.scala:172)
   	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:204)
   	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
   	at org.apache.iceberg.actions.RemoveOrphanFilesAction.buildValidDataFileDF(RemoveOrphanFilesAction.java:161)
   	at org.apache.iceberg.actions.RemoveOrphanFilesAction.execute(RemoveOrphanFilesAction.java:139)
   	at com.dove.iceberg.IcebergIssues$.testCode(IcebergIssues.scala:64)
   	at com.dove.iceberg.IcebergIssues$.main(IcebergIssues.scala:29)
   ```
   I had checked my hadoop file.
   ```bash
   [root@hadoop39 ~]# hdfs dfs -ls /iceberg/warehouse/testDb/testTb/*/*
   -rw-r--r--   3 hdfs supergroup        645 2020-07-02 15:39 /iceberg/warehouse/testDb/testTb/data/00000-0-dba22cf3-b96c-467c-8bf3-01dd7d2f45c6-00000.parquet
   -rw-r--r--   3 hdfs supergroup        630 2020-07-02 15:39 /iceberg/warehouse/testDb/testTb/data/00001-1-1c2caf36-bc08-492d-8f1b-cc0fe83795ab-00000.parquet
   -rw-r--r--   3 hdfs supergroup          0 2020-07-02 15:39 /iceberg/warehouse/testDb/testTb/data/_SUCCESS
   -rw-r--r--   3 hdfs supergroup        607 2020-07-02 15:39 /iceberg/warehouse/testDb/testTb/data/part-00000-7c56f3e9-3b0f-48db-ad77-340ea302074c-c000.snappy.parquet
   -rw-r--r--   3 hdfs supergroup        589 2020-07-02 15:39 /iceberg/warehouse/testDb/testTb/data/part-00001-7c56f3e9-3b0f-48db-ad77-340ea302074c-c000.snappy.parquet
   -rw-r--r--   3 hdfs supergroup       4221 2020-07-02 15:39 /iceberg/warehouse/testDb/testTb/metadata/1c330540-e731-4804-b1d4-4ace0952ea0a-m0.avro
   -rw-r--r--   3 hdfs supergroup       2544 2020-07-02 15:39 /iceberg/warehouse/testDb/testTb/metadata/snap-7685756080210806989-1-1c330540-e731-4804-b1d4-4ace0952ea0a.avro
   -rw-r--r--   3 hdfs supergroup        762 2020-07-02 15:39 /iceberg/warehouse/testDb/testTb/metadata/v1.metadata.json
   -rw-r--r--   3 hdfs supergroup       1503 2020-07-02 15:39 /iceberg/warehouse/testDb/testTb/metadata/v2.metadata.json
   -rw-r--r--   3 hdfs supergroup          1 2020-07-02 15:39 /iceberg/warehouse/testDb/testTb/metadata/version-hint.text
   ```
   Iceberg table is created successed and write some data successed.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue closed issue #1160: NoSuchTableException: Table does not exist

Posted by GitBox <gi...@apache.org>.

rdblue closed issue #1160:
URL: https://github.com/apache/iceberg/issues/1160


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] zhangdove commented on issue #1160: NoSuchTableException: Table does not exist

Posted by GitBox <gi...@apache.org>.

zhangdove commented on issue #1160:
URL: https://github.com/apache/iceberg/issues/1160#issuecomment-652859150


   I have add a PR [1161](https://github.com/apache/iceberg/pull/1161).
   Who can review the issue at a convenient time?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on issue #1160: NoSuchTableException: Table does not exist

Posted by GitBox <gi...@apache.org>.

rdblue commented on issue #1160:
URL: https://github.com/apache/iceberg/issues/1160#issuecomment-653621430


   To summarize the fix from the PR, the issue is that a table loaded by HadoopCatalog reports a table name rather than a location. When we try to use the table name to load a metadata table from the Spark 2.4 IcebergSource, it uses the HiveCatalog instead of a HadoopCatalog (because the HiveCatalog is configured, HadoopCatalog is not). The fix is to convert tables loaded by a Hadoop catalog (name starting with `hadoop.`) to paths so that HadoopTables is used to load the metadata.
   
   We should also consider adding configuration for the IcebergSource. Maybe we could configure it to use a HadoopCatalog.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org