You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/03/02 17:57:54 UTC
[GitHub] [iceberg] rlcyf opened a new issue #2289: data is not updated in spark-shell
rlcyf opened a new issue #2289:
URL: https://github.com/apache/iceberg/issues/2289
spark 3.0.1
iceberg 0.11
```
# push one data to kafka
bin/kafka-console-producer.sh --topic test --bootstrap-server localhost:9092
> {"user_id":1}
```
```
# use structured-streaming consume data and the consumption is successful
val tableIdentifier: String = ...
data.writeStream
.format("iceberg")
.outputMode("append")
.trigger(Trigger.ProcessingTime(1, TimeUnit.MINUTES))
.option("path", tableIdentifier)
.option("checkpointLocation", checkpointPath)
.start()
```
when I execute a query in spark-shell
```
bin/spark-shell --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions --conf spark.sql.catalog.prod=org.apache.iceberg.spark.SparkCatalog --conf spark.sql.catalog.prod.type=hive --conf spark.sql.catalog.prod.warehouse=hdfs://localhost:9000/prod --conf spark.sql.warehouse.dir=hdfs://localhost:9000/prod
spark.sql("select * from prod.db.sample").count
res0: Long = 1
# count on trino
trino:db> select count(1) from prod.db.sample;
1
(1 rows)
```
```
# push one data again
bin/kafka-console-producer.sh --topic test --bootstrap-server localhost:9092
> {"user_id":1}
```
```
spark.sql("select * from prod.db.sample").count
res0: Long = 1
# count on trino
trino:db> select count(1) from prod.db.sample;
2
(1 rows)
```
in trino, the correct results can be queried in real time
when I close spark-shell, restart it
```
spark.sql("select * from prod.db.sample").count
res0: Long = 2
```
the result is correct
there is another situation,after inserting the data, a period of time has passed (i don't know how long it takes)
query again! the result of the query is correct!
Has a merger compact?
How can I set up to check the correct data in real-time in the spark shell?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] rlcyf closed issue #2289: data is not updated in spark-shell
Posted by GitBox <gi...@apache.org>.
rlcyf closed issue #2289:
URL: https://github.com/apache/iceberg/issues/2289
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] rlcyf commented on issue #2289: data is not updated in spark-shell
Posted by GitBox <gi...@apache.org>.
rlcyf commented on issue #2289:
URL: https://github.com/apache/iceberg/issues/2289#issuecomment-789466988
> 1.The `CachingCatalog` cache is used by default for SQL queries, which can be turned off by adding the following parameter when launching Spark-shell
>
> ```
> --conf "spark.sql.catalog.hadoop_prod.cache-enabled=false"
> ```
>
> 2.The other way is to refresh the current table before querying the Iceberg table:
>
> ```
> spark.sql("refresh table prod.db.tb")
> spark.sql("select * from prod.db.tb")
> ```
ths!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] zhangdove commented on issue #2289: data is not updated in spark-shell
Posted by GitBox <gi...@apache.org>.
zhangdove commented on issue #2289:
URL: https://github.com/apache/iceberg/issues/2289#issuecomment-789408931
1.The `CachingCatalog` cache is used by default for SQL queries, which can be turned off by adding the following parameter when launching Spark-shell
```
--conf "spark.sql.catalog.hadoop_prod.cache-enabled=false"
```
2.The other way is to refresh the current table before querying the Iceberg table:
```
spark.sql("refresh table prod.db.tb")
spark.sql("select * from prod.db.tb")
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org