You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@bahir.apache.org by GitBox <gi...@apache.org> on 2019/02/05 00:46:01 UTC

[GitHub] ckadner commented on issue #75: [BAHIR-186] Reduce size of sql-cloudant test database

ckadner commented on issue #75: [BAHIR-186] Reduce size of sql-cloudant test database
URL: https://github.com/apache/bahir/pull/75#issuecomment-460471642
 
 
   @emlaver -- your changes look good and help speeding up the test execution 👍 
   
   I do have an issue that predates your changes though. After running `mvn clean install` and a few subsequent `mvn test -pl sql-cloudant` I keep running into the following test error in `CloudantChangesDFSuite`:
   
   ```
   - load data and verify deleted doc is not in results *** FAILED ***
     org.apache.bahir.cloudant.common.CloudantException: Error retrieving _changes feed data from database 'n_flight' with response code 401: {"error":"unauthorized","reason":"Name or password is incorrect."}
   ```
   
   When I run only the `CloudantChangesDFSuite` independently I do not get that error.
   ``` 
   $ mvn scalatest:test -pl sql-cloudant -Dsuites='org.apache.bahir.cloudant.CloudantChangesDFSuite'
   
   Run starting. Expected test count is: 9
   CloudantChangesDFSuite:
   
   Sql-cloudant tests that require Cloudant databases have been enabled by
   the environment variables CLOUDANT_USER and CLOUDANT_PASSWORD.
           
   - load and save data from Cloudant database
   - load and count data from Cloudant search index
   - load data and verify deleted doc is not in results
   - load data and count rows in filtered dataframe
   - save filtered dataframe to database
   - save dataframe to database using createDBOnSave=true option
   - load and count data from view
   - load data from view with MapReduce function
   - load data and verify total count of selector, filter, and view option
   Run completed in 2 minutes, 40 seconds.
   Total number of tests run: 9
   Suites: completed 1, aborted 0
   Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0
   All tests passed.
   ```
   
   Maven log (console output) for all unit tests in SQL-Cloudant:
   ```
   $ mvn test -pl sql-cloudant
   
   ...
   [INFO] --- scalatest-maven-plugin:1.0:test (test) @ spark-sql-cloudant_2.11 ---
   Discovery starting.
   
   Sql-cloudant tests that require Cloudant databases have been enabled by
   the environment variables CLOUDANT_USER and CLOUDANT_PASSWORD.
    
   ...
   
   CloudantChangesDFSuite:
   - load and save data from Cloudant database
   - load and count data from Cloudant search index
   - load data and verify deleted doc is not in results *** FAILED ***
     org.apache.bahir.cloudant.common.CloudantException: Error retrieving _changes feed data from database 'n_flight' with response code 401: {"error":"unauthorized","reason":"Name or password is incorrect."}
     at org.apache.bahir.cloudant.DefaultSource.create(DefaultSource.scala:162)
     at org.apache.bahir.cloudant.DefaultSource.createRelation(DefaultSource.scala:95)
     at org.apache.bahir.cloudant.DefaultSource.createRelation(DefaultSource.scala:87)
     at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:341)
     at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
     at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
     at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:174)
     at org.apache.bahir.cloudant.CloudantChangesDFSuite$$anonfun$8.apply$mcV$sp(CloudantChangesDFSuite.scala:67)
     at org.apache.bahir.cloudant.CloudantChangesDFSuite$$anonfun$8.apply(CloudantChangesDFSuite.scala:62)
     at org.apache.bahir.cloudant.CloudantChangesDFSuite$$anonfun$8.apply(CloudantChangesDFSuite.scala:62)
     ...
   - load data and count rows in filtered dataframe
   - save filtered dataframe to database
   - save dataframe to database using createDBOnSave=true option
   - load and count data from view
   - load data from view with MapReduce function
   - load data and verify total count of selector, filter, and view option
   
   ...
   
   Run completed in 3 minutes, 34 seconds.
   Total number of tests run: 25
   Suites: completed 6, aborted 0
   Tests: succeeded 24, failed 1, canceled 0, ignored 0, pending 0
   *** 1 TEST FAILED ***
   [INFO] ------------------------------------------------------------------------
   [INFO] BUILD FAILURE
   ```
   
   However, when I take a look at the `sql-cloudant/target/unit-tests.log` I don't find any errors logged during the execution of that test case:
   ```
   ===== TEST OUTPUT FOR org.apache.bahir.cloudant.CloudantChangesDFSuite: 'load data and verify deleted doc is not in results' =====
   
   19/02/04 15:07:11.114 ScalaTest-main-running-CloudantChangesDFSuite INFO SharedState: loading hive config file: jar:file:/Users/ckadner/.m2/repository/org/apache/spark/spark-sql_2.11/2.3.2/spark-sql_2.11-2.3.2-tests.jar!/hive-site.xml
   19/02/04 15:07:11.130 ScalaTest-main-running-CloudantChangesDFSuite INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/Users/ckadner/Projects/bahir_apache_merges/sql-cloudant/spark-warehouse/').
   19/02/04 15:07:11.130 ScalaTest-main-running-CloudantChangesDFSuite INFO SharedState: Warehouse path is 'file:/Users/ckadner/Projects/bahir_apache_merges/sql-cloudant/spark-warehouse/'.
   19/02/04 15:07:11.132 ScalaTest-main-running-CloudantChangesDFSuite INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
   19/02/04 15:07:11.288 ScalaTest-main-running-CloudantChangesDFSuite INFO DefaultSource: Loading data from Cloudant using https://testy-spark-connector.cloudant.com/n_flight/_changes?include_docs=true&feed=normal&seq_interval=200&timeout=60000&filter=_selector
   19/02/04 15:07:11.292 streaming-start INFO ReceiverTracker: Starting 1 receivers
   19/02/04 15:07:11.292 streaming-start INFO ReceiverTracker: ReceiverTracker started
   19/02/04 15:07:11.292 streaming-start INFO PluggableInputDStream: Slide time = 8000 ms
   19/02/04 15:07:11.293 streaming-start INFO PluggableInputDStream: Storage level = Memory Deserialized 1x Replicated
   19/02/04 15:07:11.293 streaming-start INFO PluggableInputDStream: Checkpoint interval = null
   19/02/04 15:07:11.293 streaming-start INFO PluggableInputDStream: Remember interval = 8000 ms
   19/02/04 15:07:11.293 streaming-start INFO PluggableInputDStream: Initialized and validated org.apache.spark.streaming.dstream.PluggableInputDStream@3fb41f46
   19/02/04 15:07:11.293 streaming-start INFO ForEachDStream: Slide time = 8000 ms
   19/02/04 15:07:11.293 streaming-start INFO ForEachDStream: Storage level = Serialized 1x Replicated
   19/02/04 15:07:11.293 streaming-start INFO ForEachDStream: Checkpoint interval = null
   19/02/04 15:07:11.293 streaming-start INFO ForEachDStream: Remember interval = 8000 ms
   19/02/04 15:07:11.293 streaming-start INFO ForEachDStream: Initialized and validated org.apache.spark.streaming.dstream.ForEachDStream@14cb05dd
   19/02/04 15:07:11.293 streaming-start INFO RecurringTimer: Started timer for JobGenerator at time 1549321632000
   19/02/04 15:07:11.294 streaming-start INFO JobGenerator: Started JobGenerator at 1549321632000 ms
   19/02/04 15:07:11.294 streaming-start INFO JobScheduler: Started JobScheduler
   19/02/04 15:07:11.294 ScalaTest-main-running-CloudantChangesDFSuite INFO StreamingContext: StreamingContext started
   19/02/04 15:07:11.311 dag-scheduler-event-loop INFO DAGScheduler: Got job 0 (start at DefaultSource.scala:155) with 1 output partitions
   19/02/04 15:07:11.311 dag-scheduler-event-loop INFO DAGScheduler: Final stage: ResultStage 0 (start at DefaultSource.scala:155)
   19/02/04 15:07:11.311 dag-scheduler-event-loop INFO DAGScheduler: Parents of final stage: List()
   19/02/04 15:07:11.311 dag-scheduler-event-loop INFO DAGScheduler: Missing parents: List()
   19/02/04 15:07:11.311 dispatcher-event-loop-2 INFO ReceiverTracker: Receiver 0 started
   19/02/04 15:07:11.311 dag-scheduler-event-loop INFO DAGScheduler: Submitting ResultStage 0 (Receiver 0 ParallelCollectionRDD[1] at makeRDD at ReceiverTracker.scala:613), which has no missing parents
   19/02/04 15:07:11.327 dag-scheduler-event-loop INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 62.3 KB, free 1638.5 MB)
   19/02/04 15:07:11.329 dag-scheduler-event-loop INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 21.8 KB, free 1638.5 MB)
   19/02/04 15:07:11.329 dispatcher-event-loop-3 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on ckadner-mbp.hsd1.ca.comcast.net:62424 (size: 21.8 KB, free: 1638.6 MB)
   19/02/04 15:07:11.329 dag-scheduler-event-loop INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1039
   19/02/04 15:07:11.330 dag-scheduler-event-loop INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (Receiver 0 ParallelCollectionRDD[1] at makeRDD at ReceiverTracker.scala:613) (first 15 tasks are for partitions Vector(0))
   19/02/04 15:07:11.330 dag-scheduler-event-loop INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
   19/02/04 15:07:11.332 dispatcher-event-loop-0 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 9000 bytes)
   19/02/04 15:07:11.332 Executor task launch worker for task 0 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
   19/02/04 15:07:11.341 Executor task launch worker for task 0 INFO RecurringTimer: Started timer for BlockGenerator at time 1549321631400
   19/02/04 15:07:11.341 Executor task launch worker for task 0 INFO BlockGenerator: Started BlockGenerator
   19/02/04 15:07:11.342 Thread-35 INFO BlockGenerator: Started block pushing thread
   19/02/04 15:07:11.342 dispatcher-event-loop-0 INFO ReceiverTracker: Registered receiver for stream 0 from ckadner-mbp.hsd1.ca.comcast.net:62423
   19/02/04 15:07:11.342 Executor task launch worker for task 0 INFO ReceiverSupervisorImpl: Starting receiver 0
   19/02/04 15:07:11.342 Executor task launch worker for task 0 INFO ReceiverSupervisorImpl: Called receiver 0 onStart
   19/02/04 15:07:11.342 Executor task launch worker for task 0 INFO ReceiverSupervisorImpl: Waiting for receiver to be stopped
   19/02/04 15:07:12.003 JobGenerator INFO JobScheduler: Added jobs for time 1549321632000 ms
   19/02/04 15:07:12.003 JobScheduler INFO JobScheduler: Starting job streaming job 1549321632000 ms.0 from job set of time 1549321632000 ms
   19/02/04 15:07:12.051 streaming-job-executor-0 INFO SparkContext: Starting job: json at DefaultSource.scala:150
   19/02/04 15:07:12.052 streaming-job-executor-0 INFO DAGScheduler: Job 1 finished: json at DefaultSource.scala:150, took 0.000025 s
   19/02/04 15:07:12.063 dispatcher-event-loop-1 INFO ReceiverTracker: Sent stop signal to all 1 receivers
   19/02/04 15:07:12.063 dispatcher-event-loop-2 INFO ReceiverSupervisorImpl: Received stop signal
   19/02/04 15:07:12.063 dispatcher-event-loop-2 INFO ReceiverSupervisorImpl: Stopping receiver with message: Stopped by driver: 
   19/02/04 15:07:12.063 dispatcher-event-loop-2 INFO ReceiverSupervisorImpl: Called receiver onStop
   19/02/04 15:07:12.063 dispatcher-event-loop-2 INFO ReceiverSupervisorImpl: Deregistering receiver 0
   19/02/04 15:07:12.064 dispatcher-event-loop-3 ERROR ReceiverTracker: Deregistered receiver for stream 0: Stopped by driver
   19/02/04 15:07:12.064 dispatcher-event-loop-2 INFO ReceiverSupervisorImpl: Stopped receiver 0
   19/02/04 15:07:12.064 dispatcher-event-loop-2 INFO BlockGenerator: Stopping BlockGenerator
   19/02/04 15:07:12.402 dispatcher-event-loop-2 INFO RecurringTimer: Stopped timer for BlockGenerator after time 1549321632400
   19/02/04 15:07:12.402 dispatcher-event-loop-2 INFO BlockGenerator: Waiting for block pushing thread to terminate
   19/02/04 15:07:12.406 Thread-35 INFO BlockGenerator: Pushing out the last 0 blocks
   19/02/04 15:07:12.406 Thread-35 INFO BlockGenerator: Stopped block pushing thread
   19/02/04 15:07:12.406 dispatcher-event-loop-2 INFO BlockGenerator: Stopped BlockGenerator
   19/02/04 15:07:12.406 Executor task launch worker for task 0 INFO ReceiverSupervisorImpl: Stopped receiver without error
   19/02/04 15:07:12.407 Executor task launch worker for task 0 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 751 bytes result sent to driver
   19/02/04 15:07:12.407 task-result-getter-0 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 1076 ms on localhost (executor driver) (1/1)
   19/02/04 15:07:12.407 task-result-getter-0 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
   19/02/04 15:07:12.408 dag-scheduler-event-loop INFO DAGScheduler: ResultStage 0 (start at DefaultSource.scala:155) finished in 1.096 s
   19/02/04 15:07:12.408 streaming-job-executor-0 INFO ReceiverTracker: All of the receivers have deregistered successfully
   19/02/04 15:07:12.408 streaming-job-executor-0 INFO ReceiverTracker: ReceiverTracker stopped
   19/02/04 15:07:12.408 streaming-job-executor-0 INFO JobGenerator: Stopping JobGenerator immediately
   19/02/04 15:07:12.409 streaming-job-executor-0 INFO RecurringTimer: Stopped timer for JobGenerator after time 1549321632000
   19/02/04 15:07:12.409 streaming-job-executor-0 INFO JobGenerator: Stopped JobGenerator
   19/02/04 15:07:12.687 Spark Context Cleaner INFO ContextCleaner: Cleaned accumulator 470
   <<19/02/04 15:07:12. ... Spark Context Cleaner INFO ContextCleaner: Cleaned accumulator ...>>
   19/02/04 15:07:12.688 Spark Context Cleaner INFO ContextCleaner: Cleaned accumulator 479
   19/02/04 15:07:12.690 dispatcher-event-loop-3 INFO BlockManagerInfo: Removed broadcast_0_piece0 on ckadner-mbp.hsd1.ca.comcast.net:62424 in memory (size: 21.8 KB, free: 1638.6 MB)
   19/02/04 15:07:12.693 Spark Context Cleaner INFO ContextCleaner: Cleaned accumulator 467
   <<19/02/04 15:07:12. ... Spark Context Cleaner INFO ContextCleaner: Cleaned accumulator ...>>
   19/02/04 15:07:12.693 Spark Context Cleaner INFO ContextCleaner: Cleaned accumulator 486
   19/02/04 15:07:14.412 streaming-job-executor-0 INFO JobScheduler: Stopped JobScheduler
   19/02/04 15:07:14.413 streaming-job-executor-0 INFO StreamingContext: StreamingContext stopped successfully
   19/02/04 15:07:14.415 ScalaTest-main-running-CloudantChangesDFSuite INFO CloudantChangesDFSuite: 
   
   ===== FINISHED org.apache.bahir.cloudant.CloudantChangesDFSuite: 'load data and verify deleted doc is not in results' =====
   ```
   
   I do however find that error message earlier in the log during the test case `incorrect password throws an error message for changes receiver` where it is expected:
   ```
   ===== TEST OUTPUT FOR org.apache.bahir.cloudant.CloudantOptionSuite: 'incorrect password throws an error message for changes receiver' =====
   ...
   19/02/04 15:06:32.751 Cloudant Receiver ERROR CookieInterceptor: Credentials are incorrect for server https://testy-spark-connector.cloudant.com:443/_session, cookie authentication will not be attempted again by this interceptor object
   19/02/04 15:06:33.909 Cloudant Receiver WARN ReceiverSupervisorImpl: Reported error Error retrieving _changes feed data from database 'n_flight' with response code 401: {"error":"unauthorized","reason":"Name or password is incorrect."} - org.apache.bahir.cloudant.common.CloudantException: Error retrieving _changes feed data from database 'n_flight' with response code 401: {"error":"unauthorized","reason":"Name or password is incorrect."}
   19/02/04 15:06:33.912 dispatcher-event-loop-1 WARN ReceiverTracker: Error reported by receiver for stream 0: Error retrieving _changes feed data from database 'n_flight' with response code 401: {"error":"unauthorized","reason":"Name or password is incorrect."} - org.apache.bahir.cloudant.common.CloudantException: Error retrieving _changes feed data from database 'n_flight' with response code 401: {"error":"unauthorized","reason":"Name or password is incorrect."}
   	at org.apache.bahir.cloudant.internal.ChangesReceiver.org$apache$bahir$cloudant$internal$ChangesReceiver$$receive(ChangesReceiver.scala:79)
   	at org.apache.bahir.cloudant.internal.ChangesReceiver$$anon$1.run(ChangesReceiver.scala:37)
   
   ...
   19/02/04 15:06:44.445 ScalaTest-main-running-CloudantOptionSuite INFO CloudantOptionSuite: 
   
   ===== FINISHED org.apache.bahir.cloudant.CloudantOptionSuite: 'incorrect password throws an error message for changes receiver' =====
   ```
   
   
   When I look at the sources here https://github.com/apache/bahir/blob/c51853d135ad2d9da67804259f4ed0e29223afb3/sql-cloudant/src/main/scala/org/apache/bahir/cloudant/DefaultSource.scala#L159-L163
   
   ...it appears that this error is a fluke? The test case `load data and verify deleted doc is not in results` expects the schema to be empty but the `DefaultSource` treats that as an error and throws up the last `receiverErrorMsg` which in this case comes from the earlier run test case `incorrect password throws an error message for changes receiver`. 
   
   At least I thought that at first sight, but the test case does not seem to expect an empty schema 
   https://github.com/apache/bahir/blob/c51853d135ad2d9da67804259f4ed0e29223afb3/sql-cloudant/src/test/scala/org/apache/bahir/cloudant/CloudantChangesDFSuite.scala#L67-L69
   
   So I am not sure what is happening. But you might? :-)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services