You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bahir.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/07/24 04:05:01 UTC

[jira] [Commented] (BAHIR-110) Implement _changes API for non-streaming receiver

    [ https://issues.apache.org/jira/browse/BAHIR-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16097919#comment-16097919 ] 

ASF GitHub Bot commented on BAHIR-110:
--------------------------------------

Github user emlaver commented on the issue:

    https://github.com/apache/bahir/pull/45
  
    @mayya-sharipova Thank you for all the time you've spent reviewing this work.
    The storageLevel option was not working as expected and also was not in the correct section in the README.  I’ve renamed the option to `cloudant.storageLevel` (as this option only works when _changes endpoint option is set), updated the README, and successfully tested and verified that the option is working using spark-submit and SparkSession’s config in 052abb4.  I've run the Scala tests and examples and they all passed.  Could you please review and approve these changes?


> Implement _changes API for non-streaming receiver
> -------------------------------------------------
>
>                 Key: BAHIR-110
>                 URL: https://issues.apache.org/jira/browse/BAHIR-110
>             Project: Bahir
>          Issue Type: Improvement
>            Reporter: Esteban Laver
>   Original Estimate: 216h
>  Remaining Estimate: 216h
>
> Today we use the _changes API for Spark streaming receiver and _all_docs API for non-streaming receiver. _all_docs API supports parallel reads (using offset and range) but performance of _changes API is still better in most cases (even with single threaded support).
> With this ticket we want to:
> a) implement _changes API for non-streaming receivers
> b) allow customers to pick either _all_docs (default) or _changes API endpoint, with documentation about pros and cons
> _changes performance details:
> Successfully loaded Cloudant (using local cloudant-developer docker image) docs into Spark (local standalone) with the following database sizes: 15GB (time: 8 1/2 mins), 20GB (17 mins), 46GB (25 mins), and 75GB (48 1/2 mins).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)