You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bahir.apache.org by "Luciano Resende (JIRA)" <ji...@apache.org> on 2017/07/26 17:26:02 UTC
[jira] [Resolved] (BAHIR-110) Implement _changes API for
non-streaming receiver
[ https://issues.apache.org/jira/browse/BAHIR-110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Luciano Resende resolved BAHIR-110.
-----------------------------------
Resolution: Fixed
Assignee: Esteban Laver
Fix Version/s: Spark-2.2.0
> Implement _changes API for non-streaming receiver
> -------------------------------------------------
>
> Key: BAHIR-110
> URL: https://issues.apache.org/jira/browse/BAHIR-110
> Project: Bahir
> Issue Type: Improvement
> Reporter: Esteban Laver
> Assignee: Esteban Laver
> Fix For: Spark-2.2.0
>
> Original Estimate: 216h
> Remaining Estimate: 216h
>
> Today we use the _changes API for Spark streaming receiver and _all_docs API for non-streaming receiver. _all_docs API supports parallel reads (using offset and range) but performance of _changes API is still better in most cases (even with single threaded support).
> With this ticket we want to:
> a) implement _changes API for non-streaming receivers
> b) allow customers to pick either _all_docs (default) or _changes API endpoint, with documentation about pros and cons
> _changes performance details:
> Successfully loaded Cloudant (using local cloudant-developer docker image) docs into Spark (local standalone) with the following database sizes: 15GB (time: 8 1/2 mins), 20GB (17 mins), 46GB (25 mins), and 75GB (48 1/2 mins).
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)