You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by Garren Smith <ga...@apache.org> on 2020/05/11 11:35:13 UTC

Re: Partition query endpoints in CouchDB 4.0

Coming back to this. I still think we should support it fully in 4.x so
that anyone using it in 3.x will not experience any api changes when moving
to 4.x. Once we have had more people use it in 3.x we can make a call on
deprecating it for 5.x or look at adding more features to it.

On Tue, Apr 21, 2020 at 11:01 PM Robert Samuel Newson <rn...@apache.org>
wrote:

> On Adam's point that the partitioned query api encourages good choices
> ("discourages hot spots"), that's only true for folks that read the
> documentation, which in my experience is a low percentage of folks. I've
> encountered a heavy user of partitioned dbs that had precisely four
> partitions in mind, for millions of docs (They chose "doc_type" as their
> partition value).
>
> My view for 4.0 is;
>
> 1) ignore the partitioned flag when creating databases
>
 I don't think we should ignore it.

2) the "partitioned" property no longer reported in GET /dbname
>

I would prefer we report the partitioned flag.  It seems confusing to not
report a setting a user intentionally set.

3) the various _partition endpoints still work
> 4) all views work either "global" or "partitioned" depending on the
> endpoint used.
>
> for 5.0 I'm +0 on removing the _partition endpoints, but we can take that
> vote at the time based on contemporary feedback.
>
> B.
>
> > On 21 Apr 2020, at 21:35, Robert Samuel Newson <rn...@apache.org>
> wrote:
> >
> > Hi,
> >
> > Good points on both sides of this. One thing we can hopefully get
> agreement on is the ?partitioned=true flag on creation and, deeper, the
> lack of distinction between the two "types" of database going forward?
> >
> > B.
> >
> >> On 21 Apr 2020, at 18:51, Garren Smith <ga...@apache.org> wrote:
> >>
> >> I'm on the fence when it comes to removing it. In terms of the original
> >> plan of making querying faster by querying fewer shards that obviously
> >> isn't needed. But I think it does create a nice mental model/design
> pattern
> >> when building an application in CouchDB.  Splitting your data into
> >> partitions that contain similar documents makes sense. And once we on
> FDB
> >> it would be awesome to see if we could have a changes feed per
> partition.
> >> That would be a really nice feature.
> >>
> >> Cheers
> >> Garren
> >>
> >> On Tue, Apr 21, 2020 at 5:51 PM Adam Kocoloski <ko...@apache.org>
> wrote:
> >>
> >>> I think it’s difficult to make a call when 3.0 is still so new.
> >>>
> >>> The case for deprecation here is basically less code to maintain,
> right?
> >>> It’s not like a user of partitioned databases is causing pain for an
> >>> FDB-based CouchDB; if anything, there’s a second-order benefit because
> the
> >>> partitioning discourages hot spots from forming in the
> (range-partitioned)
> >>> FDB keyspace.
> >>>
> >>> Cheers, Adam
> >>>
> >>>> On Apr 20, 2020, at 11:51 PM, Kyle Snavely <kj...@gmail.com>
> wrote:
> >>>>
> >>>> My two cents is the same. Let's allow 3.* users migrate to 4.* without
> >>>> needing to e.g. change the PQ part of their application and remove
> the PQ
> >>>> endpoints in 5.0.
> >>>>
> >>>> Best,
> >>>> Kyle
> >>>>
> >>>> On Mon, Apr 20, 2020, 4:16 PM Ilya Khlopotov <ii...@apache.org>
> wrote:
> >>>>
> >>>>> Given that it unlikely that there are too many people using it and
> it is
> >>>>> being noop in FDB world. I think we should deprecate and remove
> >>> _partition
> >>>>> endpoint.
> >>>>>
> >>>>> On 2020/04/20 21:04:58, Robert Samuel Newson <rn...@apache.org>
> >>> wrote:
> >>>>>> Hi All,
> >>>>>>
> >>>>>> I'd like to get views on whether we should preserve the _partition
> >>>>> endpoints in CouchDB 4.0 or remove them. In CouchDB 4.0 all _view and
> >>> _find
> >>>>> queries will automatically benefit from the same performance boost
> that
> >>> the
> >>>>> "partitioned database" feature brings, by virtue of FoundationDB.
> >>>>>>
> >>>>>> If we're preserving it, are we also deprecating it (so it's gone in
> >>> 5.0)?
> >>>>>>
> >>>>>> If we're ditching it, what will the endpoint return instead (404 Not
> >>>>> Found, 410 Gone?)
> >>>>>>
> >>>>>> Thoughts welcome.
> >>>>>>
> >>>>>> B.
> >>>>>
> >>>
> >>>
> >
>
>

Re: Partition query endpoints in CouchDB 4.0

Posted by Joshua Mintz <mi...@gmail.com>.
As a data point - during my work with clients, I have seen partition
queries by successful in guiding more thoughtful data modeling design. Also
in my experience, the guiding principles of correct partitioning are more
approachable for people moving from other database systems like SQL Server,
PostgreSQL, Oracle, Apache Cassandra.

To Robert's point, I have also seen folks make questionable thrusts at a
partitioned data model. But their "before thrust" models may have been
equally bad, worse, or not really considered.

Also, partitioned changes feed has been a not uncommon request.

-j

On Mon, May 11, 2020 at 7:35 AM Garren Smith <ga...@apache.org> wrote:

> Coming back to this. I still think we should support it fully in 4.x so
> that anyone using it in 3.x will not experience any api changes when moving
> to 4.x. Once we have had more people use it in 3.x we can make a call on
> deprecating it for 5.x or look at adding more features to it.
>
> On Tue, Apr 21, 2020 at 11:01 PM Robert Samuel Newson <rn...@apache.org>
> wrote:
>
> > On Adam's point that the partitioned query api encourages good choices
> > ("discourages hot spots"), that's only true for folks that read the
> > documentation, which in my experience is a low percentage of folks. I've
> > encountered a heavy user of partitioned dbs that had precisely four
> > partitions in mind, for millions of docs (They chose "doc_type" as their
> > partition value).
> >
> > My view for 4.0 is;
> >
> > 1) ignore the partitioned flag when creating databases
> >
>  I don't think we should ignore it.
>
> 2) the "partitioned" property no longer reported in GET /dbname
> >
>
> I would prefer we report the partitioned flag.  It seems confusing to not
> report a setting a user intentionally set.
>
> 3) the various _partition endpoints still work
> > 4) all views work either "global" or "partitioned" depending on the
> > endpoint used.
> >
> > for 5.0 I'm +0 on removing the _partition endpoints, but we can take that
> > vote at the time based on contemporary feedback.
> >
> > B.
> >
> > > On 21 Apr 2020, at 21:35, Robert Samuel Newson <rn...@apache.org>
> > wrote:
> > >
> > > Hi,
> > >
> > > Good points on both sides of this. One thing we can hopefully get
> > agreement on is the ?partitioned=true flag on creation and, deeper, the
> > lack of distinction between the two "types" of database going forward?
> > >
> > > B.
> > >
> > >> On 21 Apr 2020, at 18:51, Garren Smith <ga...@apache.org> wrote:
> > >>
> > >> I'm on the fence when it comes to removing it. In terms of the
> original
> > >> plan of making querying faster by querying fewer shards that obviously
> > >> isn't needed. But I think it does create a nice mental model/design
> > pattern
> > >> when building an application in CouchDB.  Splitting your data into
> > >> partitions that contain similar documents makes sense. And once we on
> > FDB
> > >> it would be awesome to see if we could have a changes feed per
> > partition.
> > >> That would be a really nice feature.
> > >>
> > >> Cheers
> > >> Garren
> > >>
> > >> On Tue, Apr 21, 2020 at 5:51 PM Adam Kocoloski <ko...@apache.org>
> > wrote:
> > >>
> > >>> I think it’s difficult to make a call when 3.0 is still so new.
> > >>>
> > >>> The case for deprecation here is basically less code to maintain,
> > right?
> > >>> It’s not like a user of partitioned databases is causing pain for an
> > >>> FDB-based CouchDB; if anything, there’s a second-order benefit
> because
> > the
> > >>> partitioning discourages hot spots from forming in the
> > (range-partitioned)
> > >>> FDB keyspace.
> > >>>
> > >>> Cheers, Adam
> > >>>
> > >>>> On Apr 20, 2020, at 11:51 PM, Kyle Snavely <kj...@gmail.com>
> > wrote:
> > >>>>
> > >>>> My two cents is the same. Let's allow 3.* users migrate to 4.*
> without
> > >>>> needing to e.g. change the PQ part of their application and remove
> > the PQ
> > >>>> endpoints in 5.0.
> > >>>>
> > >>>> Best,
> > >>>> Kyle
> > >>>>
> > >>>> On Mon, Apr 20, 2020, 4:16 PM Ilya Khlopotov <ii...@apache.org>
> > wrote:
> > >>>>
> > >>>>> Given that it unlikely that there are too many people using it and
> > it is
> > >>>>> being noop in FDB world. I think we should deprecate and remove
> > >>> _partition
> > >>>>> endpoint.
> > >>>>>
> > >>>>> On 2020/04/20 21:04:58, Robert Samuel Newson <rn...@apache.org>
> > >>> wrote:
> > >>>>>> Hi All,
> > >>>>>>
> > >>>>>> I'd like to get views on whether we should preserve the _partition
> > >>>>> endpoints in CouchDB 4.0 or remove them. In CouchDB 4.0 all _view
> and
> > >>> _find
> > >>>>> queries will automatically benefit from the same performance boost
> > that
> > >>> the
> > >>>>> "partitioned database" feature brings, by virtue of FoundationDB.
> > >>>>>>
> > >>>>>> If we're preserving it, are we also deprecating it (so it's gone
> in
> > >>> 5.0)?
> > >>>>>>
> > >>>>>> If we're ditching it, what will the endpoint return instead (404
> Not
> > >>>>> Found, 410 Gone?)
> > >>>>>>
> > >>>>>> Thoughts welcome.
> > >>>>>>
> > >>>>>> B.
> > >>>>>
> > >>>
> > >>>
> > >
> >
> >
>

Re: Partition query endpoints in CouchDB 4.0

Posted by Glynn Bird <gl...@gmail.com>.
I've worked with folks using partitioned database so I thought I'd drop my
experience of that here:

- partitioned databases can definitely give a performance boost (in CouchDB
< 4 scenarios) to use-cases where the main "read" use-case can be directed
to a single partition. In such cases, only a fraction of the shards are
exercised in answering the query - so there are scalability benefits there.
- not everyone who wanted to migrate from non-partitioned --> partitioned
did end up doing so - migrating involves mutating the document _id and
replication can't help - plus having to rethink indexing, access patterns
is too much for some etc. It seemed much better suited to "green field"
projects.
- in some cases partitioned databases made performance worse - by directing
a large proportion of traffic to one or a handful of partitions. This may
not be obvious at the design stage, you only find out when real-world
traffic arrives!
- it would have been nice to have a "per partition changes feed" - which
would allow a "one partition per user" model, with all the data in the same
database for reporting purposes.



On Mon, 11 May 2020 at 12:35, Garren Smith <ga...@apache.org> wrote:

> Coming back to this. I still think we should support it fully in 4.x so
> that anyone using it in 3.x will not experience any api changes when moving
> to 4.x. Once we have had more people use it in 3.x we can make a call on
> deprecating it for 5.x or look at adding more features to it.
>
> On Tue, Apr 21, 2020 at 11:01 PM Robert Samuel Newson <rn...@apache.org>
> wrote:
>
> > On Adam's point that the partitioned query api encourages good choices
> > ("discourages hot spots"), that's only true for folks that read the
> > documentation, which in my experience is a low percentage of folks. I've
> > encountered a heavy user of partitioned dbs that had precisely four
> > partitions in mind, for millions of docs (They chose "doc_type" as their
> > partition value).
> >
> > My view for 4.0 is;
> >
> > 1) ignore the partitioned flag when creating databases
> >
>  I don't think we should ignore it.
>
> 2) the "partitioned" property no longer reported in GET /dbname
> >
>
> I would prefer we report the partitioned flag.  It seems confusing to not
> report a setting a user intentionally set.
>
> 3) the various _partition endpoints still work
> > 4) all views work either "global" or "partitioned" depending on the
> > endpoint used.
> >
> > for 5.0 I'm +0 on removing the _partition endpoints, but we can take that
> > vote at the time based on contemporary feedback.
> >
> > B.
> >
> > > On 21 Apr 2020, at 21:35, Robert Samuel Newson <rn...@apache.org>
> > wrote:
> > >
> > > Hi,
> > >
> > > Good points on both sides of this. One thing we can hopefully get
> > agreement on is the ?partitioned=true flag on creation and, deeper, the
> > lack of distinction between the two "types" of database going forward?
> > >
> > > B.
> > >
> > >> On 21 Apr 2020, at 18:51, Garren Smith <ga...@apache.org> wrote:
> > >>
> > >> I'm on the fence when it comes to removing it. In terms of the
> original
> > >> plan of making querying faster by querying fewer shards that obviously
> > >> isn't needed. But I think it does create a nice mental model/design
> > pattern
> > >> when building an application in CouchDB.  Splitting your data into
> > >> partitions that contain similar documents makes sense. And once we on
> > FDB
> > >> it would be awesome to see if we could have a changes feed per
> > partition.
> > >> That would be a really nice feature.
> > >>
> > >> Cheers
> > >> Garren
> > >>
> > >> On Tue, Apr 21, 2020 at 5:51 PM Adam Kocoloski <ko...@apache.org>
> > wrote:
> > >>
> > >>> I think it’s difficult to make a call when 3.0 is still so new.
> > >>>
> > >>> The case for deprecation here is basically less code to maintain,
> > right?
> > >>> It’s not like a user of partitioned databases is causing pain for an
> > >>> FDB-based CouchDB; if anything, there’s a second-order benefit
> because
> > the
> > >>> partitioning discourages hot spots from forming in the
> > (range-partitioned)
> > >>> FDB keyspace.
> > >>>
> > >>> Cheers, Adam
> > >>>
> > >>>> On Apr 20, 2020, at 11:51 PM, Kyle Snavely <kj...@gmail.com>
> > wrote:
> > >>>>
> > >>>> My two cents is the same. Let's allow 3.* users migrate to 4.*
> without
> > >>>> needing to e.g. change the PQ part of their application and remove
> > the PQ
> > >>>> endpoints in 5.0.
> > >>>>
> > >>>> Best,
> > >>>> Kyle
> > >>>>
> > >>>> On Mon, Apr 20, 2020, 4:16 PM Ilya Khlopotov <ii...@apache.org>
> > wrote:
> > >>>>
> > >>>>> Given that it unlikely that there are too many people using it and
> > it is
> > >>>>> being noop in FDB world. I think we should deprecate and remove
> > >>> _partition
> > >>>>> endpoint.
> > >>>>>
> > >>>>> On 2020/04/20 21:04:58, Robert Samuel Newson <rn...@apache.org>
> > >>> wrote:
> > >>>>>> Hi All,
> > >>>>>>
> > >>>>>> I'd like to get views on whether we should preserve the _partition
> > >>>>> endpoints in CouchDB 4.0 or remove them. In CouchDB 4.0 all _view
> and
> > >>> _find
> > >>>>> queries will automatically benefit from the same performance boost
> > that
> > >>> the
> > >>>>> "partitioned database" feature brings, by virtue of FoundationDB.
> > >>>>>>
> > >>>>>> If we're preserving it, are we also deprecating it (so it's gone
> in
> > >>> 5.0)?
> > >>>>>>
> > >>>>>> If we're ditching it, what will the endpoint return instead (404
> Not
> > >>>>> Found, 410 Gone?)
> > >>>>>>
> > >>>>>> Thoughts welcome.
> > >>>>>>
> > >>>>>> B.
> > >>>>>
> > >>>
> > >>>
> > >
> >
> >
>