You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@distributedlog.apache.org by Jon Derrick <jo...@gmail.com> on 2016/09/12 09:46:06 UTC

Re: question about DL namespace

Sijie, thank you for your comments.

I'd like to make a proposal by introducing a `NamespaceResolver`.

What does a namespace resolver do? A namespace resolver is basically
resolving the log stream name into a metadata location path. Then DL knows
where to locate the metadata of a log stream. The resolver also takes the
responsibility of validating the stream name and managing the hierarchical
of streams.

A NamespaceResolver interface will look like as below:

public interface NamespaceResolver {

/** validate if the stream name is okay */

boolean validateStreamName(String streamName);

/** resolve the stream name into the location path of the metadata */

String resolveStreamPath(String streamName);


}

So a filesystem-like namespace resolver will only accept the absolute
file-like paths as the stream names and a kafka-like (what Khurrum
mentioned) namespace resolver will probably accept names like
'<stream>/<partition>'.

A namespace resolver will be added to the namespace metadata binding and
loaded via reflection.

Any thoughts? I will send out a pull request soon.

- jd


On Tue, Aug 23, 2016 at 9:07 AM, Khurrum Nasim <kh...@gmail.com>
wrote:

> On Thu, Aug 18, 2016 at 2:30 AM, Sijie Guo <si...@apache.org> wrote:
>
> > Jon,
> >
> > Sorry for late response. This is a very good question. Comments in line.
> >
> > Sijie
> >
> > On Monday, August 15, 2016, Jon Derrick <jo...@gmail.com>
> > wrote:
> >
> > > Hello all,
> > >
> > > I read the distributed log code closely. I found that the DL namespace
> > is a
> > > flat namespace. There will be a potential issue if there are a lot of
> > > streams created under a same namespace. I am very curious what are the
> > > thoughts behind that. Here are some questions:
> > >
> > > - How many streams that a namespace can support?
> >
> >
> > The maximum number of streams we have had for a single namespace is more
> > than 30k. But yup, you are right. It is limited by the number of children
> > that a znode can have.
> >
> > >
> > >
> > > It seems to be bound with
> > > the limitation on the number of children that a zookeeper znode can
> have.
> > > What's the maximum number of logs do you guys have?
> > > - Why not choose a tree representation? Then it might be easier to
> > organize
> > > streams. For example, if I want to use multiple dl streams as
> > partitions, I
> > > can just easily organize them together under same znode.
> >
> >
> > We don't want to DL to focus on partitions. We let applications decide
> how
> > to partition. So we choose a simple way to start. However, I don't think
> it
> > is necessary to be just a flat namespace. You probably already noticed
> that
> > there is another namespace implementation to support hierarchy.
> >
> > If you do like to support filesystem like namespace, I would suggest
> adding
> > a namespace type on metadata binding. So it can support different types
> of
> > namespaces. Does that meet your requirements?
> >
>
> +1 for supporting different types of namespaces. I want to organize a kafka
> topic in following format:
>
> namespace/topic/partitions : storing all the partitions
> namespace/topic/partitions/N : storing the given partition `N`
> namespace/topic/subscriptions : storing all the subscriptions
> namespace/topic/subscriptions/S : storing the information of subscription
> `S`
>
> both `namespace/topic/partitions/N` and `namespace/topic/subscriptions/S`
> are DL streams.
>
> So it would make me easier to manage the streams if I can customize
> namespace layout.
>
> - KN
>
>
> >
> >
> > > - Also if it is a tree-like namespace, it might be easier to implement
> a
> > > filesystem over the streams. Each file can be backed by one dl stream.
> In
> > > that way, I can also use DL as long term storage.
> > >
> > > Any thoughts? Appreciate your comments.
> > >
> > >
> > > --
> > > - jderrick
> > >
> >
>



-- 
- jderrick

Re: question about DL namespace

Posted by Khurrum Nasim <kh...@gmail.com>.
+1 for the interface.

- KN

On Mon, Sep 12, 2016 at 5:46 PM, Jon Derrick <jo...@gmail.com>
wrote:

> Sijie, thank you for your comments.
>
> I'd like to make a proposal by introducing a `NamespaceResolver`.
>
> What does a namespace resolver do? A namespace resolver is basically
> resolving the log stream name into a metadata location path. Then DL knows
> where to locate the metadata of a log stream. The resolver also takes the
> responsibility of validating the stream name and managing the hierarchical
> of streams.
>
> A NamespaceResolver interface will look like as below:
>
> public interface NamespaceResolver {
>
> /** validate if the stream name is okay */
>
> boolean validateStreamName(String streamName);
>
> /** resolve the stream name into the location path of the metadata */
>
> String resolveStreamPath(String streamName);
>
>
> }
>
> So a filesystem-like namespace resolver will only accept the absolute
> file-like paths as the stream names and a kafka-like (what Khurrum
> mentioned) namespace resolver will probably accept names like
> '<stream>/<partition>'.
>
> A namespace resolver will be added to the namespace metadata binding and
> loaded via reflection.
>
> Any thoughts? I will send out a pull request soon.
>
> - jd
>
>
> On Tue, Aug 23, 2016 at 9:07 AM, Khurrum Nasim <kh...@gmail.com>
> wrote:
>
> > On Thu, Aug 18, 2016 at 2:30 AM, Sijie Guo <si...@apache.org> wrote:
> >
> > > Jon,
> > >
> > > Sorry for late response. This is a very good question. Comments in
> line.
> > >
> > > Sijie
> > >
> > > On Monday, August 15, 2016, Jon Derrick <jo...@gmail.com>
> > > wrote:
> > >
> > > > Hello all,
> > > >
> > > > I read the distributed log code closely. I found that the DL
> namespace
> > > is a
> > > > flat namespace. There will be a potential issue if there are a lot of
> > > > streams created under a same namespace. I am very curious what are
> the
> > > > thoughts behind that. Here are some questions:
> > > >
> > > > - How many streams that a namespace can support?
> > >
> > >
> > > The maximum number of streams we have had for a single namespace is
> more
> > > than 30k. But yup, you are right. It is limited by the number of
> children
> > > that a znode can have.
> > >
> > > >
> > > >
> > > > It seems to be bound with
> > > > the limitation on the number of children that a zookeeper znode can
> > have.
> > > > What's the maximum number of logs do you guys have?
> > > > - Why not choose a tree representation? Then it might be easier to
> > > organize
> > > > streams. For example, if I want to use multiple dl streams as
> > > partitions, I
> > > > can just easily organize them together under same znode.
> > >
> > >
> > > We don't want to DL to focus on partitions. We let applications decide
> > how
> > > to partition. So we choose a simple way to start. However, I don't
> think
> > it
> > > is necessary to be just a flat namespace. You probably already noticed
> > that
> > > there is another namespace implementation to support hierarchy.
> > >
> > > If you do like to support filesystem like namespace, I would suggest
> > adding
> > > a namespace type on metadata binding. So it can support different types
> > of
> > > namespaces. Does that meet your requirements?
> > >
> >
> > +1 for supporting different types of namespaces. I want to organize a
> kafka
> > topic in following format:
> >
> > namespace/topic/partitions : storing all the partitions
> > namespace/topic/partitions/N : storing the given partition `N`
> > namespace/topic/subscriptions : storing all the subscriptions
> > namespace/topic/subscriptions/S : storing the information of
> subscription
> > `S`
> >
> > both `namespace/topic/partitions/N` and `namespace/topic/
> subscriptions/S`
> > are DL streams.
> >
> > So it would make me easier to manage the streams if I can customize
> > namespace layout.
> >
> > - KN
> >
> >
> > >
> > >
> > > > - Also if it is a tree-like namespace, it might be easier to
> implement
> > a
> > > > filesystem over the streams. Each file can be backed by one dl
> stream.
> > In
> > > > that way, I can also use DL as long term storage.
> > > >
> > > > Any thoughts? Appreciate your comments.
> > > >
> > > >
> > > > --
> > > > - jderrick
> > > >
> > >
> >
>
>
>
> --
> - jderrick
>

Re: question about DL namespace

Posted by Sijie Guo <si...@apache.org>.
This sounds reasonable to me. Look forward to your contribution.

- Sijie

On Mon, Sep 12, 2016 at 2:46 AM, Jon Derrick <jo...@gmail.com>
wrote:

> Sijie, thank you for your comments.
>
> I'd like to make a proposal by introducing a `NamespaceResolver`.
>
> What does a namespace resolver do? A namespace resolver is basically
> resolving the log stream name into a metadata location path. Then DL knows
> where to locate the metadata of a log stream. The resolver also takes the
> responsibility of validating the stream name and managing the hierarchical
> of streams.
>
> A NamespaceResolver interface will look like as below:
>
> public interface NamespaceResolver {
>
> /** validate if the stream name is okay */
>
> boolean validateStreamName(String streamName);
>
> /** resolve the stream name into the location path of the metadata */
>
> String resolveStreamPath(String streamName);
>
>
> }
>
> So a filesystem-like namespace resolver will only accept the absolute
> file-like paths as the stream names and a kafka-like (what Khurrum
> mentioned) namespace resolver will probably accept names like
> '<stream>/<partition>'.
>
> A namespace resolver will be added to the namespace metadata binding and
> loaded via reflection.
>
> Any thoughts? I will send out a pull request soon.
>
> - jd
>
>
> On Tue, Aug 23, 2016 at 9:07 AM, Khurrum Nasim <kh...@gmail.com>
> wrote:
>
> > On Thu, Aug 18, 2016 at 2:30 AM, Sijie Guo <si...@apache.org> wrote:
> >
> > > Jon,
> > >
> > > Sorry for late response. This is a very good question. Comments in
> line.
> > >
> > > Sijie
> > >
> > > On Monday, August 15, 2016, Jon Derrick <jo...@gmail.com>
> > > wrote:
> > >
> > > > Hello all,
> > > >
> > > > I read the distributed log code closely. I found that the DL
> namespace
> > > is a
> > > > flat namespace. There will be a potential issue if there are a lot of
> > > > streams created under a same namespace. I am very curious what are
> the
> > > > thoughts behind that. Here are some questions:
> > > >
> > > > - How many streams that a namespace can support?
> > >
> > >
> > > The maximum number of streams we have had for a single namespace is
> more
> > > than 30k. But yup, you are right. It is limited by the number of
> children
> > > that a znode can have.
> > >
> > > >
> > > >
> > > > It seems to be bound with
> > > > the limitation on the number of children that a zookeeper znode can
> > have.
> > > > What's the maximum number of logs do you guys have?
> > > > - Why not choose a tree representation? Then it might be easier to
> > > organize
> > > > streams. For example, if I want to use multiple dl streams as
> > > partitions, I
> > > > can just easily organize them together under same znode.
> > >
> > >
> > > We don't want to DL to focus on partitions. We let applications decide
> > how
> > > to partition. So we choose a simple way to start. However, I don't
> think
> > it
> > > is necessary to be just a flat namespace. You probably already noticed
> > that
> > > there is another namespace implementation to support hierarchy.
> > >
> > > If you do like to support filesystem like namespace, I would suggest
> > adding
> > > a namespace type on metadata binding. So it can support different types
> > of
> > > namespaces. Does that meet your requirements?
> > >
> >
> > +1 for supporting different types of namespaces. I want to organize a
> kafka
> > topic in following format:
> >
> > namespace/topic/partitions : storing all the partitions
> > namespace/topic/partitions/N : storing the given partition `N`
> > namespace/topic/subscriptions : storing all the subscriptions
> > namespace/topic/subscriptions/S : storing the information of
> subscription
> > `S`
> >
> > both `namespace/topic/partitions/N` and `namespace/topic/
> subscriptions/S`
> > are DL streams.
> >
> > So it would make me easier to manage the streams if I can customize
> > namespace layout.
> >
> > - KN
> >
> >
> > >
> > >
> > > > - Also if it is a tree-like namespace, it might be easier to
> implement
> > a
> > > > filesystem over the streams. Each file can be backed by one dl
> stream.
> > In
> > > > that way, I can also use DL as long term storage.
> > > >
> > > > Any thoughts? Appreciate your comments.
> > > >
> > > >
> > > > --
> > > > - jderrick
> > > >
> > >
> >
>
>
>
> --
> - jderrick
>