You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@bookkeeper.apache.org by "Daniel S. Kim" <ks...@gmail.com> on 2011/12/21 00:13:09 UTC

Hedwig Subproject::Hubs owning some topics that doesn't exist.

Hi everyone,


Since there is no hedwig API to delete a topic, I used zookeeper java
client to delete the znodes associated with that topic. For an example, I
delete /hedwig/standalone/topics/mytopic and its child znode(s) in order to
delete a topic named, "mytopic". As a result, bookies would delete data
associated with this topic as well. However, I can still publish and
subscribe to "mytopic" topic at this point even though there is no znodes
or data for this topic. The hub kept all the cache and believes that the
topic is still "alive". The documents says that the hub will lose its topic
ownership if the hub is overloaded or the periodic redistribution kicks in.
Is there any way to tell the hub to forget about the topic?


Regard,


Daniel S. Kim

Re: Hedwig Subproject::Hubs owning some topics that doesn't exist.

Posted by Ivan Kelly <iv...@apache.org>.
On Thu, Dec 22, 2011 at 10:13:13AM -0600, Daniel S. Kim wrote:
> I thought the messages persisted in the bookies even after someone consumes
> it. I had one test topic with one publisher and one subscriber. I published
> about 5 messages to the topic. I subscribed and consumed messages from my
> listener, which just prints out the message along with its sequence number.
> When I get rid of this listener and start another one, this new listener
> will get all previous messages from the topic. How is this possible if
> messages are not being piled up somewhere (bookies)? Does the hub keep all
> the messages? I am somewhat confused how consuming messages get rid of old
> messages. In my thought, they persisted in the bookies. Correct me if I am
> wrong.
The cleanup is lazy. Messages are stored in ledgers, one ledger per
topic. To cleanup up a message, you have to delete the ledger, so
obviously we can't do it for each individual message. Have a look at
MessageConsumedTask in
hedwig-server/src/main/java/org/apache/hedwig/server/subscriptions/AbstractSubscriptionManager.java 
This runs every minute by default.

That said, you shouldn't be getting the consumed messages again if you
are using the same subscription id. If the subscription id is
different, then you're getting the messages which having been cleaned
up yet, which is fine as you will get all messages after that in order
also.

> 
> Also I would like to contribute by adding delete method (if it is possible)
> and topic eviction, etc. However, I feel that I need to study its system,
> but I am not seeing very much information at
> http://zookeeper.apache.org/bookkeeper/docs/trunk/hedwigDesign.html. Is
> there any other design documentation with more details? Where is the best
> place to learn how hedwig is built without 100% digging through codes?
There's some extra docs in the wiki
https://cwiki.apache.org/confluence/display/BOOKKEEPER/HedWig

-Ivan

Re: Hedwig Subproject::Hubs owning some topics that doesn't exist.

Posted by "Daniel S. Kim" <ks...@gmail.com>.
I thought the messages persisted in the bookies even after someone consumes
it. I had one test topic with one publisher and one subscriber. I published
about 5 messages to the topic. I subscribed and consumed messages from my
listener, which just prints out the message along with its sequence number.
When I get rid of this listener and start another one, this new listener
will get all previous messages from the topic. How is this possible if
messages are not being piled up somewhere (bookies)? Does the hub keep all
the messages? I am somewhat confused how consuming messages get rid of old
messages. In my thought, they persisted in the bookies. Correct me if I am
wrong.

Also I would like to contribute by adding delete method (if it is possible)
and topic eviction, etc. However, I feel that I need to study its system,
but I am not seeing very much information at
http://zookeeper.apache.org/bookkeeper/docs/trunk/hedwigDesign.html. Is
there any other design documentation with more details? Where is the best
place to learn how hedwig is built without 100% digging through codes?



Regard,

On Thu, Dec 22, 2011 at 9:56 AM, Ivan Kelly <iv...@apache.org> wrote:

> On Thu, Dec 22, 2011 at 09:25:57AM -0600, Daniel S. Kim wrote:
> > When I say "Delete", I mean that I want all the stuff about that topic to
> > be gone. The reason is I need topic management to see if they are being
> > used or not. If they are not being used for awhile, I expire the topic
> and
> > kill it. This is what I should do to save resources. Imagine a large
> number
> > of hedwig users that start new topics, send messages, etc. All these data
> > build up eventually (and I believe there is no eviction mechanism and
> > policy yet). Even though hedwig lets user to keep messages persistently.
> I
> > don't think it should persist when the user wants it gone.
> The only reason data should build up like this is if there is a user
> subscribed to a topic, and it it hasn't consumed all messages
> published to the topic. Otherwise it should be safe to periodically
> delete garbage collect topics who have no subscribers, but I don't
> think we do this at the moment. It would my great if you could
> contribute this ;)
>
> Where exactly are you seeing the problem? Is the zookeeper data
> getting to big, or is the problem in bookkeeper, etc?
>
> >
> > Since you said it would possibly break some of the guarantees, I would
> have
> > to look more into it. If my memory is correct, Ben Reed said adding
> > administrative hedwig function to delete a topic should not be too
> > complicated. If it is indeed complicated to achieve the functionality
> > without breaking the guarantees, I will have to wait or build something
> > around. I need to know little bit more about the hedwig hub
> redistribution
> > and how it works, if it is configurable, etc. Where should I start (i.e.,
> > which java package or classes deal with this)?
> hedwig-server/src/main/java/org/apache/hedwig/server/topic
> &
> hedwig-server/src/main/java/org/apache/hedwig/server/subscriptions
> should cover most of what you're interested in.
>
> -Ivan
>



-- 
Daniel S. Kim

Re: Hedwig Subproject::Hubs owning some topics that doesn't exist.

Posted by Ivan Kelly <iv...@apache.org>.
On Thu, Dec 22, 2011 at 09:25:57AM -0600, Daniel S. Kim wrote:
> When I say "Delete", I mean that I want all the stuff about that topic to
> be gone. The reason is I need topic management to see if they are being
> used or not. If they are not being used for awhile, I expire the topic and
> kill it. This is what I should do to save resources. Imagine a large number
> of hedwig users that start new topics, send messages, etc. All these data
> build up eventually (and I believe there is no eviction mechanism and
> policy yet). Even though hedwig lets user to keep messages persistently. I
> don't think it should persist when the user wants it gone.
The only reason data should build up like this is if there is a user
subscribed to a topic, and it it hasn't consumed all messages
published to the topic. Otherwise it should be safe to periodically
delete garbage collect topics who have no subscribers, but I don't
think we do this at the moment. It would my great if you could
contribute this ;)

Where exactly are you seeing the problem? Is the zookeeper data
getting to big, or is the problem in bookkeeper, etc?

> 
> Since you said it would possibly break some of the guarantees, I would have
> to look more into it. If my memory is correct, Ben Reed said adding
> administrative hedwig function to delete a topic should not be too
> complicated. If it is indeed complicated to achieve the functionality
> without breaking the guarantees, I will have to wait or build something
> around. I need to know little bit more about the hedwig hub redistribution
> and how it works, if it is configurable, etc. Where should I start (i.e.,
> which java package or classes deal with this)?
hedwig-server/src/main/java/org/apache/hedwig/server/topic
& 
hedwig-server/src/main/java/org/apache/hedwig/server/subscriptions
should cover most of what you're interested in.

-Ivan

Re: Hedwig Subproject::Hubs owning some topics that doesn't exist.

Posted by "Daniel S. Kim" <ks...@gmail.com>.
When I say "Delete", I mean that I want all the stuff about that topic to
be gone. The reason is I need topic management to see if they are being
used or not. If they are not being used for awhile, I expire the topic and
kill it. This is what I should do to save resources. Imagine a large number
of hedwig users that start new topics, send messages, etc. All these data
build up eventually (and I believe there is no eviction mechanism and
policy yet). Even though hedwig lets user to keep messages persistently. I
don't think it should persist when the user wants it gone.

Since you said it would possibly break some of the guarantees, I would have
to look more into it. If my memory is correct, Ben Reed said adding
administrative hedwig function to delete a topic should not be too
complicated. If it is indeed complicated to achieve the functionality
without breaking the guarantees, I will have to wait or build something
around. I need to know little bit more about the hedwig hub redistribution
and how it works, if it is configurable, etc. Where should I start (i.e.,
which java package or classes deal with this)?



Regard,

On Thu, Dec 22, 2011 at 3:40 AM, Ivan Kelly <iv...@apache.org> wrote:

> Sorry, this isn't the model which hedwig uses at the moment. Changing
> it, apart from being a large amount of work would also break some of
> the guarantees which hedwig currently has. For example, if you create
> a topic, someone subscribes and then disconnects, messages are
> published to the topic, the topic is deleted and created again,
>  and more messages are published, should the subscriber when he
>  connects again get all the messages? only those from the second topic
>  creation? only the first ones, since that was the one he subscribed
>  to?
>
> I'm sure there's a lot of other cases similar to this also.
>
> -Ivan
>



-- 
Daniel S. Kim

Re: Hedwig Subproject::Hubs owning some topics that doesn't exist.

Posted by Ivan Kelly <iv...@apache.org>.
Sorry, this isn't the model which hedwig uses at the moment. Changing
it, apart from being a large amount of work would also break some of
the guarantees which hedwig currently has. For example, if you create
a topic, someone subscribes and then disconnects, messages are
published to the topic, the topic is deleted and created again,
 and more messages are published, should the subscriber when he
 connects again get all the messages? only those from the second topic
 creation? only the first ones, since that was the one he subscribed
 to? 

I'm sure there's a lot of other cases similar to this also.

-Ivan

Re: Hedwig Subproject::Hubs owning some topics that doesn't exist.

Posted by "Daniel S. Kim" <ks...@gmail.com>.
I am wanting to make the topics into manageable objects. I want to be
create the topic (w/o having to publish something) or delete the topic. I
do know and understand that there is no API to do these. For example, if I
try to publish something to a none-existing topic, I want it to fail
instead of creating a new topic and publishing it to there.

Also I want to save the resources (memory and hard disk in this case) by
deleting unwanted topics. Right now the only way to delete a topic is to
delete the znodes assoicated with the topic (e.g.,
/hedwig/standalone/topics/<topic_name>), then wating for the hedwig hubs
redistribution when hubs realize that the topic is gone. During this window
(from deleting znodes to hubs redistribution happening), publishers and
subscribers can still publish and get the message. I don't want them to be
able to publish or subscribe meanwhile.

My only solution right now it to make my application check if the topic
znode exists or not. This brings down my application's performance. This is
why I am hoping for the hedwig to, at least, add deleteTopic() method or
something.

On Wed, Dec 21, 2011 at 11:58 AM, Ivan Kelly <iv...@yahoo-inc.com> wrote:

> Nor is there an API to create a topic. A topic in hedwig is never created
> or destroyed explicitly by the client. Why do you want to delete the topic?
> What behaviour are you expecting?
>
> -Ivan
>
>
> On 21 Dec 2011, at 00:13, Daniel S. Kim wrote:
>
> > Hi everyone,
> >
> >
> > Since there is no hedwig API to delete a topic, I used zookeeper java
> client to delete the znodes associated with that topic. For an example, I
> delete /hedwig/standalone/topics/mytopic and its child znode(s) in order to
> delete a topic named, "mytopic". As a result, bookies would delete data
> associated with this topic as well. However, I can still publish and
> subscribe to "mytopic" topic at this point even though there is no znodes
> or data for this topic. The hub kept all the cache and believes that the
> topic is still "alive". The documents says that the hub will lose its topic
> ownership if the hub is overloaded or the periodic redistribution kicks in.
> Is there any way to tell the hub to forget about the topic?
> >
> >
> > Regard,
> >
> >
> > Daniel S. Kim
> >
>
>


-- 
Daniel S. Kim

Re: Hedwig Subproject::Hubs owning some topics that doesn't exist.

Posted by Ivan Kelly <iv...@yahoo-inc.com>.
Nor is there an API to create a topic. A topic in hedwig is never created or destroyed explicitly by the client. Why do you want to delete the topic? What behaviour are you expecting?

-Ivan


On 21 Dec 2011, at 00:13, Daniel S. Kim wrote:

> Hi everyone,
> 
> 
> Since there is no hedwig API to delete a topic, I used zookeeper java client to delete the znodes associated with that topic. For an example, I delete /hedwig/standalone/topics/mytopic and its child znode(s) in order to delete a topic named, "mytopic". As a result, bookies would delete data associated with this topic as well. However, I can still publish and subscribe to "mytopic" topic at this point even though there is no znodes or data for this topic. The hub kept all the cache and believes that the topic is still "alive". The documents says that the hub will lose its topic ownership if the hub is overloaded or the periodic redistribution kicks in. Is there any way to tell the hub to forget about the topic?
> 
> 
> Regard,
> 
> 
> Daniel S. Kim
>