You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Christian Decker <de...@gmail.com> on 2010/11/14 01:11:46 UTC

Rows missing after new node bootstrapped

Hi all,

I'm having some doubts about the current state of my cluster. I started with
one node, filled it with some 10 million rows, then flushed and compacted
the node. Then I ran a small pig script that read an index and fetched the
matching rows, no problem until this point. Now I add a new node with
AutoBootStrap turned on, it all seems to work as it chooses a token to take
over some of the first nodes responsibilities, it seems to transfer all the
relevant data and everything looks fine. Now if I run the pig script again
it'll produce many empty rows, which points me to believe that these rows
were read from the new node which doesn't yet have the corresponding data.
Now this puzzles me, since I thought the bootstrap would transfer the needed
data, will this eventually return to give me no empty rows or have I done
something terribly wrong?

Regards,
Chris

Re: Rows missing after new node bootstrapped

Posted by "Stump.J.Xu / 徐隽" <st...@gmail.com>.
你用的哪个版本? 我碰到了另外一种情况,还没找到解决办法。当Cassandra中插入了一批错误数据的时候,似乎数据的正常读取受到了干扰。
Which version? I ran into another situation, not to find a solution.
When Cassandra inserted a number of erroneous data, it seems the
normal data read by the interference.

Stump Xu


2010/11/14 Christian Decker <de...@gmail.com>:
> Hi all,
>
> I'm having some doubts about the current state of my cluster. I started with
> one node, filled it with some 10 million rows, then flushed and compacted
> the node. Then I ran a small pig script that read an index and fetched the
> matching rows, no problem until this point. Now I add a new node with
> AutoBootStrap turned on, it all seems to work as it chooses a token to take
> over some of the first nodes responsibilities, it seems to transfer all the
> relevant data and everything looks fine. Now if I run the pig script again
> it'll produce many empty rows, which points me to believe that these rows
> were read from the new node which doesn't yet have the corresponding data.
> Now this puzzles me, since I thought the bootstrap would transfer the needed
> data, will this eventually return to give me no empty rows or have I done
> something terribly wrong?
>
> Regards,
> Chris
>



-- 
徐隽
Tel: 13818213223
Email: stumpxu@gmail.com
国家反计算机入侵和防病毒研究中心
信息网络安全公安部重点实验室
公安部第三研究所信息网络安全研发中心
上海辰星电子数据司法鉴定中心

Re: Rows missing after new node bootstrapped

Posted by Jonathan Ellis <jb...@gmail.com>.
I think you'll need to show us how to reproduce without your custom
LoadFunc, e.g., with normal index scans outside of pig.

On Wed, Nov 17, 2010 at 3:56 PM, Christian Decker
<de...@gmail.com> wrote:
> On Tue, Nov 16, 2010 at 6:58 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>
>> I'm pretty sure that "reading an index" and "using pig" are not
>> compatible right now.  the m/r support that pig builds on always does
>> sequential-scan range queries.
>
> Yes it does, I have a specialized LoadFunc to read and load manually
> maintained indices (pre-0.7 style), and it works like a charm as long as I
> don't do nodetool loadbalance or add new nodes to the cluster.
>>
>> can you see the missing rows if you do a normal get_slice query for it
>> without pig?
>
> They are empty, I suspect that the "eventual" in "eventual consistency" hit
> me in the head, the empty rows are disappearing at an incredibly slow rate,
> I guess it's repairing in the background, but it's taking forever
> (100'000'000 rows in the cluster, 2 nodes added and after 3 days it's still
> not done migrating to the new nodes).
>
> Could this actually be the case?
>
> Regards,
> Chris
>
> B.T.W.: M/R and indices might mix well if we can just fetch the size of the
> index, and then we could create the splits telling them to "fetch from index
> starting from col n and fetch a max of m" any plans on implementing it?
>>
>> On Mon, Nov 15, 2010 at 7:03 AM, Christian Decker
>> <de...@gmail.com> wrote:
>> > I'm using tag cassandra-0.7.0-beta3. I wouldn't know why I need range
>> > scans
>> > since I perform a multi_get on the indexed keys.
>> >
>> > Regards,
>> > Chris
>> >
>> > On Sun, Nov 14, 2010 at 9:51 AM, Jonathan Ellis <jb...@gmail.com>
>> > wrote:
>> >>
>> >> Are you using a version with working range scans?
>> >>
>> >> On Sat, Nov 13, 2010 at 6:11 PM, Christian Decker
>> >> <de...@gmail.com> wrote:
>> >> > Hi all,
>> >> >
>> >> > I'm having some doubts about the current state of my cluster. I
>> >> > started
>> >> > with
>> >> > one node, filled it with some 10 million rows, then flushed and
>> >> > compacted
>> >> > the node. Then I ran a small pig script that read an index and
>> >> > fetched
>> >> > the
>> >> > matching rows, no problem until this point. Now I add a new node with
>> >> > AutoBootStrap turned on, it all seems to work as it chooses a token
>> >> > to
>> >> > take
>> >> > over some of the first nodes responsibilities, it seems to transfer
>> >> > all
>> >> > the
>> >> > relevant data and everything looks fine. Now if I run the pig script
>> >> > again
>> >> > it'll produce many empty rows, which points me to believe that these
>> >> > rows
>> >> > were read from the new node which doesn't yet have the corresponding
>> >> > data.
>> >> > Now this puzzles me, since I thought the bootstrap would transfer the
>> >> > needed
>> >> > data, will this eventually return to give me no empty rows or have I
>> >> > done
>> >> > something terribly wrong?
>> >> >
>> >> > Regards,
>> >> > Chris
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Jonathan Ellis
>> >> Project Chair, Apache Cassandra
>> >> co-founder of Riptano, the source for professional Cassandra support
>> >> http://riptano.com
>> >
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Rows missing after new node bootstrapped

Posted by Christian Decker <de...@gmail.com>.
On Tue, Nov 16, 2010 at 6:58 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> I'm pretty sure that "reading an index" and "using pig" are not
> compatible right now.  the m/r support that pig builds on always does
> sequential-scan range queries.
>
Yes it does, I have a specialized LoadFunc to read and load manually
maintained indices (pre-0.7 style), and it works like a charm as long as I
don't do nodetool loadbalance or add new nodes to the cluster.

>
> can you see the missing rows if you do a normal get_slice query for it
> without pig?
>
They are empty, I suspect that the "eventual" in "eventual consistency" hit
me in the head, the empty rows are disappearing at an incredibly slow rate,
I guess it's repairing in the background, but it's taking forever
(100'000'000 rows in the cluster, 2 nodes added and after 3 days it's still
not done migrating to the new nodes).

Could this actually be the case?

Regards,
Chris

B.T.W.: M/R and indices might mix well if we can just fetch the size of the
index, and then we could create the splits telling them to "fetch from index
starting from col n and fetch a max of m" any plans on implementing it?

>
> On Mon, Nov 15, 2010 at 7:03 AM, Christian Decker
> <de...@gmail.com> wrote:
> > I'm using tag cassandra-0.7.0-beta3. I wouldn't know why I need range
> scans
> > since I perform a multi_get on the indexed keys.
> >
> > Regards,
> > Chris
> >
> > On Sun, Nov 14, 2010 at 9:51 AM, Jonathan Ellis <jb...@gmail.com>
> wrote:
> >>
> >> Are you using a version with working range scans?
> >>
> >> On Sat, Nov 13, 2010 at 6:11 PM, Christian Decker
> >> <de...@gmail.com> wrote:
> >> > Hi all,
> >> >
> >> > I'm having some doubts about the current state of my cluster. I
> started
> >> > with
> >> > one node, filled it with some 10 million rows, then flushed and
> >> > compacted
> >> > the node. Then I ran a small pig script that read an index and fetched
> >> > the
> >> > matching rows, no problem until this point. Now I add a new node with
> >> > AutoBootStrap turned on, it all seems to work as it chooses a token to
> >> > take
> >> > over some of the first nodes responsibilities, it seems to transfer
> all
> >> > the
> >> > relevant data and everything looks fine. Now if I run the pig script
> >> > again
> >> > it'll produce many empty rows, which points me to believe that these
> >> > rows
> >> > were read from the new node which doesn't yet have the corresponding
> >> > data.
> >> > Now this puzzles me, since I thought the bootstrap would transfer the
> >> > needed
> >> > data, will this eventually return to give me no empty rows or have I
> >> > done
> >> > something terribly wrong?
> >> >
> >> > Regards,
> >> > Chris
> >> >
> >>
> >>
> >>
> >> --
> >> Jonathan Ellis
> >> Project Chair, Apache Cassandra
> >> co-founder of Riptano, the source for professional Cassandra support
> >> http://riptano.com
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Re: Rows missing after new node bootstrapped

Posted by Jonathan Ellis <jb...@gmail.com>.
I'm pretty sure that "reading an index" and "using pig" are not
compatible right now.  the m/r support that pig builds on always does
sequential-scan range queries.

can you see the missing rows if you do a normal get_slice query for it
without pig?

On Mon, Nov 15, 2010 at 7:03 AM, Christian Decker
<de...@gmail.com> wrote:
> I'm using tag cassandra-0.7.0-beta3. I wouldn't know why I need range scans
> since I perform a multi_get on the indexed keys.
>
> Regards,
> Chris
>
> On Sun, Nov 14, 2010 at 9:51 AM, Jonathan Ellis <jb...@gmail.com> wrote:
>>
>> Are you using a version with working range scans?
>>
>> On Sat, Nov 13, 2010 at 6:11 PM, Christian Decker
>> <de...@gmail.com> wrote:
>> > Hi all,
>> >
>> > I'm having some doubts about the current state of my cluster. I started
>> > with
>> > one node, filled it with some 10 million rows, then flushed and
>> > compacted
>> > the node. Then I ran a small pig script that read an index and fetched
>> > the
>> > matching rows, no problem until this point. Now I add a new node with
>> > AutoBootStrap turned on, it all seems to work as it chooses a token to
>> > take
>> > over some of the first nodes responsibilities, it seems to transfer all
>> > the
>> > relevant data and everything looks fine. Now if I run the pig script
>> > again
>> > it'll produce many empty rows, which points me to believe that these
>> > rows
>> > were read from the new node which doesn't yet have the corresponding
>> > data.
>> > Now this puzzles me, since I thought the bootstrap would transfer the
>> > needed
>> > data, will this eventually return to give me no empty rows or have I
>> > done
>> > something terribly wrong?
>> >
>> > Regards,
>> > Chris
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Rows missing after new node bootstrapped

Posted by Christian Decker <de...@gmail.com>.
I'm using tag cassandra-0.7.0-beta3. I wouldn't know why I need range scans
since I perform a multi_get on the indexed keys.

Regards,
Chris

On Sun, Nov 14, 2010 at 9:51 AM, Jonathan Ellis <jb...@gmail.com> wrote:

> Are you using a version with working range scans?
>
> On Sat, Nov 13, 2010 at 6:11 PM, Christian Decker
> <de...@gmail.com> wrote:
> > Hi all,
> >
> > I'm having some doubts about the current state of my cluster. I started
> with
> > one node, filled it with some 10 million rows, then flushed and compacted
> > the node. Then I ran a small pig script that read an index and fetched
> the
> > matching rows, no problem until this point. Now I add a new node with
> > AutoBootStrap turned on, it all seems to work as it chooses a token to
> take
> > over some of the first nodes responsibilities, it seems to transfer all
> the
> > relevant data and everything looks fine. Now if I run the pig script
> again
> > it'll produce many empty rows, which points me to believe that these rows
> > were read from the new node which doesn't yet have the corresponding
> data.
> > Now this puzzles me, since I thought the bootstrap would transfer the
> needed
> > data, will this eventually return to give me no empty rows or have I done
> > something terribly wrong?
> >
> > Regards,
> > Chris
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Re: Rows missing after new node bootstrapped

Posted by Jonathan Ellis <jb...@gmail.com>.
Are you using a version with working range scans?

On Sat, Nov 13, 2010 at 6:11 PM, Christian Decker
<de...@gmail.com> wrote:
> Hi all,
>
> I'm having some doubts about the current state of my cluster. I started with
> one node, filled it with some 10 million rows, then flushed and compacted
> the node. Then I ran a small pig script that read an index and fetched the
> matching rows, no problem until this point. Now I add a new node with
> AutoBootStrap turned on, it all seems to work as it chooses a token to take
> over some of the first nodes responsibilities, it seems to transfer all the
> relevant data and everything looks fine. Now if I run the pig script again
> it'll produce many empty rows, which points me to believe that these rows
> were read from the new node which doesn't yet have the corresponding data.
> Now this puzzles me, since I thought the bootstrap would transfer the needed
> data, will this eventually return to give me no empty rows or have I done
> something terribly wrong?
>
> Regards,
> Chris
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com