You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Andrew Baker <ba...@gmail.com> on 2016/11/30 15:23:02 UTC

Re: Inserting list data

Sorry this is so long after the initial. I wrote a dtest to try to make
this happen here:
https://github.com/bakerag1/cassandra-dtest/blob/master/collection_update_test.py

This is my first dtest and my second python script, so I am not overly
confident that it is doing a good job of this test, so if any dtest expert
out there could look it over, I would appreciate it.

I couldn't detect this happening with 5 threads concurrently creating and
updating the same 200 records, with List<String> and List<UDT>.

We are not using batches in our case, but we are using prepared statements.
I will add that to the test and let you know.

-Andrew

On Fri, Oct 14, 2016 at 11:00 PM Russell Spitzer <ru...@gmail.com>
wrote:

> Are you sure you aren't using batches? These will assign the same
> timestamp to your inserts which can lead to unexpected behaviors.
>
> On Fri, Oct 14, 2016 at 9:45 PM Vladimir Yudovin <vl...@winguzone.com>
> wrote:
>
> Did you try the same quires with Java driver without using prepared
> statements?
>
>
> Best regards, Vladimir Yudovin,
>
>
> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud Cassandra on
> Azure and SoftLayer.Launch your cluster in minutes.*
>
>
> ---- On Fri, 14 Oct 2016 15:13:38 -0400*Aoi Kadoya <cadyan.aoi@gmail.com
> <ca...@gmail.com>>* wrote ----
>
> Hi Vladimir,
>
> In fact I am having difficulty to reproduce this issue by cqlsh.
> I was reported this issue by one of our developers and he is using his
> client application that uses cassandra java driver 3.0.3. (we're using
> DSE5.0.1)
>
> <here is how his application log look like>
>
> app A:
> 2016-10-11 13:28:23,014 [TRACE] [core.QueryLogger.NORMAL] [cluster1]
> [HOST1/IP1:9042] Query completed normally, took 5 ms: [8 bound values]
> INSERT INTO global.table_name
> ("id","alert_to","alert_emails","created_by","created_date","alert_level","updated_by","updated_date")
>
> VALUES (?,?,?,?,?,?,?,?);
> [id:25712, alert_to:[2], alert_emails:NULL,
> created_by:'service-worker:ec45afd2-c40a-44d9-a2a1-7416409be6e2',
> created_date:1476160103007, alert_level:2, updated_by:NULL,
> updated_date:NULL]
>
> app B:
> 2016-10-11 13:28:23,014 [TRACE] [core.QueryLogger.NORMAL] [cluster1]
> [HOST2/IP2:9042] Query completed normally, took 6 ms: [8 bound values]
> INSERT INTO global.table_name
> ("alert_to","alert_emails","created_date","id","created_by","updated_by","updated_date","alert_level")
>
> VALUES (?,?,?,?,?,?,?,?);
> [alert_to:[1], alert_emails:NULL, created_date:1476160103007,
> id:25712,
> created_by:'service-worker:ec45afd2-c40a-44d9-a2a1-7416409be6e2',
> updated_by:NULL, updated_date:NULL, alert_level:1]
>
> <and the data on cassandra>
> <TYPES>
> id bigint,
> alert_emails list,
> alert_level int,
> alert_to list,
> created_by text,
> created_date timestamp,
> updated_by text,
> updated_date timestamp,
> PRIMARY KEY (id)
>
>
> SELECT id, alert_level, alert_to FROM global.table_name WHERE id=25712;
> | id | alert_level | alert_to |
> | 25712 | 2 | [2, 1] |
>
> but when I threw the queries like below from cqlsh from different
> nodes at the same time in my testing environment, the data(alert_to)
> was just [1], which is expected behavior.
>
> on host 1
> cqlsh> INSERT INTO global.table_name
> ("id","alert_to","alert_emails","created_by","created_date","alert_level","updated_by","updated_date")
>
> VALUES
> (25712,[2],NULL,'service-worker:ec45afd2-c40a-44d9-a2a1-7416409be6e2',1476160103007,2,NULL,NULL);
>
> on host2
> cqlsh> INSERT INTO global.table_name
> ("alert_to","alert_emails","created_date","id","created_by","updated_by","updated_date","alert_level")
>
> VALUES
> ([1],NULL,1476160103007,25712,'service-worker:ec45afd2-c40a-44d9-a2a1-7416409be6e2',NULL,NULL,1);
>
>
>
> so I wonder if this is something wrong with java driver but I cannot
> figure out the way to break this down further.
>
>
> @Andrew
> we're not using UDT..but appreciate if you could share your case, too.
>
> Thanks,
> Aoi
>
> 2016-10-13 11:26 GMT-07:00 Andrew Baker <ba...@gmail.com>:
> > I saw evidence of this behavior, but when we created a test to try to
> make
> > it happen it never did, we assumed it was UDT related and lost interest,
> > since it didn't have a big impact. I will try to carve some time to look
> > into this some more and let you know if I find anything.
> >
> > On Wed, Oct 12, 2016 at 9:24 PM Vladimir Yudovin <vl...@winguzone.com>
> > wrote:
> >>
> >> The data is actually appended. not overwritten.
> >> Strange, can you send exactly operators?
> >>
> >> Here is example I do:
> >> CREATE KEYSPACE events WITH replication = {'class': 'SimpleStrategy',
> >> 'replication_factor': 1};
> >> CREATE TABLE events.data (id int primary key, events list<text>);
> >> INSERT INTO events.data (id, events) VALUES ( 0, ['a']);
> >> SELECT * FROM events.data ;
> >> id | events
> >> ----+--------
> >> 0 | ['a']
> >>
> >> (1 rows)
> >>
> >> INSERT INTO events.data (id, events) VALUES ( 0, ['b']);
> >> SELECT * FROM events.data ;
> >> id | events
> >> ----+--------
> >> 0 | ['b']
> >>
> >> (1 rows)
> >>
> >> As you see, 'a' was overwritten by 'b'
> >>
> >>
> >> Best regards, Vladimir Yudovin,
> >> Winguzone - Hosted Cloud Cassandra on Azure and SoftLayer.
> >> Launch your cluster in minutes.
> >>
> >>
> >> ---- On Wed, 12 Oct 2016 23:58:23 -0400Aoi Kadoya <ca...@gmail.com>
>
> >> wrote ----
> >>
> >> yes, that's what I thought. but, when I use these forms,
> >> INSERT ... ['A']
> >> INSERT ... ['B']
> >>
> >> The data is actually appended. not overwritten.
> >> so I guess this is something unexpected?
> >>
> >> Thanks,
> >> Aoi
> >>
> >> 2016-10-12 20:55 GMT-07:00 Vladimir Yudovin <vl...@winguzone.com>:
> >> > If you use form
> >> > INSERT ... ['A']
> >> > INSERT ... ['B']
> >> >
> >> > latest INSERT will overwrite first, because this insert the whole
> list.
> >> > It's
> >> > better to use UPDATE like:
> >> > UPDATE ... SET events = events + ['A']
> >> > UPDATE ... SET events = events + ['B']
> >> > These operations add new elements to the end of existing list.
> >> >
> >> >
> >> > From here
> >> > https://docs.datastax.com/en/cql/3.0/cql/cql_using/use_list_t.html
> >> > :
> >> >
> >> > These update operations are implemented internally without any
> >> > read-before-write. Appending and prepending a new element to the list
> >> > writes
> >> > only the new element.
> >> >
> >> >
> >> > Best regards, Vladimir Yudovin,
> >> > Winguzone - Hosted Cloud Cassandra on Azure and SoftLayer.
> >> > Launch your cluster in minutes.
> >> >
> >> >
> >> > ---- On Wed, 12 Oct 2016 17:39:46 -0400Aoi Kadoya <
> cadyan.aoi@gmail.com>
> >> > wrote ----
> >> >
> >> > Hi,
> >> >
> >> > When inserting different data into a list type column from different
> >> > clients at the same time, is data supposed to be combined into one
> >> > list?
> >> >
> >> > For example, if these 2 queries were requested from clients at the
> >> > same timing, how events list should look like after?
> >> >
> >> > INSERT INTO cycling.upcoming_calendar (year, month, events) VALUES
> >> > (2015, 06, ['A']);
> >> > INSERT INTO cycling.upcoming_calendar (year, month, events) VALUES
> >> > (2015, 06, ['B']);
> >> >
> >> > In my understanding, each operation should be treated as atomic,
> which
> >> > makes me think that even if client throw the queries at the same
> time,
> >> > cassandra would take them separately and the last insert would update
> >> > the events list. (= data should be either ['A'] or ['B'])
> >> >
> >> > In my environment, I found that some data was saved as like ['A',B']
> >> > in the case like above.
> >> > Is this expected behavior of list data type?
> >> >
> >> > I am still new to cassandra and trying to make myself understood how
> >> > this happened.
> >> > Appreciate if you could help me with figuring this out!
> >> >
> >> > Thanks,
> >> > Aoi
> >> >
> >> >
>
>