You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by dragos cernahoschi <dr...@gmail.com> on 2010/11/08 17:05:56 UTC

CASSANDRA-1472 (bitmap indexes)

Hi,

I've got an exception during the following test:

test machine: core 2 duo 2.93 2GB RAM Ubuntu 10.04

test scenario:
- 1 column family
- about 15 columns
- 7 indexed columns (bitmap)
- 26 million rows (insert operation went fine)
- thrift "query" on 3 of the indexed columns with get_indexed_slices (count:
100)
- got the following exception:

10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal exception in
thread Thread[ReadStage:3,5,main]
java.io.IOError: java.io.FileNotFoundException:
/home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db (Too many open
files)
    at
org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:78)
    at
org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.openBin(BitmapIndexReader.java:226)
    at
org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.iterator(BitmapIndexReader.java:214)
    at
org.apache.cassandra.io.sstable.SSTableReader.scan(SSTableReader.java:523)
    at
org.apache.cassandra.db.secindex.KeysBitmapIndex.iterator(KeysBitmapIndex.java:103)
    at
org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1371)
    at
org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:41)
    at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:51)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.FileNotFoundException:
/home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db (Too many open
files)
    at java.io.FileInputStream.open(Native Method)
    at java.io.FileInputStream.<init>(FileInputStream.java:106)
    at
org.apache.avro.file.SeekableFileInput.<init>(SeekableFileInput.java:29)
    at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:38)
    at
org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:72)
    ... 10 more
10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal exception in
thread Thread[ReadStage:2,5,main]
java.io.IOError: java.io.FileNotFoundException:
/home/dragos/cassandra/data/keyspace/visit-e-1018-Index.db (Too many open
files)
    at
org.apache.cassandra.io.util.BufferedSegmentedFile.getSegment(BufferedSegmentedFile.java:68)
    at
org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:129)
    at
org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:1)
    at
org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:455)
    at
org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:572)
    at
org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(SSTableSliceIterator.java:49)
    at
org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:72)
    at
org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:84)
    at
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1190)
    at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1082)
    at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1052)
    at
org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1378)
    at
org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:41)
    at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:51)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.FileNotFoundException:
/home/dragos/cassandra/data/keyspace/visit-e-1018-Index.db (Too many open
files)
    at java.io.RandomAccessFile.open(Native Method)
    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:98)
    at
org.apache.cassandra.io.util.BufferedRandomAccessFile.<init>(BufferedRandomAccessFile.java:142)
    at
org.apache.cassandra.io.util.BufferedSegmentedFile.getSegment(BufferedSegmentedFile.java:62)
    ... 16 more

The same test worked fine with 1 million rows.

Re: CASSANDRA-1472 (bitmap indexes)

Posted by Stu Hood <st...@rackspace.com>.
Interesting... thanks for the report! I'll see if I can reproduce.

-----Original Message-----
From: "dragos cernahoschi" <dr...@gmail.com>
Sent: Tuesday, November 9, 2010 10:14am
To: dev@cassandra.apache.org
Subject: Re: CASSANDRA-1472 (bitmap indexes)

Meantime the number of SSTable(s) reduced to just 7. Initially the
compaction thread suffered the same problem of "too many open files" and
couldn't do any compaction.

But I'm still not able to run my tests: TimedOutException :(

On Tue, Nov 9, 2010 at 5:51 PM, Stu Hood <st...@rackspace.com> wrote:

> Hmm, 500 sstables is definitely a degenerate case: did you disable
> compaction? By default, Cassandra strives to keep the sstable count below
> ~32, since accesses to separate sstables require seeks.
>
> In this case, the query will seek 500 times to check the secondary index
> for each sstable: if it finds matches it will need to seek to find them in
> the primary index, and seek again for the data file.
>
> -----Original Message-----
> From: "dragos cernahoschi" <dr...@gmail.com>
> Sent: Tuesday, November 9, 2010 5:33am
> To: dev@cassandra.apache.org
> Subject: Re: CASSANDRA-1472 (bitmap indexes)
>
> There are about 500 SSTables (12GB of data including index data,
> statistics...) The source data file had about 3GB/26 million rows.
>
> I only test with EQ expressions for now.
>
> Increasing the file limit resolved the problem, but now I'm getting
> TimedOutException(s) from thrift when "querying" even with slice size of 1.
> Is my machine too small (core 2 duo 2.93 2GB RAM Ubuntu 10.04) for such a
> test?
>
> I really have some interesting sets of data to test indexes with and I want
> to make a comparison between ordinary indexes and bitmap indexes.
>
> Thank you,
> Dragos
>
> On Mon, Nov 8, 2010 at 6:42 PM, Stu Hood <st...@rackspace.com> wrote:
>
> > Dragos,
> >
> > How many SSTables did you have on disk, and were any of your index
> > expressions GT(E)/LT(E)?
> >
> > I expect that you are bumping into a limitation of the current
> > implementation: it opens up to 128 file-handles per SSTable in the worst
> > case for a GT/LT query (one per index bucket).
> >
> > A future version might remove that requirement, but for now, you should
> > probably bump the file handle limit on your machine to at least 2^16.
> >
> > Thanks,
> > Stu
> >
> >
> > -----Original Message-----
> > From: "dragos cernahoschi" <dr...@gmail.com>
> > Sent: Monday, November 8, 2010 10:05am
> > To: dev@cassandra.apache.org
> > Subject: CASSANDRA-1472 (bitmap indexes)
> >
> > Hi,
> >
> > I've got an exception during the following test:
> >
> > test machine: core 2 duo 2.93 2GB RAM Ubuntu 10.04
> >
> > test scenario:
> > - 1 column family
> > - about 15 columns
> > - 7 indexed columns (bitmap)
> > - 26 million rows (insert operation went fine)
> > - thrift "query" on 3 of the indexed columns with get_indexed_slices
> > (count:
> > 100)
> > - got the following exception:
> >
> > 10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal exception
> in
> > thread Thread[ReadStage:3,5,main]
> > java.io.IOError: java.io.FileNotFoundException:
> > /home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db (Too many
> open
> > files)
> >    at
> >
> >
> org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:78)
> >    at
> >
> >
> org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.openBin(BitmapIndexReader.java:226)
> >    at
> >
> >
> org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.iterator(BitmapIndexReader.java:214)
> >    at
> >
> org.apache.cassandra.io.sstable.SSTableReader.scan(SSTableReader.java:523)
> >    at
> >
> >
> org.apache.cassandra.db.secindex.KeysBitmapIndex.iterator(KeysBitmapIndex.java:103)
> >    at
> >
> org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1371)
> >    at
> >
> >
> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:41)
> >    at
> >
> >
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:51)
> >    at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >    at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >    at java.lang.Thread.run(Thread.java:662)
> > Caused by: java.io.FileNotFoundException:
> > /home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db (Too many
> open
> > files)
> >    at java.io.FileInputStream.open(Native Method)
> >    at java.io.FileInputStream.<init>(FileInputStream.java:106)
> >    at
> > org.apache.avro.file.SeekableFileInput.<init>(SeekableFileInput.java:29)
> >    at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:38)
> >    at
> >
> >
> org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:72)
> >    ... 10 more
> > 10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal exception
> in
> > thread Thread[ReadStage:2,5,main]
> > java.io.IOError: java.io.FileNotFoundException:
> > /home/dragos/cassandra/data/keyspace/visit-e-1018-Index.db (Too many open
> > files)
> >    at
> >
> >
> org.apache.cassandra.io.util.BufferedSegmentedFile.getSegment(BufferedSegmentedFile.java:68)
> >    at
> >
> >
> org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:129)
> >    at
> >
> >
> org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:1)
> >    at
> >
> >
> org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:455)
> >    at
> >
> >
> org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:572)
> >    at
> >
> >
> org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(SSTableSliceIterator.java:49)
> >    at
> >
> >
> org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:72)
> >    at
> >
> >
> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:84)
> >    at
> >
> >
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1190)
> >    at
> >
> >
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1082)
> >    at
> >
> >
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1052)
> >    at
> >
> org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1378)
> >    at
> >
> >
> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:41)
> >    at
> >
> >
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:51)
> >    at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >    at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >    at java.lang.Thread.run(Thread.java:662)
> > Caused by: java.io.FileNotFoundException:
> > /home/dragos/cassandra/data/keyspace/visit-e-1018-Index.db (Too many open
> > files)
> >    at java.io.RandomAccessFile.open(Native Method)
> >    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
> >    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:98)
> >    at
> >
> >
> org.apache.cassandra.io.util.BufferedRandomAccessFile.<init>(BufferedRandomAccessFile.java:142)
> >    at
> >
> >
> org.apache.cassandra.io.util.BufferedSegmentedFile.getSegment(BufferedSegmentedFile.java:62)
> >    ... 16 more
> >
> > The same test worked fine with 1 million rows.
> >
> >
> >
>
>
>



Re: CASSANDRA-1472 (bitmap indexes)

Posted by dragos cernahoschi <dr...@gmail.com>.
I've tried to reproduce my test data and the failing queries with stress.py.

So, I've slightly modified the stress.py and added 2 more indexes for
insertion. The indexrangeslice query is also performed on 3 indexes. The
insert is done using an uniform distribution of values.

Then:

1. python contrib/py_stress/stress.py -r -C 32 -x keys
2. python contrib/py_stress/stress.py -C 32 -o indexedrangeslice -t 3

The queries fails as in the attachment: not on the first query but on the
3rd, 4th ... not allways the same.

Dragos

On Mon, Nov 22, 2010 at 9:39 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> Let's start wth the low-hanging fruit: can you give steps to reproduce
> queries that fail right away?
>
> On Wed, Nov 17, 2010 at 10:37 AM, dragos cernahoschi
> <dr...@gmail.com> wrote:
> > Back. I've tested the keys index pagination once again. 0.7 head. Smaller
> > data set: 1 million rows. It seems there are still some issues:
> >
> > 1. *test*: query on one column, count: 1000, expected number of distinct
> > results: 48251
> >    *result*: 5 pages of 1000 results, than, after the 6th page, the
> results
> > begin to repeat, I would expect that repetition begins after the 48251-th
> > row
> >
> > 2. *test*: query on 3 columns, count: 10 (count 100, count 1000 failed
> with
> > time out)
> >    *result*: 1 page of 10 results, than second page => time out
> >
> > 3. There are queries with combinations of 2, 3 columns that fail right
> away
> > with time out (count 10, 100).
> >
> > Dragos
> >
> >
> > On Mon, Nov 15, 2010 at 2:29 PM, Jonathan Ellis <jb...@gmail.com>
> wrote:
> >
> >> On Mon, Nov 15, 2010 at 5:57 AM, dragos cernahoschi
> >> <dr...@gmail.com> wrote:
> >> > I've tested 0.7-beta3 branch index feature without the 1472 patch. The
> >> > queries on more than one column works better than the patched version,
> >> but
> >> > definitely not correctly.
> >>
> >> Please test 0.7 branch head, as you can see from the CHANGES there
> >> have been a lot of fixes.
> >>
> >> > 1.
> >> > 2.
> >> > 4.
> >>
> >> Should be fixed in head.
> >>
> >> > 3. Is there any example on the pagination feature? (without knowing
> the
> >> > expected number of rows).
> >>
> >> Same way you paginate through range slices or columns within a row,
> >> set start to the last result you got w/ previous query.
> >>
> >> > Will the get_indexed_slices return an empty list when there is no more
> >> > results?
> >>
> >> No, all queries are start-inclusive.
> >>
> >> --
> >> Jonathan Ellis
> >> Project Chair, Apache Cassandra
> >> co-founder of Riptano, the source for professional Cassandra support
> >> http://riptano.com
> >>
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Re: CASSANDRA-1472 (bitmap indexes)

Posted by Jonathan Ellis <jb...@gmail.com>.
Let's start wth the low-hanging fruit: can you give steps to reproduce
queries that fail right away?

On Wed, Nov 17, 2010 at 10:37 AM, dragos cernahoschi
<dr...@gmail.com> wrote:
> Back. I've tested the keys index pagination once again. 0.7 head. Smaller
> data set: 1 million rows. It seems there are still some issues:
>
> 1. *test*: query on one column, count: 1000, expected number of distinct
> results: 48251
>    *result*: 5 pages of 1000 results, than, after the 6th page, the results
> begin to repeat, I would expect that repetition begins after the 48251-th
> row
>
> 2. *test*: query on 3 columns, count: 10 (count 100, count 1000 failed with
> time out)
>    *result*: 1 page of 10 results, than second page => time out
>
> 3. There are queries with combinations of 2, 3 columns that fail right away
> with time out (count 10, 100).
>
> Dragos
>
>
> On Mon, Nov 15, 2010 at 2:29 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>
>> On Mon, Nov 15, 2010 at 5:57 AM, dragos cernahoschi
>> <dr...@gmail.com> wrote:
>> > I've tested 0.7-beta3 branch index feature without the 1472 patch. The
>> > queries on more than one column works better than the patched version,
>> but
>> > definitely not correctly.
>>
>> Please test 0.7 branch head, as you can see from the CHANGES there
>> have been a lot of fixes.
>>
>> > 1.
>> > 2.
>> > 4.
>>
>> Should be fixed in head.
>>
>> > 3. Is there any example on the pagination feature? (without knowing the
>> > expected number of rows).
>>
>> Same way you paginate through range slices or columns within a row,
>> set start to the last result you got w/ previous query.
>>
>> > Will the get_indexed_slices return an empty list when there is no more
>> > results?
>>
>> No, all queries are start-inclusive.
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: CASSANDRA-1472 (bitmap indexes)

Posted by dragos cernahoschi <dr...@gmail.com>.
Back. I've tested the keys index pagination once again. 0.7 head. Smaller
data set: 1 million rows. It seems there are still some issues:

1. *test*: query on one column, count: 1000, expected number of distinct
results: 48251
    *result*: 5 pages of 1000 results, than, after the 6th page, the results
begin to repeat, I would expect that repetition begins after the 48251-th
row

2. *test*: query on 3 columns, count: 10 (count 100, count 1000 failed with
time out)
    *result*: 1 page of 10 results, than second page => time out

3. There are queries with combinations of 2, 3 columns that fail right away
with time out (count 10, 100).

Dragos


On Mon, Nov 15, 2010 at 2:29 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> On Mon, Nov 15, 2010 at 5:57 AM, dragos cernahoschi
> <dr...@gmail.com> wrote:
> > I've tested 0.7-beta3 branch index feature without the 1472 patch. The
> > queries on more than one column works better than the patched version,
> but
> > definitely not correctly.
>
> Please test 0.7 branch head, as you can see from the CHANGES there
> have been a lot of fixes.
>
> > 1.
> > 2.
> > 4.
>
> Should be fixed in head.
>
> > 3. Is there any example on the pagination feature? (without knowing the
> > expected number of rows).
>
> Same way you paginate through range slices or columns within a row,
> set start to the last result you got w/ previous query.
>
> > Will the get_indexed_slices return an empty list when there is no more
> > results?
>
> No, all queries are start-inclusive.
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Re: CASSANDRA-1472 (bitmap indexes)

Posted by Jonathan Ellis <jb...@gmail.com>.
On Mon, Nov 15, 2010 at 5:57 AM, dragos cernahoschi
<dr...@gmail.com> wrote:
> I've tested 0.7-beta3 branch index feature without the 1472 patch. The
> queries on more than one column works better than the patched version, but
> definitely not correctly.

Please test 0.7 branch head, as you can see from the CHANGES there
have been a lot of fixes.

> 1.
> 2.
> 4.

Should be fixed in head.

> 3. Is there any example on the pagination feature? (without knowing the
> expected number of rows).

Same way you paginate through range slices or columns within a row,
set start to the last result you got w/ previous query.

> Will the get_indexed_slices return an empty list when there is no more
> results?

No, all queries are start-inclusive.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: CASSANDRA-1472 (bitmap indexes)

Posted by dragos cernahoschi <dr...@gmail.com>.
I've tested 0.7-beta3 branch index feature without the 1472 patch. The
queries on more than one column works better than the patched version, but
definitely not correctly.

1.
- query on 3 columns, start key 1, row count 1 => no results
- query on same columns, start key 1, row count 10 => 8 results

2.
- same query, start key 1, row count 2 => 1 result
- query again, start key = max (keys from prev query) + 1, row count 2
=>*time out, infinite cycle
*

3. Is there any example on the pagination feature? (without knowing the
expected number of rows).

Will the get_indexed_slices return an empty list when there is no more
results?

- query on 1 column, start key 1, row count 1000 => ok
- same query, start key = max (keys from prev query) + 1, row count 1000 =>
ok
...
- *at some point the max (keys from prev query) < startkey and my pagination
loop runs forever*

Maybe I'm missing something on this.

4.
- query on 1 column, row count 1000 => ok
- query on 3 columns, row count 100 => time out (there is no infinite loop,
the thread eventually terminates)

Dragos

On Sun, Nov 14, 2010 at 2:34 AM, Stu Hood <st...@gmail.com> wrote:

> > Is it worth testing 0.7-branch-without-1472 to make sure of that?
> Dragos: if you have time, this would be helpful. If you already have a KEYS
> index created, you shouldn't need to re-load the data, as the file format
> hasn't changed.
>
> Thanks,
> Stu
>
> On Sat, Nov 13, 2010 at 4:40 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>
> > Is it worth testing 0.7-branch-without-1472 to make sure of that?
> >
> > On Fri, Nov 12, 2010 at 10:28 AM, Stu Hood <st...@gmail.com> wrote:
> > > Great, thanks for the variable Dragos: I'm fairly sure I broke this in
> > the
> > > refactoring I did in 1472 to fit in a second index type.
> > >
> > >
> > > On Fri, Nov 12, 2010 at 4:03 AM, dragos cernahoschi <
> > > dragos.cernahoschi@gmail.com> wrote:
> > >
> > >> I confirm: the KEYS indexes have the same behavior as the KEYS_BITMAP
> > >> indexes: time out/succeed on the same queries.
> > >>
> > >> By the way, the insert of my data set with KEYS_BITMAP is much faster
> > than
> > >> KEYS (about 5.5 times) and less gc intensive.
> > >>
> > >> Dragos
> > >>
> > >> On Tue, Nov 9, 2010 at 8:05 PM, Stu Hood <st...@rackspace.com>
> > wrote:
> > >>
> > >> > Interesting, thanks for the info.
> > >> >
> > >> > Perhaps the limitation is that index queries involving multiple
> > clauses
> > >> are
> > >> > currently implemented using brute-force filtering rather than an
> index
> > >> join?
> > >> > The bitmap indexes have native support for this type of join, but
> it's
> > >> not
> > >> > being used yet.
> > >> >
> > >> > To confirm: have you tried the same scenario with KEYS indexes? They
> > use
> > >> > the same codepath for multiple index expressions, and should
> > experience
> > >> the
> > >> > same timeouts. Also, can you rerun the KEYS_BITMAP test with DEBUG
> > >> logging
> > >> > enabled, to ensure that we aren't going into some kind of infinite
> > loop?
> > >> >
> > >> > Thanks for the help,
> > >> > Stu
> > >> >
> > >> > -----Original Message-----
> > >> > From: "dragos cernahoschi" <dr...@gmail.com>
> > >> > Sent: Tuesday, November 9, 2010 11:50am
> > >> > To: dev@cassandra.apache.org
> > >> > Subject: Re: CASSANDRA-1472 (bitmap indexes)
> > >> >
> > >> > I'm running the query on three columns with cardinalities: 22, 17
> and
> > >> > 10.
> > >> > Interesting, if combining columns with cardinalities:
> > >> >
> > >> > 22 + 17 => no exception
> > >> > 22 + 10 => no exception
> > >> > 10 + 17 => timed out exception
> > >> > 22 + 17 + 10 => timed out exception
> > >> >
> > >> >
> > >> > On Tue, Nov 9, 2010 at 6:29 PM, Stu Hood <st...@rackspace.com>
> > wrote:
> > >> >
> > >> > > Can you tell me a little bit about your key distribution? How many
> > >> unique
> > >> > > values are indexed (the cardinality)?
> > >> > >
> > >> > > Until the OrBiC projection I mention on 1472 is implemented, the
> > >> > > bitmap
> > >> > > secondary indexes will perform terribly for high cardinality
> > datasets.
> > >> > >
> > >> > > Thanks!
> > >> > >
> > >> > >
> > >> > > -----Original Message-----
> > >> > > From: "dragos cernahoschi" <dr...@gmail.com>
> > >> > > Sent: Tuesday, November 9, 2010 10:14am
> > >> > > To: dev@cassandra.apache.org
> > >> > > Subject: Re: CASSANDRA-1472 (bitmap indexes)
> > >> > >
> > >> > > Meantime the number of SSTable(s) reduced to just 7. Initially the
> > >> > > compaction thread suffered the same problem of "too many open
> files"
> > >> and
> > >> > > couldn't do any compaction.
> > >> > >
> > >> > > But I'm still not able to run my tests: TimedOutException :(
> > >> > >
> > >> > > On Tue, Nov 9, 2010 at 5:51 PM, Stu Hood <st...@rackspace.com>
> > >> wrote:
> > >> > >
> > >> > > > Hmm, 500 sstables is definitely a degenerate case: did you
> disable
> > >> > > > compaction? By default, Cassandra strives to keep the sstable
> > count
> > >> > below
> > >> > > > ~32, since accesses to separate sstables require seeks.
> > >> > > >
> > >> > > > In this case, the query will seek 500 times to check the
> secondary
> > >> > index
> > >> > > > for each sstable: if it finds matches it will need to seek to
> find
> > >> them
> > >> > > in
> > >> > > > the primary index, and seek again for the data file.
> > >> > > >
> > >> > > > -----Original Message-----
> > >> > > > From: "dragos cernahoschi" <dr...@gmail.com>
> > >> > > > Sent: Tuesday, November 9, 2010 5:33am
> > >> > > > To: dev@cassandra.apache.org
> > >> > > > Subject: Re: CASSANDRA-1472 (bitmap indexes)
> > >> > > >
> > >> > > > There are about 500 SSTables (12GB of data including index data,
> > >> > > > statistics...) The source data file had about 3GB/26 million
> rows.
> > >> > > >
> > >> > > > I only test with EQ expressions for now.
> > >> > > >
> > >> > > > Increasing the file limit resolved the problem, but now I'm
> > getting
> > >> > > > TimedOutException(s) from thrift when "querying" even with slice
> > >> > > > size
> > >> > of
> > >> > > 1.
> > >> > > > Is my machine too small (core 2 duo 2.93 2GB RAM Ubuntu 10.04)
> for
> > >> such
> > >> > a
> > >> > > > test?
> > >> > > >
> > >> > > > I really have some interesting sets of data to test indexes with
> > and
> > >> I
> > >> > > want
> > >> > > > to make a comparison between ordinary indexes and bitmap
> indexes.
> > >> > > >
> > >> > > > Thank you,
> > >> > > > Dragos
> > >> > > >
> > >> > > > On Mon, Nov 8, 2010 at 6:42 PM, Stu Hood <
> stu.hood@rackspace.com>
> > >> > wrote:
> > >> > > >
> > >> > > > > Dragos,
> > >> > > > >
> > >> > > > > How many SSTables did you have on disk, and were any of your
> > index
> > >> > > > > expressions GT(E)/LT(E)?
> > >> > > > >
> > >> > > > > I expect that you are bumping into a limitation of the current
> > >> > > > > implementation: it opens up to 128 file-handles per SSTable in
> > the
> > >> > > worst
> > >> > > > > case for a GT/LT query (one per index bucket).
> > >> > > > >
> > >> > > > > A future version might remove that requirement, but for now,
> you
> > >> > should
> > >> > > > > probably bump the file handle limit on your machine to at
> least
> > >> 2^16.
> > >> > > > >
> > >> > > > > Thanks,
> > >> > > > > Stu
> > >> > > > >
> > >> > > > >
> > >> > > > > -----Original Message-----
> > >> > > > > From: "dragos cernahoschi" <dr...@gmail.com>
> > >> > > > > Sent: Monday, November 8, 2010 10:05am
> > >> > > > > To: dev@cassandra.apache.org
> > >> > > > > Subject: CASSANDRA-1472 (bitmap indexes)
> > >> > > > >
> > >> > > > > Hi,
> > >> > > > >
> > >> > > > > I've got an exception during the following test:
> > >> > > > >
> > >> > > > > test machine: core 2 duo 2.93 2GB RAM Ubuntu 10.04
> > >> > > > >
> > >> > > > > test scenario:
> > >> > > > > - 1 column family
> > >> > > > > - about 15 columns
> > >> > > > > - 7 indexed columns (bitmap)
> > >> > > > > - 26 million rows (insert operation went fine)
> > >> > > > > - thrift "query" on 3 of the indexed columns with
> > >> get_indexed_slices
> > >> > > > > (count:
> > >> > > > > 100)
> > >> > > > > - got the following exception:
> > >> > > > >
> > >> > > > > 10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal
> > >> > > exception
> > >> > > > in
> > >> > > > > thread Thread[ReadStage:3,5,main]
> > >> > > > > java.io.IOError: java.io.FileNotFoundException:
> > >> > > > > /home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db
> > (Too
> > >> > many
> > >> > > > open
> > >> > > > > files)
> > >> > > > >    at
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> >
> org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:78)
> > >> > > > >    at
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> >
> org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.openBin(BitmapIndexReader.java:226)
> > >> > > > >    at
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> >
> org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.iterator(BitmapIndexReader.java:214)
> > >> > > > >    at
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.cassandra.io.sstable.SSTableReader.scan(SSTableReader.java:523)
> > >> > > > >    at
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> >
> org.apache.cassandra.db.secindex.KeysBitmapIndex.iterator(KeysBitmapIndex.java:103)
> > >> > > > >    at
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> >
> org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1371)
> > >> > > > >    at
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> >
> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:41)
> > >> > > > >    at
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> >
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:51)
> > >> > > > >    at
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> > >> > > > >    at
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> > >> > > > >    at java.lang.Thread.run(Thread.java:662)
> > >> > > > > Caused by: java.io.FileNotFoundException:
> > >> > > > > /home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db
> > (Too
> > >> > many
> > >> > > > open
> > >> > > > > files)
> > >> > > > >    at java.io.FileInputStream.open(Native Method)
> > >> > > > >    at java.io.FileInputStream.<init>(FileInputStream.java:106)
> > >> > > > >    at
> > >> > > > >
> > >> > >
> > >>
> org.apache.avro.file.SeekableFileInput.<init>(SeekableFileInput.java:29)
> > >> > > > >    at
> > >> > > org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:38)
> > >> > > > >    at
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> >
> org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:72)
> > >> > > > >    ... 10 more
> > >> > > > > 10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal
> > >> > > exception
> > >> > > > in
> > >> > > > > thread Thread[ReadStage:2,5,main]
> > >> > > > > java.io.IOError: java.io.FileNotFoundException:
> > >> > > > > /home/dragos/cassandra/data/keyspace/visit-e-1018-Index.db
> (Too
> > >> many
> > >> > > open
> > >> > > > > files)
> > >> > > > >    at
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> >
> org.apache.cassandra.io.util.BufferedSegmentedFile.getSegment(BufferedSegmentedFile.java:68)
> > >> > > > >    at
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> >
> org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:129)
> > >> > > > >    at
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> >
> org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:1)
> > >> > > > >    at
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> >
> org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:455)
> > >> > > > >    at
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> >
> org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:572)
> > >> > > > >    at
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> >
> org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(SSTableSliceIterator.java:49)
> > >> > > > >    at
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> >
> org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:72)
> > >> > > > >    at
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> >
> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:84)
> > >> > > > >    at
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> >
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1190)
> > >> > > > >    at
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> >
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1082)
> > >> > > > >    at
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> >
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1052)
> > >> > > > >    at
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> >
> org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1378)
> > >> > > > >    at
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> >
> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:41)
> > >> > > > >    at
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> >
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:51)
> > >> > > > >    at
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> > >> > > > >    at
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> > >> > > > >    at java.lang.Thread.run(Thread.java:662)
> > >> > > > > Caused by: java.io.FileNotFoundException:
> > >> > > > > /home/dragos/cassandra/data/keyspace/visit-e-1018-Index.db
> (Too
> > >> many
> > >> > > open
> > >> > > > > files)
> > >> > > > >    at java.io.RandomAccessFile.open(Native Method)
> > >> > > > >    at
> java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
> > >> > > > >    at
> java.io.RandomAccessFile.<init>(RandomAccessFile.java:98)
> > >> > > > >    at
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> >
> org.apache.cassandra.io.util.BufferedRandomAccessFile.<init>(BufferedRandomAccessFile.java:142)
> > >> > > > >    at
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> >
> org.apache.cassandra.io.util.BufferedSegmentedFile.getSegment(BufferedSegmentedFile.java:62)
> > >> > > > >    ... 16 more
> > >> > > > >
> > >> > > > > The same test worked fine with 1 million rows.
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > >
> > >> > >
> > >> > >
> > >> >
> > >> >
> > >> >
> > >>
> > >
> >
> >
> >
> > --
> > Jonathan Ellis
> > Project Chair, Apache Cassandra
> > co-founder of Riptano, the source for professional Cassandra support
> > http://riptano.com
> >
>

Re: CASSANDRA-1472 (bitmap indexes)

Posted by dragos cernahoschi <dr...@gmail.com>.
No problem. I'll do the test on Monday.

On Nov 14, 2010 2:35 AM, "Stu Hood" <st...@gmail.com> wrote:

> Is it worth testing 0.7-branch-without-1472 to make sure of that?
Dragos: if you have time, this would be helpful. If you already have a KEYS
index created, you shouldn't need to re-load the data, as the file format
hasn't changed.

Thanks,
Stu


On Sat, Nov 13, 2010 at 4:40 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> Is it worth testing 0...

Re: CASSANDRA-1472 (bitmap indexes)

Posted by Stu Hood <st...@gmail.com>.
> Is it worth testing 0.7-branch-without-1472 to make sure of that?
Dragos: if you have time, this would be helpful. If you already have a KEYS
index created, you shouldn't need to re-load the data, as the file format
hasn't changed.

Thanks,
Stu

On Sat, Nov 13, 2010 at 4:40 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> Is it worth testing 0.7-branch-without-1472 to make sure of that?
>
> On Fri, Nov 12, 2010 at 10:28 AM, Stu Hood <st...@gmail.com> wrote:
> > Great, thanks for the variable Dragos: I'm fairly sure I broke this in
> the
> > refactoring I did in 1472 to fit in a second index type.
> >
> >
> > On Fri, Nov 12, 2010 at 4:03 AM, dragos cernahoschi <
> > dragos.cernahoschi@gmail.com> wrote:
> >
> >> I confirm: the KEYS indexes have the same behavior as the KEYS_BITMAP
> >> indexes: time out/succeed on the same queries.
> >>
> >> By the way, the insert of my data set with KEYS_BITMAP is much faster
> than
> >> KEYS (about 5.5 times) and less gc intensive.
> >>
> >> Dragos
> >>
> >> On Tue, Nov 9, 2010 at 8:05 PM, Stu Hood <st...@rackspace.com>
> wrote:
> >>
> >> > Interesting, thanks for the info.
> >> >
> >> > Perhaps the limitation is that index queries involving multiple
> clauses
> >> are
> >> > currently implemented using brute-force filtering rather than an index
> >> join?
> >> > The bitmap indexes have native support for this type of join, but it's
> >> not
> >> > being used yet.
> >> >
> >> > To confirm: have you tried the same scenario with KEYS indexes? They
> use
> >> > the same codepath for multiple index expressions, and should
> experience
> >> the
> >> > same timeouts. Also, can you rerun the KEYS_BITMAP test with DEBUG
> >> logging
> >> > enabled, to ensure that we aren't going into some kind of infinite
> loop?
> >> >
> >> > Thanks for the help,
> >> > Stu
> >> >
> >> > -----Original Message-----
> >> > From: "dragos cernahoschi" <dr...@gmail.com>
> >> > Sent: Tuesday, November 9, 2010 11:50am
> >> > To: dev@cassandra.apache.org
> >> > Subject: Re: CASSANDRA-1472 (bitmap indexes)
> >> >
> >> > I'm running the query on three columns with cardinalities: 22, 17 and
> >> > 10.
> >> > Interesting, if combining columns with cardinalities:
> >> >
> >> > 22 + 17 => no exception
> >> > 22 + 10 => no exception
> >> > 10 + 17 => timed out exception
> >> > 22 + 17 + 10 => timed out exception
> >> >
> >> >
> >> > On Tue, Nov 9, 2010 at 6:29 PM, Stu Hood <st...@rackspace.com>
> wrote:
> >> >
> >> > > Can you tell me a little bit about your key distribution? How many
> >> unique
> >> > > values are indexed (the cardinality)?
> >> > >
> >> > > Until the OrBiC projection I mention on 1472 is implemented, the
> >> > > bitmap
> >> > > secondary indexes will perform terribly for high cardinality
> datasets.
> >> > >
> >> > > Thanks!
> >> > >
> >> > >
> >> > > -----Original Message-----
> >> > > From: "dragos cernahoschi" <dr...@gmail.com>
> >> > > Sent: Tuesday, November 9, 2010 10:14am
> >> > > To: dev@cassandra.apache.org
> >> > > Subject: Re: CASSANDRA-1472 (bitmap indexes)
> >> > >
> >> > > Meantime the number of SSTable(s) reduced to just 7. Initially the
> >> > > compaction thread suffered the same problem of "too many open files"
> >> and
> >> > > couldn't do any compaction.
> >> > >
> >> > > But I'm still not able to run my tests: TimedOutException :(
> >> > >
> >> > > On Tue, Nov 9, 2010 at 5:51 PM, Stu Hood <st...@rackspace.com>
> >> wrote:
> >> > >
> >> > > > Hmm, 500 sstables is definitely a degenerate case: did you disable
> >> > > > compaction? By default, Cassandra strives to keep the sstable
> count
> >> > below
> >> > > > ~32, since accesses to separate sstables require seeks.
> >> > > >
> >> > > > In this case, the query will seek 500 times to check the secondary
> >> > index
> >> > > > for each sstable: if it finds matches it will need to seek to find
> >> them
> >> > > in
> >> > > > the primary index, and seek again for the data file.
> >> > > >
> >> > > > -----Original Message-----
> >> > > > From: "dragos cernahoschi" <dr...@gmail.com>
> >> > > > Sent: Tuesday, November 9, 2010 5:33am
> >> > > > To: dev@cassandra.apache.org
> >> > > > Subject: Re: CASSANDRA-1472 (bitmap indexes)
> >> > > >
> >> > > > There are about 500 SSTables (12GB of data including index data,
> >> > > > statistics...) The source data file had about 3GB/26 million rows.
> >> > > >
> >> > > > I only test with EQ expressions for now.
> >> > > >
> >> > > > Increasing the file limit resolved the problem, but now I'm
> getting
> >> > > > TimedOutException(s) from thrift when "querying" even with slice
> >> > > > size
> >> > of
> >> > > 1.
> >> > > > Is my machine too small (core 2 duo 2.93 2GB RAM Ubuntu 10.04) for
> >> such
> >> > a
> >> > > > test?
> >> > > >
> >> > > > I really have some interesting sets of data to test indexes with
> and
> >> I
> >> > > want
> >> > > > to make a comparison between ordinary indexes and bitmap indexes.
> >> > > >
> >> > > > Thank you,
> >> > > > Dragos
> >> > > >
> >> > > > On Mon, Nov 8, 2010 at 6:42 PM, Stu Hood <st...@rackspace.com>
> >> > wrote:
> >> > > >
> >> > > > > Dragos,
> >> > > > >
> >> > > > > How many SSTables did you have on disk, and were any of your
> index
> >> > > > > expressions GT(E)/LT(E)?
> >> > > > >
> >> > > > > I expect that you are bumping into a limitation of the current
> >> > > > > implementation: it opens up to 128 file-handles per SSTable in
> the
> >> > > worst
> >> > > > > case for a GT/LT query (one per index bucket).
> >> > > > >
> >> > > > > A future version might remove that requirement, but for now, you
> >> > should
> >> > > > > probably bump the file handle limit on your machine to at least
> >> 2^16.
> >> > > > >
> >> > > > > Thanks,
> >> > > > > Stu
> >> > > > >
> >> > > > >
> >> > > > > -----Original Message-----
> >> > > > > From: "dragos cernahoschi" <dr...@gmail.com>
> >> > > > > Sent: Monday, November 8, 2010 10:05am
> >> > > > > To: dev@cassandra.apache.org
> >> > > > > Subject: CASSANDRA-1472 (bitmap indexes)
> >> > > > >
> >> > > > > Hi,
> >> > > > >
> >> > > > > I've got an exception during the following test:
> >> > > > >
> >> > > > > test machine: core 2 duo 2.93 2GB RAM Ubuntu 10.04
> >> > > > >
> >> > > > > test scenario:
> >> > > > > - 1 column family
> >> > > > > - about 15 columns
> >> > > > > - 7 indexed columns (bitmap)
> >> > > > > - 26 million rows (insert operation went fine)
> >> > > > > - thrift "query" on 3 of the indexed columns with
> >> get_indexed_slices
> >> > > > > (count:
> >> > > > > 100)
> >> > > > > - got the following exception:
> >> > > > >
> >> > > > > 10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal
> >> > > exception
> >> > > > in
> >> > > > > thread Thread[ReadStage:3,5,main]
> >> > > > > java.io.IOError: java.io.FileNotFoundException:
> >> > > > > /home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db
> (Too
> >> > many
> >> > > > open
> >> > > > > files)
> >> > > > >    at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >>
> org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:78)
> >> > > > >    at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >>
> org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.openBin(BitmapIndexReader.java:226)
> >> > > > >    at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >>
> org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.iterator(BitmapIndexReader.java:214)
> >> > > > >    at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.cassandra.io.sstable.SSTableReader.scan(SSTableReader.java:523)
> >> > > > >    at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >>
> org.apache.cassandra.db.secindex.KeysBitmapIndex.iterator(KeysBitmapIndex.java:103)
> >> > > > >    at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >>
> org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1371)
> >> > > > >    at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >>
> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:41)
> >> > > > >    at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >>
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:51)
> >> > > > >    at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >> > > > >    at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >> > > > >    at java.lang.Thread.run(Thread.java:662)
> >> > > > > Caused by: java.io.FileNotFoundException:
> >> > > > > /home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db
> (Too
> >> > many
> >> > > > open
> >> > > > > files)
> >> > > > >    at java.io.FileInputStream.open(Native Method)
> >> > > > >    at java.io.FileInputStream.<init>(FileInputStream.java:106)
> >> > > > >    at
> >> > > > >
> >> > >
> >> org.apache.avro.file.SeekableFileInput.<init>(SeekableFileInput.java:29)
> >> > > > >    at
> >> > > org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:38)
> >> > > > >    at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >>
> org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:72)
> >> > > > >    ... 10 more
> >> > > > > 10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal
> >> > > exception
> >> > > > in
> >> > > > > thread Thread[ReadStage:2,5,main]
> >> > > > > java.io.IOError: java.io.FileNotFoundException:
> >> > > > > /home/dragos/cassandra/data/keyspace/visit-e-1018-Index.db (Too
> >> many
> >> > > open
> >> > > > > files)
> >> > > > >    at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >>
> org.apache.cassandra.io.util.BufferedSegmentedFile.getSegment(BufferedSegmentedFile.java:68)
> >> > > > >    at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >>
> org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:129)
> >> > > > >    at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >>
> org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:1)
> >> > > > >    at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >>
> org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:455)
> >> > > > >    at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >>
> org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:572)
> >> > > > >    at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >>
> org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(SSTableSliceIterator.java:49)
> >> > > > >    at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >>
> org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:72)
> >> > > > >    at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >>
> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:84)
> >> > > > >    at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >>
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1190)
> >> > > > >    at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >>
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1082)
> >> > > > >    at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >>
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1052)
> >> > > > >    at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >>
> org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1378)
> >> > > > >    at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >>
> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:41)
> >> > > > >    at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >>
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:51)
> >> > > > >    at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >> > > > >    at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >> > > > >    at java.lang.Thread.run(Thread.java:662)
> >> > > > > Caused by: java.io.FileNotFoundException:
> >> > > > > /home/dragos/cassandra/data/keyspace/visit-e-1018-Index.db (Too
> >> many
> >> > > open
> >> > > > > files)
> >> > > > >    at java.io.RandomAccessFile.open(Native Method)
> >> > > > >    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
> >> > > > >    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:98)
> >> > > > >    at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >>
> org.apache.cassandra.io.util.BufferedRandomAccessFile.<init>(BufferedRandomAccessFile.java:142)
> >> > > > >    at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >>
> org.apache.cassandra.io.util.BufferedSegmentedFile.getSegment(BufferedSegmentedFile.java:62)
> >> > > > >    ... 16 more
> >> > > > >
> >> > > > > The same test worked fine with 1 million rows.
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > >
> >> > > >
> >> > > >
> >> > >
> >> > >
> >> > >
> >> >
> >> >
> >> >
> >>
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Re: CASSANDRA-1472 (bitmap indexes)

Posted by Jonathan Ellis <jb...@gmail.com>.
Is it worth testing 0.7-branch-without-1472 to make sure of that?

On Fri, Nov 12, 2010 at 10:28 AM, Stu Hood <st...@gmail.com> wrote:
> Great, thanks for the variable Dragos: I'm fairly sure I broke this in the
> refactoring I did in 1472 to fit in a second index type.
>
>
> On Fri, Nov 12, 2010 at 4:03 AM, dragos cernahoschi <
> dragos.cernahoschi@gmail.com> wrote:
>
>> I confirm: the KEYS indexes have the same behavior as the KEYS_BITMAP
>> indexes: time out/succeed on the same queries.
>>
>> By the way, the insert of my data set with KEYS_BITMAP is much faster than
>> KEYS (about 5.5 times) and less gc intensive.
>>
>> Dragos
>>
>> On Tue, Nov 9, 2010 at 8:05 PM, Stu Hood <st...@rackspace.com> wrote:
>>
>> > Interesting, thanks for the info.
>> >
>> > Perhaps the limitation is that index queries involving multiple clauses
>> are
>> > currently implemented using brute-force filtering rather than an index
>> join?
>> > The bitmap indexes have native support for this type of join, but it's
>> not
>> > being used yet.
>> >
>> > To confirm: have you tried the same scenario with KEYS indexes? They use
>> > the same codepath for multiple index expressions, and should experience
>> the
>> > same timeouts. Also, can you rerun the KEYS_BITMAP test with DEBUG
>> logging
>> > enabled, to ensure that we aren't going into some kind of infinite loop?
>> >
>> > Thanks for the help,
>> > Stu
>> >
>> > -----Original Message-----
>> > From: "dragos cernahoschi" <dr...@gmail.com>
>> > Sent: Tuesday, November 9, 2010 11:50am
>> > To: dev@cassandra.apache.org
>> > Subject: Re: CASSANDRA-1472 (bitmap indexes)
>> >
>> > I'm running the query on three columns with cardinalities: 22, 17 and
>> > 10.
>> > Interesting, if combining columns with cardinalities:
>> >
>> > 22 + 17 => no exception
>> > 22 + 10 => no exception
>> > 10 + 17 => timed out exception
>> > 22 + 17 + 10 => timed out exception
>> >
>> >
>> > On Tue, Nov 9, 2010 at 6:29 PM, Stu Hood <st...@rackspace.com> wrote:
>> >
>> > > Can you tell me a little bit about your key distribution? How many
>> unique
>> > > values are indexed (the cardinality)?
>> > >
>> > > Until the OrBiC projection I mention on 1472 is implemented, the
>> > > bitmap
>> > > secondary indexes will perform terribly for high cardinality datasets.
>> > >
>> > > Thanks!
>> > >
>> > >
>> > > -----Original Message-----
>> > > From: "dragos cernahoschi" <dr...@gmail.com>
>> > > Sent: Tuesday, November 9, 2010 10:14am
>> > > To: dev@cassandra.apache.org
>> > > Subject: Re: CASSANDRA-1472 (bitmap indexes)
>> > >
>> > > Meantime the number of SSTable(s) reduced to just 7. Initially the
>> > > compaction thread suffered the same problem of "too many open files"
>> and
>> > > couldn't do any compaction.
>> > >
>> > > But I'm still not able to run my tests: TimedOutException :(
>> > >
>> > > On Tue, Nov 9, 2010 at 5:51 PM, Stu Hood <st...@rackspace.com>
>> wrote:
>> > >
>> > > > Hmm, 500 sstables is definitely a degenerate case: did you disable
>> > > > compaction? By default, Cassandra strives to keep the sstable count
>> > below
>> > > > ~32, since accesses to separate sstables require seeks.
>> > > >
>> > > > In this case, the query will seek 500 times to check the secondary
>> > index
>> > > > for each sstable: if it finds matches it will need to seek to find
>> them
>> > > in
>> > > > the primary index, and seek again for the data file.
>> > > >
>> > > > -----Original Message-----
>> > > > From: "dragos cernahoschi" <dr...@gmail.com>
>> > > > Sent: Tuesday, November 9, 2010 5:33am
>> > > > To: dev@cassandra.apache.org
>> > > > Subject: Re: CASSANDRA-1472 (bitmap indexes)
>> > > >
>> > > > There are about 500 SSTables (12GB of data including index data,
>> > > > statistics...) The source data file had about 3GB/26 million rows.
>> > > >
>> > > > I only test with EQ expressions for now.
>> > > >
>> > > > Increasing the file limit resolved the problem, but now I'm getting
>> > > > TimedOutException(s) from thrift when "querying" even with slice
>> > > > size
>> > of
>> > > 1.
>> > > > Is my machine too small (core 2 duo 2.93 2GB RAM Ubuntu 10.04) for
>> such
>> > a
>> > > > test?
>> > > >
>> > > > I really have some interesting sets of data to test indexes with and
>> I
>> > > want
>> > > > to make a comparison between ordinary indexes and bitmap indexes.
>> > > >
>> > > > Thank you,
>> > > > Dragos
>> > > >
>> > > > On Mon, Nov 8, 2010 at 6:42 PM, Stu Hood <st...@rackspace.com>
>> > wrote:
>> > > >
>> > > > > Dragos,
>> > > > >
>> > > > > How many SSTables did you have on disk, and were any of your index
>> > > > > expressions GT(E)/LT(E)?
>> > > > >
>> > > > > I expect that you are bumping into a limitation of the current
>> > > > > implementation: it opens up to 128 file-handles per SSTable in the
>> > > worst
>> > > > > case for a GT/LT query (one per index bucket).
>> > > > >
>> > > > > A future version might remove that requirement, but for now, you
>> > should
>> > > > > probably bump the file handle limit on your machine to at least
>> 2^16.
>> > > > >
>> > > > > Thanks,
>> > > > > Stu
>> > > > >
>> > > > >
>> > > > > -----Original Message-----
>> > > > > From: "dragos cernahoschi" <dr...@gmail.com>
>> > > > > Sent: Monday, November 8, 2010 10:05am
>> > > > > To: dev@cassandra.apache.org
>> > > > > Subject: CASSANDRA-1472 (bitmap indexes)
>> > > > >
>> > > > > Hi,
>> > > > >
>> > > > > I've got an exception during the following test:
>> > > > >
>> > > > > test machine: core 2 duo 2.93 2GB RAM Ubuntu 10.04
>> > > > >
>> > > > > test scenario:
>> > > > > - 1 column family
>> > > > > - about 15 columns
>> > > > > - 7 indexed columns (bitmap)
>> > > > > - 26 million rows (insert operation went fine)
>> > > > > - thrift "query" on 3 of the indexed columns with
>> get_indexed_slices
>> > > > > (count:
>> > > > > 100)
>> > > > > - got the following exception:
>> > > > >
>> > > > > 10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal
>> > > exception
>> > > > in
>> > > > > thread Thread[ReadStage:3,5,main]
>> > > > > java.io.IOError: java.io.FileNotFoundException:
>> > > > > /home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db (Too
>> > many
>> > > > open
>> > > > > files)
>> > > > >    at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>> org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:78)
>> > > > >    at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>> org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.openBin(BitmapIndexReader.java:226)
>> > > > >    at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>> org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.iterator(BitmapIndexReader.java:214)
>> > > > >    at
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.cassandra.io.sstable.SSTableReader.scan(SSTableReader.java:523)
>> > > > >    at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>> org.apache.cassandra.db.secindex.KeysBitmapIndex.iterator(KeysBitmapIndex.java:103)
>> > > > >    at
>> > > > >
>> > > >
>> > >
>> >
>>
>> org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1371)
>> > > > >    at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:41)
>> > > > >    at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:51)
>> > > > >    at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> > > > >    at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> > > > >    at java.lang.Thread.run(Thread.java:662)
>> > > > > Caused by: java.io.FileNotFoundException:
>> > > > > /home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db (Too
>> > many
>> > > > open
>> > > > > files)
>> > > > >    at java.io.FileInputStream.open(Native Method)
>> > > > >    at java.io.FileInputStream.<init>(FileInputStream.java:106)
>> > > > >    at
>> > > > >
>> > >
>> org.apache.avro.file.SeekableFileInput.<init>(SeekableFileInput.java:29)
>> > > > >    at
>> > > org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:38)
>> > > > >    at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>> org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:72)
>> > > > >    ... 10 more
>> > > > > 10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal
>> > > exception
>> > > > in
>> > > > > thread Thread[ReadStage:2,5,main]
>> > > > > java.io.IOError: java.io.FileNotFoundException:
>> > > > > /home/dragos/cassandra/data/keyspace/visit-e-1018-Index.db (Too
>> many
>> > > open
>> > > > > files)
>> > > > >    at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>> org.apache.cassandra.io.util.BufferedSegmentedFile.getSegment(BufferedSegmentedFile.java:68)
>> > > > >    at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>> org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:129)
>> > > > >    at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>> org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:1)
>> > > > >    at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>> org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:455)
>> > > > >    at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>> org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:572)
>> > > > >    at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>> org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(SSTableSliceIterator.java:49)
>> > > > >    at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>> org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:72)
>> > > > >    at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:84)
>> > > > >    at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1190)
>> > > > >    at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1082)
>> > > > >    at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1052)
>> > > > >    at
>> > > > >
>> > > >
>> > >
>> >
>>
>> org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1378)
>> > > > >    at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:41)
>> > > > >    at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:51)
>> > > > >    at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> > > > >    at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> > > > >    at java.lang.Thread.run(Thread.java:662)
>> > > > > Caused by: java.io.FileNotFoundException:
>> > > > > /home/dragos/cassandra/data/keyspace/visit-e-1018-Index.db (Too
>> many
>> > > open
>> > > > > files)
>> > > > >    at java.io.RandomAccessFile.open(Native Method)
>> > > > >    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
>> > > > >    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:98)
>> > > > >    at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>> org.apache.cassandra.io.util.BufferedRandomAccessFile.<init>(BufferedRandomAccessFile.java:142)
>> > > > >    at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>> org.apache.cassandra.io.util.BufferedSegmentedFile.getSegment(BufferedSegmentedFile.java:62)
>> > > > >    ... 16 more
>> > > > >
>> > > > > The same test worked fine with 1 million rows.
>> > > > >
>> > > > >
>> > > > >
>> > > >
>> > > >
>> > > >
>> > >
>> > >
>> > >
>> >
>> >
>> >
>>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: CASSANDRA-1472 (bitmap indexes)

Posted by Stu Hood <st...@gmail.com>.
Great, thanks for the variable Dragos: I'm fairly sure I broke this in the
refactoring I did in 1472 to fit in a second index type.


On Fri, Nov 12, 2010 at 4:03 AM, dragos cernahoschi <
dragos.cernahoschi@gmail.com> wrote:

> I confirm: the KEYS indexes have the same behavior as the KEYS_BITMAP
> indexes: time out/succeed on the same queries.
>
> By the way, the insert of my data set with KEYS_BITMAP is much faster than
> KEYS (about 5.5 times) and less gc intensive.
>
> Dragos
>
> On Tue, Nov 9, 2010 at 8:05 PM, Stu Hood <st...@rackspace.com> wrote:
>
> > Interesting, thanks for the info.
> >
> > Perhaps the limitation is that index queries involving multiple clauses
> are
> > currently implemented using brute-force filtering rather than an index
> join?
> > The bitmap indexes have native support for this type of join, but it's
> not
> > being used yet.
> >
> > To confirm: have you tried the same scenario with KEYS indexes? They use
> > the same codepath for multiple index expressions, and should experience
> the
> > same timeouts. Also, can you rerun the KEYS_BITMAP test with DEBUG
> logging
> > enabled, to ensure that we aren't going into some kind of infinite loop?
> >
> > Thanks for the help,
> > Stu
> >
> > -----Original Message-----
> > From: "dragos cernahoschi" <dr...@gmail.com>
> > Sent: Tuesday, November 9, 2010 11:50am
> > To: dev@cassandra.apache.org
> > Subject: Re: CASSANDRA-1472 (bitmap indexes)
> >
> > I'm running the query on three columns with cardinalities: 22, 17 and 10.
> > Interesting, if combining columns with cardinalities:
> >
> > 22 + 17 => no exception
> > 22 + 10 => no exception
> > 10 + 17 => timed out exception
> > 22 + 17 + 10 => timed out exception
> >
> >
> > On Tue, Nov 9, 2010 at 6:29 PM, Stu Hood <st...@rackspace.com> wrote:
> >
> > > Can you tell me a little bit about your key distribution? How many
> unique
> > > values are indexed (the cardinality)?
> > >
> > > Until the OrBiC projection I mention on 1472 is implemented, the bitmap
> > > secondary indexes will perform terribly for high cardinality datasets.
> > >
> > > Thanks!
> > >
> > >
> > > -----Original Message-----
> > > From: "dragos cernahoschi" <dr...@gmail.com>
> > > Sent: Tuesday, November 9, 2010 10:14am
> > > To: dev@cassandra.apache.org
> > > Subject: Re: CASSANDRA-1472 (bitmap indexes)
> > >
> > > Meantime the number of SSTable(s) reduced to just 7. Initially the
> > > compaction thread suffered the same problem of "too many open files"
> and
> > > couldn't do any compaction.
> > >
> > > But I'm still not able to run my tests: TimedOutException :(
> > >
> > > On Tue, Nov 9, 2010 at 5:51 PM, Stu Hood <st...@rackspace.com>
> wrote:
> > >
> > > > Hmm, 500 sstables is definitely a degenerate case: did you disable
> > > > compaction? By default, Cassandra strives to keep the sstable count
> > below
> > > > ~32, since accesses to separate sstables require seeks.
> > > >
> > > > In this case, the query will seek 500 times to check the secondary
> > index
> > > > for each sstable: if it finds matches it will need to seek to find
> them
> > > in
> > > > the primary index, and seek again for the data file.
> > > >
> > > > -----Original Message-----
> > > > From: "dragos cernahoschi" <dr...@gmail.com>
> > > > Sent: Tuesday, November 9, 2010 5:33am
> > > > To: dev@cassandra.apache.org
> > > > Subject: Re: CASSANDRA-1472 (bitmap indexes)
> > > >
> > > > There are about 500 SSTables (12GB of data including index data,
> > > > statistics...) The source data file had about 3GB/26 million rows.
> > > >
> > > > I only test with EQ expressions for now.
> > > >
> > > > Increasing the file limit resolved the problem, but now I'm getting
> > > > TimedOutException(s) from thrift when "querying" even with slice size
> > of
> > > 1.
> > > > Is my machine too small (core 2 duo 2.93 2GB RAM Ubuntu 10.04) for
> such
> > a
> > > > test?
> > > >
> > > > I really have some interesting sets of data to test indexes with and
> I
> > > want
> > > > to make a comparison between ordinary indexes and bitmap indexes.
> > > >
> > > > Thank you,
> > > > Dragos
> > > >
> > > > On Mon, Nov 8, 2010 at 6:42 PM, Stu Hood <st...@rackspace.com>
> > wrote:
> > > >
> > > > > Dragos,
> > > > >
> > > > > How many SSTables did you have on disk, and were any of your index
> > > > > expressions GT(E)/LT(E)?
> > > > >
> > > > > I expect that you are bumping into a limitation of the current
> > > > > implementation: it opens up to 128 file-handles per SSTable in the
> > > worst
> > > > > case for a GT/LT query (one per index bucket).
> > > > >
> > > > > A future version might remove that requirement, but for now, you
> > should
> > > > > probably bump the file handle limit on your machine to at least
> 2^16.
> > > > >
> > > > > Thanks,
> > > > > Stu
> > > > >
> > > > >
> > > > > -----Original Message-----
> > > > > From: "dragos cernahoschi" <dr...@gmail.com>
> > > > > Sent: Monday, November 8, 2010 10:05am
> > > > > To: dev@cassandra.apache.org
> > > > > Subject: CASSANDRA-1472 (bitmap indexes)
> > > > >
> > > > > Hi,
> > > > >
> > > > > I've got an exception during the following test:
> > > > >
> > > > > test machine: core 2 duo 2.93 2GB RAM Ubuntu 10.04
> > > > >
> > > > > test scenario:
> > > > > - 1 column family
> > > > > - about 15 columns
> > > > > - 7 indexed columns (bitmap)
> > > > > - 26 million rows (insert operation went fine)
> > > > > - thrift "query" on 3 of the indexed columns with
> get_indexed_slices
> > > > > (count:
> > > > > 100)
> > > > > - got the following exception:
> > > > >
> > > > > 10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal
> > > exception
> > > > in
> > > > > thread Thread[ReadStage:3,5,main]
> > > > > java.io.IOError: java.io.FileNotFoundException:
> > > > > /home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db (Too
> > many
> > > > open
> > > > > files)
> > > > >    at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:78)
> > > > >    at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.openBin(BitmapIndexReader.java:226)
> > > > >    at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.iterator(BitmapIndexReader.java:214)
> > > > >    at
> > > > >
> > > >
> > >
> >
> org.apache.cassandra.io.sstable.SSTableReader.scan(SSTableReader.java:523)
> > > > >    at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.cassandra.db.secindex.KeysBitmapIndex.iterator(KeysBitmapIndex.java:103)
> > > > >    at
> > > > >
> > > >
> > >
> >
> org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1371)
> > > > >    at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:41)
> > > > >    at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:51)
> > > > >    at
> > > > >
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> > > > >    at
> > > > >
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> > > > >    at java.lang.Thread.run(Thread.java:662)
> > > > > Caused by: java.io.FileNotFoundException:
> > > > > /home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db (Too
> > many
> > > > open
> > > > > files)
> > > > >    at java.io.FileInputStream.open(Native Method)
> > > > >    at java.io.FileInputStream.<init>(FileInputStream.java:106)
> > > > >    at
> > > > >
> > >
> org.apache.avro.file.SeekableFileInput.<init>(SeekableFileInput.java:29)
> > > > >    at
> > > org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:38)
> > > > >    at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:72)
> > > > >    ... 10 more
> > > > > 10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal
> > > exception
> > > > in
> > > > > thread Thread[ReadStage:2,5,main]
> > > > > java.io.IOError: java.io.FileNotFoundException:
> > > > > /home/dragos/cassandra/data/keyspace/visit-e-1018-Index.db (Too
> many
> > > open
> > > > > files)
> > > > >    at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.cassandra.io.util.BufferedSegmentedFile.getSegment(BufferedSegmentedFile.java:68)
> > > > >    at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:129)
> > > > >    at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:1)
> > > > >    at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:455)
> > > > >    at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:572)
> > > > >    at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(SSTableSliceIterator.java:49)
> > > > >    at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:72)
> > > > >    at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:84)
> > > > >    at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1190)
> > > > >    at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1082)
> > > > >    at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1052)
> > > > >    at
> > > > >
> > > >
> > >
> >
> org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1378)
> > > > >    at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:41)
> > > > >    at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:51)
> > > > >    at
> > > > >
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> > > > >    at
> > > > >
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> > > > >    at java.lang.Thread.run(Thread.java:662)
> > > > > Caused by: java.io.FileNotFoundException:
> > > > > /home/dragos/cassandra/data/keyspace/visit-e-1018-Index.db (Too
> many
> > > open
> > > > > files)
> > > > >    at java.io.RandomAccessFile.open(Native Method)
> > > > >    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
> > > > >    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:98)
> > > > >    at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.cassandra.io.util.BufferedRandomAccessFile.<init>(BufferedRandomAccessFile.java:142)
> > > > >    at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.cassandra.io.util.BufferedSegmentedFile.getSegment(BufferedSegmentedFile.java:62)
> > > > >    ... 16 more
> > > > >
> > > > > The same test worked fine with 1 million rows.
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > >
> > >
> > >
> > >
> >
> >
> >
>

Re: CASSANDRA-1472 (bitmap indexes)

Posted by dragos cernahoschi <dr...@gmail.com>.
I confirm: the KEYS indexes have the same behavior as the KEYS_BITMAP
indexes: time out/succeed on the same queries.

By the way, the insert of my data set with KEYS_BITMAP is much faster than
KEYS (about 5.5 times) and less gc intensive.

Dragos

On Tue, Nov 9, 2010 at 8:05 PM, Stu Hood <st...@rackspace.com> wrote:

> Interesting, thanks for the info.
>
> Perhaps the limitation is that index queries involving multiple clauses are
> currently implemented using brute-force filtering rather than an index join?
> The bitmap indexes have native support for this type of join, but it's not
> being used yet.
>
> To confirm: have you tried the same scenario with KEYS indexes? They use
> the same codepath for multiple index expressions, and should experience the
> same timeouts. Also, can you rerun the KEYS_BITMAP test with DEBUG logging
> enabled, to ensure that we aren't going into some kind of infinite loop?
>
> Thanks for the help,
> Stu
>
> -----Original Message-----
> From: "dragos cernahoschi" <dr...@gmail.com>
> Sent: Tuesday, November 9, 2010 11:50am
> To: dev@cassandra.apache.org
> Subject: Re: CASSANDRA-1472 (bitmap indexes)
>
> I'm running the query on three columns with cardinalities: 22, 17 and 10.
> Interesting, if combining columns with cardinalities:
>
> 22 + 17 => no exception
> 22 + 10 => no exception
> 10 + 17 => timed out exception
> 22 + 17 + 10 => timed out exception
>
>
> On Tue, Nov 9, 2010 at 6:29 PM, Stu Hood <st...@rackspace.com> wrote:
>
> > Can you tell me a little bit about your key distribution? How many unique
> > values are indexed (the cardinality)?
> >
> > Until the OrBiC projection I mention on 1472 is implemented, the bitmap
> > secondary indexes will perform terribly for high cardinality datasets.
> >
> > Thanks!
> >
> >
> > -----Original Message-----
> > From: "dragos cernahoschi" <dr...@gmail.com>
> > Sent: Tuesday, November 9, 2010 10:14am
> > To: dev@cassandra.apache.org
> > Subject: Re: CASSANDRA-1472 (bitmap indexes)
> >
> > Meantime the number of SSTable(s) reduced to just 7. Initially the
> > compaction thread suffered the same problem of "too many open files" and
> > couldn't do any compaction.
> >
> > But I'm still not able to run my tests: TimedOutException :(
> >
> > On Tue, Nov 9, 2010 at 5:51 PM, Stu Hood <st...@rackspace.com> wrote:
> >
> > > Hmm, 500 sstables is definitely a degenerate case: did you disable
> > > compaction? By default, Cassandra strives to keep the sstable count
> below
> > > ~32, since accesses to separate sstables require seeks.
> > >
> > > In this case, the query will seek 500 times to check the secondary
> index
> > > for each sstable: if it finds matches it will need to seek to find them
> > in
> > > the primary index, and seek again for the data file.
> > >
> > > -----Original Message-----
> > > From: "dragos cernahoschi" <dr...@gmail.com>
> > > Sent: Tuesday, November 9, 2010 5:33am
> > > To: dev@cassandra.apache.org
> > > Subject: Re: CASSANDRA-1472 (bitmap indexes)
> > >
> > > There are about 500 SSTables (12GB of data including index data,
> > > statistics...) The source data file had about 3GB/26 million rows.
> > >
> > > I only test with EQ expressions for now.
> > >
> > > Increasing the file limit resolved the problem, but now I'm getting
> > > TimedOutException(s) from thrift when "querying" even with slice size
> of
> > 1.
> > > Is my machine too small (core 2 duo 2.93 2GB RAM Ubuntu 10.04) for such
> a
> > > test?
> > >
> > > I really have some interesting sets of data to test indexes with and I
> > want
> > > to make a comparison between ordinary indexes and bitmap indexes.
> > >
> > > Thank you,
> > > Dragos
> > >
> > > On Mon, Nov 8, 2010 at 6:42 PM, Stu Hood <st...@rackspace.com>
> wrote:
> > >
> > > > Dragos,
> > > >
> > > > How many SSTables did you have on disk, and were any of your index
> > > > expressions GT(E)/LT(E)?
> > > >
> > > > I expect that you are bumping into a limitation of the current
> > > > implementation: it opens up to 128 file-handles per SSTable in the
> > worst
> > > > case for a GT/LT query (one per index bucket).
> > > >
> > > > A future version might remove that requirement, but for now, you
> should
> > > > probably bump the file handle limit on your machine to at least 2^16.
> > > >
> > > > Thanks,
> > > > Stu
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: "dragos cernahoschi" <dr...@gmail.com>
> > > > Sent: Monday, November 8, 2010 10:05am
> > > > To: dev@cassandra.apache.org
> > > > Subject: CASSANDRA-1472 (bitmap indexes)
> > > >
> > > > Hi,
> > > >
> > > > I've got an exception during the following test:
> > > >
> > > > test machine: core 2 duo 2.93 2GB RAM Ubuntu 10.04
> > > >
> > > > test scenario:
> > > > - 1 column family
> > > > - about 15 columns
> > > > - 7 indexed columns (bitmap)
> > > > - 26 million rows (insert operation went fine)
> > > > - thrift "query" on 3 of the indexed columns with get_indexed_slices
> > > > (count:
> > > > 100)
> > > > - got the following exception:
> > > >
> > > > 10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal
> > exception
> > > in
> > > > thread Thread[ReadStage:3,5,main]
> > > > java.io.IOError: java.io.FileNotFoundException:
> > > > /home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db (Too
> many
> > > open
> > > > files)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:78)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.openBin(BitmapIndexReader.java:226)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.iterator(BitmapIndexReader.java:214)
> > > >    at
> > > >
> > >
> >
> org.apache.cassandra.io.sstable.SSTableReader.scan(SSTableReader.java:523)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.db.secindex.KeysBitmapIndex.iterator(KeysBitmapIndex.java:103)
> > > >    at
> > > >
> > >
> >
> org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1371)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:41)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:51)
> > > >    at
> > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> > > >    at
> > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> > > >    at java.lang.Thread.run(Thread.java:662)
> > > > Caused by: java.io.FileNotFoundException:
> > > > /home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db (Too
> many
> > > open
> > > > files)
> > > >    at java.io.FileInputStream.open(Native Method)
> > > >    at java.io.FileInputStream.<init>(FileInputStream.java:106)
> > > >    at
> > > >
> > org.apache.avro.file.SeekableFileInput.<init>(SeekableFileInput.java:29)
> > > >    at
> > org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:38)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:72)
> > > >    ... 10 more
> > > > 10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal
> > exception
> > > in
> > > > thread Thread[ReadStage:2,5,main]
> > > > java.io.IOError: java.io.FileNotFoundException:
> > > > /home/dragos/cassandra/data/keyspace/visit-e-1018-Index.db (Too many
> > open
> > > > files)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.io.util.BufferedSegmentedFile.getSegment(BufferedSegmentedFile.java:68)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:129)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:1)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:455)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:572)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(SSTableSliceIterator.java:49)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:72)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:84)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1190)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1082)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1052)
> > > >    at
> > > >
> > >
> >
> org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1378)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:41)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:51)
> > > >    at
> > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> > > >    at
> > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> > > >    at java.lang.Thread.run(Thread.java:662)
> > > > Caused by: java.io.FileNotFoundException:
> > > > /home/dragos/cassandra/data/keyspace/visit-e-1018-Index.db (Too many
> > open
> > > > files)
> > > >    at java.io.RandomAccessFile.open(Native Method)
> > > >    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
> > > >    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:98)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.io.util.BufferedRandomAccessFile.<init>(BufferedRandomAccessFile.java:142)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.io.util.BufferedSegmentedFile.getSegment(BufferedSegmentedFile.java:62)
> > > >    ... 16 more
> > > >
> > > > The same test worked fine with 1 million rows.
> > > >
> > > >
> > > >
> > >
> > >
> > >
> >
> >
> >
>
>
>

Re: CASSANDRA-1472 (bitmap indexes)

Posted by dragos cernahoschi <dr...@gmail.com>.
Welcome.

It seems exactly that: when running one of the queries that generates a
timed out exception, cassandra enters some kind of infinite loop.

Trace:

DEBUG 12:15:22,307 scan
DEBUG 12:15:22,348 restricted ranges for query
[78703492656118554854272571946195123045,0] are
[[78703492656118554854272571946195123045,0]]
DEBUG 12:15:22,348 scan ranges are
[78703492656118554854272571946195123045,0]
DEBUG 12:15:22,380 reading
org.apache.cassandra.db.IndexScanCommand@1544e44from 110@localhost
/127.0.0.1
DEBUG 12:15:22,402 For operator EQ on Lynx 2.7 in rows (1481600,3203072):
bins (12,12) of #<BitmapIndexReader [null] on
/home/dragos/cassandra/data/keyspace/visit-e-1097-1-Bitidx.db>
DEBUG 12:15:22,422 For operator EQ on Lynx 2.7 in rows (1852032,4003840):
bins (12,12) of #<BitmapIndexReader [null] on
/home/dragos/cassandra/data/keyspace/visit-e-1103-1-Bitidx.db>
DEBUG 12:15:22,423 For operator EQ on Lynx 2.7 in rows (718336,1551616):
bins (12,12) of #<BitmapIndexReader [null] on
/home/dragos/cassandra/data/keyspace/visit-e-1112-1-Bitidx.db>
DEBUG 12:15:22,423 For operator EQ on Lynx 2.7 in rows (1482112,3203072):
bins (12,12) of #<BitmapIndexReader [null] on
/home/dragos/cassandra/data/keyspace/visit-e-1108-1-Bitidx.db>
DEBUG 12:15:22,424 For operator EQ on Lynx 2.7 in rows (370432,800768): bins
(12,12) of #<BitmapIndexReader [null] on
/home/dragos/cassandra/data/keyspace/visit-e-1109-1-Bitidx.db>
DEBUG 12:15:22,424 For operator EQ on Lynx 2.7 in rows (5755392,12436992):
bins (12,12) of #<BitmapIndexReader [null] on
/home/dragos/cassandra/data/keyspace/visit-e-1095-1-Bitidx.db>
DEBUG 12:15:22,425 For operator EQ on Lynx 2.7 in rows (369664,800768): bins
(12,12) of #<BitmapIndexReader [null] on
/home/dragos/cassandra/data/keyspace/visit-e-1110-1-Bitidx.db>
DEBUG 12:15:22,515 collecting 0 of 2147483647: 62726f77736572:false:8@0
DEBUG 12:15:22,515 collecting 1 of 2147483647:
636f6e6e656374696f6e:false:3@0
DEBUG 12:15:22,515 collecting 2 of 2147483647: 636f756e747279:false:7@0
DEBUG 12:15:22,516 collecting 3 of 2147483647: 646f6d61696e:false:15@0
DEBUG 12:15:22,518 collecting 4 of 2147483647: 6475726174696f6e:false:3@0
DEBUG 12:15:22,521 collecting 5 of 2147483647: 6c696e65:false:4@0
DEBUG 12:15:22,521 collecting 6 of 2147483647: 6f73:false:12@0
DEBUG 12:15:22,521 collecting 7 of 2147483647: 7069:false:3@0
DEBUG 12:15:22,521 collecting 8 of 2147483647: 74696d657374616d70:false:10@0
DEBUG 12:15:22,522 collecting 9 of 2147483647: 75736572:false:15@0
DEBUG 12:15:22,522 collecting 10 of 2147483647: 7a6970:false:5@0
DEBUG 12:15:22,523 collecting 0 of 2147483647: 62726f77736572:false:8@0
DEBUG 12:15:22,524 collecting 1 of 2147483647:
636f6e6e656374696f6e:false:3@0
DEBUG 12:15:22,524 collecting 2 of 2147483647: 636f756e747279:false:7@0
DEBUG 12:15:22,524 collecting 3 of 2147483647: 646f6d61696e:false:15@0
DEBUG 12:15:22,524 collecting 4 of 2147483647: 6475726174696f6e:false:3@0
DEBUG 12:15:22,525 collecting 5 of 2147483647: 6c696e65:false:4@0
DEBUG 12:15:22,525 collecting 6 of 2147483647: 6f73:false:19@0
DEBUG 12:15:22,525 collecting 7 of 2147483647: 7069:false:3@0

...

goes forever.

I'll try the KEYS indexes on the same scenario and let you know.

Dragos

On Tue, Nov 9, 2010 at 8:05 PM, Stu Hood <st...@rackspace.com> wrote:

> Interesting, thanks for the info.
>
> Perhaps the limitation is that index queries involving multiple clauses are
> currently implemented using brute-force filtering rather than an index join?
> The bitmap indexes have native support for this type of join, but it's not
> being used yet.
>
> To confirm: have you tried the same scenario with KEYS indexes? They use
> the same codepath for multiple index expressions, and should experience the
> same timeouts. Also, can you rerun the KEYS_BITMAP test with DEBUG logging
> enabled, to ensure that we aren't going into some kind of infinite loop?
>
> Thanks for the help,
> Stu
>
> -----Original Message-----
> From: "dragos cernahoschi" <dr...@gmail.com>
> Sent: Tuesday, November 9, 2010 11:50am
> To: dev@cassandra.apache.org
> Subject: Re: CASSANDRA-1472 (bitmap indexes)
>
> I'm running the query on three columns with cardinalities: 22, 17 and 10.
> Interesting, if combining columns with cardinalities:
>
> 22 + 17 => no exception
> 22 + 10 => no exception
> 10 + 17 => timed out exception
> 22 + 17 + 10 => timed out exception
>
>
> On Tue, Nov 9, 2010 at 6:29 PM, Stu Hood <st...@rackspace.com> wrote:
>
> > Can you tell me a little bit about your key distribution? How many unique
> > values are indexed (the cardinality)?
> >
> > Until the OrBiC projection I mention on 1472 is implemented, the bitmap
> > secondary indexes will perform terribly for high cardinality datasets.
> >
> > Thanks!
> >
> >
> > -----Original Message-----
> > From: "dragos cernahoschi" <dr...@gmail.com>
> > Sent: Tuesday, November 9, 2010 10:14am
> > To: dev@cassandra.apache.org
> > Subject: Re: CASSANDRA-1472 (bitmap indexes)
> >
> > Meantime the number of SSTable(s) reduced to just 7. Initially the
> > compaction thread suffered the same problem of "too many open files" and
> > couldn't do any compaction.
> >
> > But I'm still not able to run my tests: TimedOutException :(
> >
> > On Tue, Nov 9, 2010 at 5:51 PM, Stu Hood <st...@rackspace.com> wrote:
> >
> > > Hmm, 500 sstables is definitely a degenerate case: did you disable
> > > compaction? By default, Cassandra strives to keep the sstable count
> below
> > > ~32, since accesses to separate sstables require seeks.
> > >
> > > In this case, the query will seek 500 times to check the secondary
> index
> > > for each sstable: if it finds matches it will need to seek to find them
> > in
> > > the primary index, and seek again for the data file.
> > >
> > > -----Original Message-----
> > > From: "dragos cernahoschi" <dr...@gmail.com>
> > > Sent: Tuesday, November 9, 2010 5:33am
> > > To: dev@cassandra.apache.org
> > > Subject: Re: CASSANDRA-1472 (bitmap indexes)
> > >
> > > There are about 500 SSTables (12GB of data including index data,
> > > statistics...) The source data file had about 3GB/26 million rows.
> > >
> > > I only test with EQ expressions for now.
> > >
> > > Increasing the file limit resolved the problem, but now I'm getting
> > > TimedOutException(s) from thrift when "querying" even with slice size
> of
> > 1.
> > > Is my machine too small (core 2 duo 2.93 2GB RAM Ubuntu 10.04) for such
> a
> > > test?
> > >
> > > I really have some interesting sets of data to test indexes with and I
> > want
> > > to make a comparison between ordinary indexes and bitmap indexes.
> > >
> > > Thank you,
> > > Dragos
> > >
> > > On Mon, Nov 8, 2010 at 6:42 PM, Stu Hood <st...@rackspace.com>
> wrote:
> > >
> > > > Dragos,
> > > >
> > > > How many SSTables did you have on disk, and were any of your index
> > > > expressions GT(E)/LT(E)?
> > > >
> > > > I expect that you are bumping into a limitation of the current
> > > > implementation: it opens up to 128 file-handles per SSTable in the
> > worst
> > > > case for a GT/LT query (one per index bucket).
> > > >
> > > > A future version might remove that requirement, but for now, you
> should
> > > > probably bump the file handle limit on your machine to at least 2^16.
> > > >
> > > > Thanks,
> > > > Stu
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: "dragos cernahoschi" <dr...@gmail.com>
> > > > Sent: Monday, November 8, 2010 10:05am
> > > > To: dev@cassandra.apache.org
> > > > Subject: CASSANDRA-1472 (bitmap indexes)
> > > >
> > > > Hi,
> > > >
> > > > I've got an exception during the following test:
> > > >
> > > > test machine: core 2 duo 2.93 2GB RAM Ubuntu 10.04
> > > >
> > > > test scenario:
> > > > - 1 column family
> > > > - about 15 columns
> > > > - 7 indexed columns (bitmap)
> > > > - 26 million rows (insert operation went fine)
> > > > - thrift "query" on 3 of the indexed columns with get_indexed_slices
> > > > (count:
> > > > 100)
> > > > - got the following exception:
> > > >
> > > > 10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal
> > exception
> > > in
> > > > thread Thread[ReadStage:3,5,main]
> > > > java.io.IOError: java.io.FileNotFoundException:
> > > > /home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db (Too
> many
> > > open
> > > > files)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:78)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.openBin(BitmapIndexReader.java:226)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.iterator(BitmapIndexReader.java:214)
> > > >    at
> > > >
> > >
> >
> org.apache.cassandra.io.sstable.SSTableReader.scan(SSTableReader.java:523)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.db.secindex.KeysBitmapIndex.iterator(KeysBitmapIndex.java:103)
> > > >    at
> > > >
> > >
> >
> org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1371)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:41)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:51)
> > > >    at
> > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> > > >    at
> > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> > > >    at java.lang.Thread.run(Thread.java:662)
> > > > Caused by: java.io.FileNotFoundException:
> > > > /home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db (Too
> many
> > > open
> > > > files)
> > > >    at java.io.FileInputStream.open(Native Method)
> > > >    at java.io.FileInputStream.<init>(FileInputStream.java:106)
> > > >    at
> > > >
> > org.apache.avro.file.SeekableFileInput.<init>(SeekableFileInput.java:29)
> > > >    at
> > org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:38)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:72)
> > > >    ... 10 more
> > > > 10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal
> > exception
> > > in
> > > > thread Thread[ReadStage:2,5,main]
> > > > java.io.IOError: java.io.FileNotFoundException:
> > > > /home/dragos/cassandra/data/keyspace/visit-e-1018-Index.db (Too many
> > open
> > > > files)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.io.util.BufferedSegmentedFile.getSegment(BufferedSegmentedFile.java:68)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:129)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:1)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:455)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:572)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(SSTableSliceIterator.java:49)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:72)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:84)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1190)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1082)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1052)
> > > >    at
> > > >
> > >
> >
> org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1378)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:41)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:51)
> > > >    at
> > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> > > >    at
> > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> > > >    at java.lang.Thread.run(Thread.java:662)
> > > > Caused by: java.io.FileNotFoundException:
> > > > /home/dragos/cassandra/data/keyspace/visit-e-1018-Index.db (Too many
> > open
> > > > files)
> > > >    at java.io.RandomAccessFile.open(Native Method)
> > > >    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
> > > >    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:98)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.io.util.BufferedRandomAccessFile.<init>(BufferedRandomAccessFile.java:142)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.io.util.BufferedSegmentedFile.getSegment(BufferedSegmentedFile.java:62)
> > > >    ... 16 more
> > > >
> > > > The same test worked fine with 1 million rows.
> > > >
> > > >
> > > >
> > >
> > >
> > >
> >
> >
> >
>
>
>

Re: CASSANDRA-1472 (bitmap indexes)

Posted by Stu Hood <st...@rackspace.com>.
Interesting, thanks for the info.

Perhaps the limitation is that index queries involving multiple clauses are currently implemented using brute-force filtering rather than an index join? The bitmap indexes have native support for this type of join, but it's not being used yet.

To confirm: have you tried the same scenario with KEYS indexes? They use the same codepath for multiple index expressions, and should experience the same timeouts. Also, can you rerun the KEYS_BITMAP test with DEBUG logging enabled, to ensure that we aren't going into some kind of infinite loop?

Thanks for the help,
Stu

-----Original Message-----
From: "dragos cernahoschi" <dr...@gmail.com>
Sent: Tuesday, November 9, 2010 11:50am
To: dev@cassandra.apache.org
Subject: Re: CASSANDRA-1472 (bitmap indexes)

I'm running the query on three columns with cardinalities: 22, 17 and 10.
Interesting, if combining columns with cardinalities:

22 + 17 => no exception
22 + 10 => no exception
10 + 17 => timed out exception
22 + 17 + 10 => timed out exception


On Tue, Nov 9, 2010 at 6:29 PM, Stu Hood <st...@rackspace.com> wrote:

> Can you tell me a little bit about your key distribution? How many unique
> values are indexed (the cardinality)?
>
> Until the OrBiC projection I mention on 1472 is implemented, the bitmap
> secondary indexes will perform terribly for high cardinality datasets.
>
> Thanks!
>
>
> -----Original Message-----
> From: "dragos cernahoschi" <dr...@gmail.com>
> Sent: Tuesday, November 9, 2010 10:14am
> To: dev@cassandra.apache.org
> Subject: Re: CASSANDRA-1472 (bitmap indexes)
>
> Meantime the number of SSTable(s) reduced to just 7. Initially the
> compaction thread suffered the same problem of "too many open files" and
> couldn't do any compaction.
>
> But I'm still not able to run my tests: TimedOutException :(
>
> On Tue, Nov 9, 2010 at 5:51 PM, Stu Hood <st...@rackspace.com> wrote:
>
> > Hmm, 500 sstables is definitely a degenerate case: did you disable
> > compaction? By default, Cassandra strives to keep the sstable count below
> > ~32, since accesses to separate sstables require seeks.
> >
> > In this case, the query will seek 500 times to check the secondary index
> > for each sstable: if it finds matches it will need to seek to find them
> in
> > the primary index, and seek again for the data file.
> >
> > -----Original Message-----
> > From: "dragos cernahoschi" <dr...@gmail.com>
> > Sent: Tuesday, November 9, 2010 5:33am
> > To: dev@cassandra.apache.org
> > Subject: Re: CASSANDRA-1472 (bitmap indexes)
> >
> > There are about 500 SSTables (12GB of data including index data,
> > statistics...) The source data file had about 3GB/26 million rows.
> >
> > I only test with EQ expressions for now.
> >
> > Increasing the file limit resolved the problem, but now I'm getting
> > TimedOutException(s) from thrift when "querying" even with slice size of
> 1.
> > Is my machine too small (core 2 duo 2.93 2GB RAM Ubuntu 10.04) for such a
> > test?
> >
> > I really have some interesting sets of data to test indexes with and I
> want
> > to make a comparison between ordinary indexes and bitmap indexes.
> >
> > Thank you,
> > Dragos
> >
> > On Mon, Nov 8, 2010 at 6:42 PM, Stu Hood <st...@rackspace.com> wrote:
> >
> > > Dragos,
> > >
> > > How many SSTables did you have on disk, and were any of your index
> > > expressions GT(E)/LT(E)?
> > >
> > > I expect that you are bumping into a limitation of the current
> > > implementation: it opens up to 128 file-handles per SSTable in the
> worst
> > > case for a GT/LT query (one per index bucket).
> > >
> > > A future version might remove that requirement, but for now, you should
> > > probably bump the file handle limit on your machine to at least 2^16.
> > >
> > > Thanks,
> > > Stu
> > >
> > >
> > > -----Original Message-----
> > > From: "dragos cernahoschi" <dr...@gmail.com>
> > > Sent: Monday, November 8, 2010 10:05am
> > > To: dev@cassandra.apache.org
> > > Subject: CASSANDRA-1472 (bitmap indexes)
> > >
> > > Hi,
> > >
> > > I've got an exception during the following test:
> > >
> > > test machine: core 2 duo 2.93 2GB RAM Ubuntu 10.04
> > >
> > > test scenario:
> > > - 1 column family
> > > - about 15 columns
> > > - 7 indexed columns (bitmap)
> > > - 26 million rows (insert operation went fine)
> > > - thrift "query" on 3 of the indexed columns with get_indexed_slices
> > > (count:
> > > 100)
> > > - got the following exception:
> > >
> > > 10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal
> exception
> > in
> > > thread Thread[ReadStage:3,5,main]
> > > java.io.IOError: java.io.FileNotFoundException:
> > > /home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db (Too many
> > open
> > > files)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:78)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.openBin(BitmapIndexReader.java:226)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.iterator(BitmapIndexReader.java:214)
> > >    at
> > >
> >
> org.apache.cassandra.io.sstable.SSTableReader.scan(SSTableReader.java:523)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.db.secindex.KeysBitmapIndex.iterator(KeysBitmapIndex.java:103)
> > >    at
> > >
> >
> org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1371)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:41)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:51)
> > >    at
> > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> > >    at
> > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> > >    at java.lang.Thread.run(Thread.java:662)
> > > Caused by: java.io.FileNotFoundException:
> > > /home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db (Too many
> > open
> > > files)
> > >    at java.io.FileInputStream.open(Native Method)
> > >    at java.io.FileInputStream.<init>(FileInputStream.java:106)
> > >    at
> > >
> org.apache.avro.file.SeekableFileInput.<init>(SeekableFileInput.java:29)
> > >    at
> org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:38)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:72)
> > >    ... 10 more
> > > 10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal
> exception
> > in
> > > thread Thread[ReadStage:2,5,main]
> > > java.io.IOError: java.io.FileNotFoundException:
> > > /home/dragos/cassandra/data/keyspace/visit-e-1018-Index.db (Too many
> open
> > > files)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.io.util.BufferedSegmentedFile.getSegment(BufferedSegmentedFile.java:68)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:129)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:1)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:455)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:572)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(SSTableSliceIterator.java:49)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:72)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:84)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1190)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1082)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1052)
> > >    at
> > >
> >
> org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1378)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:41)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:51)
> > >    at
> > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> > >    at
> > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> > >    at java.lang.Thread.run(Thread.java:662)
> > > Caused by: java.io.FileNotFoundException:
> > > /home/dragos/cassandra/data/keyspace/visit-e-1018-Index.db (Too many
> open
> > > files)
> > >    at java.io.RandomAccessFile.open(Native Method)
> > >    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
> > >    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:98)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.io.util.BufferedRandomAccessFile.<init>(BufferedRandomAccessFile.java:142)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.io.util.BufferedSegmentedFile.getSegment(BufferedSegmentedFile.java:62)
> > >    ... 16 more
> > >
> > > The same test worked fine with 1 million rows.
> > >
> > >
> > >
> >
> >
> >
>
>
>



Re: CASSANDRA-1472 (bitmap indexes)

Posted by dragos cernahoschi <dr...@gmail.com>.
I'm running the query on three columns with cardinalities: 22, 17 and 10.
Interesting, if combining columns with cardinalities:

22 + 17 => no exception
22 + 10 => no exception
10 + 17 => timed out exception
22 + 17 + 10 => timed out exception


On Tue, Nov 9, 2010 at 6:29 PM, Stu Hood <st...@rackspace.com> wrote:

> Can you tell me a little bit about your key distribution? How many unique
> values are indexed (the cardinality)?
>
> Until the OrBiC projection I mention on 1472 is implemented, the bitmap
> secondary indexes will perform terribly for high cardinality datasets.
>
> Thanks!
>
>
> -----Original Message-----
> From: "dragos cernahoschi" <dr...@gmail.com>
> Sent: Tuesday, November 9, 2010 10:14am
> To: dev@cassandra.apache.org
> Subject: Re: CASSANDRA-1472 (bitmap indexes)
>
> Meantime the number of SSTable(s) reduced to just 7. Initially the
> compaction thread suffered the same problem of "too many open files" and
> couldn't do any compaction.
>
> But I'm still not able to run my tests: TimedOutException :(
>
> On Tue, Nov 9, 2010 at 5:51 PM, Stu Hood <st...@rackspace.com> wrote:
>
> > Hmm, 500 sstables is definitely a degenerate case: did you disable
> > compaction? By default, Cassandra strives to keep the sstable count below
> > ~32, since accesses to separate sstables require seeks.
> >
> > In this case, the query will seek 500 times to check the secondary index
> > for each sstable: if it finds matches it will need to seek to find them
> in
> > the primary index, and seek again for the data file.
> >
> > -----Original Message-----
> > From: "dragos cernahoschi" <dr...@gmail.com>
> > Sent: Tuesday, November 9, 2010 5:33am
> > To: dev@cassandra.apache.org
> > Subject: Re: CASSANDRA-1472 (bitmap indexes)
> >
> > There are about 500 SSTables (12GB of data including index data,
> > statistics...) The source data file had about 3GB/26 million rows.
> >
> > I only test with EQ expressions for now.
> >
> > Increasing the file limit resolved the problem, but now I'm getting
> > TimedOutException(s) from thrift when "querying" even with slice size of
> 1.
> > Is my machine too small (core 2 duo 2.93 2GB RAM Ubuntu 10.04) for such a
> > test?
> >
> > I really have some interesting sets of data to test indexes with and I
> want
> > to make a comparison between ordinary indexes and bitmap indexes.
> >
> > Thank you,
> > Dragos
> >
> > On Mon, Nov 8, 2010 at 6:42 PM, Stu Hood <st...@rackspace.com> wrote:
> >
> > > Dragos,
> > >
> > > How many SSTables did you have on disk, and were any of your index
> > > expressions GT(E)/LT(E)?
> > >
> > > I expect that you are bumping into a limitation of the current
> > > implementation: it opens up to 128 file-handles per SSTable in the
> worst
> > > case for a GT/LT query (one per index bucket).
> > >
> > > A future version might remove that requirement, but for now, you should
> > > probably bump the file handle limit on your machine to at least 2^16.
> > >
> > > Thanks,
> > > Stu
> > >
> > >
> > > -----Original Message-----
> > > From: "dragos cernahoschi" <dr...@gmail.com>
> > > Sent: Monday, November 8, 2010 10:05am
> > > To: dev@cassandra.apache.org
> > > Subject: CASSANDRA-1472 (bitmap indexes)
> > >
> > > Hi,
> > >
> > > I've got an exception during the following test:
> > >
> > > test machine: core 2 duo 2.93 2GB RAM Ubuntu 10.04
> > >
> > > test scenario:
> > > - 1 column family
> > > - about 15 columns
> > > - 7 indexed columns (bitmap)
> > > - 26 million rows (insert operation went fine)
> > > - thrift "query" on 3 of the indexed columns with get_indexed_slices
> > > (count:
> > > 100)
> > > - got the following exception:
> > >
> > > 10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal
> exception
> > in
> > > thread Thread[ReadStage:3,5,main]
> > > java.io.IOError: java.io.FileNotFoundException:
> > > /home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db (Too many
> > open
> > > files)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:78)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.openBin(BitmapIndexReader.java:226)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.iterator(BitmapIndexReader.java:214)
> > >    at
> > >
> >
> org.apache.cassandra.io.sstable.SSTableReader.scan(SSTableReader.java:523)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.db.secindex.KeysBitmapIndex.iterator(KeysBitmapIndex.java:103)
> > >    at
> > >
> >
> org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1371)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:41)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:51)
> > >    at
> > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> > >    at
> > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> > >    at java.lang.Thread.run(Thread.java:662)
> > > Caused by: java.io.FileNotFoundException:
> > > /home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db (Too many
> > open
> > > files)
> > >    at java.io.FileInputStream.open(Native Method)
> > >    at java.io.FileInputStream.<init>(FileInputStream.java:106)
> > >    at
> > >
> org.apache.avro.file.SeekableFileInput.<init>(SeekableFileInput.java:29)
> > >    at
> org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:38)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:72)
> > >    ... 10 more
> > > 10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal
> exception
> > in
> > > thread Thread[ReadStage:2,5,main]
> > > java.io.IOError: java.io.FileNotFoundException:
> > > /home/dragos/cassandra/data/keyspace/visit-e-1018-Index.db (Too many
> open
> > > files)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.io.util.BufferedSegmentedFile.getSegment(BufferedSegmentedFile.java:68)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:129)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:1)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:455)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:572)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(SSTableSliceIterator.java:49)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:72)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:84)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1190)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1082)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1052)
> > >    at
> > >
> >
> org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1378)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:41)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:51)
> > >    at
> > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> > >    at
> > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> > >    at java.lang.Thread.run(Thread.java:662)
> > > Caused by: java.io.FileNotFoundException:
> > > /home/dragos/cassandra/data/keyspace/visit-e-1018-Index.db (Too many
> open
> > > files)
> > >    at java.io.RandomAccessFile.open(Native Method)
> > >    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
> > >    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:98)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.io.util.BufferedRandomAccessFile.<init>(BufferedRandomAccessFile.java:142)
> > >    at
> > >
> > >
> >
> org.apache.cassandra.io.util.BufferedSegmentedFile.getSegment(BufferedSegmentedFile.java:62)
> > >    ... 16 more
> > >
> > > The same test worked fine with 1 million rows.
> > >
> > >
> > >
> >
> >
> >
>
>
>

Re: CASSANDRA-1472 (bitmap indexes)

Posted by Stu Hood <st...@rackspace.com>.
Can you tell me a little bit about your key distribution? How many unique values are indexed (the cardinality)?

Until the OrBiC projection I mention on 1472 is implemented, the bitmap secondary indexes will perform terribly for high cardinality datasets.

Thanks!


-----Original Message-----
From: "dragos cernahoschi" <dr...@gmail.com>
Sent: Tuesday, November 9, 2010 10:14am
To: dev@cassandra.apache.org
Subject: Re: CASSANDRA-1472 (bitmap indexes)

Meantime the number of SSTable(s) reduced to just 7. Initially the
compaction thread suffered the same problem of "too many open files" and
couldn't do any compaction.

But I'm still not able to run my tests: TimedOutException :(

On Tue, Nov 9, 2010 at 5:51 PM, Stu Hood <st...@rackspace.com> wrote:

> Hmm, 500 sstables is definitely a degenerate case: did you disable
> compaction? By default, Cassandra strives to keep the sstable count below
> ~32, since accesses to separate sstables require seeks.
>
> In this case, the query will seek 500 times to check the secondary index
> for each sstable: if it finds matches it will need to seek to find them in
> the primary index, and seek again for the data file.
>
> -----Original Message-----
> From: "dragos cernahoschi" <dr...@gmail.com>
> Sent: Tuesday, November 9, 2010 5:33am
> To: dev@cassandra.apache.org
> Subject: Re: CASSANDRA-1472 (bitmap indexes)
>
> There are about 500 SSTables (12GB of data including index data,
> statistics...) The source data file had about 3GB/26 million rows.
>
> I only test with EQ expressions for now.
>
> Increasing the file limit resolved the problem, but now I'm getting
> TimedOutException(s) from thrift when "querying" even with slice size of 1.
> Is my machine too small (core 2 duo 2.93 2GB RAM Ubuntu 10.04) for such a
> test?
>
> I really have some interesting sets of data to test indexes with and I want
> to make a comparison between ordinary indexes and bitmap indexes.
>
> Thank you,
> Dragos
>
> On Mon, Nov 8, 2010 at 6:42 PM, Stu Hood <st...@rackspace.com> wrote:
>
> > Dragos,
> >
> > How many SSTables did you have on disk, and were any of your index
> > expressions GT(E)/LT(E)?
> >
> > I expect that you are bumping into a limitation of the current
> > implementation: it opens up to 128 file-handles per SSTable in the worst
> > case for a GT/LT query (one per index bucket).
> >
> > A future version might remove that requirement, but for now, you should
> > probably bump the file handle limit on your machine to at least 2^16.
> >
> > Thanks,
> > Stu
> >
> >
> > -----Original Message-----
> > From: "dragos cernahoschi" <dr...@gmail.com>
> > Sent: Monday, November 8, 2010 10:05am
> > To: dev@cassandra.apache.org
> > Subject: CASSANDRA-1472 (bitmap indexes)
> >
> > Hi,
> >
> > I've got an exception during the following test:
> >
> > test machine: core 2 duo 2.93 2GB RAM Ubuntu 10.04
> >
> > test scenario:
> > - 1 column family
> > - about 15 columns
> > - 7 indexed columns (bitmap)
> > - 26 million rows (insert operation went fine)
> > - thrift "query" on 3 of the indexed columns with get_indexed_slices
> > (count:
> > 100)
> > - got the following exception:
> >
> > 10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal exception
> in
> > thread Thread[ReadStage:3,5,main]
> > java.io.IOError: java.io.FileNotFoundException:
> > /home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db (Too many
> open
> > files)
> >    at
> >
> >
> org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:78)
> >    at
> >
> >
> org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.openBin(BitmapIndexReader.java:226)
> >    at
> >
> >
> org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.iterator(BitmapIndexReader.java:214)
> >    at
> >
> org.apache.cassandra.io.sstable.SSTableReader.scan(SSTableReader.java:523)
> >    at
> >
> >
> org.apache.cassandra.db.secindex.KeysBitmapIndex.iterator(KeysBitmapIndex.java:103)
> >    at
> >
> org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1371)
> >    at
> >
> >
> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:41)
> >    at
> >
> >
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:51)
> >    at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >    at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >    at java.lang.Thread.run(Thread.java:662)
> > Caused by: java.io.FileNotFoundException:
> > /home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db (Too many
> open
> > files)
> >    at java.io.FileInputStream.open(Native Method)
> >    at java.io.FileInputStream.<init>(FileInputStream.java:106)
> >    at
> > org.apache.avro.file.SeekableFileInput.<init>(SeekableFileInput.java:29)
> >    at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:38)
> >    at
> >
> >
> org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:72)
> >    ... 10 more
> > 10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal exception
> in
> > thread Thread[ReadStage:2,5,main]
> > java.io.IOError: java.io.FileNotFoundException:
> > /home/dragos/cassandra/data/keyspace/visit-e-1018-Index.db (Too many open
> > files)
> >    at
> >
> >
> org.apache.cassandra.io.util.BufferedSegmentedFile.getSegment(BufferedSegmentedFile.java:68)
> >    at
> >
> >
> org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:129)
> >    at
> >
> >
> org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:1)
> >    at
> >
> >
> org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:455)
> >    at
> >
> >
> org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:572)
> >    at
> >
> >
> org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(SSTableSliceIterator.java:49)
> >    at
> >
> >
> org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:72)
> >    at
> >
> >
> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:84)
> >    at
> >
> >
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1190)
> >    at
> >
> >
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1082)
> >    at
> >
> >
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1052)
> >    at
> >
> org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1378)
> >    at
> >
> >
> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:41)
> >    at
> >
> >
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:51)
> >    at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >    at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >    at java.lang.Thread.run(Thread.java:662)
> > Caused by: java.io.FileNotFoundException:
> > /home/dragos/cassandra/data/keyspace/visit-e-1018-Index.db (Too many open
> > files)
> >    at java.io.RandomAccessFile.open(Native Method)
> >    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
> >    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:98)
> >    at
> >
> >
> org.apache.cassandra.io.util.BufferedRandomAccessFile.<init>(BufferedRandomAccessFile.java:142)
> >    at
> >
> >
> org.apache.cassandra.io.util.BufferedSegmentedFile.getSegment(BufferedSegmentedFile.java:62)
> >    ... 16 more
> >
> > The same test worked fine with 1 million rows.
> >
> >
> >
>
>
>



Re: CASSANDRA-1472 (bitmap indexes)

Posted by dragos cernahoschi <dr...@gmail.com>.
Meantime the number of SSTable(s) reduced to just 7. Initially the
compaction thread suffered the same problem of "too many open files" and
couldn't do any compaction.

But I'm still not able to run my tests: TimedOutException :(

On Tue, Nov 9, 2010 at 5:51 PM, Stu Hood <st...@rackspace.com> wrote:

> Hmm, 500 sstables is definitely a degenerate case: did you disable
> compaction? By default, Cassandra strives to keep the sstable count below
> ~32, since accesses to separate sstables require seeks.
>
> In this case, the query will seek 500 times to check the secondary index
> for each sstable: if it finds matches it will need to seek to find them in
> the primary index, and seek again for the data file.
>
> -----Original Message-----
> From: "dragos cernahoschi" <dr...@gmail.com>
> Sent: Tuesday, November 9, 2010 5:33am
> To: dev@cassandra.apache.org
> Subject: Re: CASSANDRA-1472 (bitmap indexes)
>
> There are about 500 SSTables (12GB of data including index data,
> statistics...) The source data file had about 3GB/26 million rows.
>
> I only test with EQ expressions for now.
>
> Increasing the file limit resolved the problem, but now I'm getting
> TimedOutException(s) from thrift when "querying" even with slice size of 1.
> Is my machine too small (core 2 duo 2.93 2GB RAM Ubuntu 10.04) for such a
> test?
>
> I really have some interesting sets of data to test indexes with and I want
> to make a comparison between ordinary indexes and bitmap indexes.
>
> Thank you,
> Dragos
>
> On Mon, Nov 8, 2010 at 6:42 PM, Stu Hood <st...@rackspace.com> wrote:
>
> > Dragos,
> >
> > How many SSTables did you have on disk, and were any of your index
> > expressions GT(E)/LT(E)?
> >
> > I expect that you are bumping into a limitation of the current
> > implementation: it opens up to 128 file-handles per SSTable in the worst
> > case for a GT/LT query (one per index bucket).
> >
> > A future version might remove that requirement, but for now, you should
> > probably bump the file handle limit on your machine to at least 2^16.
> >
> > Thanks,
> > Stu
> >
> >
> > -----Original Message-----
> > From: "dragos cernahoschi" <dr...@gmail.com>
> > Sent: Monday, November 8, 2010 10:05am
> > To: dev@cassandra.apache.org
> > Subject: CASSANDRA-1472 (bitmap indexes)
> >
> > Hi,
> >
> > I've got an exception during the following test:
> >
> > test machine: core 2 duo 2.93 2GB RAM Ubuntu 10.04
> >
> > test scenario:
> > - 1 column family
> > - about 15 columns
> > - 7 indexed columns (bitmap)
> > - 26 million rows (insert operation went fine)
> > - thrift "query" on 3 of the indexed columns with get_indexed_slices
> > (count:
> > 100)
> > - got the following exception:
> >
> > 10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal exception
> in
> > thread Thread[ReadStage:3,5,main]
> > java.io.IOError: java.io.FileNotFoundException:
> > /home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db (Too many
> open
> > files)
> >    at
> >
> >
> org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:78)
> >    at
> >
> >
> org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.openBin(BitmapIndexReader.java:226)
> >    at
> >
> >
> org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.iterator(BitmapIndexReader.java:214)
> >    at
> >
> org.apache.cassandra.io.sstable.SSTableReader.scan(SSTableReader.java:523)
> >    at
> >
> >
> org.apache.cassandra.db.secindex.KeysBitmapIndex.iterator(KeysBitmapIndex.java:103)
> >    at
> >
> org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1371)
> >    at
> >
> >
> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:41)
> >    at
> >
> >
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:51)
> >    at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >    at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >    at java.lang.Thread.run(Thread.java:662)
> > Caused by: java.io.FileNotFoundException:
> > /home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db (Too many
> open
> > files)
> >    at java.io.FileInputStream.open(Native Method)
> >    at java.io.FileInputStream.<init>(FileInputStream.java:106)
> >    at
> > org.apache.avro.file.SeekableFileInput.<init>(SeekableFileInput.java:29)
> >    at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:38)
> >    at
> >
> >
> org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:72)
> >    ... 10 more
> > 10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal exception
> in
> > thread Thread[ReadStage:2,5,main]
> > java.io.IOError: java.io.FileNotFoundException:
> > /home/dragos/cassandra/data/keyspace/visit-e-1018-Index.db (Too many open
> > files)
> >    at
> >
> >
> org.apache.cassandra.io.util.BufferedSegmentedFile.getSegment(BufferedSegmentedFile.java:68)
> >    at
> >
> >
> org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:129)
> >    at
> >
> >
> org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:1)
> >    at
> >
> >
> org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:455)
> >    at
> >
> >
> org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:572)
> >    at
> >
> >
> org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(SSTableSliceIterator.java:49)
> >    at
> >
> >
> org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:72)
> >    at
> >
> >
> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:84)
> >    at
> >
> >
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1190)
> >    at
> >
> >
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1082)
> >    at
> >
> >
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1052)
> >    at
> >
> org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1378)
> >    at
> >
> >
> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:41)
> >    at
> >
> >
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:51)
> >    at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >    at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >    at java.lang.Thread.run(Thread.java:662)
> > Caused by: java.io.FileNotFoundException:
> > /home/dragos/cassandra/data/keyspace/visit-e-1018-Index.db (Too many open
> > files)
> >    at java.io.RandomAccessFile.open(Native Method)
> >    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
> >    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:98)
> >    at
> >
> >
> org.apache.cassandra.io.util.BufferedRandomAccessFile.<init>(BufferedRandomAccessFile.java:142)
> >    at
> >
> >
> org.apache.cassandra.io.util.BufferedSegmentedFile.getSegment(BufferedSegmentedFile.java:62)
> >    ... 16 more
> >
> > The same test worked fine with 1 million rows.
> >
> >
> >
>
>
>

Re: CASSANDRA-1472 (bitmap indexes)

Posted by Stu Hood <st...@rackspace.com>.
Hmm, 500 sstables is definitely a degenerate case: did you disable compaction? By default, Cassandra strives to keep the sstable count below ~32, since accesses to separate sstables require seeks.

In this case, the query will seek 500 times to check the secondary index for each sstable: if it finds matches it will need to seek to find them in the primary index, and seek again for the data file.

-----Original Message-----
From: "dragos cernahoschi" <dr...@gmail.com>
Sent: Tuesday, November 9, 2010 5:33am
To: dev@cassandra.apache.org
Subject: Re: CASSANDRA-1472 (bitmap indexes)

There are about 500 SSTables (12GB of data including index data,
statistics...) The source data file had about 3GB/26 million rows.

I only test with EQ expressions for now.

Increasing the file limit resolved the problem, but now I'm getting
TimedOutException(s) from thrift when "querying" even with slice size of 1.
Is my machine too small (core 2 duo 2.93 2GB RAM Ubuntu 10.04) for such a
test?

I really have some interesting sets of data to test indexes with and I want
to make a comparison between ordinary indexes and bitmap indexes.

Thank you,
Dragos

On Mon, Nov 8, 2010 at 6:42 PM, Stu Hood <st...@rackspace.com> wrote:

> Dragos,
>
> How many SSTables did you have on disk, and were any of your index
> expressions GT(E)/LT(E)?
>
> I expect that you are bumping into a limitation of the current
> implementation: it opens up to 128 file-handles per SSTable in the worst
> case for a GT/LT query (one per index bucket).
>
> A future version might remove that requirement, but for now, you should
> probably bump the file handle limit on your machine to at least 2^16.
>
> Thanks,
> Stu
>
>
> -----Original Message-----
> From: "dragos cernahoschi" <dr...@gmail.com>
> Sent: Monday, November 8, 2010 10:05am
> To: dev@cassandra.apache.org
> Subject: CASSANDRA-1472 (bitmap indexes)
>
> Hi,
>
> I've got an exception during the following test:
>
> test machine: core 2 duo 2.93 2GB RAM Ubuntu 10.04
>
> test scenario:
> - 1 column family
> - about 15 columns
> - 7 indexed columns (bitmap)
> - 26 million rows (insert operation went fine)
> - thrift "query" on 3 of the indexed columns with get_indexed_slices
> (count:
> 100)
> - got the following exception:
>
> 10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal exception in
> thread Thread[ReadStage:3,5,main]
> java.io.IOError: java.io.FileNotFoundException:
> /home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db (Too many open
> files)
>    at
>
> org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:78)
>    at
>
> org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.openBin(BitmapIndexReader.java:226)
>    at
>
> org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.iterator(BitmapIndexReader.java:214)
>    at
> org.apache.cassandra.io.sstable.SSTableReader.scan(SSTableReader.java:523)
>    at
>
> org.apache.cassandra.db.secindex.KeysBitmapIndex.iterator(KeysBitmapIndex.java:103)
>    at
> org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1371)
>    at
>
> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:41)
>    at
>
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:51)
>    at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>    at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>    at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.FileNotFoundException:
> /home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db (Too many open
> files)
>    at java.io.FileInputStream.open(Native Method)
>    at java.io.FileInputStream.<init>(FileInputStream.java:106)
>    at
> org.apache.avro.file.SeekableFileInput.<init>(SeekableFileInput.java:29)
>    at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:38)
>    at
>
> org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:72)
>    ... 10 more
> 10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal exception in
> thread Thread[ReadStage:2,5,main]
> java.io.IOError: java.io.FileNotFoundException:
> /home/dragos/cassandra/data/keyspace/visit-e-1018-Index.db (Too many open
> files)
>    at
>
> org.apache.cassandra.io.util.BufferedSegmentedFile.getSegment(BufferedSegmentedFile.java:68)
>    at
>
> org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:129)
>    at
>
> org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:1)
>    at
>
> org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:455)
>    at
>
> org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:572)
>    at
>
> org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(SSTableSliceIterator.java:49)
>    at
>
> org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:72)
>    at
>
> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:84)
>    at
>
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1190)
>    at
>
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1082)
>    at
>
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1052)
>    at
> org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1378)
>    at
>
> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:41)
>    at
>
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:51)
>    at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>    at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>    at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.FileNotFoundException:
> /home/dragos/cassandra/data/keyspace/visit-e-1018-Index.db (Too many open
> files)
>    at java.io.RandomAccessFile.open(Native Method)
>    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
>    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:98)
>    at
>
> org.apache.cassandra.io.util.BufferedRandomAccessFile.<init>(BufferedRandomAccessFile.java:142)
>    at
>
> org.apache.cassandra.io.util.BufferedSegmentedFile.getSegment(BufferedSegmentedFile.java:62)
>    ... 16 more
>
> The same test worked fine with 1 million rows.
>
>
>



Re: CASSANDRA-1472 (bitmap indexes)

Posted by dragos cernahoschi <dr...@gmail.com>.
There are about 500 SSTables (12GB of data including index data,
statistics...) The source data file had about 3GB/26 million rows.

I only test with EQ expressions for now.

Increasing the file limit resolved the problem, but now I'm getting
TimedOutException(s) from thrift when "querying" even with slice size of 1.
Is my machine too small (core 2 duo 2.93 2GB RAM Ubuntu 10.04) for such a
test?

I really have some interesting sets of data to test indexes with and I want
to make a comparison between ordinary indexes and bitmap indexes.

Thank you,
Dragos

On Mon, Nov 8, 2010 at 6:42 PM, Stu Hood <st...@rackspace.com> wrote:

> Dragos,
>
> How many SSTables did you have on disk, and were any of your index
> expressions GT(E)/LT(E)?
>
> I expect that you are bumping into a limitation of the current
> implementation: it opens up to 128 file-handles per SSTable in the worst
> case for a GT/LT query (one per index bucket).
>
> A future version might remove that requirement, but for now, you should
> probably bump the file handle limit on your machine to at least 2^16.
>
> Thanks,
> Stu
>
>
> -----Original Message-----
> From: "dragos cernahoschi" <dr...@gmail.com>
> Sent: Monday, November 8, 2010 10:05am
> To: dev@cassandra.apache.org
> Subject: CASSANDRA-1472 (bitmap indexes)
>
> Hi,
>
> I've got an exception during the following test:
>
> test machine: core 2 duo 2.93 2GB RAM Ubuntu 10.04
>
> test scenario:
> - 1 column family
> - about 15 columns
> - 7 indexed columns (bitmap)
> - 26 million rows (insert operation went fine)
> - thrift "query" on 3 of the indexed columns with get_indexed_slices
> (count:
> 100)
> - got the following exception:
>
> 10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal exception in
> thread Thread[ReadStage:3,5,main]
> java.io.IOError: java.io.FileNotFoundException:
> /home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db (Too many open
> files)
>    at
>
> org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:78)
>    at
>
> org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.openBin(BitmapIndexReader.java:226)
>    at
>
> org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.iterator(BitmapIndexReader.java:214)
>    at
> org.apache.cassandra.io.sstable.SSTableReader.scan(SSTableReader.java:523)
>    at
>
> org.apache.cassandra.db.secindex.KeysBitmapIndex.iterator(KeysBitmapIndex.java:103)
>    at
> org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1371)
>    at
>
> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:41)
>    at
>
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:51)
>    at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>    at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>    at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.FileNotFoundException:
> /home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db (Too many open
> files)
>    at java.io.FileInputStream.open(Native Method)
>    at java.io.FileInputStream.<init>(FileInputStream.java:106)
>    at
> org.apache.avro.file.SeekableFileInput.<init>(SeekableFileInput.java:29)
>    at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:38)
>    at
>
> org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:72)
>    ... 10 more
> 10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal exception in
> thread Thread[ReadStage:2,5,main]
> java.io.IOError: java.io.FileNotFoundException:
> /home/dragos/cassandra/data/keyspace/visit-e-1018-Index.db (Too many open
> files)
>    at
>
> org.apache.cassandra.io.util.BufferedSegmentedFile.getSegment(BufferedSegmentedFile.java:68)
>    at
>
> org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:129)
>    at
>
> org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:1)
>    at
>
> org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:455)
>    at
>
> org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:572)
>    at
>
> org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(SSTableSliceIterator.java:49)
>    at
>
> org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:72)
>    at
>
> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:84)
>    at
>
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1190)
>    at
>
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1082)
>    at
>
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1052)
>    at
> org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1378)
>    at
>
> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:41)
>    at
>
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:51)
>    at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>    at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>    at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.FileNotFoundException:
> /home/dragos/cassandra/data/keyspace/visit-e-1018-Index.db (Too many open
> files)
>    at java.io.RandomAccessFile.open(Native Method)
>    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
>    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:98)
>    at
>
> org.apache.cassandra.io.util.BufferedRandomAccessFile.<init>(BufferedRandomAccessFile.java:142)
>    at
>
> org.apache.cassandra.io.util.BufferedSegmentedFile.getSegment(BufferedSegmentedFile.java:62)
>    ... 16 more
>
> The same test worked fine with 1 million rows.
>
>
>

RE: CASSANDRA-1472 (bitmap indexes)

Posted by Stu Hood <st...@rackspace.com>.
Dragos,

How many SSTables did you have on disk, and were any of your index expressions GT(E)/LT(E)?

I expect that you are bumping into a limitation of the current implementation: it opens up to 128 file-handles per SSTable in the worst case for a GT/LT query (one per index bucket).

A future version might remove that requirement, but for now, you should probably bump the file handle limit on your machine to at least 2^16.

Thanks,
Stu


-----Original Message-----
From: "dragos cernahoschi" <dr...@gmail.com>
Sent: Monday, November 8, 2010 10:05am
To: dev@cassandra.apache.org
Subject: CASSANDRA-1472 (bitmap indexes)

Hi,

I've got an exception during the following test:

test machine: core 2 duo 2.93 2GB RAM Ubuntu 10.04

test scenario:
- 1 column family
- about 15 columns
- 7 indexed columns (bitmap)
- 26 million rows (insert operation went fine)
- thrift "query" on 3 of the indexed columns with get_indexed_slices (count:
100)
- got the following exception:

10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal exception in
thread Thread[ReadStage:3,5,main]
java.io.IOError: java.io.FileNotFoundException:
/home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db (Too many open
files)
    at
org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:78)
    at
org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.openBin(BitmapIndexReader.java:226)
    at
org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.iterator(BitmapIndexReader.java:214)
    at
org.apache.cassandra.io.sstable.SSTableReader.scan(SSTableReader.java:523)
    at
org.apache.cassandra.db.secindex.KeysBitmapIndex.iterator(KeysBitmapIndex.java:103)
    at
org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1371)
    at
org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:41)
    at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:51)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.FileNotFoundException:
/home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db (Too many open
files)
    at java.io.FileInputStream.open(Native Method)
    at java.io.FileInputStream.<init>(FileInputStream.java:106)
    at
org.apache.avro.file.SeekableFileInput.<init>(SeekableFileInput.java:29)
    at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:38)
    at
org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:72)
    ... 10 more
10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal exception in
thread Thread[ReadStage:2,5,main]
java.io.IOError: java.io.FileNotFoundException:
/home/dragos/cassandra/data/keyspace/visit-e-1018-Index.db (Too many open
files)
    at
org.apache.cassandra.io.util.BufferedSegmentedFile.getSegment(BufferedSegmentedFile.java:68)
    at
org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:129)
    at
org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:1)
    at
org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:455)
    at
org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:572)
    at
org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(SSTableSliceIterator.java:49)
    at
org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:72)
    at
org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:84)
    at
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1190)
    at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1082)
    at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1052)
    at
org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1378)
    at
org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:41)
    at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:51)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.FileNotFoundException:
/home/dragos/cassandra/data/keyspace/visit-e-1018-Index.db (Too many open
files)
    at java.io.RandomAccessFile.open(Native Method)
    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:98)
    at
org.apache.cassandra.io.util.BufferedRandomAccessFile.<init>(BufferedRandomAccessFile.java:142)
    at
org.apache.cassandra.io.util.BufferedSegmentedFile.getSegment(BufferedSegmentedFile.java:62)
    ... 16 more

The same test worked fine with 1 million rows.