You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-user@db.apache.org by Andrew Shuttlewood <an...@futureroute.co.uk> on 2005/05/16 13:53:13 UTC

Bulk Import

I've looked at the Bulk Import stuff, and I'm curious as to whether it's
possible to achieve the same effects but avoiding having to format the
CSV file in a special way?

Ideally, I'd like to be able to switch the database into a creation mode
- I don't mind giving up concurrent access, transactions or anything
like that, I'm purely interested in creating a database with as much
speed as possible with a varying range of types as far as possible,
without throwing away (too much) SQL compatibility.

Also, is it possible to get some notification when a database is
recovering? It just seems to take forever in the boot stage and progress
information would be desirable.

Finally, we wish to enumerate metadata for an arbitary SQL query in the
fastest possible way - what is the suggested way of achieving this? We
need to know basically the column names and types of a given query
(preferably without having to parse the SQL ourselves!) I assume that
Derby already knows this, but at the moment we actually execute the
query (which has it's own performance bottleneck).

Sorry for the laundry list of questions, just wondering if there are any
good answers :)

Re: Bulk Import

Posted by Mamta Satoor <ms...@gmail.com>.

Hi Andrew,
 For your metadata question, JDBC metadata calls can help you. 
ResultSet.getMetaData will give you ResultSetMetaData object. 
ResultSetMetaData has various apis like getColumnName, getColumnType etc. to 
get the metadata information.
 Also, _I think_, if the table does not have indexes defined on it, then 
import into it will be much faster. But maybe someone else more familiar 
with import export can help you better.
 Mamta
 On 5/16/05, Andrew Shuttlewood <an...@futureroute.co.uk> 
wrote: 
> 
> I've looked at the Bulk Import stuff, and I'm curious as to whether it's
> possible to achieve the same effects but avoiding having to format the
> CSV file in a special way?
> 
> Ideally, I'd like to be able to switch the database into a creation mode
> - I don't mind giving up concurrent access, transactions or anything
> like that, I'm purely interested in creating a database with as much
> speed as possible with a varying range of types as far as possible,
> without throwing away (too much) SQL compatibility.
> 
> Also, is it possible to get some notification when a database is
> recovering? It just seems to take forever in the boot stage and progress
> information would be desirable.
> 
> Finally, we wish to enumerate metadata for an arbitary SQL query in the
> fastest possible way - what is the suggested way of achieving this? We
> need to know basically the column names and types of a given query
> (preferably without having to parse the SQL ourselves!) I assume that
> Derby already knows this, but at the moment we actually execute the
> query (which has it's own performance bottleneck).
> 
> Sorry for the laundry list of questions, just wondering if there are any
> good answers :)
> 
>

Re: Bulk Import

Posted by Andrew Shuttlewood <an...@futureroute.co.uk>.

On Mon, 2005-05-16 at 12:53 +0100, Andrew Shuttlewood wrote:
> I've looked at the Bulk Import stuff, and I'm curious as to whether it's
> possible to achieve the same effects but avoiding having to format the
> CSV file in a special way?
> 
> Ideally, I'd like to be able to switch the database into a creation mode
> - I don't mind giving up concurrent access, transactions or anything
> like that, I'm purely interested in creating a database with as much
> speed as possible with a varying range of types as far as possible,
> without throwing away (too much) SQL compatibility.

Okay, having poked a bit through Derby, I've seen SYSCS_BULK_INSERT.

It's (of course) not mentioned at all in the documentation. Will this be
left in? There's not much mention in the documentation about VTI support
at all, despite how useful it would be to keep this in. I'm going to
explore. Also from the incubating code I have it doesn't perform the
LOCK TABLE call that IMPOR

> 
> Finally, we wish to enumerate metadata for an arbitary SQL query in the
> fastest possible way - what is the suggested way of achieving this? We
> need to know basically the column names and types of a given query
> (preferably without having to parse the SQL ourselves!) I assume that
> Derby already knows this, but at the moment we actually execute the
> query (which has it's own performance bottleneck).

Are there any ideas about how to do this faster? I'm not sure how to
even begin to approach this. The reasoning is that I might permit the
user to enter a complex query (including ORDER by for example), and I
wish to execute it in order to retrieve the metadata, however I don't
want to actually PULL any data unless it's necessary.