You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Sa Li <sa...@gmail.com> on 2014/10/28 20:10:13 UTC

replace insert by copy command in trident postgresql state

Hi, all

I have developed a trident KafkaSpout to consume the json data from kafka,
and persistentAggregate does the data writer job, see

 topology.newStream("topictestspout", kafkaSpout)
                               .each(new Fields("str"),
                                        new JsonObjectParse(),
                                        new Fields("userid","event"))
                               .groupBy(new Fields("userid"))

 .persistentAggregate(PostgresqlState.newFactory(config), new
Fields("userid","event"), new EventUpdater(), new Fields( "eventword"))
                                .parallelismHint(16);

Basically, we want to write the data from kakfa into postgresql, current
writer does this job, but I am assuming it is kinda slow, since it performs
multiple inserts within a transaction, something like this
in the multiPut of PostgresqlState,

query =
 WITH
   new_values (userid,event) AS (VALUES (?,?), (?,?), (?,?), .....)INSERT
INTO test.state(userid, event) SELECT userid, event FROM new_values

But I think inserts are too slow, but COPY command is way faster, does
anyone have experience to make the stream data as a STDIN file in memory,
and then copy the bulk data into postgresql table directly?


Thanks

Alec