You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Ferry Syafei Sapei <fe...@googlemail.com> on 2016/01/19 16:02:28 UTC

Grouping cache when loading data using CacheStore

I have a CSV file with the following structure:

accountNumber,accountProperty1,accountProperty2,billNumber,billProperty1,billProperty2
100,property11,property12,100700,billProperty11,billProperty12
100,property11,property12,100700,billProperty21,billProperty22

I would like to import the file and fill in the cache with the following object structure:
class AccountInformation
	int accountNumber
	String accountProperty1
	String accountProperty2
	List<Bill> bills

class Bill
	int billNumber
	String billProperty1
	String billProperty2

I have tried using IgniteDataStreamer and StreamVisitor. Line by line will be read and added to the data stream. In the data streamer, I could check if the account information exists or not. If it exists, I just add the new bill to the existing account and replace the cache content for that account.

How can I achieve the same result using CacheStore?	

Re: Grouping cache when loading data using CacheStore

Posted by Dmitriy Setrakyan <ds...@apache.org>.
The CacheStore implementation using JDBC is documented here:
https://apacheignite.readme.io/docs/data-loading

You can use this example to implement the same over CSV file.

D.

On Tue, Jan 19, 2016 at 7:02 AM, Ferry Syafei Sapei <
ferry.sapei@googlemail.com> wrote:

> I have a CSV file with the following structure:
>
>
> accountNumber,accountProperty1,accountProperty2,billNumber,billProperty1,billProperty2
> 100,property11,property12,100700,billProperty11,billProperty12
> 100,property11,property12,100700,billProperty21,billProperty22
>
> I would like to import the file and fill in the cache with the following
> object structure:
> class AccountInformation
>         int accountNumber
>         String accountProperty1
>         String accountProperty2
>         List<Bill> bills
>
> class Bill
>         int billNumber
>         String billProperty1
>         String billProperty2
>
> I have tried using IgniteDataStreamer and StreamVisitor. Line by line will
> be read and added to the data stream. In the data streamer, I could check
> if the account information exists or not. If it exists, I just add the new
> bill to the existing account and replace the cache content for that account.
>
> How can I achieve the same result using CacheStore?

Re: Grouping cache when loading data using CacheStore

Posted by Denis Magda <dm...@gridgain.com>.
HI,

CacheLoadOnlyStoreAdapter [1] perfectly fits for your cause it was
deliberately designed for the cases when fast loading from raw files, CSV or
other resources is needed.

Unfortunately there is no an example in Ignite that shows how to use this
adapter for now.
I've created a ticket to be sure that it will appear in the future. [2]

Presently my suggestion is to refer to the following tests that can be used
as a reference of CacheLoadOnlyStoreAdapter usage [3].

[1]
https://github.com/apache/ignite/blob/b3d347e35a254928fd1c4a0473f1b17d642c72f3/modules/core/src/main/java/org/apache/ignite/cache/store/CacheLoadOnlyStoreAdapter.java 

[2] https://issues.apache.org/jira/browse/IGNITE-2415

[3]
https://github.com/apache/ignite/blob/8d77c18c7004f40e9b48fa19e9abd5d893967449/modules/core/src/test/java/org/apache/ignite/cache/store/GridCacheLoadOnlyStoreAdapterSelfTest.java



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Grouping-cache-when-loading-data-using-CacheStore-tp2640p2649.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Grouping cache when loading data using CacheStore

Posted by Ferry Syafei Sapei <fe...@googlemail.com>.
Importing the CSV into H2 database will require a huge amount of memory, since the file is big and contains a lot of redundant data. Some rows should be aggregated since they belong to an object with the same key (e.g. accountNumber). Moreover the rows are not sorted by the accountNumber. 

Could you propose another solution instead of using the H2 database?

I have tried storing the CSV in igfs and performing the map-reduce. I instantiate an object for each row in the the job, but in the reduce method in the task, I get all the instantiated objects. They are not grouped by the accountNumber. Is there a way to get a grouped object in the reduce method?

 
> Am 20.01.2016 um 07:21 schrieb Alexey Kuznetsov <ak...@gridgain.com>:
> 
> Ferry,
> 
> I would like to propose following work around: 
> 1) Import your CSV into H2 database, see: http://www.h2database.com/html/tutorial.html#csv <http://www.h2database.com/html/tutorial.html#csv>
> 2) Use Apache Ignite Schema Import Utility to generate POJO classes and xml/java configuration,\
> see https://apacheignite.readme.io/docs/automatic-persistence <https://apacheignite.readme.io/docs/automatic-persistence>
> 3) Use CacheJdbcPojoStoreFactory / CacheJdbcPojoStore to load your data into cache.
> 
> Will this work for you?
> 
> 
> On Tue, Jan 19, 2016 at 10:02 PM, Ferry Syafei Sapei <ferry.sapei@googlemail.com <ma...@googlemail.com>> wrote:
> I have a CSV file with the following structure:
> 
> accountNumber,accountProperty1,accountProperty2,billNumber,billProperty1,billProperty2
> 100,property11,property12,100700,billProperty11,billProperty12
> 100,property11,property12,100700,billProperty21,billProperty22
> 
> I would like to import the file and fill in the cache with the following object structure:
> class AccountInformation
>         int accountNumber
>         String accountProperty1
>         String accountProperty2
>         List<Bill> bills
> 
> class Bill
>         int billNumber
>         String billProperty1
>         String billProperty2
> 
> I have tried using IgniteDataStreamer and StreamVisitor. Line by line will be read and added to the data stream. In the data streamer, I could check if the account information exists or not. If it exists, I just add the new bill to the existing account and replace the cache content for that account.
> 
> How can I achieve the same result using CacheStore?     
> 
> 
> 
> -- 
> Alexey Kuznetsov
> GridGain Systems
> www.gridgain.com <http://www.gridgain.com/>


Re: Grouping cache when loading data using CacheStore

Posted by Alexey Kuznetsov <ak...@gridgain.com>.
Ferry,

I would like to propose following work around:
1) Import your CSV into H2 database, see:
http://www.h2database.com/html/tutorial.html#csv
2) Use Apache Ignite Schema Import Utility to generate POJO classes and
xml/java configuration,\
see https://apacheignite.readme.io/docs/automatic-persistence
3) Use CacheJdbcPojoStoreFactory / CacheJdbcPojoStore to load your data
into cache.

Will this work for you?


On Tue, Jan 19, 2016 at 10:02 PM, Ferry Syafei Sapei <
ferry.sapei@googlemail.com> wrote:

> I have a CSV file with the following structure:
>
>
> accountNumber,accountProperty1,accountProperty2,billNumber,billProperty1,billProperty2
> 100,property11,property12,100700,billProperty11,billProperty12
> 100,property11,property12,100700,billProperty21,billProperty22
>
> I would like to import the file and fill in the cache with the following
> object structure:
> class AccountInformation
>         int accountNumber
>         String accountProperty1
>         String accountProperty2
>         List<Bill> bills
>
> class Bill
>         int billNumber
>         String billProperty1
>         String billProperty2
>
> I have tried using IgniteDataStreamer and StreamVisitor. Line by line will
> be read and added to the data stream. In the data streamer, I could check
> if the account information exists or not. If it exists, I just add the new
> bill to the existing account and replace the cache content for that account.
>
> How can I achieve the same result using CacheStore?




-- 
Alexey Kuznetsov
GridGain Systems
www.gridgain.com