You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by "xubo245 (JIRA)" <ji...@apache.org> on 2018/11/13 09:13:00 UTC

[jira] [Updated] (CARBONDATA-2951) CSDK: Provide C++ interface for SDK

     [ https://issues.apache.org/jira/browse/CARBONDATA-2951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

xubo245 updated CARBONDATA-2951:
--------------------------------
    Description: 
CSDK:  Provide C++ interface for SDK
1. Provide CarbonReader for SDK, it can read carbon data in C++ language
	##features/interfaces
       1.1.	create CarbonReader
	1.2.	hasNext()
	1.3.	readNextRow()
	1.4.	close()
	1.5.	support OBS(AK/SK/Endpoint)
	1.6 	support batch read(withBatch,readNextBatchRow) 
	1.7 	support vecor read(default) and carbonrecordreader (withRowRecordReader)
	1.8 	projection
	
	##support data types:
	 String, Long,Varchar(string),Short,Int,Date(int),timestamp(long),boolean,Decimal(string),Float
	 Array<String> in carbonrecordreader, not support in vectorreader
	 byte=>support in java RowUtil, not in C++ carbon reader
	 
	## Schema and data
	 Create table tbl_email_form_to_for_XX( 
		Event_Time Timestamp,
		Ingestion_Time Timestamp,
		From_Email String,
		To_Email String,
		From_To_type String,
		Event_ID String
		) using carbon options(path ‘obs://X/tbl_email_form_to_for_XX’)
		ETL 6 columns from 18 columns table
		
		example data:
		from_email_36550_phillip.allen@enron.com	to_email_36550_stagecoachmama@hotmail.com	from_to	<29...@thyme>	1538015497000000	9755149200000

2. the performance should be reach X millions records/s/node

3.Provide CarbonWriter for SDK, it can write carbon data in C++ language
	##features/interfaces
	3.1.	create CarbonWriter, including create schema(withCsvInput),set outputPath, and build,
	3.2.	write()
	3.3.	close()
	3.4.	support OBS(AK/SK/Endpoint)(withHadoopConf)
	3.5.	writtenBy
	3.6.     support withTableProperty, withLoadOption,taskNo, uniqueIdentifier, withThreadSafe,  withBlockSize, withBlockletSize, localDictionaryThreshold, enableLocalDictionary in C++ SDK (PR2899 TO BE review)
	
	##Data types:
	   Carbon need support base data types, including string, float, double, int, long, date, timestamp, bool, array<String>.
          For other, we can convert:
             char array => carbon string
             Enum => Carbon string
              set and list => carbon array<String>

	##performance
	Writing Performance is not required now
	
4. read schema function
readSchema
getVersionDetails  =>TODO

5. support carbonproperties
	5.1 addProperty
	5.2 getProperty
	
6.TODO:
	6.1.getVersionDetails
	6.2.updated SDK/CSDK reader doc
	6.3.support byte(write read)
	6.4.support long string columns
	6.5.support sortBy
	6.6.support withCsvInput(Schema schema);  create schema(JAVA)
	6.7. optimize the write doc
			/**
			* Create a {@link CarbonWriterBuilder} to build a {@link CarbonWriter}
			*/
			public static CarbonWriterBuilder builder() {
				return new CarbonWriterBuilder();
			}

  was:
CSDK: Provide C++ interface for SDK
1.Provide CarbonReader for SDK, it can read carbon data in C++ language
2.Provide CarbonWriter for SDK, it can write carbon data in C++ language


> CSDK: Provide C++ interface for SDK
> -----------------------------------
>
>                 Key: CARBONDATA-2951
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-2951
>             Project: CarbonData
>          Issue Type: Task
>          Components: other
>    Affects Versions: 1.5.0
>            Reporter: xubo245
>            Assignee: xubo245
>            Priority: Critical
>             Fix For: NONE
>
>
> CSDK:  Provide C++ interface for SDK
> 1. Provide CarbonReader for SDK, it can read carbon data in C++ language
> 	##features/interfaces
>        1.1.	create CarbonReader
> 	1.2.	hasNext()
> 	1.3.	readNextRow()
> 	1.4.	close()
> 	1.5.	support OBS(AK/SK/Endpoint)
> 	1.6 	support batch read(withBatch,readNextBatchRow) 
> 	1.7 	support vecor read(default) and carbonrecordreader (withRowRecordReader)
> 	1.8 	projection
> 	
> 	##support data types:
> 	 String, Long,Varchar(string),Short,Int,Date(int),timestamp(long),boolean,Decimal(string),Float
> 	 Array<String> in carbonrecordreader, not support in vectorreader
> 	 byte=>support in java RowUtil, not in C++ carbon reader
> 	 
> 	## Schema and data
> 	 Create table tbl_email_form_to_for_XX( 
> 		Event_Time Timestamp,
> 		Ingestion_Time Timestamp,
> 		From_Email String,
> 		To_Email String,
> 		From_To_type String,
> 		Event_ID String
> 		) using carbon options(path ‘obs://X/tbl_email_form_to_for_XX’)
> 		ETL 6 columns from 18 columns table
> 		
> 		example data:
> 		from_email_36550_phillip.allen@enron.com	to_email_36550_stagecoachmama@hotmail.com	from_to	<29...@thyme>	1538015497000000	9755149200000
> 2. the performance should be reach X millions records/s/node
> 3.Provide CarbonWriter for SDK, it can write carbon data in C++ language
> 	##features/interfaces
> 	3.1.	create CarbonWriter, including create schema(withCsvInput),set outputPath, and build,
> 	3.2.	write()
> 	3.3.	close()
> 	3.4.	support OBS(AK/SK/Endpoint)(withHadoopConf)
> 	3.5.	writtenBy
> 	3.6.     support withTableProperty, withLoadOption,taskNo, uniqueIdentifier, withThreadSafe,  withBlockSize, withBlockletSize, localDictionaryThreshold, enableLocalDictionary in C++ SDK (PR2899 TO BE review)
> 	
> 	##Data types:
> 	   Carbon need support base data types, including string, float, double, int, long, date, timestamp, bool, array<String>.
>           For other, we can convert:
>              char array => carbon string
>              Enum => Carbon string
>               set and list => carbon array<String>
> 	##performance
> 	Writing Performance is not required now
> 	
> 4. read schema function
> readSchema
> getVersionDetails  =>TODO
> 5. support carbonproperties
> 	5.1 addProperty
> 	5.2 getProperty
> 	
> 6.TODO:
> 	6.1.getVersionDetails
> 	6.2.updated SDK/CSDK reader doc
> 	6.3.support byte(write read)
> 	6.4.support long string columns
> 	6.5.support sortBy
> 	6.6.support withCsvInput(Schema schema);  create schema(JAVA)
> 	6.7. optimize the write doc
> 			/**
> 			* Create a {@link CarbonWriterBuilder} to build a {@link CarbonWriter}
> 			*/
> 			public static CarbonWriterBuilder builder() {
> 				return new CarbonWriterBuilder();
> 			}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)