You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@phoenix.apache.org by "Swaroopa Kadam (JIRA)" <ji...@apache.org> on 2018/09/14 18:35:00 UTC

[jira] [Commented] (PHOENIX-4872) BulkLoad has bug when loading on single-cell-array-with-offsets table.

    [ https://issues.apache.org/jira/browse/PHOENIX-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16615215#comment-16615215 ] 

Swaroopa Kadam commented on PHOENIX-4872:
-----------------------------------------

I am using below test case to reproduce the issue.

And the test passes just fine. Could you please tell me if anything is missing from the test, [~mini666] ?
{code:java}
// code placeholder

@Test
public void testImportInSingleCellArrayWithOffsetsTable() throws Exception {
String tableName = generateUniqueName();
Statement stmt = conn.createStatement();
stmt.execute("CREATE IMMUTABLE TABLE S.TABLE12 (ID INTEGER NOT NULL PRIMARY KEY, NAME VARCHAR, T DATE, CF1.T2 DATE, CF2.T3 DATE)" +
" IMMUTABLE_STORAGE_SCHEME=SINGLE_CELL_ARRAY_WITH_OFFSETS");
PhoenixConnection phxConn = conn.unwrap(PhoenixConnection.class);
PTable table = phxConn.getTable(new PTableKey(null, "S.TABLE12"));
assertEquals(PTable.ImmutableStorageScheme.SINGLE_CELL_ARRAY_WITH_OFFSETS, table.getImmutableStorageScheme());
 
FileSystem fs = FileSystem.get(getUtility().getConfiguration());
FSDataOutputStream outputStream = fs.create(new Path("/tmp/inputSCAWO.csv"));
PrintWriter printWriter = new PrintWriter(outputStream);
printWriter.println("1,Name 1,1970/01/01,1970/02/01,1970/03/01");
printWriter.println("2,Name 2,1970/01/02,1970/02/02,1970/03/02");
printWriter.println("1,Name 1,1970/01/01,1970/02/03,1970/03/01");
printWriter.println("2,Name 2,1970/01/02,1970/02/04,1970/03/02");
printWriter.println("1,Name 1,1970/01/01,1970/02/05,1970/03/01");
printWriter.println("2,Name 2,1970/01/02,1970/02/06,1970/03/02");
printWriter.println("1,Name 1,1970/01/01,1970/02/07,1970/03/01");
printWriter.println("2,Name 2,1970/01/02,1970/02/08,1970/03/02");
printWriter.close();
 
CsvBulkLoadTool csvBulkLoadTool = new CsvBulkLoadTool();
csvBulkLoadTool.setConf(new Configuration(getUtility().getConfiguration()));
csvBulkLoadTool.getConf().set(DATE_FORMAT_ATTRIB,"yyyy/MM/dd");
int exitCode = csvBulkLoadTool.run(new String[] {
"--input", "/tmp/inputSCAWO.csv",
"--table", "table12",
"--schema", "s",
"--zookeeper", zkQuorum});
assertEquals(0, exitCode);
 
ResultSet rs = stmt.executeQuery("SELECT name, max(CF1.T2) FROM s.table12 GROUP BY name");
assertTrue(rs.next());
assertEquals("Name 1", rs.getString(1));
assertEquals(DateUtil.parseDate("1970-02-07"), rs.getDate(2));
assertTrue(rs.next());
assertEquals("Name 2", rs.getString(1));
assertEquals(DateUtil.parseDate("1970-02-08"), rs.getDate(2));
assertFalse(rs.next());
 
rs.close();
stmt.close();
 
}

{code}

> BulkLoad has bug when loading on single-cell-array-with-offsets table.
> ----------------------------------------------------------------------
>
>                 Key: PHOENIX-4872
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4872
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.11.0, 4.12.0, 4.13.0, 4.14.0
>            Reporter: JeongMin Ju
>            Assignee: Swaroopa Kadam
>            Priority: Critical
>
> CsvBulkLoadTool creates incorrect data for the SCAWO(SingleCellArrayWithOffsets) table.
> Every phoenix table needs a marker (empty) column, but CsvBulkLoadTool does not create that column for SCAWO tables.
> If you check the data through HBase Shell, you can see that there is no corresponding column.
>  If created by Upsert Query, it is created normally.
> {code:java}
> column=0:\x00\x00\x00\x00, timestamp=1535420036372, value=x
> {code}
> Since there is no upper column, the result of all Group By queries is zero.
> This is because "families":
> {"0": ["\\ x00 \\ x00 \\ x00 \\ x00"]}
> is added to the column of the Scan object.
> Because the CsvBulkLoadTool has not created the column, the result of the scan is empty.
>  
> This problem applies only to tables with multiple column families. The single-column family table works luckily.
> "Families": \{"0": ["ALL"]} is added to the column of the Scan object in the single column family table. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)