You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by jinhwan choi <ch...@gmail.com> on 2018/11/13 04:39:56 UTC

Fail to read parquet file with wmi log

Hello



I am using drill 1.4.

It works fine. But, with parquet file using WMI LOG(JSON),

drill fail to read data.

drill can several columns correctly or can count correctly.

But, including some columm, it fail to read.



I build parquet file using CTAS with attached json file(WMI LOG)



■CTAS completed without errors



0: jdbc:drill:drillbit=localhost> CREATE TABLE  dfs.tmp.`/test` as

. . . . . . . . . . . . . . . . > select * from
dfs.root.`/home/coyote/temp/test.json`;

+-----------+----------------------------+

| Fragment  | Number of records written  |

+-----------+----------------------------+

| 0_0       | 3910                       |

+-----------+----------------------------+

1 row selected (0.98 seconds)



■COUNT() sql was done without errors



0: jdbc:drill:drillbit=localhost> select count(*) from dfs.tmp.`/test`;

+---------+

| EXPR$0  |

+---------+

| 3910    |

+---------+

1 row selected (0.137 seconds)



■Select some columns sql was done without errors



0: jdbc:drill:drillbit=localhost> select EventIdentifier, Category from
dfs.tmp.`/test` limit 10;

+------------------+-----------+

| EventIdentifier  | Category  |

+------------------+-----------+

| 5156             | 12810     |

| 5156             | 12810     |

| 5156             | 12810     |

| 5156             | 12810     |

| 5156             | 12810     |

| 5156             | 12810     |

| 5156             | 12810     |

| 5156             | 12810     |

| 5156             | 12810     |

| 5156             | 12810     |

+------------------+-----------+

10 rows selected (0.127 seconds)



■select message column sql failed with some errors



0: jdbc:drill:drillbit=localhost> select Message from dfs.tmp.`/test` limit
10;

Error: INTERNAL_ERROR ERROR: Error in parquet record reader.

Message:

Hadoop path: /tmp/test/0_0_0.parquet

Total records read: 0

Row group index: 0

Records in row group: 3910

Parquet Metadata: ParquetMetaData{FileMetaData{schema: message root {

  optional binary TimeWritten (UTF8);

  optional binary Category (UTF8);

  optional binary EventIdentifier (UTF8);

  optional binary TimeGenerated (UTF8);

  optional binary User (UTF8);

  optional binary Message (UTF8);

  optional binary EventType (UTF8);

  optional binary SourceName (UTF8);

  optional binary Data (UTF8);

  optional binary EventCode (UTF8);

  optional binary Type (UTF8);

  optional binary ComputerName (UTF8);

  optional binary InsertionStrings (UTF8);

  optional binary CategoryString (UTF8);

  optional binary RecordNumber (UTF8);

  optional binary Logfile (UTF8);

}

, metadata: {drill-writer.version=2, drill.version=1.14.0}}, blocks:
[BlockMetaData{3910, 3750947 [ColumnMetaData{SNAPPY [TimeWritten] optional
binary TimeWritten (UTF8)  [RLE, PLAIN, BIT_PACKED], 4},
ColumnMetaData{SNAPPY [Category] optional binary Category (UTF8)  [RLE,
PLAIN, BIT_PACKED], 31073}, ColumnMetaData{SNAPPY [EventIdentifier]
optional binary EventIdentifier (UTF8)  [RLE, PLAIN, BIT_PACKED], 33034},
ColumnMetaData{SNAPPY [TimeGenerated] optional binary TimeGenerated (UTF8)
[RLE, PLAIN, BIT_PACKED], 36120}, ColumnMetaData{SNAPPY [User] optional
binary User (UTF8)  [RLE, PLAIN, BIT_PACKED], 67189}, ColumnMetaData{SNAPPY
[Message] optional binary Message (UTF8)  [RLE, PLAIN, BIT_PACKED], 67968},
ColumnMetaData{SNAPPY [EventType] optional binary EventType (UTF8)  [RLE,
PLAIN, BIT_PACKED], 244498}, ColumnMetaData{SNAPPY [SourceName] optional
binary SourceName (UTF8)  [RLE, PLAIN, BIT_PACKED], 245517},
ColumnMetaData{SNAPPY [Data] optional binary Data (UTF8)  [RLE, PLAIN,
BIT_PACKED], 252880}, ColumnMetaData{SNAPPY [EventCode] optional binary
EventCode (UTF8)  [RLE, PLAIN, BIT_PACKED], 253659}, ColumnMetaData{SNAPPY
[Type] optional binary Type (UTF8)  [RLE, PLAIN, BIT_PACKED], 256745},
ColumnMetaData{SNAPPY [ComputerName] optional binary ComputerName (UTF8)
[RLE, PLAIN, BIT_PACKED], 260401}, ColumnMetaData{SNAPPY [InsertionStrings]
optional binary InsertionStrings (UTF8)  [RLE, PLAIN, BIT_PACKED], 263066},
ColumnMetaData{SNAPPY [CategoryString] optional binary CategoryString
(UTF8)  [RLE, PLAIN, BIT_PACKED], 318281}, ColumnMetaData{SNAPPY
[RecordNumber] optional binary RecordNumber (UTF8)  [RLE, PLAIN,
BIT_PACKED], 324803}, ColumnMetaData{SNAPPY [Logfile] optional binary
Logfile (UTF8)  [RLE, PLAIN, BIT_PACKED], 342492}]}]}



Fragment 0:0



[Error Id: b32fdbc9-e346-48cf-87c5-6aaec09a8d8a on tst-nss-hdn06:31010]
(state=,code=0)



■select * column sql failed with some errors



0: jdbc:drill:drillbit=localhost> select * from dfs.tmp.`/test` limit 10;

Error: INTERNAL_ERROR ERROR: Error in parquet record reader.

Message:

Hadoop path: /tmp/test/0_0_0.parquet

Total records read: 0

Row group index: 0

Records in row group: 3910

Parquet Metadata: ParquetMetaData{FileMetaData{schema: message root {

  optional binary TimeWritten (UTF8);

  optional binary Category (UTF8);

  optional binary EventIdentifier (UTF8);

  optional binary TimeGenerated (UTF8);

  optional binary User (UTF8);

  optional binary Message (UTF8);

  optional binary EventType (UTF8);

  optional binary SourceName (UTF8);

 optional binary Data (UTF8);

  optional binary EventCode (UTF8);

  optional binary Type (UTF8);

  optional binary ComputerName (UTF8);

  optional binary InsertionStrings (UTF8);

  optional binary CategoryString (UTF8);

  optional binary RecordNumber (UTF8);

  optional binary Logfile (UTF8);

}

, metadata: {drill-writer.version=2, drill.version=1.14.0}}, blocks:
[BlockMetaData{3910, 3750947 [ColumnMetaData{SNAPPY [TimeWritten] optional
binary TimeWritten (UTF8)  [RLE, PLAIN, BIT_PACKED], 4},
ColumnMetaData{SNAPPY [Category] optional binary Category (UTF8)  [RLE,
PLAIN, BIT_PACKED], 31073}, ColumnMetaData{SNAPPY [EventIdentifier]
optional binary EventIdentifier (UTF8)  [RLE, PLAIN, BIT_PACKED], 33034},
ColumnMetaData{SNAPPY [TimeGenerated] optional binary TimeGenerated (UTF8)
[RLE, PLAIN, BIT_PACKED], 36120}, ColumnMetaData{SNAPPY [User] optional
binary User (UTF8)  [RLE, PLAIN, BIT_PACKED], 67189}, ColumnMetaData{SNAPPY
[Message] optional binary Message (UTF8)  [RLE, PLAIN, BIT_PACKED], 67968},
ColumnMetaData{SNAPPY [EventType] optional binary EventType (UTF8)  [RLE,
PLAIN, BIT_PACKED], 244498}, ColumnMetaData{SNAPPY [SourceName] optional
binary SourceName (UTF8)  [RLE, PLAIN, BIT_PACKED], 245517},
ColumnMetaData{SNAPPY [Data] optional binary Data (UTF8)  [RLE, PLAIN,
BIT_PACKED], 252880}, ColumnMetaData{SNAPPY [EventCode] optional binary
EventCode (UTF8)  [RLE, PLAIN, BIT_PACKED], 253659}, ColumnMetaData{SNAPPY
[Type] optional binary Type (UTF8)  [RLE, PLAIN, BIT_PACKED], 256745},
ColumnMetaData{SNAPPY [ComputerName] optional binary ComputerName (UTF8)
[RLE, PLAIN, BIT_PACKED], 260401}, ColumnMetaData{SNAPPY [InsertionStrings]
optional binary InsertionStrings (UTF8)  [RLE, PLAIN, BIT_PACKED], 263066},
ColumnMetaData{SNAPPY [CategoryString] optional binary CategoryString
(UTF8)  [RLE, PLAIN, BIT_PACKED], 318281}, ColumnMetaData{SNAPPY
[RecordNumber] optional binary RecordNumber (UTF8)  [RLE, PLAIN,
BIT_PACKED], 324803}, ColumnMetaData{SNAPPY [Logfile] optional binary
Logfile (UTF8)  [RLE, PLAIN, BIT_PACKED], 342492}]}]}



Fragment 0:0



[Error Id: 4966ef0e-15ae-42e9-bc2d-91d0660a62c1 on tst-nss-hdn06:31010]
(state=,code=0)



I attached all sample file and drillbit log.



Thank.

Regards,

Choi

Re: Fail to read parquet file with wmi log

Posted by Kunal Khatua <ku...@apache.org>.
Hi Choi

This looks like a bug. Could you please file a bug in the Apache Drill JIRA system?

https://issues.apache.org/jira/issues/?jql=project%20%3D%20DRILL%20ORDER%20BY%20key%20DESC%2C%20cf%5B10010%5D%20DESC%2C%20priority%20DESC%2C%20updated%20DESC

(Click on the red "Create" button).

The data is a little hard to describe, so it would help if you can provide the sample Parquet file as well. 

Thanks
Kunal
On 11/12/2018 9:31:29 PM, jinhwan choi <ch...@gmail.com> wrote:
Hello
I am using drill 1.4.
It works fine. But, with parquet file using WMI LOG(JSON),
drill fail to read data.
drill can several columns correctly or can count correctly.
But, including some columm, it fail to read.
I build parquet file using CTAS with attached json file(WMI LOG)
■CTAS completed without errors
0: jdbc:drill:drillbit=localhost> CREATE TABLE dfs.tmp.`/test` as
. . . . . . . . . . . . . . . . > select * from dfs.root.`/home/coyote/temp/test.json`;
+-----------+----------------------------+
| Fragment | Number of records written |
+-----------+----------------------------+
| 0_0 | 3910 |
+-----------+----------------------------+
1 row selected (0.98 seconds)
■COUNT() sql was done without errors
0: jdbc:drill:drillbit=localhost> select count(*) from dfs.tmp.`/test`;
+---------+
| EXPR$0 |
+---------+
| 3910 |
+---------+
1 row selected (0.137 seconds)
■Select some columns sql was done without errors
0: jdbc:drill:drillbit=localhost> select EventIdentifier, Category from dfs.tmp.`/test` limit 10;
+------------------+-----------+
| EventIdentifier | Category |
+------------------+-----------+
| 5156 | 12810 |
| 5156 | 12810 |
| 5156 | 12810 |
| 5156 | 12810 |
| 5156 | 12810 |
| 5156 | 12810 |
| 5156 | 12810 |
| 5156 | 12810 |
| 5156 | 12810 |
| 5156 | 12810 |
+------------------+-----------+
10 rows selected (0.127 seconds)
■select message column sql failed with some errors
0: jdbc:drill:drillbit=localhost> select Message from dfs.tmp.`/test` limit 10;
Error: INTERNAL_ERROR ERROR: Error in parquet record reader.
Message:
Hadoop path: /tmp/test/0_0_0.parquet
Total records read: 0
Row group index: 0
Records in row group: 3910
Parquet Metadata: ParquetMetaData{FileMetaData{schema: message root {
optional binary TimeWritten (UTF8);
optional binary Category (UTF8);
optional binary EventIdentifier (UTF8);
optional binary TimeGenerated (UTF8);
optional binary User (UTF8);
optional binary Message (UTF8);
optional binary EventType (UTF8);
optional binary SourceName (UTF8);
optional binary Data (UTF8);
optional binary EventCode (UTF8);
optional binary Type (UTF8);
optional binary ComputerName (UTF8);
optional binary InsertionStrings (UTF8);
optional binary CategoryString (UTF8);
optional binary RecordNumber (UTF8);
optional binary Logfile (UTF8);
}
, metadata: {drill-writer.version=2, drill.version=1.14.0}}, blocks: [BlockMetaData{3910, 3750947 [ColumnMetaData{SNAPPY [TimeWritten] optional binary TimeWritten (UTF8) [RLE, PLAIN, BIT_PACKED], 4}, ColumnMetaData{SNAPPY [Category] optional binary Category (UTF8) [RLE, PLAIN, BIT_PACKED], 31073}, ColumnMetaData{SNAPPY [EventIdentifier] optional binary EventIdentifier (UTF8) [RLE, PLAIN, BIT_PACKED], 33034}, ColumnMetaData{SNAPPY [TimeGenerated] optional binary TimeGenerated (UTF8) [RLE, PLAIN, BIT_PACKED], 36120}, ColumnMetaData{SNAPPY [User] optional binary User (UTF8) [RLE, PLAIN, BIT_PACKED], 67189}, ColumnMetaData{SNAPPY [Message] optional binary Message (UTF8) [RLE, PLAIN, BIT_PACKED], 67968}, ColumnMetaData{SNAPPY [EventType] optional binary EventType (UTF8) [RLE, PLAIN, BIT_PACKED], 244498}, ColumnMetaData{SNAPPY [SourceName] optional binary SourceName (UTF8) [RLE, PLAIN, BIT_PACKED], 245517}, ColumnMetaData{SNAPPY [Data] optional binary Data (UTF8) [RLE, PLAIN, BIT_PACKED], 252880}, ColumnMetaData{SNAPPY [EventCode] optional binary EventCode (UTF8) [RLE, PLAIN, BIT_PACKED], 253659}, ColumnMetaData{SNAPPY [Type] optional binary Type (UTF8) [RLE, PLAIN, BIT_PACKED], 256745}, ColumnMetaData{SNAPPY [ComputerName] optional binary ComputerName (UTF8) [RLE, PLAIN, BIT_PACKED], 260401}, ColumnMetaData{SNAPPY [InsertionStrings] optional binary InsertionStrings (UTF8) [RLE, PLAIN, BIT_PACKED], 263066}, ColumnMetaData{SNAPPY [CategoryString] optional binary CategoryString (UTF8) [RLE, PLAIN, BIT_PACKED], 318281}, ColumnMetaData{SNAPPY [RecordNumber] optional binary RecordNumber (UTF8) [RLE, PLAIN, BIT_PACKED], 324803}, ColumnMetaData{SNAPPY [Logfile] optional binary Logfile (UTF8) [RLE, PLAIN, BIT_PACKED], 342492}]}]}
Fragment 0:0
[Error Id: b32fdbc9-e346-48cf-87c5-6aaec09a8d8a on tst-nss-hdn06:31010] (state=,code=0)
■select * column sql failed with some errors
0: jdbc:drill:drillbit=localhost> select * from dfs.tmp.`/test` limit 10;
Error: INTERNAL_ERROR ERROR: Error in parquet record reader.
Message:
Hadoop path: /tmp/test/0_0_0.parquet
Total records read: 0
Row group index: 0
Records in row group: 3910
Parquet Metadata: ParquetMetaData{FileMetaData{schema: message root {
optional binary TimeWritten (UTF8);
optional binary Category (UTF8);
optional binary EventIdentifier (UTF8);
optional binary TimeGenerated (UTF8);
optional binary User (UTF8);
optional binary Message (UTF8);
optional binary EventType (UTF8);
optional binary SourceName (UTF8);
optional binary Data (UTF8);
optional binary EventCode (UTF8);
optional binary Type (UTF8);
optional binary ComputerName (UTF8);
optional binary InsertionStrings (UTF8);
optional binary CategoryString (UTF8);
optional binary RecordNumber (UTF8);
optional binary Logfile (UTF8);
}
, metadata: {drill-writer.version=2, drill.version=1.14.0}}, blocks: [BlockMetaData{3910, 3750947 [ColumnMetaData{SNAPPY [TimeWritten] optional binary TimeWritten (UTF8) [RLE, PLAIN, BIT_PACKED], 4}, ColumnMetaData{SNAPPY [Category] optional binary Category (UTF8) [RLE, PLAIN, BIT_PACKED], 31073}, ColumnMetaData{SNAPPY [EventIdentifier] optional binary EventIdentifier (UTF8) [RLE, PLAIN, BIT_PACKED], 33034}, ColumnMetaData{SNAPPY [TimeGenerated] optional binary TimeGenerated (UTF8) [RLE, PLAIN, BIT_PACKED], 36120}, ColumnMetaData{SNAPPY [User] optional binary User (UTF8) [RLE, PLAIN, BIT_PACKED], 67189}, ColumnMetaData{SNAPPY [Message] optional binary Message (UTF8) [RLE, PLAIN, BIT_PACKED], 67968}, ColumnMetaData{SNAPPY [EventType] optional binary EventType (UTF8) [RLE, PLAIN, BIT_PACKED], 244498}, ColumnMetaData{SNAPPY [SourceName] optional binary SourceName (UTF8) [RLE, PLAIN, BIT_PACKED], 245517}, ColumnMetaData{SNAPPY [Data] optional binary Data (UTF8) [RLE, PLAIN, BIT_PACKED], 252880}, ColumnMetaData{SNAPPY [EventCode] optional binary EventCode (UTF8) [RLE, PLAIN, BIT_PACKED], 253659}, ColumnMetaData{SNAPPY [Type] optional binary Type (UTF8) [RLE, PLAIN, BIT_PACKED], 256745}, ColumnMetaData{SNAPPY [ComputerName] optional binary ComputerName (UTF8) [RLE, PLAIN, BIT_PACKED], 260401}, ColumnMetaData{SNAPPY [InsertionStrings] optional binary InsertionStrings (UTF8) [RLE, PLAIN, BIT_PACKED], 263066}, ColumnMetaData{SNAPPY [CategoryString] optional binary CategoryString (UTF8) [RLE, PLAIN, BIT_PACKED], 318281}, ColumnMetaData{SNAPPY [RecordNumber] optional binary RecordNumber (UTF8) [RLE, PLAIN, BIT_PACKED], 324803}, ColumnMetaData{SNAPPY [Logfile] optional binary Logfile (UTF8) [RLE, PLAIN, BIT_PACKED], 342492}]}]}
Fragment 0:0
[Error Id: 4966ef0e-15ae-42e9-bc2d-91d0660a62c1 on tst-nss-hdn06:31010] (state=,code=0)
I attached all sample file and drillbit log.
Thank.
Regards,
Choi