You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Jean-Claude Cote <jc...@gmail.com> on 2018/10/31 00:44:50 UTC

msgpack reading schema files checksum error

I'm writing a msgpack reader which supports schema validation. The msgpack
reader is able to discover the schema and store the result in a file
named .schema.proto along side the data files. There is also an additional
..schema.proto.crc file created by the hadoop file system I believe.

However even if the reader can discover the schema I would like to be able
to edit the file manually. However when I do the checksum file does not
match anymore and my reader fails to load the file.

My question is how can I read a file ignoring the checksum file. Or how
difficult is it to produce these checksum files.

I save the file like so
    try (FSDataOutputStream out = fileSystem.create(schemaLocation, true)) {

This call fails if I modify the schema file manually.
      try (FSDataInputStream in = fileSystem.open(schemaLocation)) {

Thank you
jc

Re: msgpack reading schema files checksum error

Posted by Padma Penumarthy <pe...@gmail.com>.
How are you modifying the file manually ?
Are you copying to local file system, make changes and copying back to HDFS
?

Thanks
Padma


On Tue, Oct 30, 2018 at 5:45 PM Jean-Claude Cote <jc...@gmail.com> wrote:

> I'm writing a msgpack reader which supports schema validation. The msgpack
> reader is able to discover the schema and store the result in a file
> named .schema.proto along side the data files. There is also an additional
> ..schema.proto.crc file created by the hadoop file system I believe.
>
> However even if the reader can discover the schema I would like to be able
> to edit the file manually. However when I do the checksum file does not
> match anymore and my reader fails to load the file.
>
> My question is how can I read a file ignoring the checksum file. Or how
> difficult is it to produce these checksum files.
>
> I save the file like so
>     try (FSDataOutputStream out = fileSystem.create(schemaLocation, true))
> {
>
> This call fails if I modify the schema file manually.
>       try (FSDataInputStream in = fileSystem.open(schemaLocation)) {
>
> Thank you
> jc
>

Re: msgpack reading schema files checksum error

Posted by Jean-Claude Cote <jc...@gmail.com>.
I think the mistake is at my end. If I delete the .crc file it still can
read the schema file. I must have forgotten to remove the .crc file once I
had manually tweaked the schema file.

I tweak the schema file manually sometimes. For example if an array
contains mix data types I might say in the schema that it will be an array
of varbinary instead of what it detected say an array of bigint.

thanks Paul
jc

On Tue, Oct 30, 2018 at 9:24 PM Paul Rogers <pa...@yahoo.com.invalid>
wrote:

> Looks like Google found a couple of hits: [1] and [2]
>
>
> I'm not an expert here, but I wonder if you can just remove the file.
> Never had Drill or HDFS complain when asking it to read a local file
> without the .crc file...
>
> Thanks,
> - Paul
>
> [1]
> https://stackoverflow.com/questions/49375908/hadoop-copytolocal-creates-crc-files
>
> [2]
> https://community.hortonworks.com/questions/19449/hadoop-localfilesystem-checksum-calculation.html
>
>
>
>     On Tuesday, October 30, 2018, 5:45:11 PM PDT, Jean-Claude Cote <
> jccote@gmail.com> wrote:
>
>  I'm writing a msgpack reader which supports schema validation. The msgpack
> reader is able to discover the schema and store the result in a file
> named .schema.proto along side the data files. There is also an additional
> ..schema.proto.crc file created by the hadoop file system I believe.
>
> However even if the reader can discover the schema I would like to be able
> to edit the file manually. However when I do the checksum file does not
> match anymore and my reader fails to load the file.
>
> My question is how can I read a file ignoring the checksum file. Or how
> difficult is it to produce these checksum files.
>
> I save the file like so
>     try (FSDataOutputStream out = fileSystem.create(schemaLocation, true))
> {
>
> This call fails if I modify the schema file manually.
>       try (FSDataInputStream in = fileSystem.open(schemaLocation)) {
>
> Thank you
> jc
>

Re: msgpack reading schema files checksum error

Posted by Paul Rogers <pa...@yahoo.com.INVALID>.
Looks like Google found a couple of hits: [1] and [2]


I'm not an expert here, but I wonder if you can just remove the file. Never had Drill or HDFS complain when asking it to read a local file without the .crc file...

Thanks,
- Paul

[1] https://stackoverflow.com/questions/49375908/hadoop-copytolocal-creates-crc-files

[2] https://community.hortonworks.com/questions/19449/hadoop-localfilesystem-checksum-calculation.html

 

    On Tuesday, October 30, 2018, 5:45:11 PM PDT, Jean-Claude Cote <jc...@gmail.com> wrote:  
 
 I'm writing a msgpack reader which supports schema validation. The msgpack
reader is able to discover the schema and store the result in a file
named .schema.proto along side the data files. There is also an additional
..schema.proto.crc file created by the hadoop file system I believe.

However even if the reader can discover the schema I would like to be able
to edit the file manually. However when I do the checksum file does not
match anymore and my reader fails to load the file.

My question is how can I read a file ignoring the checksum file. Or how
difficult is it to produce these checksum files.

I save the file like so
    try (FSDataOutputStream out = fileSystem.create(schemaLocation, true)) {

This call fails if I modify the schema file manually.
      try (FSDataInputStream in = fileSystem.open(schemaLocation)) {

Thank you
jc