You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Juan Rodríguez Hortalá <ju...@gmail.com> on 2017/05/05 21:43:36 UTC

How to create a patch that contains a binary file

Hi,

For HIVE-16539 I created a patch that adds a new ORC file, using `git diff
--no-prefix` as specified in
https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-CreatingaPatch.
The corresponding jenkins build
<http://104.198.109.242/logs/PreCommit-HIVE-Build-5048/failed/238_UTBatch_itests__hive-blobstore_2_tests/logs/hive.log>
is failing with

2017-05-05T10:00:30,151 ERROR [4dda13e3-e900-4d86-a654-bca8c14720cd
main] ql.Driver: FAILED: SemanticException Line 3:23 Invalid path
''../../data/files/part.orc'': No files matching path
file:/home/hiveptest/35.188.114.194-hiveptest-1/apache-github-source-source/data/files/part.orc
org.apache.hadoop.hive.ql.parse.SemanticException: Line 3:23 Invalid
path ''../../data/files/part.orc'': No files matching path
file:/home/hiveptest/35.188.114.194-hiveptest-1/apache-github-source-source/data/files/part.orc


I think this is because the patch is not creating the ORC file
correctly when it is applied. When I apply the patch locally on an
updated clone of https://github.com/apache/hive.git in master, the
patches applies ok but the resulting file data/files/part.orc is
different from the original file I used to build the patch, and when I
try to load it into a table in a local hive instance I get "FAILED:
SemanticException Unable to load data to destination table. Error: The
file that you are trying to load does not match the file format of the
destination table". Similarly, `hive --service orcfiledump
data/files/part.orc` fails with "Exception in thread "main"
java.lang.IndexOutOfBoundsException".

So it looks like the patch is malformed for the ORC file because it is
binary. Should I use bsdiff to build the patch instead? What is the
expected way for building patches involving binary files?


Thanks,


Juan

Re: How to create a patch that contains a binary file

Posted by Juan Rodríguez Hortalá <ju...@gmail.com>.
Hi Owen,

That worked just fine, and I can now apply the patch with `git apply` and
the ORC file is ok.

Thanks a lot for your help.

Greetings,

Juan


On Fri, May 5, 2017 at 2:55 PM, Owen O'Malley <ow...@gmail.com>
wrote:

> Try:
>
> % git format-patch --stdout HEAD^ > HIVE-1234.1.patch
>
> That will generate a git format patch that should preserve the binary file.
>
> .. Owen
>
> On Fri, May 5, 2017 at 2:43 PM, Juan Rodríguez Hortalá <
> juan.rodriguez.hortala@gmail.com> wrote:
>
> > Hi,
> >
> > For HIVE-16539 I created a patch that adds a new ORC file, using `git
> diff
> > --no-prefix` as specified in
> > https://cwiki.apache.org/confluence/display/Hive/HowToContribute#
> > HowToContribute-CreatingaPatch.
> > The corresponding jenkins build
> > <http://104.198.109.242/logs/PreCommit-HIVE-Build-5048/
> > failed/238_UTBatch_itests__hive-blobstore_2_tests/logs/hive.log>
> > is failing with
> >
> > 2017-05-05T10:00:30,151 ERROR [4dda13e3-e900-4d86-a654-bca8c14720cd
> > main] ql.Driver: FAILED: SemanticException Line 3:23 Invalid path
> > ''../../data/files/part.orc'': No files matching path
> > file:/home/hiveptest/35.188.114.194-hiveptest-1/apache-
> > github-source-source/data/files/part.orc
> > org.apache.hadoop.hive.ql.parse.SemanticException: Line 3:23 Invalid
> > path ''../../data/files/part.orc'': No files matching path
> > file:/home/hiveptest/35.188.114.194-hiveptest-1/apache-
> > github-source-source/data/files/part.orc
> >
> >
> > I think this is because the patch is not creating the ORC file
> > correctly when it is applied. When I apply the patch locally on an
> > updated clone of https://github.com/apache/hive.git in master, the
> > patches applies ok but the resulting file data/files/part.orc is
> > different from the original file I used to build the patch, and when I
> > try to load it into a table in a local hive instance I get "FAILED:
> > SemanticException Unable to load data to destination table. Error: The
> > file that you are trying to load does not match the file format of the
> > destination table". Similarly, `hive --service orcfiledump
> > data/files/part.orc` fails with "Exception in thread "main"
> > java.lang.IndexOutOfBoundsException".
> >
> > So it looks like the patch is malformed for the ORC file because it is
> > binary. Should I use bsdiff to build the patch instead? What is the
> > expected way for building patches involving binary files?
> >
> >
> > Thanks,
> >
> >
> > Juan
> >
>

Re: How to create a patch that contains a binary file

Posted by Owen O'Malley <ow...@gmail.com>.
Try:

% git format-patch --stdout HEAD^ > HIVE-1234.1.patch

That will generate a git format patch that should preserve the binary file.

.. Owen

On Fri, May 5, 2017 at 2:43 PM, Juan Rodríguez Hortalá <
juan.rodriguez.hortala@gmail.com> wrote:

> Hi,
>
> For HIVE-16539 I created a patch that adds a new ORC file, using `git diff
> --no-prefix` as specified in
> https://cwiki.apache.org/confluence/display/Hive/HowToContribute#
> HowToContribute-CreatingaPatch.
> The corresponding jenkins build
> <http://104.198.109.242/logs/PreCommit-HIVE-Build-5048/
> failed/238_UTBatch_itests__hive-blobstore_2_tests/logs/hive.log>
> is failing with
>
> 2017-05-05T10:00:30,151 ERROR [4dda13e3-e900-4d86-a654-bca8c14720cd
> main] ql.Driver: FAILED: SemanticException Line 3:23 Invalid path
> ''../../data/files/part.orc'': No files matching path
> file:/home/hiveptest/35.188.114.194-hiveptest-1/apache-
> github-source-source/data/files/part.orc
> org.apache.hadoop.hive.ql.parse.SemanticException: Line 3:23 Invalid
> path ''../../data/files/part.orc'': No files matching path
> file:/home/hiveptest/35.188.114.194-hiveptest-1/apache-
> github-source-source/data/files/part.orc
>
>
> I think this is because the patch is not creating the ORC file
> correctly when it is applied. When I apply the patch locally on an
> updated clone of https://github.com/apache/hive.git in master, the
> patches applies ok but the resulting file data/files/part.orc is
> different from the original file I used to build the patch, and when I
> try to load it into a table in a local hive instance I get "FAILED:
> SemanticException Unable to load data to destination table. Error: The
> file that you are trying to load does not match the file format of the
> destination table". Similarly, `hive --service orcfiledump
> data/files/part.orc` fails with "Exception in thread "main"
> java.lang.IndexOutOfBoundsException".
>
> So it looks like the patch is malformed for the ORC file because it is
> binary. Should I use bsdiff to build the patch instead? What is the
> expected way for building patches involving binary files?
>
>
> Thanks,
>
>
> Juan
>