You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hivemall.apache.org by Makoto Yui <my...@apache.org> on 2016/11/29 12:53:14 UTC

Importing History or Not at Initial Code Dump

Hi,

At performing the initial code dump [1], the choice of importing
history or not is left to the community.
[1] http://incubator.apache.org/guides/mentor.html#initial-import-code-dump

I'm considering to import from the depth 1 shallow copy of master
branch because cloning Hivemall repository takes long to clone due to
large binary files were imported in the past.

Thoughts? > Takeshi, Kai

$ git_find_big.sh
(downloaded from
https://confluence.atlassian.com/bitbucket/maintaining-a-git-repository-321848291.html
)

All sizes are in kB's. The pack column is the size of the object,
compressed, inside the pack file.
size   pack   SHA                                       location
14705  13419  2024b5df95e5972b16e5da6b063f4f1e65e96421  target/hivemall-fat.jar
13761  12515  84dbfe3fee95557342446fb3a4a9aee9f892dc37  target/hivemall-fat.jar
8898   8064   4bca62df38c5c506dc47627a249dce2fb4096f1b
lib/hive/hive-exec-0.12.0.jar
8348   7935   d2a3efab63b5a21ebf0a665b3103cdec25bbd367
target/hivemall-nlp-with-dependencies.jar
6109   5558   b3890a58ebc4457f6592f02c76ac147d9a8f961e  lib/hive-exec-0.11.0.jar
4490   4472   9b01e9abea6a3636a0ade1cf4a889e83b177e32b
lib/lucene-analyzers-kuromoji-5.3.1.jar
3778   3508   32da99d5caad1fd7d199fa41acbe46af7e078603
lib/hadoop-core-0.20.2-cdh3u6.jar
3447   3122   d3a3f74edcf5455eb3cf480319296e2db8eb7574
lib/hive/hive-exec-0.9.0.jar
2301   2095   9ffa9173b103500ffe1d28321d08ddb5a8ed6df8
lib/lucene-core-5.3.1.jar
2042   1862   28740e444d5071d3d03027a33e38bd3e69992fb2
target/hivemall-with-dependencies.jar
1766   1677   103b588e15f6b7b44368a216cb4c4ed4105f727b
lib/source/lucene-core-5.3.1-sources.jar
1526   1373   a8713840cca091fc21a54f75dad8260ed2d810bd
lib/lucene-analyzers-common-5.3.1.jar
1493   1340   4a87ce9173e27913c69cd06f6fa300e40471e842  target/hivemall-fat.jar
1490   1395   a0aab7c42b1f7a7d1ddfff64eef22540b6a00dd6
lib/source/lucene-analyzers-common-5.3.1-sources.jar
1425   1305   5f109a2bdf6b8d75a4488cd97d5f03f51c37f946
target/hivemall-mixserv.jar
1409   1300   b04c08cf7c63229f2ca5f31574888bb00ba86790
lib/source/netty-all-4.0.23.Final-sources.jar
1391   1383   b8d432e6a3c0074951abd35caf0a777caf47afbf
xgboost/lib/xgboost4j_0.60-0.10.jar
1359   857    4e8fb11de168b0425de9755f2cfa0b0a4b4eefd2  target/hivemall-all.jar
1356   1212   5d28e1dd9e411a26fe6437c1c77e81ad87325370  target/hivemall-fat.jar
1331   1258   89db746fcb20be1e13a23c79a7f5334533e1ad22
target/hivemall-with-dependencies.jar
1265   1219   7482e31f85c6605de15dba63175a110f51c03de6
lib/deprecated/hive-exec-0.8.1.jar
1205   1051   c831489cd99ab87d95dd7a11f153ab318c5c0e6c
lib/optional/mockito-all-1.10.19.jar
1198   1130   ced3a5d79beedfc5ff237f901b953a09b963b9f0
target/hivemall-with-dependencies.jar
1190   1016   695078e93df73a2d994ef98ec27be4a6207d0706
lib/optional/guava-r09-jarjar.jar
1146   1024   1b4275262689be192ffc1e8f596eb19b44a0d6a3  target/hivemall-fat.jar
...

We can rewrite commit history as follows but it requires existing pull
requests to be rebased.

$ git filter-branch --index-filter 'git rm -r --cached
--ignore-unmatch lib/ target/*.jar' --prune-empty -- --all
$ rm -rf .git/refs/original/
$ git reflog expire --expire=now --all
$ git gc --aggressive --prune=now

Also, I'm asking ASF INFRA team about the possibility to transfer
Hivemall github repository to ASF account in
https://issues.apache.org/jira/browse/INFRA-12995

Thanks,
Makoto

Re: Importing History or Not at Initial Code Dump

Posted by Takeshi Yamamuro <li...@gmail.com>.
Great work!

// maropu

On Thu, Dec 1, 2016 at 2:29 PM, Makoto Yui <yu...@gmail.com> wrote:

> Hi,
>
> Done the initial code dump!
> https://github.com/apache/incubator-hivemall
>
> Let's move development (Pull requests) to the ASF repository.
>
> I'll update the project status page soon (and Dec report).
>
> Thanks,
> Makoto
>
>
> 2016-11-30 21:04 GMT+09:00 Makoto Yui <yu...@gmail.com>:
> > I'm considering to import https://github.com/myui/incubator-hivemall
> > to ASF repository tomorrow.
> > Let me know if it's NOT okey.
> >
> > Github tag/release issue is my concern though ..
> > https://lists.apache.org/thread.html/db78e1f8fc121d8e6b016d2f61d06c
> cafebf9fd30b4ec00883c78557@%3Clegal-discuss.apache.org%3E
> >
> > I would like to remain the past git tags to keep track of changes.
> >
> > Thanks,
> > Makoto
> >
> > 2016-11-30 20:35 GMT+09:00 Makoto Yui <yu...@gmail.com>:
> >> I'm considering to update the following way because git push does not
> >> work when performing shallow copy (maybe due to ASF git server
> >> version/configuration).
> >>
> >> You can find the tested repository on https://github.com/myui/
> incubator-hivemall
> >>
> >> $ git clone https://github.com/myui/hivemall.git incubator-hivemall
> >> $ git filter-branch --index-filter 'git rm -r --cached
> >> --ignore-unmatch lib/ target/*.jar' --tag-name-filter cat
> >> --prune-empty -- --all
> >> $ rm -rf .git/refs/original/
> >> $ git reflog expire --expire=now --all
> >> $ git gc --aggressive --prune=now
> >> $ git remote set-url origin https://github.com/myui/
> incubator-hivemall.git
> >> $ git push -f -u origin master
> >> $ git push origin --tags --force
> >>
> >> $ git clone https://github.com/myui/incubator-hivemall.git
> >> $ cd incubator-hivemall
> >> $ git_find_big.sh | head -10
> >>
> >> All sizes are in kB's. The pack column is the size of the object,
> >> compressed, inside the pack file.
> >> size  pack  SHA                                       location
> >> 1391  1383  b8d432e6a3c0074951abd35caf0a777caf47afbf
> >> xgboost/lib/xgboost4j_0.60-0.10.jar
> >> 765   303   11c617713ee2ad3f847aee7627ee8639c5a79667
> >> core/src/test/resources/hivemall/mf/ml1k.train
> >> 639   613   de4e32983604238bc72fe3f6cb6beea76fde0e8d
> >> src/site/resources/images/hivemall_overview_bg.png
> >> 382   117   8b66187fe067c3aa389ce8c98108f349ceae159c
> >> src/site/resources/fonts/fontawesome-webfont.svg
> >> 220   192   04d8605fd8daaafa72a2b6dfa2a2d48c75c57a10
> >> src/site/resources/images/asf_bg.png
> >> 194   186   fb29a3d2ee04b7981463de89a77ccc7436f4ad9a
> >> docs/gitbook/resources/images/techstack.png
> >> 191   76    e00b1127f6fb4fdcc1606a20b05e16b5456acacc
> >> core/src/test/resources/hivemall/mf/ml1k.test
> >> 149   88    f221e50a2ef60738ba30932d834530cdfe55cb3e
> >> src/site/resources/fonts/fontawesome-webfont.ttf
> >>
> >> 2016-11-30 14:31 GMT+09:00 Makoto Yui <yu...@gmail.com>:
> >>> Hi Takeshi,
> >>>
> >>> I was almost to perform the initial code dump (stopped).
> >>>
> >>> Be aware almost all commit hash will be changed when rewriting Git
> logs by [1].
> >>> [1] git filter-branch --index-filter 'git rm -r --cached
> >>> --ignore-unmatch lib/ target/*.jar' --prune-empty -- --all
> >>>
> >>> So, I'm considering to make a shallow copy limiting 100-300 or so
> >>> (that does not include large binaries).
> >>>
> >>> Thanks,
> >>> Makoto
> >>>
> >>> 2016-11-30 2:44 GMT+09:00 Takeshi Yamamuro <li...@gmail.com>:
> >>>> Hi, all
> >>>>
> >>>> I also have no strong opinion though, it seems it'd be better to keep
> as
> >>>> much activities (that is, commit logs) as possible there.
> >>>> I'm afraid few activity logs possibly make newbies misunderstand that
> >>>>  hivemall is inactive.
> >>>>
> >>>> As for the rebasing, it's not tough to rebase #285 (this is my own
> pr).
> >>>> So, rewriting the logs sounds good to me.
> >>>>
> >>>> // maropu
> >>>>
> >>>> On Tue, Nov 29, 2016 at 11:24 PM, Makoto Yui <yu...@gmail.com>
> wrote:
> >>>>
> >>>>> Kai,
> >>>>>
> >>>>> 2016-11-29 22:35 GMT+09:00 Kai Sasaki <sa...@treasure-data.com>:
> >>>>> > Currently we have 6 PRs and some of them (especially #285, #336
> and #385)
> >>>>> > are relatively large.
> >>>>> > It might cause somewhat troublesome rebasing.
> >>>>>
> >>>>> Yes, it's my concern.
> >>>>>
> >>>>> But, such large PRs should better to be contributed in the Apache
> >>>>> Incubation process.
> >>>>> I'm considering to invite some of them to the Hivemall committer.
> >>>>>
> >>>>> Another concern is moving github stars/watchers as seen in [1].
> >>>>> [1] https://issues.apache.org/jira/browse/INFRA-12995
> >>>>>
> >>>>> > Do you think some of them are not ready to be merged? I think
> merging
> >>>>> some
> >>>>> > of them before reflogging history
> >>>>> > can make migrating work easy. But if they are not ready, it's
> okay. We
> >>>>> can
> >>>>> > work on rebasing after this work.
> >>>>>
> >>>>> I'm currently reviewing #385 but it need to be revised in several
> parts.
> >>>>> Also, #336 requires large refactoring.
> >>>>>
> >>>>> So, better to do initial code dump first.
> >>>>>
> >>>>> Shallow copied repository can be pushed from git v1.9 and later
> >>>>> (I'm not sure about ASF git version though).
> >>>>> http://blogs.atlassian.com/2014/05/handle-big-repositories-git/
> >>>>>
> >>>>> Thanks,
> >>>>> Makoto
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> ---
> >>>> Takeshi Yamamuro
>



-- 
---
Takeshi Yamamuro

Re: Importing History or Not at Initial Code Dump

Posted by Makoto Yui <yu...@gmail.com>.
Hi,

Done the initial code dump!
https://github.com/apache/incubator-hivemall

Let's move development (Pull requests) to the ASF repository.

I'll update the project status page soon (and Dec report).

Thanks,
Makoto


2016-11-30 21:04 GMT+09:00 Makoto Yui <yu...@gmail.com>:
> I'm considering to import https://github.com/myui/incubator-hivemall
> to ASF repository tomorrow.
> Let me know if it's NOT okey.
>
> Github tag/release issue is my concern though ..
> https://lists.apache.org/thread.html/db78e1f8fc121d8e6b016d2f61d06ccafebf9fd30b4ec00883c78557@%3Clegal-discuss.apache.org%3E
>
> I would like to remain the past git tags to keep track of changes.
>
> Thanks,
> Makoto
>
> 2016-11-30 20:35 GMT+09:00 Makoto Yui <yu...@gmail.com>:
>> I'm considering to update the following way because git push does not
>> work when performing shallow copy (maybe due to ASF git server
>> version/configuration).
>>
>> You can find the tested repository on https://github.com/myui/incubator-hivemall
>>
>> $ git clone https://github.com/myui/hivemall.git incubator-hivemall
>> $ git filter-branch --index-filter 'git rm -r --cached
>> --ignore-unmatch lib/ target/*.jar' --tag-name-filter cat
>> --prune-empty -- --all
>> $ rm -rf .git/refs/original/
>> $ git reflog expire --expire=now --all
>> $ git gc --aggressive --prune=now
>> $ git remote set-url origin https://github.com/myui/incubator-hivemall.git
>> $ git push -f -u origin master
>> $ git push origin --tags --force
>>
>> $ git clone https://github.com/myui/incubator-hivemall.git
>> $ cd incubator-hivemall
>> $ git_find_big.sh | head -10
>>
>> All sizes are in kB's. The pack column is the size of the object,
>> compressed, inside the pack file.
>> size  pack  SHA                                       location
>> 1391  1383  b8d432e6a3c0074951abd35caf0a777caf47afbf
>> xgboost/lib/xgboost4j_0.60-0.10.jar
>> 765   303   11c617713ee2ad3f847aee7627ee8639c5a79667
>> core/src/test/resources/hivemall/mf/ml1k.train
>> 639   613   de4e32983604238bc72fe3f6cb6beea76fde0e8d
>> src/site/resources/images/hivemall_overview_bg.png
>> 382   117   8b66187fe067c3aa389ce8c98108f349ceae159c
>> src/site/resources/fonts/fontawesome-webfont.svg
>> 220   192   04d8605fd8daaafa72a2b6dfa2a2d48c75c57a10
>> src/site/resources/images/asf_bg.png
>> 194   186   fb29a3d2ee04b7981463de89a77ccc7436f4ad9a
>> docs/gitbook/resources/images/techstack.png
>> 191   76    e00b1127f6fb4fdcc1606a20b05e16b5456acacc
>> core/src/test/resources/hivemall/mf/ml1k.test
>> 149   88    f221e50a2ef60738ba30932d834530cdfe55cb3e
>> src/site/resources/fonts/fontawesome-webfont.ttf
>>
>> 2016-11-30 14:31 GMT+09:00 Makoto Yui <yu...@gmail.com>:
>>> Hi Takeshi,
>>>
>>> I was almost to perform the initial code dump (stopped).
>>>
>>> Be aware almost all commit hash will be changed when rewriting Git logs by [1].
>>> [1] git filter-branch --index-filter 'git rm -r --cached
>>> --ignore-unmatch lib/ target/*.jar' --prune-empty -- --all
>>>
>>> So, I'm considering to make a shallow copy limiting 100-300 or so
>>> (that does not include large binaries).
>>>
>>> Thanks,
>>> Makoto
>>>
>>> 2016-11-30 2:44 GMT+09:00 Takeshi Yamamuro <li...@gmail.com>:
>>>> Hi, all
>>>>
>>>> I also have no strong opinion though, it seems it'd be better to keep as
>>>> much activities (that is, commit logs) as possible there.
>>>> I'm afraid few activity logs possibly make newbies misunderstand that
>>>>  hivemall is inactive.
>>>>
>>>> As for the rebasing, it's not tough to rebase #285 (this is my own pr).
>>>> So, rewriting the logs sounds good to me.
>>>>
>>>> // maropu
>>>>
>>>> On Tue, Nov 29, 2016 at 11:24 PM, Makoto Yui <yu...@gmail.com> wrote:
>>>>
>>>>> Kai,
>>>>>
>>>>> 2016-11-29 22:35 GMT+09:00 Kai Sasaki <sa...@treasure-data.com>:
>>>>> > Currently we have 6 PRs and some of them (especially #285, #336 and #385)
>>>>> > are relatively large.
>>>>> > It might cause somewhat troublesome rebasing.
>>>>>
>>>>> Yes, it's my concern.
>>>>>
>>>>> But, such large PRs should better to be contributed in the Apache
>>>>> Incubation process.
>>>>> I'm considering to invite some of them to the Hivemall committer.
>>>>>
>>>>> Another concern is moving github stars/watchers as seen in [1].
>>>>> [1] https://issues.apache.org/jira/browse/INFRA-12995
>>>>>
>>>>> > Do you think some of them are not ready to be merged? I think merging
>>>>> some
>>>>> > of them before reflogging history
>>>>> > can make migrating work easy. But if they are not ready, it's okay. We
>>>>> can
>>>>> > work on rebasing after this work.
>>>>>
>>>>> I'm currently reviewing #385 but it need to be revised in several parts.
>>>>> Also, #336 requires large refactoring.
>>>>>
>>>>> So, better to do initial code dump first.
>>>>>
>>>>> Shallow copied repository can be pushed from git v1.9 and later
>>>>> (I'm not sure about ASF git version though).
>>>>> http://blogs.atlassian.com/2014/05/handle-big-repositories-git/
>>>>>
>>>>> Thanks,
>>>>> Makoto
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> ---
>>>> Takeshi Yamamuro

Re: Importing History or Not at Initial Code Dump

Posted by Makoto Yui <yu...@gmail.com>.
I'm considering to import https://github.com/myui/incubator-hivemall
to ASF repository tomorrow.
Let me know if it's NOT okey.

Github tag/release issue is my concern though ..
https://lists.apache.org/thread.html/db78e1f8fc121d8e6b016d2f61d06ccafebf9fd30b4ec00883c78557@%3Clegal-discuss.apache.org%3E

I would like to remain the past git tags to keep track of changes.

Thanks,
Makoto

2016-11-30 20:35 GMT+09:00 Makoto Yui <yu...@gmail.com>:
> I'm considering to update the following way because git push does not
> work when performing shallow copy (maybe due to ASF git server
> version/configuration).
>
> You can find the tested repository on https://github.com/myui/incubator-hivemall
>
> $ git clone https://github.com/myui/hivemall.git incubator-hivemall
> $ git filter-branch --index-filter 'git rm -r --cached
> --ignore-unmatch lib/ target/*.jar' --tag-name-filter cat
> --prune-empty -- --all
> $ rm -rf .git/refs/original/
> $ git reflog expire --expire=now --all
> $ git gc --aggressive --prune=now
> $ git remote set-url origin https://github.com/myui/incubator-hivemall.git
> $ git push -f -u origin master
> $ git push origin --tags --force
>
> $ git clone https://github.com/myui/incubator-hivemall.git
> $ cd incubator-hivemall
> $ git_find_big.sh | head -10
>
> All sizes are in kB's. The pack column is the size of the object,
> compressed, inside the pack file.
> size  pack  SHA                                       location
> 1391  1383  b8d432e6a3c0074951abd35caf0a777caf47afbf
> xgboost/lib/xgboost4j_0.60-0.10.jar
> 765   303   11c617713ee2ad3f847aee7627ee8639c5a79667
> core/src/test/resources/hivemall/mf/ml1k.train
> 639   613   de4e32983604238bc72fe3f6cb6beea76fde0e8d
> src/site/resources/images/hivemall_overview_bg.png
> 382   117   8b66187fe067c3aa389ce8c98108f349ceae159c
> src/site/resources/fonts/fontawesome-webfont.svg
> 220   192   04d8605fd8daaafa72a2b6dfa2a2d48c75c57a10
> src/site/resources/images/asf_bg.png
> 194   186   fb29a3d2ee04b7981463de89a77ccc7436f4ad9a
> docs/gitbook/resources/images/techstack.png
> 191   76    e00b1127f6fb4fdcc1606a20b05e16b5456acacc
> core/src/test/resources/hivemall/mf/ml1k.test
> 149   88    f221e50a2ef60738ba30932d834530cdfe55cb3e
> src/site/resources/fonts/fontawesome-webfont.ttf
>
> 2016-11-30 14:31 GMT+09:00 Makoto Yui <yu...@gmail.com>:
>> Hi Takeshi,
>>
>> I was almost to perform the initial code dump (stopped).
>>
>> Be aware almost all commit hash will be changed when rewriting Git logs by [1].
>> [1] git filter-branch --index-filter 'git rm -r --cached
>> --ignore-unmatch lib/ target/*.jar' --prune-empty -- --all
>>
>> So, I'm considering to make a shallow copy limiting 100-300 or so
>> (that does not include large binaries).
>>
>> Thanks,
>> Makoto
>>
>> 2016-11-30 2:44 GMT+09:00 Takeshi Yamamuro <li...@gmail.com>:
>>> Hi, all
>>>
>>> I also have no strong opinion though, it seems it'd be better to keep as
>>> much activities (that is, commit logs) as possible there.
>>> I'm afraid few activity logs possibly make newbies misunderstand that
>>>  hivemall is inactive.
>>>
>>> As for the rebasing, it's not tough to rebase #285 (this is my own pr).
>>> So, rewriting the logs sounds good to me.
>>>
>>> // maropu
>>>
>>> On Tue, Nov 29, 2016 at 11:24 PM, Makoto Yui <yu...@gmail.com> wrote:
>>>
>>>> Kai,
>>>>
>>>> 2016-11-29 22:35 GMT+09:00 Kai Sasaki <sa...@treasure-data.com>:
>>>> > Currently we have 6 PRs and some of them (especially #285, #336 and #385)
>>>> > are relatively large.
>>>> > It might cause somewhat troublesome rebasing.
>>>>
>>>> Yes, it's my concern.
>>>>
>>>> But, such large PRs should better to be contributed in the Apache
>>>> Incubation process.
>>>> I'm considering to invite some of them to the Hivemall committer.
>>>>
>>>> Another concern is moving github stars/watchers as seen in [1].
>>>> [1] https://issues.apache.org/jira/browse/INFRA-12995
>>>>
>>>> > Do you think some of them are not ready to be merged? I think merging
>>>> some
>>>> > of them before reflogging history
>>>> > can make migrating work easy. But if they are not ready, it's okay. We
>>>> can
>>>> > work on rebasing after this work.
>>>>
>>>> I'm currently reviewing #385 but it need to be revised in several parts.
>>>> Also, #336 requires large refactoring.
>>>>
>>>> So, better to do initial code dump first.
>>>>
>>>> Shallow copied repository can be pushed from git v1.9 and later
>>>> (I'm not sure about ASF git version though).
>>>> http://blogs.atlassian.com/2014/05/handle-big-repositories-git/
>>>>
>>>> Thanks,
>>>> Makoto
>>>>
>>>
>>>
>>>
>>> --
>>> ---
>>> Takeshi Yamamuro

Re: Importing History or Not at Initial Code Dump

Posted by Makoto Yui <yu...@gmail.com>.
I'm considering to update the following way because git push does not
work when performing shallow copy (maybe due to ASF git server
version/configuration).

You can find the tested repository on https://github.com/myui/incubator-hivemall

$ git clone https://github.com/myui/hivemall.git incubator-hivemall
$ git filter-branch --index-filter 'git rm -r --cached
--ignore-unmatch lib/ target/*.jar' --tag-name-filter cat
--prune-empty -- --all
$ rm -rf .git/refs/original/
$ git reflog expire --expire=now --all
$ git gc --aggressive --prune=now
$ git remote set-url origin https://github.com/myui/incubator-hivemall.git
$ git push -f -u origin master
$ git push origin --tags --force

$ git clone https://github.com/myui/incubator-hivemall.git
$ cd incubator-hivemall
$ git_find_big.sh | head -10

All sizes are in kB's. The pack column is the size of the object,
compressed, inside the pack file.
size  pack  SHA                                       location
1391  1383  b8d432e6a3c0074951abd35caf0a777caf47afbf
xgboost/lib/xgboost4j_0.60-0.10.jar
765   303   11c617713ee2ad3f847aee7627ee8639c5a79667
core/src/test/resources/hivemall/mf/ml1k.train
639   613   de4e32983604238bc72fe3f6cb6beea76fde0e8d
src/site/resources/images/hivemall_overview_bg.png
382   117   8b66187fe067c3aa389ce8c98108f349ceae159c
src/site/resources/fonts/fontawesome-webfont.svg
220   192   04d8605fd8daaafa72a2b6dfa2a2d48c75c57a10
src/site/resources/images/asf_bg.png
194   186   fb29a3d2ee04b7981463de89a77ccc7436f4ad9a
docs/gitbook/resources/images/techstack.png
191   76    e00b1127f6fb4fdcc1606a20b05e16b5456acacc
core/src/test/resources/hivemall/mf/ml1k.test
149   88    f221e50a2ef60738ba30932d834530cdfe55cb3e
src/site/resources/fonts/fontawesome-webfont.ttf

2016-11-30 14:31 GMT+09:00 Makoto Yui <yu...@gmail.com>:
> Hi Takeshi,
>
> I was almost to perform the initial code dump (stopped).
>
> Be aware almost all commit hash will be changed when rewriting Git logs by [1].
> [1] git filter-branch --index-filter 'git rm -r --cached
> --ignore-unmatch lib/ target/*.jar' --prune-empty -- --all
>
> So, I'm considering to make a shallow copy limiting 100-300 or so
> (that does not include large binaries).
>
> Thanks,
> Makoto
>
> 2016-11-30 2:44 GMT+09:00 Takeshi Yamamuro <li...@gmail.com>:
>> Hi, all
>>
>> I also have no strong opinion though, it seems it'd be better to keep as
>> much activities (that is, commit logs) as possible there.
>> I'm afraid few activity logs possibly make newbies misunderstand that
>>  hivemall is inactive.
>>
>> As for the rebasing, it's not tough to rebase #285 (this is my own pr).
>> So, rewriting the logs sounds good to me.
>>
>> // maropu
>>
>> On Tue, Nov 29, 2016 at 11:24 PM, Makoto Yui <yu...@gmail.com> wrote:
>>
>>> Kai,
>>>
>>> 2016-11-29 22:35 GMT+09:00 Kai Sasaki <sa...@treasure-data.com>:
>>> > Currently we have 6 PRs and some of them (especially #285, #336 and #385)
>>> > are relatively large.
>>> > It might cause somewhat troublesome rebasing.
>>>
>>> Yes, it's my concern.
>>>
>>> But, such large PRs should better to be contributed in the Apache
>>> Incubation process.
>>> I'm considering to invite some of them to the Hivemall committer.
>>>
>>> Another concern is moving github stars/watchers as seen in [1].
>>> [1] https://issues.apache.org/jira/browse/INFRA-12995
>>>
>>> > Do you think some of them are not ready to be merged? I think merging
>>> some
>>> > of them before reflogging history
>>> > can make migrating work easy. But if they are not ready, it's okay. We
>>> can
>>> > work on rebasing after this work.
>>>
>>> I'm currently reviewing #385 but it need to be revised in several parts.
>>> Also, #336 requires large refactoring.
>>>
>>> So, better to do initial code dump first.
>>>
>>> Shallow copied repository can be pushed from git v1.9 and later
>>> (I'm not sure about ASF git version though).
>>> http://blogs.atlassian.com/2014/05/handle-big-repositories-git/
>>>
>>> Thanks,
>>> Makoto
>>>
>>
>>
>>
>> --
>> ---
>> Takeshi Yamamuro

Re: Importing History or Not at Initial Code Dump

Posted by Makoto Yui <yu...@gmail.com>.
Hi Takeshi,

I was almost to perform the initial code dump (stopped).

Be aware almost all commit hash will be changed when rewriting Git logs by [1].
[1] git filter-branch --index-filter 'git rm -r --cached
--ignore-unmatch lib/ target/*.jar' --prune-empty -- --all

So, I'm considering to make a shallow copy limiting 100-300 or so
(that does not include large binaries).

Thanks,
Makoto

2016-11-30 2:44 GMT+09:00 Takeshi Yamamuro <li...@gmail.com>:
> Hi, all
>
> I also have no strong opinion though, it seems it'd be better to keep as
> much activities (that is, commit logs) as possible there.
> I'm afraid few activity logs possibly make newbies misunderstand that
>  hivemall is inactive.
>
> As for the rebasing, it's not tough to rebase #285 (this is my own pr).
> So, rewriting the logs sounds good to me.
>
> // maropu
>
> On Tue, Nov 29, 2016 at 11:24 PM, Makoto Yui <yu...@gmail.com> wrote:
>
>> Kai,
>>
>> 2016-11-29 22:35 GMT+09:00 Kai Sasaki <sa...@treasure-data.com>:
>> > Currently we have 6 PRs and some of them (especially #285, #336 and #385)
>> > are relatively large.
>> > It might cause somewhat troublesome rebasing.
>>
>> Yes, it's my concern.
>>
>> But, such large PRs should better to be contributed in the Apache
>> Incubation process.
>> I'm considering to invite some of them to the Hivemall committer.
>>
>> Another concern is moving github stars/watchers as seen in [1].
>> [1] https://issues.apache.org/jira/browse/INFRA-12995
>>
>> > Do you think some of them are not ready to be merged? I think merging
>> some
>> > of them before reflogging history
>> > can make migrating work easy. But if they are not ready, it's okay. We
>> can
>> > work on rebasing after this work.
>>
>> I'm currently reviewing #385 but it need to be revised in several parts.
>> Also, #336 requires large refactoring.
>>
>> So, better to do initial code dump first.
>>
>> Shallow copied repository can be pushed from git v1.9 and later
>> (I'm not sure about ASF git version though).
>> http://blogs.atlassian.com/2014/05/handle-big-repositories-git/
>>
>> Thanks,
>> Makoto
>>
>
>
>
> --
> ---
> Takeshi Yamamuro

Re: Importing History or Not at Initial Code Dump

Posted by Takeshi Yamamuro <li...@gmail.com>.
Hi, all

I also have no strong opinion though, it seems it'd be better to keep as
much activities (that is, commit logs) as possible there.
I'm afraid few activity logs possibly make newbies misunderstand that
 hivemall is inactive.

As for the rebasing, it's not tough to rebase #285 (this is my own pr).
So, rewriting the logs sounds good to me.

// maropu

On Tue, Nov 29, 2016 at 11:24 PM, Makoto Yui <yu...@gmail.com> wrote:

> Kai,
>
> 2016-11-29 22:35 GMT+09:00 Kai Sasaki <sa...@treasure-data.com>:
> > Currently we have 6 PRs and some of them (especially #285, #336 and #385)
> > are relatively large.
> > It might cause somewhat troublesome rebasing.
>
> Yes, it's my concern.
>
> But, such large PRs should better to be contributed in the Apache
> Incubation process.
> I'm considering to invite some of them to the Hivemall committer.
>
> Another concern is moving github stars/watchers as seen in [1].
> [1] https://issues.apache.org/jira/browse/INFRA-12995
>
> > Do you think some of them are not ready to be merged? I think merging
> some
> > of them before reflogging history
> > can make migrating work easy. But if they are not ready, it's okay. We
> can
> > work on rebasing after this work.
>
> I'm currently reviewing #385 but it need to be revised in several parts.
> Also, #336 requires large refactoring.
>
> So, better to do initial code dump first.
>
> Shallow copied repository can be pushed from git v1.9 and later
> (I'm not sure about ASF git version though).
> http://blogs.atlassian.com/2014/05/handle-big-repositories-git/
>
> Thanks,
> Makoto
>



-- 
---
Takeshi Yamamuro

Re: Importing History or Not at Initial Code Dump

Posted by Makoto Yui <yu...@gmail.com>.
Kai,

2016-11-29 22:35 GMT+09:00 Kai Sasaki <sa...@treasure-data.com>:
> Currently we have 6 PRs and some of them (especially #285, #336 and #385)
> are relatively large.
> It might cause somewhat troublesome rebasing.

Yes, it's my concern.

But, such large PRs should better to be contributed in the Apache
Incubation process.
I'm considering to invite some of them to the Hivemall committer.

Another concern is moving github stars/watchers as seen in [1].
[1] https://issues.apache.org/jira/browse/INFRA-12995

> Do you think some of them are not ready to be merged? I think merging some
> of them before reflogging history
> can make migrating work easy. But if they are not ready, it's okay. We can
> work on rebasing after this work.

I'm currently reviewing #385 but it need to be revised in several parts.
Also, #336 requires large refactoring.

So, better to do initial code dump first.

Shallow copied repository can be pushed from git v1.9 and later
(I'm not sure about ASF git version though).
http://blogs.atlassian.com/2014/05/handle-big-repositories-git/

Thanks,
Makoto

Re: Importing History or Not at Initial Code Dump

Posted by Kai Sasaki <sa...@treasure-data.com>.
Thanks

I have no preference of importing history choice because both have
reasonable pros/cons.
So I agree with you. Shallow copying and reflogging sounds good.

https://github.com/myui/hivemall/pulls

Currently we have 6 PRs and some of them (especially #285, #336 and #385)
are relatively large.
It might cause somewhat troublesome rebasing.

Do you think some of them are not ready to be merged? I think merging some
of them before reflogging history
can make migrating work easy. But if they are not ready, it's okay. We can
work on rebasing after this work.

Kai

On Tue, Nov 29, 2016 at 9:53 PM, Makoto Yui <my...@apache.org> wrote:

> Hi,
>
> At performing the initial code dump [1], the choice of importing
> history or not is left to the community.
> [1] http://incubator.apache.org/guides/mentor.html#initial-
> import-code-dump
>
> I'm considering to import from the depth 1 shallow copy of master
> branch because cloning Hivemall repository takes long to clone due to
> large binary files were imported in the past.
>
> Thoughts? > Takeshi, Kai
>
> $ git_find_big.sh
> (downloaded from
> https://confluence.atlassian.com/bitbucket/maintaining-a-
> git-repository-321848291.html
> )
>
> All sizes are in kB's. The pack column is the size of the object,
> compressed, inside the pack file.
> size   pack   SHA                                       location
> 14705  13419  2024b5df95e5972b16e5da6b063f4f1e65e96421
> target/hivemall-fat.jar
> 13761  12515  84dbfe3fee95557342446fb3a4a9aee9f892dc37
> target/hivemall-fat.jar
> 8898   8064   4bca62df38c5c506dc47627a249dce2fb4096f1b
> lib/hive/hive-exec-0.12.0.jar
> 8348   7935   d2a3efab63b5a21ebf0a665b3103cdec25bbd367
> target/hivemall-nlp-with-dependencies.jar
> 6109   5558   b3890a58ebc4457f6592f02c76ac147d9a8f961e
> lib/hive-exec-0.11.0.jar
> 4490   4472   9b01e9abea6a3636a0ade1cf4a889e83b177e32b
> lib/lucene-analyzers-kuromoji-5.3.1.jar
> 3778   3508   32da99d5caad1fd7d199fa41acbe46af7e078603
> lib/hadoop-core-0.20.2-cdh3u6.jar
> 3447   3122   d3a3f74edcf5455eb3cf480319296e2db8eb7574
> lib/hive/hive-exec-0.9.0.jar
> 2301   2095   9ffa9173b103500ffe1d28321d08ddb5a8ed6df8
> lib/lucene-core-5.3.1.jar
> 2042   1862   28740e444d5071d3d03027a33e38bd3e69992fb2
> target/hivemall-with-dependencies.jar
> 1766   1677   103b588e15f6b7b44368a216cb4c4ed4105f727b
> lib/source/lucene-core-5.3.1-sources.jar
> 1526   1373   a8713840cca091fc21a54f75dad8260ed2d810bd
> lib/lucene-analyzers-common-5.3.1.jar
> 1493   1340   4a87ce9173e27913c69cd06f6fa300e40471e842
> target/hivemall-fat.jar
> 1490   1395   a0aab7c42b1f7a7d1ddfff64eef22540b6a00dd6
> lib/source/lucene-analyzers-common-5.3.1-sources.jar
> 1425   1305   5f109a2bdf6b8d75a4488cd97d5f03f51c37f946
> target/hivemall-mixserv.jar
> 1409   1300   b04c08cf7c63229f2ca5f31574888bb00ba86790
> lib/source/netty-all-4.0.23.Final-sources.jar
> 1391   1383   b8d432e6a3c0074951abd35caf0a777caf47afbf
> xgboost/lib/xgboost4j_0.60-0.10.jar
> 1359   857    4e8fb11de168b0425de9755f2cfa0b0a4b4eefd2
> target/hivemall-all.jar
> 1356   1212   5d28e1dd9e411a26fe6437c1c77e81ad87325370
> target/hivemall-fat.jar
> 1331   1258   89db746fcb20be1e13a23c79a7f5334533e1ad22
> target/hivemall-with-dependencies.jar
> 1265   1219   7482e31f85c6605de15dba63175a110f51c03de6
> lib/deprecated/hive-exec-0.8.1.jar
> 1205   1051   c831489cd99ab87d95dd7a11f153ab318c5c0e6c
> lib/optional/mockito-all-1.10.19.jar
> 1198   1130   ced3a5d79beedfc5ff237f901b953a09b963b9f0
> target/hivemall-with-dependencies.jar
> 1190   1016   695078e93df73a2d994ef98ec27be4a6207d0706
> lib/optional/guava-r09-jarjar.jar
> 1146   1024   1b4275262689be192ffc1e8f596eb19b44a0d6a3
> target/hivemall-fat.jar
> ...
>
> We can rewrite commit history as follows but it requires existing pull
> requests to be rebased.
>
> $ git filter-branch --index-filter 'git rm -r --cached
> --ignore-unmatch lib/ target/*.jar' --prune-empty -- --all
> $ rm -rf .git/refs/original/
> $ git reflog expire --expire=now --all
> $ git gc --aggressive --prune=now
>
> Also, I'm asking ASF INFRA team about the possibility to transfer
> Hivemall github repository to ASF account in
> https://issues.apache.org/jira/browse/INFRA-12995
>
> Thanks,
> Makoto
>