You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by Mihai Soloi <mi...@gmail.com> on 2012/07/06 19:49:58 UTC

Hbase Lucene Index(HBaluin) for James Mailbox Feedback

Hello everybody,

This is just a report to what I have been up to lately.

I tried to run the tests they have in Lucene's trunk working with the 
HBaseDirectory implementation, there are 1700+ tests and a lot of them 
are failing at the moment. I had a lot of troubles figuring out how to 
run them and also include my implementation in them. Now you can easily 
run them by following the README.txt file in the project. Running their 
tests have helped me in tracking down issues with my code.

Another project I've spent some time on is HBase, trying to make the 
HBASE-3529 patch working. I had to rewrite some of the code but their 
HDFSDirectoryTest is now passing, and am currently at work at getting 
the CoprocessorsTests pass as well. I will upload the code on my github.

Have a nice weekend,
Mihai

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Hbase Lucene Index(HBaluin) for James Mailbox Feedback

Posted by Eric Charles <er...@apache.org>.
On 07/17/2012 11:43 PM, Mihai Soloi wrote:
> On 17.07.2012 13:45, Eric Charles wrote:
>> Hi Mihai,
>>
>> Not sure, but I think the test infinitely loop because zookeeper can't
>> be reached.. (not sure, will check it).
>
> I will check it too.

Looking at your code, I just realize the tests assume an external hbase 
cluster to be running. I was assuming the tests were kicking a 
minihbasecluster (I think you tried based on the commented the code) - 
Don't hesitate to ping here for any help on this :)

>>
>> You are mentioning the Lucene ant testing setup. Have you a link on
>> how to achieve this? I know the classical unit tests, but I don't see
>> how you can run those tests against any other arbitrary lucene impl?
>>
> It's all described in the README.txt of the project. Does it not work?

Oh, I just now see your README. That's very well described and it sound 
logical. Will try it - I'm a bit fan of good README, so shame on me to 
not have looked at it :).

>> ...and yes, there are definitively a bunch of good ideas and toughs to
>> take from the previous discussions around search and hbase :)
>
> I will focus on getting the much better version of Jason's working and
> seing if it can be applied in the mailbox case.
>

I think Jason's approach was a bit different from the JSON approach you 
have taken based on the InfoQ articles.

Just my 2-cents, but take care to define a project scope and focus on 
it. I could throw you a few other implementation attempts such as 
http://bizosyshsearch.sourceforge.net/,... and you will have some work 
for the next year(s).

For now, A Ioan asked, I think the existing code must be polished (at 
least on the unit tests levels).

Thx again, Eric

> Thank you,
> Mihai
>
>> Thx, Eric
>>
>> On 07/16/2012 09:25 PM, Mihai Soloi wrote:
>>> Hi Eric,
>>>
>>> Thank you for reviewing my code, please see the responses inline.
>>> This past
>>> week I've been working on improving my knowledge of Hadoop, HBase and
>>> Lucene, on github project HadoopTDG.
>>>
>>> On Mon, Jul 16, 2012 at 6:03 PM, Eric Charles <er...@apache.org> wrote:
>>>
>>>> Hi Mihai,
>>>>
>>>> Thx for the update.
>>>> I have cloned your repo and run some tests (ubuntu, jdk7). I have
>>>> stopped
>>>> the tests because it was taking long (recurring issue with hadoop/hbase
>>>> clusters - maybe an area to improve not too late, otherwise test will
>>>> become unusable).
>>>>
>>>> See other comments inline,
>>>>
>>>> Eric
>>>>
>>>>
>>>> On 07/06/2012 07:49 PM, Mihai Soloi wrote:
>>>>
>>>>> Hello everybody,
>>>>>
>>>>> This is just a report to what I have been up to lately.
>>>>>
>>>>> I tried to run the tests they have in Lucene's trunk working with the
>>>>> HBaseDirectory implementation, there are 1700+ tests and a lot of them
>>>>> are failing at the moment. I had a lot of troubles figuring out how to
>>>>> run them and also include my implementation in them. Now you can
>>>>> easily
>>>>> run them by following the README.txt file in the project. Running
>>>>> their
>>>>> tests have helped me in tracking down issues with my code.
>>>>>
>>>>>
>>>> You mean you run the lucene src test with the HBaseDirectory impl?
>>>> How did you do it? (just curious)
>>>>
>>>
>>> Yes I just packaged the HBaluin project with all dependencies and ran it
>>> using the Lucene ant testing setup.
>>>
>>>>
>>>> Btw, There is also a perf test project in lucene that could be used to
>>>> assess the HBaluin performance.
>>>
>>>
>>> Thank you, I am aware of that but the benchmarks don't start if all the
>>> tests don't pass, so it's still a work in project.
>>>
>>>>
>>>>
>>>>   Another project I've spent some time on is HBase, trying to make the
>>>>> HBASE-3529 patch working. I had to rewrite some of the code but their
>>>>> HDFSDirectoryTest is now passing, and am currently at work at getting
>>>>> the CoprocessorsTests pass as well. I will upload the code on my
>>>>> github.
>>>>>
>>>>>
>>>> Is this in the apache-extra git repo?
>>>>
>>>
>>> No, no, I am using my github repo  for that, I've come in contact with
>>> Jason Rutherglen, asking more details about his implementation of the
>>> HBASE-3529 and also of the modified HDFSDirectory, and he replied
>>> detailing
>>> his problems with the project. Also there is a detailed copy of his
>>> discussion with Ted Dunning on the implementation of the project in [1]
>>>
>>>>
>>>>   Have a nice weekend,
>>>>> Mihai
>>>>>
>>>>> ------------------------------**------------------------------**---------
>>>>>
>>>>> To unsubscribe, e-mail:
>>>>> server-dev-unsubscribe@james.**apache.org<se...@james.apache.org>
>>>>>
>>>>> For additional commands, e-mail:
>>>>> server-dev-help@james.apache.**org<se...@james.apache.org>
>>>>>
>>>>>
>>>> --
>>>> eric | http://about.echarles.net | @echarles
>>>>
>>>> ------------------------------**------------------------------**---------
>>>>
>>>> To unsubscribe, e-mail:
>>>> server-dev-unsubscribe@james.**apache.org<se...@james.apache.org>
>>>>
>>>> For additional commands, e-mail:
>>>> server-dev-help@james.apache.**org<se...@james.apache.org>
>>>>
>>>>
>>>
>>> [1]
>>> http://mail-archives.apache.org/mod_mbox/hbase-user/201102.mbox/%3CAANLkTinZwuyBea=bM2EP0A2hEBDBuRCwutowmOqWWZ5D@mail.gmail.com%3E
>>>
>>>
>>
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
> For additional commands, e-mail: server-dev-help@james.apache.org
>


-- 
eric | http://about.echarles.net | @echarles


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Hbase Lucene Index(HBaluin) for James Mailbox Feedback

Posted by Mihai Soloi <mi...@gmail.com>.
On 17.07.2012 13:45, Eric Charles wrote:
> Hi Mihai,
>
> Not sure, but I think the test infinitely loop because zookeeper can't 
> be reached.. (not sure, will check it).

I will check it too.
>
> You are mentioning the Lucene ant testing setup. Have you a link on 
> how to achieve this? I know the classical unit tests, but I don't see 
> how you can run those tests against any other arbitrary lucene impl?
>
It's all described in the README.txt of the project. Does it not work?
> ...and yes, there are definitively a bunch of good ideas and toughs to 
> take from the previous discussions around search and hbase :)

I will focus on getting the much better version of Jason's working and 
seing if it can be applied in the mailbox case.

Thank you,
Mihai
>
> Thx, Eric
>
> On 07/16/2012 09:25 PM, Mihai Soloi wrote:
>> Hi Eric,
>>
>> Thank you for reviewing my code, please see the responses inline. 
>> This past
>> week I've been working on improving my knowledge of Hadoop, HBase and
>> Lucene, on github project HadoopTDG.
>>
>> On Mon, Jul 16, 2012 at 6:03 PM, Eric Charles <er...@apache.org> wrote:
>>
>>> Hi Mihai,
>>>
>>> Thx for the update.
>>> I have cloned your repo and run some tests (ubuntu, jdk7). I have 
>>> stopped
>>> the tests because it was taking long (recurring issue with hadoop/hbase
>>> clusters - maybe an area to improve not too late, otherwise test will
>>> become unusable).
>>>
>>> See other comments inline,
>>>
>>> Eric
>>>
>>>
>>> On 07/06/2012 07:49 PM, Mihai Soloi wrote:
>>>
>>>> Hello everybody,
>>>>
>>>> This is just a report to what I have been up to lately.
>>>>
>>>> I tried to run the tests they have in Lucene's trunk working with the
>>>> HBaseDirectory implementation, there are 1700+ tests and a lot of them
>>>> are failing at the moment. I had a lot of troubles figuring out how to
>>>> run them and also include my implementation in them. Now you can 
>>>> easily
>>>> run them by following the README.txt file in the project. Running 
>>>> their
>>>> tests have helped me in tracking down issues with my code.
>>>>
>>>>
>>> You mean you run the lucene src test with the HBaseDirectory impl?
>>> How did you do it? (just curious)
>>>
>>
>> Yes I just packaged the HBaluin project with all dependencies and ran it
>> using the Lucene ant testing setup.
>>
>>>
>>> Btw, There is also a perf test project in lucene that could be used to
>>> assess the HBaluin performance.
>>
>>
>> Thank you, I am aware of that but the benchmarks don't start if all the
>> tests don't pass, so it's still a work in project.
>>
>>>
>>>
>>>   Another project I've spent some time on is HBase, trying to make the
>>>> HBASE-3529 patch working. I had to rewrite some of the code but their
>>>> HDFSDirectoryTest is now passing, and am currently at work at getting
>>>> the CoprocessorsTests pass as well. I will upload the code on my 
>>>> github.
>>>>
>>>>
>>> Is this in the apache-extra git repo?
>>>
>>
>> No, no, I am using my github repo  for that, I've come in contact with
>> Jason Rutherglen, asking more details about his implementation of the
>> HBASE-3529 and also of the modified HDFSDirectory, and he replied 
>> detailing
>> his problems with the project. Also there is a detailed copy of his
>> discussion with Ted Dunning on the implementation of the project in [1]
>>
>>>
>>>   Have a nice weekend,
>>>> Mihai
>>>>
>>>> ------------------------------**------------------------------**--------- 
>>>>
>>>> To unsubscribe, e-mail: 
>>>> server-dev-unsubscribe@james.**apache.org<se...@james.apache.org>
>>>> For additional commands, e-mail: 
>>>> server-dev-help@james.apache.**org<se...@james.apache.org>
>>>>
>>>>
>>> -- 
>>> eric | http://about.echarles.net | @echarles
>>>
>>> ------------------------------**------------------------------**--------- 
>>>
>>> To unsubscribe, e-mail: 
>>> server-dev-unsubscribe@james.**apache.org<se...@james.apache.org>
>>> For additional commands, e-mail: 
>>> server-dev-help@james.apache.**org<se...@james.apache.org>
>>>
>>>
>>
>> [1]
>> http://mail-archives.apache.org/mod_mbox/hbase-user/201102.mbox/%3CAANLkTinZwuyBea=bM2EP0A2hEBDBuRCwutowmOqWWZ5D@mail.gmail.com%3E 
>>
>>
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Hbase Lucene Index(HBaluin) for James Mailbox Feedback

Posted by Eric Charles <er...@apache.org>.
Hi Mihai,

Not sure, but I think the test infinitely loop because zookeeper can't 
be reached.. (not sure, will check it).

You are mentioning the Lucene ant testing setup. Have you a link on how 
to achieve this? I know the classical unit tests, but I don't see how 
you can run those tests against any other arbitrary lucene impl?

...and yes, there are definitively a bunch of good ideas and toughs to 
take from the previous discussions around search and hbase :)

Thx, Eric

On 07/16/2012 09:25 PM, Mihai Soloi wrote:
> Hi Eric,
>
> Thank you for reviewing my code, please see the responses inline. This past
> week I've been working on improving my knowledge of Hadoop, HBase and
> Lucene, on github project HadoopTDG.
>
> On Mon, Jul 16, 2012 at 6:03 PM, Eric Charles <er...@apache.org> wrote:
>
>> Hi Mihai,
>>
>> Thx for the update.
>> I have cloned your repo and run some tests (ubuntu, jdk7). I have stopped
>> the tests because it was taking long (recurring issue with hadoop/hbase
>> clusters - maybe an area to improve not too late, otherwise test will
>> become unusable).
>>
>> See other comments inline,
>>
>> Eric
>>
>>
>> On 07/06/2012 07:49 PM, Mihai Soloi wrote:
>>
>>> Hello everybody,
>>>
>>> This is just a report to what I have been up to lately.
>>>
>>> I tried to run the tests they have in Lucene's trunk working with the
>>> HBaseDirectory implementation, there are 1700+ tests and a lot of them
>>> are failing at the moment. I had a lot of troubles figuring out how to
>>> run them and also include my implementation in them. Now you can easily
>>> run them by following the README.txt file in the project. Running their
>>> tests have helped me in tracking down issues with my code.
>>>
>>>
>> You mean you run the lucene src test with the HBaseDirectory impl?
>> How did you do it? (just curious)
>>
>
> Yes I just packaged the HBaluin project with all dependencies and ran it
> using the Lucene ant testing setup.
>
>>
>> Btw, There is also a perf test project in lucene that could be used to
>> assess the HBaluin performance.
>
>
> Thank you, I am aware of that but the benchmarks don't start if all the
> tests don't pass, so it's still a work in project.
>
>>
>>
>>   Another project I've spent some time on is HBase, trying to make the
>>> HBASE-3529 patch working. I had to rewrite some of the code but their
>>> HDFSDirectoryTest is now passing, and am currently at work at getting
>>> the CoprocessorsTests pass as well. I will upload the code on my github.
>>>
>>>
>> Is this in the apache-extra git repo?
>>
>
> No, no, I am using my github repo  for that, I've come in contact with
> Jason Rutherglen, asking more details about his implementation of the
> HBASE-3529 and also of the modified HDFSDirectory, and he replied detailing
> his problems with the project. Also there is a detailed copy of his
> discussion with Ted Dunning on the implementation of the project in [1]
>
>>
>>   Have a nice weekend,
>>> Mihai
>>>
>>> ------------------------------**------------------------------**---------
>>> To unsubscribe, e-mail: server-dev-unsubscribe@james.**apache.org<se...@james.apache.org>
>>> For additional commands, e-mail: server-dev-help@james.apache.**org<se...@james.apache.org>
>>>
>>>
>> --
>> eric | http://about.echarles.net | @echarles
>>
>> ------------------------------**------------------------------**---------
>> To unsubscribe, e-mail: server-dev-unsubscribe@james.**apache.org<se...@james.apache.org>
>> For additional commands, e-mail: server-dev-help@james.apache.**org<se...@james.apache.org>
>>
>>
>
> [1]
> http://mail-archives.apache.org/mod_mbox/hbase-user/201102.mbox/%3CAANLkTinZwuyBea=bM2EP0A2hEBDBuRCwutowmOqWWZ5D@mail.gmail.com%3E
>


-- 
eric | http://about.echarles.net | @echarles


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Hbase Lucene Index(HBaluin) for James Mailbox Feedback

Posted by Mihai Soloi <mi...@gmail.com>.
Hi Eric,

Thank you for reviewing my code, please see the responses inline. This past
week I've been working on improving my knowledge of Hadoop, HBase and
Lucene, on github project HadoopTDG.

On Mon, Jul 16, 2012 at 6:03 PM, Eric Charles <er...@apache.org> wrote:

> Hi Mihai,
>
> Thx for the update.
> I have cloned your repo and run some tests (ubuntu, jdk7). I have stopped
> the tests because it was taking long (recurring issue with hadoop/hbase
> clusters - maybe an area to improve not too late, otherwise test will
> become unusable).
>
> See other comments inline,
>
> Eric
>
>
> On 07/06/2012 07:49 PM, Mihai Soloi wrote:
>
>> Hello everybody,
>>
>> This is just a report to what I have been up to lately.
>>
>> I tried to run the tests they have in Lucene's trunk working with the
>> HBaseDirectory implementation, there are 1700+ tests and a lot of them
>> are failing at the moment. I had a lot of troubles figuring out how to
>> run them and also include my implementation in them. Now you can easily
>> run them by following the README.txt file in the project. Running their
>> tests have helped me in tracking down issues with my code.
>>
>>
> You mean you run the lucene src test with the HBaseDirectory impl?
> How did you do it? (just curious)
>

Yes I just packaged the HBaluin project with all dependencies and ran it
using the Lucene ant testing setup.

>
> Btw, There is also a perf test project in lucene that could be used to
> assess the HBaluin performance.


Thank you, I am aware of that but the benchmarks don't start if all the
tests don't pass, so it's still a work in project.

>
>
>  Another project I've spent some time on is HBase, trying to make the
>> HBASE-3529 patch working. I had to rewrite some of the code but their
>> HDFSDirectoryTest is now passing, and am currently at work at getting
>> the CoprocessorsTests pass as well. I will upload the code on my github.
>>
>>
> Is this in the apache-extra git repo?
>

No, no, I am using my github repo  for that, I've come in contact with
Jason Rutherglen, asking more details about his implementation of the
HBASE-3529 and also of the modified HDFSDirectory, and he replied detailing
his problems with the project. Also there is a detailed copy of his
discussion with Ted Dunning on the implementation of the project in [1]

>
>  Have a nice weekend,
>> Mihai
>>
>> ------------------------------**------------------------------**---------
>> To unsubscribe, e-mail: server-dev-unsubscribe@james.**apache.org<se...@james.apache.org>
>> For additional commands, e-mail: server-dev-help@james.apache.**org<se...@james.apache.org>
>>
>>
> --
> eric | http://about.echarles.net | @echarles
>
> ------------------------------**------------------------------**---------
> To unsubscribe, e-mail: server-dev-unsubscribe@james.**apache.org<se...@james.apache.org>
> For additional commands, e-mail: server-dev-help@james.apache.**org<se...@james.apache.org>
>
>

[1]
http://mail-archives.apache.org/mod_mbox/hbase-user/201102.mbox/%3CAANLkTinZwuyBea=bM2EP0A2hEBDBuRCwutowmOqWWZ5D@mail.gmail.com%3E

Re: Hbase Lucene Index(HBaluin) for James Mailbox Feedback

Posted by Eric Charles <er...@apache.org>.
Hi Mihai,

Thx for the update.
I have cloned your repo and run some tests (ubuntu, jdk7). I have 
stopped the tests because it was taking long (recurring issue with 
hadoop/hbase clusters - maybe an area to improve not too late, otherwise 
test will become unusable).

See other comments inline,

Eric

On 07/06/2012 07:49 PM, Mihai Soloi wrote:
> Hello everybody,
>
> This is just a report to what I have been up to lately.
>
> I tried to run the tests they have in Lucene's trunk working with the
> HBaseDirectory implementation, there are 1700+ tests and a lot of them
> are failing at the moment. I had a lot of troubles figuring out how to
> run them and also include my implementation in them. Now you can easily
> run them by following the README.txt file in the project. Running their
> tests have helped me in tracking down issues with my code.
>

You mean you run the lucene src test with the HBaseDirectory impl?
How did you do it? (just curious)

Btw, There is also a perf test project in lucene that could be used to 
assess the HBaluin performance.

> Another project I've spent some time on is HBase, trying to make the
> HBASE-3529 patch working. I had to rewrite some of the code but their
> HDFSDirectoryTest is now passing, and am currently at work at getting
> the CoprocessorsTests pass as well. I will upload the code on my github.
>

Is this in the apache-extra git repo?

> Have a nice weekend,
> Mihai
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
> For additional commands, e-mail: server-dev-help@james.apache.org
>

-- 
eric | http://about.echarles.net | @echarles

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org