You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by "shubhendu.singh" <sh...@bizosys.com> on 2014/04/16 12:33:12 UTC

Releasing HSearch 1.0 - Search and Analytics Engine on hadoop/hbase

After our first release of *HSearch* (announced in this forum) back in
December 2010, we have been working at it incorporating customer feedback
from their production deployments. 
Recently, we included the capability to store and analyze structured data in
addition to unstructured data. 
With these changes, we are now releasing a new version naming it 1.0. 

The software is available on github at 
https://github.com/bizosys?tab=repositories
<https://github.com/bizosys?tab=repositories>  

Some key features of HSearch
Fast query on large datasets – queries typically return in milliseconds on
terabytes of data.
Multiple data structures are used for storing the data depending on the
nature of data
LRU cache layer for frequently accessed data
5MB index cells to co-locate data by business entities and secondary rollup
indexes for fast filtering on large datasets 

Check out HSearch at  hadoopsearch.net <http://hadoopsearch.net>   , do
download and try it out. 
Let us know your feedback and questions at  hsearch@bizosys.com
<ma...@bizosys.com>   
 
Regards 
Shubhendu Shekhar Singh 
HSearch Committer 




--
View this message in context: http://apache-hbase.679495.n3.nabble.com/Releasing-HSearch-1-0-Search-and-Analytics-Engine-on-hadoop-hbase-tp4058295.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Releasing HSearch 1.0 - Search and Analytics Engine on hadoop/hbase

Posted by "shubhendu.singh" <sh...@bizosys.com>.
Hi Sai Pavan Gadde,

The download link and getting started guide is here 
http://www.hadoopsearch.net/hsearch.html
<http://www.hadoopsearch.net/hsearch.html>  ..

Regards,
Shubhendu





--
View this message in context: http://apache-hbase.679495.n3.nabble.com/Releasing-HSearch-1-0-Search-and-Analytics-Engine-on-hadoop-hbase-tp4058295p4058297.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Releasing HSearch 1.0 - Search and Analytics Engine on hadoop/hbase

Posted by Sai Pavan Gadde <id...@gmail.com>.
Hi Shubhendu Shekhar Singh,

Could you please provide me exactly the  download
link to download HSearch. I would like to use and explore the thing


On Wed, Apr 16, 2014 at 4:03 PM, shubhendu.singh <
shubhendu.singh@bizosys.com> wrote:

> After our first release of *HSearch* (announced in this forum) back in
> December 2010, we have been working at it incorporating customer feedback
> from their production deployments.
> Recently, we included the capability to store and analyze structured data
> in
> addition to unstructured data.
> With these changes, we are now releasing a new version naming it 1.0.
>
> The software is available on github at
> https://github.com/bizosys?tab=repositories
> <https://github.com/bizosys?tab=repositories>
>
> Some key features of HSearch
> Fast query on large datasets – queries typically return in milliseconds on
> terabytes of data.
> Multiple data structures are used for storing the data depending on the
> nature of data
> LRU cache layer for frequently accessed data
> 5MB index cells to co-locate data by business entities and secondary rollup
> indexes for fast filtering on large datasets
>
> Check out HSearch at  hadoopsearch.net <http://hadoopsearch.net>   , do
> download and try it out.
> Let us know your feedback and questions at  hsearch@bizosys.com
> <ma...@bizosys.com>
>
> Regards
> Shubhendu Shekhar Singh
> HSearch Committer
>
>
>
>
> --
> View this message in context:
> http://apache-hbase.679495.n3.nabble.com/Releasing-HSearch-1-0-Search-and-Analytics-Engine-on-hadoop-hbase-tp4058295.html
> Sent from the HBase User mailing list archive at Nabble.com.




-- 
*THANKS & REGARDS,*
G.SAI PAVAN,
CCDH4 CERTIFIED,
Ph: 8121914494,
*www.bigdatatrendz.com <http://www.bigdatatrendz.com>*
linkedin profile <http://in.linkedin.com/pub/gadde-sai-pavan/38/44b/453/>
HYDERABAD.

Re: Releasing HSearch 1.0 - Search and Analytics Engine on hadoop/hbase

Posted by Ted Yu <yu...@gmail.com>.
There is overlap, in terms of hbase version, in the hadooplib_XX list.

If possible, it would be better to separate hadooplib_ into two categories
of modules: one for hadoop and one for hbase.
This way, user can combine one from each category to fit the actual distro
they're using.

Cheers


On Fri, Apr 18, 2014 at 6:59 AM, shubhendu.singh <
shubhendu.singh@bizosys.com> wrote:

> Yes you are correct.
>
> hadooplib_12 is based on hdp 1.3 distrbution (hadoop-1.2.0 and
> hbase-0.94.6.1)
> hadooplib_94 is based on cdh 4.5 distrbution (hadoop-2.0.0 and
> hbase-0.94.6)
> hadooplib_96 is based on hdp 2.0.6 distrbution (hadoop-2.2.0 and
> hbase-0.96.0)
>
>
> Thanks,
> Shubhendu
>
>
>
>
> --
> View this message in context:
> http://apache-hbase.679495.n3.nabble.com/Releasing-HSearch-1-0-Search-and-Analytics-Engine-on-hadoop-hbase-tp4058295p4058386.html
> Sent from the HBase User mailing list archive at Nabble.com.
>

Re: Releasing HSearch 1.0 - Search and Analytics Engine on hadoop/hbase

Posted by "shubhendu.singh" <sh...@bizosys.com>.
Yes you are correct.

hadooplib_12 is based on hdp 1.3 distrbution (hadoop-1.2.0 and
hbase-0.94.6.1)
hadooplib_94 is based on cdh 4.5 distrbution (hadoop-2.0.0 and hbase-0.94.6)
hadooplib_96 is based on hdp 2.0.6 distrbution (hadoop-2.2.0 and
hbase-0.96.0)


Thanks,
Shubhendu




--
View this message in context: http://apache-hbase.679495.n3.nabble.com/Releasing-HSearch-1-0-Search-and-Analytics-Engine-on-hadoop-hbase-tp4058295p4058386.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Releasing HSearch 1.0 - Search and Analytics Engine on hadoop/hbase

Posted by Ted Yu <yu...@gmail.com>.
In several Filters under src/compatibility/hadooplib_12, I see following
method (src/compatibility/hadooplib_12/storage/HSearchBytesFilter.java
e.g.):

  @Override
  public final void filterRow(final List<KeyValue> kvL) {

I guess hadooplib_12 is used with HBase 0.94

Cheers

On Thu, Apr 17, 2014 at 4:00 AM, shubhendu.singh <
shubhendu.singh@bizosys.com> wrote:

> A filter using filterRow() to filter out an entire row, or filterRow(List)
> to
> modify the
> final list of included values, must also override the hasRowFilter()
> function to return true.
>
> The framework is using this flag to ensure that a given filter is
> compatible
> with the
> selected scan parameters. In particular, these filter methods collide with
> the scanner’s
> batch mode: when the scanner is using batches to ship partial rows to the
> client, the
> previous methods are not called for every batch, but only at the actual end
> of the current
> row.
>
> InHbase 96 the filterRow(List) method is deprecated.
> Deprecated.
> WARNING: please to not override this method. Instead override
> filterRowCells(List). This is for transition from 0.94 -> 0.96
>
> In HSearchScalarFilter filterRowCells(List<Cell> cellL) is used that is why
> hasFilterRow is returning true.
>
> Thanks,
> Shubhendu
>
>
>
>
> --
> View this message in context:
> http://apache-hbase.679495.n3.nabble.com/Releasing-HSearch-1-0-Search-and-Analytics-Engine-on-hadoop-hbase-tp4058295p4058346.html
> Sent from the HBase User mailing list archive at Nabble.com.
>

Re: Releasing HSearch 1.0 - Search and Analytics Engine on hadoop/hbase

Posted by "shubhendu.singh" <sh...@bizosys.com>.
As of now we take flat file (csv,tsv etc.) loaded on hadoop to index data or
data loaded in a hbase table.

Thanks,
Shubhendu



--
View this message in context: http://apache-hbase.679495.n3.nabble.com/Releasing-HSearch-1-0-Search-and-Analytics-Engine-on-hadoop-hbase-tp4058295p4058385.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Releasing HSearch 1.0 - Search and Analytics Engine on hadoop/hbase

Posted by Flavio Pompermaier <po...@okkam.it>.
Is Hsearch able to index also json fields?that would be awesome :)
On Apr 17, 2014 1:01 PM, "shubhendu.singh" <sh...@bizosys.com>
wrote:

> A filter using filterRow() to filter out an entire row, or filterRow(List)
> to
> modify the
> final list of included values, must also override the hasRowFilter()
> function to return true.
>
> The framework is using this flag to ensure that a given filter is
> compatible
> with the
> selected scan parameters. In particular, these filter methods collide with
> the scanner’s
> batch mode: when the scanner is using batches to ship partial rows to the
> client, the
> previous methods are not called for every batch, but only at the actual end
> of the current
> row.
>
> InHbase 96 the filterRow(List) method is deprecated.
> Deprecated.
> WARNING: please to not override this method. Instead override
> filterRowCells(List). This is for transition from 0.94 -> 0.96
>
> In HSearchScalarFilter filterRowCells(List<Cell> cellL) is used that is why
> hasFilterRow is returning true.
>
> Thanks,
> Shubhendu
>
>
>
>
> --
> View this message in context:
> http://apache-hbase.679495.n3.nabble.com/Releasing-HSearch-1-0-Search-and-Analytics-Engine-on-hadoop-hbase-tp4058295p4058346.html
> Sent from the HBase User mailing list archive at Nabble.com.
>

Re: Releasing HSearch 1.0 - Search and Analytics Engine on hadoop/hbase

Posted by "shubhendu.singh" <sh...@bizosys.com>.
A filter using filterRow() to filter out an entire row, or filterRow(List) to
modify the
final list of included values, must also override the hasRowFilter()
function to return true.

The framework is using this flag to ensure that a given filter is compatible
with the
selected scan parameters. In particular, these filter methods collide with
the scanner’s
batch mode: when the scanner is using batches to ship partial rows to the
client, the
previous methods are not called for every batch, but only at the actual end
of the current
row.

InHbase 96 the filterRow(List) method is deprecated.
Deprecated.  
WARNING: please to not override this method. Instead override
filterRowCells(List). This is for transition from 0.94 -> 0.96 

In HSearchScalarFilter filterRowCells(List<Cell> cellL) is used that is why
hasFilterRow is returning true.

Thanks,
Shubhendu




--
View this message in context: http://apache-hbase.679495.n3.nabble.com/Releasing-HSearch-1-0-Search-and-Analytics-Engine-on-hadoop-hbase-tp4058295p4058346.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Releasing HSearch 1.0 - Search and Analytics Engine on hadoop/hbase

Posted by Ted Yu <yu...@gmail.com>.
Looking at some of the Filter classes,
e.g. src/compatibility/hadooplib_96/storage/HSearchScalarFilter.java, I saw:
  public final boolean hasFilterRow() {
    return true;
  }
However, there is no filterRow() defined in the Filter.

See javadoc for hasFilterRow():

  /**
   * Primarily used to check for conflicts with scans(such as scans
that do not read a full row at a
   * time).
   *
   * @return True if this filter actively uses filterRow(List) or filterRow().
   */
  abstract public boolean hasFilterRow();


Is there something I missed ?


On Wed, Apr 16, 2014 at 8:32 AM, Ted Yu <yu...@gmail.com> wrote:

> I looked at hsearch-core where I found:
>
> $ ls src/compatibility/hadooplib_
> hadooplib_12/ hadooplib_94/ hadooplib_96/
>
> From a brief look, hadooplib_94/hbase provides wrapper for hbase 0.94 and
> hadooplib_96/hbase provides wrapper for 0.96
> Similarly I found some Coprocessors under
> src/compatibility/hadooplib_94/storage/
> and src/compatibility/hadooplib_96/storage/.
>
> It seems hbaselib_94 and hbaselib_96 would be names more reflective of
> what the classes do.
>
> Cheers
>
>
> On Wed, Apr 16, 2014 at 3:33 AM, shubhendu.singh <
> shubhendu.singh@bizosys.com> wrote:
>
>> After our first release of *HSearch* (announced in this forum) back in
>> December 2010, we have been working at it incorporating customer feedback
>> from their production deployments.
>> Recently, we included the capability to store and analyze structured data
>> in
>> addition to unstructured data.
>> With these changes, we are now releasing a new version naming it 1.0.
>>
>> The software is available on github at
>> https://github.com/bizosys?tab=repositories
>> <https://github.com/bizosys?tab=repositories>
>>
>> Some key features of HSearch
>> Fast query on large datasets – queries typically return in milliseconds on
>> terabytes of data.
>> Multiple data structures are used for storing the data depending on the
>> nature of data
>> LRU cache layer for frequently accessed data
>> 5MB index cells to co-locate data by business entities and secondary
>> rollup
>> indexes for fast filtering on large datasets
>>
>> Check out HSearch at  hadoopsearch.net <http://hadoopsearch.net>   , do
>> download and try it out.
>> Let us know your feedback and questions at  hsearch@bizosys.com
>> <ma...@bizosys.com>
>>
>> Regards
>> Shubhendu Shekhar Singh
>> HSearch Committer
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-hbase.679495.n3.nabble.com/Releasing-HSearch-1-0-Search-and-Analytics-Engine-on-hadoop-hbase-tp4058295.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>
>
>

Re: Releasing HSearch 1.0 - Search and Analytics Engine on hadoop/hbase

Posted by Ted Yu <yu...@gmail.com>.
I looked at hsearch-core where I found:

$ ls src/compatibility/hadooplib_
hadooplib_12/ hadooplib_94/ hadooplib_96/

>From a brief look, hadooplib_94/hbase provides wrapper for hbase 0.94 and
hadooplib_96/hbase provides wrapper for 0.96
Similarly I found some Coprocessors under
src/compatibility/hadooplib_94/storage/
and src/compatibility/hadooplib_96/storage/.

It seems hbaselib_94 and hbaselib_96 would be names more reflective of what
the classes do.

Cheers


On Wed, Apr 16, 2014 at 3:33 AM, shubhendu.singh <
shubhendu.singh@bizosys.com> wrote:

> After our first release of *HSearch* (announced in this forum) back in
> December 2010, we have been working at it incorporating customer feedback
> from their production deployments.
> Recently, we included the capability to store and analyze structured data
> in
> addition to unstructured data.
> With these changes, we are now releasing a new version naming it 1.0.
>
> The software is available on github at
> https://github.com/bizosys?tab=repositories
> <https://github.com/bizosys?tab=repositories>
>
> Some key features of HSearch
> Fast query on large datasets – queries typically return in milliseconds on
> terabytes of data.
> Multiple data structures are used for storing the data depending on the
> nature of data
> LRU cache layer for frequently accessed data
> 5MB index cells to co-locate data by business entities and secondary rollup
> indexes for fast filtering on large datasets
>
> Check out HSearch at  hadoopsearch.net <http://hadoopsearch.net>   , do
> download and try it out.
> Let us know your feedback and questions at  hsearch@bizosys.com
> <ma...@bizosys.com>
>
> Regards
> Shubhendu Shekhar Singh
> HSearch Committer
>
>
>
>
> --
> View this message in context:
> http://apache-hbase.679495.n3.nabble.com/Releasing-HSearch-1-0-Search-and-Analytics-Engine-on-hadoop-hbase-tp4058295.html
> Sent from the HBase User mailing list archive at Nabble.com.