You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by yo...@wipro.com on 2012/07/16 08:21:15 UTC

RE: DATA UPLOADTION

Hello Debarshi,

Please suggest me what tool should I use for these operation over hadoop dfs.

Regards
Yogesh Kumar

________________________________
From: Debarshi Basak [debarshi.basak@tcs.com]
Sent: Monday, July 16, 2012 11:25 AM
To: user@hive.apache.org
Cc: user@hive.apache.org; user@hbase.apache.org
Subject: Re: DATA UPLOADTION

Hive is not the right to go about it, if you are planning to do search kind of operations


Debarshi Basak
Tata Consultancy Services
Mailto: debarshi.basak@tcs.com
Website: http://www.tcs.com
____________________________________________
Experience certainty. IT Services
Business Solutions
Outsourcing
____________________________________________

----- wrote: -----
To: <us...@hive.apache.org>
From: <yo...@wipro.com>
Date: 07/16/2012 09:11AM
cc: <us...@hbase.apache.org>
Subject: DATA UPLOADTION

Hi all,

I have data of Flat files, Log files, Images and .xls Files of around many G.B

I need to put operation like searching, Querying over that raw data.  and generating reports.
And its impossible to create tables manually for all to manage them. Is there any other way out or how to manage them using Hive or Hbase.

Please suggest me how do I perform these operations over them, I want to use HADOOP DFS and files has been uploaded on HDFS (Single user)


Thanks & Regards
Yogesh Kumar

Please do not print this email unless it is absolutely necessary.

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you

Please do not print this email unless it is absolutely necessary. 

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. 

www.wipro.com

Re: DATA UPLOADTION

Posted by "Gesli, Nicole" <Ni...@memorylane.com>.
For the Hive query approach, check the string functions (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-StringFunctions) or write your own (UDF), if needed. It depends on what you are trying to get. Example:

SELECT TRIM(SUBSTR(data, LOCATE(LOWER(data), ' this '), LOCATE(LOWER(data), ' that ')+5)) my_string
FROM   log_table
WHERE  LOWER(data) LIKE '%this%and%that%'


From: Bejoy KS <be...@yahoo.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>, "bejoy_ks@yahoo.com<ma...@yahoo.com>" <be...@yahoo.com>>
Date: Monday, July 16, 2012 11:39 PM
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Subject: Re: DATA UPLOADTION

Hi Yogesh

You can connect reporting tools like tableau , micro strategy etc direcly with hive.

If you are looking for some static reports based on aggregate data. You can process the data in hive move the resultant data into some rdbms and use some common reporting tools over the same. I know quite a few projects following this model.
Regards
Bejoy KS

Sent from handheld, please excuse typos.
________________________________
From: <yo...@wipro.com>>
Date: Tue, 17 Jul 2012 06:33:43 +0000
To: <us...@hive.apache.org>>; <be...@yahoo.com>>
ReplyTo: user@hive.apache.org<ma...@hive.apache.org>
Subject: RE: DATA UPLOADTION

Thanks Gesli and Bejoy,

I have created tables in hive and uploaded data into it. I can perform query on it, please suggest me how to generate reports from that tables.

Mr. Gesli,
If I create tables with single string column like ( create table Log_table( Data STRING); ) then how can perform condition based query over the data into Log_table ?


Thanks & Regards :-)
Yogesh Kumar

________________________________
From: Gesli, Nicole [Nicole.Gesli@memorylane.com<ma...@memorylane.com>]
Sent: Monday, July 16, 2012 11:30 PM
To: user@hive.apache.org<ma...@hive.apache.org>; bejoy_ks@yahoo.com<ma...@yahoo.com>
Cc: user@hbase.apache.org<ma...@hbase.apache.org>
Subject: Re: DATA UPLOADTION

If you are just trying to find certain text in the data files and you just want to do bulk process to create reports once a day or so, and prefer to use Hive: you can create a table with with single string column. You need to pre-process your data to replace the default column delimiter in your data. Or, you can define a column delimiter that your data does not have. That is to make sure that entire line data is assigned to the column but not cut in where the column delimiter is. If your query will be different for each file type (flat files, logs, xls,…) you can create different partitions for each file type. Dump your files into the table (or table partition) folder(s). Or you can create external table(s) if your data is already in HDFS. You can than do "like" (faster) or "rlike" search on the table.

-Nicole

From: Bejoy KS <be...@yahoo.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>, "bejoy_ks@yahoo.com<ma...@yahoo.com>" <be...@yahoo.com>>
Date: Monday, July 16, 2012 12:50 AM
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Cc: "user@hbase.apache.org<ma...@hbase.apache.org>" <us...@hbase.apache.org>>
Subject: Re: DATA UPLOADTION

Hi Yogesh

If you are looking at some indexing and search kind of operation you can take a look at lucene.

Whether you are using hive or Hbase you cannot do any operation without having a table structure defined for the data. So you need to create tables for each dataset and then only you can go ahead and issue queries and generate reports on those data.
Regards
Bejoy KS

Sent from handheld, please excuse typos.
________________________________
From: <yo...@wipro.com>>
Date: Mon, 16 Jul 2012 06:21:15 +0000
To: <us...@hive.apache.org>>
ReplyTo: user@hive.apache.org<ma...@hive.apache.org>
Cc: <us...@hbase.apache.org>>
Subject: RE: DATA UPLOADTION

Hello Debarshi,

Please suggest me what tool should I use for these operation over hadoop dfs.

Regards
Yogesh Kumar

________________________________
From: Debarshi Basak [debarshi.basak@tcs.com<ma...@tcs.com>]
Sent: Monday, July 16, 2012 11:25 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Cc: user@hive.apache.org<ma...@hive.apache.org>; user@hbase.apache.org<ma...@hbase.apache.org>
Subject: Re: DATA UPLOADTION

Hive is not the right to go about it, if you are planning to do search kind of operations


Debarshi Basak
Tata Consultancy Services
Mailto: debarshi.basak@tcs.com<ma...@tcs.com>
Website: http://www.tcs.com
____________________________________________
Experience certainty. IT Services
Business Solutions
Outsourcing
____________________________________________

----- wrote: -----
To: <us...@hive.apache.org>>
From: <yo...@wipro.com>>
Date: 07/16/2012 09:11AM
cc: <us...@hbase.apache.org>>
Subject: DATA UPLOADTION

Hi all,

I have data of Flat files, Log files, Images and .xls Files of around many G.B

I need to put operation like searching, Querying over that raw data.  and generating reports.
And its impossible to create tables manually for all to manage them. Is there any other way out or how to manage them using Hive or Hbase.

Please suggest me how do I perform these operations over them, I want to use HADOOP DFS and files has been uploaded on HDFS (Single user)


Thanks & Regards
Yogesh Kumar

Please do not print this email unless it is absolutely necessary.

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you

Please do not print this email unless it is absolutely necessary.

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

Please do not print this email unless it is absolutely necessary.

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

Re: DATA UPLOADTION

Posted by Bejoy KS <be...@yahoo.com>.
Hi Yogesh

You can connect reporting tools like tableau , micro strategy etc direcly with hive.

If you are looking for some static reports based on aggregate data. You can process the data in hive move the resultant data into some rdbms and use some common reporting tools over the same. I know quite a few projects following this model.

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: <yo...@wipro.com>
Date: Tue, 17 Jul 2012 06:33:43 
To: <us...@hive.apache.org>; <be...@yahoo.com>
Reply-To: user@hive.apache.org
Subject: RE: DATA UPLOADTION

Thanks Gesli and Bejoy,

I have created tables in hive and uploaded data into it. I can perform query on it, please suggest me how to generate reports from that tables.

Mr. Gesli,
If I create tables with single string column like ( create table Log_table( Data STRING); ) then how can perform condition based query over the data into Log_table ?


Thanks & Regards :-)
Yogesh Kumar

________________________________
From: Gesli, Nicole [Nicole.Gesli@memorylane.com]
Sent: Monday, July 16, 2012 11:30 PM
To: user@hive.apache.org; bejoy_ks@yahoo.com
Cc: user@hbase.apache.org
Subject: Re: DATA UPLOADTION

If you are just trying to find certain text in the data files and you just want to do bulk process to create reports once a day or so, and prefer to use Hive: you can create a table with with single string column. You need to pre-process your data to replace the default column delimiter in your data. Or, you can define a column delimiter that your data does not have. That is to make sure that entire line data is assigned to the column but not cut in where the column delimiter is. If your query will be different for each file type (flat files, logs, xls,…) you can create different partitions for each file type. Dump your files into the table (or table partition) folder(s). Or you can create external table(s) if your data is already in HDFS. You can than do "like" (faster) or "rlike" search on the table.

-Nicole

From: Bejoy KS <be...@yahoo.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>, "bejoy_ks@yahoo.com<ma...@yahoo.com>" <be...@yahoo.com>>
Date: Monday, July 16, 2012 12:50 AM
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Cc: "user@hbase.apache.org<ma...@hbase.apache.org>" <us...@hbase.apache.org>>
Subject: Re: DATA UPLOADTION

Hi Yogesh

If you are looking at some indexing and search kind of operation you can take a look at lucene.

Whether you are using hive or Hbase you cannot do any operation without having a table structure defined for the data. So you need to create tables for each dataset and then only you can go ahead and issue queries and generate reports on those data.
Regards
Bejoy KS

Sent from handheld, please excuse typos.
________________________________
From: <yo...@wipro.com>>
Date: Mon, 16 Jul 2012 06:21:15 +0000
To: <us...@hive.apache.org>>
ReplyTo: user@hive.apache.org<ma...@hive.apache.org>
Cc: <us...@hbase.apache.org>>
Subject: RE: DATA UPLOADTION

Hello Debarshi,

Please suggest me what tool should I use for these operation over hadoop dfs.

Regards
Yogesh Kumar

________________________________
From: Debarshi Basak [debarshi.basak@tcs.com<ma...@tcs.com>]
Sent: Monday, July 16, 2012 11:25 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Cc: user@hive.apache.org<ma...@hive.apache.org>; user@hbase.apache.org<ma...@hbase.apache.org>
Subject: Re: DATA UPLOADTION

Hive is not the right to go about it, if you are planning to do search kind of operations


Debarshi Basak
Tata Consultancy Services
Mailto: debarshi.basak@tcs.com<ma...@tcs.com>
Website: http://www.tcs.com
____________________________________________
Experience certainty. IT Services
Business Solutions
Outsourcing
____________________________________________

----- wrote: -----
To: <us...@hive.apache.org>>
From: <yo...@wipro.com>>
Date: 07/16/2012 09:11AM
cc: <us...@hbase.apache.org>>
Subject: DATA UPLOADTION

Hi all,

I have data of Flat files, Log files, Images and .xls Files of around many G.B

I need to put operation like searching, Querying over that raw data.  and generating reports.
And its impossible to create tables manually for all to manage them. Is there any other way out or how to manage them using Hive or Hbase.

Please suggest me how do I perform these operations over them, I want to use HADOOP DFS and files has been uploaded on HDFS (Single user)


Thanks & Regards
Yogesh Kumar

Please do not print this email unless it is absolutely necessary.

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you

Please do not print this email unless it is absolutely necessary.

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

Please do not print this email unless it is absolutely necessary. 

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. 

www.wipro.com


RE: DATA UPLOADTION

Posted by yo...@wipro.com.
Thanks Gesli and Bejoy,

I have created tables in hive and uploaded data into it. I can perform query on it, please suggest me how to generate reports from that tables.

Mr. Gesli,
If I create tables with single string column like ( create table Log_table( Data STRING); ) then how can perform condition based query over the data into Log_table ?


Thanks & Regards :-)
Yogesh Kumar

________________________________
From: Gesli, Nicole [Nicole.Gesli@memorylane.com]
Sent: Monday, July 16, 2012 11:30 PM
To: user@hive.apache.org; bejoy_ks@yahoo.com
Cc: user@hbase.apache.org
Subject: Re: DATA UPLOADTION

If you are just trying to find certain text in the data files and you just want to do bulk process to create reports once a day or so, and prefer to use Hive: you can create a table with with single string column. You need to pre-process your data to replace the default column delimiter in your data. Or, you can define a column delimiter that your data does not have. That is to make sure that entire line data is assigned to the column but not cut in where the column delimiter is. If your query will be different for each file type (flat files, logs, xls,…) you can create different partitions for each file type. Dump your files into the table (or table partition) folder(s). Or you can create external table(s) if your data is already in HDFS. You can than do "like" (faster) or "rlike" search on the table.

-Nicole

From: Bejoy KS <be...@yahoo.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>, "bejoy_ks@yahoo.com<ma...@yahoo.com>" <be...@yahoo.com>>
Date: Monday, July 16, 2012 12:50 AM
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Cc: "user@hbase.apache.org<ma...@hbase.apache.org>" <us...@hbase.apache.org>>
Subject: Re: DATA UPLOADTION

Hi Yogesh

If you are looking at some indexing and search kind of operation you can take a look at lucene.

Whether you are using hive or Hbase you cannot do any operation without having a table structure defined for the data. So you need to create tables for each dataset and then only you can go ahead and issue queries and generate reports on those data.
Regards
Bejoy KS

Sent from handheld, please excuse typos.
________________________________
From: <yo...@wipro.com>>
Date: Mon, 16 Jul 2012 06:21:15 +0000
To: <us...@hive.apache.org>>
ReplyTo: user@hive.apache.org<ma...@hive.apache.org>
Cc: <us...@hbase.apache.org>>
Subject: RE: DATA UPLOADTION

Hello Debarshi,

Please suggest me what tool should I use for these operation over hadoop dfs.

Regards
Yogesh Kumar

________________________________
From: Debarshi Basak [debarshi.basak@tcs.com<ma...@tcs.com>]
Sent: Monday, July 16, 2012 11:25 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Cc: user@hive.apache.org<ma...@hive.apache.org>; user@hbase.apache.org<ma...@hbase.apache.org>
Subject: Re: DATA UPLOADTION

Hive is not the right to go about it, if you are planning to do search kind of operations


Debarshi Basak
Tata Consultancy Services
Mailto: debarshi.basak@tcs.com<ma...@tcs.com>
Website: http://www.tcs.com
____________________________________________
Experience certainty. IT Services
Business Solutions
Outsourcing
____________________________________________

----- wrote: -----
To: <us...@hive.apache.org>>
From: <yo...@wipro.com>>
Date: 07/16/2012 09:11AM
cc: <us...@hbase.apache.org>>
Subject: DATA UPLOADTION

Hi all,

I have data of Flat files, Log files, Images and .xls Files of around many G.B

I need to put operation like searching, Querying over that raw data.  and generating reports.
And its impossible to create tables manually for all to manage them. Is there any other way out or how to manage them using Hive or Hbase.

Please suggest me how do I perform these operations over them, I want to use HADOOP DFS and files has been uploaded on HDFS (Single user)


Thanks & Regards
Yogesh Kumar

Please do not print this email unless it is absolutely necessary.

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you

Please do not print this email unless it is absolutely necessary.

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

Please do not print this email unless it is absolutely necessary. 

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. 

www.wipro.com

Re: DATA UPLOADTION

Posted by "Gesli, Nicole" <Ni...@memorylane.com>.
If you are just trying to find certain text in the data files and you just want to do bulk process to create reports once a day or so, and prefer to use Hive: you can create a table with with single string column. You need to pre-process your data to replace the default column delimiter in your data. Or, you can define a column delimiter that your data does not have. That is to make sure that entire line data is assigned to the column but not cut in where the column delimiter is. If your query will be different for each file type (flat files, logs, xls,…) you can create different partitions for each file type. Dump your files into the table (or table partition) folder(s). Or you can create external table(s) if your data is already in HDFS. You can than do "like" (faster) or "rlike" search on the table.

-Nicole

From: Bejoy KS <be...@yahoo.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>, "bejoy_ks@yahoo.com<ma...@yahoo.com>" <be...@yahoo.com>>
Date: Monday, July 16, 2012 12:50 AM
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Cc: "user@hbase.apache.org<ma...@hbase.apache.org>" <us...@hbase.apache.org>>
Subject: Re: DATA UPLOADTION

Hi Yogesh

If you are looking at some indexing and search kind of operation you can take a look at lucene.

Whether you are using hive or Hbase you cannot do any operation without having a table structure defined for the data. So you need to create tables for each dataset and then only you can go ahead and issue queries and generate reports on those data.
Regards
Bejoy KS

Sent from handheld, please excuse typos.
________________________________
From: <yo...@wipro.com>>
Date: Mon, 16 Jul 2012 06:21:15 +0000
To: <us...@hive.apache.org>>
ReplyTo: user@hive.apache.org<ma...@hive.apache.org>
Cc: <us...@hbase.apache.org>>
Subject: RE: DATA UPLOADTION

Hello Debarshi,

Please suggest me what tool should I use for these operation over hadoop dfs.

Regards
Yogesh Kumar

________________________________
From: Debarshi Basak [debarshi.basak@tcs.com<ma...@tcs.com>]
Sent: Monday, July 16, 2012 11:25 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Cc: user@hive.apache.org<ma...@hive.apache.org>; user@hbase.apache.org<ma...@hbase.apache.org>
Subject: Re: DATA UPLOADTION

Hive is not the right to go about it, if you are planning to do search kind of operations


Debarshi Basak
Tata Consultancy Services
Mailto: debarshi.basak@tcs.com<ma...@tcs.com>
Website: http://www.tcs.com
____________________________________________
Experience certainty. IT Services
Business Solutions
Outsourcing
____________________________________________

----- wrote: -----
To: <us...@hive.apache.org>>
From: <yo...@wipro.com>>
Date: 07/16/2012 09:11AM
cc: <us...@hbase.apache.org>>
Subject: DATA UPLOADTION

Hi all,

I have data of Flat files, Log files, Images and .xls Files of around many G.B

I need to put operation like searching, Querying over that raw data.  and generating reports.
And its impossible to create tables manually for all to manage them. Is there any other way out or how to manage them using Hive or Hbase.

Please suggest me how do I perform these operations over them, I want to use HADOOP DFS and files has been uploaded on HDFS (Single user)


Thanks & Regards
Yogesh Kumar

Please do not print this email unless it is absolutely necessary.

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you

Please do not print this email unless it is absolutely necessary.

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

Re: DATA UPLOADTION

Posted by "Gesli, Nicole" <Ni...@memorylane.com>.
If you are just trying to find certain text in the data files and you just want to do bulk process to create reports once a day or so, and prefer to use Hive: you can create a table with with single string column. You need to pre-process your data to replace the default column delimiter in your data. Or, you can define a column delimiter that your data does not have. That is to make sure that entire line data is assigned to the column but not cut in where the column delimiter is. If your query will be different for each file type (flat files, logs, xls,…) you can create different partitions for each file type. Dump your files into the table (or table partition) folder(s). Or you can create external table(s) if your data is already in HDFS. You can than do "like" (faster) or "rlike" search on the table.

-Nicole

From: Bejoy KS <be...@yahoo.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>, "bejoy_ks@yahoo.com<ma...@yahoo.com>" <be...@yahoo.com>>
Date: Monday, July 16, 2012 12:50 AM
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Cc: "user@hbase.apache.org<ma...@hbase.apache.org>" <us...@hbase.apache.org>>
Subject: Re: DATA UPLOADTION

Hi Yogesh

If you are looking at some indexing and search kind of operation you can take a look at lucene.

Whether you are using hive or Hbase you cannot do any operation without having a table structure defined for the data. So you need to create tables for each dataset and then only you can go ahead and issue queries and generate reports on those data.
Regards
Bejoy KS

Sent from handheld, please excuse typos.
________________________________
From: <yo...@wipro.com>>
Date: Mon, 16 Jul 2012 06:21:15 +0000
To: <us...@hive.apache.org>>
ReplyTo: user@hive.apache.org<ma...@hive.apache.org>
Cc: <us...@hbase.apache.org>>
Subject: RE: DATA UPLOADTION

Hello Debarshi,

Please suggest me what tool should I use for these operation over hadoop dfs.

Regards
Yogesh Kumar

________________________________
From: Debarshi Basak [debarshi.basak@tcs.com<ma...@tcs.com>]
Sent: Monday, July 16, 2012 11:25 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Cc: user@hive.apache.org<ma...@hive.apache.org>; user@hbase.apache.org<ma...@hbase.apache.org>
Subject: Re: DATA UPLOADTION

Hive is not the right to go about it, if you are planning to do search kind of operations


Debarshi Basak
Tata Consultancy Services
Mailto: debarshi.basak@tcs.com<ma...@tcs.com>
Website: http://www.tcs.com
____________________________________________
Experience certainty. IT Services
Business Solutions
Outsourcing
____________________________________________

----- wrote: -----
To: <us...@hive.apache.org>>
From: <yo...@wipro.com>>
Date: 07/16/2012 09:11AM
cc: <us...@hbase.apache.org>>
Subject: DATA UPLOADTION

Hi all,

I have data of Flat files, Log files, Images and .xls Files of around many G.B

I need to put operation like searching, Querying over that raw data.  and generating reports.
And its impossible to create tables manually for all to manage them. Is there any other way out or how to manage them using Hive or Hbase.

Please suggest me how do I perform these operations over them, I want to use HADOOP DFS and files has been uploaded on HDFS (Single user)


Thanks & Regards
Yogesh Kumar

Please do not print this email unless it is absolutely necessary.

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you

Please do not print this email unless it is absolutely necessary.

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

Re: DATA UPLOADTION

Posted by Bejoy KS <be...@yahoo.com>.
Hi Yogesh

If you are looking at some indexing and search kind of operation you can take a look at lucene.

Whether you are using hive or Hbase you cannot do any operation without having a table structure defined for the data. So you need to create tables for each dataset and then only you can go ahead and issue queries and generate reports on those data.  

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: <yo...@wipro.com>
Date: Mon, 16 Jul 2012 06:21:15 
To: <us...@hive.apache.org>
Reply-To: user@hive.apache.org
Cc: <us...@hbase.apache.org>
Subject: RE: DATA UPLOADTION

Hello Debarshi,

Please suggest me what tool should I use for these operation over hadoop dfs.

Regards
Yogesh Kumar

________________________________
From: Debarshi Basak [debarshi.basak@tcs.com]
Sent: Monday, July 16, 2012 11:25 AM
To: user@hive.apache.org
Cc: user@hive.apache.org; user@hbase.apache.org
Subject: Re: DATA UPLOADTION

Hive is not the right to go about it, if you are planning to do search kind of operations


Debarshi Basak
Tata Consultancy Services
Mailto: debarshi.basak@tcs.com
Website: http://www.tcs.com
____________________________________________
Experience certainty. IT Services
Business Solutions
Outsourcing
____________________________________________

----- wrote: -----
To: <us...@hive.apache.org>
From: <yo...@wipro.com>
Date: 07/16/2012 09:11AM
cc: <us...@hbase.apache.org>
Subject: DATA UPLOADTION

Hi all,

I have data of Flat files, Log files, Images and .xls Files of around many G.B

I need to put operation like searching, Querying over that raw data.  and generating reports.
And its impossible to create tables manually for all to manage them. Is there any other way out or how to manage them using Hive or Hbase.

Please suggest me how do I perform these operations over them, I want to use HADOOP DFS and files has been uploaded on HDFS (Single user)


Thanks & Regards
Yogesh Kumar

Please do not print this email unless it is absolutely necessary.

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you

Please do not print this email unless it is absolutely necessary. 

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. 

www.wipro.com


Re: DATA UPLOADTION

Posted by Bejoy KS <be...@yahoo.com>.
Hi Yogesh

If you are looking at some indexing and search kind of operation you can take a look at lucene.

Whether you are using hive or Hbase you cannot do any operation without having a table structure defined for the data. So you need to create tables for each dataset and then only you can go ahead and issue queries and generate reports on those data.  

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: <yo...@wipro.com>
Date: Mon, 16 Jul 2012 06:21:15 
To: <us...@hive.apache.org>
Reply-To: user@hive.apache.org
Cc: <us...@hbase.apache.org>
Subject: RE: DATA UPLOADTION

Hello Debarshi,

Please suggest me what tool should I use for these operation over hadoop dfs.

Regards
Yogesh Kumar

________________________________
From: Debarshi Basak [debarshi.basak@tcs.com]
Sent: Monday, July 16, 2012 11:25 AM
To: user@hive.apache.org
Cc: user@hive.apache.org; user@hbase.apache.org
Subject: Re: DATA UPLOADTION

Hive is not the right to go about it, if you are planning to do search kind of operations


Debarshi Basak
Tata Consultancy Services
Mailto: debarshi.basak@tcs.com
Website: http://www.tcs.com
____________________________________________
Experience certainty. IT Services
Business Solutions
Outsourcing
____________________________________________

----- wrote: -----
To: <us...@hive.apache.org>
From: <yo...@wipro.com>
Date: 07/16/2012 09:11AM
cc: <us...@hbase.apache.org>
Subject: DATA UPLOADTION

Hi all,

I have data of Flat files, Log files, Images and .xls Files of around many G.B

I need to put operation like searching, Querying over that raw data.  and generating reports.
And its impossible to create tables manually for all to manage them. Is there any other way out or how to manage them using Hive or Hbase.

Please suggest me how do I perform these operations over them, I want to use HADOOP DFS and files has been uploaded on HDFS (Single user)


Thanks & Regards
Yogesh Kumar

Please do not print this email unless it is absolutely necessary.

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you

Please do not print this email unless it is absolutely necessary. 

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. 

www.wipro.com