You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Panshul Whisper <ou...@gmail.com> on 2013/02/07 12:22:01 UTC

MapReduce to load data in HBase

Hello,

I am trying to write MapReduce jobs to read data from JSON files and load
it into HBase tables.
Please suggest me an efficient way to do it. I am trying to do it using
Spring Data Hbase Template to make it thread safe and enable table locking.

I use the Map methods to read and parse the JSON files. I use the Reduce
methods to call the HBase Template and store the data into the HBase tables.

My questions:
1. Is this the right approach or should I do all of the above the Map
method?
2. How can I pass the Java Object I create holding the data read from the
Json file to the Reduce method, which needs to be saved to the HBase table?
I can only pass the inbuilt data types to the reduce method from my mapper.
3. I thought of using the distributed cache for the above problem, to store
the object in the cache and pass only the key to the reduce method. But how
do I generate the unique key for all the objects I store in the distributed
cache.

Please help me with the above. Please tell me if I am missing some detail
or over looking some important detail.

Thanking You,


-- 
Regards,
Ouch Whisper
010101010101

Re: MapReduce to load data in HBase

Posted by Mohammad Tariq <do...@gmail.com>.

One correction. If your datatype is gonna be used just as values, you
actually don't need it to be comparable. But if you need it to be a key as
well, then it must be both.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Thu, Feb 7, 2013 at 4:58 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello Panshul,
>
>     My answers :
> 1- You can serialize the entire jSON into a byte[ ] and store it in a
> cell.(Is it important for you extract individual values from your JSON and
> then put them into the table?)
> 2- You can write your own datatype to pass your object to the reducer.
> But, it must be a Writable+Comparable. Alternatively you van use Avro.
> 3- For generating unique keys, you can use MR counters.
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Thu, Feb 7, 2013 at 4:52 PM, Panshul Whisper <ou...@gmail.com>wrote:
>
>> Hello,
>>
>> I am trying to write MapReduce jobs to read data from JSON files and load
>> it into HBase tables.
>> Please suggest me an efficient way to do it. I am trying to do it using
>> Spring Data Hbase Template to make it thread safe and enable table locking.
>>
>> I use the Map methods to read and parse the JSON files. I use the Reduce
>> methods to call the HBase Template and store the data into the HBase tables.
>>
>> My questions:
>> 1. Is this the right approach or should I do all of the above the Map
>> method?
>> 2. How can I pass the Java Object I create holding the data read from the
>> Json file to the Reduce method, which needs to be saved to the HBase table?
>> I can only pass the inbuilt data types to the reduce method from my mapper.
>> 3. I thought of using the distributed cache for the above problem, to
>> store the object in the cache and pass only the key to the reduce method.
>> But how do I generate the unique key for all the objects I store in the
>> distributed cache.
>>
>> Please help me with the above. Please tell me if I am missing some detail
>> or over looking some important detail.
>>
>> Thanking You,
>>
>>
>> --
>> Regards,
>> Ouch Whisper
>> 010101010101
>>
>
>

Re: MapReduce to load data in HBase

Posted by Mohammad Tariq <do...@gmail.com>.

You might find these links helpful :
http://stackoverflow.com/questions/10961474/how-in-hadoop-is-the-data-put-into-map-and-reduce-functions-in-correct-types/10965026#10965026
http://stackoverflow.com/questions/13877077/how-do-i-set-an-object-as-the-value-for-map-output-in-hadoop-mapreduce/13877688#13877688

HTH

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Thu, Feb 7, 2013 at 5:05 PM, Panshul Whisper <ou...@gmail.com>wrote:

> Hello,
>
> Thank you for the reply.
> 1. I cannot serialize the Json and store it as a whole. I need to extract
> individual values and store them as later I need to query the stored values
> in various aggregation algorithms.
> 2. Can u please point me in direction where I can find out how to write a
> data type to be Writable+Comparable. I will look into Avro, but I prefer to
> write my owm data type.
> 3. I will look into MR counters.
>
> Regards,
>
>
> On Thu, Feb 7, 2013 at 12:28 PM, Mohammad Tariq <do...@gmail.com>wrote:
>
>> Hello Panshul,
>>
>>     My answers :
>> 1- You can serialize the entire jSON into a byte[ ] and store it in a
>> cell.(Is it important for you extract individual values from your JSON and
>> then put them into the table?)
>> 2- You can write your own datatype to pass your object to the reducer.
>> But, it must be a Writable+Comparable. Alternatively you van use Avro.
>> 3- For generating unique keys, you can use MR counters.
>>
>> Warm Regards,
>> Tariq
>> https://mtariq.jux.com/
>> cloudfront.blogspot.com
>>
>>
>> On Thu, Feb 7, 2013 at 4:52 PM, Panshul Whisper <ou...@gmail.com>wrote:
>>
>>> Hello,
>>>
>>> I am trying to write MapReduce jobs to read data from JSON files and
>>> load it into HBase tables.
>>> Please suggest me an efficient way to do it. I am trying to do it using
>>> Spring Data Hbase Template to make it thread safe and enable table locking.
>>>
>>> I use the Map methods to read and parse the JSON files. I use the Reduce
>>> methods to call the HBase Template and store the data into the HBase tables.
>>>
>>> My questions:
>>> 1. Is this the right approach or should I do all of the above the Map
>>> method?
>>> 2. How can I pass the Java Object I create holding the data read from
>>> the Json file to the Reduce method, which needs to be saved to the HBase
>>> table? I can only pass the inbuilt data types to the reduce method from my
>>> mapper.
>>> 3. I thought of using the distributed cache for the above problem, to
>>> store the object in the cache and pass only the key to the reduce method.
>>> But how do I generate the unique key for all the objects I store in the
>>> distributed cache.
>>>
>>> Please help me with the above. Please tell me if I am missing some
>>> detail or over looking some important detail.
>>>
>>> Thanking You,
>>>
>>>
>>> --
>>> Regards,
>>> Ouch Whisper
>>> 010101010101
>>>
>>
>>
>
>
> --
> Regards,
> Ouch Whisper
> 010101010101
>

Re: MapReduce to load data in HBase

Posted by Mohammad Tariq <do...@gmail.com>.

You might find these links helpful :
http://stackoverflow.com/questions/10961474/how-in-hadoop-is-the-data-put-into-map-and-reduce-functions-in-correct-types/10965026#10965026
http://stackoverflow.com/questions/13877077/how-do-i-set-an-object-as-the-value-for-map-output-in-hadoop-mapreduce/13877688#13877688

HTH

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Thu, Feb 7, 2013 at 5:05 PM, Panshul Whisper <ou...@gmail.com>wrote:

> Hello,
>
> Thank you for the reply.
> 1. I cannot serialize the Json and store it as a whole. I need to extract
> individual values and store them as later I need to query the stored values
> in various aggregation algorithms.
> 2. Can u please point me in direction where I can find out how to write a
> data type to be Writable+Comparable. I will look into Avro, but I prefer to
> write my owm data type.
> 3. I will look into MR counters.
>
> Regards,
>
>
> On Thu, Feb 7, 2013 at 12:28 PM, Mohammad Tariq <do...@gmail.com>wrote:
>
>> Hello Panshul,
>>
>>     My answers :
>> 1- You can serialize the entire jSON into a byte[ ] and store it in a
>> cell.(Is it important for you extract individual values from your JSON and
>> then put them into the table?)
>> 2- You can write your own datatype to pass your object to the reducer.
>> But, it must be a Writable+Comparable. Alternatively you van use Avro.
>> 3- For generating unique keys, you can use MR counters.
>>
>> Warm Regards,
>> Tariq
>> https://mtariq.jux.com/
>> cloudfront.blogspot.com
>>
>>
>> On Thu, Feb 7, 2013 at 4:52 PM, Panshul Whisper <ou...@gmail.com>wrote:
>>
>>> Hello,
>>>
>>> I am trying to write MapReduce jobs to read data from JSON files and
>>> load it into HBase tables.
>>> Please suggest me an efficient way to do it. I am trying to do it using
>>> Spring Data Hbase Template to make it thread safe and enable table locking.
>>>
>>> I use the Map methods to read and parse the JSON files. I use the Reduce
>>> methods to call the HBase Template and store the data into the HBase tables.
>>>
>>> My questions:
>>> 1. Is this the right approach or should I do all of the above the Map
>>> method?
>>> 2. How can I pass the Java Object I create holding the data read from
>>> the Json file to the Reduce method, which needs to be saved to the HBase
>>> table? I can only pass the inbuilt data types to the reduce method from my
>>> mapper.
>>> 3. I thought of using the distributed cache for the above problem, to
>>> store the object in the cache and pass only the key to the reduce method.
>>> But how do I generate the unique key for all the objects I store in the
>>> distributed cache.
>>>
>>> Please help me with the above. Please tell me if I am missing some
>>> detail or over looking some important detail.
>>>
>>> Thanking You,
>>>
>>>
>>> --
>>> Regards,
>>> Ouch Whisper
>>> 010101010101
>>>
>>
>>
>
>
> --
> Regards,
> Ouch Whisper
> 010101010101
>

Re: MapReduce to load data in HBase

Posted by Mohammad Tariq <do...@gmail.com>.

You might find these links helpful :
http://stackoverflow.com/questions/10961474/how-in-hadoop-is-the-data-put-into-map-and-reduce-functions-in-correct-types/10965026#10965026
http://stackoverflow.com/questions/13877077/how-do-i-set-an-object-as-the-value-for-map-output-in-hadoop-mapreduce/13877688#13877688

HTH

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Thu, Feb 7, 2013 at 5:05 PM, Panshul Whisper <ou...@gmail.com>wrote:

> Hello,
>
> Thank you for the reply.
> 1. I cannot serialize the Json and store it as a whole. I need to extract
> individual values and store them as later I need to query the stored values
> in various aggregation algorithms.
> 2. Can u please point me in direction where I can find out how to write a
> data type to be Writable+Comparable. I will look into Avro, but I prefer to
> write my owm data type.
> 3. I will look into MR counters.
>
> Regards,
>
>
> On Thu, Feb 7, 2013 at 12:28 PM, Mohammad Tariq <do...@gmail.com>wrote:
>
>> Hello Panshul,
>>
>>     My answers :
>> 1- You can serialize the entire jSON into a byte[ ] and store it in a
>> cell.(Is it important for you extract individual values from your JSON and
>> then put them into the table?)
>> 2- You can write your own datatype to pass your object to the reducer.
>> But, it must be a Writable+Comparable. Alternatively you van use Avro.
>> 3- For generating unique keys, you can use MR counters.
>>
>> Warm Regards,
>> Tariq
>> https://mtariq.jux.com/
>> cloudfront.blogspot.com
>>
>>
>> On Thu, Feb 7, 2013 at 4:52 PM, Panshul Whisper <ou...@gmail.com>wrote:
>>
>>> Hello,
>>>
>>> I am trying to write MapReduce jobs to read data from JSON files and
>>> load it into HBase tables.
>>> Please suggest me an efficient way to do it. I am trying to do it using
>>> Spring Data Hbase Template to make it thread safe and enable table locking.
>>>
>>> I use the Map methods to read and parse the JSON files. I use the Reduce
>>> methods to call the HBase Template and store the data into the HBase tables.
>>>
>>> My questions:
>>> 1. Is this the right approach or should I do all of the above the Map
>>> method?
>>> 2. How can I pass the Java Object I create holding the data read from
>>> the Json file to the Reduce method, which needs to be saved to the HBase
>>> table? I can only pass the inbuilt data types to the reduce method from my
>>> mapper.
>>> 3. I thought of using the distributed cache for the above problem, to
>>> store the object in the cache and pass only the key to the reduce method.
>>> But how do I generate the unique key for all the objects I store in the
>>> distributed cache.
>>>
>>> Please help me with the above. Please tell me if I am missing some
>>> detail or over looking some important detail.
>>>
>>> Thanking You,
>>>
>>>
>>> --
>>> Regards,
>>> Ouch Whisper
>>> 010101010101
>>>
>>
>>
>
>
> --
> Regards,
> Ouch Whisper
> 010101010101
>

Re: MapReduce to load data in HBase

Posted by Mohammad Tariq <do...@gmail.com>.

You might find these links helpful :
http://stackoverflow.com/questions/10961474/how-in-hadoop-is-the-data-put-into-map-and-reduce-functions-in-correct-types/10965026#10965026
http://stackoverflow.com/questions/13877077/how-do-i-set-an-object-as-the-value-for-map-output-in-hadoop-mapreduce/13877688#13877688

HTH

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Thu, Feb 7, 2013 at 5:05 PM, Panshul Whisper <ou...@gmail.com>wrote:

> Hello,
>
> Thank you for the reply.
> 1. I cannot serialize the Json and store it as a whole. I need to extract
> individual values and store them as later I need to query the stored values
> in various aggregation algorithms.
> 2. Can u please point me in direction where I can find out how to write a
> data type to be Writable+Comparable. I will look into Avro, but I prefer to
> write my owm data type.
> 3. I will look into MR counters.
>
> Regards,
>
>
> On Thu, Feb 7, 2013 at 12:28 PM, Mohammad Tariq <do...@gmail.com>wrote:
>
>> Hello Panshul,
>>
>>     My answers :
>> 1- You can serialize the entire jSON into a byte[ ] and store it in a
>> cell.(Is it important for you extract individual values from your JSON and
>> then put them into the table?)
>> 2- You can write your own datatype to pass your object to the reducer.
>> But, it must be a Writable+Comparable. Alternatively you van use Avro.
>> 3- For generating unique keys, you can use MR counters.
>>
>> Warm Regards,
>> Tariq
>> https://mtariq.jux.com/
>> cloudfront.blogspot.com
>>
>>
>> On Thu, Feb 7, 2013 at 4:52 PM, Panshul Whisper <ou...@gmail.com>wrote:
>>
>>> Hello,
>>>
>>> I am trying to write MapReduce jobs to read data from JSON files and
>>> load it into HBase tables.
>>> Please suggest me an efficient way to do it. I am trying to do it using
>>> Spring Data Hbase Template to make it thread safe and enable table locking.
>>>
>>> I use the Map methods to read and parse the JSON files. I use the Reduce
>>> methods to call the HBase Template and store the data into the HBase tables.
>>>
>>> My questions:
>>> 1. Is this the right approach or should I do all of the above the Map
>>> method?
>>> 2. How can I pass the Java Object I create holding the data read from
>>> the Json file to the Reduce method, which needs to be saved to the HBase
>>> table? I can only pass the inbuilt data types to the reduce method from my
>>> mapper.
>>> 3. I thought of using the distributed cache for the above problem, to
>>> store the object in the cache and pass only the key to the reduce method.
>>> But how do I generate the unique key for all the objects I store in the
>>> distributed cache.
>>>
>>> Please help me with the above. Please tell me if I am missing some
>>> detail or over looking some important detail.
>>>
>>> Thanking You,
>>>
>>>
>>> --
>>> Regards,
>>> Ouch Whisper
>>> 010101010101
>>>
>>
>>
>
>
> --
> Regards,
> Ouch Whisper
> 010101010101
>

Re: MapReduce to load data in HBase

Posted by Panshul Whisper <ou...@gmail.com>.

Hello,

Thank you for the reply.
1. I cannot serialize the Json and store it as a whole. I need to extract
individual values and store them as later I need to query the stored values
in various aggregation algorithms.
2. Can u please point me in direction where I can find out how to write a
data type to be Writable+Comparable. I will look into Avro, but I prefer to
write my owm data type.
3. I will look into MR counters.

Regards,


On Thu, Feb 7, 2013 at 12:28 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello Panshul,
>
>     My answers :
> 1- You can serialize the entire jSON into a byte[ ] and store it in a
> cell.(Is it important for you extract individual values from your JSON and
> then put them into the table?)
> 2- You can write your own datatype to pass your object to the reducer.
> But, it must be a Writable+Comparable. Alternatively you van use Avro.
> 3- For generating unique keys, you can use MR counters.
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Thu, Feb 7, 2013 at 4:52 PM, Panshul Whisper <ou...@gmail.com>wrote:
>
>> Hello,
>>
>> I am trying to write MapReduce jobs to read data from JSON files and load
>> it into HBase tables.
>> Please suggest me an efficient way to do it. I am trying to do it using
>> Spring Data Hbase Template to make it thread safe and enable table locking.
>>
>> I use the Map methods to read and parse the JSON files. I use the Reduce
>> methods to call the HBase Template and store the data into the HBase tables.
>>
>> My questions:
>> 1. Is this the right approach or should I do all of the above the Map
>> method?
>> 2. How can I pass the Java Object I create holding the data read from the
>> Json file to the Reduce method, which needs to be saved to the HBase table?
>> I can only pass the inbuilt data types to the reduce method from my mapper.
>> 3. I thought of using the distributed cache for the above problem, to
>> store the object in the cache and pass only the key to the reduce method.
>> But how do I generate the unique key for all the objects I store in the
>> distributed cache.
>>
>> Please help me with the above. Please tell me if I am missing some detail
>> or over looking some important detail.
>>
>> Thanking You,
>>
>>
>> --
>> Regards,
>> Ouch Whisper
>> 010101010101
>>
>
>


-- 
Regards,
Ouch Whisper
010101010101

Re: MapReduce to load data in HBase

Posted by Panshul Whisper <ou...@gmail.com>.

Hello,

Thank you for the reply.
1. I cannot serialize the Json and store it as a whole. I need to extract
individual values and store them as later I need to query the stored values
in various aggregation algorithms.
2. Can u please point me in direction where I can find out how to write a
data type to be Writable+Comparable. I will look into Avro, but I prefer to
write my owm data type.
3. I will look into MR counters.

Regards,


On Thu, Feb 7, 2013 at 12:28 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello Panshul,
>
>     My answers :
> 1- You can serialize the entire jSON into a byte[ ] and store it in a
> cell.(Is it important for you extract individual values from your JSON and
> then put them into the table?)
> 2- You can write your own datatype to pass your object to the reducer.
> But, it must be a Writable+Comparable. Alternatively you van use Avro.
> 3- For generating unique keys, you can use MR counters.
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Thu, Feb 7, 2013 at 4:52 PM, Panshul Whisper <ou...@gmail.com>wrote:
>
>> Hello,
>>
>> I am trying to write MapReduce jobs to read data from JSON files and load
>> it into HBase tables.
>> Please suggest me an efficient way to do it. I am trying to do it using
>> Spring Data Hbase Template to make it thread safe and enable table locking.
>>
>> I use the Map methods to read and parse the JSON files. I use the Reduce
>> methods to call the HBase Template and store the data into the HBase tables.
>>
>> My questions:
>> 1. Is this the right approach or should I do all of the above the Map
>> method?
>> 2. How can I pass the Java Object I create holding the data read from the
>> Json file to the Reduce method, which needs to be saved to the HBase table?
>> I can only pass the inbuilt data types to the reduce method from my mapper.
>> 3. I thought of using the distributed cache for the above problem, to
>> store the object in the cache and pass only the key to the reduce method.
>> But how do I generate the unique key for all the objects I store in the
>> distributed cache.
>>
>> Please help me with the above. Please tell me if I am missing some detail
>> or over looking some important detail.
>>
>> Thanking You,
>>
>>
>> --
>> Regards,
>> Ouch Whisper
>> 010101010101
>>
>
>


-- 
Regards,
Ouch Whisper
010101010101

Re: MapReduce to load data in HBase

Posted by Panshul Whisper <ou...@gmail.com>.

Hello,

Thank you for the reply.
1. I cannot serialize the Json and store it as a whole. I need to extract
individual values and store them as later I need to query the stored values
in various aggregation algorithms.
2. Can u please point me in direction where I can find out how to write a
data type to be Writable+Comparable. I will look into Avro, but I prefer to
write my owm data type.
3. I will look into MR counters.

Regards,


On Thu, Feb 7, 2013 at 12:28 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello Panshul,
>
>     My answers :
> 1- You can serialize the entire jSON into a byte[ ] and store it in a
> cell.(Is it important for you extract individual values from your JSON and
> then put them into the table?)
> 2- You can write your own datatype to pass your object to the reducer.
> But, it must be a Writable+Comparable. Alternatively you van use Avro.
> 3- For generating unique keys, you can use MR counters.
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Thu, Feb 7, 2013 at 4:52 PM, Panshul Whisper <ou...@gmail.com>wrote:
>
>> Hello,
>>
>> I am trying to write MapReduce jobs to read data from JSON files and load
>> it into HBase tables.
>> Please suggest me an efficient way to do it. I am trying to do it using
>> Spring Data Hbase Template to make it thread safe and enable table locking.
>>
>> I use the Map methods to read and parse the JSON files. I use the Reduce
>> methods to call the HBase Template and store the data into the HBase tables.
>>
>> My questions:
>> 1. Is this the right approach or should I do all of the above the Map
>> method?
>> 2. How can I pass the Java Object I create holding the data read from the
>> Json file to the Reduce method, which needs to be saved to the HBase table?
>> I can only pass the inbuilt data types to the reduce method from my mapper.
>> 3. I thought of using the distributed cache for the above problem, to
>> store the object in the cache and pass only the key to the reduce method.
>> But how do I generate the unique key for all the objects I store in the
>> distributed cache.
>>
>> Please help me with the above. Please tell me if I am missing some detail
>> or over looking some important detail.
>>
>> Thanking You,
>>
>>
>> --
>> Regards,
>> Ouch Whisper
>> 010101010101
>>
>
>


-- 
Regards,
Ouch Whisper
010101010101

Re: MapReduce to load data in HBase

Posted by Panshul Whisper <ou...@gmail.com>.

Hello,

Thank you for the reply.
1. I cannot serialize the Json and store it as a whole. I need to extract
individual values and store them as later I need to query the stored values
in various aggregation algorithms.
2. Can u please point me in direction where I can find out how to write a
data type to be Writable+Comparable. I will look into Avro, but I prefer to
write my owm data type.
3. I will look into MR counters.

Regards,


On Thu, Feb 7, 2013 at 12:28 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello Panshul,
>
>     My answers :
> 1- You can serialize the entire jSON into a byte[ ] and store it in a
> cell.(Is it important for you extract individual values from your JSON and
> then put them into the table?)
> 2- You can write your own datatype to pass your object to the reducer.
> But, it must be a Writable+Comparable. Alternatively you van use Avro.
> 3- For generating unique keys, you can use MR counters.
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Thu, Feb 7, 2013 at 4:52 PM, Panshul Whisper <ou...@gmail.com>wrote:
>
>> Hello,
>>
>> I am trying to write MapReduce jobs to read data from JSON files and load
>> it into HBase tables.
>> Please suggest me an efficient way to do it. I am trying to do it using
>> Spring Data Hbase Template to make it thread safe and enable table locking.
>>
>> I use the Map methods to read and parse the JSON files. I use the Reduce
>> methods to call the HBase Template and store the data into the HBase tables.
>>
>> My questions:
>> 1. Is this the right approach or should I do all of the above the Map
>> method?
>> 2. How can I pass the Java Object I create holding the data read from the
>> Json file to the Reduce method, which needs to be saved to the HBase table?
>> I can only pass the inbuilt data types to the reduce method from my mapper.
>> 3. I thought of using the distributed cache for the above problem, to
>> store the object in the cache and pass only the key to the reduce method.
>> But how do I generate the unique key for all the objects I store in the
>> distributed cache.
>>
>> Please help me with the above. Please tell me if I am missing some detail
>> or over looking some important detail.
>>
>> Thanking You,
>>
>>
>> --
>> Regards,
>> Ouch Whisper
>> 010101010101
>>
>
>


-- 
Regards,
Ouch Whisper
010101010101

Re: MapReduce to load data in HBase

Posted by Mohammad Tariq <do...@gmail.com>.

One correction. If your datatype is gonna be used just as values, you
actually don't need it to be comparable. But if you need it to be a key as
well, then it must be both.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Thu, Feb 7, 2013 at 4:58 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello Panshul,
>
>     My answers :
> 1- You can serialize the entire jSON into a byte[ ] and store it in a
> cell.(Is it important for you extract individual values from your JSON and
> then put them into the table?)
> 2- You can write your own datatype to pass your object to the reducer.
> But, it must be a Writable+Comparable. Alternatively you van use Avro.
> 3- For generating unique keys, you can use MR counters.
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Thu, Feb 7, 2013 at 4:52 PM, Panshul Whisper <ou...@gmail.com>wrote:
>
>> Hello,
>>
>> I am trying to write MapReduce jobs to read data from JSON files and load
>> it into HBase tables.
>> Please suggest me an efficient way to do it. I am trying to do it using
>> Spring Data Hbase Template to make it thread safe and enable table locking.
>>
>> I use the Map methods to read and parse the JSON files. I use the Reduce
>> methods to call the HBase Template and store the data into the HBase tables.
>>
>> My questions:
>> 1. Is this the right approach or should I do all of the above the Map
>> method?
>> 2. How can I pass the Java Object I create holding the data read from the
>> Json file to the Reduce method, which needs to be saved to the HBase table?
>> I can only pass the inbuilt data types to the reduce method from my mapper.
>> 3. I thought of using the distributed cache for the above problem, to
>> store the object in the cache and pass only the key to the reduce method.
>> But how do I generate the unique key for all the objects I store in the
>> distributed cache.
>>
>> Please help me with the above. Please tell me if I am missing some detail
>> or over looking some important detail.
>>
>> Thanking You,
>>
>>
>> --
>> Regards,
>> Ouch Whisper
>> 010101010101
>>
>
>

Re: MapReduce to load data in HBase

Posted by Mohammad Tariq <do...@gmail.com>.

One correction. If your datatype is gonna be used just as values, you
actually don't need it to be comparable. But if you need it to be a key as
well, then it must be both.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Thu, Feb 7, 2013 at 4:58 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello Panshul,
>
>     My answers :
> 1- You can serialize the entire jSON into a byte[ ] and store it in a
> cell.(Is it important for you extract individual values from your JSON and
> then put them into the table?)
> 2- You can write your own datatype to pass your object to the reducer.
> But, it must be a Writable+Comparable. Alternatively you van use Avro.
> 3- For generating unique keys, you can use MR counters.
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Thu, Feb 7, 2013 at 4:52 PM, Panshul Whisper <ou...@gmail.com>wrote:
>
>> Hello,
>>
>> I am trying to write MapReduce jobs to read data from JSON files and load
>> it into HBase tables.
>> Please suggest me an efficient way to do it. I am trying to do it using
>> Spring Data Hbase Template to make it thread safe and enable table locking.
>>
>> I use the Map methods to read and parse the JSON files. I use the Reduce
>> methods to call the HBase Template and store the data into the HBase tables.
>>
>> My questions:
>> 1. Is this the right approach or should I do all of the above the Map
>> method?
>> 2. How can I pass the Java Object I create holding the data read from the
>> Json file to the Reduce method, which needs to be saved to the HBase table?
>> I can only pass the inbuilt data types to the reduce method from my mapper.
>> 3. I thought of using the distributed cache for the above problem, to
>> store the object in the cache and pass only the key to the reduce method.
>> But how do I generate the unique key for all the objects I store in the
>> distributed cache.
>>
>> Please help me with the above. Please tell me if I am missing some detail
>> or over looking some important detail.
>>
>> Thanking You,
>>
>>
>> --
>> Regards,
>> Ouch Whisper
>> 010101010101
>>
>
>

Re: MapReduce to load data in HBase

Posted by Mohammad Tariq <do...@gmail.com>.

One correction. If your datatype is gonna be used just as values, you
actually don't need it to be comparable. But if you need it to be a key as
well, then it must be both.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Thu, Feb 7, 2013 at 4:58 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello Panshul,
>
>     My answers :
> 1- You can serialize the entire jSON into a byte[ ] and store it in a
> cell.(Is it important for you extract individual values from your JSON and
> then put them into the table?)
> 2- You can write your own datatype to pass your object to the reducer.
> But, it must be a Writable+Comparable. Alternatively you van use Avro.
> 3- For generating unique keys, you can use MR counters.
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Thu, Feb 7, 2013 at 4:52 PM, Panshul Whisper <ou...@gmail.com>wrote:
>
>> Hello,
>>
>> I am trying to write MapReduce jobs to read data from JSON files and load
>> it into HBase tables.
>> Please suggest me an efficient way to do it. I am trying to do it using
>> Spring Data Hbase Template to make it thread safe and enable table locking.
>>
>> I use the Map methods to read and parse the JSON files. I use the Reduce
>> methods to call the HBase Template and store the data into the HBase tables.
>>
>> My questions:
>> 1. Is this the right approach or should I do all of the above the Map
>> method?
>> 2. How can I pass the Java Object I create holding the data read from the
>> Json file to the Reduce method, which needs to be saved to the HBase table?
>> I can only pass the inbuilt data types to the reduce method from my mapper.
>> 3. I thought of using the distributed cache for the above problem, to
>> store the object in the cache and pass only the key to the reduce method.
>> But how do I generate the unique key for all the objects I store in the
>> distributed cache.
>>
>> Please help me with the above. Please tell me if I am missing some detail
>> or over looking some important detail.
>>
>> Thanking You,
>>
>>
>> --
>> Regards,
>> Ouch Whisper
>> 010101010101
>>
>
>

Re: MapReduce to load data in HBase

Posted by Mohammad Tariq <do...@gmail.com>.

Hello Panshul,

    My answers :
1- You can serialize the entire jSON into a byte[ ] and store it in a
cell.(Is it important for you extract individual values from your JSON and
then put them into the table?)
2- You can write your own datatype to pass your object to the reducer. But,
it must be a Writable+Comparable. Alternatively you van use Avro.
3- For generating unique keys, you can use MR counters.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Thu, Feb 7, 2013 at 4:52 PM, Panshul Whisper <ou...@gmail.com>wrote:

> Hello,
>
> I am trying to write MapReduce jobs to read data from JSON files and load
> it into HBase tables.
> Please suggest me an efficient way to do it. I am trying to do it using
> Spring Data Hbase Template to make it thread safe and enable table locking.
>
> I use the Map methods to read and parse the JSON files. I use the Reduce
> methods to call the HBase Template and store the data into the HBase tables.
>
> My questions:
> 1. Is this the right approach or should I do all of the above the Map
> method?
> 2. How can I pass the Java Object I create holding the data read from the
> Json file to the Reduce method, which needs to be saved to the HBase table?
> I can only pass the inbuilt data types to the reduce method from my mapper.
> 3. I thought of using the distributed cache for the above problem, to
> store the object in the cache and pass only the key to the reduce method.
> But how do I generate the unique key for all the objects I store in the
> distributed cache.
>
> Please help me with the above. Please tell me if I am missing some detail
> or over looking some important detail.
>
> Thanking You,
>
>
> --
> Regards,
> Ouch Whisper
> 010101010101
>

Re: MapReduce to load data in HBase

Posted by Mohammad Tariq <do...@gmail.com>.

Hello Panshul,

    My answers :
1- You can serialize the entire jSON into a byte[ ] and store it in a
cell.(Is it important for you extract individual values from your JSON and
then put them into the table?)
2- You can write your own datatype to pass your object to the reducer. But,
it must be a Writable+Comparable. Alternatively you van use Avro.
3- For generating unique keys, you can use MR counters.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Thu, Feb 7, 2013 at 4:52 PM, Panshul Whisper <ou...@gmail.com>wrote:

> Hello,
>
> I am trying to write MapReduce jobs to read data from JSON files and load
> it into HBase tables.
> Please suggest me an efficient way to do it. I am trying to do it using
> Spring Data Hbase Template to make it thread safe and enable table locking.
>
> I use the Map methods to read and parse the JSON files. I use the Reduce
> methods to call the HBase Template and store the data into the HBase tables.
>
> My questions:
> 1. Is this the right approach or should I do all of the above the Map
> method?
> 2. How can I pass the Java Object I create holding the data read from the
> Json file to the Reduce method, which needs to be saved to the HBase table?
> I can only pass the inbuilt data types to the reduce method from my mapper.
> 3. I thought of using the distributed cache for the above problem, to
> store the object in the cache and pass only the key to the reduce method.
> But how do I generate the unique key for all the objects I store in the
> distributed cache.
>
> Please help me with the above. Please tell me if I am missing some detail
> or over looking some important detail.
>
> Thanking You,
>
>
> --
> Regards,
> Ouch Whisper
> 010101010101
>

Re: MapReduce to load data in HBase

Posted by Panshul Whisper <ou...@gmail.com>.

I am using the Map Reduce approach. I was looking into AVRO to create my
own custom Data types to pass from Mapper to Reducer.
With Avro I need to maintain the schema for all the types of Jason files I
am receiving and since there will be many different map reduce methods
running, so a different schema for every type.
1. Since the Json schema might change very frequently almost 3 times every
month. Is it advisable to use Avro to create custom data types? or I can
use the distributed cache and store the Java Object in the cache and pass
the key to the object to the Reducer?
2. Will there be any performance issues with using the distributed cache?
since the data will be very large and very high speed performance required.

Thanking You,
Regards,

On Thu, Feb 7, 2013 at 2:23 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Size is not a prob, frequently changing schema might be.
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Thu, Feb 7, 2013 at 6:25 PM, Panshul Whisper <ouchwhisper@gmail.com
> >wrote:
>
> > Hello,
> >
> > Thank you for the replies.
> >
> > I have not used pig yet. I am looking into it. I wanted to implement both
> > the approaches.
> > Are pig scripts maintainable? Because the Json structure that I will be
> > receiving will be changing quite often. Almost 3 times a month.
> > I will be processing 24 million Json files per month.
> > I am getting one big file with almost 3 million Json files aggregated.
> One
> > Json per line. I need to process this file and store all values into
> HBase.
> >
> > Thanking You,
> >
> >
> >
> >
> > On Thu, Feb 7, 2013 at 12:59 PM, Mohammad Tariq <do...@gmail.com>
> > wrote:
> >
> > > Good point sir. If Pig fits into Panshul's requirements then it's a
> much
> > > better option.
> > >
> > > Warm Regards,
> > > Tariq
> > > https://mtariq.jux.com/
> > > cloudfront.blogspot.com
> > >
> > >
> > > On Thu, Feb 7, 2013 at 5:25 PM, Damien Hardy <dh...@viadeoteam.com>
> > > wrote:
> > >
> > > > Hello,
> > > > Why not using a PIG script for that ?
> > > > make the json file available on HDFS
> > > > Load with
> > > >
> > > >
> > >
> >
> http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/builtin/JsonLoader.html
> > > > Store with
> > > >
> > > >
> > >
> >
> http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html
> > > >
> > > > http://pig.apache.org/docs/r0.10.0/
> > > >
> > > > Cheers,
> > > >
> > > > --
> > > > Damien
> > > >
> > >
> >
> >
> >
> > --
> > Regards,
> > Ouch Whisper
> > 010101010101
> >
>

-- 
Regards,
Ouch Whisper
010101010101

Re: MapReduce to load data in HBase

Posted by Panshul Whisper <ou...@gmail.com>.

I am using the Map Reduce approach. I was looking into AVRO to create my
own custom Data types to pass from Mapper to Reducer.
With Avro I need to maintain the schema for all the types of Jason files I
am receiving and since there will be many different map reduce methods
running, so a different schema for every type.
1. Since the Json schema might change very frequently almost 3 times every
month. Is it advisable to use Avro to create custom data types? or I can
use the distributed cache and store the Java Object in the cache and pass
the key to the object to the Reducer?
2. Will there be any performance issues with using the distributed cache?
since the data will be very large and very high speed performance required.

Thanking You,
Regards,

On Thu, Feb 7, 2013 at 2:23 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Size is not a prob, frequently changing schema might be.
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Thu, Feb 7, 2013 at 6:25 PM, Panshul Whisper <ouchwhisper@gmail.com
> >wrote:
>
> > Hello,
> >
> > Thank you for the replies.
> >
> > I have not used pig yet. I am looking into it. I wanted to implement both
> > the approaches.
> > Are pig scripts maintainable? Because the Json structure that I will be
> > receiving will be changing quite often. Almost 3 times a month.
> > I will be processing 24 million Json files per month.
> > I am getting one big file with almost 3 million Json files aggregated.
> One
> > Json per line. I need to process this file and store all values into
> HBase.
> >
> > Thanking You,
> >
> >
> >
> >
> > On Thu, Feb 7, 2013 at 12:59 PM, Mohammad Tariq <do...@gmail.com>
> > wrote:
> >
> > > Good point sir. If Pig fits into Panshul's requirements then it's a
> much
> > > better option.
> > >
> > > Warm Regards,
> > > Tariq
> > > https://mtariq.jux.com/
> > > cloudfront.blogspot.com
> > >
> > >
> > > On Thu, Feb 7, 2013 at 5:25 PM, Damien Hardy <dh...@viadeoteam.com>
> > > wrote:
> > >
> > > > Hello,
> > > > Why not using a PIG script for that ?
> > > > make the json file available on HDFS
> > > > Load with
> > > >
> > > >
> > >
> >
> http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/builtin/JsonLoader.html
> > > > Store with
> > > >
> > > >
> > >
> >
> http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html
> > > >
> > > > http://pig.apache.org/docs/r0.10.0/
> > > >
> > > > Cheers,
> > > >
> > > > --
> > > > Damien
> > > >
> > >
> >
> >
> >
> > --
> > Regards,
> > Ouch Whisper
> > 010101010101
> >
>

-- 
Regards,
Ouch Whisper
010101010101

Re: MapReduce to load data in HBase

Posted by Panshul Whisper <ou...@gmail.com>.

I am using the Map Reduce approach. I was looking into AVRO to create my
own custom Data types to pass from Mapper to Reducer.
With Avro I need to maintain the schema for all the types of Jason files I
am receiving and since there will be many different map reduce methods
running, so a different schema for every type.
1. Since the Json schema might change very frequently almost 3 times every
month. Is it advisable to use Avro to create custom data types? or I can
use the distributed cache and store the Java Object in the cache and pass
the key to the object to the Reducer?
2. Will there be any performance issues with using the distributed cache?
since the data will be very large and very high speed performance required.

Thanking You,
Regards,

On Thu, Feb 7, 2013 at 2:23 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Size is not a prob, frequently changing schema might be.
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Thu, Feb 7, 2013 at 6:25 PM, Panshul Whisper <ouchwhisper@gmail.com
> >wrote:
>
> > Hello,
> >
> > Thank you for the replies.
> >
> > I have not used pig yet. I am looking into it. I wanted to implement both
> > the approaches.
> > Are pig scripts maintainable? Because the Json structure that I will be
> > receiving will be changing quite often. Almost 3 times a month.
> > I will be processing 24 million Json files per month.
> > I am getting one big file with almost 3 million Json files aggregated.
> One
> > Json per line. I need to process this file and store all values into
> HBase.
> >
> > Thanking You,
> >
> >
> >
> >
> > On Thu, Feb 7, 2013 at 12:59 PM, Mohammad Tariq <do...@gmail.com>
> > wrote:
> >
> > > Good point sir. If Pig fits into Panshul's requirements then it's a
> much
> > > better option.
> > >
> > > Warm Regards,
> > > Tariq
> > > https://mtariq.jux.com/
> > > cloudfront.blogspot.com
> > >
> > >
> > > On Thu, Feb 7, 2013 at 5:25 PM, Damien Hardy <dh...@viadeoteam.com>
> > > wrote:
> > >
> > > > Hello,
> > > > Why not using a PIG script for that ?
> > > > make the json file available on HDFS
> > > > Load with
> > > >
> > > >
> > >
> >
> http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/builtin/JsonLoader.html
> > > > Store with
> > > >
> > > >
> > >
> >
> http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html
> > > >
> > > > http://pig.apache.org/docs/r0.10.0/
> > > >
> > > > Cheers,
> > > >
> > > > --
> > > > Damien
> > > >
> > >
> >
> >
> >
> > --
> > Regards,
> > Ouch Whisper
> > 010101010101
> >
>

-- 
Regards,
Ouch Whisper
010101010101

Re: MapReduce to load data in HBase

Posted by Panshul Whisper <ou...@gmail.com>.

I am using the Map Reduce approach. I was looking into AVRO to create my
own custom Data types to pass from Mapper to Reducer.
With Avro I need to maintain the schema for all the types of Jason files I
am receiving and since there will be many different map reduce methods
running, so a different schema for every type.
1. Since the Json schema might change very frequently almost 3 times every
month. Is it advisable to use Avro to create custom data types? or I can
use the distributed cache and store the Java Object in the cache and pass
the key to the object to the Reducer?
2. Will there be any performance issues with using the distributed cache?
since the data will be very large and very high speed performance required.

Thanking You,
Regards,

On Thu, Feb 7, 2013 at 2:23 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Size is not a prob, frequently changing schema might be.
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Thu, Feb 7, 2013 at 6:25 PM, Panshul Whisper <ouchwhisper@gmail.com
> >wrote:
>
> > Hello,
> >
> > Thank you for the replies.
> >
> > I have not used pig yet. I am looking into it. I wanted to implement both
> > the approaches.
> > Are pig scripts maintainable? Because the Json structure that I will be
> > receiving will be changing quite often. Almost 3 times a month.
> > I will be processing 24 million Json files per month.
> > I am getting one big file with almost 3 million Json files aggregated.
> One
> > Json per line. I need to process this file and store all values into
> HBase.
> >
> > Thanking You,
> >
> >
> >
> >
> > On Thu, Feb 7, 2013 at 12:59 PM, Mohammad Tariq <do...@gmail.com>
> > wrote:
> >
> > > Good point sir. If Pig fits into Panshul's requirements then it's a
> much
> > > better option.
> > >
> > > Warm Regards,
> > > Tariq
> > > https://mtariq.jux.com/
> > > cloudfront.blogspot.com
> > >
> > >
> > > On Thu, Feb 7, 2013 at 5:25 PM, Damien Hardy <dh...@viadeoteam.com>
> > > wrote:
> > >
> > > > Hello,
> > > > Why not using a PIG script for that ?
> > > > make the json file available on HDFS
> > > > Load with
> > > >
> > > >
> > >
> >
> http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/builtin/JsonLoader.html
> > > > Store with
> > > >
> > > >
> > >
> >
> http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html
> > > >
> > > > http://pig.apache.org/docs/r0.10.0/
> > > >
> > > > Cheers,
> > > >
> > > > --
> > > > Damien
> > > >
> > >
> >
> >
> >
> > --
> > Regards,
> > Ouch Whisper
> > 010101010101
> >
>

-- 
Regards,
Ouch Whisper
010101010101

Re: MapReduce to load data in HBase

Posted by Panshul Whisper <ou...@gmail.com>.

I am using the Map Reduce approach. I was looking into AVRO to create my
own custom Data types to pass from Mapper to Reducer.
With Avro I need to maintain the schema for all the types of Jason files I
am receiving and since there will be many different map reduce methods
running, so a different schema for every type.
1. Since the Json schema might change very frequently almost 3 times every
month. Is it advisable to use Avro to create custom data types? or I can
use the distributed cache and store the Java Object in the cache and pass
the key to the object to the Reducer?
2. Will there be any performance issues with using the distributed cache?
since the data will be very large and very high speed performance required.

Thanking You,
Regards,

On Thu, Feb 7, 2013 at 2:23 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Size is not a prob, frequently changing schema might be.
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Thu, Feb 7, 2013 at 6:25 PM, Panshul Whisper <ouchwhisper@gmail.com
> >wrote:
>
> > Hello,
> >
> > Thank you for the replies.
> >
> > I have not used pig yet. I am looking into it. I wanted to implement both
> > the approaches.
> > Are pig scripts maintainable? Because the Json structure that I will be
> > receiving will be changing quite often. Almost 3 times a month.
> > I will be processing 24 million Json files per month.
> > I am getting one big file with almost 3 million Json files aggregated.
> One
> > Json per line. I need to process this file and store all values into
> HBase.
> >
> > Thanking You,
> >
> >
> >
> >
> > On Thu, Feb 7, 2013 at 12:59 PM, Mohammad Tariq <do...@gmail.com>
> > wrote:
> >
> > > Good point sir. If Pig fits into Panshul's requirements then it's a
> much
> > > better option.
> > >
> > > Warm Regards,
> > > Tariq
> > > https://mtariq.jux.com/
> > > cloudfront.blogspot.com
> > >
> > >
> > > On Thu, Feb 7, 2013 at 5:25 PM, Damien Hardy <dh...@viadeoteam.com>
> > > wrote:
> > >
> > > > Hello,
> > > > Why not using a PIG script for that ?
> > > > make the json file available on HDFS
> > > > Load with
> > > >
> > > >
> > >
> >
> http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/builtin/JsonLoader.html
> > > > Store with
> > > >
> > > >
> > >
> >
> http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html
> > > >
> > > > http://pig.apache.org/docs/r0.10.0/
> > > >
> > > > Cheers,
> > > >
> > > > --
> > > > Damien
> > > >
> > >
> >
> >
> >
> > --
> > Regards,
> > Ouch Whisper
> > 010101010101
> >
>

-- 
Regards,
Ouch Whisper
010101010101

Re: MapReduce to load data in HBase

Posted by Mohammad Tariq <do...@gmail.com>.

Size is not a prob, frequently changing schema might be.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Thu, Feb 7, 2013 at 6:25 PM, Panshul Whisper <ou...@gmail.com>wrote:

> Hello,
>
> Thank you for the replies.
>
> I have not used pig yet. I am looking into it. I wanted to implement both
> the approaches.
> Are pig scripts maintainable? Because the Json structure that I will be
> receiving will be changing quite often. Almost 3 times a month.
> I will be processing 24 million Json files per month.
> I am getting one big file with almost 3 million Json files aggregated. One
> Json per line. I need to process this file and store all values into HBase.
>
> Thanking You,
>
>
>
>
> On Thu, Feb 7, 2013 at 12:59 PM, Mohammad Tariq <do...@gmail.com>
> wrote:
>
> > Good point sir. If Pig fits into Panshul's requirements then it's a much
> > better option.
> >
> > Warm Regards,
> > Tariq
> > https://mtariq.jux.com/
> > cloudfront.blogspot.com
> >
> >
> > On Thu, Feb 7, 2013 at 5:25 PM, Damien Hardy <dh...@viadeoteam.com>
> > wrote:
> >
> > > Hello,
> > > Why not using a PIG script for that ?
> > > make the json file available on HDFS
> > > Load with
> > >
> > >
> >
> http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/builtin/JsonLoader.html
> > > Store with
> > >
> > >
> >
> http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html
> > >
> > > http://pig.apache.org/docs/r0.10.0/
> > >
> > > Cheers,
> > >
> > > --
> > > Damien
> > >
> >
>
>
>
> --
> Regards,
> Ouch Whisper
> 010101010101
>

Re: MapReduce to load data in HBase

Posted by Michael Segel <mi...@hotmail.com>.

Pig is like any other scripting language. Therefore maintainable. 

Your script defines the schema of the data set and then does the processing you tell it to do. 

On Feb 7, 2013, at 6:55 AM, Panshul Whisper <ou...@gmail.com> wrote:

> Hello,
> 
> Thank you for the replies.
> 
> I have not used pig yet. I am looking into it. I wanted to implement both
> the approaches.
> Are pig scripts maintainable? Because the Json structure that I will be
> receiving will be changing quite often. Almost 3 times a month.
> I will be processing 24 million Json files per month.
> I am getting one big file with almost 3 million Json files aggregated. One
> Json per line. I need to process this file and store all values into HBase.
> 
> Thanking You,
> 
> 
> 
> 
> On Thu, Feb 7, 2013 at 12:59 PM, Mohammad Tariq <do...@gmail.com> wrote:
> 
>> Good point sir. If Pig fits into Panshul's requirements then it's a much
>> better option.
>> 
>> Warm Regards,
>> Tariq
>> https://mtariq.jux.com/
>> cloudfront.blogspot.com
>> 
>> 
>> On Thu, Feb 7, 2013 at 5:25 PM, Damien Hardy <dh...@viadeoteam.com>
>> wrote:
>> 
>>> Hello,
>>> Why not using a PIG script for that ?
>>> make the json file available on HDFS
>>> Load with
>>> 
>>> 
>> http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/builtin/JsonLoader.html
>>> Store with
>>> 
>>> 
>> http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html
>>> 
>>> http://pig.apache.org/docs/r0.10.0/
>>> 
>>> Cheers,
>>> 
>>> --
>>> Damien
>>> 
>> 
> 
> 
> 
> -- 
> Regards,
> Ouch Whisper
> 010101010101

The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: MapReduce to load data in HBase

Posted by Panshul Whisper <ou...@gmail.com>.

Hello,

Thank you for the replies.

I have not used pig yet. I am looking into it. I wanted to implement both
the approaches.
Are pig scripts maintainable? Because the Json structure that I will be
receiving will be changing quite often. Almost 3 times a month.
I will be processing 24 million Json files per month.
I am getting one big file with almost 3 million Json files aggregated. One
Json per line. I need to process this file and store all values into HBase.

Thanking You,

On Thu, Feb 7, 2013 at 12:59 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Good point sir. If Pig fits into Panshul's requirements then it's a much
> better option.
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Thu, Feb 7, 2013 at 5:25 PM, Damien Hardy <dh...@viadeoteam.com>
> wrote:
>
> > Hello,
> > Why not using a PIG script for that ?
> > make the json file available on HDFS
> > Load with
> >
> >
> http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/builtin/JsonLoader.html
> > Store with
> >
> >
> http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html
> >
> > http://pig.apache.org/docs/r0.10.0/
> >
> > Cheers,
> >
> > --
> > Damien
> >
>

-- 
Regards,
Ouch Whisper
010101010101

Re: MapReduce to load data in HBase

Posted by Mohammad Tariq <do...@gmail.com>.

Good point sir. If Pig fits into Panshul's requirements then it's a much
better option.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Thu, Feb 7, 2013 at 5:25 PM, Damien Hardy <dh...@viadeoteam.com> wrote:

> Hello,
> Why not using a PIG script for that ?
> make the json file available on HDFS
> Load with
>
> http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/builtin/JsonLoader.html
> Store with
>
> http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html
>
> http://pig.apache.org/docs/r0.10.0/
>
> Cheers,
>
> --
> Damien
>

Re: MapReduce to load data in HBase

Posted by Damien Hardy <dh...@viadeoteam.com>.

Hello,
Why not using a PIG script for that ?
make the json file available on HDFS
Load with
http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/builtin/JsonLoader.html
Store with
http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html

http://pig.apache.org/docs/r0.10.0/

Cheers,

-- 
Damien

Re: MapReduce to load data in HBase

Posted by Mohammad Tariq <do...@gmail.com>.

Hello Panshul,

    My answers :
1- You can serialize the entire jSON into a byte[ ] and store it in a
cell.(Is it important for you extract individual values from your JSON and
then put them into the table?)
2- You can write your own datatype to pass your object to the reducer. But,
it must be a Writable+Comparable. Alternatively you van use Avro.
3- For generating unique keys, you can use MR counters.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Thu, Feb 7, 2013 at 4:52 PM, Panshul Whisper <ou...@gmail.com>wrote:

> Hello,
>
> I am trying to write MapReduce jobs to read data from JSON files and load
> it into HBase tables.
> Please suggest me an efficient way to do it. I am trying to do it using
> Spring Data Hbase Template to make it thread safe and enable table locking.
>
> I use the Map methods to read and parse the JSON files. I use the Reduce
> methods to call the HBase Template and store the data into the HBase tables.
>
> My questions:
> 1. Is this the right approach or should I do all of the above the Map
> method?
> 2. How can I pass the Java Object I create holding the data read from the
> Json file to the Reduce method, which needs to be saved to the HBase table?
> I can only pass the inbuilt data types to the reduce method from my mapper.
> 3. I thought of using the distributed cache for the above problem, to
> store the object in the cache and pass only the key to the reduce method.
> But how do I generate the unique key for all the objects I store in the
> distributed cache.
>
> Please help me with the above. Please tell me if I am missing some detail
> or over looking some important detail.
>
> Thanking You,
>
>
> --
> Regards,
> Ouch Whisper
> 010101010101
>

Re: MapReduce to load data in HBase

Posted by Damien Hardy <dh...@viadeoteam.com>.

Hello,
Why not using a PIG script for that ?
make the json file available on HDFS
Load with
http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/builtin/JsonLoader.html
Store with
http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html

http://pig.apache.org/docs/r0.10.0/

Cheers,

-- 
Damien

Re: MapReduce to load data in HBase

Posted by Mohammad Tariq <do...@gmail.com>.

Hello Panshul,

    My answers :
1- You can serialize the entire jSON into a byte[ ] and store it in a
cell.(Is it important for you extract individual values from your JSON and
then put them into the table?)
2- You can write your own datatype to pass your object to the reducer. But,
it must be a Writable+Comparable. Alternatively you van use Avro.
3- For generating unique keys, you can use MR counters.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Thu, Feb 7, 2013 at 4:52 PM, Panshul Whisper <ou...@gmail.com>wrote:

> Hello,
>
> I am trying to write MapReduce jobs to read data from JSON files and load
> it into HBase tables.
> Please suggest me an efficient way to do it. I am trying to do it using
> Spring Data Hbase Template to make it thread safe and enable table locking.
>
> I use the Map methods to read and parse the JSON files. I use the Reduce
> methods to call the HBase Template and store the data into the HBase tables.
>
> My questions:
> 1. Is this the right approach or should I do all of the above the Map
> method?
> 2. How can I pass the Java Object I create holding the data read from the
> Json file to the Reduce method, which needs to be saved to the HBase table?
> I can only pass the inbuilt data types to the reduce method from my mapper.
> 3. I thought of using the distributed cache for the above problem, to
> store the object in the cache and pass only the key to the reduce method.
> But how do I generate the unique key for all the objects I store in the
> distributed cache.
>
> Please help me with the above. Please tell me if I am missing some detail
> or over looking some important detail.
>
> Thanking You,
>
>
> --
> Regards,
> Ouch Whisper
> 010101010101
>

Re: MapReduce to load data in HBase

Posted by Damien Hardy <dh...@viadeoteam.com>.

Hello,
Why not using a PIG script for that ?
make the json file available on HDFS
Load with
http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/builtin/JsonLoader.html
Store with
http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html

http://pig.apache.org/docs/r0.10.0/

Cheers,

-- 
Damien

Re: MapReduce to load data in HBase

Posted by Damien Hardy <dh...@viadeoteam.com>.

Hello,
Why not using a PIG script for that ?
make the json file available on HDFS
Load with
http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/builtin/JsonLoader.html
Store with
http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html

http://pig.apache.org/docs/r0.10.0/

Cheers,

-- 
Damien

Re: MapReduce to load data in HBase

Posted by Damien Hardy <dh...@viadeoteam.com>.

Hello,
Why not using a PIG script for that ?
make the json file available on HDFS
Load with
http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/builtin/JsonLoader.html
Store with
http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html

http://pig.apache.org/docs/r0.10.0/

Cheers,

-- 
Damien