You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@gobblin.apache.org by Abhishek Tiwari <fi...@gmail.com> on 2018/03/23 18:49:19 UTC
Re: TimeBasedPartitioner for Hive ORC Files
I think you need an implementation of TimeBasedWriterPartitioner. eg. refer
to this: TimeBasedAvroWriterPartitioner
So, the partitioning will be at Gobblin side rather than ORC Serde, so you
wouldn't need to modify / extend it.
Abhishek
On Mon, Dec 11, 2017 at 7:35 PM, Prateek Gupta <pr...@myntra.com>
wrote:
> Any advice or information aimed at resolving this difficulty, would be
> appreciated.
>
> On 8 Dec 2017 11:28 a.m., "Prateek Gupta" <pr...@myntra.com>
> wrote:
>
>> Hi,
>>
>> How can one go about creating the TimeBasedPartitioner for Hive ORC
>> files? Class *OrcSerdeRow* is *final* and available at *package* level
>> only!
>>
>> Thanks & Regards,
>> Prateek Gupta
>>
>
Re: TimeBasedPartitioner for Hive ORC Files
Posted by Prateek Gupta <pr...@myntra.com>.
Thanks for the response, Abhishek!
PFB the code for the implementation of TimeBasedOrcWriterPartitioner. The
*hitch* was that we're not able to access the field *realrow* of class
*OrcSerdeRow* as it is a *final* class and available at *package* level
only.
public class TimeBasedOrcWriterPartitioner extends
TimeBasedWriterPartitioner<Object> {
private static final Log LOG =
LogFactory.getLog(TimeBasedOrcWriterPartitioner.class);
private static final String orcSerdeRow = "realRow";
public TimeBasedOrcWriterPartitioner(gobblin.configuration.State
state, int numBranches, int branchId) {
super(state, numBranches, branchId);
}
@Override
public long getRecordTimestamp(Object orcRecord) {
return getRecordTimestampUtil(orcRecord);
}
private static ArrayList<Object> extractRealRow(Object orcRecord,
Field f) throws IllegalAccessException {
return (ArrayList<Object>) f.get(orcRecord);
}
private static long getRecordTimestampUtil(Object orcRecord) {
Class<?> clazz = orcRecord.getClass();
ArrayList<Object> realRow = null;
try {
Field f = clazz.getDeclaredField(orcSerdeRow);
f.setAccessible(true);
realRow = extractRealRow(orcRecord, f);
} catch (NoSuchFieldException e) {
e.printStackTrace();
} catch (IllegalAccessException e) {
e.printStackTrace();
}
int timestampIndex = 1;
long timestamp = (Long) realRow.get(timestampIndex);
LOG.debug("Timestamp of the OrcSerdeRow" + timestamp);
return timestamp;
}
}
On Sat, Mar 24, 2018 at 12:19 AM, Abhishek Tiwari <fi...@gmail.com>
wrote:
> I think you need an implementation of TimeBasedWriterPartitioner. eg.
> refer to this: TimeBasedAvroWriterPartitioner
> So, the partitioning will be at Gobblin side rather than ORC Serde, so you
> wouldn't need to modify / extend it.
>
> Abhishek
>
>
> On Mon, Dec 11, 2017 at 7:35 PM, Prateek Gupta <pr...@myntra.com>
> wrote:
>
>> Any advice or information aimed at resolving this difficulty, would be
>> appreciated.
>>
>> On 8 Dec 2017 11:28 a.m., "Prateek Gupta" <pr...@myntra.com>
>> wrote:
>>
>>> Hi,
>>>
>>> How can one go about creating the TimeBasedPartitioner for Hive ORC
>>> files? Class *OrcSerdeRow* is *final* and available at *package* level
>>> only!
>>>
>>> Thanks & Regards,
>>> Prateek Gupta
>>>
>>
>