You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by durga <du...@gmail.com> on 2014/12/17 00:35:30 UTC

S3 globbing

Hi All,

I need help with regex in my sc.textFile()

I have lots of files with with epoch millisecond timestamp.

ex:abc_1418759383723.json

Now I need to consume last one hour files using the epoch time stamp as
mentioned above.

I tried couple of options , nothing seems working for me.

If any one of you face this issue and got a solution , please help me. 

Appreciating your help,

Thanks,
D





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/S3-globbing-tp20731.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: S3 globbing

Posted by Akhil Das <ak...@sigmoidanalytics.com>.
Yes, you can create a Calendar instance and set it to an hour before and
use that object.

Eg:

val hourBefore = Calendar.getInstance()
hourBefore.add(Calendar.HOUR, -1)
hourBefore.getTimeInMillis

[image: Inline image 1]


Thanks
Best Regards

On Thu, Dec 18, 2014 at 12:57 AM, durga katakam <du...@gmail.com> wrote:
>
> Hi Akhil,
>
> Thanks for your time. I appreciate .I tried this approach , but either I
> am getting less files or more files not exact hour files.
>
> Is there any way I can tell the range (between this time to this time)
>
> Thanks,
> D
>
> On Tue, Dec 16, 2014 at 11:04 PM, Akhil Das <ak...@sigmoidanalytics.com>
> wrote:
>>
>> Did you try something like:
>>
>> //Get the last hour
>> val d = (System.currentTimeMillis() - 3600 * 1000)
>> val ex = "abc_" + d.toString().substring(0,7) + "*.json"
>>
>>
>> [image: Inline image 1]
>>
>> Thanks
>> Best Regards
>>
>> On Wed, Dec 17, 2014 at 5:05 AM, durga <du...@gmail.com> wrote:
>>>
>>> Hi All,
>>>
>>> I need help with regex in my sc.textFile()
>>>
>>> I have lots of files with with epoch millisecond timestamp.
>>>
>>> ex:abc_1418759383723.json
>>>
>>> Now I need to consume last one hour files using the epoch time stamp as
>>> mentioned above.
>>>
>>> I tried couple of options , nothing seems working for me.
>>>
>>> If any one of you face this issue and got a solution , please help me.
>>>
>>> Appreciating your help,
>>>
>>> Thanks,
>>> D
>>>
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/S3-globbing-tp20731.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>

Re: S3 globbing

Posted by durga katakam <du...@gmail.com>.
Hi Akhil,

Thanks for your time. I appreciate .I tried this approach , but either I am
getting less files or more files not exact hour files.

Is there any way I can tell the range (between this time to this time)

Thanks,
D

On Tue, Dec 16, 2014 at 11:04 PM, Akhil Das <ak...@sigmoidanalytics.com>
wrote:
>
> Did you try something like:
>
> //Get the last hour
> val d = (System.currentTimeMillis() - 3600 * 1000)
> val ex = "abc_" + d.toString().substring(0,7) + "*.json"
>
>
> [image: Inline image 1]
>
> Thanks
> Best Regards
>
> On Wed, Dec 17, 2014 at 5:05 AM, durga <du...@gmail.com> wrote:
>>
>> Hi All,
>>
>> I need help with regex in my sc.textFile()
>>
>> I have lots of files with with epoch millisecond timestamp.
>>
>> ex:abc_1418759383723.json
>>
>> Now I need to consume last one hour files using the epoch time stamp as
>> mentioned above.
>>
>> I tried couple of options , nothing seems working for me.
>>
>> If any one of you face this issue and got a solution , please help me.
>>
>> Appreciating your help,
>>
>> Thanks,
>> D
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/S3-globbing-tp20731.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>

Re: S3 globbing

Posted by Akhil Das <ak...@sigmoidanalytics.com>.
Did you try something like:

//Get the last hour
val d = (System.currentTimeMillis() - 3600 * 1000)
val ex = "abc_" + d.toString().substring(0,7) + "*.json"


[image: Inline image 1]

Thanks
Best Regards

On Wed, Dec 17, 2014 at 5:05 AM, durga <du...@gmail.com> wrote:
>
> Hi All,
>
> I need help with regex in my sc.textFile()
>
> I have lots of files with with epoch millisecond timestamp.
>
> ex:abc_1418759383723.json
>
> Now I need to consume last one hour files using the epoch time stamp as
> mentioned above.
>
> I tried couple of options , nothing seems working for me.
>
> If any one of you face this issue and got a solution , please help me.
>
> Appreciating your help,
>
> Thanks,
> D
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/S3-globbing-tp20731.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>