You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Subhajit Purkayastha <sp...@p3si.net> on 2016/08/23 22:46:29 UTC

DataFrame Data Manipulation - Based on a timestamp column Not Working

Using spark 2.0  & scala 2.11.8, I have a DataFrame with a timestamp column

 

root

|-- ORG_ID: integer (nullable = true)

|-- HEADER_ID: integer (nullable = true)

|-- ORDER_NUMBER: integer (nullable = true)

|-- LINE_ID: integer (nullable = true)

|-- LINE_NUMBER: integer (nullable = true)

|-- ITEM_TYPE_CODE: string (nullable = true)

|-- ORGANIZATION_ID: integer (nullable = true)

|-- INVENTORY_ITEM_ID: integer (nullable = true)

|-- SCHEDULE_SHIP_DATE: timestamp (nullable = true)

|-- ORDER_QUANTITY_UOM: string (nullable = true)

|-- UNIT_SELLING_PRICE: double (nullable = true)

|-- OPEN_QUANTITY: double (nullable = true)

 

[204,94468,56721,197328,1,STANDARD,207,149,2004-01-08
23:59:59.0,Ea,1599.0,28.0]

[204,94468,56721,197331,2,STANDARD,207,151,2004-01-08
23:59:59.0,Ea,1899.05,40.0]

[204,94468,56721,197332,3,STANDARD,207,436,2004-01-08
23:59:59.0,Ea,300.0,24.0]

[204,94468,56721,197335,4,STANDARD,207,3751,2004-01-08
23:59:59.0,Ea,380.0,24.0]

 

I want to manipulate the dataframe data based on a parameter =
demand_time_fence_date

 

var demand_timefence_end_date_instance = new
MutableDateTime(planning_start_date)    

var demand_timefence_days =
demand_timefence_end_date_instance.addDays(demand_time_fence)    

val demand_timefence_end_date =
ISODateTimeFormat.yearMonthDay().print(demand_timefence_end_date_instance)

 

var filter_stmt = "from_unixtime(SCHEDULE_SHIP_DATE,'yyyy-MM-dd') >= "+
demand_timefence_end_date  

 

val sales_order_dataFrame =
sales_order_base_dataFrame.filter(filter_stmt).limit(10)

 

What is the correct syntax to pass the parameter value? 

 

The above filter statement is not working to restrict the dataset

 

Thanks,

 

Subhajit

 

 


Re: DataFrame Data Manipulation - Based on a timestamp column Not Working

Posted by Bedrytski Aliaksandr <sp...@bedryt.ski>.
Hi Subhajit,

you may try to use sql queries instead of helper methods:

> sales_order_base_dataFrame.registerTempTable("sales_orders")
>
> val result = sqlContext.sql("""
>     SELECT *
>     FROM sales_orders
>     WHERE unix_timestamp(SCHEDULE_SHIP_DATE,'_yyyy_-MM-_dd_') >=
>     unix_timestamp(demand_timefence_end_date ,'_yyyy_-MM-_dd_')
> """)

This is if demand_timefence_end_date  has '_yyyy_-MM-_dd_' date format

Regards,
--
  Bedrytski Aliaksandr
  spark@bedryt.ski



On Wed, Aug 24, 2016, at 00:46, Subhajit Purkayastha wrote:
> Using spark 2.0  & scala 2.11.8, I have a DataFrame with a
> timestamp column
>
> root
> |-- ORG_ID: integer (nullable = true)
> |-- HEADER_ID: integer (nullable = true)
> |-- ORDER_NUMBER: integer (nullable = true)
> |-- LINE_ID: integer (nullable = true)
> |-- LINE_NUMBER: integer (nullable = true)
> |-- ITEM_TYPE_CODE: string (nullable = true)
> |-- ORGANIZATION_ID: integer (nullable = true)
> |-- INVENTORY_ITEM_ID: integer (nullable = true)
> |-- SCHEDULE_SHIP_DATE: timestamp (nullable = true)
> |-- ORDER_QUANTITY_UOM: string (nullable = true)
> |-- UNIT_SELLING_PRICE: double (nullable = true)
> |-- OPEN_QUANTITY: double (nullable = true)
>
> [204,94468,56721,197328,1,STANDARD,207,149,2004-01-08
> 23:59:59.0,Ea,1599.0,28.0]
> [204,94468,56721,197331,2,STANDARD,207,151,2004-01-08
> 23:59:59.0,Ea,1899.05,40.0]
> [204,94468,56721,197332,3,STANDARD,207,436,2004-01-08
> 23:59:59.0,Ea,300.0,24.0]
> [204,94468,56721,197335,4,STANDARD,207,3751,2004-01-08
> 23:59:59.0,Ea,380.0,24.0]
>
> I want to manipulate the dataframe data based on a parameter =
> demand_time_fence_date
>
> *var* demand_timefence_end_date_instance = *new*
> MutableDateTime(planning_start_date)
> *var* demand_timefence_days =
> demand_timefence_end_date_instance.addDays(demand_time_fence)
> *val* demand_timefence_end_date = ISODateTimeFormat.yearMonthDay().pr-
> int(demand_timefence_end_date_instance)
>
> _var_ filter_stmt = "from_unixtime(SCHEDULE_SHIP_DATE,'_yyyy_-MM-
> _dd_') >= "+ demand_timefence_end_date
>
> *val* sales_order_dataFrame =
> sales_order_base_dataFrame.filter(filter_stmt).limit(10)
>
> What is the correct syntax to pass the parameter value?
>
> The above filter statement is not working to restrict the dataset
>
> Thanks,
>
> Subhajit
>
>