When I apply TimeStamp Data Type to one of the column in DataFrame
data_frame = self.spark.sparkContext.parallelize([
(‘Joe’, '1995-08-01T00:00:01.000+0000'),
(‘Kent’, '1995-08-01T00:00:01.000+0000'),
(’Tim’, '1995-08-01T00:00:01.000+0000')
]).toDF(['firstName', 'dob'])
format = "yyyy-MM-dd'T'HH:mm:ss.SSSZ"
data_frame = data_frame.withColumn('dob', unix_timestamp('dob', format).cast('timestamp'))
I get the result as
(firstName= Joe', dob=datetime.datetime(1995, 8, 1, 2, 0, 1))
But, I would like to retain the data as is but just cast the DataType alone, something like this
(firstName= Joe', dob=1995-08-01T00:00:01.000+0000)
How to convert it?
Comments
Post a Comment