While using regular SQL for long time, I find Hive syntax is not easy to use, especially on the TIME part. In many existing SQL program, one can compare date time with string easily, while in Hive, you have to specifically set the data type right. So to make things easier, here is a summarized note on how to deal with Hive TIME related operations.
(Based on Hive 1.2.1) Principle: 1. TIME could be "seen" in multiple types: a. Integer (actually Big Int) b. Timestamp c. Date d. String 2. Many times we see a string that looks like a "timestamp" or "date", but it still is a string! 3. Hive has timestamp and date datetype, and they can both be compared with a string or an integer. require comparison be made at either Integer or String level. The "Timestamp" or "Date" is only a string representation of the underlying Integer data Key functions (guarantee a date type): cast(date as date) cast(timestamp as date) cast(string as date) cast(date as timestamp) cast(date as string) Examples: >> select cast('2016-09-01' as date) > 1203828; >> true >> select cast('2016-09-01' as string) > 1203828; >> NULL Key functions (maybe different date type): unix_timestamp() # 1474353532 (integer) unix_timestamp('2016-08-01:21:00:01', 'yyyy-MM-dd:hh:mm:ss') # 1470085201 (integer) select datediff('2016-09-21', '2015-02-21'); # 578 (integer) from_unixtime(1474353532) # '2016-09-20 06:38:52' (string ) to_date('2016-08-01 00:03:24'); # '2016-08-01' (string) date_add('2016-08-01', 1); # '2016-08-02' (string) date_sub('2016-08-01', 1); # '2016-07-31' (string) from_utc_timestamp('2016-09-19 16:40:16.0', 'PST'); # 2016-09-19 09:40:16.0 (timestamp) to_utc_timestamp('2016-09-19 09:40:16.0', 'PST'); # 2016-09-19 16:40:16.0 (timestamp) current_timestamp() # '2016-09-20 15:39:43.145' (timestamp) current_date() # '2016-09-20' (date) Examples: >> select 1474353532 < unix_timestamp(); -- integer vs. integer >> true >> select 1474353532 < from_unixtime(unix_timestamp()) -- integer vs. string >> NULL >> select to_date(from_unixtime(unix_timestamp())) > '2015-08-20' -- string vs. string >> true >> select to_date(from_unixtime(unix_timestamp())) > unix_timestamp() -- string vs. int >> NULL
1 Comment
|
AuthorData Magician Archives
October 2017
Categories
All
|