Magic Analytics
  • Home
  • Python
    • Pandas
    • Matplotlib
    • Interactive Visualization
    • Folium
  • Spark
    • DataFrame
  • Machine Learning
    • Classification >
      • Logistic Regression
    • Dimension Reduction
    • Model Explaination
  • Blog
  • About

Aries Research Note

Hive: it's about TIME

9/12/2016

1 Comment

 
While using regular SQL for long time, I find Hive syntax is not easy to use, especially on the TIME part. In many existing SQL program, one can compare date time with string easily, while in Hive, you have to specifically set the data type right. So to make things easier, here is a summarized note on how to deal with Hive TIME related operations.
(Based on Hive 1.2.1)

Principle:
    1. TIME could be "seen" in multiple types:
        a. Integer (actually Big Int)
        b. Timestamp
        c. Date
        d. String
     2. Many times we see a string that looks like a "timestamp" or "date", but it still is a string!
    3. Hive has timestamp and date datetype, and they can both be compared with a string or an integer.

require comparison be made at either Integer or String level. The "Timestamp" or "Date" is only a string representation of the underlying Integer data

Key functions (guarantee a date type):
    cast(date as date)
    cast(timestamp as date)
    cast(string as date)
    cast(date as timestamp)
    cast(date as string)


Examples:

>> select cast('2016-09-01' as date) > 1203828;   
>> true

>> select cast('2016-09-01' as string) > 1203828; 
>> NULL

Key functions (maybe different date type):
    unix_timestamp()                                                                                        # 1474353532  (integer)
    unix_timestamp('2016-08-01:21:00:01', 'yyyy-MM-dd:hh:mm:ss')      # 1470085201 (integer)
    select datediff('2016-09-21', '2015-02-21');                                             # 578 (integer)

    from_unixtime(1474353532)            # '2016-09-20 06:38:52'   (string )
    to_date('2016-08-01 00:03:24');        # '2016-08-01' (string)
    date_add('2016-08-01', 1);                 # '2016-08-02' (string)
    date_sub('2016-08-01', 1);                 # '2016-07-31' (string)

    from_utc_timestamp('2016-09-19 16:40:16.0', 'PST');       # 2016-09-19 09:40:16.0  (timestamp)
    to_utc_timestamp('2016-09-19 09:40:16.0', 'PST');            # 2016-09-19 16:40:16.0 (timestamp)
    current_timestamp()                                                              # '2016-09-20 15:39:43.145' (timestamp)

    current_date()                                     # '2016-09-20' (date)


Examples:

>> select 1474353532 < unix_timestamp();  -- integer vs. integer
>> true
>> select 1474353532 < from_unixtime(unix_timestamp())  -- integer vs. string
>> NULL

>> select to_date(from_unixtime(unix_timestamp())) > '2015-08-20' -- string vs. string
>> true
>> select to_date(from_unixtime(unix_timestamp())) > unix_timestamp() -- string vs. int
>> NULL
1 Comment
Car Charging Nevada link
11/6/2022 09:12:32 pm

Appreciaate your blog post

Reply



Leave a Reply.

    Author

    Data Magician

    Archives

    October 2017
    April 2017
    November 2016
    October 2016
    September 2016

    Categories

    All
    Git
    Hive
    Machine Learning
    Matplotlib
    Pandas
    Plotly
    Python
    R
    Spark

    RSS Feed

Powered by Create your own unique website with customizable templates.
  • Home
  • Python
    • Pandas
    • Matplotlib
    • Interactive Visualization
    • Folium
  • Spark
    • DataFrame
  • Machine Learning
    • Classification >
      • Logistic Regression
    • Dimension Reduction
    • Model Explaination
  • Blog
  • About