With Hive HBase-backed external tables, you can override the put timestamp by setting the hbase.put.timestamp property.
However, at this time (last tested with Hive 0.8.1), you need to drop and re-create the external table every time you want to update the put timestamp because of these reasons:
- hbase.put.timestamp is a serde property set when defining the external table, and is fixed at creation time
- in general, you can modify serde properties via ALTER TABLE.. but this doesn’t work for external tables.
As an example of #2, if you try to update the put timestamp (suppose we’re changing it to 10000005), you would run:
alter table stock SET SERDEPROPERTIES ("hbase.put.timestamp" = "10000005", "hbase.columns.mapping" = ":key,f1:name,f1:lv");
But because this is an external table, you get this error:
FAILED: Error in metadata: Cannot use ALTER TABLE on a non-native table
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
Given the above issues, for now the only way to make this work is to drop and recreate the external table with the updated timestamp. Clearly you don’t want to do this for every put, but hive-to-hbase inserts are a batch scenario anyway. So the best fit is an example like batch inserts to hbase every hour, in which case you’d want to assign the current datetime in milliseconds, rounded to hour.
Note that this put timestamp value can be parameterized within a script for creating the external table, which is useful for recurring loads.
So if we have a script to create the external table called create_external_table.hql, we’d modify the serdeproperties to add:
WITH SERDEPROPERTIES ("hbase.put.timestamp" = "${hiveconf:put_timestamp}" ...
Then, for each load to hbase, you’d do the following
- drop external hbase table (note this doesn’t drop hbase data since it’s external)
- set put_timestamp environment variable
- call create_external_table.hql with slight modification shown below
- load the data as usual
hive -f create_external_table.hql -hiveconf put_timestamp=$put_timestamp
Loading in this manner, I verified multiple versions are successfully created and maintained, up to the max_versions amount (specified at table creation time).