Sunday, 2 April 2017

Copy Partition table

If you want to copy existing Partition table in Hive from one cluster to another cluster or copy from one database to another database on same cluster.

Suppose you have Table t1 in database testdb and you have load data in partition table from local directory.

create table testdb.t1(a string, b string) row format delimited fields terminated by ',';
load data local inpath '/home/dheerendra/working/Data/data1' overwrite into table testdb.t1;

Insert data in Partition table.

set hive.exec.dynamic.partition.mode=nonstrict;
insert into table testdb.tp2 partition(b) select a,b from testdb.t2;

Copy data from one cluster to another cluster using  Distcp command. I my case cluster is same.

hadoop distcp   /user/hive/warehouse/testdb.db/tp /user/hive/warehouse/db1.db/

When Cluster is different then.

hadoop distcp   hdfs://cluster1/user/hive/warehouse/testdb.db/tp  hdfs://cluster2/user/hive/warehouse/db1.db/


Create same table structure on destination database.

create table db1.tp(a string) partitioned by(b string);


Use MSCK command to add partition with metastore


msck repair table tp;

If you get error like:
Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
 Take below steps:

set hive.msck.path.validation=ignore;
MSCK REPAIR TABLE table_name;

1 comment: