If you want to copy existing Partition table in Hive from one cluster to another cluster or copy from one database to another database on same cluster.
Suppose you have Table t1 in database testdb and you have load data in partition table from local directory.
create table testdb.t1(a string, b string) row format delimited fields terminated by ',';
load data local inpath '/home/dheerendra/working/Data/data1' overwrite into table testdb.t1;
Insert data in Partition table.
set hive.exec.dynamic.partition.mode=nonstrict;
insert into table testdb.tp2 partition(b) select a,b from testdb.t2;
Suppose you have Table t1 in database testdb and you have load data in partition table from local directory.
create table testdb.t1(a string, b string) row format delimited fields terminated by ',';
load data local inpath '/home/dheerendra/working/Data/data1' overwrite into table testdb.t1;
Insert data in Partition table.
set hive.exec.dynamic.partition.mode=nonstrict;
insert into table testdb.tp2 partition(b) select a,b from testdb.t2;
Copy data from one cluster to another cluster using Distcp command. I my case cluster is same.
hadoop distcp /user/hive/warehouse/testdb.db/tp /user/hive/warehouse/db1.db/
When Cluster is different then.
hadoop distcp hdfs://cluster1/user/hive/warehouse/testdb.db/tp hdfs://cluster2/user/hive/warehouse/db1.db/
Create same table structure on destination database.
create table db1.tp(a string) partitioned by(b string);
Use MSCK command to add partition with metastore
msck repair table tp;
If you get error like:
Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
Take below steps:
If you get error like:
Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
Take below steps:
set hive.msck.path.validation=ignore;
MSCK REPAIR TABLE table_name;
Nice post...
ReplyDelete