apache pig - Issues saving to Hive table from Pig -
i using hcatalog
read , write data hive pig script follows:
a = load 'customer' using org.apache.hcatalog.pig.hcatloader(); b = load 'address' using org.apache.hcatalog.pig.hcatloader(); c = join cmr_id,b cmr_id; store c 'cmr_address_join' using org.apache.hcatalog.pig.hcatstorer();
table definition customer is:
cmr_id int name string
address:
addr_id int cmr_id int address string
cmr_address_join:
cmr_id int name string addr_id int address string
when run this, pig throws following error:
error org.apache.pig.tools.grunt.grunt - error 1115: column names should in lowercase. invalid name found: a::cmr_id
i believe may because pig trying match pig generated file names hive columns , not matching (a::cmr_id versus cmr_id
). think hcatalogstorer
expecting alias cmr_id
, not a::cmr_id
. wish hcatalogstorer
ignored alias prefix , considered field name.
grunt> describe c; c: {a::cmr_id: int,a::name: chararray,b::addr_id: int,b::cmr_id: int,b::address: chararray}
is there way drop prefix of field in pig (i.e. a::)? or if has workaround or solution, great.
i know can use following explicitly add alias , work.
d = foreach c generate a::cmr_id cmr_id,a::name name, b::addr_id addr_id, b::address address; store d 'cmr_address_join' using org.apache.hcatalog.pig.hcatstorer();
but problem is, have many tables each having hundreds of columns. become tedious specify alias above.
any fix appreciated.
you can use $0, $1 , on access columns , please rename them column name example : $0 cmr_id
Comments
Post a Comment