python - luigi target for non-existent table -
i'm trying set simple table existence test luigi task using luigi.hive.hivetabletarget
i create simple table in hive make sure there:
create table test_table (a int);
next set target luigi:
from luigi.hive import hivetabletarget target = hivetabletarget(table='test_table') >>> target.exists() true
great, next try table know doesn't exist make sure returns false.
target = hivetabletarget(table='test_table_not_here') >>> target.exists()
and raises exception:
traceback (most recent call last): file "<stdin>", line 1, in <module> file "/usr/lib/python2.6/site-packages/luigi/hive.py", line 344, in exists return self.client.table_exists(self.table, self.database) file "/usr/lib/python2.6/site-packages/luigi/hive.py", line 117, in table_exists stdout = run_hive_cmd('use {0}; describe {1}'.format(database, table)) file "/usr/lib/python2.6/site-packages/luigi/hive.py", line 62, in run_hive_cmd return run_hive(['-e', hivecmd], check_return_code) file "/usr/lib/python2.6/site-packages/luigi/hive.py", line 56, in run_hive stdout, stderr) luigi.hive.hivecommanderror: ('hive command: hive -e use default; describe test_table_not_here failed error code: 17', '', '\nlogging initialized using configuration in jar:file:/opt/cloudera/parcels/cdh-5.2.0-1.cdh5.2.0.p0.36/jars/hive-common-0.13.1- cdh5.2.0.jar!/hive-log4j.properties\nok\ntime taken: 0.822 seconds\nfailed: semanticexception [error 10001]: table not found test_table_not_here\n')
edited formatting clarity
i don't understand last line of exception. of course table not found, whole point of existence check. expected behavior or have configuration issue need work out?
okay looks may have been bug in latest tagged release (1.0.19) fixed on master branch. code responsible line:
stdout = run_hive_cmd('use {0}; describe {1}'.format(database, table)) return not "does not exist" in stdout
which changed in master be:
stdout = run_hive_cmd('use {0}; show tables "{1}";'.format(database, table)) return stdout , table in stdout
the latter works fine whereas former throws hivecommanderror
.
if want solution without having update master branch, create own target class minimal effort:
from luigi.hive import hivetabletarget, run_hive_cmd class myhivetarget(hivetabletarget): def exists(self): stdout = run_hive_cmd('use {0}; show tables "{1}";'.format(self.database, self.table)) return self.table in stdout
this produce desired output.
Comments
Post a Comment