azure - Loading PIG output files into Hive table with some blank cells -


i have loaded 250000 record csv file hdfs , have performed etl functions on such removing characters in string other 0-9, a-z , a-z it's nice , clean.

i've saved output of etl hdfs loading hive. while in hive created schema table , set appropriate data types each column.

create external table pigoutputhive (   id string,   score int,   viewcount int,   owneruserid string,   body string,   rank int ) row format delimited fields terminated ',' location '/user/admin/pigoutputetl'; 

when run simple query on data such as:

select * pigoutputhive limit 100000; 

the data looks should, , when download local machine , view in excel csv looks good.

when try , run following query on same table every field being returned integer string columns. see screenshot below.

hive output integers columns

can see going wrong? of original 250000 rows there blanks in particular fields such owneruserid, need tell pig or hive how handle these?


Comments