r - How to create tidy data with a Dataset where values are duplicated over many rows -
sorry if large of example. looks more real life have hard time thinking of example better explain situation.
what want tidy data.frame can use medical conditions in summaries (avg) , in plots (edited) what need answered trying accomplish correctly. want row huge string values divided commas? need split more columns?
reports our data base vendor (actual data changed). the reports not give unique key. in data.frames person.id unique in , others multi rows of person.id , values.
person.id <- c("1017", "1018", "1018", "1018", "1018", "1018", "1018", "1018", "1018", "1018", "1018", "1019", "1019", "1020", "1020") med.condition <- c(na, "allergic rhinitis", "allergic rhinitis", "atopic dermatitis", "atopic dermatitis", "developmental speech", "developmental speech", "eye condition", "eye condition", "speech delay", "speech delay", "allergic reaction", na, "eczema", "obese") cond.type <- c("assessment", "assessment", na, "assessment", na, "assessment", na, "assessment", na, "assessment", na, "assessment", "assessment", "assessment", "assessment") df <- data.frame(person.id, med.condition, cond.type)
which looks like:
person.id med.condition cond.type 1 1017 na assessment 2 1018 allergic rhinitis assessment 3 1018 allergic rhinitis na 4 1018 atopic dermatitis assessment 5 1018 atopic dermatitis na 6 1018 developmental speech assessment 7 1018 developmental speech na 8 1018 eye condition assessment 9 1018 eye condition na 10 1018 speech delay assessment 11 1018 speech delay na 12 1019 allergic reaction assessment 13 1019 na assessment 14 1020 eczema assessment 15 1020 obese assessment
i want rows equal 1 person.id
do want have (only first 5 columns shown): used taplly fails @ tidy
condition1 condition2 condition3 condition4 condition5 1017 na na na na na 1018 allergic rhinitis atopic dermatitis allergic reaction developmental speech eye condition 1019 na na na na na 1020 eczema obese na na na
what do make dataset tidy?
med.condtion 1017 na 1018 "allergic rhinitis", "atopic dermatitis", "developmental speech", "eye condition", "speech delay", "allergic reaction" 1019 na 1020 "eczema" "obese"
or need think of in new way?
what tired tapply, reshape2
taplly doesn't work on example in program sorry
df2 <- data.frame(person.id, med.condition, cond.type) df2.wide <- tapply(x = df2$medical.condition, index = df2$person.id, function(x){ ux <- unique(x) c(ux, rep(x = na, 9 - length (ux))) }) df2.wide <- as.data.frame(do.call('rbind', df2.wide), stringsasfactors = false) names(promis.b.temp) <- paste0('condition', 1:9)
cols <- names(promis.b.temp) df2$med.all <- apply( df2[, cols], 1, paste, collapse = ", ")
reshape2 realized wouldn't work library(reshape2) df3 <- test %>% melt() %>% unique() %>% cast(person.id)
- am tackling issue correctly?
- will have problems when reports having filters strings?
i don't understand question. data appear "tidy".
the 2 things notice (1) duplicated values (which may or may not wanted) , (2) lack of unique id per person , medical condition.
if wanted long string separated commas (hard deal later, in opinion), can aggregate unique values in first 2 columns, this:
library(data.table) as.data.table(unique(df[1:2]))[, paste(med.condition, collapse = ","), = person.id] # person.id v1 # 1: 1017 na # 2: 1018 allergic rhinitis,atopic dermatitis,developmental speech,eye condition,speech delay # 3: 1019 allergic reaction,na # 4: 1020 eczema,obese
if wanted sequential id each person, can use getanid
"splitstackshape" package:
library(splitstackshape) getanid(as.data.table(unique(df[1:2]))
this let convert wide form, if desired, using dcast.data.table
, this:
library(splitstackshape) dcast.data.table(getanid(as.data.table(unique(df[1:2])), "person.id"), person.id ~ .id, value.var = "med.condition", fun.aggregate = function(x) paste(x, collapse = ",")) # person.id 1 2 3 4 5 # 1: 1017 na # 2: 1018 allergic rhinitis atopic dermatitis developmental speech eye condition speech delay # 3: 1019 allergic reaction na # 4: 1020 eczema obese
Comments
Post a Comment