algorithm - Naive Bayesian for rating -


suppose have training set has following data:

type  |  size  |   price  |  rating  |  suggestion --------------------------------------------------- shirt     m        budget      0           bad trouser   l        budget      4.2         shirt     m        expensive   2.3         ....etc.... 

here taking suggestion class need suggest when input sample provided. means, when input sample(different training dataset) given, need figure out whether good or bad.

am able understand probability calculation based on example found internet:

dataset: http://i.imgur.com/c0ptard.png

calculation input sample: http://i.imgur.com/kggedlj.png

the doubt in dataset that, have column called rating. so, column also, probability calculation other columns(like in screenshot above)? or need consider other way 1 particular column's values? mean , standard deviation?

thank you

columns "size" , "price" represent categorical data (well, actually, ordinal, that's point). while can model "rating" categorical value too, may bad idea , it'd better model data numerical. , here's why.

the difference in treating data categorical , numerical in different value. suppose have 3 observations of x: x=12, x=13, x=1344. question then: how can probabilities p(x=12), p(x=1344) , p(x=13) differ? answer heavily depends on kind of data these values represent.

for example, x denotes user id or ordering irrelevant, these probabilities can differ arbitrary. if x denotes, say, pay rate, there's not difference between 12 , 13 compared third value.

it helps infer more knowledge data. example, there might no values 4.9 in dataset, lots of 4.8 , 5.0. model "interpolates" between these two, giving probability 4.9 though wasn't presented in dataset.

so, yes, should use numerical distribution (gaussian, example) rating data. suggest cleanup: apparently, 0 stands "not rated" rather "extremely bad", may want tell model (for example, replacing 0s average rating).


Comments

Popular posts from this blog

python - mat is not a numerical tuple : openCV error -

c# - MSAA finds controls UI Automation doesn't -

wordpress - .htaccess: RewriteRule: bad flag delimiters -