P(C=c | X1=x1 and ... and Xp=xp)
= P(X1=x1 and ... and Xp=xp | C=c) P(C=c) / P(X1=x1 and ... and
Xp=xp).
P(X1=x1 and ... and
Xp=xp) = SUM_c P(X1=x1 and ... and Xp=xp | C=c) P(C=c).
With just two labels 0 and 1 we have
P(X1=x1 and ... and
Xp=xp) = P(X1=x1 and ... and Xp=xp | C=0) P(C=0) + P(X1=x1 and ... and
Xp=xp | C=1) P(C=1).
All the equations above are mathematically always true. We have
not made any assumptions yet.P(X1=x1 and ... and Xp=xp | C=c) =
P(X1=x1 C = c) ...
P(Xp=xp | C=c).
N(c) = P(C=c) PRODUCT_j
P(Xj=yj | C = c).
Each number in the product comes from one row of the matrix that we
stored for C = c. Finally, we predict the class c for which N(c)
is highest.