Linear discriminant analysis (LDA) is a popular statistical technique that can be used to analyze data and distinguish groups or classes of objects by their characteristics. LDA belongs to the category of supervised learning algorithms, in which an algorithm learns from labeled training data how to classify new data points into one of two or more categories. It is an example of a linear classifier, as it uses linear combinations of features for classification. This makes it very useful for dealing with high dimensional data. The conceptual framework underlying LDA is based on Bayes’ theorem and involves three steps: 1) determining the prior probabilities; 2) computing the likelihoods; and 3) calculating the posterior probabilities.
First, when using LDA we need to determine the prior probability that each class label occurs in our dataset. The prior probability is calculated by simply counting how many times each class has been observed in our training dataset and dividing these counts by the total number of observations in our dataset. This gives us an estimate of what percentage of observations belong to any specific class label given our training set size.
Discuss the conceptual framework of linear discriminant analysis. When do we use it for data analysis?
Second, after we have computed the prior probabilities, we next compute likelihoods for each feature vector associated with each individual observation in order to calculate its posterior probability relative to other class labels contained within our training dataset. To do this, we measure how well different explanatory variables predict whether an observation belongs to a particular group or class given its properties along those same explanatory variables. We then take into account both these predictive power measurements as well as their correlation with one another through matrix multiplication in order to compute a final score denoting how likely it is that any given observation belongs to a specific group or category relative all others contained within our training dataset according those same measures (likelihood scores).
Finally, once we have measured all likelihood scores associated with individual observations contained within our dataset corresponding different classes (groups), we use these scores combined with knowledge about prior probabilities from Step 1 described above in order calculate numerical estimates regarding what percentage chances any single variable belonging a certain group/class over against other possibilities present inside our training set accordingly combining information about what values taken jointly across multiple explanatory variables tell us about respective membership associations with different classes/groups.