Why might performing dimensionality reduction using PCA be bad for a classification task?
Table of Contents
- 1 Why might performing dimensionality reduction using PCA be bad for a classification task?
- 2 Does PCA reduce variance?
- 3 Why is feature Reduction important?
- 4 What is feature selection and dimensionality reduction?
- 5 What is meant by redundant features?
- 6 Does PCA really improve the result of classification task?
- 7 Does PCA Select some features and discard others?
- 8 Is height a good property to separate features in PCA?
Why might performing dimensionality reduction using PCA be bad for a classification task?
Therefore, if you reduce your dimension to 1 i.e. the first principal component, you are throwing away the exact solution to your classification! The problem occurs because PCA is agnostic to Y. Unfortunately, one cannot include Y in the PCA either as this will result in data leakage.
Does PCA reduce variance?
PCA itself is designed to maximize the variance of the first components, and minimize the variance of the last components, compared to all other orthogonal transformations.
Does PCA remove redundant features?
PCA does not eliminate redundant features, it creates a new set of features that is a linear combination of the input features.
Why is feature Reduction important?
The purpose of using feature reduction is to reduce the number of features (or variables) that the computer must process to perform its function. Feature reduction is used to decrease the number of dimensions, making the data less sparse and more statistically significant for machine learning applications.
What is feature selection and dimensionality reduction?
Feature Selection vs Dimensionality Reduction Feature selection is simply selecting and excluding given features without changing them. Dimensionality reduction transforms features into a lower dimension.
Why does PCA reduce accuracy?
Using PCA can lose some spatial information which is important for classification, so the classification accuracy decreases.
What is meant by redundant features?
1. Duplicated features or information, that adds as a precaution against failure or error.
Does PCA really improve the result of classification task?
Let’s see if PCA really improves the result of classification task. In order to comprove it, my strategy is to apply a neural network over a dataset and see its initial results. Afterwards, I am going to perform PCA before classification and apply the same neural network over the new dataset and last compare both results.
Is it possible to reduce the number of features in classification?
This topic is definitively one of the most interesting ones, it is great to think that there are algorithms able to reduce the number of features by choosing the most important ones that still represent the entire dataset. One of the advantages pointed out by authors is that these algorithms can improve the results of classification task.
Does PCA Select some features and discard others?
Often, people end up making a mistake in thinking that PCA selects some features out of the dataset and discards others. The algorithm actually constructs new set of properties based on combination of the old ones.
Is height a good property to separate features in PCA?
Clearly, the height of these vehicle is a good property to separate them. Recall that PCA does not take information of classes into account, it just look at the variance of each feature because is reasonable assumes that features that present high variance are more likely to have a good split between classes.