# Theoretical part

## Feature dimensionality reduction

Feature dimensionality reduction is an application of unsupervised learning: reducing n-dimensional data to m-dimensional data (n>m). Can be applied to data compression and other fields

## Principal component analysis (PCA)

Principal component analysis is a commonly used feature dimensionality reduction method. For m-dimensional data A, you can reduce the dimensionality to obtain an n-dimensional data B (m>n), which satisfies \$B = f(A)\$ and \$A/approx g(f(A))\$, where f(x) is the encoding function and g(x) is the decoding function.

When performing principal component analysis, the optimization goal is \$c = argmin ||x-g(c)||_{2}\$, where c is the encoding and g(c) is the decoding function

# Code

## Import data set

```import numpy as np
import pandas as pd```
```digits_train = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/optdigits/optdigits.tra', header=None)

## Split data and labels

```train_x,train_y = digits_train[np.arange(64)],digits_train
test_x,test_y = digits_test[np.arange(64)],digits_test```

## Principal component analysis

```from sklearn.decomposition import PCA
estimator = PCA(n_components=20)
pca_train_x = estimator.fit_transform(train_x)
pca_test_x = estimator.transform(test_x)```

## Training support vector machine

`from sklearn.svm import LinearSVC`

### Raw data

```svc = LinearSVC()
svc.fit(X=train_x,y=train_y)
svc.score(test_x,test_y)```
`0.9393433500278241`

### PCA processed data

```svc_pca = LinearSVC()
svc_pca.fit(pca_train_x,train_y)
svc_pca.score(pca_test_x,test_y)```
`0.91819699499165275`
Reference: https://cloud.tencent.com/developer/article/1110770 sklearn-based principal component analysis theory partial code implementation-Cloud + Community-Tencent Cloud