i try implement pca in python. goal create version behaves matlab's pca implementation. however, think miss crucial point tests partly produce results wrong sign(+/-).
can find mistake algorithm? why signs different?
an implementation of pca based on eigen vectors:
new_array_rank=4 a_mean = np.mean(a, axis=0) = - a_mean covariance_matrix = np.cov(a.t) eigen_values, eigen_vectors = np.linalg.eig(covariance_matrix) new_index = np.argsort(eigen_values)[::-1] eigen_vectors = eigen_vectors[:,new_index] eigen_values = eigen_values[new_index] eigen_vectors = eigen_vectors[:,:new_array_rank] return np.dot(eigen_vectors.t, a.t).t
my test values:
array([[ 0.13298325, 0.2896928 , 0.53589224, 0.58164269, 0.66202221, 0.95414116, 0.03040784, 0.26290471, 0.40823539, 0.37783385], [ 0.90521267, 0.86275498, 0.52696221, 0.15243867, 0.20894357, 0.19900414, 0.50607341, 0.53995902, 0.32014539, 0.98744942], [ 0.87689087, 0.04307512, 0.45065793, 0.29415066, 0.04908066, 0.98635538, 0.52091338, 0.76291385, 0.97213094, 0.48815925], [ 0.75136801, 0.85946751, 0.10508436, 0.04656418, 0.08164919, 0.88129981, 0.39666754, 0.86325704, 0.56718669, 0.76346602], [ 0.93319721, 0.5897521 , 0.75065047, 0.63916306, 0.78810679, 0.92909485, 0.23751963, 0.87552313, 0.37663086, 0.69010429], [ 0.53189229, 0.68984247, 0.46164066, 0.29953259, 0.10826334, 0.47944168, 0.93935082, 0.40331874, 0.18541041, 0.35594587], [ 0.36399075, 0.00698617, 0.61030608, 0.51136309, 0.54185601, 0.81383604, 0.50003674, 0.75414875, 0.54689801, 0.9957493 ], [ 0.27815017, 0.65417397, 0.57207255, 0.54388744, 0.89128334, 0.3512483 , 0.94441934, 0.05305929, 0.77389942, 0.93125228], [ 0.80409485, 0.2749575 , 0.22270875, 0.91869706, 0.54683128, 0.61501493, 0.7830902 , 0.72055598, 0.09363186, 0.05103846], [ 0.12357816, 0.29758902, 0.87807485, 0.94348706, 0.60896429, 0.33899019, 0.36310027, 0.02380186, 0.67207071, 0.28638936]])
my result of pca eigen vectors:
array([[ 5.09548931e-01, -3.97079651e-01, -1.47555867e-01, -3.55343967e-02, -4.92125732e-01, -1.78191399e-01, -3.29543974e-02, 3.71406504e-03, 1.06404170e-01, -1.66533454e-16], [ -5.15879041e-01, 6.40833419e-01, -7.54601587e-02, -2.00776798e-01, -7.07247669e-02, 2.68582368e-01, -1.66124362e-01, 1.03414828e-01, 7.76738500e-02, 5.55111512e-17], [ -4.42659342e-01, -5.13297786e-01, -1.65477203e-01, 5.33670847e-01, 2.00194213e-01, 2.06176265e-01, 1.31558875e-01, -2.81699724e-02, 6.19571305e-02, -8.32667268e-17], [ -8.50397468e-01, 5.14319846e-02, -1.46289906e-01, 6.51133920e-02, -2.83887201e-01, -1.90516618e-01, 1.45748370e-01, 9.49464768e-02, -1.05989648e-01, 4.16333634e-17], [ -1.61040296e-01, -3.47929944e-01, -1.19871598e-01, -6.48965493e-01, 7.53188055e-02, 1.31730340e-01, 1.33229858e-01, -1.43587499e-01, -2.20913989e-02, -3.40005801e-16], [ -1.70017435e-01, 4.22573148e-01, 4.81511942e-01, 2.42170125e-01, -1.18575764e-01, -6.87250591e-02, -1.20660307e-01, -2.22865482e-01, -1.73666882e-02, -1.52655666e-16], [ 6.90841779e-02, -2.86233901e-01, -4.16612350e-01, 9.38935057e-03, 3.02325120e-01, -1.61783482e-01, -3.55465509e-01, 1.15323059e-02, -5.04619674e-02, 4.71844785e-16], [ 5.26189089e-01, 6.81324113e-01, -2.89960115e-01, 2.01781673e-02, 3.03159463e-01, -2.11777986e-01, 2.25937548e-01, -5.49219872e-05, 3.66268329e-02, -1.11022302e-16], [ 6.68680313e-02, -2.99715813e-01, 8.53428694e-01, -1.30066853e-01, 2.31410283e-01, -1.02860624e-01, 1.95449586e-02, 1.30218425e-01, 1.68059569e-02, 2.22044605e-16], [ 9.68303353e-01, 4.80944309e-02, 2.62865615e-02, 1.44821658e-01, -1.47094421e-01, 3.07366196e-01, 1.91849667e-02, 5.08517759e-02, -1.03558238e-01, 1.38777878e-16]])
test result of same data using matlab's pca function:
array([[ -5.09548931e-01, 3.97079651e-01, 1.47555867e-01, 3.55343967e-02, -4.92125732e-01, -1.78191399e-01, -3.29543974e-02, -3.71406504e-03, -1.06404170e-01, -0.00000000e+00], [ 5.15879041e-01, -6.40833419e-01, 7.54601587e-02, 2.00776798e-01, -7.07247669e-02, 2.68582368e-01, -1.66124362e-01, -1.03414828e-01, -7.76738500e-02, -0.00000000e+00], [ 4.42659342e-01, 5.13297786e-01, 1.65477203e-01, -5.33670847e-01, 2.00194213e-01, 2.06176265e-01, 1.31558875e-01, 2.81699724e-02, -6.19571305e-02, -0.00000000e+00], [ 8.50397468e-01, -5.14319846e-02, 1.46289906e-01, -6.51133920e-02, -2.83887201e-01, -1.90516618e-01, 1.45748370e-01, -9.49464768e-02, 1.05989648e-01, -0.00000000e+00], [ 1.61040296e-01, 3.47929944e-01, 1.19871598e-01, 6.48965493e-01, 7.53188055e-02, 1.31730340e-01, 1.33229858e-01, 1.43587499e-01, 2.20913989e-02, -0.00000000e+00], [ 1.70017435e-01, -4.22573148e-01, -4.81511942e-01, -2.42170125e-01, -1.18575764e-01, -6.87250591e-02, -1.20660307e-01, 2.22865482e-01, 1.73666882e-02, -0.00000000e+00], [ -6.90841779e-02, 2.86233901e-01, 4.16612350e-01, -9.38935057e-03, 3.02325120e-01, -1.61783482e-01, -3.55465509e-01, -1.15323059e-02, 5.04619674e-02, -0.00000000e+00], [ -5.26189089e-01, -6.81324113e-01, 2.89960115e-01, -2.01781673e-02, 3.03159463e-01, -2.11777986e-01, 2.25937548e-01, 5.49219872e-05, -3.66268329e-02, -0.00000000e+00], [ -6.68680313e-02, 2.99715813e-01, -8.53428694e-01, 1.30066853e-01, 2.31410283e-01, -1.02860624e-01, 1.95449586e-02, -1.30218425e-01, -1.68059569e-02, -0.00000000e+00], [ -9.68303353e-01, -4.80944309e-02, -2.62865615e-02, -1.44821658e-01, -1.47094421e-01, 3.07366196e-01, 1.91849667e-02, -5.08517759e-02, 1.03558238e-01, -0.00000000e+00]])
the sign , other normalization choices eigenvectors arbitrary. matlab , numpy norm eigenvectors in same way, sign arbitrary , can depend on details of linear algebra library used.
when wrote numpy equivalent of matlab's princomp, normalized sign of eigenvectors when compared them of matlab in unit tests.
Comments
Post a Comment