pytranskit.optrans.decomposition package

CCA

class pytranskit.optrans.decomposition.cca.CCA(n_components=1, scale=True, max_iter=500, tol=1e-06, copy=True)[source]

Bases: object

Canonical Correlation Analysis.

This is a wrapper for scikit-learn’s CCA class, which allows it to be used in a similar manner to PLDA and PCA.

Parameters:

n_components (int (default=1)) – Number of components to keep.
scale (bool (default=True)) – Whether to scale the data?
max_iter (int (default=500)) – The maximum number of iterations of the NIPALS inner loop.
tol (float (default=1e-6)) – The tolerance used in the iterative algorithm.
copy (bool (default=True)) – Whether the deflation be done on a copy. Let the default value to True unless you don’t care about side effects.

components_

X block weights vectors.

Type:: array, shape (n_components, n_features)

components_y_

Y block weights vectors.

Type:: array, shape (n_components, n_targets)

explained_variance_

The amount of variance explained by each of the selected weights for the X data.

Type:: array, shape (n_components,)

explained_variance_y_

The amount of variance explained by each of the selected weights for the Y data.

Type:: array, shape (n_components,)

mean_

Per-feature empirical mean of X, estimated from the training set.

Type:: array, shape (n_features,)

mean_y_

Per-feature empirical mean of Y, estimated from the training set.

Type:: array, shape (n_targets,)

n_components_

The number of components.

Type:: int

References

[scikit-learn’s documentation on CCA] (http://scikit-learn.org/stable/modules/generated/sklearn.cross_decomposition.CCA.html) Jacob A. Wegelin. A survey of Partial Least Squares (PLS) methods, with emphasis on the two-block case. Technical Report 371, Department of Statistics, University of Washington, Seattle, 2000.

fit(X, Y)[source]

Fit model to data.

Parameters:

X (array, shape (n_samples, n_features)) – Training vectors, where n_samples is the number of samples and n_features is the number of predictors.
Y (array, shape (n_samples, n_targets)) – Target vectors, where n_samples is the number of samples and n_targets is the number of response variables.

fit_transform(X, Y)[source]

Learn and apply the dimension reduction on the train data.

Parameters:

X (array, shape (n_samples, n_features)) – Training vectors, where n_samples is the number of samples and n_features is the number of predictors.
Y (array, shape (n_samples, n_targets)) – Target vectors, where n_samples is the number of samples and n_targets is the number of response variables.

Returns:

X_new (array, shape (n_samples, n_components)) – Transformed X data.
Y_new (array, shape (n_samples, n_components)) – Transformed Y data.

inverse_transform(X, Y=None)[source]

Transform data back to its original space.

Note: This is not exact!

Parameters:

X (array, shape (n_samples, n_components)) – Transformed X data.
Y (array, shape (n_samples, n_components) or None (default=None)) – Transformed Y data. If Y=None, only the X data are transformed back to the original space.

Returns:

X_original (array, shape (n_samples, n_features)) – X data transformed back into original space.
Y_original (array, shape (n_samples, n_targets)) – Y data transformed back into original space. If Y=None, only X_original is returned.

score(X, Y)[source]

Return Pearson product-moment correlation coefficients for each component.

The values of R are between -1 and 1, inclusive.

Note: This is different from sklearn.cross_decomposition.CCA.score(), which returns the coefficient of determination of the prediction.

Parameters:

X (array, shape (n_samples, n_features)) – Input X data.
Y (array, shape (n_samples, n_targets) or None (default=None)) – Input Y data.

Returns:

score – Pearson product-moment correlation coefficients. If n_components=1, a single value is returned, else an array of correlation coefficients is returned.

Return type:

float or array, shape (n_components,)

transform(X, Y=None)[source]

Apply the dimension reduction learned on the train data.

Parameters:

X (array, shape (n_samples, n_features)) – Input X data.
Y (array, shape (n_samples, n_targets) or None (default=None)) – Input Y data. If Y=None, then only the transformed X data are returned.

Returns:

X_new (array, shape (n_samples, n_components)) – Transformed X data.
Y_new (array, shape (n_samples, n_components)) – Transformed Y data. If Y=None, only X_new is returned.

pytranskit.optrans.decomposition.cca.CanonCorr: alias of CCA

pytranskit.optrans.decomposition.cca.check_array(array, ndim=None, dtype='numeric', force_all_finite=True, force_strictly_positive=False)[source]

Input validation on an array, list, or similar.

Parameters:

array (object) – Input object to check/convert
ndim (int or None (default=None)) – Number of dimensions that array should have. If None, the dimensions are not checked
dtype (string, type, list of types or None (default='numeric')) – Data type of result. If None, the dtype of the input is preserved. If ‘numeric’, dtype is preserved unless array.dtype is object. If dtype is a list of types, conversion on the first type is only performed if the dtype of the input is not in the list.
force_all_finite (boolean (default=True)) – Whether to raise an error on np.inf and np.nan in array
force_strictly_positive (boolean (default=False)) – Whether to raise an error if any array elements are <= 0

Returns:

array_converted – The converted and validated array.

Return type:

object

PLDA

class pytranskit.optrans.decomposition.plda.BaseEstimator[source]

Bases: object

Base class for all estimators in scikit-learn.

Notes

All estimators should specify all the parameters that can be set at the class level in their __init__ as explicit keyword arguments (no *args or **kwargs).

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters:: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:: params – Parameter names mapped to their values.
Return type:: dict

set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:: **params (dict) – Estimator parameters.
Returns:: self – Estimator instance.
Return type:: estimator instance

class pytranskit.optrans.decomposition.plda.PLDA(alpha=1.0, n_components=None)[source]

Bases: BaseEstimator

Penalized Linear Discriminant Analysis.

This is both a dimensionality reduction method and a linear classifier.

Parameters:

alpha (scalar (default=1.)) – Parameter that controls the proportion of LDA vs PCA. If alpha=0, PLDA functions like LDA. If alpha is large, PLDA functions more like PCA.
n_components (int or None (default=None)) – Number of components to keep. If n_components is not set, all components are kept: n_components == min(n_samples, n_features).

components_

Axes in the feature space. The components are sorted by the explained variance.

Type:: array, shape (n_components, n_features)

explained_variance_

The amount of variance explained by each of the selected components.

Type:: array, shape (n_components,)

explained_variance_ratio_

Proportion of variance explained by each of the selected components. If n_components is not set then all components are stored and the sum of explained variance ratios is equal to 1.0.

Type:: array, shape(n_components,)

mean_

Per-feature empirical mean, estimated from the training set.

Type:: array, shape (n_features,)

n_components_

The number of components.

Type:: int

coef_

Weight vector(s).

Type:: array, shape (n_features,) or (n_classes, n_features)

intercept_

Intercept term.

Type:: array, shape (n_features,)

class_means_

Class means, estimated from the training set.

Type:: array, shape (n_classes, n_features)

classes_

Unique class labels.

Type:: array, shape (n_classes,)

References

W. Wang et al. Penalized Fisher Discriminant Analysis and its Application to Image-Based Morphometry. Pattern Recognit. Lett., 32(15):2128-35, 2011

decision_function(X)[source]

Predict confidence scores for samples.

The confidence score for a sample is the signed distance of that sample to the hyperplane.

Parameters:: X (array, shape (n_samples, n_features)) – Input data.
Returns:: scores – else (n_samples, n_classes) Confidence scores per (sample, class) combination. In the binary case, confidence score for self.classes_[1] where >0 means this class would be predicted.
Return type:: array, shape=(n_samples,) if n_classes == 2

fit(X, y)[source]

Fit PLDA model according to the given training data and parameters.

Parameters:

X (array, shape (n_samples, n_features)) – Training data.
y (array, shape (n_samples,)) – Target values.

fit_transform(X, y)[source]

Fit the model with X and transform X.

Parameters:

X (array, shape (n_samples, n_features)) – Training data.
y (array, shape (n_samples,)) – Target values.

Returns:

X_new – Transformed data.

Return type:

array, shape (n_samples, n_components)

inverse_transform(X)[source]

Transform data back to its original space.

Note: If n_components is less than the maximum, information will be lost, so reconstructed data will not exactly match the original data.

Parameters:: X (array shape (n_samples, n_components)) – New data.
Returns:: X_original – Data transformed back into original space.
Return type:: array, shape (n_samples, n_features)

predict(X)[source]

Predict class labels for samples in X.

Parameters:: X (array, shape (n_samples, n_features)) – Input data.
Returns:: C – Predicted class label per sample.
Return type:: array, shape (n_samples,)

predict_log_proba(X)[source]

Estimate log probability.

Parameters:: X (array, shape (n_samples, n_features)) – Input data.
Returns:: C – Estimated log probabilities.
Return type:: array, shape (n_samples, n_classes)

predict_proba(X)[source]

Estimate probability.

Parameters:: X (array, shape (n_samples, n_features)) – Input data.
Returns:: C – Estimated probabilities.
Return type:: array, shape (n_samples, n_classes)

predict_transformed(X_trans)[source]

Predict class labels for data that have already been transformed by self.transform(X).

This is useful for plotting classification boundaries. Note: Due to arithemtic discrepancies, this may return slightly different class labels to self.predict(X).

Parameters:: X_trans (array, shape (n_samples, n_components)) – Test samples that have already been transformed into PLDA space.
Returns:: y – Predicted class labels for X_trans.
Return type:: array, shape (n_samples,)

score(X, y, sample_weight=None)[source]

Returns the mean accuracy on the given test data and labels.

Parameters:

X (array, shape (n_samples, n_features)) – Test samples.
y (array, shape (n_samples,)) – True labels for X.
sample_weight (array, shape (n_samples,), optional) – Sample weights.

Returns:

score – Mean accuracy of self.predict(X) w.r.t. y.

Return type:

float

transform(X)[source]

Transform data.

Parameters:: X (array, shape (n_samples, n_features)) – Input data.
Returns:: X_new – Transformed data.
Return type:: array, shape (n_samples, n_components)

pytranskit.optrans.decomposition.plda.accuracy_score(y_true, y_pred, *, normalize=True, sample_weight=None)[source]

Accuracy classification score.

In multilabel classification, this function computes subset accuracy: the set of labels predicted for a sample must exactly match the corresponding set of labels in y_true.

See also

balanced_accuracy_score: Compute the balanced accuracy to deal with imbalanced datasets.
jaccard_score: Compute the Jaccard similarity coefficient score.
hamming_loss: Compute the average Hamming loss or Hamming distance between two sets of samples.
zero_one_loss: Compute the Zero-one classification loss. By default, the function will return the percentage of imperfectly predicted subsets.

Notes

In binary classification, this function is equal to the jaccard_score function.

Examples

>>> from sklearn.metrics import accuracy_score
>>> y_pred = [0, 2, 1, 3]
>>> y_true = [0, 1, 2, 3]
>>> accuracy_score(y_true, y_pred)
0.5
>>> accuracy_score(y_true, y_pred, normalize=False)
2

In the multilabel case with binary label indicators:

>>> import numpy as np
>>> accuracy_score(np.array([[0, 1], [1, 1]]), np.ones((2, 2)))
0.5

pytranskit.optrans.decomposition.plda.check_array(array, ndim=None, dtype='numeric', force_all_finite=True, force_strictly_positive=False)[source]

Input validation on an array, list, or similar.

Parameters:

array (object) – Input object to check/convert
ndim (int or None (default=None)) – Number of dimensions that array should have. If None, the dimensions are not checked
dtype (string, type, list of types or None (default='numeric')) – Data type of result. If None, the dtype of the input is preserved. If ‘numeric’, dtype is preserved unless array.dtype is object. If dtype is a list of types, conversion on the first type is only performed if the dtype of the input is not in the list.
force_all_finite (boolean (default=True)) – Whether to raise an error on np.inf and np.nan in array
force_strictly_positive (boolean (default=False)) – Whether to raise an error if any array elements are <= 0

Returns:

array_converted – The converted and validated array.

Return type:

object

pytranskit.optrans.decomposition.plda.eigh(a, b=None, lower=True, eigvals_only=False, overwrite_a=False, overwrite_b=False, turbo=True, eigvals=None, type=1, check_finite=True, subset_by_index=None, subset_by_value=None, driver=None)[source]

Solve a standard or generalized eigenvalue problem for a complex Hermitian or real symmetric matrix.

Find eigenvalues array w and optionally eigenvectors array v of array a, where b is positive definite such that for every eigenvalue λ (i-th entry of w) and its eigenvector vi (i-th column of v) satisfies:

              a @ vi = λ * b @ vi
vi.conj().T @ a @ vi = λ
vi.conj().T @ b @ vi = 1

In the standard problem, b is assumed to be the identity matrix.

Parameters:

a ((M, M) array_like) – A complex Hermitian or real symmetric matrix whose eigenvalues and eigenvectors will be computed.
b ((M, M) array_like, optional) – A complex Hermitian or real symmetric definite positive matrix in. If omitted, identity matrix is assumed.
lower (bool, optional) – Whether the pertinent array data is taken from the lower or upper triangle of a and, if applicable, b. (Default: lower)
eigvals_only (bool, optional) – Whether to calculate only eigenvalues and no eigenvectors. (Default: both are calculated)
subset_by_index (iterable, optional) – If provided, this two-element iterable defines the start and the end indices of the desired eigenvalues (ascending order and 0-indexed). To return only the second smallest to fifth smallest eigenvalues, [1, 4] is used. [n-3, n-1] returns the largest three. Only available with “evr”, “evx”, and “gvx” drivers. The entries are directly converted to integers via int().
subset_by_value (iterable, optional) – If provided, this two-element iterable defines the half-open interval (a, b] that, if any, only the eigenvalues between these values are returned. Only available with “evr”, “evx”, and “gvx” drivers. Use np.inf for the unconstrained ends.
driver (str, optional) – Defines which LAPACK driver should be used. Valid options are “ev”, “evd”, “evr”, “evx” for standard problems and “gv”, “gvd”, “gvx” for generalized (where b is not None) problems. See the Notes section.
type (int, optional) –
For the generalized problems, this keyword specifies the problem type to be solved for w and v (only takes 1, 2, 3 as possible inputs):
```
1 =>     a @ v = w @ b @ v
2 => a @ b @ v = w @ v
3 => b @ a @ v = w @ v
```
This keyword is ignored for standard problems.
overwrite_a (bool, optional) – Whether to overwrite data in a (may improve performance). Default is False.
overwrite_b (bool, optional) – Whether to overwrite data in b (may improve performance). Default is False.
check_finite (bool, optional) – Whether to check that the input matrices contain only finite numbers. Disabling may give a performance gain, but may result in problems (crashes, non-termination) if the inputs do contain infinities or NaNs.
turbo (bool, optional) – Deprecated since v1.5.0, use ``driver=gvd`` keyword instead. Use divide and conquer algorithm (faster but expensive in memory, only for generalized eigenvalue problem and if full set of eigenvalues are requested.). Has no significant effect if eigenvectors are not requested.
eigvals (tuple (lo, hi), optional) – Deprecated since v1.5.0, use ``subset_by_index`` keyword instead. Indexes of the smallest and largest (in ascending order) eigenvalues and corresponding eigenvectors to be returned: 0 <= lo <= hi <= M-1. If omitted, all eigenvalues and eigenvectors are returned.

Returns:

w ((N,) ndarray) – The N (1<=N<=M) selected eigenvalues, in ascending order, each repeated according to its multiplicity.
v ((M, N) ndarray) – (if eigvals_only == False)

Raises:

LinAlgError – If eigenvalue computation does not converge, an error occurred, or b matrix is not definite positive. Note that if input matrices are not symmetric or Hermitian, no error will be reported but results will be wrong.