Cumulative Distribution Transform Nearest Subspace (CDT-NS) Classifier

This tutorial will demonstrate how to use the CDT-NS classifier for 1D data in the PyTransKit package.

Class:: CDT_NS

Functions:

Constructor function: cdt_ns_obj = CDT_NS(num_classes, rm_edge)

Inputs:
----------------
num_classes : integer value
    totale number of classes in the dataset.
rm_edge : boolean
    IF TRUE the first and last points of CDTs will be removed.

Outputs:
----------------
cdt_ns_obj : class object
    Instance of the class CDT_NS.

Fit function: cdt_ns_obj.fit(Xtrain, Ytrain, no_deform_model)

Inputs:
----------------
Xtrain : array-like, shape (n_samples, n_columns)
    1D data for training.
Ytrain : ndarray of shape (n_samples,)
    Labels of the training samples.
no_deform_model : boolean flag; IF TRUE, no deformation model will be added
    default = False.

Predict function: preds = cdt_ns_obj.predict(Xtest, use_gpu)

Inputs:
----------------
Xtest : array-like, shape (n_samples, n_columns)
    1D data for testing.
use_gpu: boolean flag; IF TRUE, use gpu for calculations
    default = False.

Outputs:
----------------
preds : 1d array, shape (n_samples,)
   Predicted labels for test samples.

Example

The following example will demonstrate how to: * create and initialize an instance of the class CDT_NS * train the model with training 1D samples * apply the model to predict calss labels of the test 1D samples In this example we have used a synthetic dataset (1D) stored in the data folder. The dataset contains two classes. Class 0: different translated versions of Gaussian signal Class 1: translated versions of summation of two Gaussian signals

Import some python libraries

[1]:

import numpy as np
from sklearn.metrics import accuracy_score
from pathlib import Path
import sys
sys.path.append('../')
from pytranskit.classification.utils import *

use_gpu = False

Import CDT-NS class from PyTransKit package

[2]:

from pytranskit.classification.cdt_ns import CDT_NS

Load dataset

For loading data we have used load_data_1D function from the pytranskit/classifier/utils.py script. It takes name and directory of the dataset, and total number of classes as input. Returns both train and test samples in two separate 2d arrays of shape (n_samples, n_columns), and corresponding class labels. User can use there own implementation to load data, just need to make sure that the output arrays are consistent.

[3]:

datadir = './data'
dataset = 'synthetic_1D'
num_classes = 2          # total number of classes in the dataset
(x_train, y_train), (x_test, y_test) = load_data_1D(dataset, num_classes, datadir)  # load_data function from utils.py

loading data from mat files
x_train.shape (1400, 201) x_test.shape (600, 201)
saved to ./data/synthetic_1D/dataset.hdf5

In this example we have used 512 randomly chosen samples per class to train the model. We have used another function take_train_samples function from utils.py script for this. User can use their own script.

[4]:

n_samples_perclass = 512  # total number of training samples per class used in this example
x_train_sub, y_train_sub = take_train_samples(x_train, y_train, n_samples_perclass,
                                              num_classes, repeat=0) # function from utils.py

Create an instance of CDT_NS class

[5]:

cdt_ns_obj = CDT_NS(num_classes, rm_edge=True)

Training phase

This function takes the train samples and labels as input, and stores the basis vectors for corresponding classes in a private variable. This variable will be used in the predict function in the test phase

[6]:

print(x_train_sub.shape)
cdt_ns_obj.fit(x_train_sub, y_train_sub)

(1024, 201)

Calculating CDTs for training data ...
Generating basis vectors for each class ...

Testing phase

predict function takes the train samples as input and returns the predicted class labels

[7]:

preds = cdt_ns_obj.predict(x_test, use_gpu)


Calculating CDTs for testing samples ...
Finding nearest subspace for each test sample ...

[8]:

print('\nTest accuracy: {}%'.format(100*accuracy_score(y_test, preds)))


Test accuracy: 100.0%

[ ]: