Feature selection¶

Out:

Iteration 0:
   global_best: 0.956   iteration_best: 0.956
Iteration 1:
   global_best: 0.963   iteration_best: 0.963
Iteration 2:
   global_best: 0.963   iteration_best: 0.960
Iteration 3:
   global_best: 0.963   iteration_best: 0.958
Iteration 4:
   global_best: 0.963   iteration_best: 0.954
Iteration 5:
   global_best: 0.963   iteration_best: 0.956
Iteration 6:
   global_best: 0.963   iteration_best: 0.960
Iteration 7:
   global_best: 0.963   iteration_best: 0.960
Iteration 8:
   global_best: 0.963   iteration_best: 0.960
Iteration 9:
   global_best: 0.963   iteration_best: 0.963
Iteration 10:
   global_best: 0.963   iteration_best: 0.960
Iteration 11:
   global_best: 0.963   iteration_best: 0.956
Iteration 12:
   global_best: 0.963   iteration_best: 0.960
Iteration 13:
   global_best: 0.963   iteration_best: 0.958
Iteration 14:
   global_best: 0.963   iteration_best: 0.960
Iteration 15:
   global_best: 0.963   iteration_best: 0.956
Iteration 16:
   global_best: 0.963   iteration_best: 0.956
Iteration 17:
   global_best: 0.963   iteration_best: 0.952
Iteration 18:
   global_best: 0.963   iteration_best: 0.960
Iteration 19:
   global_best: 0.965   iteration_best: 0.965

Iteration completed
==========================
Exit code 0: Algortihm reached the maximum limit of 20 iterations
Elapsed time 24.399
20 iterations
Best selection: ['mean texture', 'mean perimeter', 'mean area', 'mean smoothness', 'mean concavity', 'mean fractal dimension', 'texture error', 'perimeter error', 'compactness error', 'concavity error', 'fractal dimension error', 'worst perimeter', 'worst area', 'worst smoothness', 'worst fractal dimension']
Best evaluation: 0.9648251423260138

Test accuracy
--------------------------
All columns: 0.965
Solution:  0.974

from psopt import Combination

# to run this, make sure to have scikit-learn installed
from sklearn.ensemble import RandomForestClassifier as RFC
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_validate as cv
from sklearn.datasets import load_breast_cancer


def main():
    # loading breast cancer dataset
    dataset = load_breast_cancer()

    seed = 5

    # train-test split
    train_x, test_x, train_Y, test_Y = train_test_split(
        dataset.data, dataset.target, test_size=0.2, random_state=seed
    )

    # create objective function
    def evaluate(solution):
        results = cv(RFC(n_estimators=10), train_x[:, solution], train_Y, cv=3)
        return results["test_score"].mean()

    # instantiate optimizer
    opt = Combination(evaluate, list(range(train_x.shape[1])), labels=dataset.feature_names)

    # maximize obj function
    result = opt.maximize(selection_size=15, verbose=True, max_iter=20, seed=seed)

    # result.solution will have the same effect if labels are not provided to the optimizer
    solution = [
        i
        for i in range(len(dataset.feature_names))
        if dataset.feature_names[i] in result.solution
    ]

    # ======================== COMPARISON ========================

    original = RFC().fit(train_x, train_Y)
    optimized = RFC().fit(train_x[:, solution], train_Y)

    print("\nTest accuracy\n--------------------------")
    print("All columns: {:.3f}".format(original.score(test_x, test_Y)))
    print("Solution:  {:.3f}".format(optimized.score(test_x[:, solution], test_Y)))


if __name__ == "__main__":
    main()

Total running time of the script: ( 0 minutes 24.795 seconds)

Gallery generated by Sphinx-Gallery

Feature selection¶

Home

This Page