Dimesionality Analysis - Machine Learning

Feature Selection


  • Eliminate feature based on analysis
  • Selecting subset of relevant features
  • Types
    • Filter Methods
      • Basic Filter Method
        • Constant Features
        • Quasi Constant Features
        • Duplicate Features
      • Correlation Filter Methods => Change in target attribute as compared to change in values
        • Pearson Correlation Coefficient
        • Spearman's Rank Corr Coef
        • Kendall's Rank Corr Coef
      • Statistical & Ranking Filter Methods
        • Mutual Information
        • Chi Square Score
        • ANOVA Univariate
        • Univariate ROC-AUC / RMSE
    • Wrapper Methods
      • Search Methods
        • Forward Feature Selection
        • Backward Feature Elimination
        • Exhaustive Feature Selection
      • Sequential Floating
        • Step Floating Forward Selection
        • Step Floating Backward Selection
      • Other Search
        • Bidirectional Search
    • Embedded Methods
      • Regularization
        • LASSO
        • Ridge
        • Elastic Nets
      • Tree Based Importance
        • Feature Importance
    • Hybrid Method
      • Filter & Wrapper Methods
      • Embedded & Wrapper Methods
        • Recursive Feature Elimination
        • Recursive Feature Addition

Feature Extraction


  • Used when original raw data cannot be used and needed to be transformed into desired form
    • Texts, Images, Geospatial data, Date and Time, Web data, Sensors data
  • Creating new or smaller set of features that captures most of the useful information of the raw data
  • Combine two or more features
  • Types
    • Dimensionality Reduction
      • Principle Component Analysis (PCA)
        • Solves Overfitting problem
        • Steps to find PC
          • Find Principle components from different Views
          • Number of Principle components ≤ Number of attributes
          • PC1 is given higher priority over PC2
          • PC1 and PC2 should hold Orthogonal property => Independent of each other
        • Steps to solve problem
          • X & Y are given > Find mean of both
          • Find Co-variance matrix (C) of size 2 (based on number of values given)
            • Image Not Found
          • Find eigen values by finding roots if equation (C - λI = 0)
          • Find eigen vector (V) corresponding to each eigen values using equation (CV = λV)
            • Put value of one equal to 1 and find another to get a temporary eigen vector
            • Find square root of sum of both values and divide both values by this obtained number to get the final eigen vector
      • Independent Component Analysis (ICA)
      • Linear Discriminant Analysis (LDA)
      • Locally Linear Embedding (LLE)
      • t-distributed Stochastic Neighbor Embedding (t-SNE)
    • Heuristic Search Algorithms
      • Genetic Algorithm
    • Feature Importance
      • Permutation Importance
    • Deep Learning
      • Autoencoders
    • Factor Analysis
    • Single Value Decomposition
Share: