Libraries - Machine Learning

Basic


  • Library > Modules (.py) > Functions & Global Variables
  • JSON
    • import json => Import
    • var = {"var1": val1, "var2": val2} => Initialize
    • json.loads(var) => Load data
    • Write in File
          with open("fileName", "w") as f:
              json.dump(x, f)
      
  • Pickles
  • Other
    • Quandl

Numpy


  • Basic
    • For working with array but faster
    • import numpy as np => Import
    • dir(np) => Gives all functions that can be used
  • Commands
    • Initialize
      • np.array([val1, val2, val3]) => Initialize an array
      • np.array([val1, val2, val3], dtype="int32") => Initialize an array with a datatype
      • np.array([[val1, val2, val3], [val4, val5, val6]]) => Initialize an 2D array
      • np.zeros(N) => Initialize array with all values equal to 0 having N elements
      • np.zeros((N, M)) => Initialize array with all values equal to 0 having N elements and M columns
      • np.ones(N) => Initialize array with all values equal to 1 having N elements
      • np.empty(N) => Initialize array with all values equal to some random value having N elements
    • var.shape => Returns number of rows and columns
    • var.reshape(N1, N2) => Changes the shape of matrix to given rows and columns
    • var.dtype => Returns datatype
    • var[N1:N2] => Slice from N1 to N2
    • var1 = var[N1:N2].copy() => Slice from N1 to N2 and copy without reference in var1
    • var.sum(axis=1) => Returns array of added values row-wise, 0 column-wise
    • var.dot(var1) => Returns dot product, Dimension should be appropriate
    • np.cross(var, var1) => Returns cross product, Dimension should be appropriate
    • var.transpose() => Returns transpose
    • np.arange(N) => Initialize an array from 0 to N elements
    • Sorting
      • np.sort(var) => Sorts on axis 1
      • np.sort(var, axis=0) => Sorts on axis 0
      • np.sort(var, axis=0, kind="mergesort") => Sorts on axis 0, using given algorithm
      • np.argsort(var) => Returns the index which will sort the array
      • np.argmin(var) => Returns the minimum element of argsort
      • np.argmax(var) => Returns the maximum element of argsort

Pandas


  • CSV File
        from pandas import read_csv
        data = read_csv("fileName.csv") # Read CSV file and put it in RAM
        data = read_csv("URL") # Read CSV file
        data = read_csv(r"filePath") # Read CSV file
        print(data.shape()) # Prints number of (rows, columns)
        print(data.dtypes) # Prints data types of columns
        print(data.describe()) # Prints summary of data
        print(data.head(N)) # Prints first "N" rows, Default N = 5
        print(data.tail(N)) # Prints last "N" rows, Default N = 5
        print(data.ilog[N1, N2]) # Prints "N1" rows and "N2" columns
        print(data.ilog[N1:N2, N3:N4]) # Prints "N1" to "N2" rows and "N3" to "N4" columns
    
  • DataFrame
        import pandas as pd
        pd.DataFrame() # Create an empty data frame
        pd.DataFrame([val1, val2], [val3, val4]) # Create a data frame
        pd.DataFrame([val1, val2], [val3, val4], columns=["val5", "val6"]) # Create a data frame and give names to columns
    

Matplotlib


    import matplotlib
    from matplotlib import pyplot as plt
    from matplotlib import style

    plt.plot([val1, val2], [val3, val4], linewidth="N", label="val") # Line Graph
    plt.bar([val1, val2], [val3, val4]) # Bar Graph, For categorical data
    plt.pie([val1, val2], labels=["val3", "val4", colors=["val5", "val6"], startangle=90, explode=(0, 0.1)]) # Pie Chart
    plt.hist([val1, val2]) # Histogram Graph, Data considered on x-axis
    plt.boxplot([val1, val2]) # Boxplot, Gives summary
    plt.violinplot([val1, val2]) # Violinplot, Gives summary & probability density
    plt.scatter([val1, val2], [val3, val4]) # Scatter plot

    plt.legend()
    plt.title("value") # Display at top of graph
    plt.xlabel("value") # Display at bottom of graph
    plt.ylabel("value") # Display at left of graph
    plt.show() # Display graph

    style.use("ggplot")

Tensorflow


Scikit Learn


  • Basic
    • Simple and efficient tool for data mining and data analysis
    • Built on NumPy, SciPy, Matplotlib
    • Design => Primarily three types of object
      • Estimators
        • It estimates some parameter based on a dataset
        • It has a fit method and transform method
        • Example - imputer
      • Transformers
        • Takes input and returns output based on the learnings from fit
        • It also has a convenience function called fit_transform()
      • Predictor
        • fit and predict are two common functions
        • It also gives score function which will evaluate the predictions
        • Example - LinearRegrssion model
  • Commands
    • CSV
      • pd.read_csv("data.csv") => Returns a data frame
      • var.to_csv("data.csv") => Export
        • var.to_csv("data.csv", index="False") => Export without index
    • var["val"].dtype => Returns datatype of that column
    • var["val"].astype("category") => Converts datatype into category
    • pd.merge(var1, var2, right_on="val1", left_on="val2") -=>
Share: