Manav Goyal

Libraries - Machine Learning

Basic
Numpy
Pandas
Matplotlib
Tensorflow
Scikit Learn

Basic

Library > Modules (.py) > Functions & Global Variables
JSON
- import json => Import
- var = {"var1": val1, "var2": val2} => Initialize
- json.loads(var) => Load data
- Write in File
```
    with open("fileName", "w") as f:
        json.dump(x, f)
```
Pickles
Other
- Quandl

Numpy

Basic
- For working with array but faster
- import numpy as np => Import
- dir(np) => Gives all functions that can be used
Commands
- Initialize
  - np.array([val1, val2, val3]) => Initialize an array
  - np.array([val1, val2, val3], dtype="int32") => Initialize an array with a datatype
  - np.array([[val1, val2, val3], [val4, val5, val6]]) => Initialize an 2D array
  - np.zeros(N) => Initialize array with all values equal to 0 having N elements
  - np.zeros((N, M)) => Initialize array with all values equal to 0 having N elements and M columns
  - np.ones(N) => Initialize array with all values equal to 1 having N elements
  - np.empty(N) => Initialize array with all values equal to some random value having N elements
- var.shape => Returns number of rows and columns
- var.reshape(N1, N2) => Changes the shape of matrix to given rows and columns
- var.dtype => Returns datatype
- var[N1:N2] => Slice from N1 to N2
- var1 = var[N1:N2].copy() => Slice from N1 to N2 and copy without reference in var1
- var.sum(axis=1) => Returns array of added values row-wise, 0 column-wise
- var.dot(var1) => Returns dot product, Dimension should be appropriate
- np.cross(var, var1) => Returns cross product, Dimension should be appropriate
- var.transpose() => Returns transpose
- np.arange(N) => Initialize an array from 0 to N elements
- Sorting
  - np.sort(var) => Sorts on axis 1
  - np.sort(var, axis=0) => Sorts on axis 0
  - np.sort(var, axis=0, kind="mergesort") => Sorts on axis 0, using given algorithm
  - np.argsort(var) => Returns the index which will sort the array
  - np.argmin(var) => Returns the minimum element of argsort
  - np.argmax(var) => Returns the maximum element of argsort

Pandas

CSV File

    from pandas import read_csv
    data = read_csv("fileName.csv") # Read CSV file and put it in RAM
    data = read_csv("URL") # Read CSV file
    data = read_csv(r"filePath") # Read CSV file
    print(data.shape()) # Prints number of (rows, columns)
    print(data.dtypes) # Prints data types of columns
    print(data.describe()) # Prints summary of data
    print(data.head(N)) # Prints first "N" rows, Default N = 5
    print(data.tail(N)) # Prints last "N" rows, Default N = 5
    print(data.ilog[N1, N2]) # Prints "N1" rows and "N2" columns
    print(data.ilog[N1:N2, N3:N4]) # Prints "N1" to "N2" rows and "N3" to "N4" columns

DataFrame

    import pandas as pd
    pd.DataFrame() # Create an empty data frame
    pd.DataFrame([val1, val2], [val3, val4]) # Create a data frame
    pd.DataFrame([val1, val2], [val3, val4], columns=["val5", "val6"]) # Create a data frame and give names to columns

Matplotlib

    import matplotlib
    from matplotlib import pyplot as plt
    from matplotlib import style

    plt.plot([val1, val2], [val3, val4], linewidth="N", label="val") # Line Graph
    plt.bar([val1, val2], [val3, val4]) # Bar Graph, For categorical data
    plt.pie([val1, val2], labels=["val3", "val4", colors=["val5", "val6"], startangle=90, explode=(0, 0.1)]) # Pie Chart
    plt.hist([val1, val2]) # Histogram Graph, Data considered on x-axis
    plt.boxplot([val1, val2]) # Boxplot, Gives summary
    plt.violinplot([val1, val2]) # Violinplot, Gives summary & probability density
    plt.scatter([val1, val2], [val3, val4]) # Scatter plot

    plt.legend()
    plt.title("value") # Display at top of graph
    plt.xlabel("value") # Display at bottom of graph
    plt.ylabel("value") # Display at left of graph
    plt.show() # Display graph

    style.use("ggplot")

Tensorflow

Scikit Learn

Basic
- Simple and efficient tool for data mining and data analysis
- Built on NumPy, SciPy, Matplotlib
- Design => Primarily three types of object
  - Estimators
    - It estimates some parameter based on a dataset
    - It has a fit method and transform method
    - Example - imputer
  - Transformers
    - Takes input and returns output based on the learnings from fit
    - It also has a convenience function called fit_transform()
  - Predictor
    - fit and predict are two common functions
    - It also gives score function which will evaluate the predictions
    - Example - LinearRegrssion model
Commands
- CSV
  - pd.read_csv("data.csv") => Returns a data frame
  - var.to_csv("data.csv") => Export
    - var.to_csv("data.csv", index="False") => Export without index
- var["val"].dtype => Returns datatype of that column
- var["val"].astype("category") => Converts datatype into category
- pd.merge(var1, var2, right_on="val1", left_on="val2") -=>