Basic
- Library > Modules (.py) > Functions & Global Variables
- JSON
- Pickles
- Other
Numpy
- Basic
- For working with array but faster
import numpy as np
=> Import
dir(np)
=> Gives all functions that can be used
- Commands
- Initialize
np.array([val1, val2, val3])
=> Initialize an array
np.array([val1, val2, val3], dtype="int32")
=> Initialize an array with a datatype
np.array([[val1, val2, val3], [val4, val5, val6]])
=> Initialize an 2D array
np.zeros(N)
=> Initialize array with all values equal to 0 having N elements
np.zeros((N, M))
=> Initialize array with all values equal to 0 having N elements and M columns
np.ones(N)
=> Initialize array with all values equal to 1 having N elements
np.empty(N)
=> Initialize array with all values equal to some random value having N elements
var.shape
=> Returns number of rows and columns
var.reshape(N1, N2)
=> Changes the shape of matrix to given rows and columns
var.dtype
=> Returns datatype
var[N1:N2]
=> Slice from N1 to N2
var1 = var[N1:N2].copy()
=> Slice from N1 to N2 and copy without reference in var1
var.sum(axis=1)
=> Returns array of added values row-wise, 0 column-wise
var.dot(var1)
=> Returns dot product, Dimension should be appropriate
np.cross(var, var1)
=> Returns cross product, Dimension should be appropriate
var.transpose()
=> Returns transpose
np.arange(N)
=> Initialize an array from 0 to N elements
- Sorting
np.sort(var)
=> Sorts on axis 1
np.sort(var, axis=0)
=> Sorts on axis 0
np.sort(var, axis=0, kind="mergesort")
=> Sorts on axis 0, using given algorithm
np.argsort(var)
=> Returns the index which will sort the array
np.argmin(var)
=> Returns the minimum element of argsort
np.argmax(var)
=> Returns the maximum element of argsort
Pandas
- CSV File
from pandas import read_csv
data = read_csv("fileName.csv") # Read CSV file and put it in RAM
data = read_csv("URL") # Read CSV file
data = read_csv(r"filePath") # Read CSV file
print(data.shape()) # Prints number of (rows, columns)
print(data.dtypes) # Prints data types of columns
print(data.describe()) # Prints summary of data
print(data.head(N)) # Prints first "N" rows, Default N = 5
print(data.tail(N)) # Prints last "N" rows, Default N = 5
print(data.ilog[N1, N2]) # Prints "N1" rows and "N2" columns
print(data.ilog[N1:N2, N3:N4]) # Prints "N1" to "N2" rows and "N3" to "N4" columns
- DataFrame
import pandas as pd
pd.DataFrame() # Create an empty data frame
pd.DataFrame([val1, val2], [val3, val4]) # Create a data frame
pd.DataFrame([val1, val2], [val3, val4], columns=["val5", "val6"]) # Create a data frame and give names to columns
Matplotlib
import matplotlib
from matplotlib import pyplot as plt
from matplotlib import style
plt.plot([val1, val2], [val3, val4], linewidth="N", label="val") # Line Graph
plt.bar([val1, val2], [val3, val4]) # Bar Graph, For categorical data
plt.pie([val1, val2], labels=["val3", "val4", colors=["val5", "val6"], startangle=90, explode=(0, 0.1)]) # Pie Chart
plt.hist([val1, val2]) # Histogram Graph, Data considered on x-axis
plt.boxplot([val1, val2]) # Boxplot, Gives summary
plt.violinplot([val1, val2]) # Violinplot, Gives summary & probability density
plt.scatter([val1, val2], [val3, val4]) # Scatter plot
plt.legend()
plt.title("value") # Display at top of graph
plt.xlabel("value") # Display at bottom of graph
plt.ylabel("value") # Display at left of graph
plt.show() # Display graph
style.use("ggplot")
Tensorflow
Scikit Learn
- Basic
- Simple and efficient tool for data mining and data analysis
- Built on NumPy, SciPy, Matplotlib
- Design => Primarily three types of object
- Estimators
- It estimates some parameter based on a dataset
- It has a fit method and transform method
- Example - imputer
- Transformers
- Takes input and returns output based on the learnings from fit
- It also has a convenience function called fit_transform()
- Predictor
- fit and predict are two common functions
- It also gives score function which will evaluate the predictions
- Example - LinearRegrssion model
- Commands
- CSV
pd.read_csv("data.csv")
=> Returns a data frame
var.to_csv("data.csv")
=> Export
var.to_csv("data.csv", index="False")
=> Export without index
var["val"].dtype
=> Returns datatype of that column
var["val"].astype("category")
=> Converts datatype into category
pd.merge(var1, var2, right_on="val1", left_on="val2")
-=>