Neural Network - Artificial Intelligence
- Input Data > Meaning > Learning and Improvement
- NLP (Natural Language Processing)
- How human communicate using natural language
- Pipeline
- Sentence segmentation > Word tokenization > Stemming > Lemmatization > Stop words
- Applications
- Speech Recognition, Sentimental Analysis, Machine Translations, Chat Bots
- NLU (Natural Language Understanding)
- Understanding what users say and their intent
- Challenges
- Lexical Ambiguity
- Syntactic (Structure) Ambiguity
- Semantic (Meaning) Ambiguity
- Pragmetic (Interpretation) Ambiguity
- NLG (Natural Language Generation)
- It should be Intelligent and Conversational
- Deal with structured data
- Text/Structure planning
- Corpus => Complete knowledge base
- Machine Learning
- Supervised
- Training Data
- Both Inputs & Outputs
- Classification
- Naive Bayes algorithm
- Unsupervised
- Only Inputs
- Clustering
- K-Mean
- Reinforcement
- Reward/Penalty
- Q-Learning
- Genetic Algorithm
- Abstraction of real biological evolution
- Solve complex problems => NLP Hard
- Focus on optimization
- Population of possible solutions for a given problem
- From a group of individuals, Best will survive
- Phenotype > Encode > Genotype > Decode > Phenotype
- Learning Algorithms
- Outlier Detection
- IQR (Interquartile Range) method
- Outliers are identified using the lower and upper limits based on the IQR and filtered using boolean indexing
- Percentile method
- Outliers are identified using a high percentile (e.g., 99th percentile) and filtered using boolean indexing
- Z-score method
- Z-scores are calculated using stats.zscore(). Values within a specified range (e.g., ±3 standard deviations) are retained in a new DataFrame.