Glossary

**Machine Learning (ML)**: A field of artificial intelligence that uses statistical techniques to give computer systems the ability to learn from data and make decisions without being explicitly programmed.

**Margin**: In classification tasks, the margin refers to the distance between the decision boundary and the nearest data points from any class, with larger margins generally indicating better generalization.

**Markov Chain**: A stochastic model that describes a sequence of possible events where the probability of each event depends only on the state attained in the previous event, often used in modeling random processes.

**Markov Decision Process (MDP)**: A mathematical framework used to model decision-making in situations where outcomes are partly random and partly under the control of a decision-maker, commonly used in reinforcement learning.

**Matrix Factorization**: A technique used in machine learning to decompose a matrix into two or more matrices, capturing latent factors that explain the relationships in the data, often used in recommendation systems.

**Max Pooling**: A down-sampling technique used in Convolutional Neural Networks (CNNs) that reduces the dimensionality of feature maps by taking the maximum value within a defined window, helping to make the model more invariant to small translations in the input.

**Maximum Likelihood Estimation (MLE)**: A method of estimating the parameters of a statistical model by maximizing the likelihood function, which measures how well the model explains the observed data.

**Mean Absolute Error (MAE)**: A loss function used in regression tasks that measures the average absolute difference between predicted values and actual values, providing a straightforward measure of prediction accuracy.

**Mean Squared Error (MSE)**: A common loss function used in regression tasks that calculates the average of the squared differences between predicted and actual values, penalizing larger errors more heavily.

**Median Absolute Deviation (MAD)**: A robust measure of statistical dispersion that calculates the median of the absolute deviations from the median of the dataset, often used to detect outliers.

**Mini-Batch Gradient Descent**: An optimization algorithm that combines aspects of both stochastic gradient descent (SGD) and batch gradient descent, updating the model weights based on small, random subsets of the training data.

**Minimum Description Length (MDL)**: A principle in information theory and statistics that seeks to minimize the total length of the description of a dataset, including both the model and the data encoded with the model, often used in model selection.

**Mixture Model**: A probabilistic model that represents a distribution as a combination of multiple simpler distributions, often used in clustering and density estimation tasks.

**Mode Collapse**: A problem in Generative Adversarial Networks (GANs) where the generator produces limited variations of outputs, leading to a lack of diversity in the generated data.

**Model Drift**: The phenomenon where the performance of a machine learning model degrades over time as the underlying data distribution changes, requiring model updates or retraining.

**Model Ensemble**: A method of combining multiple models to improve overall prediction accuracy, often used in techniques like bagging, boosting, and stacking.

**Model Interpretability**: The degree to which a human can understand and explain the predictions made by a machine learning model, often contrasted with model complexity.

**Model Overfitting**: A scenario where a machine learning model performs well on the training data but poorly on new, unseen data because it has learned to model the noise in the training data rather than the underlying patterns.

**Model Regularization**: Techniques used to prevent overfitting in machine learning models by adding a penalty to the loss function for model complexity, encouraging simpler models that generalize better.

**Monte Carlo Simulation**: A computational technique that uses random sampling to estimate the probability distribution of a system, often used in risk analysis and decision-making.

**Multi-Armed Bandit**: A problem in reinforcement learning where an agent must choose between multiple options (bandits) with unknown reward distributions, balancing exploration and exploitation to maximize cumulative reward.

**Multi-Class Classification**: A type of classification problem where the goal is to categorize instances into one of three or more classes, as opposed to binary classification where there are only two classes.

**Multi-Label Classification**: A classification task where each instance can be assigned multiple labels, rather than just one, often used in text classification and image tagging.

**Multicollinearity**: A condition in regression analysis where independent variables are highly correlated, making it difficult to isolate the effect of each variable and leading to unreliable coefficient estimates.

**Multimodal Learning**: A machine learning approach that integrates and processes information from multiple data modalities, such as text, images, and audio, to make predictions or generate outputs.

**Multinomial Logistic Regression**: A generalization of logistic regression used to predict outcomes with more than two possible discrete outcomes, often used in multi-class classification tasks.

**Mutual Information**: A measure of the mutual dependence between two variables, quantifying the amount of information obtained about one variable through observing the other, often used in feature selection.

**Momentum**: An optimization technique that accelerates gradient descent by accumulating a velocity vector in the direction of the gradient, helping to smooth out oscillations and speed up convergence.

**Markov Blanket**: In a Bayesian network, the Markov blanket of a node is the set of nodes that shields it from the rest of the network, containing its parents, children, and the parents of its children.

**Matrix Decomposition**: The process of breaking down a matrix into simpler, constituent matrices, often used in techniques like Singular Value Decomposition (SVD) for dimensionality reduction and data compression.

**Manifold Learning**: A type of unsupervised learning that seeks to uncover the low-dimensional structure of high-dimensional data by assuming the data lies on a manifold, often used in dimensionality reduction techniques like t-SNE and Isomap.

**Meta-Learning**: A machine learning approach where models are trained to learn how to learn, enabling them to adapt quickly to new tasks with limited data, often used in few-shot learning.

**Model Calibration**: The process of adjusting the output probabilities of a machine learning model to better reflect the true likelihood of outcomes, often improving the reliability of probabilistic predictions.

**Markov Property**: The property of a stochastic process where the future state depends only on the current state and not on the sequence of events that preceded it, fundamental in Markov chains and Markov decision processes.

**Marginal Likelihood**: The probability of the observed data under a statistical model, integrating over all possible values of the model parameters, often used in Bayesian model selection.

**Maximum A Posteriori (MAP) Estimation**: A method of estimating the mode of the posterior distribution of a model's parameters, combining prior knowledge with observed data, often used in Bayesian inference.

**Manhattan Distance**: A distance metric that calculates the distance between two points by summing the absolute differences of their coordinates, often used in machine learning algorithms that rely on distance calculations, such as k-nearest neighbors.

**Multivariate Gaussian Distribution**: A generalization of the Gaussian distribution to multiple variables, describing the joint distribution of multiple correlated random variables, often used in probabilistic modeling and pattern recognition.

**Maximum Entropy Principle**: A principle that suggests selecting the probability distribution with the highest entropy among those that satisfy certain constraints, often used in natural language processing and statistical modeling.

**Mixed Integer Programming (MIP)**: An optimization technique that involves finding the best solution from a set of options where some variables are constrained to be integers, often used in operations research and resource allocation problems.

**Marginalization**: The process of summing or integrating out a subset of variables in a probability distribution, often used in Bayesian inference to compute the marginal likelihood or to make predictions based on a subset of variables.

**Matrix Completion**: The task of filling in missing entries of a partially observed matrix, often used in collaborative filtering and recommendation systems to predict user preferences for unrated items.

**Mean Reciprocal Rank (MRR)**: A metric used to evaluate the effectiveness of a search or ranking algorithm by averaging the reciprocal ranks of the correct results across multiple queries.

**Mask R-CNN**: An extension of the Faster R-CNN model that adds a branch for predicting segmentation masks on each region of interest, enabling instance segmentation in addition to object detection.

**Mutual Information Maximization**: A technique in unsupervised learning that seeks to maximize the mutual information between learned representations and input data, often used in representation learning and clustering tasks.