Glossary

**Calibration**: The process of adjusting the output of a model so that its predicted probabilities better reflect the true likelihood of outcomes, often used in classification tasks.

**Capacitor**: In the context of neural networks, it can refer to the ability of a network to store and process information, akin to how a capacitor stores energy in an electrical circuit.

**Catastrophic Forgetting**: A phenomenon in neural networks where learning new information leads to the forgetting of previously learned information, especially in continuous learning environments.

**Categorical Data**: Data that can be divided into distinct categories or groups, often represented as labels or discrete values in machine learning.

**Centroid**: The center of a cluster in clustering algorithms like K-means, representing the mean position of all the points in the cluster.

**Classification**: The task of predicting the category or class of a given input, commonly used in supervised learning where the model assigns a label to each input.

**Clustering**: The task of grouping similar data points together based on their features, commonly used in unsupervised learning to identify patterns or structures in the data.

**Collaborative Filtering**: A technique used in recommendation systems to predict a user's preferences based on the preferences of similar users.

**Confusion Matrix**: A table used to evaluate the performance of a classification model, showing the true positives, true negatives, false positives, and false negatives.

**Convolutional Neural Network (CNN)**: A type of deep learning model particularly effective for processing grid-like data such as images, using convolutional layers to detect patterns.

**Cross-Validation**: A technique used to assess the performance of a model by dividing the data into training and validation sets multiple times, ensuring the model generalizes well to unseen data.

**Curse of Dimensionality**: The phenomenon where the performance of machine learning algorithms degrades as the number of features in the data increases, often due to the sparsity of data in high-dimensional spaces.

**Cumulative Gain**: A measure used in information retrieval to assess the effectiveness of a model by calculating the gain accumulated at each rank in a list of results.

**Covariance**: A measure of how much two random variables change together, often used in statistics to understand the relationship between variables.

**Cost Function**: A function that quantifies the error between the predicted output and the actual output of a model, guiding the optimization process during training.

**Convergence**: The point during training when a machine learning model's performance stabilizes and further training does not significantly improve accuracy or reduce error.

**Contextual Bandits**: A type of reinforcement learning problem where the agent chooses an action based on the current context to maximize cumulative reward, with the key challenge being balancing exploration and exploitation.

**Curse of Dimensionality**: The problem that arises when analyzing data with many features, where the volume of the feature space increases so much that the available data becomes sparse, making it difficult for models to learn effectively.

**Chaining**: A technique in machine learning where the output of one model is used as the input for another, often used in ensemble methods and sequence prediction.

**Causal Inference**: The process of drawing conclusions about the causal relationships between variables, often using statistical techniques to distinguish correlation from causation.

**Collaborative Filtering**: A method used in recommendation systems where predictions are made about a user's preferences based on the preferences of similar users.

**Cold Start Problem**: A challenge in recommendation systems where the system cannot make accurate predictions due to a lack of sufficient data, such as when new users or items are introduced.

**Confidence Interval**: A range of values that is likely to contain the true value of an estimated parameter, providing a measure of the uncertainty associated with the estimate.

**Confounding Variable**: A variable that influences both the independent and dependent variables in an analysis, potentially leading to a spurious association between them.

**Conjugate Gradient**: An optimization algorithm used to solve large-scale linear systems and to train certain types of machine learning models, especially those involving quadratic functions.

**Constraint Satisfaction Problem (CSP)**: A problem where the goal is to find a solution that satisfies a set of constraints, often used in scheduling, resource allocation, and configuration problems.

**Content-Based Filtering**: A recommendation system approach that suggests items similar to those a user has liked in the past, based on the content or features of the items.

**Correlation**: A statistical measure that describes the strength and direction of a relationship between two variables, often used to identify patterns in data.

**Cross-Entropy Loss**: A loss function commonly used in classification tasks, measuring the difference between the true label distribution and the predicted distribution.

**Crowdsourcing**: The practice of obtaining input, ideas, or content by soliciting contributions from a large group of people, typically from an online community, often used to gather labeled data for machine learning.

**Cuckoo Search**: A nature-inspired optimization algorithm based on the brood parasitism behavior of some cuckoo species, often used for solving complex optimization problems.

**Cutoff Threshold**: The value at which a decision is made to classify an instance as belonging to one class or another, often used in binary classification tasks.

**CycleGAN**: A type of Generative Adversarial Network (GAN) designed for image-to-image translation tasks, where the goal is to learn mappings between two domains without paired examples.

**Cython**: A programming language that makes writing C extensions for Python as easy as Python itself, often used to improve the performance of Python code.

**Covariate Shift**: A situation in machine learning where the distribution of the input features changes between the training and testing phases, potentially leading to model performance degradation.

**Compositionality**: The principle that the meaning of a complex expression (such as a sentence) is determined by its structure and the meanings of its parts, relevant in natural language processing and symbolic AI.

**Contextual Embeddings**: Word embeddings that capture the context-dependent meanings of words, often generated using models like BERT, which take into account the surrounding words in a sentence.

**Conditional Independence**: A situation where two variables are independent given the value of a third variable, often used in probabilistic graphical models to simplify the representation of joint distributions.

**Concept Drift**: The change in the statistical properties of the target variable over time, which can affect the performance of a model if not properly addressed.

**Convex Function**: A function where the line segment between any two points on the graph of the function lies above or on the graph, important in optimization because convex functions have global minima that can be efficiently found.

**Cramming**: A term used in machine learning to describe the practice of training a model to perfectly fit the training data, often leading to overfitting and poor generalization to new data.

**Co-Training**: A semi-supervised learning algorithm that trains two classifiers on two different views of the data, each classifier helping to improve the other by labeling data points.

**Confidence Level**: The probability that a confidence interval contains the true value of an estimated parameter, often used in statistical analysis to express the degree of certainty in a measurement.

**Critical Path Method (CPM)**: A project management technique that identifies the sequence of crucial and interdependent steps that determine the minimum completion time for a project, relevant in scheduling and optimization.

**Convolution**: A mathematical operation used in Convolutional Neural Networks (CNNs) to combine two functions, often applied to images to detect patterns like edges, textures, and shapes.

**Convex Hull**: The smallest convex set that contains a given set of points, often used in computational geometry and optimization problems.

**Curvature**: A measure of how much a curve deviates from being a straight line, relevant in optimization and the analysis of nonlinear models.

**Collaborative Tagging**: A system where users collectively add tags to items, often used in recommendation systems and information retrieval to enhance search and discovery.

**Cluster Analysis**: The process of partitioning a set of objects into subsets (clusters) so that objects in the same cluster are more similar to each other than to those in other clusters, used in data mining and pattern recognition.

**Confounding Factor**: An extraneous variable in an experimental design that correlates with both the dependent and independent variables, potentially leading to incorrect conclusions.

**Conditional Probability**: The probability of an event occurring given that another event has already occurred, foundational in Bayesian inference and probabilistic modeling.

**Contextual Bandits**: A variant of the multi-armed bandit problem where the decision-making process takes into account the context or state of the environment to choose the optimal action.

**Convolutional Layer**: A layer in a Convolutional Neural Network (CNN) that applies convolution operations to the input, capturing local spatial patterns in data like images.

**Convex Optimization**: A subfield of optimization focused on convex functions, where any local minimum is also a global minimum, allowing for efficient solutions to complex problems.

**Content Addressable Memory (CAM)**: A type of computer memory that retrieves data based on its content rather than its address, used in associative memory systems.

**Canonical Correlation Analysis (CCA)**: A method of inferring relationships between two sets of variables by finding linear combinations of the variables that are maximally correlated with each other.

**Clique**: In graph theory, a subset of vertices of an undirected graph such that every two distinct vertices in the clique are adjacent, often used in the study of social networks and in optimization problems.

**Complexity Theory**: The study of the computational complexity of problems and algorithms, which classifies problems based on the resources required to solve them, such as time and space.

**Conditional GAN (cGAN)**: A type of Generative Adversarial Network (GAN) where both the generator and discriminator are conditioned on some additional information, such as class labels, to guide the generation process.