Glossary

**Sampling**: The process of selecting a subset of data from a larger dataset for analysis, often used in statistics and machine learning to estimate properties of the population without analyzing the entire dataset.

**Scikit-Learn**: A popular open-source Python library that provides simple and efficient tools for data mining and data analysis, built on NumPy, SciPy, and matplotlib.

**Self-Supervised Learning**: A type of machine learning where the model is trained on a task that does not require labeled data, using the data itself to generate labels, often used for pre-training models.

**Semantic Segmentation**: A computer vision task that involves classifying each pixel in an image into a predefined category, often used in image analysis to understand the context and contents of a scene.

**Sequential Model**: A type of machine learning model that processes input data in a sequential manner, often used in time series analysis and natural language processing tasks like RNNs and LSTMs.

**Sigmoid Function**: An activation function used in neural networks that outputs a value between 0 and 1, often used in binary classification tasks to map predictions to probabilities.

**Silhouette Score**: A metric used to evaluate the quality of clustering, measuring how similar a data point is to its own cluster compared to other clusters, with values ranging from -1 to 1.

**Simplex Method**: A popular algorithm used in linear programming to find the optimal solution to linear optimization problems, often used in operations research and economics.

**Simulated Annealing**: An optimization technique inspired by the annealing process in metallurgy, where a system is gradually cooled to find the global minimum of a function, often used in combinatorial optimization problems.

**Singular Value Decomposition (SVD)**: A matrix factorization technique that decomposes a matrix into three other matrices, often used in dimensionality reduction, data compression, and recommendation systems.

**Softmax Function**: An activation function used in neural networks, particularly in the output layer for multi-class classification, that converts logits into probabilities by normalizing them across classes.

**Sparse Representation**: A way of representing data such that most elements are zero or near-zero, often used to reduce computational complexity and storage requirements in machine learning models.

**Spectral Clustering**: A clustering technique that uses the eigenvalues of a similarity matrix to perform dimensionality reduction before applying k-means or another clustering algorithm, often used for finding clusters in non-linearly separable data.

**Splitting Criterion**: The rule or method used to decide how to split data at each node in a decision tree, commonly using metrics like Gini impurity or information gain to determine the best split.

**Stacked Autoencoder**: A type of neural network used for unsupervised learning, where multiple autoencoders are stacked to form a deep network, often used for feature extraction and dimensionality reduction.

**Stacking**: An ensemble learning technique where multiple models are trained, and their predictions are combined by a meta-model to produce a final prediction, improving accuracy by leveraging the strengths of different models.

**Stochastic Gradient Descent (SGD)**: An optimization algorithm that updates model parameters based on the gradient of the loss function for a single or a small batch of training examples, often used in training large-scale machine learning models.

**Stop Words**: Commonly used words in a language that are often removed during text preprocessing in natural language processing tasks, as they are generally considered to add little value to the analysis (e.g., "and", "the", "is").

**Structured Data**: Data that is organized in a predefined format, such as tables with rows and columns, often used in relational databases and easier to analyze using traditional machine learning techniques.

**Subsampling**: A technique used to reduce the size of a dataset by selecting a representative subset, often used to speed up computations and reduce memory usage in large datasets.

**Support Vector Machine (SVM)**: A supervised learning algorithm used for classification and regression tasks, which finds the hyperplane that best separates the data into classes by maximizing the margin between them.

**Supervised Learning**: A type of machine learning where the model is trained on a labeled dataset, learning to map input features to the correct output labels, commonly used in tasks like classification and regression.

**Survival Analysis**: A statistical method used to analyze the expected duration of time until one or more events happen, such as death in biological organisms or failure in mechanical systems, often used in medical research and reliability engineering.

**Swarm Intelligence**: A field of artificial intelligence inspired by the collective behavior of decentralized, self-organized systems, such as bird flocking or fish schooling, often used in optimization and robotics.

**Synthetic Data**: Data that is artificially generated rather than collected from real-world events, often used to augment training datasets in machine learning when real data is scarce or privacy is a concern.

**Symmetric Matrix**: A square matrix that is equal to its transpose, often encountered in optimization problems and spectral methods in machine learning.

**Systematic Sampling**: A statistical method where samples are selected at regular intervals from an ordered dataset, often used as an alternative to random sampling to ensure that the sample is spread evenly across the population.

**Smote (Synthetic Minority Over-sampling Technique)**: A technique used to address imbalanced datasets by generating synthetic examples of the minority class, improving the performance of classifiers on underrepresented classes.

**Shapley Value**: A concept from cooperative game theory used to fairly distribute the total gains to players based on their contributions, often used in explainable AI to attribute the impact of features on a model’s predictions.

**Sparse Matrix**: A matrix in which most of the elements are zero, often used in large-scale machine learning problems to save space and reduce computation time.

**Soft Clustering**: A type of clustering where data points can belong to multiple clusters with varying degrees of membership, often used in fuzzy clustering methods.

**Sparse Coding**: A method of learning a set of basis functions to represent data as sparse linear combinations of these functions, often used in image processing and feature learning.

**Similarity Measure**: A function that quantifies how similar two objects are, often used in clustering, nearest neighbor algorithms, and information retrieval, with common measures including cosine similarity and Euclidean distance.

**Semantic Web**: An extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C), aiming to make web content more accessible to machines by using metadata and ontologies to describe the data.

**Sensor Fusion**: The process of integrating data from multiple sensors to produce more accurate, reliable, and comprehensive information than could be obtained from any single sensor, often used in robotics, autonomous vehicles, and IoT applications.

**Spectrogram**: A visual representation of the spectrum of frequencies in a signal as it varies with time, often used in audio signal processing and speech recognition.

**Sparse Autoencoder**: A variant of autoencoders that includes a sparsity constraint on the hidden units, encouraging the model to learn a more efficient, sparse representation of the input data.

**Sample Complexity**: The amount of data needed to train a model to achieve a certain level of performance, often used to compare the efficiency of different machine learning algorithms.

**Semantic Mapping**: The process of associating input data with meaningful concepts or categories, often used in natural language processing and computer vision to bridge the gap between raw data and human-understandable labels.

**Stochastic Process**: A collection of random variables representing the evolution of a system over time, often used in modeling time series data and in reinforcement learning.

**Statistical Power**: The probability that a test will correctly reject a false null hypothesis, often used in hypothesis testing to determine the sample size needed to detect an effect.

**Stop Gradient**: A technique used in training neural networks where the gradient is not backpropagated through certain parts of the network, effectively "freezing" those parts during optimization.

**Semi-Supervised Learning**: A type of machine learning that combines a small amount of labeled data with a large amount of unlabeled data during training, improving model performance when labeled data is scarce.

**Stationarity**: A property of a time series where its statistical properties, such as mean and variance, are constant over time, often assumed in time series analysis to apply certain forecasting methods.