Glossary

**Wasserstein Distance**: A measure of the distance between two probability distributions, often used in the context of Generative Adversarial Networks (GANs) to improve training stability, also known as Earth Mover's Distance.

**Weak Learner**: A machine learning model that performs slightly better than random guessing, often used in ensemble methods like boosting where multiple weak learners are combined to form a strong learner.

**Weight Initialization**: The process of setting the initial weights of a neural network before training begins, crucial for ensuring proper convergence during training, with common methods including Xavier initialization and He initialization.

**Weight Decay**: A regularization technique where a small penalty is added to the loss function based on the magnitude of the weights, helping to prevent overfitting by discouraging large weights in the model.

**Weighted Average Precision (WAP)**: A metric used to evaluate classification models, particularly in imbalanced datasets, by calculating a weighted average of the precision scores across all classes.

**Whitening**: A data preprocessing step where the input data is linearly transformed to have zero mean and unit variance, and to remove any correlations between features, often used to improve the performance of machine learning algorithms.

**Wide and Deep Learning**: A neural network architecture that combines linear models (wide) and deep neural networks (deep) to capture both memorization and generalization, often used in recommendation systems.

**Window Function**: A mathematical function that is applied to a signal to isolate a portion of it for analysis, often used in signal processing to reduce spectral leakage in Fourier transforms.

**Word2Vec**: A group of models used to produce word embeddings, where words are represented as vectors in a continuous vector space, capturing semantic relationships between words, often used in natural language processing.

**Wrapper Method**: A feature selection technique where subsets of features are selected and evaluated based on the performance of a machine learning model, often using cross-validation to assess the effectiveness of each feature subset.

**WGAN (Wasserstein GAN)**: A type of Generative Adversarial Network that uses the Wasserstein distance as a loss function, leading to more stable training and better quality of generated samples compared to traditional GANs.

**Windowed Time Series**: A method in time series analysis where data is divided into overlapping or non-overlapping windows, allowing for the analysis of patterns and trends within each window independently.

**Word Embedding**: A representation of words in a continuous vector space, where semantically similar words are mapped to nearby points, commonly used in natural language processing tasks.

**Wavenet**: A deep generative model for raw audio data developed by DeepMind, known for its ability to produce highly realistic speech and music by modeling the waveform of the audio signal.

**Warm Start**: A technique in optimization and machine learning where a model is initialized with parameters from a previous run, rather than random initialization, allowing for faster convergence and improved performance.

**Weak Supervision**: A form of supervised learning where the labels used to train a model are noisy, incomplete, or generated by heuristics or weak models, often used when acquiring fully accurate labels is costly or impractical.

**Weighted Sum Model (WSM)**: A decision-making technique that combines multiple criteria by assigning weights to each criterion and calculating a weighted sum, often used in multi-criteria decision analysis.

**Wavelet Transform**: A mathematical transform that decomposes a signal into components at various scales, allowing for analysis of both frequency and time, often used in signal processing and image compression.

**Word Error Rate (WER)**: A common metric used to evaluate the performance of speech recognition systems, calculated as the number of errors divided by the total number of words in the reference transcript.

**Weighted Cross-Entropy**: A variation of the cross-entropy loss function that assigns different weights to different classes, often used in imbalanced classification problems to give more importance to the minority class.

**Within-Class Scatter Matrix**: A matrix used in Linear Discriminant Analysis (LDA) that measures the scatter or spread of the features within each class, helping to find the best linear combination of features that separates the classes.

**Window Sliding Technique**: A method used in text and time series analysis where a fixed-size window is moved across the data to extract features or detect patterns, often used in natural language processing and signal processing.

**WordNet**: A lexical database of English that groups words into sets of synonyms, providing definitions and information on the relationships between these synonym sets, often used in natural language processing for tasks like word sense disambiguation.

**Watchdog Timer**: A hardware or software timer that resets the system or triggers an alert if the system fails to respond within a predefined time, often used in safety-critical systems to detect and recover from malfunctions.

**Whisker Plot**: A graphical representation used to show the distribution of a dataset, displaying the minimum, first quartile, median, third quartile, and maximum values, often used in descriptive statistics to summarize data.

**Warm-Up Learning Rate**: A technique where the learning rate starts small and gradually increases to the desired value during the initial stages of training, often used to stabilize training in deep learning models.