Verfasst von / Written by Sebastian F. Genter
L
L₀ regularization
Sparsity-inducing regularization that penalizes the count of non-zero weights. Encourages feature selection by eliminating less important parameters, often used in model compression. Computationally challenging due to non-convex nature.
L₁ loss
Robust regression loss measuring absolute differences (Mean Absolute Error). Formula: . Less sensitive to outliers than L₂ loss, creates piecewise linear optimization landscape.
L₁ regularization
(Lasso) Adds penalty proportional to absolute weight values: . Produces sparse models by driving unimportant weights to exactly zero, enabling automatic feature selection.
L₂ loss
(Squared Error) Quadratic loss measuring squared deviations: . Strongly convex optimization landscape, sensitive to outliers. Basis for ordinary least squares regression.
L₂ regularization
(Ridge) Penalizes squared magnitude of weights: . Shrinks coefficients without eliminating features, improves conditioning of ill-posed problems. Equivalent to Gaussian prior in Bayesian interpretation.
label
Target variable being predicted in supervised learning. For image classification: object categories; for regression: continuous values. Labels form the ground truth for training and evaluation.
labeled example
Data point containing both input features and corresponding target value. Essential for supervised learning algorithms. Acquisition costs vary by domain - medical labels require expert annotation.
label leakage
When features unintentionally contain target information. Example: Including "patient cured" flag in features when predicting recovery time. Causes inflated performance metrics and failed real-world deployment.
lambda
Symbol () representing regularization strength hyperparameter. Controls tradeoff between fitting training data and model simplicity. Higher increases regularization effect.
LaMDA
(Language Model for Dialogue Applications) Google's conversational AI focused on natural dialogue flow. Employs transformer architecture with modifications for improved coherence and factual grounding.
landmarks
Annotated reference points in visual data. Used in:
- Facial recognition (eye corners, nose tip)
- Medical imaging (anatomical markers)
- AR applications
Enables pose estimation and alignment.
language model
Statistical model learning probability distributions over token sequences. Predicts next tokens given context. Evolution: N-gram → RNN → Transformer models. Basis for modern NLP systems.
large language model
Transformer-based models with B parameters trained on web-scale text. Exhibit emergent abilities like reasoning and in-context learning. Examples: GPT-4, PaLM, Claude. Require distributed training infrastructure.
latent space
Lower-dimensional representation capturing data essentials. Learned through autoencoders or GANs. Enables operations like vector arithmetic on semantic concepts (king - man + woman = queen).
layer
Neural network component transforming inputs through parameterized operations. Types:
- Dense (linear + activation)
- Convolutional (spatial filters)
- Attention (contextual weighting)
- Normalization (distribution scaling)
Layers API (tf.layers)
Legacy TensorFlow interface for layer construction. Superseded by Keras layers but still found in older codebases. Provides basic layers like Dense, Conv2D with manual weight management.
leaf
Terminal node in decision trees producing final predictions. Contains class distribution (classification) or constant value (regression). Depth to leaves indicates model complexity.
Learning Interpretability Tool (LIT)
Interactive platform for model analysis. Features:
- Attribution visualization
- Counterfactual generation
- Dataset slicing
- Performance metrics
Supports text, image, and tabular models.
learning rate
Most critical hyperparameter controlling step size in gradient descent. Too high causes divergence; too low slows convergence. Adaptive methods (Adam) automate per-parameter adjustments.
least squares regression
Linear modeling technique minimizing sum of squared residuals. Closed-form solution: . Assumptions: linearity, homoscedasticity, independence, normality. Foundation for econometrics.
Levenshtein Distance
Edit distance measuring minimum single-character operations (insert, delete, substitute) to transform strings. Used in spell check, DNA alignment, OCR correction.
linear
Mathematical relationship expressible through addition and scalar multiplication. Linear models have additive, non-interacting features. Contrast with nonlinear relationships requiring polynomial terms or neural networks.
linear model
Predictive function of form . Limited to modeling additive effects but highly interpretable. Basis for many statistical methods: regression, SVMs with linear kernels.
linear regression
Continuous outcome modeling using linear predictor. Evaluates feature importance through coefficients. Diagnostic metrics: R², residual plots, p-values. Foundation for econometrics.
LIT
LLM
→ See large language model
LLM evaluations (evals)
Assessment protocols for large language models covering:
- Factual accuracy (TruthfulQA)
- Reasoning (GSM8K)
- Toxicity (RealToxicityPrompts)
- Instruction following (AlpacaEval)
Require combination of automated and human assessment.
logistic regression
Probabilistic classification using sigmoid function. Decision boundary linear in feature space. Optimizes cross-entropy loss via gradient descent. Foundation for neural network classifiers.
logits
Unnormalized output values before applying final activation (softmax/sigmoid). Represent relative confidence scores. Used in loss calculations to avoid saturation issues.
Log Loss
(Binary Cross-Entropy) Penalizes incorrect confidence estimates. Formula: . Strongly convex, provides smooth gradient flow for probabilistic models.
log-odds
Logarithm of odds ratio: . Linear in logistic regression. Inverse of sigmoid function. Enables interpretation of coefficients as multiplicative odds changes.
Long Short-Term Memory (LSTM)
RNN variant with gated memory cells. Components:
- Forget gate (discards irrelevant info)
- Input gate (updates cell state)
- Output gate (controls exposure)
Handles long-range dependencies better than vanilla RNNs.
LoRA
(Low-Rank Adaptation) Parameter-efficient fine-tuning using rank-decomposed weight updates. Freezes pretrained weights, injects trainable low-rank matrices. Reduces memory usage by >90% versus full fine-tuning.
loss
Quantitative measure of prediction error. Training minimizes loss through parameter updates. Common types:
- Task-specific (cross-entropy)
- Regularization (L1/L2)
- Composite (triplet loss)
loss aggregator
Combination method for multi-task learning losses:
- Weighted sum
- Dynamic weighting (uncertainty)
- GradNorm
Balances competing objectives during optimization.
loss curve
Training trajectory visualization showing loss vs iterations. Patterns indicate:
- Learning rate suitability
- Overfitting/underfitting
- Convergence
Critical for debugging model behavior.
loss function
Objective function quantifying model performance. Design considerations:
- Differentiability
- Robustness to outliers
- Scale sensitivity
- Task alignment (e.g., IoU for detection)
loss surface
High-dimensional error landscape shaped by model parameters. Optimization navigates this terrain seeking minima. Visualization techniques: PCA slices, random projections.
Low-Rank Adaptability (LoRA)
→ See LoRA
LSTM
→ See Long Short-Term Memory