Menü aufrufen
Toggle preferences menu
Persönliches Menü aufrufen
Nicht angemeldet
Ihre IP-Adresse wird öffentlich sichtbar sein, wenn Sie Änderungen vornehmen.

Machine Learning Glossary/B

Aus VELEVO®.WIKI

Verfasst von / Written by Sebastian F. Genter


B

backpropagation

The fundamental algorithm for training neural networks through gradient calculation. Comprises two phases:

  1. Forward pass: Compute predictions and loss
  2. Backward pass: Calculate gradients using chain rule from calculus

Automatically adjusts weights across all network layers. Modern frameworks handle this automatically, unlike early manual implementations.

bagging

(Bootstrap Aggregating) Ensemble technique creating multiple models from random data subsets sampled with replacement. Key characteristics:

  • Reduces variance through model diversity
  • Aggregates predictions via voting (classification) or averaging (regression)
  • Core component of random forests (ensembles of decision trees)

bag of words

Text representation method ignoring word order while preserving frequency. Examples:

  • "the dog jumps" = [1,1,1,0,...]
  • "jumps the dog" = [1,1,1,0,...] (identical representation)

Used in early NLP systems with sparse vector encodings. Evolved into TF-IDF and n-gram variants.

baseline

Reference model establishing minimum performance expectations. Common baselines:

  • Majority class classifier for imbalanced data
  • Linear regression vs deep networks
  • Previous system version in production

Helps quantify improvements from new approaches.

batch

Group of examples processed together during model training/inference. Benefits:

  • Hardware-friendly matrix operations
  • Smoother gradient estimates
  • Memory efficiency vs single-example processing

batch inference

Parallel prediction method dividing inputs into subgroups for simultaneous processing. Particularly effective on:

  • TPU/GPU accelerators
  • Large-scale prediction tasks

Contrasts with real-time online inference requiring immediate responses.

batch normalization

Normalization technique applied to layer inputs/outputs. Key benefits:

  • Enables higher learning rates
  • Reduces internal covariate shift
  • Acts as mild regularizer

Implemented by adjusting mean/variance per batch during training.

batch size

Critical hyperparameter controlling:

  • 1 (Stochastic GD): High variance updates
  • Full dataset: Computationally expensive
  • 32-512 (Common mini-batch): Balance of efficiency/stability

Affects memory usage and convergence speed.

Bayesian neural network

Probabilistic networks capturing uncertainty through:

  • Weight distributions vs fixed values
  • Predictive confidence intervals

Particularly valuable in medical diagnosis and risk-sensitive applications.

Bayesian optimization

Smart hyperparameter search strategy using:

  1. Surrogate model (Gaussian Process)
  2. Acquisition function (Upper Confidence Bound)

Efficiently explores parameter space with few evaluations.

Bellman equation

Foundational RL equation defining optimal value functions: Q(s,a)=r(s,a)+γ𝔼[maxaQ(s,a)] Forms basis for Q-learning updates in temporal difference learning.

BERT

(Bidirectional Encoder Representations from Transformers) Breakthrough NLP model featuring:

  • Transformer encoder architecture
  • Masked language modeling pretraining
  • Context-aware word embeddings

Revolutionized transfer learning for text tasks.

bias (ethics/fairness)

Systematic errors in ML systems categorized as:

  1. Cognitive biases (confirmation, in-group)
  2. Measurement biases (sampling, reporting)

Critical consideration in responsible AI development.

bias (math)

Model's baseline output when all features are zero. In y=wx+b:

  • b represents bias term
  • Enables model flexibility beyond origin constraints

bidirectional

Processing context from both directions in sequences. Example: Unidirectional: Predicts "____" in "The ____ jumped" using left context Bidirectional: Uses "jumped" from right context for better prediction

bidirectional language model

Contextual language models using full sentence context. Handles challenges like:

  • Pronoun resolution ("He" refers to antecedent)
  • Polysemy ("bank" as financial vs river)

bigram

Pair of consecutive tokens. Fundamental unit in:

  • Language modeling ("New York")
  • Text generation
  • Basic spelling correction

binary classification

Two-class prediction tasks with metrics:

  • Precision/Recall
  • ROC AUC
  • F1 Score

Common applications: Spam detection, medical diagnosis.

binary condition

Decision tree splits producing two paths. Example:

IF temperature ≥ 100°C:
    THEN "High risk"
ELSE:
    Proceed to next condition

binning

Converting continuous features to categorical ranges. Example:

  • Age → [0-12, 13-19, 20-64, 65+]

Trade-off: Gains nonlinear handling at cost of increased dimensionality.

BLEU

(Bilingual Evaluation Understudy) Translation quality metric:

  • Measures n-gram overlap (1-4 grams)
  • Penalizes short translations
  • Scale 0-1 (1=perfect match)

Limited in handling semantic equivalence.

BLEURT

(BERT-based Evaluation Metric) Advanced translation assessment:

  • Uses BERT embeddings
  • Understands paraphrases
  • Better human correlation

Requires pretraining on human ratings.

boosting

Ensemble method converting weak learners to strong predictors via:

  • Sequential error correction
  • Example reweighting

Famous implementations: AdaBoost, Gradient Boosted Trees

bounding box

Rectangular image coordinates specifying object location. Format:

  • Top-left (x1,y1)
  • Bottom-right (x2,y2)

Critical for object detection evaluation using IoU (Intersection over Union).

broadcasting

Array operation technique in NumPy/TensorFlow:

  • Automatically aligns dimensions
  • Expands smaller arrays

Example: Adding scalar to matrix without explicit replication.

bucketing

(See binning) Alternative term for converting continuous values to discrete ranges through:

  • Fixed intervals
  • Quantile-based divisions
  • Domain knowledge thresholds