Menü aufrufen
Toggle preferences menu
Persönliches Menü aufrufen
Nicht angemeldet
Ihre IP-Adresse wird öffentlich sichtbar sein, wenn Sie Änderungen vornehmen.

Machine Learning Glossary/G

Aus VELEVO®.WIKI

Verfasst von / Written by Sebastian F. Genter


G

GAN

(Generative Adversarial Network) A framework where two neural networks compete: a generator creates synthetic data while a discriminator evaluates authenticity. Through adversarial training, both components improve until generated samples become indistinguishable from real data.

Gemini

Google's advanced AI ecosystem encompassing multimodal models capable of processing text, images, and audio. Includes interactive interfaces, APIs, and cloud integration tools for enterprise applications.

Gemini models

Transformer-based architectures within the Gemini system optimized for cross-modal understanding. Features include:

  • Multitask capabilities
  • Agent integration
  • Scalable deployment options
  • Enhanced safety protocols

generalization

A model's ability to perform well on unseen data beyond its training set. Measured by the difference between training and validation performance, with good generalization indicating minimal overfitting.

generalization curve

Visual representation plotting training and validation metrics (typically loss) across epochs. Key patterns:

  • Converging curves indicate proper learning
  • Diverging curves suggest overfitting
  • Plateaued curves may signal underfitting

generalized linear model

Extension of linear regression supporting non-normal error distributions through link functions. Includes:

  • Logistic regression (binomial)
  • Poisson regression (count data)
  • Gamma regression (skewed continuous)

generated text

Output produced by language models through autoregressive prediction. Characteristics vary by:

  • Temperature settings
  • Top-k sampling
  • Beam search parameters

Example: GPT-3 producing essay drafts from prompts.

generative adversarial network (GAN)

System architecture comprising:

  1. Generator: Creates synthetic data
  2. Discriminator: Classifies real vs. fake

Applications range from image synthesis to drug discovery.

generative AI

Systems producing original, coherent content across modalities:

  • Text generation (articles, code)
  • Image synthesis (DALL-E, Midjourney)
  • Audio production (music, voice cloning)

Distinguished from discriminative AI by creative output capability.

generative model

Algorithm learning data distributions to create new samples. Two approaches:

  1. Explicit density estimation (VAEs)
  2. Implicit generation (GANs, Diffusion)

Contrasts with discriminative models predicting labels from features.

generator

The creative component in GANs that transforms random noise into realistic outputs. Architectures often use transposed convolutions for image synthesis or transformer layers for text.

gini impurity

Split criterion in decision trees measuring class mixture: Gini=1i(pi2) Ranges 0 (pure node) to 0.5 (balanced classes). Alternative to entropy with faster computation.

golden dataset

Curated reference data serving as:

  • Model evaluation benchmark
  • Regression testing suite
  • Performance tracking baseline

Typically hand-verified and version-controlled.

golden response

Predefined correct answer for specific inputs. Used to:

  • Validate model outputs
  • Train evaluation metrics
  • Calibrate confidence scores

Example: "2+2=4" as golden response to arithmetic queries.

GPT

(Generative Pre-trained Transformer) Landmark LLM architecture by OpenAI using:

  • Decoder-only transformers
  • Unsupervised pre-training
  • Task-agnostic design

Later versions (GPT-3/4) demonstrate few-shot learning capabilities.

gradient

Multivariable derivative vector indicating steepest ascent direction. In neural networks, computed via backpropagation to update weights: θnew=θoldηJ(θ)

gradient accumulation

Memory optimization technique performing parameter updates after multiple microbatch computations. Enables effective large batch training on memory-constrained devices.

gradient boosted (decision) trees (GBT)

Ensemble method combining weak decision trees through sequential error correction. At each iteration:

  1. Fit tree to residual errors
  2. Update predictions with shrinkage

Widely used in tabular data tasks.

gradient boosting

General ensemble algorithm minimizing loss through additive model construction. For step m: Fm(x)=Fm1(x)+νhm(x) Where ν = learning rate, hm = weak learner

gradient clipping

Stabilization technique constraining gradient magnitudes during backpropagation. Common methods:

  1. Value clipping: cap at ±threshold
  2. Norm clipping: scale if Fehler beim Parsen (Syntaxfehler): {\displaystyle \|g\| > \text{max\_norm}}

gradient descent

Optimization workhorse iteratively updating parameters: θθηJ(θ) Variants include:

  • Stochastic (single example)
  • Mini-batch (subset)
  • Full batch (entire dataset)

graph

In TensorFlow, a dataflow representation of computations as nodes (operations) and edges (tensors). Enables:

  • Static optimization
  • Distributed execution
  • Hardware acceleration

graph execution

TensorFlow's deferred computation mode building graphs before execution. Contrasts with eager execution by enabling:

  • Automatic differentiation
  • Cross-platform deployment
  • Performance optimizations

greedy policy

Reinforcement learning strategy always choosing highest-valued action. While computationally efficient, may converge to suboptimal solutions due to lack of exploration.

groundedness

Property ensuring model outputs derive from verifiable sources. Critical for:

  • Factual QA systems
  • Legal document analysis
  • Medical diagnosis tools

Assessed through citation accuracy and reference alignment.

ground truth

Authoritative reference data representing reality. Sources include:

  • Expert annotations
  • Sensor measurements
  • Historical records

Forms basis for supervised learning and model validation.

group attribution bias

Cognitive error assuming individual traits apply to entire groups. In ML manifests as:

  • Stereotyping in training data
  • Overgeneralized feature correlations
  • Skewed sampling across demographics

Mitigated through diverse data collection.