Machine Learning Glossary/H

Verfasst von / Written by Sebastian F. Genter

Machine Learning Glossary

H

hallucination

When generative models produce plausible-sounding but factually incorrect or nonsensical outputs. Common in large language models, such as inventing fake historical events or citing non-existent sources. For example: A model claiming "The Treaty of Verona (1822) established underwater railways" when no such treaty exists.

hashing

Technique for converting categorical features into fixed-size numerical representations using hash functions. Enables efficient handling of high-cardinality features (e.g., user IDs) by mapping them to predefined buckets. Trade-off: May cause collisions where different values map to same bucket.

heuristic

Rule-based approach providing practical solutions without guaranteed optimality. Common ML applications include:

Initial feature selection
Setting baseline performance
Designing simple decision rules before model deployment

Example: Using "if contains 'free offer' then spam" as email filter heuristic.

hidden layer

Intermediate processing stages in neural networks between input and output. Each layer applies:

Weighted sum of inputs
Nonlinear activation (ReLU, sigmoid)
Feature transformation

Deep networks stack multiple hidden layers to learn hierarchical representations.

hierarchical clustering

Cluster analysis method creating nested groupings through either:

Agglomerative (bottom-up): Merge similar clusters
Divisive (top-down): Split dissimilar clusters

Produces dendrograms showing relationships at different scales. Unlike k-means, doesn't require pre-specifying cluster count.

hill climbing

Local optimization strategy iteratively adjusting parameters to improve performance. Used for:

Hyperparameter tuning
Feature selection
Architecture search

Limitation: May get stuck in local optima rather than finding global best solution.

hinge loss

Loss function for maximum-margin classification, defined as: $\max (0, 1 - y_{true} \cdot y_{pred})$ Key component in Support Vector Machines (SVMs), penalizing predictions:

Correct but low-confidence (<1 margin)
Incorrect classifications

historical bias

Systemic distortions inherited from training data reflecting past inequities. Manifestations include:

Gender stereotypes in hiring models
Racial disparities in loan approvals
Age discrimination in marketing algorithms

Requires careful data auditing and debiasing techniques.

holdout data

Data subset excluded from training for final model evaluation. Typically split as:

60-80% training
10-20% validation
10-20% testing

Prevents information leakage and gives unbiased performance estimate.

host

In distributed systems, the central processor coordinating:

Data loading
Device synchronization
Checkpointing

While accelerators (GPUs/TPUs) perform parallel computations. Acts as orchestration layer in training pipelines.

human evaluation

Manual assessment of model outputs using criteria like:

Fluency (for text)
Realism (for images)
Relevance (for recommendations)

Essential for tasks without clear quantitative metrics, like creative writing generation.

human in the loop (HITL)

Hybrid systems combining AI automation with human oversight. Common implementations:

Validation: Humans verify critical decisions
Active learning: Humans label uncertain cases
Error correction: Humans fix model mistakes

Used in medical diagnosis and legal document review.

hyperparameter

Configurable settings controlling model behavior, distinct from learned parameters. Includes:

Learning rate
Network depth
Regularization strength

Optimized through grid search, random search, or Bayesian methods.

hyperplane

Decision boundary in n-dimensional space separating data classes. For 2D: line, 3D: plane. Support Vector Machines maximize margin around hyperplane to improve generalization. Defined by equation: $w \cdot x + b = 0$