Machine Learning Glossary/R

Verfasst von / Written by Sebastian F. Genter

Machine Learning Glossary

R

RAG (Retrieval-Augmented Generation)

A hybrid AI architecture that combines:

Information retrieval from external knowledge sources
Generative language model capabilities

Key components: 1. Retrieval system finds relevant documents 2. Generator incorporates retrieved information into responses

Benefits:

Reduces hallucinations
Enables fact-checking against sources
Allows knowledge updates without retraining

random forest

An ensemble learning method that operates by:

Constructing multiple decision trees
Outputting the mode (classification) or mean prediction (regression) of individual trees

Advantages:

Handles high-dimensional data well
Resistant to overfitting
Provides feature importance metrics

random policy

In reinforcement learning, a strategy that:

Selects actions uniformly at random
Serves as baseline for comparison
Useful for initial exploration

rank (ordinality)

The position of an item in an ordered sequence. Important for:

Ranking problems (search results, recommendations)
Ordinal regression tasks
Evaluation metrics like NDCG

rank (Tensor)

The number of dimensions in a tensor:

Scalar: rank 0 (e.g., 5)
Vector: rank 1 (e.g., [1,2,3])
Matrix: rank 2 (e.g., [[1,2],[3,4]])
Higher-order tensors: rank 3+

ranking

The process of ordering items by relevance/importance. Applications include:

Search engine results
Recommendation systems
Ad placement

rater

A human evaluator who:

Labels training data
Assesses model outputs
Provides feedback for RLHF

recall

A classification metric measuring:

True positives / (True positives + False negatives)
The model's ability to find all relevant instances

recall at k (recall@k)

Evaluation metric measuring:

Proportion of relevant items in top k results
Used in recommender systems and IR

recommendation system

Systems that predict user preferences for:

Products
Content
Services

Main approaches:

Collaborative filtering
Content-based
Hybrid

Rectified Linear Unit (ReLU)

A popular activation function:

f(x) = max(0, x)
Addresses vanishing gradient problem
Computationally efficient

recurrent neural network

Neural networks with:

Temporal connections
Memory of previous inputs
Applications in sequence modeling

reference text

Gold standard text used for:

Model evaluation
Training supervision
Quality benchmarking

regression model

Models predicting continuous values:

Linear regression
Polynomial regression
Neural network regression

regularization

Techniques preventing overfitting:

L1/L2 regularization
Dropout
Early stopping

regularization rate

Hyperparameter controlling:

Strength of regularization
Tradeoff between fit and simplicity

reinforcement learning (RL)

Learning paradigm where agents:

Interact with environment
Receive rewards/penalties
Optimize long-term return

RLHF (Reinforcement Learning from Human Feedback)

Training process combining:

Human preference judgments
Reward modeling
Policy optimization

replay buffer

In RL, stores:

Past experiences (state, action, reward)
Enables experience replay
Improves sample efficiency

replica

Duplicate components for:

Parallel processing
Fault tolerance
Load balancing

reporting bias

When available data:

Overrepresents certain phenomena
Underrepresents others
Distorts model learning

representation

How data is encoded for:

Machine processing
Feature extraction
Dimensionality reduction

re-ranking

Secondary ranking phase that:

Refines initial results
Incorporates additional signals
Improves final ordering

return

In RL, the cumulative:

Discounted future rewards
Objective to maximize
Measure of policy quality

reward

In RL, the:

Immediate feedback signal
Numerical evaluation of actions
Driver of learning

ridge regularization

L2 regularization that:

Adds squared magnitude penalty
Prevents coefficient explosion
Encourages small weights

RNN

Abbreviation for recurrent neural network

ROC Curve

Graphical plot showing:

True positive rate vs false positive rate
Across classification thresholds
Visualizes tradeoffs

role prompting

Technique where:

LLM is assigned a specific role
("You are a helpful assistant")
Guides response style

root

In decision trees:

The initial node
Contains all training data
First split point

root directory

The top-level:

Folder in filesystem
Container for ML project files
Reference point for paths

RMSE

Root Mean Squared Error:

sqrt(mean(squared errors))
Common regression metric
Sensitive to outliers

rotational invariance

Property where:

Input rotations don't affect output
Important for image models
Achieved via data augmentation

ROUGE

Family of metrics for:

Evaluating text summarization
Comparing machine/human text
Measuring n-gram overlap

Variants:

ROUGE-N (n-gram based)
ROUGE-L (longest common subsequence)
ROUGE-S (skip-bigram)

R-squared

Coefficient of determination:

Measures variance explained
Ranges 0-1 (higher is better)
Common regression metric