Machine Learning Glossary/T

Verfasst von / Written by Sebastian F. Genter

Machine Learning Glossary

T

T5

language

An abbreviation for Text-to-Text Transfer Transformer. T5 is a Transformer-based language model developed by Google. It frames all natural language processing tasks as a text-to-text problem, meaning it takes text as input and produces text as output, regardless of the specific task (e.g., translation, summarization, question answering). T5 achieved state-of-the-art results on many NLP benchmarks by being trained on a massive dataset called C4 (Colossal Clean Crawled Corpus) and using a transfer learning approach.

T5X

T5X is a flexible and scalable framework built in JAX and Flax for training large-scale language models, including T5, PaLM, and UL2. It is designed to handle the complexities of training large language models that require distributed computation across multiple devices and TPU Pods. T5X provides a modular architecture that supports various model architectures, optimizers, and training strategies for very large models.

tabular Q-learning

reinforcementLearning

A specific implementation of the Q-learning algorithm where the Q-function ($Q(s, a)$) is represented as a table. In tabular Q-learning, the table has rows corresponding to each possible state ($s$) in the environment and columns corresponding to each possible action ($a$) that the agent can take. Each cell in the table, $Q(s, a)$, stores the estimated value of taking action $a$ in state $s$. This method is only feasible for environments with a relatively small and finite number of states and actions, as the size of the table grows linearly with the number of states and actions. For environments with large or continuous state/action spaces, function approximation methods (like neural networks) are used instead of a table.

target

fundamentals

Synonym for label. The target is the output value or category that a supervised machine learning model is trained to predict based on the input features.

target network

reinforcementLearning

In some reinforcement learning algorithms, particularly value-based methods like Deep Q-Networks (DQNs), a target network is a separate copy of the main model (the network being trained) that is used to estimate the target values for the Q-function updates. The parameters of the target network are updated less frequently and more slowly than the parameters of the main network. Using a target network helps to stabilize the training process by providing a more stable target value for the agent to learn from, preventing oscillations or divergence that can occur when the same network is used for both prediction and target estimation.

task

fundamentals

In machine learning, a task refers to the specific problem that a model is designed to solve. Different machine learning tasks require different types of models, algorithms, and metrics. Examples of tasks include classification, regression, clustering, machine translation, image recognition, and recommendation.

temperature

generativeAI

A hyperparameter used in generative AI models, particularly large language models, that controls the randomness or creativity of the generated output. Temperature is applied to the logits of the model's output distribution.

A higher temperature (e.g., > 1.0) makes the distribution flatter, increasing the probability of sampling lower-probability tokens, leading to more diverse and potentially surprising or creative output.
A temperature of 1.0 corresponds to standard sampling from the original distribution.
A lower temperature (e.g., < 1.0) makes the distribution sharper, concentrating the probability mass on the most likely tokens, resulting in more deterministic and focused, but potentially less creative, output.
A temperature of 0.0 typically corresponds to greedy decoding, where the model always selects the most probable token.

temporal data

fundamentals

Temporal data is data where the values are recorded over time. This type of data has a temporal dimension or time component, meaning the order or time of the observations is significant. Examples include stock prices, sensor readings from a machine over time, or sequences of words in a sentence. Analyzing and modeling temporal data often requires specialized techniques like time series analysis and sequence models such as recurrent neural networks.

Tensor

TensorFlow

A fundamental data structure used in TensorFlow and other numerical computation libraries. A Tensor is a multi-dimensional array, essentially a generalization of scalars (rank 0), vectors (rank 1), and matrices (rank 2) to arbitrary numbers of dimensions (rank). Tensors are the primary units of data that flow through the computation graph in TensorFlow operations. They have a shape (the size of each dimension) and a data type (e.g., floating-point, integer).

TensorBoard

TensorFlow

A visualization tool provided with TensorFlow. TensorBoard enables users to visualize various aspects of the training) process and model architecture, including loss curves, metrics, gradients, parameter distributions, and the computation graph. It is a crucial tool for debugging, monitoring, and understanding the behavior of TensorFlow models during development and [[Machine_Learning_Glossary/T#training|training)].

TensorFlow

TensorFlow

An open-source platform for machine learning developed by Google. TensorFlow provides a comprehensive ecosystem of tools, libraries, and community resources that enable researchers and developers to build and deploy ML systems. It is designed for numerical computation using Tensors and supports deployment across various platforms (CPUs, GPUs, TPUs, mobile, edge devices). TensorFlow offers high-level APIs like Keras for ease of use, as well as lower-level APIs for more flexibility.

TensorFlow Playground

A web-based interactive visualization tool created by Google to help users understand how neural networks work. The TensorFlow Playground allows users to experiment with different neural network architectures, hyperparameters, datasets, and features, observing in real-time how these choices affect the model's training) and performance on simple tasks. It's a valuable educational resource for gaining intuition about neural networks.

TensorFlow Serving

TensorFlow

A flexible, high-performance serving system specifically designed for machine learning models, particularly those trained with TensorFlow. TensorFlow Serving can serve multiple models or multiple versions of the same model simultaneously. It is optimized for low latency and high throughput, making it suitable for production environments that require efficient online inference.

Tensor Processing Unit (TPU)

TPU

A custom-designed accelerator chip developed by Google specifically to accelerate machine learning workloads. Tensor Processing Units (TPUs) are optimized for performing the matrix multiplication and other operations commonly used in neural network computations, providing significant speedups compared to traditional CPUs or GPUs for many training) and inference tasks. TPUs are available in various configurations, including TPU devices, Cloud TPUs, and TPU Pods.

Tensor rank

Synonym for rank (Tensor). The Tensor rank is the number of dimensions in a Tensor.

Tensor shape

Synonym for shape (Tensor). The Tensor shape describes the number of elements along each dimension of a Tensor.

Tensor size

TensorFlow

The total number of scalar elements contained within a Tensor. The Tensor size is calculated by multiplying the lengths of all the dimensions in the Tensor's shape. For example, a Tensor with shape (3, 4) has a size of 12.

TensorStore

A Python library designed for efficient storage and access of large, multi-dimensional arrays (Tensors). TensorStore provides a unified API for reading and writing Tensors to various storage backends (like local disk, Google Cloud Storage, etc.). It is designed to handle large datasets that do not fit into memory and supports features like chunking, compression, and parallel access, making it suitable for machine learning workflows involving large datasets.

termination condition

reinforcementLearning

In reinforcement learning, a termination condition is a rule or set of rules that define when an episode of interaction between the agent and the environment ends. Once a termination condition is met, the current episode finishes, and a new episode typically begins. Examples of termination conditions include reaching a goal state, exceeding a maximum number of timesteps, or falling into a failure state.

test

fundamentals

Synonym for test set. The test set is the subset of data used to evaluate the final performance of a trained model.

test loss

fundamentals

The loss calculated on the test set. Test loss provides an unbiased evaluation of how well the trained model generalizes to new, unseen data. It is a critical metric for identifying overfitting, where the test loss will be significantly higher than the training loss.

test set

fundamentals

The subset of a dataset used exclusively for evaluating the final performance of a trained model. The test set is kept separate from the training set and validation set throughout the model development process to provide an unbiased estimate of the model's ability to generalize to new data. Evaluating on the test set is the final step in the standard model evaluation workflow.

text span

language

A contiguous sequence of tokens within a larger body of text. A text span can range in length from a single token to multiple sentences or paragraphs. Identifying and manipulating text spans is a common task in natural language processing, particularly in tasks like named entity recognition, question answering (where the answer is a span within a document), and text extraction.

tf.Example

TensorFlow

A standard message format used in TensorFlow for representing training examples and inference examples. A tf.Example is a flexible structure that can contain various types of features, including byte strings, floating-point numbers, and integers, each with a list of values. This format is used by TensorFlow's data loading APIs and is commonly used for serializing data to disk for efficient input pipelines.

tf.keras

TensorFlow

A high-level API for building and training) neural networks that is integrated into TensorFlow. tf.keras provides a user-friendly and modular way to define models, layers, and optimizers, making it easier to quickly prototype and experiment with different neural network architectures. It supports both sequential and functional API styles for building models and is the recommended high-level API for most users in modern TensorFlow.

threshold (for decision trees)

decisionForests

In a decision tree, the threshold is the specific value of a feature that is used in an axis-aligned condition to split the data at a node. For example, if a split is based on the condition "Is Feature X > 10?", then 10 is the threshold for that split. The learning algorithm for decision trees determines the optimal feature and threshold for each split to maximize metrics like information gain.

time series analysis

temporal

A field of machine learning and statistics focused on analyzing and modeling temporal data (data collected over time). Time series analysis involves identifying patterns, trends, seasonality, and dependencies within the data to understand past behavior and make future predictions. Techniques used in time series analysis include statistical methods like ARIMA, as well as machine learning sequence models such as recurrent neural networks.

timestep

reinforcementLearning

In reinforcement learning, a timestep represents a single discrete unit of time within an episode of interaction between the agent and the environment. At each timestep, the agent observes the current state, chooses and performs an action, and the environment transitions to a new state and provides a reward. An episode consists of a sequence of timesteps until a termination condition is met.

token

language

In language models and natural language processing, a token is a fundamental unit of text processing. Tokens are typically words, but they can also be punctuation marks, symbols, or subword units (like prefixes or suffixes). The process of breaking down text into tokens is called tokenization. Language models process text as sequences of tokens.

top-k accuracy

Metric

A metric used to evaluate the performance of multi-class classification models, particularly when there are many possible classes. Top-k accuracy measures whether the correct label is present among the top $k$ most probable predictions made by the model for a given example. For example, if $k=5$, top-5 accuracy checks if the true class label is among the top 5 classes predicted by the model. This is less strict than standard accuracy (which is equivalent to top-1 accuracy), where only the single most probable prediction is considered correct.

tower

TensorFlow

In TensorFlow and distributed training) setups, a tower typically refers to the portion of the model that is replicated and runs on a single device (like a GPU or TPU core) to process a mini-batch of data. In a data-parallel training) strategy, the total batch size is divided among multiple towers, and the gradients computed by each tower are aggregated to update the shared parameters of the model.

toxicity

responsible

Toxicity in the context of machine learning and language models refers to language that is rude, disrespectful, or unreasonable, and that is likely to make someone leave a discussion or conversation. Detecting and mitigating toxicity is a significant challenge and focus area in developing responsible large language models and online platforms. LLM evaluations often include metrics to assess the level of toxicity in generated text.

TPU

See Tensor Processing Unit (TPU).

TPU chip

TPU

The physical Tensor Processing Unit device. A TPU chip contains computational units optimized for machine learning workloads. Multiple TPU chips can be interconnected to form a TPU device or TPU Pod for accelerating larger training) or inference tasks.

TPU device

TPU

A single board or unit containing one or more TPU chips and associated hardware. A TPU device serves as a computational device for running TensorFlow or other machine learning computations. Multiple TPU devices can be linked together to form larger TPU resources like TPU Pods.

TPU node

TPU

A virtual representation of a Cloud TPU resource that you create and manage in the Google Cloud environment. A TPU node corresponds to one or more TPU devices that are allocated for your use to run machine learning workloads.

TPU Pod

TPU

A large-scale configuration of interconnected TPU devices designed to provide massive computational power for machine learning training) and inference at scale. A TPU Pod consists of many TPU devices networked together to function as a single, powerful accelerator. TPU Pods are used to train the largest and most complex models.

TPU resource

TPU

A general term referring to any allocation of Tensor Processing Unit hardware available for running machine learning computations. A TPU resource can range from a single TPU device to a large TPU Pod.

TPU slice

TPU

A portion of a TPU Pod that is allocated for a specific workload. A TPU slice consists of a set of interconnected TPU devices or cores within a Pod that work together to run a user's machine learning job. The size of a TPU slice can be configured based on the computational needs of the workload.

TPU type

TPU

The specific generation and configuration of a TPU resource. TPU type refers to the hardware version (e.g., TPU v2, v3, v4) and the number of cores or devices included in the resource (e.g., 2x2, 4x4). Different TPU types offer varying levels of computational performance and memory capacity.

TPU worker

TPU

A job or process running on a TPU resource that performs the computational work for a machine learning task. In a distributed training) setup using TPU Pods, multiple TPU workers run in parallel, each executing a portion of the training) or inference workload on their assigned TPU slice.

training

fundamentals

The core process in machine learning where a model learns from data to make accurate predictions or perform a specific task. Training involves feeding training set data to the model and iteratively adjusting the model's internal parameters (its weights and biases) to minimize the objective function (loss function) or maximize a performance metric. This process is typically performed using optimization algorithms like gradient descent.

training loss

fundamentals

The loss calculated during the training) process on the training set. Training loss measures how well the model is fitting the data it is currently learning from. Monitoring training loss is essential for understanding the progress of training) and can help diagnose issues like overfitting (where training loss continues to decrease while validation loss increases) or underfitting (where training loss remains high).

training-serving skew

fundamentals

A problem that occurs when there is a significant difference between the way data is handled during training) and the way it is handled during serving (inference). Training-serving skew can lead to degraded model performance in production compared to its performance during evaluation on the test set. Common causes include discrepancies in preprocessing logic, differences in data sources or pipelines, or changes in the data distribution between the training set and the data encountered during serving.

training set

fundamentals

The largest subset of a dataset used to train a machine learning model. The training set contains the examples that the model learns from by iteratively adjusting its internal parameters. The training) process aims to minimize the training loss and improve the model's ability to make accurate predictions on this data. The training set is distinct from the validation set and test set.

trajectory

reinforcementLearning

In reinforcement learning, a trajectory refers to a sequence of states, actions, and rewards experienced by an agent during a single episode of interaction with its environment. A trajectory can be represented as $(s_0, a_0, r_1, s_1, a_1, r_2, s_2, \dots, s_T)$, where $s_t$ is the state at timestep $t$, $a_t$ is the action taken at timestep $t$, $r_{t+1}$ is the reward received after taking action $a_t$ and transitioning to state $s_{t+1}$, and $T$ is the final timestep of the episode. Learning from collected trajectories is fundamental to many reinforcement learning algorithms.

transfer learning

fundamentals

A machine learning technique where knowledge gained from training a model on one task is reused to improve the performance on a different, but related, task. In transfer learning, a pre-trained model (trained on a large dataset for a general task) is used as a starting point for training) on a new, specific task with a smaller dataset. The pre-trained model's learned features or parameters are transferred and then typically fine-tuned on the new task. This significantly reduces the amount of data and computational resources required for the new task compared to training a model from scratch.

Transformer

Transformer

A machine learning model architecture that relies heavily on the attention mechanism, particularly self-attention. Transformers were initially developed for sequence-to-sequence tasks like machine translation but have since become the dominant architecture for many natural language processing tasks, including large language models. Unlike recurrent neural networks, Transformers process input sequences in parallel, enabling faster training) on modern hardware. They utilize positional encoding to incorporate information about the order of tokens in the sequence.

translational invariance

vision

An attribute of an image model that means the model can successfully classify images even if the position of the object in the image shifts. For example, if a model is trained to recognize a cat, it should ideally still recognize the cat regardless of where it appears in the image. Convolutional neural networks achieve some degree of translational invariance through the use of convolutional layers and pooling operations, which allow them to detect features regardless of their exact location in the image. Contrast with rotational invariance.

trigram

language

An N-gram of size 3. A trigram is a contiguous sequence of three tokens (typically words or characters) from a text. Trigrams are used in N-gram language models to capture local word sequences and dependencies, which can be useful for tasks like text generation or sentiment analysis.

true negative (TN)

Metric

In binary classification, a true negative (TN) is an outcome where the model correctly predicted the negative class. This occurs when the actual label is negative, and the model's prediction is also negative. True negatives are one of the four possible outcomes represented in a confusion matrix and are used to calculate metrics like accuracy and specificity.

true positive (TP)

Metric

In binary classification, a true positive (TP) is an outcome where the model correctly predicted the positive class. This occurs when the actual label is positive, and the model's prediction is also positive. True positives are one of the four possible outcomes represented in a confusion matrix and are fundamental to calculating metrics like precision, recall (True Positive Rate), and accuracy.

true positive rate (TPR)

Metric

In binary classification, the true positive rate (TPR) is a metric that measures the proportion of actual positive examples that are correctly identified by the model. It is calculated as the number of true positives divided by the total number of actual positives (true positives + false negatives). The true positive rate is synonymous with recall. It is a key metric used in the ROC curve, where it is plotted against the false positive rate.