Verfasst von / Written by Sebastian F. Genter
C
calibration layer
Post-prediction adjustment aligning model outputs with observed distributions. Corrects systematic biases through:
- Platt scaling (logistic calibration)
- Isotonic regression
Critical for reliability in probabilistic forecasting.
candidate generation
Initial recommendation phase filtering large catalogs to manageable options. Strategies:
- Collaborative filtering
- Content-based filtering
- Embedding similarity search
Reduces 100,000 items → 500 candidates for detailed ranking.
candidate sampling
Training optimization for large output spaces:
- Evaluates all positive labels
- Samples subset of negative labels
Reduces computation in scenarios like extreme classification (millions of classes).
categorical data
Discrete features with finite possible values:
- Nominal: Colors {red, blue, green}
- Ordinal: Ratings {poor, fair, good}
Encoded via one-hot, embeddings, or target encoding.
causal language model
Unidirectional models predicting next token using left context:
- GPT architecture
- Masked future tokens during training
Contrasts with bidirectional models like BERT.
centroid
Cluster center in partitioning methods:
- k-means: Mean of cluster points
- k-median: Median reduces outlier sensitivity
Updated iteratively during clustering.
centroid-based clustering
Partitioning approach grouping data around central points:
- k-means (most common)
- k-medoids
- BIRCH (hierarchical variant)
Requires predefined cluster count (k).
chain-of-thought prompting
LLM technique eliciting step-by-step reasoning: "Calculate gravitational force between Earth and Moon. Show equations." Forces explicit calculation rather than direct answers.
chat
Conversational interface preserving dialog history:
- Context window management
- Multi-turn interaction tracking
Applications: Customer service bots, AI companions.
checkpoint
Model state preservation:
- Training: Resume from interruption
- Deployment: Version control
Contains weights, optimizer state, and metadata.
class
Discrete prediction category:
- Binary: {spam, not_spam}
- Multiclass: {cat, dog, horse}
- Multilabel: Multiple simultaneous classes
classification model
Predictor outputting discrete labels:
- Logistic regression (probabilistic)
- SVM (maximum margin)
- Decision trees (rule-based)
Contrasts with regression models.
classification threshold
Probability cutoff converting scores to classes:
- Default 0.5 for binary
- Tuning affects precision/recall tradeoff
Example: Cancer diagnosis vs spam filtering thresholds.
classifier
(See classification model)
class-imbalanced dataset
Skewed class distribution challenges:
- 1,000,000:1 negative:positive ratio
- Mitigation: Resampling, class weights, anomaly detection
clipping
Outlier handling techniques:
- Feature clipping: Cap extreme values (e.g., ages >100 →100)
- Gradient clipping: Prevent exploding gradients
Stabilizes training and numerical computations.
Cloud TPU
Google's tensor processing units:
- Matrix multiplication optimization
- Pod configurations for large models
- Integrated with TensorFlow/JAX
clustering
Unsupervised grouping methods:
- Centroid-based (k-means)
- Density-based (DBSCAN)
- Hierarchical (agglomerative)
Applications: Customer segmentation, anomaly detection.
co-adaptation
Neural network pathology where:
- Neurons over-specialize to specific patterns
- Reduces generalization
Mitigated via dropout regularization.
collaborative filtering
Recommendation technique using user-item interactions:
- User-based: "Customers like you bought..."
- Item-based: "People who bought X also bought Y"
Matrix factorization approaches (SVD, ALS).
concept drift
Data distribution shifts over time:
- Feature meaning changes (e.g., "fuel efficiency" standards)
- Requires model retraining/monitoring
Detection methods: Statistical process control.
condition
Decision tree splitting rule:
- Axis-aligned: single feature threshold
- Oblique: multiple feature combinations
Determines data partitioning path.
confabulation
(LLM Hallucination) Plausible but incorrect generations:
- Factual errors in summaries
- Fictional citations
Mitigation: Retrieval augmentation, grounding.
configuration
Model setup parameters:
- Hyperparameters (learning rate, layers)
- Architectural choices (optimizer type)
Managed via config files (YAML/JSON) or libraries (Gin).
confirmation bias
Human tendency favoring information confirming existing beliefs:
- Dataset collection bias
- Labeling subjectivity
- Metric selection skew
confusion matrix
Classification performance visualization:
| Actual\Predicted | Positive | Negative |
|---|---|---|
| Positive | TP | FN |
| Negative | FP | TN |
Derives metrics: Accuracy, Precision, Recall, F1.
constituency parsing
Sentence structure analysis breaking text into nested phrases:
- Noun phrases
- Verb phrases
- Prepositional phrases
Used in grammar checking, information extraction.
contextualized language embedding
Word representations adapting to context:
- "Bank" → financial vs river meanings
- BERT-style dynamic embeddings
Superior to static embeddings (Word2Vec).
context window
LLM input token capacity:
- Early models: 512 tokens
- Modern: 8k-128k tokens
Determines document processing capabilities.
continuous feature
Numerical variables with infinite possible values:
- Temperature measurements
- Sensor readings
Requires normalization/scaling for model stability.
convenience sampling
Non-probabilistic data collection:
- Quick experiments
- Potential sampling bias
Should transition to stratified/random sampling.
convergence
Training stability state when:
- Loss plateaus
- Parameter updates become negligible
Indicates model readiness (or local optimum).
convex function
Mathematical property enabling optimization:
- Bowl-shaped curve
- Single global minimum
Examples: L2 loss, logistic loss.
convex optimization
Minimizing convex functions:
- Guaranteed convergence
- Gradient descent variants
Foundation of linear models.
convex set
Geometric property where line between any two points remains within set:
- Spheres
- Cubes
Non-convex example: Star shape.
convolution
Matrix operation extracting spatial/temporal patterns:
- Kernel sliding across input
- Element-wise multiplication + summation
Basis for CNNs in image processing.
convolutional filter
Feature detector kernels:
- Edge detection: [[-1,0,1], [-1,0,1], [-1,0,1]]
- Learned during training
Depthwise separable variants reduce parameters.
convolutional layer
CNN component applying filters:
- Stride controls overlap
- Padding preserves dimensions
- Channels manage feature depth
convolutional neural network
Architecture for grid-like data:
- Convolution → Pooling → Dense
- Local connectivity → Translation invariance
Dominates computer vision tasks.
convolutional operation
Feature map calculation process: Where I=input, K=kernel.
cost
(See loss)
co-training
Semi-supervised method using:
- Multiple views of data
- Complementary feature sets
Example: Web page classification using text + link structure.
counterfactual fairness
Fairness criterion requiring:
- Same prediction for individuals differing only in protected attributes
- "What if?" scenario analysis
Mathematically formalized through causal models.
coverage bias
Dataset incompleteness issues:
- Missing population segments
- Unrepresentative sampling
Leads to model underspecification.
crash blossom
Ambiguous phrasing challenging NLU: "Stolen painting found by tree"
- Found near tree?
- Found using tree method?
Requires world knowledge for disambiguation.
critic
Reinforcement learning component:
- Estimates value functions
- DQN: Q-value approximator
Guides policy improvement through evaluation.
cross-entropy
Multiclass loss function: Minimized during classification training.
cross-validation
Robust evaluation protocol:
- k-fold data partitioning
- Rotation of train/validation splits
Prevents overfitting to single split.
cumulative distribution function (CDF)
Probability analysis tool: Used for:
- Statistical testing
- Data distribution analysis
- Quantile calculations