What is the 4th-root rule for embedding dimensions?

The 4th-root rule suggests setting the embedding dimension to the fourth root of the number of unique categories (vocab size). For 10,000 categories that gives roughly 10 — a useful lower bound for categorical embeddings in tabular models.

What is the Google rule of thumb?

Google's practical guideline recommends using the fourth root of unique categories multiplied by a scaling factor, but a common simplified form used in production is: dim ≈ cardinality^0.25 × 6, capped at a reasonable maximum. This tool shows both the raw 4th-root and a ×6 scaled version so you can pick based on task complexity.

Should I always round to a power of two?

Not required, but recommended. GPUs and TPUs are most efficient when tensor dimensions are multiples of 8 or 16, and powers of two (32, 64, 128, 256 …) hit peak BLAS throughput. The tool highlights the nearest power-of-two ceiling for that reason.

Does dataset size affect the recommended dimension?

Yes. With more training examples you can reliably learn higher-dimensional representations. A rough heuristic: aim for at least 5–10 training examples per embedding dimension per category, so very small datasets should use smaller dimensions even if vocabulary is large.

Why do large language models use such big dimensions?

LLMs embed tens of thousands of sub-word tokens but also need their embeddings to encode rich syntactic and semantic structure across hundreds of layers. Dimensions like 768 (BERT-base) or 4096 (LLaMA-7B) reflect both vocabulary scale and the depth of the task, not just the token count.

Can I use this for image or graph embeddings?

The formulas are derived for categorical/text embeddings and serve as a starting point. Image and graph embeddings are typically driven by architecture depth rather than vocabulary size, so treat the numbers here as a lower bound and tune empirically.

Embedding Dimension Helper

Name: Embedding Dimension Helper
Availability: InStock
Author: Nham Vu

Enter your dataset size and unique category count to get a recommended embedding dimension, then compare it against standard model sizes.

Your Dataset

Unique categories / vocab size

Number of distinct items to embed (tokens, product IDs, user IDs, etc.)

Training examples (optional)

Used for the data-coverage check (examples per dimension per category).

Precision (for memory estimate)

Recommended Dimensions

Enter a vocab size and click Calculate.

Standard Model Dimensions

Reference: common embedding / hidden sizes in production models.

Model	Dim	Vocab	Match

Summary

Enter your dataset size and unique category count to get a recommended embedding dimension, then compare it against standard model sizes.

How it works

Enter the number of unique categories or vocabulary items in your dataset.
Optionally enter your total dataset size (number of training examples).
Read the recommended dimension from the 4th-root rule and Google rule.
Compare your recommended size against the standard model dimensions shown in the reference table.
Use the closest power of two at or above your recommendation for hardware efficiency.

Use cases

Choose an embedding dimension for a product recommendation system with millions of SKUs.
Set entity embedding size for a tabular deep learning model.
Pick a word embedding dimension when training a custom NLP model from scratch.
Validate that a pre-trained embedding matches the complexity of your dataset.
Teach students the relationship between vocabulary size and embedding dimension.
Quickly benchmark your choice against production models like BERT or LLaMA.

Frequently Asked Questions

Last updated: 2026-07-22 · Reviewed by Nham Vu