Hugging Face Transformers

developer tools

72029 lines. Hugging Face Transformers came to play.

Hugging Face Transformers revolutionizes model training with their `Trainer` class, simplifying the hyperparameter search process. This guide showcases how to seamlessly integrate popular backends like Optuna and Wandb, making advanced model tuning accessible to all developers.

Not sure yours is this good? Check it →

72,029 lines +6900%
3207 sections +18765%
1 file

Hugging Face Transformers's llms.txt Insights

Overachiever

3207 sections. Most sites can barely manage 3. This one went all in.

War and Peace vibes

72029 lines. They really wanted AI to understand them.

What's inside Hugging Face Transformers's llms.txt

Hugging Face Transformers's llms.txt contains 3 sections:

  • Hyperparameter Search using Trainer API
  • Hyperparameter Search backend
  • How to enable Hyperparameter search in example

How does Hugging Face Transformers's llms.txt compare?

Hugging Face TransformersDirectory AvgTop Performer
Lines72,0291029163,447
Sections3207173207

Cool table. Now the real question — where do you land? Find out →

Hugging Face Transformers's llms.txt preview

First 100 of 72,029 lines

# Hyperparameter Search using Trainer API

🤗 Transformers provides a `Trainer` class optimized for training 🤗 Transformers models, making it easier to start training without manually writing your own training loop. The `Trainer` provides API for hyperparameter search. This doc shows how to enable it in example.

## Hyperparameter Search backend

`Trainer` supports four hyperparameter search backends currently:
[optuna](https://optuna.org/), [sigopt](https://sigopt.com/), [raytune](https://docs.ray.io/en/latest/tune/index.html) and [wandb](https://wandb.ai/site/sweeps).

you should install them before using them as the hyperparameter search backend
```bash
pip install optuna/sigopt/wandb/ray[tune]
```

## How to enable Hyperparameter search in example

Define the hyperparameter search space, different backends need different format.

For sigopt, see sigopt [object_parameter](https://docs.sigopt.com/ai-module-api-references/api_reference/objects/object_parameter), it's like following:
```py
>>> def sigopt_hp_space(trial):
...     return [
...         {"bounds": {"min": 1e-6, "max": 1e-4}, "name": "learning_rate", "type": "double"},
...         {
...             "categorical_values": ["16", "32", "64", "128"],
...             "name": "per_device_train_batch_size",
...             "type": "categorical",
...         },
...     ]
```

For optuna, see optuna [object_parameter](https://optuna.readthedocs.io/en/stable/tutorial/10_key_features/002_configurations.html#sphx-glr-tutorial-10-key-features-002-configurations-py), it's like following:

```py
>>> def optuna_hp_space(trial):
...     return {
...         "learning_rate": trial.suggest_float("learning_rate", 1e-6, 1e-4, log=True),
...         "per_device_train_batch_size": trial.suggest_categorical("per_device_train_batch_size", [16, 32, 64, 128]),
...     }
```

Optuna provides multi-objective HPO. You can pass `direction` in `hyperparameter_search` and define your own compute_objective to return multiple objective values. The Pareto Front (`List[BestRun]`) will be returned in hyperparameter_search, you should refer to the test case `TrainerHyperParameterMultiObjectOptunaIntegrationTest` in [test_trainer](https://github.com/huggingface/transformers/blob/main/tests/trainer/test_trainer.py). It's like following

```py
>>> best_trials = trainer.hyperparameter_search(
...     direction=["minimize", "maximize"],
...     backend="optuna",
...     hp_space=optuna_hp_space,
...     n_trials=20,
...     compute_objective=compute_objective,
... )
```

For raytune, see raytune [object_parameter](https://docs.ray.io/en/latest/tune/api/search_space.html), it's like following:

```py
>>> def ray_hp_space(trial):
...     return {
...         "learning_rate": tune.loguniform(1e-6, 1e-4),
...         "per_device_train_batch_size": tune.choice([16, 32, 64, 128]),
...     }
```

For wandb, see wandb [object_parameter](https://docs.wandb.ai/guides/sweeps/configuration), it's like following:

```py
>>> def wandb_hp_space(trial):
...     return {
...         "method": "random",
...         "metric": {"name": "objective", "goal": "minimize"},
...         "parameters": {
...             "learning_rate": {"distribution": "uniform", "min": 1e-6, "max": 1e-4},
...             "per_device_train_batch_size": {"values": [16, 32, 64, 128]},
...         },
...     }
```

Define a `model_init` function and pass it to the `Trainer`, as an example:
```py
>>> def model_init(trial):
...     return AutoModelForSequenceClassification.from_pretrained(
...         model_args.model_name_or_path,
...         from_tf=bool(".ckpt" in model_args.model_name_or_path),
...         config=config,
...         cache_dir=model_args.cache_dir,
...         revision=model_args.model_revision,
...         token=True if model_args.use_auth_token else None,
...     )
```

Create a `Trainer` with your `model_init` function, training arguments, training and test datasets, and evaluation function:

```py
>>> trainer = Trainer(
...     model=None,
...     args=training_args,
...     train_dataset=small_train_dataset,
...     eval_dataset=small_eval_dataset,
...     compute_metrics=compute_metrics,
...     processing_class=tokenizer,

What is llms.txt?

llms.txt is an open standard that helps AI language models understand your website. By placing a structured markdown file at /llms.txt, websites provide AI search engines like ChatGPT, Claude, and Perplexity with a clear map of their content, services, and documentation. Companies like Hugging Face Transformers use it to ensure AI accurately represents their brand when answering user queries. Read the spec.

See who else in developer tools got the memo →

Hugging Face Transformers showed up. Where's yours?

1000+ companies didn't overthink it. 60 seconds. Go.

Check your site →

More llms.txt examples

View all →