Tips and Tricks for Working with Transformers: A Powerful Library for NLP

Bilal Muhammad

10 months ago

Table of Contents

1. Importing transformers modules

from transformers import module_name

2. Loading pre-trained models

from transformers import AutoModelForSequenceClassification, AutoTokenizer
model_name = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

3. Tokenizing text

encoded_input = tokenizer(text, padding=True, truncation=True, max_length=128, return_tensors='pt')

4. Fine-tuning a pre-trained model

from transformers import AdamW, get_scheduler
optimizer = AdamW(model.parameters(), lr=2e-5)
scheduler = get_scheduler("linear", optimizer, num_warmup_steps=100, num_training_steps=1000)

for epoch in range(3):
    model.train()

for batch in dataloader:
    optimizer.zero_grad()
    outputs = model(**batch)
    loss = outputs.loss
    loss.backward()
    optimizer.step()
    scheduler.step()

5. Saving and loading models

model.save_pretrained(directory_path)
model = AutoModelForSequenceClassification.from_pretrained(directory_path)

6. Generating text with language models

input_text = "Once upon a time"
encoded_input = tokenizer.encode(input_text, return_tensors='pt')
output = model.generate(encoded_input, max_length=50, num_beams=5, early_stopping=True)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)

7. Extracting contextual word embeddings

input_text = "Hello, how are you?"
encoded_input = tokenizer(input_text, return_tensors='pt')
output = model(**encoded_input)
embeddings = output.last_hidden_state

8. Fine-tuning with custom datasets

from transformers import Trainer, TrainingArguments
training_args = TrainingArguments(
output_dir='./results',
num_train_epochs=3,
per_device_train_batch_size=16,
per_device_eval_batch_size=64,
warmup_steps=500,
weight_decay=0.01,
logging_dir='./logs',
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
)
trainer.train()

9. Using pipelines for easy inference

from transformers import pipeline
classifier = pipeline('text-classification', model=model, tokenizer=tokenizer)
results = classifier(["Text 1", "Text 2", "Text 3"])

10. Utilizing model-specific features and configurations

Transformers provides a wide range of models and additional features like token classification, named entity recognition, summarization, translation, etc. Refer to the official documentation and model-specific examples to explore these capabilities.

Conclusion

These tips and tricks should help you get started with the transformers library. For more detailed information and examples, make sure to refer to the official documentation and explore the vast range of functionalities the library offers.