Fine-tuning experiment for generating news headlines from short article descriptions using facebook/bart-base and LoRA adapters.
This was built as a DAT410 Design of AI Systems course project at Chalmers University in Spring 2025 with Elvina Fahlgren. The project received 100/100, and the original course report is available at docs/report.pdf.
The project explores whether parameter-efficient fine-tuning can adapt a pretrained sequence-to-sequence model to headline generation under limited compute.
Key technical pieces:
- Fine-tuning
facebook/bart-basefor description-to-headline generation - Parameter-efficient training with LoRA instead of updating all model weights
- Hugging Face
transformers,datasets,evaluate, andpeft - Exploratory analysis of the HuffPost News Category Dataset
- Training and evaluation in Google Colab on an A100 GPU
- Scriptable training, evaluation, and inference entry points for reproducibility
| Area | Details |
|---|---|
| Task | Generate a headline from a short news description |
| Dataset | HuffPost News Category Dataset, about 209k articles |
| Base model | facebook/bart-base |
| Adaptation method | LoRA on BART attention projection layers |
| Trainable parameters | 442,368 of 139,862,784, about 0.32% |
| Training environment | Google Colab Pro, A100 GPU, fp16 |
| Batch size | 16 |
| Saved notebook output | Evaluation after epoch 5 |
The committed notebook reports:
eval_loss: 4.6797
eval_bleu: 0.5041
epoch: 5.0
Important: the notebook's compute_metrics function multiplies Hugging Face's raw BLEU value by 100 before returning it. That means the reported eval_bleu: 0.5041 corresponds to raw corpus BLEU of about 0.005.
This should not be read as 0.50 raw BLEU or 50% BLEU. The result is best understood as a working fine-tuning pipeline built under course constraints, not as a model optimized for production headline quality.
For headline generation, BLEU is also a limited metric: many valid headlines can share little exact n-gram overlap with the reference. A stronger follow-up would add ROUGE, BERTScore, qualitative examples, and a reproducible evaluation script.
.
|-- configs/
| `-- default.json
|-- docs/
| `-- report.pdf
|-- notebooks/
| |-- 01_final_model.ipynb
| `-- 02_load_and_explore.ipynb
|-- src/
| |-- data.py
| |-- evaluate.py
| |-- infer.py
| |-- metrics.py
| |-- modeling.py
| `-- train.py
|-- .gitignore
|-- requirements.txt
`-- README.md
Run these in order:
notebooks/02_load_and_explore.ipynbexplores the dataset, category distribution, word counts, and missing values.notebooks/01_final_model.ipynbloads BART, applies LoRA, preprocesses the dataset, trains, evaluates, and prints sample predictions.
The dataset is not included in this repository.
Use the Kaggle "News Category Dataset" by Rishabh Misra. The notebooks expect the JSON file at:
/content/drive/MyDrive/News_Category_Dataset_v3.json
The training notebook also writes checkpoints to:
/content/drive/MyDrive/checkpoints
The original run was done in Google Colab. The repository now also includes script entry points so the workflow can be rerun outside the notebooks.
Install dependencies:
pip install -r requirements.txtThe default config expects the Kaggle dataset at:
/content/drive/MyDrive/News_Category_Dataset_v3.json
To use a different location, edit data.dataset_path in configs/default.json.
python -m src.train --config configs/default.jsonThis trains LoRA adapters with a fixed train/test split seed and writes checkpoints to:
checkpoints/bart-lora-headline
Evaluate a trained adapter:
python -m src.evaluate \
--config configs/default.json \
--adapter-path checkpoints/bart-lora-headline/finalOmit --adapter-path to evaluate the base facebook/bart-base model as a baseline.
For a quick smoke test on a smaller subset:
python -m src.evaluate \
--config configs/default.json \
--adapter-path checkpoints/bart-lora-headline/final \
--limit 100The script writes aggregate metrics and sample predictions under results/.
Generate a headline from one description:
python -m src.infer \
--config configs/default.json \
--adapter-path checkpoints/bart-lora-headline/final \
--text "A short news article description goes here."Omit --adapter-path to generate with the base model.
- Download
News_Category_Dataset_v3.jsonfrom Kaggle. - Upload it to the Google Drive path shown above.
- Run the exploration notebook.
- Run the final model notebook on a GPU runtime.
This repository currently preserves the course-project version of the work. Before treating it as a fully reproducible ML project, these items should be cleaned up:
- Add markdown explanations inside the notebooks.
- Rerun the new script pipeline and commit a clean metric table plus sample predictions.
- Add BERTScore and a stronger baseline comparison.
- Add an inference-only demo path.
- Add a license and clearer dataset usage notes.
Group project by Harry Denell and Elvina Fahlgren for DAT410, Chalmers University.