ATRS

Official implementation of:

Lim, H., Li, X., Park, S., Li, Q., & Kim, J. (2026). Reducing contextual noise in review-based recommendation via aspect term extraction and attention modeling. Information Sciences, 735, 123078. Paper

Overview

This repository is the official implementation of ATRS (Aspect Term-aware Recommender System), published in Information Sciences (2026).

Most review-based recommendation models process entire review bodies indiscriminately, allowing aspect-relevant signal to be diluted by surrounding context. ATRS addresses this by routing review text through a dedicated Aspect Term Extraction (ATE) stage that filters out non-aspect content before downstream encoding.

The retained aspect terms are encoded with a 1D-CNN over Word2Vec embeddings, fused with user/item ID embeddings, and passed through a self-attention block to form aspect-aware user and item representations. These are concatenated and forwarded to an MLP that predicts a continuous rating score as a regression target. Quantitative comparisons against representative recommendation baselines on Amazon and Yelp datasets are reported in Experimental Results.

Repository Structure

├── data/
│   ├── raw/                        # Source datasets — place {fname}.{raw_ext} here
│   ├── processed/                  # Pipeline parquet caches (preprocessed / aspects)
│   └── ate_output/                 # PyABSA workspace + extraction JSON
│       └── .pyabsa/                # Contained pyabsa CWD: checkpoints/, checkpoints.json, result JSON
│
├── model/
│   ├── atrs.py                     # ATRS architecture, trainer, predictor
│   ├── ATRS Architecture.png       # Architecture diagram
│   └── save/                       # Best checkpoint per dataset (best.pth)
│
├── src/
│   ├── config.yaml                 # Single source of truth for all hyperparameters
│   ├── data_processing.py          # DataProcessor pipeline + Dataset/DataLoader factory
│   ├── aspect_extraction.py        # ATExtractor — PyABSA wrapper for aspect term extraction
│   ├── preprocessing.py            # Review-text cleaning and row filters
│   ├── path.py                     # Project path constants (auto-creates runtime folders)
│   └── utils.py                    # Generic helpers — I/O, metrics, seeding
│
├── main.py                         # Entry point: data preparation → train → test
├── requirements.txt
└── README.md

Model Description

ATRS consists of two sequential modules. Aspect extraction runs in src/aspect_extraction.py (orchestrated by src/data_processing.py); the recommender network is in model/atrs.py. The full architecture is illustrated below.

1. Aspect Term Extraction Module

A pretrained Transformer encoder (PyABSA's English ATE checkpoint, FAST-LCF-ATEPC over DeBERTa-v3-base) reads each cleaned review and emits BIO-tagged aspect terms. Per-row aspect lists are then aggregated into per-user and per-item aspect sets, which become the inputs to the RS module.

2. Recommender System Module

Each user and item aspect set is tokenized over a Word2Vec-trained vocabulary, encoded by a 1D-CNN (AspectEncoder), and concatenated with a learned ID embedding. The fused vector is projected and passed through a multi-head self-attention + FFN block (SelfAttentionBlock) to yield aspect-aware user and item representations. Their concatenation is fed to an MLP regressor (ATRS.regressor) that outputs the predicted rating.

How to Run

Configuration

All hyperparameters live in src/config.yaml — it is the single source of truth. Defaults reproduce the paper experiments.

A CUDA-capable GPU is recommended; main.py falls back to CPU with a warning if CUDA is unavailable. See requirements.txt for the GPU wheel and CPU-only setup.

End-to-end run:

conda create -n atrs python=3.11
conda activate atrs
pip install -r requirements.txt
python main.py

Data Preparation

Place the dataset as data/raw/{fname}.{raw_ext} where {fname} and {raw_ext} match data.fname / data.raw_ext in config.yaml. The file is read as JSON-lines (one review object per line) — each line must carry the columns below, or the run aborts at load with a KeyError.

Column	Role
`user_id`	Reviewer id — user-side aspect aggregation and ID embedding.
`parent_asin`	Product id — item-side aspect aggregation and ID embedding.
`text`	Review body — cleaned, then aspect terms are extracted from it (`review_text` is also accepted as an alias).
`rating`	Ground-truth rating; the regression target the model predicts.
`verified_purchase`	Boolean flag; only verified-purchase reviews are kept.

Optional: an aspect column of pre-extracted per-row aspect lists — if present, the PyABSA extraction stage is skipped. Any other columns are ignored. The pipeline writes two cache layers under data/processed/:

{fname}_preprocessed.parquet — written after text cleaning and the k-core filter.
- Columns: the required columns above + clean_text (HTML/URL-stripped, lowercased, contraction-expanded, stop-word-removed, lemmatized review body). Any extra raw columns pass through untouched.
{fname}_aspects.parquet — adds the extracted aspect terms and their per-user/item aggregation.
- Columns: the preprocessed columns + aspect (per-row aspect-term list), user_aspect_set / item_aspect_set (each id's aspect terms flattened across all its reviews).

Re-runs and caching

On every python main.py, the pipeline resumes from the most-complete cache on disk, checking newest-first (aspects → preprocessed → raw) and falling through to the next-earliest stage. The train/test split, Word2Vec, and sequence padding always run fresh in memory, so changes to test_size, seed, val_ratio, aspect_length_percentile, or w2v_* take effect on the next run. To re-trigger an upstream stage, delete its parquet.

Experimental Results

ATRS was evaluated on three real-world review datasets: Musical Instruments, Video Games, and Yelp (Pennsylvania). The results demonstrate that ATRS consistently outperforms representative baselines across all evaluation metrics, achieving average improvements of 19.54% in MAE and 11.89% in RMSE.

Model	Musical Instruments				Video Games				Yelp
Model	MAE	MSE	RMSE	MAPE	MAE	MSE	RMSE	MAPE	MAE	MSE	RMSE	MAPE
PMF	1.306	2.640	1.625	35.034	1.220	2.407	1.551	33.948	1.276	2.803	1.674	38.330
NCF	1.174	1.705	1.306	35.401	0.948	1.331	1.154	35.032	1.085	1.674	1.294	39.320
DeepCoNN	0.786	1.137	1.067	29.931	0.847	1.263	1.124	32.850	0.937	1.381	1.175	38.276
NARRE	0.767	0.993	0.997	29.459	0.776	1.173	1.083	30.518	0.886	1.212	1.101	36.724
AENAR	0.665	0.970	0.985	27.193	0.693	1.002	1.001	28.039	0.845	1.177	1.085	35.605
SAFMR	0.705	0.975	0.987	28.388	0.711	1.033	1.016	30.016	0.881	1.229	1.109	36.076
MFNR	0.708	0.965	0.982	26.922	0.730	0.980	0.990	27.863	0.855	1.174	1.084	33.923
ATRS (Proposed)	0.640	0.933	0.966	26.638	0.646	0.970	0.985	27.537	0.832	1.163	1.078	34.917

Citation

If you use this repository in your research, please cite:

@article{LIM2026123078,
  title = {Reducing contextual noise in review-based recommendation via aspect term extraction and attention modeling},
  author = {Heena Lim and Xinzhe Li and Seonu Park and Qinglong Li and Jaekyeong Kim},
  journal = {Information Sciences},
  volume = {735},
  pages = {123078},
  year = {2026},
  doi = {10.1016/j.ins.2026.123078}
}

Contact

For research inquiries or collaborations, please contact:

Seonu Park Ph.D. Student, Department of Big Data Analytics Kyung Hee University Email: sunu0087@khu.ac.kr

Qinglong Li Assistant Professor, Division of Computer Engineering Hansung University Email: leecy@hansung.ac.kr

Last updated: June 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ATRS

Overview

Repository Structure

Model Description

1. Aspect Term Extraction Module

2. Recommender System Module

How to Run

Configuration

Data Preparation

Re-runs and caching

Experimental Results

Citation

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
data/raw		data/raw
model		model
src		src
.gitignore		.gitignore
README.md		README.md
Reducing contextual noise in review-based.pdf		Reducing contextual noise in review-based.pdf
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

ATRS

Overview

Repository Structure

Model Description

1. Aspect Term Extraction Module

2. Recommender System Module

How to Run

Configuration

Data Preparation

Re-runs and caching

Experimental Results

Citation

Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages