Skip to content

PRISM-AILAB/ATRS

Repository files navigation

ATRS

Official implementation of:

Lim, H., Li, X., Park, S., Li, Q., & Kim, J. (2026). Reducing contextual noise in review-based recommendation via aspect term extraction and attention modeling. Information Sciences, 735, 123078. Paper

Overview

This repository is the official implementation of ATRS (Aspect Term-aware Recommender System), published in Information Sciences (2026).

Most review-based recommendation models process entire review bodies indiscriminately, allowing aspect-relevant signal to be diluted by surrounding context. ATRS addresses this by routing review text through a dedicated Aspect Term Extraction (ATE) stage that filters out non-aspect content before downstream encoding.

The retained aspect terms are encoded with a 1D-CNN over Word2Vec embeddings, fused with user/item ID embeddings, and passed through a self-attention block to form aspect-aware user and item representations. These are concatenated and forwarded to an MLP that predicts a continuous rating score as a regression target. Quantitative comparisons against representative recommendation baselines on Amazon and Yelp datasets are reported in Experimental Results.

Repository Structure

├── data/
│   ├── raw/                        # Source datasets — place {fname}.{raw_ext} here
│   ├── processed/                  # Pipeline parquet caches (preprocessed / aspects)
│   └── ate_output/                 # PyABSA workspace + extraction JSON
│       └── .pyabsa/                # Contained pyabsa CWD: checkpoints/, checkpoints.json, result JSON
│
├── model/
│   ├── atrs.py                     # ATRS architecture, trainer, predictor
│   ├── ATRS Architecture.png       # Architecture diagram
│   └── save/                       # Best checkpoint per dataset (best.pth)
│
├── src/
│   ├── config.yaml                 # Single source of truth for all hyperparameters
│   ├── data_processing.py          # DataProcessor pipeline + Dataset/DataLoader factory
│   ├── aspect_extraction.py        # ATExtractor — PyABSA wrapper for aspect term extraction
│   ├── preprocessing.py            # Review-text cleaning and row filters
│   ├── path.py                     # Project path constants (auto-creates runtime folders)
│   └── utils.py                    # Generic helpers — I/O, metrics, seeding
│
├── main.py                         # Entry point: data preparation → train → test
├── requirements.txt
└── README.md

Model Description

ATRS consists of two sequential modules. Aspect extraction runs in src/aspect_extraction.py (orchestrated by src/data_processing.py); the recommender network is in model/atrs.py. The full architecture is illustrated below.

ATRS Architecture

1. Aspect Term Extraction Module

A pretrained Transformer encoder (PyABSA's English ATE checkpoint, FAST-LCF-ATEPC over DeBERTa-v3-base) reads each cleaned review and emits BIO-tagged aspect terms. Per-row aspect lists are then aggregated into per-user and per-item aspect sets, which become the inputs to the RS module.

2. Recommender System Module

Each user and item aspect set is tokenized over a Word2Vec-trained vocabulary, encoded by a 1D-CNN (AspectEncoder), and concatenated with a learned ID embedding. The fused vector is projected and passed through a multi-head self-attention + FFN block (SelfAttentionBlock) to yield aspect-aware user and item representations. Their concatenation is fed to an MLP regressor (ATRS.regressor) that outputs the predicted rating.

How to Run

Configuration

All hyperparameters live in src/config.yaml — it is the single source of truth. Defaults reproduce the paper experiments.

A CUDA-capable GPU is recommended; main.py falls back to CPU with a warning if CUDA is unavailable. See requirements.txt for the GPU wheel and CPU-only setup.

End-to-end run:

conda create -n atrs python=3.11
conda activate atrs
pip install -r requirements.txt
python main.py

Data Preparation

Place the dataset as data/raw/{fname}.{raw_ext} where {fname} and {raw_ext} match data.fname / data.raw_ext in config.yaml. The file is read as JSON-lines (one review object per line) — each line must carry the columns below, or the run aborts at load with a KeyError.

Column Role
user_id Reviewer id — user-side aspect aggregation and ID embedding.
parent_asin Product id — item-side aspect aggregation and ID embedding.
text Review body — cleaned, then aspect terms are extracted from it (review_text is also accepted as an alias).
rating Ground-truth rating; the regression target the model predicts.
verified_purchase Boolean flag; only verified-purchase reviews are kept.

Optional: an aspect column of pre-extracted per-row aspect lists — if present, the PyABSA extraction stage is skipped. Any other columns are ignored. The pipeline writes two cache layers under data/processed/:

  • {fname}_preprocessed.parquet — written after text cleaning and the k-core filter.
    • Columns: the required columns above + clean_text (HTML/URL-stripped, lowercased, contraction-expanded, stop-word-removed, lemmatized review body). Any extra raw columns pass through untouched.
  • {fname}_aspects.parquet — adds the extracted aspect terms and their per-user/item aggregation.
    • Columns: the preprocessed columns + aspect (per-row aspect-term list), user_aspect_set / item_aspect_set (each id's aspect terms flattened across all its reviews).

Re-runs and caching

On every python main.py, the pipeline resumes from the most-complete cache on disk, checking newest-first (aspects → preprocessed → raw) and falling through to the next-earliest stage. The train/test split, Word2Vec, and sequence padding always run fresh in memory, so changes to test_size, seed, val_ratio, aspect_length_percentile, or w2v_* take effect on the next run. To re-trigger an upstream stage, delete its parquet.

Experimental Results

ATRS was evaluated on three real-world review datasets: Musical Instruments, Video Games, and Yelp (Pennsylvania). The results demonstrate that ATRS consistently outperforms representative baselines across all evaluation metrics, achieving average improvements of 19.54% in MAE and 11.89% in RMSE.

Model Musical Instruments Video Games Yelp
MAE MSE RMSE MAPE MAE MSE RMSE MAPE MAE MSE RMSE MAPE
PMF 1.3062.6401.62535.034 1.2202.4071.55133.948 1.2762.8031.67438.330
NCF 1.1741.7051.30635.401 0.9481.3311.15435.032 1.0851.6741.29439.320
DeepCoNN 0.7861.1371.06729.931 0.8471.2631.12432.850 0.9371.3811.17538.276
NARRE 0.7670.9930.99729.459 0.7761.1731.08330.518 0.8861.2121.10136.724
AENAR 0.6650.9700.98527.193 0.6931.0021.00128.039 0.8451.1771.08535.605
SAFMR 0.7050.9750.98728.388 0.7111.0331.01630.016 0.8811.2291.10936.076
MFNR 0.7080.9650.98226.922 0.7300.9800.99027.863 0.8551.1741.08433.923
ATRS (Proposed) 0.6400.9330.96626.638 0.6460.9700.98527.537 0.8321.1631.07834.917

Citation

If you use this repository in your research, please cite:

@article{LIM2026123078,
  title = {Reducing contextual noise in review-based recommendation via aspect term extraction and attention modeling},
  author = {Heena Lim and Xinzhe Li and Seonu Park and Qinglong Li and Jaekyeong Kim},
  journal = {Information Sciences},
  volume = {735},
  pages = {123078},
  year = {2026},
  doi = {10.1016/j.ins.2026.123078}
}

Contact

For research inquiries or collaborations, please contact:

Seonu Park Ph.D. Student, Department of Big Data Analytics Kyung Hee University Email: sunu0087@khu.ac.kr

Qinglong Li Assistant Professor, Division of Computer Engineering Hansung University Email: leecy@hansung.ac.kr

Last updated: June 2026

About

About Official implementation of "Reducing contextual noise in review-based recommendation via aspect term extraction and attention modeling" (Information Sciences, 2026)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages