Skip to content

1403920368/-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

News Network Pipeline

Standalone pipeline for processing large-scale Chinese financial news files and exporting monthly stock-level network tables for downstream empirical asset-pricing and covariance-estimation studies.

What This Repository Does

This project is responsible for:

  • ingesting large yearly Excel or CSV news files
  • normalizing article-level schemas
  • cleaning timestamps and deduplicating news items
  • resolving stock mentions from article metadata
  • constructing monthly stock-pair co-mention networks
  • exporting compact analysis-ready parquet or CSV outputs

This project does not implement covariance estimation or portfolio backtesting. Those steps should consume the exported monthly network tables from this repository.

Planned Outputs

  • monthly_news_pair_stats.parquet
  • monthly_news_stock_stats.parquet
  • processing_log.csv

Expected Input Schema

The first version assumes raw news files contain some subset of:

  • article id
  • publication timestamp
  • title
  • source
  • stock code list or company name list
  • optional article body or summary

The exact field mapping should be configured in configs/default.yaml.

Repository Layout

src/news_pipeline/
  ingest/
  clean/
  entity/
  network/
  export/
scripts/
configs/
tests/

Quick Start

  1. Put raw data files outside the git-tracked repository or under a local ignored directory.
  2. Update configs/default.yaml with your raw file path and column names.
  3. Run the pipeline step by step:
python scripts/ingest_news.py --config configs/default.yaml
python scripts/build_monthly_network.py --config configs/default.yaml

The initial scaffold provides placeholders and schema conventions. Full processing logic will be added after the raw news field structure is confirmed.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages