evaluate-ocr-page

A script that takes an xml of an evaluated page and automatically transcribed page and calculated CER and WER. Designed for a DH class on the subject of evaluation.

Instructions for running the script

Dependencies

The script uses numpy to calculate Character Error Rate and Word Error Rate. Before running the script, install numpy using pip

Requirements

This script is only for comparing at the page level. It will only compare one PAGE xml zip file at a time - attempting to run more than one PAGE xml will result in an error. It is written for eScriptorium PAGE xml exports.

Steps to run

Add the evaluation PAGE xml zip into the evaluation_xml folder
Add the test data PAGE xml zip (i.e. the same page but run with a new OCR model that you would like to test) into the test_xml folder
In the terminal run python evaluate_xml.py or python3 evaluate_xml.py
The script will print mean CER and WER for each page
The script will print mean CER and WER for all pages
The script will output a line-by-line comparison to a csv file in the 'out' folder

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

evaluate-ocr-page

Instructions for running the script

Dependencies

Requirements

Steps to run

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
evaluation_xml		evaluation_xml
out		out
temp		temp
test_xml		test_xml
.gitignore		.gitignore
README.md		README.md
evaluate_xml.py		evaluate_xml.py

Folders and files

Latest commit

History

Repository files navigation

evaluate-ocr-page

Instructions for running the script

Dependencies

Requirements

Steps to run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages