GitHub - OpenCovenant/gramatiket

gramatiket - Grammatical Error Correction Dataset For Albanian

This repository contains a dataset aimed to train AI models for the task of Grammatical Error Correction.

Getting Started

The file gramatiket.json contains a list of sentences with an instrumented grammatical error as dictated by the tag and a corrected version of the source sentence. The list of the created tags can be found in token_level_transformations.json. Certain checks are performed on the pipeline once changes are made to the dataset.

Constraints of the Dataset

The sentences should not have any other issues themselves.
The tag REPLACE_ELEMENT means that the source token should be replaced with ELEMENT.
Do NOT add tags that don’t need context to be solved.
The value in the "tag" column MUST be unique, otherwise we can’t know exactly which transformation we are referring to.
The tag will apply the exact operation as depicted in the tag, i.e. if the tag is REPLACE_tre then we cannot apply it to the sentence "Tri nga ato u kthyen." In this scenario a new tag will have to be created, REPLACE_Tre.

Contributors

The entries found in this dataset have been added in a number of various ways. We'd like to express our gratitude in the following alphabetical list.

AndiBraimllari (Andi Braimllari)
KostaTB (Kostian Qirjazi)

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
CITATION.bib		CITATION.bib
LICENSE		LICENSE
README.md		README.md
check_duplicates.py		check_duplicates.py
check_increment_of_ids.py		check_increment_of_ids.py
gramatiket.json		gramatiket.json
token_level_transformations.json		token_level_transformations.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gramatiket - Grammatical Error Correction Dataset For Albanian

Getting Started

Constraints of the Dataset

Contributors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

gramatiket - Grammatical Error Correction Dataset For Albanian

Getting Started

Constraints of the Dataset

Contributors

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages