Skip to content

Add Normalization Form and Language to ParatextSettingsParser#313

Open
TaperChipmunk32 wants to merge 2 commits into
mainfrom
normalization-lang-settings
Open

Add Normalization Form and Language to ParatextSettingsParser#313
TaperChipmunk32 wants to merge 2 commits into
mainfrom
normalization-lang-settings

Conversation

@TaperChipmunk32

@TaperChipmunk32 TaperChipmunk32 commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

This PR add normalization_form and language to ParatextProjectSettings, from the NormalizationForm and Language fields in a project's Settings.xml file. These are both wanted for some improved Onboarding information collection.


This change is Reviewable

@TaperChipmunk32 TaperChipmunk32 requested a review from Enkidu93 June 12, 2026 19:42

@Enkidu93 Enkidu93 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Enkidu93 reviewed 2 files and all commit messages, and made 2 comments.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on TaperChipmunk32).


machine/corpora/paratext_project_settings.py line 25 at r1 (raw file):

    language_code: Optional[str]
    translation_type: str
    normalization_form: str

It might be good to test these in the settings parser tests somehow - especially with how many properties there now are.


machine/corpora/paratext_project_settings_parser_base.py line 102 at r1 (raw file):

                parent_guid = translation_info_setting_parts[2] if translation_info_setting_parts[2] != "" else None

        normalization_form: str = settings_tree.getroot().findtext("NormalizationForm", "Off")

Are you confident this should be the default? Looking here https://paratext.org/2019/04/02/announcement-of-feature-to-control-unicode-normalization/, it looks like 'Off' is what it is for older projects migrated over to Paratext 8, but that NFC is the new default. I wonder what the true default behavior is if this setting doesn't exist 🤔. Alternatively, if it is truly missing, maybe None is better? What do you think?

@TaperChipmunk32

Copy link
Copy Markdown
Collaborator Author

machine/corpora/paratext_project_settings_parser_base.py line 102 at r1 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

Are you confident this should be the default? Looking here https://paratext.org/2019/04/02/announcement-of-feature-to-control-unicode-normalization/, it looks like 'Off' is what it is for older projects migrated over to Paratext 8, but that NFC is the new default. I wonder what the true default behavior is if this setting doesn't exist 🤔. Alternatively, if it is truly missing, maybe None is better? What do you think?

When Normalization is Off, Paratext does not include the NormalizationForm in the Settings file, as far as I can tell, which is why the default I chose is Off. I think Off is never actually in the Settings file, so when a new Paratext project is made with the default, NFC, it is still included in the Settings file.

@Enkidu93 Enkidu93 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Enkidu93 made 1 comment.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on ddaspit and TaperChipmunk32).


machine/corpora/paratext_project_settings_parser_base.py line 102 at r1 (raw file):

Previously, TaperChipmunk32 (Matthew Beech) wrote…

When Normalization is Off, Paratext does not include the NormalizationForm in the Settings file, as far as I can tell, which is why the default I chose is Off. I think Off is never actually in the Settings file, so when a new Paratext project is made with the default, NFC, it is still included in the Settings file.

OK, hmm, so if Off is never a NormalizationForm from the Settings.xml perspective, maybe it shouldn't be the default here - i.e., maybe the thing to do here is to just set it to None (like we do for some of the other settings) and then on your end in the script you're making, interpret that None as 'Off' 🤷‍♂️. @ddaspit, any thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants