Add Normalization Form and Language to ParatextSettingsParser#313
Add Normalization Form and Language to ParatextSettingsParser#313TaperChipmunk32 wants to merge 2 commits into
Conversation
Enkidu93
left a comment
There was a problem hiding this comment.
@Enkidu93 reviewed 2 files and all commit messages, and made 2 comments.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on TaperChipmunk32).
machine/corpora/paratext_project_settings.py line 25 at r1 (raw file):
language_code: Optional[str] translation_type: str normalization_form: str
It might be good to test these in the settings parser tests somehow - especially with how many properties there now are.
machine/corpora/paratext_project_settings_parser_base.py line 102 at r1 (raw file):
parent_guid = translation_info_setting_parts[2] if translation_info_setting_parts[2] != "" else None normalization_form: str = settings_tree.getroot().findtext("NormalizationForm", "Off")
Are you confident this should be the default? Looking here https://paratext.org/2019/04/02/announcement-of-feature-to-control-unicode-normalization/, it looks like 'Off' is what it is for older projects migrated over to Paratext 8, but that NFC is the new default. I wonder what the true default behavior is if this setting doesn't exist 🤔. Alternatively, if it is truly missing, maybe None is better? What do you think?
|
Previously, Enkidu93 (Eli C. Lowry) wrote…
When Normalization is |
Enkidu93
left a comment
There was a problem hiding this comment.
@Enkidu93 made 1 comment.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on ddaspit and TaperChipmunk32).
machine/corpora/paratext_project_settings_parser_base.py line 102 at r1 (raw file):
Previously, TaperChipmunk32 (Matthew Beech) wrote…
When Normalization is
Off, Paratext does not include theNormalizationFormin the Settings file, as far as I can tell, which is why the default I chose isOff. I think Off is never actually in the Settings file, so when a new Paratext project is made with the default,NFC, it is still included in the Settings file.
OK, hmm, so if Off is never a NormalizationForm from the Settings.xml perspective, maybe it shouldn't be the default here - i.e., maybe the thing to do here is to just set it to None (like we do for some of the other settings) and then on your end in the script you're making, interpret that None as 'Off' 🤷♂️. @ddaspit, any thoughts?
This PR add
normalization_formandlanguagetoParatextProjectSettings, from theNormalizationFormandLanguagefields in a project'sSettings.xmlfile. These are both wanted for some improved Onboarding information collection.This change is