From 07a23bda90e56fa086c4bd5bd8207314ccaf30ae Mon Sep 17 00:00:00 2001 From: Fini Jastrow Date: Fri, 26 May 2023 08:33:06 +0200 Subject: [PATCH] name-parser: Allow dashes between modifier and weight [why] Some fonts might have a non-standard (i.e. broken) weight naming scheme: They put a blank or a dash between the modifier and the weight, for example "Extra Bold" or "Demi-Condensed", when they mean "ExtraBold" resp "DemiCondensed". The former happens with CartographCF, the later with IBM3270. [how] Automatically allow a dash between modifier and weight, which comes up as CamelCase boundary. Insert an optional dash (r'-?') into such boundaries. For the further lookup we need to remove the dash in the found keyword, if there is any, to get back to standard naming. This might break if the font name ends in a modifier. So we can not really distinguish Font Name Extra Bold Italic => Font Name - ExtraBold Italic => Font Name Extra - Bold Italic The known modifiers are 'Demi', 'Ultra', 'Semi', 'Extra'. It is possible but unlikely that a font name ends in one of these. For example "Modern Ultra - Bold". [note] The question arises if we should not parse the PSname instead of the Fullname; and stick to the dash there as boundary. The problem might be prepatched fonts with broken naming, that would be parsed completely wrong then. So maybe the current approach is still the best, with the caveat given above (fontnames ending in a modifier). [note 2] Funny enough the variable allow_regex_token was not used at all :-> Some leftover? Anyhow we use it now. [note 3] We can still not remove the special handling for IBM3270, because the font initially looks like a PSname and this is parsed as such, which breaks the name in the incorrect place: PSname template = "Name-StylesWeights" Fullname of 3270 = "IBM 3270 Semi-Condensed" Signed-off-by: Fini Jastrow --- bin/scripts/name_parser/FontnameTools.py | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/bin/scripts/name_parser/FontnameTools.py b/bin/scripts/name_parser/FontnameTools.py index 0878d57a7..d5ce1bd40 100644 --- a/bin/scripts/name_parser/FontnameTools.py +++ b/bin/scripts/name_parser/FontnameTools.py @@ -64,7 +64,6 @@ class FontnameTools: known_names = { # Source of the table is the current sourcefonts # Left side needs to be lower case - '-': '', 'book': '', 'text': '', 'ce': 'CE', @@ -150,7 +149,12 @@ class FontnameTools: not_matched = "" all_tokens = [] j = 1 - regex = re.compile('(.*?)(' + '|'.join(tokens) + ')(.*)', re.IGNORECASE) + token_regex = '|'.join(tokens) + if not allow_regex_token: + # Allow a dash between CamelCase token word parts, i.e. Camel-Case + # This allows for styles like Extra-Bold + token_regex = re.sub(r'(?<=[a-z])(?=[A-Z])', '-?', token_regex) + regex = re.compile('(.*?)(' + token_regex + ')(.*)', re.IGNORECASE) while j: j = regex.match(name) if not j: @@ -159,6 +163,9 @@ class FontnameTools: sys.exit('Malformed regex in FontnameTools.get_name_token()') not_matched += ' ' + j.groups()[0] # Blanc prevents unwanted concatenation of unmatched substrings tok = j.groups()[1].lower() + if not allow_regex_token: + # Remove dashes between CamelCase token words + tok = tok.replace('-', '') if tok in lower_tokens: tok = tokens[lower_tokens.index(tok)] tok = FontnameTools.unify_style_names(tok)