name-parser: Allow dashes between modifier and weight

[why]
Some fonts might have a non-standard (i.e. broken) weight naming scheme:
They put a blank or a dash between the modifier and the weight, for
example "Extra Bold" or "Demi-Condensed", when they mean "ExtraBold"
resp "DemiCondensed".

The former happens with CartographCF, the later with IBM3270.

[how]
Automatically allow a dash between modifier and weight, which comes up
as CamelCase boundary. Insert an optional dash (r'-?') into such
boundaries.
For the further lookup we need to remove the dash in the found keyword,
if there is any, to get back to standard naming.

This might break if the font name ends in a modifier. So we can not
really distinguish

       Font Name Extra Bold Italic
    => Font Name - ExtraBold Italic
    => Font Name Extra - Bold Italic

The known modifiers are 'Demi', 'Ultra', 'Semi', 'Extra'.

It is possible but unlikely that a font name ends in one of these.
For example "Modern Ultra - Bold".

[note]
The question arises if we should not parse the PSname instead of the
Fullname; and stick to the dash there as boundary.
The problem might be prepatched fonts with broken naming, that would be
parsed completely wrong then. So maybe the current approach is still the
best, with the caveat given above (fontnames ending in a modifier).

[note 2]
Funny enough the variable allow_regex_token was not used at all :->
Some leftover? Anyhow we use it now.

[note 3]
We can still not remove the special handling for IBM3270, because the
font initially looks like a PSname and this is parsed as such, which
breaks the name in the incorrect place:

        PSname template  = "Name-StylesWeights"
        Fullname of 3270 = "IBM 3270 Semi-Condensed"

Signed-off-by: Fini Jastrow <ulf.fini.jastrow@desy.de>
This commit is contained in:
Fini Jastrow 2023-05-26 08:33:06 +02:00 committed by Fini
parent f1c2eea937
commit 07a23bda90

View file

@ -64,7 +64,6 @@ class FontnameTools:
known_names = {
# Source of the table is the current sourcefonts
# Left side needs to be lower case
'-': '',
'book': '',
'text': '',
'ce': 'CE',
@ -150,7 +149,12 @@ class FontnameTools:
not_matched = ""
all_tokens = []
j = 1
regex = re.compile('(.*?)(' + '|'.join(tokens) + ')(.*)', re.IGNORECASE)
token_regex = '|'.join(tokens)
if not allow_regex_token:
# Allow a dash between CamelCase token word parts, i.e. Camel-Case
# This allows for styles like Extra-Bold
token_regex = re.sub(r'(?<=[a-z])(?=[A-Z])', '-?', token_regex)
regex = re.compile('(.*?)(' + token_regex + ')(.*)', re.IGNORECASE)
while j:
j = regex.match(name)
if not j:
@ -159,6 +163,9 @@ class FontnameTools:
sys.exit('Malformed regex in FontnameTools.get_name_token()')
not_matched += ' ' + j.groups()[0] # Blanc prevents unwanted concatenation of unmatched substrings
tok = j.groups()[1].lower()
if not allow_regex_token:
# Remove dashes between CamelCase token words
tok = tok.replace('-', '')
if tok in lower_tokens:
tok = tokens[lower_tokens.index(tok)]
tok = FontnameTools.unify_style_names(tok)