Bug 26165 - The unconfigured fonts-otf-source-han font causes fontconfig to select Japanese rather than the Unicode base characters.
Summary: The unconfigured fonts-otf-source-han font causes fontconfig to select Japane...
Status: NEW
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: Cauldron
Hardware: All Linux
Priority: Normal normal
Target Milestone: ---
Assignee: All Packagers
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-02-03 11:07 CET by Edward d'Auvergne
Modified: 2020-06-18 11:43 CEST (History)
4 users (show)

See Also:
Source RPM: fonts-otf-source-han-2.000-1.mga7.noarch.rpm
CVE:
Status comment:


Attachments
Screenshot of https://en.wiktionary.org/wiki/%E9%97%A8 (236.84 KB, image/png)
2020-02-03 21:43 CET, Martin Whitaker
Details
Screenshots of Chromium browser with and without fonts-otf-source-han installed. (206.37 KB, image/png)
2020-02-04 09:56 CET, Edward d'Auvergne
Details
Screenshots of the 门 wiktionary page on Chromium browser with and without the font installed. (471.33 KB, image/jpeg)
2020-02-04 10:23 CET, Edward d'Auvergne
Details
Screenshots of Firefox browser with and without fonts-otf-source-han installed. (491.03 KB, image/jpeg)
2020-02-04 10:54 CET, Edward d'Auvergne
Details
A basic fontconfig configuration file for the Source Han Sans font. (2.12 KB, text/plain)
2020-02-07 11:43 CET, Edward d'Auvergne
Details
Python script for testing the fontconfig matching. (3.64 KB, text/plain)
2020-02-10 15:26 CET, Edward d'Auvergne
Details
A new fontconfig configuration file for the Source Han Sans font. (2.40 KB, text/plain)
2020-02-10 15:30 CET, Edward d'Auvergne
Details
Patch for removing hinting from the Source Han Sans font. (640 bytes, patch)
2020-02-10 15:34 CET, Edward d'Auvergne
Details | Diff

Description Edward d'Auvergne 2020-02-03 11:07:17 CET
Description of problem:

The Chinese characters in the font 'fonts-otf-source-han' are rubbish.  This is a problem because the font, when installed, becomes the default system font.  For example U+95e8 (https://en.wiktionary.org/wiki/%E9%97%A8), the Chinese character for gate, is actually shown as Japanese character for gate.  I.e. it matches this bug report:

https://askubuntu.com/questions/901486/%E9%97%A8-looks-weird-on-my-system-default-font/901488

This can be seen by installing the font and typing:

$ hb-view /usr/share/fonts/OTF/source-han/SourceHanSans.ttc  `echo -ne "\u95e8"`
                  ▍                 
                  ▍                 
     ▗ ▄▄▄▄▄▄▄▄▄▄  ▄▄▄▄▄▄▄▄▄▄ ▖     
     ▋            ▍           ▍     
     ▋            ▍           ▍     
     ▋            ▍           ▍     
     ▋           ▇            ▍     
     ▋                        ▍     
     ▋                        ▍     
     ▋                        ▍     
     ▋                        ▍     
     ▋                        ▍     
     ▋                        ▍     
     ▋                        ▍     
     ▙▟                  ▄▄▄▅▅      
$


Version-Release number of selected component (if applicable):

2.0000-1.mga7


How reproducible:

100%

Steps to Reproduce:
1.  Install font.
2.  Log out and then back in again.
3.  Look at https://en.wiktionary.org/wiki/%E9%97%A8 and check the URL in the browser and the text on the page.
Comment 1 Edward d'Auvergne 2020-02-03 15:45:29 CET
To make it more clear, simply installing the 'fonts-otf-source-han' package makes it impossible to input or read Chinese on a Mageia 7 installation (at least when using a non-Chinese locale).

Looking at the issue reports for this font (https://github.com/adobe-fonts/source-han-sans/issues), it is abundantly clear that the font was not designed or created by a native Chinese, Korean, or Japanese speaker.  I would suggest that this font is too embarrassing to ship with Mageia.  The Google Noto fonts already shipped with Mageia are orders of magnitude better and, importantly, are correct.
Comment 2 Lewis Smith 2020-02-03 20:05:11 CET
Thank you for this report, damning though it is. You speak with some authority, and one cannot but recognise the sense of your request. Since the package only provides:
 $ urpmq -l fonts-otf-source-han
 /etc/X11/fontpath.d/otf-source-han:pri=50
 /usr/share/fonts/OTF/source-han
 /usr/share/fonts/OTF/source-han/SourceHanSans.ttc
it suggests removing it completely. And if the SRPM provides no other package, removing that too.

This SRPM has no registered maintainer, so assigning the bug globally.
CC'ing You-Cheng as having committed it in the not-too-distant past .

CC: (none) => lewyssmith, yochenhsieh
Assignee: bugsquad => pkg-bugs

Comment 3 Martin Whitaker 2020-02-03 21:40:13 CET
Without being an expert on this subject, I question the assertions made by Edward.

1. The Google Noto CJK fonts are rebranded versions of the Source Han fonts. See https://github.com/adobe-fonts/source-han-mono/issues/1 and the "Noto Sans CJK Differences" section of https://github.com/adobe-fonts/source-han-sans/raw/release/SourceHanSansReadMe.pdf.

2. The aforementioned ReadMe file states that Source Han Sans was designed by Ryoko Nishizuka (西塚涼子).

3. /usr/share/fonts/OTF/source-han/SourceHanSans.ttc is a Super OpenType Collection. Again from the ReadMe:

"This deployment configuration packs all seven weights and all five languages, along with half-width variations of two of the seven weights, into a single font resource that includes a total of 45 font instances and 458,745 total glyphs."

In LibreOffice, selecting one of the Chinese variants yields the correct Chinese glyph for U+95e8.

CC: (none) => mageia

Comment 4 Martin Whitaker 2020-02-03 21:43:32 CET
Created attachment 11495 [details]
Screenshot of https://en.wiktionary.org/wiki/%E9%97%A8

And here is a screenshot of the wiki page Edward referenced. It is using the Chinese glyph in the body of the page. The only issue is that where the language is not specified, e.g. in the URL field, it defaults to Japanese.
Comment 5 Martin Whitaker 2020-02-03 22:02:03 CET
Also note that the Japanese glyph is used where appropriate on that page. Without the fonts-otf-source-han package installed (but with the fonts-ttf-unicode package installed), the Chinese glyph is used throughout.
Comment 6 Edward d'Auvergne 2020-02-04 09:56:56 CET
Created attachment 11496 [details]
Screenshots of Chromium browser with and without fonts-otf-source-han installed.

This screenshot comparison shows the differences I see with and without fonts-otf-source-han-2.000-1.mga7.noarch installed.  It shows three issues:

1. UI elements of Chromium browser:  These use the Japanese Ryakuji glyph for gate rather than the Chinese glyph for gate with the font installed.

2. Web content:  Again the Japanese glyph is used.

3. Ibus:  For simplified Chinese input, the displayed characters are Japanese, not Chinese.

For this test, I installed the font, logged out, killed all user processes, logged back in, took the screenshot, logged out, killed all user processes, uninstalled the font, logged back in, and took the final screenshot.
Comment 7 Edward d'Auvergne 2020-02-04 09:59:08 CET
(In reply to Edward d'Auvergne from comment #6)

Note I did not use firefox for this test, as it has a complex language configuration and is known to have its own internal Chinese vs. Japanese issues.
Comment 8 Edward d'Auvergne 2020-02-04 10:23:01 CET
Created attachment 11497 [details]
Screenshots of the 门 wiktionary page on Chromium browser with and without the font installed.

This comparison matches the previous one, but shows the 门 wiktionary page on Chromium browser instead.
Comment 9 Edward d'Auvergne 2020-02-04 10:54:59 CET
Created attachment 11498 [details]
Screenshots of Firefox browser with and without fonts-otf-source-han installed.

This matches the comparison in Comment #6, except that Firefox browser is being used instead of Chromium browser, as well as this test being for a new user.

The differences are more subtle and demonstrate a second bug in Firefox itself.  In this comparison, just the Plasma decorations shows the Japanese rather than Chinese character with this font installed.  The second bug in Firefox should treated separately.
Comment 10 Edward d'Auvergne 2020-02-04 11:41:32 CET
More on the Firefox side-bug.  This has to do with the hardcoded "font fallback" mechanism within the Firefox engine.  It seems as if the hardcoded fallback order is Japanese->Korean->Chinese (e.g. see https://stackoverflow.com/questions/29241764/how-do-web-browsers-implement-font-fallback).  So, to 'trick' Firefox, I:

1. Click on the settings (the three lines in the top right hand corner).
2. Go to 'Preferences'.
3. In the general tab, click on the "Advanced..." button in the "Language and Appearance" section.
4. In the "Fonts for" section, select Japanese.
5. Change the "Serif", "San Serif", and "Monospace" to, for example, "Noto Serif CJK SC", "Noto Sans CJK SC", and "Noto Sans Mono CJK SC" respectively.  The "SC" in these fonts means "Simplified Chinese".
6. Click "OK".

This tricks Firefox into using the correct font for Chinese pages.  I assume it would not be good for Japanese webpages if the language is not explicitly set.
Edward d'Auvergne 2020-02-04 12:28:52 CET

CC: (none) => true.bugman

Comment 11 Lewis Smith 2020-02-04 21:16:13 CET
Impressive expertise.
Thank you Martin for your research; and Edward for the exemplary annotated graphic attachments. The latter (comments 6, 8, 9) show without doubt the benefit of *not* having the pkg in question; but conversely:
(In reply to Martin Whitaker from comment #5)
> Also note that the Japanese glyph is used where appropriate on that page.
> Without the fonts-otf-source-han package installed (but with the
> fonts-ttf-unicode package installed), the Chinese glyph is used throughout.
which indicates the benefit of having the thing installed. It would help to have 
You-Cheng appear on this.

@Martin : what do you say to:
> The Google Noto fonts already shipped with Mageia are orders of magnitude
> better and, importantly, are correct.
How does your experiment work with 'google-noto-cjk-fonts' instead of the contentious ones ?
Can we say that noto-cjk offers all the functionality of 'fonts-otf-source-han' (& better) ?
What would be lost by withdrawing fonts-otf-source-han ?
Comment 12 Edward d'Auvergne 2020-02-05 09:43:50 CET
For reference, I see the correct Ryakuji character.  According to the Firefox "Inspect element (Q)" right click menu, in the 'Fonts' tab on the right I see that it is using "Noto Sans CJK JP Regular".  This is in a <span> with the Jpan class, and the CSS includes:

.Hira, .Jpan, .Kana {
    font-family: 'Hiragino Kaku Gothic Pro',Osaka,'Yu Gothic',Meiryo,'Source Han Sans J','Source Han Sans JP','Noto Sans CJK JP','Droid Sans Japanese','MS PGothic','MS Gothic','MS PMincho','MS Mincho',HanaMinA,HanaMinB,sans-serif;
    font-size: 120%;
    line-height: 1;
}

In Chromium browser, the "Inspect" menu after double clicking on the character in the HTML says it is using "Noto Sans CJK JP Regular" as well.
Comment 13 Edward d'Auvergne 2020-02-05 11:08:28 CET
The following is my understanding of the situation.  I present it mainly for reference.  Note that I might be missing information or might have some bias.


Scripts and variants
====================

The CJK (Chinese, Japanese, Korean) system has many scripts and variants.  Relevant to this bug are:

* Traditional Chinese:   The script from which all the following derive from (https://en.wikipedia.org/wiki/Traditional_Chinese_characters).
* Simplified Chinese:  A simplification of the traditional Chinese characters (https://en.wikipedia.org/wiki/Simplified_Chinese_characters).
* Kanji:  The complex Japanese characters derived from traditional Chinese (https://en.wikipedia.org/wiki/Kanji).
* Ryakuji:  Colloquial, non-official simplifications of Kanji (https://en.wikipedia.org/wiki/Ryakuji).

For a full list of the large number of Chinese script derivatives, see the Wikipedia Chinese characters sidebar (https://en.wikipedia.org/wiki/Template:Chinese_characters_sidebar).


Character for door/gate
=======================

See the linked images for what they should look like:

* Traditional Chinese: 門 (https://commons.wikimedia.org/wiki/File:%E9%96%80-order.gif)
* Simplified Chinese: 门 (https://commons.wikimedia.org/wiki/File:%E9%97%A8-order.gif)
* Japanese Kanji: 門 - this is the same as traditional Chinese.
* Japanese Ryakuji: 门󠄁 (https://commons.wikimedia.org/wiki/File:Japanese_abbreviation_kanji_mon.png)

Only the traditional Chinese and Kanji should appear as the same character.


Unicode
=======

* Traditional Chinese: U+9580, &#38272, "\u9580", https://unicode.org/cldr/utility/character.jsp?a=9580, https://www.fileformat.info/info/unicode/char/9580/index.htm
* Simplified Chinese: U+95E8, &#38376, "\u95e8", https://unicode.org/cldr/utility/character.jsp?a=95e8, https://www.fileformat.info/info/unicode/char/95e8/index.htm
* Japanese Kanji: U+9580 U+E0100 (it appears to be the same as traditional Chinese).
* Japanese Ryakuji: U+95E8 U+E0101 (this also appears to be the same as U+95E8 U+E0100).

The Kanji and Ryakuji Japanese characters are Unicode variants.  They belong to the Ideographic Variation Database (https://en.wikipedia.org/wiki/Variant_form_%28Unicode%29), which uses the base Unicode character (e.g. U+95E8) followed by a character defining which variant to use (e.g. U+E0101).

Here is a ~29 MB PDF of the Hanyo-Denshi collection for the Unicode Ideographic Variation Database (https://unicode.org/ivd/data/2017-12-12/IVD_Charts_Hanyo-Denshi.pdf).  On page 188 under 95E8, you can see that there are two variants defined, E0101 and E0102.  Also see the other references at https://unicode.org/ivd/data/2017-12-12/.  Specifically the text file listing all variants (https://unicode.org/ivd/data/2017-12-12/IVD_Sequences.txt) which lists the U+95E8 U+E0100 variant.


Fonts
=====

In the Japanese fonts shipped with Mageia, it appears that the U+95E8 position is always filled with the U+95E8 U+E0101 Ryakuji character.  For the generic CJK character fonts, instead of using the Unicode simplified Chinese U+95E8 character, the Ryakuji U+95E8 U+E0101 character is used instead.  This is the case for the fonts-otf-source-han font, as the hb-view command in comment #1 shows.  There seems to be a little Japanese bias there.

So instead of using the "U+95E8 U+E0101" Unicode sequence for defining the display character, instead the base "U+95E8" character is used together with a font definition to select the variant.

In any case, I would recommend that the default behaviour should be:

1. If no font or language is specified:  The character should be displayed using the base Unicode definition of that character (e.g. https://www.fileformat.info/info/unicode/char/95e8/index.htm for U+95E8).
2. If a font or language is specified:  The author of the specific font has already decided which variant to use.
3. If the double Unicode character is present:  I guess the system should determine the appropriate font and select the position using the base Unicode character.

Somehow installing 'fonts-otf-source-han-2.000-1.mga7.noarch.rpm' breaks 1. and makes the system unusable for Chinese.
Comment 14 Lewis Smith 2020-02-05 19:31:46 CET
@Edward
Blinded by science (me).
Your last remark above seems important (qualified originally by "makes it impossible to input or read Chinese on a Mageia 7 installation (at least when using a non-Chinese locale)").
> Can we say that 'noto-cjk' offers all the functionality of
> 'fonts-otf-source-han' (& better) ?
> What would be lost by withdrawing fonts-otf-source-han ?
In your view.
Lewis Smith 2020-02-05 19:32:16 CET

Assignee: pkg-bugs => bugsquad

Comment 15 Edward d'Auvergne 2020-02-06 13:52:08 CET
I would say that the Noto fonts could fully replace and be better than fonts-otf-source-han.  The Noto fonts should be able to replace this - as Martin said and I'm now reading about, the Noto CJK fonts are derived from the Source Han fonts.  But I now wonder if any of the software shipped with Mageia has the Source Han font as a hard-coded dependency?

But I am trying to work out what the difference is with mga5, mga6, and mga7 (I had neither the Source Han nor Noto fonts installed from mga1-mga4).  It looks like the SourceHanSans.ttc contains all the font variants whereas in Noto the fonts are split out into different files.  The only difference I can see is that Source Han when from version 1.004 to 2.000 from mga6 to mga7.

The default value for U+95E8 in SourceHanSans.ttc is the Japanese Ryakuji character (a Unicode variant).  This should really not be the case - see https://www.fileformat.info/info/unicode/char/95e8/index.htm for the definition of this Unicode glyph as the simplified Chinese character.  The Noto fonts in /usr/share/fonts/google-noto-cjk/NotoSansCJK-*.ttc also have this Japanese character instead of the Uncode base character (from google-noto-sans-cjk-ttc-fonts).  I guess the /usr/share/fonts/google-noto-cjk/Noto*.otf fonts are somehow being correctly used though, when 'fonts-otf-source-han' is missing, by defaulting to the Unicode standard.  As to why this is a problem, I still do not understand though.  Maybe Mageia is simply not using SourceHanSans.ttc the way it should be?  Still, SourceHanSans.ttc should not default to a Unicode variant character rather than the Unicode base character (the obvious Japanese bias there).
Comment 16 Lewis Smith 2020-02-06 21:01:58 CET
> I would say that the Noto fonts [google-noto-cjk-fonts] could fully
> replace and be better than fonts-otf-source-han
Thanks. And for all your input.
It seems that we can still add stuff to M7 ERRATA; this problem could be noted there. I will look into it.

> I now wonder if any of the software shipped with Mageia has the
> Source Han font as a hard-coded dependency?
 $ urpmq --whatrequires-recursive fonts-otf-source-han
No.

I think this (drop 'fonts-otf-source-han') cannot be done for M7, but could be for M8 if agreed; so changing the version to Cauldron, and re-assigning globally. It might end closed wontfix because the remedy is simply not to install the iffy package.

Version: 7 => Cauldron
Assignee: bugsquad => pkg-bugs

Comment 17 Edward d'Auvergne 2020-02-07 11:11:55 CET
This is a lot more complicated than I thought.  The issues are not due to the fonts themselves.  The issue is that many CJK fonts appear to not be correctly configured or not configured at all on Mageia.

So I've been looking at Mageia's fontconfig fallback ordering.  I've used the following steps:

1) Run "$ DISPLAY=:0 FC_DEBUG=4 knotes".
2) Open a note from the panel.
3) Type or paste the character 门.
4) Look at the last selection from the fontconfig debugging output.
5) Delete the text in the note.
6) Find the rpm of the last font and uninstall it.
7) Log out of X.
8) Log back in.
9) Repeat the above.

From this, the fallback ordering I see is:

JP; fullname: "Source Han Sans"(s) "源ノ角ゴシック"(s); rpm: fonts-otf-source-han
SC; fullname: "WenQuanYi Micro Hei"(s) "文泉驛微米黑"(s) "文泉驿微米黑"(s); rpm: fonts-ttf-wqy-microhei
SC; family: "WenQuanYi Bitmap Song"(s); rpm: x11-font-wqy-bitmapfont
SC; fullname: "AR PL UMing TW MBE"(s); rpm: fonts-ttf-chinese
JP; fullname: "Noto Sans CJK JP Regular"(s); rpm: google-noto-sans-cjk-ttc-fonts
JP; fullname: "Un Dotum"(s) "은 돋움"(s); rpm: fonts-ttf-korean
JP; fullname: "Noto Sans CJK JP Regular"(s); rpm: google-noto-sans-cjk-jp-fonts

Here the 'J' means the Japanese Ryakuji character was selected, and 'C' means the simplified Chinese character was.  This was a long procedure to see that the ordering is from /etc/fonts/conf.d/65-nonlatin.conf (from the 'fontconfig' package itself).  The relevant lines being:


"""
<description>Set preferable fonts for non-Latin</description>
  <alias>
    <family>serif</family>
    <prefer>
      [snip]
      <family>MS Mincho</family> <!-- han (ja) -->
      <family>SimSun</family> <!-- han (zh-cn,zh-tw) -->
      <family>PMingLiu</family> <!-- han (zh-tw) -->
      <family>Source Han Sans</family> <!-- han (ja, ko, zh) -->
      <family>Open Hei</family> <!--han (zh-tw) -->
      <family>WenQuanYi Micro Hei</family> <!-- han (zh-cn,zh-tw) -->
      <family>WenQuanYi Zen Hei</family> <!-- han (zh-cn,zh-tw) -->
      <family>WenQuanYi Bitmap Song</family> <!-- han (zh-cn,zh-tw) -->
      <family>AR PL ShanHeiSun Uni</family> <!-- han (ja,zh-cn,zh-tw) -->
      <family>AR PL New Sung</family> <!-- han (zh-cn,zh-tw) -->
      <family>AR PL UMing TW MBE</family> <!--han (zh-tw) -->
      <family>AR PL UMing CN</family> <!--han (ja,zh-cn) -->
      <family>ZYSong18030</family> <!-- han (zh-cn,zh-tw) -->
      <family>HanyiSong</family> <!-- han (zh-cn,zh-tw) -->
      <family>MgOpen Canonica</family>
"""

This list is problematic:

1) If "MS Mincho" is present (maybe aliased to fonts-ttf-japanese-extra or sazanami-mincho-fonts?), then a Japanese font will be the default for all CJK characters.  Only if the glyph is not present in the "MS Mincho" font, will a different font be used.

2) The "Source Han Sans" font will be the default for fontconfig on Mageia if it is installed (if MS Mincho, SimSun, or PMingLui are not found).  However "Source Han Sans" is not at all configured in Mageia - there is nothing in /etc/fonts/conf.d.  Therefore it defaults to the first font in the *.ttc file, which is Japanese.

3) The Google Noto font is not in this list.  It does have fontconfig configuration files in Mageia.  However it is also not configured correctly to default to the base Unicode CJK characters.

I'll try working on some basic configuration files next.
Comment 18 Edward d'Auvergne 2020-02-07 11:32:48 CET
I've changed the title of the bug from "The fonts-otf-source-han font is terrible - please remove from the repositories." to "The unconfigured fonts-otf-source-han font causes fontconfig to select Japanese rather than the Unicode base characters."

Summary: The fonts-otf-source-han font is terrible - please remove from the repositories. => The unconfigured fonts-otf-source-han font causes fontconfig to select Japanese rather than the Unicode base characters.

Comment 19 Edward d'Auvergne 2020-02-07 11:43:00 CET
Created attachment 11500 [details]
A basic fontconfig configuration file for the Source Han Sans font.

This configuration file causes the Source Han Sans font to behave according to the Unicode base standard.  The font family base name "Source Han Sans" is the Japanese font in the *.ttc file.  So the logic is:

1) If fontconfig encounters "Source Han Sans", then traditional Chinese and simplified Chinese are prepended to the font family.  This means that the Unicode base characters will be the default - as they should be.

2) If the language is set to "ja", then 1) is essentially undone.

3) If any of the languages "zh-cn", "zh-tw", "zh-hk", or "kr" are encountered, then the appropriate font family base name will be selected from /usr/share/fonts/OTF/source-han/SourceHanSans.ttc.

I assume that this /etc/fonts/conf.d/65-source-han-sans.conf file could be improved.
Comment 20 Edward d'Auvergne 2020-02-07 12:02:28 CET
I've also just noticed that the current fontconfig git repository file 65-nonlatin.conf (https://gitlab.freedesktop.org/fontconfig/fontconfig/blob/master/conf.d/65-nonlatin.conf) does not contain "Source Han Sans".  It took me a while, but I found it to be added Mageia side:  http://svnweb.mageia.org/packages/cauldron/fontconfig/current/SOURCES/fontconfig-mdvconfig.patch?view=markup.

If a /etc/fonts/conf.d/65-source-han-sans.conf file similar to that in comment #19 is added to Mageia, then this bug is essentially fixed.  The correct characters will be shown using the Source Han Sans font families.

Or if Google Noto is to replace Source Han Sans, then it looks like Google Noto needs to be added to 65-nonlatin.conf (where Source Han Sans currently is), and the Google Noto fontconfig files modified to also not default to Japanese (using similar logic to comment #19)!
Comment 21 Edward d'Auvergne 2020-02-07 12:40:36 CET
Digging more into the Mageia fontconfig files, I've also noticed that a number of the CJK fonts have hinting set to True.  This should not be the case, as the CJK glyphs look much better without hinting.  I see that this is because the font families are missing from /etc/fonts/conf.d/25-unhint-nonlatin.conf.  From the top of this file:

"""
  <description>Disable hinting for CJK fonts</description>
<!-- We can't hint CJK fonts well, so turn off hinting for CJK fonts. -->
"""

So Source Han Sans, Google Noto, and other CJK fonts should be added to this list.
Comment 22 Lewis Smith 2020-02-08 09:38:11 CET
Well, Edward, you have certainly researched several issues. Thank you for all that, and for adapting the bug title.
It seems that from comment 17 onward you have pinpointed various problems and identified where to correct them. That should help any packager looking at this bug - an obscure world.

Would you be willing yourself to take responsibility for maintaining this very specialised area of CJK fonts? Have a look at:
 https://wiki.mageia.org/en/Becoming_a_Mageia_Packager
Comment 23 Martin Whitaker 2020-02-08 13:13:24 CET
Sorry not to have found time to respond sooner. But Edward has discovered for himself what I was going to suggest - the problem lies in the fontconfig rules (or rather, the lack of them).

To answer the question of whether or not the fonts-otf-source-han package could be dropped (replaced by google-noto-sans-cjk-ttc-fonts) the differences I can see are:

1. The google-noto-sans-cjk-ttc-fonts package uses marginally more disk space - 121M vs 118M - not an issue.

2. The fonts-otf-source-han package provides an extra language, Traditional Chines e (Hong Kong). This might matter to some users.
Comment 24 Lewis Smith 2020-02-09 11:21:31 CET
(In reply to Martin Whitaker from comment #23)
Thank you for coming back.
> To answer the question of whether or not the fonts-otf-source-han package
> could be dropped (replaced by google-noto-sans-cjk-ttc-fonts)
> the problem lies in the fontconfig rules (or rather, the lack of them).
This request has been dropped; note the bug title change in comment 18; and comment 22. I feel optimistic that you both agree!

If these fonts do get tidied up, we shall probably ask you, Edward, to try them.
Comment 25 Edward d'Auvergne 2020-02-10 15:24:27 CET
I'm looking at fixing the configuration and look at the requirements for maintaining this.  As mentioned by Martin in comment #23, fonts-otf-source-han does have a Hong Kong variant font that is not, for some reason, present in the Google Noto fonts.  This might already be present in Mageia, but I am writing a small script to test the fontconfig matching.  I'll attach that next, together with a much improved configuration for fonts-otf-source-han.
Comment 26 Edward d'Auvergne 2020-02-10 15:26:39 CET
Created attachment 11502 [details]
Python script for testing the fontconfig matching.

This preliminary script is for testing if the font matched by fontconfig is what you would expect it to be.
Comment 27 Edward d'Auvergne 2020-02-10 15:30:05 CET
Created attachment 11503 [details]
A new fontconfig configuration file for the Source Han Sans font.

This configuration file appears to be fully functional.  It passes the previously attached 'fontconfig_testing.py' testing script (except for the North Korean local ko_KP, for some unknown reason).

Attachment 11500 is obsolete: 0 => 1

Comment 28 Edward d'Auvergne 2020-02-10 15:34:31 CET
Created attachment 11504 [details]
Patch for removing hinting from the Source Han Sans font.

This patch is just a demonstration.  Many more CJK fonts - possibly all those shipped with Mageia - need to have hinting turned off by default.
Comment 29 Lewis Smith 2020-02-11 10:27:53 CET
Once again thank you for your investigations. You have done a great deal for anyone taking this on.
> Many more CJK fonts - possibly all those shipped with Mageia - need to have
> hinting turned off by default.
For reference, follow all the CJK related font *packages* I could find; I did not reduce them to their SRPMs:

bitmap-lucida-typewriter-fonts - Selected CJK bitmap fonts for Anaconda
fonts-otf-source-han - A set of Pan-CJK fonts designed to complement Source Sans Pro​
fonts-ttf-chinese - Unified Chinese True Type font​
fonts-ttf-chinese-opendesktop - OpenDesktop.Org.tw Font -- Simplified and Traditional Chinese and Japanese Ming and Kai Face
fonts-ttf-japanese - Japanese TrueType fonts
fonts-ttf-japanese-extra - Japanese TrueType fontsfonts-ttf-kanjistrokeorders - Kanji Stroke Orders Fonts​
fonts-ttf-korean - Un Fonts in Koream
fonts-ttf-wqy-microhei - WenQuanYi MicroHei TrueType fonts​
google-noto-cjk-fonts - Google Noto Sans CJK Fonts​
google-noto-cjk-fonts-common - Common files for Noto CJK fonts​
google-noto-sans-cjk-jp-fonts - Japanese Multilingual Sans OTF font files for google-noto-cjk-fonts​
google-noto-sans-cjk-kr-fonts - Korean Multilingual Sans OTF font files for google-noto-cjk-fonts
google-noto-sans-cjk-sc-fonts - Simplified Chinese Multilingual Sans OTF font files for google-noto-cjk-fonts​ 
google-noto-sans-cjk-tc-fonts - Traditional Chinese Multilingual Sans OTF font files for google-noto-cjk-fonts​
google-noto-sans-cjk-ttc-fonts - Sans OTC font files for google-noto-cjk-fonts​
google-noto-sans-jp-fonts - Japanese Region-specific Sans OTF font files for google-noto-cjk-fonts​
google-noto-sans-mono-cjk-jp-fonts - Japanese Multilingual Sans Mono OTF font files for google-noto-cjk-fonts​
google-noto-sans-mono-cjk-kr-fonts - Korean Multilingual Sans Mono OTF font files for google-noto-cjk-fonts​
google-noto-sans-mono-cjk-sc-fonts - Simplified Chinese Multilingual Sans Mono OTF font files for google-noto-cjk-fonts​ 
google-noto-sans-mono-cjk-tc-fonts - Traditional Chinese Multilingual Sans Mono OTF font files for google-noto-cjk-fonts​ 
google-noto-sans-sc-fonts - Simplified Chinese Region-specific Sans OTF font files for google-noto-cjk-fonts​ 
google-noto-serif-cjk-jp-fonts - Japanese Multilingual Serif OTF font files for google-noto-cjk-fonts
google-noto-serif-cjk-kr-fonts - Korean Multilingual Serif OTF font files for google-noto-cjk-fonts
google-noto-serif-cjk-sc-fonts - Simplified Chinese Multilingual Serif OTF font files for google-noto-cjk-fonts​
google-noto-serif-cjk-tc-fonts - Traditional Chinese Multilingual Serif OTF font files for google-noto-cjk-fonts​
google-noto-serif-cjk-ttc-fonts - Serif OTC font files for google-noto-cjk-fonts​
google-noto-serif-jp-fonts - Japanese Region-specific Serif OTF font files for google-noto-cjk-fonts​
sazanami-fonts-common - Common files for Sazanami Japanese TrueType fonts​
sazanami-gothic-fonts - Sazanami Gothic Japanese TrueType font​
sazanami-mincho-fonts - Sazanami Mincho Japanese TrueType font​

In case anyone wants to look at this.
Comment 30 Edward d'Auvergne 2020-02-11 14:53:02 CET
I might need to ask on the fontconfig mailing list if there are easier hooks into testing the font matching.  For example if I install absolutely all fonts shipped with Mageia and then have the testing script blacklist the fonts in the Mageia fallback order to see that what the user gets is what is expected.

Note this script doesn't cover software that use their own font selection mechanism rather than fontconfig (Firefox, GTK, etc.).

Also note that some fonts in the above list might contain hinting, for example if it includes non-CJK fonts (e.g. the Korean Hangul and Japanese Kana and scripts).

I think that what really needs to be done is that the 'ttx' fonttools program be used to check all fonts in Mageia in some automated way (loop over all fonts, convert to XML, search for any hinting in <instructions> elements, and make a list of hinted and unhinted fonts).  Then we completely rewrite the configuration for turning off hinting.  For example this has hinting:  /usr/share/fonts/TTF/dejavu/DejaVuSans.ttf.  E.g.:

    <TTGlyph name=".notdef" xMin="102" yMin="-362" xMax="1126" yMax="1444">
      <contour>
        <pt x="102" y="-362" on="1"/>
        <pt x="102" y="1444" on="1"/>
        <pt x="1126" y="1444" on="1"/>
        <pt x="1126" y="-362" on="1"/>
      </contour>
      <contour>
        <pt x="217" y="-248" on="1"/>
        <pt x="1012" y="-248" on="1"/>
        <pt x="1012" y="1329" on="1"/>
        <pt x="217" y="1329" on="1"/>
      </contour>
      <instructions>
        <assembly>
          NPUSHB[ ]	/* 12 values pushed */
          4 251 0 6 251 1 8 5 127 2 4 0
          MDAP[1]	/* MoveDirectAbsPt */
          MDRP[00100]	/* MoveDirectRelPt */
          MDRP[10100]	/* MoveDirectRelPt */
          MIRP[01100]	/* MoveIndirectRelPt */
          IUP[1]	/* InterpolateUntPts */
          SVTCA[0]	/* SetFPVectorToAxis */
          SRP0[ ]	/* SetRefPoint0 */
          MDRP[10100]	/* MoveDirectRelPt */
          MIRP[01100]	/* MoveIndirectRelPt */
          MDRP[10100]	/* MoveDirectRelPt */
          MIRP[01100]	/* MoveIndirectRelPt */
          IUP[0]	/* InterpolateUntPts */
        </assembly>
      </instructions>
    </TTGlyph>
Comment 31 Edward d'Auvergne 2020-02-13 13:43:22 CET
Hmmm, despite a lot of noise on the web about the Source Han Sans and Google Noto CJK fonts not containing hinting, this appears to not be the case:

https://github.com/googlefonts/noto-fonts/issues/159

For the record, the text is:

"""
Noto Sans CJK requires FreeType 2.5.x  to utilize Adobe's CFF rasterizer that 
was added to FreeType 2.5.x.   Moreover, fontconfig has to be configured to use 
the full native hint and gray-scale rendering (as opposed to autohint and 
subpixel rendering). 

We have to document this somewhere (FAQ, release note, web page, etc)  so that 
both individual users and distribution builders can configure fontconfig 
properly to get the best result. 

Making our own deb or rpm package is another way to achieve this. 
"""

I can see CFF hinting commands in both fonts (using ttx to convert the font to XML).  Google recommend 'hintfull' for Noto CJK, but we have 'hintslight' set as the default for all fonts.  And 'rgba' is not set at all - I wonder if that is the same as setting it to 'none'.
Comment 32 Martin Whitaker 2020-06-10 14:37:35 CEST
Being incapable of reading or writing the affected languages, I was hoping You-Cheng would comment on this bug. But absent that, it would be a shame to let Edward's work go to waste. Do I understand correctly that we just need to add the new fontconfig file attached in comment 27, and should ignore the patch to disable hinting?
Comment 33 You-Cheng Hsieh 2020-06-18 02:54:40 CEST
Hello,
Sorry that I forgot my password and didn't reset it until recently. I'm really grateful for what Edward has been done in this issue. 
Usually gray-scale rendering looks better for my eyes with CJK fonts, but that's only my personal preference.
As for hinting, I thought this is usually set in desktop environment. And it probably also varies between people. There might be some CJK characters that doesn't suit hintfull, though.
Again, thank you very much for researching into this.
Comment 34 Martin Whitaker 2020-06-18 11:43:26 CEST
Thanks for commenting You-Cheng. I agree that hinting and rendering is often a matter of personal preference, so best left to the user. I've added Edward's fontconfig file to the fonts-otf-source-han package. It should be available for testing in cauldron shortly.

Note You need to log in before you can comment on or make changes to this bug.