After upgrading my local copy (WAMP 3.1.4, PHP 7.10, MySQL 5.7.23) of a site from ZP1.4.14 to 1.5, I have discovered that pages containing Cyrillic characters are not displayed correctly. I've repeated clean installation/upgrade as described below with reproducible loss of Cyrillic support:
I don't dare at the moment to test it on production site, so I do not exclude a possibility that this can be related to something related to WAMP configuration, but as I have no problems with the older versions of Zp, I think there should be a bug.
Actually nothing changed from 1.4.14 regarding the database. I just tried on my local MAMP install and "проверка" works fine here and also on our own site. Here is a test article: https://www.zenphoto.org/news/proverka/
You said you setup a new database. Please check that the encoding should generally be utf8_unicode_ci which is what Zenphoto will actually set. Also check that not only the tables and columns but the database itself is set correctly. And also the encoding options in Zenphoto itself. Also try with the tinyMCE text editor enabled.
Thanks for checking it. I confirm I use utf8_unicode_ci for the database, tables and columns and that tinyMCE is enabled. If it works in MAMP, it might be WAMP-related isssue. what is strange for me, that I see correct characters in Edit page, Content field, but not in the php-rendered page,
this is what I see in the database in phpMyAdmin:
a:1:{s:5:"fr_FR";s:16:"проверка";} for title field
a:1:{s:5:"fr_FR";s:23:"проверка ";} for content.
fr_FR was en_US before editing the record with timyMCE.
A record made with TinyMCE switched of reads as:
a:1:{s:5:"fr_FR";s:17:"проверка2";},
a:1:{s:5:"fr_FR";s:24:"проверка tést 2";} and cyrrillics and é are then displayed as �
If "fr_FR" changes from "en_US" that means you switch the language on the backend.
tinymce does do some encoding of special chars when saving. Does the exact same happen on the other install?
I just tried the same locally without tinymce and it still works for me, even with a freshly created page. I would assume there is some tiny detail off somehwere…
Ok, I've found a fix:
file functions-common, line 358 reads in zp1.5 as
$str = tidyHTML($str);.
After I've changed to what it (approximately) was in zp 1.4.14 (see below), everything works fine.
if ($str != $original) {
$str = tidyHTML($str);
}
that line only prevented evaluation of tidyHTML($str), and tidyHTML() itself checks the presence of tidy class and if it is absent,
in zp1.5:
return trim(htmLawed($html, array('tidy' => '2s2n')))
in zp1.4:
return $html;
So the issue is most probably related to htmLawed().
This probably should be tested with a longer text using Cyrillic chars and how that all works on actual truncation. The missing comparison probably should be re-added as it is not really necessary if the string is the same anyway.
And yes, htmLawed is a kind of workaround if tidy is not there as tidy is superior.
Ok, I've tested it with tidy extension switched off. xdebug shows that the string (independent of its length, by the way) changes its value to unreadable charachters in line 677 of lib-htmLawed.php saying (not clear yet for me):
$t = preg_replace(array('`(]*(?)\s+`', '`\s+`', '`(]*(?) `'), array(' $1', ' ', '$1'), preg_replace_callback(array('`()`sm', '`()`sm', '`(]*?>)(.+?)()`sm'), 'hl_aux2', $t));