Cyrillic support is broken in the content of pages (on WAMP)

olejorik Member
edited November 4 in General support

After upgrading my local copy (WAMP 3.1.4, PHP 7.10, MySQL 5.7.23) of a site from ZP1.4.14 to 1.5, I have discovered that pages containing Cyrillic characters are not displayed correctly. I've repeated clean installation/upgrade as described below with reproducible loss of Cyrillic support:

  1. Clean installation of Zp 1.4.14
  2. make clean database
  3. choose Zenpage theme
  4. activate zenpage plugin
    create new page with unicode content (I used "проверка" both as title and content)
  5. everything works OK:
  6. upgrade to Zp1.5 (from github) https://github.com/zenphoto/zenphoto
    with or without removing the unknown files
  7. Content of the page is not shown correctly, while the title is ok:
  8. but the article content is not lost (it's visible in Edit Page menu)

I don't dare at the moment to test it on production site, so I do not exclude a possibility that this can be related to something related to WAMP configuration, but as I have no problems with the older versions of Zp, I think there should be a bug.

Comments

  • acrylian Administrator

    Actually nothing changed from 1.4.14 regarding the database. I just tried on my local MAMP install and "проверка" works fine here and also on our own site. Here is a test article: https://www.zenphoto.org/news/proverka/

    You said you setup a new database. Please check that the encoding should generally be utf8_unicode_ci which is what Zenphoto will actually set. Also check that not only the tables and columns but the database itself is set correctly. And also the encoding options in Zenphoto itself. Also try with the tinyMCE text editor enabled.

  • Thanks for checking it. I confirm I use utf8_unicode_ci for the database, tables and columns and that tinyMCE is enabled. If it works in MAMP, it might be WAMP-related isssue. what is strange for me, that I see correct characters in Edit page, Content field, but not in the php-rendered page,

  • acrylian Administrator

    Can you take a look at the database columbs on each directly? Are the strings both stored the same way? If all is right they should be directly stored (besides some serialized array stuff around them that belong to multilingual storage).

  • olejorik Member
    edited November 4

    this is what I see in the database in phpMyAdmin:
    a:1:{s:5:"fr_FR";s:16:"проверка";} for title field
    a:1:{s:5:"fr_FR";s:23:"<p>проверка</p>";} for content.

    fr_FR was en_US before editing the record with timyMCE.

    A record made with TinyMCE switched of reads as:
    a:1:{s:5:"fr_FR";s:17:"проверка2";},
    a:1:{s:5:"fr_FR";s:24:"проверка tést 2";} and cyrrillics and é are then displayed as �

  • acrylian Administrator

    If "fr_FR" changes from "en_US" that means you switch the language on the backend.

    tinymce does do some encoding of special chars when saving. Does the exact same happen on the other install?

    I just tried the same locally without tinymce and it still works for me, even with a freshly created page. I would assume there is some tiny detail off somehwere…

  • olejorik Member
    edited November 5

    Ok, I've found a fix:
    file functions-common, line 358 reads in zp1.5 as
    $str = tidyHTML($str);.

    After I've changed to what it (approximately) was in zp 1.4.14 (see below), everything works fine.

    if ($str != $original) {
        $str = tidyHTML($str);
     }
    
  • acrylian Administrator
    edited November 5

    Thanks, will take a look at that. There had been some changes because of issues with truncated text and broken html. Btw, do you actually have the tidy extension on WAMP (I do in MAMP)?

  • My goodness, how easy it was! Yes, I have it, and it was disabled. After enabling it, I've got all my letters back.

    Thanks! I can only imagine how excellent your paid support should be :)

  • acrylian Administrator

    Thanks ;-) Well, the issue should not happen without tidy so I quickly tested with that line change and at least for me then all works even with tidy as before. So probably we can re-add that line with 1.5.1.

  • that line only prevented evaluation of tidyHTML($str), and tidyHTML() itself checks the presence of tidy class and if it is absent,
    in zp1.5:

    return trim(htmLawed($html, array('tidy' => '2s2n')))
    

    in zp1.4:

    return $html;
    

    So the issue is most probably related to htmLawed().

  • acrylian Administrator

    This probably should be tested with a longer text using Cyrillic chars and how that all works on actual truncation. The missing comparison probably should be re-added as it is not really necessary if the string is the same anyway.

    And yes, htmLawed is a kind of workaround if tidy is not there as tidy is superior.

  • Ok, I've tested it with tidy extension switched off. xdebug shows that the string (independent of its length, by the way) changes its value to unreadable charachters in line 677 of lib-htmLawed.php saying (not clear yet for me):

    $t = preg_replace(array('`(<\w[^>]*(?<!/)>)\s+`', '`\s+`', '`(<\w[^>]*(?<!/)>) `'), array(' $1', ' ', '$1'), preg_replace_callback(array('`(<(!\[CDATA\[))(.+?)(\]\]>)`sm', '`(<(!--))(.+?)(-->)`sm', '`(<(pre|script|textarea)[^>]*?>)(.+?)(</\2>)`sm'), 'hl_aux2', $t));
    
  • acrylian Administrator
    edited November 6

    htmlawed is a third party library which we generally don't touch and have no hand in. I will try to reproduce this later on.

Sign In or Register to comment.