The simpler media website CMS
I use my Zenphoto setup (v. 1.6) to host photos with titles/tags in several languages, main interface is in English and I don't use multi-lingual mode.
I've discovered that when Zenphoto is importing metadata from images, it limits photo max-length of Title to 36 symbols and max-length of single Tag to 33 symbols IF this metadata was written in Cyrillic script (Russian, Ukrainian etc). Description is imported without shortening.
You can manually edit Title for such images to restore full length, but it's impossible to rename Tag to longer version, which is very frustrating.
Those limits are undocumented and never mentioned, I wonder if it's possible to remove them?
Comments
The title field for images is set to "text" so there should be no limit like that.
Tags are varchar(255). How long these exactly are depends on the encoding of your database. At best that should be uft8mb4_unicode_ci. What do you have?
@acrylian
In Database info:
character_set_client: utf8mb4
character_set_connection: utf8mb4
character_set_database: utf8mb4
character_set_filesystem: binary
character_set_results: utf8mb4
character_set_server: utf8mb3
character_set_system: utf8mb3
character_sets_dir: /usr/share/mysql/charsets/
collation_connection: utf8mb4_0900_ai_ci
collation_database: utf8mb4_general_ci
collation_server: utf8_unicode_ci
and then, in tables for specific fields utf8mb4_unicode_520_ci is listed.
Actually there should not be any limit besides the database colums itself. But of course I never tested with actual Cycrill data. If you can provide a test image (via link for example) this happens with we can try to reproduce this.
@acrylian
Sure!
A few examples of both:
Clipped title: "Петропавловская церковь в Петерго" instead of "Петропавловская церковь в Петергофе"
https://www.photo.private-universe.net/travel/russia/peterhof/petropavlovskaya-tserkov-v-petergofe.jpg.html
direct image link: https://www.photo.private-universe.net/albums/travel/russia/peterhof/petropavlovskaya-tserkov-v-petergofe.jpg
Clipped tags:
tag: "Государственный музей истории рел" instead of "Государственный музей истории религии"
https://www.photo.private-universe.net/365-projects/2016/12-december/11.12.2016-find-buddha-and-turn-right.jpg.html
direct image link: https://www.photo.private-universe.net/albums/365-projects/2016/12-december/11.12.2016-find-buddha-and-turn-right.jpg
Thanks. We can reproduce the general issue but don't know yet why. Perhaps it is the specialities of Cyrillic and some encoding stuff in connection with our a little older internal exif reading tool. I'll post here if we find anything.
@acrylian
Yes, probably some encoding issues - when title is fully in Cyrillic it limits to 33 symbols, but if it's mixed Cyrillic & Latin you can get more symbols (in one such instance I got fore-mentioned 36 symbols that way)
So far it sadly seems this already wrong after the initial data is read/parsed from the image via the native PHP functions
getimagesize()
andiptcparse()
.@acrylian I'll check encodings used by my software and will report
Meanwhile I tried a third party library for reading metadata and get the same results. I also checked my local server, everything is utf-8 (the "mb4" extra is just a mysql thing) so it "should" work. Also tried our live server and also the same.
Perhaps one or both of the native php functions are not multibyte save for some reasons. Could not find info about that except general encoding setting we cover as intended actually.
Perhaps also check the encoding of the data written to the image itsself. The IPTC keywords are stored binary so a wild guess - as I have no knowledge how tools might write such data to images - is that for some reaons the are not in the proper encoding before being converted or something.
I did my checks and some googling.
First here is post from exiftool author on limitations of various standards for metadata EXIF, IPTC, XMP, where he talks specifically about encoding and implications for various languages.
https://exiftool.org/gui/articles/where_what.html
IPTC section has imposed limitations on field length, which is a source of problem.
There is no settings for encoding used in lightroom 5.3, but I also use geotagger app and this is set to write everything in UTF-8, even if original metadata is encoded differently.
I checked metadata for my photos using their GUI for exiftool.
EXIF has no fields for Title of Keywords, just Description.
XMP has fields for Title, Keywords (field named Subject) and Description and all my metadata is preserved in full.
IPTC has fields for Title (field Object name), Keywords and Description and here we can see shortening of longer entries in Object name and Keywords.
https://www.photo.private-universe.net/albums/-temp/iptc-1.jpg
https://www.photo.private-universe.net/albums/-temp/iptc-2.jpg
So, maybe the solution is to use XMP field first (data is similar)?
Yes, it is a IPTC issue it seems. But actually the image stores the values correctly as I can see in image editors. So it is either the PHP function limiting or doing something wrong. WE just first thought we had something but…
This is probably a general issue with other non Western European chars + encoding as well.
Yes, in your case of course try XMP using the plugin.
It worked!
Enabling xmpMetadata plugin and refreshing metadata for my site allowed to reload full versions of Title and Keywords.
Some notes:
https://www.zenphoto.org/news/xmpmetadata/ - wrong status and dead link, as plugin is included with zenphoto.
Maybe it will be helpful to other users to include info in xmpMetadata plugin description, Help files and Admin on possible benefits for non-latin based languages in tags, so people will enable it right away?
Great that worked! I also discovered that with the other library I mentioned that iptc is truncated while its xmp values - which I was not aware of somehow - are also correct. So it is perhaps indeed the php iptcparse() function here just following some "official standard" limits.
Thanks for the note about the wrong link on the plugin page. Seems that applies to all official plugins.
Thanks, we'll also think about your suggest about a note.