UTF8 IPTC metadata


I have some trouble with the charset of image captions, namely the non ASCII 7 bits characters are replaced with ‘?’, or even the text stops rendering. (That's what is in the database anyway, but not in the (UTF8) EXIF of my pictures.)

I chose UTF8 images URIs, unicode charset, and UTF8 IPTC in ZP's options, and it connects to the mySQL database in unicode only (client, connection, database, results, server, system). Also the non-latin file names are printed properly, hence I suspect IPTC's reader.

If I edit the database manually and replace the wrong text with something that contains non ASCII 7 bits character everything goes well, but of course I loose my changes when I press "Refresh Metatada" :(

Am I missing something?
(I use ZP version 1.4.2-RC2)


  • Most likely the character set of you image metadata misunderstood. The metadata is supposed to identify itself as UTF-8 if it is so, but often cameras and or software do not do this. Zenphoto will presume image metadata is Ascii unless otherwise informed.

    If the imbedded setting for the character set is not set properly you can override it with the `IPTC encoding` option.
  • Thanks for your reply, sbillard. I had in fact already set IPTC encoding to UTF8, but I didn't know that digikam only generates ASCI IPTC metadata. Hopefully it's more flexible with XMP, and the plugin xmpMetadata you wrote solves my problem ;)
  • hi,
    I have problems with Vietnamese UTF-8. My photos is encoded with unicode and I try set IPTC encoding is UTF-8 and also try use xmpMetadata but characters are wrong.
    I test on both mysql 5.0 and 5.1.
    Please help me!
  • MySQL has no bearing on the encoding of your image metadata. If the metadata does not explicitly set the encoding to UTF-8 it may well not be being stored as such. But if the data is infact UTF-8 and you have set the IPTC encoding option to UTF-8 and the caracters are still wrong, then whatever is encoding them in the image is doing it wrong.
  • Thank sbillard for your reply.
    My column collation is utf8_unicode_ci. Now I will change to what?
  • I think I told you that the MySQL coding is not the issue. It is and should be UTF-8. It is the internal encoding of the image metadata. You will have to figure out what character set it is and set the Zenphoto option accordingly.
  • Could you test my photo? I email you one?
    many thanks
  • acrylian Administrator
    No, you can't mail us. Please open a ticket and attach an image there.
  • OK, I have create ticket which contains one attachment. Ticket #2226
    Please help me.
  • Just for everyone to know:

    The image submitted with this ticket has metadata encoded in some form of ASCII, not UTF-8. As well, the data itself does not has ASCII question marks imbedded in it, presumably where saosangmo expects maaningful characters. So Zenphoto merely is presenting the data as it is stored.
  • So, could you guide me how to enforce use UTF-8 without auto dectect encoding of image?
    I want to test some cases of encoding settings.
    thank sbillard,
  • This is not a Zenphoto question. You cannot enforce something from Zenphoto when it is being created somewhere else. You need to deal with whatever is setting the image metadata.
  • hi sbillard,
    I am sorry if this information cause bump this topic: when I open my photo with photoshop, I copy the description of photo and paste it into the description in the editor -> save it -> everything display correctly.
    I will try to test some other cases and report you soon.
    many thanks for your support.
  • What you should do is open your image in a hex editor and see what has been placed there. For the image you submitted there was ASCII text of some varient with a large number of imbedded question mark characters.
  • hi sbillard,
    First, hope that I lost my account on forum is by accident.
    My encoding of description photo may be unicode. "a large number of imbedded question mark characters" you mention is the problem I'm trying to solve and need you review it.

    http://img594.imageshack.us/img594/2158/screenshot20120824at352.png --> none of question mark.
  • acrylian Administrator
    I am sorry for the loss of the account. Probably a wrong click while sorting out the loads of spam we get currently.

    That is the display of Photoshop. But that is not the point. I quote my colleage:
    "What you should do is open your image in a hex editor"

    Encoding is a rather complicated matter. This is surely an encoding mismatch

    Your original image with the desc read by Zenphoto is broken: http://zenphoto.maltem.de/Test/A1.jpg.php

    The metadata from Photoshop pasted manually is allright: http://zenphoto.maltem.de/Test/A1b.jpg.php

    My database has these settings:
    character_set_client: utf8
    character_set_connection: utf8
    character_set_database: latin1
    character_set_filesystem: binary
    character_set_results: utf8
    character_set_server: latin1
    character_set_system: utf8
    collation_connection: utf8_general_ci
    collation_database: latin1_swedish_ci
    collation_server: latin1_swedish_ci

    And Zenphoto is set to utf8.
  • hi acrylian,
    I see the wrong characters with hex viewer.
    I test my photo with http://exifdata.com/exif.php. I think the description we copy and paste manually from Photoshop is XMP DC field.

    Adobe Photoshop may do some trick to display description in Description IPTC field. :(

    The text below is utf-8 which is viewed in binary mode. And it will display correctly when encoded with utf-8.

    "Ảnh: A6a + A6b: NÆ°á»?c lấy từ mạch rá»? ra nÆ¡i khe Ä?á Khát cháy Nam Đông PSA: Xuân Trường - An SÆ¡n Đã Ä?ến giữa tháng 8-2012, chảo lá»­a Nam Đông (Thừa Thiên Huế) vẫn chÆ°a có mÆ°a, toà n huyá»?n miền núi nà y lâm và o Ä?ợt Ä?ại hạn tá»?i tá»? nhất trong hÆ¡n 10 nÄ?m qua. HÆ¡n 7 nghìn người dân á»? các xã bá»? thiếu nÆ°á»?c trầm trọng do các công trình nÆ°á»?c sinh hoạt, giếng bá»? mất nguá»?n. hÆ¡n 60 ha lúa chiếm 20% diá»?n tích lúa hè thu bá»? khô cháy, phần còn lại bá»? ảnh hÆ°á»?ng nÄ?ng suất 20-50%, cây cao su nguá»?n thu lá»?n nhất của nông dân Nam Đông do hạn hán cÅ©ng giảm nÄ?ng suất khoảng 20%, hầu hết diá»?n tích rau, mà u bá»? hÆ° hỏng trong Ä?ó có nhiều vùng khó phục há»?i Ä?ược."

    And can I use XMP DC for my photo to by pass the problem?

    thank you very much!
  • acrylian Administrator
    I have to leave futher answers to my colleague as he is the expert on this stuff.
  • Yes, lost by accident. We get about 1000 new SPAM users a day so it is hard to manage. This last time I made a mistake and forgot the "no posts" selection when deleting users. So I am sure I accidentally deleted your user id.

    Again I appologize.

    But as to your problem. I am sorry, but I cannot spend more time on this issue. As stated, I examined image you provided even though this is really above and beyond the support normally provided. That image was not encoded in UTF-8 and did have ASCII question marks imbedded in it which natually would also show up in Zenphoto. Most likely these question marks are exactly where you show in the post above, but of course I have not verified this. In addition, the characterset tag in your image is empty which means Zenphoto will assume ISO-8859-1 unless you set the option differently.

    You can use XMP is you wish, but of course you will have to resolve the character set there as well.
  • hi,
    Thank you. You are hard working to support me.

    Could you guide me some lines to use XMP information instead of IPTC when I upload my photos in Zenphoto.

Sign In or Register to comment.