Strange Encode drop out

Hi there yet again some Encoding thing. This one is pretty strange.

What I get is what seems to me like a random drop of characters to be but into the database. Even different from the same file just with different things in the description, even stranger it contains the same characters but in 1 it drops in the other one its fine.

I will send 2 links to the same file but with different descriptions, Both Done in Lightroom just seconds apart. Same method used. This is just 1 out of many random works or not works images.

Same bug is in my local installation and Abroad server.

So this one works fine for me, it has some random selections of letters and make sure you look at the letter á as it works fine in this file.
http://www.olihar.com/junk/zen/2007_07_22_140638R-works.jpg

In this file the sentence gets cut off at the letter á, I have some more files that do get cut with all kinds of other icelandic characters.
http://www.olihar.com/junk/zen/2007_07_22_140638R-notworks.jpg

Here is a screenshot from MacOs about the files. And pointing at the Character giving problems in this case, Like I said it happens at what seems to be random what character it is if it has any problem at all.
http://img.skitch.com/20090205-rpgmg1k7r96chwarikxf4uu831.png

Database Collation is utf8_unicode_ci

IPTC encoding is UTF-8 in ZenPhoto.

Charset is UTF-8 in ZenPhoto.

I have tried ticking on and off "UTF8 image URIs" box.

I did inluce these 2 Images for you to give it a try and see for yourself this strange random thing. Here is the link again. http://www.olihar.com/junk/zen/

best
Olafur

Comments

  • I forgot to state version:

    Version: zenphoto version 1.2.3 [3429] | zenpage version 1.0.1 [3429]
  • If these characters are in the Descriptions of the images and these descriptions are comeing from the IPTC data then you must be having a problem with two things. First, the IPTC data is not tagged for its character set. Second, you have different character sets being used.

    What specifically is happening is that there are non-UTF8 characters in the description that are causing the SQL store of it to be truncated.
  • Why i decided to include the 2 different files is to show that even the same photo just with 2 different sentences exported minutes apart with the same example of a problematic letter, 1 works the other one does not.

    I just find it so strange to believe how the problem is with these files.

    Have you tried the files yourself in your own setup?

    I really would like to see if you get the same results.

    Thanks again for looking into this Encode problem.
  • I have been doing some testing and I found a strange thing.

    When Icelandic letters are used they never work in a sentence if it does not include the letter þ and ð.

    http://en.wikipedia.org/wiki/Thorn_(letter)
    http://en.wikipedia.org/wiki/Eth

    It does not matter where they are placed in the sentence as long as there is at least 1 of them the reading of IPTC works 100%.

    If they are not used in the sentence in any place it will be cut off at the first other icelandic letter. áéíúóæö

    Alphabet is here for further refs.
    http://en.wikipedia.org/wiki/Icelandic_language

    This is as far as I know and I have started to find it a rather strange bug.

    I hope you can take the time to try these findings out.

    Edit: I have been doing some reading and I always ment to ask why UTF-16 is not a possibility, this is just out of curiosity.
  • No testing has been done with UTF-16. You can try adding it to the Charsets array of admin-functions.php and see what happens.

    As far as your characters go. Bottom line is that somehow the string is not being recodgnized for the charcter set it is in. The recognition possibilities are 1. a flag in the IPTC data which indicates UTF-8. (There may be one for UTF-16, but I do not know what that flag is.) 2. The setting of your IPTC data character set option.

    If the character set stored in the IPTC datadoes not match this setting then you will get the problem you describe.
  • just a quick update I tried your trick with adding UTF-16. I assigned UTF-16 to the IPTC reading in ZenPhoto.

    Now the things I had problems with do display, but they display like this 骇銎 . Chinese symbols.

    The other one with a added ð or þ displays the same, there to say display good.

    But normal sentence displays like this 䝥捫漠獵湢慴桩湧⁩渠䭩湧猠䍡湹潮Ⱐ坡瑡牲歡⁎慴楯湡氠偡牫 Chinese characters.

    I just for the fun changed the encode for the page as well and it all comes like this.
    http://img.skitch.com/20090206-f6s4gqy8me33te126wnhitssr6.png
  • I was talking to some guy about encoding and he suggested Photo Mechanic to have a look at the files. So it is clear now(even though Adobe has stated that Lightroom is UTF-8) that it is indeed MacRoman.

    I have been looking around and did come accross the name for it is macintosh.

    So I tought well I will add "macintosh" => "MacRoman", to admin-functions.php
    Well it is there already so no need to add it.
    "MACINTOSH" => "Western European (MAC)",

    So, I have indeed tried this many times to change it to that encoding and what happens is.

    For the letters áíéú

    I get ‡’Žœ so yes boxes.
    However if I change my browser encoding to MACINTOSH I get
    ‡’Žœ 1 extra little thing in front of all the letter.

    Well thats where i stand again, right back to the beginning when I did think it as enough to change to macintosh encoding for the IPTC.

    So now I must have the right setup on how ZenPhoto treats metadata.
  • Well, you got me curious enough to download and look at your pictures. The description is not making it into the IPTC string.

    Not works:
    2#120 => ( 0 => Icelandic Indiana Jones in Stakkholtsgj‡. )

    Works:
    2#120 => ( 0 => hverig koma íslenskir stafir út núna áíöþæ )

    In addition, the Works picture has the flag that the character set is UTF-8 while the not works one does not have that flag.

    So, the problem is on the Lightroom end.
  • strange if one of them gets flagged but the other one doesn't They are the same photo from Lightroom but one has the letter þ that makes it work. And Yes Photo Mechanic does show them both as "Macintosh" not "utf-8"

    Well I guess I have to put out money for photo Mechanic then. 135$ just to upgrade the IPTC to UTF-8.

    but like I posted a little earlier I do find it strange that changing to macintosh does not work, as far as I can see that is the right charset for the photos.
  • The only "encoding" definition that I could find for IPCT data was the UTF-8 one. (That is not quite true--the only definition that seemed useful was the UTF-8 one.) So if there is a 'macintosh' encoding I think it is not stored either.

    In Photoshop Bridge there is an option to set the IPTC data to UTF-8. Maybe that option exists for Lightroom as well.
  • I have looked around for some time about charset in Lightroom and as far as people say it is not possible to change it. They have complained ever since Lightroom was in Beta and now it is 2.x and nothing has changed and no feedback from Adobe what so ever about it.

    I have been told by a photographer not to use bridge for any IPTC related things, it can really screw up the data. I might give it a try and see how it goes.

    To mention Lightroom, even on the PC you are not able to write any of the áéóíú letters in the program they turn out as ´a´e´o´i´u. However you can paste them in from Text edit for example. It really pisses the Scandinavians off. EDIT: Hold that thought I have just been reported Adobe finally fixed that problem with version 2.2, about freaking time. :)

    Why is Charset such a hassle, hehe.
  • acrylian Administrator, Developer
    Probably because all these programs come for english speaking countries that do not have that much accented characters as we do...:-)
  • I was always going to ask, you are on a mac and you use German Characters. How does it work out for you. Well I guys you don't use lightroom as I heard you say at some point you don't take photos that much.
  • I use Bridge exclusively for my IPTC data. However, I also am on a PC rather than a MAC if that makes any difference.
  • acrylian Administrator, Developer
    @olihar: I don't have Lightroom but Bridge CS3 but I actually don't use it that much, I mostly use the Mac finder and Preview directly to sort images. Also bascially rarely IPTC/EXIF as I don't take photos at all as you supposed. (Don't have a digital camera at all currently!).
Sign In or Register to comment.