Hi there yet again some Encoding thing. This one is pretty strange.
What I get is what seems to me like a random drop of characters to be but into the database. Even different from the same file just with different things in the description, even stranger it contains the same characters but in 1 it drops in the other one its fine.
I will send 2 links to the same file but with different descriptions, Both Done in Lightroom just seconds apart. Same method used. This is just 1 out of many random works or not works images.
Same bug is in my local installation and Abroad server.
So this one works fine for me, it has some random selections of letters and make sure you look at the letter á as it works fine in this file.
http://www.olihar.com/junk/zen/2007_07_22_140638R-works.jpgIn this file the sentence gets cut off at the letter á, I have some more files that do get cut with all kinds of other icelandic characters.
http://www.olihar.com/junk/zen/2007_07_22_140638R-notworks.jpgHere is a screenshot from MacOs about the files. And pointing at the Character giving problems in this case, Like I said it happens at what seems to be random what character it is if it has any problem at all.
http://img.skitch.com/20090205-rpgmg1k7r96chwarikxf4uu831.pngDatabase Collation is utf8_unicode_ci
IPTC encoding is UTF-8 in ZenPhoto.
Charset is UTF-8 in ZenPhoto.
I have tried ticking on and off "UTF8 image URIs" box.
I did inluce these 2 Images for you to give it a try and see for yourself this strange random thing. Here is the link again.
http://www.olihar.com/junk/zen/best
Olafur
Comments
Version: zenphoto version 1.2.3 [3429] | zenpage version 1.0.1 [3429]
What specifically is happening is that there are non-UTF8 characters in the description that are causing the SQL store of it to be truncated.
I just find it so strange to believe how the problem is with these files.
Have you tried the files yourself in your own setup?
I really would like to see if you get the same results.
Thanks again for looking into this Encode problem.
When Icelandic letters are used they never work in a sentence if it does not include the letter þ and ð.
http://en.wikipedia.org/wiki/Thorn_(letter)
http://en.wikipedia.org/wiki/Eth
It does not matter where they are placed in the sentence as long as there is at least 1 of them the reading of IPTC works 100%.
If they are not used in the sentence in any place it will be cut off at the first other icelandic letter. áéÃúóæö
Alphabet is here for further refs.
http://en.wikipedia.org/wiki/Icelandic_language
This is as far as I know and I have started to find it a rather strange bug.
I hope you can take the time to try these findings out.
Edit: I have been doing some reading and I always ment to ask why UTF-16 is not a possibility, this is just out of curiosity.
As far as your characters go. Bottom line is that somehow the string is not being recodgnized for the charcter set it is in. The recognition possibilities are 1. a flag in the IPTC data which indicates UTF-8. (There may be one for UTF-16, but I do not know what that flag is.) 2. The setting of your IPTC data character set option.
If the character set stored in the IPTC datadoes not match this setting then you will get the problem you describe.
Now the things I had problems with do display, but they display like this 骇銎 . Chinese symbols.
The other one with a added ð or þ displays the same, there to say display good.
But normal sentence displays like this ä¥æ«æ¼ çµæ¹¢æ…´æ¡©æ¹§â©æ¸ ä©æ¹§çŒ ä¡æ¹¹æ½®â° å¡ç‘¡ç‰²æ¡âŽæ…´æ¥¯æ¹¡æ° å¡ç‰« Chinese characters.
I just for the fun changed the encode for the page as well and it all comes like this.
http://img.skitch.com/20090206-f6s4gqy8me33te126wnhitssr6.png
I have been looking around and did come accross the name for it is macintosh.
So I tought well I will add "macintosh" => "MacRoman", to admin-functions.php
Well it is there already so no need to add it.
"MACINTOSH" => "Western European (MAC)",
So, I have indeed tried this many times to change it to that encoding and what happens is.
For the letters áÃéú
I get ‡’Žœ so yes boxes.
However if I change my browser encoding to MACINTOSH I get
¬á¬ÃŽœ 1 extra little thing in front of all the letter.
Well thats where i stand again, right back to the beginning when I did think it as enough to change to macintosh encoding for the IPTC.
So now I must have the right setup on how ZenPhoto treats metadata.
Not works:
2#120 => ( 0 => Icelandic Indiana Jones in Stakkholtsgj‡. )
Works:
2#120 => ( 0 => hverig koma Ãslenskir stafir út núna áÃöþæ )
In addition, the Works picture has the flag that the character set is UTF-8 while the not works one does not have that flag.
So, the problem is on the Lightroom end.
Well I guess I have to put out money for photo Mechanic then. 135$ just to upgrade the IPTC to UTF-8.
but like I posted a little earlier I do find it strange that changing to macintosh does not work, as far as I can see that is the right charset for the photos.
In Photoshop Bridge there is an option to set the IPTC data to UTF-8. Maybe that option exists for Lightroom as well.
I have been told by a photographer not to use bridge for any IPTC related things, it can really screw up the data. I might give it a try and see how it goes.
To mention Lightroom, even on the PC you are not able to write any of the áéóÃú letters in the program they turn out as ´a´e´o´i´u. However you can paste them in from Text edit for example. It really pisses the Scandinavians off. EDIT: Hold that thought I have just been reported Adobe finally fixed that problem with version 2.2, about freaking time.
Why is Charset such a hassle, hehe.