xmpMetadata and (X)HTML character references

Hello,

I'm wondering, since XMP is XML-friendly, shouldn't xmpMetadata decode HTML characters references? `exiftool -tagsfromfile img.jpg img.xmp` produces a XMP where & (&), ' ('), " ("), > (>), and < (<) are escaped. On the other hand `exiv2 ex -e xX img.jpg` is fine with quotes, but escapes linefeed (&#xa) among others.

Or perhaps there another way to circumvent HTML character references?
Thanks anyway!

Comments

  • The plugin probably should be decoding HTML entities. We will add that to the list.
  • All right thanks ;) If you want me to open a ticket just let me know.
  • Normally, yes, but I have made the change and it will be in the nightly tonight. I would appreciate some testing, though.
  • Wow, that was fast!
    I checked it against https://en.wikipedia.org/wiki/Character_entity_reference, and as far I saw the entity and numeric (decimal and hexadecimal) references are rendered properly, except one thing: the & (`&`, `&#38` and `&#x26`) "eats" one character too many in some (!) cases: try eg, to render `&'&'&x&§&e`. On the other hand, `&a&a` looks fine
  • It is hard to read/write html entities on a website. But it seems to me that what you are describing is that the translation fails when you have a naked ampersand preceding an entity. That is, of ocurse, not legal--ampersand is supposed to be represented by `&``amp;`
  • Oops, sorry for the mess. No I mean, if you write an entity that represents the ampersand, then in some cases the character that immediately follows is ignored. Try e.g., to render `https://pastebin.com/raw.php?i=y77sUUcB`: the first line is messed up, while the second is fine.
  • It looks like it is rendering correctly to me. However remember that the output may cause you issues: `&``§` is not valid HTML
  • Ah? I know that `&§` is not valid HTML, but with the first line of my above paste (it's raw, there is no translation), I would expect `&'&'&x&§&e`, but I get `http://i.imgur.com/hYVsx.png`.
    Don't you get the same result?
  • No, I do not, I get as you expect. I am guessing that what you see is a result of the browser tyring to interpret the `&``§`

    Anyway, I did my tests by saving the result to a disk file to keep the browser out of the picture.

    But I did notice that `&``apos;` does not get converted. So maybe we really need a full XML character table and not just the PHP `html_entities_decode()` I'll work on that.
  • All right, thanks for your work.
Sign In or Register to comment.