search box and words with accents

Hello,

I cannot find with the search box a word that has accents,like in French, by typing the same word without accent. The contrary doesn't work either.
The only way I found to get the photos found even though a user doesn't know the good spelling, is to tag with the 2 versions: one with accents and one without.
Is it normal? It would be more usual (and practical) that accents don't make any difference I guess.

Thank you!

ZPhoto 1.3.1.2. / zp-galleriffic theme

Comments

  • acrylian Administrator
    Accents are a complicated matter as it envolves a lot of encoding. Everything should be UTF8 and probably you need the mbstring package on your sever. Please do a forum search and visit the troubleshooting.
  • Depending on how you got these target accents into Zenphoto it is also possible that the character set of that data was not UTF-8. Since the default for Zenphoto WEB pages is UTF-8, any accent you type into the search box will be in UTF-8 and will not match any stored accent which is not.
  • ctdlg Member

    Hello,

    I have the same problem.

    Suppose I have a picture description "école de ..."

    Searching "école" gives a result containing this picture. OK.
    But searching "ecole" does not return this picture as a result.

    Is it possible to rencode accentuated characters to be searched like
    é -> e
    è -> e
    ế -> e
    à -> a
    etc.

    Thanks in advance.

  • acrylian Administrator

    Well, technically those are not the same character and therefore not found.

    It will be really complicated. How should we know that a search for "ecole" means "école"? That would imply that the search engine understands the word in French and knows that this word is written with an accent.

    I would recommend to add tags with and without the accent and turn on the tag suggest plugin.

  • I am not an expert in search engine, but I suppose there are different ways to do that:

    • convert both search word and (dynamically) all text field where the search is made.
      exemple : search word "école" is converted to "ecole" and all contents like "école primaire" or "école secondaire" are converted to "ecole primaire" or "ecole secondaire".
      then search engine on converted keyword "ecole" can find some results.

    • made a search on a part of search criteria without any accentuated characters : with "école" criteria, search on "cole"

    • with a search word like "ecoliere" create as many search word with all accentuated characters that can be in the word : écoliere, ècolière, ecolïere,... and then made the search on all this words collection, and then the right word "écolière" can be found by the search engine on some content fields.

    • I don't know if it exists, but use a webservice that can make some proposal to convert incorrects words
      (a search on "ecoliére" gives someting like : you mean "écolière", and then made a search on this right word.

    There is probably other ideas to do this achievment.

  • vincent3569 Member
    edited October 2

    of course, I supose that is not a good idea in this open source project: google search engine is a webservice.
    I don't know if this webservice can index all the content of a website and then allow search on it with errors on search words (or errors on content).

    other open source search webservices exist, as far I can search (with google, sorry ;-) : https://www.google.com/search?q=open+source+search+engine

  • acrylian Administrator
    edited October 2

    You in any case need a huge catalog to know what to replace and use as alternatives or additions. "école" > "ecole" is easy but you also need the other way round of courses.

    This would in any case requires a huge catalog of almost all languages to work as expected. Don't forget Google has this huge catalog because it is a billion dollar heavy company ;-) If you really need this Google provides means to include a site wide limited search into your site via their APIs. There are other search engines (and more privacy friendly ones) that might provide similar.

  • vincent3569 Member
    edited October 3

    @ctdlg
    are you sure of your trouble?
    on my site (https://www.vincentbourganel.fr), keywords like "Mélie", "mélie", "Melie" and "melie" give (almost, see below) the same results:

    • Albums (1) et images (22)
    • Articles (17)

    so zenphoto search engine seems to be case insensitive and doesn't care of accentuated characters (in search keywords and in contents)

    @acrylian
    I wrote "almost", because all what I wrote is right if I choose all search fields but "tags".
    But search on tags only seems to be more strict:

    • "Mélie" and "mélie" give 10 pictures (with "Mélie" tag)
    • but "Melie" and "melie" give no result (but 10 pictures still exist with "Mélie" tag)

    Search engine on tags only seems to be case insensitive but seems to be accentuted characters sensitive.

    Maybe there is something to do on this way...

  • acrylian Administrator

    The case should really not matter on searching because it is really the same work no matter of the case.

    but "Melie" and "melie" give no result (but 10 pictures still exist with "Mélie" tag)

    Probably because of that. Perhaps you have the word without accents in one other field so the general search covers it.

  • ctdlg Member

    Thank you for all your comments.
    Yes, Zenphoto is not case-sensitive. Nice feature.

    There is another way : using the word "ecole" in the file name and the word "école" in the description.

    And I will also have a look to this kind of info : https://stackoverflow.com/questions/1017599/how-do-i-remove-accents-from-characters-in-a-php-string

    @ vincent3569 : yes, I'm sure, I triple checked with different french words. Zenphoto 1.5.

    And why do I want this feature ?
    because many french people do not know much about spelling. École is not a good example, as everyone (is at - or) went to school - fière is a better one, many people will search for fiere ...
    And will get no result.

  • ctdlg Member

    From the above link, the wordpress method is supposed to be the best.
    I've added their "convert" function to zenphoto functions.php file
    (utf8 method for me)

    I have to understand how the zenphoto search operates to adapt and use this function.

    Stay tuned.

  • acrylian Administrator

    As mentioned we have a plugin with a similar convert function like wp called zenphoto_seo that clears filenames/titlelinks etc if enabled. If that should be used.

  • ctdlg Member

    I suppose I have to convert both the sql field entries and the search word entered by the visitor : école in the database will be converted to ecole, user can enter école or ecole ! it should work.

  • ctdlg Member

    @acrylian : seo_zenphoto does not help with search function : "ecole" does not return "école" items.

  • acrylian Administrator

    convert both the sql field entries

    Or you assign two tags in all required writings. Of course for general free text search in the description that will be not usable. But for all that can use tags.

    seo_zenphoto does not help with search function : "ecole" does not return "école" items.

    No, of course it does not becuase that is not setup to work like that. I meant if any replacement woud have to be done there is no need to add an additional function that does basically the same.

  • vincent3569 Member
    edited October 4

    @ctdlg: as a french people, of course I am concerned with accentuated characters ;-)

    @acrylian: I have a test with a new tag "chèvre".
    have a look on my test site with keywords "chèvre", "chevre", "Chèvre" and "Chevre".
    https://test.vincentbourganel.fr/page/archive/

    there are 3 pictures with "chèvre" in description
    there is one picture with tag "chèvre"
    there is on news with "chèvre" in content.

    picture with tag "chèvre" is only found with accentuated keywords "chèvre" and "Chèvres".
    other items are found with all keywords above (accentuated or not).

    so search engine seems to have specific behavior with tags.

  • acrylian Administrator

    Yes, a direct tag search uses a specific url since you explicitly request that when clicking on the link. This is rather expected behaviour.

  • I don't use tag on tag cloud in the end of my page but I use the search input.

    To be more clear on my scenario:

    • I create a new tag "chèvre" used only by one picture.
    • the words "Chèvre", "Chèvres" and "Chèvrerie" are present on several picture descriptions / news content.

    If I use "chèvre" or "Chèvre" as keyword on all fields, there are 5 items found (including item with tag "chèvre").
    But if I use "chevre" or "Chevre" as keyword, there are only 4 items found: the picture with tag "chèvre" is not found.

    so search on tags content seems to be "case insensitive" but "characters sensitive".

    objectively, i don't see reason to privilege the "case insensitive" and not "characters insensitive" as it is done on the contents of all the fields of the gallery.

  • acrylian Administrator

    Did you clear or disable the search cache, too? Jus to be sure nothing interferes. I will try to reproduce this as soon as I get the time.

    Btw, the next major release planned will see some changes to tags that might impact searching on them so I am not sure I will work on this in 1.5.x anymore to avoid double work. I seriously need to start to be overly picky…

  • Yes, the search cache is disabled and cleared.

    @ctdlg: on witch contents have you your trouble?
    as we could see, the search on pictures descriptions and news/pages contents gives good results with accentuated characters except on tag.

    can you provide a link?

  • ctdlg Member

    @vincent3569
    I just wrote a message on this forum to announce my website:
    https://clatique.fr
    And yes, i get same search results as you.
    Search Neopolis or Néopolis .
    on pages, accents do not matter !

Sign In or Register to comment.