Sitemap plugin generates urls disallowed by robots.txt

Hi,

I'm using the sitemap plugin with the Google extensions enabled. I noticed this morning that the Google Webmaster tools lists warnings that the sitemap contains entries that are disallowed by the robots.txt file.

The URLs that the Webmaster tools are complaining about are like this one:

http://photosbytechxplorer.com/zp-core/i.php?a=brighton&i=IMG_0506.jpg&s=1200&q=85&wmk=!

By default urls with /zp-core/ in them are disallowed by the robots.txt file.

Is this an issue that is impacting on the indexing of my site? If so what do I need to do to resolve it?

With thanks.

-Corey

Comments

  • You will need to cause all your images to be cached before you generate your sitemap. That URL is to the image processor and really should not be indexed by Google.

    You can use the cacheManager plugin if there are a large number of images on the site that still need caching.
  • Hi,

    I've used the cacheManager plugin to update my cache and the error went away.

    I now have a new error that I didn't see before.

    The error is in this part of the sitemap:

    http://photosbytechxplorer.com/cache_html/sitemap/sitemap-zenphoto-images-1.xml

    The error is that the urlset tag doesn't contain any url child tags.

    Any thoughts on what I'm missing?

    With thanks.

    -Corey
  • acrylian Administrator, Developer
    Sure your images are published and/or not protected? Otherwise they will not appear. Also you might need to clear the sitemaps and re-generate.
  • Hi,

    As far as I can tell my images are published, they appear in the gallery and albums, I've certainly not made them protected or anything. You can check by going here:

    http://photosbytechxplorer.com

    I've emptied and regenerated the cache, as well as cleared the sitemap cache and regenerated it and the http://photosbytechxplorer.com/cache_html/sitemap/sitemap-zenphoto-images-1.xml is still an empty urlset tag.

    Any thoughts on what I should try next? I've had a look in the debug log but can't see any errors.
  • Interesting...

    Just turned off the "Enable Google image and video extension" and the http://photosbytechxplorer.com/cache_html/sitemap/sitemap-zenphoto-images-1.xml file now contains urls.

    With the option turned on, the file is empty, with it turned off, the file contains urls.

    I only really turned the option on so it could put the creative commons license url into the xml.

    Any thoughts on what is going on?
  • acrylian Administrator, Developer
    Please look into the server error log. The google image extension might overload your server if you have a lot of images since for each album a full list of theme is generated (I think we put a note on the options). If so there should be an error reported.
  • I edit the sitemaps and upload them to the root folder of my site instead of the cache_html folder. It only takes a minute to take out the images and news index (I use the Google images extension.) Then Google doesn't gripe about my empty sitemap files, and my sitemaps are accepted 100% of the time.

    From what I've seen of sitemaps (so far), it's very difficult for any sitemap generator to get it 100% right so Google won't complain. Seems as if it's always something, with Google. So I've found that it's just easier to do a quick cleanup.

    And, it's good for me to understand how sitemaps are constructed too. They're easy to edit, they're not very complicated.

    That's my two cents, for what it's worth. :)
  • Hi,

    This afternoon I did some further experimentation.

    I can't see any errors reported in the debug log of zenphoto.

    I can't see any errors in the error log of my host.

    In attempt to reduce the load I changed the settings so that each sub index file would contain only one album each.

    The upshot of the issue is that for me I can't include the Google specific extensions to the site map, which is unfortunate as I wanted to include the url to the creative commons license in the sitemap.

    For the moment I'll leave the Google image and video extension disabled.

    Thanks for your help.

    -Corey
  • acrylian Administrator, Developer
    Sadly there is no other way to generate this Google extra stuff, it must be tied to the album as of their definition. I am not sure but won't google discover the licence if you just place the html snippet creativecommons provides as well?
Sign In or Register to comment.