Some scalability suggestions

Hi,

I am currently building a large gallery. But I'm yet to put it on a production environment, since I'm concerned about scalability issues.

As of now, the gallery has > 2000 albums/folders. When the site grows, I think we may run into some performance issues. Thousands of directories and files in the same place - not a very efficient solution. We'll be screwed if there are lot of concurrent visitors!

I am proposing to rework the core engine to accomodate more scalable solutions:

1. There should be a seperate "uploads" folder. Users shouldn't upload directly into the "albums" folder, rather they can put the new albums in "uploads". Then through admin panel, or perhaps via cron, we can import the newly added albums into the main "albums" folder. Nobody, except the zen engine, should mess with the "albums" directory. The reasons are described below.

2. The "albums" directory structure should be made scalable, so that it can efficiently store large number of albums and images. Instead of storing thousands of albums in the same directory level, we can store them based on depth. This is how I'd implement it (ala cache-lite):

Let's say I have uploaded a new album in the "uploads" dir. And now zp is importing it to the main "albums" dir. Let's assume the name of the dir is "New Album 1".

zp adds the directory name ("New Album 1") to mysql table as the album_title;

then we generate a hash value from the dirname/albumname:
`$md5 = md5($new_album_name);`

Let's assume the hash value is "abc123". This will be the key for the new album, and we'll compute path of the album from this key.

Let's assume we want to store the albums 3 levels deep into the "albums" directory.

`$scalable_dir_depth = 3;`

So we deduce the path of the album from its hash key ("abc123") in this way:
`

/albums/a <-- level 1<br />
/albums/a/b <-- level 2<br />
/albums/a/b/c <-- level 3<br />
/albums/a/b/c/abc123 <-- We store the actual album in this directory, 3 levels down<br />
`

So zenphoto moves the album from the "uploads" folder to the new location, and adds the path to appropriate mysql tables.

When zp is displaying the albums to end users, instead of scanning the "albums" directory, it will grab the album title and path from mysql tables. There is no need to hit the disks since nobody will alter the "albums" directory.

This is a much more efficient solution than the current disk-traversal based approach. It's also possible to allow duplicate album names:

Suppose we upload another album named "New Album 1". But this album already exist here in /albums/a/b/c/abc123

To avoid duplicate folders, we do this:

We can concatenate random strings to the album name, until a unique hash is found. Pseudocode:
`

while (folder_exists($new_album_folder) === true) {

$hash = md5($new_album_name . generate_random_chars());

$new_folder_name = generate_folder_from_hash($hash);

}

`

In this way we can have a unique folder name for the album, even if it has a duplicate title.

A lot of inner-working of zp-core will have to be rewritten to accomodate these scalable concepts.

@developers: what do you think?

Comments

  • There's some caveats I forgot to mention about generating unique hashes for duplicate albums: How do we handle sub-albums?

    Here's how:
    #1 The new album has no sub-albums: we can safely use the algorithm, no problems.
    #2 The new album has sub-directories:
    /New-album-1/sub-1/sub-2/sub-3
    In this case, we compute unique hash from the deepest directory (sub-3)
    I.e, we calculate hash values for these directories as-is:
    * New-album-1
    * sub-1
    * sub-2
    It doesn't matter if the above folders are present in the "albums" directory.
    But for sub-3, we'll generate the "duplicate-safe" hash value.

    How do we handle sub-albums at the presentation level? That's for the developers to figure out... ;-)

    Perhaps we could introduce a "parent_id" field into the mysql table and define a master-detail relationship for the sub-albums?
    `SELECT ... FROM ... WHERE parent_id="x";`

    Gonna be pretty complicated.
  • acrylian Administrator, Developer
    I am actually not the expert on these things, so I would let my fellow developers answer that. But actually Zenphoto should be quite scalable: http://www.zenphoto.org/2007/12/installation-and-upgrading/#6 (statement from our project leader Trisweb).
  • Also, your proposed change violates a fundamental premis of zenphoto--that the files in the folders are the defining element of the gallery and that the database only holds "meta data".
  • hmmm, I didn't know that. But then, I've been tinkering with zp only for a week!

    Anyways, I've rearranged my gallery. Instead of placing all galleries at the same level, I've put them under different sub-albums.

    Interestingly, I had also experienced mysql crash and corrupted zenphoto database. I had to drop the database. Thanks to zenphotos folder-based approach I just had to do a simple setup.php!

    So I'll be sticking with the default zp distro, no crazy scalability hacks for me! ;-)
Sign In or Register to comment.