Distributed Caching with Memcached
One day, sick of how painful it is to cache efficiently in mod_perl applications, I started dreaming. I realized we had a lot of spare memory available around the network, and I wanted to use it somehow. If you're a Perl programmer strolling through CPAN, you find an abundance of Cache::* modules. The interface to almost all of them is a dictionary. If you're fortunate enough to have missed Computer Science 101, a dictionary is the name of the abstract data type that maps keys to values. Perl people call that an associative array or a hash, short for hash table. A hash table is a specific type of data structure that provides a dictionary interface.
I wanted a global hash table that all Web processes on all machines could access simultaneously, instantly seeing one another's changes. I'd use that for my cache. And because memory is cheap, networks are fast and I don't trust servers to stay alive, I wanted it spread out over all our machines. I did a quick search, found nothing and started building it.
Each Memcached server instance listens on a user-defined IP and port. The basic idea is you run Memcached instances all over your network, wherever you have free memory and your application uses them all. It's even useful to run multiple instances on the same machine, if that machine is 32-bit and has more total memory than the kernel makes available to a single process. For example, while we were learning our lesson on scaling out and not up, we picked up a ridiculously expensive machine that happens to have 12GB of memory. Nowadays, we use it for a number of miscellaneous tasks, one of which is running five 2GB Memcached instances. That gives us 10GB more memory in our global cache from a single machine, even though each process on 32-bit Linux usually can address only 3GB of memory.
The trick to Memcached is that for a given key, it needs to pick the same Memcached node consistently to handle that key, all while spreading out storage (keys) evenly across all nodes. It wouldn't work to store the key foo on machine 1 and then later have another process try to load foo from machine 2. Fortunately, this isn't a hard problem to solve. We simply can think of all the Memcached nodes on the network as buckets in a hash table.
Step 1: the application requests keys foo, bar and baz using the client library, which calculates key hash values, determining which Memcached server should receive requests.
Step 2: the Memcached client sends parallel requests to all relevant Memcached servers.
Step 3: the Memcached servers send responses to the client library.
Step 4: the Memcached client library aggregates responses for the application.
If you know how a hash table works, skim along. If you're new to hashes, here's a quick overview. A hash table is implemented as an array of buckets. Each bucket (array element) contains a list of nodes, with each node containing [key, value]. This list later is searched to find the node containing the right key. Most hashes start small and dynamically resize over time as the lists of the buckets get too long.
A request to get/set a key with a value requires that the key be run through a hash function. A hash function is a one-way function mapping a key (be it numeric or string) to some number that is going to be the bucket number. Once the bucket number has been calculated, the list of nodes for that bucket is searched, looking for the node with the given key. If it's not found, a new one can be added to the list.
So how does this relate to Memcached? Memcached presents to the user a dictionary interface (key -> value), but it's implemented internally as a two-layer hash. The first layer is implemented in the client library; it decides which Memcached server to send the request to by hashing the key onto a list of virtual buckets, each one representing a Memcached server. Once there, the selected Memcached server uses a typical hash table.
Each Memcached instance is totally independent, and does not communicate with the others. Each instance drops items used least recently by default to make room for new items. The server provides many statistics you can use to find query/hit/miss rates for your entire Memcached farm. If a server fails, the clients can be configured to route around the dead machine or machines and use the remaining active servers. This behavior is optional, because the application must be prepared to deal with receiving possibly stale information from a flapping node. When off, requests for keys on a dead server simply result in a cache miss to the application. With a sufficiently large Memcached farm on enough unique hosts, a dead machine shouldn't have much impact on global hit rates.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
- Dynamic DNS—an Object Lesson in Problem Solving
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Using Salt Stack and Vagrant for Drupal Development
- New Products
- A Topic for Discussion - Open Source Feature-Richness?
- Drupal Is a Framework: Why Everyone Needs to Understand This
- Validate an E-Mail Address with PHP, the Right Way
- RSS Feeds
- Readers' Choice Awards
- Tech Tip: Really Simple HTTP Server with Python
- DynDNS
2 hours 4 min ago - Reply to comment | Linux Journal
2 hours 37 min ago - All the articles you talked
5 hours 52 sec ago - All the articles you talked
5 hours 3 min ago - All the articles you talked
5 hours 5 min ago - myip
9 hours 30 min ago - Keeping track of IP address
11 hours 21 min ago - Roll your own dynamic dns
16 hours 34 min ago - Please correct the URL for Salt Stack's web site
19 hours 45 min ago - Android is Linux -- why no better inter-operation
22 hours 1 min ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?





Comments
Very nice and informative
Very nice and informative article.
iehsan
http://iehsan.com
Does PHP and Java can set and get data from the same Memcached?
My system use both PHP and Java. Does PHP and Java can set and get data from the same Memcached without any error?
In my case, when PHP set data to Memcached with compression, Java can not decompress this data.
Could you help me?
Thank a lot for support.
how to configure mem cahce in centos 5.4
http://blogs.iehsan.com
http://pmedia4u.com
NCache-memcached
Memcached is an open-source distributed cache that speeds up applications by caching application data and reducing database load. Memcached is free but it has has limitations also. It has limitations in cache reliability availability and high availability, whch can cause serious problems for mission critical applications.
NCache-Memcached removes these limitations without requiring you abandon using the Memcached API and your investment in your existing code.
NCache-Memcached is 100% Memcached Compatible for .NET and Java and gives you Reliability thru Scalable Cache Replication.
Does NCache has client for Memcached?
Hi, i was just curious if NCache has client for Memcached... it had something for NHibernate... if they do this would be a great product to use for distributed caching
Does NCache has client for Memcached?
Yes , NCache does have a client for Memcached.
Josh.
Can you provide the distributed File system used in facebook
You have mentioned about the caching mechanism in facebook.. Is there any way I can know the distributed file system or file system being used in facebook scenario..
Gear6 Web Cache?
Can anyone with experience comment on the memcached distribution from Gear6? Thanks.
On the Gear6 memcached distribution
Hi! I'm the Director of Community Development for Gear6.
The Gear6 distribution of Memcached is a heavily modified fork of version 1.2 of the BSD licenced open source implementation of Memcached.
On a protocol level, it speaks the memcached text protocol. All existing memcached clients should Just Work. Implementation of the binary protocol is on our roadmap.
Unlike the community version of memcached, the Gear6 version is not intented to be run on whatever servers you have handy. Instead it is bundled into a rack mount appliance, and uses flash memory as a fast high density secondary cache as the available RAM fills up.
There are a couple of other features of the Gear6 implemention, including a web-based GUI, a REST based management API, and support for setting up High Availably pairs, so that a hardware failure does not cause a node failure in your memcached fleet.
If you want to play around with the Gear6 implementation, you can go to our website and download a VM image (this does not use flash, of course), or you can go to Amazon Web Services and start up specially bundled EC2 AMIs. Details on doing this are, of course, on our website.
Please do play with our product, and feel free to publicly post what you think of it.
Thanks!
.. Mark Atwood
if i have 3 server each
if i have 3 server each server has 1 GB of memory allocate for memcache, and if i using distibute memcache, it means 1 have 3 GB of allocate memcache?
Distributed Caching using NCache
NCache has been the caching solution of choice for various mission critical applications throughout the world since August 2005. Its scalability, high availability and performance are the reasons it has earned the trust of developers, senior IT and management personnel within high profile companies. These companies are involved in a range of endeavors including e-commerce, financial services, health services and airline services.
The range of features that NCache currently showcases is highly competitive and revolutionary. It supports dynamic clustering, local and remote clients, advanced caching topologies like replicated and mirrored, partitioned and replicated partitioned. It also provides an overflow cache, eviction strategies, read-through and write-through capabilities, cache dependencies, event notifications and object query language facilities. For a complete list of features and details please visit http://www.alachisoft.com/ncache/index.html.
Download a 60 day trial enterprise/developer version or totally free NCache Express from www.alachisoft.com/download.html
Team NCache
how to do distribut memcache
how to do distribut memcache in php? using memcache::addserver? i have tried it but cant work :( please advise thanks
It's very nice artice Thanks
It's very nice artice
Thanks for sharing this very good and helpful topic and comments
Hi, whenever I am storing
Hi,
whenever I am storing any data in the memcache server using PHP Memcache client or mysql UDF's , I am not able to retrieve the same value using Java client.
I am using Danga's memcache client.
can u tell me where could the problem be ??
great article
Great article on memcached.
There is a small confusion in my mind.
"Once the bucket number has been calculated, the list of nodes for that bucket is searched, looking for the node with the given key."
When you say "node" here you mean the actual elements entries in the hash/bucket, right? Node has been used earlier to refer to a machine on the network, as in, a web node. Hence the confusion. Thanks!
memcached is great !
hello ;]
It's very nice artice
memcached is very powerfull module
for website with high traffic
few days ago I intall it on my forum counter strike
before memcached i was have simtime load server over 200 :( and server was craching
now i have load 1-20
thx for memcached :)
thx for great artice :)
greetings, mosh
thanks a lot!
i'm looking for this kind of solution everywhere!
thanks a lot
Deployed a system using
Deployed a system using memcached through Hibernate. Pretty decent.
I still looking for a way to probe the memcached server. Is there a way you know of ?
Looking into mixing Mysql for read-only tables and memcached
Thx for the article
Cheers
memcached on a single server
Hi folks,
Is there a good php class that you can suggest for memcached operations like
class memcacheoperations {
function insert($q) {
// insert into memcache and insert into mysql blah..
}
}
I know i can make roll my own but im too lazy in these days
Memcache class available in PECL
http://us2.php.net/manual/en/ref.memcache.php is what you're looking for.
Alternatives
Beyond the alternatives mentioned above there are several other alternatives that are commonly used in Java world such as GigaSpaces, Terracota, Tangosol to name few. Some of those alternatives provides .Net and CPP support as well.
I happen to represent GigaSpaces so i can speak on its behalf.
GigaSpaces is used in many transactional systems today in very large clusters to store Terra bytes of data. It is used in mission critical systems such as trading applications, pre-paid applications that are mission critical i.e. it is tuned to support extensive write operations not just read mostly and yes it is also used for session replication, and scaling of large websites.
A free version is available through our Community-Edition, for more information refer to this page.
Beyond that we provide a complete Platfrom that provides a solution for the entire application scalability using a scale-out server model.
You can read more on our site
http://www.gigaspaces.com
Nati S.
can Java client be used with 'C' memcached server
Can a Java client be used against the 'C' version of memcached server or do I have to run a Java version of the memcached server?
If you are using Java you
If you are using Java you should probably look at one of the pure java alternatives that i mentioned above
Nati S.
I'm currently working for a
I'm currently working for a very big company with 200000000 accounts in the DB. With 6000000 hits per hour in special days.
Java client with the C memchaced works very well.
You can use Prevayler for jav
You can use Prevayler for java.
its persistence, auto recoverable, etc.
Re: Distributed Caching with Memcached
Speaking to Tim's concern, the Linux Virtual Server contributor Li Wang has kindly implemented an open source TCP handoff implementation for the Linux kernel. If you take such source code furthur you could conceivably ensure web requests transparently go to the right servers in the first place (to retrieve the content). There are paper detailing such tactics, one such paper by Eric Van Hensbergen is located here:
http://citeseer.ist.psu.edu/vanhensbergen02knits.html
question
I use memcached to store my count data.and set
memcached -d -m 2048 -p 9876
after this i use php api to store my data.here is code:
<?php
include_once "MemCachedClient.inc.php";
$show;
$options["servers"] = array("*.*.*.*:9876");
$options["debug"] = false;
$memc = new MemCachedClient($options);
$path = "dongdong";
for ($i = 0; $i < 10000; $i ++)
{
$get = $memc->get($path);
if (!$get)
{
if ($i != 0)
{
echo "error ".$i."\n";
//continue;
break;
}
$memc->set($path, "1");
$show = 1;
}
else
{
echo $get."正确".$i."\n";
$show = $get + 1;
$memc->replace($path, $show);
}
}
?>
All is ok, but when i run to 1988, the cycle break, i tried more times.all failed.why?
here is the bug:
MemCache: replace dongdong = 10952
sock_to_host(): Failed to connect to 192.168.241.109:9876
sock_to_host(): Host 192.168.241.109:9876 is not available.
sock_to_host(): Host 192.168.241.109:9876 is not available.
sock_to_host(): Host 192.168.241.109:9876 is not available.
Re: Distributed Caching with Memcached
Interesting, but could someone please fix the missing image?
http://www.linuxjournal.com/7451f1.png does not exist!
Thank You
distributed HASHing
You might be interested in the distributed hash stuff in
Chord. I think it's related to what you're doing with memcached. You might be able to use those ideas to improve the two level hash.
Re: Distributed Caching with Memcached
Memcached's big claim is that it's faster than a database, which may well be true. But with no local caching, it certainly can't compete with a true distributed memory system like those commonly used in supercomputers. With memcached, if you have a data item which is needed on every web request, then that data item will be sent across the network from the same server on every request.
Memcached also has a lingering problem with slabs reassignment. If your application uses one particular size class heavily, and doesn't use another size class, then writes to the unused size class (when they eventually occur) will fail. The daemon can't automatically recover memory from the other slabs for use in an empty slab. Similarly, it's not a true LRU cache, the item dropped will always be an item from the same slab. The lifetime of an item in the cache is skewed due to differing amounts of memory allocated to each slab after a restart.
At Wikipedia, we've also had perennial problems with writes failing, probably due to high load from other processes on the server leading to a connection timeout. This is unfortunate when the write is an important cache invalidation operation.
Tim Starling (Wikipedia developer)
At Wikipedia, we've also
"At Wikipedia, we've also had perennial problems with writes failing, probably due to high load from other processes on the server leading to a connection timeout. This is unfortunate when the write is an important cache invalidation operation."
Tim you can use In Memory Data Grid for exactly that purpose i.e. In Memory Data Grids can be used as the system of records and therefore handle writes as well as reads and do the synchronization with your data base as a background process - I refer to this as Persistence as a Service (PaaS).
Nati S.
GigaSpaces
Write Once Scale Anywhere
in regards to the comments by Tim Starling
That's why you use something like Tangosol Coherence instead.. It has various caching topologies like near-cache, replicated, partitioned - and has distributed locking when needed. Moreover, it implements a cache-loader mechanism and can be used as a write back cache.
It's not free - and it's a Java only product. It would be nice if the memcache developers look at it's feature set and use it as a roadmap.