Scaling dcache with RCU
Listing 2. Pathname Segment Lookup Rename-Race Resolution
1 struct dentry *
2 d_lookup(struct dentry * parent,
3 struct qstr * name)
4 {
5 struct dentry * dentry = NULL;
6 unsigned long seq;
7
8 do {
9 seq = read_seqbegin(&rename_lock);
10 dentry = __d_lookup(parent, name);
11 if (dentry)
12 break;
13 } while (read_seqretry(&rename_lock, seq));
14 return dentry;
15 }
The d_free() function must defer freeing of a given dentry until a grace period has elapsed, because any number of ongoing path walks might be holding references to that dentry. Deferment is accomplished in the d_free() function shown in Listing 3, where line 5 uses the call_rcu() primitive to defer the destructive actions in the d_callback() function until after a grace period has elapsed. The d_callback() function is shown in Listing 4; it simply frees large names stored separately (lines 5–7), if appropriate, then frees the dentry itself on line 8.
Listing 3. Deferred-Free of dentry Structures
1 static void d_free(struct dentry *dentry)
2 {
3 if (dentry->d_op && dentry->d_op->d_release)
4 dentry->d_op->d_release(dentry);
5 call_rcu(&dentry->d_rcu, d_callback, dentry);
6 }
Listing 4. RCU Callback Function for dentries
1 static void d_callback(void *arg)
2 {
3 struct dentry * dentry = (struct dentry *)arg;
4
5 if (dname_external(dentry)) {
6 kfree(dentry->d_qstr);
7 }
8 kmem_cache_free(dentry_cache, dentry);
9 }
The d_move() function implements the dentry-specific portion of the rename system call, as shown in Listing 5. Line 5 excludes any other tasks attempting to update dcache, and line 6 permits d_lookup() to determine that it has raced with a rename. Lines 7–13 acquire the per-dentry lock of the file being renamed and its destination, ordered by address so as to avoid deadlock. Lines 14–17 remove the entry from its old location in the dcache hash table, if it has not been so removed already.
Line 19 updates the dentry to point to its new hash bucket, lines 20–21 add the dentry to its destination hash bucket and line 22 updates the flags to indicate that the dentry is present in the dcache hash table. Line 24 removes the target dentry—the one being rename()ed over—from the dcache hash table, and lines 25–26 divorce the moving and target dentries from their old parents.
Line 27 changes the dentry's name, and line 28 enforces write ordering. The name change is nontrivial due to the fact that short names are stored in the dentry itself, and longer names are stored in separately allocated memory. Lines 29–32 update the name length and hash value. Lines 33–44 connect the dentry to its new parent. Finally, line 45 updates the d_move_count so __d_lookup() can detect races, and lines 46–49 release the locks.
In theory, a sustained succession of rename operations carefully designed to leave dentries in the same directory and in the same hash chain could stall indefinitely horribly unlucky lookups. One way this stall could happen is if the lookup is searching for the last element in the hash chain and the second-to-last element is renamed consistently (thus moved to the head of the list) just as the lookup got to it. In practice, dcache hash chains are short and renames are slow. If these stalls become a problem, though, it may be necessary to add code to stall renames upon path-walk failure. Another approach being considered is to eliminate the global hash table entirely in favor of modifying the d_subdirs list so as to handle large directories gracefully.
Listing 5. Renaming dentries
1 void
2 d_move(struct dentry *dentry,
3 struct dentry *target)
4 {
5 spin_lock(&dcache_lock);
6 write_seqlock(&rename_lock);
7 if (target < dentry) {
8 spin_lock(&target->d_lock);
9 spin_lock(&dentry->d_lock);
10 } else {
11 spin_lock(&dentry->d_lock);
12 spin_lock(&target->d_lock);
13 }
14 if (dentry->d_vfs_flags & DCACHE_UNHASHED)
15 goto already_unhashed;
16 if (dentry->d_bucket != target->d_bucket) {
17 hlist_del_rcu(&dentry->d_hash);
18 already_unhashed:
19 dentry->d_bucket = target->d_bucket;
20 hlist_add_head_rcu(&dentry->d_hash,
21 target->d_bucket);
22 dentry->d_vfs_flags &= ~DCACHE_UNHASHED;
23 }
24 __d_drop(target);
25 list_del(&dentry->d_child);
26 list_del(&target->d_child);
27 switch_names(dentry, target);
28 smp_wmb();
29 do_switch(dentry->d_name.len,
30 target->d_name.len);
31 do_switch(dentry->d_name.hash,
32 target->d_name.hash);
33 if (IS_ROOT(dentry)) {
34 dentry->d_parent = target->d_parent;
35 target->d_parent = target;
36 INIT_LIST_HEAD(&target->d_child);
37 } else {
38 do_switch(dentry->d_parent,
39 target->d_parent);
40 list_add(&target->d_child,
41 &target->d_parent->d_subdirs);
42 }
43 list_add(&dentry->d_child,
44 &dentry->d_parent->d_subdirs);
45 dentry->d_move_count++;
46 spin_unlock(&target->d_lock);
47 spin_unlock(&dentry->d_lock);
48 write_sequnlock(&rename_lock);
49 spin_unlock(&dcache_lock);
50 }
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
- Dynamic DNS—an Object Lesson in Problem Solving
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Using Salt Stack and Vagrant for Drupal Development
- New Products
- A Topic for Discussion - Open Source Feature-Richness?
- Drupal Is a Framework: Why Everyone Needs to Understand This
- Validate an E-Mail Address with PHP, the Right Way
- RSS Feeds
- Readers' Choice Awards
- Tech Tip: Really Simple HTTP Server with Python
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?




1 hour 34 min ago
5 hours 10 min ago
5 hours 42 min ago
8 hours 6 min ago
8 hours 9 min ago
8 hours 10 min ago
12 hours 35 min ago
14 hours 26 min ago
19 hours 39 min ago
22 hours 51 min ago