Scaling dcache with RCU
Listing 2. Pathname Segment Lookup Rename-Race Resolution
1 struct dentry *
2 d_lookup(struct dentry * parent,
3 struct qstr * name)
4 {
5 struct dentry * dentry = NULL;
6 unsigned long seq;
7
8 do {
9 seq = read_seqbegin(&rename_lock);
10 dentry = __d_lookup(parent, name);
11 if (dentry)
12 break;
13 } while (read_seqretry(&rename_lock, seq));
14 return dentry;
15 }
The d_free() function must defer freeing of a given dentry until a grace period has elapsed, because any number of ongoing path walks might be holding references to that dentry. Deferment is accomplished in the d_free() function shown in Listing 3, where line 5 uses the call_rcu() primitive to defer the destructive actions in the d_callback() function until after a grace period has elapsed. The d_callback() function is shown in Listing 4; it simply frees large names stored separately (lines 5–7), if appropriate, then frees the dentry itself on line 8.
Listing 3. Deferred-Free of dentry Structures
1 static void d_free(struct dentry *dentry)
2 {
3 if (dentry->d_op && dentry->d_op->d_release)
4 dentry->d_op->d_release(dentry);
5 call_rcu(&dentry->d_rcu, d_callback, dentry);
6 }
Listing 4. RCU Callback Function for dentries
1 static void d_callback(void *arg)
2 {
3 struct dentry * dentry = (struct dentry *)arg;
4
5 if (dname_external(dentry)) {
6 kfree(dentry->d_qstr);
7 }
8 kmem_cache_free(dentry_cache, dentry);
9 }
The d_move() function implements the dentry-specific portion of the rename system call, as shown in Listing 5. Line 5 excludes any other tasks attempting to update dcache, and line 6 permits d_lookup() to determine that it has raced with a rename. Lines 7–13 acquire the per-dentry lock of the file being renamed and its destination, ordered by address so as to avoid deadlock. Lines 14–17 remove the entry from its old location in the dcache hash table, if it has not been so removed already.
Line 19 updates the dentry to point to its new hash bucket, lines 20–21 add the dentry to its destination hash bucket and line 22 updates the flags to indicate that the dentry is present in the dcache hash table. Line 24 removes the target dentry—the one being rename()ed over—from the dcache hash table, and lines 25–26 divorce the moving and target dentries from their old parents.
Line 27 changes the dentry's name, and line 28 enforces write ordering. The name change is nontrivial due to the fact that short names are stored in the dentry itself, and longer names are stored in separately allocated memory. Lines 29–32 update the name length and hash value. Lines 33–44 connect the dentry to its new parent. Finally, line 45 updates the d_move_count so __d_lookup() can detect races, and lines 46–49 release the locks.
In theory, a sustained succession of rename operations carefully designed to leave dentries in the same directory and in the same hash chain could stall indefinitely horribly unlucky lookups. One way this stall could happen is if the lookup is searching for the last element in the hash chain and the second-to-last element is renamed consistently (thus moved to the head of the list) just as the lookup got to it. In practice, dcache hash chains are short and renames are slow. If these stalls become a problem, though, it may be necessary to add code to stall renames upon path-walk failure. Another approach being considered is to eliminate the global hash table entirely in favor of modifying the d_subdirs list so as to handle large directories gracefully.
Listing 5. Renaming dentries
1 void
2 d_move(struct dentry *dentry,
3 struct dentry *target)
4 {
5 spin_lock(&dcache_lock);
6 write_seqlock(&rename_lock);
7 if (target < dentry) {
8 spin_lock(&target->d_lock);
9 spin_lock(&dentry->d_lock);
10 } else {
11 spin_lock(&dentry->d_lock);
12 spin_lock(&target->d_lock);
13 }
14 if (dentry->d_vfs_flags & DCACHE_UNHASHED)
15 goto already_unhashed;
16 if (dentry->d_bucket != target->d_bucket) {
17 hlist_del_rcu(&dentry->d_hash);
18 already_unhashed:
19 dentry->d_bucket = target->d_bucket;
20 hlist_add_head_rcu(&dentry->d_hash,
21 target->d_bucket);
22 dentry->d_vfs_flags &= ~DCACHE_UNHASHED;
23 }
24 __d_drop(target);
25 list_del(&dentry->d_child);
26 list_del(&target->d_child);
27 switch_names(dentry, target);
28 smp_wmb();
29 do_switch(dentry->d_name.len,
30 target->d_name.len);
31 do_switch(dentry->d_name.hash,
32 target->d_name.hash);
33 if (IS_ROOT(dentry)) {
34 dentry->d_parent = target->d_parent;
35 target->d_parent = target;
36 INIT_LIST_HEAD(&target->d_child);
37 } else {
38 do_switch(dentry->d_parent,
39 target->d_parent);
40 list_add(&target->d_child,
41 &target->d_parent->d_subdirs);
42 }
43 list_add(&dentry->d_child,
44 &dentry->d_parent->d_subdirs);
45 dentry->d_move_count++;
46 spin_unlock(&target->d_lock);
47 spin_unlock(&dentry->d_lock);
48 write_sequnlock(&rename_lock);
49 spin_unlock(&dcache_lock);
50 }
Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
| Trying to Tame the Tablet | May 08, 2013 |
| Dart: a New Web Programming Experience | May 07, 2013 |
- RSS Feeds
- New Products
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Drupal Is a Framework: Why Everyone Needs to Understand This
- A Topic for Discussion - Open Source Feature-Richness?
- Home, My Backup Data Center
- Developer Poll
- Dart: a New Web Programming Experience
- What's the tweeting protocol?
- New Products
Enter to Win an Adafruit Prototyping Pi Plate Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Prototyping Pi Plate Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- Next winner announced on 5-21-13!
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.




1 hour 15 min ago
3 hours 12 min ago
3 hours 30 min ago
4 hours 12 sec ago
4 hours 45 sec ago
4 hours 1 min ago
7 hours 2 min ago
15 hours 28 min ago
15 hours 33 min ago
16 hours 3 min ago