Linux Kernel News - June 2013
As always the Linux kernel community has been busy moving the Linux mainline to another finish line and the stable and extended releases to the next bump in their revisions to fix security and bug fixes. It is a steady and methodical evolution process which is intriguing to follow. Here is my take on the happenings in the Linux kernel world during June 2013.
Mainline Release (Linus's tree) News
Linus Torvalds released Linux 3.10. You can read what Linus Torvalds had to say about this release in his release announcement at http://lkml.indiana.edu/hypermail/linux/kernel/1306.3/04336.html
Two notable features in this release are improved SSD caching and better Radeon graphics driver Power Management.
Stable releases News
As of this writing,
- Current stable release is 3.9.8. (3.9.9-rc is released for testing
- Longterm stable releases are 3.0.84 (3.0.85-rc is released for testing), 3.2.48, and 3.4.51 (3.4.52-rc is released for testing).
- Canonical continues their extended stable maintenance. Canonical keeps the the kernel community engaged in the extended release maintenance by following the Linux development processes. As a result, these releases benefit from kernel developers contributing patches and reviewing and testing the release candidates. This in my opinion, sets Canonical apart from other distributions.
- Extended stable is 220.127.116.11. Kamal Mostafa from Canonical maintains 3.8.13.y release for Ubuntu 13.04 bug fixes and security updates.
- Extended stable is 18.104.22.168 Luis Henriques from Canonical maintains 3.5.7.y release for Ubuntu 12.10 bug fixes and security updates.
- Linux RT
- 22.214.171.124-rt36 maintained by Steven Rostedt
- 3.8.13-rt12 maintained by Sebastian Andrzej Siewior.
Please note that, bug fixes to stable releases funnel through Linus's mainline. Patches should be committed to Linus's mainline before stable release maintainers accept them into the stable releases. Patch change logs should include the mainline git commit ID.
Power efficient scheduling design
Ingo Molnar (Red Hat, x86 maintainer), Morten Rasmussen (ARM, power mgmt.), Priti Murthy (IBM, scheduler), Rafael Wysocki (Intel, Linux PM, and Linux ACPI maintainer) and Arjan van de Ven discussed the proposed power-aware or power-efficient scheduler design and what's the best way to integrate it into the kernel.
Power management and the ability to balance performance and power efficiency is important and complex. It is not just about scheduler or cpus. It spans I/O devices that transition into lower-power states and how costly it is to bring them back to fully active state when needed. There is latency involved in these transitions. As always, Linux developers reach consensus to solve complex problems such as these and come up with path to get to the goal taking small steps towards that goal. Here is another example of that process at work.
Power-efficient scheduler work has been active for a few months now. Several RFC patches have been floated and discussed. This work is being pursued very actively in x86 space by IBM and in ARM space by ARM. The premise is that, if scheduler could pack tasks on a few cores and keep these cores fully utilized and, transition other cores to low power states, when the scheduling goal is power savings over performance. In other words, instead of keeping all the cores active, scheduler could consolidate tasks on a few cores and transition other cores to low-power states for better power efficiency.
It is easier said than done. Scheduler is at a higher level and would not be the best judge of making decisions on transitioning CPUs to idle states and deciding on the ideal frequency they should be running at. These decisions are better left to platform drivers that have the specific knowledge of the platform and architecture as they are complex and very hardware specific. In other words, power aware scheduler tuned to run well on x86 platforms will not work as well or could fail miserably on ARM platforms.
Scheduler has to accomplish load balancing as well as power balancing in a way to meet performance and power goals and do it well on all platforms. A generic scheduler doesn't have to control and drive low-power state decisions on a platform. However, the goal of power-efficient scheduler is to set higher level abstracted policies that would work on all platforms. After a long and productive discussion, there is a consensus and here is the summary:
- A new kernel configuration option CONFIG_SCHED_POWER to enable/disable the power scheduler feature. Power scheduler is totally inactive, when CONFIG_SCHED_POWER is disabled, and fully active when CONFIG_SCHED_POWER is enabled. The important goal is evolving the power scheduler feature without disrupting and destabilizing the current scheduler.
- Work on a generic power scheduler with hardware and platform abstractions that will work well on big little ARM, x86, and other platforms. Avoid platform specific power policies that could lead to duplication of functionality in platform specific power drivers.
Please check the Linux Foundation site for presentations made at the Linux Collaboration Summit back in April 2013 on this topic. Here is the link to Jonathan Corbet's blog on this topic.
Recursive routines allowed in the kernel?
Recursion often makes code simple, however there is a risk associated with buggy non-terminating recursion logic. When a recursive routine goes out of control and overflows the limited kernel stack, it could overwrite random kernel memory. Which would result in disk and data corruption depending on the content of the said memory location. The moral of the story and my take away from a recent lkml discussion on an IOMMU patch is
"Use recursion when it is absolutely needed and make sure it is not buggy and stays that way."
Alternatives to dmesg error reporting mechanism
Several kernel modules use dmesg for reporting errors and other information. It is simple and easy to use mechanism and it is always there with no additional work. However, it doesn't scale well and messages could be hard to filter and scan for. There are alternatives better suited for reporting which are scalable. Tracepoint and events is one such infrastructure and modules are using this method for new events and converting dmesg based reporting to use tracepoints is also occurring at individual module levels. For certain platform modules, such as ACPI, and EFI etc. EDAC framework might be a better choice than tracepoints alone. From its humble beginnings of kernel first error handling and reporting, EDAC has evolved to include firmware first on platforms that support error detection and correction in firmware via APEI. When error detection and correction could span firmware and kernel, EDAC is a better choice as a reporting framework.
In conclusion, the Linux development process might appear chaotic and ad-hoc to casual observers. However, it is methodical and organized. With the 3.10 release out the door, it is now time to get started on integrating 3.11 content taking it from rc-1 through rc-? and continuing development work for 3.12 in parallel. 3.12 release might include the first version of the power aware scheduler, and a few more modules might switch over to using trace-point infrastructure for events and errors, and the evolution and innovation continue.
Corn image via Shutterstock.com.