Application Defined Processors
DEL-based processors could run Linux, but do they need to? Code segments within the Linux kernel certainly might benefit in performance from running on a DEL processor, and applications within the Linux distributions also could achieve higher performance. However, the role of an operating system, and the kernel in particular, is to manage the hardware such that applications achieve their required performance levels. In other words, the OS is supposed to stay out of the way and let applications consume the hardware.
Applications do a lot more than intense computation. They interact with users, read and write files, display results and communicate with the world through Internet connections. Thus, applications require both computational resources and the services of an operating system. Heavy computation with high parallelism benefits from DEL processors. Although serial code could run as DEL, it is best serviced in a traditional microprocessor.
The best combination of hardware for running most applications is a mix of microprocessor and DEL processors. This combination allows applications to achieve orders of magnitude performance gains while still running in a standard Linux environment with all of the OS services and familiar support tools. The portion of an application that is predominantly sequential or that requires OS services can run in a traditional microprocessor portion of a system, while applications and even portions of the OS that benefit from the DEL parallelism run on a closely coupled DEL processor.
SRC has created systems that are composed of DEL processors and microprocessors. SRC systems run Linux as the OS, provide a programming environment called Carte for creating applications composed of both microprocessor instructions and DEL, and support microprocessor and DEL processor hardware in a single system.
The patented MAP processor is SRC's high-performance DEL processor. MAP uses reconfigurable components to accomplish control and user-defined compute, data prefetch and data access functions. This compute capability is teamed with very high on- and off-board interconnect bandwidth. MAP's multiple banks of dual-ported On-Board Memory provide 11.2GB/sec of local memory bandwidth. MAP is equipped with separate input and output ports with each port sustaining a data payload bandwidth of 1.4GB/sec. Each MAP also has two general-purpose I/O (GPIO) ports, sustaining an additional data payload of 4.8GB/sec for direct MAP-to-MAP connections or data source input. Figure 3 presents the block diagram of the MAP processor.
The Dense Logic Devices (DLDs) used in these products are the dual-processor Intel IA-32 line of microprocessors. These third-party commodity boards are then equipped with the SRC-developed SNAP interface. SNAP allows commodity microprocessor boards to connect to, and share memory with, MAPs and Common Memory nodes that make up the rest of the SRC system.
The SNAP interface is designed to plug directly in to the microprocessors' memory subsystem, instead of its I/O subsystem, allowing SRC systems to sustain significantly higher interconnect bandwidths. SNAP uses separate input and output ports with each port currently sustaining a data payload bandwidth of 1.4GB/sec.
The intelligent DMA controller on SNAP is capable of performing complex DMA prefetch and data access functions, such as data packing, strided access and scatter/gather, to maximize the efficient use of the system interconnect bandwidth. Interconnect efficiencies more than ten times greater than a cache-based microprocessor using the same interconnect are common for these operations.
SNAP either can connect directly to a single MAP or to SRC's Hi-Bar switch for system-wide access to multiple MAPs, microprocessors or Common Memory.
System-level configurations implement either a cluster of MAPstations or a crossbar switch-based topology. Cluster-based systems, as shown in Figure 4, utilize the microprocessor and DEL processor previously discussed in a direct connected configuration. Although this topology does have a microprocessor-DEL processor affinity, it also has the benefit of using standards-based clustering technology to create very large systems.
When more flexibility is desired, Hi-Bar switch-based systems can be employed. Hi-Bar is SRC's proprietary scalable, high-bandwidth, low-latency switch. Each Hi-Bar supports 64-bit addressing and has 16 input and 16 output ports to connect to 16 nodes. Microprocessors, MAPs and Common Memory nodes can all be connected to Hi-Bar in any configuration as shown in Figure 4. Each input or output port sustains a yielded data payload of 1.4GB/sec for an aggregate yielded bisection data bandwidth of 22.4GB/sec per 16 ports. Port-to-port latency is 180ns with Single Error Correction and Double Error Detection (SECDED) implemented on each port.
Hi-Bar switches also can be interconnected in multitier configurations, allowing two tiers to support 256 nodes. Each Hi-Bar switch is housed in a 2U-high, 19-inch wide rackmountable chassis, along with its power supplies and cooling solution, for easy inclusion into rack-based servers.
SRC servers that use the Hi-Bar crossbar switch interconnect can incorporate Common Memory nodes in addition to microprocessors and MAPs. Each of these Common Memory nodes contains an intelligent DMA controller and up to 8GBs of DDR SDRAM. The SRC-6 MAPs, SNAPs and Common Memory node (CM) support 64-bit virtual addressing of all memory in the system, allowing a single flat address space to be used within applications. Each node sustains memory reads and writes with 1.4GB/sec of yielded data payload bandwidth.
The CM's intelligent DMA controller is capable of performing complex DMA functions such as data packing, strided access and scatter/gather to maximize the efficient use of the system interconnect bandwidth. Interconnect efficiencies more than ten times greater than a cache-based microprocessor using the same interconnect are common for these operations.
In addition, SRC Common Memory nodes have dedicated semaphore circuitry that also is accessible by all MAP processors and microprocessors for synchronization.
|Designing Electronics with Linux||May 22, 2013|
|Dynamic DNS—an Object Lesson in Problem Solving||May 21, 2013|
|Using Salt Stack and Vagrant for Drupal Development||May 20, 2013|
|Making Linux and Android Get Along (It's Not as Hard as It Sounds)||May 16, 2013|
|Drupal Is a Framework: Why Everyone Needs to Understand This||May 15, 2013|
|Home, My Backup Data Center||May 13, 2013|
- New Products
- Linux Systems Administrator
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- Designing Electronics with Linux
- Dynamic DNS—an Object Lesson in Problem Solving
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Using Salt Stack and Vagrant for Drupal Development
- Nice article, thanks for the
7 hours 36 min ago
- I once had a better way I
13 hours 22 min ago
- Not only you I too assumed
13 hours 40 min ago
- another very interesting
15 hours 33 min ago
- Reply to comment | Linux Journal
17 hours 26 min ago
- Reply to comment | Linux Journal
1 day 20 min ago
- Reply to comment | Linux Journal
1 day 36 min ago
- Favorite (and easily brute-forced) pw's
1 day 2 hours ago
- Have you tried Boxen? It's a
1 day 8 hours ago
- seo services in india
1 day 12 hours ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi
It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?