共1条
1/1 1 跳转至页
Evaluating RTOS for DSP(老站转)
Evaluating Real-time Operating Systems for DSP - A comparison of system requirements
Hugh Griffiths, Lineo
Applications for digital signal processors (DSPs) are becoming more complicated, with the addition of system control and communications functions previously only found on microcontrollers. DSP developers need to meet these changes with up-to-date tool chain choices and commercial Real-Time Operating Systems (RTOS) optimised for DSP architectures.
Furthermore, the monolithic assembly language programs of yesterday do not meet the demands of today's time-to-market driven engineering schedules. DSP-based application software in today's market needs to be designed to run under the control of a real-time operating system (RTOS) in order to meet market demands for system flexibility, code reuse, and rapid project development & deployment.
Most RTOS are based on microprocessor or microcontroller designs, and while that's not a bad heritage, it is unreasonable to expect them to offer the performance needed for DSP applications. Moreover, the architecture itself may make it unsuitable for use in DSP applications. A wiser course is to choose an RTOS specifically designed, tuned, and adapted to the application demands of digital signal processing.
The Loop versus Multi-Tasking
The traditional DSP view of an operating system? whether a customised, proprietary design or commercial one - is a loop in which CPU control passes from one point to another in the process of looking at enabling criteria. When a process is found to be ready, the loop allows execution of the associated body of code.
In such a design, the operating loop can use a lot of machine cycles testing the enabling criteria in each pass through the loop. Therefore these designs are hand-crafted to ensure that the performance of the application is preserved while operating the loop. While they normally require a minimum amount of code space and are fast, they are often too well-tuned to the particular application, requiring code changes when the application requirements change. The result is a one-two punch that increases risk and cost, two items to avoid if at all possible.
Figure 1: Traditional DSP Design
Contrast this loop structure to the typical microprocessor/controller RTOS that uses a multitasking structure instead of a loop. In a multitasking design, the application is decomposed into several tasks. Each task runs according to a scheduling policy that invokes the operating system only when it makes a kernel call for some service.
The kernel call may run as a service routine, returning directly to the calling task. Or, it may block the calling task due to the unavailability of some resource or mutex needed by the calling task. In this case, CPU control passes to another task to run. The design allows most of the processorÕs cycles to be used by the tasks (the application) rather than most cycles used by the scheduler. The operating system only takes processor cycles when the tasks call on it.
One can readily see how classic RTOS architectures probably cannot support the performance requirements of DSP designs due to the overhead imposed by a multi-tasking architecture. Yet, as DSP processors take on more system control functions in newer designs, the capability of a loop architecture diminishes as it becomes clumsy and potentially costly. Ideally, a DSP RTOS must be able to allow both loop and multi-tasking behaviours to meet today's DSP application needs.
Figure 2: RTXCDSP Architecure
Beyond the RTOS Architecture - Checking Code Efficiency
When determining whether an RTOS has been optimised for the DSP, one of the first things to consider is the efficiency of its code. An effective RTOS for DSP can be written totally in assembler, or in a mixture of Assembler and a high-level language (such as C). In the latter case however, C compilers for DSPs must be equally tuned for handling numeric operations (such as FFTs) as well as control code (such as an RTOS).
After all, the main job of a DSP is number crunching, but the job of the RTOS is control, and some DSP compilers produce less than optimal code for handling control operations. Compiler tools such as Green Hills and Metrowerks make significant strides optimising both characteristic in a single code generator.
Cosmetic Changes Do Not Meet DSP Needs
Beware of RTOS that merely take a microprocessor/ controller RTOS and recompile it for a DSP! While this will produce something that works for control applications, it's highly probable that it won't meet the performance needs of a DSP application.
It takes some lengthy work by the RTOS developer to make a microprocessor RTOS operate effectively in a DSP-based system. Consider that DSP applications often produce high interrupt frequency, requiring servicing and processing responses having a low latencies.Moreover, data is often block oriented, requiring that it be moved from producer to consumer in an efficient manner with little or no copying. It can also require a good deal of work by the application developer to re-work, modify, or otherwise optimise control code which DSP compilers don't handle well, and, if necessary, redesign it.
For example, in porting Lineo's microcontroller RTOS, RTXC, to Motorola DSPs, it was initially necessary to add post-compile processes to locate inefficient code produced by the DSP compiler and replace it with alternative code sequences more efficient for the operations of the RTOS. The result was a code space reduction of approximately 20% compared to the code emitted by the compiler, with a corresponding reduction in kernel service execution times. More mature compilers have eliminated the need to take such extraordinary steps, but the example serves to illustrate the lengths to which it is often necessary to go, in order to make an RTOS work well in the world of DSP.
With today's applications taxing DSP processors running at clock speeds in the hundreds of megahertz, it is not inconceivable that there will be fresher approaches to the problem of RTOS designs that can accommodate the seemingly disparate requirements that stem from digital A signal processing and control processing. That is the approach Lineo took when designing its new RTOS for DSP, RTXCDSP.
Table1: Process Scheduling
Schedule DSP process from Control Process
225 cycles
Schedule DSP process from an ISR
63 cycles
Schedule DSP process from another DSP process
82 cycles
Schedule a control process from another control process
354 cycles
RTXCDSP incorporates a dual process scheduling model to meet the needs of both DSP and control applications. Table 1 shows a comparison of operation times required to schedule a DSP process and a control process on a Motorola StarCore processor.
The results clearly show that the RTOS operations to schedule the DSP process from an interrupt service routine or from another DSP process require much fewer cycles, translating into faster response times for those processes than their control process counterparts.
Don't Ask the Wrong Question
While we have broached the subject, the next question to pursue concerns performance. Unfortunately, DSP developers typically ask the wrong performance questions, such as "How much overhead (in MIPS or percentage of CPU loading) does the RTOS take?"
This question stems partly from a mistaken notion of how today's operating systems function. As we saw earlier, the traditional DSP view of an operating system is a loop that runs continuously while looking for some process to perform. When the need arises it calls a routine associated with the need and executes the associated code. When the routine is complete, the loop continues looking for another processing need.
A typical microprocessor/controller RTOS uses a multitasking structure in which tasks get control of the CPU according to some scheduling policy implemented by the RTOS Scheduler. The normal mode, in contrast to the loop method, is for the CPU to be occupied with application processes. The RTOS kernel call may run as a service routine, returning directly to the calling application process (task). Or, the kernel may block the calling task due to the unavailability of a resource needed by the calling task, in which case the scheduler will pass CPU control to another task to run.
Most of the processor's cycles are used by the tasks. The operating system only takes processor cycles when the tasks call on it. So, the RTOS' overhead isn't a fixed value, it depends on how much the application uses RTOS services. Therefore a better measure of RTOS overhead is the operational times (or cycles) of the kernel services used by the application.
Measure the Time that is Important
Another important consideration is the operating system's interrupt-response time and the time required to switch from one process to another. Many DSP-based systems depend on interrupts to signal when data is to be gathered from or distributed to the outside world, and perform the number-crunching as fast as possible the rest of the time. In contrast, a microprocessor/ controller RTOS has comparatively much fewer interrupts. Calculating the response and process switch times lets you determine how many interrupts per second the RTOS can handle at a reasonable processor load.
Of particular importance is the time required to save and/or restore the processor's context. During a context switch in multitasking, the RTOS stores the state of the current task, including the contents of registers, so that it can resume the task later and restore those registers of the task that has control of the CPU.
A typical DSP has many registers that must be saved and restored during a full context switch. Motorola's MSC8101 for example, has about 68 different registers (about 272 bytes) while Motorola's DSP56300 family has almost 40 different registers but also an on-chip stack that requires some manipulation. As a worst case, saving all these registers during an interrupt or a context switch can quickly eat up a measurable percentage of processor cycles if steps aren't taken to mitigate the effects.
On a DSP56307 running at 100 MHz with an interrupt load of 25,000 interrupts per second, saving and restoring all 40 registers consumes about 2 percent of the available processor cycles. (80*25,000*100/100,000,000). On a MSC8101 running at 300 MHz, the number is about half that of the DSP56307, or 1.13%.
Since it's often necessary to do a full context switch while executing control code, the efficiency of the save/restore logic has a great effect on overall system throughput. The optimal configuration of registers to save and restore, and the method to do so, will vary with processor architecture and even applications.
Clearly, the DSP application that can't afford time spent in the save/restore cycle required in a multitasking RTOS will benefit greatly from an RTOS design that does not require the save/restore cycle, or at least minimises its effect.
Figure 3: RTXCDSP Architecture
Architecture impact
Processor architectures can have other impacts on the operating system besides the number of registers that need saving. The use of separate I/O and code memory spaces, segmentation of memory into multiple data spaces, and handling of program stacks are all DSP architectural features that the RTOS must accommodate. It isn't uncommon for a DSP processor to have specialised hardware stack support using very fast internal memory. While such stack architectures can be efficient for traditional loop-based software designs, they're challenging to the developer of multi-tasking RTOS who must write efficient stack swapping code necessary for a modern RTOS using preemptive scheduling.
DSPs are indeed different from the typical microprocessor or microcontroller, and the DSP RTOS must adapt and accommodate the architectural quirks of the DSP processor. These quirks are part of the DSP's architecture and usually exist to make certain operations very efficient. But without the RTOS developer's careful attention to that level of detail, those efficiencies may be lost.
Let it Flow
Another efficiency feature to consider is how the RTOS handles block data transfers - an essential task in many DSP applications.Typically, a system works by gathering a block of data and then moving it through transform operations, converting integer data to floating-point data, filtering, or compression, for example, before performing the main number crunching.
In handling these transform operations, the RTOS should provide facilities to allow the application touch the data blocks as little as possible, and with very low overhead. An RTOS that creates unnecessary copies of the data blocks to pass from process to process is a real performance-killer. One way of handling blocks much more efficiently is to offer pipes to flow data from one process to another, while being careful to note the processor's memory-handling schemes that can impact this method.
Handling Legacy Code Effectively
Beyond raw performance, one of the most important features to consider is how the RTOS will handle legacy code that's present in many DSP designs. Legacy code includes routines previously developed as well as canned routines available for purchase from a vendor. There will be some 'clean sheet' projects that don't need to worry about legacy code, but many will.
The trouble with a lot of legacy code is that it was developed with the loop RTOS model in which the code simply runs until it's done, without provision for other operations that may be going on. With a modern, multitasking operating system the legacy code's operation is subject to suspension or pre-emption based on interrupts or kernel resource utilisation. Consequently, loop-based legacy code will not port easily to a pure multitasking RTOS system even if the RTOS is optimised for DSP designs.
For example, an operating system for DSP might need to provide a mechanism for protecting the operation of legacy code that has no built-in RTOS functions. Or, it could offer alternative ways of handling legacy modules, such as providing an additional, low-overhead scheduling mechanism for high-priority tasks. It may even be necessary to revise the legacy code to add services that permit its operation under the RTOS.
It Works
With so many concerns about using an RTOS in a DSP-based project, it may seem that it's not worth the effort. That feeling is especially common among experienced DSP developers, many of whom are charter members of 'The Society to Save a MIP.' It's true that an RTOS requires some amount of the available processor cycles, but that doesn't mean that a modern RTOS has no future in DSP applications.
Consider the situation faced by microprocessor and microcontroller application developers ten years ago. They, too, had similar concerns about the utility and need for an RTOS. But as, projects became too complicated to finish under tight time constraints, without looking at the development process in a new light, they gradually shifted to modern custom and commercial RTOS designs to simplify the process improve maintainability and evolution, and achieve a high degree of code reuse.
The DSP world today faces the same issues. The real essence of the decision to use an RTOS for digital signal processing is, 'Does it do the job?' If it provides the needed capabilities, fit the processor to the application by adding the MIPS needed to tackle the extra overhead. The software shouldn't care because it is the RTOS' job to mask the hardware from the application code. It may mean giving up cherished beliefs about DSP application design, but it will make you richer as your projects become more successful.
关键词: Evaluating 老站转 system con
共1条
1/1 1 跳转至页
回复
有奖活动 | |
---|---|
【有奖活动】分享技术经验,兑换京东卡 | |
话不多说,快进群! | |
请大声喊出:我要开发板! | |
【有奖活动】EEPW网站征稿正在进行时,欢迎踊跃投稿啦 | |
奖!发布技术笔记,技术评测贴换取您心仪的礼品 | |
打赏了!打赏了!打赏了! |
打赏帖 | |
---|---|
vscode+cmake搭建雅特力AT32L021开发环境被打赏30分 | |
【换取逻辑分析仪】自制底板并驱动ArduinoNanoRP2040ConnectLCD扩展板被打赏47分 | |
【分享评测,赢取加热台】RISC-V GCC 内嵌汇编使用被打赏38分 | |
【换取逻辑分析仪】-基于ADI单片机MAX78000的简易MP3音乐播放器被打赏48分 | |
我想要一部加热台+树莓派PICO驱动AHT10被打赏38分 | |
【换取逻辑分析仪】-硬件SPI驱动OLED屏幕被打赏36分 | |
换逻辑分析仪+上下拉与多路选择器被打赏29分 | |
Let'sdo第3期任务合集被打赏50分 | |
换逻辑分析仪+Verilog三态门被打赏27分 | |
换逻辑分析仪+Verilog多输出门被打赏24分 |