poignant: Deal with multi-core programming revolution

Multi-core revolution
2001 , IBM introduced dual-core Power4 processors based; then Sun and HP have introduced dual-core architecture based on the UltraSPARC IV and PA-RISC8800 processors. But RISC processor for high-end applications highbrow, there is no concern of the masses can cause. until the second quarter of 2005, Intel released the X86-based desktop dual-core processor, multi-core only from ordinary people into the house.
in today's multi-core processors have accounted for more and more and more market share, as first-line programming staff, we must confront the impact of multi-core revolution. multicore programming, both opportunities and challenges, and how great changes in the industry to grasp the direction of the times, be placed in our before pressing issue. because from single core to multi-core processor clock frequency does not increase as that is transparent to the programmer, if our program is not prepared to design for multi-core features, it can not be multi-core The performance enhancements. in the inter-face of the Warring States era, we have little choice, can learn from past development experience?
Yes, most of mankind is the ability to draw on the skills of the great experiences before. We should learn from previous experience, active learning parallel programming skills at the same time carefully verify the actual work, prudent action. multicore, especially dual-core, and dual-SMP (symmetric multi-processor) architecture is very similar:
Figure 1Intel and AMD's dual-core CPU structure diagram can be seen from Figure 1
While Intel and AMD dual-core technology is different, but still can be found is the so-called dual-core processors integrate two computing cores on a single processor. This integrated on a motherboard with two in the dual-processor SMP systems are quite similar, the difference is only two dual-core systems to exchange data between computational core does not need to front-end system bus (FSB), and two-way system through the FSB processors to exchange data, which is when we write programs in a small detail to note.
as for SMP programming, programming for multi-core processors must use multiple threads or multiple processes to write an application form can be bring multi-core performance. SMP parallel programming can be seen on our experience can be applied to the most up multi-core programming.
change
multi-core programming era, to our thinking has brought great programming impact. In order to take full advantage of multi-core performance, we must learn to block the design process of thinking, a multi-process or thread in the form of programming. in the end should be using multi-process or multi-threaded programming is the most form your program One of the problems members of confused, I think it needs to determine specific applications; but usually use multiple threads than with multi-process multi-core programming to have an advantage:
A) thread creation and switching overhead than process smaller.
B) multi-way communication between threads and simple and more efficient.
C) the basis of multi-threading a voluminous library support.
D) multi-threaded multi-process procedures than the procedures easier to understand and modify.
In addition to programming in the form, we use multi-threaded programming, motivation has changed. In the past, for Windows programmers, one of the main reasons for using multiple threads is to improve the user experience: such as increase in the calculation of long UI, I / O or network response speed. multicore era in order to write applications that we take advantage of multiple computing cores, reduce the computational time or more calculated in the same period of time the task. If programming in the game the way through the multi-threaded calculation of the collision detection across multiple CPU cores can greatly reduce the computational time; also can do more detailed testing of multi-core computing, which can simulate more realistic collisions.
in multi-core era, we have the choice of programming language is more cautious. Although the contents of this section is a personal opinion but it is indeed worthy of system development, game development or even to discuss Web development programmer. No matter what kind of project development, relative to C / C + + / Fortran and other compiled languages, C # / java / Python and other scripting languages may be a better choice. The reason is that more advanced scripting languages, generally provides native support for multithreading; such as C #, System.Threading.Thread, java's java.lang.Thread and Python's Threading.Thread. In contrast, compiled languages are often related to the library through the platform to provide multi-threading support, such as Win32 SDK, POSIX threads and so on. there is no uniform standard, resulting in use of C / write multi-threaded C + + programs need to consider more detail, increase the project cost. From now on view, C / C + + Although many users, but in the multicore era will be more popular scripting language, because U-turn ah small boat, the script ISO standard language generally do not, that change can change, and soon there will be an interpreter for the multi-core and compiler of the. but PHP / Ruby / Lua scripting language, etc. will be more difficult to obtain favor with the multicore programmers mm because they do not provide kernel-level thread support, which is multi-threaded does not support user-level thread even with them in the preparation of multi-threaded program still not able to exploit multi-core advantage.
Table 1 language support for multi-threaded Comparison
C / C + + and other compiled languages
C # / java / Python and other scripting
PHP / Ruby / Lua and other scripting languages support multithreading

whether the library has to support multi-threaded
is whether to support the kernel-level thread
is whether to support the user-level thread

can simulate
can simulate the complexity of thread programming is

General / Apt N / A
Recommended degree
Although C / C + + in a multi-threaded programming because there is no linguistic level to provide support and lost some advantage; but because of the current mainstream operating systems to C language interface provided to create thread API, and C / C + + there are very rich library, also to some extent compensate for language deficiencies. using C / C + + writing multi-threaded program can not only use the Win32 SDK, you can also use POSIX threads, MFC and boost.thread and so on. Although these libraries provide a degree of packaging, reducing the burden on the programmer for multi-threaded, but targeting on improving the performance of compute-intensive process of multi-core programmers, these methods are still too complex. because of the use of these libraries to increase almost double key code, corresponding to the cost of debugging and testing has greatly increased. better choice should be the use of OpenMP compiler that by strengthening the foundation to support multi-threaded library. OpenMP compiler by using the # pragma directives to specify parallel code segment, relatively few changes to the program; and compile the serial version can be specified to facilitate the debugging, but can not support the coexistence of OpenMP compiler.
visible even in the language level scripting language provides a multi-threaded programming native support, but not more than C / C + + leading far. fundamental reason is that a scripting language based mm data structure and algorithm based library with CRT / STL and other C / C + + based library then the same is the serial form of design development. to modify the basis for multi-core programming library that almost all programming languages have to face the immediate need is to pull the leading edge of the Shengsizhizhan two camps, and concentration of ownership in a company or organization's C # / java / Python scripting languages such small boat U-turn, is expected to win this crucial battle. This is what I recommend in the above script to write programs choose to use one of the reasons.
multicore programming
As time goes on, We will eventually need to face the multi-core system design process. multicore programming I think basically the same as shared memory parallel programming, multicore programming can learn from past experiences mm parallel programming, such as block design thinking, design methodology and a variety of parallel way parallel support.
First we talk about the sub-block design thinking. because the operating system thread is allocated the smallest unit of CPU resources, so if you want to design multi-core parallel program, then we will block the formation of the program design thinking. Remember junior high school textbook, Mr. Hua's What happened was: there is no water; kettle to wash, to wash the cup teapot; Huo-sheng, and tea also. how to do?
way to A: a good bottle washing, filling the cold water, the fire ; the time waiting for open water, the wash pot, wash the cup, took tea; wait while the water, and tea to drink.
way to B: first do some preparation work, wash bottle, wash the pot of tea Cup, to take tea; everything is ready, boil water irrigation; sit around waiting to open a tea to drink water.
way to C: Wash bottles, filling the cold water, the fire, sit around waiting to open water; water was opened, the emergency busy quickly find tea, wash the cup teapot, tea drink.
Which way to save time? we could tell a good first approach, the latter two approaches are nest workers.
assume China There are two robots to give him the old tea to drink, then best way is obviously in accordance with the Look, inadvertently, we applied the thinking mm to block the transaction is not related to different processors separate the execution. As another frequently encountered in our work examples: there are sequences of data type T, A, find sequences the value of K equal to the number of elements. to achieve this functionality as C + + function:
Code 1 and Statistics sequence number of elements in the value of K
template
size_t Count (const T & ; K, const T * pA, int num)
{
size_t cnt = 0;
for (int i = 0; i if (pA [i ] == K) + + cnt;
return cnt;
}
statistical series from the code a number of elements in the value of K is evident Count (k, p, n) = Count (k, p, n / 2) + Count (k, p + n / 2, nn / 2), the sequence value is equal to the number of elements K for the first half of the value of K, the number of elements in the second half of the value added K is equal to the number of elements. If we open the two threads, one first half statistics (executive Count (k, p, n / 2)), the other half of a statistic (the implementation of Count (k, p + n / 2, nn / 2)), then the dual-core systems, we will be able to save half the running time (ignoring the cost of generating other thread).
block of thinking is more straightforward, if it is a complex task, not can easily identify the block of the program, so the concurrent design methodology needed to guide us. After decades of parallel programming research, has concluded a number of previous effective concurrent design method presented here is a classic Methods: Data related to Fig. still 2 There is no data related to map from one task to another task of the path, then the two tasks are not related to parallel processing. If the old Chinese tea to drink their own hands, that Figure 2 imaginary part of the red box is parallel; and if the old Chinese tea for two robots to help him, but not less than two taps for the robot to use the green part of the virtual box can be made in parallel and can more efficient. see that the more rational use of resources, the higher the ratio of the parallel acceleration.
in the data related to the figure, if there are tasks related to the different sets of data elements the same operation, we call The data related to the performance of the data parallelism. as in scientific computing would be a regular N-dimensional vector is multiplied by a real value:
for (int i = 0; i v [i] *= r;
If you have N processors, then this N times with the iterations of data parallelism can be executed simultaneously. In addition to data parallelism, and if there are tasks related to the set of data different operations in different elements, the performance of the functional parallelism. there is a simple shape of the data path or chain of related plans means that a single problem in parallel does not exist, but if a number of issues need to be addressed, and each issue can be divided into several stages, with stage that can support the same number of parallelism, this is called pipelining. on the functional parallelism and pipelining, because of space relations, can not be described here, interested readers can access in parallel programming books.
block both the thinking, but also parallel programming methodology as a guide, and now we are missing how to develop the parallel program. current popular thinking of the following three:
a , extending the compiler. parallelizing compiler development to enable them to discover and express language of the existing serial program parallelism, such as Intel C + + Compiler have an automatic parallel operation cycle and the function of the quantitative data. This the parallel The method left to the compiler to write parallel programs while lowering the cost, but because the control statements such as loops and branching complex combination, the compiler does not recognize a lot of errors in parallel code and compiled into the serial version.
Second, the expansion of the serial programming language. This is the most popular way, by increasing the function call or command to represent the low-level language compiler for parallel programming. and end users to create parallel processes or threads, and to provide synchronization and communication of the functionality and so on. this is more prominent with MPI and OpenMP libraries, etc.; in the interpreted scripting camp, ParallelPython also attracted many fans.
three, creating a parallel language. Although this is a crazy idea, but In fact in recent decades have been those who do such a thing, such as HPF (High Performance Fortran) is a Fortran90 extensions, with a variety of ways to support data parallel programming.
multi-core programming in a later trip, we will found that only a drop in the ocean as described above, parallel computing has more knowledge worth learning, but also more room for us to realize their ideas.

old wine in new bottles are becoming mainstream though multi-core CPU, but after all not long, and now most of the applications are developed in a single nuclear age, those older programs how to shine in the new environment its new glory? Here I give my some ideas:
1) accurately assess the need to amend the old procedure. If Foxmail, Windows optimization of desktop software like the master had to take only minimal CPU resources, you do not need to be rewritten for multi-core. and as a web server running the CGI program is basically are based on multi-process or multi-threaded approach to respond to the request, will be able to smoothly take full advantage of the performance advantages of multi-core systems, in general do not need to be rewritten for multi-core.
2) to avoid the light weight. An application performance bottlenecks are usually only a few or even one or two related features of these bottlenecks, or rarely use the. so little demand for these procedures have been injured on the moving bone Zhu transformation is not appropriate, but not to multi-threaded architecture to rewrite the entire application. If the application is to use C / C + + / Fortran written that use OpenMP to performance bottleneck piece of code parallelization is a very good choice. If the code is to use C # / java / Python and other scripting prepared to spend more effort may be required to complete the same job.
3) do not chase trends. In short, if the old application is not performance bottlenecks, do not make any changes, otherwise it will backfire. like storm audio, video, TTPlayer multimedia player software in this category, for multi-core optimization is to do things can not do; but if you do, users may feel too resource-intensive but, because of the change of the dual-core system also think that playing video / audio time to do other things is still a little , should not have to dual-core and dual-core, to maintaining the status quo.
written in the last
multi-core era, will bring huge changes in programming will, which I have a few suggestions:
First, parallel computing has been a considerable number of researchers have made the foundation for decades of work, there is considerable knowledge to learn and use. We should learn a complex, simple, complex, MPI also want to know if understanding But the more time the application as simple as possible, as indicated in the code a statistical series is a function of the number of elements K, a better parallel is to use OpenMP:
code 2 using the OpenMP parallel
/ / with minor
size_t cnt = 0;
# pragma omp parallel for reduciotn (+: cnt)
for (int i = 0; i / / & c.
simply add a line of source code to achieve parallelism, not only than the use of Win32 SDK / PThreads much less code to create threads and easier to maintain.
II, such as non-essential, not parallel. All along, we are accepted serial programming, education, and most programmers are used to write serial programs. Even if we were to learn parallel programming, practice and still a bunch of people running around in circles will inevitably cause trouble. Therefore, in practical projects at this stage non-essential, not parallel; more appropriate way is to first be familiar with the non-core business of parallel programming, and then there is the need for practical operation part of the work.
three parallel optimization can be as a last means. know When using the parallel with the know how to write parallel code as important. If you make every effort to optimize the program still can not let you after the boss, customer satisfaction, you can try some of the parallel performance bottlenecks, as the optimization of the final choice.

poignant

Thursday, March 3, 2011

Deal with multi-core programming revolution

No comments:

Post a Comment