From Wikipedia, the free encyclopedia
Operating systemCross-platform
LicenseGNU General Public License

MEncoder is a free command line video decoding, encoding and filtering tool released under the GNU General Public License. It is a sibling of MPlayer, and can convert all the formats that MPlayer understands into a variety of compressed and uncompressed formats using different codecs.[1]

MEncoder is included in the MPlayer distribution.




As it is built from the same code as MPlayer, it can read from every source which MPlayer can read, decode all media which MPlayer can decode and it supports all filters which MPlayer can use. MPlayer can also be used to view the output of most of the filters (or of a whole pipeline of filters) before running MEncoder. If the system is not able to process this in realtime, audio can be disabled using -nosound to allow a smooth review of the video filtering results.

It is possible to copy audio and/or video unmodified into the output file to avoid quality loss because of re-encoding. For example, to modify only the audio or only the video, or to put the audio/video data unmodified into a different container format.

Since it uses the same code as MPlayer, it features the same huge number of highly-configurable video and audio filters to transform the video and audio stream. Filters include CroppingScaling, Vertical Flipping, horizontal mirroring, expanding to create letterboxes, rotating, brightness/contrast, changing the aspect ratiocolorspace conversion, hue/saturation, color-specific Gamma correction, filters for reducting the visibility of compression artifacts caused by MPEG compression (deblocking, deringing), automatic brightness/contrast enhancement (autolevel), sharpness/blur, denoising filters, several ways of deinterlacing, and reversing telecine.

[edit]Frame rate conversions and slow-motion

Changing the frame rate is possible using the -ofps or -speed options and by using the framestep filter for skipping frames. Reducing the frame rate can be used to create fast-motion "speed" effects which are sometimes seen in films.

Doubling the frame rate of interlaced footage without duplicating or morphing frames is possible using the tfields filter to create two different frames from each of the two fields in one frame of interlaced video. This allows playback on progressive displays, while preserving the full resolution and framerate of interlaced video, unlike other deinterlacing methods. It also makes the video stream usable for framerate conversion, and creating slow-motion scenes from streams taken at standard video/TV frame rates, e.g. using cheap consumer camcorders. If the filter gets incorrect information about the top/bottom field order, the resulting output will have juddering motion, because the two frames created would be displayed in the wrong order.

[edit]See also

  • MPlayer, the media player built from the same source code as MEncoder
  • FFmpeg, similar to MEncoder
  • MediaCoder, a media transcoding application for Windows OSs, uses MEncoder as one of its backends
  • Transcode, a command-line transcoding application for Unix-like OSs
  • MPlayer Wikibook- almost all decoding-related and filtering arguments are shared with mencoder
  • RetroCode, a universal mobile content encoder/decoder


  1. ^ MPlayer and MEncoder Status of codecs support, Retrieved on 2009-07-19

[edit]External links

저작자 표시

'소스코드' 카테고리의 다른 글

VideoLAN VLC 설치되는 작업  (0) 2012.07.13
Neural Network for Recognition of Handwritten Digits  (0) 2012.07.13
MEncoder  (0) 2012.07.12
Microsoft DirectX 9.0 DirectShow한글 설명  (0) 2012.07.12
Windows Development  (0) 2012.07.12
Hilo: Developing C++ Applications for Windows 7  (0) 2012.07.12

Microsoft DirectX 9.0 DirectShow한글 설명



DirectShow   [목차열람] [주소복사] [슬롯비우기]

Microsoft DirectX 9.0


Microsoft® DirectShow® 애플리케이션 프로그래밍 인터페이스는, Microsoft® Windows® 플랫폼전용의 미디어 스트리밍 아키텍처이다. DirectShow 를 사용하면 애플리케이션으로 고품질인 비디오나 오디오의 재생과 캡춰를 실시할 수가 있다.

DirectShow 의 문서는, 다음 주제로 나누어져 있다.

저작자 표시

Windows Development



Windows Development

193 out of 311 rated this helpful - Rate this topic

This documentation provides info about developing applications and drivers for the Windows operating system. The Win32 and COM application programming interface (API) were designed primarily for development in C and C++, and support development for both 32- and 64-bit Windows. For more info, see these Dev Centers: Windows Desktop Development and Windows Hardware Development.

The .NET Framework also provides programming interfaces for developing Windows applications, components, and controls. These programming interfaces can be used with a variety of programming languages, including Visual Basic, C#, and C++. For more info, see the .NET Development documentation.

You can also develop Metro style apps. For more info, see Metro style app development.

In this section

저작자 표시

'소스코드' 카테고리의 다른 글

MEncoder  (0) 2012.07.12
Microsoft DirectX 9.0 DirectShow한글 설명  (0) 2012.07.12
Windows Development  (0) 2012.07.12
Hilo: Developing C++ Applications for Windows 7  (0) 2012.07.12
How to Write a Windows Application with Hilo  (0) 2012.07.12
Windows SDK Archive  (0) 2012.07.12

Hilo: Developing C++ Applications for Windows 7

Hilo: Developing C++ Applications for Windows 7

10 out of 13 rated this helpful - Rate this topic

"Hilo" is a series of articles and sample applications that show how you can leverage the power of Windows 7, using the Visual Studio 2010 and Visual C++ development systems to build high performance, responsive rich client applications. Hilo provides both source code and written guidance to help you design and develop Windows applications of your own.

The series covers many topics, including the key capabilities and features of Windows 7, the design process for the user experience, and application design and architecture. Source code is provided so that you can see firsthand how the accompanying sample applications were designed and implemented. You can also use the source code in your own projects to produce your own rich, compelling applications for Windows 7. The Hilo sample applications are designed for high performance and responsiveness and are written entirely in C++ using Visual C++.

These articles describe the design and implementation of a set of touch-enabled applications that allow you to browse, select, and work with images. They will illustrate how to write applications that leverage some of the powerful capabilities that Windows 7 provides. You will see how the various technologies for Windows 7 can be used together to create a compelling user experience.

In This Section

Topic Description

Chapter 1: Introducing Hilo

The first Hilo sample application—the Hilo Browser—implements a touch-enabled user interface for browsing and selecting photos and images.

Chapter 2: Setting up the Hilo Development Environment

This article outlines how to set up a workstation for the development environment so that you can compile and run the Hilo Browser sample application.

Chapter 3: Choosing Windows Development Technologies

This article describes the rationale for the choice of the development technologies used to implement the Hilo applications.

Chapter 4: Designing the Hilo User Experience

This article describes the process and thoughts when developing the Hilo User Experience.

Chapter 5: The Hilo Common Library

This article introduces the Hilo Common Library, a lightweight object orientated library to help to create and manage Hilo-based application windows and handle messages sent to them.

Chapter 6: Using Windows Direct2D

This article describes how hardware accelerated Direct2D and DirectWrite are used in the Hilo sample application.

Chapter 7: Using Windows Animation Manager

This article explores the Windows 7 Windows Animation Manager, that handles the complexities of image changes over time.

Chapter 8: Using Windows 7 Libraries and the Shell

Files from many different locations can be accessed through a single logical location according to their type even though they are stored in many different locations.

Libraries are user defined collections of content that are indexed to enable faster search and sorting. Hilo uses the Windows 7 Libraries feature to access the user’s images.

Chapter 9: Introducing Hilo Annotator

This article describes the Hilo Annotator application, which allows you to crop, rotate, and draw on the photographs you have selected. Hilo Annotator uses the Windows Ribbon Control to provide easy access to the various annotation functions, and the Windows Imaging Component to load and manipulate the images and their metadata.

Chapter 10: Using the Windows Ribbon

This article examines the use of the Windows Ribbon control, which is designed to help users find, use, and understand available commands for a particular application in a way that’s more natural and intuitive than menu bars or toolbars.

Chapter 11: Using the Windows Imaging Component

In this article you will learn how the Windows 7 Imaging Component is used in the Hilo Browser and Annotator applications. The Windows 7 Imaging Component (WIC) allows you to load and manipulate images and their metadata. The WIC Application Programming Interface (API) has built-in component support for all standard formats. In addition, the images created by the WIC can be used to create Direct2D bitmaps so you can use Direct2D to change images.

Chapter 12: Sharing Photos with Hilo

In this article, we’ll describe how the Hilo applications have been extended to allow you to share photos via an online photo sharing site. To do this, Hilo uses the Windows 7 Web Services application programming Interface (WSSAPI). The Hilo Browser application has also been updated to provide additional user interface (UI) and touch screen features, and the Hilo Annotator application has been extended to support Windows 7 Taskbar Jump Lists. This chapter provides an overview of these new features.

Chapter 13: Enhancing the Hilo Browser User Interface

In the final version of Hilo, the Annotator and Browser applications provide a number of enhanced user interface (UI) features. For example, the Hilo Browser now provides buttons to launch the Annotator application, to share photos via Flickr, and touch screen gestures to pan and zoom images. In this chapter we will see how these features were implemented.

Chapter 14: Adding Support for Windows 7 Jump Lists & Taskbar Tabs

The Hilo Browser and Annotator support Windows 7 Jump Lists and taskbar tabs. Jump Lists provide the user with easy access to recent files and provide a mechanism to launch key tasks. Taskbar tabs provide a preview image and access to additional actions within the Windows taskbar. In this Chapter we will see how the Hilo Browser and Annotator applications implement support for Windows 7 Jump Lists and taskbar tabs.

Chapter 15: Using Windows HTTP Services

The Hilo Browser application allows you to upload photos to the Flickr online photo sharing application. To do this, Hilo uses Windows HTTP Services. This chapter will explore how this library is used in the Hilo Browser to implement its photo sharing feature.

Chapter 16: Using the Windows 7 Web Services API

The Hilo Browser application allows you to share your photos via Flickr by using the Share dialog. The previous chapter showed how the Share dialog uses the Windows HTTP Services API to upload the selected photos to Flickr using a multi-part HTTP POST request. Before the photo can be uploaded the Hilo Browser must first be authenticated with Flickr by obtaining a session token (called a frob), and then authorized to upload photos by obtaining an access token. To accomplish these two steps, Hilo Browser uses the Windows 7 Web Services Application Programming Interface (WWSAPI) to access Flickr using web services. In this chapter we will explore how the Hilo Browser uses this library.


Additional Resources

This series is targeted at C++ developers. If you are a C++ developer but are not familiar with Windows development, you may want to check out the Learn to Program for Windows in C++ series of articles.

These articles are available as a single, downloadable PDF.

The code of the sample Hilo applications are available in the Hilo project on MSDN Code Gallery.

저작자 표시

How to Write a Windows Application with Hilo



How to Write a Windows Application with Hilo

“Hilo” is a series of articles and sample applications that show how you can program for Windows 7 using Visual Studio 2010 and Visual C++. The end goal of this series is to enable developers to create high performance client applications that are responsive and attractive. Hilo provides both source code and the written guidance that will help you design and develop compelling, touch-enabled Windows applications of your own. If you want to read the documentation offline, download the whitepaper for Developing Hilo from the Hilo Code Gallery page.

A series of articles are now on MSDN that describe the design and implementation of a set of touch-enabled Windows applications that allow you to browse, select, and work with photos and images. The articles cover key Windows 7 technologies, describe how they are used together to create a compelling user experience, and detail the design and implementation of the applications themselves. You can find the Hilo articles on the MSDN page Hilo: Developing C++ Applications for Windows 7 and you can find the Hilo sample code on the Hilo Code Gallery page. The first article provides an overview of Hilo and describes the goals of the articles and sample applications in the series and the rest go into details of the design and coding practices used throughout the release.

The Hilo article series, along with the sample application source code, are intended to jumpstart your development and show you how to take advantage of key Windows capabilities in your own.

The following sections will describe the Hilo applications as they become available.

The Hilo Browser

This first release of Hilo contains the source code for the Hilo Browser application. This application implements an innovative carousel-style navigation user interface. It’s touch-enabled so you can quickly browse and select images using touch gestures. Download the source code for the Hilo browser.

Read the Hilo articles on MSDN to learn more about how to program for Windows with Hilo.

The Hilo Annotator

This Hilo sample applications allows you to browse, annotate, and share photographs and images.The latest Hilo source code release includes the Hilo Annotator application, which allows you to crop, rotate, and draw on the photographs you have selected. Download the source code for the Hilo Annotator sources.

The Hilo Annotator uses the Windows Ribbon Control to provide easy access to the various annotation functions, and the Windows Imaging Component to load and manipulate the images and their metadata.

Share Your Photos With Hilo

The Hilo sample applications allow you to browse, annotate, and share photographs and images. Previous articles in this series have described the design and implementation of the Hilo Browser, which allows you to browse and select images using a touch-enabled user interface, and the Hilo Annotator, which allows you to crop, rotate, and draw on the photographs you have selected. Now, in the third release of Hilo, we've added support for sharing your photo via Flickr! You can download the source code for the Hilo applications from here.

We've also added support for Windows 7 taskbar and jump lists!

저작자 표시

Windows SDK Archive



Windows SDK Archive

Windows SDK for Windows 7 and .NET Framework 3.5 SP1

Released in August 2009, this SDK provides Windows 7 headers, libraries, documentation, samples, and tools (including VS 2008 SP1 C++ compilers) to develop applications for Windows 7, Windows XP, Windows Server 2003, Windows Vista, Windows Server 2008, and .NET Framework versions 2.0, 3.0, and 3.5 SP1. It is available in ISO and Web setup form.

Windows SDK for Windows Server 2008 and .NET Framework 3.5

Released February 2008, this SDK provides the documentation, samples, header files, libraries, and tools (including C++ compilers) that you need to develop applications to run on Windows Server 2008 and the .NET Framework 3.5. To build and run .NET Framework applications, you must have the corresponding version of the .NET Framework installed.

Microsoft Windows SDK Update for Windows Vista and .NET Framework 3.0

Released in February 2007, this SDK provides documentation, samples, header files, libraries, and tools you need to develop applications that run on Windows. This release of the SDK supplies updated compilers and documentation. The updated compilers are the same ones that recently shipped in Visual Studio 2005 Service Pack 1. This SDK also includes the samples, tools, headers, and libraries that shipped in the Windows SDK for Vista in November, 2006.

저작자 표시

Learn Windows: Audio and Video Development


Learn Windows: Audio and Video Development

Get Started with Windows Audio and Video Development

Get Started with Media Foundation
Media foundation enables you to produce high quality multimedia experiences. This link will lead you to MSDN documentation on what Media Foundation is and has a high level overview of the developer content.Click the following link to learn more about supported media formats in Media Foundation.

Get Started with Core Audio
Core Audio is the preferred way to add support for low-latency, well-featured, and glitch-resistant audio in the latest Windows operating systems. This link will lead you to an overview of Core Audio highlighting its features.

Getting Started with DirectShow
DirectShow is an older end-to-end media pipeline, which supports playback, audio/video capture, encoding, DVD navigation and playback, analog television, and MPEG-2. This link will lead you to the DirectShow introduction that will lead you through the process of developing DirectShow applications.Click the following link to learn more about supported media formats in DirectShow.

Windows SDK
The Microsoft Windows SDK is a set of tools, code samples, documentation, compilers, headers, and libraries that developers can use to create applications that run on Microsoft Windows operating systems using native (Win32) or managed (.NET Framework) programming models. Download the Windows SDK to add support for Windows media formats and develop Windows applications.

Integrate Windows Audio and Video into your Application

Media Foundation Programming Guide
Read the developer documentation covering the best practices and key developer scenarios for developing applications that feature the latest Windows multimedia APIs.

Core Audio Programming Guide
the developer documentation covering developer scenarios for creating applications that feature the latest Windows audio APIs.

DirectShow Programming Guide
Read the developer documentation covering DirectShow.

Media Foundation SDK Samples
Gives descriptions of the Media Foundation samples that ship with the Windows SDK.

SDK Samples that use the Core Audio APIs
Lists the samples that ship with the Windows SDK and demonstrate how to use Core Audio.

DirectShow Samples
Lists the various samples that ship with the Windows SDK and demonstrate how to use DirectShow.

Learn More about Windows Audio and Video Development

Media Foundation Development Forum
Discuss development of multimedia applications.

Windows Pro-Audio Application Development Forum
Discuss topics and ask questions related to DAWs, Plug-Ins, Soft-Synths, and more.

DirectShow Development Forum
Discuss the use of the DirectShow API.

Other Resources

The Green Button

This site is the official Windows Media Center community portal. It contains developer forums for Windows Media Center and Microsoft TV Technologies.

Windows Media Licensing Administration (WMLA) License Request Form

Use this form to request licenses for Windows Media technologies, including PlayReady and Windows Media Rights Manager SDK.

Windows Media Foundation Team Blog

This blog provides in-depth information about Media Foundation programming.

Windows Media

Using MFTrace to Trace Your Own Code
Now that you know how to trace Media Foundation and analyze those traces to figure out what Media Foundation is doing, the next step is to figure out what your own code is doing. That means adding traces for MFTrace to your code. The simplest way to... More...
Automating Trace Analysis
In our last post, we discussed how to examine Media Foundation traces manually, using TextAnalysisTool.NET. This time, we present a few scripts to automate the process and identify problems more quickly. Getting Started First, install Perl, Graph... More...
Analyzing Media Foundation traces
As we mentioned in our previous blog post, log files generated by MFTrace quickly become huge, and it can be difficult to sort out interesting traces from background noise. This blog post will share a few tips to help analyze those log files. TextAn... More...
저작자 표시

Develop great Metro style apps for windows 8


Develop great Metro style
apps for Windows 8

Download the Windows 8 Release Preview

Experience the newest version of Windows and see for yourself how apps are at the center of the Windows 8 experience.


Download the tools and SDK

Get the tools to build Metro style apps for Windows 8. Our free download includes Microsoft Visual Studio Express 2012 RC for Windows 8 and Blend for Visual Studio to help jumpstart your project.

Explore the documentation

Our docs are optimized to make you more productive. Discover everything you need to plan, build, and sell great apps.

Read the developer guide

Windows 8 Release Preview has many powerful features for developers. Discover the new features for Desktop, Web, and Metro style app developers.

Build apps with the experts at Windows Dev Camps

Dev Camps are free events that bring together developers like you to learn more about building apps. Learn interactively and get advice from expert app developers in hundreds of locations around the world.

Network sites


Other links



저작자 표시

DirectX Developer Center



DirectX Highlights RSS

DirectX Developer Center

Windows Developer Center – Developing Games
Windows Developer Center – Developing Games
The DirectX Developer Center will soon become part of the Windows Development Center. Over the next few months, expect most of the... more
GDC 2012 Presentations Now Available
GDC 2012 Presentations Now Available
Presentations from the Microsoft Developer Day at the Game Developers Conference 2012 have been posted to the Microsoft Download ... more
Where is the DirectX SDK?
Where is the DirectX SDK?
As of the Windows 8 Developer Preview, the DirectX SDK is now part of the Windows SDK. For more details on the transition, visit t... more
저작자 표시

3D Space Flight Demo for Android OS, with Accelerometer-Based Controls



3D Space Flight Demo for Android OS, with Accelerometer-Based Controls

By | 8 Jun 2012 | Article
Simulation of flight in 3D space using OpenGL. The accelerometer-based UI is of particular interest.
Prize winner in Competition "Best Mobile article of May 2012"


Figure 1: Image of the program running on a Motorola Droid Bionic XT875


This article describes a modest 3D game ("Demofox") written for devices running the Android OS. While the scope of this game is limited (it has two levels and no enemies to fight), the code provided does address some difficult issues effectively, particularly in the area of user input. 

The objective of game play is to direct a spaceship forward through a continuous series of obstacles (rectangular prisms) without contacting them. Forward flight (positive motion in the "Z" dimension) takes place at all times. The user is able to effect position changes in the X and Y dimensions, within limits. The perspective shown to the user is first person, with the spaceship itself not visible.

Figure 2 below shows a model of the game world. For clarity, this figure is rendered using parallel projection, whereas the game itself uses perspective projection with a vanishing point. As shown, movement is constrained to the interior of a long, tunnel-like rectangular solid of indefinite length in the "Z" dimension.

The obstacles faced by the user are rectangular prisms occupying discrete channels. Prisms reside completely within these channels, of which there are 16 in Figure 2. Though this is not shown in the model, prisms can touch one or more other prisms; more formally, the space between such prisms is zero.

The universe within which movement takes place in the model figure below is four prisms wide by four prisms high. In the real program as supplied, the universe is defined by a 14 x 14 matrix of channels, each of which contains random prisms of constant size, interspersed with the open areas through which the player must direct the spaceship as it moves forward.

Figure 2: Model of game universe, rendered using parallel projection

As the user proceeds successfully forward, the score shown in the text control immediately above the main game display increments. When the user contacts an obstacle, the ship is pushed backwards slightly, and the score resets to zero. At this point, if the user has exceeded the previous high score, three short vibrations are emitted by the device as a notification mechanism. The high score resets to zero only when the device is powered off, or if the "Demofox" task is forcibly terminated. It is displayed at the very top of the game UI.

At program startup, the user is made to select between touchscreen and accelerometer-based play, and also to select between two levels. The button-based GUI that prompts the user for these pieces of information is landscape-oriented, as this is the mode of play that is prefered1. An image of this button-based GUI is shown below:

Figure 3: Button-based startup GUI, running on a Motorola Droid 4

The second level has a different background from the first level, and also has more obstacles. Accordingly, the user's score accrues more quickly in the second level. Figure 1 showed first level game play; a picture of second level game play is shown below:

Figure 4: Image of the program running on a Motorola Droid Bionic XT875 (Level 2)


In general, libaries have been selected with maximum portability in mind. Only the "ES1.0" version of OpenGL is required, for example, and the Android version targeted is 2.1 or better. Similarly, touchscreen user input is supported, since some devices do not support accelerometer input. This is true of the emulator provided with the Android SDK, for example. 

Despite its rudimentary nature, this emulator is fully supported by the code and binary files included with this article. A picture of the program running in the emulator is shown below:

Figure 5: Image of the program running on the Android SDK emulator


Two ZIP archives are provided at the top of the article: a "demo" archive and a "source" archive. The demo archive contains an ".apk" file. This is similar to an ".exe" file in the Windows OS, in that it contains executable code with some embedded resources (images). To run the Demofox game, it is necessary to copy the ".apk" file to some folder on the Android device file system. This can generally be done by connecting the device to a PC via USB, at which point Windows will give the user the option of browsing the Android file system using Explorer.

After the file has copied and the Android device has been disconnected, browse to the Android folder where the ".apk" file is located and tap it. The OS will give you the option of installing the application. It is necessary to enable a setting called "Unknown sources" / "Allow installation of non-Market applications" (or similar) in order to install an application in this way, but most versions of Android will allow you to change this setting interactively as needed when you attempt to install a non-market ".apk" file.

User Input

The simple game described in the last section is mildly amusing, but the area in which the author believes it really excels is that of user input. While the overall application described here is an arcade-style game, the user input techniques described should prove applicable to a wide variety of flight simulators of varying complexity and purpose.

The supplied program exposes two user interfaces: an accelerometer-based (or motion-sensing) user inteface and a touchscreen user interface. Accelerometer-based input devices have grown explosively in number since the introduction of the Nintendo Wii in late 2006. In particular, devices running the iOS and Android operating systems almost universally have multiple accelerometers. All of these devices offer the ability to construct a user experience that is closer to the real experience being modeled than is possible with a mouse, keyboard, or joystick.

The accelerometer-based user interface employed here is an example. It operates in two modes. One mode of operation allows the user to hold the device (e.g. a smart phone) like the control yoke of an airplane. Two of these are featured prominently in the next figure shown below. The user must pull back to point the nose up, push forward to point it down, rotate the device clockwise to effect a right turn, and rotate it counterclockwise to turn left. If an Android device running Demofox were attached to a dummy airplane yoke, its video output could be fed to a display in front of this yoke to create a sort of virtual reality.

Figure 6: Two yokes in a fixed-wing aircraft.

Strictly speaking, in accelerometer control mode, the Demofox user rotates the Android device about its longest axis to effect up-and-down movement. In a yoke like the one shown in the figure above, pulling back / pushing forward naturally causes the necessary rotation. This is part of the mechanical design of the yoke. The Android user operating without a dummy yoke, on the other hand, must take care to rotate the device, not just push it backwards and forwards. This is a familiar motion for many Android users, though, since many accelerometer-based Android user interfaces are based on axial rotation. At the Google Play store, many games that use such an interface have "tilt" in their names, like "aTilt Labyrinth 3D" or "Tilt Racing."

The little snippet of text below the buttons in Figure 3 relates to the other mode in which the accelerometer-based UI can operate: "Divecam" mode. The name of this mode draws an analogy with a diver holding an underwater camera as shown in the next figure below. Such a diver generally has the camera pointed in the direction in which he or she is moving, and this is similarly true of the Android device operating in "Divecam" mode. If the player wishes to navigate the Demofox ship toward the left, for example, he or she can do so by rotating the Android device as if he or she were taking a picture of an object to his or her left.

Figure 7: A diving camera

The selection between "Divecam" mode and "Yoke" mode is made by the user, but this is done indirectly. While the button-based GUI is showing, if the user holds the device like a yoke, i.e. basically perpendicular to the ground or floor plane beneath it, then the snippet of text below the buttons will indicate "Yoke" mode. If the user does not do this, then the text indicates "Divecam" mode. The mode in effect when the user presses a level selection button (i.e. when the user makes a selection on the second and final screen of the button-based GUI) is the mode used for game play. If the user chooses touchscreen mode, then this distinction is irrelevant.

The touchscreen input mode operates in a manner that should be familiar to Android users. To effect a maneuver to the left, the player must touch his or her finger down somewhere on the screen and drag it to the right. This is the same technique used to move toward the left edge of a zoomed-in webpage in the Android browser, to move toward the left in the Android version of Google Earth, and so on. Maneuvers in other directions are performed according to the same pattern: the user must touch down a finger somewhere on the screen and drag it in the direction 180 degrees opposite the desired direction. This results in a very intuitive mode of game play.


Most basically, the code base supplied here is built around the principles of interface-based programming, and of modular programming, which relies on the use of interfaces. In keeping with these principles, the Demofox source consists of series of classes that communicate with each other exclusively using defined sets of methods.  

Despite its use of the class keyword to subdivide code, the implementation provided is not really object-oriented, in several key senses. The author saw no need to use polymorphism beyond what is strictly required by the Java / Android infrastructure, for example. In fact, the same can be said about object instantiation; all members introduced by the author (as opposed to required overridden methods) are static.

This design offers the advantage of deterministic management of resources. These are allocated at application start, and reclaimed upon process termination, without any dependency on object instantiation or destruction.

Also, the author did not want to obfuscate the code presented in the article with architectural concerns. The design selected allows for a clear presentation of techniques that can be integrated into a wide variety of overall architectures. 

Despite these simple structured programming underpinnings, every effort has been made to respect good programming principles. Most basically, the structured, procedural nature of the code is fully reflected in its declarations. Except where the infrastructure requires a true object instance, classes contain static members only, and any initialization that is necessary takes place in the static constructor. This article thus provides a good example of the proper syntax for procedural programming in Java. 

Another design principle which is respected in the Demofox design is the principle of least privilege. Access modifiers (public, protected, etc.) are no more permissive than they need to be, and parameters are marked final wherever possible.

The principle of information hiding is also respected, in that the program consists of a series of classes that interact only by way of a functional interface. Only private fields are used, and the protected interface exposed by each class is comprised of a series of methods in a designated portion of each class. 

Well-defined standards for the naming of members, and for the ordering of their declarations, were followed for all Demofox code. The member order used in the supplied source code files is shown below, from top to bottom:

  • Constants  
  • Constructor (if strictly required)  
  • public method overrides (if strictly required) 
  • protected methods  
  • private methods  
  • private fields  
  • static constructor   

The listing shown above reflects the basic design already laid out. In keeping with least privilege, the only allowance made for any public member in the listing above is for overrides that are absolutely required by the Android / Java infrastructure.

Similarly, an instance constructor is only allowed for in the listing above if it is absolutely necessary. All of this reflects the overall design goal stated above, which was to implement a basic structured programming design, but to do so in a well-specified manner, and to adhere to general good practices.

The design used for this application is interface-based. Accordingly, each class's interface has a designated position in its file. This is the fourth item in the list shown above, "protected methods."

Naming standards are as shown in the listing below:

  • "Lazy" case is used unless otherwise specified, e.g. xbackplane 
  • Constant names are uppercase and are underbar-delimited e.g. YOKE_UI_X_LIMIT 
  • Type suffixes are used for GUI elements, e.g. highscoretv (a TextView
  •  "Camel" case is used for protected methods, if 2+ words are necessary, e.g. setScore() 
  • Abbreviations are generally avoided 
  • As shown above, type name suffixes for GUI controls are abbreviated. 
  • Local variable names can be abbreviated  
  • Units-of-measure can be abbreviated 
  • The word "alternate" is abbreviated "alt"    
  • The word "minimum" is abbreviated "min"    
  • The word "maximum" is abbreviated "max"  
The author does not pretend that these naming and ordering conventions will be completely satisfactory for any other application. They represent his attempt to impose an appropriate but not stifling level of discipline on this relatively limited project. The same is true of the overall design selected. It is adequate for the author's purpose here.


    At the bottom level of the application architecture lies a class named cube. This class is capable of rendering cubes into the model world associated with an instance of type javax.microedition.khronos.opengles.GL10. In keeping with the interface-based architecture described above, this GL10 instance is passed into a static method, draw().

    The draw() method draws a cube2 whose sides are 1.0 units in width, centered about the origin at (0,0,0). It is up to the caller to apply transformations to position each cube properly before the method is called. The cubes are drawn using a top-to-bottom gray gradient coloration; no texture is applied. This is a good, neutral building block for many 3D applications. In the Demofox application, groups of three cubes are used to create obstacles for the player.  

    Like all of the Demofox classes, the cube class begins with its local constant declarations. These are shown and discussed below. The first segment appears thus:

    final class cube
     private static final int PRISM_FULL_COLOR = 0x08000;
     private static final int PRISM_HALF_COLOR = 0x04000;

    These first two declarations define the brightness of the beginning and ending shades of gray used to render the cube. These are of type int, and will be used as OpenGL "fixed point" values. The values actually used above represent full brightness and 50% brightness. The code continues as shown below:

     private static final float PRISM_CUBE_WIDTH = 1f;
     private static final float VERTICES[] = 
       // Back
      // Front

    The constant PRISM_CUBE_WIDTH specifies the length of each side of the cube. Then, the lengthy array declaration shown in the snippet above defines the eight vertices of the cube in 3D space. Two possible coordinates, 0.5 and -0.5, are permuted across the three dimensions to yield 8 array elements.

    It is not sufficient, though, to simply supply the eight vertices of the cube to OpenGL. Instead, 12 triangles must be drawn (two per face, times six cube faces), resulting in a 36-vertex sequence. OpenGL rendering is triangle-driven and does not work with rectangular faces. The next set of constant declarations take care of these details:

     // 6 sides * 2 triangles per side * 3 vertices per triangle = 36
     private static final int CUBE_VERTEX_COUNT = 36;
     // This is the specific, clockwise zig-zagging order
     // necessary for optimal rendering of the solid.
     private static final byte CUBE_VERTEX_INDICES[] = 
      0, 4, 5,  //First triangle drawn
      0, 5, 1,  //Second... etc.
      1, 5, 6, 
      1, 6, 2, 
      2, 6, 7, 
      2, 7, 3, 
      3, 7, 4, 
      3, 4, 0, 
      4, 7, 6, 
      4, 6, 5, 
      3, 0, 1, 
      3, 1, 2 

    Each of the elements in CUBE_VERTEX_INDICES is a number from 0 to 7, and uniquely identifies one of the 8 cube vertices. Specifically, these 0 to 7 elements are indices into array cube.VERTICES.

    The coordinates for these faces are rendered, and therefore defined above, in a specific order that allows for a performance benefit. This ordering is such that the front faces (i.e. those that are visible in a given frame) end up getting drawn using a clockwise motion from each vertex to the next, whereas the equivalent motion for hidden faces ends up being counterclockwise. This allows OpenGL to cull (i.e. not render) the back faces of the solid.

    The cube is a very easy example of this optimization. When defining the cube, we do so such that the vertices of the front face (the face lying entirely in the plane where Z = -0.5) is drawn in clockwise order, and the back face is drawn counterclocwise, assuming the camera is looking at the front face.

    If we rotate the cube 180 degrees about the yaw ("Y") axis, and leave the camera in the same spot, then when these same three vertices are rendered, it will take place in a counterclockwise fashion. OpenGL, when configured to cull surfaces, deals with both scenarios by only drawing the clockwise renderings.  

    This brief discussion of surface culling necessarily leaves out some detail. Some more information about this technique is available in another of the author's articles on this site. Though that article uses Direct3D instead of OpenGL, the design techniques required are the same, and some of the illustrative figures used in that article may help make the text discussion above clearer.

    After the declaration of these constants, the protected interface exposed by class cube is defined. This consists of two static methods, the first of which is draw():  

     protected static void draw(final GL10 gl)
      // 3 means dimensions: x, y, and z
      gl.glVertexPointer(3, GL10.GL_FLOAT, 0, vertexbuffer);
      // 4 means R, G, B, A here
      gl.glColorPointer(4, GL10.GL_FIXED, 0, colorbuffer);
      gl.glDrawElements(GL10.GL_TRIANGLES, CUBE_VERTEX_COUNT,
       GL10.GL_UNSIGNED_BYTE, indexbuffer);

    The method implementation begins by disabling texturing, since the cube to be drawn is not textured. Then, two calls are made, which prepare the OpenGL rendering engine to receive vertices, and enable clockwise surface culling, as described above. The floating point coordinate buffer, and the fixed point color buffer, are supplied to OpenGL by the calls to glVertexPointer() and glColorPointer(). Then, glDrawElements() is used to effect the actual drawing of the cube, and a call is made to return the OpenGL rendering engine to its pre-call state.

    The other protected method that is exposed by cube returns the width of a cube side. This is shown below: 

     protected static float getPrismCubeWidth()
      return PRISM_CUBE_WIDTH;

    This is used by cuber, for example, to translate channel numbers into OpenGL coordinates.

    Many readers will likely be accustomed to a flow of information which is the reverse of what's shown here. Specifically, such readers might expect for a cube class to expose a property allowing its size to be set by the consumer of the class, accompanied, perhaps, by similar properties relating to color and texture.

    Here, these values are instead contained within the cube class, and are selectively exposed via a defined interface. This is done only in cases where some specific consumer is known to require access. 

    While not an example of OOP, this design does respect several principles the author holds very important. The principle of least privilege is an example; Demofox does not need for cuber, demofoxactivity, touchableglview, etc., to manipulate the color or width of the solid rendered by cube, so these other classes are not allowed to do so.

    Instead, the constants used to determine such things as cube color and width are colocated with the code that uses them most, and are hidden unless they specifically need to be exposed. The cuber class needs to read the width value, but not write it, and it is therefore granted this privilege only. Consistent with the interface-based  approach promised earlier, access to this value is given by way of a method, not a field.

    The code shown above also exemplifies modular programming. There is a strict separation-of-concerns between two key tasks: the generation of cubes, which is a generic and potentially reusable operation existing at a relatively low level, and their construction into groups of prisms and integration into the overall application, which is a higher-level task.

    The cube class is self-contained. It does one thing well (renders a neutral cube using a gray gradient) and eschews all other roles (e.g. positioning the cube).

    While the cube class is admittedly inflexible, it also is most definitely a component, or (literal) building block. To force some notion of adaptability onto it would be an example of speculative generality, and this is something the author considers it wise to avoid, at least for the expository purposes of this article.   

    All other executable code for the cube class resides in the static constructor. In each of the Demofox classes, the static constructor takes care of the allocation of application resources. For the cube class, this constructor begins as shown below:   

      final int fullcolor = PRISM_FULL_COLOR;
      final int halfcolor = PRISM_HALF_COLOR;
      final int colors[] = 
       halfcolor, halfcolor, halfcolor, fullcolor,
       halfcolor, halfcolor, halfcolor, fullcolor, 
       fullcolor, fullcolor, fullcolor, fullcolor, 
       fullcolor, fullcolor, fullcolor, fullcolor, 
       halfcolor, halfcolor, halfcolor, fullcolor, 
       halfcolor, halfcolor, halfcolor, fullcolor,
       fullcolor, fullcolor, fullcolor, fullcolor, 
       fullcolor, fullcolor, fullcolor, fullcolor 

    This initial portion of the static constructor is focused on coloration. First, the aliases fullcolor and halfcolor are created. Then, these are used to declare and initialize array colors. This array parallels array VERTICES. Each of the eight rows in its initializer corresponds to the similarly-numbered row in the initializer of VERTICES, and to the same cube vertex as that row. Where each row in VERTICES had three individual numeric coordinates, above, each row has four: red, green, blue, and alpha (opacity).

    In examining the declaration of colors, consider that all of the vertices with lower "Y" coordinate values are gray, while vertices with higher "Y" values are white. This is what creates the gradient effect evident in the rendered cubes. 

    The static constructor for the cube class concludes with the code shown below:

      final ByteBuffer vbb = ByteBuffer.allocateDirect(VERTICES.length * Float.SIZE
       / Byte.SIZE);
      vertexbuffer = vbb.asFloatBuffer();
      final ByteBuffer cbb = ByteBuffer
       .allocateDirect(colors.length * Integer.SIZE);
      colorbuffer = cbb.asIntBuffer();
      indexbuffer = ByteBuffer.allocateDirect(CUBE_VERTEX_INDICES.length);

    This code sets up three data structures required for OpenGL 3D rendering. Member vertexbuffer holds the 8 coordinates of the cube vertices. This is basically a copy of VERTICES, in a very specific format. In particular, vertexbuffer is a  direct buffer. In the nomenclature of Java, this means that it does not reside on the main heap, and will not get relocated by the Java garbage collector. These requirements are imposed by OpenGL. 

    Another direct buffer, indexbuffer, holds the 36 indices into VERTICES that define the way in which the cube is actually rendered. Again, this is basically just a copy of a more familiar-looking data structure, in this case CUBE_VERTEX_INDICES.   

    Finally, colorbuffer is a direct buffer copy of colors. Like the other two data structures for which a direct buffer gets created, colorbuffer contains data to which OpenGL requires quick access during rendering. 

    Other Classes

    The other classes used for Demofox are fundamentally similar to cube in their construction. Each of them exhibits the same member ordering, naming conventions, and overall interface-based, structured philosophy. A listing of all Demofox classes is given below: 

    • cube : See above. 
    • plane : This is similar to cube, but renders a flat plane defined by the equation Z=K , where K is some constant. It is used to render a background image for the game.  
    • cuber : This is a subclass of GLSurfaceView.Renderer, and it performs high-level management of the 3D rendering. It is the class that calls into plane and cube, for example. 
    • touchableglview :  This is a subclass of GLSurfaceView, with methods overridden to facilitate the touchscreen UI. 
    • demofoxactivity : This is a subclass of Activity, and is thus the top-level unit of the application from a procedural standpoint. The main point-of-entry is defined here, for example. 
    The remainder of this article deals with those portions of these other classes that the author thought most noteworthy.  

    Prism Lifetime   

    One very significant role played by cuber is the generation and management of the prismatic obstacles. These are presented to the user in a continuous stream, as if they comprised a sort of permanent, asteroid-belt-like obstruction in space.

    It would be sub-optimal to render such a huge conglomeration of figures with each frame, though. Only a small subset of these really needs to be rendered to provide a convincing game experience. 

    A simplistic way to optimize the generation of prisms might be to generate a small group of them, in near proximity to the user, at application start. These would be positioned in front of the user; in this case, that would place them at slightly higher "Z" coordinates than the user's ship. Under such a design, another, similar group of prisms would be generated after the user had passed by all of the first group. Also, at that time, the first group of prisms would cease to be rendered, and any associated resources would be deallocated.

    This simplistic approach does create a stream of obstacles that is basically continuous,  but it also suffers from gaps in the presentation of these obstacles. Typically, as a prism group passes out of view, there is a point at which the last prism of the group becomes invisible, and then the next group of prisms appears suddenly, all at once. This results in an unconvincing simulation.

    The strategy actually used by Demofox, though, does rely on the basic narrative given above, despite its simplistic nature. The actual code given here compensates for the natural abruptness of the prism group transitions by rendering two such groups at once, and staggering them such that the transitions between groups are much less obvious. 

    In essence, the approach actually used is identical to the simplistic approach described, but involving two sets of prism groups. The second of these groups does not start getting rendered until the user has reached a certain "Z" coordinate, cuber.START_ALT_PRISMS. After that, groups are regenerated as they have completely passed the user in the "Z" dimension.

    To be specific, this regeneration process is handled by calling the reset method of class cuber.This method begins as shown below:  

    private static void reset(final boolean alternate)
      if (!alternate)
       paging = getPositionstate() - demofoxactivity.getStartDistance();
       xstate = new ArrayList<Integer>();
       ystate = new ArrayList<Integer>();
       zstate = new ArrayList<Integer>();
       for (int index = 0; index < simultaneaty; ++index)
         - PRISMS_PER_DIMENSION / 2);
         - PRISMS_PER_DIMENSION / 2);
         - PRISMS_PER_DIMENSION / 2);

    First, note that the portion of reset()shown above pertains to the first set of obstacle prisms, i.e. the one that was present even in the simplistic algorithm given above, before the second set of obstacles was added to improved continuity. The outer if shown above has an else clause which is very similar to the if clause shown above, but uses its own set of variables (altpaging instead of paging, altxstate instead of xstate, and so on3).  

    Most of the code shown above takes care of resetting the first obstacle group into new positions after it has passed out of view. To begin, member paging is set to the current "Z" position of the user's ship (yielded by accessor getPositionState()) minus an initial offset value. 

    All of the prisms' "Z" positions will be expressed as offsets from paging, since at the start of their lifetimes they must all be visible to the end user. This mechanism is used to deal with the ever-increasing "Z" position of the user, ship, and camera. It is evident in much of the code relating to position in the "Z" dimension. 

    Member lists xstate, ystate, and zstate serve to locate each of the new obstacle prisms somewhere in 3D space in front of the user. These are lists, with one element per prism. These lists exist in parallel. The first elements from all three lists together form a single coordinate, as do the three second elements, and so on. 

    These lists hold integer elements. Rather than holding coordinates in our OpenGL universe (which would be floating point numbers), xstate and ystate hold channel numbers, as described in the introduction and as shown in Figure 2.

    Because each channel is exactly large enough in the "X" and "Y" dimensions to accommodate one full prism, the translation of the integer data in xstate and ystate into floating point OpenGL coordinates involves a multiplication by cube.getPrismCubeWidth(). The same numbering convention is continued into the "Z" dimension, although Because the integer value in zstate is offset from paging, not 0, the value of paging  must be added in for calculations involving the "Z" dimension, although there are no channels per se in this dimension. 

    Background Effect  

    The prismatic obstacles shown during game play are rendered in front of an unchanging background. This is implemented using a flat OpenGL plane at a fixed distance from the camera. This plane is of sufficient size that it occupies the entire area of the display, other than those parts covered by obstacles. This is an inexpensive way to add interest to the game's 3D scene without unduly taxing the device's processing resources. 

    Unlike the OpenGL surfaces used to construct the obstacles, the single face used to create this background is textured, since a solid color or even a gradient would make for an uninteresting effect. Though the background surface is unchanging in its appearance and position, this is not unrealistic. Stellar features like the ones depicted in the background images reside at huge distances from the viewer. In the real world, the apparent position of such stellar features does not move much if at all in response to viewer movements. This is particularly true of movement like the sort modeled here, i.e. movement forward with constant orientation. 

    Class plane takes responsibility for drawing the background. This is similar to cube, with some important additions related to texturing, as well as some simplifications related to the simpler solid being rendered. Much of the additional work associated with texturing is handled by method  loadtexture():   

     private static void loadtexture(GL10 gl, Context context)
      Bitmap bitmap;
      // Loading texture
      if (demofoxactivity.getLevel() == 1)
       bitmap = BitmapFactory.decodeResource(context.getResources(),
       bitmap = BitmapFactory.decodeResource(context.getResources(),
      // Generate one texture pointer
      gl.glGenTextures(1, textures, 0);
      // ...and bind it to our array
      gl.glBindTexture(GL10.GL_TEXTURE_2D, textures[0]);
      // Use "nearest pixel" filter for sizing
      gl.glTexParameterf(GL10.GL_TEXTURE_2D, GL10.GL_TEXTURE_MIN_FILTER,
      gl.glTexParameterf(GL10.GL_TEXTURE_2D, GL10.GL_TEXTURE_MAG_FILTER,
      // Use GLUtils to make a two-dimensional texture image from
      // the bitmap
      GLUtils.texImage2D(GL10.GL_TEXTURE_2D, 0, bitmap, 0);
      // Allow for resource reclamation

    Above, an object of type Bitmap is first created, from an embedded resource identified by a member of class R, an Android facility provided for this purpose. Then, the next two calls set up a single texture level for the GL10 rendering engine object passed into the method. For generality, these calls require an int[] array parameter, which holds an identifier for the texture created in its single element. The call to GLUtils.texImage2D() serves to load the appearance data from the Bitmap as this level 0 texture's appearance. After that, the  Bitmap is expendable, and is deallocated. 

    The remainder of the new code in plane associated with surface texturing is in its static constructor. This constructor begins with code that is very similar to the code at the start of the static constructor for class cube. The code in plane diverges from that in cube, though, at the point shown below:   

      float texture[] =
       0.0f, 0.0f, // bottom left 
       1.0f, 0.0f, // bottom right
       1.0f, 1.0f, // top left
       0.0f, 1.0f  // top right
      ByteBuffer cbb3 = ByteBuffer.allocateDirect(texture.length * Float.SIZE
       / Byte.SIZE);
      texturebuffer = cbb3.asFloatBuffer();

    Here, another sort of direct buffer is created for OpenGL. This serves to map the two-dimensional texture appearance onto the single face that gets rendered by OpenGL. These texture coordinates are floating point numbers ranging from 0.0 to 1.0 for all textures, and are associated with the four vertices held in vertexbuffer. Here, we are simply spreading a flat square texture over a flat square face, so the determination of the correct texture coordinates is very simple. The portions of this code devoted to allocating the direct buffer are fundamentally similar to code already shown in the discussion of the cube class above.  

    Drawing a Frame 

    Much key game logic is present in method cuber.onDrawFrame(). This is an overridden method which runs for each OpenGL frame. It is the longest method in the Demofox code base, at 60 lines. In general, long functions are avoided in this code base. The length of onDrawFrame()does allow for a linear presentation below. This method begins thus: 

     public void onDrawFrame(final GL10 gl)

    This code first performs some OpenGL preliminaries shared in common with many 3D applications. The depth buffer, which tracks which pixels are in front and visible, is cleared, and certain basic rendering facilities are enabled. Then, the code shown above calls plane() to draw the background image. The method implementation continues as shown below:  

      boolean allbehind;
      allbehind = true;
      for (int index = 0; index < xstate.size(); ++index)
       allbehind &= (prism(gl, xstate.get(index), ystate.get(index),

    Boolean variable allbehind plays a key role in the management of prism lifetime. As the prisms confronting the user are drawn, by the prism() method, this  allbehind is updated to reflect whether or not each prism is behind the current camera position. If it is not, allbehind is set to false. After all of the prisms in the first group are drawn, if allbehind is still true, then special action is taken:  

      if (allbehind)
       if (wrapitup && altdone)
        positionstate = demofoxactivity.getStartDistance();
        startedalt = false;
        wrapitup = false;
        altdone = false;

    Variable alldone and wrapitup are part of a system that restarts the ship at position 0.0 after a very high maximum "Z" position is reached. Once this happens,  wrapitup is set to true. This begins a process by which both prism groups are allowed to pass behind the camera, without regeneration, before the game is essentially restarted. (Position is reset to 0.0 and the initialization code for the prism group runs, but the score is not reset and accrues normally.)  

    After wrapitup is set to true, neither of the two groups of prismatic obstacles will be reallocated until the overall reset to ship position 0.0 has been completed. This system exists to prevent ship position from climbing indefinitely high and causing number system anomalies. After an extreme position is reached, and both prism groups have subsequently passed behind the current position, (wrapitup && altdone) will evaluate to true and the final reset of ship position to 0.0 will be effected by the innermost sequence of statements above (followed by the subsequent call to reset()). Otherwise, the main prism group is simply recycled using a call to reset().    

    The cuber.onDrawFrame() method continues as shown below:  

      if (positionstate >= START_ALT_PRISMS)
       if (!startedalt)
        startedalt = true;
       allbehind = true;
       for (int index = 0; index < altxstate.size(); ++index)
        allbehind &= (prism(gl, altxstate.get(index), altystate.get(index),
         altzstate.get(index), true));
       if (allbehind)
        if (!wrapitup)
         altdone = true;

    Above, we see a section of code that is similar to the previous snippet, but for the second or alternate prism group. One immediately obvious difference relates to variable startedalt. In order to stagger the two prism groups properly, the alternate prism group does not get drawn until a specific positional milestone is reached by the player / ship / camera. At this point, reset(true) is invoked to create the alternate prism group for the first time, and startedalt is set to true. Next, series of calls to prism() is used to render the actual obstacle group. Throughout this process, variable allbehind is used to track whether the second group is due to be regenerated.

    At the end of the code shown above, the situation where allbehind is true is dealt with. Normally, this results in regeneration of the alternate prism group. However, when it is time to effect the major reset of ship position to 0.0, and wrapitup is true, regeneration of the prism group is deferred, and altdone is set to true instead. As shown in an earlier code snippet, this serves to trigger what amounts to a game restart at ship position 0.0. 

    Finally, this method concludes with the code to increase the current "Z" position, and, if it is sufficiently high, begin the position reset process already described: 

       positionstate = positionstate + EMULATOR_FORWARD_SPEED;
       positionstate = positionstate + PHONE_FORWARD_SPEED;
      if (positionstate > MAX_POSITION)
       wrapitup = true;

    Collision Detection  

    The detection of collisions between the user's ship and the prismatic obstacles is handled within private method cuber.cube(). The relevant portion of this method is explore below. It begins with this conditional structure:  

      // 0.5 is for rounding
      if (positionstate + cube.getPrismCubeWidth() + 0.5 + SHIP_RADIUS >= -cube
       .getPrismCubeWidth() * z + tpaging
       && positionstate + cube.getPrismCubeWidth() - 0.5 - SHIP_RADIUS <= -cube
        .getPrismCubeWidth() * z + tpaging)

    This initial check returns true if the user's ship and the front plane of a cube intersect in the "Z" dimension. This check runs one per frame for the front face of each cube, i.e. three times per prism. While not perfect, this logic ensures that the vast majority of user collisions with any portion of the overall prism are detected. 

    Note that the ship is assumed to have volume; it is treated as a cube of width 2.0 * SHIP_RADIUS. The literal value 0.5 is included for rounding purposes. In conjunction with an implied cast to an integer type, which results in truncation, the addition of 0.5 results in proper rounding-up of numbers having a fractional portion portion greater than or equal to 0.5. Variable tpaging is equal to member paging when drawing the main prism group and is equal to altpaging otherwise. Either way, this value is a component of the overall "Z" position of the cube. If the logic shown above evaluates to true, the conditional shown below then executes: 

       if (getXpos() >= x - 0.5 - SHIP_RADIUS && getXpos() <= x + 0.5 + SHIP_RADIUS)
        if (getYpos() >= y - 0.5 - SHIP_RADIUS
         && getYpos() <= y + 0.5 + SHIP_RADIUS)

    The outer conditional is true when the ship and the front plane intersect in the "X" dimension, and the inner conditional plays a similar role for the "Y" dimension. These calculations are reminiscent of the "Z" dimension's conditional presented earlier, except that there is no need to deal with any paging value for the prism group. The "X" and "Y" values associated with the cubes are absolute positions in the OpenGL 3D universe. 

    If the two conditional expressions shown above both evaluate to true, the collision code runs. This code is simple; the score must be set to zero, the ship bounces back slightly in the "Z" dimension (by value BOUNCE_BACK), and if a new high score has been set, a distinctive three vibration indication is given to the user, using method pulse()

         if (demofoxactivity.getHighScore() <= demofoxactivity.getScore())
         positionstate -= BOUNCE_BACK;

    During development, the author also experimented with vibration feedback during prism collisions. This was ultimately excluded from the end product provided here, since it is battery-draining, and because it was judged to be redundant in light of the other feedback provided for collisions. 

    Touchscreen Interface       

    The touchscreen user interface is built around whole drag events. When the user touches his or her finger down, the ship immediately begins moving correspondingly in the "X" and "Y" dimensions, as described in the "User Input" section above, and continues to do so until the user removes the finger. At this point, the ship remains in the last position attained. If the user at any point attempts to navigate past a world boundary, i.e. to pass outside of the long rectangular play area shown in Figure 2, a long vibration is given, and the ship bounces back to a corrected position. 

    This intuitive and interactive system of control actually does not require too much logic. The handler event for all touchscreen events is in class touchableglview, in method onTouchEvent(). The ACTION_DOWN event must be handled specially, but all other events associated with the touchscreen share common handler logic. The next snippet of code shown below executes when the user first touches a finger down on the screen. 

      if (event.getAction() == MotionEvent.ACTION_DOWN)
       oldy = event.getY();
       originaly = cuber.getYpos();
       oldx = event.getX();
       originalx = cuber.getXpos();

    Values oldx and oldy establish a frame-of-reference for the whole drag event. Throughout the event, displacement of the ship in the "X" and "Y" dimensions will be a linear function of the corresponding "X" or "Y" difference of the finger at any point in time from oldx or oldy. The ship moves forward throughout such events, resulting in a  user-controlled flight path in 3D space.

    Constant FINENESS defines how steep the linear relationship between finger position and ship position actually is. The next segment of code shows the associated calculations. This is the start of the handler logic for all of the touchscreen events that execute after the initial ACTION_DOWN event. Actual ship movement is performed using calls to cuber.setXpos() and cuber.setYpos()

       dy = event.getY() - oldy;
       cuber.setYpos(originaly + (dy / FINENESS));
       // Dragger UI
       dx = event.getX() - oldx;
       cuber.setXpos(originalx + (dx / FINENESS));  

    After these brief calculations, all that is necessary is the check for world extremes. Here, method vibe() is used to give the designated pulse when the user exceeds a boundary. Movement based on constant BOUNCE_TO is applied, in a direction opposite to the user's disallowed maneuver: 

       if (cuber.getXpos() > demofoxactivity.getWorldLimit())
        cuber.setXpos(demofoxactivity.getWorldLimit() * BOUNCE_TO);
       if (cuber.getYpos() > demofoxactivity.getWorldLimit())
        cuber.setYpos(demofoxactivity.getWorldLimit() * BOUNCE_TO);
       if (cuber.getXpos() < -demofoxactivity.getWorldLimit())
        cuber.setXpos(-demofoxactivity.getWorldLimit() * BOUNCE_TO);
       if (cuber.getYpos() < -demofoxactivity.getWorldLimit())
        cuber.setYpos(-demofoxactivity.getWorldLimit() * BOUNCE_TO);

    Accelerometer Interface     

    The accelerometer-based control system takes a form that is broadly similar to that of the touchscreen system just discussed. Again, it is displacement from a base point that effects ship position changes in the "X" and "Y" dimensions, while movement forward in the "Z" dimension continues unabated.

    In the touchscreen system described above, the "base point" for each position change was the point on the touchscreen where the user initially touched down. For the accelerometer-based system, the "base point" is the position of the Android device according to the device accelerometer at the beginning of the flight itself. This is stored in variables xbackplane and ybackplane. The accelerometer sensor values in all cases originate from a member of class SensorEvent, named values. The key subtraction operation for each sensor event therefore involves xbackplane or ybackplane and an element of values

    Like the touchscreen system, the accelerometer-based system relies on a linear translation from input movement to ship movement. These are multiplied by the result of the central subtraction operation. Here, the constant factors at play are TILT_SPEED_X and TILT_SPEED_Y.  

    The dimensions in the names of these constants refers to the movement of the ship, not to the associated accelerometer dimensions. In default mode (i.e. when yokemode is false), these dimensions are inversed. It is SensorEvent.values[Y_ACCELEROMETER] that controls movement in the "X" dimension, and SensorEvent.values[X_ACCELEROMETER] that controls movement in the "Y" dimension. 

    In "yoke" mode, SensorEvent.values[Y_ACCELEROMETER] is used to similar purpose, but is negated. The value of SensorEvent.values[X_ACCELEROMETER] is not used. Instead, it is SensorEvent.values[Z_ACCELEROMETER] that is used to control movement in the "Y" dimension of the ship. 

    These modes of operation were set up empirically by the author. Another program was developed to output the raw value of all three sensors to the device screen, and it was used to track desired control movements.  

    Finally, variables ytiltfactor and xtiltfactor are set to zero to disable the accelerometer system for touchscreen operation. They are multiplied by the result of each linear translation function, similarly to TILT_SPEED_X and TILT_SPEED_Y to account for situations where accelerometer input is disabled. This is a slightly inefficient, but reliable, method. 

    All of the logic described above takes place in method demofoxacitivty.readaccelerometers(). The body of this method is shown below, in its entirety. As was true of the touchscreen system, all of the calculations just described result in a set of calls to cuber.setXpos() and cuber.setYpos()

      if (yokemode)
       cuber.setYpos(cuber.getYpos() - ytiltfactor
        * ((event.values[Z_ACCELEROMETER] - xbackplane) / TILT_SPEED_Y));
       cuber.setYpos(cuber.getYpos() + ytiltfactor
        * ((event.values[X_ACCELEROMETER] - xbackplane) / TILT_SPEED_Y));
      if (yokemode)
       cuber.setXpos(cuber.getXpos() - xtiltfactor
        * ((event.values[Y_ACCELEROMETER] - ybackplane) / TILT_SPEED_X));
       cuber.setXpos(cuber.getXpos() + xtiltfactor
        * ((event.values[Y_ACCELEROMETER] - ybackplane) / TILT_SPEED_X)); 

    Process Management    

    The Android OS employs a process model in which applications do not usually exit once started. Rather, they are suspended and then they resume at some indeterminate point in the future. The Demofox application responds to being suspended and resumed by restarting the application. That is, the user returns to the start of the button-based GUI used to select level and input mode. Full credit is given for high scores set before process suspension. 


    The author's experience with Android and its OpenGL implementation has been positive. This implementation is a powerful and accessible one. The Eclipse IDE and Java language are economical and familiar choices, whose operations and semantics should be familiar to many developers. 

    The accelerometer and touchscreen input devices are somewhat more novel technologies. The author hopes that the code discussed here offers some useful techniques for dealing with these forms of input.    


    Many of the images used in the article, and used as embedded resources in the game, were photographs taken by, or in conjunction with, the United States federal government. The first game level uses a photograph taken by the Hubble Space Telescope, of the Boomerang Nebula. The European Space Administration shares credit for this image, which is in the public domain.

    Similarly, the image of the diver and camera shown above in the article was released by the U.S. Navy. Credit goes to Petty Officer Shane Tuck, and the individual shown is Petty Officer Jayme Pastoric. 

    Credit for the picture of aircraft yokes shown in the article body above goes to Christian Kath. This image was made available under the GNU Free Documentation License.  

    The background image for the second game level is an original work by my daughter, Holly. 


    1. Portrait mode play is possible, but play control seemed inferior to the author. Also, "yoke" mode, in which the accelerometer-based UI directly models the yoke of an airplane, is not available when the device is held in portrait orientation. This reflects the fact that real airplane yokes are generally wider than they are tall.

    2. Issues of aspect ratio will cause these cubes to actually appear elongated in actual practice. Because the prisms shown to the user in this application are elongated by design, this is not a problem.   

    3. Consideration was given to using an array here. Ultimately, the required syntax was judged to be less intuitive than having two sets of variables. 


    This is the third major version of this article. Both revisions contained improvements to the article text only. The code and binary files have not changed. 


    This article, along with any associated source code and files, is licensed under The GNU General Public License (GPLv3)

    저작자 표시

    Windows SDK for Windows 7 and .NET Framework 4 Release Notes

    Windows SDK for Windows 7 and .NET Framework 4 Release Notes


    1. Welcome

    2. License Agreement

    3. Installing and Uninstalling the Windows SDK

    4. Build Environment

    5. Documentation

    6. Known Issues    

    7. Windows SDK Product Support and Feedback




    1. Welcome

    Welcome to the Microsoft Windows Software Development Kit (SDK) for Windows 7 and .NET Framework 4.

    The Windows SDK contains a set of tools, code samples, documentation, compilers, headers, and libraries that developers can use to create applications that run on Microsoft Windows. You can use the Windows SDK to write applications using the native (Win32/COM) or managed (.NET Framework) programming model.

    For access to additional resources and information, such as downloads, forum posts, and the Windows SDK team blog, go to the Windows SDK Developer Center.


    1.1 What’s New in the Windows SDK for Windows 7 and .NET Framework 4 Release?

    ·         Smaller/Faster: at less than 600MB, this SDK is one third the size of the Windows 7 RTM SDK; it installs faster and has a smaller footprint.

    ·         Cleaner setup: features on setup screens have been grouped into native, managed, and common buckets to help you choose the components you need faster.

    ·         New Microsoft Help System v1.0: this brand new system was first introduced with Visual Studio 2010.  You can import just the content you need from the MSDN cloud, and update it according to your schedule.

    ·         Visual C++ 2010 compilers/CRT with improved compilation performance and speed.  These are the same compilers and toolset that ships with Visual Studio 2010.

    ·         New command line build environment that uses MSBuild 4.0, now the common Microsoft build system for all languages, and supporting the new Visual C++ project type .vcxproj.


    1.2 What Does the Windows SDK for Windows 7 and .NET Framework 4 Support?

    ·         Operating Systems: You can install this SDK on and/or create applications for Windows 7, Server 2008 R2, Server 2008, XPSP3, Vista, and Windows Server 2003 R2.

    ·         Platform architecture: you can install this SDK on and/or create applications for platform chipsets X86, X64, and IA64 (Itanium).

    ·         .NET Framework: you can use the SDK resources to create applications that target .NET Framework versions 2.0, 3.0, 3.5, 4.

    ·         Visual Studio: you can use the resources in this SDK with Visual Studio versions 2005, 2008, and 2010, including Express editions. (Not all features work with all versions of Visual Studio. For example, you can’t use the .NET 4 tools with Visual Studio 2008.)

    ·         Setup/Install options: Win SDK v7.1 will be available through an ISO or a Web setup download and install experience.  Web setup allows you to install selected components of the SDK without having to download the entire SDK.  The DVD ISO setup allows you to download the entire SDK to install later, or share among different computers. 



    1.3 What’s Been Removed?

    1.3.1 Documentation

    The DExplore document viewer that shipped with previous SDKs is no longer delivered via the SDK, and documentation is no longer delivered in-box with the SDK.  You’ll be prompted at the end of SDK setup to download documentation to your computer using the Microsoft Help System if you wish to do so.


    1.3.2 Managed Samples

    Managed samples have been removed from this release of the Windows SDK.  Managed samples can be found on Code Gallery.


    1.3.3 Tools

    The following tools were included in the Windows SDK for Server 2008 and .NET Framework 3.5 release, but are not included in this release:


    Tools Removed in the Windows SDK for Windows 7 and .Net Framework 4

    ·         UISpy.exe

    ·         Wpt_arch.msi



    1.4 Required Resources

    This release of the Windows SDK does not include a .NET Framework Redistributable Package.


    .NET Framework 4 (Required)

    ·         The Microsoft Windows SDK for Windows 7 and .NET Framework 4 requires the RTM version of the full, extended .NET Framework 4 Redistributable Components  

    ·         The .NET Framework 4 Redistributable Components must be installed prior to installing the Windows SDK for Windows 7 and .NET Framework 4

    NOTE:  The Client version of the .NET Framework 4 is not sufficient and the SDK will not install on a computer that has a pre-release (Beta or RC) version of the .NET Framework 4.


    2. License Agreement

    The contents included in the Windows SDK are licensed to you, the end user. Your use of the SDK is subject to the terms of an End User License Agreement (EULA) accompanying the SDK and located in the \License subdirectory. You must read and accept the terms of the EULA before you access or use the SDK. If you do not agree to the terms of the EULA, you are not authorized to use the SDK.

    3. Installing, Uninstalling the Windows SDK

    To optimize your Windows SDK setup experience, we strongly recommend that you install the latest updates and patches from Microsoft Update before you begin installing the Windows SDK.


    3.1 Windows SDK Disk Space Requirements

    The complete installation of the Windows SDK requires less than 600MB, this SDK is one third the size of the Windows 7 RTM SDK; it installs faster and has a smaller footprint. Please verify that the computer you are installing to has the minimum required disk space before beginning setup. If the minimum required disk space is not available, setup will return a fatal error.


    3.2 How to Uninstall SDK Components

    When you uninstall the SDK through Programs and Features (Add/Remove Programs on pre-Vista OS's) most of the SDK components will be uninstalled automatically. However, a few shared components installed by the SDK may need to be uninstalled separately. This guide provides instructions for uninstalling those shared components.


    To uninstall shared SDK components:

    1.    From the Start Menu, go to Control Panel, Programs and Features (Add/Remove Programs on pre-Vista OS's)

    2.    Select and remove the following entry:

    ·    Microsoft Windows SDK for Windows 7 (7.1) (the Windows SDK core-component files)

    3.    Remove the shared components. This list provides some of the components you may see. If you are running a 64 bit version of Windows you may see both 32 bit (x86) and 64 bit (x64, IA-64) versions of the components listed:

    ·         Application Verifier

    ·         Debugging Tools for Windows

    ·         Windows Performance Toolkit

    ·         Microsoft Help Viewer 1.0

    ·         Microsoft Visual C++ 2010 Redistributable

    ·         Microsoft Visual C++ 2010 Standard Edition


    4. Build Environment

    4.1 Moving From the VCBuild to the MSBuild 4.0 Environment

    Visual C++ 2010 moved to newer build system (MSBuild v4.0) to provide better performance, scalability and extensibility. In order for the build system migration to happen the project file format in Visual C++ 2010 had to change to a different format in order to be compatible with the new MSBuild v4.0 file format (earlier versions of Visual C++ have used VCBuild as their build system).  The extension of the new Visual C++ 2010 project file has been changed from “.vcproj” to “.vcxproj”.  Project files with the “.vcproj” extension can be upgraded using the vcupgrade tool (included in this release of the SDK).  For more information on how to upgrade a project file, see section titled “Upgrading Projects to Visual C++ 2010.”


    4.2 Setting Build Environment Switches

    To set specific targets in the build environment:

    ·         Launch the Windows SDK build environment - From the Start menu, click on  All Programs > Microsoft Windows SDK v7.1 > Windows SDK 7.1 Command Prompt

    ·         Set the build environment -  At the prompt, type:
    setenv  [/Debug | /Release][/x86 | /x64 | /ia64 ][/vista | /xp | /2003 | /win7][-h | /?]


    The setenv.cmd help [-h | /?] displays the usage

                    /Debug   - Create a Debug configuration build environment

                    /Release - Create a Release configuration build environment

                    /x86     - Create 32-bit x86 applications

                    /x64     - Create 64-bit x64 applications

                    /ia64    - Create 64-bit ia64 applications

                    /vista   - Create Windows Vista SP1 applications or Windows Server 2008

                    /xp      -   Create Windows XP applications

                    /2003    - Create Windows Server 2003 applications

                   /win7    - Create Windows 7 or Windows Server 2008 R2 applications


    4.3 Upgrading Projects to Visual C++ 2010

    The Windows SDK for Windows 7 and .NET Framework 4 installs the vcupgrade.exe tool which will convert the previous project file format to the MSBuild compatible project file format. In this process the extension of the project file (.vcproj) will change to .vcxproj.  To upgrade a Visual C++ 2005 or a Visual C++ 2008 (ex: sample.vcproj) file to VC 2010 (sample.vcxproj) file format in the SDK build environment, at the command prompt type: “vcupgrade sample.vcproj”.



    VCUpgrade [options] <project file>



         -nologo                     Suppresses the copyright message

         -nocolor                    Do not output error and warning messages in color

         -overwrite                Overwrite existing files

         -PersistFramework  Keep the Target Framework Version while upgrading. (-p)


    The vcupgrade tool can upgrade any project version from VC6 through VC 2008 to the VC 2010 format, and returns a success message if the conversion succeeded.  If unsuccessful, a message listing conversion errors is returned.


    5. Documentation

    The new Microsoft Help System allows you to view documents on the MSDN Library using a standard browser, and select documents to download from the MSDN Online content publication web site (MSDN cloud) to your computer for viewing when a connection to the Internet is unavailable or undesired.  You can download, update or delete content on your own schedule.  

    The Microsoft Help System is also delivered via Visual Studio 2010. 


    If you select the Microsoft Help System during SDK setup, you will be prompted at the end of SDK setup to import documentation to your computer using the Help Library Manager.  You will be prompted to select a location for content to be stored on your computer’s hard drive.  You may select the default location to store content on your computer at this time only.  The content store location cannot be changed later. 


    Microsoft Help System is the replacement for the Document Explorer (DExplore) help viewer that was delivered in earlier versions of the Windows SDK.  The Document Explorer has been deprecated.

    5.1 Start Menu shortcuts for Microsoft Help System

    Three Start Menu shortcuts will be installed with the MHS under All Programs, Windows SDK v7.1, Documentation:

    ·         Manage Help Settings: use the Help Library Manager to manage content in online and offline modes

    ·         Microsoft Help System Documentation: open the MHS Help-on-Help documentation stored locally on your computer when in offline mode, or online on MSDN when in online mode.

    ·         Windows SDK Documentation: open the documentation for the Windows SDK product stored locally on your computer when in offline mode, or online on MSDN when in online mode.

    5.2 Confirm that you wish to go online

    The MHS is set to Online Mode by default.  The first time you click the MHS shortcuts on the Start Menu you will be asked to confirm that you wish to connect to the Internet to view documentation in the MSDN cloud. 

    5.3 Offline documentation

    If you wish to view documentation when a connection to the Internet is unavailable, you can import documentation sets (books) from the MSDN cloud and install these books to your computer.  You can switch to Offline Mode to view content on your computer by default. 

    5.4 Configuring MHS to use Offline mode

    The Help Viewer is set to online mode by default upon installation.  This setting affects all instances of the Help System installed on a computer.  If you have downloaded content from the MSDN cloud using the Help Library Manager (Start, All Programs, Windows SDK v7.1, Documentation, Help Library Manager) and wish to view these documents when not connected to the Internet, you must change the configuration to offline mode:

    1.  Open the Help Library Manager (Start, All Programs, Windows SDK v7.1, Documentation, Help Library Manager)

    2.  Select Choose online or local help

    3.  Select I want to use local help

    4.  Click OK

    5.5 Importing documentation to your local computer

    If you wish to import (download documentation) to your local computer for viewing in offline mode (no Internet connection):

    1.  From the Start Menu, select Start, All Programs, Windows SDK v7.1, Documentation, Help Library Manager. 

    2.  On the options screen, select ‘Install Content from online’.

    3.  Select the Add link next to any content you wish to add locally, example:  .NET Framework 3.5

    4.  When the Add link is selected, the Actions column will change to say Cancel and Status to Update Pending. 

    5.  Click the Update button.

    6.  While the content is updating you will see a status screen.

    7.  When the update is finished you will see a Finished Updating notification.

    8.  On the options screen, select ‘Choose online or local help’.

    9.  Select ‘I want to use local help’ and click OK

    5.6 Checking for updates on the MSDN cloud

    Documentation is frequently updated on the MSDN Cloud.  If you wish to update the documentation you have previously downloaded to your local computer for viewing in offline mode (no Internet connection):

    1.  From the Start Menu, select Start, All Programs, Windows SDK v7.1, Documentation, Help Library Manager. 

    3.  On the options screen, select ‘Check for updates online’.

    4.  A Checking for Updates screen will show the documentation you have imported to your computer.  The Status column next to each feature will show the status of the documentation for that feature:

          - Up to date: no updates are available

          -Update Available: click to import updates for this feature documentation

    5.  Click the Update button.

    6.   While the content is updating you will see a status screen.

    7.  When the update is finished you will see a Finished Updating notification.

    5.7 Removing documentation content from your computer

    If you wish to remove the documentation you have previously downloaded to your local computer for viewing in offline mode:

    1.  From the Start Menu, select Start, All Programs, Windows SDK v7.1, Documentation, Help Library Manager. 

    2.  On the options screen, select ‘Remove content’.

    3.  A Remove Content screen will show the documentation stored locally on your computer.  The Actions column next to each feature will show the available actions for that documentation set.

    4.  Click the Remove link to remove a documentation set.

    5.  The Status column will show Remove Pending as content is being deleted.

    6.  Click the Cancel link during removal if you wish to cancel this action.

    7.  Click the Remove button.

    8.  While the content is being removed you will see an Updating Local Library screen.

    9.  When the removal is finished you will see a Finished Updating notification.

    5.8 Installing Offline Content from media

    Offline content on media is not provided for this release of the Windows SDK.  This content may be made available at a future date.



    6. Known Issues

    This release of the Windows SDK has the following known issues, categorized by type.

    6.1 Build Environment

    6.1.1 Limitations of the vcupgrade tool

    ·         The vcupgrade tool cannot upgrade a solution file

    Proposed workaround:

    o   Hand edit/update the solution file to conform with the Visual Studio 2010 format. This involves

    §  Changing the header in the file to say Visual Studio 2010.

    §  Changing the extension of the projects in the solution file to .vcxproj


    ·         The vcupgrade tool cannot upgrade a project which has a reference to another project.

    Proposed workaround:

    §  Run the tool from the solution directory.


    6.1.2 SDK Build Environment may Fail on X86 XP with VS2005

    If your usage scenario matches the one listed below, you will be unable to build in the Windows SDK command line build environment.  If you type cl.exe in the SDK command window and press Enter, you will see this error:

    This application has failed to start because mspdb80.dll was not found.  Re-installing the application may fix this problem.

    Computer setup required to repro issue:

    1.           Windows XP on x86 machine (which has version 5.1.2600.2180 of REG.exe)

    2.           Visual Studio 2005 installed, but Visual Studio 2008 or 2010 is NOT installed

    Cause: When the SDK build environment window is launched, the SDK file SetEnv.cmd launches Reg.exe.  Reg.exe generates standard output when a valid KeyPath is specified and also generates error output when invalid Value is specified.  In this scenario, the KeyPath is valid but the value doesn’t exist.  For more information, see the Windows SDK blog post on this issue.

    Workaround: follow these instructions to manually edit SetEnv.cmd to remove the second call to REG:

    1.    Open C:\Program Files\Microsoft SDKs\Windows\v7.0\Bin\SetEnv.cmd in Notepad or another editor

    2.    For this line:
    FOR /F "tokens=2* delims=  " %%A IN ('REG QUERY "%VSRegKeyPath%" /v 9.0') DO SET VSRoot=%%B

    Either comment out using the REM command (like this):
    REM FOR /F "tokens=2* delims=     " %%A IN ('REG QUERY "%VSRegKeyPath%" /v 9.0') DO SET VSRoot=%%B

    OR delete the line completely.

    3.    Save SetEnv.cmd

    4.    Restart the Windows SDK command prompt


    6.1.3 Windows 7 SDK with Visual C++ 2005: Failure to compile in Debug mode.

    If your usage scenario matches the one listed below, you will be unable to debug in the Windows SDK command line build environment or Visual Studio 2005 SP1.

    Symptom: You have an .lib file or an .obj file that exposes C interfaces that was built by using Microsoft Visual C++ 2008 or for Windows 7. You add this file to a project as a link dependency. When you build the project in Microsoft Visual Studio 2005 Service Pack 1 (SP1) to generate an .exe file or a .dll file, you may receive the following link error:

    Fatal error LNK1103: debugging information corrupt

    Cause: This problem occurs because of a compatibility issue between Visual Studio 2005 and Visual Studio 2008 versions. For more information, see the Microsoft Support page for the patch:

    Fix: Install the patch for Visual Studio 2005 SP1 available from:


    6.1.4 Platform Used in /p:platform=[target platform] and setenv [target platform] Must Match

    If you encounter one of the following error messages, make sure that platform in /p:platform=[target platform] and setenv [target platform] match.


    Error messages:

    You are attempting to build an AMD64 application from an x86 environment.

    You are attempting to build an Itanium (IA64) application from an x86 environment.

    You are attempting to build a Win32 application from an x64 environment.

    You are attempting to build an Itanium (IA64) application from an x64 environment.

    You are attempting to build a Win32 application from an Itanium (IA64) environment.

    You are attempting to build an AMD64 application from an Itanium (IA64) environment.


    6.1.5 Builds Requiring hhc.exe May Fail When Using MSBuild

    Builds requiring hhc.exe (HTML Help Workshop compiler) may fail and throw a "Windows cannot find 'hhc'..." error message.  There are a couple of workarounds.


    Workaround 1:

    Modify setenv.cmd to include %programfiles(x86)%\HTML Help Workshop

    ·         Open an elevated command-prompt.

    ·         Change directories to the Windows SDK 7.1 bin directory (%programfiles%\Microsoft SDKs\Windows\v7.1\bin)

    ·         Type: Notepad setenv.cmd

    ·         After line 369 (which looks like this): SET Path=%FxTools%;%VSTools%;%VCTools%;%SdkTools%;%Path%
    Add the following:
    IF "%CURRENT_CPU%"=="x86" (
    SET “Path=%path%;%ProgramFiles%\HTML Help Workshop”
    ) ELSE (
    SET “Path=%path%;%ProgramFiles(x86)%\HTML Help Workshop”


    Workaround 2:

    Add %programfiles(x86)%\HTML Help Workshop to %path% in the environment variables.

    On Vista/Server 2008/Windows 7:

    ·         Click Start

    ·         Right-click ”Computer

    ·         Click “Properties

    ·         Click “Advanced system settings

    ·         Click the “Environment Variables…” button

    ·         Use the scroll bar to find the “Path” variable

    ·         Click “Path” to highlight it, then click the “Edit…” button

    ·         At the end of the Variable Value text box, add a semicolon and the path to the HTML Help Workshop (i.e. “;%programfiles%\HTML Help Workshop” on x86)


     6.2 Microsoft Help System

    6.2.1 Microsoft Help Viewer is unavailable during SDK setup

    If the Microsoft Help System (MHS) has already been installed on your computer by a Visual Studio 2010 product, the Microsoft Help Viewer node on the SDK setup screen will be unavailable, or grayed out.  This is by design.  The MHS will only be installed once on a computer.

    6.2.2 Microsoft Help System remains after uninstall of the Windows SDK

    Help Viewer is a separate entry in Programs and Features (Add/Remove Programs in pre-Vista OSes) and will not be uninstalled when the Windows SDK is uninstalled.  The Help Viewer must be uninstalled separately. 

    6.2.3 Help Library Manager will not update/synch content if Internet connection is not available

    The Help Library Manager requires an Internet connection in order to connect to the MSDN cloud to update or synch content.   If you attempt to synch offline content with updated documentation available in the MSDN cloud, you will receive an error message that a network connectivity problem has occurred.

    6.2.4 Downloading documentation requires an internet connection

    Documentation is not included in the SDK setup package and must be downloaded from the MSDN cloud using the Help Library Manager (Start, All Programs, Windows SDK v7.1, Documentation, Help Library Manager).

    6.2.5 The Help Viewer is not supported on computers with an Itanium (IA64) chip

    The Help Viewer will not install on a computer that is using an Itanium (IA64) processor.  A Help Viewer option will not be available when installing the Windows SDK on a computer with an Itanium (IA64) processor.

    6.2.6 Help Library Manager requires Administrator permission to update content

    For security purposes, the Help Library Manager installs content to a location that is locked down to users who are not members of the HelpLibraryUpdaters security group.  Non-members cannot update, add or delete content to the store.  This security group is created during setup, and by default, it includes the administrators group and the user account that created the local store.  To enable support for multiple users (non-administrators) to update content on the machine, an administrator must add the additional accounts to the HelpLibraryUpdaters security group. 

    6.2.7 Help Library Manager validates digital signatures on content

    As a security check, the Help Library Manager checks the digital signature used to sign cabinet files that contain documentation content to be installed locally. When installing content interactively, users can view the certificate of signed files and approve or reject the content.


    6.3 SDK Tools and Compilers

    6.3.1 Some Tools Require .NET Framework 3.5 SP1

    In order to function correctly some tools included in the Windows SDK also require you to install .NET Framework 3.5 SP1.

    6.3.2 The Command-Line Environment for the Windows SDK Configuration Tool Supports Only Visual Studio 2008

    The command-line environment for The Windows SDK Version Selection tool supports only Visual Studio 2008. It does not support earlier versions of Visual Studio.

    6.3.3 GuidGen.exe may fail to run without Visual Studio Installed.

    The .NET Framework 3.5 version of the GuidGen.exe tool has a dependency on version 9.0 of the MFC Runtime. If you do not have the correct version installed you may see an error similar to this:


    C:\Program Files\Microsoft SDKs\Windows\v7.1\Bin>guidgen.exe

    The application has failed to start because its side-by-side configuration is incorrect. Please see the application event log or use the command-line sxstrace.exe tool for more detail.


    In order for the tool to function correctly we recommend installing any Visual Studio 2008 Express edition to obtain the required version of the MFC Runtime.


    NOTE:  This does not occur with the .NET Framework 4 version of GuidGen.exe.

    6.3.4 MSIL Disassembler (ILDASM.EXE) Help does not display in Windows 7.

    Help for the ILDASM tool does not display because it uses a help file format that is not natively supported in Windows 7.  Online documentation is available:

    6.3.5 OLE-COM Object Viewer (OLEView.exe) – error message appears and some tool functions are unavailable if tool is not run for the first time with administrator permissions.

    If the first run of the OLE-COM Object Viewer tool is not done with administrator permissions the following error message will appear:

    DllRegisterServer in IVIEWERS.DLL failed. OLEViewer will operate correctly without this DLL, however you will not be able to use the interface viewers.

    To avoid this issue use “Run as administrator” when you run the tool for the first time.

    6.3.6 FXCop Setup is Now Located Under the Window SDK “\Bin” Directory.

    The installer for FXCop, fxcopsetup.exe, is now located in [Program Files]\Microsoft SDKs\Windows\v7.1\Bin\FXCop.


    6.3.7 Windows SDK Configuration Tool Does Not Update Visual Studio 2008 Paths for Itanium Headers and Libraries

     Symptom: When the Windows SDK Configuration Tool is used to integrate the Windows 7 SDK with Visual Studio 2008, the Visual C++ Directories for headers and libraries are set to point to the Windows 7 SDK content for x86 and x64 platforms only. Visual Studio will still point to the IA64 headers, libraries and tools that ship with Visual Studio 2008.

    Fix: Open Visual Studio 2008, Select Tools->Options->Projects and Solutions->C++ Directories. From the ‘Platform’ drop-down, Select ‘Itanium’ and from the “Show directories for” drop-down select ‘Include files’. Replace all instances of ‘WindowsSDKDirIA64’ with ‘WindowsSDKDir’. Select ‘Library files’ from the “Show directories for” drop down menu and repeat the previous step.


    6.3.8 Windows UI Automation May Fail When Using AccEvent or Inspect on Windows XP, Windows Vista, Windows Server 2003 R2 or Windows Server 2008


    Symptom in UIA Events Mode: Start AccEvent, Select Mode, Select UIA Event, Select Settings.  An error occurs stating “Accessible Event Watcher (32-bit UNICODE Release) has encountered a problem and needs to close.  We are sorry for the inconvenience.”

    Symptom in WinEvents Mode: Start AccEvent, Select Mode menu, Select WinEvents (In Context) or WinEvents (Out of Context), Select Events menu, Select Start Listening.  At the bottom corner of the screen a message is displays stating “Registration failed, Stopped”.

    Cause: The latest Automation API is not present on the machine.  For more information, see the Microsoft Support page for the Windows Automation API:

    Fix:   Install the latest Windows Automation API available from:



    Symptom: When starting Inspect an error occurs stating “The program was unable to query the UI Automation client interfaces. Please confirm that the latest framework is installed.”

    Cause: The latest Automation API is not present on the machine.  For more information, see the Microsoft Support page for the Windows Automation API:

    Fix:   Install the latest Windows Automation API available from:


    6.3.9 A Failure May Occur When Using aximp.exe to Add a COM Component to VS2010 Sample Form Application


    The following error message is encountered when adding a COM component to a VS2010 Sample Form Application.

    Error Message:

    “Failed to import ActiveX control. Please ensure it is properly registered."



    Install the comlibrary.interop.dll and axcontrol.inteorp.dll into the GAC.  Open a Windows SDK 7.1 Command Prompt.  At the prompt type cd bin.  Then type Gacutil /if comlibrary.interop.dll and Gacutil /if axcontrol.inteorp.dll.


    6.3.10 Windows Troubleshooting Pack Designer May Fail on Startup

    Issue: Windows Troubleshooting Pack Designer fails to start.

    Cause: TSPDesigner.exe and DesignerFunction.dll are delay signed only.

    Workaround: Run the SN tool to bypass strong name validation for Windows Troubleshooting Pack Designer:

    -          Open a Windows SDK Command Prompt as Administrator

    -          Change directories to the folder where TSPDesigner.exe is located.  The default location is C:\Program Files\Microsoft SDKs\Windows\v7.1\Bin\TSPDesigner

    -          Enter ‘sn.exe -Vr TSPDesigner.exe’

    -          Enter ‘sn.exe -Vr DesignFunction.dll’

    Note that ‘sn.exe –Vr’ is case sensitive


    6.4 Samples

    6.4.1 Some C++ Samples Require Upgrade Prior to Building

  Building Samples Using Visual Stuido 2005/2008

    Many of the sample project files in the SDK are written for VC 8.0 compilers, the compilers that shipped in Microsoft Visual Studio 2005. Samples with version 8.0 project files can be built in Microsoft Visual Studio 2005 but must be upgraded before being built in the Visual Studio 2008 build environment. 

  Building Samples Using Visual Studio 2010 or Windows SDK v7.1 (MSBuild 4.0)

    Visual Studio 2010 and the Windows SDK v7.1 use the MSBuild 4.0 build environment.  The Visual C++ compilers shipped in the Windows SDK v7.2 are the same compilers that ship in Visual Studio 2010 RTM.  All Native samples project and solution files in the Windows SDK v7.1 must be upgraded prior to building using the MSBuild 4.0 environment.  

    NOTE: For more information about upgrading sample to build in the MSBuild 4.0 build environment, see section titled “Moving From the VCBuild to the MSBuild 4.0 Environment”.

    6.4.2 Some C++ Samples will not Build in Visual Studio 2005

    Some of the samples in the SDK contain v9.0 project files, which will not build in Visual Studio 2005. You cannot downgrade the project files to a lower version.  To workaround this issue, install Visual Studio 2008 or 2010 (Express or Retail SKU) or build the sample in the Windows SDK command line build environment.

    6.4.3 Some Samples have External Dependencies

    Some samples included with the Windows SDK have dependencies on components outside the Windows SDK. ATL/MFC Dependency

    Some samples require the ATL and/or MFC headers, libraries, or runtime, which are included with Visual C++ (non-Express editions). When building a sample that depends on ATL/MFC without Visual Studio installed on your computer, you might see an error similar to this:

    fatal error C1083: Cannot open include file: 'afxwin.h': No such file or directory

    To workaround this issue, install a non-Express version of Microsoft Visual Studio with the compatible MFC/ATL. Windows Media Player Dependency

    The Multimedia\WMP_11\dotNet\SchemaReader sample requires Windows Media Player 11 or later to be installed. Microsoft Management Console 3.0 Dependency

    The samples in the \Samples\SysMgmt\MMC3.0 directory require Microsoft Management Console 3.0 or later to be installed. DirectX SDK Dependency

    Some samples require the DirectX SDK (refer to the sample's readme for additional information). msime.h Dependency

    Some samples fail to build because the file msime.h is not found. Msime.h is not shipped with the Windows SDK. Msime.h is for use by developers when customizing applications for the 2007 Microsoft Office System.

    The affected samples are:

    ·         winui\Input\tsf\TSFApps\ImmPad-Interim

    ·         winui\Input\tsf\TSFApps\ImmPad-Level3-Step3

    ·         winui\Input\tsf\TSFApps\TsfPad-Hybrid

    To workaround this issue, download msime.h from the Microsoft Download Center and copy to the Windows SDK \Include directory.

    6.4.4 Some Samples Require Microsoft Visual Studio 2005 and will not Build with Microsoft Visual Studio 2008

    A few unmanaged samples rely on mspbase.h, mtyp.h, or mfc80ud.lib. These files are included with Microsoft Visual Studio 2005 and do not ship with Microsoft Visual Studio 2008.

    1.    mspbase.h

    a.    netds\Tapi\Tapi3\Cpp\Msp\MSPBase

    b.    netds\Tapi\Tapi3\Cpp\Msp\Sample

    c.    netds\Tapi\Tapi3\Cpp\pluggable

    2.    mtype.h

    a.    netds\Tapi\Tapi3\Cpp\tapirecv

    b.    netds\Tapi\Tapi3\Cpp\tapisend

    3.    mfc80ud.lib

    a.    Sysmgmt\Wmi\VC\AdvClient

    6.4.5 Some C++ Samples do not have Configurations for x64

    When building a C++ sample on an x64 computer that does not have support for x64, you might see the following error message:

    fatal error LNK1112: module machine type 'x64' conflicts with target machine type 'x86'

    To workaround this issue, perform one of the following actions:

    1.       Build the sample targeting x86 by using this command:

    msbuild *.vcxproj /p:platform=win32

    1.       Add x64 support by doing the following:

    a.       Load the sample in Microsoft Visual Studio (C++).

    b.      Update the Configuration Manager under Project | Properties.


    For detailed instructions, see the Windows SDK Blog post "How to add 64-bit support to vcproj files."


    Note: if you do not install libraries for all CPU architectures during SDK setup, some samples with Visual C++ project files might fail to build with this error for all configurations in the project file:


    Fatal error LNK1181: cannot open input file


    For example, if a sample has an x86 configuration and x86 libraries were not installed (these libraries are installed by default when installing the SDK on all platforms), the sample will fail to compile.

    6.4.6 Visual J# Samples Require VJ++ (Visual Studio)

    J# samples will not build using the Windows SDK because there is no appropriate build environment. This edition of the Windows SDK does not support building J# applications. To workaround this issue, install Visual Studio (VJ++) 2005.

    6.4.7 Setupvroot.bat Setup Script for WCF Samples Fails on Windows Vista if the NetMsmqActivator Service is Enabled and Message Queuing (MSMQ) is not Installed

    The Windows Communication Foundation samples setup script Setupvroot.bat does not work on Windows Vista if the NetMsmqActivator service is enabled and Message Queuing (MSMQ) is not installed. The iisreset utility does not work unless MSMQ is installed or the NetMsmqActivator service is disabled. The WCF samples setup script Setupvroot.bat will not run unless MSMQ is installed or the NetMsmqActivator service is disabled.

    Make sure MSMQ is installed or disable the NetMsmqActivator service on Windows Vista before you run the WCF samples setup script Setupvroot.bat.

    6.4.8 Some Samples Fail to Compile: Debug:Itanium Error

    Some samples might fail to compile on all platforms with the following error:

    vcbuild.exe: error VCBLD0004: Project 'C:\Samples\Technologies\DirectoryServices\BEREncoding\CP\BerEncoding\BerEncoding.vcproj' does not contain a configuration called 'Debug|Itanium'

    This error occurs because platform configurations are listed alphabetically by default in a project or solution file created by Visual Studio. If Debug|Itanium is a supported configuration, it will be listed first in the samples' solution and/or project files. This configuration will be built first by default.

    To workaround this issue, use a configuration switch to specify what platform you want to build for:

    Msbuild.exe *.sln /p:platform=Win32
    Msbuild.exe *.sln /p:platform=x64

    6.4.9 Windows Media Services SDK Plug-ins Fail to Build on Windows 2008 Server

    The “Playlistparser” and “Authorization” plug-in samples should be available in Windows Media Services. The user should be able to enable and disable the newly built plug-ins. However, the “Playlistparser” and “Authorization” plug-ins fail to build and produce the following error:

    Syntax error for the code given below in Unknwn.idl

       HRESULT QueryInterface(

            [in] REFIID riid,

            [out, iid_is(riid), annotation("__RPC__deref_out")] void **ppvObject);

    To workaround this issue, build the “Playlistparser” and “Authorization” plug-ins on Windows Vista or Windows 2003 Server and copy the plug-ins to Windows 2008 Server.

    6.4.10 WSDAPI StockQuote and FileService samples may fail to build on the command line (using msbuild) if Visual Studio 2008 is installed

    The WSDAPI StockQuote and FileService samples may fail to build on the command line (using msbuild) if Visual Studio 2008 is installed. These samples include project files which reference WSDL and XSD files, and msbuild attempts to invoke sproxy.exe to process these files. Visual Studio 2008 does not include sproxy.exe, and compilation fails if the tool is not present. Compiling from inside the Visual Studio IDE is unaffected.

    It is not necessary to use sproxy to process these files. WsdCodeGen, a WSDL/XSD compiler for WSDAPI, can be used to generate C++ code from these files. This generated code is already included in the sample.

    If you encounter this error, you can remove the XSD and WSDL files from the affected project files and recompile with msbuild:

    ·         StockQuote\StockQuoteContract\StockQuoteContract.vcproj: remove StockQuote.xsd, StockQuote.wsdl, and StockQuoteService.wsdl

    ·         FileService\FileServiceContract\FileServiceContract.vcproj: remove FileService.wsdl

    6.4.11 WinBase\RDC sample generates assertion failure when built in Visual Studio

    You may receive an “Assertion failed” message when attempting to build this sample in Visual Studio. This is caused by a post build instruction that attempts to register the COM component, but fails. To workaround this issue, build the sample in Administrator mode. From the Start menu, Microsoft Visual Studio 2008, Visual Studio Tools, then right click on a Visual Studio command prompt and select “Run as Administrator”.

    6.5 Headers and Libraries

    6.5.1 The wincodec_proxy.h Header File is Missing From the Windows SDK for Windows 7 and .NET Framework 4

    Symptom: The wincodec_proxy.h header file is missing from the the Windows SDK for Windows 7 and .NET Framework 4.

    Fix: Download the wincodec_proxy.h header file from the Microsoft Code Gallery:

    6.5.2 Warnings May Occur When Compiling Programs Containing the ws2tcpip.h Header File

    Symptom: Warnings occur when compiling programs containing the ws2tcpip.h header file.

    Fix: Precede #include <ws2tcpip.h> with #pragma warning(push), #pragma warning(disable : 6386) and follow it with #pragma warning(pop).

    6.5.3 Conflict in definition of INTSAFE_E_ARITHMETIC_OVERFLOW in intsafe.h/comutil.h

    Symptom: You are using the Windows 7 SDK development environment or Visual Studio 2008 or earlier compilers with the Windows 7 version of intsafe.h and at compile time receive the below warning:

    warning C4005: 'INTSAFE_E_ARITHMETIC_OVERFLOW' : macro redefinition c:\program files\microsoft sdks\windows\v7.1\include\intsafe.h

    Cause: This warning is reported because intafe.h and comutil.h have conflicting definitions of INTSAFE_E_ARITHMETIC_OVERFLOW.

    Workaround: Intsafe.h must be included in the code before comutil.h and the warning will not occur.

    Fix: This conflict is resolved in Visual Studio 2010 and later versions of the compiler package.


    6.6 Windows Native Development

    6.6.1 Windows SDK Configuration Tool (Command Line) Fails With Visual Studio 2005

    This is not a supported scenario will not work with Visual Studio 2005.


    7.   Windows SDK Product Support and Feedback

    The Windows SDK is provided as-is and is not supported by Microsoft. For technical support, there are a number of options:

    7.1 Professional Support for Developers

    Microsoft Professional Support for Developers provides incident-based access to Microsoft support professionals and rich information services to help developers to create and enhance their software solutions with Microsoft products and technologies.

    For more information about Professional Support for Developers, or to purchase Professional Support incidents, please contact a Customer Representative at 1-800-936-3500. To access Professional Support for Developers, visit the MSDN Web site. If you have already purchased support incidents and would like to speak directly with a Microsoft support professional, call 1-800-936-5800.

    7.2 MSDN Online

    MSDN Online provides Developer Support search, support incident submission, support highlights, service packs, downloads, technical articles, API information, blogs, newsgroups, forums, webcasts, and other resources to help optimize development.

    7.3 Ways to Find Support and Send Feedback

    Your feedback is important to us. Your participation and feedback through the locations listed below is appreciated.

    ·         The MSDN Forums are available for peer-to-peer support.

    ·         Windows SDK Developer Center is the official site about development using the Windows SDK, and provides information about the SDKs, links to the Windows SDK Blog, Forum, online release notes and other resources.

    ·         The Windows SDK Forum deals with topics related specifically to the Windows SDK.

    ·         The Software Development for Windows Client forum contains an updated list of related forums.

    ·         You can also send mail to the Windows SDK Feedback alias at

    ·         The Windows SDK Blog contains workarounds,  late-breaking and forward-looking news.

    Copyright © 2010 Microsoft Corporation. All rights reserved. Legal Notices:


    저작자 표시

    Using Direct2D with WPF



    Using Direct2D with WPF

    By | 3 Nov 2010 | Article
    Hosting Direct2D content in WPF controls.


    With Windows 7, Microsoft introduced a new technology called Direct2D (which is also supported on Windows Vista SP2 with the Platform Update installed). Looking through all its documentation, you'll notice it's aimed at Win32 developers; however, the Windows API Code Pack allows .NET developers to use the features of Windows 7 easily, with Direct2D being one of the features supported. Unfortunately, all the WPF examples included with the Code Pack require hosting the control in a HwndHost, which is a problem as it has airspace issues. This basically means that the Direct2D control needs to be separated from the rest of the WPF controls, which means no overlapping controls with transparency.

    The attached code allows Direct2D to be treated as a normal WPF control and, thanks to some COM interfaces, doesn't require you to download the DirectX SDK or even play around with any C++ - the only dependency is the aforementioned Code Pack (the binaries of which are included in the attached file). This article is more about the problems found along the way the challenges involved in creating the control, so feel free to skip to the Using the code section if you want to jump right in.


    WPF architecture

    WPF is built on top of DirectX 9, and uses a retained rendering system. What this means is that you don't draw anything to the screen, but instead create a tree of visual objects; their drawing instructions are cached and later rendered automatically by the framework. This, coupled with using DirectX to do the graphics processing, enables WPF applications not only to remain responsive when they have to be redrawn, but also allows WPF to use a "painter's algorithm" painting model. In this model, each component (starting at the back of the display, going towards the front) is asked to draw itself, allowing them to paint over the previous component's display. This is the reason it's so easy to have complex and/or partially transparent shapes with WPF - because it was designed taking this scenario into account. For more information, check out the MSDN article.

    Direct2D architecture

    In contrast to the managed WPF model, Direct2D is immediate-mode where the developer is responsible for everything. This means you are responsible for creating your resources, refreshing the screen, and cleaning up after yourself. It's built on top of Direct3D 10.1, which gives it high-performance rendering, but provides several of the advantages of WPF (such as device independent units, ClearType text rendering, per primitive anti-aliasing, and solid/linear/radial/bitmap brushes). MSDN has a more in-depth introduction; however, it's more aimed at native developers.


    Direct2D has been designed to be easily integrated into existing projects that use GDI, GDI+, or Direct3D, with multiple options available for incorporating Direct2D content with Direct3D 10.1 or above. The Direct2D SDK even includes a nice sample called DXGI Interop to show how to do this.

    To host Direct3D content inside WPF, the D3DImage class was introduced in .NET 3.5 SP1. This allows you to host Direct3D 9 content as an ImageSource, enabling it to be used inside an Image control, or as an ImageBrush etc. There's a great article here on CodeProject with more information and examples.

    The astute would have noticed that whilst both technologies can work with Direct3D, Direct2D requires version 10.1 or later, whilst the D3DImage in WPF only supports version 9. A quick internet search resulted in this blog post by Jeremiah Morrill. He explains that an IDirect3DDevice9Ex (which is supported by D3DImage) supports sharing resources between devices. A shared render target created in Direct3D 10.1 can therefore be pulled into a D3DImage via an intermediate IDirect3DDevice9Ex device. He also includes example source code which does exactly this, and the attached code is derived from his work.

    So, we now have a way of getting Direct2D working with Direct3D 10.1, and we can get WPF working with Direct3D 10.1; the only problem is the dependency of both of the examples on unmanaged C++ code and the DirectX SDK. To get around this problem, we'll access DirectX through its COM interface.

    Component Object Model

    I'll admit I know nothing about COM, apart from to avoid it! However, there's an article here on CodeProject that helped to make it a bit less scary. To use COM, we have to use low level techniques, and I was surprised (and relieved!) to find that the Marshal class has methods which could mimic anything that would normally have to be done in unmanaged code.

    Since there are only a few objects we need from Direct3D 9, and there are only one or two functions in each object that are of interest to us, instead of trying to convert all the interfaces and their functions to their C# equivalent, we'll manually map the V-table as discussed in the linked article. To do this, we'll create a helper function that will extract a method from the specified slot in the V-table:

    public static bool GetComMethod<T, U>(T comObj, int slot, out U method) where U : class
        IntPtr objectAddress = Marshal.GetComInterfaceForObject(comObj, typeof(T));
        if (objectAddress == IntPtr.Zero)
            method = null;
            return false;
            IntPtr vTable = Marshal.ReadIntPtr(objectAddress, 0);
            IntPtr methodAddress = Marshal.ReadIntPtr(vTable, slot * IntPtr.Size);
            // We can't have a Delegate constraint, so we have to cast to
            // object then to our desired delegate
            method = (U)((object)Marshal.GetDelegateForFunctionPointer(
                                 methodAddress, typeof(U)));
            return true;
            Marshal.Release(objectAddress); // Prevent memory leak

    This code first gets the address of the COM object (using Marshal.GetComInterfaceForObject), then gets the location of the V-table stored at the start of the COM object (using Marshal.ReadIntPtr), then gets the address of the method at the specified slot from the V-table (multiplying by the system size of a pointer, as Marshal.ReadIntPtr specifies the offset in bytes), then finally creates a callable delegate to the returned function pointer (Marshal.GetDelegateForFunctionPointer). Simple!

    An important thing to note is that the IntPtr returned by the call to Marshal.GetComInterfaceForObject must be released; I wasn't aware of this, and found my program leaking memory when the resources were being re-created. Also, the function uses an out parameter for the delegate so we get all the nice benefits of type inference and, therefore, reduces the amount of typing required for the caller. Finally, you'll notice there's some nasty casting to object and then to the delegate type. This is unfortunate but necessary, as there's no way to specify a delegate generic constraint in C# (the CLI does actually allow this constraint, as mentioned by Jon Skeet in his blog). Since this is an internal class, we'll assume that the caller of the function knows this constraint.

    With this helper function, it becomes a lot easier to create a wrapper around the COM interfaces, so let's take a look at how to provide a wrapper around the IDirect3DTexture9 interface. First, we'll create an internal interface with the ComImport, Guid, and InterfaceType attributes attached so that the Marshal class knows how to use the object. For guid, we'll need to look inside the DirectX SDK header files, in particular d3d9.h:

    interface DECLSPEC_UUID("85C31227-3DE5-4f00-9B3A-F11AC38C18B5") IDirect3DTexture9;

    With the same header open, we can also look for the interface's declaration, which looks like this after running it through the pre-processor and removing the __declspec and __stdcall attributes:

    struct IDirect3DTexture9 : public IDirect3DBaseTexture9
        virtual HRESULT QueryInterface( const IID & riid, void** ppvObj) = 0;
        virtual ULONG AddRef(void) = 0;
        virtual ULONG Release(void) = 0;
        virtual HRESULT GetDevice( IDirect3DDevice9** ppDevice) = 0;
        virtual HRESULT SetPrivateData( const GUID & refguid, 
                const void* pData,DWORD SizeOfData,DWORD Flags) = 0;
        virtual HRESULT GetPrivateData( const GUID & refguid, 
                void* pData,DWORD* pSizeOfData) = 0;
        virtual HRESULT FreePrivateData( const GUID & refguid) = 0;
        virtual DWORD SetPriority( DWORD PriorityNew) = 0;
        virtual DWORD GetPriority(void) = 0;
        virtual void PreLoad(void) = 0;
        virtual D3DRESOURCETYPE GetType(void) = 0;
        virtual DWORD SetLOD( DWORD LODNew) = 0;
        virtual DWORD GetLOD(void) = 0;
        virtual DWORD GetLevelCount(void) = 0;
        virtual HRESULT SetAutoGenFilterType( D3DTEXTUREFILTERTYPE FilterType) = 0;
        virtual D3DTEXTUREFILTERTYPE GetAutoGenFilterType(void) = 0;
        virtual void GenerateMipSubLevels(void) = 0;
        virtual HRESULT GetLevelDesc( UINT Level,D3DSURFACE_DESC *pDesc) = 0;
        virtual HRESULT GetSurfaceLevel( UINT Level,IDirect3DSurface9** ppSurfaceLevel) = 0;
        virtual HRESULT LockRect( UINT Level,D3DLOCKED_RECT* pLockedRect, 
                const RECT* pRect,DWORD Flags) = 0;
        virtual HRESULT UnlockRect( UINT Level) = 0;
        virtual HRESULT AddDirtyRect( const RECT* pDirtyRect) = 0;

    We only need one of these methods for our code, which is the GetSurfaceLevel method. Starting from the top and counting down, we can see that this is the 19th method, so will therefore be at slot 18 in the V-table. We can now create a wrapper class around this interface.

    internal sealed class Direct3DTexture9 : IDisposable
        private delegate int GetSurfaceLevelSignature(IDirect3DTexture9 texture, 
                             uint Level, out IntPtr ppSurfaceLevel);
        [ComImport, Guid("85C31227-3DE5-4f00-9B3A-F11AC38C18B5"), 
        internal interface IDirect3DTexture9
        private IDirect3DTexture9 comObject;
        private GetSurfaceLevelSignature getSurfaceLevel;
        internal Direct3DTexture9(IDirect3DTexture9 obj)
            this.comObject = obj;
            HelperMethods.GetComMethod(this.comObject, 18, 
                                       out this.getSurfaceLevel);
        public void Dispose()
        public IntPtr GetSurfaceLevel(uint Level)
            IntPtr surface;
                                  this.comObject, Level, out surface));
            return surface;
        private void Release()
            if (this.comObject != null)
                this.comObject = null;
                this.getSurfaceLevel = null;

    In the code, I've used Marshal.ThrowExceptionForHR to make sure that the call succeeds - if there's an error, then it will throw the relevant .NET type (e.g., a result of E_NOTIMPL will result in a NotImplementedException being thrown).

    Using the code

    To use the attached code, you can either include the compiled binary into your project, or include the code as there's not a lot of it (despite the time spent on creating it!). Either way, you'll need to make sure you reference the Windows API Code Pack DirectX library in your project.

    In the code, there are three classes of interest: D3D10Image, Direct2DControl, and Scene.

    The D3D10Image class inherits from D3DImage, and adds an override of the SetBackBuffer method that accepts a Direct3D 10 texture (in the form of a Microsoft.WindowsAPICodePack.DirectX.Direct3D10.Texture2D object). As the code is written, the texture must be in the DXGI_FORMAT_B8G8R8A8_UNORM format; however, feel free to edit the code inside the GetSharedSurface function to whatever format you want (in fact, the original code by Jeremiah Morrill did allow for different formats, so take a look at that for inspiration).

    Direct2DControl is a wrapper around the D3D10Image control, and provides an easy way to display a Scene. The control takes care of redrawing the Scene and D3D10Image when it's invalidated, and also resizes their contents. To help improve performance, the control uses a timer to resize the contents 100ms after the resize event has been received. If another request to be resized occurs during this time, the timer is reset to 100ms again. This might sound like it could cause problems when resizing, but internally, the control uses an Image control, which will stretch its contents when it's resized so the contents will always be visible; they just might get temporarily blurry. Once resizing has finished, the control will redraw its contents at the correct resolution. Sometimes, for reasons unknown to me, there will be a flicker when this happens, but by using the timer, this will occur infrequently.

    The Scene class is an abstract class containing three main functions for you to override: OnCreateResources, OnFreeResources, and OnRender. The reason for the first two functions is that a DirectX device can get destroyed (for example, if you switch users), and afterwards, you will need to create a new device. These methods allow you to create/free device dependent resources, such as brushes for example. The OnRender method, as the name implies, is where you do the actual drawing.

    Putting this together gives us this code to create a simple rectangle on a semi-transparent blue background:

    <!-- Inside your main window XAML code -->
    <!-- Make sure you put a reference to this at the top of the file:
    <d2d:Direct2DControl x:Name="d2DControl" />
    using D2D = Microsoft.WindowsAPICodePack.DirectX.Direct2D1;
    internal sealed class MyScene : Direct2D.Scene
        private D2D.SolidColorBrush redBrush;
        protected override void OnCreateResources()
            // We'll fill our rectangle with this brush
            this.redBrush = this.RenderTarget.CreateSolidColorBrush(
                                 new D2D.ColorF(1, 0, 0));
        protected override void OnFreeResources()
            if (this.redBrush != null)
                this.redBrush = null;
        protected override void OnRender()
            // This is what we're going to draw
            var size = this.RenderTarget.Size;
            var rect = new D2D.Rect
                    (int)size.Width - 10,
                    (int)size.Height - 10
            // This actually draws the rectangle
            this.RenderTarget.Clear(new D2D.ColorF(0, 0, 1, 0.5f));
            this.RenderTarget.FillRectangle(rect, this.redBrush);
    // This is the code behind class for the XAML
    public partial class MainWindow : Window
        public MainWindow()
            // Add this after the call to InitializeComponent. Really you should
            // store this object as a member so you can dispose of it, but in our
            // example it will get disposed when the window is closed.
            this.d2DControl.Scene = new MyScene();

    Updating the Scene

    In the original code to update the Scene, you needed to call Direct2DControl.InvalidateVisual. This has now been changed so that calling the Render method on Scene will cause the new Updated event to be fired, which the Direct2DControl subscribes to and invalidates its area accordingly.

    Also discovered was that the Scene would sometimes flicker when redrawn. This seems to be an issue with the D3DImage control, and the solution (whilst not 100%) is to synchronize the AddDirtyRect call with when WPF is rendering (by subscribing to the CompositionTarget.Rendering event). This is all handled by the Direct2DControl for you.

    To make things easier still, there's a new class deriving from Scene called AnimatableScene. After releasing the first version, there was some confusion with how to do continuous scene updates, so hopefully this class should make it easier - you use it the same as the Scene class, but your OnRender code will be called, when required, by setting the desired frames per second in the constructor (though see the Limitations section). Also note that if you override the OnCreateResources method, you need to make sure to call the base's version at the end of your code to start the animation, and when you override the OnFreeResources method, you need to call the base's version first to stop the animation (see the example in the attached code).

    Mixed mode assembly is built against version 'v2.0.50727'

    The attached code is compiled against .NET 4.0 (though it could probably be retargeted to work under .NET 2.0), but the Code Pack is compiled against .NET 2.0. When I first referenced the Code Pack and tried running the application, the above exception kept getting raised. The solution, found here, is to include an app.config file in the project with the following startup information:

    <?xml version="1.0"?>
      <startup useLegacyV2RuntimeActivationPolicy="true">
        <supportedRuntime version="v4.0"/>


    Direct2D will work over remote desktop; however (as far as I can tell), the D3DImage control is not rendered. Unfortunately, I only have a Home Premium version of Windows 7, so cannot test any workarounds, but would welcome feedback in the comments.

    The code written will work with targeting either x86 or x64 platforms (or even using the Any CPU setting); however, you'll need to use the correct version of Microsoft.WindowsAPICodePack.DirectX.dll; I couldn't find a way of making this automatic, and I don't think the Code Pack can be compiled to use Any CPU as it uses unmanaged code.

    The timer used in the AnimatableScene is a DispatchTimer. MSDN states:

    [The DispatcherTimer is] not guaranteed to execute exactly when the time interval occurs [...]. This is because DispatcherTimer operations are placed on the Dispatcher queue like other operations. When the DispatcherTimer operation executes is dependent on the other jobs in the queue and their priorities.


    • 02/11/10 - Direct2DControl has been changed to use a DispatchTimer so that it doesn't contain any controls needing to be disposed of (makes FxCop a little happier), and the control is now synchronized with WPF's CompositionTarget.Rendering event to reduce flickering. Scene has been changed to include an Updated event and to allow access to its D2DFactory to derived classes. Also, the AnimatedScene class has been added.
    • 21/09/10 - Initial version.


    This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

    저작자 표시

    Hex Grids and Hex Coordinate Systems in Windows: Drawing and Printing

    Hex Grids and Hex Coordinate Systems in Windows: Drawing and Printing

    By | 29 Apr 2012 | Article
    A library (DLL) for the generation of hexagon grids (or "tessellations"), and for the management of the resultant coordinate systems.


    Figure 1: "Hexplane.exe", a demo in which the user flies through 3D space over a hex-based terrain


    Figure 2: Spinning cube demo with four dynamic hex tessellation surfaces ("Hex3d.exe")


    Whether called a hex grid, a honeycomb, or a tessellation (which is the mathematical term for a space-filling pattern), groupings of hexagons like the ones shown above are a useful way to fill up two-dimensional space. They provide for a more interesting visual effect than tessellations of 3- or 4-sided polygons; and unlike tessellations composed of several different polygons, hex tessellations create a regular grid with a consistent coordinate system, a fact which has been exploited by all sorts of computer games, and also by many board games.

    The work at hand describes how to use a library created by the author to draw a wide variety of hex tessellations. This is accomplished using selected calls into the GDI and GDI+ APIs made by the library. The library is named "hexdll.dll". This somewhat repetitive name makes more sense in the context of its codebase, where it is found alongside "hexdll.cpp", "hexdll.h", "hex3d.cpp", and so on.

    The development of the programs provided with this article presented some performance challenges. In one demonstration, a hexagon grid is drawn onto a Direct3D surface. The surface is dynamic; with each frame, a slightly different hex tessellation is drawn, creating an interesting flicker effect. Care was taken to ensure that the many calls into "hexdll.dll" necessitated by this application did not result in decreased frame rate. This requires "hexdll.dll" itself to be capable of operating quickly, and also presents potential interface issues between GDI and Direct3D, which are discussed more extensively further down in the article.

    In another of the demo programs, a large hex grid is redrawn in its entirety with each Resize event. Again, if done incorrectly, this action will introduce a very noticeable lag.

    These high-performance applications are both enabled by a single common design decision: GDI+ is avoided, in favor of GDI, unless the caller specifically requests anti-aliasing. GDI does not offer this capability, so GDI+ must be used for anti-aliased drawing. However, for drawing operations not involving anti-aliasing, GDI is significantly faster than GDI+. This is an interesting result which is discussed at some length below. Here, suffice it to say that the OOP-style interface exposed by GDI+ comes at a cost, and at a cost which is in some cases dramatic. This article is thus a reminder of the high performance potential of non-OO procedural and structured programming techniques.


    The "device context" (DC) is ubiquitous in Windows programming. The Direct3D surface interface used here (IDirect3DSurface91), for example, exposes a GetDC() method. Most Windows controls (be they MFC, Win32, or .NET-based) expose an HWND, which can be converted into a DC using a single Windows API call. Each of these entities is ultimately just a different kind of 2D Windows surface, and "hexdll.dll" can draw hex tessellations on all of them, using the DC as a common medium. Many of these operations are demonstrated in the code base provided, and discussed in this article.

    The author's DLL is designed for hassle-free use from native or .NET programs2; the source code provided contains complete examples of both types of clients. The main ".cs" file for the .NET demo is only 87 lines long. None of the C++ demos use more than 450 lines of code, despite their high feature content. The "hex2d.cpp" printer-friendly demo uses only 236 lines of code.

    The next section of the article deals with the creation of client apps that use the DLL. For simplicity, a Visual Studio 2010 / C# client app ("hexdotnet.sln" / "hexdotnet.exe") is shown first. The folder tree for this C# application is present in the "hexdotnet" subfolder of the provided source code archive.

    After the presentation of the .NET client, the text below continues with the presentation of a variety of C++ client programs, and then concludes with a discussion of the internal implementation of "hexdll.dll". The code for the library is written in C++, and built using the MinGW compiler. Some C++ client apps are discussed as well.

    The client programs provided were developed using Visual Studio for the .NET app and MinGW for the C++ apps and for the DLL itself. In constructing the C++ development tool chain, this article relies heavily on techniques described in two previous articles by the same author, GDI Programming with MinGW and Direct3D Programming with MinGW. The text below attempts to be self-contained, but it does make reference to these predecessor articles, as necessary, and they do provide more detailed explanations of some of the background topics for this article. There are some minor differences between the instructions given in this article and those given in its predecessors, due to changes in MinGW. These are discussed in the section titled "Building", further below.

    API Selection

    The author faced a choice between several 2D drawing APIs in developing the programs described in this article. The APIs considered were GDI, GDI+, DirectDraw, and Direct2D. Of these, Direct2D is the newest and likely the fastest-running alternative. Unfortunately, MinGW does not support it, at least not as downloaded. DirectDraw is, like Direct2D, a component of DirectX, but it is a deprecated one.

    Of course, it would be difficult to integrate either of these DirectX-based technologies into typical (i.e., raster-based) Windows applications as seamlessly as was done for the GDI/GDI+ implementation present in "hexdll.dll". Two main advantages of the approach selected are therefore its generality and its lack of bothersome architectural requirements.

    Using the Code

    One simple way to develop a client application that uses "hexdll.dll" is to use the .NET System.Windows.Forms namespace to create a control onto which the DLL can draw. Any C# application can access the functions exposed by "hexdll.dll". The first step is to insert the declarations shown below into an application class:

    [DllImport("hexdll.dll", CallingConvention = CallingConvention.Cdecl)]
    static extern void hexdllstart();
    [DllImport("hexdll.dll", CallingConvention = CallingConvention.Cdecl)]
    static extern void hexdllend();
    [DllImport("hexdll.dll", CallingConvention = CallingConvention.Cdecl)]
    static extern void systemhex
      IntPtr hdc,  //DC we are drawing upon
      Int32 origx, //Top left corner of (0,0) hex in system - X
      Int32 origy, //Top left corner of (0,0) hex in system - Y
      Int32 magn,  //One-half hex width; also, length of each hex side
      Int32 r,     //Color of hex - R
      Int32 g,     //Color of hex - G
      Int32 b,     //Color of hex - B
      Int32 coordx,//Which hex in the system is being drawn? - X
      Int32 coordy,//Which hex in the system is being drawn? - Y
      Int32 penr,  //Outline (pen) color - R
      Int32 peng,  //Outline (pen) color - G
      Int32 penb,  //Outline (pen) color - B
      Int32 anti   //Anti-alias? (0 means "no")

    In the C# application provided, these declarations are inserted directly into the Form1 class, very near the top of "Form1.cs". The systemhex(), hexdllstart(), and hexdllend() functions are therefore accessible as static methods of the Form1 class.

    At runtime, "hexdll.dll" must be present in the same folder as the .NET executable ("hexdotnet.exe" here), or at least present in the Windows search path, for this technique to work.

    In the declarations shown above, note that the Cdecl calling convention is used, as opposed to the default Stdcall option. Programmers uninterested in this distinction can simply copy the declarations as shown above without giving this any thought. For those interested in detail, the author found that using Stdcall in the DLL implementation code caused MinGW to engage in some undesirable name mangling. The DLL function names ended up looking like hexdllstart@0.

    The Stdcall convention uses these extra suffixes to support function name overloading; Cdecl does not support overloaded functions and therefore does not require them. It is worth noting, too, that this sort of name mangling is an inherent requirement for the linkage of C++ class methods; the library presented here thus makes no attempt to expose an OO interface.

    Calls to functions hexdllstart() and hexdllend() must bracket any use of "hexdll.dll" for drawing. These functions exist basically to call Microsoft's GdiplusStartup and GdiplusShutdown API functions, at app startup / shutdown. This design is in keeping with Microsoft's guidelines for the construction of DLLs that use GDI+.

    The actual hex-drawing code in each client app consists of call(s) to systemhex(). In this identifier, the word "system" refers not to some sort of low-level privilege, but to the system of coordinates created by a hex tessellation. Any such tessellation has a hexagon designated (0,0), at its top / left corner. Unless the tessellation is exceedingly narrow, there is a (1,0) hexagon to its right. Unless the tessellation is very short, there is a (0,1) hexagon beneath the (0,0) hexagon.

    The figure below shows an example hex tessellation with several of its constituent hexagons labeled with their (X,Y) coordinates, as defined in the system used by "hexdll.dll". In this figure, many arbitrary but necessary decisions made by the author are evident. The placement of the origin at the top left is an obvious example. More subtly, note that the entire grid is oriented such that vertical columns of hexagons can be identified (e.g., the column of hexagons with "X" coordinate 0). The grid could be rotated 90 degrees such that these formed rows instead, but this is not the orientation used here. Finally, note that hexagons at odd "X" coordinates, by convention, are located slightly lower in the "Y" dimension than those at even "X" coordinates. This is another one of these arbitrary decisions made by the author, each of which will potentially impact the implementation of any application that uses "hexdll.dll".


    Figure 3: Coordinate system used by "hexdll.dll"

    Returning to the declaration of systemhex(), the coordx and coordy parameters to this function define the coordinate of the single hexagon drawn by each call to systemhex(). This (X,Y) point defines the entire hexagon in terms of a coordinate system like the one shown in the figure above. The specifics of this coordinate system are passed in parameters origx, origy, and magn. The origx and origy parameters, taken together, define where the leftmost vertex of the hexagon (0,0) is located. These coordinates are expressed in pixels, relative to coordinate (0,0) of the surface onto which the hexagon is being drawn.

    The magn parameter defines the size of each hexagon. Each hexagon is 2.0 * magn pixels wide. Each hexagon's height is slightly less than that, at approximately 1.7321 times magn. (This is 2.0 * sin(60o)) * magn.)

    Two RGB color triads are passed to systemhex(): parameters r, g, and b define the interior color of the hexagon, while penr, peng, and penb define the color of its single-pixel outline. Each of these parameters can range from 0 to 255.

    Finally, the IntPtr parameter to systemhex() is a HANDLE to the DC to be drawn upon. In the .NET example client provided, this is obtained by taking the Handle property of a Panel control created for this purpose, and passing it to the Win32 GetDC() function. This function is brought into the .NET program using a DllImport declaration very similar to the three already shown, along with the corresponding cleanup function ReleaseDC():

    static extern IntPtr GetDC(IntPtr hWnd);
    static extern bool ReleaseDC(IntPtr hWnd, IntPtr hDC);

    In the .NET example program, the MakeHex() method of Form1 does the actual drawing. It is deliberately ill-behaved, redrawing an 80 x 80 hex coordinate system in its entirety. Because MakeHex() gets called one time for every Resize event, this presents severe performance issues unless each call to systemhex() executes with sufficient speed. The code for MakeHex() is shown in its entirety below:

    private void MakeHex()
      IntPtr h = GetDC(this.panel1.Handle);
      //Not efficient. Good for testing.
      for (int row = 0; row < 80; ++row)
       for (int col = 0; col < 80; ++col)
        systemhex(h, 30, 30, 10, 255, 255, 255, row, col, 255, 0, 0, 0);
      ReleaseDC(this.panel1.Handle, h);

    Above, note that each hex drawn is part of a system having its (0,0) hex at raster coordinate (30,30). This is measured in pixels from the top left corner of Panel1, which is configured to fill the entire client area of Form1. Each hex is 20 pixels wide (twice the magn parameter of 10). The hexagons are white (red=255, green=255, blue=255), with a bright red outline. A full 6400 hexagon is drawn with each call to MakeHex(); an 80 x 80 grid of hexagons is drawn, at system coordinates (0,0) through (79,79). The result of this process is shown below; note that the window is not sufficiently large at this point in time to show the entire 80 x 80 grid:


    Figure 4: "Hexdotnet.exe" at startup (not anti-aliased)

    As the code exists in the download provided, the final parameter to systemhex(), named anti, is set to 0. This disables anti-aliasing and allows for GDI (as opposed to GDI+) to be used, which is key to obtaining good Resize performance. The tradeoff is a somewhat jagged rendering, as evident in the picture above.

    If anti is set to a non-zero value, and the .NET example client is recompiled, then a significant performance lag will be perceptible when resizing Form1. In the author's test, a simple maximize operation performed immediately after app startup took about 2 seconds with anti-aliasing enabled.

    Significantly, GDI's performance advantage was present even when compared to GDI+ running without anti-aliasing enabled (i.e., with SmoothingModeHighSpeed in effect). If OVERRIDE_C_GDI is defined when "hexdll.cpp" is built, GDI+ will be used for all calls. The resultant performance lag is, again, quite perceptible, and the author provides this option only for performance testing.


    The Build Script

    The C# demonstration described in the last section can be built and executed by simply opening "hexdotnet.sln" and pressing F5. A pre-built copy of the DLL is included along with its source code.

    The DLL can be rebuilt, though, along with all of the C++ demonstration programs, using the build script "make.bat". This batch file also copies "hexdll.dll" to the requisite locations under the "hexdotnet" folder tree.

    The script "clean.bat" is also provided; it removes all traces of the build process, except for the pre-built version of the DLL included with the .NET solution. These are intended for execution from a Command Prompt window, not directly from Explorer or Task Manager. Before attempting to run "make.bat", it is necessary to include the MinGW binary path in the environment PATH, e.g.:


    Figure 5: C++ Build Steps

    A batch file that sets PATH properly is also provided in the source code archive. It is named "envvars.bat". This can be run instead of the set command shown above.

    The build script itself consists mostly of calls to g++. The commands that compile "hex3d.cpp" and "hexplane.cpp" rely on build commands that are very similar to those shown in Direct3D Programming with MinGW. The commands that build "hexdll.dll" itself rely heavily on the MinGW / GDI+ build instructions given in GDI+ Programming With MinGW, and also on some detailed instructions for DLL construction given by the developers of MinGW.

    In all of the "g++" commands in "make.bat", the -w option is used to disable warnings. In the versions of MinGW used by the author, this either had no effect, i.e., there were no warnings even without -w, or, if there were warnings, they came from Microsoft's DirectX source files.

    MinGW Version Differences

    The author used the November, 2011 release of MinGW during the final development of the code supplied, with the MinGW Developer Toolkit selected for inclusion during installation. Slightly different techniques were necessary with earlier versions of MinGW.

    GDI+ headers are now included in the distribution, and do not have to be obtained from elsewhere, for example. These headers are in a "gdiplus" subfolder, though, which must be considered in constructing one's #include directives.

    Also, it used to be possible to run the MinGW compiler without including "c:\mingw\bin" (or equivalent) in the search path. In the latest versions of MinGW, this will result in missing dependency errors when attempting to use "g++.exe".

    Some of these earlier techniques were used in GDI+-Programming-With-MinGW and Direct3D Programming with MinGW, and the instructions given in these articles remain valid when the compiler actually recommended in those specific articles is used.

    C++ Demonstrations

    At a high level, the steps necessary to create a client application for "hexdll.dll" are the same in both C# and C++. In both cases, the DLL itself must be present alongside the client EXE at runtime (or, at least, in its search path). Similarly, in both C# and C++ clients, a sort of function prototype or stub declaration is inserted into the client code base, to represent the DLL functions. Once these preconditions are met, the DLL functions can be called exactly like the functions (in the case of C++) or methods (in C#) implemented in the client code.

    In the C++ client code written here, these declarations are brought into the code base from a header file, "hexdll.h", using #include. This is a very typical way for C++ programs to share declarations, and to, in this way, expose interfaces to each other. The C++ declarations comprising the external interface of "hexdll.dll" are shown below. This is the core of "hexdll.h":

    void HEXDLL systemhex(HDC hdc,int origx,int origy,int magn,int r,
          int g,int b,int coordx,int coordy,int penr,int peng,int penb,BOOL anti);
    void HEXDLL hexdllstart();
    void HEXDLL hexdllend();

    These declarations are analogous to the C# declarations of systemhex(), hexdllstart(), and hexdllend(), shown earlier. The HEXDLL macro evaluates, when included in a client application, to __declspec(dllimport), a Windows-specific modifier for functions imported from a DLL. During the DLL build, HEXDLL evaluates to __declspec(dllexport); this is all managed using the preprocessor macro BUILDING_HEXDLL.

    When included by a C++ compilation unit, the declarations shown above get wrapped in an extern "C" block. This action is based on the fact that __cplusplus is defined. The extern "C" block ensures that the Cdecl calling convention is used, even in C++ programs, and that names are not mangled. Finally, all of this code is bracketed by an #ifdef directive designed to keep these declarations from getting repeated due to preprocessor actions. Of course, the author of the client application needs only to #include the header file and call its functions.

    In neither (C++ / .NET) case does the client application code need to make any direct reference to the GDI+ libraries. Rather, they are included indirectly, as a part of "hexdll.dll".

    Spinning Cube Demo

    Three C++ example programs are provided with the article. First, "hex3d.exe" is a variation on the spinning cube demo shown in Direct3D Programming with MinGW. This is the application shown earlier in Figure 2. It is built from a single source code file, "hex3d.cpp". In this program, a static texture is not used for the cube surfaces. Instead, with each iteration of the rendering loop, a DC is obtained for the texture's main appearance surface, and is drawn on using systemhex(). Random shades of red are used for each hexagon, resulting in an appealing frame-by-frame flicker effect. The application exits after a certain number of frames have been rendered (by default, a thousand frames). This allows for easy determination of frame rate, by timing the demo's execution time.

    The code to get a DC from a texture surface is a new addition to "hex3d.cpp", compared to its spinning cube predecessor. This task is performed by the function do2dwork, shown below this paragraph. This function is called with each iteration of the main loop, prior to the call to render().

    void do2dwork()
     IDirect3DSurface9* surface=NULL;
     hexgridtexture->GetSurfaceLevel(0, &surface);
     HDC hdc;
     for(int hexcx=0;hexcx<TESSEL_ROWS;++hexcx)
      for(int hexcy=0;hexcy<TESSEL_ROWS;++hexcy) 
       //Slight flicker to red in hexagons
       //Red values range from MIN_HEX_RED to 255
       int red=(rand()%(256-MIN_HEX_RED))+MIN_HEX_RED;     

    The first four lines of code in the function body above serve to get the necessary DC handle for systemhex(). The loop immediately after that is very similar in its implementation to the C# loop from MakeHex(). The color randomization code in the loop body is new, but straightforward. As is typical of C++ compared to C#, the final two statements above clean up resources.

    Like "hexdotnet.exe", "hex3d.exe" expects "hexdll.dll" to be present in the current search path at runtime. In addition, it requires the file "hex3d.png" to be present. This contains appearance information for the static texture applied to the top and bottom of the demo solid.


    This demonstration program creates an illusion of flight in 3D space, above a flat terrain covered by a hex tessellation. The program is built from a single source code file, "hex3d.cpp". It is shown in action in Figure 1, near the top of the article. In this demo, flight takes place in the positive "Z" direction (forward), with rotation about the "Z" axis occurring throughout the flight. Movement continues at an accelerating (but limited) rate until shortly after the terrain below passes out of view. At that point, the demo restarts. The sky is simulated by applying a horizon image to a rectangular solid off in the distance. Like the spinning cube demo, "hexplane.exe" exits after a set number of frames have been rendered.

    In many ways, this demo is a simplification of the spinning cube demo. Only two rectangular faces must be drawn, versus six in the spinning cube demo. The declaration of the eight vertices required to draw these two rectangular faces is shown below:

    // These are our vertex declarations, for both of the rectangular faces
    //  being drawn.
    MYVERTEXTYPE demo_vertices[] =
     { -SKY_SIZE,  SKY_SIZE, SKY_DISTANCE, 0, 0, -1, 0, 0 },         // Sky face  
     {  SKY_SIZE,  SKY_SIZE, SKY_DISTANCE, 0, 0, -1, 1, 0 },
     { -SKY_SIZE, -SKY_SIZE, SKY_DISTANCE, 0, 0, -1, 0, 1 },
     {  SKY_SIZE, -SKY_SIZE, SKY_DISTANCE, 0, 0, -1, 1, 1 },
     { -GROUND_SIZE, -GROUND_DEPTH,  GROUND_SIZE, 0, 1, 0, 0, 0 },    // Ground face
     {  GROUND_SIZE, -GROUND_DEPTH,  GROUND_SIZE, 0, 1, 0, 1, 0 },
     { -GROUND_SIZE, -GROUND_DEPTH, -GROUND_SIZE, 0, 1, 0, 0, 1 },
     {  GROUND_SIZE, -GROUND_DEPTH, -GROUND_SIZE, 0, 1, 0, 1, 1 },

    This declaration consists of eight distinct vertex structures, each occupying its own line in the code. Each of these begins with "X", "Y", and "Z" coordinates. These coordinates are defined using preprocessor constants that hint at their purposes. More details about the actual design of 3D solids is available in Direct3D Programming with MinGW; the ground face is roughly analogous to the top face of the original spinning cube, and the sky face is analogous to its front.

    The remainder of the initializers are explained by the declaration of MYVERTEXTYPE, the custom vertex struct used by both of the Direct3D demo programs presented here. This declaration is shown below:


    Note that immediately after the coordinates comes the normal vector, followed by 2D point (U,V). The normal vector extends outward into space from the solid, and is perpendicular to the face; this is necessary for lighting purposes. For the ground face, the normal vectors are <0,1,0>, i.e., a vector sticking straight up in the "Y" dimension. For the sky face, the normal vectors point at the user, i.e., in the negative "Z" direction. They thus have a value of <0,0,-1>.

    Point (U,V) maps the vertex to a point on the 2D surface of whatever texture is applied to it. The texture 2D coord system has "U" increasing from top to bottom, and "V" increasing from left to right. Because both rectangular faces are defined as triangle strips, a criss-cross pattern is evident in (U,V), as well as in the "X", "Y", and "Z" coordinates themselves; the vertices do not go around the rectangle from vertex 0, to 1, to 2, to 3; rather, they cross over the rectangle in diagonal fashion between vertex 1 and vertex 2. This is consistent with Direct3D's general expectation that solids be comprised of triangular facets.

    Both textures used have a static appearance. As a result, anti is set to 1; because the hex tessellation is drawn just once, there is no real performance penalty associated with this improvement. There is still a function do2dwork(), as was seen in "hex3d.cpp", but it is called only once, before the first frame is rendered, to set up the static texture appearance. The code for this function is shown below:

    void do2dwork()
     IDirect3DSurface9* surface=NULL;
     HDC hdc;
     planetexture->GetSurfaceLevel(0, &surface);
     for(int hexcx=0;hexcx<TESSEL_ROWS;++hexcx)
      for(int hexcy=0;hexcy<TESSEL_ROWS;++hexcy)
        case 0:
         //255 means full color for R, G or B
        case 1:
        case 2:
        case 3:  

    As in "hex3d.cpp", the function begins by obtaining a handle to a DC for the surface's appearance. Again, a tessellation of fixed size is drawn. Here, the randomization component is different; either a red, green, or blue hex can be drawn, or no hex at all can be drawn for a given system coordinate. This allows a default appearance, dictated by file "hexplane.png", to show through. This default appearance is loaded from "hexplane.png" earlier in the startup sequence using a call to D3DXLoadSurfaceFromFile. Preprocessor constants TESSEL_ORIG_X, TESSEL_ORIG_Y, and TESSEL_MAGNITUDE define the coordinate system used for the hex terrain; these were tuned to yield hexagons of an acceptable size, and to achieve full coverage of the ground surface. In particular, slightly negative values are used for TESSEL_ORIG_X and TESSEL_ORIG_Y, to avoid leaving unfilled space around the top and left edges of the tessellation.


    This demo creates a high-resolution, 8.5" x 11.0" bitmap file. The executable shows a modal message box with the message "DONE!" after it has finished creating the output bitmap file. The bitmap is completely covered by a black and white hex tessellation, drawn with anti-aliasing enabled. If printed, the result could be useful for a board or pen-and-paper game built around a hexagonal coordinate system. This program is built from source code file "hex2d.cpp".

    Unlike the other two C++ demos, DirectX is not used here. Rather, GDI and Win32 API calls only are used, in conjunction with calls into "hexdll.dll", to achieve the desired result. Specifically, a starting BITMAP is created. A DC is then obtained, for drawing onto this BITMAP. This DC is passed to systemhex() repeatedly, in a nested for loop, to draw the hex tessellation. Finally, the resultant appearance data after drawing must be written from memory out to a properly formatted bitmap file. This last step in particular requires a significant amount of new low-level code compared to the two 3D C++ demos.

    The series of steps outlined in the last paragraph are mostly executed directly from main(). After declaring some local variables, main() begins as shown below:

    //Delete "out.bmp"
    synchexec("cmd.exe","/c del out.bmp");
    //Make blank "temp.bmp"
    synchexec("cmd.exe","/c copy blank.bmp temp.bmp");
    //Modify TEMP.BMP...
    hbm = (HBITMAP) LoadImage(NULL, "temp.bmp", IMAGE_BITMAP, 0, 0,
    if(hbm==NULL) //Error
        MessageBox(0,"BITMAP ERROR","Hex2D.exe",
        return 1;

    The two calls to synchexec() (a wrapper for ShellExecuteEx()) serve to delete "out.bmp", which is the program output file, and then to create working bitmap file "temp.bmp". Note that this working file is a copy of "blank.bmp", which is a plain white bitmap having 16-bit color depth (like the output bitmap). In the application code as provided, this is just a starting point, which is completely overwritten using systemhex calls.

    The main() function continues as shown below:

    static BITMAP bm;
      (HGDIOBJ)hbm,     // handle to graphics object of interest
      sizeof(BITMAP),   // size of buffer for object information
      (LPVOID)&bm   // pointer to buffer for object information

    This code snippet takes the HBITMAP value hbm, which is a pointer-like identifier for a structure held by Windows, and converts it into a BITMAP object proper, present in the static storage of "hex2d.cpp". Getting the actual BITMAP structure (vs. an HBITMAP) is useful as a way to access some properties like width and height using "dot" operator. Variable bigbuff, which is declared with a static size equal to the known memory requirements of the high-resolution bitmap, holds the local copy of the BITMAP appearance information.

    Next, main() continues with the code shown below:


    The series of calls shown above first create a new and independent DC, as opposed to one obtained for a control or window. The DC created is compatible with the desktop (HWND zero), since there is no app main window DC to pass instead. Then, the code associates this DC, and the drawing about to happen, with hbm. Now, with this relationship established, the actual hexagon drawing can take place, with the newly created DC passed as the first parameter to systemhex():

    for(int ccx=0;ccx<HEX_GRID_COLS;++ccx)
      for(int ccy=0;ccy<HEX_GRID_ROWS ;++ccy)
       systemhex( hdc,
        1 );

    This code fragment is very reminiscent of the earlier demos. Note that the last parameter is 1, indicating that anti-aliasing is enabled. All of the other parameters are constants which, as before, were tweaked by the author, based on observation, to yield complete coverage of the target surface.

    The remainder of main() writes out the image identified by hbm to a ".bmp" file. This is a somewhat tedious process, which is already well-summarized elsewhere online. One noteworthy addition made for this application is that DPI is explicitly set to 192, using the bit of code presented below. Note that the actual setting involves the somewhat more obscure terminology "pels per meter". Application constant IMG_PELS_PER_METER contains the correct value of 7,560 pels per meter:

    lpbi->bmiHeader.biYPelsPerMeter = IMG_PELS_PER_METER;
    lpbi->bmiHeader.biXPelsPerMeter = IMG_PELS_PER_METER;

    Several online sources simply set these values to 0. The author wished for a high-resolution, printable image of the correct 8.5" x 11.0" size, though, so setting DPI (or "pels per meter") correctly was deemed necessary.

    Library Implementation

    Many of the calculations required to draw hexagons will involve real numbers. In order to maximize the accuracy of these computations, and to minimize the number of typecast operations necessary, systemhex begins by converting all of its pixel parameters into doubles, and passing them to an inner implementation function:

    void systemhex(HDC hdc,int origx,int origy,int magn,int r,
         int g,int b,int coordx,int coordy,int pr,int pg,int pb,BOOL anti)
          g, b,coordx,coordy,pr,pg, pb,anti);

    This inner function translates coordx and coordy (hex system coordinates) into actual screen coordinates. In doing so, it largely just enforces the arbitrary decisions made by the author in designing the coordinate system. Its if, for example, ensures that hexagons at odd "X" coordinates are located slightly lower in the "Y" dimension than those at even "X" coordinates, as is the stated convention of "hexdll.dll":

    void innerhex(HDC hdc,double origx,double origy,double magn,int r,
         int g,int b,int x,int y,int pr,int pg,int pb,BOOL anti)
     //Odd X translates drawing up and left a bit 
      abstracthex( hdc, 
       magn, r, g, b,pr,pg,pb,anti); 
      abstracthex( hdc, 
       magn, r, g, b,pr,pg,pb,anti); 

    As shown above, the bottom-level function responsible for drawing hexagons in the internal implementation of "hexdll.dll" is another function called abstracthex(). This lower level function operates in terms of system coordinates (as opposed to the hexagon coordinates shown in Figure 3). The prototype of abstracthex() is shown below:

    void abstracthex(HDC hdc,double origx,double origy,double magn,
         int r,int g,int b,int pr,int pg,int pb,BOOL anti)

    Note that in performing this final translation into raster coordinates, the geometry of the hexagon must be considered in depth. Figure 6, below, is a useful aid to understanding this geometry:


    Figure 6: Hexagon Geometry

    The diagram above gives all of the dimensions necessary to implement abstracthex(). The leftmost vertex of the hexagon is, by definition, located at (x,y). This vertex is the first vertex drawn by the abstracthex() function. From there, drawing moves down and right to the next vertex. As shown in Figure 6, the 60° angle is key to these calculations. We can view any side of the hexagon as the hypotenuse of a 30-60-90 triangle. The triangle constructed using dotted lines in Figure 6 is an example of one of these 30-60-90 triangles. The other sides of such a triangle measure cos(60°) times the hypotenuse length (for the shorter side) and sin(60°) times the hypotenuse length (for the longer side). Here, the hypotenuse has length magn, and the two side other than the hypotenuse therefore have lengths of cos(60°)*magn and sin(60°)*magn. The actual measurement shown in Figure 6 is negative, since positive "Y" movement in the Direct3D texture coordinate system is down.

    As shown in the picture above, the shorter of these two triangle sides approximates the movement from the first vertex drawn to the second in the "X" dimension. Similarly, the longer of these two sides approximates the movement from the first vertex drawn to the second in the "Y" dimension. As we move from the first vertex drawn at (x,y) to the next vertex, we therefore move cos(60°)*magn pixels in the "X" dimension and sin(60°)*magn in the "Y" dimension. The coordinate of this second vertex is thus (x+cos(60°)*magn, y+sin(60°)*magn).

    The next vertex drawn is the one directly to the right of the vertex just drawn. Because the length of the side between these two is magn, the third coordinate is located at (x+cos(60°)*magn+magn, y+sin(60°)*magn).

    Instead of passing these coordinate expressions to GDI/GDI+ as shown above, though, the code provided uses a system of running totals, in an effort to minimize repeated calculations. Near the top of abstracthex(), the following initializations are present:

    double cham=COS_HEX_ANGLE*magn;
    double sham=SIN_HEX_ANGLE*magn;
    double opx=(x+cham);   //Second vertex's "X" location
    double opy=(y+sham);   //Hex bottom "Y" location
    double opm=(opx+magn); //Third vertex's "X" location
    double oms=(y-sham);   //Hex top "Y" location

    After the execution of this code, the first three vertices drawn will have coordinates (x,y). The second will be located at (opx,opy), and the third at (opm,y). The fourth coordinate drawn, at the extreme right side of the hexagon, is just a bit further to the right, at (opm+cham,y). The drawing of the fifth vertex moves back toward the left and up, to (opm,oms). Finally, we move back magn pixels to the left, and draw the sixth vertex at (opx,oms).

    Depending on whether or not anti is true, either GDI or GDI+ will be used for the actual drawing operations. In either case, a data structure holding all of the vertex coordinates, in drawing order, is first constructed. For GDI, this is an array of POINT structures, whose construction is shown below:

    POINT hex1[6];
    //Start hex at origin... leftmost point of hex
    //Move [ cos(theta) , sin(theta) ] units in positive (down/right) direction
    //Move ((0.5) * hexwidth) more units right, to make "bottom" of hex
    //Move to vertex opposite origin... Y is same as origin
    //Move to right corner of hex "top"
    //Complete the "top" side of the hex

    Note that the addition of 0.5 to each term serves to achieve proper rounding; otherwise, the decimal portion of each floating point value (x, y, opx, etc.) would simply be abandoned.

    If GDI+ is used, an array of PointF structures is built instead. These structures use floating point coordinates, and no rounding or typecasting is necessary. Their declaration is shown below:

    PointF myPointFArray[] = 
      //Start hex at origin... leftmost point of hex
      PointF(x, y),
      //Move [ cos(theta) , sin(theta) ] units in positive (down/right) direction
      PointF((opx), (opy)),
      //Move ((0.5) * hexwidth) more units right, to make "bottom" of hex
      PointF((opm), (opy)),
      //Move to vertex opposite origin... Y is same as origin
      PointF(opm+cham, y),
      //Move to right corner of hex "bottom"
      PointF((opm), (oms)),
      //Complete the "bottom" side of the hex
      PointF((opx), (oms))

    If GDI is in use, the vertex data structure gets passed to a function named Polygon. The SelectObject API is first used to select a pen with the requested outline color, and then to select a brush with the requested interior color. This series of actions results in a polygon with the requested outline and interior colors.

    Under GDI+, two calls are necessary to achieve the same result, one to DrawPolygon() and one to FillPolygon(). It is once again necessary to create both a pen and a brush, with the first of these getting passed to DrawPolygon() and the second to FillPolygon(). It should be noted that the necessity of two distinct function calls here plays some role in the relatively slow performance obtained using GDI+. However, the author made a point of running tests with a single call to FillPolygon() only, and GDI+ was still much slower than GDI.


    The work presented here led the author to several conclusions about recent versions of Windows, its APIs, and its rendering architecture. GDI, of course, is much more efficient than GDI+. This should be no surprise, given that much of GDI was originally written to work well on the relatively primitive hardware that ran the earliest versions of Windows.

    GDI+ is useful primarily because of its anti-aliasing capability. It also offers a cleaner interface than GDI, e.g. in the area of memory management. This OO interface comes at a significant cost, though. Comparable operations are much slower in GDI+ than in GDI, even with anti-aliasing disabled.

    While both imperfect, GDI and GDI+ do seem to complement each other well. In the demonstration programs provided, GDI+ works well for generating a high-quality printable image, and this, fortuitously, is not a task that needs to happen with incredible quickness anyway. GDI, on the other hand, provides the high level of speed and efficiency necessary for the dynamic texturing demo ("hex3d.exe"), and in this arena its lack of anti-aliasing will usually go unnoticed. The texture will be moving quickly at runtime, and will also get passed through the Direct3D interpolation filters necessary to scale the texture for each frame. Whatever jagged edges GDI might generate compared to GDI+ are quite likely lost in the translation and animation process.

    Finally, some conclusions about combining Direct3D and GDI in the latest versions of Windows were reached by the author in preparing this work. While the changes in GUI rendering that came with Windows Vista were significant, nothing in them seems to rule of the possibility of using GDI to draw on Direct3D surfaces with a level of efficiency that is at least reasonably good. The process of obtaining the necessary DC remains quick and intuitive, and the GDI operations themselves seem mostly to be fast enough to keep up with Direct3D.


    1. MinGW did not support later versions of DirectX when the article was written. At least, DirectX 9 was the newest version for which headers were present in "c:\mingw\include\", and web searches yielded no obvious way to incorporate later versions. Microsoft's "August 2007" version of the DirectX SDK should therefore be installed in order to build the 3D demonstration programs. Detailed instructions for obtaining and installing the SDK are given in Direct3D Programming with MinGW.
    2. At present, only 32-bit client applications are supported. To support 64-bit clients would require the code for "hexdll.dll" to be rebuilt in a 64-bit development environment. While there is no reason to suspect that this would not work, 64-bit compilation has not been tested.


    This is the second major version of this article. Compared to the first version, some improvements in formatting and clarity have been made. The code and binary files have not changed. 


    This article, along with any associated source code and files, is licensed under The GNU General Public License (GPLv3)


    저작자 표시

    Endogine sprite engine




    Endogine sprite engine

    By | 17 Jul 2006 | Article
    Sprite engine for D3D and GDI+ (with several game examples).

    Sample Image

    Some of the examples are included in the source.


    Endogine is a sprite and game engine, originally written in Macromedia Director to overcome the many limitations of its default sprite engine. I started working in my spare time on a C# version of the project about a year ago, and now it has far more features than the original - in fact, I've redesigned the architecture many times so it has little in common with the Director++ paradigm I tried to achieve. Moving away from those patterns has made the project much better. However, there are still a lot of things I want to implement before I can even call it a beta.

    Some of the features are:

    • Easy media management.
    • Sprite hierarchy (parent/child relations, where children's Rotation, LOC etc. are inherited from the parent).
    • Behaviors.
    • Collision detection.
    • Plugin-based rendering (Direct3D, GDI+, Irrlicht is next).
    • Custom raster operations.
    • Procedural textures (Perlin/Wood/Marble/Plasma/others).
    • Particle systems.
    • Flash, Photoshop, and Director import (not scripts). NB: Only prototype functionality.
    • Mouse events by sprite (enter/leave/up/down etc. events).
    • Widgets (button, frame, window, scrollbar etc.). All are sprite-based, so blending/scaling/rotating works on widget elements as well.
    • Animator object (can animate almost any property of a sprite).
    • Interpolators (for generating smooth animations, color gradients etc.).
    • Sprite texts (each character is a sprite which can be animated, supports custom multicolor bitmap fonts and kerning).
    • Example game prototypes (Puzzle Bobble, Parallax Asteroids, Snooker/Minigolf, Cave Hunter).
    • IDE with scene graph, sprite/behavior editing, resource management, and debugging tools.
    • Simple scripting language, FlowScript, for animation and sound control.
    • Plug-in sound system (currently BASS or DirectSound).
    • New - Color editor toolset: Gradient, Painter-style HSB picker, Color swatches (supports .aco, .act, Painter.txt).
    • New - Classes for RGB, HSB, HSL, HWB, Lab, XYZ color spaces (with plug-in functionality). Picker that handles any 3-dimensional color space.

    Sample Image

    Some of the current GUI tools (editors, managers etc.).


    I had been developing games professionally in Macromedia Director for 10 years, and was very disappointed with the development of the product the last 5 years. To make up for this, I wrote several graphical sub-systems, some very project-specific, but finally I designed one that fulfilled the more generic criteria I had for a 2D game creation graphics API. It was being developed in Director's scripting language Lingo from autumn 2004 to spring 2005, and since then it's a C# project..

    It's a prototype

    The current engine design is not carved in stone, and I have already made several major changes during its development, and even more are planned.

    Optimizations will have to wait until all functionality is implemented. The GDI+ mode is extremely slow, because I haven't ported my dirty rect system yet. The D3D full-screen mode has the best performance.

    The code is poorly commented at this stage, as it is still possible I'll rewrite many parts. If there is a demand for documentation, I will create it as questions appear. For now, you can get a feel for how to use it, by investigating the Tests project.

    Example projects

    There are two solutions in the download. One is the actual engine, including a project called Tests which contains most of the examples and code. I choose to include it in the solution since it's a little bit easier to develop/debug the engine if the test is part of it, but that's not the way your own projects should be set up. The MusicGame project is closer to how it should be done.

    There's also a simple tutorial text on how to set up your own project.

    I wanted to have a simple, but real-life testbed, so I'm creating a few game prototypes. Currently, they are Puzzle Bobble, a scrolling asteroid game, a golf/snooker game, CaveHunter, and Space Invaders. Other non-game tests are also available in the project. Turn them on and off by bringing the main window into focus and select items from the "Engine tests" menu.

    Example walkthrough

    Note: For using dialogs / editors, the Endogine.Editors.dll has to be present in the .exe folder. For sound, include Endogine.Audio.Bass.dll and the files in (shareware license).

    To try out some of the examples in the Tests project, run the Tests solution and follow these steps:

    • Start in 3D/MDI mode (default).
    • Set focus on the Main window, so the Engine Tests menu appears in the MDI parent.
    • Select Particle System from the menu. The green "dialog" controls a few aspects of the particle system. The top slider is numParticles, the bottom the size. The buttons switch between color and size schemes. After playing around with it, turn it off by selecting Particle System again from the menu.
    • Select GDI+ random procedural, and also Font from the menu. This demonstrates a few ways to create procedural textures and manipulate bitmaps, as well as the support for bitmap fonts. Each letter sprite also has a behavior that makes it swing. Note that both are extremely slow - they're using my old systems. I'll upgrade them soon which will make them a hundred times faster.
    • Go to the SceneGraphViewer, and expand the nodes until you get to the Label sprite. Right-click it and select the Loc/Scale/Rot control. Try the different modes of changing the properties. Notice the mouse wrap-around feature.
    • Close the Loc/Scale/Rot control, go back to the SceneGraphViewer, expand the Label node. Right-click one of the letter sprites and select Properties (the LocY and Rotation properties are under the program control so they are hard to change here).
    • Click the Behaviors button to see which behaviors the sprite has. Mark ThisMovie.BhSwing, and click Remove to make it stop swinging. (Add... and Properties... aren't implemented yet).
    • Stop the program, and start it again, but now deselect MDI mode (because of .NET's keyboard problems in MDI mode).
    • Set focus to the Main window and select Puzzle Bobble from the menu. Player 1 uses the arrow keys to steer (Down to shoot, Up to center), player 2 uses AWSD. Select it again to turn it off. Note: due to changes in the ERectangle(F) classes, there's currently a bug which makes the ball stick in the wrong places. Will look into that later. (May be fixed.)
    • Select Snooker from the menu and click-drag on the white ball to shoot it. Open the Property Inspector for the topology sprite object, change Depth and Function to see how a new image is generated, use the Loc/Scale/Rot control to drag it around, and see how the balls react (buggy, sometimes!).
    • Select Parallax Scroll from the menu and try the Asteroids-like game. Arrow keys to steer, Space to shoot. Note: the Camera control's LOC won't work now, because the camera is managed by the player, centering the camera over the ship for each frame update. That's how the parallax works - just move the camera, and the parallax layers will automatically create the depth effect.
    • (Broken in the Feb '06 version) Go to the main menu, Edit, check Onscreen sprite edit, and right-click on a sprite on the screen, and you'll get a menu with the sprites under the mouse LOC. Select one of the sprites, and it should appear as selected in the SceneGraph. It doesn't update properly now, so you'll have to click the SceneGraph's toolbar to see the selection.

    Using the code

    Prerequisites: .NET 2.0 and DirectX 9.0c with Managed Extensions (Feb 06) for the demo executable, and DirectX SDK Feb 06 for compiling the source. You can download SharpDevelop 2.0 or Microsoft's Visual Studio Express C# for free, if you need a C# developer IDE. Read the README.txt included in the source for further instructions.

    Note that Managed DirectX versions aren't backwards nor forwards compatible. I think later versions will work if you recompile the project, but the demo executable needs MDX Feb 06.

    I'm currently redesigning the workflow, and you would need a fairly long list of instructions in order to set up a new solution which uses Endogine, which I haven't written. The easiest way to get started is by looking at the Tests project. You can also have a look at the MusicGame solution, which is more like how a real project would be organized.

    Most of the terminology is borrowed from Director. Some examples of sprite creation:

    Sprite sp1 = new Sprite();
    //Loads the first bitmap file named 
    //Ball.* from the default directory
    sp1.MemberName = "Ball";
    sp1.Loc = new Point(30,30); //moves the sprite
    Sprite sp2 = new Sprite();
    //If it's an animated gif, 
    //the sprite will automatically animate
    sp2.MemberName = "GifAnim";
    sp2.Animator.StepSize = 0.1; //set the animation speed
    sp2.Rect = new RectangleF(50,50,200,200); //stretches and moves the sprite
    Sprite sp2Child = new Sprite();
    //same texture/bitmap as sp1's will be used 
    //- no duplicate memory areas
    sp2Child.MemberName = "Ball";
    //now, sp2Child will follow sp2's location, 
    //rotation, and stretching
    sp2Child.Parent = sp2;

    Road map

    I'll be using the engine for some commercial projects this year. This means I'll concentrate all features that are necessary for those games, and that I probably won't work on polishing the "common" feature set a lot.

    There will be a number of updates during the year, but I've revised my 1.0 ETA to late autumn '06. I expect the projects to put an evolutionary pressure on the engine, forcing refactoring and new usage patterns, but resulting in a much better architecture. Another side effect is documentation; I'll have to write at least a few tutorials for the other team members.

    Currently, I put most of my spare time into PaintLab, an image editor, which is based on OpenBlackBox, an open source modular signal processing framework. They both use Endogine as their graphics core, and many GUI elements are Endogine editors, so it's common that I need to improve/add stuff in Endogine while working on them.

    Goals for next update (unchanged)

    I've started using Subversion for source control, which will make it easier to assemble new versions for posting, so updates with new tutorials and bug-fixes should appear more often. Some probable tasks for the next few months:

    • Switch between IDE/MDI mode and "clean" mode in runtime.
    • Clean up terminology, duplicate systems, continue transition to .NET 2.0.
    • Look into supporting XAML, harmonize with its terminology and patterns.
    • Fix IDE GUI, work on some more editors.


    Update 2006-07-06

    Again, other projects have taken most of my time - currently it's the OpenBlackBox/Endogine-based PaintLab paint program. Side effects for Endogine:

    • Color management tools: editors, color space converters, color palette file filters etc.
    • Refactoring: WinForms elements are being moved out of the main project into a separate DLL. Simplifies porting to mono or WPF, and makes the core DLL 50% smaller.
    • Improved Canvas, Vector3, Vector4, and Matrix4 classes.
    • Improved PSD parser.
    • More collision/intersection detection algorithms.
    • Several new or updated user controls.

    Update 2006-03-15

    I've focused on releasing the first version of OpenBlackBox, so most of the modifications have been made in order to provide interop functionality (OBB highly dependent on Endogine).

    • Optimizations (can be many times faster when lots of vertices are involved).
    • Continued transition to .NET 2.0.
    • Requires DirectX SDK Feb '06.
    • Simple support for HLSL shaders and RenderToTexture.
    • Better proxies for pixel manipulation - Canvas and PixelDataProvider classes instead of the old PixelManipulator.
    • Extended interfaces to the rendering parts of Endogine. Projects (such as OpenBlackBox) can take advantage of the abstracted rendering API without necessarily firing up the whole Endogine sprite system.
    • Forgot to mention it last time, but there's a small tutorial included since the last update.

    That's pretty much it. But don't miss OpenBlackBox, in time it will become a very useful component for developing applications with Endogine!

    Update 2006-02-01

    Some major architectural changes in this version, especially in the textures/bitmap/animation system. The transition isn't complete yet, so several usage patterns can co-exist and cause some confusion. Some utilities will be needed to make the animation system easier to use.

    • Moved to .NET 2.0. Note: refactoring is still in progress (e.g., generics isn't used everywhere yet).
    • The Isometric example has been removed. I've continued to work on it as a separate project, which I'll make available later if possible.
    • Renderers are now plug-ins, allowing for easier development/deployment. (Also started on a Tao- based OpenGL renderer, but lost my temper with its API.)
    • PixelManipulator - easy access to pixels regardless of if the source is a bitmap or texture surface.
    • Examples of how to use the PixelManipulator (adapted Smoke and Cellular Automata3 from - thanks Mike Davis and Glen Murphy for letting me use them).
    • C# version of Carlos J. Quintero's VB.NET TriStateTreeView - thanks Carlos. Added a TriStateTreeNode class.
    • Started on a plugin-based sound system. Currently, I've implemented two sound sub-systems: BASS and a simple DirectX player. OpenAL can be done, but BASS is cross-platform enough, and it has a better feature set. Later, I'll add DirectShow support.
    • Included a modified version of Leslie Sanford's MidiToolKit (supports multiple playbacks and has a slightly different messaging system). Thanks!
    • Flash parser/renderer has been restructured and improved. Can render basic shapes and animations.
    • System for managing file system and settings for different configurations (a bit like app.config but easier to use and better for my purposes).
    • RegEx-based file finder (extended search mechanism so you can use non-regex counters like [39-80] and [1-130pad:3] - the latter will find the string 001 to 130).
    • Helper for creating packed textures (tree map packing).
    • Abstraction layer for texture usage - the user doesn't have to care if a sprite's image comes from a packed texture or not.
    • The concept of Members is on its way out, replaced by PicRefs. Macromedia Director terminology in general will disappear over time from now on.
    • New, more abstracted animation system. Not fully implemented.
    • .NET scripting system with code injection. Currently only implemented for Boo, but will support other languages. Will be restructured later, with a strategy pattern for the different languages.
    • New vastly improved bitmap font system, both for rendering and creating fonts. Real-time kerning instead of precalculated kerning tables.
    • Localization/translation helper (for multi-language products).
    • A number of helper classes such as the IntervalString class (translates "-1-3,5-7" to and from the array [-1,0,1,2,3,5,6,7]).
    • Unknown number of bug fixes and optimizations.

    Update 2005-10-10

    Since the last update wasn't that exciting from a gaming POV, I decided to throw in a new version with a prototype isometric game. Totally R.A.D.

    • Isometric rendering, based on David Skoglund's game CrunchTime, including his graphics (see _readme.txt in the Isometric folder). Thanks, pal!
    • A Star pathfinding algorithm adapted from an implementation by John Kenedy. Thank you!
    • Removed references to LiveInterface.
    • Added "resource fork" for bitmap files - an XML file with additional info such as number of animation frames and offset point.

    Update 2005-10-04

    OK, it's over a month late, and it doesn't include stuff you might have been waiting for, and the things I've been working on - mainly creating two script languages - aren't that useful in their current state (especially with no good examples of their use). I think it was worth putting some time into, as I'm certain FlowScript will become a great tool later on. Here's what I've got for this version:

    • Compiled using the August 2005 SDK.
    • Added Space Invaders prototype.
    • Added curves rendering to .swf import (it still renders strange images).
    • Reorganized the engine into a .dll.
    • Fixed mouse LOC <-> screen LOC error.
    • Fixed transparency / alpha / color-key issues.
    • Map (probably a bad name) class - like a SortedList, but accepts multiple identical "keys".
    • Node class, like XmlNode, but for non-text data.
    • Started preparing the use of 3D matrices for scale/LOC/rotation.
    • Removed DirectDraw support.
    • Basic sound support.
    • A badly sync'ed drum machine example.
    • FlowScript, a time-based scripting language, aimed at non-programmers for animation and sound control, based on:
    • EScript, simple scripting language based on reflection (no bytecode).
    • Simple CheckBox widget.

    Update 2005-08-01

    • Compiled using the June 2005 SDK.
    • Sprite.Cursor property (works like Control.Cursor).
    • Simple XML editor.
    • .NET standardized serializing.
    • PropertyGrid instead of custom property editors.
    • VersatileDataGrid User Control (a new DataGrid control which implements functionality missing in .NET's standard DataGrid).
    • TreeGrid User Control - a bit like Explorer, but the right-hand pane is a VersatileDataGrid locked to the treeview.
    • Two new game prototypes: CaveHunter and Snooker/MiniGolf.
    • ResourceManager editor w/ drag 'n' drop to scene (and to SceneGraph viewer).
    • Better structure.
    • Transformer behavior - Photoshop-style sprite overlay for moving/scaling/rotating.
    • Scene XML import/export.
    • Director Xtra for exporting movies to Endogine XML scene format.
    • Import Photoshop documents - each layer becomes a sprite, layer effects become behaviors. Decoding of layer bitmaps incomplete.
    • Import Flash swf files (rendering code incomplete).
    • BinaryReverseReader which reads bytes in reverse order (for .psd), and BinaryFlashReader which reads data with sub-byte precision (for .swf).
    • Extended EPoint(F) and ERectangle(F) classes. Note that Puzzle Bobble doesn't work properly after the latest changes, I'll take care of that later.

    Update 2005-07-07

    • Camera node.
    • Parallax layers.
    • LOC/Scale control toolbox.
    • User Controls: ValueEdit (arrow keys to change a Point), JogShuttle (mouse drag to change a Point value by jog or shuttle method).
    • MDI mode crash fixed (thanks to adrian cirstei) - but keys still don't work in MDI mode.
    • Multiple Scene Graph windows.
    • Select sprites directly in scene.
    • Asteroids-type game with parallax layers.
    • Extended EPoint(F) and ERectangle(F) classes.

    Update 2005-06-27

    • Optional MDI interface: editors and game window as MDI windows. (Problem: I get an error when trying to create the 3D device in a MDI window.)
    • Scene Graph: treeview of all sprites.
    • Sprite marker: creates a marker around the sprite which is selected in the scene graph.
    • Property Inspector: interface for viewing and editing sprite properties.
    • Sprite Behaviors: easy way to add functionality to sprites. The swinging letters animation is done with a behavior.
    • Behavior Inspector: add/remove/edit behaviors in runtime.
    • Inks use an enumeration instead of an int (ROPs.Multiply instead of 103).
    • Switched from MS' too simplistic Point(F)/Rectangle(F) classes to my own EPoint(F)/ERectangle(F), which have operator overloading and many more methods. Note that I've chosen to write them as classes, not structs - i.e., they're passed as references, not values.
    • Easier keyboard handling (assigns names to keys - makes it easier to let the user define keys, or to have multiple players on one keyboard).
    • Puzzle Bobble allows multiple players in each area.

    You can read about my early thoughts about the Endogine concept, future plans, and see some Shockwave/Lingo demos here.

    I have added a Endogine C# specific page here, but it probably lags behind this page.


    This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

    A list of licenses authors might use can be found here

    About the Author

    Jonas Beckeman

    Web Developer

    Sweden Sweden


    저작자 표시

    '소스코드' 카테고리의 다른 글

    Using Direct2D with WPF  (0) 2012.07.12
    Hex Grids and Hex Coordinate Systems in Windows: Drawing and Printing  (0) 2012.07.12
    Endogine sprite engine  (0) 2012.07.12
    Paint.NET  (0) 2012.07.12
    GPGPU on Accelerating Wave PDE  (0) 2012.07.12
    Microsoft® Surface® 2 Design and Interaction Guide  (0) 2012.07.12



    Paint.NET is free image and photo editing software for computers that run Windows. It features an intuitive and innovative user interface with support for layers, unlimited undo, special effects, and a wide variety of useful and powerful tools. An active and growing online community provides friendly help, tutorials, and plugins.
    It started development as an undergraduate college senior design project mentored by Microsoft, and is currently being maintained by some of the alumni that originally worked on it. Originally intended as a free replacement for the Microsoft Paint software that comes with Windows, it has grown into a powerful yet simple image and photo editor tool. It has been compared to other digital photo editing software packages such as Adobe® Photoshop®, Corel® Paint Shop Pro®, Microsoft Photo Editor, and The GIMP.





    Writing effect plug-ins for Paint.NET 2.1 in C#

    By | 10 May 2005 | Article
    This article is an introduction on how to create your own effect plug-ins for Paint.NET 2.1 in C#.


    Paint.NET 2.1 was released last week. Created to be a free replacement for the good old Paint that ships with every copy of Windows, it is very interesting for end users at large. But it is even more interesting for developers because of two reasons. First, it is open source. So if you like to study a few megabytes of C# code or how some architectural problems can be solved, go and get it. Second, the application provides a simple but appealing interface for creating your own effect plug-ins. And that's what this article is all about (if you're searching for some fancy effect algorithm, go somewhere else as the effect used in the article is quite simple).

    Getting started

    The first thing you need to do is to get the Paint.NET source code. Besides being the source code, it also serves as its own documentation and as the Paint.NET SDK. The solution consists of several projects. However, the only interesting ones when developing Paint.NET effect plug-ins are the PdnLib library which contains the classes we will use for rendering our effect and the Effects library which contains the base classes for deriving your own effect implementations.

    The project basics

    To create a new effect plug-in, we start with creating a new C# Class Library and add references to the official release versions of the PdnLib (PdnLib.dll) and the PaintDotNet.Effects library (PaintDotNet.Effects.dll). The root namespace for our project should be PaintDotNet.Effects as we're creating a plug-in that is supposed to fit in seamlessly. This is, of course, not limited to the namespace but more of a general rule: when writing software for Paint.NET, do as the Paint.NET developers do. The actual implementation requires deriving three classes:

    1. Effect is the base class for all Paint.NET effect implementations and it's also the interface Paint.NET will use for un-parameterized effects. It contains the method public virtual void Render(RenderArgs, RenderArgs, Rectangle) which derived un-parameterized effects override.
    2. Most of the effects are parameterized. The EffectConfigToken class is the base class for all specific effect parameter classes.
    3. And finally, as parameterized effects most likely will need a UI, there is a base class for effect dialogs: EffectConfigDialog.

    Implementing the infrastructure

    Now, we will take a look at the implementation details on the basis of the Noise Effect (as the name implies, it simply adds noise to an image). By the way, when using the sources provided with this article, you will most likely need to update the references to the Paint.NET libraries.

    The effect parameters

    As I said before, we need to derive a class from EffectConfigToken to be able to pass around our effect parameters. Given that our effect is called Noise Effect and that we want to achieve consistency with the existing sources, our parameter class has to be named NoiseEffectConfigToken.

    public class NoiseEffectConfigToken : EffectConfigToken

    There is no rule what your constructor has to look like. You can use a simple default constructor or one with parameters. From Paint.NET's point of view, it simply does not matter because (as you will see later) the class (derived from) EffectConfigDialog is responsible for creating an instance of the EffectConfigToken. So, you do not need to necessarily do anything else than having a non-private constructor.

    public NoiseEffectConfigToken() : base()

    However, our base class implements the ICloneable interface and also defines a pattern how cloning should be handled. Therefore, we need to create a protected constructor that expects an object of the class' own type and uses it to duplicate all values. We then have to override Clone() and use the protected constructor for the actual cloning. This also means that the constructor should invoke the base constructor but Clone() must not call its base implementation.

    protected NoiseEffectConfigToken(NoiseEffectConfigToken copyMe) : base(copyMe)
      this.frequency      = copyMe.frequency;
      this.amplitude      = copyMe.amplitude;
      this.brightnessOnly = copyMe.brightnessOnly;
    public override object Clone()
      return new NoiseEffectConfigToken(this);

    The rest of the implementation details are again really up to you. Most likely, you will define some private fields and corresponding public properties (as the case may be with some plausibility checks).

    The UI to set the effect parameters

    Now that we've got a container for our parameters, we need a UI to set them. As mentioned before, we will derive the UI dialog from EffectConfigDialog. This is important as it helps to ensure consistency of the whole UI. For example, in Paint.NET 2.0, an effect dialog is by default shown with opacity of 0.9 (except for sessions over terminal services). If I don't use the base class of Paint.NET and the developers decide that opacity of 0.6 is whole lot cooler, my dialog would all of a sudden look "wrong". Because we still try to be consistent with the original code, our UI class is called NoiseEffectConfigDialog.

    Again, you have a lot of freedom when it comes to designing your dialog, so I will again focus on the mandatory implementation details. The effect dialog is entirely responsible for creating and maintaining effect parameter objects. Therefore, there are three virtual base methods you must override. And, which might be unexpected, don't call their base implementations (it seems that earlier versions of the base implementations would even generally throw exceptions when called). The first is InitialInitToken() which is responsible for creating a new concrete EffectConfigToken and stores a reference in the protected field theEffectToken (which will implicitly cast the reference to an EffectConfigToken reference).

    protected override void InitialInitToken()
      theEffectToken = new NoiseEffectConfigToken();

    Second, we need a method to update the effect token according to the state of the dialog. Therefore, we need to override the method InitTokenFromDialog().

    protected override void InitTokenFromDialog()
      NoiseEffectConfigToken token = (NoiseEffectConfigToken)theEffectToken;
      token.Frequency      = (double)FrequencyTrackBar.Value / 100.0;
      token.Amplitude      = (double)AmplitudeTrackBar.Value / 100.0;
      token.BrightnessOnly = BrightnessOnlyCheckBox.Checked;

    And finally, we need to be able to do what we did before the other way round. That is, updating the UI according to the values of a token. That's what InitDialogFromToken() is for. Unlike the other two methods, this one expects a reference to the token to process.

    protected override void InitDialogFromToken(EffectConfigToken effectToken)
      NoiseEffectConfigToken token = (NoiseEffectConfigToken)effectToken;
      if ((int)(token.Frequency * 100.0) > FrequencyTrackBar.Maximum)
        FrequencyTrackBar.Value = FrequencyTrackBar.Maximum;
      else if ((int)(token.Frequency * 100.0) < FrequencyTrackBar.Minimum)
        FrequencyTrackBar.Value = FrequencyTrackBar.Minimum;
        FrequencyTrackBar.Value = (int)(token.Frequency * 100.0);
      if ((int)(token.Amplitude * 100.0) > AmplitudeTrackBar.Maximum)
        AmplitudeTrackBar.Value = AmplitudeTrackBar.Maximum;
      else if ((int)(token.Amplitude * 100.0) < AmplitudeTrackBar.Minimum)
        AmplitudeTrackBar.Value = AmplitudeTrackBar.Minimum;
        AmplitudeTrackBar.Value = (int)(token.Amplitude * 100.0);
      FrequencyValueLabel.Text = FrequencyTrackBar.Value.ToString("D") + "%";
      AmplitudeValueLabel.Text = AmplitudeTrackBar.Value.ToString("D") + "%";
      BrightnessOnlyCheckBox.Checked = token.BrightnessOnly;

    We're almost done. What's still missing is that we need to signal the application when values have been changed and about the user's final decision to either apply the changes to the image or cancel the operation. Therefore, whenever a value has been changed by the user, call UpdateToken() to let the application know that it needs to update the preview. Also, call Close() when leaving the dialog and set the appropriate DialogResult. For example:

    private void AmplitudeTrackBar_Scroll(object sender, System.EventArgs e)
      AmplitudeValueLabel.Text = AmplitudeTrackBar.Value.ToString("D") + "%";
    private void OkButton_Click(object sender, System.EventArgs e)
      DialogResult = DialogResult.OK;
    private void EscButton_Click(object sender, System.EventArgs e)
      DialogResult = DialogResult.Cancel;

    Implementing the effect

    Now everything is in place to start the implementation of the effect. As I mentioned before, there is a base class for un-parameterized effects. The Noise Effect is parameterized but that will not keep us from deriving from Effect. However, in order to let Paint.NET know that this is a parameterized effect, we need to also implement the IConfigurableEffect interface which adds another overload of the Render() method. It also introduces the method CreateConfigDialog() which allows the application to create an effect dialog.

    public class NoiseEffect : Effect, IConfigurableEffect

    But how do we construct an Effect object, or in this case, a NoiseEffect object? This time, we have to follow the patterns of the application which means that we use a public default constructor which invokes one of the two base constructors. The first one expects the effect's name, its description, and an icon to be shown in the Effects menu. The second constructor, in addition, requires a shortcut key for the effect. The shortcut key, however, will only be applied to effects which are categorized as an adjustment. In case of a normal effect, it will be ignored (see chapter Effect Attributes for details on effects and adjustments). In conjunction with some resource management, this might look like this:

    public NoiseEffect() : base(NoiseEffect.resources.GetString("Text.EffectName"),

    The only mandatory implementations we need are those that come with the implementation of the interface IConfigurableEffect. Implementing CreateConfigDialog() is quite simple as it does not involve anything but creating a dialog object and returning a reference to it.

    public EffectConfigDialog CreateConfigDialog()
      return new NoiseEffectConfigDialog();

    Applying the effect is more interesting but we're going to deal with some strange classes we may never have heard of. So let's first take a look at the signature of the Render() method:

    public void Render(EffectConfigToken properties,
                       PaintDotNet.RenderArgs dstArgs,
                       PaintDotNet.RenderArgs srcArgs,
                       PaintDotNet.PdnRegion roi)

    The class RenderArgs contains all we need to manipulate images; most important, it provides us with Surface objects which actually allow reading and writing pixels. However, beware not to confuse dstArgs and srcArgs. The object srcArgs (of course, including its Surface) deals with the original image. Therefore, you should never ever perform any write operations on those objects. But you will constantly read from the source Surface as once you made changes to the target Surface, nobody is going to reset those. The target (or destination) Surface is accessible via the dstArgs object. A pixel at a certain point can be easily addressed by using an indexer which expects x and y coordinates. The following code snippet, for example, takes a pixel from the original image, performs an operation, and then assigns the changed pixel to the same position in the destination Surface.

    point = srcArgs.Surface[x, y];
    VaryBrightness(ref point, token.Amplitude);
    dstArgs.Surface[x, y] = point;

    But that's not all. The region, represented by the fourth object roi, which the application orders us to manipulate, can have any shape. Therefore, we need to call a method like GetRegionScansReadOnlyInt() to obtain a collection of rectangles that approximate the drawing region. Furthermore, we should process the image line by line beginning at the top. These rules lead to a pattern like this:

    public void Render(EffectConfigToken properties, RenderArgs dstArgs,
                       RenderArgs srcArgs, PdnRegion roi)
      /* Loop through all the rectangles that approximate the region */
      foreach (Rectangle rect in roi.GetRegionScansReadOnlyInt())
        for (int y = rect.Top; y < rect.Bottom; y++)
          /* Do something to process every line in the current rectangle */
          for (int x = rect.Left; x < rect.Right; x++)
            /* Do something to process every point in the current line */

    The last interesting fact that should be mentioned is that the Surface class generally uses a 32-bit format with four channels (red, green, blue and alpha) and 8-bits per channel where each pixel is represented by a ColorBgra object. Keep in mind that ColorBgra is actually a struct, so in order to pass an object of that type by reference, you have to use the ref keyword. Furthermore, the struct allows accessing each channel through a public field:

    private void VaryBrightness(ref ColorBgra c, double amplitude)
      short newOffset = (short)(random.NextDouble() * 127.0 * amplitude);
      if (random.NextDouble() > 0.5)
        newOffset *= -1;
      if (c.R + newOffset < byte.MinValue)
        c.R = byte.MinValue;
      else if (c.R + newOffset > byte.MaxValue)
        c.R = byte.MaxValue;
        c.R = (byte)(c.R + newOffset);
      if (c.G + newOffset < byte.MinValue)
        c.G = byte.MinValue;
      else if (c.G + newOffset > byte.MaxValue)
        c.G = byte.MaxValue;
        c.G = (byte)(c.G + newOffset);
      if (c.B + newOffset < byte.MinValue)
        c.B = byte.MinValue;
      else if (c.B + newOffset > byte.MaxValue)
        c.B = byte.MaxValue;
        c.B = (byte)(c.B + newOffset);

    Effect Attributes

    Now we've got our effect up and running. Is there something else we have to do? Well, in this case everything is fine. However, as every effect is different you might want to apply one of the three attributes that are available in the PaintDotNet.Effects namespace. First, there is the attribute EffectCategoryAttribute which is used to let Paint.NET know if the effect is an effect or an adjustment. The difference between those two is that effects are meant to perform substantial changes on an image and are listed in the Effects menu while adjustments only perform small corrections on the image and are listed in the submenu Adjustments in the menu Layers. Just take a look at the effects and adjustments that are integrated in Paint.NET to get a feeling for how to categorize a certain plug-in. The EffectCategoryAttribute explicitly sets the category of an effect by using the EffectCategory value which is passed to the attribute's constructor. By default, every effect plug-in which does not have an EffectCategoryAttribute is considered to be an effect (and therefore appears in the Effects menu) which is equivalent to applying the attribute as follows:


    Of course, the enumeration EffectCategory contains two values and the second one, EffectCategory.Adjustment, is used to categorize an effect as an adjustment so that it will appear in the Adjustments submenu in Paint.NET.


    Besides from being able to categorize effects, you can also define your own submenu by applying the EffectSubMenu attribute. Imagine you created ten ultra-cool effects and now want to group them within the Effects menu of Paint.NET to show that they form a toolbox. Now, all you would have to do in order to put all those plug-ins in the submenu 'My Ultra-Cool Toolbox' within the Effects menu would be to apply the EffectSubMenu attribute to every plug-in of your toolbox. This of course can also be done with adjustment plug-ins in order to create submenus within the Adjustments submenu. However, there is one important restriction: because of the way how effects are managed in Paint.NET, the effect name must be unique. This means that you can't have an effect called Foo directly in the Effects menu and a second effect which is also called Foo in the submenu 'My Ultra-Cool Toolbox'. If you try something like this, Paint.NET will call only one of the two effects no matter if you use the command in the Effects menu or the one in the submenu.

    [EffectSubMenu("My Ultra-Cool Toolbox")]

    Last but not least there is the SingleThreadedEffect attribute. Now, let's talk about multithreading first. In general, Paint.NET is a multithreaded application. That means, for example, that when it needs to render an effect, it will incorporate worker threads to do the actual rendering. This ensures that the UI stays responsive and in case the rendering is done by at least two threads and Paint.NET is running on a multi-core CPU or a multiprocessor system, it also reduces the rendering time significantly. By default, Paint.NET will use as many threads to render an effect as there are logical processors in the system with a minimum number of threads of two.


    physical CPUs

    logical CPUs


    Intel Pentium III




    Intel Pentium 4 with hyper-threading




    Dual Intel Xeon without hyper-threading




    Dual Intel Xeon with hyper-threading




    However, Paint.NET will use only one thread if the SingleThreadedEffect attribute has been applied regardless of the number of logical processors. If the rendering is done by multiple threads, you have to ensure that the usage of any object in the method Render() is thread-safe. The effect configuration token is usually no problem (as long as you don't change its values, which is not recommended anyway) as the rendering threads get a copy of the token instance used by the UI. Also Paint.NET's own PdnRegion class is thread-safe, accordingly you don't have to worry about those objects. However, GDI+ objects like RenderArgs.Graphics or RenderArgs.Bitmap are not thread-safe so whenever you want to use these objects to render your effect, you have to apply the SingleThreadedEffect attribute. You also may apply the attribute whenever you are not sure if your implementation is actually thread-safe or you simply don't want to ponder about multi-threading. Although doing so will lead to a decreased performance on multiprocessor systems and multi-core CPUs, you'll at least be literally on the safe side.



    Creating effect plug-ins for Paint.NET is not too difficult after all. The parts of the object model you need in order to do this are not very complex (trying this is an excellent idea for the weekend) and it even seems quite robust. Of course, this article does not cover everything there is to know about Paint.NET effect plug-ins but it should be enough to create your first own plug-in.


    I'd like to thank Rick Brewster and Craig Taylor for their feedback and for proof-reading this article.

    Change history

    • 2005-05-08: Added a note that shortcut keys are only applied to adjustments and a chapter about attributes.
    • 2005-01-06: Corrected a major bug in NoiseEffectConfigDialog.InitDialogFromToken(EffectConfigToken). The old implementation used the property EffectConfigDialog.EffectToken instead of the parameter effectToken.
    • 2005-01-03: Initial release.


    This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

    A list of licenses authors might use can be found here

    About the Author

    Dennis C. Dietrich

    Web Developer

    Ireland Ireland





    저작자 표시

    GPGPU on Accelerating Wave PDE

    소스 검색한것을 공유하기.






    GPGPU on Accelerating Wave PDE

    By | 20 Apr 2012 | Article
    A Wave PDE simulation using GPGPU capabilities


    This article aims at exploiting GPGPU (GP2U) computation capabilities to improve physical and scientific simulations. In order for the reader to understand all the passages, we will gradually proceed in the explanation of a simple physical simulation based on the well-known Wave equation. A classic CPU implementation will be developed and eventually another version using GPU workloads is going to be presented.

    Wave PDE

    PDE stands for “Partial differential equation” and indicates an equation which has one or more partial derivatives as independent variables in its terms. The order of the PDE is usually defined as the highest partial derivative in it. The following is a second-order PDE:

    Usually a PDE of order n having m variables xi for i=1,2…m is expressed as

    A compact form to express is ux and the same applies to (uxx ) and (uxy ).

    The wave equation is a second order partial differential equation used to describe water waves, light waves, sound waves, etc. It is a fundamental equation in fields such as electro-magnetics, fluid dynamics and acoustics. Its applications involve modeling the vibration of a string or the air flow dynamics as a result from an aircraft movement.

    We can derive the one-dimensional wave equation (i.e. the vibration of a string) by considering a flexible elastic string that is tightly bound between two fixed end-points lying of the x axis

    A guitar string has a restoring force that is proportional to how much it’s stretched. Suppose that, neglecting gravity, we apply a y-direction­ displacement at the string (i.e. we’re slightly pulling it). Let’s consider only a short segment of it between x and x+  :

    Let’s write down for the small red segment in the diagram above. We assume that the string has a linear density (linear density is a measure of mass per unit of length and is often used with one-dimensional objects). Recalling that if a one-dimensional object is made of a homogeneous substance of length L and total mass m the linear density is

    we have m= .

    The forces (as already stated we neglect gravity as well as air resistance and other tiny forces) are the tensions T at the ends. Since the string is actually waving, we can’t assume that the two T vectors cancel themselves: this is the force we are looking for. Now let’s make some assumptions to simplify things: first we consider the net force we are searching for as vertical (actually it’s not exactly vertical but very close)

    Furthermore we consider the wave amplitude small. That is, the vertical component of the tension at the x+ end of the small segment of string is

    The slope is if we consider dx and dy as the horizontal and vertical components of the above image. Since we have considered the wave amplitude to be very tiny, we can therefore assume . This greatly helps us: and the total vertical force from the tensions at the two ends becomes

    The equality becomes exact in the limit .

    We see that y is a function of x, but it is however a function of t as well: y=y(x,t). The standard convention for denoting differentiation with respect to one variable while the other is held constant (we’re looking at the sum of forces at one instant of time) let us write

    The final part is to use Newton’s second law and put everything together: the sum of all forces, the mass (substituted with the linear density multiplied by the segment length) and the acceleration (i.e. just because we’re only moving in the vertical direction, remember the small amplitude approximation).

    And we’re finally there: the wave equation. Using the spatial Laplacian operator, indicating the y function (depending on x and t) as u and substituting (a fixed constant) we have the common compact form

    The two-dimensional wave equation is obtained by adding a second spatial term to the equation as follows

    The constant c has dimensions of distance per unit time and thus represents a velocity. We won’t prove here that c is actually the velocity at which waves propagate along a string or through a surface (although it’s surely worth noting). This makes sense since the wave speed increases with tension experienced by the medium and decreases with the density of the medium.

    In a physical simulation we need to take into account forces other than just the surface tension: the average amplitude of the waves on the surface diminishes in real-world fluids. We may therefore add a viscous damping force to the equation by introducing a force that acts in the opposite direction of the velocity of a point on the surface:

    where the nonnegative constant represents the viscosity of the fluid (it controls how long it takes for the wave on the surface to calm down, a small allows waves to exist for a long time as with water while a large causes them to diminish rapidly as for thick oil).

    Solving the Wave PDE with finite difference method

    To actually implement a physical simulation which uses the wave PDE we need to find a method to solve it. Let’s solve the last equation we wrote with the damping force

    here t is the time, x and y are the spatial coordinates of the 2D wave, c2 is the wave speed and the damping factor. u=u(t,x,y) is the function we need to calculate: generally speaking it could represent various effects like a change of height of a pool’s water, electric potential in an electromagnetic wave, etc..

    A common method to solve this problem is to use the finite difference method. The idea behind is basically to replace derivatives with finite differences which can be easily calculated in a discrete algorithm. If there is a function f=f(x) and we want to calculate its derivative respect to the x variable we can write

    if h is a discrete small value (which is not zero), then we can approximate the derivative with

    the error of such an approach could be derived from Taylor’s theorem but that isn’t the purpose of this paper.

    A finite difference approach uses one of the following three approximations to express the derivative

    • Forward difference

    • Backward difference

    • Central difference

    Let’s stick with the latter (i.e. central difference); this kind of approach can be generalized, so if we want to calculate the discrete version of f''(x) we could first write

    and then calculate f''(x) as follows

    The same idea is used to estimate the partial derivatives below

    Let’s get back to the wave equation. We had the following

    let’s apply the partial derivative formula just obtained to it (remember that u=u(t,x,y), that is, u depends by t,x and y)

    This is quite a long expression. We just substituted the second derivatives and the first derivatives with the formulas we got before.

    If we now consider , we are basically forcing the intervals where the derivatives are calculated to be the same for both the x and the y directions (and we can greatly simplify the expression):

    To improve the readability, let us substitute some of the terms with letters

    and we have

    If you look at the t variables we have something like

    This tells us that the new wave’s height (at t+1) will be the current wave height (t) plus the previous wave height (t-1) plus something in the present (at t) depending only by what are the values around the wave point we are considering.

    This can be visualized as a sequence of time frames one by another where a point on the surface we are considering evolves

    The object

    has a central dot which represents a point (t,x,y) on the surface at time t. Since the term we previously called something_at_t(x,y) is actually

    the value of the central point is influenced by five terms and the latter is its same value (-4ut,x,y ) multiplied by -4. 

    Creating the algorithm

    As we stated before, the wave PDE can be effectively solved with finite difference methods. However we still need to write some resolution code before a real physical simulation can be set up. In the last paragraph we eventually ended up obtaining

    we then simplified this expression with

    This is indeed a recursive form which may be modeled with the following pseudo-code

    for each t in time
    ut+1 <- ut+ut-1+something_at_t(x,y) 

        // Do something with the new ut+1, e.g. you can draw the wave here

        ut-1<- ut

    u<- ut+1

    The above pseudo-code is supposed to run in retained-mode so the application might be able to draw the wave in each point of the surface we’re considering by simply calling draw_wave(ut+1) .

    Let assume that we’re modeling how a water wave evolves and so let the u function represent the wave height (0 – horizontal level): we can now write down the beginning of our pseudo-code

    // Surface data 

    height = 500;

    width = 500; 

    // Wave parameters

    c = 1; // Speed of wave

    dx = 1; // Space step

    dt = 0.05; // Time step

    k=0.002 // Decay factor

    kdt=k*dt; // Decayment factor per timestep, recall that q = 2 - kdt, r = -1 +kdt

    c1 = dt^2*c^2/dx^2; // b factor

    // Initial conditions

    u_current_t = zeros(height,width); // Create a height x width zero matrix

    u_previous_t = u_current_t;

    We basically defined a surface where the wave evolution will be drawn (a 500x500 area) and initialized the wave parameters we saw a few paragraphs ago (make sure to recall the q,r and b substitutions we did before). The initial condition is a zero matrix (u_current_t) so the entire surface is quiet and there’s no wave evolving.

    Given that we are considering a matrix surface (every point located at (x;y) coordinates is described by a u value indicating the height of the wave there) we need to write code to implement the

    line in the for cycle. Actually the above expression is a simplified form for

    and we need to implement this last one. We may write down something like the following

    for(t=0; t<A_LOT_OF_TIME; t += dt)


        u_next_t = (2-kdt)*u_current_t+(kdt-1)*u_previous_t+c1*something_at_t(x,y)

        u_previous_t = u_current_t; // Current becomes old

        u_current_t = u_next_t; // New becomes current

        // Draw the wave



    that is, a for cycle with index variable t increasing by dt at every step. Everything should be familiar by now because the (2-kdt),(kdt-1) and c1 terms are the usual q,r and b substitutions. Last thing we need to implement is the something_at_t(x,y) term, as known as:

    The wave PDE we started from was

    and the term we are interested now is this:

    that, in our case, is

    Since we have a matrix of points representing our surface, we are totally in the discrete field, and since we need to perform a second derivative on a matrix of discrete points our problem is the same as having an image with pixel intensity values I(x,y) and need to calculate its Laplacian

    This is a common image processing task and it’s usually solved by applying a convolution filter to the image we are interested in (in our case: the matrix representing the surface). A common small kernel used to approximate the second derivatives in the definition of the Laplacian is the following

    So in order to implement the

    term, we need to apply the D Laplacian kernel as a filter to our image (i.e. the u_current_t):

    u_next_t=(2-kdt)*u_current_t+(kdt-1)*u_previous_t+c1* convolution(u_current_t,D);

    In fact in the element we saw earlier

    the red dot elements are weighted and calculated with a 2D convolution of the D kernel.

    An important element to keep in mind while performing the D kernel convolution with our u_current_t matrix is that every value in outside halos (every value involved in the convolution but outside the u_current_t matrix boundaries) should be zero as in the image below

    In the picture above the red border is the u_current_t matrix perimeter while the blue 3x3 matrix is the D Laplacian kernel, everything outside the red border is zero. This is important because we want our surface to act like water contained in a recipient and the waves in the water to “bounce back” if they hit the container’s border. By zeroing all the values outside the surface matrix we don’t receive wave contributions from the outside of our perimeter nor do we influence in any way what’s beyond it. In addition the “energy” of the wave doesn’t spread out and is partially “bounced back” by the equation.

    Now the algorithm is almost complete: our PDE assures us that every wave crest in our surface will be properly “transmitted” as a real wave. The only problem is: starting with a zero-everywhere matrix and letting it evolve would produce just nothing. Forever.

    We will now add a small droplet to our surface to perturb it and set the wheels in motion.

    To simulate as realistic as possible the effect of a droplet falling into our surface we will introduce a “packet” in our surface matrix. That means we are going to add a matrix that represents a discrete Gaussian function (similar to a Gaussian kernel) to our surface matrix. Notice that we’re going to “add”, not to “convolve”.

    Given the 2D Gaussian formula

    we have that A is the Gaussian amplitude (so will be our droplet amplitude), x0 and y0 are the center’s coordinates and the   and  are the spreads of the blob. Putting a negative amplitude like in the following image we can simulate a droplet just fallen into our surface.

    To generate the droplet matrix we can use the following pseudo-code

    // Generate a droplet matrix

    dsz = 3; // Droplet size

    da=0.07; // Droplet amplitude

    [X,Y] = generate_XY_planes(dsz,da);

    DropletMatrix = -da * exp( -(X/dsz)^2 -(Y/dsz)^2);

    da is the droplet amplitude (and it’s negative as we just said), while dsz is the droplet size, that is, the Gaussian “bell” radius. X and Y are two matrices representing the X and Y plane discrete values

    so the X and Y matrices for the image above are

    And the final DropletMatrix is similar to the following

    where the central value is -0.0700. If you drew the above matrix you would obtain a 3D Gaussian function which can now model our droplet.

    The final pseudo-code for our wave algorithm is the following

    // Surface data

    height = 500;

    width = 500;

    // Wave parameters

    c = 1; // Speed of wave

    dx = 1; // Space step

    dt = 0.05; // Time step

    k=0.002 // Decay factor

    kdt=k*dt; // Decayment factor per timestep, recall that q = 2 - kdt, r = -1 +kdt

    c1 = dt^2*c^2/dx^2; // b factor

    // Initial conditions

    u_current_t=zeros(height,width); // Create a height x width zero matrix

    u_previous_t = u_current_t;

    // Generate a droplet matrix

    dsz = 3; // Droplet size

    da=0.07; // Droplet amplitude

    [X,Y] = generate_XY_planes(dsz,da);

    DropletMatrix = -da * exp( -(X/dsz)^2 -(Y/dsz)^2);

    // This variable is used to add just one droplet

    One_single_droplet_added = 0;

    for(t=0; t<A_LOT_OF_TIME; t = t + dt)


        u_next_t = (2-kdt)*u_current_t+(kdt-1)*u_previous_t+c1*convolution(u_current_t,D);

        u_previous_t = u_current_t; // Current becomes old

        u_current_t = u_next_t; // New becomes current

        // Draw the wave


        if(One_single_droplet_added == 0)


            One_single_droplet_added = 1; // no more droplets

            addMatrix(u_current_t, DropletMatrix, [+100;+100]);



    The variable One_single_droplet_added is used to check if the droplet has already been inserted (we want just one droplet). The addMatrix function adds the DropletMatrix values to the u_current_t surface matrix values centering the DropletMatrix at the point (100;100), remember that the DropletMatrix is smaller (or equal) to the surface matrix, so we just add the DropletMatrix’s values to the u_current_t’s values that fall within the dimensions of the DropletMatrix.

    Now the algorithm is complete, although it is still a theoretical simulation. We will soon implement it with real code.


    Implementing the Wave simulation

    We will now discuss how the above algorithm has been implemented in a real C++ project to create a fully functional openGL physical simulation.

    The sequence diagram below shows the skeleton of the program which basically consists of three main parts: the main module where the startup function resides as well as the kernel function which creates the matrix image for the entire window, an openGL renderer wrapper to encapsulate GLUT library functions and callback handlers and a matrix hand-written class to simplify matrix data access and manipulation. Although a sequence diagram would require following a standard software engineering methodology and its use is strictly enforced by predetermined rules, nonetheless we will use it as an abstraction to model the program’s control flow

    The program starts at the main() and creates an openGLrenderer object which will handle all the graphic communication with the GLUT library and the callback events (like mouse movements, keyboard press events, etc.). OpenGL doesn’t provide routines for interfacing with a windowing system, so in this project we will rely on GLUT libraries which provide a platform-independent interface to manage windows and input events. To create an animation that runs as fast as possible we will set an idle callback with the glutIdleFunc() function. We will explain more about this later.

    Initially the algorithm sets its initialization variables (time step, space step, droplet amplitude, Laplacian 2D kernel, etc.. practically everything we saw in the theory section) and every matrix corresponding to the image to be rendered is zeroed out. The Gaussian matrix corresponding to the droplets is also preprocessed. A structure defined in openGLRenderer’s header file contains all the data which should be retained across image renderings

     typedef struct kernelData
        float a1; // 2-kdt
        float a2; // kdt-1
        float c1; // decayment factor
        sMatrix* u;
        sMatrix* u0;
        sMatrix* D;
        int Ddim; // Droplet matrix width/height
        int dsz; // Gaussian radius
        sMatrix* Zd;
    } kernelData; 

    The structure is updated each time a time step is performed since it contains both current and previous matrices that describe the waves evolution across time. Since this structure is both used by the openGL renderer and the main module to initialize it, the variable is declared as external and defined afterwards in the openGLrenderer cpp file (so its scope goes beyond the single translation unit). After everything has been set up the openGLRenderer class’ startRendering() method is called and the openGL main loop starts fetching events. The core of the algorithm we saw is in the main module’s kernel() function which is called every time an openGL idle event is dispatched (that is, the screen is updated and the changes will be shown only when the idle callback has completed, thus the amount of rendering work here performed should be minimized to avoid performance loss).

    The kernel’s function code is the following

    // This kernel is called at each iteration
    // It implements the main loop algorithm and someway "rasterize" the matrix data
    // to pass to the openGL renderer. It also adds droplets in the waiting queue
    void kernel(unsigned char *ptr, kernelData& data)
        // Wave evolution
        sMatrix un(DIM,DIM);
        // The iterative discrete update (see documentation)
        un = (*data.u)*data.a1 + (*data.u0)*data.a2 + convolutionCPU((*data.u),(*data.D))*data.c1;
        // Step forward in time
        (*data.u0) = (*data.u);
        (*data.u) = un;
        // Prepare matrix data for rendering
        matrix2Bitmap( (*data.u), ptr );
        if(first_droplet == 1) // By default there's just one initial droplet
            first_droplet = 0;
            int x0d= DIM / 2; // Default droplet center
            int y0d= DIM / 2;
            // Place the (x0d;y0d) centered Zd droplet on the wave data (it will be added at the next iteration)
            for(int Zdi=0; Zdi < data.Ddim; Zdi++)
                for(int Zdj=0; Zdj < data.Ddim; Zdj++)
                    (*data.u)(y0d-2*data.dsz+Zdi,x0d-2*data.dsz+Zdj) += (*data.Zd)(Zdi, Zdj);
        // Render the droplet queue

    The pattern we followed in the wave PDE evolution can be easily recognized in the computational-intensive code line

    un = (*data.u)*data.a1 + (*data.u0)*data.a2 + convolutionCPU((*data.u),(*data.D))*data.c1;

    which basically performs the well-known iterative step

    All constants are preprocessed to improve performances.

    It is to be noticed that the line adds up large matrices which are referred in the code as sMatrix objects. The sMatrix class is an handwritten simple class that exposes simple operator overrides to ease working with matrices. Except that one should bear in mind that large matrix operations shall avoid passing arguments by value and that to create a new matrix and copy it to the destination a copy constructor is required (to avoid obtaining a shallow copy without the actual data), the code is pretty straight forwarding so no more words will be spent on it 

    // This class handles matrix objects
    class sMatrix
       int rows, columns;
       float *values;
       sMatrix(int height, int width)
           if (height == 0 || width == 0)
          throw "Matrix constructor has 0 size";
           rows = height;
           columns = width;
           values = new float[rows*columns];
       // Copy constructor, this is needed to perform a deep copy (not a shallow one)
       sMatrix(const sMatrix& mt)
           this->rows = mt.rows;
           this->columns = mt.columns;
           this->values = new float[rows*columns];
           // Copy the values
           memcpy(this->values, mt.values, this->rows*this->columns*sizeof(float));
           delete [] values;
       // Allows matrix1 = matrix2
       sMatrix& operator= (sMatrix const& m)
           if(m.rows != this->rows || m.columns != this->columns)
            throw "Size mismatch";
          return *this; // Since "this" continues to exist after the function call, it is perfectly legit to return a reference
       // Allows both matrix(3,3) = value and value = matrix(3,3)
       float& operator() (const int row, const int column)
        // May be suppressed to slightly increase performances
        if (row < 0 || column < 0 || row > this->rows || column > this->columns)
            throw "Size mismatch";
        return values[row*columns+column]; // Since the float value continues to exist after the function call, it is perfectly legit to return a reference
      // Allows scalar*matrix (e.g. 3*matrix) for each element
      sMatrix operator* (const float scalar)
        sMatrix result(this->rows, this->columns);
        // Multiply each value by the scalar
        for(int i=0; i<rows*columns; i++)
            result.values[i] = this->values[i]*scalar;
        return result; // Copy constructor
      // Allows matrix+matrix (if same size)
      sMatrix operator+ (const sMatrix& mt)
        if (this->rows != mt.rows || this->columns != mt.columns)
            throw "Size mismatch";
        sMatrix result(this->rows, this->columns);
        // Sum each couple of values
        for(int i=0; i<rows; i++)
            for(int j=0; j<columns; j++)
                result.values[i*columns+j] = this->values[i*columns+j] + mt.values[i*columns+j];
        return result; // Copy constructor

    The convolution is performed with the following code (a classic approach):

    // Returns the convolution between the matrix A and the kernel matrix B,
    // A's size is preserved
    sMatrix convolutionCPU(sMatrix& A, sMatrix& B)
      sMatrix result(A.rows, A.columns);
      int kernelHradius = (B.rows - 1) / 2;
      int kernelWradius = (B.columns - 1) / 2;
      float convSum, convProd, Avalue;
      for(int i=0; i<A.rows; i++)
        for(int j=0; j<A.columns; j++)
            // --------j--------->
            // _ _ _ _ _ _ _     
            // |            |    |
            // |            |    |
            // |       A    |    i
            // |            |    |
            // |            |    |
            // _ _ _ _ _ _ _|    v
            convSum = 0;
            for(int bi=0; bi<B.rows; bi++)
                for(int bj=0; bj<B.columns; bj++)
                    // A's value respect to the kernel center
                    int relpointI = i-kernelHradius+bi;
                    int relpointJ = j-kernelWradius+bj;
                    if(relpointI < 0 || relpointJ < 0 || relpointI >= A.rows || relpointJ >= A.columns)
                        Avalue = 0;
                        Avalue = A(i-kernelHradius+bi,j-kernelWradius+bj);
                    convProd = Avalue*B(bi,bj);
                    convSum += convProd;
            // Store the convolution result
            result(i,j) = convSum;
      return result;

    After calculating the system’s evolution, the time elapsing is simulated by swapping the new matrix with the old one and discarding the previous state as we described before.

    Then a matrix2Bitmap() call performs a matrix-to-bitmap conversion as its name suggests, more precisely the entire simulation area is described by a large sMatrix object which contains, obviously, float values. To actually render these values as pixel units we need to convert each value to its corresponding RGBA value and pass it to the openGLRenderer class (which in turn will pass the entire bitmap buffer to the GLUT library). In brief: we need to perform a float-to-RGBcolor mapping.

    Since in the physical simulation we assumed that the resting water height is at 0 and every perturbation would heighten or lower this value (in particular the droplet Gaussian matrix lowers it by a maximum -0.07 factor), we are searching for a [-1;1] to color mapping. A HSV color model would better simulate a gradual color transition as we actually experience with our own eyes, but this would require converting it back to RGB values to set up a bitmap map to pass back at the GLUT wrapper. For performance reasons we chose to assign each value a color (colormap). A first solution would have been implementing a full [-1;1] -> [0x0;0xFFFFFF] mapping in order to cover all the possible colors in the RGB format

    // Returns a RGB color from a value in the interval between [-1;+1]
    RGB getColorFromValue(float value)
        RGB result;
        if(value <= -1.0f)
                result.r = 0x00;
                result.g = 0x00;
                result.b = 0x00;
        else if(value >= 1.0f)
                result.r = 0xFF;
                result.g = 0xFF;
                result.b = 0xFF;
                float step = 2.0f/0xFFFFFF;
                unsigned int cvalue = (unsigned int)((value + 1.0f)/step);
                if(cvalue < 0)
                        cvalue = 0;
                else if(cvalue > 0xFFFFFF)
                        cvalue = 0xFFFFFF;
                result.r = cvalue & 0xFF;
                result.g = (cvalue & 0xFF00) >> 8;
                result.b = (cvalue & 0xFF0000) >> 16;
        return result;

    However the above method is either performance-intensive and doesn’t render very good a smooth color transition: let’s take a look at a droplet mapped like that

    looks more like a fractal rather than a droplet, so the above solution won’t work. A better way to improve performances (and the look of the image) is to hard-code a colormap in an array and to use it when needed:

    float pp_step = 2.0f / (float)COLOR_NUM;
    // The custom colormap, you can customize it to whatever you like
    unsigned char m_colorMap[COLOR_NUM][3] = 
    // Returns a RGB color from a value in the interval between [-1;+1] using the above colormap
    RGB getColorFromValue(float value)
      RGB result;
      unsigned int cvalue = (unsigned int)((value + 1.0f)/pp_step);
      if(cvalue < 0)
        cvalue = 0;
      else if(cvalue >= COLOR_NUM)
        cvalue = COLOR_NUM-1;
      result.r = m_colorMap[cvalue][0];
      result.g = m_colorMap[cvalue][1];
      result.b = m_colorMap[cvalue][2];
      return result;

    Creating a colormap isn’t hard and different colormaps are freely available on the web which produce different transition effects. This time the result was much nicer (see the screenshot later on) and the performances (although an every-value conversion is always an intensive task) increased substantially.

    The last part involved in the on-screen rendering is adding a droplet wherever the user clicks on the window with the cursor. One droplet is automatically added at the center of the surface (you can find the code in the kernel() function, it is controlled by the first_droplet variable) but the user can click everywhere (almost everywhere) on the surface to add another droplet in that spot. To achieve this a queue has been implemented to contain at the most 60 droplet centers where the Gaussian matrix will be placed (notice that the matrix will be added to the surface values that were already present in the spots, not just replace them).

    #define MAX_DROPLET        60
    typedef struct Droplet
        int mx;
        int my;
    } Droplet; 
    Droplet dropletQueue[MAX_DROPLET];
    int dropletQueueCount = 0; 

    The queue system has been implemented for a reason: unlike the algorithm in pseudo-code we wrote before, rendering a scene with openGL requires the program to control the objects to be displayed in immediate-mode: that means the program needs to take care of what should be drawn before the actual rendering is performed, it cannot simply put a droplet to be rendered inside the data to be drawn because it could be in use (you can do this in retained-mode). Besides, we don’t know when a droplet will be added because it’s totally user-dependent. Because of that, every time the kernel() finishes, the droplet queue is emptied and the droplet Gaussian matrices are added to the data to be rendered (the surface data). The code which performs this is the following

    void openGLRenderer::renderWaitingDroplets()
       // If there are droplets waiting to be rendered, schedule for the next rendering
       while(dropletQueueCount > 0)
    void addDroplet( int x0d, int y0d )
       y0d = DIM - y0d;
       // Place the (x0d;y0d) centered Zd droplet on the wave data (it will be added at the next iteration)
       for(int Zdi=0; Zdi< m_simulationData.Ddim; Zdi++)
           for(int Zdj=0; Zdj< m_simulationData.Ddim; Zdj++)
    += (*m_simulationData.Zd)(Zdi, Zdj);

    The code should be familiar by now: the addDroplet function simply adds the Zd 2D Gaussian matrix to the simulation data (the kernel data) at the “current” time, a.k.a. the u matrix which represents the surface.

    The code loops until the keyboard callback handler (defined by the openGLrenderer) detects the Esc keypress, after that the application is issued a termination signal, the loops ends. The resources allocated by the program are freed before the exit signal is issued, however this might not be necessary since a program termination would let the OS free any of its previously allocated resources.

    With the droplet-adding feature and all the handlers set the code is ready to run. This time the result is much nicer than the previous, that’s because we used a smoother colormap (take a look at the images below). Notice how the convolution term creates the “wave spreading” and the “bouncing” effect when computing values against the padded zero data outside the surface matrix (i.e. when the wave hits the window’s borders and is reflected back). The first image is the simulation in its early stage, that is when some droplets have just been added, the second image represents a later stage when the surface is going to calm down (in our colormap blue values are higher than red ones).

    Since we introduced a damping factor (recall it from the theory section), the waves are eventually going to cease and the surface will sooner or later be quiet again. The entire simulation is (except for the colors) quite realistic but quite slow too. That’s because the entire surface matrix is being thoroughly updated and calculated by the system. The kernel() function runs continuously updating the rendering buffer. For a 512x512 image the CPU has to process a large quantity of data and it has also to perform 512x512 floating point convolutions. Using a profiler (like VS’s integrated one) shows that the program spends most of its time in the kernel() call (as we expected) and that the convolution operation is the most cpu-intensive.

    It is also interesting to notice that the simulation speed decreases substantially when adding lots of droplets.

    In a real scientific simulation environment gigantic quantities of data need to be processed in relatively small time intervals. That’s where GPGPU computing comes into the scene. We will briefly summarize what this acronym means and then we will present a GPU-accelerated version of the wave PDE simulation. 

    GPGPU Architectures

    GPGPU stands for General Purpose computation on Graphics Processing Units and indicates using graphic processors and devices to perform high parallelizable computations that would normally be handled by CPU devices. The idea of using graphic devices to help CPUnits with their workloads isn’t new, but until recent architectures and frameworks like CUDA (©NVIDIA vendor-specific) or openCL showed up, programmers had to rely on series of workarounds and tricks to work with inconvenient and unintuitive methods and data structures. The reason why a developer should think about porting his CPU-native code into a new GPU version resides in the architecture design differences between CPUs and GPUs. While CPUs evolved (multicore) to gain performance advantages with sequential executions (pipelines, caches, control flows, etc..), GPUs evolved in a many-core way: they tended to operate at higher data bandwidths and chose to heavily increase their execution threads number. In the last years GPGPU has been seen as a new opportunity to use graphical processing units as algebraic coprocessors capable of handling massive parallelization and precision floating point calculations. The idea behind GPGPU architectures is letting CPU handling sequential parts of programs and GPU getting over with parallelizable parts. In fact many scientific applications and systems found their performances increased by such an approach and GPGPU is now a fundamental technology in many fields like medical imaging, physics simulations, signal processing, cryptography, intrusion detection, environmental sciences, etc..

    Massive parallelization with CUDA

    We chose to use the CUD-Architecture to parallelize some parts of our code. Parallelizing with GPGPU techniques means passing from a sequentially-designed code to a parallel-designed code, this also often means having to rewrite large parts of your code. The most obvious part of the entire simulation algorithm that could benefit of a parallel approach is the surface matrix convolution with the 2D Laplacian kernel.

    Notice: for brevity’s sake, we will assume that the reader is already familiar with CUDA C/C++.

    The CUDA SDK comes with a large variety of examples regarding how to implement an efficient convolution between matrices and how to apply a kernel as an image filter. However we decided to implement our own version rather than rely on standard techniques. The reasons are many:

    • we wanted to show how a massively parallel algorithm is designed and implemented
    • since our kernel is very small (3x3 2D Laplacian kernel, basically 9 float values) using a FFT approach like the one described by Victor Podlozhnyuk would be inefficient
    • the two-dimensional Gaussian kernel is the only radially symmetric function that is also separable, our kernel is not, so we cannot use separable convolution methods
    • such a small kernel seems perfect to be “aggressively cached” in the convolution operation. We’ll expand on that as soon as we describe the CUDA kernel designed

    The most obvious way to perform a convolution (although extremely inefficient) consists in delegating each GPU thread multiple convolutions for each element across the entire image.

    Take a look at the following image, we will use a 9x9 thread grid (just one block to simplify things) to perform the convolution. The purple square is our 9x9 grid while the red grids correspond to the 9x9 kernel. Each thread performs the convolution within its elements, then the X are shifted and the entire block is “virtually” moved to the right. When the X coordinate is complete (that is: the entire image horizontal area has been covered), the Y coordinate is incremented and the process starts again until completion. In the border area, every value outside the image will be set to zero.

    The code for this simple approach is the following where A is the image matrix and B is the kernel.

    __global__ void convolutionGPU(float *A, int A_cols, int A_rows, float *B, int B_Wradius, int B_Hradius,
    int B_cols, int B_rows, float *result)
        // Initial position
        int threadXabs = blockIdx.x*blockDim.x + threadIdx.x;
        int threadYabs = blockIdx.y*blockDim.y + threadIdx.y;
        int threadXabsInitialPos = threadXabs;
        float convSum;
        while(threadYabs < A_rows)
            while(threadXabs < A_cols)
                 // If we are still in the image, start the convolution
                 convSum = 0.0f;
                 // relative x coord to the absolute thread
                 #pragma unroll
                 for(int xrel=-B_Wradius;xrel<(B_Wradius+1); xrel++)
                     #pragma unroll
                     for(int yrel=-B_Hradius;yrel<(B_Hradius+1); yrel++)
                           // Check the borders, 0 if outside
                           float Avalue;
                           if(threadXabs + xrel < 0 || threadYabs + yrel <0 || threadXabs + xrel >= A_cols || threadYabs + yrel >= A_rows)
                                Avalue = 0;
                                Avalue = A[ (threadYabs+yrel)*A_cols + (threadXabs + xrel) ];
                           // yrel+b_hradius makes the interval positive 
                           float Bvalue = B[ (yrel+B_Hradius)*B_cols + (xrel+B_Wradius) ];
                           convSum += Avalue * Bvalue;
                  // Store the result and proceed ahead in the grid
                  result[threadYabs*A_cols + threadXabs ] = convSum;
                  threadXabs += blockDim.x * gridDim.x;
             // reset X pos and forward the Y pos
             threadXabs = threadXabsInitialPos;
             threadYabs += blockDim.y * gridDim.y;
        // convolution finished

    As already stated, this simple approach has several disadvantages

    • The kernel is very small, keeping it into global memory and accessing to it for every convolution performed is extremely inefficient
    • Although the matrix readings are partially coalesced, thread divergence can be significant with threads active in the border area and threads that are inactive
    • There’s no collaborative behavior among threads, although they basically use the same kernel and share a large part of the apron region

    Hence a way better method to perform GPU convolution has been designed keeping in mind the points above.

    The idea is simple: letting each thread load part of the apron and data regions in the shared memory thus maximizing readings coalescence and reducing divergence.

    The code that performs the convolution on the GPU version of the simulation is the following

     // For a 512x512 image the grid is 170x170 blocks 3x3 threads each one
    __global__ void convolutionGPU(float *A, float *result)
       __shared__ float data[laplacianD*2][laplacianD*2];
       // Absolute position into the image
       const int gLoc = threadIdx.x + IMUL(blockIdx.x,blockDim.x) + IMUL(threadIdx.y,DIM) + IMUL(blockIdx.y,blockDim.y)*DIM;
       // Image-relative position
       const int x0 = threadIdx.x + IMUL(blockIdx.x,blockDim.x);
       const int y0 = threadIdx.y + IMUL(blockIdx.y,blockDim.y);
       // Load the apron and data regions
       int x,y;
       // Upper left square
       x = x0 - kernelRadius;
       y = y0 - kernelRadius;
       if(x < 0 || y < 0)
             data[threadIdx.x][threadIdx.y] = 0.0f;
             data[threadIdx.x][threadIdx.y] = A[ gLoc - kernelRadius - IMUL(DIM,kernelRadius)];
       // Upper right square
       x = x0 + kernelRadius + 1;
       y = y0 - kernelRadius;
       if(x >= DIM || y < 0)
             data[threadIdx.x + blockDim.x][threadIdx.y] = 0.0f;
             data[threadIdx.x + blockDim.x][threadIdx.y] = A[ gLoc + kernelRadius+1 - IMUL(DIM,kernelRadius)];
       // Lower left square
       x = x0 - kernelRadius;
       y = y0 + kernelRadius+1;
       if(x < 0 || y >= DIM)
             data[threadIdx.x][threadIdx.y + blockDim.y] = 0.0f;
             data[threadIdx.x][threadIdx.y + blockDim.y] = A[ gLoc - kernelRadius + IMUL(DIM,(kernelRadius+1))];
       // Lower right square
       x = x0 + kernelRadius+1;
       y = y0 + kernelRadius+1;
       if(x >= DIM || y >= DIM)
             data[threadIdx.x + blockDim.x][threadIdx.y + blockDim.y] = 0.0f;
             data[threadIdx.x + blockDim.x][threadIdx.y + blockDim.y] = A[ gLoc + kernelRadius+1 + IMUL(DIM,(kernelRadius+1))];
       float sum = 0;
       x = kernelRadius + threadIdx.x;
       y = kernelRadius + threadIdx.y;
       // Execute the convolution in the shared memory (kernel is in constant memory)
    #pragma unroll
       for(int i = -kernelRadius; i<=kernelRadius; i++)
              for(int j=-kernelRadius; j<=kernelRadius; j++)
                      sum += data[x+i][y+j]  * gpu_D[i+kernelRadius][j+kernelRadius];
       // Transfer the risult to global memory
       result[gLoc] = sum;

    The kernel only receives the surface matrix and the result where to store the convolved image. The kernel isn’t provided because it has been put into a special memory called “constant memory” which is read-only by kernels, pre-fetched and highly optimized to let all threads read from a specific location with minimum latency. The downside is that this kind of memory is extremely limited (in the order of 64Kb) so should be used wisely. Declaring our 3x3 kernel as constant memory grants us a significant speed advantage

    __device__ __constant__ float gpu_D[laplacianD][laplacianD]; // Laplacian 2D kernel

    The image below helps to determine how threads load from the surface matrix in the global memory the data and store them into faster on-chip shared memory before actually using them in the convolution operation. The purple 3x3 square is the kernel window and the central element is the value we are actually pivoting on. The grid is a 172x172 blocks 3x3 threads each one; each block of 3x3 threads have four stages to complete before entering the convolution loop: load the upper left apron and image data into shared memory (the upper left red square from the kernel purple window), load the upper right area (red square), load the lower left area (red square) and load the lower right area (idem). Since shared memory is only available to the threads in a block, each block loads its own shared area. Notice that we chose to let every thread read something from global memory to maximize coalescence, but we are not actually going to use every single element. The image shows a yellow area and a gray area: the yellow data is actually going to be used in the convolution operation for each element in the purple kernel square (it comprises aprons and data) while the gray area isn’t going to be used by any convolution performed by the block we are considering.

    After filling each block’s shared memory array, the CUDA threads get synchronized to minimize their divergence. Then the execution of the convolution algorithm is performed: shared data is multiplied against constant kernel data resulting in a highly optimized operation.

    The #pragma unroll directive instructs the compiler to unroll (where possible) the loop to reduce cycle control overhead and improve performances. A small example of loop unrolling: the following loop

    for(int i=0;i<1000;i++)

    a[i] = b[i] + c[i];

    might be optimized by unrolling it

    for(int i=0;i<1000;i+=2)


    a[i] = b[i] + c[i];

    a[i+1] = b[i+1] + c[i+1];


    so that the control instructions are executed less and the overall loop improves its performances. It is to be noticed that almost every optimization in CUDA code needs to be carefully and thoroughly tested because a different architecture and different program control flows might produce different results (as well as different compiler optimizations that, unfortunately, cannot be always trusted).

    Also notice that the IMUL macro is used in the code which is defined as

    #define IMUL(a,b) __mul24(a,b)

    On devices of CUDA compute capability 1.x, 32-bit integer multiplication is implemented using multiple instructions as it is not natively supported. 24-bit integer multiplication is natively supported via the __[u]mul24 intrinsic. However on devices of compute capability 2.0, however, 32-bit integer multiplication is natively supported, but 24-bit integer multiplication is not. __[u]mul24 is therefore implemented using multiple instructions and should not be used. So if you are planning to use the code on 2.x devices, make sure to redefine the macro directive.

    A typical code which could call the kernel we just wrote could be

    sMatrix convolutionGPU_i(sMatrix& A, sMatrix& B)
        unsigned int A_bytes = A.rows*A.columns*sizeof(float);
        sMatrix result(A.rows, A.columns);
        float *cpu_result = (float*)malloc(A_bytes);
        // Copy A data to the GPU global memory (B aka the kernel is already there)
        cudaError_t chk;
        chk = cudaMemcpy(m_gpuData->gpu_matrixA, A.values, A_bytes, cudaMemcpyHostToDevice);
        if(chk != cudaSuccess)
            return result;
        // Call the convolution kernel
        dim3 blocks(172,172);
        dim3 threads(3,3);
        convolutionGPU<<<blocks,threads>>>(m_gpuData->gpu_matrixA, m_gpuData->gpu_matrixResult);
        // Copy back the result
        chk = cudaMemcpy(cpu_result, m_gpuData->gpu_matrixResult, A_bytes, cudaMemcpyDeviceToHost);
        if(chk != cudaSuccess)
             return result;
        // Allocate a sMatrix and return it with the GPU data
        result.values = cpu_result;
        return result;

    obviously CUDA memory should be cudaMalloc-ated at the beginning of our program and freed only when the GPU work is complete.

    However, as we stated before, converting a sequentially-designed program into a parallel one isn’t an easy task and often requires more than just a plain function-to-function conversion (it depends on the application). In our case substituting just a CPU-convolution function with a GPU-convolution function won’t work. In fact even though we distributed our workload in a better way from the CPU version (see the images below for a CPU-GPU exclusive time percentage), we actually slowed down the whole simulation.

    The reason is simple: our kernel() function is called whenever a draw event is dispatched, so it needs to be called very often. Although the CUDA kernel is faster than the CPU convolution function and although GPU memory bandwidths are higher than CPU’s, transferring from (possibly paged-out) host memory to global device memory back and forth just kills our simulation performances. Applications which would benefit more from a CUDA approach usually perform a single-shot heavily-computational kernel workload and then transfer back the results. Real time applications might benefit from a concurrent kernels approach, but a 2.x capability device would be required.

    In order to actually accelerate our simulation, a greater code revision is required.

    Another more subtle thing to take into account when working with GPU code is CPU optimizations: take a look at the following asm codes for the CPU version of the line

    un = (*data.u)*data.a1 + (*data.u0)*data.a2 + convolutionCPU((*data.u),(*data.D))*data.c1;

    000000013F2321EF  mov        r8,qword ptr [m_simulationData+20h (13F234920h)]  
    000000013F2321F6  mov        rdx,qword ptr [m_simulationData+10h (13F234910h)]  
    000000013F2321FD  lea        rcx,[rbp+2Fh]  
    000000013F232201  call       convolutionCPU (13F231EC0h)  
    000000013F232206  nop  
    000000013F232207  movss      xmm2,dword ptr [m_simulationData+8 (13F234908h)]  
    000000013F23220F  lea        rdx,[rbp+1Fh]  
    000000013F232213  mov        rcx,rax  
    000000013F232216  call       sMatrix::operator* (13F2314E0h)  
    000000013F23221B  mov        rdi,rax  
    000000013F23221E  movss      xmm2,dword ptr [m_simulationData+4 (13F234904h)]  
    000000013F232226  lea        rdx,[rbp+0Fh]  
    000000013F23222A  mov        rcx,qword ptr [m_simulationData+18h (13F234918h)]  
    000000013F232231  call       sMatrix::operator* (13F2314E0h)  
    000000013F232236  mov        rbx,rax  
    000000013F232239  movss      xmm2,dword ptr [m_simulationData (13F234900h)]  
    000000013F232241  lea        rdx,[rbp-1]  
    000000013F232245  mov        rcx,qword ptr [m_simulationData+10h (13F234910h)]  
    000000013F23224C  call       sMatrix::operator* (13F2314E0h)  
    000000013F232251  nop  
    000000013F232252  mov        r8,rbx  
    000000013F232255  lea        rdx,[rbp-11h]  
    000000013F232259  mov        rcx,rax  
    000000013F23225C  call       sMatrix::operator+ (13F2315B0h)  
    000000013F232261  nop  
    000000013F232262  mov        r8,rdi  
    000000013F232265  lea        rdx,[rbp-21h]  
    000000013F232269  mov        rcx,rax  
    000000013F23226C  call       sMatrix::operator+ (13F2315B0h)  
    000000013F232271  nop  
    000000013F232272  cmp        dword ptr [rax],1F4h  
    000000013F232278  jne        kernel+33Fh (13F2324CFh)  
    000000013F23227E  cmp        dword ptr [rax+4],1F4h  
    000000013F232285  jne        kernel+33Fh (13F2324CFh)  
    000000013F23228B  mov        r8d,0F4240h  
    000000013F232291  mov        rdx,qword ptr [rax+8]  
    000000013F232295  mov        rcx,r12  
    000000013F232298  call       memcpy (13F232DDEh)  
    000000013F23229D  nop  
    000000013F23229E  mov        rcx,qword ptr [rbp-19h]  
    000000013F2322A2  call       qword ptr [__imp_operator delete (13F233090h)]  
    000000013F2322A8  nop  
    000000013F2322A9  mov        rcx,qword ptr [rbp-9]  
    000000013F2322AD  call       qword ptr [__imp_operator delete (13F233090h)]  
    000000013F2322B3  nop  
    000000013F2322B4  mov        rcx,qword ptr [rbp+7]  
    000000013F2322B8  call       qword ptr [__imp_operator delete (13F233090h)]  
    000000013F2322BE  nop  
    000000013F2322BF  mov        rcx,qword ptr [rbp+17h]  
    000000013F2322C3  call       qword ptr [__imp_operator delete (13F233090h)]  
    000000013F2322C9  nop  
    000000013F2322CA  mov        rcx,qword ptr [rbp+27h]  
    000000013F2322CE  call       qword ptr [__imp_operator delete (13F233090h)]  
    000000013F2322D4  nop  
    000000013F2322D5  mov        rcx,qword ptr [rbp+37h]  
    000000013F2322D9  call       qword ptr [__imp_operator delete (13F233090h)]  

    and now take a look at the GPU version of the line

    un = (*data.u)*data.a1 + (*data.u0)*data.a2 + convolutionGPU_i ((*data.u),(*data.D))*data.c1;

    000000013F7E23A3  mov        rax,qword ptr [data]  
    000000013F7E23AB  movss      xmm0,dword ptr [rax+8]  
    000000013F7E23B0  movss      dword ptr [rsp+0A8h],xmm0  
    000000013F7E23B9  mov        rax,qword ptr [data]  
    000000013F7E23C1  mov        r8,qword ptr [rax+20h]  
    000000013F7E23C5  mov        rax,qword ptr [data]  
    000000013F7E23CD  mov        rdx,qword ptr [rax+10h]  
    000000013F7E23D1  lea        rcx,[rsp+70h]  
    000000013F7E23D6  call       convolutionGPU_i (13F7E1F20h)  
    000000013F7E23DB  mov        qword ptr [rsp+0B0h],rax  
    000000013F7E23E3  mov        rax,qword ptr [rsp+0B0h]  
    000000013F7E23EB  mov        qword ptr [rsp+0B8h],rax  
    000000013F7E23F3  movss      xmm0,dword ptr [rsp+0A8h]  
    000000013F7E23FC  movaps     xmm2,xmm0  
    000000013F7E23FF  lea        rdx,[rsp+80h]  
    000000013F7E2407  mov        rcx,qword ptr [rsp+0B8h]  
    000000013F7E240F  call       sMatrix::operator* (13F7E2B20h)  
    000000013F7E2414  mov        qword ptr [rsp+0C0h],rax  
    000000013F7E241C  mov        rax,qword ptr [rsp+0C0h]  
    000000013F7E2424  mov        qword ptr [rsp+0C8h],rax  
    000000013F7E242C  mov        rax,qword ptr [data]  
    000000013F7E2434  movss      xmm0,dword ptr [rax+4]  
    000000013F7E2439  movaps     xmm2,xmm0  
    000000013F7E243C  lea        rdx,[rsp+50h]  
    000000013F7E2441  mov        rax,qword ptr [data]  
    000000013F7E2449  mov        rcx,qword ptr [rax+18h]  
    000000013F7E244D  call       sMatrix::operator* (13F7E2B20h)  
    000000013F7E2452  mov        qword ptr [rsp+0D0h],rax  
    000000013F7E245A  mov        rax,qword ptr [rsp+0D0h]  
    000000013F7E2462  mov        qword ptr [rsp+0D8h],rax  
    000000013F7E246A  mov        rax,qword ptr [data]  
    000000013F7E2472  movss      xmm2,dword ptr [rax]  
    000000013F7E2476  lea        rdx,[rsp+40h]  
    000000013F7E247B  mov        rax,qword ptr [data]  
    000000013F7E2483  mov        rcx,qword ptr [rax+10h]  
    000000013F7E2487  call       sMatrix::operator* (13F7E2B20h)  
    000000013F7E248C  mov        qword ptr [rsp+0E0h],rax  
    000000013F7E2494  mov        rax,qword ptr [rsp+0E0h]  
    000000013F7E249C  mov        qword ptr [rsp+0E8h],rax  
    000000013F7E24A4  mov        r8,qword ptr [rsp+0D8h]  
    000000013F7E24AC  lea        rdx,[rsp+60h]  
    000000013F7E24B1  mov        rcx,qword ptr [rsp+0E8h]  
    000000013F7E24B9  call       sMatrix::operator+ (13F7E2BF0h)  
    000000013F7E24BE  mov        qword ptr [rsp+0F0h],rax  
    000000013F7E24C6  mov        rax,qword ptr [rsp+0F0h]  
    000000013F7E24CE  mov        qword ptr [rsp+0F8h],rax  
    000000013F7E24D6  mov        r8,qword ptr [rsp+0C8h]  
    000000013F7E24DE  lea        rdx,[rsp+90h]  
    000000013F7E24E6  mov        rcx,qword ptr [rsp+0F8h]  
    000000013F7E24EE  call       sMatrix::operator+ (13F7E2BF0h)  
    000000013F7E24F3  mov        qword ptr [rsp+100h],rax  
    000000013F7E24FB  mov        rax,qword ptr [rsp+100h]  
    000000013F7E2503  mov        qword ptr [rsp+108h],rax  
    000000013F7E250B  mov        rdx,qword ptr [rsp+108h]  
    000000013F7E2513  lea        rcx,[un]  
    000000013F7E2518  call       sMatrix::operator= (13F7E2A90h)  
    000000013F7E251D  nop  
    000000013F7E251E  lea        rcx,[rsp+90h]  
    000000013F7E2526  call        sMatrix::~sMatrix (13F7E2970h)  
    000000013F7E252B  nop  
    000000013F7E252C  lea        rcx,[rsp+60h]  
    000000013F7E2531  call       sMatrix::~sMatrix (13F7E2970h)  
    000000013F7E2536  nop  
    000000013F7E2537  lea        rcx,[rsp+40h]  
    000000013F7E253C  call       sMatrix::~sMatrix (13F7E2970h)  
    000000013F7E2541  nop  
    000000013F7E2542  lea        rcx,[rsp+50h]  
    000000013F7E2547  call       sMatrix::~sMatrix (13F7E2970h)  
    000000013F7E254C  nop  
    000000013F7E254D  lea        rcx,[rsp+80h]  
    000000013F7E2555  call       sMatrix::~sMatrix (13F7E2970h)  
    000000013F7E255A  nop  
    000000013F7E255B  lea        rcx,[rsp+70h]  
    000000013F7E2560 call        sMatrix::~sMatrix(13F7E2970h)  

    the code, although the data involved are practically the same, looks much more bloated up (there are even nonsense operations, look at address 000000013F7E23DB). Probably letting the CPU finish the calculation after the GPU has done its work is not a good idea.

    Since there are other functions which can be parallelized (like the matrix2bitmap() function), we need to move as much workload as possible on the device.

    First we need to allocate memory on the device at the beginning of the program and deallocate it when finished. Small data like our algorithm parameters can be stored into constant memory to boost performances, while matrices large data is more suited into global memory (constant memory size is very limited).

    // Initialize all data used by the device
    // and the rendering simulation data as well
    void initializeGPUData()
        /* Algorithm parameters */
        // Time step
        float dt = (float)0.05;
        // Speed of the wave
        float c = 1;
        // Space step
        float dx = 1;
        // Decay factor
        float k = (float)0.002;
        // Droplet amplitude (Gaussian amplitude)
        float da = (float)0.07;
        // Initialize u0
        sMatrix u0(DIM,DIM);
        for(int i=0; i<DIM; i++)
              for(int j=0; j<DIM; j++)
                    u0(i,j) = 0.0f; // The corresponding color in the colormap for 0 is green
        // Initialize the rendering img to the u0 matrix
        CPUsMatrix2Bitmap(u0, renderImg);
        // Decayment per timestep
        float kdt=k*dt;
        // c1 constant
        float c1=pow(dt,2)*pow(c,2)/pow(dx,2);
        // Droplet as gaussian
        // This code creates a gaussian discrete droplet, see the documentation for more information
        const int dim = 4*dropletRadius+1;
        sMatrix xd(dim, dim);
        sMatrix yd(dim, dim);
        for(int i=0; i<dim; i++)
               for(int j=-2*dropletRadius; j<=2*dropletRadius; j++)
                      xd(i,j+2*dropletRadius) = j;
                      yd(j+2*dropletRadius,i) = j;
        float m_Zd[dim][dim];
        for(int i=0; i<dim; i++)
               for(int j=0; j<dim; j++)
                      // Calculate Gaussian centered on zero
                      m_Zd[i][j] = -da*exp(-pow(xd(i,j)/dropletRadius,2)-pow(yd(i,j)/dropletRadius,2));
        /* GPU data initialization */
        // Allocate memory on the GPU for u and u0 matrices
        unsigned int UU0_bytes = DIM*DIM*sizeof(float);
        cudaError_t chk;
        chk = cudaMalloc((void**)&m_gpuData.gpu_u, UU0_bytes);
        if(chk != cudaSuccess)
             printf("\nCRITICAL: CANNOT ALLOCATE GPU MEMORY");
        chk = cudaMalloc((void**)&m_gpuData.gpu_u0, UU0_bytes);
        if(chk != cudaSuccess)
             printf("\nCRITICAL: CANNOT ALLOCATE GPU MEMORY");
        // Allocate memory for ris0, ris1, ris2 and ptr matrices
        chk = cudaMalloc((void**)&m_gpuData.ris0, UU0_bytes);
        if(chk != cudaSuccess)
             printf("\nCRITICAL: CANNOT ALLOCATE GPU MEMORY");
        chk = cudaMalloc((void**)&m_gpuData.ris1, UU0_bytes);
        if(chk != cudaSuccess)
             printf("\nCRITICAL: CANNOT ALLOCATE GPU MEMORY");
        chk = cudaMalloc((void**)&m_gpuData.ris2, UU0_bytes);
        if(chk != cudaSuccess)
             printf("\nCRITICAL: CANNOT ALLOCATE GPU MEMORY");
        chk = cudaMalloc((void**)&m_gpuData.gpu_ptr, DIM*DIM*4);
        if(chk != cudaSuccess)
             printf("\nCRITICAL: CANNOT ALLOCATE GPU MEMORY");
        // Initialize to zero both u and u0
        chk = cudaMemcpy(m_gpuData.gpu_u0, u0.values, UU0_bytes, cudaMemcpyHostToDevice);
        if(chk != cudaSuccess)
        chk = cudaMemcpy(m_gpuData.gpu_u, u0.values, UU0_bytes, cudaMemcpyHostToDevice);
        if(chk != cudaSuccess)
        // Preload Laplacian kernel
        float m_D[3][3];
        m_D[0][0] = 0.0f; m_D[1][0] = 1.0f;  m_D[2][0]=0.0f;
        m_D[0][1] = 1.0f; m_D[1][1] = -4.0f; m_D[2][1]=1.0f;
        m_D[0][2] = 0.0f; m_D[1][2] = 1.0f;  m_D[2][2]=0.0f;
        // Copy Laplacian to constant memory
        chk = cudaMemcpyToSymbol((const char*)gpu_D, m_D, 9*sizeof(float), 0, cudaMemcpyHostToDevice);
        if(chk != cudaSuccess)
              printf("\nCONSTANT MEMORY TRANSFER FAILED");
        // Store all static algorithm parameters in constant memory
        const float a1 = (2-kdt);
        chk = cudaMemcpyToSymbol((const char*)&gpu_a1, &a1, sizeof(float), 0, cudaMemcpyHostToDevice);
        if(chk != cudaSuccess)
              printf("\nCONSTANT MEMORY TRANSFER FAILED");
        const float a2 = (kdt-1);
        chk = cudaMemcpyToSymbol((const char*)&gpu_a2, &a2, sizeof(float), 0, cudaMemcpyHostToDevice);
        if(chk != cudaSuccess)
             printf("\nCONSTANT MEMORY TRANSFER FAILED");
        chk = cudaMemcpyToSymbol((const char*)&gpu_c1, &c1, sizeof(float), 0, cudaMemcpyHostToDevice);
        if(chk != cudaSuccess)
             printf("\nCONSTANT MEMORY TRANSFER FAILED");
        const int ddim = dim;
        chk = cudaMemcpyToSymbol((const char*)&gpu_Ddim, &ddim, sizeof(int), 0, cudaMemcpyHostToDevice);
        if(chk != cudaSuccess)
             printf("\nCONSTANT MEMORY TRANSFER FAILED");
        const int droplet_dsz = dropletRadius;
        chk = cudaMemcpyToSymbol((const char*)&gpu_dsz, &droplet_dsz, sizeof(int), 0, cudaMemcpyHostToDevice);
        if(chk != cudaSuccess)
             printf("\nCONSTANT MEMORY TRANSFER FAILED");
        chk = cudaMemcpyToSymbol((constchar*)&gpu_Zd, &m_Zd, sizeof(float)*dim*dim, 0, cudaMemcpyHostToDevice);
        if(chk != cudaSuccess)
             printf("\nCONSTANT MEMORY TRANSFER FAILED");
        // Initialize colormap and ppstep in constant memory
        chk = cudaMemcpyToSymbol((const char*)&gpu_pp_step, &pp_step, sizeof(float), 0, cudaMemcpyHostToDevice);
        if(chk != cudaSuccess)
             printf("\nCONSTANT MEMORY TRANSFER FAILED");
        chk = cudaMemcpyToSymbol((const char*)&gpu_m_colorMap, &m_colorMap, sizeof(unsigned char)*COLOR_NUM*3, 0, cudaMemcpyHostToDevice);
        if(chk != cudaSuccess)
             printf("\nCONSTANT MEMORY TRANSFER FAILED");
    void deinitializeGPUData()
        // Free everything from device memory

    After initializing the GPU memory the openGLRenderer can be started as usual to call the kernel() function in order to obtain a valid render-able surface image matrix. But there’s a difference now, right in the openGLRenderer constructor

         //. . .
         // Sets up the CUBLAS
         cublasStatus_t status = cublasInit();
         if (status != CUBLAS_STATUS_SUCCESS)
               // CUBLAS initialization error
               printf("\nCRITICAL: CUBLAS LIBRARY FAILED TO LOAD");
         // Set up the bitmap data with page-locked memory for fast access
         //no more : renderImg = new unsigned char[DIM*DIM*4];
         cudaError_t chk = cudaHostAlloc((void**)&renderImg, DIM*DIM*4*sizeof(char), cudaHostAllocDefault);
         if(chk != cudaSuccess)
               return ;

    First we decided to use CUBLAS library to perform matrix addition for two reasons:

    • our row-major data on the device is ready to be used by the CUBLAS functions yet (cublasMalloc is just a wrapper around the cudaMalloc)
    • CUBLAS library is extremely optimized for large matrices operations; our matrices aren’t that big but this could help extending the architecture for a future version

    Using our sMatrix wrapper is no more an efficient choice and we need to get rid of it while working on the device, although we can still use it for the initialization stage.

    The second fundamental thing that we need to notice in the openGLRenderer constructor is that we allocated host-side memory (the memory that will contain the data to be rendered) with cudaHostAlloc instead of the classic malloc. As the documentation states, allocating memory with such a function grants that the CUDA driver will track the virtual memory ranges allocated with this function and accelerate calls to function like cudaMemCpy. Host memory allocated with cudaHostAlloc is often referred as “pinned memory”, and cannot be paged-out (and because of that allocating excessive amounts of it may degrade system performance since it reduces the amount of memory available to the system for paging). This expedient will grant additional speed in memory transfers between device and host.

    We are not ready to take a peek at the revised kernel() function

    // This kernel is called at each iteration
    // It implements the main loop algorithm and someway "rasterize" the matrix data
    // to be passed to the openGL renderer. It also adds droplets in the waiting queue
    void kernel(unsigned char *ptr)
        // Set up the grid
        dim3 blocks(172,172);
        dim3 threads(3,3); // 516x516 img is 172x172 (3x3 thread) blocks
        // Implements the un = (*data.u)*data.a1 + (*data.u0)*data.a2 + convolution((*data.u),(*data.D))*data.c1;
        // line by means of several kernel calls
        convolutionGPU<<<blocks,threads>>>(m_gpuData.gpu_u, m_gpuData.ris0);
        // Now multiply everything by c1 constant
        multiplyEachElementby_c1<<<blocks,threads>>>(m_gpuData.ris0, m_gpuData.ris1);
        // First term is ready, now u*a1
        multiplyEachElementby_a1<<<blocks,threads>>>(m_gpuData.gpu_u, m_gpuData.ris0);
        // u0*a2
        multiplyEachElementby_a2<<<blocks,threads>>>(m_gpuData.gpu_u0, m_gpuData.ris2);
        // Perform the matrix addition with the CUBLAS library
        // un = ris0 + ris2 + ris1
        // Since everything is already stored as row-major device vectors, we don't need to do anything to pass it to the CUBLAS
        cublasSaxpy(DIM*DIM, 1.0f, m_gpuData.ris0, 1, m_gpuData.ris2, 1);
        cublasSaxpy(DIM*DIM, 1.0f, m_gpuData.ris2, 1, m_gpuData.ris1, 1);
        // Result is not in m_gpuData.ris1
        // Step forward in time
        cudaMemcpy(m_gpuData.gpu_u0, m_gpuData.gpu_u, DIM*DIM*sizeof(float), cudaMemcpyDeviceToDevice);
        cudaMemcpy(m_gpuData.gpu_u, m_gpuData.ris1, DIM*DIM*sizeof(float), cudaMemcpyDeviceToDevice);
        // Draw the u surface matrix and "rasterize" it into gpu_ptr
        gpuMatrix2Bitmap<<<blocks,threads>>>(m_gpuData.gpu_u, m_gpuData.gpu_ptr);
        // Back on the pagelocked host memory
        cudaMemcpy(ptr, m_gpuData.gpu_ptr, DIM*DIM*4, cudaMemcpyDeviceToHost);
        if(first_droplet == 1) // By default there's just one initial droplet
             first_droplet = 0;
             int x0d= DIM / 2; // Default droplet center
             int y0d= DIM / 2;
             cudaMemcpy(m_gpuData.ris0, m_gpuData.gpu_u, DIM*DIM*sizeof(float), cudaMemcpyDeviceToDevice);
             addDropletToU<<<blocks,threads>>>(m_gpuData.ris0, x0d,y0d, m_gpuData.gpu_u);
        // Add all the remaining droplets in the queue
        while(dropletQueueCount >0)
             int y0d = DIM - dropletQueue[dropletQueueCount].my;
             // Copy from u to one of our buffers
             cudaMemcpy(m_gpuData.ris0, m_gpuData.gpu_u, DIM*DIM*sizeof(float), cudaMemcpyDeviceToDevice);
             addDropletToU<<<blocks,threads>>>(m_gpuData.ris0, dropletQueue[dropletQueueCount].mx,y0d, m_gpuData.gpu_u);
        // Synchronize to make sure all kernels executions are done

    The line

    un = (*data.u)*data.a1 + (*data.u0)*data.a2 + convolution((*data.u),(*data.D))*data.c1;

    has completely been superseded by multiple kernel calls which respectively operate a convolution operation, multiply matrix data by algorithm constants and perform a matrix-matrix addition via CUBLAS. Everything is performed in the device including the point-to-RGBvalue mapping (which is a highly parallelizable operation since must be performed for every value in the surface image matrix). Stepping forward in time is also accomplished with device methods. Eventually the data is copied back to the page-locked pinned host memory and droplets waiting in the queue are added for the next iteration to the u surface simulation data matrix.

    The CUDA kernels called by the kernel() function are the following

                                    CUDA KERNELS
    // For a 512x512 image the grid is 170x170 blocks 3x3 threads each one
    __global__ void convolutionGPU(float *A, float *result)
       __shared__ float data[laplacianD*2][laplacianD*2];
       // Absolute position into the image
       const int gLoc = threadIdx.x + IMUL(blockIdx.x,blockDim.x) + IMUL(threadIdx.y,DIM) + IMUL(blockIdx.y,blockDim.y)*DIM;
       // Image-relative position
       const int x0 = threadIdx.x + IMUL(blockIdx.x,blockDim.x);
       const int y0 = threadIdx.y + IMUL(blockIdx.y,blockDim.y);
       // Load the apron and data regions
       int x,y;
       // Upper left square
       x = x0 - kernelRadius;
       y = y0 - kernelRadius;
       if(x < 0 || y < 0)
            data[threadIdx.x][threadIdx.y] = 0.0f;
            data[threadIdx.x][threadIdx.y] = A[ gLoc - kernelRadius - IMUL(DIM,kernelRadius)];
       // Upper right square
       x = x0 + kernelRadius + 1;
       y = y0 - kernelRadius;
       if(x >= DIM || y < 0)
            data[threadIdx.x + blockDim.x][threadIdx.y] = 0.0f;
            data[threadIdx.x + blockDim.x][threadIdx.y] = A[ gLoc + kernelRadius+1 - IMUL(DIM,kernelRadius)];
       // Lower left square
       x = x0 - kernelRadius;
       y = y0 + kernelRadius+1;
       if(x < 0 || y >= DIM)
            data[threadIdx.x][threadIdx.y + blockDim.y] = 0.0f;
            data[threadIdx.x][threadIdx.y + blockDim.y] = A[ gLoc - kernelRadius + IMUL(DIM,(kernelRadius+1))];
       // Lower right square
       x = x0 + kernelRadius+1;
       y = y0 + kernelRadius+1;
       if(x >= DIM || y >= DIM)
             data[threadIdx.x + blockDim.x][threadIdx.y + blockDim.y] = 0.0f;
             data[threadIdx.x + blockDim.x][threadIdx.y + blockDim.y] = A[ gLoc + kernelRadius+1 + IMUL(DIM,(kernelRadius+1))];
       float sum = 0;
       x = kernelRadius + threadIdx.x;
       y = kernelRadius + threadIdx.y;
       // Execute the convolution in the shared memory (kernel is in constant memory)
    #pragma unroll
       for(int i = -kernelRadius; i<=kernelRadius; i++)
             for(int j=-kernelRadius; j<=kernelRadius; j++)
                      sum += data[x+i][y+j]  * gpu_D[i+kernelRadius][j+kernelRadius];
       // Transfer the risult to global memory
       result[gLoc] = sum;
    __global__ void multiplyEachElementby_c1(float *matrix, float *result)
        // Absolute position into the image
        const int gLoc = threadIdx.x + IMUL(blockIdx.x,blockDim.x) + IMUL(threadIdx.y,DIM) + IMUL(blockIdx.y,blockDim.y)*DIM;
        // Multiply by c1 each matrix's element
        result[gLoc] = matrix[gLoc]*gpu_c1;
    __global__ void multiplyEachElementby_a1(float *matrix, float *result)
        // Absolute position into the image
        const int gLoc = threadIdx.x + IMUL(blockIdx.x,blockDim.x) + IMUL(threadIdx.y,DIM) + IMUL(blockIdx.y,blockDim.y)*DIM;
        // Multiply by c1 each matrix's element
        result[gLoc] = matrix[gLoc]*gpu_a1;
    __global__ void multiplyEachElementby_a2(float *matrix, float *result)
        // Absolute position into the image
        const int gLoc = threadIdx.x + IMUL(blockIdx.x,blockDim.x) + IMUL(threadIdx.y,DIM) + IMUL(blockIdx.y,blockDim.y)*DIM;
        // Multiply by c1 each matrix'selement
        result[gLoc] = matrix[gLoc]*gpu_a2;
    // Associate a colormap RGB value to each point
    __global__ void gpuMatrix2Bitmap(float *matrix, BYTE *bitmap)
        // Absolute position into the image
        const int gLoc = threadIdx.x + IMUL(blockIdx.x,blockDim.x) + IMUL(threadIdx.y,DIM) + IMUL(blockIdx.y,blockDim.y)*DIM;
        int cvalue = (int)((matrix[gLoc] + 1.0f)/gpu_pp_step);
        if(cvalue < 0)
              cvalue = 0;
        else if(cvalue >= COLOR_NUM)
              cvalue = COLOR_NUM-1;
        bitmap[gLoc*4] = gpu_m_colorMap[cvalue][0];
        bitmap[gLoc*4 + 1] = gpu_m_colorMap[cvalue][1];
        bitmap[gLoc*4 + 2] = gpu_m_colorMap[cvalue][2];
        bitmap[gLoc*4 + 3] = 0xFF; // Alpha
    // Add a gaussian 2D droplet matrix to the surface data
    // Warning: this kernel has a high divergence factor, it is meant to be seldom called
    __global__ void addDropletToU(float *matrix, int x0d, int y0d, float *result)
        // Absolute position into the image
        const int gLoc = threadIdx.x + IMUL(blockIdx.x,blockDim.x) + IMUL(threadIdx.y,DIM) + IMUL(blockIdx.y,blockDim.y)*DIM;
        // Image relative position
        const int x0 = threadIdx.x + IMUL(blockIdx.x,blockDim.x);
        const int y0 = threadIdx.y + IMUL(blockIdx.y,blockDim.y);
        // Place the (x0d;y0d) centered Zd droplet on the wave data (it will be added at the next iteration)
        if (x0 >= x0d-gpu_dsz*2 && y0 >= y0d-gpu_dsz*2 && x0 <= x0d+gpu_dsz*2 && y0 <= y0d+gpu_dsz*2)
             // Add to result the matrix value plus the Zd corresponding value
             result[gLoc] = matrix[gLoc] + gpu_Zd[x0 -(x0d-gpu_dsz*2)][y0 - (y0d-gpu_dsz*2)];
            result[gLoc] = matrix[gLoc]; // This value shouln't be changed

    Notice that we preferred to “hardcode” the constant values usages with different kernels rather than introducing divergence with a conditional branch. The only kernel that increases thread divergence is the addDropletToU since only a few threads are actually performing the Gaussian packet-starting routine (see the theoric algorithm described a few paragraphs ago), but this isn’t a problem due to its low calling frequency. 

    Performance comparison

    The timing measurements and performance comparisons have been performed on the following system

    Intel 2 quad cpu Q9650 @ 3.00 Ghz

    6 GB ram

    64 bit OS

    NVIDIA GeForce GTX 285 (1GB DDR3 @ 1476 Mhz, 240 CUDA cores)

    The CUDA version we used to compile the projects is 4.2, if you have problems make sure to install the right version or change it as described in the readme file.

    To benchmark the CUDA kernel execution we used the cudaEventCreate / cudaEventRecord / cudaEventSynchronize / cudaEventElapsedTime functions shipped with every CUDA version, while for the CPU version we used two Windows platform-dependent APIs: QueryPerformanceFrequency and QueryPerformanceCounter.

    We split the benchmark into four stages: a startup stage where the only droplet in the image is the default one, a second stage when both the CPU and the GPU version stabilized themselves, a third one where we add 60-70 droplets to the rendering queue and a final one when the application is left running for 15-20 minutes. We saw that in every test the CPU performed worse than the GPU version which could rely on a large grid of threads ready to split up an upcoming significant workload and provide a fixed rendering time. On the other hand in the long term period, although the GPU still did better, the CPU version showed a small performance increment, perhaps thanks to the caching mechanisms.

    Notice that an application operating on larger data would surely have taken a greater advantage from a massive parallelization approach. Our wave PDE simulation is quite simple indeed and did not require a significant workload thus reducing the performance gain that could have been achieved.

    Once and for all: there’s not a general rule to convert a sequentially-designed algorithm to a parallel one and each case must be evaluated in its own context and architecture. Also using CUDA could provide a great advantage in scientific simulations, but one should not consider the GPU as a substitute of the CPU but rather as a algebraic coprocessor that can rely on massive data parallelization. Combining CPU sequential code parts with GPU parallel code parts is the key to succeed. 

    CUDA Kernels best practices

    The last, but not the least, section of this paper provide a checklist of best-practices and errors to avoid when writing CUDA kernels in order to get the maximum from your GPU-accelerated application

    1. Minimize host <-> device transfers, especially device -> host transfers which are slower, also if that could mean running on the device kernels that would not have been slower on the CPU.
    2. Use pinned memory (pagelocked) on the host side to exploit bandwidth advantages. Be careful not to abuse it or you’ll slow your entire system down.
    3. cudaMemcpy is a blocking function, if possible use it asynchronously with pinned memory + a CUDA stream in order to overlap transfers with kernel executions.
    4. If your graphic card is integrated, zero-copy memory (that is: pinned memory allocated with cudaHostAllocMapped flag) always grants an advantage. If not, there is no certainty since the memory is not cached by the GPU.
    5. Always check your device compute capabilities (use cudaGetDeviceProperties), if < 2.0 you cannot use more than 512 threads per block (65535 blocks maximum).
    6. Check your graphic card specifications: when launching a AxB block grid kernel, each SM (streaming multiprocessor) can serve a part of them. If your card has a maximum of 1024 threads per SM you should size your blocks in order to fill as many of them as possible but not too many (otherwise you would get scheduling latencies). Every warp is usually 32 thread (although this is not a fixed value and is architecture dependent) and is the minimum scheduling unit on each SM (only one warp is executed at any time and all threads in a warp execute the same instruction - SIMD), so you should consider the following example: on a GT200 card you need to perform a matrix multiplication. Should you use 8x8, 16x16 or 32x32 threads per block?

    For 8X8 blocks, we have 64 threads per block. Since each SM can take up to 1024 threads, there are 16 Blocks (1024/64). However, each SM can only take up to 8 blocks. Hence only 512 (64*8) threads will go into each SM -> SM execution resources are under-utilized; fewer wraps to schedule around long latency operations

    For 16X16 blocks , we have 256 threads per Block. Since each SM can take up to 1024 threads, it can take up to 4 Blocks (1024/256) and the 8 blocks limit isn’t hit -> Full thread capacity in each SM and maximal number of warps for scheduling around long-latency operations (1024/32= 32 wraps).

    For 32X32 blocks, we have 1024 threads per block -> Not even one can fit into an SM! (there’s a 512 threads per block limitation).

    1. Check the SM registers limit per block and divide them by the thread number: you’ll get the maximum register number you can use in a thread. Exceeding it by just one per thread will cause less warp to be scheduled and decreased performance.
    2. Check the shared memory per block and make sure you don’t exceed that value. Exceeding it will cause less warp to be scheduled and decreased performance.
    3. Always check the thread number to be inferior than the maximum threads value supported by your device.
    4. Use reduction and locality techniques wherever applicable.
    5. If possible, never split your kernel code into conditional branches (if-then-else), different paths would cause more executions for the same warp and the following overhead.
    6. Use reduction techniques with divergence minimization when possible (that is, try to perform a reduction with warps performing a coalesced reading per cycle as described in chapter 6 of Kirk and Hwu book)
    7. Coalescence is achieved by forcing hardware reading consecutive data. If each thread in a warp access consecutive memory coalescence is significantly increased (half of the threads in a warp should access global memory at the same time), that’s why with large matrices (row-major) reading by columns is better than rows readings. Coalescence can be increased with locality (i.e. threads cooperating in loading data needed by other threads); kernels should perform coalesced readings with locality purposes in order to maximize memory throughputs. Storing values into shared memory (when possible) is a good practice too.
    8. Make sure you don’t have unused threads/blocks, by design or because of your code. As said before graphic cards have limits like maximum threads on SM and maximum blocks per SM. Designing your grid without consulting your card specifications is highly discouraged.
    9. As stated before adding more registers than the card maximum registers limit is a recipe for performance loss. Anyway adding a register may also cause instructions to be added, that is: more time to parallelize transfers or to schedule warps and better performances. Again: there’s no general rule, you should abide by the best practises when designing your code and then experiment by yourself.
    10. Data prefetching means preloading data you don’t actually need at the moment to gain performance in a future operation (closely related to locality). Combining data prefetching in matrix tiles can solve many long-latency memory access problems.
    11. Unrolling loops is preferable when applicable (as if the loop is small). Ideally loop unrolling should be automatically done by the compiler, but checking to be sure of that is always a better choice.
    12. Reduce thread granularity with rectangular tiles (Chapter 6 Kirk and Hwu book) when working with matrices to avoid multiple row/columns readings from global memory by different blocks.
    13. Textures are cached memory, if used properly they are significantly faster than global memory, that is: texture are better suited for 2D spatial locality accesses (e.g. multidimensional arrays) and might perform better in some specific cases (again, this is a case-by-case rule).
    14. Try to mantain at least 25% of the overall registers occupied, otherwise access latency could not be hidden by other warps’ computations.
    15. The number of threads per block should always be a multiple of warp size to favor coalesced readings.
    16. Generally if a SM supports more than just one block, more than 64 threads per block should be used.
    17. The starting values to begin experimenting with kernel grids are between 128 and 256 threads per block.
    18. High latency problems might be solved by using more blocks with less threads instead of just one block with a lot of threads per SM. This is important expecially for kernels which often call __syncthreads().
    19. If the kernel fails, use cudaGetLastError() to check the error value, you might have used too many registers / too much shared or constant memory / too many threads.
    20. CUDA functions and hardware perform best with float data type. Its use is highly encouraged.
    21. Integer division and modulus (%) are expensive operators, replace them with bitwise operations whenever possible. If n is a power of 2,

    1. Avoid automatic data conversion from double to float if possible.
    2. Mathematical functions with a __ preceeding them are hardware implemented, they’re faster but less accurate. Use them if precision is not a critical goal (e.g. __sinf(x) rather than sinf(x)).
    3. Use signed integers rather than unsigned integers in loops because some compilers optimize signed integers better (overflows with signed integers cause undefined behavior, then compilers might aggressively optimize them).
    4. Remember that floating point math is not associative because of round-off errors. Massively parallel results might differ because of this.
    5. 1) If your device supports concurrent kernel executions (see concurrentKernels device property) you might be able to gain additional performance by running kernels in parallel. Check your device specifications and the CUDA programming guides.

    This concludes our article. The goal was an exploration of GPGPU applications capabilities to improve scientific simulations and programs which aim to large data manipulations.




    This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

    About the Author

    Alesiani Marco

    I'm a Computer Science Engineer and I've been programming with a large variety of technologies for years. I love writing software with C/C++, CUDA, .NET and playing around with reverse engineering
    저작자 표시

    '소스코드' 카테고리의 다른 글

    Endogine sprite engine  (0) 2012.07.12
    Paint.NET  (0) 2012.07.12
    GPGPU on Accelerating Wave PDE  (0) 2012.07.12
    Microsoft® Surface® 2 Design and Interaction Guide  (0) 2012.07.12
    From Soup to Nuts with the Surface SDK 2.0  (0) 2012.07.12
    Microsoft® Surface® 2 Development Whitepaper  (0) 2012.07.12

    Microsoft® Surface® 2 Design and Interaction Guide




    Microsoft Surface 2 0 Design and Interaction Guide.pdf

    Microsoft® Surface® 2 Design and Interaction Guide

    Microsoft® Surface® 2 Design and Interaction Guide

    Quick details

    Version: 2.0 Date published: 7/11/2011
    Language: English
    File name Size
    Microsoft Surface 2 0 Design and Interaction Guide.pdf 2.8 MB Download


    The Microsoft Surface 2.0 Design and Interaction Guide helps designers and developers create Surface applications for Microsoft Surface and Windows 7 touch PCs. Developing compelling Surface experiences requires a different approach to interface design. This document presents design principles and guidelines to address key aspects of application interface design including: interaction, visual, sound, text, and more. These principles and practices are a starting point to get the most out of the Surface software and hardware platform’s unique capabilities.

    Top of pageTop of page

    System requirements

    Supported operating systems: Windows 7

    The file is in .PDF format, so a .PDF reader is required.

    Top of pageTop of page


    Download the document and open it with a compatible reader.

    저작자 표시

    From Soup to Nuts with the Surface SDK 2.0


    From Soup to Nuts with the Surface SDK 2.0

    By | 31 Jul 2011 | Article
    A look at the new Surface SDK 2.0 that was recently released by Microsoft


    With the Microsoft® Surface® 2.0 SDK, you can easily create applications to take advantage of the next generation Surface computing device or any Windows touch-enables devices (defined by Microsoft).

    Links worth checking out (thanks to Luis Cabrera):

    Getting the SDK Installed

    After downloading the Surface 2 SDK, double click the installer on the SDK to get the ball rolling.

    Getting the Runtime Installed

    Once that is complete, then double click the installer for the Surface 2.0 Runtime.

    A Few Things to Note After Installing It

    Hit your Start button and go to your programs and navigate to the Microsoft Surface 2.0 SDK. You will notice the normal “Getting Help” and “Release Notes”, but it also contains Surface Samples and Tools.

    Surface Samples

    After clicking on that folder, you will see a Surface Code file.

    Go ahead and extract the zip file and you will notice the following sample project exists.

    Once loaded into Visual Studio 2010, you will see 14 projects exist inside of the solution.

    Go ahead and set one of them as your “Startup Project”.

    You can now use your mouse or a touch-enabled monitor to interact with the application. You also have the full source code, so you can manipulate the application all you want.

    They have several other great examples of what the Surface 2.0 SDK is capable of.

    The Tools Folder

    Inside the tools folder, you will find the following applications:

    Input Simulator - Simulate touch input and supported hardware parameters.

    According to the docs (updated now for version 2.0):

    Surface Simulator replicates the user interface and behavior of a Microsoft Surface unit that is in user mode. Surface Simulator has access points, Launcher, and the loading screen. When you start an application in Surface Simulator, the application displays like it is on a Microsoft Surface unit.

    You can use Surface Simulator to evaluate how an application and its user interface respond to basic input. For example, if you simulate a painting application and if you touch multiple colors, one at a time, and then add the colors to a mixing bucket, you can test the logic of the application and how well it mixes the colors by using the touch-based interface.

    Surface Simulator runs with the appearance and functionality of a Microsoft Surface unit in user mode (the way that it appears to users). You can switch applications by using Launcher and the access points that display on the Launcher screen and the applications.

    Input Visualizer - Display input data on top of a Microsoft Surface application.

    According to the docs:

    The Input Visualizer tool enables you to see the contact data that the Microsoft Surface Vision System returns in the context of your application. This tool runs on top of your application and displays information about the contacts that the input system detects.

    Input Visualizer can help you test and debug the following scenarios:

    • Accidental input: Track the accidental activation of Microsoft Surface controls from palms, forearms, and other objects by seeing when these controls detect contacts.
    • Contact tracking: Determine what gestures are lost as contacts when users are dragging content in Microsoft Surface applications. You can use the fade away feature of Input Visualizer for this type of tracking.
    • Input hit-testing: Investigate where hit-testing occurs by freezing the user interface of Input Visualizer, lifting contacts, and seeing where their centers are reported.

    Input Visualizer is installed with the Microsoft Surface SDK and runs only on Microsoft Surface units. If you are developing on a separate workstation, Surface Simulator provides contact visuals, reducing how much you need a visual representation of input.

    Surface Stress - Open a command prompt window to run stress tests against a Microsoft Surface application.

    According to the docs:

    The Surface Stress tool enables you to test the stability and robustness of your Microsoft Surface application by delivering multiple, simultaneous contacts to your application in a random way. Surface Stress generates all four types of contacts: fingers, blobs, byte tags, and identity tags.

    Surface Stress is included with Microsoft Surface SDK 1.0 SP1. By default, the Surface Stress executable file (SurfaceStress.exe) is located in the C:\Program Files\Microsoft SDKs\Surface\v1.0\Tools\SurfaceStress folder, and a shortcut to Surface Stress appears in the Start menu under the Microsoft Surface SDK entry.

    Let’s create a new project.

    Now that you have learned how to download and get started with it, it is time to actually create an application. Go ahead and fire up Visual Studio 2010 and begin a new project. Look for Surface then v2.0.

    You will notice that you have 2 templates to start with:

    • Surface Application (WPF)
    • Surface Application (XNA Game Studio 4.0)

    We are only going to focus on the Surface Application (WPF).

    Go ahead and give your application a name and hit OK.

    At first glance, you will realize this is just a WPF application. The folder structure looks just like what we would expect for a WPF application except that you have a “Resources” folder, a .xml document and MainPage.xaml is now called SurfaceWindow1.xaml.

    Let’s go ahead and take a look at the Toolbox. What we are most interested in is the “Surface Controls”. As you can see from this long list, there is a lot of Surface specific controls at our disposal right off the bat.

    Let’s go ahead and use the “SurfaceInkCanvas”. So drag and drop it onto the SurfaceWindow1.xaml file.

    Make sure your XAML looks very similar to the following:

    <s:SurfaceWindow x:Class="MichaelSurfaceApplication.SurfaceWindow1"
            <s:SurfaceInkCanvas Name="SampleInkCanvas" 
    	HorizontalAlignment="Stretch" VerticalAlignment="Stretch" >
                    <DrawingAttributes Color="#FF808080"/>

    Now go ahead and run your application and you should get the following screen. Go ahead and draw something on the screen and then close the window.

    Congratulations! You just created your first Surface 2.0 application while actually writing no code. While you are probably testing it on your laptop or desktop, this application would actually run on a Surface 2 Unit! Very cool stuff indeed.


    The Surface is very cool technology and I am planning on investing a lot of time into it and other things such as Kinect. Microsoft really got it right with the Surface 2.0 SDK. I think that this is possibly the best SDK release Microsoft has ever been a part of. The documentation is excellent, the samples are a plenty and it’s just plain easy to build your first application. Now if only I had an actual Surface 2 table in my house to play with, then I would be really happy.


    This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

    About the Author


    Software Developer (Senior)
    United States United States


    Follow on Twitter Follow on Twitter
    Michael Crump is a Silverlight MVP and MCPD that has been involved with computers in one way or another for as long as he can remember, but started professionally in 2002. After spending years working as a systems administrator/tech support analyst, Michael branched out and started developing internal utilities that automated repetitive tasks and freed up full-time employees. From there, he was offered a job working at McKesson corporation and has been working with some form of .NET and VB/C# since 2003.
    He has worked at Fortune 500 companies where he gained experience in embedded systems design and software development to systems administration and database programming, and everything in between.
    His primary focus right now is developing healthcare software solutions using Microsoft .NET technologies. He prefers building infrastructure components, reusable shared libraries and helping companies define, develop and automate process standards and guidelines.
    You can read his blog at: or follow him on Twitter at @mbcrump.
    저작자 표시

    Microsoft® Surface® 2 Development Whitepaper


    Developing Surface Applications.pdf



    별 내용은 없지만, 저장해두는 것임.



    Microsoft® Surface® 2 Development Whitepaper

    Microsoft® Surface® 2 Development Whitepaper

    Quick details

    Version: 2.0 Date published: 7/11/2011
    Language: English
    File name Size
    Developing Surface Applications.pdf 619 KB Download


    This paper provides an overview of the Microsoft Surface application development process. It provides detailed information about the Surface platform and unique capabilities of the hardware. Topics include the Surface 2.0 SDK, vision based touch input, and system architecture. This development whitepaper covers the basic end-to-end process for creating great Surface applications.

    Top of pageTop of page

    System requirements

    Supported operating systems: Windows 7

    The document is in .PDF format, so a .PDF compatible reader is required.

    Top of pageTop of page


    Download the file and open it with a compatible reader.

    저작자 표시

    Microsoft Surface SDK 1.0 SP1 Workstation Edition

    Microsoft Surface SDK 1.0 SP1 Workstation Edition

    The Microsoft Surface SDK 1.0 Service Pack 1 (SP1) Workstation Edition enables you to create and test Microsoft Surface touch-enabled applications on a workstation instead of on a Microsoft Surface unit.

    Quick details

    Version: 1.0 Date published: 11/5/2009
    Language: English

    Files in this download

    The links in this section correspond to files available for this download. Download the files appropriate for you.

    File name Size
    Release Notes for Microsoft Surface SDK 1.0 SP1 Workstation Edition.xps 598 KB Download
    Start Here for Microsoft Surface SDK 1.0 SP1 Workstation Edition.xps 588 KB Download
    SurfaceSDKWE.msi 144.2 MB Download


    The Microsoft Surface SDK, Workstation Edition, includes a simulator (called Surface Simulator) that replicates the Microsoft Surface user interface on a workstation. Surface Simulator, along with the Microsoft Visual Studio project templates that are included in the Microsoft Surface SDK, enables you to create and test Microsoft Surface touch-enabled applications on a workstation instead of on a Microsoft Surface unit.

    Important: If you develop a Microsoft Surface application on a workstation, the final testing step is to run and test your application on a Microsoft Surface unit.

    Top of pageTop of page

    System requirements

    Supported operating systems: Windows Vista Business, Windows Vista Enterprise, Windows Vista Home Premium, Windows Vista Ultimate

      • A 32-bit edition of one of the following Windows Vista operating systems:
        • Windows Vista Business
        • Windows Vista Enterprise
        • Windows Vista Ultimate
        • Windows Vista Home Premium
    • Additional Requirements:
      • Microsoft Visual C# 2008 Express Edition or Microsoft Visual Studio 2008
      • Microsoft XNA Framework Redistributable 2.0
    Note: For additional important software and hardware requirements, download the Start Here guide.

    Important: You can install the Surface SDK 1.0 SP1, Workstation Edition on additional versions of Windows Vista and on the Windows 7 operating system. However, these additional operating systems are unsupported for the Surface SDK.

    Top of pageTop of page


    1. Install one of the Windows Vista operating systems listed earlier.
    2. Install Visual C# 2008 Express Edition or Visual Studio 2008.
    3. Install XNA Framework Redistributable 2.0
    4. Install the Surface SDK Workstation Edition.
    For additional information about how to develop Surface applications, see Microsoft Surface in the MSDN Library.

    Top of page
    저작자 표시

    '소스코드' 카테고리의 다른 글

    From Soup to Nuts with the Surface SDK 2.0  (0) 2012.07.12
    Microsoft® Surface® 2 Development Whitepaper  (0) 2012.07.12
    Microsoft Surface SDK 1.0 SP1 Workstation Edition  (0) 2012.07.12
    Microsoft Surface 2.0 SDK  (0) 2012.07.12
    Surface Samples  (0) 2012.07.12
    Kinect for Windows SDK v1.5  (0) 2012.07.12

    Microsoft Surface 2.0 SDK

    Microsoft Surface 2.0 SDK

    2 out of 3 rated this helpful - Rate this topic

    The Microsoft Surface 2.0 SDK provides the managed APIs and the tools you need to develop Surface applications. Applications that are built using the Surface SDK can run on devices made for Surface 2.0, and on Windows 7 computers. Developing applications for Surface is essentially the same as developing WPF or XNA applications, except that the Surface SDK provides extended support for the special features of the Surface environment (50 simultaneous touch points, finger and blob recognition, tagged objects, detection of the orientation of touches, tilted display, rotated display, specialized controls, and so on). Surface applications that are installed and registered on a device made for Surface are automatically integrated with the Surface Shell and can make use of those special features. For a video that shows a device made for Surface in use, see What Is Surface.

    In this Section

    Additional Information

    Did you find this information useful? Please send us your suggestions and comments.

    © Microsoft Corporation. All rights reserved.
    저작자 표시

    '소스코드' 카테고리의 다른 글

    Microsoft® Surface® 2 Development Whitepaper  (0) 2012.07.12
    Microsoft Surface SDK 1.0 SP1 Workstation Edition  (0) 2012.07.12
    Microsoft Surface 2.0 SDK  (0) 2012.07.12
    Surface Samples  (0) 2012.07.12
    Kinect for Windows SDK v1.5  (0) 2012.07.12
    FFMPEG  (0) 2012.07.11

    Surface Samples



    Surface Samples

    This topic has not yet been rated - Rate this topic

    The Microsoft Surface SDK contains several types of samples, including quick start tutorials, how-to topics, and extractable Microsoft Visual Studio 2010 projects for sample applications.

    Quick Starts

    The following topics are basic tutorials to help you create your first Surface application for the Presentation layer or the Core layer:

    "How Do I…?" Examples

    Sample Application Projects

    The sample applications that come with the Surface SDK show several different programming techniques in a complete application. You can use these applications as a starting point for more complete applications or just as examples of best practices in Surface programming. For information about obtaining the sample files, see Extracting and Installing the Surface Samples.

    Samples That Use the Core Layer and the XNA Framework

    • Finger Fountain draws small images for every contact at every frame. This sample emphasizes multiple touches and shows how to use the Microsoft XNA APIs.

    • Framework provides an extensive sample framework that helps you create controls by using the Core layer. The code in this sample eliminates inconsistent behavior among Core-based applications by using the Model-View-Controller (MVC) design pattern.

    • Cloth is an XNA-based application that demonstrates how to use the Core Interaction Framework.

    • RawImage Visualizer shows how to use the RawImage APIs for XNA applications. This sample displays captured normalized (8 bit per pixel) images that are flipped vertically.

    • XNA Scatter demonstrates how to use the manipulations and inertia APIs to move graphical user interface (GUI) components around in a Surface application in a natural and intuitive way.

    Samples That Use the Presentation Layer (WPF)

    • Controls Box shows how to build simple application behaviors from touch-enabled controls that the Presentation layer provides, such as updating a text box when a user touches a button.

    • Data Visualizer shows contact properties that are exposed in the Presentation layer (such as x, y, height, width, major axis, minor axis, and orientation) and how you can read and use these properties in a Surface application.

    • Grand Piano demonstrates how to integrate sound into Surface applications based on the Presentation layer.

    • Item Compare represents a simple tool that lets a user compare and contrast the properties of two "items" (tagged objects).

    • Photo Paint uses the SurfaceInkCanvas control to implement drawing and painting over pictures and video.

    • ScatterPuzzle shows an implementation of the ScatterView and SurfaceListBox controls to create a simple puzzle game. The ScatterView and SurfaceListBox controls automatically provide some powerful features related to Surface.

    • Shopping Cart shows how to implement drag-and-drop functionality in a retail application.

    • Tag Visualizer Events shows how to incorporate hit-testing in the TagVisualizer control to let user interface (UI) elements react when tagged objects move over them.

    Did you find this information useful? Please send us your suggestions and comments.

    © Microsoft Corporation. All rights reserved.
    저작자 표시

    '소스코드' 카테고리의 다른 글

    Microsoft Surface SDK 1.0 SP1 Workstation Edition  (0) 2012.07.12
    Microsoft Surface 2.0 SDK  (0) 2012.07.12
    Surface Samples  (0) 2012.07.12
    Kinect for Windows SDK v1.5  (0) 2012.07.12
    FFMPEG  (0) 2012.07.11
    qt style 설정  (0) 2012.07.05

    Kinect for Windows SDK v1.5



    Kinect for Windows SDK v1.5

    Quick links

    The Kinect for Windows SDK enables developers to create applications that support gesture and voice recognition, using Kinect sensor technology on computers running Windows 7, Windows 8 consumer preview, and Windows Embedded Standard 7.

    Quick details

    Version: Date published: 5/18/2012
    Language: English
    File name Size
    KinectSDK-v1.5-Setup.exe 222.0 MB Download


    What's new?

    The Kinect for Windows SDK v1.5, driver, and runtime are 100% compatible with Kinect for Windows v1.0 applications and include new features such as: skeletal tracking in near range, seated skeletal tracking, joint orientation, and other improvements.

    Learn more about the Kinect for Windows commercial SDK

    View Release Notes >

    Explore the features >

    The Kinect for Windows SDK includes the following:

    • Drivers for using Kinect for Windows sensors on a computer running Windows 7, Windows 8 consumer preview, and Windows Embedded Standard 7
    • Application programming interfaces (APIs) and device interfaces
    • Note: Samples have been removed from the SDK install. Samples, tools, and other valuable development resources are now available in the Kinect for Windows Developer Toolkit.

    Top of pageTop of page

    System requirements

    Supported operating systems: Windows 7

      Windows Embedded Standard 7

    Top of pageTop of page


    To install the SDK:

    1. Make sure the Kinect sensor is not plugged into any of the USB ports on the computer.
    2. If you have the Kinect for Windows v1.0 SDK installed, close any open samples, the Sample Browser, etc. You do not need to uninstall the v1.0 SDK. Skip to step 5.
    3. Remove any other drivers for the Kinect sensor.
    4. If you have Microsoft Server Speech Platform 10.2 installed, uninstall the Microsoft Server Speech Platform Runtime and SDK components including both the x86 and x64 bit versions, plus the Microsoft Server Speech Recognition Language - Kinect Language Pack.
    5. Close Visual Studio. You must close Visual Studio before installing the SDK and then restart it after installation to pick up environment variables that the SDK requires.
    6. From the download location, double-click on KinectSDK-v1.5-Setup.exe. This single installer works for both 32-bit and 64-bit Windows.
    7. Once the SDK has completed installing successfully, ensure the Kinect sensor is plugged into an external power source and then plug the Kinect sensor into the PC's USB port. The drivers will load automatically.
    8. The Kinect sensor should now be working correctly.
    9. Download the Kinect for Windows Developer Toolkit, which contains source code samples, tools, and other valuable development resources that simplify developing Kinect for Windows applications.

    Top of page
    저작자 표시

    '소스코드' 카테고리의 다른 글

    Microsoft Surface 2.0 SDK  (0) 2012.07.12
    Surface Samples  (0) 2012.07.12
    Kinect for Windows SDK v1.5  (0) 2012.07.12
    FFMPEG  (0) 2012.07.11
    qt style 설정  (0) 2012.07.05
    MOTODEV app-validator  (0) 2012.06.28


    Project Description

    FFmpeg is a complete, cross-platform solution to record, convert and stream audio and video. It includes libavcodec - the leading audio/video codec library. See the documentation for a complete feature list and theChangelog for recent changes.

    FFmpeg is free software licensed under the LGPL or GPL depending on your choice of configuration options. If you use FFmpeg or its constituent libraries, you must adhere to the terms of the license in question. You can find basic compliance information and get licensing help on our license and legal considerations page.

    Looking for help? Contact us, but before you report any bugs, read the guidelines that we created for this purpose.

    Want to participate in the active development of FFmpeg? Keep up with the latest developments by subscribing to both the ffmpeg-devel and ffmpeg-cvslog lists.

    News [RSS]

    July, 5, 2012, Donations

    We're glad to announce that FFmpeg has been accepted as SPI associated project.

    Donations to FFmpeg can be done through SPI, following the instructions here, or following this direct Click&Pledge link.

    Donations will be used to fund expenses related to development (e.g. to cover equipment and server maintenance costs), to sponsor bug fixing, feature development, the participation or organization of meetings and events in the project interest area, and to support internal development or educational projects or any other activity promoting FFmpeg.

    June, 7, 2012, FFmpeg 0.11.1

    We have made a new point releases (0.11.1). It contains about 70 bugfixes, some possibly security relevant.

    We recommend users, distributors and system integrators to upgrade to 0.11.1 or git master.

    May, 25, 2012, FFmpeg 0.11

    We have made a new major release (0.11) It contains all features and bugfixes of the git master branch. A partial list of new stuff is below:

    Fixes:CVE-2012-2772, CVE-2012-2774, CVE-2012-2775, CVE-2012-2776, CVE-2012-2777,
          CVE-2012-2779, CVE-2012-2782, CVE-2012-2783, CVE-2012-2784, CVE-2012-2785,
          CVE-2012-2786, CVE-2012-2787, CVE-2012-2788, CVE-2012-2789, CVE-2012-2790,
          CVE-2012-2791, CVE-2012-2792, CVE-2012-2793, CVE-2012-2794, CVE-2012-2795,
          CVE-2012-2796, CVE-2012-2797, CVE-2012-2798, CVE-2012-2799, CVE-2012-2800,
          CVE-2012-2801, CVE-2012-2802, CVE-2012-2803, CVE-2012-2804,
    - v408 Quicktime and Microsoft AYUV Uncompressed 4:4:4:4 encoder and decoder
    - setfield filter
    - CDXL demuxer and decoder
    - Apple ProRes encoder
    - ffprobe -count_packets and -count_frames options
    - Sun Rasterfile Encoder
    - ID3v2 attached pictures reading and writing
    - WMA Lossless decoder
    - bluray protocol
    - blackdetect filter
    - libutvideo encoder wrapper (--enable-libutvideo)
    - swapuv filter
    - bbox filter
    - XBM encoder and decoder
    - RealAudio Lossless decoder
    - ZeroCodec decoder
    - tile video filter
    - Metal Gear Solid: The Twin Snakes demuxer
    - OpenEXR image decoder
    - removelogo filter
    - drop support for ffmpeg without libavfilter
    - drawtext video filter: fontconfig support
    - ffmpeg -benchmark_all option
    - super2xsai filter ported from libmpcodecs
    - add libavresample audio conversion library for compatibility
    - MicroDVD decoder
    - Avid Meridien (AVUI) encoder and decoder
    - accept + prefix to -pix_fmt option to disable automatic conversions.
    - complete audio filtering in libavfilter and ffmpeg
    - add fps filter
    - audio split filter
    - vorbis parser
    - png parser
    - audio mix filter

    We recommend users, distributors and system integrators to upgrade unless they use current git master.

    April 12, 2012, FFmpeg 0.7.12 / 0.8.11

    We have made two new point releases (0.7.12 and 0.8.11). An abbreviated list of changes is below:

    Fixes: CVE-2012-0853, CVE-2012-0858, CVE-2011-3929, CVE-2011-3936,
           CVE-2011-3937, CVE-2011-3940, CVE-2011-3945, CVE-2011-3947
    Several security issues that dont have CVE numbers.
    and about 150 bugfixes
    See the changelog for details.

    We recommend distributors and system integrators to upgrade to 0.10.2 or git master when possible though.

    April, 4, 2012, Server Upgrade

    Today our main server has been upgraded due to performance issues with our bug tracker. While investigating the speed issues, we also took the opportunity to add voting support to bug reports and wiki pages, so you can now "tell" us which issues you want us to work on first.

    March, 17, 2012, FFmpeg 0.10.1

    We have made a new point release (0.10.1) It contains some security fixes, over 100 bugfixes and some new features like the swapuv filter. See the changelog for details. We recommend users, distributors and system integrators to upgrade unless they use current git master.

    January, 27, 2012, FFmpeg 0.10

    We have made a new major release (0.10) It contains all features and bugfixes of the git master branch. A partial list of new stuff is below:

    Fixes: CVE-2011-3929, CVE-2011-3934, CVE-2011-3935, CVE-2011-3936,
           CVE-2011-3937, CVE-2011-3940, CVE-2011-3941, CVE-2011-3944,
           CVE-2011-3945, CVE-2011-3946, CVE-2011-3947, CVE-2011-3949,
           CVE-2011-3950, CVE-2011-3951, CVE-2011-3952
    v410 Quicktime Uncompressed 4:4:4 10-bit encoder and decoder
    SBaGen (SBG) binaural beats script demuxer
    OpenMG Audio muxer
    Timecode extraction in DV and MOV
    thumbnail video filter
    XML output in ffprobe
    asplit audio filter
    tinterlace video filter
    astreamsync audio filter
    amerge audio filter
    ISMV (Smooth Streaming) muxer
    GSM audio parser
    SMJPEG muxer
    XWD encoder and decoder
    Automatic thread count based on detection number of (available) CPU cores
    y41p Brooktree Uncompressed 4:1:1 12-bit encoder and decoder
    ffprobe -show_error option
    Avid 1:1 10-bit RGB Packer codec
    v308 Quicktime Uncompressed 4:4:4 encoder and decoder
    yuv4 libquicktime packed 4:2:0 encoder and decoder
    ffprobe -show_frames option
    silencedetect audio filter
    ffprobe -show_program_version, -show_library_versions, -show_versions options
    rv34: frame-level multi-threading
    optimized iMDCT transform on x86 using SSE for for mpegaudiodec
    Improved PGS subtitle decoder
    dumpgraph option to lavfi device
    r210 and r10k encoders
    ffwavesynth decoder
    aviocat tool
    ffeval tool
    all features from avconv merged into ffmpeg

    We recommend users, distributors and system integrators to upgrade unless they use current git master.

    January 24, 2012, Forgotten Patches

    FFmpeg development has gone into OVERDRIVE. Over the years we have missed patches, so we need your help to locate old unapplied patches to review again.

    If you find a patch that was never applied, please let us know, either by resubmitting it to ffmpeg-devel or by attaching it to a bug on our bug tracker.

    For example, did you know there was a patch to read DVDs with FFmpeg? Its now being reviewed and fixed up for inclusion. Want to add BluRay support? We're interested!

    January 16, 2012, Chemnitzer Linux-Tage

    We happily announce that FFmpeg will be represented at `Chemnitzer Linux-Tage' in Chemnitz, Germany. The event will take place on 17th and 18th of March.

    More information can be found here

    We hereby invite you to visit us at our booth located in the Linux-Live area! There we will demonstrate usage of FFmpeg, answer your questions and listen to your problems and wishes.

    January 12, 2012, FFmpeg 0.8.10, 0.7.11, 0.6.5, 0.5.8

    We have made 4 new point releases, ( and 0.8.10). All of them contain fixes for CVE-2011-3892 (already in previous 0.8 and 0.7 releases), CVE-2011-3893, and CVE-2011-3895. In addition 0.8.10 and 0.7.11 contain all critical security fixes from 0.9.1. We recommend users, distributors and system integrators to upgrade unless they use current git master. We recommend everyone to upgrade to at least 0.7.11, 0.8.10 or 0.9.1.

    January 5, 2012, FFmpeg 0.9.1

    We have made a new point release, (0.9.1). It contains many bug and security fixes, amongth them CVE-2011-3893 and CVE-2011-3895. It also significantly improves seeking support in H.264. We recommend users, distributors and system integrators to upgrade unless they use current git master.

    December 25, 2011, FFmpeg 0.5.7, 0.6.4, 0.7.9, 0.8.8

    We have made 4 new point releases, ( and 0.8.8). They contain some bug fixes, minor changes and security fixes. Note, CVE-2011-4352, CVE-2011-4579, CVE-2011-4353, CVE-2011-4351, CVE-2011-4364 and the addition of avcodec_open2() for libx264 have been fixed/done in previous 0.7 and 0.8 point releases already. We recommend users, distributors and system integrators to upgrade unless they use current git master. We recommend everyone to upgrade to at least 0.7.8, 0.8.7 or 0.9.

    December 23, 2011, Call For Maintainers

    FFmpeg is moving faster than ever before, and with your help we could move even faster. If you know C and git and want to maintain some part of FFmpeg you can help us. Clone git://, pick an area of the codebase you want to maintain, subscribe to ffmpeg-devel and start hacking on the code you are interested in, review patches on the mailing list, and fix bugs from our bug tracker that are related to the area you want to maintain. Once you are happy with your work just send us a link to your public git clone (for example from Github). Non-programmers are welcome to contribute too. We are also searching for someone to make new official Debian and Ubuntu packages, that would be part of the official distributions. If you have questions, just ask on ffmpeg-devel mailing list or our IRC channel #ffmpeg-devel.

    December 20, 2011, Winter logo

    Our winter logo has been drawn by Daniel Perez from Google Code-In. FFmpeg has teamed up with VideoLAN to help pre-university students contribute to open-source projects. See the Google Code-In VideoLAN project page if you would like to contribute.

    We would also like to thank our students who have already participated.

    December 11, 2011, FFmpeg 0.9

    We have made a new major release (0.9) It contains all features and bugfixes of the git master branch. A partial list of new stuff is below:

    native dirac decoder
    mmsh seeking
    more accurate rgb->rgb in swscale
    MPO file format reading support
    mandelbrot fraktal video source
    libass filter
    export quarter_sample & divx_packed from decoders
    VBLE decoder
    libopenjpeg encoder
    alpha opaqueness fixes in many codecs
    8bit palette dynamic range fixes in many codecs
    OS/2 threads support
    cbr mp3 muxing fix
    sample rate change support in flv (nellymoser decoder)
    mov/mp4 chunking support (equivalent to mp4boxs -inter)
    mov/mp4 fragment support (equivalent to mp4boxs -frag)
    rgba tiffs
    x264rgb bugfix
    cljrencoder with dither
    escape130 decoder
    many new ARM optimizations
    Dxtory capture format decoder
    life video source
    wtv, sox, utvideo and many other new regression tests
    gcc coverage support
    cellauto video source
    planar rgb input support in sws
    libmodplug & bintext output
    g723.1 encoder
    g723.1 muxer
    random() function for the expression evaluator
    persistent variables for the expression evaluator
    pulseaudio input support
    h264 422 inter decoding support
    prores encoder
    native utvideo decoder
    libutvideo support
    deshake filter
    aevalsrc filter
    segment muxer
    mkv timecode v2 muxer
    cache urlprotocol
    libaacplus support
    ACT/BIT demuxers
    AMV video encoder
    g729 decoder
    stdin control of drawtext
    2bpp, 4bpp png support
    interlaced 1bpp and PAETH png fixes
    libspeex encoding support
    hardened h264 decoder that wont overread the bitstream
    wtv muxer
    H/W Accelerated H.264 Decoding on Android
    stereo3d filter from libmpcodecs works now
    an experimental jpeg2000 encoder
    many bugfixes

    We recommend users, distributors and system integrators to upgrade unless they use current git master.

    December 10, 2011, Donations

    Want to donate to FFmpeg? Well, theres no way to do that currently. Luckily we dont need any money. But there are many not for profit organizations with noble goals that do. Select one of your choice that you trust and agree with their goals and instead of donating to FFmpeg, send your donation to them.

    November 29, 2011, Google Code-in

    The FFmpeg project participates for the first time in Google Code-in. Thanks go to the VideoLAN project for making this possible! We welcome all eligible students to pick up some task and win a T-Shirt or some money from google and at the same time have some fun and contribute to a Free software project.

    November 21, 2011

    We have made 2 new point releases (0.7.8 and 0.8.7) that fix many bugs, several of which are security relevant. Amongth them NGS00144, NGS00145 and NGS00148. We recommend users, distributors and system integrators to upgrade unless they use current git master.

    stop censorship logoNovember 20, 2011

    FFmpeg supports the fight against American Internet censorship.

    November 6, 2011

    We have made a new point release (0.5.5) from the old 0.5 branch. It fixes many serious security issues, a partial list is below.

    d39cc3c0 resample2: fix potential overflow
    e124c3c2 resample: Fix overflow
    8acc0546 matroskadec: fix out of bounds write
    c603cf51 qtrle: check for out of bound writes.
    e1a46eff qtrle: check for invalid line offset
    23aaa82b vqa: fix double free on corrupted streams
    58087a4e mpc7: return error if packet is too small.
    8d1fa1c9 mpc7: check output buffer size before decoding
    2eb5f77b h264: do not let invalid values in h->ref_count after a decoder reset.
    ddbbe500 h264: fix the check for invalid SPS:num_ref_frames.
    d1a5b53e h264: do not let invalid values in h->ref_count on ff_h264_decode_ref_pic_list_reordering() errors.
    3699a46e Check for out of bound writes in the QDM2 decoder.
    62da9203 Check for out of bound writes in the avs demuxer.
    2e1e3c1e Check for corrupted data in avs demuxer.
    635256a3 Fix out of bound writes in fix_bitshift() of the shorten decoder.
    240546a1 Check for out of bounds writes in the Delphine Software International CIN decoder.
    07df40db Check for invalid update parameters in vmd video decoder.
    b24c2e59 Release old pictures after a resolution change in vp5/6 decoder
    25bc1108 Check output buffer size in nellymoser decoder.
    8ef917c0 check all svq3_get_ue_golomb() returns.
    648dc680 Reject audio tracks with invalid interleaver parameters in RM demuxer.
    d6f8b654 segafilm: Check for memory allocation failures in segafilm demuxer.
    d8439f04 rv34: check that subsequent slices have the same type as first one.
    6108f04d Fixed segfault on corrupted smacker streams in the demuxer.
    b261ebfd Fixed segfaults on corruped smacker streams in the decoder.
    03db051b Fixed segfault with wavpack decoder on corrupted decorrelation terms sub-blocks.
    9cda3d79 rv10: Reject slices that does not have the same type as the first one
    52b8edc9 oggdec: fix out of bound write in the ogg demuxer
    2e17744a Fixed off by one packet size allocation in the smacker demuxer.
    19431d4d ape demuxer: fix segfault on memory allocation failure.
    ecd6fa11 Check for invalid packet size in the smacker demuxer.
    80fb9f2c cavsdec: avoid possible crash with crafted input
    46f9a620 Fix possible double free when encoding using xvid.
    4f07a3aa Fix memory (re)allocation in matroskadec.c, related to MSVR-11-0080. Fixes: MSVR11-011, CVE-2011-3504
    04888ede cavs: fix some crashes with invalid bitstreams Fixes CVE-2011-3362, CVE-2011-3973, CVE-2011-3974
    24cd7c5d Fix apparently exploitable race condition.
    8210ee22 AMV: Fix possibly exploitable crash. Fixes

    We recommend distributors and system integrators whenever possible to upgrade to 0.7.7, 0.8.6 or git master. But when this is not possible 0.5.5 is more secure than previous releases from the 0.5 branch. If you are looking for an updated 0.6 release, please consider 0.7.7 which is ABI compatible and contains a huge number of security fixes that are missing in 0.6.*.

    November 4, 2011

    We have made 2 new point releases (0.7.7 and 0.8.6) that fix around 90 bugs, several of which are security relevant. We recommend users, distributors and system integrators to upgrade unless they use current git master.

    October 29, 2011

    New stuff in git master:

    planar rgb input support in sws
    libmodplug & bintext output
    g723.1 encoder
    g723.1 muxer
    random() function for the expression evaluator
    persistent variables for the expression evaluator
    pulseaudio input support
    h264 422 inter decoding support
    prores encoder
    native utvideo decoder
    libutvideo support
    deshake filter
    aevalsrc filter
    segment muxer
    mkv timecode v2 muxer
    cache urlprotocol
    many bugfixes and many other things

    October 2, 2011

    We have made 2 new point releases (0.7.6 and 0.8.5) that fix security issues in

    4X Technologies demuxer
    4xm decoder
    ADPCM IMA Electronic Arts EACS decoder
    ANM decoder
    Delphine Software International CIN decoder
    Deluxe Paint Animation demuxer
    Electronic Arts CMV decoder
    PTX decoder
    QDM2 decoder
    QuickDraw decoder
    TIFF decoder
    Tiertex Limited SEQ decoder
    aac decoder
    avi demuxer
    avs demuxer
    bink decoder
    flic decoder
    h264 decoder
    indeo2 decoder
    jpeg 2000 decoder,
    libx264 interface to x264 encoder
    mov muxer
    mpc v8 decoder
    rasterfile decode
    shorten decoder
    sun raster decoder
    unsharp filter
    vmd audio decoder
    vmd video decoder
    wmapro decoder
    wmavoice decoder
    xan decoder

    These releases also add libaacplus support and include all changes from 0.7.2.
    We recommend users, distributors and system integrators to upgrade unless they use current git master.

    September 28, 2011

    New stuff in git master:

        libaacplus support
        ACT/BIT demuxers
        AMV video encoder
        g729 decoder
        stdin control of drawtext
        2bpp, 4bpp png support
        interlaced 1bpp and PAETH png fixes
        libspeex encoding support
        hardened h264 decoder that wont overread the bitstream
        wtv muxer
        H/W Accelerated H.264 Decoding on Android
        stereo3d filter from libmpcodecs works now
        an experimental jpeg2000 encoder
        many bugfixes

    September 22, 2011

    We have made 2 new point releases that fix more security issues. They also include many bugfixes and a few backported features, for example speex encoding support through libspeex has been backported. All changes from the latest libav release (0.7.1) are included as well. Grab them from our download page. or even better use latest git master.

    September 15, 2011

    FFmpeg now has a ProRes decoder in master git.

    We want to support more raw or 10bit or broadcast codecs. We need samples of the following codecs. If you have some, please upload them to our trac.

    Codec name / isom or fourcc

    Pinnacle TARGA2000	dvr1
    Pinnacle TARGA Cine YUV	Y216
    BlackMagic Design 	Vr21
    Digital Voodoo DV10 HD10
    Media-100 844/X Uncompressed v.2.02	MYUV
    Media-100 iFinish Transcoder 	dtmt
    Accom SphereOUS v.3.0.1 	ImJG
    Abekas ClipStore MXc J2K Compressed v.3.0.2	HDJ1 HDJK
    BOXX v.1.0	bxrg bxbg bxyv bxy2
    LiveType Codec Decompressor	pRiz
    Cineon DPX 10-bit Y'CbCr 4:2:2	D210 C310 DPX cini
    Radius DV YUV PAL/NTSC	R420 R411

    September 7, 2011

    We have made 2 new point releases that fix several security issues, amongth them MSVR-11-0088. They also include many bugfixes and a few backported features. All changes from the latest libav release (0.7.1) are included as well. Grab them from our download page. or even better use latest git master.

    August 29, 2011

    We have added support for H.264 4:2:2 intra, there are some new 8->10bit fixes in swscale, ffplay has more accurate AV-sync, ogg duration is more accurate now, we can decode WMVP and WVP2 streams and many many other new things and bugfixes. All in ffmpeg git master.

    July 28, 2011

    We have made 2 new point releases that fix several security issues, amongth them MSVR-11-0080. They also include many bugfixes and a few backported features. All changes from libav 0.7.1 are included as well. Grab them from our download page. or even better use latest git master.

    June 24, 2011

    Instead of having fun outside in the warm summer months, we have made a new release: FFmpeg 0.8! All bugfixes and merges from ffmpeg-mt and libav are included in this release. Although we still recommend you use the latest git version of our code.

    We have also made an OLDABI release: FFmpeg 0.7.1. It contains almost all of the features, bugfixes and merges of ffmpeg-mt and libav of 0.8, while being compatible with the 0.6 ABI and API. It has a few missing features, read the Changelog for more information.

    May 3, 2011

    FFmpeg now accesses x264 presets via libx264. This extends functionality by introducing several new libx264 options including -preset-tune, and -profile. You can read more detailed information about these options with "x264 --fullhelp".

    The syntax has changed so be sure to update your commands. Example:

    ffmpeg -i input -vcodec libx264 -preset fast -tune film -profile main -crf 22 -threads 0 output

    April 27, 2011

    FFmpeg now has an oldabi branch. It is updated to master but with the old ABI. Only fixes that break the old ABI are missing from this branch.

    To access the oldabi branch, clone FFmpeg, then do

    git checkout oldabi

    To get back to latest FFmpeg, just run:

    git checkout master

    April 14, 2011

    FFmpeg can now decode 9-bit and 10-bit H.264 streams, used in particular by AVCIntra 50.

    April 4, 2011

    In order to supply our release users with the newest features and bug fixes we are in the process of making a new release. The release will be based on the latest development tree while staying API/ABI compatible to the previous release.

    Please download the release candidate and report problems to our bug tracker.

    March 30, 2011

    Win32 and Win64 builds of FFmpeg are now available at

    Please report any bugs to our bug tracker.

    March 21, 2011

    Today FFmpeg-mt, the multithreaded decoding branch, has been merged into FFmpeg. This has been a long awaited merge, and we would like to thank Alexander Strange for his patience and hard work.

    Testing is appreciated and if you find any bugs please report them to our bug tracker.

    March 21, 2011

    The mailing lists have been fully migrated to!

    The FFmpeg mailing lists were moved from to in April 2005, and moved from to in 2011.

    Unfortunately the lists were down for a few hours because of the abrupt shut down on the previous server[1]. We apologize for this interruption. Also we could not move the subscribers of the libav-user mailing list (libav-user is for application developers using libav* libraries from the FFmpeg project). Even though libav-user was not listed in the shut down announcement[1], it was also shut down.

    If you are not yet subscribed we encourage you to do so now if you are interested in FFmpeg or multimedia or both. Visit our contacts page to find out more about the various mailing lists surrounding the FFmpeg project. You can also find the archives there if you like to browse the old posts.

    As stated in the previous news entry we are in the process of recovering our project infrastructure. We will keep you posted.

    March 17, 2011

    Reinhard Tartler backported several security fixes to the 0.5 release branch and made another point release, that is 0.5.4. Note, 0.5 is quite old and this release is mostly for those stuck with the 0.5 branch, and not so interesting for end users.

        Changelog between 0.5.3 and 0.5.4
    - Fix memory corruption in WMV parsing (addresses CVE-2010-3908)
    - Fix heap corruption crashes (addresses CVE-2011-0722)
    - Fix crashes in Vorbis decoding found by zzuf (addresses CVE-2010-4704)
    - Fix another crash in Vorbis decoding (addresses CVE-2011-0480, Chrome issue 68115)
    - Fix invalid reads in VC-1 decoding (related to CVE-2011-0723)
    - Do not attempt to decode APE file with no frames

    March 15, 2011

    FFmpeg has been forked by some developers after their attempted takeover[1] two months ago did not fully succeed. During these two months their repository was listed here as main FFmpeg repository. We corrected this now and list the actual main repository and theirs directly below. All improvements of their fork have been merged into the main repository already.

    Sadly we lost a not so minor part of our infrastructure to the forking side. We are still in the process of recovering, but web, git and issue tracker are already replaced.

    Readers who want to find out more about the recent happenings are encouraged to read through the archives of the FFmpeg development mailing list[2]. There was also a bit of coverage on some news sites like here [3].

    February 24, 2011

    FFmpeg development has moved to Git, and the SVN repository is no longer updated. The SVN repository may be removed in a near future, so you're recommended to use a Git repository instead.

    The last revision committed to SVN was r26402 on 2011-01-19 and replaced the svn:external libswscale with a standalone copy.

    Oct 18, 2010

    We have just pushed the first point release from our 0.6 release branch: FFmpeg 0.6.1. This is a maintenance-only release that addresses a small number of bugs and security issues. It also adds a newer version of the AAC decoder, which enables the playback of HE-AAC v2 media.

    We have also taken the time make another point release our 0.5 branch: FFmpeg 0.5.3. It is a maintenance-only release that addresses a security issue and a minor set of bugs.

    Distributors and system integrators are encouraged to update and share their patches against our release branches.

    June 15, 2010

    A bit longer than actually expected, but finally, we are proud to announce a new release: FFmpeg 0.6. Check out the release notes and changelog.

    It is codenamed "Works with HTML5" as the special focus of this release were improvements for the new multimedia elements in HTML5. The H.264 and Theora decoders are now significantly faster and the Vorbis decoder has seen important updates. This release supports Google's newly released libvpx library for the VP8 codec and the Matroska demuxer was extended to support to WebM container.

    This release includes again an extensive number of changes; some of its highlights are:

    • Significant work to support at least decoding of all widespread mainstream proprietary codecs, and as usual broad coverage of widespread non-proprietary codecs, such as:
      • decoders and encoders
        • VP8 (via Google's libvpx library)
      • decoders
        • AMR-NB
        • Atrac1
        • HE-AAC v1
        • Bink
        • Bluray (PGS) subtitle
        • MPEG-4 Audio Lossless Coding (ALS)
        • WMA Pro
        • WMA Voice
    • Highlights among the newly supported container formats:
      • demuxers and muxers
        • Adobe Filmstrip
        • SoX native format
        • WebM support in Matroska de/muxer
      • demuxers
        • Bink
        • Core Audio Format
        • Dirac in Ogg
        • IV8
        • QCP
        • VQF
        • Wave64
      • muxers
        • IEC-61937
        • RTSP
    • faster AAC decoding
    • faster H.264 decoding
    • numerous ARM optimizations
    • important updates to the Vorbis decoder
    • RTP packetization support for H.263, and AMR
    • RTP depacketization support for AMR, ASF, H.263, Theora and Vorbis
    • RTMP/RTMPT/RTMPS/RTMPE/RTMPTE protocol support via librtmp
    • the new ffprobe tool
    • VorbisComment writing for FLAC, Ogg FLAC and Ogg Speex files
    • and so much more!

    June 2, 2010

    We are pleased to announce that FFmpeg will be present at LinuxTag in Berlin June 9-12 where we will be showing some spectacular demos. There will also be some trolls.

    May 25, 2010

    We have just pushed out another point release from our 0.5 release branch: FFmpeg 0.5.2. This is a maintenance-only release that addresses a small number of security and portability issues. Distributors and system integrators are encouraged to update and share their patches against this branch.

    March 19, 2010

    Once again, FFmpeg has been accepted to take part in the Google Summer of Code. Here is the Google SoC FFmpeg page.

    We have a list of proposed project ideas available so, if you think you might be interested, head over there to see if there is any project on which you wish to work and for which you may wish to make an application. The list is still in flux, and you're free to come up with your own ideas, but note that proposals should be closely tied to the progression of FFmpeg's code base.

    We would like prospective students to show us that they've got what it takes to be a contributor to FFmpeg. If you think you're suited, then please complete a small task before submitting your Summer-of-Code proposal. Note that many of the proposed Summer-of-Code projects have specific tasks that you would want to work on, since they would show us that you're comfortable in that particular piece of our codebase that relates to your specific project. Send patches to the mailing list for review, so that you will learn about our patch review process, inline replying (because we don't like top-posting on our mailing lists) and general interactions with our developer base.

    The sooner you start communicating with us and working within our code base, the sooner both you and we will ascertain your suitability and you will get used to our development methodology. You have until the application deadline to complete your small task. Good luck!

    March 2, 2010

    We have just pushed out a point release from our 0.5 release branch: FFmpeg 0.5.1. This release fixes security, packaging and licensing issues for FFmpeg 0.5, but it is a maintenance only release; no new codecs, formats or other feature are being introduced. The full details are spelled out in the the release notes and changelog.

    There have been security fixes for the ASF, Ogg and MOV/MP4 demuxers as well as the FFv1, H.264, HuffYUV, MLP, MPEG audio and Snow decoders. libswscale can now be compiled in LGPL mode, albeit with x86 optimizations disabled. Some non-free bits in a test program were replaced. The AC-3 decoder is now completely LGPL. AMR-NB/WB support is now possible in free software through the OpenCORE libraries.

    To help packagers, the x264 glue code was updated to work with newer versions and symbol versioning was backported, as was the lock management API. The symbol versioning change is enabled on platforms that support it. This allows users to upgrade from 0.5.1 to the upcoming 0.6 release without having to recompile their applications. While this release is both API and ABI compatible with 0.5, please note that distributors have to recompile applications against 0.5.1 in order to make seamless upgrades to 0.6 possible.

    March 1, 2010

    We have been busy over the past few months. Among other things, the results are an Indeo 5 video decoder as well as audio decoders for AMR-NB, Sipro, MPEG-4 ALS and WMA Voice, complete support for Bink, CDG and IFF PBM/ILBM bitmaps, an RTSP muxer, Bluray (PGS) subtitle support, a protocol for file concatenation and the ffprobe tool for extracting information from multimedia files.

    September 23, 2009

    In 1992 Sony introduced the first Minidisc player. 17 years later it is now possible to transfer and play back the raw ATRAC data from the actual digital disc with the help of FFmpeg, tools developed by the Linux Minidisc project and official hardware (MZ-RH1). So if you have lots of digital recordings stored on Minidisc now is the time to archive it all.

    One of the last entrenchments of proprietary multimedia has fallen: Windows Media Audio Pro support is finally available in FFmpeg. It decodes all known samples flawlessly and is considerably faster than the binary decoder from Microsoft. A big thank you goes out to all the reverse engineers and programmers who made this possible. It really was a herculean effort.

    August 24, 2009

    Just a very short time after its launch (~10 years), FFmpeg now supports decoding of TwinVQ (remember .vqf files?). Now FOSS enthusiasts can finally contribute to the late 90's discussion if it sounds better than MP3 or not.

    July 24, 2009

    FFmpeg has removed support for libamr as of svn revision 19365. It has been replaced with support for libopencore-amr. Naturally the configure options have changed. The libamr options have been removed and there are two new options to take their place:

    • --enable-libopencore-amrnb
    • --enable-libopencore-amrwb

    The reason for this change is that the libamr license was non-free, while libopencore-amr is licensed under an Apache 2 license. The change was discussed at length on the developer mailing list during May, June, and July. This has several effects:

    • You may now distribute FFmpeg builds with support for dynamically loading libopencore-amr
    • Support for AMR-WB encoding has been removed since libopencore-amr does not support it

    May 7, 2009

    FFmpeg was granted 9 slots to fill with applicants. After the gruelling application and qualification process, we will be running the following tasks this year:

    • RTMP Support
      • Student: Kostya Shiskov
      • Mentor: Ronald Bultje
    • Libswscale Cleanup
      • Student: Ramiro Polla
      • Mentor: Reimar Döffinger
    • S/PDIF Multiplexer
      • Student: Bartlomiej Wolowiec
      • Mentor: Benjamin Larsson
    • Playlist/Concatenation Support
      • Student: Geza Kovacs
      • Mentor: Baptiste Coudurier
    • JPEG2000 Codec
      • Student: Jai Menon
      • Mentor: Justin Ruggles
    • Implement the New Seeking API in Libavformat
      • Student: Zhentan Feng
      • Mentor: Baptiste Coudurier
    • MPEG-4 ALS Decoder
      • Student: Thilo Borgmann
      • Mentor: Justin Ruggles
    • Implementation of AVFilter infrastructure and various audio filters
      • Student: Kevin Dubois
      • Mentor: Vitor Sessak
    • Finish AMR-NB decoder and write an encoder
      • Student: Colin McQuillan
      • Mentor: Robert Swain

    Congratulations to all the successful applicants. Work hard, communicate well and prosper! Good luck!

    March 26, 2009

    Once again, FFmpeg has been accepted to take part in the Google Summer of Code. Here is the Google SoC FFmpeg page.

    We have a list of proposed project ideas available so, if you think you might be interested, head over there to see if there is any project on which you wish to work and for which you may wish to make an application. The list is still in flux, and you're free to come up with your own ideas, but note that proposals should be closely tied to the progression of FFmpeg's code base.

    If you're a student who thinks you have what it takes, we require that prospective students complete some degree of small task before they will be considered to take part in the program for FFmpeg. Take a look at the list, pick something to do, learn about inline replying because we don't like top-posting on our mailing lists and then tell us on the FFmpeg-devel mailing list your small task of choice.

    The sooner you start communicating with us and working within our code base, the sooner both you and we will ascertain your suitability and you will get used to our development methodology. You have until the application deadline to complete your small task. Good luck!

    March 23, 2009

    A new mailing list has been created for ffserver users. The list is intended to create an environment for discussion amongst ffserver users so that they can better receive support and support each other. Interested parties can subscribe and view the archives via the contact page.

    March 10, 2009

    It has been a very long time since we last made a release and many did not think we would make one again but, back by popular demand, we are proud to announce a new release: FFmpeg 0.5. Check out the release notes and changelog.

    It is codenamed "half-way to world domination A.K.A. the belligerent blue bike shed" to give an idea where we stand in the grand scheme of things and to commemorate the many fruitful discussions we had during its development.

    This release includes a very extensive number of changes, but some of the highlights are:

    • Significant work to support at least decoding of all widespread mainstream proprietary codecs, such as:
      • decoders and encoders
        • ALAC
        • Flash Screen Video
        • WMAv2 decoder fixed, WMAv1/v2 encoder
      • decoders
        • Atrac3
        • MLP/TrueHD
        • On2 VP3 improvements and VP5/VP6 support
        • RealAudio Cooker and fixes for 14.4 and 28.8
        • RealVideo RV30/40
        • WMV3/WMV9/VC-1 and IntraX8 frame support for WMV2/VC-1
    • Broad coverage of widespread non-proprietary codecs, including:
      • decoders and encoders
        • DNxHD
        • DVCPRO50 (a.k.a. DV50)
        • Floating point PCM
        • GSM-MS
        • Theora (and encoding via libtheora)
        • Vorbis
      • decoders
        • AAC with ADTS support and >2x the speed of FAAD! (no HE AAC support yet)
        • AC-3 that is faster than liba52 in 5.1, up to 2x faster in stereo and also supports E-AC-3! Hence liba52 is now obsolete.
        • DCA
        • DVCPRO HD (a.k.a. DV100)
        • H.264 PAFF and CQM support, plus slice-based multithreaded decoding
        • Monkey's Audio
        • MPEG-2 video support for intra VLC and 4:2:2
        • Musepack
        • QCELP
        • Shorten
        • True Audio (TTA)
        • Wavpack including hybrid mode support
    • Highlights among the newly supported container formats:
      • demuxers and muxers
        • GXF
        • MXF
      • demuxers
        • NullSoft Video (NSV)
      • muxers
        • iPhone/iPod compatibility for MP4/MOV
        • Matroska
        • NUT
        • Ogg (FLAC, Theora and Vorbis only)
        • ShockWave Flash (SWF)
    • libavdevice
    • ffserver is working again.
    • a shiny, new, completely revamped, non-recursive build system
    • cleaner, more consistent code
    • an all new metadata API
    • and so much more!

    March 4, 2009

    Google are again running their Summer of Code program and, as usual, we will be applying for a project position. As such we will need strong project proposals and qualification tasks for the students to complete.

    To all the students out there who want to work on FFmpeg over the summer, the sooner you begin to contribute to the project the better. Working on digital multimedia software is not the easiest task and getting code into FFmpeg's trunk repository demands significant rigor and commitment.

    Until we are officially accepted into the program, you could take a look at the list of small tasks we have and try to complete one of those. Support for development of FFmpeg is available via the FFmpeg-devel mailing list or IRC.

    December 20, 2008

    RealVideo 3.0 decoder added. Still working the bugs out, please test and report any problems.

    December 20, 2008

    The FFmpeg project would like to recognize and thank the people at Picsearch for their help improving FFmpeg recently. The Picsearch team makes extensive use of FFmpeg and provided feedback to FFmpeg in the form of thousands of files that either crash FFmpeg or use unsupported/unknown codecs. The FFmpeg development team is putting this information to work in order to improve FFmpeg for everyone.

    We know that there are other organizations using FFmpeg on a large scale to process diverse input types. The FFmpeg team invites those organizations to provide similar feedback about problems encountered in the wild.

    December 3, 2008

    A bunch of new formats have recently been added to FFmpeg, namely a QCELP/PureVoice speech decoder, a floating point PCM decoder and encoder, a Nellymoser ASAO encoder, an Electronic Arts TGQ decoder, Speex decoding via libspeex, an MXF muxer, an ASS/SSA subtitle demuxer and muxer and our AC-3 decoder has been extended with E-AC-3 support. Last but not least we now have a decoder for RealVideo 4.0.

    September 8, 2008

    FFmpeg is undergoing major changes in its API/ABI. The last valid revision for libavcodec version 51 is r15261.

    August 21, 2008

    The AAC decoder from FFmpeg Summer of Code 2006 has finally been cleaned up and is now in FFmpeg trunk. It supports Main and Low Complexity profile AAC but does not yet support HE AAC v1 (LC + SBR) or v2 (LC + SBR + PS), though implementation of this support is underway. It is considerably faster than FAAD and you should expect further performance improvements and bug fixes in the coming weeks.

    Also, FFmpeg now has floating point PCM support and supports MLP/TrueHD decoding (FFmpeg SoC 2008 should bring us an encoder), Apple Lossless Audio encoding (FFmpeg SoC 2008) MVI demuxing and Motion Pixels Video decoding, D-Cinema audio muxing, Electronic Arts CMV and TGV decoding and MAXIS EA XA demuxing/decoding.

    June 16, 2008

    UAB "DKD" ( have released a Nellymoser ASAO compatible decoder and encoder under the LGPL. This will aid the development of a native encoder in FFmpeg, and right now a GSoC student is working hard on just that task. A great thanks to UAB "DKD" for this contribution to the FFmpeg community.

    June 11, 2008

    We have added an Oma demuxer, the QuickTime variant of an IMA ADPCM encoder, a VFW grabber, an iPod/iPhone-compatible MP4 muxer, a Mimic decoder, an MSN TCP Webcam stream demuxer as well as demuxers and decoders for the following fringe formats: RL2, IFF, 8SVX, BFI.

    February 7, 2008

    We have added Ogg and AVM2 (Flash 9) SWF muxers, TechnoTrend PVA and Linux Media Labs MPEG-4 (LMLM4) demuxers, PC Paintbrush PCX and Sun Rasterfile decoders.

    November 11, 2007

    FFmpeg now supports XIntra8 frames, meaning that finally all WMV2 samples and some WMV3 samples that showed blocky color artifacts can be decoded correctly.

    October 22, 2007

    Beam Software SIFF demuxer and video decoder support added.

    October 15, 2007

    FFmpeg gets support for the Nellymoser speech codec used in flash.

    October 9, 2007

    Apart from a DNxHD encoder, PAFF decoding support for H.264 was committed to SVN.

    September 29, 2007

    AMV audio and video decoding has arrived.

    September 13, 2007

    In about half a year of work since the l