Using Direct2D with WPF

http://www.codeproject.com/Articles/113991/Using-Direct2D-with-WPF

 

 

Using Direct2D with WPF

By | 3 Nov 2010 | Article
Hosting Direct2D content in WPF controls.

Introduction

With Windows 7, Microsoft introduced a new technology called Direct2D (which is also supported on Windows Vista SP2 with the Platform Update installed). Looking through all its documentation, you'll notice it's aimed at Win32 developers; however, the Windows API Code Pack allows .NET developers to use the features of Windows 7 easily, with Direct2D being one of the features supported. Unfortunately, all the WPF examples included with the Code Pack require hosting the control in a HwndHost, which is a problem as it has airspace issues. This basically means that the Direct2D control needs to be separated from the rest of the WPF controls, which means no overlapping controls with transparency.

The attached code allows Direct2D to be treated as a normal WPF control and, thanks to some COM interfaces, doesn't require you to download the DirectX SDK or even play around with any C++ - the only dependency is the aforementioned Code Pack (the binaries of which are included in the attached file). This article is more about the problems found along the way the challenges involved in creating the control, so feel free to skip to the Using the code section if you want to jump right in.

Background

WPF architecture

WPF is built on top of DirectX 9, and uses a retained rendering system. What this means is that you don't draw anything to the screen, but instead create a tree of visual objects; their drawing instructions are cached and later rendered automatically by the framework. This, coupled with using DirectX to do the graphics processing, enables WPF applications not only to remain responsive when they have to be redrawn, but also allows WPF to use a "painter's algorithm" painting model. In this model, each component (starting at the back of the display, going towards the front) is asked to draw itself, allowing them to paint over the previous component's display. This is the reason it's so easy to have complex and/or partially transparent shapes with WPF - because it was designed taking this scenario into account. For more information, check out the MSDN article.

Direct2D architecture

In contrast to the managed WPF model, Direct2D is immediate-mode where the developer is responsible for everything. This means you are responsible for creating your resources, refreshing the screen, and cleaning up after yourself. It's built on top of Direct3D 10.1, which gives it high-performance rendering, but provides several of the advantages of WPF (such as device independent units, ClearType text rendering, per primitive anti-aliasing, and solid/linear/radial/bitmap brushes). MSDN has a more in-depth introduction; however, it's more aimed at native developers.

Interoperability

Direct2D has been designed to be easily integrated into existing projects that use GDI, GDI+, or Direct3D, with multiple options available for incorporating Direct2D content with Direct3D 10.1 or above. The Direct2D SDK even includes a nice sample called DXGI Interop to show how to do this.

To host Direct3D content inside WPF, the D3DImage class was introduced in .NET 3.5 SP1. This allows you to host Direct3D 9 content as an ImageSource, enabling it to be used inside an Image control, or as an ImageBrush etc. There's a great article here on CodeProject with more information and examples.

The astute would have noticed that whilst both technologies can work with Direct3D, Direct2D requires version 10.1 or later, whilst the D3DImage in WPF only supports version 9. A quick internet search resulted in this blog post by Jeremiah Morrill. He explains that an IDirect3DDevice9Ex (which is supported by D3DImage) supports sharing resources between devices. A shared render target created in Direct3D 10.1 can therefore be pulled into a D3DImage via an intermediate IDirect3DDevice9Ex device. He also includes example source code which does exactly this, and the attached code is derived from his work.

So, we now have a way of getting Direct2D working with Direct3D 10.1, and we can get WPF working with Direct3D 10.1; the only problem is the dependency of both of the examples on unmanaged C++ code and the DirectX SDK. To get around this problem, we'll access DirectX through its COM interface.

Component Object Model

I'll admit I know nothing about COM, apart from to avoid it! However, there's an article here on CodeProject that helped to make it a bit less scary. To use COM, we have to use low level techniques, and I was surprised (and relieved!) to find that the Marshal class has methods which could mimic anything that would normally have to be done in unmanaged code.

Since there are only a few objects we need from Direct3D 9, and there are only one or two functions in each object that are of interest to us, instead of trying to convert all the interfaces and their functions to their C# equivalent, we'll manually map the V-table as discussed in the linked article. To do this, we'll create a helper function that will extract a method from the specified slot in the V-table:

public static bool GetComMethod<T, U>(T comObj, int slot, out U method) where U : class
{
    IntPtr objectAddress = Marshal.GetComInterfaceForObject(comObj, typeof(T));
    if (objectAddress == IntPtr.Zero)
    {
        method = null;
        return false;
    }

    try
    {
        IntPtr vTable = Marshal.ReadIntPtr(objectAddress, 0);
        IntPtr methodAddress = Marshal.ReadIntPtr(vTable, slot * IntPtr.Size);

        // We can't have a Delegate constraint, so we have to cast to
        // object then to our desired delegate
        method = (U)((object)Marshal.GetDelegateForFunctionPointer(
                             methodAddress, typeof(U)));
        return true;
    }
    finally
    {
        Marshal.Release(objectAddress); // Prevent memory leak
    }
}

This code first gets the address of the COM object (using Marshal.GetComInterfaceForObject), then gets the location of the V-table stored at the start of the COM object (using Marshal.ReadIntPtr), then gets the address of the method at the specified slot from the V-table (multiplying by the system size of a pointer, as Marshal.ReadIntPtr specifies the offset in bytes), then finally creates a callable delegate to the returned function pointer (Marshal.GetDelegateForFunctionPointer). Simple!

An important thing to note is that the IntPtr returned by the call to Marshal.GetComInterfaceForObject must be released; I wasn't aware of this, and found my program leaking memory when the resources were being re-created. Also, the function uses an out parameter for the delegate so we get all the nice benefits of type inference and, therefore, reduces the amount of typing required for the caller. Finally, you'll notice there's some nasty casting to object and then to the delegate type. This is unfortunate but necessary, as there's no way to specify a delegate generic constraint in C# (the CLI does actually allow this constraint, as mentioned by Jon Skeet in his blog). Since this is an internal class, we'll assume that the caller of the function knows this constraint.

With this helper function, it becomes a lot easier to create a wrapper around the COM interfaces, so let's take a look at how to provide a wrapper around the IDirect3DTexture9 interface. First, we'll create an internal interface with the ComImport, Guid, and InterfaceType attributes attached so that the Marshal class knows how to use the object. For guid, we'll need to look inside the DirectX SDK header files, in particular d3d9.h:

interface DECLSPEC_UUID("85C31227-3DE5-4f00-9B3A-F11AC38C18B5") IDirect3DTexture9;

With the same header open, we can also look for the interface's declaration, which looks like this after running it through the pre-processor and removing the __declspec and __stdcall attributes:

struct IDirect3DTexture9 : public IDirect3DBaseTexture9
{
    virtual HRESULT QueryInterface( const IID & riid, void** ppvObj) = 0;
    virtual ULONG AddRef(void) = 0;
    virtual ULONG Release(void) = 0;
    
    virtual HRESULT GetDevice( IDirect3DDevice9** ppDevice) = 0;
    virtual HRESULT SetPrivateData( const GUID & refguid, 
            const void* pData,DWORD SizeOfData,DWORD Flags) = 0;
    virtual HRESULT GetPrivateData( const GUID & refguid, 
            void* pData,DWORD* pSizeOfData) = 0;
    virtual HRESULT FreePrivateData( const GUID & refguid) = 0;
    virtual DWORD SetPriority( DWORD PriorityNew) = 0;
    virtual DWORD GetPriority(void) = 0;
    virtual void PreLoad(void) = 0;
    virtual D3DRESOURCETYPE GetType(void) = 0;
    virtual DWORD SetLOD( DWORD LODNew) = 0;
    virtual DWORD GetLOD(void) = 0;
    virtual DWORD GetLevelCount(void) = 0;
    virtual HRESULT SetAutoGenFilterType( D3DTEXTUREFILTERTYPE FilterType) = 0;
    virtual D3DTEXTUREFILTERTYPE GetAutoGenFilterType(void) = 0;
    virtual void GenerateMipSubLevels(void) = 0;
    virtual HRESULT GetLevelDesc( UINT Level,D3DSURFACE_DESC *pDesc) = 0;
    virtual HRESULT GetSurfaceLevel( UINT Level,IDirect3DSurface9** ppSurfaceLevel) = 0;
    virtual HRESULT LockRect( UINT Level,D3DLOCKED_RECT* pLockedRect, 
            const RECT* pRect,DWORD Flags) = 0;
    virtual HRESULT UnlockRect( UINT Level) = 0;
    virtual HRESULT AddDirtyRect( const RECT* pDirtyRect) = 0;
};

We only need one of these methods for our code, which is the GetSurfaceLevel method. Starting from the top and counting down, we can see that this is the 19th method, so will therefore be at slot 18 in the V-table. We can now create a wrapper class around this interface.

internal sealed class Direct3DTexture9 : IDisposable
{
    [UnmanagedFunctionPointer(CallingConvention.StdCall)]
    private delegate int GetSurfaceLevelSignature(IDirect3DTexture9 texture, 
                         uint Level, out IntPtr ppSurfaceLevel);

    [ComImport, Guid("85C31227-3DE5-4f00-9B3A-F11AC38C18B5"), 
                InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
    internal interface IDirect3DTexture9
    {
    }

    private IDirect3DTexture9 comObject;
    private GetSurfaceLevelSignature getSurfaceLevel;

    internal Direct3DTexture9(IDirect3DTexture9 obj)
    {
        this.comObject = obj;
        HelperMethods.GetComMethod(this.comObject, 18, 
                                   out this.getSurfaceLevel);
    }

    ~Direct3DTexture9()
    {
        this.Release();
    }

    public void Dispose()
    {
        this.Release();
        GC.SuppressFinalize(this);
    }

    public IntPtr GetSurfaceLevel(uint Level)
    {
        IntPtr surface;
        Marshal.ThrowExceptionForHR(this.getSurfaceLevel(
                              this.comObject, Level, out surface));
        return surface;
    }

    private void Release()
    {
        if (this.comObject != null)
        {
            Marshal.ReleaseComObject(this.comObject);
            this.comObject = null;
            this.getSurfaceLevel = null;
        }
    }
}

In the code, I've used Marshal.ThrowExceptionForHR to make sure that the call succeeds - if there's an error, then it will throw the relevant .NET type (e.g., a result of E_NOTIMPL will result in a NotImplementedException being thrown).

Using the code

To use the attached code, you can either include the compiled binary into your project, or include the code as there's not a lot of it (despite the time spent on creating it!). Either way, you'll need to make sure you reference the Windows API Code Pack DirectX library in your project.

In the code, there are three classes of interest: D3D10Image, Direct2DControl, and Scene.

The D3D10Image class inherits from D3DImage, and adds an override of the SetBackBuffer method that accepts a Direct3D 10 texture (in the form of a Microsoft.WindowsAPICodePack.DirectX.Direct3D10.Texture2D object). As the code is written, the texture must be in the DXGI_FORMAT_B8G8R8A8_UNORM format; however, feel free to edit the code inside the GetSharedSurface function to whatever format you want (in fact, the original code by Jeremiah Morrill did allow for different formats, so take a look at that for inspiration).

Direct2DControl is a wrapper around the D3D10Image control, and provides an easy way to display a Scene. The control takes care of redrawing the Scene and D3D10Image when it's invalidated, and also resizes their contents. To help improve performance, the control uses a timer to resize the contents 100ms after the resize event has been received. If another request to be resized occurs during this time, the timer is reset to 100ms again. This might sound like it could cause problems when resizing, but internally, the control uses an Image control, which will stretch its contents when it's resized so the contents will always be visible; they just might get temporarily blurry. Once resizing has finished, the control will redraw its contents at the correct resolution. Sometimes, for reasons unknown to me, there will be a flicker when this happens, but by using the timer, this will occur infrequently.

The Scene class is an abstract class containing three main functions for you to override: OnCreateResources, OnFreeResources, and OnRender. The reason for the first two functions is that a DirectX device can get destroyed (for example, if you switch users), and afterwards, you will need to create a new device. These methods allow you to create/free device dependent resources, such as brushes for example. The OnRender method, as the name implies, is where you do the actual drawing.

Putting this together gives us this code to create a simple rectangle on a semi-transparent blue background:

<!-- Inside your main window XAML code -->
<!-- Make sure you put a reference to this at the top of the file:
        xmlns:d2d="clr-namespace:Direct2D;assembly=Direct2D"
 -->

<d2d:Direct2DControl x:Name="d2DControl" />
using D2D = Microsoft.WindowsAPICodePack.DirectX.Direct2D1;

internal sealed class MyScene : Direct2D.Scene
{
    private D2D.SolidColorBrush redBrush;

    protected override void OnCreateResources()
    {
        // We'll fill our rectangle with this brush
        this.redBrush = this.RenderTarget.CreateSolidColorBrush(
                             new D2D.ColorF(1, 0, 0));
    }

    protected override void OnFreeResources()
    {
        if (this.redBrush != null)
        {
            this.redBrush.Dispose();
            this.redBrush = null;
        }
    }

    protected override void OnRender()
    {
        // This is what we're going to draw
        var size = this.RenderTarget.Size;
        var rect = new D2D.Rect
            (
                5,
                5,
                (int)size.Width - 10,
                (int)size.Height - 10
            );

        // This actually draws the rectangle
        this.RenderTarget.BeginDraw();
        this.RenderTarget.Clear(new D2D.ColorF(0, 0, 1, 0.5f));
        this.RenderTarget.FillRectangle(rect, this.redBrush);
        this.RenderTarget.EndDraw();
    }
}

// This is the code behind class for the XAML
public partial class MainWindow : Window
{
    public MainWindow()
    {
        InitializeComponent();

        // Add this after the call to InitializeComponent. Really you should
        // store this object as a member so you can dispose of it, but in our
        // example it will get disposed when the window is closed.
        this.d2DControl.Scene = new MyScene();
    }
}

Updating the Scene

In the original code to update the Scene, you needed to call Direct2DControl.InvalidateVisual. This has now been changed so that calling the Render method on Scene will cause the new Updated event to be fired, which the Direct2DControl subscribes to and invalidates its area accordingly.

Also discovered was that the Scene would sometimes flicker when redrawn. This seems to be an issue with the D3DImage control, and the solution (whilst not 100%) is to synchronize the AddDirtyRect call with when WPF is rendering (by subscribing to the CompositionTarget.Rendering event). This is all handled by the Direct2DControl for you.

To make things easier still, there's a new class deriving from Scene called AnimatableScene. After releasing the first version, there was some confusion with how to do continuous scene updates, so hopefully this class should make it easier - you use it the same as the Scene class, but your OnRender code will be called, when required, by setting the desired frames per second in the constructor (though see the Limitations section). Also note that if you override the OnCreateResources method, you need to make sure to call the base's version at the end of your code to start the animation, and when you override the OnFreeResources method, you need to call the base's version first to stop the animation (see the example in the attached code).

Mixed mode assembly is built against version 'v2.0.50727'

The attached code is compiled against .NET 4.0 (though it could probably be retargeted to work under .NET 2.0), but the Code Pack is compiled against .NET 2.0. When I first referenced the Code Pack and tried running the application, the above exception kept getting raised. The solution, found here, is to include an app.config file in the project with the following startup information:

<?xml version="1.0"?>
<configuration>
  <startup useLegacyV2RuntimeActivationPolicy="true">
    <supportedRuntime version="v4.0"/>
  </startup>
</configuration>

Limitations

Direct2D will work over remote desktop; however (as far as I can tell), the D3DImage control is not rendered. Unfortunately, I only have a Home Premium version of Windows 7, so cannot test any workarounds, but would welcome feedback in the comments.

The code written will work with targeting either x86 or x64 platforms (or even using the Any CPU setting); however, you'll need to use the correct version of Microsoft.WindowsAPICodePack.DirectX.dll; I couldn't find a way of making this automatic, and I don't think the Code Pack can be compiled to use Any CPU as it uses unmanaged code.

The timer used in the AnimatableScene is a DispatchTimer. MSDN states:

[The DispatcherTimer is] not guaranteed to execute exactly when the time interval occurs [...]. This is because DispatcherTimer operations are placed on the Dispatcher queue like other operations. When the DispatcherTimer operation executes is dependent on the other jobs in the queue and their priorities.

History

  • 02/11/10 - Direct2DControl has been changed to use a DispatchTimer so that it doesn't contain any controls needing to be disposed of (makes FxCop a little happier), and the control is now synchronized with WPF's CompositionTarget.Rendering event to reduce flickering. Scene has been changed to include an Updated event and to allow access to its D2DFactory to derived classes. Also, the AnimatedScene class has been added.
  • 21/09/10 - Initial version.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

저작자 표시

Hex Grids and Hex Coordinate Systems in Windows: Drawing and Printing

http://www.codeproject.com/Articles/312144/Hex-Grids-and-Hex-Coordinate-Systems-in-Windows-Dr

Hex Grids and Hex Coordinate Systems in Windows: Drawing and Printing

By | 29 Apr 2012 | Article
A library (DLL) for the generation of hexagon grids (or "tessellations"), and for the management of the resultant coordinate systems.

HEXPLANE.EXE DEMO

Figure 1: "Hexplane.exe", a demo in which the user flies through 3D space over a hex-based terrain

HEX3D.EXE DEMO

Figure 2: Spinning cube demo with four dynamic hex tessellation surfaces ("Hex3d.exe")

Introduction

Whether called a hex grid, a honeycomb, or a tessellation (which is the mathematical term for a space-filling pattern), groupings of hexagons like the ones shown above are a useful way to fill up two-dimensional space. They provide for a more interesting visual effect than tessellations of 3- or 4-sided polygons; and unlike tessellations composed of several different polygons, hex tessellations create a regular grid with a consistent coordinate system, a fact which has been exploited by all sorts of computer games, and also by many board games.

The work at hand describes how to use a library created by the author to draw a wide variety of hex tessellations. This is accomplished using selected calls into the GDI and GDI+ APIs made by the library. The library is named "hexdll.dll". This somewhat repetitive name makes more sense in the context of its codebase, where it is found alongside "hexdll.cpp", "hexdll.h", "hex3d.cpp", and so on.

The development of the programs provided with this article presented some performance challenges. In one demonstration, a hexagon grid is drawn onto a Direct3D surface. The surface is dynamic; with each frame, a slightly different hex tessellation is drawn, creating an interesting flicker effect. Care was taken to ensure that the many calls into "hexdll.dll" necessitated by this application did not result in decreased frame rate. This requires "hexdll.dll" itself to be capable of operating quickly, and also presents potential interface issues between GDI and Direct3D, which are discussed more extensively further down in the article.

In another of the demo programs, a large hex grid is redrawn in its entirety with each Resize event. Again, if done incorrectly, this action will introduce a very noticeable lag.

These high-performance applications are both enabled by a single common design decision: GDI+ is avoided, in favor of GDI, unless the caller specifically requests anti-aliasing. GDI does not offer this capability, so GDI+ must be used for anti-aliased drawing. However, for drawing operations not involving anti-aliasing, GDI is significantly faster than GDI+. This is an interesting result which is discussed at some length below. Here, suffice it to say that the OOP-style interface exposed by GDI+ comes at a cost, and at a cost which is in some cases dramatic. This article is thus a reminder of the high performance potential of non-OO procedural and structured programming techniques.

Background

The "device context" (DC) is ubiquitous in Windows programming. The Direct3D surface interface used here (IDirect3DSurface91), for example, exposes a GetDC() method. Most Windows controls (be they MFC, Win32, or .NET-based) expose an HWND, which can be converted into a DC using a single Windows API call. Each of these entities is ultimately just a different kind of 2D Windows surface, and "hexdll.dll" can draw hex tessellations on all of them, using the DC as a common medium. Many of these operations are demonstrated in the code base provided, and discussed in this article.

The author's DLL is designed for hassle-free use from native or .NET programs2; the source code provided contains complete examples of both types of clients. The main ".cs" file for the .NET demo is only 87 lines long. None of the C++ demos use more than 450 lines of code, despite their high feature content. The "hex2d.cpp" printer-friendly demo uses only 236 lines of code.

The next section of the article deals with the creation of client apps that use the DLL. For simplicity, a Visual Studio 2010 / C# client app ("hexdotnet.sln" / "hexdotnet.exe") is shown first. The folder tree for this C# application is present in the "hexdotnet" subfolder of the provided source code archive.

After the presentation of the .NET client, the text below continues with the presentation of a variety of C++ client programs, and then concludes with a discussion of the internal implementation of "hexdll.dll". The code for the library is written in C++, and built using the MinGW compiler. Some C++ client apps are discussed as well.

The client programs provided were developed using Visual Studio for the .NET app and MinGW for the C++ apps and for the DLL itself. In constructing the C++ development tool chain, this article relies heavily on techniques described in two previous articles by the same author, GDI Programming with MinGW and Direct3D Programming with MinGW. The text below attempts to be self-contained, but it does make reference to these predecessor articles, as necessary, and they do provide more detailed explanations of some of the background topics for this article. There are some minor differences between the instructions given in this article and those given in its predecessors, due to changes in MinGW. These are discussed in the section titled "Building", further below.

API Selection

The author faced a choice between several 2D drawing APIs in developing the programs described in this article. The APIs considered were GDI, GDI+, DirectDraw, and Direct2D. Of these, Direct2D is the newest and likely the fastest-running alternative. Unfortunately, MinGW does not support it, at least not as downloaded. DirectDraw is, like Direct2D, a component of DirectX, but it is a deprecated one.

Of course, it would be difficult to integrate either of these DirectX-based technologies into typical (i.e., raster-based) Windows applications as seamlessly as was done for the GDI/GDI+ implementation present in "hexdll.dll". Two main advantages of the approach selected are therefore its generality and its lack of bothersome architectural requirements.

Using the Code

One simple way to develop a client application that uses "hexdll.dll" is to use the .NET System.Windows.Forms namespace to create a control onto which the DLL can draw. Any C# application can access the functions exposed by "hexdll.dll". The first step is to insert the declarations shown below into an application class:

[DllImport("hexdll.dll", CallingConvention = CallingConvention.Cdecl)]
static extern void hexdllstart();

[DllImport("hexdll.dll", CallingConvention = CallingConvention.Cdecl)]
static extern void hexdllend();

[DllImport("hexdll.dll", CallingConvention = CallingConvention.Cdecl)]
static extern void systemhex
(
  IntPtr hdc,  //DC we are drawing upon
  Int32 origx, //Top left corner of (0,0) hex in system - X
  Int32 origy, //Top left corner of (0,0) hex in system - Y
  Int32 magn,  //One-half hex width; also, length of each hex side
  Int32 r,     //Color of hex - R
  Int32 g,     //Color of hex - G
  Int32 b,     //Color of hex - B
  Int32 coordx,//Which hex in the system is being drawn? - X
  Int32 coordy,//Which hex in the system is being drawn? - Y
  Int32 penr,  //Outline (pen) color - R
  Int32 peng,  //Outline (pen) color - G
  Int32 penb,  //Outline (pen) color - B
  Int32 anti   //Anti-alias? (0 means "no")
);

In the C# application provided, these declarations are inserted directly into the Form1 class, very near the top of "Form1.cs". The systemhex(), hexdllstart(), and hexdllend() functions are therefore accessible as static methods of the Form1 class.

At runtime, "hexdll.dll" must be present in the same folder as the .NET executable ("hexdotnet.exe" here), or at least present in the Windows search path, for this technique to work.

In the declarations shown above, note that the Cdecl calling convention is used, as opposed to the default Stdcall option. Programmers uninterested in this distinction can simply copy the declarations as shown above without giving this any thought. For those interested in detail, the author found that using Stdcall in the DLL implementation code caused MinGW to engage in some undesirable name mangling. The DLL function names ended up looking like hexdllstart@0.

The Stdcall convention uses these extra suffixes to support function name overloading; Cdecl does not support overloaded functions and therefore does not require them. It is worth noting, too, that this sort of name mangling is an inherent requirement for the linkage of C++ class methods; the library presented here thus makes no attempt to expose an OO interface.

Calls to functions hexdllstart() and hexdllend() must bracket any use of "hexdll.dll" for drawing. These functions exist basically to call Microsoft's GdiplusStartup and GdiplusShutdown API functions, at app startup / shutdown. This design is in keeping with Microsoft's guidelines for the construction of DLLs that use GDI+.

The actual hex-drawing code in each client app consists of call(s) to systemhex(). In this identifier, the word "system" refers not to some sort of low-level privilege, but to the system of coordinates created by a hex tessellation. Any such tessellation has a hexagon designated (0,0), at its top / left corner. Unless the tessellation is exceedingly narrow, there is a (1,0) hexagon to its right. Unless the tessellation is very short, there is a (0,1) hexagon beneath the (0,0) hexagon.

The figure below shows an example hex tessellation with several of its constituent hexagons labeled with their (X,Y) coordinates, as defined in the system used by "hexdll.dll". In this figure, many arbitrary but necessary decisions made by the author are evident. The placement of the origin at the top left is an obvious example. More subtly, note that the entire grid is oriented such that vertical columns of hexagons can be identified (e.g., the column of hexagons with "X" coordinate 0). The grid could be rotated 90 degrees such that these formed rows instead, but this is not the orientation used here. Finally, note that hexagons at odd "X" coordinates, by convention, are located slightly lower in the "Y" dimension than those at even "X" coordinates. This is another one of these arbitrary decisions made by the author, each of which will potentially impact the implementation of any application that uses "hexdll.dll".

HEX COORDINATE SYSTEM

Figure 3: Coordinate system used by "hexdll.dll"

Returning to the declaration of systemhex(), the coordx and coordy parameters to this function define the coordinate of the single hexagon drawn by each call to systemhex(). This (X,Y) point defines the entire hexagon in terms of a coordinate system like the one shown in the figure above. The specifics of this coordinate system are passed in parameters origx, origy, and magn. The origx and origy parameters, taken together, define where the leftmost vertex of the hexagon (0,0) is located. These coordinates are expressed in pixels, relative to coordinate (0,0) of the surface onto which the hexagon is being drawn.

The magn parameter defines the size of each hexagon. Each hexagon is 2.0 * magn pixels wide. Each hexagon's height is slightly less than that, at approximately 1.7321 times magn. (This is 2.0 * sin(60o)) * magn.)

Two RGB color triads are passed to systemhex(): parameters r, g, and b define the interior color of the hexagon, while penr, peng, and penb define the color of its single-pixel outline. Each of these parameters can range from 0 to 255.

Finally, the IntPtr parameter to systemhex() is a HANDLE to the DC to be drawn upon. In the .NET example client provided, this is obtained by taking the Handle property of a Panel control created for this purpose, and passing it to the Win32 GetDC() function. This function is brought into the .NET program using a DllImport declaration very similar to the three already shown, along with the corresponding cleanup function ReleaseDC():

[DllImport("user32.dll")]
static extern IntPtr GetDC(IntPtr hWnd);
 
[DllImport("user32.dll")]
static extern bool ReleaseDC(IntPtr hWnd, IntPtr hDC);

In the .NET example program, the MakeHex() method of Form1 does the actual drawing. It is deliberately ill-behaved, redrawing an 80 x 80 hex coordinate system in its entirety. Because MakeHex() gets called one time for every Resize event, this presents severe performance issues unless each call to systemhex() executes with sufficient speed. The code for MakeHex() is shown in its entirety below:

private void MakeHex()
{
  IntPtr h = GetDC(this.panel1.Handle);

  //Not efficient. Good for testing.
  for (int row = 0; row < 80; ++row)
   for (int col = 0; col < 80; ++col)
    systemhex(h, 30, 30, 10, 255, 255, 255, row, col, 255, 0, 0, 0);

  ReleaseDC(this.panel1.Handle, h);
}

Above, note that each hex drawn is part of a system having its (0,0) hex at raster coordinate (30,30). This is measured in pixels from the top left corner of Panel1, which is configured to fill the entire client area of Form1. Each hex is 20 pixels wide (twice the magn parameter of 10). The hexagons are white (red=255, green=255, blue=255), with a bright red outline. A full 6400 hexagon is drawn with each call to MakeHex(); an 80 x 80 grid of hexagons is drawn, at system coordinates (0,0) through (79,79). The result of this process is shown below; note that the window is not sufficiently large at this point in time to show the entire 80 x 80 grid:

HEX3D.EXE DEMO

Figure 4: "Hexdotnet.exe" at startup (not anti-aliased)

As the code exists in the download provided, the final parameter to systemhex(), named anti, is set to 0. This disables anti-aliasing and allows for GDI (as opposed to GDI+) to be used, which is key to obtaining good Resize performance. The tradeoff is a somewhat jagged rendering, as evident in the picture above.

If anti is set to a non-zero value, and the .NET example client is recompiled, then a significant performance lag will be perceptible when resizing Form1. In the author's test, a simple maximize operation performed immediately after app startup took about 2 seconds with anti-aliasing enabled.

Significantly, GDI's performance advantage was present even when compared to GDI+ running without anti-aliasing enabled (i.e., with SmoothingModeHighSpeed in effect). If OVERRIDE_C_GDI is defined when "hexdll.cpp" is built, GDI+ will be used for all calls. The resultant performance lag is, again, quite perceptible, and the author provides this option only for performance testing.

Building

The Build Script

The C# demonstration described in the last section can be built and executed by simply opening "hexdotnet.sln" and pressing F5. A pre-built copy of the DLL is included along with its source code.

The DLL can be rebuilt, though, along with all of the C++ demonstration programs, using the build script "make.bat". This batch file also copies "hexdll.dll" to the requisite locations under the "hexdotnet" folder tree.

The script "clean.bat" is also provided; it removes all traces of the build process, except for the pre-built version of the DLL included with the .NET solution. These are intended for execution from a Command Prompt window, not directly from Explorer or Task Manager. Before attempting to run "make.bat", it is necessary to include the MinGW binary path in the environment PATH, e.g.:

COMMAND PROMPT BUILD

Figure 5: C++ Build Steps

A batch file that sets PATH properly is also provided in the source code archive. It is named "envvars.bat". This can be run instead of the set command shown above.

The build script itself consists mostly of calls to g++. The commands that compile "hex3d.cpp" and "hexplane.cpp" rely on build commands that are very similar to those shown in Direct3D Programming with MinGW. The commands that build "hexdll.dll" itself rely heavily on the MinGW / GDI+ build instructions given in GDI+ Programming With MinGW, and also on some detailed instructions for DLL construction given by the developers of MinGW.

In all of the "g++" commands in "make.bat", the -w option is used to disable warnings. In the versions of MinGW used by the author, this either had no effect, i.e., there were no warnings even without -w, or, if there were warnings, they came from Microsoft's DirectX source files.

MinGW Version Differences

The author used the November, 2011 release of MinGW during the final development of the code supplied, with the MinGW Developer Toolkit selected for inclusion during installation. Slightly different techniques were necessary with earlier versions of MinGW.

GDI+ headers are now included in the distribution, and do not have to be obtained from elsewhere, for example. These headers are in a "gdiplus" subfolder, though, which must be considered in constructing one's #include directives.

Also, it used to be possible to run the MinGW compiler without including "c:\mingw\bin" (or equivalent) in the search path. In the latest versions of MinGW, this will result in missing dependency errors when attempting to use "g++.exe".

Some of these earlier techniques were used in GDI+-Programming-With-MinGW and Direct3D Programming with MinGW, and the instructions given in these articles remain valid when the compiler actually recommended in those specific articles is used.

C++ Demonstrations

At a high level, the steps necessary to create a client application for "hexdll.dll" are the same in both C# and C++. In both cases, the DLL itself must be present alongside the client EXE at runtime (or, at least, in its search path). Similarly, in both C# and C++ clients, a sort of function prototype or stub declaration is inserted into the client code base, to represent the DLL functions. Once these preconditions are met, the DLL functions can be called exactly like the functions (in the case of C++) or methods (in C#) implemented in the client code.

In the C++ client code written here, these declarations are brought into the code base from a header file, "hexdll.h", using #include. This is a very typical way for C++ programs to share declarations, and to, in this way, expose interfaces to each other. The C++ declarations comprising the external interface of "hexdll.dll" are shown below. This is the core of "hexdll.h":

void HEXDLL systemhex(HDC hdc,int origx,int origy,int magn,int r,
      int g,int b,int coordx,int coordy,int penr,int peng,int penb,BOOL anti);
 
void HEXDLL hexdllstart();

void HEXDLL hexdllend();

These declarations are analogous to the C# declarations of systemhex(), hexdllstart(), and hexdllend(), shown earlier. The HEXDLL macro evaluates, when included in a client application, to __declspec(dllimport), a Windows-specific modifier for functions imported from a DLL. During the DLL build, HEXDLL evaluates to __declspec(dllexport); this is all managed using the preprocessor macro BUILDING_HEXDLL.

When included by a C++ compilation unit, the declarations shown above get wrapped in an extern "C" block. This action is based on the fact that __cplusplus is defined. The extern "C" block ensures that the Cdecl calling convention is used, even in C++ programs, and that names are not mangled. Finally, all of this code is bracketed by an #ifdef directive designed to keep these declarations from getting repeated due to preprocessor actions. Of course, the author of the client application needs only to #include the header file and call its functions.

In neither (C++ / .NET) case does the client application code need to make any direct reference to the GDI+ libraries. Rather, they are included indirectly, as a part of "hexdll.dll".

Spinning Cube Demo

Three C++ example programs are provided with the article. First, "hex3d.exe" is a variation on the spinning cube demo shown in Direct3D Programming with MinGW. This is the application shown earlier in Figure 2. It is built from a single source code file, "hex3d.cpp". In this program, a static texture is not used for the cube surfaces. Instead, with each iteration of the rendering loop, a DC is obtained for the texture's main appearance surface, and is drawn on using systemhex(). Random shades of red are used for each hexagon, resulting in an appealing frame-by-frame flicker effect. The application exits after a certain number of frames have been rendered (by default, a thousand frames). This allows for easy determination of frame rate, by timing the demo's execution time.

The code to get a DC from a texture surface is a new addition to "hex3d.cpp", compared to its spinning cube predecessor. This task is performed by the function do2dwork, shown below this paragraph. This function is called with each iteration of the main loop, prior to the call to render().

void do2dwork()
{ 
 IDirect3DSurface9* surface=NULL;
 hexgridtexture->GetSurfaceLevel(0, &surface);

 HDC hdc;
 
 surface->GetDC(&hdc); 

 for(int hexcx=0;hexcx<TESSEL_ROWS;++hexcx)
 {
  for(int hexcy=0;hexcy<TESSEL_ROWS;++hexcy) 
  {  
   //Slight flicker to red in hexagons
   //Red values range from MIN_HEX_RED to 255
   int red=(rand()%(256-MIN_HEX_RED))+MIN_HEX_RED;     
 
   systemhex
   (
    hdc,
    TESSEL_ORIG_X,
    TESSEL_ORIG_Y,
    TESSEL_MAGNITUDE,
    red,0,0,
    hexcx,hexcy,
    red,0,0,0
   );       
  }
 }
 surface->ReleaseDC(hdc);
 surface->Release();
}

The first four lines of code in the function body above serve to get the necessary DC handle for systemhex(). The loop immediately after that is very similar in its implementation to the C# loop from MakeHex(). The color randomization code in the loop body is new, but straightforward. As is typical of C++ compared to C#, the final two statements above clean up resources.

Like "hexdotnet.exe", "hex3d.exe" expects "hexdll.dll" to be present in the current search path at runtime. In addition, it requires the file "hex3d.png" to be present. This contains appearance information for the static texture applied to the top and bottom of the demo solid.

"Hexplane.exe"

This demonstration program creates an illusion of flight in 3D space, above a flat terrain covered by a hex tessellation. The program is built from a single source code file, "hex3d.cpp". It is shown in action in Figure 1, near the top of the article. In this demo, flight takes place in the positive "Z" direction (forward), with rotation about the "Z" axis occurring throughout the flight. Movement continues at an accelerating (but limited) rate until shortly after the terrain below passes out of view. At that point, the demo restarts. The sky is simulated by applying a horizon image to a rectangular solid off in the distance. Like the spinning cube demo, "hexplane.exe" exits after a set number of frames have been rendered.

In many ways, this demo is a simplification of the spinning cube demo. Only two rectangular faces must be drawn, versus six in the spinning cube demo. The declaration of the eight vertices required to draw these two rectangular faces is shown below:

//
// These are our vertex declarations, for both of the rectangular faces
//  being drawn.
//
MYVERTEXTYPE demo_vertices[] =
{
 { -SKY_SIZE,  SKY_SIZE, SKY_DISTANCE, 0, 0, -1, 0, 0 },         // Sky face  
 {  SKY_SIZE,  SKY_SIZE, SKY_DISTANCE, 0, 0, -1, 1, 0 },
 { -SKY_SIZE, -SKY_SIZE, SKY_DISTANCE, 0, 0, -1, 0, 1 },
 {  SKY_SIZE, -SKY_SIZE, SKY_DISTANCE, 0, 0, -1, 1, 1 },

 { -GROUND_SIZE, -GROUND_DEPTH,  GROUND_SIZE, 0, 1, 0, 0, 0 },    // Ground face
 {  GROUND_SIZE, -GROUND_DEPTH,  GROUND_SIZE, 0, 1, 0, 1, 0 },
 { -GROUND_SIZE, -GROUND_DEPTH, -GROUND_SIZE, 0, 1, 0, 0, 1 },
 {  GROUND_SIZE, -GROUND_DEPTH, -GROUND_SIZE, 0, 1, 0, 1, 1 },
};

This declaration consists of eight distinct vertex structures, each occupying its own line in the code. Each of these begins with "X", "Y", and "Z" coordinates. These coordinates are defined using preprocessor constants that hint at their purposes. More details about the actual design of 3D solids is available in Direct3D Programming with MinGW; the ground face is roughly analogous to the top face of the original spinning cube, and the sky face is analogous to its front.

The remainder of the initializers are explained by the declaration of MYVERTEXTYPE, the custom vertex struct used by both of the Direct3D demo programs presented here. This declaration is shown below:

struct MYVERTEXTYPE {FLOAT X, Y, Z; D3DVECTOR NORMAL; FLOAT U, V;};

Note that immediately after the coordinates comes the normal vector, followed by 2D point (U,V). The normal vector extends outward into space from the solid, and is perpendicular to the face; this is necessary for lighting purposes. For the ground face, the normal vectors are <0,1,0>, i.e., a vector sticking straight up in the "Y" dimension. For the sky face, the normal vectors point at the user, i.e., in the negative "Z" direction. They thus have a value of <0,0,-1>.

Point (U,V) maps the vertex to a point on the 2D surface of whatever texture is applied to it. The texture 2D coord system has "U" increasing from top to bottom, and "V" increasing from left to right. Because both rectangular faces are defined as triangle strips, a criss-cross pattern is evident in (U,V), as well as in the "X", "Y", and "Z" coordinates themselves; the vertices do not go around the rectangle from vertex 0, to 1, to 2, to 3; rather, they cross over the rectangle in diagonal fashion between vertex 1 and vertex 2. This is consistent with Direct3D's general expectation that solids be comprised of triangular facets.

Both textures used have a static appearance. As a result, anti is set to 1; because the hex tessellation is drawn just once, there is no real performance penalty associated with this improvement. There is still a function do2dwork(), as was seen in "hex3d.cpp", but it is called only once, before the first frame is rendered, to set up the static texture appearance. The code for this function is shown below:

void do2dwork()
{
 IDirect3DSurface9* surface=NULL;
 HDC hdc;

 planetexture->GetSurfaceLevel(0, &surface);
 surface->GetDC(&hdc);

 for(int hexcx=0;hexcx<TESSEL_ROWS;++hexcx)
  for(int hexcy=0;hexcy<TESSEL_ROWS;++hexcy)
  {
   switch(rand()%4)
   {
    case 0:
     //255 means full color for R, G or B
     systemhex(hdc,TESSEL_ORIG_X,TESSEL_ORIG_Y,
               TESSEL_MAGNITUDE,255,0,0,hexcx,hexcy,255,0,0,1); 
     break;
    case 1:
     systemhex(hdc,TESSEL_ORIG_X,TESSEL_ORIG_Y,
               TESSEL_MAGNITUDE,0,255,0,hexcx,hexcy,0,255,0,1);
     break;
    case 2:
     systemhex(hdc,TESSEL_ORIG_X,TESSEL_ORIG_Y,
               TESSEL_MAGNITUDE,0,0,255,hexcx,hexcy,0,0,255,1);
    case 3:  
     break;
   }
 }
 surface->ReleaseDC(hdc);
 surface->Release();
}

As in "hex3d.cpp", the function begins by obtaining a handle to a DC for the surface's appearance. Again, a tessellation of fixed size is drawn. Here, the randomization component is different; either a red, green, or blue hex can be drawn, or no hex at all can be drawn for a given system coordinate. This allows a default appearance, dictated by file "hexplane.png", to show through. This default appearance is loaded from "hexplane.png" earlier in the startup sequence using a call to D3DXLoadSurfaceFromFile. Preprocessor constants TESSEL_ORIG_X, TESSEL_ORIG_Y, and TESSEL_MAGNITUDE define the coordinate system used for the hex terrain; these were tuned to yield hexagons of an acceptable size, and to achieve full coverage of the ground surface. In particular, slightly negative values are used for TESSEL_ORIG_X and TESSEL_ORIG_Y, to avoid leaving unfilled space around the top and left edges of the tessellation.

"Hex2d.exe"

This demo creates a high-resolution, 8.5" x 11.0" bitmap file. The executable shows a modal message box with the message "DONE!" after it has finished creating the output bitmap file. The bitmap is completely covered by a black and white hex tessellation, drawn with anti-aliasing enabled. If printed, the result could be useful for a board or pen-and-paper game built around a hexagonal coordinate system. This program is built from source code file "hex2d.cpp".

Unlike the other two C++ demos, DirectX is not used here. Rather, GDI and Win32 API calls only are used, in conjunction with calls into "hexdll.dll", to achieve the desired result. Specifically, a starting BITMAP is created. A DC is then obtained, for drawing onto this BITMAP. This DC is passed to systemhex() repeatedly, in a nested for loop, to draw the hex tessellation. Finally, the resultant appearance data after drawing must be written from memory out to a properly formatted bitmap file. This last step in particular requires a significant amount of new low-level code compared to the two 3D C++ demos.

The series of steps outlined in the last paragraph are mostly executed directly from main(). After declaring some local variables, main() begins as shown below:

hexdllstart();

//Delete "out.bmp"
synchexec("cmd.exe","/c del out.bmp");
//Make blank "temp.bmp"
synchexec("cmd.exe","/c copy blank.bmp temp.bmp");

//Modify TEMP.BMP...

hbm = (HBITMAP) LoadImage(NULL, "temp.bmp", IMAGE_BITMAP, 0, 0,
 LR_LOADFROMFILE | LR_CREATEDIBSECTION);
 
if(hbm==NULL) //Error
{
    MessageBox(0,"BITMAP ERROR","Hex2D.exe",
               MB_APPLMODAL|MB_SETFOREGROUND);
    return 1;
}

The two calls to synchexec() (a wrapper for ShellExecuteEx()) serve to delete "out.bmp", which is the program output file, and then to create working bitmap file "temp.bmp". Note that this working file is a copy of "blank.bmp", which is a plain white bitmap having 16-bit color depth (like the output bitmap). In the application code as provided, this is just a starting point, which is completely overwritten using systemhex calls.

The main() function continues as shown below:

static BITMAP bm;
bm.bmBits=bigbuff;

GetObject
(
  (HGDIOBJ)hbm,     // handle to graphics object of interest
  sizeof(BITMAP),   // size of buffer for object information
  (LPVOID)&bm   // pointer to buffer for object information
);

This code snippet takes the HBITMAP value hbm, which is a pointer-like identifier for a structure held by Windows, and converts it into a BITMAP object proper, present in the static storage of "hex2d.cpp". Getting the actual BITMAP structure (vs. an HBITMAP) is useful as a way to access some properties like width and height using "dot" operator. Variable bigbuff, which is declared with a static size equal to the known memory requirements of the high-resolution bitmap, holds the local copy of the BITMAP appearance information.

Next, main() continues with the code shown below:

hdc=CreateCompatibleDC(hdctemp=GetDC(0));
ReleaseDC(0,hdctemp);
SelectObject(hdc,hbm);

The series of calls shown above first create a new and independent DC, as opposed to one obtained for a control or window. The DC created is compatible with the desktop (HWND zero), since there is no app main window DC to pass instead. Then, the code associates this DC, and the drawing about to happen, with hbm. Now, with this relationship established, the actual hexagon drawing can take place, with the newly created DC passed as the first parameter to systemhex():

for(int ccx=0;ccx<HEX_GRID_COLS;++ccx)
{
  for(int ccy=0;ccy<HEX_GRID_ROWS ;++ccy)
  {
   systemhex( hdc,
    X_ORIG,Y_ORIG, 
    HEX_RADIUS, 
    BRUSH_RED,BRUSH_GREEN,BRUSH_BLUE, 
    ccx,ccy, 
    PEN_RED,PEN_GREEN,PEN_BLUE,
    1 );
  }
}

This code fragment is very reminiscent of the earlier demos. Note that the last parameter is 1, indicating that anti-aliasing is enabled. All of the other parameters are constants which, as before, were tweaked by the author, based on observation, to yield complete coverage of the target surface.

The remainder of main() writes out the image identified by hbm to a ".bmp" file. This is a somewhat tedious process, which is already well-summarized elsewhere online. One noteworthy addition made for this application is that DPI is explicitly set to 192, using the bit of code presented below. Note that the actual setting involves the somewhat more obscure terminology "pels per meter". Application constant IMG_PELS_PER_METER contains the correct value of 7,560 pels per meter:

lpbi->bmiHeader.biYPelsPerMeter = IMG_PELS_PER_METER;
lpbi->bmiHeader.biXPelsPerMeter = IMG_PELS_PER_METER;

Several online sources simply set these values to 0. The author wished for a high-resolution, printable image of the correct 8.5" x 11.0" size, though, so setting DPI (or "pels per meter") correctly was deemed necessary.

Library Implementation

Many of the calculations required to draw hexagons will involve real numbers. In order to maximize the accuracy of these computations, and to minimize the number of typecast operations necessary, systemhex begins by converting all of its pixel parameters into doubles, and passing them to an inner implementation function:

void systemhex(HDC hdc,int origx,int origy,int magn,int r,
     int g,int b,int coordx,int coordy,int pr,int pg,int pb,BOOL anti)
{
 innerhex(hdc,(double)origx,(double)origy,(double)magn,r,
      g, b,coordx,coordy,pr,pg, pb,anti);
}

This inner function translates coordx and coordy (hex system coordinates) into actual screen coordinates. In doing so, it largely just enforces the arbitrary decisions made by the author in designing the coordinate system. Its if, for example, ensures that hexagons at odd "X" coordinates are located slightly lower in the "Y" dimension than those at even "X" coordinates, as is the stated convention of "hexdll.dll":

void innerhex(HDC hdc,double origx,double origy,double magn,int r,
     int g,int b,int x,int y,int pr,int pg,int pb,BOOL anti)
{ 
 //Odd X translates drawing up and left a bit 
 if(coordx%2)
  abstracthex( hdc, 
   origx+((double)coordx)*(magn+magn*COS_HEX_ANGLE), 
   origy+((double)coordy+0.5)*(2.0*magn*SIN_HEX_ANGLE), 
   magn, r, g, b,pr,pg,pb,anti); 
 else 
  abstracthex( hdc, 
   origx+((double)coordx)*(magn+magn*COS_HEX_ANGLE), 
   origy+((double)coordy)*(2.0*magn*SIN_HEX_ANGLE), 
   magn, r, g, b,pr,pg,pb,anti); 
}

As shown above, the bottom-level function responsible for drawing hexagons in the internal implementation of "hexdll.dll" is another function called abstracthex(). This lower level function operates in terms of system coordinates (as opposed to the hexagon coordinates shown in Figure 3). The prototype of abstracthex() is shown below:

void abstracthex(HDC hdc,double origx,double origy,double magn,
     int r,int g,int b,int pr,int pg,int pb,BOOL anti)

Note that in performing this final translation into raster coordinates, the geometry of the hexagon must be considered in depth. Figure 6, below, is a useful aid to understanding this geometry:

HEXAGON GEOMETRY

Figure 6: Hexagon Geometry

The diagram above gives all of the dimensions necessary to implement abstracthex(). The leftmost vertex of the hexagon is, by definition, located at (x,y). This vertex is the first vertex drawn by the abstracthex() function. From there, drawing moves down and right to the next vertex. As shown in Figure 6, the 60° angle is key to these calculations. We can view any side of the hexagon as the hypotenuse of a 30-60-90 triangle. The triangle constructed using dotted lines in Figure 6 is an example of one of these 30-60-90 triangles. The other sides of such a triangle measure cos(60°) times the hypotenuse length (for the shorter side) and sin(60°) times the hypotenuse length (for the longer side). Here, the hypotenuse has length magn, and the two side other than the hypotenuse therefore have lengths of cos(60°)*magn and sin(60°)*magn. The actual measurement shown in Figure 6 is negative, since positive "Y" movement in the Direct3D texture coordinate system is down.

As shown in the picture above, the shorter of these two triangle sides approximates the movement from the first vertex drawn to the second in the "X" dimension. Similarly, the longer of these two sides approximates the movement from the first vertex drawn to the second in the "Y" dimension. As we move from the first vertex drawn at (x,y) to the next vertex, we therefore move cos(60°)*magn pixels in the "X" dimension and sin(60°)*magn in the "Y" dimension. The coordinate of this second vertex is thus (x+cos(60°)*magn, y+sin(60°)*magn).

The next vertex drawn is the one directly to the right of the vertex just drawn. Because the length of the side between these two is magn, the third coordinate is located at (x+cos(60°)*magn+magn, y+sin(60°)*magn).

Instead of passing these coordinate expressions to GDI/GDI+ as shown above, though, the code provided uses a system of running totals, in an effort to minimize repeated calculations. Near the top of abstracthex(), the following initializations are present:

double cham=COS_HEX_ANGLE*magn;
double sham=SIN_HEX_ANGLE*magn;
double opx=(x+cham);   //Second vertex's "X" location
double opy=(y+sham);   //Hex bottom "Y" location
double opm=(opx+magn); //Third vertex's "X" location
double oms=(y-sham);   //Hex top "Y" location

After the execution of this code, the first three vertices drawn will have coordinates (x,y). The second will be located at (opx,opy), and the third at (opm,y). The fourth coordinate drawn, at the extreme right side of the hexagon, is just a bit further to the right, at (opm+cham,y). The drawing of the fifth vertex moves back toward the left and up, to (opm,oms). Finally, we move back magn pixels to the left, and draw the sixth vertex at (opx,oms).

Depending on whether or not anti is true, either GDI or GDI+ will be used for the actual drawing operations. In either case, a data structure holding all of the vertex coordinates, in drawing order, is first constructed. For GDI, this is an array of POINT structures, whose construction is shown below:

POINT hex1[6];

//Start hex at origin... leftmost point of hex
hex1[0].x=(int)(x+0.5);         
hex1[0].y=(int)(y+0.5);

//Move [ cos(theta) , sin(theta) ] units in positive (down/right) direction
hex1[1].x=(int)(opx+0.5);  
hex1[1].y=(int)(opy+0.5);

//Move ((0.5) * hexwidth) more units right, to make "bottom" of hex
hex1[2].x=(int)(opm+0.5);
hex1[2].y=(int)(opy+0.5);

//Move to vertex opposite origin... Y is same as origin
hex1[3].x=(int)(opm+cham+0.5);
hex1[3].y=(int)(y+0.5);

//Move to right corner of hex "top"
hex1[4].x=(int)(opm+0.5);
hex1[4].y=(int)(oms+0.5);

//Complete the "top" side of the hex
hex1[5].x=(int)(opx+0.5);
hex1[5].y=(int)(oms+0.5);

Note that the addition of 0.5 to each term serves to achieve proper rounding; otherwise, the decimal portion of each floating point value (x, y, opx, etc.) would simply be abandoned.

If GDI+ is used, an array of PointF structures is built instead. These structures use floating point coordinates, and no rounding or typecasting is necessary. Their declaration is shown below:

PointF myPointFArray[] = 
{ 
  //Start hex at origin... leftmost point of hex
  PointF(x, y),

  //Move [ cos(theta) , sin(theta) ] units in positive (down/right) direction
  PointF((opx), (opy)),

  //Move ((0.5) * hexwidth) more units right, to make "bottom" of hex
  PointF((opm), (opy)),

  //Move to vertex opposite origin... Y is same as origin
  PointF(opm+cham, y),

  //Move to right corner of hex "bottom"
  PointF((opm), (oms)),

  //Complete the "bottom" side of the hex
  PointF((opx), (oms))
};

If GDI is in use, the vertex data structure gets passed to a function named Polygon. The SelectObject API is first used to select a pen with the requested outline color, and then to select a brush with the requested interior color. This series of actions results in a polygon with the requested outline and interior colors.

Under GDI+, two calls are necessary to achieve the same result, one to DrawPolygon() and one to FillPolygon(). It is once again necessary to create both a pen and a brush, with the first of these getting passed to DrawPolygon() and the second to FillPolygon(). It should be noted that the necessity of two distinct function calls here plays some role in the relatively slow performance obtained using GDI+. However, the author made a point of running tests with a single call to FillPolygon() only, and GDI+ was still much slower than GDI.

Conclusion

The work presented here led the author to several conclusions about recent versions of Windows, its APIs, and its rendering architecture. GDI, of course, is much more efficient than GDI+. This should be no surprise, given that much of GDI was originally written to work well on the relatively primitive hardware that ran the earliest versions of Windows.

GDI+ is useful primarily because of its anti-aliasing capability. It also offers a cleaner interface than GDI, e.g. in the area of memory management. This OO interface comes at a significant cost, though. Comparable operations are much slower in GDI+ than in GDI, even with anti-aliasing disabled.

While both imperfect, GDI and GDI+ do seem to complement each other well. In the demonstration programs provided, GDI+ works well for generating a high-quality printable image, and this, fortuitously, is not a task that needs to happen with incredible quickness anyway. GDI, on the other hand, provides the high level of speed and efficiency necessary for the dynamic texturing demo ("hex3d.exe"), and in this arena its lack of anti-aliasing will usually go unnoticed. The texture will be moving quickly at runtime, and will also get passed through the Direct3D interpolation filters necessary to scale the texture for each frame. Whatever jagged edges GDI might generate compared to GDI+ are quite likely lost in the translation and animation process.

Finally, some conclusions about combining Direct3D and GDI in the latest versions of Windows were reached by the author in preparing this work. While the changes in GUI rendering that came with Windows Vista were significant, nothing in them seems to rule of the possibility of using GDI to draw on Direct3D surfaces with a level of efficiency that is at least reasonably good. The process of obtaining the necessary DC remains quick and intuitive, and the GDI operations themselves seem mostly to be fast enough to keep up with Direct3D.

Footnotes

  1. MinGW did not support later versions of DirectX when the article was written. At least, DirectX 9 was the newest version for which headers were present in "c:\mingw\include\", and web searches yielded no obvious way to incorporate later versions. Microsoft's "August 2007" version of the DirectX SDK should therefore be installed in order to build the 3D demonstration programs. Detailed instructions for obtaining and installing the SDK are given in Direct3D Programming with MinGW.
  2. At present, only 32-bit client applications are supported. To support 64-bit clients would require the code for "hexdll.dll" to be rebuilt in a 64-bit development environment. While there is no reason to suspect that this would not work, 64-bit compilation has not been tested.

History

This is the second major version of this article. Compared to the first version, some improvements in formatting and clarity have been made. The code and binary files have not changed. 

License

This article, along with any associated source code and files, is licensed under The GNU General Public License (GPLv3)

 

저작자 표시

Endogine sprite engine

 

 

http://www.codeproject.com/Articles/10768/Endogine-sprite-engine

 

Endogine sprite engine

By | 17 Jul 2006 | Article
Sprite engine for D3D and GDI+ (with several game examples).

Sample Image

Some of the examples are included in the source.

Introduction

Endogine is a sprite and game engine, originally written in Macromedia Director to overcome the many limitations of its default sprite engine. I started working in my spare time on a C# version of the project about a year ago, and now it has far more features than the original - in fact, I've redesigned the architecture many times so it has little in common with the Director++ paradigm I tried to achieve. Moving away from those patterns has made the project much better. However, there are still a lot of things I want to implement before I can even call it a beta.

Some of the features are:

  • Easy media management.
  • Sprite hierarchy (parent/child relations, where children's Rotation, LOC etc. are inherited from the parent).
  • Behaviors.
  • Collision detection.
  • Plugin-based rendering (Direct3D, GDI+, Irrlicht is next).
  • Custom raster operations.
  • Procedural textures (Perlin/Wood/Marble/Plasma/others).
  • Particle systems.
  • Flash, Photoshop, and Director import (not scripts). NB: Only prototype functionality.
  • Mouse events by sprite (enter/leave/up/down etc. events).
  • Widgets (button, frame, window, scrollbar etc.). All are sprite-based, so blending/scaling/rotating works on widget elements as well.
  • Animator object (can animate almost any property of a sprite).
  • Interpolators (for generating smooth animations, color gradients etc.).
  • Sprite texts (each character is a sprite which can be animated, supports custom multicolor bitmap fonts and kerning).
  • Example game prototypes (Puzzle Bobble, Parallax Asteroids, Snooker/Minigolf, Cave Hunter).
  • IDE with scene graph, sprite/behavior editing, resource management, and debugging tools.
  • Simple scripting language, FlowScript, for animation and sound control.
  • Plug-in sound system (currently BASS or DirectSound).
  • New - Color editor toolset: Gradient, Painter-style HSB picker, Color swatches (supports .aco, .act, Painter.txt).
  • New - Classes for RGB, HSB, HSL, HWB, Lab, XYZ color spaces (with plug-in functionality). Picker that handles any 3-dimensional color space.

Sample Image

Some of the current GUI tools (editors, managers etc.).

Background

I had been developing games professionally in Macromedia Director for 10 years, and was very disappointed with the development of the product the last 5 years. To make up for this, I wrote several graphical sub-systems, some very project-specific, but finally I designed one that fulfilled the more generic criteria I had for a 2D game creation graphics API. It was being developed in Director's scripting language Lingo from autumn 2004 to spring 2005, and since then it's a C# project..

It's a prototype

The current engine design is not carved in stone, and I have already made several major changes during its development, and even more are planned.

Optimizations will have to wait until all functionality is implemented. The GDI+ mode is extremely slow, because I haven't ported my dirty rect system yet. The D3D full-screen mode has the best performance.

The code is poorly commented at this stage, as it is still possible I'll rewrite many parts. If there is a demand for documentation, I will create it as questions appear. For now, you can get a feel for how to use it, by investigating the Tests project.

Example projects

There are two solutions in the download. One is the actual engine, including a project called Tests which contains most of the examples and code. I choose to include it in the solution since it's a little bit easier to develop/debug the engine if the test is part of it, but that's not the way your own projects should be set up. The MusicGame project is closer to how it should be done.

There's also a simple tutorial text on how to set up your own project.

I wanted to have a simple, but real-life testbed, so I'm creating a few game prototypes. Currently, they are Puzzle Bobble, a scrolling asteroid game, a golf/snooker game, CaveHunter, and Space Invaders. Other non-game tests are also available in the project. Turn them on and off by bringing the main window into focus and select items from the "Engine tests" menu.

Example walkthrough

Note: For using dialogs / editors, the Endogine.Editors.dll has to be present in the .exe folder. For sound, include Endogine.Audio.Bass.dll and the files in BASS.zip (shareware license).

To try out some of the examples in the Tests project, run the Tests solution and follow these steps:

  • Start in 3D/MDI mode (default).
  • Set focus on the Main window, so the Engine Tests menu appears in the MDI parent.
  • Select Particle System from the menu. The green "dialog" controls a few aspects of the particle system. The top slider is numParticles, the bottom the size. The buttons switch between color and size schemes. After playing around with it, turn it off by selecting Particle System again from the menu.
  • Select GDI+ random procedural, and also Font from the menu. This demonstrates a few ways to create procedural textures and manipulate bitmaps, as well as the support for bitmap fonts. Each letter sprite also has a behavior that makes it swing. Note that both are extremely slow - they're using my old systems. I'll upgrade them soon which will make them a hundred times faster.
  • Go to the SceneGraphViewer, and expand the nodes until you get to the Label sprite. Right-click it and select the Loc/Scale/Rot control. Try the different modes of changing the properties. Notice the mouse wrap-around feature.
  • Close the Loc/Scale/Rot control, go back to the SceneGraphViewer, expand the Label node. Right-click one of the letter sprites and select Properties (the LocY and Rotation properties are under the program control so they are hard to change here).
  • Click the Behaviors button to see which behaviors the sprite has. Mark ThisMovie.BhSwing, and click Remove to make it stop swinging. (Add... and Properties... aren't implemented yet).
  • Stop the program, and start it again, but now deselect MDI mode (because of .NET's keyboard problems in MDI mode).
  • Set focus to the Main window and select Puzzle Bobble from the menu. Player 1 uses the arrow keys to steer (Down to shoot, Up to center), player 2 uses AWSD. Select it again to turn it off. Note: due to changes in the ERectangle(F) classes, there's currently a bug which makes the ball stick in the wrong places. Will look into that later. (May be fixed.)
  • Select Snooker from the menu and click-drag on the white ball to shoot it. Open the Property Inspector for the topology sprite object, change Depth and Function to see how a new image is generated, use the Loc/Scale/Rot control to drag it around, and see how the balls react (buggy, sometimes!).
  • Select Parallax Scroll from the menu and try the Asteroids-like game. Arrow keys to steer, Space to shoot. Note: the Camera control's LOC won't work now, because the camera is managed by the player, centering the camera over the ship for each frame update. That's how the parallax works - just move the camera, and the parallax layers will automatically create the depth effect.
  • (Broken in the Feb '06 version) Go to the main menu, Edit, check Onscreen sprite edit, and right-click on a sprite on the screen, and you'll get a menu with the sprites under the mouse LOC. Select one of the sprites, and it should appear as selected in the SceneGraph. It doesn't update properly now, so you'll have to click the SceneGraph's toolbar to see the selection.

Using the code

Prerequisites: .NET 2.0 and DirectX 9.0c with Managed Extensions (Feb 06) for the demo executable, and DirectX SDK Feb 06 for compiling the source. You can download SharpDevelop 2.0 or Microsoft's Visual Studio Express C# for free, if you need a C# developer IDE. Read the README.txt included in the source for further instructions.

Note that Managed DirectX versions aren't backwards nor forwards compatible. I think later versions will work if you recompile the project, but the demo executable needs MDX Feb 06.

I'm currently redesigning the workflow, and you would need a fairly long list of instructions in order to set up a new solution which uses Endogine, which I haven't written. The easiest way to get started is by looking at the Tests project. You can also have a look at the MusicGame solution, which is more like how a real project would be organized.

Most of the terminology is borrowed from Director. Some examples of sprite creation:

Sprite sp1 = new Sprite();
//Loads the first bitmap file named 
//Ball.* from the default directory
sp1.MemberName = "Ball";
sp1.Loc = new Point(30,30); //moves the sprite

Sprite sp2 = new Sprite();
//If it's an animated gif, 
//the sprite will automatically animate
sp2.MemberName = "GifAnim";
sp2.Animator.StepSize = 0.1; //set the animation speed
sp2.Rect = new RectangleF(50,50,200,200); //stretches and moves the sprite

Sprite sp2Child = new Sprite();
//same texture/bitmap as sp1's will be used 
//- no duplicate memory areas
sp2Child.MemberName = "Ball";
//now, sp2Child will follow sp2's location, 
//rotation, and stretching
sp2Child.Parent = sp2;

Road map

I'll be using the engine for some commercial projects this year. This means I'll concentrate all features that are necessary for those games, and that I probably won't work on polishing the "common" feature set a lot.

There will be a number of updates during the year, but I've revised my 1.0 ETA to late autumn '06. I expect the projects to put an evolutionary pressure on the engine, forcing refactoring and new usage patterns, but resulting in a much better architecture. Another side effect is documentation; I'll have to write at least a few tutorials for the other team members.

Currently, I put most of my spare time into PaintLab, an image editor, which is based on OpenBlackBox, an open source modular signal processing framework. They both use Endogine as their graphics core, and many GUI elements are Endogine editors, so it's common that I need to improve/add stuff in Endogine while working on them.

Goals for next update (unchanged)

I've started using Subversion for source control, which will make it easier to assemble new versions for posting, so updates with new tutorials and bug-fixes should appear more often. Some probable tasks for the next few months:

  • Switch between IDE/MDI mode and "clean" mode in runtime.
  • Clean up terminology, duplicate systems, continue transition to .NET 2.0.
  • Look into supporting XAML, harmonize with its terminology and patterns.
  • Fix IDE GUI, work on some more editors.

History

Update 2006-07-06

Again, other projects have taken most of my time - currently it's the OpenBlackBox/Endogine-based PaintLab paint program. Side effects for Endogine:

  • Color management tools: editors, color space converters, color palette file filters etc.
  • Refactoring: WinForms elements are being moved out of the main project into a separate DLL. Simplifies porting to mono or WPF, and makes the core DLL 50% smaller.
  • Improved Canvas, Vector3, Vector4, and Matrix4 classes.
  • Improved PSD parser.
  • More collision/intersection detection algorithms.
  • Several new or updated user controls.

Update 2006-03-15

I've focused on releasing the first version of OpenBlackBox, so most of the modifications have been made in order to provide interop functionality (OBB highly dependent on Endogine).

  • Optimizations (can be many times faster when lots of vertices are involved).
  • Continued transition to .NET 2.0.
  • Requires DirectX SDK Feb '06.
  • Simple support for HLSL shaders and RenderToTexture.
  • Better proxies for pixel manipulation - Canvas and PixelDataProvider classes instead of the old PixelManipulator.
  • Extended interfaces to the rendering parts of Endogine. Projects (such as OpenBlackBox) can take advantage of the abstracted rendering API without necessarily firing up the whole Endogine sprite system.
  • Forgot to mention it last time, but there's a small tutorial included since the last update.

That's pretty much it. But don't miss OpenBlackBox, in time it will become a very useful component for developing applications with Endogine!

Update 2006-02-01

Some major architectural changes in this version, especially in the textures/bitmap/animation system. The transition isn't complete yet, so several usage patterns can co-exist and cause some confusion. Some utilities will be needed to make the animation system easier to use.

  • Moved to .NET 2.0. Note: refactoring is still in progress (e.g., generics isn't used everywhere yet).
  • The Isometric example has been removed. I've continued to work on it as a separate project, which I'll make available later if possible.
  • Renderers are now plug-ins, allowing for easier development/deployment. (Also started on a Tao- based OpenGL renderer, but lost my temper with its API.)
  • PixelManipulator - easy access to pixels regardless of if the source is a bitmap or texture surface.
  • Examples of how to use the PixelManipulator (adapted Smoke and Cellular Automata3 from processing.org - thanks Mike Davis and Glen Murphy for letting me use them).
  • C# version of Carlos J. Quintero's VB.NET TriStateTreeView - thanks Carlos. Added a TriStateTreeNode class.
  • Started on a plugin-based sound system. Currently, I've implemented two sound sub-systems: BASS and a simple DirectX player. OpenAL can be done, but BASS is cross-platform enough, and it has a better feature set. Later, I'll add DirectShow support.
  • Included a modified version of Leslie Sanford's MidiToolKit (supports multiple playbacks and has a slightly different messaging system). Thanks!
  • Flash parser/renderer has been restructured and improved. Can render basic shapes and animations.
  • System for managing file system and settings for different configurations (a bit like app.config but easier to use and better for my purposes).
  • RegEx-based file finder (extended search mechanism so you can use non-regex counters like [39-80] and [1-130pad:3] - the latter will find the string 001 to 130).
  • Helper for creating packed textures (tree map packing).
  • Abstraction layer for texture usage - the user doesn't have to care if a sprite's image comes from a packed texture or not.
  • The concept of Members is on its way out, replaced by PicRefs. Macromedia Director terminology in general will disappear over time from now on.
  • New, more abstracted animation system. Not fully implemented.
  • .NET scripting system with code injection. Currently only implemented for Boo, but will support other languages. Will be restructured later, with a strategy pattern for the different languages.
  • New vastly improved bitmap font system, both for rendering and creating fonts. Real-time kerning instead of precalculated kerning tables.
  • Localization/translation helper (for multi-language products).
  • A number of helper classes such as the IntervalString class (translates "-1-3,5-7" to and from the array [-1,0,1,2,3,5,6,7]).
  • Unknown number of bug fixes and optimizations.

Update 2005-10-10

Since the last update wasn't that exciting from a gaming POV, I decided to throw in a new version with a prototype isometric game. Totally R.A.D.

  • Isometric rendering, based on David Skoglund's game CrunchTime, including his graphics (see _readme.txt in the Isometric folder). Thanks, pal!
  • A Star pathfinding algorithm adapted from an implementation by John Kenedy. Thank you!
  • Removed references to LiveInterface.
  • Added "resource fork" for bitmap files - an XML file with additional info such as number of animation frames and offset point.

Update 2005-10-04

OK, it's over a month late, and it doesn't include stuff you might have been waiting for, and the things I've been working on - mainly creating two script languages - aren't that useful in their current state (especially with no good examples of their use). I think it was worth putting some time into, as I'm certain FlowScript will become a great tool later on. Here's what I've got for this version:

  • Compiled using the August 2005 SDK.
  • Added Space Invaders prototype.
  • Added curves rendering to .swf import (it still renders strange images).
  • Reorganized the engine into a .dll.
  • Fixed mouse LOC <-> screen LOC error.
  • Fixed transparency / alpha / color-key issues.
  • Map (probably a bad name) class - like a SortedList, but accepts multiple identical "keys".
  • Node class, like XmlNode, but for non-text data.
  • Started preparing the use of 3D matrices for scale/LOC/rotation.
  • Removed DirectDraw support.
  • Basic sound support.
  • A badly sync'ed drum machine example.
  • FlowScript, a time-based scripting language, aimed at non-programmers for animation and sound control, based on:
  • EScript, simple scripting language based on reflection (no bytecode).
  • Simple CheckBox widget.

Update 2005-08-01

  • Compiled using the June 2005 SDK.
  • Sprite.Cursor property (works like Control.Cursor).
  • Simple XML editor.
  • .NET standardized serializing.
  • PropertyGrid instead of custom property editors.
  • VersatileDataGrid User Control (a new DataGrid control which implements functionality missing in .NET's standard DataGrid).
  • TreeGrid User Control - a bit like Explorer, but the right-hand pane is a VersatileDataGrid locked to the treeview.
  • Two new game prototypes: CaveHunter and Snooker/MiniGolf.
  • ResourceManager editor w/ drag 'n' drop to scene (and to SceneGraph viewer).
  • Better structure.
  • Transformer behavior - Photoshop-style sprite overlay for moving/scaling/rotating.
  • Scene XML import/export.
  • Director Xtra for exporting movies to Endogine XML scene format.
  • Import Photoshop documents - each layer becomes a sprite, layer effects become behaviors. Decoding of layer bitmaps incomplete.
  • Import Flash swf files (rendering code incomplete).
  • BinaryReverseReader which reads bytes in reverse order (for .psd), and BinaryFlashReader which reads data with sub-byte precision (for .swf).
  • Extended EPoint(F) and ERectangle(F) classes. Note that Puzzle Bobble doesn't work properly after the latest changes, I'll take care of that later.

Update 2005-07-07

  • Camera node.
  • Parallax layers.
  • LOC/Scale control toolbox.
  • User Controls: ValueEdit (arrow keys to change a Point), JogShuttle (mouse drag to change a Point value by jog or shuttle method).
  • MDI mode crash fixed (thanks to adrian cirstei) - but keys still don't work in MDI mode.
  • Multiple Scene Graph windows.
  • Select sprites directly in scene.
  • Asteroids-type game with parallax layers.
  • Extended EPoint(F) and ERectangle(F) classes.

Update 2005-06-27

  • Optional MDI interface: editors and game window as MDI windows. (Problem: I get an error when trying to create the 3D device in a MDI window.)
  • Scene Graph: treeview of all sprites.
  • Sprite marker: creates a marker around the sprite which is selected in the scene graph.
  • Property Inspector: interface for viewing and editing sprite properties.
  • Sprite Behaviors: easy way to add functionality to sprites. The swinging letters animation is done with a behavior.
  • Behavior Inspector: add/remove/edit behaviors in runtime.
  • Inks use an enumeration instead of an int (ROPs.Multiply instead of 103).
  • Switched from MS' too simplistic Point(F)/Rectangle(F) classes to my own EPoint(F)/ERectangle(F), which have operator overloading and many more methods. Note that I've chosen to write them as classes, not structs - i.e., they're passed as references, not values.
  • Easier keyboard handling (assigns names to keys - makes it easier to let the user define keys, or to have multiple players on one keyboard).
  • Puzzle Bobble allows multiple players in each area.

You can read about my early thoughts about the Endogine concept, future plans, and see some Shockwave/Lingo demos here.

I have added a Endogine C# specific page here, but it probably lags behind this page.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

Jonas Beckeman

Web Developer

Sweden Sweden

Member


저작자 표시

'소스코드' 카테고리의 다른 글

Using Direct2D with WPF  (0) 2012.07.12
Hex Grids and Hex Coordinate Systems in Windows: Drawing and Printing  (0) 2012.07.12
Endogine sprite engine  (0) 2012.07.12
Paint.NET  (0) 2012.07.12
GPGPU on Accelerating Wave PDE  (0) 2012.07.12
Microsoft® Surface® 2 Design and Interaction Guide  (0) 2012.07.12

Paint.NET

http://www.getpaint.net/index.html

 

About
Paint.NET is free image and photo editing software for computers that run Windows. It features an intuitive and innovative user interface with support for layers, unlimited undo, special effects, and a wide variety of useful and powerful tools. An active and growing online community provides friendly help, tutorials, and plugins.
It started development as an undergraduate college senior design project mentored by Microsoft, and is currently being maintained by some of the alumni that originally worked on it. Originally intended as a free replacement for the Microsoft Paint software that comes with Windows, it has grown into a powerful yet simple image and photo editor tool. It has been compared to other digital photo editing software packages such as Adobe® Photoshop®, Corel® Paint Shop Pro®, Microsoft Photo Editor, and The GIMP.

 

 

 

http://www.codeproject.com/Articles/9200/Writing-effect-plug-ins-for-Paint-NET-2-1-in-C

 

Writing effect plug-ins for Paint.NET 2.1 in C#

By | 10 May 2005 | Article
This article is an introduction on how to create your own effect plug-ins for Paint.NET 2.1 in C#.

Introduction

Paint.NET 2.1 was released last week. Created to be a free replacement for the good old Paint that ships with every copy of Windows, it is very interesting for end users at large. But it is even more interesting for developers because of two reasons. First, it is open source. So if you like to study a few megabytes of C# code or how some architectural problems can be solved, go and get it. Second, the application provides a simple but appealing interface for creating your own effect plug-ins. And that's what this article is all about (if you're searching for some fancy effect algorithm, go somewhere else as the effect used in the article is quite simple).

Getting started

The first thing you need to do is to get the Paint.NET source code. Besides being the source code, it also serves as its own documentation and as the Paint.NET SDK. The solution consists of several projects. However, the only interesting ones when developing Paint.NET effect plug-ins are the PdnLib library which contains the classes we will use for rendering our effect and the Effects library which contains the base classes for deriving your own effect implementations.

The project basics

To create a new effect plug-in, we start with creating a new C# Class Library and add references to the official release versions of the PdnLib (PdnLib.dll) and the PaintDotNet.Effects library (PaintDotNet.Effects.dll). The root namespace for our project should be PaintDotNet.Effects as we're creating a plug-in that is supposed to fit in seamlessly. This is, of course, not limited to the namespace but more of a general rule: when writing software for Paint.NET, do as the Paint.NET developers do. The actual implementation requires deriving three classes:

  1. Effect is the base class for all Paint.NET effect implementations and it's also the interface Paint.NET will use for un-parameterized effects. It contains the method public virtual void Render(RenderArgs, RenderArgs, Rectangle) which derived un-parameterized effects override.
  2. Most of the effects are parameterized. The EffectConfigToken class is the base class for all specific effect parameter classes.
  3. And finally, as parameterized effects most likely will need a UI, there is a base class for effect dialogs: EffectConfigDialog.

Implementing the infrastructure

Now, we will take a look at the implementation details on the basis of the Noise Effect (as the name implies, it simply adds noise to an image). By the way, when using the sources provided with this article, you will most likely need to update the references to the Paint.NET libraries.

The effect parameters

As I said before, we need to derive a class from EffectConfigToken to be able to pass around our effect parameters. Given that our effect is called Noise Effect and that we want to achieve consistency with the existing sources, our parameter class has to be named NoiseEffectConfigToken.

public class NoiseEffectConfigToken : EffectConfigToken

There is no rule what your constructor has to look like. You can use a simple default constructor or one with parameters. From Paint.NET's point of view, it simply does not matter because (as you will see later) the class (derived from) EffectConfigDialog is responsible for creating an instance of the EffectConfigToken. So, you do not need to necessarily do anything else than having a non-private constructor.

public NoiseEffectConfigToken() : base()
{

}

However, our base class implements the ICloneable interface and also defines a pattern how cloning should be handled. Therefore, we need to create a protected constructor that expects an object of the class' own type and uses it to duplicate all values. We then have to override Clone() and use the protected constructor for the actual cloning. This also means that the constructor should invoke the base constructor but Clone() must not call its base implementation.

protected NoiseEffectConfigToken(NoiseEffectConfigToken copyMe) : base(copyMe)
{

  this.frequency      = copyMe.frequency;
  this.amplitude      = copyMe.amplitude;
  this.brightnessOnly = copyMe.brightnessOnly;

}

public override object Clone()
{
  return new NoiseEffectConfigToken(this);
}

The rest of the implementation details are again really up to you. Most likely, you will define some private fields and corresponding public properties (as the case may be with some plausibility checks).

The UI to set the effect parameters

Now that we've got a container for our parameters, we need a UI to set them. As mentioned before, we will derive the UI dialog from EffectConfigDialog. This is important as it helps to ensure consistency of the whole UI. For example, in Paint.NET 2.0, an effect dialog is by default shown with opacity of 0.9 (except for sessions over terminal services). If I don't use the base class of Paint.NET and the developers decide that opacity of 0.6 is whole lot cooler, my dialog would all of a sudden look "wrong". Because we still try to be consistent with the original code, our UI class is called NoiseEffectConfigDialog.

Again, you have a lot of freedom when it comes to designing your dialog, so I will again focus on the mandatory implementation details. The effect dialog is entirely responsible for creating and maintaining effect parameter objects. Therefore, there are three virtual base methods you must override. And, which might be unexpected, don't call their base implementations (it seems that earlier versions of the base implementations would even generally throw exceptions when called). The first is InitialInitToken() which is responsible for creating a new concrete EffectConfigToken and stores a reference in the protected field theEffectToken (which will implicitly cast the reference to an EffectConfigToken reference).

protected override void InitialInitToken()
{

  theEffectToken = new NoiseEffectConfigToken();

}

Second, we need a method to update the effect token according to the state of the dialog. Therefore, we need to override the method InitTokenFromDialog().

protected override void InitTokenFromDialog()
{

  NoiseEffectConfigToken token = (NoiseEffectConfigToken)theEffectToken;
  token.Frequency      = (double)FrequencyTrackBar.Value / 100.0;
  token.Amplitude      = (double)AmplitudeTrackBar.Value / 100.0;
  token.BrightnessOnly = BrightnessOnlyCheckBox.Checked;

}

And finally, we need to be able to do what we did before the other way round. That is, updating the UI according to the values of a token. That's what InitDialogFromToken() is for. Unlike the other two methods, this one expects a reference to the token to process.

protected override void InitDialogFromToken(EffectConfigToken effectToken)
{

  NoiseEffectConfigToken token = (NoiseEffectConfigToken)effectToken;

  if ((int)(token.Frequency * 100.0) > FrequencyTrackBar.Maximum)
    FrequencyTrackBar.Value = FrequencyTrackBar.Maximum;
  else if ((int)(token.Frequency * 100.0) < FrequencyTrackBar.Minimum)
    FrequencyTrackBar.Value = FrequencyTrackBar.Minimum;
  else
    FrequencyTrackBar.Value = (int)(token.Frequency * 100.0);

  if ((int)(token.Amplitude * 100.0) > AmplitudeTrackBar.Maximum)
    AmplitudeTrackBar.Value = AmplitudeTrackBar.Maximum;
  else if ((int)(token.Amplitude * 100.0) < AmplitudeTrackBar.Minimum)
    AmplitudeTrackBar.Value = AmplitudeTrackBar.Minimum;
  else
    AmplitudeTrackBar.Value = (int)(token.Amplitude * 100.0);

  FrequencyValueLabel.Text = FrequencyTrackBar.Value.ToString("D") + "%";
  AmplitudeValueLabel.Text = AmplitudeTrackBar.Value.ToString("D") + "%";

  BrightnessOnlyCheckBox.Checked = token.BrightnessOnly;

}

We're almost done. What's still missing is that we need to signal the application when values have been changed and about the user's final decision to either apply the changes to the image or cancel the operation. Therefore, whenever a value has been changed by the user, call UpdateToken() to let the application know that it needs to update the preview. Also, call Close() when leaving the dialog and set the appropriate DialogResult. For example:

private void AmplitudeTrackBar_Scroll(object sender, System.EventArgs e)
{

  AmplitudeValueLabel.Text = AmplitudeTrackBar.Value.ToString("D") + "%";
  UpdateToken();

}

private void OkButton_Click(object sender, System.EventArgs e)
{

  DialogResult = DialogResult.OK;
  Close();

}

private void EscButton_Click(object sender, System.EventArgs e)
{

  DialogResult = DialogResult.Cancel;
  Close();

}

Implementing the effect

Now everything is in place to start the implementation of the effect. As I mentioned before, there is a base class for un-parameterized effects. The Noise Effect is parameterized but that will not keep us from deriving from Effect. However, in order to let Paint.NET know that this is a parameterized effect, we need to also implement the IConfigurableEffect interface which adds another overload of the Render() method. It also introduces the method CreateConfigDialog() which allows the application to create an effect dialog.

public class NoiseEffect : Effect, IConfigurableEffect

But how do we construct an Effect object, or in this case, a NoiseEffect object? This time, we have to follow the patterns of the application which means that we use a public default constructor which invokes one of the two base constructors. The first one expects the effect's name, its description, and an icon to be shown in the Effects menu. The second constructor, in addition, requires a shortcut key for the effect. The shortcut key, however, will only be applied to effects which are categorized as an adjustment. In case of a normal effect, it will be ignored (see chapter Effect Attributes for details on effects and adjustments). In conjunction with some resource management, this might look like this:

public NoiseEffect() : base(NoiseEffect.resources.GetString("Text.EffectName"),
  NoiseEffect.resources.GetString("Text.EffectDescription"),
  (Image)NoiseEffect.resources.GetObject("Icons.NoiseEffect.bmp"))
{

}

The only mandatory implementations we need are those that come with the implementation of the interface IConfigurableEffect. Implementing CreateConfigDialog() is quite simple as it does not involve anything but creating a dialog object and returning a reference to it.

public EffectConfigDialog CreateConfigDialog()
{

  return new NoiseEffectConfigDialog();

}

Applying the effect is more interesting but we're going to deal with some strange classes we may never have heard of. So let's first take a look at the signature of the Render() method:

public void Render(EffectConfigToken properties,
                   PaintDotNet.RenderArgs dstArgs,
                   PaintDotNet.RenderArgs srcArgs,
                   PaintDotNet.PdnRegion roi)

The class RenderArgs contains all we need to manipulate images; most important, it provides us with Surface objects which actually allow reading and writing pixels. However, beware not to confuse dstArgs and srcArgs. The object srcArgs (of course, including its Surface) deals with the original image. Therefore, you should never ever perform any write operations on those objects. But you will constantly read from the source Surface as once you made changes to the target Surface, nobody is going to reset those. The target (or destination) Surface is accessible via the dstArgs object. A pixel at a certain point can be easily addressed by using an indexer which expects x and y coordinates. The following code snippet, for example, takes a pixel from the original image, performs an operation, and then assigns the changed pixel to the same position in the destination Surface.

point = srcArgs.Surface[x, y];
VaryBrightness(ref point, token.Amplitude);
dstArgs.Surface[x, y] = point;

But that's not all. The region, represented by the fourth object roi, which the application orders us to manipulate, can have any shape. Therefore, we need to call a method like GetRegionScansReadOnlyInt() to obtain a collection of rectangles that approximate the drawing region. Furthermore, we should process the image line by line beginning at the top. These rules lead to a pattern like this:

public void Render(EffectConfigToken properties, RenderArgs dstArgs,
                   RenderArgs srcArgs, PdnRegion roi)
{

  /* Loop through all the rectangles that approximate the region */
  foreach (Rectangle rect in roi.GetRegionScansReadOnlyInt())
  {
    for (int y = rect.Top; y < rect.Bottom; y++)
    {
      /* Do something to process every line in the current rectangle */
      for (int x = rect.Left; x < rect.Right; x++)
      {
        /* Do something to process every point in the current line */
      }
    }
  }

}

The last interesting fact that should be mentioned is that the Surface class generally uses a 32-bit format with four channels (red, green, blue and alpha) and 8-bits per channel where each pixel is represented by a ColorBgra object. Keep in mind that ColorBgra is actually a struct, so in order to pass an object of that type by reference, you have to use the ref keyword. Furthermore, the struct allows accessing each channel through a public field:

private void VaryBrightness(ref ColorBgra c, double amplitude)
{

  short newOffset = (short)(random.NextDouble() * 127.0 * amplitude);
  if (random.NextDouble() > 0.5)
    newOffset *= -1;

  if (c.R + newOffset < byte.MinValue)
    c.R = byte.MinValue;
  else if (c.R + newOffset > byte.MaxValue)
    c.R = byte.MaxValue;
  else
    c.R = (byte)(c.R + newOffset);

  if (c.G + newOffset < byte.MinValue)
    c.G = byte.MinValue;
  else if (c.G + newOffset > byte.MaxValue)
    c.G = byte.MaxValue;
  else
    c.G = (byte)(c.G + newOffset);

  if (c.B + newOffset < byte.MinValue)
    c.B = byte.MinValue;
  else if (c.B + newOffset > byte.MaxValue)
    c.B = byte.MaxValue;
  else
    c.B = (byte)(c.B + newOffset);

}

Effect Attributes

Now we've got our effect up and running. Is there something else we have to do? Well, in this case everything is fine. However, as every effect is different you might want to apply one of the three attributes that are available in the PaintDotNet.Effects namespace. First, there is the attribute EffectCategoryAttribute which is used to let Paint.NET know if the effect is an effect or an adjustment. The difference between those two is that effects are meant to perform substantial changes on an image and are listed in the Effects menu while adjustments only perform small corrections on the image and are listed in the submenu Adjustments in the menu Layers. Just take a look at the effects and adjustments that are integrated in Paint.NET to get a feeling for how to categorize a certain plug-in. The EffectCategoryAttribute explicitly sets the category of an effect by using the EffectCategory value which is passed to the attribute's constructor. By default, every effect plug-in which does not have an EffectCategoryAttribute is considered to be an effect (and therefore appears in the Effects menu) which is equivalent to applying the attribute as follows:

[EffectCategoryAttribute(EffectCategory.Effect)]

Of course, the enumeration EffectCategory contains two values and the second one, EffectCategory.Adjustment, is used to categorize an effect as an adjustment so that it will appear in the Adjustments submenu in Paint.NET.

[EffectCategoryAttribute(EffectCategory.Adjustment)]

Besides from being able to categorize effects, you can also define your own submenu by applying the EffectSubMenu attribute. Imagine you created ten ultra-cool effects and now want to group them within the Effects menu of Paint.NET to show that they form a toolbox. Now, all you would have to do in order to put all those plug-ins in the submenu 'My Ultra-Cool Toolbox' within the Effects menu would be to apply the EffectSubMenu attribute to every plug-in of your toolbox. This of course can also be done with adjustment plug-ins in order to create submenus within the Adjustments submenu. However, there is one important restriction: because of the way how effects are managed in Paint.NET, the effect name must be unique. This means that you can't have an effect called Foo directly in the Effects menu and a second effect which is also called Foo in the submenu 'My Ultra-Cool Toolbox'. If you try something like this, Paint.NET will call only one of the two effects no matter if you use the command in the Effects menu or the one in the submenu.

[EffectSubMenu("My Ultra-Cool Toolbox")]

Last but not least there is the SingleThreadedEffect attribute. Now, let's talk about multithreading first. In general, Paint.NET is a multithreaded application. That means, for example, that when it needs to render an effect, it will incorporate worker threads to do the actual rendering. This ensures that the UI stays responsive and in case the rendering is done by at least two threads and Paint.NET is running on a multi-core CPU or a multiprocessor system, it also reduces the rendering time significantly. By default, Paint.NET will use as many threads to render an effect as there are logical processors in the system with a minimum number of threads of two.

Processor(s)

physical CPUs

logical CPUs

Threads

Intel Pentium III

1

1

2

Intel Pentium 4 with hyper-threading

1

2

2

Dual Intel Xeon without hyper-threading

2

2

2

Dual Intel Xeon with hyper-threading

2

4

4

However, Paint.NET will use only one thread if the SingleThreadedEffect attribute has been applied regardless of the number of logical processors. If the rendering is done by multiple threads, you have to ensure that the usage of any object in the method Render() is thread-safe. The effect configuration token is usually no problem (as long as you don't change its values, which is not recommended anyway) as the rendering threads get a copy of the token instance used by the UI. Also Paint.NET's own PdnRegion class is thread-safe, accordingly you don't have to worry about those objects. However, GDI+ objects like RenderArgs.Graphics or RenderArgs.Bitmap are not thread-safe so whenever you want to use these objects to render your effect, you have to apply the SingleThreadedEffect attribute. You also may apply the attribute whenever you are not sure if your implementation is actually thread-safe or you simply don't want to ponder about multi-threading. Although doing so will lead to a decreased performance on multiprocessor systems and multi-core CPUs, you'll at least be literally on the safe side.

[SingleThreadedEffect]

Conclusion

Creating effect plug-ins for Paint.NET is not too difficult after all. The parts of the object model you need in order to do this are not very complex (trying this is an excellent idea for the weekend) and it even seems quite robust. Of course, this article does not cover everything there is to know about Paint.NET effect plug-ins but it should be enough to create your first own plug-in.

Acknowledgment

I'd like to thank Rick Brewster and Craig Taylor for their feedback and for proof-reading this article.

Change history

  • 2005-05-08: Added a note that shortcut keys are only applied to adjustments and a chapter about attributes.
  • 2005-01-06: Corrected a major bug in NoiseEffectConfigDialog.InitDialogFromToken(EffectConfigToken). The old implementation used the property EffectConfigDialog.EffectToken instead of the parameter effectToken.
  • 2005-01-03: Initial release.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

Dennis C. Dietrich

Web Developer

Ireland Ireland

Member

 

 

 

저작자 표시

GPGPU on Accelerating Wave PDE

소스 검색한것을 공유하기.

 

 

http://www.codeproject.com/Articles/369416/GPGPU-on-Accelerating-Wave-PDE

 

 

 

GPGPU on Accelerating Wave PDE

By | 20 Apr 2012 | Article
A Wave PDE simulation using GPGPU capabilities


Abstract

This article aims at exploiting GPGPU (GP2U) computation capabilities to improve physical and scientific simulations. In order for the reader to understand all the passages, we will gradually proceed in the explanation of a simple physical simulation based on the well-known Wave equation. A classic CPU implementation will be developed and eventually another version using GPU workloads is going to be presented.

Wave PDE

PDE stands for “Partial differential equation” and indicates an equation which has one or more partial derivatives as independent variables in its terms. The order of the PDE is usually defined as the highest partial derivative in it. The following is a second-order PDE:

Usually a PDE of order n having m variables xi for i=1,2…m is expressed as

A compact form to express is ux and the same applies to (uxx ) and (uxy ).

The wave equation is a second order partial differential equation used to describe water waves, light waves, sound waves, etc. It is a fundamental equation in fields such as electro-magnetics, fluid dynamics and acoustics. Its applications involve modeling the vibration of a string or the air flow dynamics as a result from an aircraft movement.

We can derive the one-dimensional wave equation (i.e. the vibration of a string) by considering a flexible elastic string that is tightly bound between two fixed end-points lying of the x axis

A guitar string has a restoring force that is proportional to how much it’s stretched. Suppose that, neglecting gravity, we apply a y-direction­ displacement at the string (i.e. we’re slightly pulling it). Let’s consider only a short segment of it between x and x+  :

Let’s write down for the small red segment in the diagram above. We assume that the string has a linear density (linear density is a measure of mass per unit of length and is often used with one-dimensional objects). Recalling that if a one-dimensional object is made of a homogeneous substance of length L and total mass m the linear density is

we have m= .

The forces (as already stated we neglect gravity as well as air resistance and other tiny forces) are the tensions T at the ends. Since the string is actually waving, we can’t assume that the two T vectors cancel themselves: this is the force we are looking for. Now let’s make some assumptions to simplify things: first we consider the net force we are searching for as vertical (actually it’s not exactly vertical but very close)

Furthermore we consider the wave amplitude small. That is, the vertical component of the tension at the x+ end of the small segment of string is

The slope is if we consider dx and dy as the horizontal and vertical components of the above image. Since we have considered the wave amplitude to be very tiny, we can therefore assume . This greatly helps us: and the total vertical force from the tensions at the two ends becomes

The equality becomes exact in the limit .

We see that y is a function of x, but it is however a function of t as well: y=y(x,t). The standard convention for denoting differentiation with respect to one variable while the other is held constant (we’re looking at the sum of forces at one instant of time) let us write

The final part is to use Newton’s second law and put everything together: the sum of all forces, the mass (substituted with the linear density multiplied by the segment length) and the acceleration (i.e. just because we’re only moving in the vertical direction, remember the small amplitude approximation).

And we’re finally there: the wave equation. Using the spatial Laplacian operator, indicating the y function (depending on x and t) as u and substituting (a fixed constant) we have the common compact form

The two-dimensional wave equation is obtained by adding a second spatial term to the equation as follows

The constant c has dimensions of distance per unit time and thus represents a velocity. We won’t prove here that c is actually the velocity at which waves propagate along a string or through a surface (although it’s surely worth noting). This makes sense since the wave speed increases with tension experienced by the medium and decreases with the density of the medium.

In a physical simulation we need to take into account forces other than just the surface tension: the average amplitude of the waves on the surface diminishes in real-world fluids. We may therefore add a viscous damping force to the equation by introducing a force that acts in the opposite direction of the velocity of a point on the surface:

where the nonnegative constant represents the viscosity of the fluid (it controls how long it takes for the wave on the surface to calm down, a small allows waves to exist for a long time as with water while a large causes them to diminish rapidly as for thick oil).

Solving the Wave PDE with finite difference method

To actually implement a physical simulation which uses the wave PDE we need to find a method to solve it. Let’s solve the last equation we wrote with the damping force

here t is the time, x and y are the spatial coordinates of the 2D wave, c2 is the wave speed and the damping factor. u=u(t,x,y) is the function we need to calculate: generally speaking it could represent various effects like a change of height of a pool’s water, electric potential in an electromagnetic wave, etc..

A common method to solve this problem is to use the finite difference method. The idea behind is basically to replace derivatives with finite differences which can be easily calculated in a discrete algorithm. If there is a function f=f(x) and we want to calculate its derivative respect to the x variable we can write

if h is a discrete small value (which is not zero), then we can approximate the derivative with

the error of such an approach could be derived from Taylor’s theorem but that isn’t the purpose of this paper.

A finite difference approach uses one of the following three approximations to express the derivative

  • Forward difference

  • Backward difference

  • Central difference

Let’s stick with the latter (i.e. central difference); this kind of approach can be generalized, so if we want to calculate the discrete version of f''(x) we could first write

and then calculate f''(x) as follows

The same idea is used to estimate the partial derivatives below

Let’s get back to the wave equation. We had the following

let’s apply the partial derivative formula just obtained to it (remember that u=u(t,x,y), that is, u depends by t,x and y)

This is quite a long expression. We just substituted the second derivatives and the first derivatives with the formulas we got before.

If we now consider , we are basically forcing the intervals where the derivatives are calculated to be the same for both the x and the y directions (and we can greatly simplify the expression):

To improve the readability, let us substitute some of the terms with letters

and we have

If you look at the t variables we have something like

This tells us that the new wave’s height (at t+1) will be the current wave height (t) plus the previous wave height (t-1) plus something in the present (at t) depending only by what are the values around the wave point we are considering.

This can be visualized as a sequence of time frames one by another where a point on the surface we are considering evolves

The object

has a central dot which represents a point (t,x,y) on the surface at time t. Since the term we previously called something_at_t(x,y) is actually

the value of the central point is influenced by five terms and the latter is its same value (-4ut,x,y ) multiplied by -4. 

Creating the algorithm

As we stated before, the wave PDE can be effectively solved with finite difference methods. However we still need to write some resolution code before a real physical simulation can be set up. In the last paragraph we eventually ended up obtaining


we then simplified this expression with

This is indeed a recursive form which may be modeled with the following pseudo-code

for each t in time
{
ut+1 <- ut+ut-1+something_at_t(x,y) 

    // Do something with the new ut+1, e.g. you can draw the wave here

    ut-1<- ut

u<- ut+1
}

The above pseudo-code is supposed to run in retained-mode so the application might be able to draw the wave in each point of the surface we’re considering by simply calling draw_wave(ut+1) .

Let assume that we’re modeling how a water wave evolves and so let the u function represent the wave height (0 – horizontal level): we can now write down the beginning of our pseudo-code

// Surface data 

height = 500;

width = 500; 

// Wave parameters

c = 1; // Speed of wave

dx = 1; // Space step

dt = 0.05; // Time step

k=0.002 // Decay factor

kdt=k*dt; // Decayment factor per timestep, recall that q = 2 - kdt, r = -1 +kdt

c1 = dt^2*c^2/dx^2; // b factor

// Initial conditions

u_current_t = zeros(height,width); // Create a height x width zero matrix

u_previous_t = u_current_t;

We basically defined a surface where the wave evolution will be drawn (a 500x500 area) and initialized the wave parameters we saw a few paragraphs ago (make sure to recall the q,r and b substitutions we did before). The initial condition is a zero matrix (u_current_t) so the entire surface is quiet and there’s no wave evolving.

Given that we are considering a matrix surface (every point located at (x;y) coordinates is described by a u value indicating the height of the wave there) we need to write code to implement the

line in the for cycle. Actually the above expression is a simplified form for

and we need to implement this last one. We may write down something like the following

for(t=0; t<A_LOT_OF_TIME; t += dt)

{

    u_next_t = (2-kdt)*u_current_t+(kdt-1)*u_previous_t+c1*something_at_t(x,y)

    u_previous_t = u_current_t; // Current becomes old

    u_current_t = u_next_t; // New becomes current

    // Draw the wave

    draw_wave(u_current_t)

}

that is, a for cycle with index variable t increasing by dt at every step. Everything should be familiar by now because the (2-kdt),(kdt-1) and c1 terms are the usual q,r and b substitutions. Last thing we need to implement is the something_at_t(x,y) term, as known as:

The wave PDE we started from was

and the term we are interested now is this:

that, in our case, is

Since we have a matrix of points representing our surface, we are totally in the discrete field, and since we need to perform a second derivative on a matrix of discrete points our problem is the same as having an image with pixel intensity values I(x,y) and need to calculate its Laplacian

This is a common image processing task and it’s usually solved by applying a convolution filter to the image we are interested in (in our case: the matrix representing the surface). A common small kernel used to approximate the second derivatives in the definition of the Laplacian is the following

So in order to implement the

term, we need to apply the D Laplacian kernel as a filter to our image (i.e. the u_current_t):

u_next_t=(2-kdt)*u_current_t+(kdt-1)*u_previous_t+c1* convolution(u_current_t,D);

In fact in the element we saw earlier

the red dot elements are weighted and calculated with a 2D convolution of the D kernel.

An important element to keep in mind while performing the D kernel convolution with our u_current_t matrix is that every value in outside halos (every value involved in the convolution but outside the u_current_t matrix boundaries) should be zero as in the image below

In the picture above the red border is the u_current_t matrix perimeter while the blue 3x3 matrix is the D Laplacian kernel, everything outside the red border is zero. This is important because we want our surface to act like water contained in a recipient and the waves in the water to “bounce back” if they hit the container’s border. By zeroing all the values outside the surface matrix we don’t receive wave contributions from the outside of our perimeter nor do we influence in any way what’s beyond it. In addition the “energy” of the wave doesn’t spread out and is partially “bounced back” by the equation.

Now the algorithm is almost complete: our PDE assures us that every wave crest in our surface will be properly “transmitted” as a real wave. The only problem is: starting with a zero-everywhere matrix and letting it evolve would produce just nothing. Forever.

We will now add a small droplet to our surface to perturb it and set the wheels in motion.

To simulate as realistic as possible the effect of a droplet falling into our surface we will introduce a “packet” in our surface matrix. That means we are going to add a matrix that represents a discrete Gaussian function (similar to a Gaussian kernel) to our surface matrix. Notice that we’re going to “add”, not to “convolve”.

Given the 2D Gaussian formula

we have that A is the Gaussian amplitude (so will be our droplet amplitude), x0 and y0 are the center’s coordinates and the   and  are the spreads of the blob. Putting a negative amplitude like in the following image we can simulate a droplet just fallen into our surface.

To generate the droplet matrix we can use the following pseudo-code

// Generate a droplet matrix

dsz = 3; // Droplet size

da=0.07; // Droplet amplitude

[X,Y] = generate_XY_planes(dsz,da);

DropletMatrix = -da * exp( -(X/dsz)^2 -(Y/dsz)^2);

da is the droplet amplitude (and it’s negative as we just said), while dsz is the droplet size, that is, the Gaussian “bell” radius. X and Y are two matrices representing the X and Y plane discrete values

so the X and Y matrices for the image above are

And the final DropletMatrix is similar to the following

where the central value is -0.0700. If you drew the above matrix you would obtain a 3D Gaussian function which can now model our droplet.

The final pseudo-code for our wave algorithm is the following

// Surface data

height = 500;

width = 500;

// Wave parameters

c = 1; // Speed of wave

dx = 1; // Space step

dt = 0.05; // Time step

k=0.002 // Decay factor

kdt=k*dt; // Decayment factor per timestep, recall that q = 2 - kdt, r = -1 +kdt

c1 = dt^2*c^2/dx^2; // b factor

// Initial conditions

u_current_t=zeros(height,width); // Create a height x width zero matrix

u_previous_t = u_current_t;

// Generate a droplet matrix

dsz = 3; // Droplet size

da=0.07; // Droplet amplitude

[X,Y] = generate_XY_planes(dsz,da);

DropletMatrix = -da * exp( -(X/dsz)^2 -(Y/dsz)^2);

// This variable is used to add just one droplet

One_single_droplet_added = 0;

for(t=0; t<A_LOT_OF_TIME; t = t + dt)

{

    u_next_t = (2-kdt)*u_current_t+(kdt-1)*u_previous_t+c1*convolution(u_current_t,D);

    u_previous_t = u_current_t; // Current becomes old

    u_current_t = u_next_t; // New becomes current

    // Draw the wave

    draw_wave(u_current_t)

    if(One_single_droplet_added == 0)

    {

        One_single_droplet_added = 1; // no more droplets

        addMatrix(u_current_t, DropletMatrix, [+100;+100]);

    }

}

The variable One_single_droplet_added is used to check if the droplet has already been inserted (we want just one droplet). The addMatrix function adds the DropletMatrix values to the u_current_t surface matrix values centering the DropletMatrix at the point (100;100), remember that the DropletMatrix is smaller (or equal) to the surface matrix, so we just add the DropletMatrix’s values to the u_current_t’s values that fall within the dimensions of the DropletMatrix.

Now the algorithm is complete, although it is still a theoretical simulation. We will soon implement it with real code.

 

Implementing the Wave simulation

We will now discuss how the above algorithm has been implemented in a real C++ project to create a fully functional openGL physical simulation.

The sequence diagram below shows the skeleton of the program which basically consists of three main parts: the main module where the startup function resides as well as the kernel function which creates the matrix image for the entire window, an openGL renderer wrapper to encapsulate GLUT library functions and callback handlers and a matrix hand-written class to simplify matrix data access and manipulation. Although a sequence diagram would require following a standard software engineering methodology and its use is strictly enforced by predetermined rules, nonetheless we will use it as an abstraction to model the program’s control flow

The program starts at the main() and creates an openGLrenderer object which will handle all the graphic communication with the GLUT library and the callback events (like mouse movements, keyboard press events, etc.). OpenGL doesn’t provide routines for interfacing with a windowing system, so in this project we will rely on GLUT libraries which provide a platform-independent interface to manage windows and input events. To create an animation that runs as fast as possible we will set an idle callback with the glutIdleFunc() function. We will explain more about this later.

Initially the algorithm sets its initialization variables (time step, space step, droplet amplitude, Laplacian 2D kernel, etc.. practically everything we saw in the theory section) and every matrix corresponding to the image to be rendered is zeroed out. The Gaussian matrix corresponding to the droplets is also preprocessed. A structure defined in openGLRenderer’s header file contains all the data which should be retained across image renderings

 typedef struct kernelData
{
    float a1; // 2-kdt
    float a2; // kdt-1
    float c1; // decayment factor
    sMatrix* u;
    sMatrix* u0;
    sMatrix* D;
    int Ddim; // Droplet matrix width/height
    int dsz; // Gaussian radius
    sMatrix* Zd;
} kernelData; 

The structure is updated each time a time step is performed since it contains both current and previous matrices that describe the waves evolution across time. Since this structure is both used by the openGL renderer and the main module to initialize it, the variable is declared as external and defined afterwards in the openGLrenderer cpp file (so its scope goes beyond the single translation unit). After everything has been set up the openGLRenderer class’ startRendering() method is called and the openGL main loop starts fetching events. The core of the algorithm we saw is in the main module’s kernel() function which is called every time an openGL idle event is dispatched (that is, the screen is updated and the changes will be shown only when the idle callback has completed, thus the amount of rendering work here performed should be minimized to avoid performance loss).

The kernel’s function code is the following

// This kernel is called at each iteration
// It implements the main loop algorithm and someway "rasterize" the matrix data
// to pass to the openGL renderer. It also adds droplets in the waiting queue
//
void kernel(unsigned char *ptr, kernelData& data)
{
    // Wave evolution
    sMatrix un(DIM,DIM);

    // The iterative discrete update (see documentation)
    un = (*data.u)*data.a1 + (*data.u0)*data.a2 + convolutionCPU((*data.u),(*data.D))*data.c1;

    // Step forward in time
    (*data.u0) = (*data.u);
    (*data.u) = un;

    // Prepare matrix data for rendering
    matrix2Bitmap( (*data.u), ptr );

    if(first_droplet == 1) // By default there's just one initial droplet
    {
        first_droplet = 0;
        int x0d= DIM / 2; // Default droplet center
        int y0d= DIM / 2;

        // Place the (x0d;y0d) centered Zd droplet on the wave data (it will be added at the next iteration)
        for(int Zdi=0; Zdi < data.Ddim; Zdi++)
        {
            for(int Zdj=0; Zdj < data.Ddim; Zdj++)
            {
                (*data.u)(y0d-2*data.dsz+Zdi,x0d-2*data.dsz+Zdj) += (*data.Zd)(Zdi, Zdj);
            }
        }
    }

    // Render the droplet queue
    m_renderer->renderWaitingDroplets();
 } 

The pattern we followed in the wave PDE evolution can be easily recognized in the computational-intensive code line

un = (*data.u)*data.a1 + (*data.u0)*data.a2 + convolutionCPU((*data.u),(*data.D))*data.c1;

which basically performs the well-known iterative step

All constants are preprocessed to improve performances.

It is to be noticed that the line adds up large matrices which are referred in the code as sMatrix objects. The sMatrix class is an handwritten simple class that exposes simple operator overrides to ease working with matrices. Except that one should bear in mind that large matrix operations shall avoid passing arguments by value and that to create a new matrix and copy it to the destination a copy constructor is required (to avoid obtaining a shallow copy without the actual data), the code is pretty straight forwarding so no more words will be spent on it 

// This class handles matrix objects
class sMatrix
{
public:
   int rows, columns;
   float *values;

   sMatrix(int height, int width)
   {
       if (height == 0 || width == 0)
      throw "Matrix constructor has 0 size";

       rows = height;
       columns = width;

       values = new float[rows*columns];
   }

   // Copy constructor, this is needed to perform a deep copy (not a shallow one)
   sMatrix(const sMatrix& mt)
   {
       this->rows = mt.rows;
       this->columns = mt.columns;

       this->values = new float[rows*columns];

       // Copy the values
       memcpy(this->values, mt.values, this->rows*this->columns*sizeof(float));
   }

   ~sMatrix()
   {
       delete [] values;
   }

   // Allows matrix1 = matrix2
   sMatrix& operator= (sMatrix const& m)
   {
       if(m.rows != this->rows || m.columns != this->columns)
       {
        throw "Size mismatch";
       }

       memcpy(this->values,m.values,this->rows*this->columns*sizeof(float));

      return *this; // Since "this" continues to exist after the function call, it is perfectly legit to return a reference
   }
    
   // Allows both matrix(3,3) = value and value = matrix(3,3)
   float& operator() (const int row, const int column)
   {
    // May be suppressed to slightly increase performances
    if (row < 0 || column < 0 || row > this->rows || column > this->columns)
        throw "Size mismatch";

    return values[row*columns+column]; // Since the float value continues to exist after the function call, it is perfectly legit to return a reference
   }

  // Allows scalar*matrix (e.g. 3*matrix) for each element
  sMatrix operator* (const float scalar)
  {
    sMatrix result(this->rows, this->columns);
    // Multiply each value by the scalar
    for(int i=0; i<rows*columns; i++)
    {
        result.values[i] = this->values[i]*scalar;
    }
    return result; // Copy constructor
  }

  // Allows matrix+matrix (if same size)
  sMatrix operator+ (const sMatrix& mt)
  {
    if (this->rows != mt.rows || this->columns != mt.columns)
        throw "Size mismatch";

    sMatrix result(this->rows, this->columns);
    // Sum each couple of values
    for(int i=0; i<rows; i++)
    {
        for(int j=0; j<columns; j++)
            result.values[i*columns+j] = this->values[i*columns+j] + mt.values[i*columns+j];
    }
    return result; // Copy constructor
  }
}; 

The convolution is performed with the following code (a classic approach):

// Returns the convolution between the matrix A and the kernel matrix B,
// A's size is preserved
//
sMatrix convolutionCPU(sMatrix& A, sMatrix& B)
{
  sMatrix result(A.rows, A.columns);
  int kernelHradius = (B.rows - 1) / 2;
  int kernelWradius = (B.columns - 1) / 2;
  float convSum, convProd, Avalue;
  for(int i=0; i<A.rows; i++)
  {
    for(int j=0; j<A.columns; j++)
    {
        // 
        // --------j--------->
        // _ _ _ _ _ _ _     
        // |            |    |
        // |            |    |
        // |       A    |    i
        // |            |    |
        // |            |    |
        // _ _ _ _ _ _ _|    v
        //
        convSum = 0;
        for(int bi=0; bi<B.rows; bi++)
        {
            for(int bj=0; bj<B.columns; bj++)
            {
                // A's value respect to the kernel center
                int relpointI = i-kernelHradius+bi;
                int relpointJ = j-kernelWradius+bj;
                if(relpointI < 0 || relpointJ < 0 || relpointI >= A.rows || relpointJ >= A.columns)
                    Avalue = 0;
                else
                    Avalue = A(i-kernelHradius+bi,j-kernelWradius+bj);
                convProd = Avalue*B(bi,bj);
                convSum += convProd;
            }
        }
            
        // Store the convolution result
        result(i,j) = convSum;
    }
  }
  return result;
} 

After calculating the system’s evolution, the time elapsing is simulated by swapping the new matrix with the old one and discarding the previous state as we described before.

Then a matrix2Bitmap() call performs a matrix-to-bitmap conversion as its name suggests, more precisely the entire simulation area is described by a large sMatrix object which contains, obviously, float values. To actually render these values as pixel units we need to convert each value to its corresponding RGBA value and pass it to the openGLRenderer class (which in turn will pass the entire bitmap buffer to the GLUT library). In brief: we need to perform a float-to-RGBcolor mapping.

Since in the physical simulation we assumed that the resting water height is at 0 and every perturbation would heighten or lower this value (in particular the droplet Gaussian matrix lowers it by a maximum -0.07 factor), we are searching for a [-1;1] to color mapping. A HSV color model would better simulate a gradual color transition as we actually experience with our own eyes, but this would require converting it back to RGB values to set up a bitmap map to pass back at the GLUT wrapper. For performance reasons we chose to assign each value a color (colormap). A first solution would have been implementing a full [-1;1] -> [0x0;0xFFFFFF] mapping in order to cover all the possible colors in the RGB format

// Returns a RGB color from a value in the interval between [-1;+1]
RGB getColorFromValue(float value)
{
    RGB result;
 
    if(value <= -1.0f)
    {
            result.r = 0x00;
            result.g = 0x00;
            result.b = 0x00;
    }
    else if(value >= 1.0f)
    {
            result.r = 0xFF;
            result.g = 0xFF;
            result.b = 0xFF;
    }
    else
    {
            float step = 2.0f/0xFFFFFF;
 
            unsigned int cvalue = (unsigned int)((value + 1.0f)/step);
 
            if(cvalue < 0)
                    cvalue = 0;
            else if(cvalue > 0xFFFFFF)
                    cvalue = 0xFFFFFF;
 
            result.r = cvalue & 0xFF;
            result.g = (cvalue & 0xFF00) >> 8;
            result.b = (cvalue & 0xFF0000) >> 16;
    }
 
    return result;
} 

However the above method is either performance-intensive and doesn’t render very good a smooth color transition: let’s take a look at a droplet mapped like that

looks more like a fractal rather than a droplet, so the above solution won’t work. A better way to improve performances (and the look of the image) is to hard-code a colormap in an array and to use it when needed:

float pp_step = 2.0f / (float)COLOR_NUM;
// The custom colormap, you can customize it to whatever you like
unsigned char m_colorMap[COLOR_NUM][3] = 
{
{0,0,143},{0,0,159},{0,0,175},{0,0,191},{0,0,207},{0,0,223},{0,0,239},{0,0,255},
{0,16,255},{0,32,255},{0,48,255},{0,64,255},{0,80,255},{0,96,255},{0,111,255},{0,128,255},
{0,143,255},{0,159,255},{0,175,255},{0,191,255},{0,207,255},{0,223,255},{0,239,255},{0,255,255},
{16,255,239},{32,255,223},{48,255,207},{64,255,191},{80,255,175},{96,255,159},{111,255,143},{128,255,128},
{143,255,111},{159,255,96},{175,255,80},{191,255,64},{207,255,48},{223,255,32},{239,255,16},{255,255,0},
{255,239,0},{255,223,0},{255,207,0},{255,191,0},{255,175,0},{255,159,0},{255,143,0},{255,128,0},
{255,111,0},{255,96,0},{255,80,0},{255,64,0},{255,48,0},{255,32,0},{255,16,0},{255,0,0},
{239,0,0},{223,0,0},{207,0,0},{191,0,0},{175,0,0},{159,0,0},{143,0,0},{128,0,0},
{100,0,20},{80,0,40},{60,0,60},{40,0,80},{20,0,100},{0,0,120}
};
// Returns a RGB color from a value in the interval between [-1;+1] using the above colormap
RGB getColorFromValue(float value)
{
  RGB result;
  unsigned int cvalue = (unsigned int)((value + 1.0f)/pp_step);
  if(cvalue < 0)
    cvalue = 0;
  else if(cvalue >= COLOR_NUM)
    cvalue = COLOR_NUM-1;
  result.r = m_colorMap[cvalue][0];
  result.g = m_colorMap[cvalue][1];
  result.b = m_colorMap[cvalue][2];
  return result;
} 

Creating a colormap isn’t hard and different colormaps are freely available on the web which produce different transition effects. This time the result was much nicer (see the screenshot later on) and the performances (although an every-value conversion is always an intensive task) increased substantially.

The last part involved in the on-screen rendering is adding a droplet wherever the user clicks on the window with the cursor. One droplet is automatically added at the center of the surface (you can find the code in the kernel() function, it is controlled by the first_droplet variable) but the user can click everywhere (almost everywhere) on the surface to add another droplet in that spot. To achieve this a queue has been implemented to contain at the most 60 droplet centers where the Gaussian matrix will be placed (notice that the matrix will be added to the surface values that were already present in the spots, not just replace them).

#define MAX_DROPLET        60
typedef struct Droplet
{
    int mx;
    int my;
} Droplet; 
Droplet dropletQueue[MAX_DROPLET];
int dropletQueueCount = 0; 

The queue system has been implemented for a reason: unlike the algorithm in pseudo-code we wrote before, rendering a scene with openGL requires the program to control the objects to be displayed in immediate-mode: that means the program needs to take care of what should be drawn before the actual rendering is performed, it cannot simply put a droplet to be rendered inside the data to be drawn because it could be in use (you can do this in retained-mode). Besides, we don’t know when a droplet will be added because it’s totally user-dependent. Because of that, every time the kernel() finishes, the droplet queue is emptied and the droplet Gaussian matrices are added to the data to be rendered (the surface data). The code which performs this is the following

void openGLRenderer::renderWaitingDroplets()
{
   // If there are droplets waiting to be rendered, schedule for the next rendering
   while(dropletQueueCount > 0)
   {
         dropletQueueCount--;
         addDroplet(dropletQueue[dropletQueueCount].mx,dropletQueue[dropletQueueCount].my);
   }       
}
 
void addDroplet( int x0d, int y0d )
{
   y0d = DIM - y0d;
   // Place the (x0d;y0d) centered Zd droplet on the wave data (it will be added at the next iteration)
   for(int Zdi=0; Zdi< m_simulationData.Ddim; Zdi++)
   {
       for(int Zdj=0; Zdj< m_simulationData.Ddim; Zdj++)
       {
          (*m_simulationData.u)(y0d-2*m_simulationData.dsz+Zdi,x0d-2*m_simulationData.dsz+Zdj)
+= (*m_simulationData.Zd)(Zdi, Zdj);
       }
   }
} 

The code should be familiar by now: the addDroplet function simply adds the Zd 2D Gaussian matrix to the simulation data (the kernel data) at the “current” time, a.k.a. the u matrix which represents the surface.

The code loops until the keyboard callback handler (defined by the openGLrenderer) detects the Esc keypress, after that the application is issued a termination signal, the loops ends. The resources allocated by the program are freed before the exit signal is issued, however this might not be necessary since a program termination would let the OS free any of its previously allocated resources.

With the droplet-adding feature and all the handlers set the code is ready to run. This time the result is much nicer than the previous, that’s because we used a smoother colormap (take a look at the images below). Notice how the convolution term creates the “wave spreading” and the “bouncing” effect when computing values against the padded zero data outside the surface matrix (i.e. when the wave hits the window’s borders and is reflected back). The first image is the simulation in its early stage, that is when some droplets have just been added, the second image represents a later stage when the surface is going to calm down (in our colormap blue values are higher than red ones).

Since we introduced a damping factor (recall it from the theory section), the waves are eventually going to cease and the surface will sooner or later be quiet again. The entire simulation is (except for the colors) quite realistic but quite slow too. That’s because the entire surface matrix is being thoroughly updated and calculated by the system. The kernel() function runs continuously updating the rendering buffer. For a 512x512 image the CPU has to process a large quantity of data and it has also to perform 512x512 floating point convolutions. Using a profiler (like VS’s integrated one) shows that the program spends most of its time in the kernel() call (as we expected) and that the convolution operation is the most cpu-intensive.

It is also interesting to notice that the simulation speed decreases substantially when adding lots of droplets.

In a real scientific simulation environment gigantic quantities of data need to be processed in relatively small time intervals. That’s where GPGPU computing comes into the scene. We will briefly summarize what this acronym means and then we will present a GPU-accelerated version of the wave PDE simulation. 

GPGPU Architectures

GPGPU stands for General Purpose computation on Graphics Processing Units and indicates using graphic processors and devices to perform high parallelizable computations that would normally be handled by CPU devices. The idea of using graphic devices to help CPUnits with their workloads isn’t new, but until recent architectures and frameworks like CUDA (©NVIDIA vendor-specific) or openCL showed up, programmers had to rely on series of workarounds and tricks to work with inconvenient and unintuitive methods and data structures. The reason why a developer should think about porting his CPU-native code into a new GPU version resides in the architecture design differences between CPUs and GPUs. While CPUs evolved (multicore) to gain performance advantages with sequential executions (pipelines, caches, control flows, etc..), GPUs evolved in a many-core way: they tended to operate at higher data bandwidths and chose to heavily increase their execution threads number. In the last years GPGPU has been seen as a new opportunity to use graphical processing units as algebraic coprocessors capable of handling massive parallelization and precision floating point calculations. The idea behind GPGPU architectures is letting CPU handling sequential parts of programs and GPU getting over with parallelizable parts. In fact many scientific applications and systems found their performances increased by such an approach and GPGPU is now a fundamental technology in many fields like medical imaging, physics simulations, signal processing, cryptography, intrusion detection, environmental sciences, etc..

Massive parallelization with CUDA

We chose to use the CUD-Architecture to parallelize some parts of our code. Parallelizing with GPGPU techniques means passing from a sequentially-designed code to a parallel-designed code, this also often means having to rewrite large parts of your code. The most obvious part of the entire simulation algorithm that could benefit of a parallel approach is the surface matrix convolution with the 2D Laplacian kernel.

Notice: for brevity’s sake, we will assume that the reader is already familiar with CUDA C/C++.

The CUDA SDK comes with a large variety of examples regarding how to implement an efficient convolution between matrices and how to apply a kernel as an image filter. However we decided to implement our own version rather than rely on standard techniques. The reasons are many:

  • we wanted to show how a massively parallel algorithm is designed and implemented
  • since our kernel is very small (3x3 2D Laplacian kernel, basically 9 float values) using a FFT approach like the one described by Victor Podlozhnyuk would be inefficient
  • the two-dimensional Gaussian kernel is the only radially symmetric function that is also separable, our kernel is not, so we cannot use separable convolution methods
  • such a small kernel seems perfect to be “aggressively cached” in the convolution operation. We’ll expand on that as soon as we describe the CUDA kernel designed

The most obvious way to perform a convolution (although extremely inefficient) consists in delegating each GPU thread multiple convolutions for each element across the entire image.

Take a look at the following image, we will use a 9x9 thread grid (just one block to simplify things) to perform the convolution. The purple square is our 9x9 grid while the red grids correspond to the 9x9 kernel. Each thread performs the convolution within its elements, then the X are shifted and the entire block is “virtually” moved to the right. When the X coordinate is complete (that is: the entire image horizontal area has been covered), the Y coordinate is incremented and the process starts again until completion. In the border area, every value outside the image will be set to zero.

The code for this simple approach is the following where A is the image matrix and B is the kernel.

__global__ void convolutionGPU(float *A, int A_cols, int A_rows, float *B, int B_Wradius, int B_Hradius,
int B_cols, int B_rows, float *result)
{
 
    // Initial position
    int threadXabs = blockIdx.x*blockDim.x + threadIdx.x;
    int threadYabs = blockIdx.y*blockDim.y + threadIdx.y;
    int threadXabsInitialPos = threadXabs;
 
    float convSum;
 
    while(threadYabs < A_rows)
    {
        while(threadXabs < A_cols)
        {
             // If we are still in the image, start the convolution
             convSum = 0.0f;
             // relative x coord to the absolute thread
             #pragma unroll
             for(int xrel=-B_Wradius;xrel<(B_Wradius+1); xrel++)
             {
                 #pragma unroll
                 for(int yrel=-B_Hradius;yrel<(B_Hradius+1); yrel++)
                 {
                       // Check the borders, 0 if outside
                       float Avalue;
                       if(threadXabs + xrel < 0 || threadYabs + yrel <0 || threadXabs + xrel >= A_cols || threadYabs + yrel >= A_rows)
                            Avalue = 0;
                       else
                            Avalue = A[ (threadYabs+yrel)*A_cols + (threadXabs + xrel) ];
 
                       // yrel+b_hradius makes the interval positive 
                       float Bvalue = B[ (yrel+B_Hradius)*B_cols + (xrel+B_Wradius) ];
 
                       convSum += Avalue * Bvalue;
                   }
              }
 
              // Store the result and proceed ahead in the grid
              result[threadYabs*A_cols + threadXabs ] = convSum;
 
              threadXabs += blockDim.x * gridDim.x;
         }
 
         // reset X pos and forward the Y pos
         threadXabs = threadXabsInitialPos;
         threadYabs += blockDim.y * gridDim.y;
    }
        
    // convolution finished
} 

As already stated, this simple approach has several disadvantages

  • The kernel is very small, keeping it into global memory and accessing to it for every convolution performed is extremely inefficient
  • Although the matrix readings are partially coalesced, thread divergence can be significant with threads active in the border area and threads that are inactive
  • There’s no collaborative behavior among threads, although they basically use the same kernel and share a large part of the apron region

Hence a way better method to perform GPU convolution has been designed keeping in mind the points above.

The idea is simple: letting each thread load part of the apron and data regions in the shared memory thus maximizing readings coalescence and reducing divergence.

The code that performs the convolution on the GPU version of the simulation is the following

 // For a 512x512 image the grid is 170x170 blocks 3x3 threads each one
__global__ void convolutionGPU(float *A, float *result)
{
   __shared__ float data[laplacianD*2][laplacianD*2];
 
   // Absolute position into the image
   const int gLoc = threadIdx.x + IMUL(blockIdx.x,blockDim.x) + IMUL(threadIdx.y,DIM) + IMUL(blockIdx.y,blockDim.y)*DIM;
 
   // Image-relative position
   const int x0 = threadIdx.x + IMUL(blockIdx.x,blockDim.x);
   const int y0 = threadIdx.y + IMUL(blockIdx.y,blockDim.y);
 
   // Load the apron and data regions
 
   int x,y;
   // Upper left square
   x = x0 - kernelRadius;
   y = y0 - kernelRadius;
   if(x < 0 || y < 0)
         data[threadIdx.x][threadIdx.y] = 0.0f;
   else
         data[threadIdx.x][threadIdx.y] = A[ gLoc - kernelRadius - IMUL(DIM,kernelRadius)];
 
   // Upper right square
   x = x0 + kernelRadius + 1;
   y = y0 - kernelRadius;
   if(x >= DIM || y < 0)
         data[threadIdx.x + blockDim.x][threadIdx.y] = 0.0f;
   else
         data[threadIdx.x + blockDim.x][threadIdx.y] = A[ gLoc + kernelRadius+1 - IMUL(DIM,kernelRadius)];
 
   // Lower left square
   x = x0 - kernelRadius;
   y = y0 + kernelRadius+1;
   if(x < 0 || y >= DIM)
         data[threadIdx.x][threadIdx.y + blockDim.y] = 0.0f;
   else
         data[threadIdx.x][threadIdx.y + blockDim.y] = A[ gLoc - kernelRadius + IMUL(DIM,(kernelRadius+1))];
 
   // Lower right square
   x = x0 + kernelRadius+1;
   y = y0 + kernelRadius+1;
   if(x >= DIM || y >= DIM)
         data[threadIdx.x + blockDim.x][threadIdx.y + blockDim.y] = 0.0f;
   else
         data[threadIdx.x + blockDim.x][threadIdx.y + blockDim.y] = A[ gLoc + kernelRadius+1 + IMUL(DIM,(kernelRadius+1))];
 
   __syncthreads();
 
   float sum = 0;
   x = kernelRadius + threadIdx.x;
   y = kernelRadius + threadIdx.y;
 
   // Execute the convolution in the shared memory (kernel is in constant memory)
#pragma unroll
   for(int i = -kernelRadius; i<=kernelRadius; i++)
          for(int j=-kernelRadius; j<=kernelRadius; j++)
                  sum += data[x+i][y+j]  * gpu_D[i+kernelRadius][j+kernelRadius];
 
   // Transfer the risult to global memory
   result[gLoc] = sum;
 
} 

The kernel only receives the surface matrix and the result where to store the convolved image. The kernel isn’t provided because it has been put into a special memory called “constant memory” which is read-only by kernels, pre-fetched and highly optimized to let all threads read from a specific location with minimum latency. The downside is that this kind of memory is extremely limited (in the order of 64Kb) so should be used wisely. Declaring our 3x3 kernel as constant memory grants us a significant speed advantage

__device__ __constant__ float gpu_D[laplacianD][laplacianD]; // Laplacian 2D kernel

The image below helps to determine how threads load from the surface matrix in the global memory the data and store them into faster on-chip shared memory before actually using them in the convolution operation. The purple 3x3 square is the kernel window and the central element is the value we are actually pivoting on. The grid is a 172x172 blocks 3x3 threads each one; each block of 3x3 threads have four stages to complete before entering the convolution loop: load the upper left apron and image data into shared memory (the upper left red square from the kernel purple window), load the upper right area (red square), load the lower left area (red square) and load the lower right area (idem). Since shared memory is only available to the threads in a block, each block loads its own shared area. Notice that we chose to let every thread read something from global memory to maximize coalescence, but we are not actually going to use every single element. The image shows a yellow area and a gray area: the yellow data is actually going to be used in the convolution operation for each element in the purple kernel square (it comprises aprons and data) while the gray area isn’t going to be used by any convolution performed by the block we are considering.

After filling each block’s shared memory array, the CUDA threads get synchronized to minimize their divergence. Then the execution of the convolution algorithm is performed: shared data is multiplied against constant kernel data resulting in a highly optimized operation.

The #pragma unroll directive instructs the compiler to unroll (where possible) the loop to reduce cycle control overhead and improve performances. A small example of loop unrolling: the following loop

for(int i=0;i<1000;i++)

a[i] = b[i] + c[i];

might be optimized by unrolling it

for(int i=0;i<1000;i+=2)

{

a[i] = b[i] + c[i];

a[i+1] = b[i+1] + c[i+1];

}

so that the control instructions are executed less and the overall loop improves its performances. It is to be noticed that almost every optimization in CUDA code needs to be carefully and thoroughly tested because a different architecture and different program control flows might produce different results (as well as different compiler optimizations that, unfortunately, cannot be always trusted).

Also notice that the IMUL macro is used in the code which is defined as

#define IMUL(a,b) __mul24(a,b)

On devices of CUDA compute capability 1.x, 32-bit integer multiplication is implemented using multiple instructions as it is not natively supported. 24-bit integer multiplication is natively supported via the __[u]mul24 intrinsic. However on devices of compute capability 2.0, however, 32-bit integer multiplication is natively supported, but 24-bit integer multiplication is not. __[u]mul24 is therefore implemented using multiple instructions and should not be used. So if you are planning to use the code on 2.x devices, make sure to redefine the macro directive.

A typical code which could call the kernel we just wrote could be

sMatrix convolutionGPU_i(sMatrix& A, sMatrix& B)
{
    unsigned int A_bytes = A.rows*A.columns*sizeof(float);
    sMatrix result(A.rows, A.columns);
    float *cpu_result = (float*)malloc(A_bytes);
       
    // Copy A data to the GPU global memory (B aka the kernel is already there)
    cudaError_t chk;
    chk = cudaMemcpy(m_gpuData->gpu_matrixA, A.values, A_bytes, cudaMemcpyHostToDevice);
    if(chk != cudaSuccess)
    {
        printf("\nCRITICAL: CANNOT TRANSFER MEMORY TO GPU");
        return result;
    }
       
    // Call the convolution kernel
    dim3 blocks(172,172);
    dim3 threads(3,3);
 
    convolutionGPU<<<blocks,threads>>>(m_gpuData->gpu_matrixA, m_gpuData->gpu_matrixResult);
 
    // Copy back the result
    chk = cudaMemcpy(cpu_result, m_gpuData->gpu_matrixResult, A_bytes, cudaMemcpyDeviceToHost);
    if(chk != cudaSuccess)
    {
         printf("\nCRITICAL: CANNOT TRANSFER MEMORY FROM GPU");
         return result;
    }
 
    // Allocate a sMatrix and return it with the GPU data
    free(result.values);
    result.values = cpu_result;
 
    return result;
} 

obviously CUDA memory should be cudaMalloc-ated at the beginning of our program and freed only when the GPU work is complete.

However, as we stated before, converting a sequentially-designed program into a parallel one isn’t an easy task and often requires more than just a plain function-to-function conversion (it depends on the application). In our case substituting just a CPU-convolution function with a GPU-convolution function won’t work. In fact even though we distributed our workload in a better way from the CPU version (see the images below for a CPU-GPU exclusive time percentage), we actually slowed down the whole simulation.

The reason is simple: our kernel() function is called whenever a draw event is dispatched, so it needs to be called very often. Although the CUDA kernel is faster than the CPU convolution function and although GPU memory bandwidths are higher than CPU’s, transferring from (possibly paged-out) host memory to global device memory back and forth just kills our simulation performances. Applications which would benefit more from a CUDA approach usually perform a single-shot heavily-computational kernel workload and then transfer back the results. Real time applications might benefit from a concurrent kernels approach, but a 2.x capability device would be required.

In order to actually accelerate our simulation, a greater code revision is required.

Another more subtle thing to take into account when working with GPU code is CPU optimizations: take a look at the following asm codes for the CPU version of the line

un = (*data.u)*data.a1 + (*data.u0)*data.a2 + convolutionCPU((*data.u),(*data.D))*data.c1;

000000013F2321EF  mov        r8,qword ptr [m_simulationData+20h (13F234920h)]  
000000013F2321F6  mov        rdx,qword ptr [m_simulationData+10h (13F234910h)]  
000000013F2321FD  lea        rcx,[rbp+2Fh]  
000000013F232201  call       convolutionCPU (13F231EC0h)  
000000013F232206  nop  
000000013F232207  movss      xmm2,dword ptr [m_simulationData+8 (13F234908h)]  
000000013F23220F  lea        rdx,[rbp+1Fh]  
000000013F232213  mov        rcx,rax  
000000013F232216  call       sMatrix::operator* (13F2314E0h)  
000000013F23221B  mov        rdi,rax  
000000013F23221E  movss      xmm2,dword ptr [m_simulationData+4 (13F234904h)]  
000000013F232226  lea        rdx,[rbp+0Fh]  
000000013F23222A  mov        rcx,qword ptr [m_simulationData+18h (13F234918h)]  
000000013F232231  call       sMatrix::operator* (13F2314E0h)  
000000013F232236  mov        rbx,rax  
000000013F232239  movss      xmm2,dword ptr [m_simulationData (13F234900h)]  
000000013F232241  lea        rdx,[rbp-1]  
000000013F232245  mov        rcx,qword ptr [m_simulationData+10h (13F234910h)]  
000000013F23224C  call       sMatrix::operator* (13F2314E0h)  
000000013F232251  nop  
000000013F232252  mov        r8,rbx  
000000013F232255  lea        rdx,[rbp-11h]  
000000013F232259  mov        rcx,rax  
000000013F23225C  call       sMatrix::operator+ (13F2315B0h)  
000000013F232261  nop  
000000013F232262  mov        r8,rdi  
000000013F232265  lea        rdx,[rbp-21h]  
000000013F232269  mov        rcx,rax  
000000013F23226C  call       sMatrix::operator+ (13F2315B0h)  
000000013F232271  nop  
000000013F232272  cmp        dword ptr [rax],1F4h  
000000013F232278  jne        kernel+33Fh (13F2324CFh)  
000000013F23227E  cmp        dword ptr [rax+4],1F4h  
000000013F232285  jne        kernel+33Fh (13F2324CFh)  
000000013F23228B  mov        r8d,0F4240h  
000000013F232291  mov        rdx,qword ptr [rax+8]  
000000013F232295  mov        rcx,r12  
000000013F232298  call       memcpy (13F232DDEh)  
000000013F23229D  nop  
000000013F23229E  mov        rcx,qword ptr [rbp-19h]  
000000013F2322A2  call       qword ptr [__imp_operator delete (13F233090h)]  
000000013F2322A8  nop  
000000013F2322A9  mov        rcx,qword ptr [rbp-9]  
000000013F2322AD  call       qword ptr [__imp_operator delete (13F233090h)]  
000000013F2322B3  nop  
000000013F2322B4  mov        rcx,qword ptr [rbp+7]  
000000013F2322B8  call       qword ptr [__imp_operator delete (13F233090h)]  
000000013F2322BE  nop  
000000013F2322BF  mov        rcx,qword ptr [rbp+17h]  
000000013F2322C3  call       qword ptr [__imp_operator delete (13F233090h)]  
000000013F2322C9  nop  
000000013F2322CA  mov        rcx,qword ptr [rbp+27h]  
000000013F2322CE  call       qword ptr [__imp_operator delete (13F233090h)]  
000000013F2322D4  nop  
000000013F2322D5  mov        rcx,qword ptr [rbp+37h]  
000000013F2322D9  call       qword ptr [__imp_operator delete (13F233090h)]  

and now take a look at the GPU version of the line

un = (*data.u)*data.a1 + (*data.u0)*data.a2 + convolutionGPU_i ((*data.u),(*data.D))*data.c1;

000000013F7E23A3  mov        rax,qword ptr [data]  
000000013F7E23AB  movss      xmm0,dword ptr [rax+8]  
000000013F7E23B0  movss      dword ptr [rsp+0A8h],xmm0  
000000013F7E23B9  mov        rax,qword ptr [data]  
000000013F7E23C1  mov        r8,qword ptr [rax+20h]  
000000013F7E23C5  mov        rax,qword ptr [data]  
000000013F7E23CD  mov        rdx,qword ptr [rax+10h]  
000000013F7E23D1  lea        rcx,[rsp+70h]  
000000013F7E23D6  call       convolutionGPU_i (13F7E1F20h)  
000000013F7E23DB  mov        qword ptr [rsp+0B0h],rax  
000000013F7E23E3  mov        rax,qword ptr [rsp+0B0h]  
000000013F7E23EB  mov        qword ptr [rsp+0B8h],rax  
000000013F7E23F3  movss      xmm0,dword ptr [rsp+0A8h]  
000000013F7E23FC  movaps     xmm2,xmm0  
000000013F7E23FF  lea        rdx,[rsp+80h]  
000000013F7E2407  mov        rcx,qword ptr [rsp+0B8h]  
000000013F7E240F  call       sMatrix::operator* (13F7E2B20h)  
000000013F7E2414  mov        qword ptr [rsp+0C0h],rax  
000000013F7E241C  mov        rax,qword ptr [rsp+0C0h]  
000000013F7E2424  mov        qword ptr [rsp+0C8h],rax  
000000013F7E242C  mov        rax,qword ptr [data]  
000000013F7E2434  movss      xmm0,dword ptr [rax+4]  
000000013F7E2439  movaps     xmm2,xmm0  
000000013F7E243C  lea        rdx,[rsp+50h]  
000000013F7E2441  mov        rax,qword ptr [data]  
000000013F7E2449  mov        rcx,qword ptr [rax+18h]  
000000013F7E244D  call       sMatrix::operator* (13F7E2B20h)  
000000013F7E2452  mov        qword ptr [rsp+0D0h],rax  
000000013F7E245A  mov        rax,qword ptr [rsp+0D0h]  
000000013F7E2462  mov        qword ptr [rsp+0D8h],rax  
000000013F7E246A  mov        rax,qword ptr [data]  
000000013F7E2472  movss      xmm2,dword ptr [rax]  
000000013F7E2476  lea        rdx,[rsp+40h]  
000000013F7E247B  mov        rax,qword ptr [data]  
000000013F7E2483  mov        rcx,qword ptr [rax+10h]  
000000013F7E2487  call       sMatrix::operator* (13F7E2B20h)  
000000013F7E248C  mov        qword ptr [rsp+0E0h],rax  
000000013F7E2494  mov        rax,qword ptr [rsp+0E0h]  
000000013F7E249C  mov        qword ptr [rsp+0E8h],rax  
000000013F7E24A4  mov        r8,qword ptr [rsp+0D8h]  
000000013F7E24AC  lea        rdx,[rsp+60h]  
000000013F7E24B1  mov        rcx,qword ptr [rsp+0E8h]  
000000013F7E24B9  call       sMatrix::operator+ (13F7E2BF0h)  
000000013F7E24BE  mov        qword ptr [rsp+0F0h],rax  
000000013F7E24C6  mov        rax,qword ptr [rsp+0F0h]  
000000013F7E24CE  mov        qword ptr [rsp+0F8h],rax  
000000013F7E24D6  mov        r8,qword ptr [rsp+0C8h]  
000000013F7E24DE  lea        rdx,[rsp+90h]  
000000013F7E24E6  mov        rcx,qword ptr [rsp+0F8h]  
000000013F7E24EE  call       sMatrix::operator+ (13F7E2BF0h)  
000000013F7E24F3  mov        qword ptr [rsp+100h],rax  
000000013F7E24FB  mov        rax,qword ptr [rsp+100h]  
000000013F7E2503  mov        qword ptr [rsp+108h],rax  
000000013F7E250B  mov        rdx,qword ptr [rsp+108h]  
000000013F7E2513  lea        rcx,[un]  
000000013F7E2518  call       sMatrix::operator= (13F7E2A90h)  
000000013F7E251D  nop  
000000013F7E251E  lea        rcx,[rsp+90h]  
000000013F7E2526  call        sMatrix::~sMatrix (13F7E2970h)  
000000013F7E252B  nop  
000000013F7E252C  lea        rcx,[rsp+60h]  
000000013F7E2531  call       sMatrix::~sMatrix (13F7E2970h)  
000000013F7E2536  nop  
000000013F7E2537  lea        rcx,[rsp+40h]  
000000013F7E253C  call       sMatrix::~sMatrix (13F7E2970h)  
000000013F7E2541  nop  
000000013F7E2542  lea        rcx,[rsp+50h]  
000000013F7E2547  call       sMatrix::~sMatrix (13F7E2970h)  
000000013F7E254C  nop  
000000013F7E254D  lea        rcx,[rsp+80h]  
000000013F7E2555  call       sMatrix::~sMatrix (13F7E2970h)  
000000013F7E255A  nop  
000000013F7E255B  lea        rcx,[rsp+70h]  
000000013F7E2560 call        sMatrix::~sMatrix(13F7E2970h)  

the code, although the data involved are practically the same, looks much more bloated up (there are even nonsense operations, look at address 000000013F7E23DB). Probably letting the CPU finish the calculation after the GPU has done its work is not a good idea.

Since there are other functions which can be parallelized (like the matrix2bitmap() function), we need to move as much workload as possible on the device.

First we need to allocate memory on the device at the beginning of the program and deallocate it when finished. Small data like our algorithm parameters can be stored into constant memory to boost performances, while matrices large data is more suited into global memory (constant memory size is very limited).

// Initialize all data used by the device
// and the rendering simulation data as well
void initializeGPUData()
{
    /* Algorithm parameters */
 
    // Time step
    float dt = (float)0.05;
    // Speed of the wave
    float c = 1;
    // Space step
    float dx = 1;
    // Decay factor
    float k = (float)0.002;
    // Droplet amplitude (Gaussian amplitude)
    float da = (float)0.07;
        
    // Initialize u0
    sMatrix u0(DIM,DIM);
 
    for(int i=0; i<DIM; i++)
    {
          for(int j=0; j<DIM; j++)
          {
                u0(i,j) = 0.0f; // The corresponding color in the colormap for 0 is green
          }
    }
 
    // Initialize the rendering img to the u0 matrix
    CPUsMatrix2Bitmap(u0, renderImg);
        
    // Decayment per timestep
    float kdt=k*dt;
    // c1 constant
    float c1=pow(dt,2)*pow(c,2)/pow(dx,2);
 
    // Droplet as gaussian
    // This code creates a gaussian discrete droplet, see the documentation for more information
    const int dim = 4*dropletRadius+1;
    sMatrix xd(dim, dim);
    sMatrix yd(dim, dim);
    for(int i=0; i<dim; i++)
    {
           for(int j=-2*dropletRadius; j<=2*dropletRadius; j++)
           {
                  xd(i,j+2*dropletRadius) = j;
                  yd(j+2*dropletRadius,i) = j;
           }
    }
    float m_Zd[dim][dim];
    for(int i=0; i<dim; i++)
    {
           for(int j=0; j<dim; j++)
           {
                  // Calculate Gaussian centered on zero
                  m_Zd[i][j] = -da*exp(-pow(xd(i,j)/dropletRadius,2)-pow(yd(i,j)/dropletRadius,2));
           }
    }
 
    /* GPU data initialization */
 
    // Allocate memory on the GPU for u and u0 matrices
    unsigned int UU0_bytes = DIM*DIM*sizeof(float);
    cudaError_t chk;
    chk = cudaMalloc((void**)&m_gpuData.gpu_u, UU0_bytes);
    if(chk != cudaSuccess)
    {
         printf("\nCRITICAL: CANNOT ALLOCATE GPU MEMORY");
         return;
    }
    chk = cudaMalloc((void**)&m_gpuData.gpu_u0, UU0_bytes);
    if(chk != cudaSuccess)
    {
         printf("\nCRITICAL: CANNOT ALLOCATE GPU MEMORY");
         return;
    }
    // Allocate memory for ris0, ris1, ris2 and ptr matrices
    chk = cudaMalloc((void**)&m_gpuData.ris0, UU0_bytes);
    if(chk != cudaSuccess)
    {
         printf("\nCRITICAL: CANNOT ALLOCATE GPU MEMORY");
         return;
    }
    chk = cudaMalloc((void**)&m_gpuData.ris1, UU0_bytes);
    if(chk != cudaSuccess)
    {
         printf("\nCRITICAL: CANNOT ALLOCATE GPU MEMORY");
         return;
    }
    chk = cudaMalloc((void**)&m_gpuData.ris2, UU0_bytes);
    if(chk != cudaSuccess)
    {
         printf("\nCRITICAL: CANNOT ALLOCATE GPU MEMORY");
         return;
    }
    chk = cudaMalloc((void**)&m_gpuData.gpu_ptr, DIM*DIM*4);
    if(chk != cudaSuccess)
    {
         printf("\nCRITICAL: CANNOT ALLOCATE GPU MEMORY");
         return;
    }
 
    // Initialize to zero both u and u0
    chk = cudaMemcpy(m_gpuData.gpu_u0, u0.values, UU0_bytes, cudaMemcpyHostToDevice);
    if(chk != cudaSuccess)
    {
         printf("\nCRITICAL: CANNOT TRANSFER MEMORY TO GPU");
         return;
    }
    chk = cudaMemcpy(m_gpuData.gpu_u, u0.values, UU0_bytes, cudaMemcpyHostToDevice);
    if(chk != cudaSuccess)
    {
         printf("\nCRITICAL: CANNOT TRANSFER MEMORY TO GPU");
         return;
    }
        
    // Preload Laplacian kernel
    float m_D[3][3];
 
    m_D[0][0] = 0.0f; m_D[1][0] = 1.0f;  m_D[2][0]=0.0f;
    m_D[0][1] = 1.0f; m_D[1][1] = -4.0f; m_D[2][1]=1.0f;
    m_D[0][2] = 0.0f; m_D[1][2] = 1.0f;  m_D[2][2]=0.0f;
        
    // Copy Laplacian to constant memory
    chk = cudaMemcpyToSymbol((const char*)gpu_D, m_D, 9*sizeof(float), 0, cudaMemcpyHostToDevice);
    if(chk != cudaSuccess)
    {
          printf("\nCONSTANT MEMORY TRANSFER FAILED");
          return;
    }
 
    // Store all static algorithm parameters in constant memory
    const float a1 = (2-kdt);
    chk = cudaMemcpyToSymbol((const char*)&gpu_a1, &a1, sizeof(float), 0, cudaMemcpyHostToDevice);
    if(chk != cudaSuccess)
    {
          printf("\nCONSTANT MEMORY TRANSFER FAILED");
          return;
    }
    const float a2 = (kdt-1);
    chk = cudaMemcpyToSymbol((const char*)&gpu_a2, &a2, sizeof(float), 0, cudaMemcpyHostToDevice);
    if(chk != cudaSuccess)
    {
         printf("\nCONSTANT MEMORY TRANSFER FAILED");
         return;
    }
    chk = cudaMemcpyToSymbol((const char*)&gpu_c1, &c1, sizeof(float), 0, cudaMemcpyHostToDevice);
    if(chk != cudaSuccess)
    {
         printf("\nCONSTANT MEMORY TRANSFER FAILED");
         return;
    }
    const int ddim = dim;
    chk = cudaMemcpyToSymbol((const char*)&gpu_Ddim, &ddim, sizeof(int), 0, cudaMemcpyHostToDevice);
    if(chk != cudaSuccess)
    {
         printf("\nCONSTANT MEMORY TRANSFER FAILED");
         return;
    }
    const int droplet_dsz = dropletRadius;
    chk = cudaMemcpyToSymbol((const char*)&gpu_dsz, &droplet_dsz, sizeof(int), 0, cudaMemcpyHostToDevice);
    if(chk != cudaSuccess)
    {
         printf("\nCONSTANT MEMORY TRANSFER FAILED");
         return;
    }
    chk = cudaMemcpyToSymbol((constchar*)&gpu_Zd, &m_Zd, sizeof(float)*dim*dim, 0, cudaMemcpyHostToDevice);
    if(chk != cudaSuccess)
    {
         printf("\nCONSTANT MEMORY TRANSFER FAILED");
         return;
    }
 
    //
    // Initialize colormap and ppstep in constant memory
    chk = cudaMemcpyToSymbol((const char*)&gpu_pp_step, &pp_step, sizeof(float), 0, cudaMemcpyHostToDevice);
    if(chk != cudaSuccess)
    {
         printf("\nCONSTANT MEMORY TRANSFER FAILED");
         return;
    }
    chk = cudaMemcpyToSymbol((const char*)&gpu_m_colorMap, &m_colorMap, sizeof(unsigned char)*COLOR_NUM*3, 0, cudaMemcpyHostToDevice);
    if(chk != cudaSuccess)
    {
         printf("\nCONSTANT MEMORY TRANSFER FAILED");
         return;
    }
}
 
void deinitializeGPUData()
{
    // Free everything from device memory
    cudaFree(m_gpuData.gpu_u);
    cudaFree(m_gpuData.gpu_u0);
    cudaFree(m_gpuData.gpu_ptr);
    cudaFree(m_gpuData.ris0);
    cudaFree(m_gpuData.ris1);
    cudaFree(m_gpuData.ris2);
} 

After initializing the GPU memory the openGLRenderer can be started as usual to call the kernel() function in order to obtain a valid render-able surface image matrix. But there’s a difference now, right in the openGLRenderer constructor

openGLRenderer::openGLRenderer(void)
{
     //. . .
        
     // Sets up the CUBLAS
     cublasStatus_t status = cublasInit();
     if (status != CUBLAS_STATUS_SUCCESS)
     {
           // CUBLAS initialization error
           printf("\nCRITICAL: CUBLAS LIBRARY FAILED TO LOAD");
           return;
     }
        
     // Set up the bitmap data with page-locked memory for fast access
     //no more : renderImg = new unsigned char[DIM*DIM*4];
     cudaError_t chk = cudaHostAlloc((void**)&renderImg, DIM*DIM*4*sizeof(char), cudaHostAllocDefault);
     if(chk != cudaSuccess)
     {
           printf("\nCRITICAL: CANNOT ALLOCATE PAGELOCKED MEMORY");
           return ;
     }
} 

First we decided to use CUBLAS library to perform matrix addition for two reasons:

  • our row-major data on the device is ready to be used by the CUBLAS functions yet (cublasMalloc is just a wrapper around the cudaMalloc)
  • CUBLAS library is extremely optimized for large matrices operations; our matrices aren’t that big but this could help extending the architecture for a future version

Using our sMatrix wrapper is no more an efficient choice and we need to get rid of it while working on the device, although we can still use it for the initialization stage.

The second fundamental thing that we need to notice in the openGLRenderer constructor is that we allocated host-side memory (the memory that will contain the data to be rendered) with cudaHostAlloc instead of the classic malloc. As the documentation states, allocating memory with such a function grants that the CUDA driver will track the virtual memory ranges allocated with this function and accelerate calls to function like cudaMemCpy. Host memory allocated with cudaHostAlloc is often referred as “pinned memory”, and cannot be paged-out (and because of that allocating excessive amounts of it may degrade system performance since it reduces the amount of memory available to the system for paging). This expedient will grant additional speed in memory transfers between device and host.

We are not ready to take a peek at the revised kernel() function

// This kernel is called at each iteration
// It implements the main loop algorithm and someway "rasterize" the matrix data
// to be passed to the openGL renderer. It also adds droplets in the waiting queue
//
void kernel(unsigned char *ptr)
{
    // Set up the grid
    dim3 blocks(172,172);
    dim3 threads(3,3); // 516x516 img is 172x172 (3x3 thread) blocks
        
    // Implements the un = (*data.u)*data.a1 + (*data.u0)*data.a2 + convolution((*data.u),(*data.D))*data.c1;
    // line by means of several kernel calls
    convolutionGPU<<<blocks,threads>>>(m_gpuData.gpu_u, m_gpuData.ris0);
    // Now multiply everything by c1 constant
    multiplyEachElementby_c1<<<blocks,threads>>>(m_gpuData.ris0, m_gpuData.ris1);
    // First term is ready, now u*a1
    multiplyEachElementby_a1<<<blocks,threads>>>(m_gpuData.gpu_u, m_gpuData.ris0);
    // u0*a2
    multiplyEachElementby_a2<<<blocks,threads>>>(m_gpuData.gpu_u0, m_gpuData.ris2);
    // Perform the matrix addition with the CUBLAS library
    // un = ris0 + ris2 + ris1
    // Since everything is already stored as row-major device vectors, we don't need to do anything to pass it to the CUBLAS
    cublasSaxpy(DIM*DIM, 1.0f, m_gpuData.ris0, 1, m_gpuData.ris2, 1);
    cublasSaxpy(DIM*DIM, 1.0f, m_gpuData.ris2, 1, m_gpuData.ris1, 1);
    // Result is not in m_gpuData.ris1
 
    // Step forward in time
    cudaMemcpy(m_gpuData.gpu_u0, m_gpuData.gpu_u, DIM*DIM*sizeof(float), cudaMemcpyDeviceToDevice);
    cudaMemcpy(m_gpuData.gpu_u, m_gpuData.ris1, DIM*DIM*sizeof(float), cudaMemcpyDeviceToDevice);
 
    // Draw the u surface matrix and "rasterize" it into gpu_ptr
    gpuMatrix2Bitmap<<<blocks,threads>>>(m_gpuData.gpu_u, m_gpuData.gpu_ptr);
 
    // Back on the pagelocked host memory
    cudaMemcpy(ptr, m_gpuData.gpu_ptr, DIM*DIM*4, cudaMemcpyDeviceToHost);
        
    if(first_droplet == 1) // By default there's just one initial droplet
    {
         first_droplet = 0;
         int x0d= DIM / 2; // Default droplet center
         int y0d= DIM / 2;
 
         cudaMemcpy(m_gpuData.ris0, m_gpuData.gpu_u, DIM*DIM*sizeof(float), cudaMemcpyDeviceToDevice);
         addDropletToU<<<blocks,threads>>>(m_gpuData.ris0, x0d,y0d, m_gpuData.gpu_u);
    }
 
    // Add all the remaining droplets in the queue
    while(dropletQueueCount >0)
    {
         dropletQueueCount--;
         int y0d = DIM - dropletQueue[dropletQueueCount].my;
         // Copy from u to one of our buffers
         cudaMemcpy(m_gpuData.ris0, m_gpuData.gpu_u, DIM*DIM*sizeof(float), cudaMemcpyDeviceToDevice);
         addDropletToU<<<blocks,threads>>>(m_gpuData.ris0, dropletQueue[dropletQueueCount].mx,y0d, m_gpuData.gpu_u);
    }
 
    // Synchronize to make sure all kernels executions are done
    cudaThreadSynchronize();
 } 

The line

un = (*data.u)*data.a1 + (*data.u0)*data.a2 + convolution((*data.u),(*data.D))*data.c1;

has completely been superseded by multiple kernel calls which respectively operate a convolution operation, multiply matrix data by algorithm constants and perform a matrix-matrix addition via CUBLAS. Everything is performed in the device including the point-to-RGBvalue mapping (which is a highly parallelizable operation since must be performed for every value in the surface image matrix). Stepping forward in time is also accomplished with device methods. Eventually the data is copied back to the page-locked pinned host memory and droplets waiting in the queue are added for the next iteration to the u surface simulation data matrix.

The CUDA kernels called by the kernel() function are the following

/******************************************************************************
                                CUDA KERNELS
/******************************************************************************/
 
// For a 512x512 image the grid is 170x170 blocks 3x3 threads each one
__global__ void convolutionGPU(float *A, float *result)
{
   __shared__ float data[laplacianD*2][laplacianD*2];
 
   // Absolute position into the image
   const int gLoc = threadIdx.x + IMUL(blockIdx.x,blockDim.x) + IMUL(threadIdx.y,DIM) + IMUL(blockIdx.y,blockDim.y)*DIM;
 
   // Image-relative position
   const int x0 = threadIdx.x + IMUL(blockIdx.x,blockDim.x);
   const int y0 = threadIdx.y + IMUL(blockIdx.y,blockDim.y);
 
   // Load the apron and data regions
 
   int x,y;
   // Upper left square
   x = x0 - kernelRadius;
   y = y0 - kernelRadius;
   if(x < 0 || y < 0)
        data[threadIdx.x][threadIdx.y] = 0.0f;
   else
        data[threadIdx.x][threadIdx.y] = A[ gLoc - kernelRadius - IMUL(DIM,kernelRadius)];
 
   // Upper right square
   x = x0 + kernelRadius + 1;
   y = y0 - kernelRadius;
   if(x >= DIM || y < 0)
        data[threadIdx.x + blockDim.x][threadIdx.y] = 0.0f;
   else
        data[threadIdx.x + blockDim.x][threadIdx.y] = A[ gLoc + kernelRadius+1 - IMUL(DIM,kernelRadius)];
 
   // Lower left square
   x = x0 - kernelRadius;
   y = y0 + kernelRadius+1;
   if(x < 0 || y >= DIM)
        data[threadIdx.x][threadIdx.y + blockDim.y] = 0.0f;
   else
        data[threadIdx.x][threadIdx.y + blockDim.y] = A[ gLoc - kernelRadius + IMUL(DIM,(kernelRadius+1))];
 
   // Lower right square
   x = x0 + kernelRadius+1;
   y = y0 + kernelRadius+1;
   if(x >= DIM || y >= DIM)
         data[threadIdx.x + blockDim.x][threadIdx.y + blockDim.y] = 0.0f;
   else
         data[threadIdx.x + blockDim.x][threadIdx.y + blockDim.y] = A[ gLoc + kernelRadius+1 + IMUL(DIM,(kernelRadius+1))];
 
   __syncthreads();
 
   float sum = 0;
   x = kernelRadius + threadIdx.x;
   y = kernelRadius + threadIdx.y;
 
   // Execute the convolution in the shared memory (kernel is in constant memory)
#pragma unroll
   for(int i = -kernelRadius; i<=kernelRadius; i++)
         for(int j=-kernelRadius; j<=kernelRadius; j++)
                  sum += data[x+i][y+j]  * gpu_D[i+kernelRadius][j+kernelRadius];
 
   // Transfer the risult to global memory
   result[gLoc] = sum;
 
}
 
__global__ void multiplyEachElementby_c1(float *matrix, float *result)
{
    // Absolute position into the image
    const int gLoc = threadIdx.x + IMUL(blockIdx.x,blockDim.x) + IMUL(threadIdx.y,DIM) + IMUL(blockIdx.y,blockDim.y)*DIM;
        
    // Multiply by c1 each matrix's element
    result[gLoc] = matrix[gLoc]*gpu_c1;
}
__global__ void multiplyEachElementby_a1(float *matrix, float *result)
{
    // Absolute position into the image
    const int gLoc = threadIdx.x + IMUL(blockIdx.x,blockDim.x) + IMUL(threadIdx.y,DIM) + IMUL(blockIdx.y,blockDim.y)*DIM;
        
    // Multiply by c1 each matrix's element
    result[gLoc] = matrix[gLoc]*gpu_a1;
}
__global__ void multiplyEachElementby_a2(float *matrix, float *result)
{
    // Absolute position into the image
    const int gLoc = threadIdx.x + IMUL(blockIdx.x,blockDim.x) + IMUL(threadIdx.y,DIM) + IMUL(blockIdx.y,blockDim.y)*DIM;
        
    // Multiply by c1 each matrix'selement
    result[gLoc] = matrix[gLoc]*gpu_a2;
}
 
// Associate a colormap RGB value to each point
__global__ void gpuMatrix2Bitmap(float *matrix, BYTE *bitmap)
{
    // Absolute position into the image
    const int gLoc = threadIdx.x + IMUL(blockIdx.x,blockDim.x) + IMUL(threadIdx.y,DIM) + IMUL(blockIdx.y,blockDim.y)*DIM;
        
    int cvalue = (int)((matrix[gLoc] + 1.0f)/gpu_pp_step);
 
    if(cvalue < 0)
          cvalue = 0;
    else if(cvalue >= COLOR_NUM)
          cvalue = COLOR_NUM-1;
 
    bitmap[gLoc*4] = gpu_m_colorMap[cvalue][0];
    bitmap[gLoc*4 + 1] = gpu_m_colorMap[cvalue][1];
    bitmap[gLoc*4 + 2] = gpu_m_colorMap[cvalue][2];
    bitmap[gLoc*4 + 3] = 0xFF; // Alpha
}
 
// Add a gaussian 2D droplet matrix to the surface data
// Warning: this kernel has a high divergence factor, it is meant to be seldom called
__global__ void addDropletToU(float *matrix, int x0d, int y0d, float *result)
{
    // Absolute position into the image
    const int gLoc = threadIdx.x + IMUL(blockIdx.x,blockDim.x) + IMUL(threadIdx.y,DIM) + IMUL(blockIdx.y,blockDim.y)*DIM;
    // Image relative position
    const int x0 = threadIdx.x + IMUL(blockIdx.x,blockDim.x);
    const int y0 = threadIdx.y + IMUL(blockIdx.y,blockDim.y);
 
    // Place the (x0d;y0d) centered Zd droplet on the wave data (it will be added at the next iteration)
    if (x0 >= x0d-gpu_dsz*2 && y0 >= y0d-gpu_dsz*2 && x0 <= x0d+gpu_dsz*2 && y0 <= y0d+gpu_dsz*2)
    {
         // Add to result the matrix value plus the Zd corresponding value
         result[gLoc] = matrix[gLoc] + gpu_Zd[x0 -(x0d-gpu_dsz*2)][y0 - (y0d-gpu_dsz*2)];
    }
    else
        result[gLoc] = matrix[gLoc]; // This value shouln't be changed
} 

Notice that we preferred to “hardcode” the constant values usages with different kernels rather than introducing divergence with a conditional branch. The only kernel that increases thread divergence is the addDropletToU since only a few threads are actually performing the Gaussian packet-starting routine (see the theoric algorithm described a few paragraphs ago), but this isn’t a problem due to its low calling frequency. 

Performance comparison

The timing measurements and performance comparisons have been performed on the following system

Intel 2 quad cpu Q9650 @ 3.00 Ghz

6 GB ram

64 bit OS

NVIDIA GeForce GTX 285 (1GB DDR3 @ 1476 Mhz, 240 CUDA cores)

The CUDA version we used to compile the projects is 4.2, if you have problems make sure to install the right version or change it as described in the readme file.

To benchmark the CUDA kernel execution we used the cudaEventCreate / cudaEventRecord / cudaEventSynchronize / cudaEventElapsedTime functions shipped with every CUDA version, while for the CPU version we used two Windows platform-dependent APIs: QueryPerformanceFrequency and QueryPerformanceCounter.

We split the benchmark into four stages: a startup stage where the only droplet in the image is the default one, a second stage when both the CPU and the GPU version stabilized themselves, a third one where we add 60-70 droplets to the rendering queue and a final one when the application is left running for 15-20 minutes. We saw that in every test the CPU performed worse than the GPU version which could rely on a large grid of threads ready to split up an upcoming significant workload and provide a fixed rendering time. On the other hand in the long term period, although the GPU still did better, the CPU version showed a small performance increment, perhaps thanks to the caching mechanisms.

Notice that an application operating on larger data would surely have taken a greater advantage from a massive parallelization approach. Our wave PDE simulation is quite simple indeed and did not require a significant workload thus reducing the performance gain that could have been achieved.

Once and for all: there’s not a general rule to convert a sequentially-designed algorithm to a parallel one and each case must be evaluated in its own context and architecture. Also using CUDA could provide a great advantage in scientific simulations, but one should not consider the GPU as a substitute of the CPU but rather as a algebraic coprocessor that can rely on massive data parallelization. Combining CPU sequential code parts with GPU parallel code parts is the key to succeed. 

CUDA Kernels best practices

The last, but not the least, section of this paper provide a checklist of best-practices and errors to avoid when writing CUDA kernels in order to get the maximum from your GPU-accelerated application

  1. Minimize host <-> device transfers, especially device -> host transfers which are slower, also if that could mean running on the device kernels that would not have been slower on the CPU.
  2. Use pinned memory (pagelocked) on the host side to exploit bandwidth advantages. Be careful not to abuse it or you’ll slow your entire system down.
  3. cudaMemcpy is a blocking function, if possible use it asynchronously with pinned memory + a CUDA stream in order to overlap transfers with kernel executions.
  4. If your graphic card is integrated, zero-copy memory (that is: pinned memory allocated with cudaHostAllocMapped flag) always grants an advantage. If not, there is no certainty since the memory is not cached by the GPU.
  5. Always check your device compute capabilities (use cudaGetDeviceProperties), if < 2.0 you cannot use more than 512 threads per block (65535 blocks maximum).
  6. Check your graphic card specifications: when launching a AxB block grid kernel, each SM (streaming multiprocessor) can serve a part of them. If your card has a maximum of 1024 threads per SM you should size your blocks in order to fill as many of them as possible but not too many (otherwise you would get scheduling latencies). Every warp is usually 32 thread (although this is not a fixed value and is architecture dependent) and is the minimum scheduling unit on each SM (only one warp is executed at any time and all threads in a warp execute the same instruction - SIMD), so you should consider the following example: on a GT200 card you need to perform a matrix multiplication. Should you use 8x8, 16x16 or 32x32 threads per block?

For 8X8 blocks, we have 64 threads per block. Since each SM can take up to 1024 threads, there are 16 Blocks (1024/64). However, each SM can only take up to 8 blocks. Hence only 512 (64*8) threads will go into each SM -> SM execution resources are under-utilized; fewer wraps to schedule around long latency operations

For 16X16 blocks , we have 256 threads per Block. Since each SM can take up to 1024 threads, it can take up to 4 Blocks (1024/256) and the 8 blocks limit isn’t hit -> Full thread capacity in each SM and maximal number of warps for scheduling around long-latency operations (1024/32= 32 wraps).

For 32X32 blocks, we have 1024 threads per block -> Not even one can fit into an SM! (there’s a 512 threads per block limitation).

  1. Check the SM registers limit per block and divide them by the thread number: you’ll get the maximum register number you can use in a thread. Exceeding it by just one per thread will cause less warp to be scheduled and decreased performance.
  2. Check the shared memory per block and make sure you don’t exceed that value. Exceeding it will cause less warp to be scheduled and decreased performance.
  3. Always check the thread number to be inferior than the maximum threads value supported by your device.
  4. Use reduction and locality techniques wherever applicable.
  5. If possible, never split your kernel code into conditional branches (if-then-else), different paths would cause more executions for the same warp and the following overhead.
  6. Use reduction techniques with divergence minimization when possible (that is, try to perform a reduction with warps performing a coalesced reading per cycle as described in chapter 6 of Kirk and Hwu book)
  7. Coalescence is achieved by forcing hardware reading consecutive data. If each thread in a warp access consecutive memory coalescence is significantly increased (half of the threads in a warp should access global memory at the same time), that’s why with large matrices (row-major) reading by columns is better than rows readings. Coalescence can be increased with locality (i.e. threads cooperating in loading data needed by other threads); kernels should perform coalesced readings with locality purposes in order to maximize memory throughputs. Storing values into shared memory (when possible) is a good practice too.
  8. Make sure you don’t have unused threads/blocks, by design or because of your code. As said before graphic cards have limits like maximum threads on SM and maximum blocks per SM. Designing your grid without consulting your card specifications is highly discouraged.
  9. As stated before adding more registers than the card maximum registers limit is a recipe for performance loss. Anyway adding a register may also cause instructions to be added, that is: more time to parallelize transfers or to schedule warps and better performances. Again: there’s no general rule, you should abide by the best practises when designing your code and then experiment by yourself.
  10. Data prefetching means preloading data you don’t actually need at the moment to gain performance in a future operation (closely related to locality). Combining data prefetching in matrix tiles can solve many long-latency memory access problems.
  11. Unrolling loops is preferable when applicable (as if the loop is small). Ideally loop unrolling should be automatically done by the compiler, but checking to be sure of that is always a better choice.
  12. Reduce thread granularity with rectangular tiles (Chapter 6 Kirk and Hwu book) when working with matrices to avoid multiple row/columns readings from global memory by different blocks.
  13. Textures are cached memory, if used properly they are significantly faster than global memory, that is: texture are better suited for 2D spatial locality accesses (e.g. multidimensional arrays) and might perform better in some specific cases (again, this is a case-by-case rule).
  14. Try to mantain at least 25% of the overall registers occupied, otherwise access latency could not be hidden by other warps’ computations.
  15. The number of threads per block should always be a multiple of warp size to favor coalesced readings.
  16. Generally if a SM supports more than just one block, more than 64 threads per block should be used.
  17. The starting values to begin experimenting with kernel grids are between 128 and 256 threads per block.
  18. High latency problems might be solved by using more blocks with less threads instead of just one block with a lot of threads per SM. This is important expecially for kernels which often call __syncthreads().
  19. If the kernel fails, use cudaGetLastError() to check the error value, you might have used too many registers / too much shared or constant memory / too many threads.
  20. CUDA functions and hardware perform best with float data type. Its use is highly encouraged.
  21. Integer division and modulus (%) are expensive operators, replace them with bitwise operations whenever possible. If n is a power of 2,

  1. Avoid automatic data conversion from double to float if possible.
  2. Mathematical functions with a __ preceeding them are hardware implemented, they’re faster but less accurate. Use them if precision is not a critical goal (e.g. __sinf(x) rather than sinf(x)).
  3. Use signed integers rather than unsigned integers in loops because some compilers optimize signed integers better (overflows with signed integers cause undefined behavior, then compilers might aggressively optimize them).
  4. Remember that floating point math is not associative because of round-off errors. Massively parallel results might differ because of this.
  5. 1) If your device supports concurrent kernel executions (see concurrentKernels device property) you might be able to gain additional performance by running kernels in parallel. Check your device specifications and the CUDA programming guides.

This concludes our article. The goal was an exploration of GPGPU applications capabilities to improve scientific simulations and programs which aim to large data manipulations.

 

References

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Alesiani Marco

I'm a Computer Science Engineer and I've been programming with a large variety of technologies for years. I love writing software with C/C++, CUDA, .NET and playing around with reverse engineering
저작자 표시

'소스코드' 카테고리의 다른 글

Endogine sprite engine  (0) 2012.07.12
Paint.NET  (0) 2012.07.12
GPGPU on Accelerating Wave PDE  (0) 2012.07.12
Microsoft® Surface® 2 Design and Interaction Guide  (0) 2012.07.12
From Soup to Nuts with the Surface SDK 2.0  (0) 2012.07.12
Microsoft® Surface® 2 Development Whitepaper  (0) 2012.07.12

Microsoft® Surface® 2 Design and Interaction Guide

http://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=26713

 

 

 

Microsoft Surface 2 0 Design and Interaction Guide.pdf

Microsoft® Surface® 2 Design and Interaction Guide

Microsoft® Surface® 2 Design and Interaction Guide

Quick details

Version: 2.0 Date published: 7/11/2011
Language: English
File name Size
Microsoft Surface 2 0 Design and Interaction Guide.pdf 2.8 MB Download

Overview

The Microsoft Surface 2.0 Design and Interaction Guide helps designers and developers create Surface applications for Microsoft Surface and Windows 7 touch PCs. Developing compelling Surface experiences requires a different approach to interface design. This document presents design principles and guidelines to address key aspects of application interface design including: interaction, visual, sound, text, and more. These principles and practices are a starting point to get the most out of the Surface software and hardware platform’s unique capabilities.

Top of pageTop of page

System requirements

Supported operating systems: Windows 7

The file is in .PDF format, so a .PDF reader is required.

Top of pageTop of page

Instructions

Download the document and open it with a compatible reader.

저작자 표시

From Soup to Nuts with the Surface SDK 2.0

http://www.codeproject.com/Articles/234149/From-Soup-to-Nuts-with-the-Surface-SDK-2-0

 

From Soup to Nuts with the Surface SDK 2.0

By | 31 Jul 2011 | Article
A look at the new Surface SDK 2.0 that was recently released by Microsoft
image001.png

Introduction

With the Microsoft® Surface® 2.0 SDK, you can easily create applications to take advantage of the next generation Surface computing device or any Windows touch-enables devices (defined by Microsoft).

Links worth checking out (thanks to Luis Cabrera):

Getting the SDK Installed

After downloading the Surface 2 SDK, double click the installer on the SDK to get the ball rolling.

Getting the Runtime Installed

Once that is complete, then double click the installer for the Surface 2.0 Runtime.

A Few Things to Note After Installing It

Hit your Start button and go to your programs and navigate to the Microsoft Surface 2.0 SDK. You will notice the normal “Getting Help” and “Release Notes”, but it also contains Surface Samples and Tools.

Surface Samples

After clicking on that folder, you will see a Surface Code Samples.zip file.

Go ahead and extract the zip file and you will notice the following sample project exists.

Once loaded into Visual Studio 2010, you will see 14 projects exist inside of the solution.

Go ahead and set one of them as your “Startup Project”.

You can now use your mouse or a touch-enabled monitor to interact with the application. You also have the full source code, so you can manipulate the application all you want.

They have several other great examples of what the Surface 2.0 SDK is capable of.

The Tools Folder

Inside the tools folder, you will find the following applications:

Input Simulator - Simulate touch input and supported hardware parameters.

According to the docs (updated now for version 2.0):

Surface Simulator replicates the user interface and behavior of a Microsoft Surface unit that is in user mode. Surface Simulator has access points, Launcher, and the loading screen. When you start an application in Surface Simulator, the application displays like it is on a Microsoft Surface unit.

You can use Surface Simulator to evaluate how an application and its user interface respond to basic input. For example, if you simulate a painting application and if you touch multiple colors, one at a time, and then add the colors to a mixing bucket, you can test the logic of the application and how well it mixes the colors by using the touch-based interface.

Surface Simulator runs with the appearance and functionality of a Microsoft Surface unit in user mode (the way that it appears to users). You can switch applications by using Launcher and the access points that display on the Launcher screen and the applications.

Input Visualizer - Display input data on top of a Microsoft Surface application.

According to the docs:

The Input Visualizer tool enables you to see the contact data that the Microsoft Surface Vision System returns in the context of your application. This tool runs on top of your application and displays information about the contacts that the input system detects.

Input Visualizer can help you test and debug the following scenarios:

  • Accidental input: Track the accidental activation of Microsoft Surface controls from palms, forearms, and other objects by seeing when these controls detect contacts.
  • Contact tracking: Determine what gestures are lost as contacts when users are dragging content in Microsoft Surface applications. You can use the fade away feature of Input Visualizer for this type of tracking.
  • Input hit-testing: Investigate where hit-testing occurs by freezing the user interface of Input Visualizer, lifting contacts, and seeing where their centers are reported.

Input Visualizer is installed with the Microsoft Surface SDK and runs only on Microsoft Surface units. If you are developing on a separate workstation, Surface Simulator provides contact visuals, reducing how much you need a visual representation of input.

Surface Stress - Open a command prompt window to run stress tests against a Microsoft Surface application.

According to the docs:

The Surface Stress tool enables you to test the stability and robustness of your Microsoft Surface application by delivering multiple, simultaneous contacts to your application in a random way. Surface Stress generates all four types of contacts: fingers, blobs, byte tags, and identity tags.

Surface Stress is included with Microsoft Surface SDK 1.0 SP1. By default, the Surface Stress executable file (SurfaceStress.exe) is located in the C:\Program Files\Microsoft SDKs\Surface\v1.0\Tools\SurfaceStress folder, and a shortcut to Surface Stress appears in the Start menu under the Microsoft Surface SDK entry.

Let’s create a new project.

Now that you have learned how to download and get started with it, it is time to actually create an application. Go ahead and fire up Visual Studio 2010 and begin a new project. Look for Surface then v2.0.

You will notice that you have 2 templates to start with:

  • Surface Application (WPF)
  • Surface Application (XNA Game Studio 4.0)

We are only going to focus on the Surface Application (WPF).

Go ahead and give your application a name and hit OK.

At first glance, you will realize this is just a WPF application. The folder structure looks just like what we would expect for a WPF application except that you have a “Resources” folder, a .xml document and MainPage.xaml is now called SurfaceWindow1.xaml.

Let’s go ahead and take a look at the Toolbox. What we are most interested in is the “Surface Controls”. As you can see from this long list, there is a lot of Surface specific controls at our disposal right off the bat.

Let’s go ahead and use the “SurfaceInkCanvas”. So drag and drop it onto the SurfaceWindow1.xaml file.

Make sure your XAML looks very similar to the following:

<s:SurfaceWindow x:Class="MichaelSurfaceApplication.SurfaceWindow1"
    xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
    xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
    xmlns:s="http://schemas.microsoft.com/surface/2008"
    Title="MichaelSurfaceApplication">
  <Grid>
        <s:SurfaceInkCanvas Name="SampleInkCanvas" 
	HorizontalAlignment="Stretch" VerticalAlignment="Stretch" >
            <s:SurfaceInkCanvas.DefaultDrawingAttributes>
                <DrawingAttributes Color="#FF808080"/>
            </s:SurfaceInkCanvas.DefaultDrawingAttributes>
        </s:SurfaceInkCanvas>
    </Grid>
</s:SurfaceWindow>

Now go ahead and run your application and you should get the following screen. Go ahead and draw something on the screen and then close the window.

Congratulations! You just created your first Surface 2.0 application while actually writing no code. While you are probably testing it on your laptop or desktop, this application would actually run on a Surface 2 Unit! Very cool stuff indeed.

Conclusion

The Surface is very cool technology and I am planning on investing a lot of time into it and other things such as Kinect. Microsoft really got it right with the Surface 2.0 SDK. I think that this is possibly the best SDK release Microsoft has ever been a part of. The documentation is excellent, the samples are a plenty and it’s just plain easy to build your first application. Now if only I had an actual Surface 2 table in my house to play with, then I would be really happy.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

mbcrump

Software Developer (Senior)
Telerik
United States United States

Member

Follow on Twitter Follow on Twitter
Michael Crump is a Silverlight MVP and MCPD that has been involved with computers in one way or another for as long as he can remember, but started professionally in 2002. After spending years working as a systems administrator/tech support analyst, Michael branched out and started developing internal utilities that automated repetitive tasks and freed up full-time employees. From there, he was offered a job working at McKesson corporation and has been working with some form of .NET and VB/C# since 2003.
 
He has worked at Fortune 500 companies where he gained experience in embedded systems design and software development to systems administration and database programming, and everything in between.
 
His primary focus right now is developing healthcare software solutions using Microsoft .NET technologies. He prefers building infrastructure components, reusable shared libraries and helping companies define, develop and automate process standards and guidelines.
 
You can read his blog at: MichaelCrump.net or follow him on Twitter at @mbcrump.
저작자 표시

Microsoft® Surface® 2 Development Whitepaper

http://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=26715

 

Developing Surface Applications.pdf

 

 

별 내용은 없지만, 저장해두는 것임.

 

 

Microsoft® Surface® 2 Development Whitepaper

Microsoft® Surface® 2 Development Whitepaper

Quick details

Version: 2.0 Date published: 7/11/2011
Language: English
File name Size
Developing Surface Applications.pdf 619 KB Download

Overview

This paper provides an overview of the Microsoft Surface application development process. It provides detailed information about the Surface platform and unique capabilities of the hardware. Topics include the Surface 2.0 SDK, vision based touch input, and system architecture. This development whitepaper covers the basic end-to-end process for creating great Surface applications.

Top of pageTop of page

System requirements

Supported operating systems: Windows 7

The document is in .PDF format, so a .PDF compatible reader is required.

Top of pageTop of page

Instructions

Download the file and open it with a compatible reader.

저작자 표시

Microsoft Surface SDK 1.0 SP1 Workstation Edition

http://www.microsoft.com/en-us/download/details.aspx?id=15532

Microsoft Surface SDK 1.0 SP1 Workstation Edition

The Microsoft Surface SDK 1.0 Service Pack 1 (SP1) Workstation Edition enables you to create and test Microsoft Surface touch-enabled applications on a workstation instead of on a Microsoft Surface unit.

Quick details

Version: 1.0 Date published: 11/5/2009
Language: English

Files in this download

The links in this section correspond to files available for this download. Download the files appropriate for you.

File name Size
Release Notes for Microsoft Surface SDK 1.0 SP1 Workstation Edition.xps 598 KB Download
Start Here for Microsoft Surface SDK 1.0 SP1 Workstation Edition.xps 588 KB Download
SurfaceSDKWE.msi 144.2 MB Download

Overview

The Microsoft Surface SDK, Workstation Edition, includes a simulator (called Surface Simulator) that replicates the Microsoft Surface user interface on a workstation. Surface Simulator, along with the Microsoft Visual Studio project templates that are included in the Microsoft Surface SDK, enables you to create and test Microsoft Surface touch-enabled applications on a workstation instead of on a Microsoft Surface unit.

Important: If you develop a Microsoft Surface application on a workstation, the final testing step is to run and test your application on a Microsoft Surface unit.

Top of pageTop of page

System requirements

Supported operating systems: Windows Vista Business, Windows Vista Enterprise, Windows Vista Home Premium, Windows Vista Ultimate

    • A 32-bit edition of one of the following Windows Vista operating systems:
      • Windows Vista Business
      • Windows Vista Enterprise
      • Windows Vista Ultimate
      • Windows Vista Home Premium
  • Additional Requirements:
    • Microsoft Visual C# 2008 Express Edition or Microsoft Visual Studio 2008
    • Microsoft XNA Framework Redistributable 2.0
Note: For additional important software and hardware requirements, download the Start Here guide.

Important: You can install the Surface SDK 1.0 SP1, Workstation Edition on additional versions of Windows Vista and on the Windows 7 operating system. However, these additional operating systems are unsupported for the Surface SDK.

Top of pageTop of page

Instructions

  1. Install one of the Windows Vista operating systems listed earlier.
  2. Install Visual C# 2008 Express Edition or Visual Studio 2008.
  3. Install XNA Framework Redistributable 2.0
  4. Install the Surface SDK Workstation Edition.
For additional information about how to develop Surface applications, see Microsoft Surface in the MSDN Library.

Top of page
저작자 표시

'소스코드' 카테고리의 다른 글

From Soup to Nuts with the Surface SDK 2.0  (0) 2012.07.12
Microsoft® Surface® 2 Development Whitepaper  (0) 2012.07.12
Microsoft Surface SDK 1.0 SP1 Workstation Edition  (0) 2012.07.12
Microsoft Surface 2.0 SDK  (0) 2012.07.12
Surface Samples  (0) 2012.07.12
Kinect for Windows SDK v1.5  (0) 2012.07.12

Microsoft Surface 2.0 SDK

http://msdn.microsoft.com/en-us/library/ff727815

Microsoft Surface 2.0 SDK

2 out of 3 rated this helpful - Rate this topic


The Microsoft Surface 2.0 SDK provides the managed APIs and the tools you need to develop Surface applications. Applications that are built using the Surface SDK can run on devices made for Surface 2.0, and on Windows 7 computers. Developing applications for Surface is essentially the same as developing WPF or XNA applications, except that the Surface SDK provides extended support for the special features of the Surface environment (50 simultaneous touch points, finger and blob recognition, tagged objects, detection of the orientation of touches, tilted display, rotated display, specialized controls, and so on). Surface applications that are installed and registered on a device made for Surface are automatically integrated with the Surface Shell and can make use of those special features. For a video that shows a device made for Surface in use, see What Is Surface.

In this Section

Additional Information

Did you find this information useful? Please send us your suggestions and comments.

© Microsoft Corporation. All rights reserved.
저작자 표시

'소스코드' 카테고리의 다른 글

Microsoft® Surface® 2 Development Whitepaper  (0) 2012.07.12
Microsoft Surface SDK 1.0 SP1 Workstation Edition  (0) 2012.07.12
Microsoft Surface 2.0 SDK  (0) 2012.07.12
Surface Samples  (0) 2012.07.12
Kinect for Windows SDK v1.5  (0) 2012.07.12
FFMPEG  (0) 2012.07.11

Surface Samples

http://msdn.microsoft.com/en-us/library/ff727811

 

 

Surface Samples

This topic has not yet been rated - Rate this topic


The Microsoft Surface SDK contains several types of samples, including quick start tutorials, how-to topics, and extractable Microsoft Visual Studio 2010 projects for sample applications.

Quick Starts

The following topics are basic tutorials to help you create your first Surface application for the Presentation layer or the Core layer:

"How Do I…?" Examples

Sample Application Projects

The sample applications that come with the Surface SDK show several different programming techniques in a complete application. You can use these applications as a starting point for more complete applications or just as examples of best practices in Surface programming. For information about obtaining the sample files, see Extracting and Installing the Surface Samples.

Samples That Use the Core Layer and the XNA Framework

  • Finger Fountain draws small images for every contact at every frame. This sample emphasizes multiple touches and shows how to use the Microsoft XNA APIs.

  • Framework provides an extensive sample framework that helps you create controls by using the Core layer. The code in this sample eliminates inconsistent behavior among Core-based applications by using the Model-View-Controller (MVC) design pattern.

  • Cloth is an XNA-based application that demonstrates how to use the Core Interaction Framework.

  • RawImage Visualizer shows how to use the RawImage APIs for XNA applications. This sample displays captured normalized (8 bit per pixel) images that are flipped vertically.

  • XNA Scatter demonstrates how to use the manipulations and inertia APIs to move graphical user interface (GUI) components around in a Surface application in a natural and intuitive way.

Samples That Use the Presentation Layer (WPF)

  • Controls Box shows how to build simple application behaviors from touch-enabled controls that the Presentation layer provides, such as updating a text box when a user touches a button.

  • Data Visualizer shows contact properties that are exposed in the Presentation layer (such as x, y, height, width, major axis, minor axis, and orientation) and how you can read and use these properties in a Surface application.

  • Grand Piano demonstrates how to integrate sound into Surface applications based on the Presentation layer.

  • Item Compare represents a simple tool that lets a user compare and contrast the properties of two "items" (tagged objects).

  • Photo Paint uses the SurfaceInkCanvas control to implement drawing and painting over pictures and video.

  • ScatterPuzzle shows an implementation of the ScatterView and SurfaceListBox controls to create a simple puzzle game. The ScatterView and SurfaceListBox controls automatically provide some powerful features related to Surface.

  • Shopping Cart shows how to implement drag-and-drop functionality in a retail application.

  • Tag Visualizer Events shows how to incorporate hit-testing in the TagVisualizer control to let user interface (UI) elements react when tagged objects move over them.

Did you find this information useful? Please send us your suggestions and comments.

© Microsoft Corporation. All rights reserved.
저작자 표시

'소스코드' 카테고리의 다른 글

Microsoft Surface SDK 1.0 SP1 Workstation Edition  (0) 2012.07.12
Microsoft Surface 2.0 SDK  (0) 2012.07.12
Surface Samples  (0) 2012.07.12
Kinect for Windows SDK v1.5  (0) 2012.07.12
FFMPEG  (0) 2012.07.11
qt style 설정  (0) 2012.07.05

Kinect for Windows SDK v1.5

http://www.microsoft.com/en-us/download/details.aspx?id=29866

 

 

Kinect for Windows SDK v1.5

Quick links

\n
'));" frameBorder=0 width=180 scrolling=no>

The Kinect for Windows SDK enables developers to create applications that support gesture and voice recognition, using Kinect sensor technology on computers running Windows 7, Windows 8 consumer preview, and Windows Embedded Standard 7.

Quick details

Version: 1.5.2.331 Date published: 5/18/2012
Language: English
File name Size
KinectSDK-v1.5-Setup.exe 222.0 MB Download

Overview

What's new?

The Kinect for Windows SDK v1.5, driver, and runtime are 100% compatible with Kinect for Windows v1.0 applications and include new features such as: skeletal tracking in near range, seated skeletal tracking, joint orientation, and other improvements.


Learn more about the Kinect for Windows commercial SDK

View Release Notes >

Explore the features >

The Kinect for Windows SDK includes the following:

  • Drivers for using Kinect for Windows sensors on a computer running Windows 7, Windows 8 consumer preview, and Windows Embedded Standard 7
  • Application programming interfaces (APIs) and device interfaces
  • Note: Samples have been removed from the SDK install. Samples, tools, and other valuable development resources are now available in the Kinect for Windows Developer Toolkit.

Top of pageTop of page

System requirements

Supported operating systems: Windows 7

    Windows Embedded Standard 7


Top of pageTop of page

Instructions

To install the SDK:

  1. Make sure the Kinect sensor is not plugged into any of the USB ports on the computer.
  2. If you have the Kinect for Windows v1.0 SDK installed, close any open samples, the Sample Browser, etc. You do not need to uninstall the v1.0 SDK. Skip to step 5.
  3. Remove any other drivers for the Kinect sensor.
  4. If you have Microsoft Server Speech Platform 10.2 installed, uninstall the Microsoft Server Speech Platform Runtime and SDK components including both the x86 and x64 bit versions, plus the Microsoft Server Speech Recognition Language - Kinect Language Pack.
  5. Close Visual Studio. You must close Visual Studio before installing the SDK and then restart it after installation to pick up environment variables that the SDK requires.
  6. From the download location, double-click on KinectSDK-v1.5-Setup.exe. This single installer works for both 32-bit and 64-bit Windows.
  7. Once the SDK has completed installing successfully, ensure the Kinect sensor is plugged into an external power source and then plug the Kinect sensor into the PC's USB port. The drivers will load automatically.
  8. The Kinect sensor should now be working correctly.
  9. Download the Kinect for Windows Developer Toolkit, which contains source code samples, tools, and other valuable development resources that simplify developing Kinect for Windows applications.

Top of page
저작자 표시

'소스코드' 카테고리의 다른 글

Microsoft Surface 2.0 SDK  (0) 2012.07.12
Surface Samples  (0) 2012.07.12
Kinect for Windows SDK v1.5  (0) 2012.07.12
FFMPEG  (0) 2012.07.11
qt style 설정  (0) 2012.07.05
MOTODEV app-validator  (0) 2012.06.28

FFMPEG

http://ffmpeg.mplayerhq.hu/index.html


Project Description

FFmpeg is a complete, cross-platform solution to record, convert and stream audio and video. It includes libavcodec - the leading audio/video codec library. See the documentation for a complete feature list and theChangelog for recent changes.

FFmpeg is free software licensed under the LGPL or GPL depending on your choice of configuration options. If you use FFmpeg or its constituent libraries, you must adhere to the terms of the license in question. You can find basic compliance information and get licensing help on our license and legal considerations page.

Looking for help? Contact us, but before you report any bugs, read the guidelines that we created for this purpose.

Want to participate in the active development of FFmpeg? Keep up with the latest developments by subscribing to both the ffmpeg-devel and ffmpeg-cvslog lists.

News [RSS]

July, 5, 2012, Donations

We're glad to announce that FFmpeg has been accepted as SPI associated project.

Donations to FFmpeg can be done through SPI, following the instructions here, or following this direct Click&Pledge link.

Donations will be used to fund expenses related to development (e.g. to cover equipment and server maintenance costs), to sponsor bug fixing, feature development, the participation or organization of meetings and events in the project interest area, and to support internal development or educational projects or any other activity promoting FFmpeg.

June, 7, 2012, FFmpeg 0.11.1

We have made a new point releases (0.11.1). It contains about 70 bugfixes, some possibly security relevant.

We recommend users, distributors and system integrators to upgrade to 0.11.1 or git master.

May, 25, 2012, FFmpeg 0.11

We have made a new major release (0.11) It contains all features and bugfixes of the git master branch. A partial list of new stuff is below:

Fixes:CVE-2012-2772, CVE-2012-2774, CVE-2012-2775, CVE-2012-2776, CVE-2012-2777,
      CVE-2012-2779, CVE-2012-2782, CVE-2012-2783, CVE-2012-2784, CVE-2012-2785,
      CVE-2012-2786, CVE-2012-2787, CVE-2012-2788, CVE-2012-2789, CVE-2012-2790,
      CVE-2012-2791, CVE-2012-2792, CVE-2012-2793, CVE-2012-2794, CVE-2012-2795,
      CVE-2012-2796, CVE-2012-2797, CVE-2012-2798, CVE-2012-2799, CVE-2012-2800,
      CVE-2012-2801, CVE-2012-2802, CVE-2012-2803, CVE-2012-2804,
- v408 Quicktime and Microsoft AYUV Uncompressed 4:4:4:4 encoder and decoder
- setfield filter
- CDXL demuxer and decoder
- Apple ProRes encoder
- ffprobe -count_packets and -count_frames options
- Sun Rasterfile Encoder
- ID3v2 attached pictures reading and writing
- WMA Lossless decoder
- bluray protocol
- blackdetect filter
- libutvideo encoder wrapper (--enable-libutvideo)
- swapuv filter
- bbox filter
- XBM encoder and decoder
- RealAudio Lossless decoder
- ZeroCodec decoder
- tile video filter
- Metal Gear Solid: The Twin Snakes demuxer
- OpenEXR image decoder
- removelogo filter
- drop support for ffmpeg without libavfilter
- drawtext video filter: fontconfig support
- ffmpeg -benchmark_all option
- super2xsai filter ported from libmpcodecs
- add libavresample audio conversion library for compatibility
- MicroDVD decoder
- Avid Meridien (AVUI) encoder and decoder
- accept + prefix to -pix_fmt option to disable automatic conversions.
- complete audio filtering in libavfilter and ffmpeg
- add fps filter
- audio split filter
- vorbis parser
- png parser
- audio mix filter

We recommend users, distributors and system integrators to upgrade unless they use current git master.

April 12, 2012, FFmpeg 0.7.12 / 0.8.11

We have made two new point releases (0.7.12 and 0.8.11). An abbreviated list of changes is below:

Fixes: CVE-2012-0853, CVE-2012-0858, CVE-2011-3929, CVE-2011-3936,
       CVE-2011-3937, CVE-2011-3940, CVE-2011-3945, CVE-2011-3947
Several security issues that dont have CVE numbers.
and about 150 bugfixes
See the changelog for details.

We recommend distributors and system integrators to upgrade to 0.10.2 or git master when possible though.

April, 4, 2012, Server Upgrade

Today our main server has been upgraded due to performance issues with our bug tracker. While investigating the speed issues, we also took the opportunity to add voting support to bug reports and wiki pages, so you can now "tell" us which issues you want us to work on first.

March, 17, 2012, FFmpeg 0.10.1

We have made a new point release (0.10.1) It contains some security fixes, over 100 bugfixes and some new features like the swapuv filter. See the changelog for details. We recommend users, distributors and system integrators to upgrade unless they use current git master.

January, 27, 2012, FFmpeg 0.10

We have made a new major release (0.10) It contains all features and bugfixes of the git master branch. A partial list of new stuff is below:

Fixes: CVE-2011-3929, CVE-2011-3934, CVE-2011-3935, CVE-2011-3936,
       CVE-2011-3937, CVE-2011-3940, CVE-2011-3941, CVE-2011-3944,
       CVE-2011-3945, CVE-2011-3946, CVE-2011-3947, CVE-2011-3949,
       CVE-2011-3950, CVE-2011-3951, CVE-2011-3952
v410 Quicktime Uncompressed 4:4:4 10-bit encoder and decoder
SBaGen (SBG) binaural beats script demuxer
OpenMG Audio muxer
Timecode extraction in DV and MOV
thumbnail video filter
XML output in ffprobe
asplit audio filter
tinterlace video filter
astreamsync audio filter
amerge audio filter
ISMV (Smooth Streaming) muxer
GSM audio parser
SMJPEG muxer
XWD encoder and decoder
Automatic thread count based on detection number of (available) CPU cores
y41p Brooktree Uncompressed 4:1:1 12-bit encoder and decoder
ffprobe -show_error option
Avid 1:1 10-bit RGB Packer codec
v308 Quicktime Uncompressed 4:4:4 encoder and decoder
yuv4 libquicktime packed 4:2:0 encoder and decoder
ffprobe -show_frames option
silencedetect audio filter
ffprobe -show_program_version, -show_library_versions, -show_versions options
rv34: frame-level multi-threading
optimized iMDCT transform on x86 using SSE for for mpegaudiodec
Improved PGS subtitle decoder
dumpgraph option to lavfi device
r210 and r10k encoders
ffwavesynth decoder
aviocat tool
ffeval tool
all features from avconv merged into ffmpeg

We recommend users, distributors and system integrators to upgrade unless they use current git master.

January 24, 2012, Forgotten Patches

FFmpeg development has gone into OVERDRIVE. Over the years we have missed patches, so we need your help to locate old unapplied patches to review again.

If you find a patch that was never applied, please let us know, either by resubmitting it to ffmpeg-devel or by attaching it to a bug on our bug tracker.

For example, did you know there was a patch to read DVDs with FFmpeg? Its now being reviewed and fixed up for inclusion. Want to add BluRay support? We're interested!

January 16, 2012, Chemnitzer Linux-Tage

We happily announce that FFmpeg will be represented at `Chemnitzer Linux-Tage' in Chemnitz, Germany. The event will take place on 17th and 18th of March.

More information can be found here

We hereby invite you to visit us at our booth located in the Linux-Live area! There we will demonstrate usage of FFmpeg, answer your questions and listen to your problems and wishes.

January 12, 2012, FFmpeg 0.8.10, 0.7.11, 0.6.5, 0.5.8

We have made 4 new point releases, (0.5.80.6.50.7.11 and 0.8.10). All of them contain fixes for CVE-2011-3892 (already in previous 0.8 and 0.7 releases), CVE-2011-3893, and CVE-2011-3895. In addition 0.8.10 and 0.7.11 contain all critical security fixes from 0.9.1. We recommend users, distributors and system integrators to upgrade unless they use current git master. We recommend everyone to upgrade to at least 0.7.11, 0.8.10 or 0.9.1.

January 5, 2012, FFmpeg 0.9.1

We have made a new point release, (0.9.1). It contains many bug and security fixes, amongth them CVE-2011-3893 and CVE-2011-3895. It also significantly improves seeking support in H.264. We recommend users, distributors and system integrators to upgrade unless they use current git master.

December 25, 2011, FFmpeg 0.5.7, 0.6.4, 0.7.9, 0.8.8

We have made 4 new point releases, (0.5.70.6.40.7.9 and 0.8.8). They contain some bug fixes, minor changes and security fixes. Note, CVE-2011-4352, CVE-2011-4579, CVE-2011-4353, CVE-2011-4351, CVE-2011-4364 and the addition of avcodec_open2() for libx264 have been fixed/done in previous 0.7 and 0.8 point releases already. We recommend users, distributors and system integrators to upgrade unless they use current git master. We recommend everyone to upgrade to at least 0.7.8, 0.8.7 or 0.9.

December 23, 2011, Call For Maintainers

FFmpeg is moving faster than ever before, and with your help we could move even faster. If you know C and git and want to maintain some part of FFmpeg you can help us. Clone git://source.ffmpeg.org/ffmpeg.git, pick an area of the codebase you want to maintain, subscribe to ffmpeg-devel and start hacking on the code you are interested in, review patches on the mailing list, and fix bugs from our bug tracker that are related to the area you want to maintain. Once you are happy with your work just send us a link to your public git clone (for example from Github). Non-programmers are welcome to contribute too. We are also searching for someone to make new official Debian and Ubuntu packages, that would be part of the official distributions. If you have questions, just ask on ffmpeg-devel mailing list or our IRC channel #ffmpeg-devel.

December 20, 2011, Winter logo

Our winter logo has been drawn by Daniel Perez from Google Code-In. FFmpeg has teamed up with VideoLAN to help pre-university students contribute to open-source projects. See the Google Code-In VideoLAN project page if you would like to contribute.

We would also like to thank our students who have already participated.

December 11, 2011, FFmpeg 0.9

We have made a new major release (0.9) It contains all features and bugfixes of the git master branch. A partial list of new stuff is below:

native dirac decoder
mmsh seeking
more accurate rgb->rgb in swscale
MPO file format reading support
mandelbrot fraktal video source
libass filter
export quarter_sample & divx_packed from decoders
VBLE decoder
libopenjpeg encoder
alpha opaqueness fixes in many codecs
8bit palette dynamic range fixes in many codecs
AVIOInterruptCB
OS/2 threads support
cbr mp3 muxing fix
sample rate change support in flv (nellymoser decoder)
mov/mp4 chunking support (equivalent to mp4boxs -inter)
mov/mp4 fragment support (equivalent to mp4boxs -frag)
rgba tiffs
x264rgb bugfix
cljrencoder with dither
escape130 decoder
many new ARM optimizations
-report
Dxtory capture format decoder
life video source
wtv, sox, utvideo and many other new regression tests
gcc coverage support
cellauto video source
planar rgb input support in sws
libmodplug & bintext output
g723.1 encoder
g723.1 muxer
random() function for the expression evaluator
persistent variables for the expression evaluator
pulseaudio input support
h264 422 inter decoding support
prores encoder
native utvideo decoder
libutvideo support
deshake filter
aevalsrc filter
segment muxer
mkv timecode v2 muxer
cache urlprotocol
libaacplus support
ACT/BIT demuxers
AMV video encoder
g729 decoder
stdin control of drawtext
2bpp, 4bpp png support
interlaced 1bpp and PAETH png fixes
libspeex encoding support
hardened h264 decoder that wont overread the bitstream
wtv muxer
H/W Accelerated H.264 Decoding on Android
stereo3d filter from libmpcodecs works now
an experimental jpeg2000 encoder
many bugfixes
libswresample

We recommend users, distributors and system integrators to upgrade unless they use current git master.

December 10, 2011, Donations

Want to donate to FFmpeg? Well, theres no way to do that currently. Luckily we dont need any money. But there are many not for profit organizations with noble goals that do. Select one of your choice that you trust and agree with their goals and instead of donating to FFmpeg, send your donation to them.

November 29, 2011, Google Code-in

The FFmpeg project participates for the first time in Google Code-in. Thanks go to the VideoLAN project for making this possible! We welcome all eligible students to pick up some task and win a T-Shirt or some money from google and at the same time have some fun and contribute to a Free software project.

November 21, 2011

We have made 2 new point releases (0.7.8 and 0.8.7) that fix many bugs, several of which are security relevant. Amongth them NGS00144, NGS00145 and NGS00148. We recommend users, distributors and system integrators to upgrade unless they use current git master.

stop censorship logoNovember 20, 2011

FFmpeg supports the fight against American Internet censorship.

November 6, 2011

We have made a new point release (0.5.5) from the old 0.5 branch. It fixes many serious security issues, a partial list is below.

d39cc3c0 resample2: fix potential overflow
e124c3c2 resample: Fix overflow
8acc0546 matroskadec: fix out of bounds write
c603cf51 qtrle: check for out of bound writes.
e1a46eff qtrle: check for invalid line offset
23aaa82b vqa: fix double free on corrupted streams
58087a4e mpc7: return error if packet is too small.
8d1fa1c9 mpc7: check output buffer size before decoding
2eb5f77b h264: do not let invalid values in h->ref_count after a decoder reset.
ddbbe500 h264: fix the check for invalid SPS:num_ref_frames.
d1a5b53e h264: do not let invalid values in h->ref_count on ff_h264_decode_ref_pic_list_reordering() errors.
3699a46e Check for out of bound writes in the QDM2 decoder.
62da9203 Check for out of bound writes in the avs demuxer.
2e1e3c1e Check for corrupted data in avs demuxer.
635256a3 Fix out of bound writes in fix_bitshift() of the shorten decoder.
240546a1 Check for out of bounds writes in the Delphine Software International CIN decoder.
07df40db Check for invalid update parameters in vmd video decoder.
b24c2e59 Release old pictures after a resolution change in vp5/6 decoder
25bc1108 Check output buffer size in nellymoser decoder.
8ef917c0 check all svq3_get_ue_golomb() returns.
648dc680 Reject audio tracks with invalid interleaver parameters in RM demuxer.
d6f8b654 segafilm: Check for memory allocation failures in segafilm demuxer.
d8439f04 rv34: check that subsequent slices have the same type as first one.
6108f04d Fixed segfault on corrupted smacker streams in the demuxer.
b261ebfd Fixed segfaults on corruped smacker streams in the decoder.
03db051b Fixed segfault with wavpack decoder on corrupted decorrelation terms sub-blocks.
9cda3d79 rv10: Reject slices that does not have the same type as the first one
52b8edc9 oggdec: fix out of bound write in the ogg demuxer
2e17744a Fixed off by one packet size allocation in the smacker demuxer.
19431d4d ape demuxer: fix segfault on memory allocation failure.
ecd6fa11 Check for invalid packet size in the smacker demuxer.
80fb9f2c cavsdec: avoid possible crash with crafted input
46f9a620 Fix possible double free when encoding using xvid.
4f07a3aa Fix memory (re)allocation in matroskadec.c, related to MSVR-11-0080. Fixes: MSVR11-011, CVE-2011-3504
04888ede cavs: fix some crashes with invalid bitstreams Fixes CVE-2011-3362, CVE-2011-3973, CVE-2011-3974
24cd7c5d Fix apparently exploitable race condition.
8210ee22 AMV: Fix possibly exploitable crash. Fixes http://seclists.org/bugtraq/2011/Apr/257

We recommend distributors and system integrators whenever possible to upgrade to 0.7.7, 0.8.6 or git master. But when this is not possible 0.5.5 is more secure than previous releases from the 0.5 branch. If you are looking for an updated 0.6 release, please consider 0.7.7 which is ABI compatible and contains a huge number of security fixes that are missing in 0.6.*.

November 4, 2011

We have made 2 new point releases (0.7.7 and 0.8.6) that fix around 90 bugs, several of which are security relevant. We recommend users, distributors and system integrators to upgrade unless they use current git master.

October 29, 2011

New stuff in git master:

planar rgb input support in sws
libmodplug & bintext output
g723.1 encoder
g723.1 muxer
random() function for the expression evaluator
persistent variables for the expression evaluator
pulseaudio input support
h264 422 inter decoding support
prores encoder
native utvideo decoder
libutvideo support
deshake filter
aevalsrc filter
segment muxer
mkv timecode v2 muxer
cache urlprotocol
many bugfixes and many other things

October 2, 2011

We have made 2 new point releases (0.7.6 and 0.8.5) that fix security issues in

4X Technologies demuxer
4xm decoder
ADPCM IMA Electronic Arts EACS decoder
ANM decoder
Delphine Software International CIN decoder
Deluxe Paint Animation demuxer
Electronic Arts CMV decoder
PTX decoder
QDM2 decoder
QuickDraw decoder
TIFF decoder
Tiertex Limited SEQ decoder
aac decoder
avi demuxer
avs demuxer
bink decoder
flic decoder
h264 decoder
indeo2 decoder
jpeg 2000 decoder,
libx264 interface to x264 encoder
mov muxer
mpc v8 decoder
rasterfile decode
shorten decoder
sun raster decoder
unsharp filter
vmd audio decoder
vmd video decoder
wmapro decoder
wmavoice decoder
xan decoder

These releases also add libaacplus support and include all changes from libav.org 0.7.2.
We recommend users, distributors and system integrators to upgrade unless they use current git master.

September 28, 2011

New stuff in git master:

    libaacplus support
    ACT/BIT demuxers
    AMV video encoder
    g729 decoder
    stdin control of drawtext
    2bpp, 4bpp png support
    interlaced 1bpp and PAETH png fixes
    libspeex encoding support
    hardened h264 decoder that wont overread the bitstream
    wtv muxer
    H/W Accelerated H.264 Decoding on Android
    stereo3d filter from libmpcodecs works now
    an experimental jpeg2000 encoder
    many bugfixes
    libswresample
    ...

September 22, 2011

We have made 2 new point releases that fix more security issues. They also include many bugfixes and a few backported features, for example speex encoding support through libspeex has been backported. All changes from the latest libav release (0.7.1) are included as well. Grab them from our download page. or even better use latest git master.

September 15, 2011

FFmpeg now has a ProRes decoder in master git.

We want to support more raw or 10bit or broadcast codecs. We need samples of the following codecs. If you have some, please upload them to our trac.

Codec name / isom or fourcc

Pinnacle TARGA2000	dvr1
Pinnacle TARGA Cine YUV	Y216
BlackMagic Design 	Vr21
Digital Voodoo DV10 HD10
Media-100 844/X Uncompressed v.2.02	MYUV
Media-100 iFinish Transcoder 	dtmt
Accom SphereOUS v.3.0.1 	ImJG
Abekas ClipStore MXc J2K Compressed v.3.0.2	HDJ1 HDJK
BOXX v.1.0	bxrg bxbg bxyv bxy2
LiveType Codec Decompressor	pRiz
Cineon DPX 10-bit Y'CbCr 4:2:2	D210 C310 DPX cini
Radius DV YUV PAL/NTSC	R420 R411

September 7, 2011

We have made 2 new point releases that fix several security issues, amongth them MSVR-11-0088. They also include many bugfixes and a few backported features. All changes from the latest libav release (0.7.1) are included as well. Grab them from our download page. or even better use latest git master.

August 29, 2011

We have added support for H.264 4:2:2 intra, there are some new 8->10bit fixes in swscale, ffplay has more accurate AV-sync, ogg duration is more accurate now, we can decode WMVP and WVP2 streams and many many other new things and bugfixes. All in ffmpeg git master.

July 28, 2011

We have made 2 new point releases that fix several security issues, amongth them MSVR-11-0080. They also include many bugfixes and a few backported features. All changes from libav 0.7.1 are included as well. Grab them from our download page. or even better use latest git master.

June 24, 2011

Instead of having fun outside in the warm summer months, we have made a new release: FFmpeg 0.8! All bugfixes and merges from ffmpeg-mt and libav are included in this release. Although we still recommend you use the latest git version of our code.

We have also made an OLDABI release: FFmpeg 0.7.1. It contains almost all of the features, bugfixes and merges of ffmpeg-mt and libav of 0.8, while being compatible with the 0.6 ABI and API. It has a few missing features, read the Changelog for more information.

May 3, 2011

FFmpeg now accesses x264 presets via libx264. This extends functionality by introducing several new libx264 options including -preset-tune, and -profile. You can read more detailed information about these options with "x264 --fullhelp".

The syntax has changed so be sure to update your commands. Example:

ffmpeg -i input -vcodec libx264 -preset fast -tune film -profile main -crf 22 -threads 0 output

April 27, 2011

FFmpeg now has an oldabi branch. It is updated to master but with the old ABI. Only fixes that break the old ABI are missing from this branch.

To access the oldabi branch, clone FFmpeg, then do

git checkout oldabi

To get back to latest FFmpeg, just run:

git checkout master

April 14, 2011

FFmpeg can now decode 9-bit and 10-bit H.264 streams, used in particular by AVCIntra 50.

April 4, 2011

In order to supply our release users with the newest features and bug fixes we are in the process of making a new release. The release will be based on the latest development tree while staying API/ABI compatible to the previous release.

Please download the release candidate and report problems to our bug tracker.

March 30, 2011

Win32 and Win64 builds of FFmpeg are now available at http://ffmpeg.zeranoe.com/builds/

Please report any bugs to our bug tracker.

March 21, 2011

Today FFmpeg-mt, the multithreaded decoding branch, has been merged into FFmpeg. This has been a long awaited merge, and we would like to thank Alexander Strange for his patience and hard work.

Testing is appreciated and if you find any bugs please report them to our bug tracker.

March 21, 2011

The mailing lists have been fully migrated to ffmpeg.org!

The FFmpeg mailing lists were moved from sourceforge.net to mplayerhq.hu in April 2005, and moved from mplayerhq.hu to ffmpeg.org in 2011.

Unfortunately the lists were down for a few hours because of the abrupt shut down on the previous server[1]. We apologize for this interruption. Also we could not move the subscribers of the libav-user mailing list (libav-user is for application developers using libav* libraries from the FFmpeg project). Even though libav-user was not listed in the shut down announcement[1], it was also shut down.

If you are not yet subscribed we encourage you to do so now if you are interested in FFmpeg or multimedia or both. Visit our contacts page to find out more about the various mailing lists surrounding the FFmpeg project. You can also find the archives there if you like to browse the old posts.

As stated in the previous news entry we are in the process of recovering our project infrastructure. We will keep you posted.

March 17, 2011

Reinhard Tartler backported several security fixes to the 0.5 release branch and made another point release, that is 0.5.4. Note, 0.5 is quite old and this release is mostly for those stuck with the 0.5 branch, and not so interesting for end users.

    Changelog between 0.5.3 and 0.5.4

- Fix memory corruption in WMV parsing (addresses CVE-2010-3908)
- Fix heap corruption crashes (addresses CVE-2011-0722)
- Fix crashes in Vorbis decoding found by zzuf (addresses CVE-2010-4704)
- Fix another crash in Vorbis decoding (addresses CVE-2011-0480, Chrome issue 68115)
- Fix invalid reads in VC-1 decoding (related to CVE-2011-0723)
- Do not attempt to decode APE file with no frames
  (addresses http://packetstorm.linuxsecurity.com/1103-exploits/vlc105-dos.txt)

March 15, 2011

FFmpeg has been forked by some developers after their attempted takeover[1] two months ago did not fully succeed. During these two months their repository was listed here as main FFmpeg repository. We corrected this now and list the actual main repository and theirs directly below. All improvements of their fork have been merged into the main repository already.

Sadly we lost a not so minor part of our infrastructure to the forking side. We are still in the process of recovering, but web, git and issue tracker are already replaced.

Readers who want to find out more about the recent happenings are encouraged to read through the archives of the FFmpeg development mailing list[2]. There was also a bit of coverage on some news sites like here [3].

February 24, 2011

FFmpeg development has moved to Git, and the SVN repository is no longer updated. The SVN repository may be removed in a near future, so you're recommended to use a Git repository instead.

The last revision committed to SVN was r26402 on 2011-01-19 and replaced the svn:external libswscale with a standalone copy.

Oct 18, 2010

We have just pushed the first point release from our 0.6 release branch: FFmpeg 0.6.1. This is a maintenance-only release that addresses a small number of bugs and security issues. It also adds a newer version of the AAC decoder, which enables the playback of HE-AAC v2 media.

We have also taken the time make another point release our 0.5 branch: FFmpeg 0.5.3. It is a maintenance-only release that addresses a security issue and a minor set of bugs.

Distributors and system integrators are encouraged to update and share their patches against our release branches.

June 15, 2010

A bit longer than actually expected, but finally, we are proud to announce a new release: FFmpeg 0.6. Check out the release notes and changelog.

It is codenamed "Works with HTML5" as the special focus of this release were improvements for the new multimedia elements in HTML5. The H.264 and Theora decoders are now significantly faster and the Vorbis decoder has seen important updates. This release supports Google's newly released libvpx library for the VP8 codec and the Matroska demuxer was extended to support to WebM container.

This release includes again an extensive number of changes; some of its highlights are:

  • Significant work to support at least decoding of all widespread mainstream proprietary codecs, and as usual broad coverage of widespread non-proprietary codecs, such as:
    • decoders and encoders
      • VP8 (via Google's libvpx library)
    • decoders
      • AMR-NB
      • Atrac1
      • HE-AAC v1
      • Bink
      • Bluray (PGS) subtitle
      • MPEG-4 Audio Lossless Coding (ALS)
      • WMA Pro
      • WMA Voice
  • Highlights among the newly supported container formats:
    • demuxers and muxers
      • Adobe Filmstrip
      • SoX native format
      • WebM support in Matroska de/muxer
    • demuxers
      • Bink
      • Core Audio Format
      • Dirac in Ogg
      • IV8
      • QCP
      • VQF
      • Wave64
    • muxers
      • IEC-61937
      • RTSP
  • faster AAC decoding
  • faster H.264 decoding
  • numerous ARM optimizations
  • important updates to the Vorbis decoder
  • RTP packetization support for H.263, and AMR
  • RTP depacketization support for AMR, ASF, H.263, Theora and Vorbis
  • RTMP/RTMPT/RTMPS/RTMPE/RTMPTE protocol support via librtmp
  • the new ffprobe tool
  • VorbisComment writing for FLAC, Ogg FLAC and Ogg Speex files
  • and so much more!

June 2, 2010

We are pleased to announce that FFmpeg will be present at LinuxTag in Berlin June 9-12 where we will be showing some spectacular demos. There will also be some trolls.

May 25, 2010

We have just pushed out another point release from our 0.5 release branch: FFmpeg 0.5.2. This is a maintenance-only release that addresses a small number of security and portability issues. Distributors and system integrators are encouraged to update and share their patches against this branch.

March 19, 2010

Once again, FFmpeg has been accepted to take part in the Google Summer of Code. Here is the Google SoC FFmpeg page.

We have a list of proposed project ideas available so, if you think you might be interested, head over there to see if there is any project on which you wish to work and for which you may wish to make an application. The list is still in flux, and you're free to come up with your own ideas, but note that proposals should be closely tied to the progression of FFmpeg's code base.

We would like prospective students to show us that they've got what it takes to be a contributor to FFmpeg. If you think you're suited, then please complete a small task before submitting your Summer-of-Code proposal. Note that many of the proposed Summer-of-Code projects have specific tasks that you would want to work on, since they would show us that you're comfortable in that particular piece of our codebase that relates to your specific project. Send patches to the mailing list for review, so that you will learn about our patch review process, inline replying (because we don't like top-posting on our mailing lists) and general interactions with our developer base.

The sooner you start communicating with us and working within our code base, the sooner both you and we will ascertain your suitability and you will get used to our development methodology. You have until the application deadline to complete your small task. Good luck!

March 2, 2010

We have just pushed out a point release from our 0.5 release branch: FFmpeg 0.5.1. This release fixes security, packaging and licensing issues for FFmpeg 0.5, but it is a maintenance only release; no new codecs, formats or other feature are being introduced. The full details are spelled out in the the release notes and changelog.

There have been security fixes for the ASF, Ogg and MOV/MP4 demuxers as well as the FFv1, H.264, HuffYUV, MLP, MPEG audio and Snow decoders. libswscale can now be compiled in LGPL mode, albeit with x86 optimizations disabled. Some non-free bits in a test program were replaced. The AC-3 decoder is now completely LGPL. AMR-NB/WB support is now possible in free software through the OpenCORE libraries.

To help packagers, the x264 glue code was updated to work with newer versions and symbol versioning was backported, as was the lock management API. The symbol versioning change is enabled on platforms that support it. This allows users to upgrade from 0.5.1 to the upcoming 0.6 release without having to recompile their applications. While this release is both API and ABI compatible with 0.5, please note that distributors have to recompile applications against 0.5.1 in order to make seamless upgrades to 0.6 possible.

March 1, 2010

We have been busy over the past few months. Among other things, the results are an Indeo 5 video decoder as well as audio decoders for AMR-NB, Sipro, MPEG-4 ALS and WMA Voice, complete support for Bink, CDG and IFF PBM/ILBM bitmaps, an RTSP muxer, Bluray (PGS) subtitle support, a protocol for file concatenation and the ffprobe tool for extracting information from multimedia files.

September 23, 2009

In 1992 Sony introduced the first Minidisc player. 17 years later it is now possible to transfer and play back the raw ATRAC data from the actual digital disc with the help of FFmpeg, tools developed by the Linux Minidisc project and official hardware (MZ-RH1). So if you have lots of digital recordings stored on Minidisc now is the time to archive it all.

One of the last entrenchments of proprietary multimedia has fallen: Windows Media Audio Pro support is finally available in FFmpeg. It decodes all known samples flawlessly and is considerably faster than the binary decoder from Microsoft. A big thank you goes out to all the reverse engineers and programmers who made this possible. It really was a herculean effort.

August 24, 2009

Just a very short time after its launch (~10 years), FFmpeg now supports decoding of TwinVQ (remember .vqf files?). Now FOSS enthusiasts can finally contribute to the late 90's discussion if it sounds better than MP3 or not.

July 24, 2009

FFmpeg has removed support for libamr as of svn revision 19365. It has been replaced with support for libopencore-amr. Naturally the configure options have changed. The libamr options have been removed and there are two new options to take their place:

  • --enable-libopencore-amrnb
  • --enable-libopencore-amrwb

The reason for this change is that the libamr license was non-free, while libopencore-amr is licensed under an Apache 2 license. The change was discussed at length on the developer mailing list during May, June, and July. This has several effects:

  • You may now distribute FFmpeg builds with support for dynamically loading libopencore-amr
  • Support for AMR-WB encoding has been removed since libopencore-amr does not support it

May 7, 2009

FFmpeg was granted 9 slots to fill with applicants. After the gruelling application and qualification process, we will be running the following tasks this year:

  • RTMP Support
    • Student: Kostya Shiskov
    • Mentor: Ronald Bultje
  • Libswscale Cleanup
    • Student: Ramiro Polla
    • Mentor: Reimar Döffinger
  • S/PDIF Multiplexer
    • Student: Bartlomiej Wolowiec
    • Mentor: Benjamin Larsson
  • Playlist/Concatenation Support
    • Student: Geza Kovacs
    • Mentor: Baptiste Coudurier
  • JPEG2000 Codec
    • Student: Jai Menon
    • Mentor: Justin Ruggles
  • Implement the New Seeking API in Libavformat
    • Student: Zhentan Feng
    • Mentor: Baptiste Coudurier
  • MPEG-4 ALS Decoder
    • Student: Thilo Borgmann
    • Mentor: Justin Ruggles
  • Implementation of AVFilter infrastructure and various audio filters
    • Student: Kevin Dubois
    • Mentor: Vitor Sessak
  • Finish AMR-NB decoder and write an encoder
    • Student: Colin McQuillan
    • Mentor: Robert Swain

Congratulations to all the successful applicants. Work hard, communicate well and prosper! Good luck!

March 26, 2009

Once again, FFmpeg has been accepted to take part in the Google Summer of Code. Here is the Google SoC FFmpeg page.

We have a list of proposed project ideas available so, if you think you might be interested, head over there to see if there is any project on which you wish to work and for which you may wish to make an application. The list is still in flux, and you're free to come up with your own ideas, but note that proposals should be closely tied to the progression of FFmpeg's code base.

If you're a student who thinks you have what it takes, we require that prospective students complete some degree of small task before they will be considered to take part in the program for FFmpeg. Take a look at the list, pick something to do, learn about inline replying because we don't like top-posting on our mailing lists and then tell us on the FFmpeg-devel mailing list your small task of choice.

The sooner you start communicating with us and working within our code base, the sooner both you and we will ascertain your suitability and you will get used to our development methodology. You have until the application deadline to complete your small task. Good luck!

March 23, 2009

A new mailing list has been created for ffserver users. The list is intended to create an environment for discussion amongst ffserver users so that they can better receive support and support each other. Interested parties can subscribe and view the archives via the contact page.

March 10, 2009

It has been a very long time since we last made a release and many did not think we would make one again but, back by popular demand, we are proud to announce a new release: FFmpeg 0.5. Check out the release notes and changelog.

It is codenamed "half-way to world domination A.K.A. the belligerent blue bike shed" to give an idea where we stand in the grand scheme of things and to commemorate the many fruitful discussions we had during its development.

This release includes a very extensive number of changes, but some of the highlights are:

  • Significant work to support at least decoding of all widespread mainstream proprietary codecs, such as:
    • decoders and encoders
      • ALAC
      • Flash Screen Video
      • WMAv2 decoder fixed, WMAv1/v2 encoder
    • decoders
      • Atrac3
      • MLP/TrueHD
      • On2 VP3 improvements and VP5/VP6 support
      • RealAudio Cooker and fixes for 14.4 and 28.8
      • RealVideo RV30/40
      • WMV3/WMV9/VC-1 and IntraX8 frame support for WMV2/VC-1
  • Broad coverage of widespread non-proprietary codecs, including:
    • decoders and encoders
      • DNxHD
      • DVCPRO50 (a.k.a. DV50)
      • Floating point PCM
      • GSM-MS
      • Theora (and encoding via libtheora)
      • Vorbis
    • decoders
      • AAC with ADTS support and >2x the speed of FAAD! (no HE AAC support yet)
      • AC-3 that is faster than liba52 in 5.1, up to 2x faster in stereo and also supports E-AC-3! Hence liba52 is now obsolete.
      • DCA
      • DVCPRO HD (a.k.a. DV100)
      • H.264 PAFF and CQM support, plus slice-based multithreaded decoding
      • Monkey's Audio
      • MPEG-2 video support for intra VLC and 4:2:2
      • Musepack
      • QCELP
      • Shorten
      • True Audio (TTA)
      • Wavpack including hybrid mode support
  • Highlights among the newly supported container formats:
    • demuxers and muxers
      • GXF
      • MXF
    • demuxers
      • NullSoft Video (NSV)
    • muxers
      • iPhone/iPod compatibility for MP4/MOV
      • Matroska
      • NUT
      • Ogg (FLAC, Theora and Vorbis only)
      • ShockWave Flash (SWF)
  • libavdevice
  • ffserver is working again.
  • a shiny, new, completely revamped, non-recursive build system
  • cleaner, more consistent code
  • an all new metadata API
  • and so much more!

March 4, 2009

Google are again running their Summer of Code program and, as usual, we will be applying for a project position. As such we will need strong project proposals and qualification tasks for the students to complete.

To all the students out there who want to work on FFmpeg over the summer, the sooner you begin to contribute to the project the better. Working on digital multimedia software is not the easiest task and getting code into FFmpeg's trunk repository demands significant rigor and commitment.

Until we are officially accepted into the program, you could take a look at the list of small tasks we have and try to complete one of those. Support for development of FFmpeg is available via the FFmpeg-devel mailing list or IRC.

December 20, 2008

RealVideo 3.0 decoder added. Still working the bugs out, please test and report any problems.

December 20, 2008

The FFmpeg project would like to recognize and thank the people at Picsearch for their help improving FFmpeg recently. The Picsearch team makes extensive use of FFmpeg and provided feedback to FFmpeg in the form of thousands of files that either crash FFmpeg or use unsupported/unknown codecs. The FFmpeg development team is putting this information to work in order to improve FFmpeg for everyone.

We know that there are other organizations using FFmpeg on a large scale to process diverse input types. The FFmpeg team invites those organizations to provide similar feedback about problems encountered in the wild.

December 3, 2008

A bunch of new formats have recently been added to FFmpeg, namely a QCELP/PureVoice speech decoder, a floating point PCM decoder and encoder, a Nellymoser ASAO encoder, an Electronic Arts TGQ decoder, Speex decoding via libspeex, an MXF muxer, an ASS/SSA subtitle demuxer and muxer and our AC-3 decoder has been extended with E-AC-3 support. Last but not least we now have a decoder for RealVideo 4.0.

September 8, 2008

FFmpeg is undergoing major changes in its API/ABI. The last valid revision for libavcodec version 51 is r15261.

August 21, 2008

The AAC decoder from FFmpeg Summer of Code 2006 has finally been cleaned up and is now in FFmpeg trunk. It supports Main and Low Complexity profile AAC but does not yet support HE AAC v1 (LC + SBR) or v2 (LC + SBR + PS), though implementation of this support is underway. It is considerably faster than FAAD and you should expect further performance improvements and bug fixes in the coming weeks.

Also, FFmpeg now has floating point PCM support and supports MLP/TrueHD decoding (FFmpeg SoC 2008 should bring us an encoder), Apple Lossless Audio encoding (FFmpeg SoC 2008) MVI demuxing and Motion Pixels Video decoding, D-Cinema audio muxing, Electronic Arts CMV and TGV decoding and MAXIS EA XA demuxing/decoding.

June 16, 2008

UAB "DKD" (dkd.lt) have released a Nellymoser ASAO compatible decoder and encoder under the LGPL. This will aid the development of a native encoder in FFmpeg, and right now a GSoC student is working hard on just that task. A great thanks to UAB "DKD" for this contribution to the FFmpeg community.

June 11, 2008

We have added an Oma demuxer, the QuickTime variant of an IMA ADPCM encoder, a VFW grabber, an iPod/iPhone-compatible MP4 muxer, a Mimic decoder, an MSN TCP Webcam stream demuxer as well as demuxers and decoders for the following fringe formats: RL2, IFF, 8SVX, BFI.

February 7, 2008

We have added Ogg and AVM2 (Flash 9) SWF muxers, TechnoTrend PVA and Linux Media Labs MPEG-4 (LMLM4) demuxers, PC Paintbrush PCX and Sun Rasterfile decoders.

November 11, 2007

FFmpeg now supports XIntra8 frames, meaning that finally all WMV2 samples and some WMV3 samples that showed blocky color artifacts can be decoded correctly.

October 22, 2007

Beam Software SIFF demuxer and video decoder support added.

October 15, 2007

FFmpeg gets support for the Nellymoser speech codec used in flash.

October 9, 2007

Apart from a DNxHD encoder, PAFF decoding support for H.264 was committed to SVN.

September 29, 2007

AMV audio and video decoding has arrived.

September 13, 2007

In about half a year of work since the last update we have added among other things: DXA and Monkey's Audio demuxer and decoder, DNxHD, Atrac3 and AC-3 decoders, QTRLE encoder, NUT and Matroska muxers.

July 14, 2007

FFmpeg got 8 projects this year in the Google Summer of Code program. Check out the FFmpeg SoC about page for more information.

March 09, 2007

Nine months without news but with heavy development. A few select highlights are decoders for VC-1/WMV3/WMV9, VMware, VP5, VP6 video and WavPack, IMC, DCA audio and a WMA encoder.

저작자 표시

'소스코드' 카테고리의 다른 글

Surface Samples  (0) 2012.07.12
Kinect for Windows SDK v1.5  (0) 2012.07.12
FFMPEG  (0) 2012.07.11
qt style 설정  (0) 2012.07.05
MOTODEV app-validator  (0) 2012.06.28
안드로이드 개발 시, out.xml 파일 생성되면  (0) 2012.06.27

qt style 설정

http://nokia.svn.wordpress.org/trunk/src/qss/common.qss


item:selected 


QLineEdit:focus, QTextEdit:focus {
	border: 1px solid black;

	color: black;
}

등등 이벤트 스타일 지정 예

저작자 표시

'소스코드' 카테고리의 다른 글

Kinect for Windows SDK v1.5  (0) 2012.07.12
FFMPEG  (0) 2012.07.11
qt style 설정  (0) 2012.07.05
MOTODEV app-validator  (0) 2012.06.28
안드로이드 개발 시, out.xml 파일 생성되면  (0) 2012.06.27
Microsoft Windows SDK for Windows 7 and .NET Framework 4  (0) 2012.06.21

MOTODEV app-validator

http://developer.motorola.com/testing/app-validator/

 

저작자 표시

'소스코드' 카테고리의 다른 글

FFMPEG  (0) 2012.07.11
qt style 설정  (0) 2012.07.05
MOTODEV app-validator  (0) 2012.06.28
안드로이드 개발 시, out.xml 파일 생성되면  (0) 2012.06.27
Microsoft Windows SDK for Windows 7 and .NET Framework 4  (0) 2012.06.21
플래시 게임 수집하자.  (0) 2012.06.15

안드로이드 개발 시, out.xml 파일 생성되면

안드로이드 개발 시, 안드로이드를 컴파일 하면, 

xml 파일이 out.xml 파일로 생성이된다. 

이것 때문에 실행 안되곤 한다. 


해결 방법은 다음과 같다. 



http://stackoverflow.com/questions/2393103/android-sdk-main-out-xml-parsing-error


Better fixing: Eclipse -> Window -> Preferences -> Run/Debug -> Launching -> Lauch Configuration

Cross : Filter checked launch configuration types and Cross also : XSL (at the end of the list)

Make the problem go away Forever !


저작자 표시

Microsoft Windows SDK for Windows 7 and .NET Framework 4

http://www.microsoft.com/en-us/download/dlx/listdetailsview.aspx?id=8279





Quick detail

Microsoft Windows SDK for Windows 7 and .NET Framework 4 


Full details:

Overview 
System requirements 
Instructions
File name:
winsdk_web.exe
Version:
7.1
Download size:
498 KB
Language:
English
Date published:
5/19/2010



Thank you for downloading

Microsoft Windows SDK for Windows 7 and .NET Framework 4 


저작자 표시

플래시 게임 수집하자.

쉽게 구할 수 있는 실행 파일, 어렵게 구하는 소스 파일 모두 수집해보려고 합니다. 


실행 파일 

  1. 테트리스 http://monodreamer.cafe24.com/games/tetris/
  2. 달리기 게임 http://www.foddy.net/Athletics.html




소스 파일




저작자 표시

안드로이드 사용 이미지 변경, Invalid file name: must contain only [a-z0-9_.]

안드로이드에서 사용하는 이미지를 추가했는데, 다음과 같은 에러가 나타나는 경우가 있습니다. 


디자이너가 작업해준 파일을 그냥 사용하려고 했는데, 이게 뭐야 하지 마시고, 

처음에 파일 이름 규칙을 알려주고 시작해야 하는 것이 좋겠습니다. 


에러는 다음과 같습니다. 

Invalid file name: must contain only [a-z0-9_.]

파일이름이 null 이면 안된다. 


파일 이름에 소문자, 숫자, 아래바만 사용하도록 요청하세요. 


보통 대문자가 많아서 에러를 발생 합니다. 



---------------------------------

이름 변경 작업 순서.(울투라에디터)

윈도우즈 환경에서 안드로이드 프로그래밍 개발 중일때, 


1. 'cmd'를 이용하여 이미지 위치로 갑니다. 

2. 'dir >> 1.bat' 를 합니다. 

3. 파일 이름만 복사 합니다. 

   -> 열모드 선택 하는 것이 좋습니다. 

4. 파일 이름을 모두 소문자로 변경합니다. 

5. 소문자로 변경한 곳 앞에 복사 해놓았던 원본 이름을 붙입니다. 

6. 'ren ' 을 붙입니다. 

   -> 구성 

        'ren [원본 파일이름]  [소문자로 수정된 파일 이름]' 

   -> 메크로 기능을 이용해서 전체 내용에 대해서 적용합니다. 

7. 이클립스에서 Refresh(F5)를 해서, 수정한 내용이 불러오도록 합니다. 


3분 작업이면 끝



저작자 표시

안드로이드 개발 환경

또 다른 pc에 셋팅 중. ㅠㅠ

안드로이드 개발 환경을 쉽게 구축 할 수 있는 방법이 있으면 좋겠다. 



  1. 다운로드 
    6/2일 현재 최신 버전 
    1. 자바 JDK
      1. http://java.sun.com
        java_ee_sdk-6u4-jdk7-windows
    2. 이클립스 설치
      1. http://www.eclipse.org
        eclipse-java-indigo-SR2-win32
    3. Android SDK
      1. http://developer.android.com/sdk
        android-sdk_r18-windows
  2. 설치
  3. 셋팅 
  4. 테스트 





저작자 표시

'작업 > work2012' 카테고리의 다른 글

QNetworkConfigurationManager  (0) 2012.07.16
블랙박스 :: 우리 쓰리제이는 PC 프로그램을 만들고 있습니다.  (0) 2012.07.16
안드로이드 개발 환경  (0) 2012.06.02
연구 전담 부서  (0) 2012.05.23
[ref]c# camera  (0) 2012.05.23
[ref]c# 3d  (0) 2012.05.23

ZedGraph

c#으로 작업하기 좋음. 


ZedGraph

- http://zedgraph.sourceforge.net/index.html


문서 파일 


ZedGraph Help.chm


저작자 표시

[소스 검색하기]2D Fast Wavelet Transform Library for Image Processing

요즘 취미로 좋은 소스 수집하기 중...ㅋ.ㅋ. 

http://www.codeproject.com/Articles/20869/2D-Fast-Wavelet-Transform-Library-for-Image-Proces



Screenshot - fwt2d.jpg

Introduction

I've been involved with wavelet-analysis since my Ph.D studies and over the years developed various wavelet-transforms C++ libraries. This one concerns 2D implementation of the Fast wavelet transform (FWT). The 2D FWT is used in image processing tasks like image compression, denoising and fast scaling. I intended to design the implementation of the 2D FWT with custom filter support and simple in usage. I've equipped it with known filter families: Daubechies, Coiflets and Biorthogonal wavelets. The wavelets used in compression tasks are Bior53 and Bior97. With Daub1 filter you can achieve fast dyadic image scaling algorithm. You may also experiment with other filters for compression and denoising and compare the results, you may also add your custom wavelets. The nice feature of this library is that reconstruction FWT filters are rearranged the way that you perform just convolution operations on the spectrum, which is very important if you'd like MMX optimizations.

Background

You need basic understanding of wavelet transforms to be familiar with the topic discussed. You can learn it from numerous sources available on the Internet, also from Matlab Wavelet Toolbox documentation or have a look at thiswebsite. Also some C++ experience with abstract base classes is desirable if you plan to extend the code by deriving your own implementations, numerically optimized for a particular filter.

Using the Code

The GUI for the library has been designed by me using Windows Forms with .NET 2.0 Framework, so anyone can start using it right now without much ado. It provides image visualization, 2D FWT transformation in RGB color space and synthesis. Just unzip the binary file with the \filters directory and run it. Load the image using the File menu, and transform it with the Transforms menu. After clicking FWT2D RGB transform, you'll be presented with a translucent dialog box, where you can select the filter (present in the \filters dir), select the number of scales (usually 3 for image transform) and denoising threshold (in the range of 0 to 128) below which, all the pixels at the high frequency bands will be zeroed. In the FWT library this threshold is decreased 4 times every scale so you do not end up with too much distortion at the reconstructed image. The caption of the image frame is headed with the percent of non-zero coefficients at the FWT spectrum. After reconstruction you are also presented with the error in dB compared to the original, untransformed image.

The C++ library itself is composed of several classes:

  • BaseFWT2D - The abstract base class for the transform
  • FWT2D - The implementation of the 2D FWT, it is a derived class from BaseFWT2D
  • vec1D - The 1D vector wrapper for the filters

The 2D FWT transform and synthesis functions are defined in the BaseFWT2D (trans() and synth()) and they use private virtual functions for transforming and synthesizing rows and columns of the image which are implemented in the FWT2D class. I've designed such an implementation that anyone can derive its own optimized versions of the FWT transform for a particular filter I've mentioned in the Background section. I have written the implementations of MMX optimized versions of Bior53, Bior97 for image compression and Daub1 for fast image scaling and plan to post them in the future in addition to this article.

Quick Start with the FWT Library

To start with the 2D FWT library, you just need to initialize FWT2D object with the filter name, providing full path to its location, check the status after the constructor has been called, initialize desired width and height of the single channel of the image (you need to arrange separate channels, from [RGB] stream copy just R channel to the image Width Height buffer, also the same for G and B channels, you may also convert them to YUV space or the like, but then you need separate buffers for Y, U and V channels also) and continue with transformation and synthesis. This way every class object is targeted to the specific wavelet filter and image channel (you may use different filters for different channels then).

/* 
unsigned char* pRchannel = new unsigned char[width * height];
for (unsigned int i = 0; i < width * height; i++) {
        //Taking R channels from RGB buffer
        pRchannel[i] = pRGB[3*i + 0];
}
*/
        
unsigned char* pRchannel;  //R channel buffer unsigned char 0...255
unsigned int width;        //R channel width
unsigned int height;       //R channel height
    
const wchar_t* perr;       //error text message returned after status() function
unsigned int status;       //numerical error value 
    
unsigned int scales = 3;    
unsigned int TH = 20;
    
FWT2D r_fwt(L"path\\bior97.flt");
r_fwt.init(width, height);
perr = r_fwt.status(status);
if (perr != 0) {
        //Output text representation of the error pointed by perr
        return;
}
    
//After initialization you can do multiple trans() and synth() calls
//with upcoming image data for analysis or FWT spectrum for synthesis
r_fwt.trans(pRchannel, scales, TH);
r_fwt.synth(pRchannel);
//and so on ...  
    
//clean up before destroying the object
r_fwt.close(); 

If you want to access the FWT spectrum after trans() call for visualization or entropy coding, you can do it withBaseFWT2D functions getspec() or getspec2d(). The first one provides 1D char pointer to the spectrum and the later 2D char pointer so you can access the spectrum as a two dimensional array. Note, that the original color channels are converted to char by subtracting 128 and the FWT spectrum is in char interval too -128 ... 127.

char** pspec = r_fwt.getspec2d();
for (unsigned int j = 0; j < height; j++ ) {
        for (unsigned int i = 0; i < width; i++ ) {
                char val = pspec[j][i];
        }            
}

Detailed Reference to the FWT Library

The nice 1D vector wrapper is implemented in the vec1D class. It allows defining the starting index of the first element in the array. By default, it is 0, as in usual C array, but you can also define positive one, for example 1, to end up with Matlab like array, or negative one, as it is with wavelet filters. If you define -3 as the starting index and 6 as the length of the array (total number of items) you will access the elements of the array with the indices: -3, -2, -1, 0, 1, 2. This way your array is centered around 0, which coincides with particular wavelet filter center for example. The array data itself is 16 bit aligned for performing SSE optimized floating point operations.

//plain C array of 6 elements with indices from 0 to 5 initialized to 0.0f
unsigned int size = 6;
vec1D vec1(size);
vec1(0) = 1.0f; //Set the first element to 1.0
    
//Initialize the array data with external buffer
unsigned int offset = -3;
float data[] = {-3.0f, -2.0f, -1.0f, 0.0f, 1.0f, 2.0f};
vec1D vec2(size, offset, data);
    
//Print out the array contents
for (int i = vec2.first(); i <= vec2.last(); i++ ) {
        wprintf(L"%i %f\n", i, vec2(i));  
}

The abstract base class BaseFWT2D provides all the necessary functions for transforming and reconstructing the image, getting the spectrum, dumping loaded wavelet filters, retrieving the number of scales in the spectrum. Start by constructing the object of FWT2D class which publicly derives from BaseFWT2D.

  • FWT2D::FWT2D(const w_char* filter_name);
  • FWT2D::FWT2D(const wchar_t* fname, const float* tH, unsigned int thL, 
        int thZ, const float* tG, unsigned int tgL, int tgZ, const float* H, 
        unsigned int hL, int hZ, const float* G, unsigned int gL, int gZ);

With the first constructor you provide the filters from an external file specifying full path to it. With the second one, you can provide the filters from memory buffers. tH and tG are the low-pass and high-pass filters for image transformation, thL and tgL their length and thZ and tgZ are the first indices values (-3 for example). The same applies to and filters but they are used for image synthesis from the spectrum.

  • const wchar_t* BaseFWT2D::status(int& status);

Check the status after constructors. status = 0 indicates success and the function returns a NULL pointer, otherwise inspect the returned error message.

Then you need to initialize the class object with either of the functions:

  • void BaseFWT2D::init(unsigned int width, unsigned int height);
  • void BaseFWT2D::init(char* data, char* tdata, unsigned int width, 
        unsigned int height);

The last one provides an opportunity to supply external buffers to the class object data and tdata both of thewidth by height size. This way you can fill the data buffer from external source and proceed with:

  • int BaseFWT2D::trans(unsigned int scales, unsigned int th = 0);

The image will be replaced with FWT spectrum in the data buffer in case the function returns NULL on success or-1 if you have not initialized the library.

Or use the init(unsigned int width, unsigned int height) function for initialization and perform the image transformation supplied from external data buffer with two functions:

  • int BaseFWT2D::trans(const char* data, unsigned int scales, unsigned int th = 0);
  • int BaseFWT2D::trans(const unsigned char* data, unsigned int scales, 
        unsigned int th = 0);

The first one transforms the image supplied in the data buffer of type char, that is your data already DC shifted and in the range -128 ? 127. The last one accepts unsigned char buffer and subtracts 128 from it using MMX optimizedprivate function. The functions return NULL if they succeed, otherwise -1 if you have not initialized the library. Note that the image data is kept intact after the transformation. scales defines the number of levels of FWT transform and th is the denoising threshold (0 ... 128).

You can access the FWT spectrum using the following functions:

  • char* BaseFWT2D::getspec();
  • char** BaseFWT2D::getspec2d();

With char* pointer you've got straight width by height array and with char** pointer you can access individual coefficient at (column, row) location as with 2D array pointers.

To reconstruct the image, you use the following functions:

  • int BaseFWT2D::synth();
  • int BaseFWT2D::synth(char* data);
  • int BaseFWT2D::synth(unsigned char* data);

The first one just reconstruct the image from the spectrum to the class's own internal buffers (that can be supplied externally with corresponding init() function). The second and third reconstruct the image and copy it to databuffer in char -128 ... 127 or unsigned char 0 ... 255 format.

You can keep transforming and synthesizing the images any number of times once you created the class object and initialized it with image width and height. After you finished with the class object, call close() method and destroy the object.

  • void BaseFWT2D::close();

You can get and set the number of FWT scales with the functions:

  • unsigned int BaseFWT2D::getJ();
  • void BaseFWT2D::setJ(unsigned int j);

The last one provides the opportunity to change the number of scales before synthesis, so if you've got FWTspectrum with 3 scales you can change it to, say, 1 and perform only one level of FWT synthesis.

If you are planning to extend the code yourself, you need to derive your own class from BaseFWT2D class and provide overrides for transrows()transcols()synthrows() and synthcols(). You need to be well versed with 2D FWT analysis also. It is useful if you'd like to implement MMX, SSE optimized versions for a filter of a particular length. As I mentioned before I'm planning to post SSE optimizations for Bior53 and Bior97 wavelet-filters and MMX one for Daub1 filter in the future.

Points of Interest

The tedious point to program was writing BaseFWT2D::makeHGsynth() function, which rearranges the synthesis filters coefficients to odd and even ones. Having these, you can substitute 2*m and 2*m+1 operations of selecting even and odd coefficients from the synthesis filters during reconstruction process and proceed with just straight convolution.

License

This article, along with any associated source code and files, is licensed under The GNU General Public License (GPLv3)

About the Author

Chesnokov Yuriy

Engineer

Russian Federation Russian Federation

Member

Former Cambridge University postdoc (http://www-ucc-old.ch.cam.ac.uk/research/yc274-research.html), Department of Chemistry, Unilever Centre for Molecular Informatics, where I worked on the problem of complexity analysis of cardiac data.
 
As a subsidiary result we achieved 1st place in the annual PhysioNet/Computers in Cardiology Challenge 2006: QT Interval Measurement (http://physionet.org/challenge/2006/)
 
My research intrests are: digital signal processing in medicine, image and video processing, pattern recognition, AI, computer vision.
 
My recent publications are:
 
Complexity and spectral analysis of the heart rate variability dynamics for distant prediction of paroxysmal atrial fibrillation with artificial intelligence methods. Artificial Intelligence in Medicine. 2008. V43/2. PP. 151-165 (http://dx.doi.org/10.1016/j.artmed.2008.03.009)
 
Face Detection C++ Library with Skin and Motion Analysis. Biometrics AIA 2007 TTS. 22 November 2007, Moscow, Russia. (http://www.dancom.ru/rus/AIA/2007TTS/ProgramAIA2007TTS.html)
 
Screening Patients with Paroxysmal Atrial Fibrillation (PAF) from Non-PAF Heart Rhythm Using HRV Data Analysis. Computers in Cardiology 2007. V. 34. PP. 459–463 (http://www.cinc.org/archives/2007/pdf/0459.pdf)
 
Distant Prediction of Paroxysmal Atrial Fibrillation Using HRV Data Analysis. Computers in Cardiology 2007. V. 34. PP. 455-459 (http://www.cinc.org/archives/2007/pdf/0455.pdf)
 
Individually Adaptable Automatic QT Detector. Computers in Cardiology 2006. V. 33. PP. 337-341 http://www.cinc.org/archives/2006/pdf/0337.pdf)

저작자 표시

이쁜 글자 만들기.

http://www.codeproject.com/Articles/42529/Outline-Text


잘만들었네요. 


Table of Contents

Sample Screenshot

Introduction

I am an avid fan of animes (Japanese animations). As I do not understand the Japanese language, the animes which I watched have English subtitles. These fan-subbed animes have the most beautiful fonts and text. Below is a screenshot of the "Sora no Manimani" which is an anime about an Astronomy club in high school.

Anime Screenshot

I was fascinated by the outline text. I searched on the web for an outline text library which allows me to do outline text. Sadly I found none. Those that I found, are too difficult for me to retrofit to my general purpose and I do not fully understand their sparsely commented codes. I decided to roll up my sleeves to write my own outline text library. In my previous article, How to Use a Font Without Installing it, a reader, knoami, commented and requested about using C# to do the same thing. Now this time taking C# users into account, every C++ code example in this article is accompanied by a C# code example. Without further ado, let us begin now!

Initializing and Uninitializing GDI+

Before we use GDI+ in C++ application, we need to initialize it. Below is an example on how to initialize GDI+ in the constructor and uninitialize GDI+ in the destructor.

// class declaration
class CTestOutlineTextApp
{
// ...
private:
    // data members
    Gdiplus::GdiplusStartupInput m_gdiplusStartupInput;
    ULONG_PTR m_gdiplusToken;
};

// default constructor
CTestOutlineTextApp::CTestOutlineTextApp()
{
    Gdiplus::GdiplusStartup(&m_gdiplusToken, &m_gdiplusStartupInput, NULL);
}

// destructor
CTestOutlineTextApp::~CTestOutlineTextApp()
{
    Gdiplus::GdiplusShutdown(m_gdiplusToken);
}

For .NET Windows Form application, GDI+ initialization and uninitialization is handled automatically for the developers.

Drawing Single Outline Text with Generic GDI+

Single Outline Text using GDI+

For this tutorial, I mostly stick to the Arial font, because Arial font comes with every Windows Operating Systems, so the sample code will work out of the box for you. To draw outline text, we have to add the string to theGraphicsPath object, using its AddString method, so that we can have its path to draw its outline. We must draw the text's outline first. To do that, we use Graphics class's DrawPath method. Lastly, we draw the text body with Graphics class's FillPath method. Simple, right?

#include <Gdiplus.h>

void CScratchPadDlg::OnPaint()
{
    //CDialog::OnPaint();
    CPaintDC dc(this);
    using namespace Gdiplus;
    Graphics graphics(dc.GetSafeHdc());
    graphics.SetSmoothingMode(SmoothingModeAntiAlias);
    graphics.SetInterpolationMode(InterpolationModeHighQualityBicubic);

    FontFamily fontFamily(L"Arial");
    StringFormat strformat;
    wchar_t pszbuf[] = L"Text Designer";

    GraphicsPath path;
    path.AddString(pszbuf, wcslen(pszbuf), &fontFamily, 
	FontStyleRegular, 48, Gdiplus::Point(10,10), &strformat );
    Pen pen(Color(234,137,6), 6);
    graphics.DrawPath(&pen, &path);
    SolidBrush brush(Color(128,0,255));
    graphics.FillPath(&brush, &path);
}

This is the equivalent C# code, using GDI+'s System::Drawing classes to draw outline text. You will notice that it is quite similar to the C++ code above.

private void OnPaint(object sender, PaintEventArgs e)
{
    e.Graphics.SmoothingMode = SmoothingMode.AntiAlias;
    e.Graphics.InterpolationMode = InterpolationMode.HighQualityBicubic;

    SolidBrush brushWhite = new SolidBrush(Color.White);
    e.Graphics.FillRectangle(brushWhite, 0, 0, 
	this.ClientSize.Width, this.ClientSize.Height);

    FontFamily fontFamily = new FontFamily("Arial");
    StringFormat strformat = new StringFormat();
    string szbuf = "Text Designer";

    GraphicsPath path = new GraphicsPath();
    path.AddString(szbuf, fontFamily, 
	(int)FontStyle.Regular, 48.0f, new Point(10, 10), strformat);
    Pen pen = new Pen(Color.FromArgb(234, 137, 6), 6);
    e.Graphics.DrawPath(pen, path);
    SolidBrush brush = new SolidBrush(Color.FromArgb(128, 0, 255));
    e.Graphics.FillPath(brush, path);
	
    brushWhite.Dispose();
    fontFamily.Dispose();
    path.Dispose();
    pen.Dispose();
    brush.Dispose();
    e.Graphics.Dispose();
}

Single Outline Text Sharp Corner Problem

We have a problem with the above text, specifically "A"; there is a sharp pointer at the top of "A". This problem exists if there are sharp edges or corners in the font glyphs and the outline is quite thick or thicker than the text body. The sharp pointer above came from the outline of the inner triangle of "A". Below is the C++ code followed by the C# code to reproduce the problem.

#include <Gdiplus.h>

void CScratchPadDlg::OnPaint()
{
    //CDialog::OnPaint();
    CPaintDC dc(this);
    using namespace Gdiplus;
    Graphics graphics(dc.GetSafeHdc());
    graphics.SetSmoothingMode(SmoothingModeAntiAlias);
    graphics.SetInterpolationMode(InterpolationModeHighQualityBicubic);

    FontFamily fontFamily(L"Arial");
    StringFormat strformat;
    wchar_t pszbuf[] = L"ABC";

    GraphicsPath path;
    path.AddString(pszbuf, wcslen(pszbuf), &fontFamily, 
	FontStyleRegular, 48, Gdiplus::Point(10,10), &strformat );
    Pen pen(Color(234,137,6), 6);
    graphics.DrawPath(&pen, &path);
    SolidBrush brush(Color(128,0,255));
    graphics.FillPath(&brush, &path);
}
private void OnPaint(object sender, PaintEventArgs e)
{
    e.Graphics.SmoothingMode = SmoothingMode.AntiAlias;
    e.Graphics.InterpolationMode = InterpolationMode.HighQualityBicubic;

    SolidBrush brushWhite = new SolidBrush(Color.White);
    e.Graphics.FillRectangle(brushWhite, 0, 0, 
	this.ClientSize.Width, this.ClientSize.Height);

    FontFamily fontFamily = new FontFamily("Arial");
    StringFormat strformat = new StringFormat();
    string szbuf = "ABC";

    GraphicsPath path = new GraphicsPath();
    path.AddString(szbuf, fontFamily, 
	(int)FontStyle.Regular, 48.0f, new Point(10, 10), strformat);
    Pen pen = new Pen(Color.FromArgb(234, 137, 6), 6);
    e.Graphics.DrawPath(pen, path);
    SolidBrush brush = new SolidBrush(Color.FromArgb(128, 0, 255));
    e.Graphics.FillPath(brush, path);
	
    brushWhite.Dispose();
    fontFamily.Dispose();
    path.Dispose();
    pen.Dispose();
    brush.Dispose();
    e.Graphics.Dispose();
}

Fortunately, I have a workaround for this problem. We can set the LineJoin property of GDI+ pen toLineJoinRound to avoid sharp edges and corners. The downside is every edge will be rounded, instead of crisp sharp as the font. Below is the C++ code to call the SetLineJoin method.

Single Outline Text Problem Solved

#include <Gdiplus.h>

void CScratchPadDlg::OnPaint()
{
    //CDialog::OnPaint();
    CPaintDC dc(this);
    using namespace Gdiplus;
    Graphics graphics(dc.GetSafeHdc());
    graphics.SetSmoothingMode(SmoothingModeAntiAlias);
    graphics.SetInterpolationMode(InterpolationModeHighQualityBicubic);

    FontFamily fontFamily(L"Arial");
    StringFormat strformat;
    wchar_t pszbuf[] = L"ABC";

    GraphicsPath path;
    path.AddString(pszbuf, wcslen(pszbuf), &fontFamily, 
	FontStyleRegular, 48, Gdiplus::Point(10,10), &strformat );
    Pen pen(Color(234,137,6), 6);
    pen.SetLineJoin(LineJoinRound);
    graphics.DrawPath(&pen, &path);
    SolidBrush brush(Color(128,0,255));
    graphics.FillPath(&brush, &path);
}

This is the equivalent C# code to solve the problem by setting the LineJoin property to LineJoin.Round.

private void OnPaint(object sender, PaintEventArgs e)
{
    e.Graphics.SmoothingMode = SmoothingMode.AntiAlias;
    e.Graphics.InterpolationMode = InterpolationMode.HighQualityBicubic;

    SolidBrush brushWhite = new SolidBrush(Color.White);
    e.Graphics.FillRectangle(brushWhite, 0, 0, 
	this.ClientSize.Width, this.ClientSize.Height);

    FontFamily fontFamily = new FontFamily("Arial");
    StringFormat strformat = new StringFormat();
    string szbuf = "ABC";

    GraphicsPath path = new GraphicsPath();
    path.AddString(szbuf, fontFamily, 
	(int)FontStyle.Regular, 48.0f, new Point(10, 10), strformat);
    Pen pen = new Pen(Color.FromArgb(234, 137, 6), 6);
    pen.LineJoin = LineJoin.Round;
    e.Graphics.DrawPath(pen, path);
    SolidBrush brush = new SolidBrush(Color.FromArgb(128, 0, 255));
    e.Graphics.FillPath(brush, path);

    brushWhite.Dispose();
    fontFamily.Dispose();
    path.Dispose();
    pen.Dispose();
    brush.Dispose();
    e.Graphics.Dispose();
}

Drawing Single Outline Text with Gradient Color

Single Outline Text with Gradient Color

We can select a gradient or texture brush, instead of a solid brush, for the text color. Below is a C++ example that shows how to do it.

#include <Gdiplus.h>

void CScratchPadDlg::OnPaint()
{
    //CDialog::OnPaint();
    CPaintDC dc(this);
    using namespace Gdiplus;
    Graphics graphics(dc.GetSafeHdc());
    graphics.SetSmoothingMode(SmoothingModeAntiAlias);
    graphics.SetInterpolationMode(InterpolationModeHighQualityBicubic);

    FontFamily fontFamily(L"Arial");
    StringFormat strformat;
    wchar_t pszbuf[] = L"Text Designer";

    GraphicsPath path;
    path.AddString(pszbuf, wcslen(pszbuf), &fontFamily, 
    FontStyleBold, 48, Gdiplus::Point(10,10), &strformat );
    Pen pen(Color(0,0,160), 5);
    pen.SetLineJoin(LineJoinRound);
    graphics.DrawPath(&pen, &path);
    LinearGradientBrush brush(Gdiplus::Rect(10, 10, 30, 60), 
        Color(132,200,251), Color(0,0,160), LinearGradientModeVertical);
    graphics.FillPath(&brush, &path);
}

Below is C# example that shows how to select a gradient brush.

private void OnPaint(object sender, PaintEventArgs e)
{
    e.Graphics.SmoothingMode = SmoothingMode.AntiAlias;
    e.Graphics.InterpolationMode = InterpolationMode.HighQualityBicubic;

    SolidBrush brushWhite = new SolidBrush(Color.White);
    e.Graphics.FillRectangle(brushWhite, 0, 0,
    this.ClientSize.Width, this.ClientSize.Height);

    FontFamily fontFamily = new FontFamily("Arial");
    StringFormat strformat = new StringFormat();
    string szbuf = "Text Designer";

    GraphicsPath path = new GraphicsPath();
    path.AddString(szbuf, fontFamily,
        (int)FontStyle.Bold, 48.0f, new Point(10, 10), strformat);
    Pen pen = new Pen(Color.FromArgb( 0, 0, 160), 5);
    pen.LineJoin = LineJoin.Round;
    e.Graphics.DrawPath(pen, path);
    LinearGradientBrush brush = new LinearGradientBrush(new Rectangle(10,10,30,70), 
        Color.FromArgb(132,200,251), 
        Color.FromArgb(0,0,160), LinearGradientMode.Vertical);
    e.Graphics.FillPath(brush, path);

    brushWhite.Dispose();
    fontFamily.Dispose();
    path.Dispose();
    pen.Dispose();
    brush.Dispose();
    e.Graphics.Dispose();
}

Drawing Double Outline Text with Generic GDI+

Double Outline Text with GDI+

To achieve double outline text, you have to render the outer outline first, then the inner outline, using DrawPath, followed by the FillPath call to draw the text body. Below is the C++ code to achieve that:

#include <Gdiplus.h>

void CScratchPadDlg::OnPaint()
{
    //CDialog::OnPaint();
    CPaintDC dc(this);
    using namespace Gdiplus;
    Graphics graphics(dc.GetSafeHdc());
    graphics.SetSmoothingMode(SmoothingModeAntiAlias);
    graphics.SetInterpolationMode(InterpolationModeHighQualityBicubic);

    FontFamily fontFamily(L"Arial");
    StringFormat strformat;
    wchar_t pszbuf[] = L"Text Designer";

    GraphicsPath path;
    path.AddString(pszbuf, wcslen(pszbuf), 
	&fontFamily, FontStyleRegular, 48, Gdiplus::Point(10,10), &strformat );
	
    Pen penOut(Color(32, 117, 81), 12);
    penOut.SetLineJoin(LineJoinRound);
    graphics.DrawPath(&penOut, &path);

    Pen pen(Color(234,137,6), 6);
    pen.SetLineJoin(LineJoinRound);
    graphics.DrawPath(&pen, &path);
    SolidBrush brush(Color(128,0,255));
    graphics.FillPath(&brush, &path);
}

This is the equivalent C# code to draw double outline text:

private void OnPaint(object sender, PaintEventArgs e)
{
    e.Graphics.SmoothingMode = SmoothingMode.AntiAlias;
    e.Graphics.InterpolationMode = InterpolationMode.HighQualityBicubic;

    SolidBrush brushWhite = new SolidBrush(Color.White);
    e.Graphics.FillRectangle(brushWhite, 0, 0, 
	this.ClientSize.Width, this.ClientSize.Height);

    FontFamily fontFamily = new FontFamily("Arial");
    StringFormat strformat = new StringFormat();
    string szbuf = "Text Designer";

    GraphicsPath path = new GraphicsPath();
    path.AddString(szbuf, fontFamily, 
	(int)FontStyle.Regular, 48.0f, new Point(10, 10), strformat);
    
    Pen penOut = new Pen(Color.FromArgb(32, 117, 81), 12);
    penOut.LineJoin = LineJoin.Round;
    e.Graphics.DrawPath(penOut, path);
    
    Pen pen = new Pen(Color.FromArgb(234, 137, 6), 6);
    pen.LineJoin = LineJoin.Round;
    e.Graphics.DrawPath(pen, path);
    SolidBrush brush = new SolidBrush(Color.FromArgb(128, 0, 255));
    e.Graphics.FillPath(brush, path);
	
    brushWhite.Dispose();
    fontFamily.Dispose();
    path.Dispose();
    penOut.Dispose();
    pen.Dispose();
    brush.Dispose();
    e.Graphics.Dispose();
}

Drawing Text Glow with Generic GDI+

Text Glow

To draw text glow, you have to start with a thin pen with low alpha values between 24 to 64 and draw the outline, repeatedly draw the outline with a thicker pen. Then finally draw the text body with FillPath method.

#include <Gdiplus.h>

void CScratchPadDlg::OnPaint()
{
    //CDialog::OnPaint();
    CPaintDC dc(this);
    using namespace Gdiplus;
    Graphics graphics(dc.GetSafeHdc());
    graphics.SetSmoothingMode(SmoothingModeAntiAlias);
    graphics.SetInterpolationMode(InterpolationModeHighQualityBicubic);

    FontFamily fontFamily(L"Arial");
    StringFormat strformat;
    wchar_t pszbuf[] = L"Text Designer";

    GraphicsPath path;
    path.AddString(pszbuf, wcslen(pszbuf), &fontFamily, 
	FontStyleRegular, 48, Gdiplus::Point(10,10), &strformat );
	
    for(int i=1; i<8; ++i)
    {
        Pen pen(Color(32, 0, 128, 192), i);
        pen.SetLineJoin(LineJoinRound);
        graphics.DrawPath(&pen, &path);
    }
	
    SolidBrush brush(Color(255,255,255));
    graphics.FillPath(&brush, &path);
}

This is the equivalent C# code to draw text glow:

private void OnPaint(object sender, PaintEventArgs e)
{
    e.Graphics.SmoothingMode = SmoothingMode.AntiAlias;
    e.Graphics.InterpolationMode = InterpolationMode.HighQualityBicubic;

    SolidBrush brushWhite = new SolidBrush(Color.White);
    e.Graphics.FillRectangle(brushWhite, 0, 0, 
	this.ClientSize.Width, this.ClientSize.Height);

    FontFamily fontFamily = new FontFamily("Arial");
    StringFormat strformat = new StringFormat();
    string szbuf = "Text Designer";

    GraphicsPath path = new GraphicsPath();
    path.AddString(szbuf, fontFamily, 
	(int)FontStyle.Regular, 48.0f, new Point(10, 10), strformat);
    
    for(int i=1; i<8; ++i)
    {
        Pen pen = new Pen(Color.FromArgb(32, 0, 128, 192), i);
        pen.LineJoin = LineJoin.Round;
        e.Graphics.DrawPath(pen, path);
        pen.Dispose();
    }

    SolidBrush brush = new SolidBrush(Color.FromArgb(255, 255, 255));
    e.Graphics.FillPath(brush, path);
	
    brushWhite.Dispose();
    fontFamily.Dispose();
    path.Dispose();
    brush.Dispose();
    e.Graphics.Dispose();
}

Postscript OpenType Fonts

Before you rush to make your own outline library, I have to tell you one pitfall of GDI+. GDI+ cannot handle Postscript OpenType fonts; GDI+ can only handle TrueType fonts. I have searched for a solution and found Sjaak Priester's Make GDI+ Less Finicky About Fonts. His approach is to parse the font file for its glyphs and draw its outline. Sadly, I cannot use his code as his library is using the restrictive GNU license as I want to make my code free for all to use.Note: This is the reason why Winform developers use the TextRenderer class, to display the text, not GDI+ classes. I racked my brains for a solution. Since GDI (not GDI+) can display Postscript OpenType fonts and GDI supports path extraction through BeginPath/EndPath/GetPath, I decided to use just that to get my path into GDI+. Below is the comparison of the GDI+ path and GDI path. Note: Both are rendered by GDI+, it is just that their path extraction is different; one is using GDI+ while the other is using GDI to get the text path.

Gdi and Gdiplus rendering

The top one is using GDI+ path and the bottom one is using GDI path. Looks like GDI path text is bigger and a bit inaccurate (not obvious here because it depends on the font). (Note: I realised that if you useGraphics::DrawString to draw the text, they are roughly the same size as the GDI path text; it is the GDI+ path text which is smaller!) However, GDI paths can do rotated italic text trick, like below, which GDI+ cannot do because GDI GraphicsPathAddString takes in a FontFamily object, not a Font object. My OutlineText class provides the GdiDrawString method if you have the need to use PostScript OpenType fonts. The effect below is a Franklin Gothic Demi font, size 36, Italic text rotated 10 degrees anti-clockwise.

Rotated Text

Drawing Outline Text using DirectWrite

As you all know, Direct2D and DirectWrite are the next graphics and text APIs for Vista and Windows 7. I have emailed Tom Mulcahy (Microsoft's developer of Direct2D for Windows 7). Below is Tom Mulcahy's email reply to me.

(Courtesy of Tom Mulcahy) The way to do this is to get the text contents as an ID2D1GeometrySink(See IDwriteFontFace::GetGlyphRunOutline). You can then callID2D1RenderTarget::DrawGeometry to draw the outline of the text (specifying any color and width you want). Next call ID2D1RenderTarget::FillGeometry to fill the text (again you can specify any color you want).

Note: Text Designer Outline Text Library which is mentioned in the latter part of the article, will be updated withDirectWrite when Windows 7 is out.

What about Drawing Shadows?

To tell you the truth, text shadow is drawn using the single outline text code. There is one problem: shadow is translucent. If we use the first code example to render shadows, it will turn out to be like the image below. It is because some area of the font body and font outline overlaps, so they are rendered twice, therefore it is darker.

Shadow Text body and outline combined

My solution is to render the shadow text body and shadow text outline separately like below and combine them, with the pixels with shadow text body taking precedence; Only where the shadow text body is not rendered, the shadow text outline is rendered. Shadow rendering is more involved and complicated, the 0.1.0 version of OutlineText.cppwithout shadow implementation is only 3KB in file size and has 164 lines of code while the 0.2.0 version ofOutlineText.cpp with shadow implementation is 23KB in file size and has 865 lines of code! Therefore, I will not show its code here, you can download and read the source code if you are interested.

Shadow Text Body

Shadow Text Outline

Proper Shadow

Drawing Single Outline using OutlineText

Single Outline Text

This is the C++ code to use the OutlineText class to display the single outline text, using the TextOutline andDrawString methods:

#include "TextDesigner/OutlineText.h"

void CScratchPadDlg::OnPaint()
{
    //CDialog::OnPaint();
    CPaintDC dc(this);
    using namespace Gdiplus;
    Graphics graphics(dc.GetSafeHdc());
    graphics.SetSmoothingMode(SmoothingModeAntiAlias);
    graphics.SetInterpolationMode(InterpolationModeHighQualityBicubic);

    FontFamily fontFamily(L"Arial Black");
    StringFormat strformat;
    wchar_t pszbuf[] = L"Text Designer";

    OutlineText text;
    text.TextOutline(Color(255,128,64),Color(200,0,0),8);
    text.EnableShadow(true);
    CRect rect;
    GetClientRect(&rect);
    text.SetShadowBkgd(Color(255,255,0),rect.Width(),rect.Height());
    text.Shadow(Color(128,0,0,0), 4, Point(4,8));
    text.DrawString(&graphics,&fontFamily,FontStyleItalic, 
        48, pszbuf, Gdiplus::Point(10,10), &strformat);
}

This is the equivalent C# code to use the OutlineText class to display the single outline text, using theTextOutline and DrawString methods:

private void OnPaint(object sender, PaintEventArgs e)
{
    e.Graphics.SmoothingMode = SmoothingMode.AntiAlias;
    e.Graphics.InterpolationMode = InterpolationMode.HighQualityBicubic;

    FontFamily fontFamily = new FontFamily("Arial Black");
    StringFormat strformat = new StringFormat();
    string szbuf = "Text Designer";

    OutlineText text = new OutlineText();
    text.TextOutline(Color.FromArgb(255, 128, 64), Color.FromArgb(200, 0, 0), 8);
    text.EnableShadow(true);
    text.SetShadowBkgd(Color.FromArgb(255, 255, 0), this.Size.Width, this.Size.Height);
    text.Shadow(Color.FromArgb(128, 0, 0, 0), 4, new Point(4, 8));
    text.DrawString(e.Graphics, fontFamily,
        FontStyle.Italic, 48, szbuf, new Point(10, 10), strformat);

    fontFamily.Dispose();
    e.Graphics.Dispose();
}

Drawing Single Outline Text with Gradient Color using OutlineText

Single Outline Text with Gradient Color and Shadow

We can select a gradient or texture brush, instead of a solid brush, for the text color. We can use theMeasureString method to calculate the width and height of the text will take and this returned width and height will be the width and height of the gradient brush. However, we must call TextOutline method again to set the brush. The reason TextOutline needs to be called twice is because MeasureString needs the TextOutlineinformation before the string can be measured correctly. Do not worry: TextOutline is pretty lightweight, it just sets some information. Below is a C++ example that shows how to do it.

#include <Gdiplus.h>

void CScratchPadDlg::OnPaint()
{
    //CDialog::OnPaint();
    using namespace Gdiplus;
    using namespace TextDesigner;
    Graphics graphics(dc.GetSafeHdc());
    graphics.SetSmoothingMode(SmoothingModeAntiAlias);
    graphics.SetInterpolationMode(InterpolationModeHighQualityBicubic);

    FontFamily fontFamily(L"Arial Black");
    StringFormat strformat;
    wchar_t pszbuf[] = L"Text Designer";

    OutlineText text;

    text.EnableShadow(true);
    CRect rect;
    GetClientRect(&rect);
    text.SetShadowBkgd(Color(255,255,0),rect.Width(),rect.Height());
    text.Shadow(Color(128,0,0,0), 4, Point(4,8));
    text.TextOutline(Color(0,0,0), Color(0,0,160),5);
    text.MeasureString(
        &graphics,
        &fontFamily,
        FontStyleItalic,
        48,
        pszbuf,
        Gdiplus::Point(10,10),
        &strformat,
        &fDestWidth,
        &fDestHeight);

    float fDestWidth = 0.0f;
    float fDestHeight = 0.0f;
    LinearGradientBrush brush(Gdiplus::Rect(10, 10, fDestWidth, fDestHeight), 
        Color(132,200,251), Color(0,0,160), LinearGradientModeVertical);
    text.TextOutline(&brush, Color(0,0,160),5);
    text.DrawString(&graphics,&fontFamily,FontStyleItalic, 
        48, pszbuf, Gdiplus::Point(10,10), &strformat);
}

Below is a C# example that shows how to select a gradient brush with OutlineText.

private void OnPaint(object sender, PaintEventArgs e)
{
    e.Graphics.SmoothingMode = SmoothingMode.AntiAlias;
    e.Graphics.InterpolationMode = InterpolationMode.HighQualityBicubic;
    OutlineText outlineText = new OutlineText();
    outlineText.TextOutline(
        Color.FromArgb(255, 128, 192),
        Color.FromArgb(255, 0, 0, 160),
        4);

    outlineText.EnableShadow(true);
    //Rem to SetNullShadow() to release memory if a previous shadow has been set.
    outlineText.SetNullShadow();
    outlineText.Shadow(Color.FromArgb(128, 0, 0, 0), 4, new Point(4, 8));

    Color m_clrBkgd = Color.FromArgb(255, 255, 255);
    outlineText.SetShadowBkgd(m_clrBkgd, this.ClientSize.Width, this.ClientSize.Height);
    FontFamily fontFamily = new FontFamily("Arial Black");

    StringFormat strFormat = new StringFormat();

    float fDestWidth = 0.0f;
    float fDestHeight = 0.0f;

    outlineText.MeasureString(
        e.Graphics,
        fontFamily,
        FontStyle.Italic,
        48,
        "Text Designer",
        new Point(10, 10),
        strFormat,
        ref fDestWidth,
        ref fDestHeight);

    LinearGradientBrush brush = new LinearGradientBrush(new Rectangle(10, 10, 
				(int)fDestWidth, (int)fDestHeight), 
        Color.FromArgb(132,200,251), Color.FromArgb(0,0,160), 
		System.Drawing.Drawing2D.LinearGradientMode.Vertical);

    outlineText.TextOutline(
        brush,
        Color.FromArgb(255, 0, 0, 160),
        4);

    outlineText.DrawString(e.Graphics, fontFamily,
        FontStyle.Italic, 48, "Text Designer",
        new Point(10, 10), strFormat);

    e.Graphics.Dispose();
}

These are the settings in TestOutlineText application to display the above. I list out the settings here because sometimes even I was a bit lost on how to use TestOutlineText to display certain outline text effects. By listing the settings here, I hope the readers will get familiar with this application so that they can try out the outline effects they want. Please note if you enable shadow, the scrolling and resizing of the TestOutlineText application will be jerky because shadow rendering is computation intensive operation. I have written a PngOutlineText class to work around this problem, which I talk about it later towards the end of the article.

Single Outline Text Settings

Drawing Double Outline using OutlineText

Double Outline Text

To achieve double outline text, you have to specify the outer outline and the inner outline. This is the C++ code to display the double outline text, using the TextDblOutline and DrawString methods:

#include "TextDesigner/OutlineText.h"

void CScratchPadDlg::OnPaint()
{
    //CDialog::OnPaint();
    CPaintDC dc(this);
    using namespace Gdiplus;
    Graphics graphics(dc.GetSafeHdc());
    graphics.SetSmoothingMode(SmoothingModeAntiAlias);
    graphics.SetInterpolationMode(InterpolationModeHighQualityBicubic);

    FontFamily fontFamily(L"Arial Black");
    StringFormat strformat;
    wchar_t pszbuf[] = L"Text Designer";

    OutlineText text;
    text.TextDblOutline(Color(255,255,255),Color(0,128,128),Color(0,255,0),4,4);
    text.EnableShadow(true);
    CRect rect;
    GetClientRect(&rect);
    text.SetShadowBkgd(Color(255,128,192),rect.Width(),rect.Height());
    text.Shadow(Color(128,0,0,0), 4, Point(4,8));
    text.DrawString(&graphics,&fontFamily,FontStyleRegular, 
        48, pszbuf, Gdiplus::Point(10,10), &strformat);
}

This is the C# code to display the double outline text, using the TextDblOutline and DrawString methods:

private void OnPaint(object sender, PaintEventArgs e)
{
    e.Graphics.SmoothingMode = SmoothingMode.AntiAlias;
    e.Graphics.InterpolationMode = InterpolationMode.HighQualityBicubic;

    FontFamily fontFamily = new FontFamily("Arial Black");
    StringFormat strformat = new StringFormat();
    string szbuf = "Text Designer";

    OutlineText text = new OutlineText();
    text.TextDblOutline(Color.FromArgb(255, 255, 255),
        Color.FromArgb(0, 128, 128), Color.FromArgb(0, 255, 0), 4, 4);
    text.EnableShadow(true);
    text.SetShadowBkgd(Color.FromArgb(255, 128, 192), this.Size.Width, this.Size.Height);
    text.Shadow(Color.FromArgb(128, 0, 0, 0), 4, new Point(4, 8));
    text.DrawString(e.Graphics, fontFamily,
        FontStyle.Bold, 48, szbuf, new Point(10, 10), strformat);

    fontFamily.Dispose();
    e.Graphics.Dispose();
}

These are the settings to display the double outline text:

Double Outline Text Settings

Drawing Text Glow using OutlineText

Text Glow Text

This is the C++ code to display the text glow using the TextGlow and DrawString methods. Text glow is usually not displayed with a shadow because shadow interferes with the glow effect, so I disabled the shadow and did not set any shadow settings.

#include "TextDesigner/OutlineText.h"

void CScratchPadDlg::OnPaint()
{
    //CDialog::OnPaint();
    CPaintDC dc(this);
    using namespace Gdiplus;
    Graphics graphics(dc.GetSafeHdc());
    graphics.SetSmoothingMode(SmoothingModeAntiAlias);
    graphics.SetInterpolationMode(InterpolationModeHighQualityBicubic);

    FontFamily fontFamily(L"Arial Black");
    StringFormat strformat;
    wchar_t pszbuf[] = L"Text Designer";

    OutlineText text;
    text.TextGlow(Color(191,255,255),Color(24,0,128,128),14);
    text.EnableShadow(false);
    text.DrawString(&graphics,&fontFamily,FontStyleRegular, 
        48, pszbuf, Gdiplus::Point(10,10), &strformat);
}

This is the similar C# code to display the text glow using the TextGlow and DrawString methods:

private void OnPaint(object sender, PaintEventArgs e)
{
    e.Graphics.SmoothingMode = SmoothingMode.AntiAlias;
    e.Graphics.InterpolationMode = InterpolationMode.HighQualityBicubic;

    FontFamily fontFamily = new FontFamily("Arial Black");
    StringFormat strformat = new StringFormat();
    string szbuf = "Text Designer";

    OutlineText text = new OutlineText();
    text.TextGlow(Color.FromArgb(191, 255, 255), Color.FromArgb(24, 0, 128, 128), 14);
    text.EnableShadow(false);
    text.DrawString(e.Graphics, fontFamily, FontStyle.Bold,
        48, szbuf, new Point(10, 10), strformat);

    fontFamily.Dispose();
    e.Graphics.Dispose();
}

These are the settings to display the text glow:

Text Glow Text

This is text glow with shadow if you are curious:

Text Glow Text With Shadow

Fake 3D Text

You can achieve simulated 3D text by using a bigger and opaque shadow which has the same colour as the outline colour. Of course, if you look closely enough, you know it is not looking like 3D at all.

Fake 3D Text Settings

Real 3D Text (Orthogonal)

Real 3D Text

It's easy to do real 3D text with PngOutlineText class. The extruded part is achieved by rendering the same colored text repeatedly and diagonally. By rendering diagonally, I mean render the text by offsetting the starting draw point by 1 pixel in x and y direction. Finally, we will render the real text at its original point. The sample code below achieves this by using the same PngOutlineText object. It sets new TextOutline parameters for the final text inDrawActualText. You will notice that in DrawDiagonal and DrawActualText methods, I blit the PNG image ingraphics object which is created out of a ARGB Bitmap object, so that the resultant 3D text will be 'saved' in the ARGB Bitmap object. Then in the OnPaint method, I just blit that ARGB Bitmap object without usingPngOutlineText anymore. To draw outline text, PngOutlineText is the way to go; OutlineText is just too slow as it has to recalculate and redraw the text each time the client area is invalidated for repainting. By the way, the sample code below is modified from sample code which is pasted from the clipboard, which in turn is copied into the clipboard from the WYSIWYG "Copy C++ Code" button. Talk about eating your own dog food!

#include "../TextDesigner/PngOutlineText.h"

Gdiplus::Bitmap m_bitmap(420,100,PixelFormat32bppARGB);

BOOL CScratchPadDlg::OnInitDialog()
{
    CDialog::OnInitDialog();

    SetIcon(m_hIcon, TRUE);			// Set big icon
    SetIcon(m_hIcon, FALSE);		// Set small icon

    using namespace Gdiplus;
    Graphics graphics(&m_bitmap);
    PngOutlineText pngOutlineText;
    DrawDiagonal(graphics, pngOutlineText, 6);
    DrawActualText(graphics, pngOutlineText);

    return TRUE;
}

void CScratchPadDlg::OnPaint()
{
    //CDialog::OnPaint();
    using namespace Gdiplus;
    CPaintDC dc(this);
    Graphics graphics(dc.GetSafeHdc());
    graphics.SetSmoothingMode(SmoothingModeAntiAlias);
    graphics.SetInterpolationMode(InterpolationModeHighQualityBicubic);

    // Fill background with white colour.
    CRect rect;
    GetClientRect(&rect);
    SolidBrush brushWhite(Color(255,255,255));
    graphics.FillRectangle(&brushWhite, 0, 0, rect.Width(), rect.Height());

    graphics.DrawImage(&m_bitmap, 10, 10, m_bitmap.GetWidth(), m_bitmap.GetHeight());
}

void CScratchPadDlg::DrawDiagonal(Gdiplus::Graphics& graphics, 
	PngOutlineText& pngOutlineText, int nDiagonal)
{
    using namespace Gdiplus;
    graphics.SetSmoothingMode(SmoothingModeAntiAlias);
    graphics.SetInterpolationMode(InterpolationModeHighQualityBicubic);

    pngOutlineText.TextOutline(
        Color(0,0,0), 
        Color(255,0,0,0), 
        4);

    pngOutlineText.EnableShadow(false);
    FontFamily fontFamily(L"Arial Black");

    StringFormat strFormat;

    Bitmap* pPngImage = new Gdiplus::Bitmap(m_bitmap.GetWidth(),
			m_bitmap.GetHeight(),PixelFormat32bppARGB);
    pngOutlineText.SetPngImage(pPngImage);
    pngOutlineText.DrawString(&graphics,&fontFamily, 
        FontStyleRegular, 48, L"Text Designer", 
        Gdiplus::Point(10,10), &strFormat);

    for(int i=0; i<nDiagonal; ++i)
        graphics.DrawImage(pPngImage, i, i, pPngImage->GetWidth(), 
					pPngImage->GetHeight());

    if(pPngImage)
        delete pPngImage;

    pPngImage = NULL;
}

void CScratchPadDlg::DrawActualText(Gdiplus::Graphics& graphics, 
				PngOutlineText& pngOutlineText)
{
    using namespace Gdiplus;
    graphics.SetSmoothingMode(SmoothingModeAntiAlias);
    graphics.SetInterpolationMode(InterpolationModeHighQualityBicubic);

    pngOutlineText.TextOutline(
        Color(178,0,255), 
        Color(255,0,0,0), 
        4);

    pngOutlineText.EnableShadow(false);
    FontFamily fontFamily(L"Arial Black");

    StringFormat strFormat;

    Bitmap* pPngImage = new Gdiplus::Bitmap(m_bitmap.GetWidth(),
			m_bitmap.GetHeight(),PixelFormat32bppARGB);
    pngOutlineText.SetPngImage(pPngImage);
    pngOutlineText.DrawString(&graphics,&fontFamily, 
        FontStyleRegular, 48, L"Text Designer", 
        Gdiplus::Point(10,10), &strFormat);

    graphics.DrawImage(pPngImage, 0, 0, pPngImage->GetWidth(), pPngImage->GetHeight());
	
    if(pPngImage)
        delete pPngImage;

    pPngImage = NULL;
}

For the above C++ code sample, I did not supply a C# equivalent code sample, because it was impossible to do blitting, using the previous version of .NET API because the .NET API makes a native copy of every managed object passed in. For example, if you supply a Graphics object with an internal Bitmap object, the PngOutlineTextdoes not use that Graphics object to draw, instead it uses a native copy of that Graphics object to draw. As a result, the internal Bitmap will not get rendered. Since then, I have added GetCopyOfInternalPng method toPngOutlineText class to enable to do blitting with a copy of rendered PNG, using GDI+.

I have also added a method, called Extrude to do 3D text easily but this method is still not as fast as the previous "PNG blitting multiple times" method because the ExtrudeStrategy class is generic and it does not know whetherOutlineText or PngOutlineText class is using it, so it cannot do any optimizations. Note: To Extrude, you have to EnableShadow because Extrude is treated like a type of shadow. Please note to achieve the 3D text effect, the shadow color has to be fully opaque (meaning 255) and the 3D extrude effect looks best when the absolute values of x and y offset are equal (For example, x=4,y=4 or x-4, y=4). Below is the C++ and C# sample code from the WYSIWYG clipboard code copy feature.

Extruded Text

#include "TextDesigner/OutlineText.h"

void CScratchPadDlg::OnPaint()
{
    //CDialog::OnPaint();

    using namespace Gdiplus;
    using namespace TextDesign;
    CPaintDC dc(this);
    Graphics graphics(dc.GetSafeHdc());
    graphics.SetSmoothingMode(SmoothingModeAntiAlias);
    graphics.SetInterpolationMode(InterpolationModeHighQualityBicubic);
    OutlineText m_OutlineText;
    m_OutlineText.TextOutline(
        Color(255,128,192), 
        Color(255,128,0,0), 
        4);

    m_OutlineText.EnableShadow(true);
    //Rem to SetNullShadow() to release memory if a previous shadow has been set.
    m_OutlineText.SetNullShadow();
    m_OutlineText.Extrude(
        Gdiplus::Color(255,128,0,0), 
        4, 
        Gdiplus::Point(8,8));

    CRect rect;
    this->GetClientRect(&rect);
    Color m_clrBkgd(255, 255, 255);
    m_OutlineText.SetShadowBkgd(m_clrBkgd,rect.Width(),rect.Height());
    FontFamily fontFamily(L"Arial Black");

    StringFormat strFormat;
    m_OutlineText.DrawString(&graphics,&fontFamily, 
        FontStyleRegular, 48, L"Text Designer", 
        Gdiplus::Point(10, 10), &strFormat);
}

Here is the equivalent C# code to call Extrude achieve 3D extruded text.

private void OnPaint(object sender, PaintEventArgs e)
{
    e.Graphics.SmoothingMode = SmoothingMode.AntiAlias;
    e.Graphics.InterpolationMode = InterpolationMode.HighQualityBicubic;
    OutlineText outlineText = new OutlineText();
    outlineText.TextOutline(
        Color.FromArgb(255, 128, 192),
        Color.FromArgb(255, 128, 0, 0),
        4);

    outlineText.EnableShadow(true);
    //Rem to SetNullShadow() to release memory if a previous shadow has been set.
    outlineText.SetNullShadow();
    outlineText.Extrude(
        Color.FromArgb(255, 128, 0, 0),
        4,
        new Point(8, 8));

    Color m_clrBkgd = Color.FromArgb(255, 255, 255);
    outlineText.SetShadowBkgd(m_clrBkgd, this.ClientSize.Width, this.ClientSize.Height);
    FontFamily fontFamily = new FontFamily("Arial Black");

    StringFormat strFormat = new StringFormat();
    outlineText.DrawString(e.Graphics, fontFamily,
        FontStyle.Regular, 48, "Text Designer",
        new Point(10, 10), strFormat);

    e.Graphics.Dispose();
}

Here is the settings to achieve 3D extruded text. You must enable the Extrude Text checkbox: (See the red rectangle!)

Extruded Text Settings

Rotated Italic Text

Rotated Text

We have to use GdiDrawString method to display the rotated italic text because GdiDrawString takes in aLOGFONT structure which allows us to specify rotational angle through the lfEscapement and lfOrientation. While DrawString method uses AddString method, of GraphicsPath, which takes in a font family object instead of a font object, we do not have the ability to specify the rotational angle.

#include "TextDesigner/OutlineText.h"

void CScratchPadDlg::OnPaint()
{
    //CDialog::OnPaint();
    CPaintDC dc(this);
    using namespace Gdiplus;
    
    Graphics graphics(dc.GetSafeHdc());
    graphics.SetSmoothingMode(SmoothingModeAntiAlias);
    graphics.SetInterpolationMode(InterpolationModeHighQualityBicubic);

    wchar_t pszbuf[] = L"Text Designer";

    LOGFONTW logfont;
    memset(&logfont, 0, sizeof(logfont));
    wcscpy_s(logfont.lfFaceName, L"Arial Black");
    logfont.lfHeight = -MulDiv(48, dc.GetDeviceCaps(LOGPIXELSY), 72);
    logfont.lfEscapement = 100;
    logfont.lfOrientation = 100;
    logfont.lfItalic = 1;

    OutlineText text;
    text.TextOutline(Color(64,193,255),Color(0,0,0),8);
    text.EnableShadow(false);
    CRect rect;
    GetClientRect(&rect);
    text.EnableShadow(true);
    text.SetShadowBkgd(Color(255,255,255),rect.Width(),rect.Height());
    text.Shadow(Color(128,0,0,0), 8, Point(4,4));
    text.GdiDrawString(&graphics, &logfont, pszbuf, Gdiplus::Point(10,100));
}

The C# code to display the rotated text is removed, as the new C# library has not implemented this yet as this will involve the pinvoke data types which I am afraid may have portablilty issues.

These are the settings to display the rotated text:

Rotated Text Settings

Diffused Shadow and Sample Code

I have added diffused shadow to the outline text library. Click the checkbox as indicated by the green rectangle to enable diffuse shadow. Note: You have to tweak the shadow alpha values(ranged from 12 to 32) and shadow thickness (ranged from 8 to 12) to achieve the diffused shadow effect. Diffused shadow is implemented using the text glow effect, so the shadow thickness indicates how many times the shadow color will be rendered. So as a rule of thumb, the higher the shadow thickness, the lower shadow alpha value. I have also implemented WYSIWYG sample code generation. Click the "Copy C++ Code" and "Copy C# Code" buttons, indicated by the red rectangle, to copy the code to clipboard and paste it to your code editor! You may still have to edit the code in your editor to make it suit your requirement, for example, changing a local object to member object of your class. In the event of the sample code crashes, try changing the bitmap sizes to be the same and also please report this crash to me and the steps to reproduce it. Note: The crash, if there is any, is due to the sample code being wrong, not because something is wrong with the Text Designer Outline Library.

Diffused Shadow Settings

PngOutlineText Class

I have written a PngOutlineText class which renders the text and shadow to a Bitmap object with an alpha channel (PixelFormat32bppARGB format), so that you need not re-generate the text whenever you need to render the text again because outline text generation typically takes a long time, especially for text with shadow. Using PngOutlineText, you must call the SetPngImage method to set the PixelFormat32bppARGB format image for the PngOutlineText to render to. After the first DrawString or GdiDrawString, you need just to blit this image to your graphics object, instead of generating the same outline text through DrawString orGdiDrawString again. I create PngOutlineText class for use in video rendering which is typically about 30fps or 60fps. If you use a big image background in the TestOutlineText application, you will find resizing the application and scrolling the image is not smooth. If you check "Enable PNG Rendering" checkbox (See the highlighted red rectangle below), resizing and scrolling becomes smooth because the TestOutlineText application detects if the settings have not been modified, it will just blit the transparent text image instead. You can use the SavePngImagemethod to save the image to PNG image. If you open the image in any image editor, like Paint.Net or Adobe Photoshop, you will see the checkered boxes which is the transparent part of the PNG image.

Enable Png Settings

Png in Paint.Net

MeasureString and GdiMeasureString

I have implemented MeasureString and GdiMeasureString method for PngOutlineText. Please do not useGraphics::MeasureString, as Graphics::MeasureString is for Graphics::DrawString method. After you use MeasureString and GdiMeasureString to get the minimum width and height required, you should add some space to width and height, like 5 pixels. MeasureString and GdiMeasureString parameters are similar toDrawString and GdiDrawString, respectively, except for the additional 2 parameters to get the width and height. This is how MeasureString family methods are used. First call MeasureString to get the width and height, then allocate a PixelFormat32bppARGB format Bitmap which is slightly larger. Make your shadow background the same size as this PixelFormat32bppARGB Bitmap. Any bitmap will do as a shadow background as this does not affect rendering for PngOutlineText, however for OutlineText, you need to crop out the part of background for the shadow background. After setting up the outline text attributes, DrawString or GdiDrawString at point (0,0) position, then Graphics::DrawImage the PixelFormat32bppARGB Bitmap at the position you want the text to appear.

The above method is fine for outline text without shadow or the shadow is at the bottom right, meaning positive x and y offset. If one or both of the offsets is negative, it would not work out so nicely. Imagine you draw the text at point (0,0), the shadow offset is at point (-4,-4), so part of the shadow may not be seen. So if the shadow offset is at point (-4,-4), you DrawString the text at point (4,4) and Graphics::DrawImage the finalPixelFormat32bppARGB Bitmap at original position subtracted by point (4,4). So in this way, whether your shadow is offset to the top or bottom or left or right, the text would always appear at the same position. Below is the code to accomplish this. Please note that the code below does not appear in the sample code because I do not want to confuse the beginners on how to use the PngOutlineText class as PngOutlineText sample code is already the longest.

float fWidth=0.0f;
float fHeight=0.0f;
m_PngOutlineText.MeasureString(&graphics,&fontFamily,FontStyleBold, 
    72, m_szText, Gdiplus::Point(0,0), &strFormat,
    &fWidth, &fHeight);
	
m_pPngImage = new Bitmap(fWidth+5.0f, fHeight+5.0f, PixelFormat32bppARGB);

if(!m_pPngImage)
    return;

m_PngOutlineText.SetPngImage(m_pPngImage);
m_PngOutlineText.SetNullShadow();
m_PngOutlineText.SetShadowBkgd(
Gdiplus::Color(GetRValue(m_clrBkgd),GetGValue(m_clrBkgd),GetBValue(m_clrBkgd)),
    m_pPngImage->GetWidth(), m_pPngImage->GetHeight());

if(!m_bEnableShadow)
{
    m_PngOutlineText.DrawString(
        &graphics,&fontFamily,fontStyle,m_nFontSize,
	m_szText,Gdiplus::Point(0,0), &strFormat);
    graphics.DrawImage(m_pPngImage, (float)m_nTextPosX, (float)m_nTextPosY, 
        (float)m_pPngImage->GetWidth(), (float)m_pPngImage->GetHeight());
}
else
{
    int nShadowOffsetX = 0;
    if(m_nShadowOffsetX<0)
        nShadowOffsetX = -m_nShadowOffsetX;
    int nShadowOffsetY = 0;
    if(m_nShadowOffsetY<0)
        nShadowOffsetY = -m_nShadowOffsetY;
    m_PngOutlineText.DrawString(&graphics,&fontFamily,
        fontStyle,m_nFontSize,m_szText,Gdiplus::Point
	(nShadowOffsetX,nShadowOffsetY), &strFormat);
    graphics.DrawImage(m_pPngImage, (float)
	(m_nTextPosX-nShadowOffsetX), (float)(m_nTextPosY-nShadowOffsetY), 
        (float)m_pPngImage->GetWidth(), (float)m_pPngImage->GetHeight());
}

Please note that the above method doesn't work for GdiDrawString's rotated italic text effect. For information on how to make rotated italic text effect work, search for GdiMeasureStringRealHeight method call, inMyScrollView.cpp of TestOutlineText project, which is a quick hack to make it work. I do not put it here because it will complicate the code example.

OpenGL Demo

I have made an OpenGL demo which shows how to use PNG images with alpha channel as textures. Please note that the text images are generated on the fly. I used to work in muvee, a company, which specializes in making automatic video editing software, uses OpenGL to do their video special effects with the photos. I find their text captions, used in their automatic edited video, quite bland. I hope they will adopt my Text Designer Outline Text library in their next version of muvee software.

OpenGL Demo Screenshot 1

OpenGL Demo Screenshot 2

CodePlex

Text Designer Outline Text open source library is hosted on CodePlex at this URL: http://outlinetext.codeplex.com/. The library at CodePlex will be updated faster than the one on this article, so be sure to check the library Codeplex for updated source, from time to time. The version of Text Designer is currently at 0.3.0. There is a lot more work (70% more) to be done before this library achieves version 1.0.0. The eventual goal of this library is to get as advanced asWPF text effects.

References

Changelog of Source Code

  • Version 0.3.6 (15th minor release)
    • Version 2 preview. Version 2 only has 2 class, namely Canvas and MaskColor. Added WPF support for version 2.
    • Added 3 demos (Aquarion, Be Happy and Dirty) for C++ MFC, C# Winform and C# WPF
  • Version 0.3.0 (9th minor release)
    • Added the ability to select brush, such as gradient brush or texture brush for the text body
    • Added CSharp library, TextDesignerCSLibrary. Managed C++ dotNetOutline library is obsolete and will be removed soon.
  • Version 0.2.9 (8th minor release)
    • Added 3D Text to OutlineTextPngOutlineTextdotNetOutlineText anddotNetPngOutlineText
    • Add C++ classes to TextDesign namespace
    • Added WYSIWYG code sample for Extrude (3D Text feature)
  • Version 0.2.8 (7th minor release)
    • Solved the bug of incorrect DrawImage position for the C++ and C# sample code for thePngOutlineText
  • Version 0.2.7 (6th minor release)
    • Changed the PngOutlineText that user must call Graphics::DrawImage method now.DrawString and GdiDrawString methods don't render now
    • Changed the TestOutlineText to use MeasureString and GdiMeasureString forPngOutlineText code to use a small rectangle for rendering than the whole client rectangle
    • Added a GdiMeasureStringRealHeight method as a hack for the rotated italic text forPngOutlineText
  • Version 0.2.6 (5th minor release)
    • Added C# sample code generation by which the C# code is copied to the clipboard(Beta)
  • Version 0.2.5
    • Added C++ sample code generation by which the C++ code is copied to the clipboard (Beta)
  • Version 0.2.4 (4th minor release)
    • OpenGL demo to generate the text PNG images dynamically, instead of reading from pre-rendered PNG image files. Noticeable pause at the start of the demo (grey window) because of text PNG images being generated in memory
  • Version 0.2.3
    • Added GdiMeasureString and MeasureString methods to .NET classes
    • Added diffused shadow methods to .NET classes
    • Made dotNetOutlineText class and dotNetPngOutlineText class inherited from the interface class
  • Version 0.2.2 (3rd minor release)
    • Added diffused shadow
  • Version 0.2.1 (2nd minor release)
    • Fixed path memory leaks in GdiDrawString methods
    • Added GdiMeasureString and MeasureString
    • Made OutlineText class and PngOutlineText class inherited from the abstract class
  • Version 0.2.0
    • First public release

History

  • 17th February, 2010
    • Added a section on how to initialize GDI+
    • Replaced all the obsolete C++/CLI dotNetOutlineText examples with C# TextDesignerCSLibrarycode
    • Added a generic section on how to select a gradient brush
    • Added a section on how to select a gradient brush
  • 19th October, 2009
    • Expanded the section on how to do real 3D extruded text
  • 13th October, 2009
    • Added a section on how to do real 3D text
  • 7th October, 2009
    • Explained how to use MeasureString and GdiMeasureString methods for PngOutlineTextobjects
    • Updated some C# code to dispose the objects at the end of OnPaint method
    • Updated some explanation
    • Added Diffused Shadow and C++ and C# sample code to copy to clipboard to achieve WYSIWYG
  • 22nd September, 2009
    • First release on CodeProject
    • Text Designer Outline Text Library version 0.2.0

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Wong Shao Voon

Software Developer

Singapore Singapore

Member

I guess I'll write here what I does in my free time, than to write an accolade of skills which I currently possess. I believe the things I does in my free time, say more about me.
 
When I am not working, I like to watch Japanese anime. I am also writing some movie script, hoping to see my own movie on the big screen one day.
 
I like to jog because it makes me feel good, having done something meaningful in the morning before the day starts.
 
I also writes articles for Codeproject; I have a few ideas to write about but never get around writing because of hectic schedule.

저작자 표시

'소스코드' 카테고리의 다른 글

ZedGraph  (0) 2012.05.31
[소스 검색하기]2D Fast Wavelet Transform Library for Image Processing  (0) 2012.05.30
이쁜 글자 만들기.  (0) 2012.05.30
[qt]관련 자료 검색  (0) 2012.05.23
블루투스 관련 자료  (0) 2012.05.21
24 bit bmp to rGB565 file conversion[c#]  (0) 2012.05.08

UX Design CourseⅡ(서울) _UX기획자를 위한 실전 UX Planning Training Course<5.29>


정규교육과정

[정부지원] UX Design CourseⅡ(서울)
_UX기획자를 위한 실전 UX Planning Training Course<5.29>

교육센터 신청내역 보기

프로그램안내 강사소개 등록관련안내 교육장소

정규교육과정 소개

UX 디자인 전문가로서 UX구현(디자인,프로그래밍) 할 수 있도록, 자사 서비스를 이용하는 사용자 정의, 사용자의 멘탈모델에 맞춰 정보 구조 설계, 사용자가 쉽게 이용할 수 있는 UX 환경 조성,UI 설계원칙,Wire-Frame 설계 및 Paper Mock-up Testing으로 구성하여 UX 디자인 역량과 노하우를 전수하여 UX전문가로서 필요한 전반적인 능력을 함양하는 고급 과정입니다. 

* 총 교육비 : 700,000원, 정부지원금 : 630,000원, 수강생 부담금 : 70,000원
(고용보험료가 체납된 업체 교육 참여 불가)
* 지원대상 : 중소기업의 사업주 또는 근로자

참가신청

정규교육과정 개요

기간2012년 05월 29일 (화) 오전 10:00 ~ 2012년 06월 26일 (화) 오후 6:00
인원30명
장소한국인터넷전문가협회 KIPFA 가산교육센터 2F 약도보기
주최한국인터넷전문가협회
주관한국인터넷전문가협회
후원

고용노동부
목적- UX 기본, UX 방법론과 프로세스, 인터랙션 디자인, 인터페이스 디자인으로 구성된 통합 UX디자인 핵심실무
- 각 모듈별 개념 및 실습을 통하여 자사의 업무에 반영할 수 있는 실무 적응력 배양 및 적용 능력 향상
참여대상- UX분야에 대한 능력을 향상시키고자 하는 전문가
- 기업내 디자인 관련부서 실무자(2년이상 해당분야 근무) 및 실무 책임자
- 디지털디자인 UX/UI 관련 전문가
- 기존 웹 사용성, UI 및 Interaction디자인 컨설턴트

*지원대상 : 중소기업의 사업주 또는 근로자

정규교육과정 프로그램

구분강의 주제 및 내용강사
1강
2012.05.29(화)

10:00~18:00
[자사 서비스를 이용하는 사용자 정의]

인터랙션(Interaction) 
- 인터랙션의 네 가지 절차/ 종류/ 수준/인터랙션 행위와 스타일 

인터랙션 디자인 지침 
- 조화로운 행동을 위한/부정적 요소 줄여주는/긍정 요소 증가시키는 인터랙션 디자인 지침 
- 도널드노만 인간의 7가지 행위모델 

인터랙션 디자인과 HCI 
- 유용성(Usefulness)/ 사용성(Usability)/ 감성(affect) 

사용자 정의 
- 주 사용자(primary)/ 부 사용자(secondary users) 
- 사용자의 숙련 정도/ 개인적 특성/인지모형 및 역할모형 

페르소나 모형 및 기법 
- 기법1. 중요한 사용자 범주 파악 
- 기법2. 주요 단서 분류하고 이름 정하기 
- 기법3. 세부 범주 파악하고 기간구조 잡기 
- 기법4. 사용자에 대한 세부 범주의 기간구조 평가, 우선순위 정하기 
- 기법5. 기간구조를 바탕으로 페르소나 작성 

전민수
이비피알컨설팅 대표
2강
2012.06.05(화)

10:00~18:00
[사용자의 멘탈모델에 맞춰 정보 구조 설계]-정보구조 설계와 카드소팅

1주차 개별 과제 피드백 

심성 모형(Mental Model) 
- 심성모형이란 
- 심성모형의 특징 
- 심성모형의 역할 
- 도날드 노만의 7가지 행위 모델 

사용자 심성모형 사용자 조사 방법 
- 1:1 심층 인터뷰 
- FGI 
- 설문조사 

정보구조 설계 
- 정보구조 디자인의 의미 
- 데이터의 유형 
- 정보의 분류 
- 정보구조 
- 정부구조 디자인 절차 

사용자 대상 유용성 검증 방법: 카드 소팅 
- 카드소팅이란 
- 열린 카드소팅 
- 닫힌 카드소팅 
- 레이블 테스트 

조별 실습 
조별 과제: 카드소팅 
조별 발표 및 피드백 
개별 과제

3강
2012.06.12(화)

10:00~18:00
[사용자가 쉽게 이용할 수 있는 UX 환경 조성]-사용성 분석

2주차 개별 과제 피드백 

사용성(Usability) 분석 
- 사용성의 정의/ 중요성/ 속성 
- 효율성(efficiency) /정확성(accuracy)/의미성(meaningfulness) /유연성(flexibility)/일관성(consistency) 분석 

제이콥 닐슨의 10가지 휴리스틱(Jacob Nilsson's 10 Heuristics) 

과업 분석(Task analysis)