Concurrency with an STA?

I was recently experimenting with the Windows Running Object Table (ROT) when I ran into a peculiar problem. Here's the scenario: I had a simple in-process COM component configured to run in a single threaded apartment (STA). The STA apartment configuration, if you didn't know, is an indication to the COM runtime that the component does not know anything about thread synchronization and will miserably stomp over itself and do other unpleasant things if unbridled concurrent access to an instance of it is made available from multiple threads. Wanting to test whether the COM runtime's call serialization guarantee would continue to hold true even when we remote an in-process component to remote processes via the ROT, I put together a quick sample containing the following:

A simple in-process STA COM component named Dong that contained a single method, again, called Dong that simply popped up a message box to announce to the world the fact that it had indeed been found worthy of being invoked.
A console application called AddToROT.exe that created an instance of Dong and added it to the ROT. After that it hung around running a windows message loop set to terminate upon a key press. The message loop is needed because the COM runtime's call serialization implementation depends on it. We'll learn why exactly in a moment. Here's what the loop looks like:
```
while( !_kbhit() )
{
    MSG msg;
    if( PeekMessage( &msg, NULL, 0, 0, PM_REMOVE ) )
    {
        TranslateMessage( &msg );
        DispatchMessage( &msg );
    }
    else
        Sleep( 500 );
}
```
As you can see, a fairly straightforward message loop that keeps spinning till you hit a key on the keyboard.
The last piece in the sample was a console application called GetFromROT.exe that fetched a reference to Dong from the ROT and invoked its sole method.

My plan was to first run AddToROT and then launch multiple instances of GetFromROT. I expected that the message box from Dong.Dong() would get displayed one after the other even though the client processes were running more or less concurrently. I expected this because that's what the COM runtime guarantees for components marked as an STA. How exactly does it provide this guarantee? It's quite simple actually.

Whenever a method call is made on a component from the thread on which it was created, it behaves like a regular method call, i.e. your call results in a simple transfer of control to the called method in the component. When you wish to call the method from another thread however, the COM powers that be have mandated that you first marshal the interface pointer across the thread boundary before invoking methods on it. You do this via CoMarshalInterThreadInterfaceInStream and CoGetInterfaceAndReleaseStream, i.e. you call the former from the thread on which the component was created to be handed an IStream pointer which you somehow pass to the second thread from where you call the latter to be handed a component pointer which in turn you can use to call its methods. If you do all this the COM runtime guarantees that the calls will get serialized and peace shall reign everywhere.

Now if all that sounds a bit confusing here's a small code snippet that'll hopefully clear the air for you (the code below is meant to just illustrate the concept which means that there are no error checks; and what's more, it won't even compile!).

Thread 1

//
// let's assume this is a global variable
//
IStream *g_pStream;

//
// create an instance of dong
//
IDong *pDong;
CoCreateInstance( ..., &pDong );

//
// marshal the interface pointer into a stream
//
CoMarshalInterThreadInterfaceInStream(
   __uuidof( IDong ),
   pDong,
   &g_pStream );

//
// simple straightforward method call
//
pDong->Dong();

Thread 2

IDong *pDong;

//
// un-marshal the interface pointer
//
CoGetInterfaceAndReleaseStream(
   g_pStream,
   __uuidof( IDong ),
   &pDong );

//
// "pDong" is in truth a proxy object that marshalls the call
// across thread/process boundaries; the COM runtime ensures that
// the component gets only one call at a time
//
pDong->Dong();

Now that you know how marshalling interface pointers across threads is accomplished, let's go back to our question of how the COM runtime provides the call serialization guarantee and what it has to do with running message loops. As it turns out whenever you create an STA COM component, the COM runtime secretly goes and creates a hidden window. When you marshal the interface pointer across to another thread (or another process for that matter) what you are actually handed is a proxy object. When you invoke a method on the proxy all that it does is to serialize the method parameters and post (or rather, send) a regular window message to the hidden window. The window procedure that handles the message unpacks the parameters and calls the method on the actual component. Simple! No matter how many concurrent clients exist for the component, as long as all the method calls are routed through the hidden window, call serialization is automatically guaranteed!

As must be evident, in order for windows (hidden or otherwise) to receive messages there must be a message loop that's retrieving and dispatching the messages. This is the reason why COM's call serialization guarantee works only so long as the thread on which the component was created has a message loop going. So far so good!

In our sample setup therefore you couldn't have blamed me too much for expecting that when I run 2 instances of GetFromROT one after the other without dismissing the message box shown as a result of the first instance the 2nd instance would essentially block on the method call till I dismissed the first message box. After it had been dismissed however I would see the message box appearing a second time, courtesy the 2nd instance of GetFromROT. Here's a screenshot of what I actually saw!

As you can see, the second instance of GetFromROT was also somehow given access to Dong while the first invocation still hadn't returned! What's even more stranger is that both the calls seem to have occurred on the same thread!! This is evident from the fact that both the message boxes show the same thread ID as returned by the GetCurrentThreadId API.

For a couple of days there I walked about with tousled hair, unshaved chin and rumpled shirt with a murderous look in my eye. What has world come to if one can't trust the COM runtime to do what it had promised to do?! This sorry state of affairs ended finally one day as I was performing my morning ablutions (and I could hear mother nature letting out a sigh of relief) when it dawned upon me with startling clarity that the windows message box spawns a little message loop of its own!

That was indeed the problem here! As it turns out whenever you pop up a message box (or any modal dialog box for that matter) a local message loop is executed from that dialog. This is done because of the modal nature of the dialog. The thread that has the message pump running is now blocked on the call to the modal dialog which means that the message pump isn't doing a whole lot while the dialog is active. It would also mean that the dialog itself would remain unresponsive since there's nobody picking the messages from the queue and having it processed. To counter all this, modal dialog boxes always run their own message loop till the dialog is dismissed.

So, in our case the 2nd call to Dong.Dong was facilitated not by the message loop running in AddToROT but from the one running in the message box that had been invoked from the previous call to Dong.Dong. We can easily verify this by taking a look at the call-stack of the primary thread in AddToROT while Dong.Dong is running.

Here's the call-stack while the message box is being shown as a result of running the first instance of GetFromROT. The stack was captured using the excellent Process Explorer tool written by Mark Russinovich of Sysinternals fame. I have snipped some of the function calls from the stack so that we can focus on the relevant stuff.

snip.. snip..

USER32.dll!NtUserWaitMessage+0xc
USER32.dll!InternalDialogBox+0xd0
USER32.dll!SoftModalMessageBox+0x938
USER32.dll!MessageBoxWorker+0x2ba
USER32.dll!MessageBoxTimeoutW+0x7a
USER32.dll!MessageBoxExW+0x1b
USER32.dll!MessageBoxW+0x45
SampleCOM.dll!CDong::Dong+0x7e          <-- this is our function
RPCRT4.dll!Invoke+0x30
RPCRT4.dll!NdrStubCall2+0x297

snip.. snip..

ole32.dll!StubInvoke+0xa7
ole32.dll!CCtxComChnl::ContextInvoke+0xe3
ole32.dll!MTAInvoke+0x1a
ole32.dll!STAInvoke+0x4a

snip.. snip..

USER32.dll!DispatchMessageWorker+0x306
USER32.dll!DispatchMessageW+0xf         <-- and this is the
                                            DispatchMessage call
AddToROT.exe!wmain+0x135
AddToROT.exe!__tmainCRTStartup+0x1a6
AddToROT.exe!wmainCRTStartup+0xd
kernel32.dll!BaseProcessStart+0x23

Now take a look at what the stack looks like after the second instance of GetFromROT is launched without dismissing the first message box.

snip.. snip..

USER32.dll!NtUserWaitMessage+0xc
USER32.dll!InternalDialogBox+0xd0
USER32.dll!SoftModalMessageBox+0x938
USER32.dll!MessageBoxWorker+0x2ba
USER32.dll!MessageBoxTimeoutW+0x7a
USER32.dll!MessageBoxExW+0x1b
USER32.dll!MessageBoxW+0x45
SampleCOM.dll!CDong::Dong+0x7e          <-- second invocation
                                            of Dong.Dong
RPCRT4.dll!Invoke+0x30
RPCRT4.dll!NdrStubCall2+0x297

snip.. snip..

ole32.dll!StubInvoke+0xa7
ole32.dll!CCtxComChnl::ContextInvoke+0xe3
ole32.dll!MTAInvoke+0x1a
ole32.dll!STAInvoke+0x4a

snip.. snip..

USER32.dll!DispatchMessageWorker+0x306
USER32.dll!DispatchMessageW+0xf         <-- dispatch message from
                                            the loop in MessageBox
USER32.dll!DialogBox2+0x15a
USER32.dll!InternalDialogBox+0xd0
USER32.dll!SoftModalMessageBox+0x938
USER32.dll!MessageBoxWorker+0x2ba
USER32.dll!MessageBoxTimeoutW+0x7a
USER32.dll!MessageBoxExW+0x1b
USER32.dll!MessageBoxW+0x45
SampleCOM.dll!CDong::Dong+0x7e          <-- first invocation of
                                            Dong.Dong
RPCRT4.dll!Invoke+0x30
RPCRT4.dll!NdrStubCall2+0x297

snip.. snip..

ole32.dll!StubInvoke+0xa7
ole32.dll!CCtxComChnl::ContextInvoke+0xe3
ole32.dll!MTAInvoke+0x1a
ole32.dll!STAInvoke+0x4a

snip.. snip..

USER32.dll!DispatchMessageWorker+0x306
USER32.dll!DispatchMessageW+0xf         <-- original dispatch msg
                                            for first invocation
AddToROT.exe!wmain+0x135
AddToROT.exe!__tmainCRTStartup+0x1a6
AddToROT.exe!wmainCRTStartup+0xd
kernel32.dll!BaseProcessStart+0x23

As is evident, the fact that the message loop in the MessageBox API does not filter for messages that are applicable only to the message box window and its descendants results in this side effect. Inadvertently our STA component has actually become re-entrant! The behaviour we were expecting to see is evident the moment you change the MessageBox call to a _tprintf and make the method wait for user input via a call to _getch. The following implementation of Dong.Dong causes the second launch of GetFromROT to wait till the first launch has been responded to by the pressing a key in the AddToROT console window.

STDMETHODIMP CDong::Dong(LONG* plRetVal)
{
    *plRetVal = 50;

    TCHAR szBuf[1024];
    _stprintf( szBuf, _T( "Dong - Thread ID = 0x%X, " \
    "Object ID = %d\n" ),
        GetCurrentThreadId(), m_iObjectID );
    _tprintf( _T( "%s\nPress any key to return from "\
    "CDong::Dong\n" ), szBuf );
    _getch();
    return S_OK;
}

A nasty sort of issue to run into wouldn't you think?!