Fundamental Shell Concepts

Topics: Shell object identification | Managing shell DLL versions | Special folders | Launching applications

The shell namespace is a fancy name for the various objects managed by the shell, including the normal filesystem plus a number of "virtual" folders like the Printers, Control Panel, the Recycle Bin, et cetera. It is organised in a single tree-structured hierarchy, where the Desktop, which is itself semi-virtual, holds the root node.

Except for the folders, the namespace contains items like control panel applets and naturally regular files. These are organized in file types, according to their filename extension. For example, peeking the registry for files with ".txt" extension reveals that they belong to a class called "txtfile". Further down in the listing for the HKEY_CLASSES_ROOT key you can find the definition of the txtfile class, which contains information like the default icon, verbs that the objects support (e.g. open, print, etc) and much more. All this information can be carefully edited using the standard Folder Options dialog.

Us programmers have one up on mere users, since the shell allows to programmatically change and extend it in a number of ways. For instance, instead of having one icon for all text files it is possible to develop an icon handler that could provide a slightly modified icon depending e.g. on the size of the file, or whatever else you'd fancy. Not all shell extensions are as vain as this, there are some really useful things you can do, like adding custom items in the context menu, extracting thumbnail images, and so on. I won't be covering the development of shell extensions; however I'm going to demonstrate how 2xExplorer uses many extension objects, as a client.

ADVANCED: special registry keys

Except for regular file types the shell allows other kinds of objects to have verbs, too, something that is not immediately straightforward to understand. For instance any verbs added under HKEY_CLASSES_ROOT\* will apply to all objects. That can be useful for adding e.g. an "edit" verb that would appear in the context menu of all documents, whether text, source code or whatever. A side effect is that this verb would also appear for unsuitable objects, like executable files.

Another interesting key is HKEY_CLASSES_ROOT\Folder. Verbs added here will appear in the context menu of all folders (both filesystem and virtual). A serving suggestion here would be a "2xOpen" verb that could invoke 2xExplorer to open some folder. The following table has a list of some more interesting "file type" keys, where you can add or edit verbs for custom functionality:

HKCR KeyAffected object types
* All files
AllFileSystemObjects All regular files and file folders
Folder All folders, virtual and filesystem
Directory File folders
Drive Root folders of all system drives
Network Entire network
NetShare All network shares
Printers All printers

Shell object identification

Let us return to the shell namespace. In the old days when people only dealt with regular filesystem objects, a pathname was enough for identification, where folder names are separated by backslash '\' characters. Nowadays, such paths are still supported, but the shell internally uses the so called Item Identifier Lists, shortened to PIDLs. These are binary identifiers that consist of one or more SHITEMID structures:
struct SHITEMID { 
   USHORT cb;      // total size in bytes
   BYTE   abID[1]; // beginning of item identification data
};

Apart from its derogatory name <g> this is quite a simple data structure. It serves as a template for storing custom data you'd seldom use the abID member. The length of a SHITEMID is variable (held in the cb member) and it's content is completely up to the folder that manages the item. For example, some folder might use the next struct for identifying its items:
struct MyPidl {
   USHORT cb;
   DWORD  dwType;
   TCHAR  szDisplayName[40];
};

Notice that this struct also begins with the cb member which holds the total length in bytes (cb == sizeof(MyPidl)), followed by the real data held for the item. Here the folder implementor has opted to keep the item type as well as the name. Other folders may also opt to store more details like file size, modification date etc. The only requirement is for each idem to have a unique PIDL in the folder that contains it.

Just like as regular paths can either be simple or absolute, PIDLs can also be simple or absolute. A single SHITEMID followed by a USHORT zero (the terminator) is the simplest PIDL, describing an item locally within its folder. An absolute PIDL contains in addition a SHITEMID for the container folder, one for its parent, and so on till the namespace root, i.e. the desktop. Here's a schematic with a series of SHITEMIDs for locating "c:\autoexec.bat":
---------------- ---------------- ---------------- ------------------
| cb | data... | | cb | data... | | cb | data... | | '\0''\0' (==cb)|
---------------- ---------------- ---------------- ------------------
  "My Computer" \        "C:"    \    "autoexec.bat"   (terminator)

In all aspects, a PIDL or list of SHITEMIDs, can be treated like regular file paths. For example, removing the last SHITEMID from the list gives the PIDL of the parent folder. You can append a simple file SHITEMID to the full PIDL of its container folder to obtain the full PIDL of the file, and so on. The only critical detail is that all memory allocation should be done through shell's allocator IMalloc as returned by SHGetMalloc.

A complex PIDL can be traversed using the individual cb lengths of each SHITEMID. Adding cb bytes from the start address will take you to the second SHITEMID in the list, and so on until you hit an entry where cb==(USHORT)0. Here's an example that counts the number of "tokens" in any PIDL, excluding the terminator:
int PIDLGetSegments(LPCITEMIDLIST pidl)
{
   int cnt = 0;
   int cb = pidl->mkid.cb; // size of the first item in list

   while(cb != 0) {
      cnt++;
      pidl = (LPITEMIDLIST)(((LPBYTE) pidl) + cb); // point to the next one
      cb = pidl->mkid.cb;
   }

   return cnt; // zero only for the desktop, 1 for local (enum) pidls
}

It is a peculiar omission, that despite the dominant role of PIDLs in shell operations, there are no documented APIs for regular PIDL management tasks, like merging, counting their length in bytes, etc. However there are tons of sample functions in the online documentation, and they all basically work like the PIDLGetSegments example above.

Useful APIs for PIDLs

Shell APIOperation
SHBindToParent Takes the fully-qualified PIDL of a namespace object, and returns a specified interface pointer on the parent object; requires win2000/Me
SHGetDataFromIDList Retrieves extended property data from a relative identifier list
SHGetPathFromIDList Converts a (non-virtual) item identifier list to a file system path.

ADVANCED: Managing shell DLL versions

The previous table has illustrated a constant nag when it comes developing shell applications. Some APIs and COM interfaces are not supported for all win32 platforms, and usually the problem is windows 95. Microsoft has been working on the shell ever since 1995 (and earlier if betas are considered). Each new version added new features accessible with new COM objects and new APIs. Here's a list of the major milestones:

So what can you do to ensure that your program runs on all 32-bit platforms and shell configurations? One option is what I've done with 2xExplorer: only use system function calls that are supported on the weakest OS, i.e. win95. It is usually straightforward to develop your own replacements for functions that have appeared in shell version 4.71. For example, the SHBindToParent API mentioned above can be substituted by the following steps:

  1. Obtain the parent PIDL of the item, by direct manipulation of the SHITEMID list.
  2. Use desktop's BindToObject to obtain the required interface.

Not all compatibility problems are as easily sorted out though. If the thumbnail object (exposing IExtractImage) is missing then you'd have to write code for image extraction yourself, which obviously is out of the question given the amount of different image and document formats. Fortunately, when it comes to COM objects, your program can simply query for the required interface, and if not available just gracefully remove some part of the functionality, leaving the main program intact.

With shell APIs it's different though. If you hard code e.g. SHBindToParent in your program, then unless the appropriate DLL version is available, the program will not run at all. If you take this path then you would need to have a setup program to distribute your application and ensure that all the required shell DLLs are installed.

Special folders

Virtual folders are the reason behind the "complicated" shell object identification mechanism offered by PIDLs. If it was just regular files and folders the good-old path naming approach would work a treat. But alas, the shell namespace allows anything to be explored, including contents of ZIP files, the registry itself and all sorts of oddities.

There are several ways to access virtual folders. The easiest is via SHGetFolderLocation which returns the fully qualified PIDL for all system-supplied virtual folders using predefined symbolic constants. For example the constant CSIDL_BITBUCKET will provide the location of the "Recycle Bin". SHGetFolderLocation is also useful for getting the location of many important filesystem folders too, like CSIDL_PERSONAL which corresponds to "My Documents" folder.

Another useful API is SHBrowseForFolder which pops up a dialog window with a tree control containing all the shell namespace, as seen in the left pane of (2x)Explorer. The API returns the PIDL of whatever folder the user selected, which encompasses virtual folders, too.

ADVANCED
Each virtual folder is managed by a COM object (known as namespace extension), registered in your system under HKCR\CLSID key as usual. The 128-bit identifier (also known as GUID) of the object can also be used for accessing the folder in a textual fashion. For example the string "::{20D04FE0-3AEA-1069-A2D8-08002B30309D}" which is the GUID of "My Computer" can be used as a pathname. This is a less known addressing mechanism. Such strings are recognised by ParseDisplayName, offering another access route to virtual PIDLs.

Launching applications

In the old DOS days, only executable programs (*.exe, *.com and *.bat) could be launched. The win32 shell has significantly broadened the picture, allowing any type of registered file type to be launched. For documents, this translates in launching the associated application passing the document as the argument. You can use FindExecutable to find out the application associated with some file type, but you don't need to know this info just to open a document.

Two APIs can be used for launching any shell object, ShellExecute and ShellExecuteEx. Except for mere launching, it is possible to specify a verb to be executed, as defined for each file type. So instead of the default "open", the "print" verb may be specified (assuming that the intended target document supports such a verb) which would result in the document being printed unsurprisingly <g>. There are a number of generic verbs that can be used, on top of those supported by each file type:

VerbAction on item
properties Displays the file or folder's properties
find Initiates a search starting from the specified directory
printto Similar to "print" verb; here a printer name can be specified

These verbs are locale independent, meaning that you can safely use them in China too, without speaking a word of that language <g>.

Except for regular filenames, you can "open" other interesting things. For example passing a URL name like "http://www.ps.ic.ac.uk/~umeca74" will open the website in it's associated application, which is the default internet browser on a computer. Similarly, opening a "mailto:n.bozinis@ic.ac.uk?subject=test" pseudo-file will open the default email client. A real powerhouse this.

ShellExecute is simpler, but I always use ShellExecuteEx, which is a real powerhouse. When you try to launch some file type that is not associated with any application, the "Open with" dialog is automatically called. Here's but a short list of the additional things you can do with it, by setting the appropriate fMask in SHELLEXECUTEINFO:

FlagDescription
SEE_MASK_CONNECTNETDRV Validate the share and connect to a drive letter
SEE_MASK_HMONITOR Control which monitor the application will output to (on multimonitor systems)
SEE_MASK_IDLIST Open a document given its PIDL instead of a filename

The SEE_MASK_IDLIST is especially useful for launching items within virtual folders, where regular filenames are not available. It is thus possible to use ShellExecuteEx to launch control panel applets, among other things.

Another kool feature is that you can obtain a handle to the new process, which you can wait upon, so as to find out exactly when the child process will terminate, if necessary. Here's some code demonstrating how to achieve that:

void LaunchAndWait(LPCTSTR file)
{
   SHELLEXECUTEINFO sei;
   ::ZeroMemory(&sei, sizeof(SHELLEXECUTEINFO));
   sei.cbSize = sizeof(SHELLEXECUTEINFO);
   sei.fMask = SEE_MASK_NOCLOSEPROCESS;
   // sei.lpVerb is uninitialised, so that default action will be taken
   sei.nShow = SW_SHOWNORMAL;
   sei.lpFile = file;

   ::ShellExecuteEx(&sei); // i hope in your code you check the return value!
   // now sit and wait till the child terminates
   ::WaitForSingleObject(sei.hProcess, INFINITE);
   ::CloseHandle(sei.hProcess); // been there, done that
}

Note that the above example uses a file name, not a PIDL. It is not possible to obtain a process handle using a PIDL, apparently, as I've learned by my various misfortunes <g>. That was confirmed by Microsoft.

ADVANCED: Launching from a thread
There are occasions where launching a file takes ages (e.g. any office document), and you don't want to bog down your main program till ShellExecuteEx finishes its work. The natural solution here is to launch the document in a separate thread of execution. ShellExecuteEx will even provide the busy-in-the-background trix mouse cursor during the launching. An important detail is that you must specify SEE_MASK_FLAG_DDEWAIT in fMask. If you omit this, the thread may terminate too early and there may be hell to pay and BSODs to stare at <g>.

Additional information

Coming straight from the horse's mouth, here are a couple of MSDN knowledge base articles that you could find useful.

Q174156 - HOWTO: Programmatically Launch the Default Internet Browser
Q67673 - How to Determine When Another Application Has Finished
Q263909 - PRB: ShellExecuteEx Limits URL to MAX_PATH
ARTICLE - Cutting Edge: The Windows 98 Shell; MIND August 1998 (in VB but written by the Man himself)
ARTICLE - NTFS 5: A File System for the 21st Century




COM basics Exploring shell Contents