IsItMySource

is a program that can

  • List source files stored in the debug information of EXE or PDB.
  • Check whether a set of source files matches given EXE or PDB.

Source code:https://github.com/ikriv/IsItMySource
Binaries:https://github.com/ikriv/IsItMySource/releases

Why it is useful

I wrote this tool to verify whether given executable was compiled from given source code. Let’s say we need to modify a library or a tool X that is distributed in binary form, but was allegedly compiled from source revision Y. If it was really compiled from Y, then we can safely make necessary modifications and recompile, but if it was not compiled from Y, we may lose functionality, or introduce bugs.

In the ideal world each executable should contain an unambiguous reference to the location and revision of the source code, but in practice it is not always the case.

Usage

is a Console (command line) application. It requires .NET Framework 4 or later.

IsItMySource [options] exe_or_pdb_file [folder]

If folder is specified, checks whether source files in the folder match those specified in the PDB or EXE file. If folder is not specified, shows list of source files of EXE or PDB file.

Loading PDB files directly is only supported for unmanaged debug info. Managed debug info reader starts with an EXE file and searches for PDB.

OPTIONS

Refer to readme.md for the list of options. Scroll down to the "OPTIONS" section.

Managed vs Native Debug Information

Unfortunately, format of debug information is not well documented and tends to change over time. is known to work with programs compiled by Visual Studio 2013, 2015, and 2017, but it may fail with older programs, e.g. those compiled with Visual Studio 6.

There are at least two different types of debug information: managed (.NET) and unmanaged (native). .NET executables contain both, but only the managed portion has the checksums. Native executables contain only native debug information.

does not parse executable files by hand. Managed debug information is accessed via Diagnostics Symbols Store Interface that is shipped with .NET Framework. Unmanaged debug information is read via Debug Interface Access SDK that ships with Visual Studio. You must have Visual Studio and DIA SDK on the machine to read native debug information.

works with managed debug information by default. Attempt to access a native executable will fail with BadImageFormatException. Use --native option to read native debug information.

The Problem of Source Root

Debug information includes absolute paths of the source files on the machine where the code was compiled. When comparing this to source files on the local machine, debug information paths must somehow be mapped to local paths.

Suppose we run this command:

IsItMySource acme.exe c:\projects\acme

Let's assume acme.exe that was compiled from the following files:

D:\dev\acme\src\Program.cs
D:\dev\acme\src\Interfaces\IFoo.cs
D:\dev\acme\src\Install\Installer.cs

We must decide how to map debug information paths to local paths. By default calculates the longest common path of all files, in this case D:\dev\acme\src, and assumes that it corresponds to the local source folder. Thus, it will use the following mapping:

IsItMySource acme.exe c:\projects\acme

(Debug information path)                (Expected local path)
D:\dev\acme\src\Program.cs           => c:\projects\acme\Program.cs
D:\dev\acme\src\Interfaces\IFoo.cs   => c:\projects\acme\Interfaces\IFoo.cs
D:\dev\acme\src\Install\Installer.cs => c:\projects\acme\Install\Installer.cs

One can override this default behavior by using --root option.

IsItMySource acme.exe c:\projects\acme --root D:\dev\acme

(Debug information path)                (Expected local path)
D:\dev\acme\src\Program.cs           => c:\projects\acme\src\Program.cs
D:\dev\acme\src\Interfaces\IFoo.cs   => c:\projects\acme\src\Interfaces\IFoo.cs
D:\dev\acme\src\Install\Installer.cs => c:\projects\acme\src\Install\Installer.cs

If some files lie outside the specified root, they will be ignored:

IsItMySource acme.exe c:\projects\acme --root D:\dev\acme\src\Install

(Debug information path)                (Expected local path)
D:\dev\acme\src\Install\Installer.cs => c:\projects\acme\Installer.cs

It is theoretically possible that source paths in the debug information do not have a common directory. This could happen, for instance,if both c:\somefile.cs and d:\otherfile.cs are present. In this case by default will ignore specified local source folder and will look for files at their absolute locations. This is rarely useful, and in practice this means that --root option must be specified to achieve meaningful results.

Ignoring Files

The list of source files for managed programs is relatievly straightforward, but native C++ programs refer to a lot of system files. A simple test program with three original source files contains over a hundred paths in its debug information, including the likes of c:\program files (x86)\microsoft visual studio 14.0\vc\include\xstring and f:\dd\vctools\crt\vcstartup\src\rtc\stack.cpp. The former is at least present on the compiling machine, but the latter is not, and apparently this path comes from Microsoft's compile servers.

To do away with unnecessary source paths, automatically excludes any path that matches one of the wildcards in the IgnoreFiles configuration setting. At the time of writing the list of the automatically ignored files is:

**\*.tmp;
**\*.pch;
**\Debug\**;
**\Release\**;
**\microsoft visual studio *\vc\include\**;
**\windows kits\**\include\**;
f:\dd\externalapis\**;
f:\dd\vctools\**;
f:\binaries.x86ret\inc\**;
**\reference assemblies\microsoft\framework\**\*.dll;
**\microsoft.net\framework\**\*.dll;

Additional ignore paths can be specified via --ignore option. Use --allfiles option to stop ignoring system files.

IsItMySource.exe --native Win32ConsoleApp.exe

c:\ivan\dev\github\isitmysource\testprojects\vs2015test\win32consoleapp\myfunc.cpp MD5 0BEF7000E0C253A66928FBF2664696CC
c:\ivan\dev\github\isitmysource\testprojects\vs2015test\win32consoleapp\program.cpp MD5 0FA5B7D722C9D7A2ECCD274E14104757
IsItMySource.exe --native --allfiles Win32ConsoleApp.exe
c:\ivan\dev\github\isitmysource\testprojects\vs2015test\win32consoleapp\myfunc.cpp MD5 0BEF7000E0C253A66928FBF2664696CC
c:\ivan\dev\github\isitmysource\testprojects\vs2015test\win32consoleapp\program.cpp MD5 0FA5B7D722C9D7A2ECCD274E14104757
c:\program files (x86)\microsoft visual studio 14.0\vc\include\cmath MD5 6196FAFAA60DB251F444A5710CACEEB9
c:\program files (x86)\microsoft visual studio 14.0\vc\include\exception MD5 5B9FAA891BF1D49FD54D78674B6AD335
c:\program files (x86)\microsoft visual studio 14.0\vc\include\ios MD5 A1FECDC7E7A90A86479D61B84C9EF656
...
... several dozen more files  - click here to see full list
...
f:\dd\vctools\crt\vcstartup\src\utility\utility_desktop.cpp MD5 CC9AAE4BAA114C08FFC7F1515EC09E4C

File Statuses Explained

After verification, each file referenced in the debug information is put into one of the following buckets:

IgnoredThe file in the ignore list. It will not be mentioned in any way.
Skipped--root option was specified and the file is outside of the given root.
VerifiedLocal file was found and the checksum matches.
DifferentLocal file was found, but the checksum does not match.
MissingLocal file not found.
PresentLocal file found, but checksum is not specified, or checksum algorithm is not supported.
ErrorLocal file found, but we could not calculate the checksum, e.g. could not open the file.

DiaSymReader vs DIA SDK

DiaSymReader is a set of unmanaged COM interfaces that ships with .NET Framework. It allows to read managed debug information. It can only open EXE files. The PDB file must be alongside the EXE file or in the search path specified by --pdbdir option.

Newer managed executables (most likely starting with VS 2015 update 3) contain SHA1 checksums of the sources, while older ones contain MD5 checksums.

DIA SDK is a set of unmanaged COM interfaces shipped with Visual Studio 2015 and 2017. It reads native debug information. It can open EXE files or PDB files. Unmanaged executables up to and including those compiled with VS 2017 contain MD5 checksums of the source files. Managed executables do contain unmanaged debug information, but it has no checksums at all.

does not attempt to detect whether the executable is managed or native, or whether appropriate symbol reader is installed. Attempt to use incorrect or missing reader will lead to exception.

Extensibility

is extensible: it abstracts debug info reader as IDebugInfoReader interface. If --use {name} option is specified, will load IsItMySource.{name}.dll and look there for assembly-level attribute [assembly:DebugInfoEngine], that should contain the type of debug info reader to instantiate.

DiaSymReader wrapper is implemented in IsItMySource.DiaSymReader.dll, while DIA SDK wrapper is implemented in IsItMySource.DiaSdk.dll. Thus, --use DiaSymReader is assumed by default, and --native is equivalent to --use DiaSdk. Other debug info readers can be placed in the same directory as IsItMySource.exe and loaded via the --use option without recompilation as long as they implement IDebugInfoReader.

Structure of the Source Code

The general idea behind the code is quite simple:

  1. Read debug information.
  2. Get list of files and their checksums.
  3. If local folder is not specified ("list" operation), stop.
  4. Otherwise, for each file locate corresponding file on local disk and calculate its checksum.
  5. See if the checksums match.

This is complicated by several issues:

  • There are at least two types of debug information: managed and unmanaged.
  • List of files returned from DIA SDK contains lots of extra files and may have duplicates.
  • Finding local file corresponding to the debug info file may be tricky: see "The problem of the source root" above.
  • File checksum may be MD5, SHA1, calculated by some other algorithm*, or not present at all.

* - in practice, "other algorithm" does not happen: it is either MD5, SHA1, or no value. However, the program must still handle that case gracefully.

Assembly Depenencies Diagram

Assembly dependencies diagram

The Program Class

The entry point of the program is the Program class defined in IsItMySource.exe. Its Main() method

  1. Parses command line options.
  2. Creates debug information reader object.
  3. Gets the list of source files and filters out "junk" files.
  4. Creates the operation object: either "list" or "verify", depending on the arguments.
  5. Executes the operation.

Reading Debug Information

Debug information reader is represented by IDebugInfoReader interface:

public interface IDebugInfoReader
{
    IDebugInfo GetDebugInfo(string exeOrPdbfilePath, string pdbSearchPath);
}

Actual implementation of this interface is instantiated via reflection from a plugin DLL, depending on the command line arguments. IsItMySource.DiaSymReader.dll is loaded by default or if --managed option is specified. IsItMySource.DiaSdk.dll is loaded if --native or --unmanaged option is specified. Custom debug info reader can be loaded via --use {name} option.

Once the plugin assembly is loaded, looks at the assembly-level attribute [assembly:DebugInfoEngine], which contains the type of the debug info reader object to instantiate. It is DsrDebugInfoReader for managed debug info, and DiaSdkDebugInfo for native debug info.

DiaSymReader

DiaSymReader is an set of unmanaged COM components (32-bit and 64-bit) that is shipped with .NET Framework. Microsoft provides a wrapper assembly, IsItMySource.DiaSymReader.DLL as a Nuget package.

IsItMySource.DiaSymReader.dll is another small wrapper that implements IDebugInfoReader interface in terms of DiaSymReader specific calls.

Dia SDK

DIA SDK is shipped with Visual Studio and can read native debug information. This is also a set of unmanaged COM components (32-bit and 64-bit), but there is no pre-built wrapper that allows to call it from C#.

The problem and the solution are described in more detail in my blog post "Calling non-automation compatible COM objects from .NET".

DiaSdk.cs contains the class definitions necessary to call DIA SDK from C#. DiaSdkDebugInfoReader.cs contains implementation of IDebugInfoReader interface in terms of DIA SDK calls.

Filtering Unwanted Source Files

Debug info reader returns a collection of SourceFileInfo objects. For managed debug info this collection can be used "as is" but for native debug info it contains a lot of "junk" files.

SourceFilesFilter class removes duplicates and filters out files that match any of the patterns specified in the ignore file list. Default ignore file list is stored in the application configuration file. Additional ignore patterns can be added via --ignore option. The effect of the default ignore list can be cancelled via --allfiles option.

The List Operation

Once we received the (filtered) list of source files, we can do something useful with them. This is abstracted via IOperation interface:

internal interface IOperation
{
    void Run(IEnumerable<SourceFileInfo> sources, Options options);
}

There are two implementations of this interface: ListSourcesOperation and VerifySourcesOperation.

ListSourcesOperation method prints the list of source files and their checksums to the standard output.

The Verify Operation

VerifyListSourcesOperation is a more complex implementation of IOperation that for each file specified in the debug information attempts to find an actual file on disk and verify its checksum.

FileVerifier is a helper class that verifies a single file. It locates file on the local disk, opens it, calculates the checksum, and reports whether the checksum matches the one in the debug information. Depending on which checksum type is used, it will call either System.Security.Cryptography.SHA1 or System.Security.Cryptography.MD5 class.

Dependency Injection

is simple enough and does not really require a dependency injection container like Unity or StructureMap. However, it does use general dependency injection principles: the important classes have constructors that accept dependencies as interfaces. These constructors are used mostly for unit testing. In production scenario, another, "shortcut" constructor is used that supplies concrete dependencies to the "dependency injectino" constructor. For example:

// production constructor
public VerifySourcesOperation(TextWriter output)
    :
    this(output, new FileVerifier())
{
}

// test constructor
public VerifySourcesOperation(TextWriter output, IFileVerifier fileVerifier)
{
    _output = output;
    _fileVerifier = fileVerifier;
}

Unit Tests

has some unit tests in IsItMySource.Tests.dll, although the test coverage can definitely be improved. It does not use any mocking libraries like Moq, but implements some mocks by hand.

Comments and Suggestions

If you have questions or comments, feel free to leave feedback.


Copyright (c) Ivan Krivyakov. Last updated: Jul 9, 2017