IsItMySource
IsItMySource is a program that can
- List source files stored in the debug information of EXE or PDB.
- Check whether a set of source files matches given EXE or PDB.
Source code:https://github.com/ikriv/IsItMySource
Binaries:https://github.com/ikriv/IsItMySource/releases
Why it is useful
I wrote this tool to verify whether given executable was compiled from given source code. Let’s say we need to modify a library or a tool X that is distributed in binary form, but was allegedly compiled from source revision Y. If it was really compiled from Y, then we can safely make necessary modifications and recompile, but if it was not compiled from Y, we may lose functionality, or introduce bugs.
In the ideal world each executable should contain an unambiguous reference to the location and revision of the source code, but in practice it is not always the case.
Usage
IsItMySource is a Console (command line) application. It requires .NET Framework 4 or later.
IsItMySource [options] exe_or_pdb_file [folder]
If folder is specified, checks whether source files in the folder match those specified in the PDB or EXE file. If folder is not specified, shows list of source files of EXE or PDB file.
Loading PDB files directly is only supported for unmanaged debug info. Managed debug info reader starts with an EXE file and searches for PDB.
OPTIONS
Refer to readme.md for the list of options. Scroll down to the "OPTIONS" section.
Managed vs Native Debug Information
Unfortunately, format of debug information is not well documented and tends to change over time. IsItMySource is known to work with programs compiled by Visual Studio 2013, 2015, and 2017, but it may fail with older programs, e.g. those compiled with Visual Studio 6.
There are at least two different types of debug information: managed (.NET) and unmanaged (native). .NET executables contain both, but only the managed portion has the checksums. Native executables contain only native debug information.
IsItMySource does not parse executable files by hand. Managed debug information is accessed via Diagnostics Symbols Store Interface that is shipped with .NET Framework. Unmanaged debug information is read via Debug Interface Access SDK that ships with Visual Studio. You must have Visual Studio and DIA SDK on the machine to read native debug information.
IsItMySource works with managed debug information by default. Attempt to access a native executable
will fail with BadImageFormatException
. Use --native
option to read
native debug information.
The Problem of Source Root
Debug information includes absolute paths of the source files on the machine where the code was compiled. When comparing this to source files on the local machine, debug information paths must somehow be mapped to local paths.
Suppose we run this command:
IsItMySource acme.exe c:\projects\acme
Let's assume acme.exe
that was compiled from the following files:
D:\dev\acme\src\Program.cs
D:\dev\acme\src\Interfaces\IFoo.cs
D:\dev\acme\src\Install\Installer.cs
We must decide how to map debug information paths to local paths. By default IsItMySource
calculates the longest common path of all files, in this case D:\dev\acme\src
,
and assumes that it corresponds to the local source folder. Thus, it will use the following
mapping:
IsItMySource acme.exe c:\projects\acme
(Debug information path) (Expected local path)
D:\dev\acme\src\Program.cs => c:\projects\acme\Program.cs
D:\dev\acme\src\Interfaces\IFoo.cs => c:\projects\acme\Interfaces\IFoo.cs
D:\dev\acme\src\Install\Installer.cs => c:\projects\acme\Install\Installer.cs
One can override this default behavior by using --root
option.
IsItMySource acme.exe c:\projects\acme --root D:\dev\acme
(Debug information path) (Expected local path)
D:\dev\acme\src\Program.cs => c:\projects\acme\src\Program.cs
D:\dev\acme\src\Interfaces\IFoo.cs => c:\projects\acme\src\Interfaces\IFoo.cs
D:\dev\acme\src\Install\Installer.cs => c:\projects\acme\src\Install\Installer.cs
If some files lie outside the specified root, they will be ignored:
IsItMySource acme.exe c:\projects\acme --root D:\dev\acme\src\Install
(Debug information path) (Expected local path)
D:\dev\acme\src\Install\Installer.cs => c:\projects\acme\Installer.cs
It is theoretically possible that source paths in the debug information do not have a common directory.
This could happen, for instance,if both c:\somefile.cs
and d:\otherfile.cs
are present.
In this case IsItMySource by default will ignore specified local source folder and will look for files
at their absolute locations. This is rarely useful, and in practice this means that --root
option must be specified to achieve meaningful results.
Ignoring Files
The list of source files for managed programs is relatievly straightforward, but native C++ programs
refer to a lot of system files. A simple test program with three original source files contains
over a hundred paths in its debug information, including the likes of
c:\program files (x86)\microsoft visual studio 14.0\vc\include\xstring
and
f:\dd\vctools\crt\vcstartup\src\rtc\stack.cpp
. The former is at least present on the
compiling machine, but the latter is not, and apparently this path comes from Microsoft's compile servers.
To do away with unnecessary source paths, IsItMySource automatically excludes any path that
matches one of the wildcards in the IgnoreFiles
configuration setting. At the time
of writing the list of the automatically ignored files is:
**\*.tmp;
**\*.pch;
**\Debug\**;
**\Release\**;
**\microsoft visual studio *\vc\include\**;
**\windows kits\**\include\**;
f:\dd\externalapis\**;
f:\dd\vctools\**;
f:\binaries.x86ret\inc\**;
**\reference assemblies\microsoft\framework\**\*.dll;
**\microsoft.net\framework\**\*.dll;
Additional ignore paths can be specified via --ignore
option. Use --allfiles
option to stop ignoring system files.
IsItMySource.exe --native Win32ConsoleApp.exe
c:\ivan\dev\github\isitmysource\testprojects\vs2015test\win32consoleapp\myfunc.cpp MD5 0BEF7000E0C253A66928FBF2664696CC
c:\ivan\dev\github\isitmysource\testprojects\vs2015test\win32consoleapp\program.cpp MD5 0FA5B7D722C9D7A2ECCD274E14104757
IsItMySource.exe --native --allfiles Win32ConsoleApp.exe
c:\ivan\dev\github\isitmysource\testprojects\vs2015test\win32consoleapp\myfunc.cpp MD5 0BEF7000E0C253A66928FBF2664696CC
c:\ivan\dev\github\isitmysource\testprojects\vs2015test\win32consoleapp\program.cpp MD5 0FA5B7D722C9D7A2ECCD274E14104757
c:\program files (x86)\microsoft visual studio 14.0\vc\include\cmath MD5 6196FAFAA60DB251F444A5710CACEEB9
c:\program files (x86)\microsoft visual studio 14.0\vc\include\exception MD5 5B9FAA891BF1D49FD54D78674B6AD335
c:\program files (x86)\microsoft visual studio 14.0\vc\include\ios MD5 A1FECDC7E7A90A86479D61B84C9EF656
...
... several dozen more files - click here to see full list
...
f:\dd\vctools\crt\vcstartup\src\utility\utility_desktop.cpp MD5 CC9AAE4BAA114C08FFC7F1515EC09E4C
File Statuses Explained
After verification, each file referenced in the debug information is put into one of the following buckets:
Ignored | The file in the ignore list. It will not be mentioned in any way. |
Skipped | --root option was specified and the file is outside of the given root. |
Verified | Local file was found and the checksum matches. |
Different | Local file was found, but the checksum does not match. |
Missing | Local file not found. |
Present | Local file found, but checksum is not specified, or checksum algorithm is not supported. |
Error | Local file found, but we could not calculate the checksum, e.g. could not open the file. |
DiaSymReader vs DIA SDK
DiaSymReader is a set of unmanaged COM interfaces that ships with .NET Framework. It allows to
read managed debug information. It can only open EXE files. The PDB file must be alongside the EXE
file or in the search path specified by --pdbdir
option.
Newer managed executables (most likely starting with VS 2015 update 3) contain SHA1 checksums of the sources, while older ones contain MD5 checksums.
DIA SDK is a set of unmanaged COM interfaces shipped with Visual Studio 2015 and 2017. It reads native debug information. It can open EXE files or PDB files. Unmanaged executables up to and including those compiled with VS 2017 contain MD5 checksums of the source files. Managed executables do contain unmanaged debug information, but it has no checksums at all.
IsItMySource does not attempt to detect whether the executable is managed or native, or whether appropriate symbol reader is installed. Attempt to use incorrect or missing reader will lead to exception.
Extensibility
IsItMySource is extensible: it abstracts debug info reader as IDebugInfoReader
interface.
If --use {name}
option is specified, IsItMySource will load
IsItMySource.{name}.dll
and look there for assembly-level attribute [assembly:DebugInfoEngine]
,
that should contain the type of debug info reader to instantiate.
DiaSymReader wrapper is implemented in IsItMySource.DiaSymReader.dll
, while DIA SDK wrapper
is implemented in IsItMySource.DiaSdk.dll
. Thus, --use DiaSymReader
is assumed by default,
and --native
is equivalent to --use DiaSdk
. Other debug info readers can be placed
in the same directory as IsItMySource.exe
and loaded via the --use
option without
recompilation as long as they implement IDebugInfoReader
.
Structure of the Source Code
The general idea behind the code is quite simple:
- Read debug information.
- Get list of files and their checksums.
- If local folder is not specified ("list" operation), stop.
- Otherwise, for each file locate corresponding file on local disk and calculate its checksum.
- See if the checksums match.
This is complicated by several issues:
- There are at least two types of debug information: managed and unmanaged.
- List of files returned from DIA SDK contains lots of extra files and may have duplicates.
- Finding local file corresponding to the debug info file may be tricky: see "The problem of the source root" above.
- File checksum may be MD5, SHA1, calculated by some other algorithm*, or not present at all.
* - in practice, "other algorithm" does not happen: it is either MD5, SHA1, or no value. However, the program must still handle that case gracefully.
Assembly Depenencies Diagram
The Program Class
The entry point of the IsItMySource program is the Program
class defined in IsItMySource.exe
. Its Main()
method
- Parses command line options.
- Creates debug information reader object.
- Gets the list of source files and filters out "junk" files.
- Creates the operation object: either "list" or "verify", depending on the arguments.
- Executes the operation.
Reading Debug Information
Debug information reader is represented by IDebugInfoReader
interface:
public interface IDebugInfoReader { IDebugInfo GetDebugInfo(string exeOrPdbfilePath, string pdbSearchPath); }
Actual implementation of this interface is instantiated via reflection from a plugin DLL, depending on the command line arguments. IsItMySource.DiaSymReader.dll
is loaded by default or if --managed
option is specified. IsItMySource.DiaSdk.dll
is loaded if --native
or --unmanaged
option is specified. Custom debug info reader can be loaded via --use {name}
option.
Once the plugin assembly is loaded, IsItMySource looks at the assembly-level attribute [assembly:DebugInfoEngine]
,
which contains the type of the debug info reader object to instantiate. It is DsrDebugInfoReader
for managed debug info,
and DiaSdkDebugInfo
for native debug info.
DiaSymReader
DiaSymReader
is an set of unmanaged COM components (32-bit and 64-bit) that is shipped with .NET Framework. Microsoft provides a wrapper assembly,
IsItMySource.DiaSymReader.DLL
as a Nuget package.
IsItMySource.DiaSymReader.dll
is another small wrapper that implements IDebugInfoReader
interface
in terms of DiaSymReader specific calls.
Dia SDK
DIA SDK is shipped with Visual Studio and can read native debug information. This is also a set of unmanaged COM components (32-bit and 64-bit), but there is no pre-built wrapper that allows to call it from C#.
The problem and the solution are described in more detail in my blog post "Calling non-automation compatible COM objects from .NET".
DiaSdk.cs
contains the class definitions necessary to call DIA SDK from C#. DiaSdkDebugInfoReader.cs
contains implementation of IDebugInfoReader
interface in terms of DIA SDK calls.
Filtering Unwanted Source Files
Debug info reader returns a collection of SourceFileInfo
objects. For managed debug info this collection can be used "as is" but for native
debug info it contains a lot of "junk" files.
SourceFilesFilter
class removes duplicates and filters out files that match any of the patterns specified in the ignore file
list. Default ignore file list is stored in the application configuration file. Additional ignore patterns can be added via --ignore
option. The effect of the default ignore list can be cancelled via --allfiles
option.
The List Operation
Once we received the (filtered) list of source files, we can do something useful with them. This is abstracted via IOperation
interface:
internal interface IOperation { void Run(IEnumerable<SourceFileInfo> sources, Options options); }
There are two implementations of this interface: ListSourcesOperation
and VerifySourcesOperation
.
ListSourcesOperation
method prints the list of source files and their checksums to the standard output.
The Verify Operation
VerifyListSourcesOperation
is a more complex implementation of IOperation
that
for each file specified in the debug information attempts to find an actual file on disk and verify its checksum.
FileVerifier
is a helper class that verifies a single file. It locates file on the local
disk, opens it, calculates the checksum, and reports whether the checksum matches the one in the debug information.
Depending on which checksum type is used, it will call either System.Security.Cryptography.SHA1
or System.Security.Cryptography.MD5
class.
Dependency Injection
IsItMySource is simple enough and does not really require a dependency injection container like Unity or StructureMap. However, it does use general dependency injection principles: the important classes have constructors that accept dependencies as interfaces. These constructors are used mostly for unit testing. In production scenario, another, "shortcut" constructor is used that supplies concrete dependencies to the "dependency injectino" constructor. For example:
// production constructor public VerifySourcesOperation(TextWriter output) : this(output, new FileVerifier()) { } // test constructor public VerifySourcesOperation(TextWriter output, IFileVerifier fileVerifier) { _output = output; _fileVerifier = fileVerifier; }
Unit Tests
IsItMySource has some unit tests in IsItMySource.Tests.dll
, although
the test coverage can definitely be improved. It does not use any mocking libraries like Moq, but implements some
mocks by hand.
Comments and Suggestions
If you have questions or comments, feel free to leave feedback.