Does that source match the binary? Part 1

A couple of weeks ago I ran into a problem: I had a DLL file, but I was not certain which version of the source code it was compiled from (yes, our release management could use some improvement).

Visual Studio debugger immediately knows if I try to give it a version of a source file that does not match the compiled binary, and shows a warning. But how does it know it?

The information about source files is stored in the PDB file (which must also exactly match the binary). PDB file format is not fully documented, but Microsoft provides at least two libraries to read it: DIA SDK, that is shipped as part of Visual Studio,and DiaSymReader interfaces that are shipped with .NET Framework. Despite being part of .NET framework, DiaSymReader interfaces are native COM interfaces, but Microsoft provides a managed wrapper as a NuGet package. 

Astonishingly, for the same PDB file these two libraries produce similar, but slightly different results. It turns out that PDB file may have two kinds of debug information, which I will call “managed” and “unmanaged”. DIA SDK accesses the “unmanaged” part. DiaSymReader accesses the “managed” part.

Here’s the summary of my findings so far:

Native C++ programs:
DIA SDK: returns list of source files with MD5 checksums.
DiaSymReader: does not work with unmanaged programs and returns an error.

Managed programs (C#, VB.NET):
DIA SDK: returns list of source files, but without checksums.
DiaSymReader: returns list of source files with SHA1 checksums (MD5 for Visual Studio 2013 and earlier).

Managed C++ programs:
DIA SDK: returns list of source files with MD5 checkums.
DiaSymReader: returns list of source files with SHA1 checksums (MD5 for Visual Studio 2013 and earlier).

Thus, managed C++ programs compiled by VS2015 contain both SHA1 and MD5 checksums of their source files in the PDB. On top of that Microsoft also has something called “Portable PDB format”, but I am not sure about the details yet.

Some other differences: DIA SDK interfaces can open a PDB file directly, or start with a binary and look for  a matching PDB in a given search path. DiaSymReader only supports the second mode: you cannot open a stand-alone PDB file without the binary.

 

2 Comments


  1. We insert Subversion revision# into source code (see “BuildOrigin” below) prior to building solution in TeamCity.
    Then this BuildOrigin value is available to us to see what revision is currently deployed to production (we only deploy from TeamCity builds, so BuildOrigin always contains SVN revision#).

    ———– VersionInfo.cs ———–
    namespace PostJobFree.Utilities
    {
    public static class VersionInfo
    {
    private const string BuildOrigin = “local”; // will be replaced on build machine by TeamCityBuildPjf.ps1 to “SVN Revision NNNNN, automated build time … UTC”
    …..
    ——————————————–

    ======= TeamCityBuildPjf.ps1 =======
    # access to teamcity configuration

    if($env:TEAMCITY_VERSION)
    {
    $file = (Resolve-Path ($env:TEAMCITY_BUILD_PROPERTIES_FILE + “.xml”)).Path;

    $buildPropertiesXml = New-Object System.Xml.XmlDocument
    $buildPropertiesXml.XmlResolver = $null
    $buildPropertiesXml.Load($file)
    $teamcity = @{}

    foreach($entry in $buildPropertiesXml.SelectNodes(“//entry”))
    {
    $key = $entry.key
    $value = $entry.’#text’
    $teamcity[$key] = $value
    }
    }

    if($teamcity)
    {
    $build_folder = $teamcity[“teamcity.build.checkoutDir”]
    $build_rev = $teamcity[“build.vcs.number”]
    $build_number = $teamcity[“build.number”]
    log “Starting TeamCity build, Rev. $build_rev”
    }
    …..
    if ($teamcity)
    {
    # copy SVN information

    $versionInfoFile = Join-Path $build_folder “PostJobFreeLibrary\Utilities\VersionInfo.cs”
    $text = Get-Content $versionInfoFile
    $text = $text -replace (‘(?<=BuildOrigin\s*=\s*")(local)(?=")'), ("SVN Revision $build_rev, automated build time $startTimeString UTC")
    Set-Content $versionInfoFile $text
    }
    …..
    =====================

    Why wouldn't you do something similar in your production builds?

    Reply

  2. Sorry for a delayed answer, I was on vacation.

    > Why wouldn’t you do something similar in your production builds?

    We are talking about a huge company with multiple teams, multiple source control repositories, etc., combined with a somewhat relaxed attitude towards rigorous software development practices (“we are not a software company”).

    This is not a company-wide mandatory policy, so most teams don’t do anything of the sort.

    Reply

Leave a Reply to Dennis Gorelik Cancel reply

Your email address will not be published. Required fields are marked *