SemanticMerge intro guide


SemanticMerge tool

The SemanticMerge tool is a language-dependent source code merge tool. It can make a huge number of merge scenarios really easy - particularly ones that can't be managed by current text-based, language-agnostic, merge tools.

We have started with C# and Visual Basic, next was Java and C++ to follow. Further languages will be chosen based on user feedback: JavaScript, Objective-C, Ruby? It's your call.

Our SemanticMerge tool leverages the current merge technology of the merge tool included in Plastic SCM - which is already capable of dealing with refactors through Xmerge - and also the merge system of the Plastic SCM server itself, and combines them together with language-dependent parsing to create the ultimate source code merging machine.

SemanticMerge is not limited to Plastic SCM. It can be configured to work with Git, Subversion, Perforce, ClearCase, Team Foundation Server, Mercurial, and many others.

Please have in mind that our bundled JVM-base parsers require JVM version 8 or higher.

The merge problem

These days, software development is based on the "modify-merge" working pattern: developers work in parallel on the same codebase performing concurrent changes - potentially inside the same files - that will need to be reconciled back ("merged" in version control terms). In order to perform the merge, developers rely on "merge tools".

All merge systems use text based algorithms. The tools won't actually consider the programming language the code is written in, only the modifications made to the text. This way, all merge tools are language-unaware, and hence, they all have a wide operation range.

Not being able to act based on the specific programming language structures means that the merge tools are heavily dependent on the position of the texts being modified, which severely restricts developers' ability to perform changes concurrently and improve code quality and readability by refactoring.


Language aware merge tool

But, how would a merge tool with programming language support behave? Suppose the following scenario:

Language aware merge tool - Scenario

This merge would be a nightmare for any merge tool on the market, but at the end of the day there's not a single conflict, if you look at it from a programmer's point of view. You just made some refactors in parallel, that's all.

So you'd expect the following result:

Language aware merge tool - Result

And this is exactly what we get with SemanticMerge!

Since it looks into the code structure and doesn't use a textual comparison method to compare the three contributors (it is a 3-way merge after all), the algorithm is not restricted by the relative positions of the texts it compares. Learn more about 2-way vs 3-way merge.

Our tool parses the code, checks the obtained structures, and merges based on the "code trees" of the base plus the 3 contributors, automatically providing the expected result.


Enter SemanticMerge - class splitting

Let's check the Socket refactor example again with actual code, using the SemanticMerge tool. These are the three files to merge (the same example as above, but now with the actual code):


Base file

using System.Net;

namespace Network
{
    internal class Socket
    {
        internal string GetHostByName(string addr)
        {
            // this method returns the host
            // when you give an IP
            return CalculateHostByName(addr);
        }

        internal void Listen()
        {
            // do the listen on a port
            // and whatever it is needed
            // to listen
        }

        internal void ConnectTo(string addr)
        {
            // connect to a client
            Net.ConnectTo(addr);
        }

        internal int Send(byte[] buffer)
        {
            System.IO.Write(buffer);
        }

        internal int Recv(byte[] buffer)
        {
            System.IO.Read(buffer);
        }
    }
}

Source contributor (changed by developer 2)

using System.Net;

namespace Network
{
    internal class Socket
    {
        internal string GetHostByName(string addr)
        {
            // this method returns the host
            // when you give an addr
            return CalculateHostByName(addr);
        }

        internal void Listen()
        {
            // do the listen on a port
            // and whatever it is needed
            // to listen
        }

        internal int Recv(byte[] buffer)
        {
            System.IO.Read(buffer);
        }
    }
    internal class ClientSocket
    {
        internal int Send(byte[] buffer)
        {
            System.IO.Write(buffer);
        }

        internal void ConnectTo(string addr)
        {
            // connect to a client
            Net.ConnectTo(addr);
        }
    }
}

Destination contributor

using System.Net;

namespace Network
{
    internal class ServerSocket
    {
        internal int Recv(byte[] buffer)
        {
            System.IO.Read(buffer);
        }

        internal void Listen()
        {
            // do the listen on a port
            // and whatever it is needed
            // to listen
        }
    }

    internal class DNS
    {
        internal string GetHostByName(string addr)
        {
            // this method returns the host
            // when you give an IP
            return CalculateHostByName(addr);
        }

        internal void ConnectTo(string addr)
        {
            // connect to a client
            Net.ConnectTo(addr);
        }

        internal int Send(byte[] buffer)
        {
            System.IO.Write(buffer);
        }
    }
}

SemanticMerge output

By working with the three files above, here is the result that SemanticMerge provides:

SemanticMerge output

As you can see, there are no pending conflicts to be resolved and the result file looks exactly how you expect.


Slightly more complex merge - manual conflict

Let's consider again the scenario of the Socket class that is split into three classes. What if the two developers decide to modify the Send() method?

The "Source contributor" would modify Send() on its final location inside ClientSocket. While "Destination contributor" would get the Send() modified inside the renamed (and moved) DNS class:

Manual conflict - Scenario

Let's see how SemanticMerge detects the conflict:

Manual conflict - Detecting the conflict

As you can see, the tool only detects 1 conflict (try to do the same with a conventional text-based merge tool and you'll enter into nightmare mode).

You can run your favorite 3-way merge tool to solve the merge (by default, SemanticMerge comes with its own 3-way mergetool, the one included in Plastic SCM that is able to do Xmerge):

Manual conflict - Xmerge

In my example, the merge has been fully automatic for the method too, since I didn't modify the same part of the method.

The advantage, as you can see, is that you use the divide and conquer method: you merge method by method (if needed) instead of the entire file, and SemanticMerge is able to detect the classes, methods, properties, etc. independently of their locations. SemanticMerge can track them when they've been moved, renamed, and more.


Going deeper - dealing with a divergent move

After the previous merge cycle, now suppose we take the resulting file as the base for the next iteration.

The two developers now decide to move the Listen() method, but each of them to a different location, as the following image shows:

Divergent move - Scenario

This is what the SemanticMerge tool detects as a divergent move and it will be handled as follows:

Divergent move - Detecting the move

The developer running the merge can choose whether he wants to keep the method on the source location (keep src button), the destination (keep dst button), or even duplicate it, keeping both contributors.

The move can be explained by clicking the explain move button. It checks how it was located in two different positions:

Divergent move - Explaining the move

What if the two developers not only moved the Listen() method but also modified it? Let's see how the tool handles the case:

Divergent move - Move and modify

As you can see there is a "double conflict" on the method: you first have to resolve the changed/changed conflict and then the moved/moved.

In either case, it makes easy a situation that would be close to impossible to deal with using traditional text-based merge tools.


More merging capabilities

You guessed it right! SemanticMerge is able to understand the code structure and hence there are many situations where it can be a great aid:

  • Suppose you always want to review conflicts if one method is modified in parallel - Text-based merge tools can detect when a block of text has been modified in parallel, but if you modified the first line of a method and someone else modified the last part, the merge will be automatic, even if there are potential logic issues. This is easily handled by SemanticMerge.
  • Usings (or imports in Java jargon) are also handled by the system - If you add using System.Text on the first line and I add it on the fifth, SemanticMerge knows it is the same using so it will only add it once.
  • Changed/deleted - Suppose you modified a method inside a subclass and I go and delete the class. SemanticMerge will deal with this specific case.
  • The same holds true for many other scenarios like moved/moved, added/moved, and so on.
  • What if I modify two methods and you go and decide to rearrange the class based on visibility rules? Public goes first, then internal, protected, and finally private. It will be an automatic, easy merge for SemanticMerge.

If we listed all the merge cases specifically handled by SemanticMerge, we'd be here all day. This short list, however, gives a pretty good idea of what the tool can do.


SemanticMerge configuration

SemanticMerge offers some options to consider when running the tool.

  • The General tab shows the merge options you can select:

    SemanticMerge configuration dialog - General

    • -a - Tells the tool to automatically merge until a conflict appears. Then, the user interaction is needed.
    • --merge-decl - Merge automatically as many declarations* as possible.
    • --include-format-change - Include changes where only indentation and EOLs have been modified. This includes differences in white spaces and tabs at the beginning of lines and differences in EOLs. It also includes white lines at the beginning of declarations. By default, all these differences are ignored to simplify the merge/diff.
    • --process-all-merges - Merge automatically as many declarations* as possible and run the external text based tool for each non-automatic merge declaration.
    • --nolangwarn - Skip the "no supported language" dialog, and directly launch the text-based tool.
    • --nostructurewarn - Skip the structure errors dialog on startup and directly launch the associated text-based tool.
    • Encoding - The way in which the content of the files is interpreted as text characters.
    • Java Virtual Machine path - Specify where the virtual machine for Java is.
    • External parser - Enter the command to run an external parser (if wanted). Learn more about External parsers.
    * A declaration is the statement that defines any of the supported syntax elements: classes, methods, attributes, etc. Depending on the element type (for example, classes, methods), they include a body where the element is implemented.
  • From the External tools tab you can select or customize the tools you want to use for diffing and merging:

    SemanticMerge configuration dialog - External tools

    Read some examples about how to configure some external tools.

  • The Version controls tab lets you configure Plastic SCM or Git as the linked version control:

    SemanticMerge configuration dialog - Version controls


SemanticMerge license server


Server Linux installation (Red Hat based)

Run the following commands with root permissions to install the Semantic license server and the admin tool:

wget https://www.plasticscm.com/plasticrepo/plasticscm-common/RHEL_6.3/plasticscm-common.repo -O /etc/yum.repos.d/plasticscm-common.repo wget https://www.semanticmerge.com/semanticrepo/RHEL_6.3/semanticmerge.repo -O /etc/yum.repos.d/semanticmerge.repo yum install semanticmerge-license-server

(The bin files will be located at: /opt/semanticlicserver)

A semanticlicserverd init.d service will be automatically started afterwards.


Machine ID generation

Use the licadmin command to generate the machine id.

licadmin is a program to administrate the SemanticMerge licenses. Type the following command to generate the machine id:

licadmin generateid

We will need this machine id to generate your license.


License installation

Once you have the license ready, copy your license.lic file under the directory /var/lib/semanticlicserver (maybe you have to create the folder with root privileges).

The license info is reloaded and updated every 5 minutes. But if you want to reload immediately, restart the license server service:

sudo /etc/init.d/semanticlicserverd restart

Client configuration

Every client working with the licenseserver must configure it by setting up the server address in the licenseserver.conf file .The file must contain a single line with the license server name or IP address.

The file must be located under the SemanticMerge installation directory.


licadmin

The licadmin application allows the user to administrate the SemanticMerge license.

The command supports the following options:

  • generateid - Generates the id of this machine (required to build a license afterwards).
  • list - Lists the current licensed users.
  • licinfo - Shows info about the installed license.
  • deactivateuser <user_id> - Removes an user identified by user_id from the licensed users list.
  • help - Prints this help.

Last updated

July 31, 2017
  • Please check the minimun JVM required version.
  • June 16, 2017
  • We included documentation about how to configure SemanticMerge.
  • July 11, 2016
  • The documentation has been updated to SemanticMerge 2.0.