Sunday, November 18, 2012

Content Type Hub in Globally Distributed SharePoint Environment

The cross-farm service application publishing and consuming is quite interesting ability which offers additional dimension for architecting complex SharePoint implementations. On the other hand it looks like the SharePoint product team had in mind only basic scenarios in the level of service applications association to a web application on different farm, but did not take in consideration the other SharePoint functionalities dependency. Early this year I wrote a post Alternate Access Mappings in Cross-Farm Search Scenario focused on how to make the cross-farm consumed Search service application aware of your Alternate Access Mapping settings on the consuming farm. This post is focused on how to chain Content Type Hubs to be able to distribute content types in globally distributed SharePoint deployment scenarios.

Recently I was involved in one project, where the customer had requirement to be able to modify content types in single location, but propagate the changes to site collections hosted on several farms distributed round the globe and mostly connected over low bandwidth and high latency WAN connections. There are actually two solution options to fulfill those customer requirements.

The first option is to configure single Managed Metadata service application and a Content Type Hub in central location and establish cross-farm connections from the regional farms to the central farm to consume content types from this single location. If the content type distribution is configured this way, it must be taken in consideration that each subscribing site collection will receive a copy of the published or republished content types. You should not forget that the transfer of the content types is done over relatively slow WAN connections and therefore it might get some time to transfer large amount of content types. Even if you can leverage any WAN acceleration and optimization technology, which might considerably decrease amount of transferred data (the same content types are going to be transferred many times and therefore can be cached on destination location WAN acceleration device), the Content Type Subscriber timer jobs on the regional farms will probably run for ages. The case when there is no WAN acceleration and optimization infrastructure in place will show significant network bandwidth consumption.

The Content Type Hub "chaining" is the second solution option. The following part of this post describes how the Content Type Hub chaining works and what are the advantages. The Content Type Hub chaining is based on principles of:
  • leveraging the SharePoint out of the box functionality
  • minimizing network bandwidth consumption
  • enabling users to make content type changes only once
  • minimizing administrative effort for content type distribution
Each regional farm hosting site collections that consume content types from the Content Type Hub should have configured local Managed Metadata service application and connection to a Managed Metadata service application running on the central farm. The central farm hosts the central Content Type Hub, which is going to be used as the single location where the users create, modify and publish the content types. Each regional farm hosts also a Content Type Hub site collection, which consumes the published content types from the central Content Type Hub, but also works as subscription point for the other site collections hosted on the regional farm. The concept looks simple and in fact it is simple, but there are small settings that should be done to make it work and make it manageable. Let’s look at the following diagram showing the Content Type Hub changing in details.

Figure 1: Content Type Hub Chaining

The diagram shows two farms, the Service Applications Publishing farm (the central farm) and the Service Applications Consuming farm (the regional farm). The real deployment can consist of multiple consuming farms. The central farm runs a Managed Metadata service application, which we can call "Parent" Managed Metadata service application. This service application is associated with "Parent" Content Type Hub site collection hosted on web application "A" which is associated with services application is the "default" proxy group on the central farm. This is not mandatory to use the "default" proxy group and also have the Parent Content Type Hub hosted on a web application on the central farm, but I made it this way to simplify the scenario example. In any case there must be a proxy for the Parent Managed Metadata services application and the web application hosting the Parent Content Type Hub must be associated with it via any proxy group, because this connection makes the content type publishing happen. If you do not make the association between the web application hosting the Parent Content Type Hub and the Parent Managed Metadata service application proxy, because you think that the Content Type Hub URL is already configured in the service application properties (so it should know about it), you get "No valid proxy can be found to do this operation." error when you try to publish any modified or created content type.

The Parent Managed Metadata service application proxy configuration options for content type syndication can be configured in the case you plan to use the Parent Content Type Hub as a distribution point also for the locally hosted site collections on the central farm, otherwise you can leave them cleared.

The configuration of the regional farm is little bit more complex. First of all you need to configure a dedicated web application (web application "B") hosting the "Child" Content Type Hub. In normal case it is not necessary to waste the server resources in dedicated web application just for single site collection hosting the content type hub, but in this case it is required, because this web application needs to be associated with specific set of service application proxies using dedicated proxy group (we can name it "CTH Chain"). The CTH Chain proxy group includes connection to the locally configured Managed Metadata service application on the remote farm ("Child" Managed Metadata service application) and the connection to the Parent Managed Metadata service application consumed from the central farm. The Parent Managed Metadata service application proxy should be configured with both content type syndication options and also as the default Managed Metadata service application connection in this proxy group, because the Child Content Type Hub consumes through this connection the content types published in the Parent Content Type Hub.

The cross-farm Parent Managed Metadata service application connection should be configured the standard way as it is described in many blog posts or the TechNet documentation sections Share service applications across farms (SharePoint Server 2010) and Share service applications across farms in SharePoint 2013.

Now it is possible to configure (on the regional farm) one or multiple web applications (web application "C" in the diagram) hosting the site collections and the sites for the end users' content. The web application "C" should be associated with the Child Managed Metadata service application proxy which is included in the "default" proxy group on the regional farm. The "default" proxy group on this farm should not include the Parent Managed Metadata service application proxy, because the content types are going to be distributed to this web application hosted site collections from the locally configured Child Content Type Hub using the Child Managed Metadata service application. This last step of the configuration is the standard way to configure content type hub in any SharePoint farm.

The second and probably most important part of the Content Type Hub chaining concept is the process of pushing the new content types and existing content types' modifications down the stream to the site collections content type galleries on the regional farm(s). The process is quite simple and starts to work as soon as you finish the chaining configuration. The process steps are included in the diagram above, but I'm going to write them here with more details:
  1. The end user creates or modifies content type in the Parent Content Type Hub on the central farm. The user publishes or republishes the content types.
  2. Once the "Content Type Hub" timer job (MetadataHubTimerJob) on the central farm is executed, the published and republished content types are recognized by the Parent Managed Metadata service application and prepared to being pushed to the Child Content Type Hub site collection. This timer job runs by default every 15 minutes.
  3. The identified content types are pushed to the Child Content Type Hub site collection by the "Content Type Subscriber" timer job (MetadataSubscriberTimerJob) running for the web application "B" on the regional farm(s). This timer job runs by default every hour.
  4. The new or updated content types received to the Child Content Type Hub site collection content type gallery have to be published or republished to be identified for distribution to the site collections content type galleries on the regional farm(s).
  5. The "Content Type Hub" timer job (MetadataHubTimerJob) on the regional farm(s) is executed and the published and republished content types are recognized by the Child Managed Metadata service application and prepared to being pushed to the site collections hosted in the web application "C". This timer job runs by default every 15 minutes.
  6. The identified content types are pushed to the site collections hosted in the web application "C" by the "Content Type Subscriber" timer job (MetadataSubscriberTimerJob) running for this web application on the regional farm(s). This timer job runs by default every hour.
Now should be all updates available in the destination content type galleries.

From the process description is understandable that it is mixture of automated and manual tasks, and the overall time spent by running the whole process is quite unpredictable as it is mostly dependent on several timer job schedules and ability to recognize whether particular content type update was already propagated or not. In fact there are three options to run the overall process and each option has different effort demand and level of control over the process state.
  1. The process execution is left to be driven by the timer job schedules, which only requires the end user interaction to publish and republish the content types in each Content Type Hub according to the process flow. This approach requires some waiting time between the steps to make sure that the timer jobs did their work and the content types can be published or republished in next level of the Content Type Hub chain.
  2. The process execution is controlled manually leveraging the Central Administration web site "Monitoring" and "Job Definitions" functionality to execute the timer jobs on demand. This option gives the end users and administrator more control over the process flow, but still requires significant effort to push necessary content types to the destination content type galleries.
  3. Using simple or more robust PowerShell scripts to automate parts of the process or the whole process in the end-to-end manner. This approach gives users and administrators full control over the process flow and significantly decreases required effort.
I’m personally very strong supporter of the last option, and because I want as many as possible readers to join the "Community of the 3rd option supporters", I’m going to publish also the key blocks of the PowerShell scripts that can be used by anyone who wants to automate the Content Type Hub functionality of try the Content Type Hub chaining concept.

Here is a simple command to run the "Content Type Hub" timer job. It does not need specification of any web application as it is running in the farms level.
Start-SPTimerJob -Identity "MetadataHubTimerJob"

The "Content Type Subscriber" timer job is bound with particular web application and because you might have more than one web application in your farm, you need to identify and execute only specific timer job. Here is the PowerShell script block doing this work for you.
$webAppURL = "<YourWebApp URL>"
$timerJob = Get-SPTimerJob -WebApplication $webAppURL | `
?{$_.Name -match "MetadataSubscriberTimerJob"}
Start-SPTimerJob $timerJob

The most time consuming task for the users and administrators is to publish or republish number of content types, which could grow up to multiple hundreds. To automate this task I’m using the SharePoint object model "ContentTypePublisher" class with the "Publish" method. There is also the "IsPublished" method, which returns Boolean value indicating whether the content type is published or not, but unfortunately there is no way to find whether the already published content type has been modified and needs to be republished. Here is a simple script block which was originally posted by Terry Cornwell and modified by me. This PowerShell script block publishes all content types in particular content type gallery group:
$CTGroup = "<GroupNameInContentTypeGallery>"
$CTHUrl = "<YourContentTypeHub URL>"
$CTHSite = Get-SPSite $CTHUrl
if(!($CTHSite -eq $null)) {
   $contentTypePublisher = New-Object `
   Microsoft.SharePoint.Taxonomy.ContentTypeSync.ContentTypePublisher($CTHSite)
   $CTHSite.RootWeb.ContentTypes | ? {$_.Group -match $CTGroup} | % {
      $contentTypePublisher.Publish($_)
      write-host "Content type"$_.Name "has been published"-foregroundcolor Green
   }
}
$CTHSite.Dispose()

Well, and this is it.

Mmmm. Maybe not. Just one little recommendation to conclude this blog post. It is always good to create custom content types in custom content type groups as it provides better flexibility in managing the content types and also gives the users bigger comfort in finding and using specific content type.

No comments:

Post a Comment