Please note: This blog is no longer active. My new blog is located at http://blog.timwheeler.io

Wednesday, November 17, 2010

Content Migration and Taxonomy Fields

 

I have been using Import-SPWeb and Export-SPWeb to migrate content from different environments and found an issue if the sites use Taxonomy Fields.

The problem in a nutshell

The content is exported from a site using the PowerShell command Export-SPWeb or with code. The site contains Taxonomy fields included in content types. The taxonomy fields are wired to a Managed Metadata Service Term Store. The taxonomy fields have id's to both a TaxonomyHiddenList at the root web level and to the Term stores themselves. These ID's are migrated along with the list. However in the next environment you try to import the content to the Managed Metadata Service has a different ID for the Term Store, and as such the taxonomy fields fail to work properly.

Note: I'm not sure if this is an issue if you use the default Managed Metadata Service, I am using a new one created from a PowerShell script.

Visible Issues

If the site was accessed as an anonymous user, some pages would error with:

 image

The error message might as well just say “Something bad’s happened, but you have to figure out what. Good luck!”  (Well, I could turn off custom errors but that would be too easy and harder to make fun of!)

So the Slightly more helpful are the trace logs which showed:

"Hidden list not found, creating new one"

"Creating taxonomy hidden list"

"Unknown SPRequest error occurred. More information: 0x80070005"

"System.ArgumentException: Value does not fall within the expected range."

Strangely, the first page would access correctly under anonymous and subsequent requests to other pages would fail.  I have a theory about why but wild speculation is probably best left unsaid.

Other Symptoms

The publishing root web contains a list called "TaxonomyHiddenList". This list is used by the taxonomy fields to store terms in use for the site. After using Import-SPWeb many extra entries where added to this list with an empty title.

If you use SharePoint Manager (very useful BTW), you will see a bunch of entries with “(no title)”.  As far as I can see this shouldn’t happen, normally when the taxonomy fields create these entries they include the label as the title. 

image

The above screen shot of SharePoint Manager shows the bad entries.  Notice as well that there are lots, 64 in this example.  If this was working properly you should see the following:

image

In this screen shot the entries have a valid title.  These values originally come from the terms in the term store.  These entries where created by a powershell script and added to the term store, they are some of the ones required for a dublin core and AGLS metadata implementation.  (Which is probably about as interesting as xml namespaces or writing documentation but I felt I had to explain them anyway)

The cause

If you look at the definition of the Microsoft.SharePoint.Taxonomy.TaxonomyField you will see a property called SspId. This property is the Guid of the Term Store that its connected to. After importing, this value points to the old term store. It also has a side effect of creating bad entries in the /Lists/TaxonomyHiddenList list and is likely the cause of the entries in the trace logs. The data in the field also points to the bad entry within that list rather than the entry it should point to. There is a good write up on how the entries are created here so I won't go into that part too much.

The fix

The fix for this is much simpler than tracking it down I assure you. After many frustrating hours and some stern words aimed directly at SharePoint I managed to coble together a script that suitably cleans my site collections after content migration.

The following is a PowerShell script that will clean up your publishing site.  Use at your own risk. Backup your data, etc, etc! Wish you the best of luck! Like a Microsoft error message Smile

Note: This script will utilise the field definitions in the root web.  If you are migrating an entire web then you should fix your content types prior to running this script.

Code Snippet
  1. #Script: CleanTaxonomyFields.ps1
  2. #Author: Tim Wheeler (http://tjwheeler.blogspot.com/)
  3. #Created: 17/11/2010
  4. #Notes:
  5. #When content in imported into a new environment the Term Store ID is kept on the
  6. #taxonomy fields.  This causes a problem as the new term store has a different id.
  7. #These functions clean up the taxonomy list which has bad entries following an import
  8. #and modify all the items and resets the correct values.
  9. param
  10.     (    $siteCollectionUrl = (Read-Host "Please enter a site collection url") )
  11.     
  12. #region SharePoint Snappin Setup
  13. $snapin="Microsoft.SharePoint.PowerShell"
  14. if (get-pssnapin $snapin -ea "silentlycontinue") {
  15.     write-host -f Green "PSsnapin $snapin is loaded"
  16. }
  17. else {
  18.     if (get-pssnapin $snapin -registered -ea "silentlycontinue") {
  19.         write-host -f Green "PSsnapin $snapin is registered"
  20.         Add-PSSnapin $snapin
  21.         write-host -f Green "PSsnapin $snapin is loaded"
  22.     }
  23.     else {
  24.         write-host -f Red "PSSnapin $snapin not found"
  25.     }
  26. }
  27. #endregion
  28.     
  29. #region Data Clean Up
  30.  
  31. function CleanTaxonomyList($site)
  32. {
  33.     Write-Host "Cleaning /Lists/TaxonomyHiddenList"
  34.     $web = $site.RootWeb
  35.       $taxlist =$web.GetList("Lists/TaxonomyHiddenList")
  36.     for($count = $taxlist.Items.Count - 1; $count -ge 0; $count--)
  37.     {
  38.          $item = $taxlist.Items[$count]
  39.         if([string]::IsNullOrEmpty($item.Title))
  40.         {
  41.             $item.Delete()
  42.             if($?)
  43.             {
  44.                 #Write-Host "Deleted Item with bad title"
  45.             }
  46.             else
  47.             {
  48.                 Write-Host -ForegroundColor Red "Failed to deleted Item with bad title"
  49.             }
  50.         }
  51.         
  52.     }
  53. }
  54. #This function will reset the term store id if it is wrong, and will fix the WssId.
  55. #If the correct term cannot be located, the default one will be used.
  56. function ResetTaxonomyDefaults($site, [Microsoft.SharePoint.Publishing.PublishingWeb] $web)
  57. {
  58.     $txs = New-Object "Microsoft.SharePoint.Taxonomy.TaxonomySession" -ArgumentList $site
  59.    $pages = $web.GetPublishingPages()
  60.    foreach($page in $pages)
  61.    {
  62.         Write-Host -ForegroundColor Cyan "Checking publishing page " $page.Title
  63.        if($page.ListItem.File.CheckOutStatus -ne "None")
  64.         {
  65.             $page.CheckIn("Checked in by data clean process");
  66.         }
  67.         $page.CheckOut();
  68.        foreach ($field in $page.ListItem.Fields)
  69.        {
  70.            if($field.GetType().Name -eq "TaxonomyField")
  71.            {
  72.                $taxField = [Microsoft.SharePoint.Taxonomy.TaxonomyField] $field
  73.                 
  74.                 Write-Host "Found field to update:" $taxField.Title
  75.                 $currentValue = $page.ListItem.Properties[$taxField.InternalName]
  76.                 Write-Host "Current Value is" $currentValue
  77.                 $templateField = $page.ListItem.ParentList.ParentWeb.Site.RootWeb.Fields[$field.Id]
  78.                 $defaultValue = $templateField.DefaultValue
  79.                 $termStore = $txs.TermStores[$templateField.SspId]
  80.                 $termSet = $termStore.GetTermSet($templateField.TermSetId)
  81.                 if($taxField.SspId -ne $templateField.SspId)
  82.                 {
  83.                     Write-Host "TaxField SspId is not correct, updating"
  84.                     $taxField.SspId = $templateField.SspId
  85.                     $taxField.Update()
  86.                 }
  87.                 if($taxField.TermSetId -ne $templateField.TermSetId)
  88.                 {
  89.                     Write-Host "TaxField TermSetId is not correct, updating"
  90.                     $taxField.TermSetId = $templateField.TermSetId
  91.                     $taxField.Update()
  92.                 }
  93.                 $fieldValue = $templateField.GetFieldValue($currentValue)
  94.                   if(($fieldValue.GetType().Name -eq "TaxonomyFieldValue") -and($fieldValue -eq $null -or [string]::IsNullOrEmpty($fieldValue.TermGuid)))
  95.                 {
  96.                     $fieldValue = $templateField.GetFieldValue($defaultValue)
  97.                 }
  98.                   if(($fieldValue.GetType().Name -ne "TaxonomyFieldValue") -and($fieldValue[0] -eq $null -or [string]::IsNullOrEmpty($fieldValue[0].TermGuid)))
  99.                 {
  100.                     $fieldValue = $templateField.GetFieldValue($defaultValue)
  101.                 }
  102.                 if($fieldValue.GetType().Name -eq "TaxonomyFieldValue")
  103.                 {
  104.                     try
  105.                     {
  106.                         $term = $termSet.GetTerm($fieldValue.TermGuid)
  107.                     }
  108.                     catch
  109.                     {
  110.                           Write-Host -red ("Failed to update field {0} for page {1} in web {2}"-f $taxField.InternalName, $page.Title, $web.Url)
  111.                     }
  112.                 }
  113.                 else
  114.                 {
  115.                     try
  116.                     {
  117.                         $term = $termSet.GetTerm($fieldValue[0].TermGuid)
  118.                     }
  119.                     catch
  120.                     {
  121.                           Write-Host -red ("Failed to update field {0} for page {1} in web {2}"-f $taxField.InternalName, $page.Title, $web.Url)
  122.                     }
  123.                 }
  124.                 $taxField.SetFieldValue($page.ListItem, $term)
  125.            }
  126.        }
  127.         $page.ListItem.Update()
  128.        $page.CheckIn("Data clean process")
  129.    }
  130.  
  131. }
  132. function CleanSiteCollection ($siteColUrl)
  133. {
  134.     [Microsoft.SharePoint.SPSite] $site = get-spsite -Limit ALL | where-object {$_.Url -ieq $siteColUrl}
  135.     if($site -eq $null)
  136.     {
  137.         Write-Host -ForegroundColor Red "Unable to find site collection"
  138.         throw "Unable to find site collection"
  139.     }
  140.     CleanTaxonomyList $site
  141.    $site | Get-SPWeb -limit all | ForEach-Object {
  142.             #Check to see if site is a publishing site
  143.             if ([Microsoft.SharePoint.Publishing.PublishingWeb]::IsPublishingWeb($_))
  144.             {
  145.                 Write-Host "Cleaning pages in `"$($_.Title)`" site."
  146.                 #Get the Publishing Web and pages within it
  147.                 $publishingWeb = [Microsoft.SharePoint.Publishing.PublishingWeb]::GetPublishingWeb($_)
  148.                 ResetTaxonomyDefaults $site $publishingWeb    
  149.             }
  150.             $_.Dispose()
  151.         }
  152.     $site.Dispose()
  153. }
  154. #endregion
  155. CleanSiteCollection $siteCollectionUrl

3 comments:

  1. On SP2010 SP 1 it errors with:

    Multiple ambiguous overloads found for "SetFieldValue" and the argument count: "2".
    At C:\Users\Administrator\Desktop\CleanSiteCollection.ps1:124 char:40
    + $taxField.SetFieldValue <<<< ($page.ListItem, $term)
    + CategoryInfo : NotSpecified: (:) [], MethodException
    + FullyQualifiedErrorId : MethodCountCouldNotFindBest

    ReplyDelete
  2. Hi austrheim,
    Because powershell isn't strongly typed we need to be explicit with which overload we are calling. I'll have to check the sp1 api because I wrote this in RTM. A simple fix is to cast to the correct datatype for the overload. eg; $taxField.SetFieldValue([Microsoft.SharePoint.ListItem] $page.ListItem, [Microsoft.SharePoint.Taxonomy.Term
    ] $term)

    Note: I haven't tested the above line, I will update the post when I get a chance. Thanks for letting me know.

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete