I have been using Import-SPWeb and Export-SPWeb to migrate content from different environments and found an issue if the sites use Taxonomy Fields.
The problem in a nutshell
The content is exported from a site using the PowerShell command Export-SPWeb or with code. The site contains Taxonomy fields included in content types. The taxonomy fields are wired to a Managed Metadata Service Term Store. The taxonomy fields have id's to both a TaxonomyHiddenList at the root web level and to the Term stores themselves. These ID's are migrated along with the list. However in the next environment you try to import the content to the Managed Metadata Service has a different ID for the Term Store, and as such the taxonomy fields fail to work properly.
Note: I'm not sure if this is an issue if you use the default Managed Metadata Service, I am using a new one created from a PowerShell script.
Visible Issues
If the site was accessed as an anonymous user, some pages would error with:
The error message might as well just say “Something bad’s happened, but you have to figure out what. Good luck!” (Well, I could turn off custom errors but that would be too easy and harder to make fun of!)
So the Slightly more helpful are the trace logs which showed:
"Hidden list not found, creating new one"
"Creating taxonomy hidden list"
"Unknown SPRequest error occurred. More information: 0x80070005"
"System.ArgumentException: Value does not fall within the expected range."
Strangely, the first page would access correctly under anonymous and subsequent requests to other pages would fail. I have a theory about why but wild speculation is probably best left unsaid.
Other Symptoms
The publishing root web contains a list called "TaxonomyHiddenList". This list is used by the taxonomy fields to store terms in use for the site. After using Import-SPWeb many extra entries where added to this list with an empty title.
If you use SharePoint Manager (very useful BTW), you will see a bunch of entries with “(no title)”. As far as I can see this shouldn’t happen, normally when the taxonomy fields create these entries they include the label as the title.
The above screen shot of SharePoint Manager shows the bad entries. Notice as well that there are lots, 64 in this example. If this was working properly you should see the following:
In this screen shot the entries have a valid title. These values originally come from the terms in the term store. These entries where created by a powershell script and added to the term store, they are some of the ones required for a dublin core and AGLS metadata implementation. (Which is probably about as interesting as xml namespaces or writing documentation but I felt I had to explain them anyway)
The cause
If you look at the definition of the Microsoft.SharePoint.Taxonomy.TaxonomyField you will see a property called SspId. This property is the Guid of the Term Store that its connected to. After importing, this value points to the old term store. It also has a side effect of creating bad entries in the /Lists/TaxonomyHiddenList list and is likely the cause of the entries in the trace logs. The data in the field also points to the bad entry within that list rather than the entry it should point to. There is a good write up on how the entries are created here so I won't go into that part too much.
The fix
The fix for this is much simpler than tracking it down I assure you. After many frustrating hours and some stern words aimed directly at SharePoint I managed to coble together a script that suitably cleans my site collections after content migration.
The following is a PowerShell script that will clean up your publishing site. Use at your own risk. Backup your data, etc, etc! Wish you the best of luck! Like a Microsoft error message
Note: This script will utilise the field definitions in the root web. If you are migrating an entire web then you should fix your content types prior to running this script.
Code Snippet
- #Script: CleanTaxonomyFields.ps1
- #Author: Tim Wheeler (http://tjwheeler.blogspot.com/)
- #Created: 17/11/2010
- #Notes:
- #When content in imported into a new environment the Term Store ID is kept on the
- #taxonomy fields. This causes a problem as the new term store has a different id.
- #These functions clean up the taxonomy list which has bad entries following an import
- #and modify all the items and resets the correct values.
- param
- ( $siteCollectionUrl = (Read-Host "Please enter a site collection url") )
-
- #region SharePoint Snappin Setup
- $snapin="Microsoft.SharePoint.PowerShell"
- if (get-pssnapin $snapin -ea "silentlycontinue") {
- write-host -f Green "PSsnapin $snapin is loaded"
- }
- else {
- if (get-pssnapin $snapin -registered -ea "silentlycontinue") {
- write-host -f Green "PSsnapin $snapin is registered"
- Add-PSSnapin $snapin
- write-host -f Green "PSsnapin $snapin is loaded"
- }
- else {
- write-host -f Red "PSSnapin $snapin not found"
- }
- }
- #endregion
-
- #region Data Clean Up
-
- function CleanTaxonomyList($site)
- {
- Write-Host "Cleaning /Lists/TaxonomyHiddenList"
- $web = $site.RootWeb
- $taxlist =$web.GetList("Lists/TaxonomyHiddenList")
- for($count = $taxlist.Items.Count - 1; $count -ge 0; $count--)
- {
- $item = $taxlist.Items[$count]
- if([string]::IsNullOrEmpty($item.Title))
- {
- $item.Delete()
- if($?)
- {
- #Write-Host "Deleted Item with bad title"
- }
- else
- {
- Write-Host -ForegroundColor Red "Failed to deleted Item with bad title"
- }
- }
-
- }
- }
- #This function will reset the term store id if it is wrong, and will fix the WssId.
- #If the correct term cannot be located, the default one will be used.
- function ResetTaxonomyDefaults($site, [Microsoft.SharePoint.Publishing.PublishingWeb] $web)
- {
- $txs = New-Object "Microsoft.SharePoint.Taxonomy.TaxonomySession" -ArgumentList $site
- $pages = $web.GetPublishingPages()
- foreach($page in $pages)
- {
- Write-Host -ForegroundColor Cyan "Checking publishing page " $page.Title
- if($page.ListItem.File.CheckOutStatus -ne "None")
- {
- $page.CheckIn("Checked in by data clean process");
- }
- $page.CheckOut();
- foreach ($field in $page.ListItem.Fields)
- {
- if($field.GetType().Name -eq "TaxonomyField")
- {
- $taxField = [Microsoft.SharePoint.Taxonomy.TaxonomyField] $field
-
- Write-Host "Found field to update:" $taxField.Title
- $currentValue = $page.ListItem.Properties[$taxField.InternalName]
- Write-Host "Current Value is" $currentValue
- $templateField = $page.ListItem.ParentList.ParentWeb.Site.RootWeb.Fields[$field.Id]
- $defaultValue = $templateField.DefaultValue
- $termStore = $txs.TermStores[$templateField.SspId]
- $termSet = $termStore.GetTermSet($templateField.TermSetId)
- if($taxField.SspId -ne $templateField.SspId)
- {
- Write-Host "TaxField SspId is not correct, updating"
- $taxField.SspId = $templateField.SspId
- $taxField.Update()
- }
- if($taxField.TermSetId -ne $templateField.TermSetId)
- {
- Write-Host "TaxField TermSetId is not correct, updating"
- $taxField.TermSetId = $templateField.TermSetId
- $taxField.Update()
- }
- $fieldValue = $templateField.GetFieldValue($currentValue)
- if(($fieldValue.GetType().Name -eq "TaxonomyFieldValue") -and($fieldValue -eq $null -or [string]::IsNullOrEmpty($fieldValue.TermGuid)))
- {
- $fieldValue = $templateField.GetFieldValue($defaultValue)
- }
- if(($fieldValue.GetType().Name -ne "TaxonomyFieldValue") -and($fieldValue[0] -eq $null -or [string]::IsNullOrEmpty($fieldValue[0].TermGuid)))
- {
- $fieldValue = $templateField.GetFieldValue($defaultValue)
- }
- if($fieldValue.GetType().Name -eq "TaxonomyFieldValue")
- {
- try
- {
- $term = $termSet.GetTerm($fieldValue.TermGuid)
- }
- catch
- {
- Write-Host -red ("Failed to update field {0} for page {1} in web {2}"-f $taxField.InternalName, $page.Title, $web.Url)
- }
- }
- else
- {
- try
- {
- $term = $termSet.GetTerm($fieldValue[0].TermGuid)
- }
- catch
- {
- Write-Host -red ("Failed to update field {0} for page {1} in web {2}"-f $taxField.InternalName, $page.Title, $web.Url)
- }
- }
- $taxField.SetFieldValue($page.ListItem, $term)
- }
- }
- $page.ListItem.Update()
- $page.CheckIn("Data clean process")
- }
-
- }
- function CleanSiteCollection ($siteColUrl)
- {
- [Microsoft.SharePoint.SPSite] $site = get-spsite -Limit ALL | where-object {$_.Url -ieq $siteColUrl}
- if($site -eq $null)
- {
- Write-Host -ForegroundColor Red "Unable to find site collection"
- throw "Unable to find site collection"
- }
- CleanTaxonomyList $site
- $site | Get-SPWeb -limit all | ForEach-Object {
- #Check to see if site is a publishing site
- if ([Microsoft.SharePoint.Publishing.PublishingWeb]::IsPublishingWeb($_))
- {
- Write-Host "Cleaning pages in `"$($_.Title)`" site."
- #Get the Publishing Web and pages within it
- $publishingWeb = [Microsoft.SharePoint.Publishing.PublishingWeb]::GetPublishingWeb($_)
- ResetTaxonomyDefaults $site $publishingWeb
- }
- $_.Dispose()
- }
- $site.Dispose()
- }
- #endregion
- CleanSiteCollection $siteCollectionUrl