Wednesday, April 11, 2012

An Alternate Way to Prepare the Term Set Import CSV File

Recently I was in a need for importing of largish amount of terms to the SharePoint 2010 Managed Metadata Term Store in my lab, as I wanted to test performance of a back-up and restore scripts I’ve developed. I knew that the out of the box (OOTB) import functionality is not the perfect one and has significant limitations, so I tried to find a way how to use what we got and do not spend too much time by editing the source CSV file or developing a custom import solution.

Before I explain the simple way of creating the compatible CSV file for a term set import, I’ll briefly describe the limits of the OOTB import functionality. The term set import functionality is accessible through the SharePoint Server 2010 Central Administration web site in the “Term Store Management Tool” called also “SharePoint Metadata Manager”. You need to have at least one term group, in your term store, to be able to access the “Import Term Set” command, because it is available only through the group context menu.
Figure 1: Import Term Set command
The OOTB import functionality is limited to single term set with a description, and up to seven term levels, where for each term can be specified only a description, availability for tagging flag and the Locale ID (LCID) determining the language in which will be the term created. Any additional term set names and descriptions will be ignored and related terms will be created in the term set specified in second line of the import CSV file. The import functionality does not distinguish among different LCID values, which mean that you cannot have in a single import file a mixture of terms in different languages. It also does not allow you to specify additional labels for default working language, and description and labels in other working languages.

The import file is in one of the most common CSV format implementations, which is really using comma as delimiter, but with each value enclosed in double-quoted characters. You can read in the “Sample Import” message in the properties pane of the Term Store Management Tool, that the encoding of the file should be UTF-8. The file format is well described in the Managed metadata input file format (SharePoint Server 2010) TechNet article and a sample file can be downloaded from the term store properties pane. The most suitable tool for creating and editing term set import file is Microsoft Excel. Contrary to the great editing functionalities, the Microsoft Excel does not provide any CSV export file format compatible with the format of term set import file. Microsoft Excel can save data in three different CSV file formats, which are different in character set encoding, but all of them use semicolon as the delimiter and none of them is using double-quoted characters to enclose the values.

If you have installed the Microsoft Excel 2010, then the best approach for creating correct term set import file is to use the “Term Store Creator” tools developed by Wictor Wilén. You can read more about the tool and download it here: Create SharePoint 2010 Managed Metadata with Excel 2010. You can find on the internet many other macro code samples, which would help you to export data to the double-quoted and comma delimited file format, but you will always face the risk that the code will not work correctly with your Microsoft Excel version. In such case you can take the opposite approach like I did.

You can save your term set import file using the standard Microsoft Excel ”CSV (Comma delimited) (*.csv)” file format, and modify it later with some other tool. It really does not make sense to edit the import file with a text editor like Notepad or Notepad++, but there is another tool I found on the Internet and it is quite useful. The tool is a free CSV file editor tool CSVed and you can download it from http://csved.sjfrancke.nl/.

Figure 2: The CSV (Comma delimited) (*.csv) file from Microsoft Excel
Let me show you in few simple steps, how to produce correct term set import file from any TXT or CSV source file generated by Microsoft Excel or other spreadsheet editor you have used to edit the terms.

Figure 3: Import Text file command
In first step you import your source file into the CSVed tool. In the “Import” dialog window you specify that the source file contains delimiters and you also have to specify the input and output file names. Click “Next” to get to the second view of the “Import” dialog window.

Figure 4: Import Text file filter settings - Part 1
In my case I’ve got a semi-colon delimited file format, and therefore I had to select the semi-colon option for the delimiter. It is important to click on the first line to indicate from which row CSVed should start with importing data. Do not forget to click on the first line before you click “Finish”, otherwise you realize that no data were imported.

Figure 4: Import Text file filter settings - Part 2
You do not need to edit or modify the data as only thing we need from CSVed is to produce correct file format. First thing we do, after the data import, is changing the delimiter character to comma. This is done by single click in the “Set Separator” area of the “Start and Item Edit” tab.

Figure 5: Setting comma as the delimiter character
The second thing we need is to enclose the values to double-quoted characters. This is done by setting the options for saving CSV files. Select “Tools → Options” and in the list of options select “Double Quotes”. Set the option “Save CSV File with Double Quotes” if it is not already set, and click “OK”.

Figure 6: Setting double-quote characters to enclose values
The last step is to save the file to desired location, and the preparation work is done. Now you can just import the terms to selected term group using the Term Store Management Tool import functionality.

Figure 7: The final CSV file format after modification in CSVed
The above described approach is independent from Microsoft Excel 2010 and can be useful in case you have to, or you like to edit the term set import file using different spreadsheet editor.

Each term set in term store can have up to 30,000 term items. If you plan to import such large amount of terms, plan this activity for the time when your SharePoint Server 2010 farm is not utilized. The term set import procedure is going to continuously write data to the database associated with the Managed Metadata service application, which could have impact on the SQL Server back-end availability for the other SharePoint related database connections.

No comments:

Post a Comment