Introduction

Many of our K2 blackpearl processes involve working with MS Word documents and also document libraries in MOSS 2007. Up until now it has been a rather fiddly task to extract data from MS Word documents and use that data in our K2 blackpearl processes.

Fortunately, MS Word 2007 uses a file format called Office Open XML which is an XML-based structure that makes extracting data relatively easy compared to previous versions of MS Word.

This post looks at how you can use the features of MS Word 2007 and a little bit of code (see sample below) to extract data from an MS Word document and assign it to process datafields in a K2 blackpearl process.

Preparing the MS Word 2007 Document

The first step in this project is to prepare the document - we shall do this by adding a new table cell to the document and assigning bookmarks and properties to the document.

The screenshots below walk us through this stage.

1. Step One - Add New Table Cell

The table cell is where we are going to type in our data and later use it in the process. The feature you need to use to insert a table into a document is found on the "Insert" tab of the "ribbon" in MS Word 2007. A bit basic I know but just in case no-one's ever done this before :)

2. Step Two - Select the Cell

Once you've added a new cell to your table, put the cursor into the cell and then select the "Select Cell" option from the "Select" menu which is located on the "Layout" tab of the ribbon.

The screenshot below shows you how to select a table cell in MS Word 2007.

Select Cell

 

Step 3 - Add a Bookmark

Once we've selected our table cell we need to create a new bookmark which points to this cell. To create a new bookmark jump to the "Insert" tab on the ribbon and select the "Bookmark" option from within the "Links" section which is located in the middle of the ribbon.

Selecting the "Bookmark" option will cause the bookmark dialog box to open.

Type in a value, in this case I'm using "ApplicantName" for the name of my bookmark. We'll use the bookmark later on when we create our document properties. Once you've typed in the name of the bookmark just click the "Add" button and you're done with this step.

Bookmark

Step 4 - Add a New Property - Part One

Now that we've added our bookmark to the table cell we need to create a new property in the document. MS Word 2007 is based on the Office Open XML format and any properties we set on the document are stored within the document's XML structure.

First jump to the "Properties" menu item and select it. This will cause a new menu "Document Properties" to appear just beneath the ribbon.

The screenshot below shows you how to jump to the "Properties" menu item.

. Document Property

When the "Document Properties" menu appears select the "Advanced Properties" option. This will cause the document "Properties" dialog box to appear.

Advanced Properties

Step 5 - Add a New Property - Part Two

Now that we've navigated to the "Properties" dialog box we need to jump to the custom tab in the dialog box. From here we will add a new property to the document itself.

Type in a value for the name of your new property, in this case I'm using "ApplicantName" for the name of my new property. The screenshot below shows how to do this.

New Property

Step 6 - Add a New Property - Part Three 

Once we've created a new property we need to link it to the bookmark that we created in a previous step.

To do this, check the "Link to content" checkbox and then select a value from the "Source" drop-list - this drop-list contains a lookup to all of the bookmarks for the document.

You should see the name of the bookmark that you created in a previous step. Select the bookmark you want to link your property to and once you've done that click the "Add" button.

The screenshot below shows how to do this.

Link to Bookmark

Next click "OK" and that's you done - you've successfully configured your document with a new property linked to a bookmark within a table cell. Later on, you can type data into this cell and whatever you type in will be extracted and used within the process. We'll talk about how that happens in the next section.

If you need to extract additional data from the document simply repeat the steps described above. 

Extracting the Data

In this section we'll look at how to extract the data from a document that has been prepared using the steps described above. Before we get to the code we'll first take a look at the process included in the sample project attached to this article.

1. Process Description

We start the process by uploading a document to a MOSS 2007 document library. From there the document is then downloaded (we use standard K2 blackpearl wizards to download the document) to a location on the file system.

The reason I chose to download the document to the file system was due to the fact that it made it easier to write some C# code to extract the data - feel free to play around with other methods of manipulating the document and see what you can come up with, I'm sure there are many other ways to do this!

Once the document is downloaded we can use some C# code to get at the data in the document XML and assign it to process data fields.

Steps One and Two below show this in more detail.

Step One - The Process

The screenshot below shows the process and you can see where we first download the document and then use a server (code) event to extract the data.

Process Designo

 

Step Two - The Code Listing

A section of the code is shown below.         {

            XmlDocument xmlProperties = new XmlDocument();

            using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(@"C:\XXX\Application.docx", false))

            {

                CustomFilePropertiesPart appPart = wordDoc.CustomFilePropertiesPart;

                xmlProperties.Load(appPart.GetStream());

            }

            XmlNodeList chars = xmlProperties.GetElementsByTagName("property");

            foreach (XmlNode var in chars)

            {

                if (var.Attributes["name"].InnerText == "ApplicantName")

                {

                    string ApplicantName = var.InnerText;

                    char[] myChar = {'.'};

                    K2.ProcessInstance.DataFields["ApplicantName"].Value = ApplicantName.TrimEnd(myChar);

                }

You'll see that I've hard-wired the path on the file system (where the document was downloaded to) and I've also hard-wired the document name. You can also use dynamic values here, possibly from a SmartObject or process data fields, as you prefer - you choose, it's your project!

If you do decide to download the file to the file system before parsing don't forget to write a bit of "clean up" code to delete the file once it's been parsed. 

 

Testing the Solution

Once you've got this far you're almost there. Deploy your process as normal and once you've set your process rights you can start the process off.

 

Further Reading

If you want to read more about Office Open XML then Microsoft have plenty of content on MSDN for you to explore.

Hope you've enjoyed this article - if you've any questions drop me a mail and I'll be happy to answer them.

Cheers and happy blackpearling..

Andy