Working with Word document content objects
Microsoft Word is about the authoring of documents. Documents contain pages, paragraphs, sentences and more. Today, I want to wade into the waters of manipulating Word document content. The plan is to get your feet wet by providing an overview of the key objects along with code samples.
Continuing the fine tradition of all Office applications (OneNote being the exception), Word has an extensive and mature object model. My goal is to show “a way” to accomplish some tasks in Microsoft Word. These samples are not necessarily, “the way” to do it. There is more than one way to achieve the same results.
Today, we are concerned with the structure of a Word document… not its stylings and presentation (I’ll concern ourselves with that topic next).
- Word document content objects
- Accessing document content objects with code
- Working with familiar Word content objects
Let’s look at the objects that combine together to construct a Word document.
- Document :: The document represents a single, solitary Word document. This is the parent object for all content within a Word document file. It lives in an appropriately named collection called Documents. I covered the document object in a previous article – Word application and base objects.
- Section :: Documents can contain multiple sections. This is how you apply different formatting, layout, and header/footer options to a Word document… by creating sections and formatting them differently.
- Page :: Represents a single page within a document. A page can reside within a Range.
- Paragraph :: A single paragraph of content. A paragraph can reside within a Selection, Document, or Range and is accessible via the Paragraphs collection object. Paragraphs also contain the formatting for content residing above it. Don’t ponder over that last sentence too much, I will explain it further in my next article.
- Sentence :: This object does not exist… strange as that might seem. Instead, there is the Paragraphs collection object. This collection contains a collection of Range objects, which in turn, contain all the sentences for the Range. Confused? The code samples will help bring focus to the picture.
- Selection :: This object contains the content selected either by the user or via code. It also is the insertion point if no selection exists. The insertion point is where the cursor blinks in the Word UI to inform the user of their location within the document.
- Range :: Think of this object like real-estate. It contains a contiguous section of a document. Perhaps its two most popular properties are Start and End. They contain ranges starting and ending character positions within the Word document.
There are more “content” objects than these but the list above are the major objects that provide access to everything else.
I want to give you a lot of VB.NET code samples today. I’ll start with the Range object and keep going as long as I have paper. By the end of the samples, my hope is you will have a general idea of how to access and edit content within a Word document.
The Range object
The Word object model is extensive and more than a little complex. The best way to figure out how to make things happen via code is to learn to think like Word thinks. The quickest way to achieve this goal is to master the Range and the Selection objects. I covered the selection object in this article so I will focus on the Range object. Just know that together, these two objects allow you to select and find Word content.
Enumerate paragraphs in a range
This method loops through all the paragraphs in the active document and reverses their order in a new document.
Private Sub EnumerateParagraphs() Dim curDoc As Word.Document = WordApp.ActiveDocument Dim newDoc As Word.Document Dim rng As Word.Range Dim str As String = "" Dim i As Integer rng = curDoc.Content i = rng.Paragraphs.Count() Do Until i = 0 str = str & rng.Paragraphs.Item(i).Range.Text & vbCrLf i = i - 1 Loop newDoc = WordApp.Documents.Add newDoc.Range().Text = str Marshal.ReleaseComObject(rng) Marshal.ReleaseComObject(curDoc) Marshal.ReleaseComObject(newDoc)
The key is the use of the Range object and all the document content. The code first grabs all the document content and assigns it to a Range variable (rng). Then the code finds out how many paragraphs exist and loops backwards to create a string. This string is then inserted into a new document. It’s a good bar trick that only works in nerd bars.
Select the current page range
In this sample, we use a Range object that’s buried deeper in the Word object model. The reason for this sample is that I want to point out that the Range object is darn near everywhere within Word.
Private Sub SelectedPageRange() Dim curDoc As Word.Document curDoc = WordApp.ActiveDocument curDoc.Bookmarks("\page").Range.Select() Marshal.ReleaseComObject(curDoc) End Sub
In this procedure, I employ the predefined “page” bookmark. This is a standard Word bookmark that always exists and allows easy selection of the page containing the current focus (or selection).
You can learn more about predefined bookmarks at MSDN.
Enumerate sentences in a range
This one is similar to paragraphs but we switch sentences for paragraphs. The idea is the same… create a new document that contains the sentences in reverse order.
Private Sub EnumerateSentences() Dim curDoc As Word.Document = WordApp.ActiveDocument Dim newDoc As Word.Document Dim rng As Word.Range Dim str As String = "" Dim i As Integer rng = curDoc.Content i = rng.Sentences.Count() Do Until i = 0 str = str & rng.Sentences.Item(i).Text i = i - 1 Loop newDoc = WordApp.Documents.Add newDoc.Range().Text = str.ToUpper Marshal.ReleaseComObject(rng) Marshal.ReleaseComObject(curDoc) Marshal.ReleaseComObject(newDoc) End Sub
To mix it up, the final document’s text is upper case. I know, this is dazzling trickery.
The takeaway is that the Range object contains a section of the Word document. Within it are major items like paragraphs and sentences.
The Section object
Documents can contain multiple sections. Sections allow you to define different page layouts and header/footer schemes.
You can loop through document sections by accessing the Sections collection. This collection resides directly beneath the Document object.
Private Sub EnumerateSections() Dim curDoc As Word.Document Dim newDoc As Word.Document Dim str As String = "" Dim i As Integer curDoc = WordApp.ActiveDocument For i = 1 To curDoc.Sections.Count str = str & "SECTION " & i & vbCr str = str & vbTab & "start = " & curDoc.Sections(i).Range.Start str = str & vbTab & "end = " & curDoc.Sections(i).Range.End & vbCrLf Next newDoc = WordApp.Documents.Add newDoc.Range().Text = str Marshal.ReleaseComObject(curDoc) Marshal.ReleaseComObject(newDoc) End Sub
This procedure builds a string that 1) lists each document section and 2) contains the section’s Start and End character position. Notice the use of the Range object to read this information.
Create a new section
When building Word documents via code, you will likely need to create a new section. This procedure will do the trick.
Private Sub CreateSection() Dim curDoc As Word.Document curDoc = WordApp.ActiveDocument curDoc.Sections.Add(WordApp.Selection.Range) Marshal.ReleaseComObject(curDoc) End Sub
This sample inserts a section break at the current selection (or cursor) location. Again, notice the use of the Range property. You can easily grab a different range within the document and pass it as the location of the page break.
The Page object
The page object resides in a “funny” location. Not “ha ha” funny… more like “why the hell is it here?” funny. You access document pages via the Document.ActiveWindow.Panes collection. Why? Because these are the rules.
Someday you might want to loop through all document pages and perform some very specific business logic on them. This code does exactly that.
Private Sub EnumeratePages() Dim curDoc As Word.Document Dim newDoc As Word.Document Dim pgs As Word.Pages Dim str As String = "" Dim i As Integer curDoc = WordApp.ActiveDocument pgs = curDoc.ActiveWindow.Panes(1).Pages For i = 1 To pgs.Count str = "PAGE " & i & vbCr str = str & vbTab & "height = " & pgs.Item(i).Height str = str & vbTab & "width = " & pgs.Item(i).Width & vbCrLf Next newDoc = WordApp.Documents.Add newDoc.Range().Text = str Marshal.ReleaseComObject(pgs) Marshal.ReleaseComObject(curDoc) Marshal.ReleaseComObject(newDoc) End Sub
Here, the business logic is to create a string that contains the height and width of each page and then display it in a new document. Your business logic will probably be more complex than this. Consider this procedure a starter kit for processing document pages.
Insert a page break
To create a page, you create a page break.
Private Sub InsertPageBreak() Dim sel As Word.Selection sel = WordApp.Selection sel.InsertBreak(Type:=Word.WdBreakType.wdPageBreak) Marshal.ReleaseComObject(sel) End Sub
I start by referencing the current insertion point via the Selection object. I then invoke the InsertBreak method and specify a page break. Walla! We have a new page.
Now that you know the basics of working with the Range, Section, and Page objects, let’s look at working with typical Word content like tabless, comments, & text.
Insert a table
I use tables all the time. I can see how it would be useful to have a procedure that inserts a table exactly how I like it.
Private Sub InsertTable(rowCount As Integer, columnCount As Integer) Dim curDoc As Word.Document = WordApp.ActiveDocument Dim table As Word.Table table = curDoc.Tables.Add(WordApp.Selection.Range, rowCount, columnCount) table.Cell(1, 1).Range.Text = "Hello Table" Marshal.ReleaseComObject(table) Marshal.ReleaseComObject(curDoc) End Sub
In this case, I like a table with 1 column and 7 rows and 0 formatting. The Tables collection resides under the Document object. To add a new table, you call the collection’s Add method and specify its location (via a Range object), number of rows, and number of columns.
Authoring a quality document of any type (blog, article, report, proposal, etc.) is a collaborative effort. Comments are key to the collaborative process. If you receive your document after this process and it is littered with helpful comments for improving it… the following code will come in handy.
Private Sub EnumerateComments() Dim curDoc As Word.Document = WordApp.ActiveDocument For i = 1 To curDoc.Comments.Count curDoc.Comments.Item(i).Range.Text = _ curDoc.Comments.Item(i).Range.Text & vbCrLf & _ "Corrected. It's all good now!" Next Marshal.ReleaseComObject(curDoc) End Sub
Here, I loop through the Comments collection. This collection also resides directly under the Document object. For each comment, I insert a comment below the existing comment text.
Create a comment
Creating a comment is straight-forward. The approach below utilizes the current selection’s range as the insertion point.
Private Sub CreateComment(commentText As String) WordApp.Selection.Comments.Add(WordApp.Selection.Range, commentText) End Sub
The text for the comment needs to be passed as the procedure’s parameter.
You can also create comments by calling Document.Comments.Add. If you do that, you need to pass a Range to specify where to insert the comment.
Delete all comments
Deleting all comments is delightfully easy. There is a method that takes care of them.
Private Sub DeleteComments() Dim curDoc As Word.Document = WordApp.ActiveDocument curDoc.DeleteAllComments() Marshal.ReleaseComObject(curDoc) End Sub
There is no need to loop through the Comments collection.
To insert text, you can utilize the Selection and Range objects.
Private Sub InsertText(textToInsert As String) WordApp.Selection.InsertAfter(textToInsert) 'WordApp.Selection.InsertBefore(textToInsert) 'WordApp.ActiveDocument.Range.InsertAfter(textToInsert) End Sub
In this sample, I utilize the current selection to insert the passed string after the current selection. I’ve include commented code to show how you can chose to InsertBefore. Also, I’ve shown how to do the same with the Range object.
Finding text is a core competency in Word solution development. This sample performs search and replace.
Private Sub FindText() Dim rng As Word.Range rng = WordApp.ActiveDocument.Content With rng.Find .ClearFormatting() .Execute(FindText:="Hello Table", _ ReplaceWith:="Found Table", _ Replace:=Word.WdReplace.wdReplaceAll) End With Marshal.ReleaseComObject(rng) End Sub
The procedure sets a Range object that contains all document Content. It then executes a Find & Replace action to replace the text inserted in the InsertTable method from earlier.
Copy and paste text
If you have text in a Word document, you will need to move it around.
Private Sub CopyAndPasteText() Dim curDoc As Word.Document = WordApp.ActiveDocument Dim rng As Word.Range 'Dim sel As Word.Selection rng = curDoc.Range(curDoc.Paragraphs(1).Range.Start, _ curDoc.Paragraphs(3).Range.End) rng.Copy() 'sel = curDoc.Range(curDoc.Paragraphs(1).Range, _ ' curDoc.Paragraphs(3).Range.End) 'sel.Copy() WordApp.Selection.GoTo(What:=Word.WdGoToItem.wdGoToBookmark, _ Name:="\EndOfDoc") WordApp.Selection.Paste() Marshal.ReleaseComObject(rng) Marshal.ReleaseComObject(curDoc) End Sub
This method stores the first 3 paragraphs in a Range object, copies them, moves to the end of the document, and pastes the paragraphs. I’ve included commented code that shows how you could perform the copy using a Selection object.
We have now delved into the waters of Word document content manipulation. We’ll continue looking at scenarios in future articles!
This sample Word add-in was developed using Add-in Express for Office and .net:
Word add-in development in Visual Studio for beginners:
- Part 1: Word add-in development – Application and base objects
- Part 2: Customizing Word UI – What is and isn’t customizable
- Part 3: Customizing Word main menu, context menus and Backstage view
- Part 4: Creating custom Word ribbons and toolbars: VB.NET, C#
- Part 5: Building custom task panes for Word 2013 – 2003
- Part 6: Working with Word document content objects
- Part 8: Working with multiple Microsoft Word documents
- Part 9: Using custom XML parts in Word add-ins
- Part 10: Working with Word document properties, bookmarks, content controls and quick parts
- Part 11: Populating Word documents with data from external sources
- Part 12: Working with Microsoft Word templates