Access Pages collection in large documents

Add-in Express™ Support Service
That's what is more important than anything else

Access Pages collection in large documents
 
Daniel Lutz




Posts: 32
Joined: 2022-02-09
Hello community,

this is more an offtopic question and not a really related question to this product but maybe someone has an idea. I know that Word is not an layout based program but i try to work wird the
WordApp.ActiveWindow.ActivePane.Pages
Collection to evaluate all shape objects on a page.

This works fine in small documents but in large documents (> 1500 pages) it seems to be that the access via index
Pages[index]
throw a "Target object is not in the collection". This seems to be a general problem.

Example

- acess via C# Interop
- Document (lorem ipsum) > 3000 pages
- the first ~ 1600 pages are accessible via index
- the rest throws an exception
- the Pages.Count returns the right value

- access via VBA seems to be work for all indexes

After lots of research i dont`t found any solution for the exception and so i want ask here if anyone had the same problem:



var window = WordApp.ActiveWindow;
var pane = window.ActivePane;
var pages = pane.Pages;
var pagesCount = pages.Count;
Page page = null;
try
{
    for (int pageIndex = 1; pageIndex <= pagesCount; pageIndex++)
    {
        try
        {
            page = pages[pageIndex];
        }
        catch (Exception exp)
        {
         
        }
        finally
        {
            if (page != null)
            {
                Marshal.ReleaseComObject(page);
            }
        }
    }
}
finally
{
    Marshal.ReleaseComObject(pages);
    Marshal.ReleaseComObject(pane);
    Marshal.ReleaseComObject(window);
}
Posted 14 Jun, 2022 01:17:47 Top
Daniel Lutz




Posts: 32
Joined: 2022-02-09
I know that COM objects are very limited but this is very strange becaus VBA (i believe) works also with COM and the implementations seems to be correct.

- Pages collection uses a CustomMarshalers implementations to map the IEnumVARIANT to IEnumerable
- the get_Item implementation looks okay

So the question is, is the problem the mapping from unmanaged to managed area or blocks microsoft unmanaged implementation the access.

Maybe someone has an idea
Posted 14 Jun, 2022 01:23:05 Top
Daniel Lutz




Posts: 32
Joined: 2022-02-09
The idea to use the Pages collection is to evaluate all shapes on a specific page. So i enumerate all pages and look into the Rectangles collection for Shape rectangles and store some shape objects.

When the full code runs in some memory limitation i had understand it becaus storing COM Objects can avoid releaseing the objects above the calling hierachiy (like releasing Pages, Pane, Window object and so on) but that the similar code above failed is very strange.
Posted 14 Jun, 2022 01:25:47 Top
Andrei Smolin


Add-in Express team


Posts: 18338
Joined: 2006-05-11
Hello Daniel,

Could you provide the test document and both VBA and C# source code? I'll test these.

Regards from Poland (CEST),

Andrei Smolin
Add-in Express Team Leader
Posted 14 Jun, 2022 01:30:39 Top
Daniel Lutz




Posts: 32
Joined: 2022-02-09
Hello Andrei,

Test file

Downloadlink

I`ve uploaded the test file to the european file hosting service WeTransfer.com

C#


// Repaginate document to avoid page conflicts
Document document = null;
try
{
    document = WordApp.ActiveDocument;
    document.Repaginate();
}
finally
{
    if (document != null) Marshal.ReleaseComObject(document);
}


// access every page

var stringBuilder = new StringBuilder();

var window = WordApp.ActiveWindow;
var pane = window.ActivePane;
var pages = pane.Pages;
var pagesCount = pages.Count;
Page page = null;

try
{
    for (int pageIndex = 1; pageIndex <= pagesCount; pageIndex++)
    {
        try
        {
            page = pages[pageIndex];
            stringBuilder.AppendLine($"Page {pageIndex} access successed");
        }
        catch (Exception e)
        {
            stringBuilder.AppendLine($"Page {pageIndex} access failed: {e.Message}");
        }
        finally
        {
            if (page != null) Marshal.ReleaseComObject(page);
            page = null;
        }
    }
}
finally
{
    Marshal.ReleaseComObject(pages);
    Marshal.ReleaseComObject(pane);
    Marshal.ReleaseComObject(window);
}

var log = stringBuilder.ToString();



StringBuilder Log



Page 1 access successed
Page 2 access successed
[...]
Page 1616 access successed
Page 1617 access successed
Page 1618 access failed: The requested item does not exist in the collection.
Page 1619 access failed: The requested item does not exist in the collection.
[...]
Page 3100 access failed: The requested item does not exist in the collection.
Page 3101 access failed: The requested item does not exist in the collection.


I translate the exception message from german to english so it could be a little different to your localized message

VBA code


Sub Macro()

Dim pageIndex As Integer
Dim page As page

For pageIndex = 1 To Application.ActiveWindow.ActivePane.Pages.Count
 Set page = Application.ActiveWindow.ActivePane.Pages.Item(pageIndex)
 Debug.Print CStr(pageIndex) & " " & CStr(page.Rectangles.Count)
Next

End Sub


Best regards
Daniel
Posted 14 Jun, 2022 03:45:36 Top
Andrei Smolin


Add-in Express team


Posts: 18338
Joined: 2006-05-11
Hello Daniel,

The code looks correct to me. I suppose the issue relates to the pagination run in the background; see https://docs.microsoft.com/en-us/office/vba/api/word.options.pagination. Could you set this option to false and check if the issue is reproducible?

Regards from Poland (CEST),

Andrei Smolin
Add-in Express Team Leader
Posted 14 Jun, 2022 04:56:34 Top
Daniel Lutz




Posts: 32
Joined: 2022-02-09
Hello Andrei,

thanks for answering. This options has no really impact on this problem. If i activate the options this first ~1600 pages will come via enumeration if i deactivate the background pagination then it happen das maybe the first ~1380 pages.

It seems to be that the pagination engine works correct (because the thumbnails will generated completed >> see navigation panel) but the bridge between word (unmanaged) and the interop (managed) part works not correct with more than ~ 1550 pages.

Best regards
Daniel
Posted 14 Jun, 2022 05:59:07 Top
Andrei Smolin


Add-in Express team


Posts: 18338
Joined: 2006-05-11
Hello Daniel,

It breaks at page 1576 for me. The most strange thing is it succeeds again at page 1629.

As to your VBA macro, note that it doesn't call document.Repaginate().

Regards from Poland (CEST),

Andrei Smolin
Add-in Express Team Leader
Posted 14 Jun, 2022 09:01:34 Top
Daniel Lutz




Posts: 32
Joined: 2022-02-09
Hello Andrei,

this is right i forget the call in the VBA code. But without the call (independ if i wait until the document opening that word shows all pages) the pages collection includs lesser pages (~1280 pages).

Best regards
Daniel
Posted 15 Jun, 2022 01:18:49 Top
Andrei Smolin


Add-in Express team


Posts: 18338
Joined: 2006-05-11
Hello Daniel,

I still think the issue somehow relates to some process running behind the scenes. I would try to introduce a delay once that error occurs; say, you can try to call System.Windows.Forms.Application.DoEvents().

And it is seemingly working for me with or without document.Repaginate(). Yesterday I reproduced the issue; today I ran two tests without the issue.

Regards from Poland (CEST),

Andrei Smolin
Add-in Express Team Leader
Posted 15 Jun, 2022 03:05:52 Top