Find & Replace text in body

Add-in Express™ Support Service
That's what is more important than anything else

Find & Replace text in body
 
Gavin Rolfe




Posts: 4
Joined: 2011-01-04
Could anyone point me in the right direction to assist with my project.

Essentially I need to perform a find & replace on the body of the HTML.


Looking through the Examples & HowTo's, i found "How to run and replace a script (JavaScript, JScript or VBScript) on an HTML page in IE 6 or 7"

I only understand Basic VB, but the project is in C#.
I have attempted to replace "script" with "body"
And also amended the IHTMLScriptElement to IHTMLBodyElement (both places)

I was expecting that I could use msgbox(scriptElement.text) to test the output, but it displays an empty msgbox everytime.



Extracted code from ADX example

       private void button2_Click(object sender, EventArgs e)
       {
          mshtml.HTMLDocument doc = this.HTMLDocument;
          if (doc != null)
          {
             mshtml.IHTMLElementCollection elementCollection1 = doc.getElementsByTagName("script");
             if (elementCollection1 != null)
             {
                 for (int i = 0; i < elementCollection1.length; i++)
                 {
                     mshtml.IHTMLScriptElement scriptElement = elementCollection1.item(i, i) as mshtml.IHTMLScriptElement;
                     if (scriptElement != null)
                     {
                         if (scriptElement.text != null)
                         {
                             scriptElement.text = "function done(){alert("hello"); }";
                         }
                         Marshal.ReleaseComObject(scriptElement);
                     }
                 }
                 Marshal.ReleaseComObject(elementCollection1);
             }
          }
       } 
Posted 04 Feb, 2011 06:53:37 Top
Dominik Burth




Posts: 6
Joined: 2011-01-25
Hi Gavin

Hope, this helps you getting started:

Public Class IEModule
Private Sub IEModule_DocumentComplete(ByVal pDisp As Object, ByVal url As String) Handles Me.DocumentComplete
'added class (see below) to ensure, IE owns the MsgBox
Dim window As MyWindow = New MyWindow(Me.ParentHandle)

Dim tmpStr As String = ""
Dim Elem As mshtml.IHTMLElement
Dim Elems As mshtml.IHTMLElementCollection = HTMLDocument.getElementsByTagName("B")

For Each Elem In Elems
tmpStr = tmpStr & Elem.innerText & vbCrLf
'following is only an example of how to find and edit something:
If Elem.innerText = "Look at this" Then
Elem.innerText = "Close eyes"
Exit For
End If
Next
MessageBox.Show(window, tmpStr)

''here is, how to find pictures:
'Dim locImg As mshtml.IHTMLImgElement
'Elems = HTMLDocument.images
'For Each locImg In Elems
' tmpStr = tmpStr & "pic from: " & locImg.href & vbCrLf
'Next
'MessageBox.Show(window, tmpStr)
End Sub
End Class
Public Class MyWindow
Implements System.Windows.Forms.IWin32Window
Dim theHandle As IntPtr
Public Sub New(ByVal aHandle As System.IntPtr)
theHandle = aHandle
End Sub
Public ReadOnly Property Handle() As System.IntPtr _
Implements System.Windows.Forms.IWin32Window.Handle
Get
Return theHandle
End Get
End Property
End Class

How to find out the TagNames? Download and install DebugBar for IE; so you have a look inside each visited webpage for mass of samples! http://www.debugbar.com/?langage=en

Greets,
Dominik
Posted 08 Feb, 2011 15:52:24 Top
Gavin Rolfe




Posts: 4
Joined: 2011-01-04
Hi Dominik,

Thank you for your time in responding.

My project requires me to pass all of the HTML source through the find/replace function(which will search for words and place <a> tags around them to make them hyperlinks) the returned string will then be inserted into the document replacing the original page source.

I created a fresh project to test the code you posted.
Your code sample searches for Bold tags, and outputs the innertext.
I ammended getElementsByTagName("B") to getElementsByTagName("HTML")
And changed elem.innertext to elem.outerhtml which returned all html on the page.

The only two problems i have are:
** Once i have passed the html through my function how do i re-insert my code into the document?
E.g. elem.outerhtml = replacefunction(elem.outerhtml) ' Is Not working

** How can I search through framesets & frames on websites?
I am testing with a website that has a html layout like this:

<ht ml>
  <head>..</head>
  <fra meset>
    <fra me><ht ml></html></frame>
    <fra meset>
      <fra me><ht ml></html></frame>
      <fra me><ht ml></html></frame>
    </frameset>
  </frameset>
</html>


I welcome comments from anyone who may be able to point me in the right direction.
Posted 10 Feb, 2011 17:00:52 Top
Dominik Burth




Posts: 6
Joined: 2011-01-25
Hi Gavin,

after some tests I found, that TagName(HTML) doesn't work for me too. I can't explain why it is so, but I tried the following:

Dim Elems As mshtml.IHTMLElementCollection = HTMLDocument.getElementsByTagName("BODY")
Dim Elem As mshtml.IHTMLElement
For Each Elem In Elems
tmpStr = tmpStr & Elem.innerText & vbCrLf
If Elem.innerHTML.Contains("Login") Then
Elem.innerHTML = Replace(Elem.innerHTML, "Login", "Enter here")
End If
Next
MessageBox.Show(window, tmpStr)

tmpStr and MsgBox (.innerText) is only to control/see, what's passing throu "Elem" - you may want to mark out or perhaps add a counter to each line catch, to become able to identify contents of each single Elem.

The real work is to iterate (by .innerHTML) throu all IHTMLElement of the "BODY collection", looking for a specific word and if found, to replace that.
This sample worked well for me on my webpage and did the expected!

My webpage hasn't any frames - so if this doesn't work for you, try:
a) another webpage w/o frames to see, if frames are "blocking" your work
b) pass me an URL having such frame sets - I'll try to catch something

Btw: passing throu a big document (HTML, BODY, ...) is a highly time consuming thing. Don't forget this aspect. It's always a good idea to reduce the haystack before searching the needle; if possible. This means smaller collections like TagName: P/TR/... are to be prefered.

Greets, Dominik.
Posted 11 Feb, 2011 03:36:05 Top
Sergey Grischenko


Add-in Express team


Posts: 7202
Joined: 2004-07-05
Hi Gavin,

Please try to use the createTextRange method of the IHTMLBodyElement interface to obtain the whole body text.
E.g.
mshtml.IHTMLBodyElement elem =
(mshtml.IHTMLBodyElement)((mshtml.IHTMLElementCollection)(browser.Document as mshtml.IHTMLDocument2).all.tags("body")).item(null, 0);
mshtml.IHTMLTxtRange textRange = elem.createTextRange();
MessageBox.Show(textRange.text);
Posted 15 Feb, 2011 06:02:18 Top