F Andr?
Guest
|
Hello,
I'm trying to extract the full HTML source code from the current page. I want to have the same result as when I do "View --> Source".
But these properties does not contain the "same" HTML, the DOCTYPE tag is missing for example :
HTMLDocument.documentElement.outerHTML
or
HTMLDocument.documentElement.innerHTML
Is there other property with the full HTML source code ?
thank you
Andr? |
|
Sergey Grischenko
Add-in Express team
Posts: 7233
Joined: 2004-07-05
|
Hi Andr?.
Did you try using the doctype property of the IHTMLDocument interface? |
|
F Andr?
Guest
|
Sorry I'm unable to reply, I get the error message "Message text." when I try to post...
do you filter contents ? I have some links on the reply. |
|
F Andr?
Guest
|
Hello Sergey,
finally I used this framwork for this task : csEXWB (http://tinyurl.com/yw3c25)
Because concatenating the doctype will not give me the full HTML source as you can get from "View --> Source".
It seems that PInvokes are necessary to get the HTML source.
I use the class cEXWB like that :
cEXWB myWB = new cEXWB();
string htmlSource = myWB.GetSource(IEApp as IfacesEnumsStructsClasses.IWebBrowser2);
Now I'm just having string encoding problems, I'll fix that with this another framework : http://tinyurl.com/qlvx5l
Code extract from csEXWB :
public string GetSource(IWebBrowser2 thisBrowser)
{
if ((thisBrowser == null) || (thisBrowser.Document == null))
return string.Empty;
//Declare vars
int hr = Hresults.S_OK;
IStream pStream = null;
IPersistStreamInit pPersistStreamInit = null;
// Query for IPersistStreamInit.
pPersistStreamInit = thisBrowser.Document as IPersistStreamInit;
if (pPersistStreamInit == null)
return string.Empty;
//Create stream, delete on release
hr = WinApis.CreateStreamOnHGlobal(m_NullPointer, true, out pStream);
if ((pStream == null) || (hr != Hresults.S_OK))
return string.Empty;
//Save
hr = pPersistStreamInit.Save(pStream, false);
if (hr != Hresults.S_OK)
return string.Empty;
//Now read from stream....
//First get the size
long ulSizeRequired = (long)0;
//LARGE_INTEGER
long liBeggining = (long)0;
System.Runtime.InteropServices.ComTypes.STATSTG statstg = new System.Runtime.InteropServices.ComTypes.STATSTG();
pStream.Seek(liBeggining, (int)tagSTREAM_SEEK.STREAM_SEEK_SET, m_NullPointer);
pStream.Stat(out statstg, (int)tagSTATFLAG.STATFLAG_NONAME);
//Size
ulSizeRequired = statstg.cbSize;
if (ulSizeRequired == (long)0)
return string.Empty;
//Allocate buffer + read
byte[] pSource = new byte[ulSizeRequired];
pStream.Read(pSource, (int)ulSizeRequired, m_NullPointer);
//Added by schlaup to handle UTF8 and UNICODE pages
//Convert
//ASCIIEncoding asce = new ASCIIEncoding();
//return asce.GetString(pSource);
Encoding enc = null;
if (pSource.Length > 8)
{
// Check byte order mark
if ((pSource[0] == 0xFF) && (pSource[1] == 0xFE)) // UTF16LE
enc = Encoding.Unicode;
if ((pSource[0] == 0xFE) && (pSource[1] == 0xFF)) // UTF16BE
enc = Encoding.BigEndianUnicode;
if ((pSource[0] == 0xEF) && (pSource[1] == 0xBB) && (pSource[2] == 0xBF)) //UTF8
enc = Encoding.UTF8;
if (enc == null)
{
// Check for alternating zero bytes which might indicate Unicode
if ((pSource[1] == 0) && (pSource[3] == 0) && (pSource[5] == 0) && (pSource[7] == 0))
enc = Encoding.Unicode;
}
}
if (enc == null)
enc = Encoding.Default;
int bomLength = enc.GetPreamble().Length;
return enc.GetString(pSource, bomLength, pSource.Length - bomLength);
}
|
|
Sergey Grischenko
Add-in Express team
Posts: 7233
Joined: 2004-07-05
|
Hi Andr?,
Thank you very much for sharing your solution. |
|