Extract plain text from HTML email
Mail.dll MIME and email component may be used to get the plain-text body and HTML body from any email message.
If a message contains plain-text, no conversion is necessary. It’s simply a matter of using the Text property of IMail interface.
If however the email does not contain plain-text and only HTML content is available, GetTextFromHtml method may be used to convert the HTML to plain-text.
The internal conversion process is much more sophisticated than what can be accomplished with the simple regular-expression code. Converting HTML to plain text is much more than simply removing HTML tags from an HTML document.
Mail.dll contains full-blown HTML parser that handles script tags, comments, CDATA and even incorrectly formatted HTML.
The following C# and VB.NET code extracts plain-text from the HTML body of the email message:
// C# IMail email = ... string text = ""; if (email.IsText) text = email.Text; else if (email.IsHtml) text = email.GetTextFromHtml(); Console.WriteLine(text);
' VB.NET Dim email As IMail = ... Dim text As String = "" If email.IsText Then text = email.Text ElseIf email.IsHtml Then text = email.GetTextFromHtml() End If Console.WriteLine(text)
You can also use GetBodyAsText method that returns body in plain text format (it uses IMail.Text property or GetTextFromHtml method).
// C# IMail email = ... string text = email.GetBodyAsText(); Console.WriteLine(text);
' VB.NET Dim email As IMail = ... Dim text As String = email.GetBodyAsText() Console.WriteLine(text)