Extract plain text from HTML email

Mail.dll MIME and email component may be used to get the plain-text body and HTML body from any email message.

If a message contains plain-text, no conversion is necessary. It’s simply a matter of using the Text property of IMail interface.

If however the email does not contain plain-text and only HTML content is available, GetTextFromHtml method may be used to convert the HTML to plain-text.

The internal conversion process is much more sophisticated than what can be accomplished with the simple regular-expression code. Converting HTML to plain text is much more than simply removing HTML tags from an HTML document.

Mail.dll contains full-blown HTML parser that handles script tags, comments, CDATA and even incorrectly formatted HTML.

The following C# and VB.NET code extracts plain-text from the HTML body of the email message:

// C#

IMail email = ...

string text = ""; 
if (email.IsText)
    text = email.Text;
else if (email.IsHtml)
    text = email.GetTextFromHtml();
Console.WriteLine(text);

' VB.NET

Dim email As IMail = ...

Dim text As String = ""
If email.IsText Then
    text = email.Text
ElseIf email.IsHtml Then
    text = email.GetTextFromHtml()
End If
Console.WriteLine(text)

You can also use GetBodyAsText method that returns body in plain text format (it uses IMail.Text property or GetTextFromHtml method).

// C#

IMail email = ...

string text = email.GetBodyAsText();
Console.WriteLine(text);

' VB.NET

Dim email As IMail = ...

Dim text As String = email.GetBodyAsText()
Console.WriteLine(text)

Tags:    

Questions?

Consider using our Q&A forum for asking questions.