So you have some email content that looks something like this…
MIME-Version: 1.0
References: <CABxEEohuqZBoVpsyY4pOFMYixhU2bzfxgs9tRLbUoV2NJMqCJw@mail.gmail.com>
<CAL5Lp9Xyo0mEQ6-c1yAQ+SuKXrT4Xu5y-7BnvnGS4RMjZOBJ=g@mail.gmail.com>
In-Reply-To: <CAL5Lp9Xyo0mEQ6-c1yAQ+SuKXrT4Xu5y-7BnvnGS4RMjZOBJ=g@mail.gmail.com>
From: Chris <c@sigparser.com>
Date: Wed, 9 Jan 2019 08:36:15 -0800
Message-ID: <CABxEEoizOPyCLkq4+FBGNaw7KC2TJDfTZF5dp8xD9aFjDQoL+Q@mail.gmail.com>
Subject: Re: food for thought
To: Paul <p@sigparser.com>
Content-Type: multipart/related; boundary="000000000000382db9057f0910d6"
--000000000000382db9057f0910d6
Content-Type: multipart/alternative; boundary="000000000000382db0057f0910d5"
--000000000000382db0057f0910d5
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Ok. Just a thought. Got it.
--000000000000382db0057f0910d5
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
<div><div dir=3D"auto">Ok.=C2=A0 Just a thought.=C2=A0 Got it. =C2=A0</div>=
</div><div><br><div class=3D"gmail_quote"><div dir=3D"ltr">On Wed, Jan 9, 2=
You want to decode it into something useful. But how? We’ll go thru how to do that in this guide.
You can see the MIME data in Gmail by opening any email, clicking the three dots on the right and clicking Show Original.
Sections
- MIME explained
- Manually Decoding the Content
- Parsing Frameworks - Python, C#, JavaScript, Java, C/C++, PHP
- Code Examples
MIME Explained
MIME stands for Multipurpose Internet Mail Extensions and defines the standard format email clients use when sending and receiving emails behind the scenes. This was created before JSON and XML were popular which is why the format is so unique. You can read the specification but that can be a bit challenging. Instead we’ll go over the basics.
MIME supports features like embedded attachments, multiple email body types (plain text and HTML) in the same email, defining a content encoding type and then additional properties which can be used by new email clients. The Wikipedia page on MIME has some details.
You can see in the above example how the sections are divided, the Content-Type and Content-Transfer-Encoding fields.
You can also see near the top how In-Reply-To is used BUT you won’t see In-Reply-To defined anywhere in the MIME spec. Instead it is in defined in the Registration of Mail and MIME Header Fields spec. You can see all the various fields an email message can have on it.
Other fields like DKIM-Signature validate the sender of the message is really the sender.
In the end, you should avoid writing your own parser.
Manually Decoding The Content
We’ll show you how to capture the HTML and plain text bodies from MIME format and convert them to a usable form without any code.
If you’re in Gmail for example and click the “…” for an email and click “Show Original” you can see the MIME data.
HTML Section
Find the section header with content-type is text/html.
--9f823aebd27c8d7e34c2ad1ba241f25b1140e7d3745bb216e425308b6182
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html; charset=UTF-8
Mime-Version: 1.0
Everything below that is the HTML. In this case it is encoded as quoted-printable which means we need to decode it or it won’t render correctly. To decode, use a web tool like SigParser’s Quoted Printable Decoder which lets you copy and paste the quoted-printable text into the Encoded box and then click Decode. There are other tools out there but many of the others send your email to their server and store the contents so be careful about those.
The result will be a usable set of HTML.
Plain Text Section
Not all emails will have HTML or sometimes you’ll get a plain text version as well which is an approximation of the HTML content.
--9f823aebd27c8d7e34c2ad1ba241f25b1140e7d3745bb216e425308b6182
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset=UTF-8
Mime-Version: 1.0
Again this is encoded as quoted-printable which means we need to decode it or it won’t render correctly. To decode, use a tool like SigParser’s Quoted Printable Decoder which lets you copy and paste the quoted-printable text into the Encoded box and then click Decode.
Content-Transfer-Encoding: base64
What if the Content-Transfer-Encoding is base64 for either text/plain or text/html? In that case you have an extra step to do.
It will look like this.
--_000_BYAPR13MB2294DD720555473A1F0CF57BD23E0BYAPR13MB2294namp_
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
VGhhbmsgeW91LCBQYXVsLg0KDQpGcm9tOiBQYXVsIE1lbmRvemEgPHBtZW5kb3phQHNpZ3BhcnNl
ci5jb20+DQpTZW50OiBGcmlkYXksIEFwcmlsIDI2LCAyMDE5IDY6NDIgQU0NClRvOiBEYW1vbiBT
- Copy all the base64 encoded text. It is the non-sensical text below “Content-Transfer-Encoding” in this case.
- Paste it into Base64 Decoder textbox and click Decode
- The result textbox should contain the email.
MIME Parsing Tools
Online Services
Code Frameworks
- PHP
- Python
- JavaScript
- .NET/C#
- MimeKit
- Using Gmail API? Checkout our Gmail API Response Parsing guide
- Java
- C/C++
Code Examples
SigParser MIME to JSON Reply Chain Splitter
SigParser has developed a set of tools for parsing details out of emails. We let developers use our email splitting and email cleansing tools in their own products.
- Clean email bodies of signatures and reply chains
- Get email bodies for forwarded emails
- Capture nested email chains in a single MIME message or .eml file
- Windows, Linux and Lambda deployment options
- Cloud API - POST https://ipaas.sigparser.com/api/Mime/ParseString
- Frequent updates as email clients and patterns change
- Usage based and unlimited plans available
Here is an example of a response.
{
"CleanedBodyPlain": "Another response in the chain.\r\n\r\n",
"CleanedBodyHtml": "<div dir=\"ltr\"><div dir=\"ltr\"><div>Another response in the chain. </div><div><br clear=\"all\"></div></div></div>",
"IsSpammyLookingEmailMessage": false,
"IsSpammyLookingSender": false,
"EmailTypes": [
"NormalEmail"
],
"Emails": [
{
"CleanedBodyPlain": "Another response in the chain.\r\n\r\n",
"CleanedBodyHtml": "<div dir=\"ltr\"><div dir=\"ltr\"><div>Another response in the chain. </div><div><br clear=\"all\"></div></div></div>",
"Subject": null,
"Date": "2020-05-11T16:41:16+00:00",
"FromEmailAddress": "paul@example.com",
"FromName": "Paul Mendoza",
"To": [
{
"Name": "Outlook Tester",
"EmailAddress": "outlook.tester@salesforceemail.com"
}
],
"Cc": []
},
{
"CleanedBodyPlain": "This is a reply from the test account.\r\n\r\n",
"CleanedBodyHtml": null,
"Subject": null,
"Date": "2020-05-11T09:40:00",
"FromEmailAddress": "outlook.tester@salesforceemail.com",
"FromName": "Outlook Tester",
"To": [],
"Cc": []
},
{
"CleanedBodyPlain": null,
"CleanedBodyHtml": null,
"Subject": "One more test email at 3:25 PM",
"Date": "2020-04-12T15:25:00",
"FromEmailAddress": "paul@example.com",
"FromName": "Paul Mendoza",
"To": [
{
"Name": "Outlook Tester",
"EmailAddress": "outlook.tester@salesforceemail.com"
}
],
"Cc": []
}
],
"Subject": "Re: One more test email at 3:25 PM",
"Date": "2020-05-11T16:41:16+00:00",
"Headers": {
"mime-version": "1.0",
"date": "Mon, 11 May 2020 09:41:16 -0700",
"references": "<CAL5Lp9VcCVNqeiw0Rry7BHQaTct46qv3BnUvR5-HNqWZO-Xxiw@mail.gmail.com>\r\n\t<BY5PR04MB6819EFA89CDABDFCB9D67D2F8AA10@BY5PR04MB6819.namprd04.prod.outlook.com>",
"in-reply-to": "<BY5PR04MB6819EFA89CDABDFCB9D67D2F8AA10@BY5PR04MB6819.namprd04.prod.outlook.com>",
"message-id": "<CAL5Lp9X0RjYNOo68Y_boL8OOw32gU-SWxLW3WjgYj93eTfUsyQ@mail.gmail.com>",
"subject": "Re: One more test email at 3:25 PM",
"from": "Paul Mendoza <paul@example.com>",
"to": "Outlook Tester <outlook.tester@salesforceemail.com>",
"content-type": "multipart/alternative; boundary=\"00000000000001bd4705a5620460\""
},
"FullPlainTextBody": "Another response in the chain.\n\n*Paul Mendoza*, Founder\nMobile 760-917-3753\nSigParser\npaul@example.com\nSchedule a meeting with me here <https://www.meetingbird.com/m/xxxxxx>\n\nListen to podcasts? I was recently on the *FutureTech Podcast*\n<https://www.futuretechpodcast.com/podcasts/digging-up-the-data-your-company-has-needs-and-cant-access-paul-mendoza-sigparser/>\ntalking about SigParser and use cases other customers are using it for.\n\n\nOn Mon, May 11, 2020 at 9:40 AM Outlook Tester <\noutlook.tester@salesforceemail.com> wrote:\n\n> This is a reply from the test account.\n>\n>\n>\n> *From:* Paul Mendoza <paul@example.com>\n> *Sent:* Sunday, April 12, 2020 3:25 PM\n> *To:* Outlook Tester <outlook.tester@salesforceemail.com>\n> *Subject:* One more test email at 3:25 PM\n>\n>\n>\n>\n> *Paul Mendoza, *Founder\n>\n> Mobile 760-917-3753\n>\n> SigParser\n>\n> paul@example.com\n>\n> Schedule a meeting with me here <https://www.meetingbird.com/m/xxxxxx>\n>\n> Listen to podcasts? I was recently on the *FutureTech Podcast*\n> <https://www.futuretechpodcast.com/podcasts/digging-up-the-data-your-company-has-needs-and-cant-access-paul-mendoza-sigparser/>\n> talking about SigParser and use cases other customers are using it for.\n>\n",
"FullHtmlBody": "<div dir=\"ltr\"><div dir=\"ltr\"><div>Another response in the chain. </div><div><br clear=\"all\"><div><div dir=\"ltr\" class=\"gmail_signature\" data-smartmail=\"gmail_signature\"><div dir=\"ltr\"><div><div dir=\"ltr\"><div><div dir=\"ltr\"><div><div dir=\"ltr\"><div dir=\"ltr\"><div dir=\"ltr\"><div dir=\"ltr\"><font color=\"#3d85c6\" face=\"tahoma, sans-serif\" style=\"font-size:12.8px\"><b>Paul Mendoza</b></font><font color=\"#3d85c6\" face=\"tahoma, sans-serif\" style=\"font-size:12.8px;font-weight:bold\">, </font><span style=\"font-size:12.8px;color:rgb(61,133,198);font-family:tahoma,sans-serif\">Founder</span><div style=\"font-size:12.8px\"><div><font color=\"#666666\" size=\"2\" face=\"arial narrow, sans-serif\">Mobile 760-917-3753</font></div><div><font color=\"#666666\" size=\"2\" face=\"arial narrow, sans-serif\">SigParser</font></div><div><a href=\"mailto:paul@example.com\" style=\"font-family:tahoma,sans-serif;font-size:12.8px;color:rgb(17,85,204)\" target=\"_blank\">paul@example.com</a><br></div><div><a href=\"https://www.meetingbird.com/m/xxxxxx\" target=\"_blank\">Schedule a meeting with me here</a></div><div><img src=\"https://drive.google.com/a/sigparser.com/uc?id=1GUhMvrGnJMCfkge1HMqyKFQCLSJNXcw-&export=download\" width=\"200\" height=\"90\"><br></div></div>Listen to podcasts? I was recently on the <a href=\"https://www.futuretechpodcast.com/podcasts/digging-up-the-data-your-company-has-needs-and-cant-access-paul-mendoza-sigparser/\" target=\"_blank\"><b>FutureTech Podcast</b></a> talking about SigParser and use cases other customers are using it for. </div></div></div></div></div></div></div></div></div></div></div></div><br></div></div><br><div class=\"gmail_quote\"><div dir=\"ltr\" class=\"gmail_attr\">On Mon, May 11, 2020 at 9:40 AM Outlook Tester <<a href=\"mailto:outlook.tester@salesforceemail.com\">outlook.tester@salesforceemail.com</a>> wrote:<br></div><blockquote class=\"gmail_quote\" style=\"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex\">\n\n\n\n\n\n<div lang=\"EN-US\">\n<div class=\"gmail-m_-2662285044572695259WordSection1\">\n<p class=\"MsoNormal\">This is a reply from the test account.<u></u><u></u></p>\n<p class=\"MsoNormal\"><u></u> <u></u></p>\n<div style=\"border-right:none;border-bottom:none;border-left:none;border-top:1pt solid rgb(225,225,225);padding:3pt 0in 0in\">\n<p class=\"MsoNormal\"><b>From:</b> Paul Mendoza <<a href=\"mailto:paul@example.com\" target=\"_blank\">paul@example.com</a>> <br>\n<b>Sent:</b> Sunday, April 12, 2020 3:25 PM<br>\n<b>To:</b> Outlook Tester <<a href=\"mailto:outlook.tester@salesforceemail.com\" target=\"_blank\">outlook.tester@salesforceemail.com</a>><br>\n<b>Subject:</b> One more test email at 3:25 PM<u></u><u></u></p>\n</div>\n<p class=\"MsoNormal\"><u></u> <u></u></p>\n<div>\n<p class=\"MsoNormal\"><br clear=\"all\">\n<u></u><u></u></p>\n<div>\n<div>\n<div>\n<div>\n<div>\n<div>\n<div>\n<div>\n<div>\n<div>\n<div>\n<div>\n<p class=\"MsoNormal\"><b><span style=\"font-size:9.5pt;font-family:Tahoma,sans-serif;color:rgb(61,133,198)\">Paul Mendoza, </span></b><span style=\"font-size:9.5pt;font-family:Tahoma,sans-serif;color:rgb(61,133,198)\">Founder</span><u></u><u></u></p>\n<div>\n<div>\n<p class=\"MsoNormal\"><span style=\"font-size:10pt;font-family:"Arial Narrow",sans-serif;color:rgb(102,102,102)\">Mobile 760-917-3753</span><span style=\"font-size:9.5pt\"><u></u><u></u></span></p>\n</div>\n<div>\n<p class=\"MsoNormal\"><span style=\"font-size:10pt;font-family:"Arial Narrow",sans-serif;color:rgb(102,102,102)\">SigParser</span><span style=\"font-size:9.5pt\"><u></u><u></u></span></p>\n</div>\n<div>\n<p class=\"MsoNormal\"><span style=\"font-size:9.5pt\"><a href=\"mailto:paul@example.com\" target=\"_blank\"><span style=\"font-family:Tahoma,sans-serif;color:rgb(17,85,204)\">paul@example.com</span></a><u></u><u></u></span></p>\n</div>\n<div>\n<p class=\"MsoNormal\"><span style=\"font-size:9.5pt\"><a href=\"https://www.meetingbird.com/m/xxxxxx\" target=\"_blank\">Schedule a meeting with me here</a><u></u><u></u></span></p>\n</div>\n<div>\n<p class=\"MsoNormal\"><span style=\"font-size:9.5pt\"><img border=\"0\" width=\"200\" height=\"90\" style=\"width: 2.0833in; height: 0.9375in;\" id=\"gmail-m_-2662285044572695259_x0000_i1025\" src=\"https://ci6.googleusercontent.com/proxy/TTpjUlFcjmphqTPKcbTFGb7TsHUk5MzP3P1Wt2uZYLjMzlO0UPeF7MAgaUwFk4hqlFafCMhmzlmkc3FUbGH4ijNXkqx9DAsv-_3CFnCTmZaZhMlONJqrrR-oGfWMfwqGpDgk301HHsijRMhsymfOCkhNKg=s0-d-e1-ft#https://drive.google.com/a/sigparser.com/uc?id=1GUhMvrGnJMCfkge1HMqyKFQCLSJNXcw-&export=download\"></span><span style=\"font-size:9.5pt\"><u></u><u></u></span></p>\n</div>\n</div>\n<p class=\"MsoNormal\">Listen to podcasts? I was recently on the <a href=\"https://www.futuretechpodcast.com/podcasts/digging-up-the-data-your-company-has-needs-and-cant-access-paul-mendoza-sigparser/\" target=\"_blank\">\n<b>FutureTech Podcast</b></a> talking about SigParser and use cases other customers are using it for.\n<u></u><u></u></p>\n</div>\n</div>\n</div>\n</div>\n</div>\n</div>\n</div>\n</div>\n</div>\n</div>\n</div>\n</div>\n</div>\n</div>\n</div>\n\n</blockquote></div></div>\n"
}
Copy and paste your own email below and we’ll send you the JSON formatted email.
You can get a MIME email from Gmail by opening an email, click the three dots, click Show Original and then click Copy to Clipboard.
PHP Parse Email
Example from the GitHub page for php-mime-mail-parser
<?php
// Include the library first
require_once __DIR__.'/vendor/autoload.php';
$path = 'path/to/mail.txt';
$Parser = new PhpMimeMailParser\Parser();
// There are four methods available to indicate which mime mail to parse.
// You only need to use one of the following four:
// 1. Specify a file path to the mime mail.
$Parser->setPath($path);
// 2. Specify a php file resource (stream) to the mime mail.
$Parser->setStream(fopen($path, "r"));
// 3. Specify the raw mime mail text.
$Parser->setText(file_get_contents($path));
// 4. Specify a stream to work with mail server
$Parser->setStream(fopen("php://stdin", "r"));
// Once we've indicated where to find the mail, we can parse out the data
$to = $Parser->getHeader('to'); // "test" <test@example.com>, "test2" <test2@example.com>
$addressesTo = $Parser->getAddresses('to');
//Return an array : [["display"=>"test", "address"=>"test@example.com", false],["display"=>"test2",
"address"=>"test2@example.com", false]]
$from = $Parser->getHeader('from'); // "test" <test@example.com>
$addressesFrom = $Parser->getAddresses('from');
//Return an array : [["display"=>"test", "address"=>"test@example.com", "is_group"=>false]]
$subject = $Parser->getHeader('subject');
$text = $Parser->getMessageBody('text');
$html = $Parser->getMessageBody('html');
....
Python Parse Email
import email
msg = email.message_from_string(emailtext)
msg['from'] # 'example@example.com'
msg['to'] # 'example2@something.com'
JavaScript Parse Email
Example from emailjs-mime-parser
npm install --save emailjs-mime-parser
import parse from 'emailjs-mime-parser'
parse(String) -> MimeNode
C#/.NET MIME Parser
Example from MimeKit
var parser = new MimeParser (stream, MimeFormat.Entity);
var message = parser.ParseMessage ();