jadon
fowler

compilers, VMs, & operating systems


Rant on Evernote's ENML

Evernote’s language to represent the content of your notes is the Evernote Markup Language, and I’m here to tell you why it’s the worst subset of XML I’ve ever used, and probably the worst markup language I’ve ever had to deal with.

When working with the Evernote API, I needed to extract values out of a specific note’s content. This content is stored by Evernote in a format called ENML. It is by far the worst format I have ever used, and I’m scared for anyone else who has to use it in production code.

What exactly is ENML?

ENML is a subset of XML that adds some new elements specific to Evernote. This doesn’t sound so bad, but let’s take a look at its list of “Prohibited Elements”. Specifically, its list of prohibited attributes.

There are two very important attributes that are removed from ENML, id and class. Data has no way of being identified without these attributes. Well, obviously Evernote must be identifying them some way. But how? Let’s look at a sample Business Card note. I’m going to break this up so we can take a look at everything wrong with it.

Dumpster Diving

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml2.dtd">
<en-note style="padding-left: 0px 10px 0px 10px;
  font-size: 14px;
  line-height: 20px;
  color: #808080;
  font-family: Helvetica;">

  <!-- begin span 1 -->
  <div style="x-evernote:contact">

    <!-- begin div 1 -->
    <div style="height: 100%;">

      <div style="x-evernote:contact-info-section">
        <!-- begin div 2 -->
        <div>

The note starts with XML headers, and then we get into the meat. en-note is an element added to designate a note. Seems simple enough. It has some inline CSS. If you recall, ENML doesn’t allow class attributes. This means that all styling has to be inlined, and it is everywhere. Almost every element in the document has styling, so you’ll see giant blocks like

<div style="float: left; clear: left; min-width: 400px;overflow:hidden; _overflow:visible; zoom:1;">

all throughout the file. More specifically, this is the only way Evernote stores data.

Yes, you read that right. Evernote stores the data in style attributes. How do I select the div containing contact info? I just have to select div[style^=x-evernote-info-section]. No big deal, except that’s the worst design I’ve ever seen. No allowing elements attributes like id and then identifying them using style is absurd and defeats the purpose. Wait, what is the purpose? There’s no reason to disallow these attributes.

But I’m ranting too much, let’s get back to the shit storm.

          <!-- begin div 4 -->
          <div style="margin: 20px 35px 20px 10px;
            width: 330px;
            float: left;">

            <!-- begin div 8 - PHOTO -->
            <div style="float:left;x-evernote:profile-image;-evernote-editable:profile-image;">
              <div style="width:130px; height; 130px; display:block;">&nbsp;</div>
            </div>
            <!-- end div 8 -->

            <!-- begin div 9 - JOB TITLE/COMPANY -->
            <div style="float: left; width: 150px; padding-left: 25px;">

              <p style="display:none;margin:0;padding:0; ">

              <span style="x-evernote:family-name;margin:0;padding:0;"></span>
              <span style="x-evernote:given-name;margin:0;padding:0;"></span>

              </p>

I’m not using any of this. You can see the default exporting gives us some weird formatting, but I’m not surprised. Let’s see something interesting.

<p style="color:black;
  font-family:Helvetica;
  font-size:14px;
  line-height:20px;
  margin:0;
  padding:0;">

  <span>
    <span style="x-evernote:display-as;
      -evernote-editable:field;
      font-size: 18px;
      display: block;
      margin-bottom: 2px;
      font-family: Helvetica;
      color: #5f5f5f;
      line-height: 24px;">Billy Bob</span>
  </span>

  <span>
    <span style="x-evernote:contact-title;
      -evernote-editable:field;
      font-size: 18px;
      display: block;
      margin-bottom: 2px;
      font-family: Helvetica;
      color: #5f5f5f;
      line-height: 24px;">Chief Executive Officer</span>
  </span>

  <span>
    <span style="x-evernote:contact-org;
      -evernote-editable:field;
      font-size: 16px;
      font-family: Helvetica;
      color: #6f6f6f;
      line-height: 22px;">Awesome Inc.</span>
  </span>

I would strip out the styling, but I want you to see how horrid this is. Here we get to some juicy info. Name is stored under x-evernote:display-as, job title under x-evernote:contact-title, and company under x-evernote:contact-org.

If we were using ids, it’s be as easy as

<p id="name">Billy Bob</p>
<p id="title">Chief Executive Officer</p>
<p id="company">Awesome Inc.</p>

but we’re dealing with ENML. Alright, what could possibly be next?

<!-- begin div 5 -->
<div style="float: left;
  margin: 0;
  margin-top: 20px;
  margin-left: 15px;
  padding-bottom: 18px;
  font-size: 15px;
  line-height: 26px;
  min-width: 400px;">

      <!-- NOTE: I formatted the code below because no one wants to read a one-liner.
           Yes, this was a one-liner. Please end my suffering.
      -->
      <div style="float: left; clear: left; min-width: 400px;overflow:hidden; _overflow:visible; zoom:1;">
          <div style="margin: 0;padding: 0;text-align: left;overflow:hidden; _overflow:visible; zoom:1;margin-bottom:6px;">
              <div style="x-evernote:email; -evernote-editable:email; word-wrap: break-word;">
                  <div style="float: left;margin: 0;padding: 0;width: 77px;text-align: left;">
                      <p style="margin:0;padding:0;">
                          <span style="-webkit-appearance: none;border: none;color: #aaaaaa;font-size: 14px;font-family: Helvetica;">
                              <span style="x-evernote:context;-evernote-context-name:email">email</span>
                          </span>
                      </p>
                  </div>
                  <div style="x-evernote:value; color: #6f6f6f;">
                      <a href="mailto:[email protected]">[email protected]</a>
                  </div>
              </div>
          </div>
      </div>

Emails and phone numbers use this same format. An unnamed div with divs inside it that have important data. Emails are quite easy to extract, as all you have to do is search for a[href*='mailto'] inside div[style*='x-evernote:email'. Let’s look at phone numbers.

<!-- NOTE: Again, I formatted the below code. Evernote stores them in one-liners because they want to watch the world burn. -->
<div style="float: left; clear: left; min-width: 400px;overflow:hidden; _overflow:visible; zoom:1;">
    <div style="margin: 0;padding: 0;text-align: left;overflow:hidden; _overflow:visible; zoom:1;margin-bottom:6px;">
        <div style="x-evernote:phone; -evernote-editable:phone; word-wrap: break-word;">
            <div style="float: left;margin: 0;padding: 0;width: 77px;text-align: left;">
                <p style="margin:0;padding:0;">
                    <span style="-webkit-appearance: none;border: none;color: #aaaaaa;font-size: 14px;font-family: Helvetica;">
                        <span style="x-evernote:context;-evernote-context-name:phone">phone</span>
                    </span>
                </p>
            </div>
            <span style="x-evernote:value; color: #6f6f6f;">(111) 222-3333</span>
        </div>
    </div>
</div>

<div style="float: left; clear: left; min-width: 400px;overflow:hidden; _overflow:visible; zoom:1;">
    <div style="margin: 0;padding: 0;text-align: left;overflow:hidden; _overflow:visible; zoom:1;margin-bottom:6px;">
        <div style="x-evernote:phone; -evernote-editable:phone; word-wrap: break-word;">
            <div style="float: left;margin: 0;padding: 0;width: 77px;text-align: left;">
                <p style="margin:0;padding:0;">
                    <span style="-webkit-appearance: none;border: none;color: #aaaaaa;font-size: 14px;font-family: Helvetica;">
                        <span style="x-evernote:context;-evernote-context-name:mobile">mobile</span>
                    </span>
                </p>
            </div>
            <span style="x-evernote:value; color: #6f6f6f;">(111) 222-3334</span>
        </div>
    </div>
</div>

These are a lot worse, as the values are stored in arbitrary spans. x-evernote:value seems to be quite popular. Selecting div[style*='x-evernote:phone' will get you a list of divs that contain the type of phone number it is, and the number itself. Well, in span[style*='x-evernote:context'] and span[style*='x-evernote:value'], respectively.


A simple JSON file would have been a whole lot better than all these random elements with random attributes.

{
  "contact": {
    "name": "Billy Bob",
    "title": "Chief Executive Officer",
    "company": "Awesome Inc.",
    "email": "[email protected]",
    "phoneNumbers": {
      "work": "(111) 222-3333",
      "mobile": "(111) 222-3334",
    }
  }
}

This is a lot cleaner than the awful attempt at an “XML subset” that Evernote provides. This language is supposed to help developers access the data of a note easily, and that’s the last thing it’s doing.