🤓 danott.co

Today I Learned

A byte order mark is an optional character that may appear at the beginning of a text stream. The unicode character U+FEFF can be used to signal several things to a program reading the text.

My program reading the text is Ruby’s JSON.parse. The text being read a JSON payload from the Pinboard API.

A helpful email exchange with Pinboard support is what pointed me in the right direction. A recent server upgrade began inserting the byte order mark where it wasn’t present before. It sounds like this insertion is going to be removed, because it makes the raw JSON payloads un-parsable by JSON.parse in both Ruby and JavaScript.

In the meantime, I wanted to get my parsing fixed, so I could keep my Pinboard powered links up-to-date. With the tip from Pinboard support and some StackOverflow I got my JSON parsing fixed.

  def load_remote_json
    uri = URI("https://api.pinboard.in/v1/posts/all?tag=danott.co&format=json&auth_token=#{auth_token}")
    body = Net::HTTP.get(uri)
+   body = strip_byte_order_mark(body)
    JSON.parse(body)
  end

+ def strip_byte_order_mark(potentially_unparsable)
+   potentially_unparsable.dup.sub("\xEF\xBB\xBF".force_encoding("UTF-8"), "")
+ end

This experience highlighted the tradeoffs of using vcr for tests. On the positive side, I had a cassette recording that captured the old API response. This allowed me to diff the old response and the new response to dig down and find what changed. On the negative side, I didn’t encounter the failure in parsing until it broke my link importing in production.

Tradeoffs.