Notes from a Linguistic Mystic

My particular form of procrastination is optimization. You can tell I don’t want to cut two bags of potatoes when I’m sharpening the kitchen knives. You can tell I’m uninterested in laundry when I ‘m cleaning the dryer barrel. And when I didn’t quite know where to go with my dissertation prospectus, well, I decided that I needed to develop a more graceful way to do so.

For the last few years, I’ve written all my large papers in XeLaTeX (using XeLaTeX for unicode support, making IPA much easier). I love LaTeX, love BibTeX, and love not worrying about formatting. But writing long sections of text in LaTeX kind of sucks, because it’s rather clunky and there are no good editors for LaTeX on mobile devices.

In LaTeX, making text bold requires you to wrap the word or phrase in eight characters worth of tags. Section headings are ugly, and also have accompanying tags. Every %, & or _ must be escaped. LaTeX is powerful for doing complex things, but while writing prose, it just gets in the way.

Why Markdown?

I decided that I’d rather write in Markdown. Markdown is an easy syntax for writing, where you can define section headings as easily as:

# This is a section heading 
## Subsection Heading
### Subsubsection heading

Bold, italic, and bold-italic are as easy as:

**bold**, *italic*, ***bold italic***

Most importantly, it’s designed to be quick to use and type using available symbols. So, in short, writing Markdown doesn’t suck, but I wanted to still use the best of LaTeX, for things like dynamic numbering, BibTeX automatic bibliographies, and easy creation of nice tables.

So, I hacked together a solution using Pandoc, the same software I use to generate this site from Markdown.

Turning Markdown into LaTeX

First, I created two documents which had the preamble code for LaTeX in one (everything up until the first section heading), and the footer info in the other (the bibliography).

Then, I created a markdown file for the meat of the paper, which I’ll later convert into LaTeX and stick between the header and footer. I stuck this markdown file in my Dropbox folder and I edit that markdown file to write the paper, whether on a Mac (using TextMate), or on an iPad or iPhone (using Editorial). You can make individual chapter files and concatenate them, if you’d prefer, but I stuck to one mega-file.

The beautiful thing about this approach is that I can write Markdown, which is readable and pleasant, 95% of the time, and then switch into LaTeX in the same file to add something fancy, such as a \cite{Paper Citation}, a \ref{reference} to a \label{labeled section} or a \footnote{}.

I can also include LaTeX tables, throw in \input{} commands to read other tables in, and use \vspace{} where needed. There’s no penalty to going back and forth, and I have the power of LaTeX when needed, and the easy-pretty of markdown when I’m just writing.

This also allows me to use Stargazer, a package for the R Statistics Suite which allows you to directly output data as pretty LaTeX tables. I just have Stargazer output to a .tex file, then \input{} that .tex file. It’s both wonderful and reproducible, because all of my figures, tables, and models are generated directly by R, so no “copy-paste” errors are possible.

How?!

Well, the joy is in the script that creates the data. When I’d like to see a final version, I run a script in the terminal (or hit Cmd+Option+Control+Shift+PageDown, triggering it through KeyboardMaestro.

Although you’ll want to look at the script itself, which is extensively commented, basically, it does the following:

  1. It copies all of the text from Markdown files, and all of the analysis scripts, into a single place.
  2. It turns the Markdown into a LaTeX file using Pandoc.
  3. It cleans up the output a bit.
  4. It tacks a custom header and footer onto the output, which contains all my style information.
  5. It builds the document and bibliography in LaTeX
  6. It opens the PDF copy in a PDF reader, and copies the latest PDF version to my dissertation folder
  7. It builds a .tar.gz archive containing the complete text and analysis scripts, and saves it to a “backups” folder by date.
    • This way, if I mess something up, I can always go back to the last version(s), and I’ve got a way to compare changes if I need to.

It combines the best parts of simple plaintext writing with the best parts of LaTeX, and allows me to be as productive on my phone or iPad as I can be at home (with the exception of rendering a new PDF, and using PocketBib for reading and finding citekeys). In short, it allowed me to write 72,000+ words of dissertation, and not hate my life. I’ve since moved my guide to using Praat to a similar workflow, so I can write it using Markdown too!

Most importantly, though, I’ve found a way to make writing a dissertation geekier than it already was. And that, my friends, is my real accomplishment.

~ ə ~

Hi everybody! Three pieces of news, two language related.

Dr. Vowels?

First, I’ve just completed and submitted to my committee my doctoral dissertation. For those of you not familiar with the American Ph.D system, a Doctoral Dissertation is a large paper (or small book) in which you aim to make a teeny, tiny increase in the world’s knowledge on something or another.

My dissertation, which I described here for a non-linguistic audience, worked to get a better idea of how we humans are able to hear vowel nasality (the difference in the vowel between “pat” and “pant”). Here’s a word-cloud of what I wrote, just for grins:

So, after running four experiments, recording around 4000 words, and writing 170 single-spaced pages, I now have some conclusions, which I’ll present to a committee of six professors from my university on March 18th, and, after they grill me about it for around 2 hours, if they agree I’ve done good work, I will have a Ph.D!

After that, I’m moving to the University of Michigan to act as a Post-Doctoral Research Fellow on a major grant (the “Post-Doc” option discussed here), investigating how people’s perception of different nuances of speech is reflected in their production of these things.

So, between that, publishing my dissertation research, and continuing to work on my guide to phonetic analysis using the Praat software package, I’ll be busy, but I’m also hoping to post a bit more often here.

Site Updates

As part of that process, I’m going to be making a few updates to the site itself. I’ll be going through some old posts to make sure they’re linguistically sound (and maybe add sources), fixing some long-standing formatting issues, and tweaking the code a bit. I’ll keep all the URLs the same, but don’t be too shocked if the odd post disappears, or if some things are updated or changed.

Unrelated…

Finally, as many people who know me will mournfully attest, I’m a lover of puns. I collect and often deploy Wordplay, puns, father goose stories, or other humor that makes you hurt and laugh in equal parts.

Over the years, I’ve developed quite a large list of terrible puns, and I’ve decided to put them online, because some people just want to watch the world groan. If you’d like to be in pun-pain, well, go to my #crappypuns website at:

http://savethevowels.org/crappypuns/

I welcome contributions, either by comment, Facebook, or email. And if you ever hear a good fish pun, be sure to let minnow.

(Sorry.)

Post Date: March 3, 2015
Categories: humor - personal -
~ ə ~

I’m kind of a nerd about websites. I’m not content to use Dreamweaver, or just write some code. I always want my websites to be a lightweight and optimal as they can be.

When it comes to web publishing, I’ve always been a bit of a minimalist. Over time, I moved this blog from a hosted solution, to a Wordpress install, and then eventually, to Jekyll (that migration process is explained in detail here).

I started off creating my personal site using Jekyll. This was rather a waste, considering that Jekyll’s made for blogs, and that site is really just styled text at its core, with nothing temporal, and no need of fancy tags or pagination. But I still wanted to be able to write in Markdown and style it with CSS afterwards.

So, I got the idea to write everything in markdown, style with CSS, use Pandoc to convert it to HTML, then just drop it onto the server.

Oh, simple!

Actually implementing this was one of those things that’s easy to do once working, but takes forever to get exactly right. The core of it is a single shell script (viewable here, slightly de-identified) which converts the markdown to HTML, then uploads it.

The hardest part of setting all this up was figuring out the syntax of the below command, applied to each folder:

find . -name \*.md -type f -exec pandoc -B includes/spcvhead -A includes/spcvfoot -o {}.html {} \;

In English, it finds any file which ends with “.md” (a markdown file), then executes the pandoc command, including spcvhead (containing the header info, overarching style info, etc) (B)efore the file , then spcvfoot (A)fter. Then it outputs the rest as .html files. If you want different headers/footers in different parts of the site, just run the command on each folder with a different set of includes.

This gives you folders full of .md.html files, due to a quirk of how Pandoc operates. It then goes through and changes those back to .html files with the below command:

find . -depth -name '*.md.html' -execdir bash -c 'mv -i "$1" "${1//md.html/html}"' bash {} \;

Then, it uploads the contents of the site folder (html, css, etc.) to the server using rsync, and goes through and removes the newly generated .html files (to keep the local folder tidy).

This allows me to write pages, posts, and essays using mostly markdown, with occasional dashes of HTML/CSS to style particular elements (page titles, lists, images).

It works great, and is the closest a CMS has ever come to simply getting out of my way. Hopefully this description (and the shell script that makes it work) will prove useful to others.

~ ə ~

I— layout: post title: The headline read with difficulty failed categories: - Language Usage - Language and the Media - Words, Phrases and Idioms —

I’d like to take a writing break to point out an absolutely killer Garden Path sentence that Ars Technica just dropped:

"Fake browser warning your uncle might fall for delivers malicious trojan"

The headline, reading “Fake browser warning your uncle might fall for delivers malicious trojan”, is a wonderful example of a Garden Path sentence, a fully grammatical sentence whose structure is misleading, and seems like it should be read one way until later in the sentence the real structure is revealed, when it’s already too late.

These have a tendency to crash human brains. Try these on for size:

  • The horse raced past the barn fell.
  • The old man the boat.
  • The headline read with difficulty failed

Those should be read as “The horse who was raced past the barn fell down”, and “The oldest people on the boat run (“man”) it”, and that last one just isn’t as rough, as we all understand that headlines cannot read.

Ars’ little Garden Path bomb is particularly spectacular. We start off thinking that there is a fake browser, which is warning us that our uncles may fall, but then the delivers hits. At this point, we think we’ve already hit the verb (“warning”), so we’re puzzled by “delivers”.

Only once we go back and read “Fake browser warning your uncle might fall for” as a single noun-phrase can we understand that this fake warning is delivering a malicious trojan.

These sentences are spectacular, and it’s fairly rare to see such a blatant example in the wild (as a Syntactician friend put it, “this one is a doozy”), so, when I saw it, I just had to comment.

Finally, I must point out that my goal isn’t to mock the author. More likely than not, the reporter worked hard missed it.

(Sorry)

Post Date: December 6, 2014
Categories:
~ ə ~

I’ve been too busy teaching, dissertating, and preparing for my Post-Doc to post regularly, but I just figured I’d post a quick update and respond to current events.

I’ve been running the Yosemite beta for a month or so now, and have had no problems at all. The OS is good, my previous tutorial on using IPA fonts with Mac OS X still works, my P2FA install guide still works, and Praat is good-to-go.

Putting aside the strangeness of 10.10 not being the same thing as “10.1” or just “11”, Mac OS X 10.10 “Yosemite” gets my seal of approval.

Seal of Approval

~ ə ~