Saturday, February 26, 2011

1998: Best. Year. Ever.


The following story is true, if somewhat apocryphal.

In 1998, with a whirlwind of buzz and activity swirling around outside, my life was buried in programming. Day and night, all hours, building exciting new things that never existed before. To see the newfound power of the web used in real businesses, watching the web grow exponentially, making new connections, new discoveries, new inventions, it seemed to come by the hour. Hacking, hacking, hacking. My whole life I had a love of programming, and it felt like this was my moment. It was pure magic.

Which is not to say it was all work. Once in a while I might find a glass of wine next to my computer. Somebody must have put it there. I take a sip of wine and go back to work. Hack, hack, hack, it's all coming together, all the connections, the logical structure. Another sip of wine. Hack, hack the structure became a little less logical, recursion became loopy and I was getting tipsy. I stop. I look at the wine, I look at the computer, then I look up. It's 5pm on a Friday and the weekend has begun. I turn off the computer, pick up my glass of wine and step outside.

Our office was situated in one of those quaint downtown main streets that exist up and down the peninsula. We had a store front converted to hipster office space, and on a typical Friday after work, we could just move some chairs and tables outside for an impromptu cafe, with wine and cheese, talking about the future of the web, or maybe hearing an old war story from the ARPAnet days.

My neighbors, Bill and Christine, had a starship bridge in their home. We would gather to watch Star Trek on the view screen and maybe play around with Bill’s battlebot. Some evenings we would attend a meeting of the recently-formed Web Guild, and some nights, we would find out about a big dot-com launch party that everybody was crashing.

I don’t remember the company, and I’m not sure if I knew at the time. but It was a free party with a live band in a hip San Francisco nightclub and that’s all we needed to know. It wasn’t an open bar - none of that irresponsibly excessive burn rate wasting investors' money here! No, we each got two drink tickets at the door and the rest was cash bar. The band was playing, the place was thumping. Jello Biafra - of Dead Kennedys fame - jumped up on stage to join the band for a song. In his hand he had a large roll of those drink tickets, which he unspooled out into the crowd. I must have had a strip of tickets ten feet long, which I hung on my shoulders like a bandoleer. I walked up the the prettiest girl I saw and said, “Wow, the market hasn’t been this good since 1928! Can I buy you a drink?”

And that’s what it was really like in 1998. Or was it '99? Hard to tell, sometimes. Hard to tell.

Saturday, February 19, 2011

Visualizing Open Health Data with Fusion Tables


This post will describe a simple way to take health data, as curated in my last blog post, and visualize it using Fusion Tables (a Google Labs product).

A more sophisticated visiualization may be done with Fusion Tables and the Google Maps API, as detailed in the API Developer's Guide, Geo Section, but for this simple example we will create some maps by hand.

We start with the spreadsheet of CHSI data, by loading into Fusion Tables.


We then select the Visualize->Intensity Map option from the menu.

First, we are going to create heat maps of the various health status indicators. For example, average life expectancy, or ALE averaged by state produces a state-by-state map where states with the longer average life expectancy appears darker in color.

The way this works is fairly simple. Fusion Tables simply averages the county data by state and translates the result to a number. It scales the numbers by color, as we see below. In this example there is no data for Washington State so it appears completely white.
Next, we can create a scatter chart comparing two variables.
In this chart, we compare the average life expectancy on the Y-axis to the annual number of unhealthy days (by air quality) on the X-axis. As one might expect, areas of higher pollution have lower life expectancy.


This is just a quick and simple visualization of open data. Later we will go more in depth and refine our visualizations to extract useful and actionable information.

Thursday, February 10, 2011

Curating Open Health Data with Google Refine

In a previous post, I briefly discussed the meaning and implications of open, linked data. Today I will discuss some work I did at a recent Health 2.0 Hackathon with a particular data set.

The Tools

CHSI
I decided to start with the Community Health Status Indicators from HHS. I was familiar with this data set, having written a brief developer's guide for the first Health 2.0 Hackathon last fall. This is from HealthData.gov, part the government's ongoing "open government" initiative under President Obama and national CTO Aneesh Chopra.

Freebase
Freebase is an open semantic web database. This is the "linked data" part of our exercise. An explanation of what linked data is can be found at LinkedData.org and we won't deal with it in depth except to make connections between the open data released by HHS and real world data in the semantic web.

Google Refine
Google Refine (formerly GridWorks) is a tool for curating, reducing, and linking data using Freebase. Using Google Refine we can take an ordinary spreadsheet, correlate it with semantic data sets in Freebase, and create sets of triples for import into Freebase itself. For this exercise, I created a "base" ordomain of data in Freebase called CHSI. However, for the first session the challenge of translating tabular data into triples is one that could not be addressed in the time allotted.

The Process
The first step is to take a set of data in CSV format and import it into Google Refine as a new project.

This is easy enough and produces a spreadsheet in the familiar fashion.

Now, creating a spreadsheet is just the first step. The real magic happens when we link data in this spreadsheet to semantic data in Freebase. The act of linking data to the real world is called reification, and in Freebase this is done through the "reconcile" function. By clicking on the menu (arrow) icon on a column header, we see a number of menu options, one of which is "Start reconciling..."


The first thing to reconcile is the state. This is easy for Freebase to reason through, as state names are unique and easily recognized. After reconciling, we see each state name is now hyperlinked. We can follow the hyperlink to the Freebase entry for that state.

Next, we want to reconcile counties. The CHSI data is arranged by county, so we can get a fine-grained view of the nation's health data geographically. To reconcile county, we go through the same process.


In the next illustration, you see Freebase has recognized county name, and gives you the default of US County as the semantic data type for that column. If you just reconcile on the name, you'll get a hit-or-miss on the reification, so we want to give Freebase a little more information about this data element. In this case, we can include another column as an extra hint. For our additional column we select state name and start typing in the relationship "contained by." As you start typing, Freebase auto-completes the relationship.



After going through this process, we have hyperlinks in the state and county name columns. These link directly to Freebase and are now semantically linked to their respective entities. Now we can add more columns based on data in Freebase. If you go to the Freebase entry for a county, you will see a number of data elements listed such as GDP, population, pollution levels, household income, adjoining counties, geographical features (the "contained in" relationship") and many others. All of these can be added as additional columns in your spreadsheet.

In my next post, I will discuss visualizing this data.

For more information on using Google Refine, see Jeni's blog post Using Freebase Gridworks to Create Linked Data.








Open and Linked Data

I confess, I love buzzwords. I find them fascinating. Their implications their history, and what makes them buzzy in the first place. Two of my current favorites are what's known as "Open Data" and "Linked Data." Two fundamentally different concepts that work together.


Open Data

Open data means governments and other organizations are releasing data sets to the public domain, and making them accessible in various formats. The hope is that if we have enough open data, clever people will find new and useful applications for it. The old saw “Information wants to be free” applies here. Moreover, it is to everyone’s benefit that information be free. The more information we have, the better and more informed decisions we can make.

Linked Data

Linked data is in a literal sense the semantic web. Each data point is assigned a URI, and relationships between URIs are defined using semantic triples. For example, the County of Santa Clara in California may be represented with a URI:

http://www.freebase.com/view/en/santa_clara_county

The state of California:

http://www.freebase.com/view/en/california

And the country of USA:

http://www.freebase.com/view/en/united_states

A simple relationship “contained in” is then assigned: Santa Clara is containe

d in California. California is contained in USA. Therefore, Santa Clara is contained in USA. With this very simple set of relationships, we can list all the counties in a given state, or all the counties in the country. We can add other relationships, which we shall detail later.

Linked Data is an open platform. Relationships can be defined and queried without restriction.

Open Data and Government 2.0

When it comes to government data sets, the underlying principle is that this data belongs to the people, the citizens of each country. The broad hope is that if all the world’s governments make their public data available we can create semantic relationships and make new discoveries about how government and nations function, and develop better ideas of how they can be improved, removing inefficiencies, lowering costs, and improving effectiveness of public programs. It is possible, indeed likely, that we will find other unrelated uses for open data, for example in the area of making healthy decisions.

The UK is leading in these efforts, its program headed by Sir Tim Berners Lee. More information on the UK Open Data Project can be found here:

http://data.gov.uk/

In [date], the US Department of Health and Human Services (HHS) announced [summary], making a number of data sets public with plans to release more as they become available. In particular, Medicare and Medicaid cost and outcome data is put forward, as well as a number of metrics to measure the health status of communities.

HHS has partnered with Health 2.0 and other organizations to create the Health 2.0 Developer Challenge.

http://health2challenge.org

The implications of open and linked data are clear. If you are considering moving to another city, wouldn’t you want to know the quality of the air, water, education system, and health care? If you could compare these factors to other locations would you possibly make a better decision on where to live, work and raise a family? And shouldn’t we all have access to this information? The data is there. It is only left to us to turn that data into information, information into knowledge, knowledge into wisdom, and wisdom into a better way of life.


Open Data: The Role of Government in Fostering Smartphone Applications


Tuesday, November 16, 2010

Quick and Dirty Home Multimedia


I was going to call this, "How to build your own GoogleTV/AppleTV (but not as good)... for free (except for the stuff you have to buy)."

I've been thinking about this a while. A smart phone could make a great universal remote. Add a little computer to store, retrieve, find, and play your music and movies, pull in content from the outside, plug it into a nice stereo and you're set. Forget physical media. Digital copy is where it's at.

Thinking about the "entertainment center" from basic principles, I decided to start with music, figuring it's a simpler problem to solve and a really good solution can be expanded to video. I've used Apple's own Remote app for a while, and it's good but not quite enough. I want more than just iTunes, I want Internet radio, podcasts, and anything I can fit on the hard drive and play. I figure a Mac mini is flexible and unobtrusive enough, it can all be done with software, and it has digital audio which I can feed into my home stereo (a low-power but great sounding surround sound job). That's another thing, I bought this stereo with a gazillion inputs, and now I only use one. I call that progress.

Now on to the software.

Just about every radio station is streaming on the web somewhere. All I needed was an easy way to point a web browser to the URLs of their streaming sources. I used Sqworl.com because it's simple, it's cloudy (I can make changes to my Sqworl pages from anywhere and use them on a little computer on the shelf which is now music central), and because it generates thumbnails of each site automatically, creating an easy push-button interface.

Once I signed up for a Sqworl account, I added my favorite music links: KFOG San Francisco, WWOZ New Orleans, Pandora, etc. This is going to be the face of my remote. Not bad for a quick and dirty UI.

I also needed a remote desktop, so I can configure and maintain the little music mac from my desktop computer, and besides a lot of this is trial-and-error. On a Mac this is done through VNC (Virtual Network Computing). There are a number of VNC apps, some free some not. I use Vine Viewer from TestPlant. It's $30 but has some advanced remote administration features. I like that stuff, not everyone does.

Once I could operate the music mac remotely, it doesn't need a monitor, keyboard, or mouse - but I keep them handy in a nearby closet in case something goes wrong. VNC requires a server on the music mac and client on the desktop computer, where I'll be working from unless the server stops running, in which case I'll need that keyboard, mouse, and monitor.

Now for the remote. I decided to try Mobile Mouse from RPA Tech. This allows you to use your mobile phone as a wireless mouse or trackpad and click those big Sqworl buttons on the screen. That screen could be a remote desktop, say across the room or even in another room, or it could be a digital TV right there, but you do need a display. If you decide to attach a TV, you could make another Sqworl page linking to Hulu, Netflix, or whatever. If you use a remote computer, you can have it run the mouse server too. The Mobile Mouse Pro includes audio and video controls, much like your standard DVR remote, and uses the accelerometer so you can wave it in the air while chanting Hogwarts incantations if that's your thing.

There's the quick and dirty - and extensible - home multimedia kit. Two client-server programs, a neat little bookmarking site, and a smart phone. Just add content.

Wednesday, September 1, 2010

Freebase at Google

Kirrily Robert of Freebase (and now Google) gave a presentation at this month's GTUG meeting, describing the technology stack, queries, and data management.

This is a collection of information gathered at the talk, though it is mostly just an amalgamated Twitter feed. Still, there are links to all the resources one would need to get starting programming against the Freebase platform.

Freebase has been around for years and was recently acquired by Google. It uses a semantic web model of linked data with dynamic ontologies, more or less. It features a public REST API whose parameter is a structured query (which can be treated as a subset of SPARQL) and returns results in JSON format. It can also return results in RDF.

http://blog.freebase.com
Twitter: @fbase

one example of a site using Freebase data is http://www.tippify.com/

Metaweb Query Language MQL (pronounced like pickle) used to access Freebase data http://ow.ly/2ygTN

Freebase MQL Query Editor available online http://www.freebase.com/app/queryeditor

best supported library is freebase-python http://ow.ly/2ygZf and there are others too http://ow.ly/2yh0n

Online Freebase App Editor http://acre.freebase.com/

Acre is also a server-side javascript application framework for Freebase http://acre.freebase.com/

Freebase 102 demo application done using Acre http://freebase102demo.freebaseapps.com/

Full data dumps of every fact and assertion in Freebase are available weekly http://ow.ly/2yhd8

the Freebase RDF Service http://rdf.freebase.com/

RABJ (pronounced like cabbage) Redundant Array of Brains in a Jar http://ow.ly/2yhhE

Freebase Gridworks for dealing with messy tabular data and cleaning it up http://ow.ly/2yhpO

Kirrily Robert @skud runs a regular Freebase meetup in San Francisco too http://www.meetup.com/sf-freebase/

OpenCalais can give you freebase identifiers as part of its analysis. Some news organizations (NYTimes, UK Guardian) have built linked data APIs which can be integrated in a mash-up with Freebase or other linked data applications.

Monday, May 10, 2010

No, I Did Not Write This on my iPad


First the Obvious

Flash doesn't work, who cares. At Stanford Med we had a "No Flash" rule (much to the relief of the many underpowered older workstations). In general the web browser is no Firefox. Then again a lot of web sites aren't great either. You can turn on the browser debug and watch the Javascript errors fly by, especially the newsy sites with lots of ads. Some sites still think it's a phone and direct you to a mobile version that looks ridiculous on a 10" screen. I suspect a lot of that will improve over time. I know I can live without Flash, but the other day I heard about this roundtable discussion on the subject of the iPad starring one of my favorite technology journalists. I went to the site to watch and it said "Flash required." D'oh!

The second most obvious, this is a device for content consumption, not creation. I can type upwards of 100wpm on a good day, but on this touch screen I hunt and peck. Or I hold it with both hands and try to type with my thumbs. Gestures are excellent for navigating content. To write a blog post I need a keyboard.

As far as battery life, I haven't a clue. It lasts so long I forget to plug it in. I haven't put it down long enough to fully charge it, or used it long enough to fully drain it, so that should tell you something.

The Less Obvious
I don't believe in the "game changer" but I do believe in devices that help you make the plays. For me the real value of the iPad is technical documentation, and the first apps I looked for are document reading and annotation. Study is not a new thing, it's just easier if I can get up from my computer. The more time I spend reading these things the better. Since buying the iPad I have a much better understanding of the JBoss application server and emerging HIT standards (my employer will be happy to know).

Browsing the apps, it's pretty clear the development shops haven't had a lot of time with the iPad. Its UI capabilities are far from fully realized, and many apps are rushed and somewhat buggy. I can hardly blame them, trying to build on a simulator in a few weeks, the actual device sight unseen. Until you hold the thing and use a good touch gesture interface, it's hard to know what it can do, or even what it should do.

Another thing I noticed is that Apps which crash, tend to crash when the orientation changes. If you QA iPad apps for a living, do us all a favor and spin the thing around. A lot. See what happens to your video pointers in the middle of complex operations.

The apps that are well done, are really really neat.

Apps I Like
There are a few I like so far:

Zillow. If you want to see a demonstration of good use of the iPad UI elements, install this app. If you are building your own iPad app, study this one first. It has a full gesture map, listings you can change by touching the map, and a photo gallery in the corner. The real trick is, you can navigate in each of the screen elements independently, or together.

NPR is another good example of independent but related components. Each news category has its own scrollable library. I like it.

Kindle, no kidding. They've had more practice with this form factor than anyone. It's free, and my already-purchased kindle books moved right over.

iAnnotate. I have a lot of issue with the UI, especially the navigation. Pan and zoom tend to blank the screen so often it might actually stop being annoying, except it never stops being annoying. The table of contents widget is downright unusable. There is no forward and back button, no bread crumbs. Loading documents into it is a pain. In general it desperately needs a Version 2.0. That said, it's the app I'm using most. Simply because it lets me read and annotate PDFs.

iTeleport is so far the most usable remote desktop app I've tried on the iPad. It works well enough to do simple things which is all I really want to do without a keyboard anyway.

WolframAlpha on the iPhone is possibly the best pocket calculator ever, and I have high hopes for the iPad version. Right now, though, it's just a big iPhone app. Get it anyway.

GoodReader is a decent PDF reader. Smoother pan and zoom but the iPad needs a tree element for the table of contents like Adobe Acrobat, and not that iPhone multi-menu navigator. Ick.

I'm tempted to buy Omnigraffle, and I'm sure one of these days I'll need to draw a diagram, maybe a system diagram, a workflow, high level software design, or... okay I just bought it.

Pandora and Radio.com are great, but useless without multitasking. Next fall I'll be able to use them with OS 4 if the rumors are correct. Until then it's just silly to turn on the radio and not be able to do anything else.

What I Really Want
The iPhone remote could do so much more on an iPad. I'm sure Apple is porting it but there's more. I want to run Internet radio, pandora, last.fm, and iTunes on my home computer and be able to change the channel without getting up from the morning paper (also on the iPad). So far there's not an app for that.

Guest mode. Has anybody else asked for this? I would like to be able to lock certain applications without locking the whole thing. People come over, want to use it, whatever, especially if it's the sound system remote. Is there some reason why I can't put a passcode on email, facebook, twitter, and other personal apps? Then everyone can browse the web, check the stars, or calculate differential equations, and I don't have to worry about people reading my email.

That's it. The apps are obviously 1.0, the multi-touch interface has yet to be fully exploited, and I'll probably be filing a lot of bug reports with some of these software makers.