Google Labs has just released Google Squared. Unlike a Google web search which returns an unstructured list of web pages, Google Squared is designed to return structured data. Searching for US States returns a “square”, much like an Excel spreadsheet or a data frame in R. The rows are states, and the columns are “facts” about those states: Name, Image, Population, etc. You can customize the columns returned to add new variables.
My first thought was that this would be a great source of data for examples in
R. Just the other day, I was looking for a list the populations of the largest US cities to illustrate
Zipf’s law — could Google Squared have helped me? Sadly, no — at least not yet.
The first problem is data quality. That search for US States included Georgia in the top 10 … but if you add “Capital” to the list of variables, the capital is listed as T’bilisi, not Atlanta. To be fair, Google Squares lets you click on a data value and select from other possibilities, so I can change it to Atlanta if I want. But I was hoping that Google Squared would draw on the consensus of the Web, in context with my search, to produce a table of good data values. It seems the intent is to . …
Google Labs has just released Google Squared. Unlike a Google web search which returns an unstructured list of web pages, Google Squared is designed to return structured data. Searching for US States returns a “square”, much like an Excel spreadsheet or a data frame in R. The rows are states, and the columns are “facts” about those states: Name, Image, Population, etc. You can customize the columns returned to add new variables.
My first thought was that this would be a great source of data for examples in
R. Just the other day, I was looking for a list the populations of the largest US cities to illustrate
Zipf’s law — could Google Squared have helped me? Sadly, no — at least not yet.
The first problem is data quality. That search for US States included Georgia in the top 10 … but if you add “Capital” to the list of variables, the capital is listed as T’bilisi, not Atlanta. To be fair, Google Squares lets you click on a data value and select from other possibilities, so I can change it to Atlanta if I want. But I was hoping that Google Squared would draw on the consensus of the Web, in context with my search, to produce a table of good data values. It seems the intent is to use Google Squared as an alternative to Excel for collecting data you’ve found and verified yourself on the Web.
Even if you can find the right variables, getting the right records is tricky, too. Let’s say I want to generate data for the 50 US States. First of all, I have to keep clicking “Add next 10 items” until the Square is full of all 53 rows Google generates. (Why can’t I get all the rows in one fell swoop?) Then I have to delete DC, Virgin Islands, Afghanistan and Harvard University: that leaves me with 49 rows. One state is missing, but which one? You can’t sort the rows by state name, which might have helped.
My next thought was to export the Square to R, and match the names against state.name to find the missing one. But, alas, you can’t export the data. C’mon Google, why not a simple CSV export? I have to spend all this time creating and verifying the data, and now you’re not going to let me use it? Grr.
I know this is only a Labs feature, and it does show promise. But with the data quality issues and the inability to export, sadly it doesn’t seem like it’s going to be a useful source of datasets anytime soon.
Link to original post