What the Tweetgeister?

This is a visualization of the tweets associated with Ignite Austin #2 held on October 20, 2010. A "live" visualization exists, but one will need modern-and-hip browsers (Chrome, FireFox, Safari) to see it. The technical term of this layout is a "circle packing layout", which provides a reasonably space-efficient presentation of hierarchical data. Wasted space exists (the empty part of the circles), but the viz very quickly reveals the clusters with the largest number of tweets. These populous clusters represent memes that formed during the span of time that people were tweeting about something - the Twitter zeitgeist: The Tweetgeist.
Each red dot is an individual tweet hashtagged with "#ia2" (link may not return anything in a week or two - Twitter doesn't search the past very far). Each gray circle is a clustering of tweets. I used semantic techniques to perform the clustering, so that similar content will be grouped together. For example:

shows a cluster related to the words "Fonts are the clothes that words wear". That is the text of the tweet that is most representative of this cluster and the nine red dots are tweets that fit into this cluster. The yellow box is a tool tip that shows the actual cluster "concept".
When the cursor is placed over a tweet, the actual text of the tweet shows up. In this case, that text is "Typography: think of the clothes words wear". Similar to the cluster concept, but not identical. The clusterer sub-clustered the cluster (did that make sense? I think it did). There are two main sub-clusters: one with one tweet and one with eight tweets. The eight-tweet cluster is further decomposed.
Now the hover-text shows the content of a tweet inside the inner-most cluster. Using Twitter-speak, this tweet is a re-tweet ("RT") of something "schnee" tweeted. Note that the text "Fonts are the clothes..." is the text of the outer most cluster. The clusterer associated "Fonts are the clothes that words wear", "Typography: think of the clothes words wear" and various re-tweets. That this works is pretty nifty.
Also note that "schnee" is me.
Using the live version of the tweetgeist, one can click on a tweet and open the Twitter page of the tweet, which is a little sugar.
"Tweetgeister" is the service that creates tweetgeists. A very private service - doing a tweetgeist is horrendously expensive in terms of time and compute resources. While I could fix both of those, the fix would require money expenses and hey, this is a hobby.
Posted at 11:30AM Oct 22, 2010 by schnee in General |
Sitegeister, huh?
I recently mentioned something called "Sitegeister". This is a little concept I came up with back in the Spring and wanted to do something with it.
The name is a play on both "web site" and "zeitgeist", and a "sitegeist" is a summary of the topic of a site, in much the same way a "zeitgeist" is the topic of an era. "Sitegeister", then, is the actor that determines the sitegeist.
Sitegeister subscribes to RSS/Atom feeds from various websites (news aggregators, blogs, whatever) and analyzes the content of the feeds to determine the topics (topics are reset weekly). These topics are located at the bottom of the main Sitegeister page. For example, as I write this, the top topics (or concepts) on the Yahoo! most popular news feeds are "candidate palin", "candidate sarah", "gulf hurricane", "hurricane orleans", "palin running", "running sarah", and others. Obviously, some overlap between these concepts exist.
After determining the top concepts, Sitegeister then draws the blob on the top of the main page. This is a link-map and tries to connect sites to other sites. Each blob is a Site and is surrounded by it's top concepts. Concepts from one site that are related to another site's concepts are linked via an edge. This is all good stuff and relies heavily on Latent Semantic Indexing and Vector Space Models and angles between vectors and really helped me marry my math background with my love of language.
Unfortunately, I geisted the wrong sites. Most of my sites are news aggregators and they all tend to converge on the same concepts. That means that the blob-map tends towards fully-connectedness (which means it is a mess) and really doesn't reveal anything. I need to choose other sites that may or may not be related, like various blogs. It may be interesting to determine links between blogs that advertise different topics.
Or maybe not. Either way, it was a fun excercise, and demonstrates that Google AdSense works really, really well. The FAQ at Sitegeister.com has more information.
Posted at 12:52PM Sep 05, 2008 by schnee in General |