Jon Aquino's Mental Garden

Engineering beautiful software jon aquino labs | personal blog

Thursday, March 31, 2005

BrainMaps: Using TreeMaps To Browse Your Blog

Yesterday I wrote about automatically extracting categories for your blog posts using Y!Q and Google Directory, and I ended with two ideas for browsing this information: (1) as a list of categories starting with the most frequent, and (2) as a tree (think Windows Explorer). The tree idea was interesting, but it does not give a good visual indication of which categories contain more posts than others.

One solution is to use a treemap, as shown below.



Above we see a treemap for the various categories of information extracted from my blog. The University of Maryland Human-Computer Interaction Lab has this great free Java app called TreeMap that will read in a simple tab-delimited file and display a treemap for it. The information is a bit hard to read at this level (some clever colour support would definitely enhance it), but even at the 50,000' level we can see some subject areas that fire me up:
  • Computers: programming languages (yellow), programming methodologies, PDAs, Human-Computer Interaction
  • Arts: online writing
  • Religion: fun and entertainment (heh)
  • Science: language and linguistics
Let's zoom into the main area, Computers:



Here we see a number of areas that I love reading about:
  • software engineering
  • text editors and development tools
  • search engines
Let's back up a bit and see if I can spot anything in the Arts that really grabs my interest:



Here I am reminded of my fascination with:
  • typography
  • literature: cyberpunk
  • poetry
I won't go further, but suffice to say that treemaps are an interesting way to browse your brain. In a recent conference presentation, John Smart ended his talk with the idea of uploading your brain onto the web, and the intriguing idea of your great great great grandchildren conversing with your uploaded brain, long after you're gone. The idea of mining your blog for your unique hierarchy of interests, and browsing them with treemaps, is a step in that direction.

And for the curious, here are some interests siphoned from other parts of my brainmap:
  • Society/Religion_and_Spirituality/Christianity/Denominations/Catholicism/Prayer_and_Spirituality
  • Society/Subcultures/Geeks_and_Nerds
  • Society/Ethnicity/Filipino
  • Society/Future/Essays
  • Society/Philosophy/Ethics/Education/Empathy_and_Compassion
  • Business/Industries/Information_Technology
  • Business/Management/Communication_Skills
  • Reference/Libraries/Library_and_Information_Science/Technical_Services/Cataloguing/Metadata/Resource_Description_Framework_-_RDF/Applications/RSS
  • Reference/Knowledge_Management
  • Science/Social_Sciences/Psychology/Self-Help
  • Science/Math/Statistics
  • Home/Consumer_Information/Electronics
  • Home/Personal_Organization

Wednesday, March 30, 2005

I received an invite to Yahoo 360

Well I just received an invite to Yahoo 360 from Erik Thauvin (thanks Erik!). It's Yahoo's blog/social-networking app. My threadbare Yahoo 360 blog lives here: http://360.yahoo.com/jonathan_aquino.

I don't know about you, but I want my blog to be preserved for eternity. That's why I signed up with Blogger -- Google's stewardship of the Usenet archives has shown me that they are committed to preserving user-created content for the long haul. Another candidate is OurMedia.org, backed by the Internet Archive which has altruistically maintained snapshots of a good portion of the web since 1996, under the leadership of the eccentric Brewster Kahle. So they too have proven their commitment.

I don't know about Yahoo though... Their culture screams commercialism to me. I don't know if I want to hand my data over to them, for fear of it being deleted. Case in point: Yahoo Mail deletes all your mail if you don't log for a certain period of time. This is an example of stinginess toward the customer. Counter-example: GMail, which offers 1 GB of space and email forwarding -- probably the only free webmail service to offer these features.

I'll take generosity over stinginess any day.

(If you want a Yahoo 360 invite, send me an email. I've got 100 to give out.)

Tuesday, March 29, 2005

Rosetta Stone For Jon Aquino's Skills And Interests

Working on my previous post on automated introspection, I discovered a remarkable triangle that elegantly illustrates my favourite skills and interests:



So here we see my three favourite skills: programming, statistics, and graphic design. And also we see three of my very favourite interests: data mining, human-computer interaction, and information visualization. And the neat thing is that each favourite interest relates two favourite skills:
  • data mining relates programming and statistics
  • human-computer interaction relates programming and graphic design
  • information visualization relates graphic design and statistics
It's a self-referential triangle, so it has to be true!

Mashing up Y!Q with Google Directory

What if we could use existing web services to perform automated categorization of blog entries? It turns out that we can achieve this using Y!Q and Google Directory -- the former to extract the most important words from a paragraph, and the latter to provide categories for these words.

There are two steps:

1. Extract keywords from each post using Yahoo's beta Y!Q service, which lets you search on one or more paragraphs of text (yes, as input).

2. Plug those keywords into Google Directory to find out what categories they belong to.

For example, take the following post which I wrote a few weeks ago:
My hope is that this Meetup.com group will be a means to find other Victoria residents who have a passion for cutting-edge technology. I found this great list of the world A-list digerati. Brewster Kahle is The Searcher, Kevin Kelly is the Saint, Bill Gates is The Software Developer, Esther Dyson is The Pattern-Recognizer, etc. Who are the digerati in beautiful, unassuming Victoria BC?
Y!Q gives me the following keywords. I'm guessing the numbers represent relevance scores, out of 100:
esther dyson:100 | digerati:99.553 | victoria bc:51.223 | brewster kahle:50.773 | kevin kelly:50.756 | cutting edge technology:50.252 | software developer:50.207 | bill gates:50.18 | cutting edge:50.053 | edge technology:49.751 | passion:49.059 | meetup:23.262 | pattern recognizer:1.487 | list:0.798 | saint bill:0.498 | searcher:0.498 | unassuming:0.498 | dyson:0.497 | residents:0.491 | victoria:0.01
Now let's look at the category that Google Directory gives me for "esther dyson":
Science/Social_Sciences/Political_Science/Public_Policy/Ecommerce_Policy
I would love to extract the Google Directory categories for all 24300 keywords that Y!Q extracted from my blog, but unfortunately it limits me to 1000 queries per day. So I'm just taking the highest-ranking ones for now.

That's the beauty of web services -- they can be combined in interesting ways. Web-service mashups is a meme that is currently building momentum.

Update: Here are some preliminary results of the automated analysis of my blog, for my favourite interests, using the 1000 highest-rated keywords from Y!Q:

226 Computers
108 Arts
83 Society
73 Computers/Software
72 Business
51 Regional
48 Computers/Programming
46 Games
45 Society/Religion_and_Spirituality
41 Science
39 Society/Religion_and_Spirituality/Christianity
38 Shopping
38 Arts/Music
37 Computers/Internet
34 Reference
33 Business/Industries
32 Computers/Programming/Languages
30 World
30 Recreation
26 Computers/Data_Formats
25 Regional/North_America
22 Computers/Software/Internet
22 Computers/Data_Formats/Markup_Languages
20 Games/Card_Games
20 Computers/Data_Formats/Markup_Languages/HTML
19 Games/Card_Games/Special_Decks/Guillotine
19 Games/Card_Games/Special_Decks
18 Computers/Data_Formats/Markup_Languages/HTML/References
17 Reference/Education
17 Arts/Music/Styles
15 Society/Religion_and_Spirituality/Christianity/Denominations
15 Science/Social_Sciences
15 Regional/Europe
15 Computers/Programming/Languages/Java
14 Computers/Software/Internet/Clients
14 Arts/Movies
13 Regional/North_America/United_States
13 Arts/Literature
12 Society/Religion_and_Spirituality/Christianity/Denominations/Catholicism
12 Regional/North_America/Canada
12 Home
12 Business/Management
11 Regional/Europe/United_Kingdom
11 Computers/Internet/On_the_Web
10 Kids_and_Teens
10 Computers/Software/Operating_Systems
10 Computers/Open_Source
10 Business/Major_Companies/Publicly_Traded
10 Business/Major_Companies
10 Arts/Animation

Another way we can browse this information is using a tree. Here I'm using Microsoft's free XML Notepad. The number of blog posts is given for each node:



The implication of this process is that you can analyze anyone's blog and "browse their brain-tree" to see if you have similar interests as they do. Taking it a step further, you could probably find some statistical measure of how well your brain-tree intersects their brain-tree. Call it automated introspection (or automated extrospection?).

From the above information I have hand picked a few areas of interest that I am especially interested in:
  • Arts/Graphic_Design/Typography
  • Computers/Human_Computer_Interaction
  • Computers/Internet/On_the_Web/Weblogs
  • Computers/Internet/Searching
  • Computers/Open_Source
  • Computers/Programming/Languages
  • Computers/Programming/Languages/Ruby
  • Computers/Programming/Languages/Smalltalk
  • Computers/Programming/Methodologies
  • Computers/Software/Databases/Data_Mining/
  • Computers/Software/Editors
  • Computers/Software/Fonts
  • Computers/Software/Freeware
  • Computers/Software/Internet/Servers/Collaboration
  • Computers/Systems/Handhelds
  • Consumer_Electronics
  • Reference/Knowledge Management/Knowledge Discovery/Data Mining
  • Reference/Knowledge Management/Knowledge Discovery/Text Mining
  • Reference/Knowledge Management/Knowledge Discovery/Information Visualization
  • Regional/North_America/Canada/British_Columbia/Localities/V/Victoria
  • Science/Math/Statistics
  • Science/social_sciences/Psychology/industrial_and_organizational/human_factors_and_ergonomics
  • Science/Social_Sciences/Psychology/Self-Help
  • Society/Religion_and_Spirituality/Christianity/Denominations/Catholicism/Prayer_and_Spirituality
  • Society/Subcultures/Geeks_and_Nerds

Update, April 6: No need to screen-scrape Y!Q anymore -- they have released an official API for accessing their "term-extraction" service.

Sunday, March 27, 2005

YEAAAAAHHHHH!!! GTD Weekly Review Complete!

After a lapse of 37 days, I have finally completed a GTD weekly review. Celebrate with me! [dance]

The GTD Weekly Review is a painful two hours. After 1 hour, my mind is frazzled. I'm going to split my review into two: 1 hour for the review, then do something completely different, and back for the final hour of torture.

To those who have no idea what I am talking about, the Getting Things Done (GTD) methodology is a system for organizing the hundreds of actions that you need to do in your life. In a nutshell: get them off your mind and onto written lists, and review these lists weekly. There's a lot more to it than that, but that's the gist of it.

I was wondering what I would give up for Lent, and now that the end of it has arrived, I see I gave up my Weekly Review! ;-)

Saturday, March 26, 2005

Cogent Explanation Of Web 2.0

Someone called their blog "Read/Write Web", and that is a succinct description of what is being referred to as "Web 2.0".

Originally the Web was "read-only" -- the average person did not have the tools to put things on the web -- they would simply read webpages that companies and web enthusiasts would put on the web.

But now the Web is read/write -- there are all sorts of tools that make it trivial to write information to the web (often without cost): wikis, blogs, comments, podcasts (well, that needs to be made simpler), Flickr, del.icio.us.

A Better Bloglines Notifier

>> Download yabn-1.0-beta-1-windows.zip (1.5 MB) <<

Requirements: Java 1.4 or newer, Windows XP (Let me know if it works in other versions of Windows.) (Want to help me test the Mac/Unix/Linux versions? Send me an email!)

Installation: Unzip it to a folder on your computer. Edit yabn.ini to specify your email and password. Add a shortcut to yabn.bat to your Startup folder.

Caveat: As this is a Java application, don't be surprised if this puppy consumes 30 MB of memory. But it's worth it if you're a Bloglines junkie like me!


Below is the standard Bloglines Notifier. It's OK, but it doesn't provide very much information:



Basically when you have unread items, a red square appears in the upper-right corner.

Being a data-visualization junkie as well as a computer programmer, I decided to write my own alternative Bloglines notifier:



It improves on the original by displaying the number of unread items directly on the icon itself. I don't know of any other system-tray icon that displays numbers, although many have some sort of visual cue to indicate two or three different states. The 16x16 area of a system-tray icon presents interesting possibilities for a richer display of status information (cf. Edward Tufte's beautiful work on sparklines).

Another feature of this notifier is that when new posts arrive, it pops up a balloon containing an excerpt of each:



This is great if you are monitoring your conversations via PubSub or Technorati -- when someone talks about you or a topic you are interested in, the balloon will pop up showing their post. (Note that the balloon will not appear in real time per se, as Bloglines grabs new posts on an hourly basis). This feature was inspired by the GMail Notifier (screenshot), which also pops up an excerpt when new mail comes in. (The similarity points to the possible future convergence of RSS and email methinks).

The menu item provides a couple of additional features:



If the balloons went by too quickly and you want to see them again, click Show Unread Items Again. If nothing looks interesting, you can click Mark All Items As Read to save yourself a trip to Bloglines. This workflow presents the interesting possibility of never having to open the browser to check your feeds -- when new posts come in, they appear as balloons, and you can browse your feeds from the system tray. Of course, to complete the loop it would be nice to click on a balloon to open the link in your web browser, but I haven't implemented this yet.

If you want to enhance or customize the notifier, simply open yabn.groovy in your favourite text editor. It's written in a wonderfully expressive Java scripting language called Groovy, and as you can see below, it's not that complicated if you've done some programming before. The neato JDesktop Integration Components library (JDIC) was used to create the system-tray icon and to launch the browser.


# Thanks to the Groovy developers for creating the wonderfully
# expressive language in which this script is written [Jon Aquino 2005-03-25]

import java.awt.*
import java.awt.event.*
import java.awt.image.*
import java.io.*
import java.net.*
import javax.swing.*
import org.apache.commons.httpclient.*
import org.apache.commons.httpclient.methods.*
import groovy.swing.*
import org.jdesktop.jdic.desktop.*
import org.jdesktop.jdic.tray.*
import org.jdom.input.*

UIManager.setLookAndFeel(UIManager.systemLookAndFeelClassName)
properties = new Properties()
properties.load(new FileInputStream("yabn.ini"))

# Thanks to the Sun JDIC team for the ability to create a tray icon
# and launch a browser in 1 line. [Jon Aquino 2005-03-25]

image = Toolkit.defaultToolkit.getImage("Bloglines.png")
TrayIcon trayIcon = new TrayIcon(new ImageIcon(image), "Yet Another Bloglines Notifier", createMenu())
trayIcon.addActionListener(new ActionListenerClosure({
goToBloglines()
updateTrayIcon(trayIcon, image, 0)
}))
SystemTray.defaultSystemTray.addTrayIcon(trayIcon)

seenItems = new HashSet()
firstDisplay = true
putAt("showUnreadItemsAgainRequested", false)
while (true) {
items = items(false)
println("Found ${items.size()} unread items")
updateTrayIcon(trayIcon, image, items.size())
if (!getAt("showUnreadItemsAgainRequested")) {
items.removeAll(seenItems)
}
displayItems(trayIcon, firstDisplay && items.size() > 5 ? items[-5..-1] : items)
seenItems.addAll(items)
System.gc()
firstDisplay = false
println("Waiting for " + properties.getProperty("seconds-between-checking-for-new-items") + " seconds")
putAt("showUnreadItemsAgainRequested", false)
i = 0
while (!getAt("showUnreadItemsAgainRequested") && i < properties.getProperty("seconds-between-checking-for-new-items").toInteger()) {
Thread.sleep(1000)
i++
}
}

# Thanks to phk and blackdrag on the Groovy IRC channel for
# ActionListenerClosure [Jon Aquino 2005-03-25]

class ActionListenerClosure implements ActionListener {
closure
ActionListenerClosure(closure) { this.closure = closure }
void actionPerformed(ActionEvent e) { closure.doCall(e) }
}

def callBloglines(url) {
client = new HttpClient()
credentials = new UsernamePasswordCredentials(properties.getProperty("email"), properties.getProperty("password"))
client.state.setCredentials("Bloglines RPC", "rpc.bloglines.com", credentials)
get = new GetMethod(url)
get.doAuthentication = true
client.executeMethod(get)
get
}

def updateTrayIcon(trayIcon, image, numberOfUnreadItems) {
trayIcon.toolTip = "${numberOfUnreadItems} Unread Item${numberOfUnreadItems==1?'':'s'}"
bufferedImage = new BufferedImage(16, 16, 2)
graphics = bufferedImage.createGraphics()
graphics.drawImage(image, 0, 0, null)
if (numberOfUnreadItems > 0) {
graphics.color = Color.yellow
graphics.fillRect(0, 8, 16, 16)
graphics.color = Color.red
graphics.font = graphics.font.deriveFont(new Float(9))
graphics.drawString(numberOfUnreadItems.toString(), 0, 16)
}
trayIcon.setIcon(new ImageIcon(bufferedImage))
}

def displayItems(trayIcon, items) {
i = 0
items.each { item |
i += 1
trayIcon.displayMessage("${i}/${items.size()}: ${item.title}", "(${item.feed}) ${item.text}", TrayIcon.NONE_MESSAGE_TYPE)
Thread.sleep(properties.getProperty("seconds-between-displaying-each-item").toInteger() * 1000)
}
trayIcon.displayMessage("", "", TrayIcon.NONE_MESSAGE_TYPE)
}

def items(markAsRead) {
# Use responseBodyAsString rather than responseBodyAsStream, which
# seems susceptible to UTF-8 errors ("Invalid byte 2 of 3-byte UTF-8
# sequence.
") [Jon Aquino 2005-03-20]
response = callBloglines("http://rpc.bloglines.com/getitems?s=0&n=${markAsRead?1:0}").responseBodyAsString
if (response == null) { return [] }
items = []
new SAXBuilder().build(new StringReader(response)).rootElement.getChildren("channel").each { channelTag |
channelTag.getChildren("item").each { itemTag |
item = new Item(feed:channelTag.getChildTextTrim("title"), title:itemTag.getChildTextTrim("title"), text:itemTag.getChildTextTrim("description"))
items.add(item)
}
}
items
}

class Item {
feed
title
text
void setFeed(feed) { this.feed = clean(feed) }
void setTitle(title) { this.title = clean(title) }
void setText(text) { this.text = clean(text) }
boolean equals(Object other) { feed+text == other.feed+other.text }
int hashCode() { (feed+text).hashCode() }
clean(s) { s.replaceAll("<[^>]+>", "").replaceAll("&[^ ]+;", "#") }
}

def createMenu() {
x = this
aboutDialog = new SwingBuilder().optionPane(message:"Yet Another Bloglines Notifier 1.0 Beta 1\nby Jonathan Aquino").createDialog(null, "About YABN")
new SwingBuilder().popupMenu() {
menuItem() { action(name:"Show Unread Items Again", closure:{ x.putAt("showUnreadItemsAgainRequested", true) }) }
menuItem() { action(name:"Mark All Items As Read", closure:{ x.markAllItemsAsRead() }) }
menuItem() { action(name:"Go To Bloglines (click icon)", closure:{ x.goToBloglines() }) }
menuItem() { action(name:"About...", closure:{ aboutDialog.visible = true }) }
menuItem() { action(name:"Exit", closure:{ System.exit(0) }) }
}
}

def goToBloglines() { Desktop.browse(new URL("http://bloglines.com/myblogs")) }

def markAllItemsAsRead() {
items(true)
updateTrayIcon(trayIcon, image, 0)
}


>> Download yabn-1.0-beta-1-windows.zip (1.5 MB) <<

Requirements: Java 1.4 or newer, Windows XP (Let me know if it works in other versions of Windows.) (Want to help me test the Mac/Unix/Linux versions? Send me an email!)

Installation: Unzip it to a folder on your computer. Edit yabn.ini to specify your email and password. Add a shortcut to yabn.bat to your Startup folder.

Caveat: As this is a Java application, don't be surprised if this puppy consumes 30 MB of memory. But it's worth it if you're a Bloglines junkie like me!

MemeWatch: GreaseMonkey

The latest meme that is appearing on my radar screen is GreaseMonkey. This is a Firefox extension that "fixes" various websites, and new fixes for new websites can easily be added. For example, there is a GreaseMonkey script to add auto-complete to the del.icio.us tag input-box.

That's the cool-thing about del.icio.us popular -- you typically get a 48-hour lead on interesting memes before they hit larger news sites like Slashdot (and a week or two before they hit big media sites like CNet).

Friday, March 25, 2005

11-day GTD lapse

Dang, I was doing well at managing my time using the Getting Things Done (GTD) methodology, even getting a mention on tech ronin's blog for my empty email inbox.

Well I have recently stumbled, and stumbled hard. Haven't done a daily or weekly review for 11 days. Anyway, enough is enough, and I am going to at least do a daily review tonight, and I'm definitely going to do a full weekly review tomorrow, painful as they are.

The Zen Of Having 1 Firefox Address Bar Instead Of 2

By default, Firefox defaults to having two "address bars": one for entering URLs into, and one for entering Google searches into:



The problem is that this setup forces you to make a conscious decision every time you want to type something in: do I type it into the left one, or the right one? This is the classic problem of the evilness of "modes" -- having to switch to different modes (contexts) breaks the flow.

Fortunately there is an easy way to fix this problem. First of all, remove the Google box (hang on, I'm not finished yet!). You can easily do this in Firefox using View > Toolbars > Customize.



Now you've got the one address bar. You can either type a URL into it, or type some keywords into it to go to the top-ranked Google search result (i.e. an "I'm Feeling Lucky" query).

"But sometimes I want a list of Google search results", you say. OK, here is the icing on the cake. Instead of using Google's "I'm Feeling Lucky" for keywords, let's use Google's "Browse By Name" function. This is a smart search that takes you to the top-ranked Google search result only if most people who do that search go there; if not, it will return a standard Google search-result list.

In effect, you have a single "smart" address bar that either (1) goes to the URL you entered, (2) goes to the top-ranked Google search result for the keywords you entered, or (3) provides a list of Google search results. And it is smart enough to choose the correct option almost all of the time.

So if you type "porsche", you'll be taken to the Porsche website:



...whereas if you type "cars", you'll be shown a list of Google search results for "cars":



Neat!

So how do I enable "Browse By Name" in the address bar, you ask? The short answer is that in Firefox's about:config page, you set the keyword.URL to http://www.google.com/search?ie=UTF-8&sourceid=navclient&gfns=1&q=
Jesse Ruderman has written up more detailed instructions.

Thursday, March 24, 2005

del.icio.us down?

Man, what is up with del.icio.us right now? Has the dreaded day when del.icio.us goes down and takes all its user information down with it finally arrived? But of course we've all been backing up our del.icio.us links to our local drives, right?

C'mon del.icio.us. Work again. Please.

A board game about board gaming

Board gaming experienced a renaissance in North America a few years ago, and its devotees congregate at BoardGameGeek.com.

I was thinking ... how about making a board game about board gaming? You could call it "Meta", short for meta-boardgaming. In this game you are a boardgame collector, and you agonize about how to spend your $200 on a few of the many excellent games out there (a good German board game will typically run you $30-$60). And there are so many factors to consider: theme, mechanics, price, reputation of the designer, reputation of the manufacturer, how many people will be at your place tonight. And there are only so many hours in the day, so how to spend it: reading boardgame reviews, writing articles for boardgamegeek.com, analyzing boardgame statistics like average rating, or maybe even playing a boardgame.

Anyway, it could become a big hit in the boardgaming community. Lazyweb, I invoke thee. And make sure it has good bits!

Getting Upset Over Nothing, or Be Sure To Read The *Whole* Email

A couple of days ago, my email notifier popped up a new window saying, "New Mail from Todd Brill: 'Just thought I'd drop you a note to let you know that your blog comments...'". Now I've recently been leaving some comments here and there on Todd Brill's blog -- Todd is a fellow Victoria BC blogger that I met at the recent Meetup.com meeting. So I didn't know Todd that well, and here I am leaving all these comments on his blog about this technology or that technology -- you know, the stuff I'm passionate about. And I wasn't too sure if he was getting annoyed with me overstaying my welcome on his blog -- that is, until I received that email notification:

"New Mail from Todd Brill: 'Just thought I'd drop you a note to let you know that your blog comments...'"

When I read that, my heart sank. Finally Todd had had enough and was writing to tell me that my blog comments were getting a bit off topic, and to please confine my comments to the subject of the post -- or so I thought. Actually, I had received an email very like this from John Zeratsky, so I thought, here comes another one.

I prepared my rebuttal -- would I say, "OK then, see ya" and never come back? I mean, c'mon it's a blog for heaven's sake -- can't we have a little leeway for freedom of expression? I mean, it's not like I'm saying anything offensive; I'm just veering a little off topic, and the subject is still related -- I'm just using the post as a springboard, which is I think valid. "Fine," I steamed, vowing not to attend the next Meetup meeting. How could I? The tension would be unbearable.

Anyway, I was curious to read the entire email, and it really went like this:

"Just thought I'd drop you a note to let you know that your blog comments don't seem to be functioning properly. When I clicked on the 'Tradeoff between blogging now and blogging later' comment button, I got a 404 error."

Mea culpa, mea culpa, mea maxima culpa. Todd was not "letting me know that my blog comments were getting a little off topic" -- he was "letting me know that my blog comments don't seem to be functioning properly". The man was just trying to help me out by informing me about a problem with my blogging software. When I read that, my emotions did a 180 and instead of feeling angry and frustrated, suddenly I felt extremely grateful (and extremely guilty!).

It just goes to show you, Jon's Productivity Principle #1: Be Sure To Read The *Whole* Email.

Wednesday, March 23, 2005

PocketPC Tip: Put FastCleanup in your Startup folder

Here's a tip for PocketPC users. Put FastCleanup (freeware) in your Startup folder. Then everytime your PocketPC is restarted, FastCleanup will clean out your temporary files, freeing up valuable space.

Google Desktop is slowly changing the way I work...

... Instead of opening Windows Explorer to find a file, I am increasingly typing search queries into the Google Desktop Search deskbar.

Power at my fingertips!

Googlebaiting

Googlebaiting is the practice of mentioning a long lost name's acquaintance in a blog entry in the hopes that they will Google their own name and find you. Today I received confirmation that this works in practice -- I blogged an old acquaintance's name (put out the bait, if you will), and the bait got bitten -- not by a friend, but by a mutual acquaintance Dan Kreuger. Dan - it was good to touch base with you.

ourmedia.org: dream come true for me

ourmedia.org is now live. It provides unlimited hosting for your audio, video, images, and text, for free, forever, provided that it is licensed under the Creative Commons.

This. Is. Amazing.

It's backed by the Internet Archive, under the leadership of Brewster Kahle, so you know it's for real.

This is a major step forward in the realization of my Mission, which is to influence humanity to use technologies that give individuals superhuman control over their information space. Welcome to the information space.

My toolbar has two bookmarklets

1. The del.icio.us bookmarklet, because I use this service frequently to record all web pages that look interesting to me.

2. My Firefox sidebar, which contains all the other bookmarklets! I love how Firefox lets you make a sidebar using simple HTML.

Destined Collective

I'm making this my start page, mostly for the audio experience - I love the grand, ambient soundtrack! And it has pretty pictures - it showcases cutting-edge artists.

But the music is an awesome way to begin browsing! It seems to have a bunch of different tracks, and you can skip to the next track if you don't like the current one.

http://www.dstnd.com/

There are two kinds of people using del.icio.us...

... taggers, and describers. I'm a describer. When I add a bookmark to del.icio.us, I add a description to it, mainly because adding a description is ridiculously easy -- it is simply a matter of selecting some text then pressing the del.icio.us bookmarklet.

I don't know what motivates taggers to tag -- you have to expend brainpower to come up with tags that make sense. I guess there are two kinds of del.icio.us users: those who tag, and those who describe.

ToDo: secure your wireless network : Lifehacker

My opinion on unsecured wireless networks as an opportunity for people to do bad things: it is also an opportunity for people to do good things. It is amoral.

I leave my wireless network unsecured because I am out of the house from 9-5 -- might as well let people get some use out of it.

Tuesday, March 22, 2005

GMail's plain-HTML mode is great

Well it's not as luxurious as GMail in JavaScript mode, but what's cool about GMail's HTML mode is that it brings GMail to older browsers and unsupported browsers (like Pocket IE on my PDA!). And it's not too shabby! I'd say it retains 75% of the power of true GMail.

This is a great example of "graceful degradation" -- under sub-optimal conditions, the app still works, to a limited extent. It is still useful. That is a beautiful thing to see.

Labeller Machine - Worth It?

Today at the stationery store I was tempting myself with possibly buying a labelling machine. David Allen extols the virtues of automatic labellers, but I have also heard people say that the novelty wears off after a while.

Anyway I was all set to buy the Brother $50 labeller, but it turns out that this sucker requires 6 AA's. To get a labeller with a plug, you need to shell out $120 for the top-of-the-line model. This is so evil.

I wonder if there is a safe, easy way to hack the $50 model so I can have a wire from the battery compartment into the wall. (Safe, now!)

Tradeoff between blogging now and blogging later

I have a growing list of interesting things to blog about, most of them about technology. The problem is whether to blog an idea as it occurs to me (tapping it into an email on my PDA) or later at my PC (when I have more time to flesh it out, and add links and screenshots). One great thing about blogging on the spot (or on the walk, as I'm doing right now, believe it or not) is that the idea is fresh in your mind, and doing it later you might lose the spirit that empassioned you to write in the first place.

I've been building up a "to-blog" list for a couple of weeks now, and it's time for me to realize that I am probably not going to find the time to sit down and write each of them perfectly. So rather than let my blog go unposted-to for weeks, I'm going to consciously try to write on the spot, rather than save my ideas up for later.

Unless an idea *really* needs accompanying links or screenshots.

This is the nature of blogging and what separates it from polished magazine writing.

I'd love to spend my life ...

... writing free software and web services and giving them away for free, delighting people, and receiving recognition for my talents.

In fact I am doing that now to a degree, and I love spending my free time in this way. Except for the frustration when things don't work! Grrr.

JRuby problems :-(

I'm writing a neat-o super-duper-enhanced Bloglines notifier in a scripting language called JRuby (which combines Ruby and Java). But I'm getting these annoying, cryptic errors.

Sigh. I guess I'll try translating my script into Groovy to see if that fares better.

Here are a couple of examples of strange exceptions that JRuby is giving me:


Exception: LocalJumpError: yield called out of block
yabn.rb:248:in `openURL'
yabn.rb:248:in `balloonChanged'
/home/enebo/release/jruby/src/builtin/javasupport.rb:227:in `__send__'
/home/enebo/release/jruby/src/builtin/javasupport.rb:227:in `new'
yabn.rb:245:in `sleep'
yabn.rb:214:in `show_popups_for_new_items'
yabn.rb:258:in `each'
yabn.rb:216:in `show_popups_for_new_items'
yabn.rb:258


Exception in thread "Thread-5" org.jruby.exceptions.RaiseException: Native Excep
tion: 'class java.util.EmptyStackException'; Message: null; StackTrace: java.uti
l.EmptyStackException
at org.jruby.util.collections.AbstractStack.pop(AbstractStack.java:57)
at org.jruby.runtime.ScopeStack.pop(ScopeStack.java:74)
at org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:115)
at org.jruby.RubyModule.call0(RubyModule.java:641)
at org.jruby.RubyModule.call(RubyModule.java:602)
at org.jruby.RubyObject.send(RubyObject.java:1007)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)


Update: I just remembered that the error reporting in the current version of Groovy is even more low-level and cryptic than JRuby's. So instead of rewriting my Bloglines notifier in Groovy, I'm going to go with good ol' BeanShell -- a bit old-fashioned and verbose, but at least it is reliable.

Or maybe it's time to start learning yet another Java scripting language. I hear that Nice is interesting . . .

Friday, March 18, 2005

RSS Reduces Pressure to Post Daily

I've come across a couple of articles now about the need to post on your blog frequently, daily at least, to maintain your readership. This may have been true in the recent past (what is referred to as "Web 1.0"), when readers would visit your blog daily to check for new material and would be disappointed to find nothing new.

But now that the web is moving into an age of RSS feeds and feedreaders (the so-called "Web 2.0"), the pressure to post frequently is becoming less of an issue. People are not trotting down to your website to see what's new; instead, you are pushing content into their feedreaders. If you don't post for a few days, people using feedreaders will not mind as much, as they have a lot of other feeds to catch up on anyway. Most of them will not even notice.

This is good news for writers. The advent of RSS alleviates some of the pressure to post frequently. We have more freedom to wait until good ideas occur to us before writing.

Thursday, March 17, 2005

Starting a Second Blog for Personal Matters

It often happens that a blog that begins as a personal journal develops a more specialized focus, and starts to build a readership. I started Jon Aquino's Mental Garden in March of 2004, and like many online diaries, the content was an eclectic mix of personal and professional snippets. Over the course of a year, JAMG has assumed a definite technological focus -- my emerging interest has been the "personal information cloud" -- and most people who visit this site are more interested in my new ideas rather than my personal details.

Thus, I have decided that it is finally time to create a second blog to hold my personal ramblings, geared more toward my family, friends, and myself most of all. It has the quirky name of JonAquino2, and it will be a repository for random thoughts and ideas-in-transit -- stuff not quite ready for prime time that I want to stream into the cloud nonetheless.

And who can resist creating a new blog?

Wednesday, March 16, 2005

Meetup.com Report: The Victoria Bloggerati


Meetup.com is a free online service that is used to organize meetings of like-minded people in the same city. This week I attended my first two meetups -- the Victoria Graphic Design meetup on Monday, and the Victoria Blogger meetup today.

The Victoria Bloggerati convened after several months of absence, at two tables at Starbucks on Government. I had the great pleasure of meeting up with Todd (http://www.toddbrill.net/), Jodie (http://tryingreallyhard.com/), Leon (http://firemind2.blogspot.com/), and Wil (http://www.mentalwanderings.com/). I spent most of the evening chatting with Leon, who counts among his many interests his Flickr photoblog and his many blogs ("I've got too many of them!").

Wil's a veteran blogger who's been at it for a couple of years. He recently moved to beautiful Victoria BC from Portland. We talked a bit about TypePad, WordPress, and some of the other blog systems out there.

It was great to meet Jodie, who is the humorous voice behind www.tryingreallyhard.com. She brought her son Aidan (hope you weren't too bored during the meetup!). Jodie has a bunch of creative ideas, and she will soon be launching a new website, which she's quite excited about.

And I was intrigued to see Todd whip out a Handspring Visor PDA (I had a couple of years of good service from a Visor Neo), and it was great to meet a kindred spirit. We're both big fans of Web 2.0, and of course we had to evangelize RSS and Bloglines (the most convenient RSS reader out there BTW). Todd - we've got to chat more about this stuff!

Personal Information Cloud: Better Name Needed?

Personal Information Cloud: Better Name Needed?

I love the idea of a personal information cloud, and technologies to manage it. But it seriously needs a better name. "Personal information cloud" conjures up an image of a person groping about in broad daylight because of this "cloud" in which they are enveloped.

On the other hand, maybe that is not too far from reality...

Tuesday, March 15, 2005

Oh yeah, another 256 MB of RAM

Instead of paying $40 for Opera, I paid $80 for more memory for my computer so that Firefox would be more nimble. Go figure!

Anyway, I doubled the RAM in my budget Gateway computer, and she is a-flyin' now, let me tell you! I've always had a ton of things running in the background, and Firefox would just bring the system to its knees, causing me no small amount of stress and frustration.

And back in the day, 1 MB of RAM cost $50! Remember that?

GMail Coolness: Downloading Multiple Attachments

I recently discovered another cool GMail feature: it lets you save all your attachments at once (as a single zip file). Other webmail systems (and even desktop mail clients like Outlook) force you to save each attachment, one at a time.

That's the thing about GMail - it is constantly being enhanced with new features, requiring no user intervention, with no updates to install. Features are being added to it more quickly than Yahoo Mail and Hotmail, and far more quickly than desktop mail clients like Outlook, which get new features (and bugs) on a yearly cycle.

Oh, and GMail is free.

Monday, March 14, 2005

A Conversation With Michael Tension



Today I had the pleasure of having a conversation with Michael Tension, who is well known in Victoria for his ten years as the singer/songwriter for the band Jeffrey Sez. More recently he has been focusing on his talents in the visual arts and has been producing graphic design work for a number of organizations, groups, and businesses in Victoria and other parts of the world.

Michael and I both have a passion for graphic design, and our conversation ranged from posters and billboards that each of us had recently created, to typography and grunge fonts, to the serene state of inspiration that Michael likes to get into for the first day of any major creative effort. Michael likes to radically combine digital and traditional tools in his work; a piece might undergo numerous iterations of computer scanning, cutting, and physical pulverizing to achieve the effect that he wants.

I greatly enjoyed our conversation, and I look forward to meeting up with other creative minds in Victoria.

Sunday, March 13, 2005

More to Life than Ruby on Rails

More to Life than Ruby on Rails

I was sitting by the steps in front of the Bay Centre in Victoria BC, doing some peoplewatching, and feeling a little discouraged at the prospects of of my Mission, which is to influence humanity to use technologies that give them superhuman control over their information space. Anyway, an old Chinese woman was making her way down the steps, dragging her cart behind her. Thud-thud, thud-thud, thud-thud went the cart down the steps. She turned to me, held out her hand, and said, "Ma-nee, ma-nee".

I told her I had to go to the bank machine to get some, but I don't think she understood. So I went to the ABM, withdrew a 20, put it in an envelope, and ran to try to look for the old woman, whom I found poking through a garbage can. She was startled when I approached her and handed her the envelope. Slowly she opened the envelope and looked inside, then a big grin spread across her face and she said thankyou.

I'm now back sitting where I was originally, and I feel fantastic -- about helping out another human being. And she probably also feels fantastic -- about getting 20 bucks today. Through our interaction we each lifted the spirits of the other; it's what Stephen Covey calls "synergy" and what Marshall Rosenberg calls "through giving, feeling given to." It's a delightful and energizing feeling!

Barber shop closed on Sunday

Barber shop closed on Sunday

Today I set off for Jimmy's Barber Shop, which is a basic barber shop that's been around for 30 years in Victoria. It being Sunday today, I was half hoping that it would be open, half that it would be closed. Well, it turns out that it was closed, and I am glad in a way - it is a gesture that says to the world that, no, you don't have to be constantly operating 7 days a week, and that it is important to make time for leisure and to recharge.

I will try again on Monday.

Matthew Kelly's Suggestions For Your "Me Day"

Here are some suggestions from Matthew Kelly's The Rhythm of Life:
  • Read one of those books you have been meaning to read for years.
  • Spend time with your family.
  • Take an afternoon nap.
  • Paint a picture.
  • Read poetry.
  • Write a poem.
  • Make some memories.
  • Speak your love.
  • Play catch with your son.
  • Get a little exercise.
I really like this book!

Demo of the power of Cascading Style Sheets


This is an amazing, dynamic demo of how different stylesheets can dramatically alter a website's appearance:

http://csszengarden.com/?cssfile=http://www.resume-3.com/zen/sample.css

Click a link under "Select a design" to change the stylesheet.

This is an amazing, dynamic demo of how different stylesheets can dramatically alter a website's appearance:

http://csszengarden.com/?cssfile=http://www.resume-3.com/zen/sample.css

Click a link under "Select a design" to change the stylesheet.

CSS Garden

Kevin - This is an amazing, dynamic demo of how different stylesheets
can totally alter a website's appearance, like you were saying. You
may already have seen this:

http://csszengarden.com/?cssfile=http://www.resume-3.com/zen/sample.css

Click a link under "Select a design" on the right side to change the stylesheet.

An Example of Excellent Usability: Wikipedia's "Create Account / Log In" Page


Clearly the programmers at Wikipedia have a clue about user-interface design. Check out their "Create account / log in" page, which solves the problem of what to do if the user does not yet have an account. Traditionally web sites have had two different pages: Log In and Create Account, and if you went to the wrong page, you got an error. Wikipedia saves you some steps by combining the two activities into a single page, the way we should have tackled this problem in the first place.

Beginnings of a personal mission statement

This is the fruit of today's "hyperbaric chamber", in which I spend an hour each evening in a hot bath meditating on my life.

We have the beginnings of a personal mission statement: "To influence humanity to use technologies that give them superhuman control over their information space."

Currently these technologies include: search engines (web and PC), blogging, social networks (delicious, Flickr), GMail, PDAs, smart phones, RSS feeds, the open-source movement, the Internet Archives, itconversations.com ...

Jef Raskin's ideas realized through GMail

Today I read this description of the Canon Cat, which is a famous computer design by usability guru Jef Raskin:

"...the Canon Cat, a small desktop computer that featured a very unique text-based user interface that not only lacked modes such as spreadsheet and word processing but didn't even use files. As I recall from my testing of the Cat, it was as if everything was a single program and a single document. You got from place to place by pressing the "leap key" that invoked a quick search of the document."

Sounds a lot like GMail. Check out this piece on GMail as the Notepad of the Web.

Saturday, March 12, 2005

GMail as the Notepad of the Web

Today I realized that GMail's latest features make it an excellent replacement for Notepad and other basic desktop text editors. (Use its Save Draft feature so that you can edit your text whenever you want.)

GMail has a number of powerful advantages over Notepad:
  • Filename is optional. No need to think of a unique filename to save under -- just enter your content and go.
  • Search all your past files at once. Try that, Notepad!
  • Spell-checking on demand
  • Load/save your text files from any computer in the world
  • Cross-platform
  • Undo Discard. Ever wish you could retrieve your file after closing it without saving? Now you can:
This is incredibly cool - a viable web-based replacement for basic desktop text editors. Yes, the Web OS is slowly coming together!

PS If you want a GMail invite, send me an email!


Update: We now have a Start Menu for the web!

Update 2: Gmail now supports rich formatting, so we now also have a WordPad for the web:

Common Fonts And When To Use Them

There are some descriptions of good fonts and when to use them over at Thinking With Type. Here are some of them which I found on my Windows XP computer:

Elegant and easy to read font: Baskerville


For children's books: Century Expanded


For headlines: Franklin Gothic


For on-screen reading: Georgia


Popular in the UK: Gill Sans


Elegance: Garamond


Another for on-screen reading: Verdana

Friday, March 11, 2005

Add a splash of colour to your RSS feedreading, with Famous Paintings (Large!)


I have created an RSS feed called Masterpiece of the Day that brings a famous painting to your feedreader each day, from the ibiblio WebMuseum. And these jpegs are HUGE! Makes for a nice splash of colour in your RSS reader every evening.

java.lang.Object - class definition not found

Some of my co-workers were getting an Eclipse error "java.lang.Object class definition not found". It turns out that the cause is that the specified JDK is missing. Blogging it here for future reference.

Thursday, March 10, 2005

Ruby on Rails

Where are the Ruby on Rails people in Victoria BC? This technology is going to be hot - I can feel it.

Weird PL/SQL Developer bug: stale windows

Today I observed a weird PL/SQL Developer bug. I was quite frustrated and mystified before I discovered it. I was trying to grant privileges to a role, but the privileges weren't having an effect. I eventually discovered that the problem vanished when I opened a new child window. I suppose that the original window somehow got "stale" and no longer was having any effect.

I even found that the same SQL statement executed in the first window had a different error when executed in a new window. So definitely some aliasing or staleness going on.

Wednesday, March 09, 2005

Victoria Weblogger Meetup, March 16

Hi Gang - Meetup.com won't let me contact you individually without paying a fee, so I'm cleverly sending out this group mail, with individual salutations below.

Wil - Are you coming to the March 16 Meetup on Starbucks on Government St.? Your blog says you're in Portland, so I guess not . . .

Ada - Got your message about Wednesday nights not being good. Ah well.

June - Saw your Flickr photostream - I like the snowy photos! Are you interested in joining our Victoria bloggers Starbucks meetup on Wednesday March 16 @ 7PM?

Mavis - I see from your blog that you are currently in Saskatoon. Guess you can't join us for the Victoria Weblogger meetup next Wednesday :-(

Leon - I'll post a comment to your blog reminding you about the meetup.

Laura-Jane - I'll post to your blog too.

Kyle - Been to the site. Love the GeoURL. Come grace us with your presence.

Devilish - You said "I hope you guys aren't too creepy in reality". I hope so too - I haven't met any of these people! Let's find out.

Joet - Come on by and have a caramel macchiato, on me. Just be sure to RSVP at http://blog.meetup.com/192/events/4297446/

And finally, MrCheezy - Love the name; looks like you're in Toronto though. Jamais la première cigarette.


OK , so when you arrive at Starbucks, look for the group congregating around the big orange Blogger sign. I'm going to make up this big sign (well, the size of a piece of paper anyway) and it's going to have a big orange B on it, just like the logo at http://blogger.com. But only we, only WE, will know what that B stands for. Mwa ha ha ha! [end evil laugh]

High Quality Podcasts


Quote:










Originally Posted by FallN



I have an iPod and I can tell you that "PodCasting" by 99.999% of people is
pure garbage. Adam Curry (Ex-MTV VJ) brings some merit to the genre but
most of the stuff is pure junk.






FallN - Just wanted to let you know about a couple of podcasts that are of
surprisingly high quality, especially if you are into tech:





IT Conversations - Recordings from various technology/science conferences and
interviews. I can't believe that this amazing resource is free. http://www.itconversations.com/



CBC/Nerd - 5 minute programs about a current technology. High production quality. http://www.cbc.ca/nerd/



For the religiously inclined, Verbum Domini has daily scripture readings of the Roman
Catholic church: http://homepage.mac.com/noebie/lectionary/verbum.htm

High Quality Podcasts


Quote:










Originally Posted by FallN



I have an iPod and I can tell you that "PodCasting" by 99.999% of people is
pure garbage. Adam Curry (Ex-MTV VJ) brings some merit to the genre but
most of the stuff is pure junk.






FallN - Just wanted to let you know about a couple of podcasts that are of
surprisingly high quality, especially if you are into tech:





IT Conversations - Recordings from various technology/science conferences and
interviews. I can't believe that this amazing resource is free. http://www.itconversations.com/



CBC/Nerd - 5 minute programs about a current technology. High production quality. http://www.cbc.ca/nerd/



For the religiously inclined, Verbum Domini has daily scripture readings of the Roman
Catholic church: http://homepage.mac.com/noebie/lectionary/verbum.htm

My Passion: Personal IT

After much soul searching (some of it automated -- see recent posts), I have finally arrived at a name for my favourite interest: Personal Information Technology.

I get excited about new free web services and software products -- things like GMail, Flickr, del.icio.us, Google Desktop. Not just any software, but software that gives us control over our personal "information space" -- the websites, emails, RSS feeds, phone calls, meetings that bombard us daily.

And not just software (Google Desktop), but also web services (GMail, Meetup.com), hardware (PDA's), personal management methods (Covey, GTD) -- I'm interested in technologies and methodologies to manage the personal information cloud.

The problem is what to call this interest. Some of my earlier attempts were: "free software and web services", "Information Technology", "the mobile lifestyle", "cyberculture" -- but none of these precisely described this space. Then today I hit upon it: Personal IT. It's about technology that persons can use to manage their information.

Personal IT is a hot industry right now, and many of the services are freely available to all. We live in exciting times!

Trick yourself into meditating for an hour

I was inspired to take up Matthew Kelly's challenge to meditate for an hour a day, but I was puzzled as to how I would keep up this superhuman practice. The answer for me, as it often is, is to make it fun.

I call it Jon's Hyperbaric Chamber. Basically you submerge yourself in a tub of water as best you can. The key is to get your ears under the waterline. You will be treated to the incredible internal sounds of your body: Breathing through your nose sounds like a subway train arriving to pick up passengers, then leaving for its next destination. Breathing through your mouth sounds like the wind forcing itself through a thick forest. And in the background is a low rumble - I'm not sure if it's the sound of blood coursing through my circulatory system or my muscles in a constant state of tension. Anyway, that grumble grows to a roar when you clench your teeth.

A big part of meditation is self-awareness, and this is a fantastic way to become more aware of your breathing.

Jon's Free (but Painful) Method for Making Podcasts for Free

1. Sign up for a free Blogger blog (this gets you your RSS feed).
2. Sign up for a free FeedBurner account (this turns your RSS feed into a Podcast feed)
3. Sign up for a free Internet Archive account (they'll host your podcast mp3's for free, forever, unlimited storage, unlimited bandwidth, provided you license it under the Creative Commons).
4. Record your MP3 (e.g. on your Pocket PC)
5. FTP it to the Internet Archive
6. Wait impatiently for 24 hours while the Internet Archive processes your MP3
7. Post a link to your MP3 on your blog. And you're done!


Somebody make this process easier!

I love Step 3 - free podcast hosting!

Tuesday, March 08, 2005


I don't know what's up with the new Doppler, but it seems to have a memory leak

Closing Doppler seemed to lighten the load substantially.

Update: The very latest Doppler (2.0.0.3) seems to be more well-behaved. I'll keep an eye on it.

The man born blind - from Father Tony's homily

Pope John XXIII: "open the windows of the church. let the stale air out and the new air in"

[gimp] What to do if the "Colormap Rotation" menu item is disabled

Click Mode > RGB

Do-It-Yourself Semantic Clustering of Tags using Google Directory

There has recently been a surge of articles about using the Porter Stemming Algorithm to find similar tags by similar etymology. Well how about clustering tags by similar meaning? Well, that's a hard problem, you say.

Or is it? It turns out that Google Directory gives us hierarchical categories for any given keyword. For example, the word "coding" is categorized under Science/Math/Applications/Communication_Theory/Coding_Theory and Computers/Programming/Languages/Java/Coding_Standards:



So if we want to organize del.icio.us and Flickr tags semantically i.e. if we want to cluster tags together, we can simply ask Google Directory for the categories of each tag.

In fact, I have implemented this idea using screen scraping to get the information from Google Directory. (Hope you won't mind too much, Google!). I have written a script to extract the most frequently occurring tags for my del.icio.us links, and a second script to cluster the tags together into categories scraped from Google Directory. The end result is a list of categories representing my interests. Check it out!

Update: A neat side effect of using Google Directory is that it even understands tags from other languages. For instance, someone tagged one of my links with "desenvolvimento". Now, I don't know what this word means, but Google Directory still gave me some clusters for it: World, Português/Regional/Brasil/Governo/Ministérios e Agências:

Scraping Google Directory to do Semantic Clustering of Tags from intersp.icio.us

Earlier today I experimented with analyzing my delicious links to determine my favourite interests. I created a Ruby script called interspicious to do this.

I was hoping that the script would tell me my 5 or 6 favourite areas of interest, but the results are too granular -- many topics identified belong to the same space:



I did use the Porter Stemming Algorithm to combine similar tags by dropping their suffixes, but I see that I'm going to need to cluster the tags by meaning, not just by their etymology.

And I think I know how -- Google Directory:



Note that Google Directory does some very nice clustering in its "Related Categories" section. So I'm going to write a script to scrape this information to cluster the hundreds of results from interspicious. Hopefully, hopefully, the result of all this will be a machine-generated list of my favourite interests!

OK, here's interspicious2 in action:



Here it is analyzing my link "RPA-base: GoodAPIDesign". It finds all the tags that others have applied to this link. It then retrieves the set of categories for each tag by scraping Google Directory. (Any tags that have previously been used for scraping are cached, for efficiency).

At the bottom of the screen you can see the categories found for the tags. Some are pretty good ("Languages/Ruby"). Others are not so good ("Health/Alternative"). But the noise will be significantly reduced as we process hundreds more of these links. At the end we will have an accurate list of categories describing the person's favourite interests, which is the goal of the script.

OK, here is the script, introspicious2, written in Ruby. It analyzes your del.icio.us links to find the tags describing your favourite interests, and clusters those tags into meaningful categories using Google Directory:


# This Ruby script analyzes your del.icio.us links to determine your
# favourite interests by listing the tag categories that are most
# frequently used. Because you may not tag your links, this script
# uses the tags from *all* users, for *your* links.
#
# The result is a list of tag categories, sorted by the number of
# links that *you* have associated with each. In effect, you get a
# description of your favourite interests, starting with the ones you
# enjoy the most. It is a means of introspection, using the
# descriptive power of delicious tags. Hence the name of the script:
# introspicious.
#
# Technical Note: To find the categories associated with a given tag,
# this script looks up the tag in Google Directory to see what
# categories it is under.
#
# Instructions:
# 1. Save this script as "introspicious.rb"
# 2. Edit it to use your del.icio.us username and password
# 3. Download and install Ruby
# 4. Run this script using "ruby introspicious.rb"
#
# (Would someone be willing to write a web front-end for this script,
# so that it is more accessible to others?)
#
# [Jon Aquino 2005-03-06]

$delicious_username = 'JonathanAquino'
$delicious_password = 'tiger'

require 'net/http'
def printException(e)
# Force Ruby to print the full stack trace. [Jon Aquino 2005-03-07]
puts "Exception: #{e.class}: #{e.message}\n\t#{e.backtrace.join("\n\t")}"
end
class Link
attr_accessor :url, :description, :checksum
def initialize(url, description, checksum)
@url = url
@description = description
@checksum = checksum
end
end
class Delicious
def all_links
last_url = nil
last_description = nil
all_links = []
# Change "all" to "recent" for a quicker test [Jon Aquino 2005-03-07]
get('/api/posts/all').each {|line|
if line =~ /href="([^"]+)"/
last_url = $1
end
if line =~ /description="([^"]+)"/
last_description = $1
end
if line =~ /hash="([^"]+)"/
all_links << Link.new(last_url, last_description, $1)
end
}
all_links
end
def tags(link)
tags = []
get('/url/'+link.checksum).each {|line|
next if not line =~ /^\s*<.*delNav.*>(.*)</
tags << $1
}
tags.uniq
end
def get(path)
# Wait 1 second between queries, as per
# http://del.icio.us/doc/api [Jon Aquino 2005-03-06]
sleep 1
response = nil
Net::HTTP.start('del.icio.us') { |http|
req = Net::HTTP::Get.new(path)
req.basic_auth $delicious_username, $delicious_password
response = http.request(req).body
}
response
end
end

# Version 1 of this script used the Porter Stemming Algorithm to
# cluster tags etymologically. The current script scrapes Google
# Directory to cluster tags semantically. [Jon Aquino 2005-03-06]
class GoogleDirectory
def initialize
@query_to_cached_categories_map = {}
end
def categories(query)
if @query_to_cached_categories_map.keys.include? query
puts "(Cached: #{query})"
else
attempts = 1
begin
@query_to_cached_categories_map[query] = categories_proper(query)
rescue Exception => e
# Sometimes I get c:/ruby/lib/ruby/1.8/timeout.rb:42:in
# `rbuf_fill': execution expired (Timeout::Error).
# [Jon Aquino 2005-03-07]
printException(e)
puts "(Retrying)"
attempts += 1
retry if attempts <= 5
end
end
@query_to_cached_categories_map[query]
end
def categories_proper(query)
response = Net::HTTP.get("www.google.com", "/search?q=#{query}&cat=gwd%2FTop")
response.gsub!(/\n/, " ")
response.gsub!(/Related categories:/, "$$$Related categories:")
response.gsub!(/<\/table>/, "</table>$$$")
response.split("$$$").each { |line|
break if line =~ /Related categories:(.*)<\/table>/
}
line = $1
return [] if line == nil
line.gsub!(/href=/, "$$$")
line.gsub!(/\/>/, "$$$")
categories = []
line.split("$$$").each { |substring|
next if not substring =~ /http:\/\/directory.google.com\/Top\/([^?]+)/
categories << $1
}
categories.uniq
end
private :categories_proper
end

delicious = Delicious.new
google_directory = GoogleDirectory.new
categories = []
begin
i = 0
delicious.all_links.each { |link|
i += 1
puts "\n#{i}. #{link.description}\n#{link.url}"
tags_for_link = delicious.tags(link)
puts "Tags: #{tags_for_link.join(", ")}"
categories_for_link = []
tags_for_link.each { |tag|
categories_for_link += google_directory.categories(tag)
}
categories_for_link.uniq!
puts "Categories: #{categories_for_link.join(", ")}"
categories += categories_for_link
}
rescue Exception => e
printException(e)
exit 1
end

class CategoryCount
attr_reader :category, :count
def initialize(category)
@category = category
@count = 0
end
def inc
@count += 1
end
end

category_to_categorycount_map = Hash.new
categories.uniq.each { |category| category_to_categorycount_map[category] = CategoryCount.new(category) }
categories.each { |category| category_to_categorycount_map[category].inc }
category_to_categorycount_map.values.sort { |a,b| b.count <=> a.count }.each { |categorycount|
puts "#{categorycount.count} page#{categorycount.count==1?"":"s"} categorized as #{categorycount.category}"
}


And here are the results!



As expected, computer programming tops the list, as it is my #1 interest. But there are some strange categories as well (strange in terms of my interests, that is) - "Economics/Development", "Sterling,_Bruce", "University_of_Illinois". Maybe we need to reduce the granularity a bit...

Here are my results if we collapse all the categories to 1-level deep:

65% - 235 pages categorized as Computers
52% - 188 pages categorized as Arts
50% - 182 pages categorized as Science
45% - 164 pages categorized as Society
43% - 156 pages categorized as Reference
43% - 155 pages categorized as World
41% - 150 pages categorized as Business
33% - 122 pages categorized as Shopping
30% - 108 pages categorized as Kids_and_Teens
29% - 105 pages categorized as Regional

Computers, obviously, are a major interest for me. I have an interest in the Arts, as well as in Science. Meanwhile, the World, Society, and Business are a bit outside my radar screen. This is a fairly accurate picture.

Let's probe 2-levels deep to get a bit more specific:

50% - 180 pages categorized as Computers/Software
45% - 162 pages categorized as Computers/Internet
43% - 157 pages categorized as Computers/Programming
38% - 137 pages categorized as Science/Social_Sciences
28% - 104 pages categorized as Reference/Education
27% - 98 pages categorized as Arts/Music
26% - 95 pages categorized as Computers/Data_Formats
26% - 94 pages categorized as Science/Technology
25% - 92 pages categorized as Arts/Literature
24% - 89 pages categorized as Business/Industries
24% - 88 pages categorized as World/Deutsch
23% - 84 pages categorized as Computers/Multimedia
23% - 84 pages categorized as Society/Religion_and_Spirituality

Computers tops the list, as expected. Not sure where "Science/Social_Sciences" and "Reference/Education" is going -- guess I'll have to prove a few levels deeper to understand that. I don't know what "World/Deutsch" is doing on the list. And I do have an interest in Religion_and_Spirituality, which explains the Society category.

It would be fascinating to make a tree-control that displayed this information -- a tree viewer would be a better way to browse this stuff than straight lists. Hm! It occurs to me that if I converted it to XML, I could just open it in Internet Explorer, which displays XML in a tree format.

In conclusion, del.icio.us tags and Google Directory categories can be used to automatically infer a person's favourite interests based on the web pages they visit. By combining the flat structure of del.icio.us tags with the hierarchical structure of Google Directory categories, we arrive at a tree of the person's interests, with quantitative values for the strength of each interest.



Update: After manually weeding out some of the entries, here is the definitive list of my favourite interests:

26% - 96 pages categorized as Computers/Programming/Languages
16% - 60 pages categorized as Arts/Graphic_Design
16% - 58 pages categorized as Society/Subcultures/Geeks_and_Nerds
15% - 54 pages categorized as Computers/Internet/On_the_Web/Weblogs/Tools
14% - 51 pages categorized as Science/Technology

I love programming, obviously. But I also have a penchant for graphic design. I love the idea of cyberspace and cyberculture, which is where "Subcultures/Geeks_and_Nerds" comes in. I love writing for a large audience on the web ("Weblogs"). And in general, I just love technology.


desenvolvimento

Monday, March 07, 2005


Here I'm having introspicious2 analyze my del.icio.us tags, clustering them into categories using Google Directory. I'll be writing more about this soon. I love how Google Directory can be used as a clustering engine (with a bit o' screen scraping, that is).


This is great - a little tool for Windows users to redirect output to both file and console. http://www.fpschultze.de/w3.htm

Convert your syntax-coloured XEmacs buffer to HTML using HTMLize

Found a great XEmacs package today called HTMLize. It takes your XEmacs buffer...



... and converts it to HTML ...



This is going to be great for putting syntax-coloured code on the web. XEmacs syntax-colours a lot of different programming languages, so no longer will I have to hunt on the web for an online (Language X) to HTML converter. Hooray!

(Tip: Set it to "font" mode instead of "css" mode, to make the output easier to paste into your blog).


require 'net/http'

class Delicious
def all_link_checksums
all_link_checksums = []
# Change "all" to "recent" for a quicker test [Jon Aquino 2005-03-07]
get('/api/posts/all').each {|line|
next if not line =~ /hash/
all_link_checksums << checksum(line)
}
all_link_checksums
end
def tags(link_checksum)
tags = []
get('/url/'+link_checksum).each {|line|
next if not line =~ /^\s*<.*delNav.*>(.*)</
tags << $1
}
tags.uniq
end

Sunday, March 06, 2005


I love the sparklines on this page. It is showing which web pages are currently the most popular. The sparklines add a temporal dimension that is quite interesting. http://tools.waglo.com/durl/graph/popular

Pasting Code Into Your Blog Using QuickEscape


If you're like me and paste a lot of code into your blog, you might find QuickEscape handy. This is a handy website that lets you paste in some text, and convert it into HTML-safe characters e.g. & becomes &amp;, < becomes &lt;, > becomes &gt;, etc.

Caveat: It seems to have a bug (feature?) in which it also converts HTML-safe characters back to their original characters, so be careful with it.

Update: If you use the XEmacs text editor, an even better solution is to use the HTMLize package.

introsp.icio.us: Using del.icio.us to identify your interests

The Ruby script below analyzes your del.icio.us links to generate a list of your favourite interests. It lists the tags that are most frequently used in your links. Because you may not have been diligent about descriptively tagging your links, this script uses the tags from *all* users, for *your* links.

The result is a list of tags, sorted by the number of your links that have been associated with each tag. In effect, you get a list of your favourite interests, starting with the ones you enjoy the most. It is a means of introspection, using the descriptive power of shared tagging in del.icio.us. Hence the name of the script: introspicious.

How this script came to be: I stumbled across extispicious, which is a neat mind-map of your del.icio.us tags. But when I tried it, I was disappointed to find that it only used my tags -- you see, I don't use very descriptive tags for my del.icio.us links. I wished there was a way to show the most common tags from all users, for my links, as a pseudo-quantitative way of understanding my favourite interests. Thus, introspicious was born.

Here is the beginning of the results for my links. As you can see, I am very interested in programming, news articles, and blogs:



Here's a graph of the numbers:


Note that there are a few tags that frequently came up in my links (e.g. programming). In contrast, there are a lot of tags that each appear in only one of my links (e.g. zzread200411).

So there you have it -- a script to analyze your del.icio.us links to determine your favourite interests. (Would someone be willing to write a web front-end for this script, so that it is more accessible to others?)



Instructions:
1. Save this script as "introspicious.rb"
2. Edit it to use your del.icio.us username and password
3. Download and install Ruby
4. Download http://www.tartarus.org/~martin/PorterStemmer/ruby.txt and save it as "stemmable.rb" in the same directory as this script
5. Run this script using "ruby introspicious.rb"



#! /local/ruby/bin/ruby
# introspicious.rb. For more information, see
# http://jonaquino.blogspot.com/2005/03/introspicious-using-delicious-to.html

$delicious_username = 'JonathanAquino'
$delicious_password = 'tiger'

require 'net/http'

def get(path, http)
# Wait 1 second between queries, as per
# http://del.icio.us/doc/api [Jon Aquino 2005-03-06]
sleep 1
puts path
req = Net::HTTP::Get.new(path)
req.basic_auth $delicious_username, $delicious_password
http.request(req).body
end

def checksum(xml)
xml =~ /hash="([^"]+)"/
$1
end

tags = []
Net::HTTP.start('del.icio.us') {|http|
get('/api/posts/all', http).each {|post|
next if not post =~ /hash/
tags_for_url = []
get('/url/'+checksum(post), http).each {|line|
next if not line =~ /^\s*<.*delNav.*>(.*)</
tags_for_url << $1.downcase
}
puts "Tags: #{tags_for_url.uniq.join(", ")}"
tags += tags_for_url.uniq
}
}

# Use the Porter Stemming Algorithm to strip suffixes from tags, thus
# combining similarly-named tags. [Jon Aquino 2005-03-06]
require "stemmable.rb"
class String
include Stemmable
end
stems = []
stem_to_example_tag_map = Hash.new
tags.each { |tag|
stems << tag.stem
stem_to_example_tag_map[tag.stem] = tag
}

class StemCount
attr_reader :stem, :count
def initialize(stem)
@stem = stem
@count = 0
end
def inc
@count += 1
end
end

stem_to_stemcount_map = Hash.new
stems.uniq.each { |stem| stem_to_stemcount_map[stem] = StemCount.new(stem) }
stems.each { |stem| stem_to_stemcount_map[stem].inc }
stem_to_stemcount_map.values.sort { |a,b| b.count <=> a.count }.each { |stemcount|
puts "#{stemcount.count} page#{stemcount.count==1?"":"s"} tagged as #{stem_to_example_tag_map[stemcount.stem]}"
}


Update: I have written a new version of introspicious that replaces
the Porter Stemmer with screen-scrapes of queries to the Google Directory,
which provides categories for the keywords you enter. Thus, instead
of using the Porter Stemmer to cluster tags etymologically, introspicious2
uses Google Directory to cluster tags semantically. Get introspicious2 here.