Jon Aquino's Mental Garden

Engineering beautiful software jon aquino labs | personal blog

Sunday, March 06, 2005

introsp.icio.us: Using del.icio.us to identify your interests

The Ruby script below analyzes your del.icio.us links to generate a list of your favourite interests. It lists the tags that are most frequently used in your links. Because you may not have been diligent about descriptively tagging your links, this script uses the tags from *all* users, for *your* links.

The result is a list of tags, sorted by the number of your links that have been associated with each tag. In effect, you get a list of your favourite interests, starting with the ones you enjoy the most. It is a means of introspection, using the descriptive power of shared tagging in del.icio.us. Hence the name of the script: introspicious.

How this script came to be: I stumbled across extispicious, which is a neat mind-map of your del.icio.us tags. But when I tried it, I was disappointed to find that it only used my tags -- you see, I don't use very descriptive tags for my del.icio.us links. I wished there was a way to show the most common tags from all users, for my links, as a pseudo-quantitative way of understanding my favourite interests. Thus, introspicious was born.

Here is the beginning of the results for my links. As you can see, I am very interested in programming, news articles, and blogs:



Here's a graph of the numbers:


Note that there are a few tags that frequently came up in my links (e.g. programming). In contrast, there are a lot of tags that each appear in only one of my links (e.g. zzread200411).

So there you have it -- a script to analyze your del.icio.us links to determine your favourite interests. (Would someone be willing to write a web front-end for this script, so that it is more accessible to others?)



Instructions:
1. Save this script as "introspicious.rb"
2. Edit it to use your del.icio.us username and password
3. Download and install Ruby
4. Download http://www.tartarus.org/~martin/PorterStemmer/ruby.txt and save it as "stemmable.rb" in the same directory as this script
5. Run this script using "ruby introspicious.rb"



#! /local/ruby/bin/ruby
# introspicious.rb. For more information, see
# http://jonaquino.blogspot.com/2005/03/introspicious-using-delicious-to.html

$delicious_username = 'JonathanAquino'
$delicious_password = 'tiger'

require 'net/http'

def get(path, http)
# Wait 1 second between queries, as per
# http://del.icio.us/doc/api [Jon Aquino 2005-03-06]
sleep 1
puts path
req = Net::HTTP::Get.new(path)
req.basic_auth $delicious_username, $delicious_password
http.request(req).body
end

def checksum(xml)
xml =~ /hash="([^"]+)"/
$1
end

tags = []
Net::HTTP.start('del.icio.us') {|http|
get('/api/posts/all', http).each {|post|
next if not post =~ /hash/
tags_for_url = []
get('/url/'+checksum(post), http).each {|line|
next if not line =~ /^\s*<.*delNav.*>(.*)</
tags_for_url << $1.downcase
}
puts "Tags: #{tags_for_url.uniq.join(", ")}"
tags += tags_for_url.uniq
}
}

# Use the Porter Stemming Algorithm to strip suffixes from tags, thus
# combining similarly-named tags. [Jon Aquino 2005-03-06]
require "stemmable.rb"
class String
include Stemmable
end
stems = []
stem_to_example_tag_map = Hash.new
tags.each { |tag|
stems << tag.stem
stem_to_example_tag_map[tag.stem] = tag
}

class StemCount
attr_reader :stem, :count
def initialize(stem)
@stem = stem
@count = 0
end
def inc
@count += 1
end
end

stem_to_stemcount_map = Hash.new
stems.uniq.each { |stem| stem_to_stemcount_map[stem] = StemCount.new(stem) }
stems.each { |stem| stem_to_stemcount_map[stem].inc }
stem_to_stemcount_map.values.sort { |a,b| b.count <=> a.count }.each { |stemcount|
puts "#{stemcount.count} page#{stemcount.count==1?"":"s"} tagged as #{stem_to_example_tag_map[stemcount.stem]}"
}


Update: I have written a new version of introspicious that replaces
the Porter Stemmer with screen-scrapes of queries to the Google Directory,
which provides categories for the keywords you enter. Thus, instead
of using the Porter Stemmer to cluster tags etymologically, introspicious2
uses Google Directory to cluster tags semantically. Get introspicious2 here.

8 Comments:

  • Jonathan,

    Good idea. Haven't tried the code yet, so responding to the concept rather than the implementation, but good idea.

    Notwithstanding reference to *all* tags rather than just our own, we are still subject to a couple of phenomena that may distort our "true interests."

    1) Not only do you and I not tag very carefully, but people in general don't tag very carefully. (I'm sure people study this. I, however, am not among them.) In any case, this will influence the picture we get of our interests just as does our own "laziness," or (less pejorative?) "lax-ness."

    2) We are also constrained to "interests people bookmark," right? Or even more precisely, "interests people who use deli.icio.us bookmark." This will not necessarily be representative of our broader interests. Take, e.g., your interest in programming. I have no doubt you're interested in programming(!!) However, should we, for example, take the number of "pages tagged as [tag]" as indicative of the "relative strength of your interest," as it were? I think the answer is obviously, No.

    (Ahem "Obviously...." "Obvious" to some perhaps. Absurd to others?...)

    And again, for the sake of clarity, this is both a) because they're "interests people bookmark," and b) because they're "interests *people-who-use-del.icio.us* bookmark."

    MDW

    By Blogger userX, at 3/07/2005 11:06 a.m.  

  • Hi MDW - I love your analysis. Thanks.

    By Blogger Jonathan, at 3/07/2005 12:42 p.m.  

  • Btw, Jon, are you sure you want to post your del.icio.us password here?...

    I can't think of the reason not to, but I suspect there is one. (Not having a particularly nefarious imagination is one of my shortcomings!)

    Or is that not your pwd?!?! smile....

    You're prob'ly way ahead of me.

    MDW

    By Blogger userX, at 3/08/2005 6:23 p.m.  

  • Fantastic - Thanks PC.

    By Blogger Jonathan, at 5/09/2005 8:50 a.m.  

  • Has anyone seen this set up as a web service anywhere?

    By Anonymous Anonymous, at 7/23/2005 10:57 p.m.  

  • Hi John -- interesting idea about setting it up as a web service.

    By Anonymous Anonymous, at 7/24/2005 8:53 a.m.  

  • Hi! Very interesting idea! I (and I think a lot of more people too) am very interested in a webservice of introsp.icio.us for I can't (or don't want) to handle ruby. Would be great! Thanx!

    By Anonymous Anonymous, at 8/07/2005 8:58 a.m.  

  • Hm! I'll put it on my todo list. It's kind of tricky because the Yahoo clustering API will only let me do 1000 queries a day (or something like that). Perhaps this could be solved by asking the user to get a Yahoo developer key (not too hard).

    By Blogger Jonathan, at 8/07/2005 10:17 a.m.  

Post a Comment

<< Home