Duplicate issues on drupal.org and thoughts on metrics

At DrupalCon SF there's been a bit of a buzz around the new Certified to Rock site that computes metrics about a user's participation on drupal.org. I've been pondering various metrics for users and projects on drupal.org as part of my on-going efforts to improve the tools for our community to figure out WTF is going on. As I pulled open my projects page this morning to see what new issues await my attention before heading out to the post conference sprint, I did something I do extremely frequently: marked an issue as duplicate. There are lots of ways we can and should improve the process of trying to submit an issue on drupal.org so that before the user gets too far, it suggests possible existing issues that it can point them to instead. But, meanwhile, we still end up doing this a lot. I thought to myself: "I wonder how many times I've marked an issue duplicate, and if you ranked users on drupal.org for this activity, would I be near the top of the list?" I was pretty sure I'd be in the top 10. Luckily, it's an easy query to write (although not to run, sorry d.o DB server):

mysql> SELECT u.name, count(*) AS dup_count FROM project_issue_comments pic INNER JOIN comments c ON pic.cid = c.cid INNER JOIN users u WHERE c.uid = u.uid AND pic.sid = 3 GROUP BY c.uid ORDER BY dup_count DESC LIMIT 20;
| name            | dup_count |
| sun             |      1263 | 
| quicksketch     |       896 | 
| dww             |       861 | 
| greggles        |       810 | 
| moshe weitzman  |       777 | 
| Dave Reid       |       678 | 
| merlinofchaos   |       649 | 
| drewish         |       542 | 
| catch           |       452 | 
| KarenS          |       413 | 
| kiamlaluno      |       385 | 
| yched           |       383 | 
| Darren Oh       |       353 | 
| Gábor Hojtsy    |       351 | 
| Michelle        |       350 | 
| NancyDru        |       315 | 
| Damien Tournoud |       271 | 
| stella          |       267 | 
| bdragon         |       257 | 
| webchick        |       245 | 
20 rows in set (15.95 sec)

p.s. For anyone still in town tomorrow night, don't forget to come to my samba gig: Friday, 4/23, 10:00pm-1:00am, Grupo Samba Rio at Il Pirata: 2007 16th St (at Potrero) in San Francisco. Samba, dancing, and tons of fun. If you're still around after the conference, come down and unwind (ear plugs recommended -- it gets loud). ;)




nice work. sometimes, I don't

nice work.
sometimes, I don't know how the commits count come from. if someone submit a patch on drupal.org, and you commited it into Drupal project. will it count into your profile ??


Regarding that "when they submit an issue" part, you know about http://drupal.org/project/uniqueness right?

Somehow I missed uniqueness!

Ugh, sorry, no. Not only did I not know about uniqueness, I somehow missed this comment, too, until just now. ;) So I submitted http://drupal.org/node/1128044 about it. ;)


Oh so true...

For some reason this makes me feel better. I swear I do this at least once a day. Thanks for the post!

6 everywhere!

Hah, I find it very humorous that I'm #6 on both the D7 contributors and dupe issue finders. :)

person who marks it, or person who participates in it...?

I wanted to do a similar query and came back to this one. I think this query just looks for someone who commented on the issue when it's a duplicate which is not necessarily the same thing as "marked it as the duplicate."

To figure out who marked it as a duplicate you'd have to see whether it was a duplicate in the previous comment on the issue, right?

That query seems a bit nastier to write given that project_issue_comments.comment_number is not necessarily sequential.

How right you are...

Very good point. I was assuming people weren't going to keep commenting once it's duplicate, but you're right, that happens a lot, too. Hrm. And indeed, the current schema for project_issue doesn't make the "find comments that moved an issue from something else into 'duplicate'" query easy to run at all. :(