Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
intertwingle [2008-11-15 17:39] 81.188.78.24intertwingle [2009-12-14 10:20] (current) nik
Line 5: Line 5:
 ====vast volumes of email==== ====vast volumes of email====
  
-May 18th+May 18th (1998)
  
 Submitted by Jamie Zawinski to Miscellaneous. Submitted by Jamie Zawinski to Miscellaneous.
  
-"Intertwingularity is not generally acknowledged -- people keep pretending they can make things deeply hierarchical, categorizable and sequential when they can't. Everything is deeply intertwingled." -- Ted Nelson  +"Intertwingularity is not generally acknowledged -- people keep pretending they can make things deeply hierarchical, categorizable and sequential when they can't. Everything is deeply [[intertwingled]]." -- Ted Nelson  
  
 In the following, I outline a potential project to make it easier to deal with a massive volume of personal messages: excavating, traversing, relating, reporting, annotating. In the following, I outline a potential project to make it easier to deal with a massive volume of personal messages: excavating, traversing, relating, reporting, annotating.
Line 25: Line 25:
   * future.    * future. 
  
-===introduction.===+====introduction.====
  
 Intertwingle can be seen as a unification of a search tool and an address book. It is not, however, a mail reader. The presentation of query results could be done through a mail reader, but the intention is that ones choice of mail reader should be orthogonal to the use of this tool. The two kinds of tools just happen to operate on the same data. Intertwingle can be seen as a unification of a search tool and an address book. It is not, however, a mail reader. The presentation of query results could be done through a mail reader, but the intention is that ones choice of mail reader should be orthogonal to the use of this tool. The two kinds of tools just happen to operate on the same data.
Line 88: Line 88:
     * Folders have names.     * Folders have names.
     * Folders are sometimes arranged in a hierarchy.     * Folders are sometimes arranged in a hierarchy.
-    * Folders tend to store messages linearly, in a particular order: thus, each message has ``previous'' and ``next'' relationships with other messages. +    * Folders tend to store messages linearly, in a particular order: thus, each message has "previousand "nextrelationships with other messages. 
   * Messages can contain other messages (forwarded messages, or digests.) Each such message is a message in its own right, but the containment relationship can be important.   * Messages can contain other messages (forwarded messages, or digests.) Each such message is a message in its own right, but the containment relationship can be important.
   *  Messages have bodies.   *  Messages have bodies.
Line 107: Line 107:
   * All messages containing text in the main body, but not in an attachment.   * All messages containing text in the main body, but not in an attachment.
   * All messages with an attachment whose file name contains string.    * All messages with an attachment whose file name contains string. 
 +
  
  
Line 113: Line 114:
 The basic components of this system are: The basic components of this system are:
  
-====1. parser.====+===1. parser.===
  
 The module which reads the existing message store (directories of BSD mbox files, or news spool directories, or whatever) and parses them into tagged, indexable data. The module which reads the existing message store (directories of BSD mbox files, or news spool directories, or whatever) and parses them into tagged, indexable data.
Line 152: Line 153:
 </code> </code>
  
-These objects are shallow: that last "db-id" mentioned in the example is a pointer to a top-level message object that will be coming up soon (probably next in the stream.) That is, deeply nested trees of messages are flattened. (An interesting search term might be ``depth > 1'' for when you're looking for something, and you know it was in a forwarded message, but you don't remember from whom.)+These objects are shallow: that last "db-id" mentioned in the example is a pointer to a top-level message object that will be coming up soon (probably next in the stream.) That is, deeply nested trees of messages are flattened. (An interesting search term might be "depth > 1for when you're looking for something, and you know it was in a forwarded message, but you don't remember from whom.)
  
 Deeply nested MIME structures (multipart/ forms) are also flattened. Content-Disposition is always assumed to be inline for purposes of indexing; we index the body of any part that is of a text type. There is no special handling for multipart/alternative forms: each part is indexed as for multipart/mixed. Deeply nested MIME structures (multipart/ forms) are also flattened. Content-Disposition is always assumed to be inline for purposes of indexing; we index the body of any part that is of a text type. There is no special handling for multipart/alternative forms: each part is indexed as for multipart/mixed.
Line 198: Line 199:
  
  
-==== 2. database.====+=== 2. database.===
    
 The module which stores the output of the parser on disk in some quickly-retrievable format. It needs to have both relational and full-text-indexing properties; many of the searches we want to do could be accomplished with a database that was nothing but a glorified set of hash tables; but body searches need to be done in some more clever way. (Perhaps simply putting every word in a hash table would be sufficient, but I doubt it.) And more to the point, the text searches have to take advantage of the tagging of the data, so that, for example, constraining a search to be in the subject and not the body actually makes the search go faster instead of slower. The module which stores the output of the parser on disk in some quickly-retrievable format. It needs to have both relational and full-text-indexing properties; many of the searches we want to do could be accomplished with a database that was nothing but a glorified set of hash tables; but body searches need to be done in some more clever way. (Perhaps simply putting every word in a hash table would be sufficient, but I doubt it.) And more to the point, the text searches have to take advantage of the tagging of the data, so that, for example, constraining a search to be in the subject and not the body actually makes the search go faster instead of slower.
Line 206: Line 207:
 It seems clear that RDF would be the way go go here. It seems clear that RDF would be the way go go here.
  
- ====3. query tool.====+===3. query tool.===
  
 All of the web search engines force the user to type in boolean expressions. Sometimes that's ok, but we should do something better, that lets the user construct expressions with a GUI. All of the web search engines force the user to type in boolean expressions. Sometimes that's ok, but we should do something better, that lets the user construct expressions with a GUI.
  
-Drawing on the notion that searches are really set operations, perhaps one aspect of the search tool could be drag-and-drop: to add a set of messages to the union of messages returned, drop the link on the ``Or'' box. To add it to the intersection of messages returned, drop it on the ``And'' box. Of course, that doesn't handle deeper boolean expressions, or textual searches. Maybe it's a dumb idea.+Drawing on the notion that searches are really set operations, perhaps one aspect of the search tool could be drag-and-drop: to add a set of messages to the union of messages returned, drop the link on the "Orbox. To add it to the intersection of messages returned, drop it on the "Andbox. Of course, that doesn't handle deeper boolean expressions, or textual searches. Maybe it's a dumb idea.
  
-==== 4. presentation tools.====+=== 4. presentation tools.===
  
 There are objects, sets of objects, and presentation tools. There is a presentation tool for each kind of object; and one for each kind of object set. There are objects, sets of objects, and presentation tools. There is a presentation tool for each kind of object; and one for each kind of object set.
  
-=====names, addresses, or people.=====+====names, addresses, or people.====
  
-The presentation tools for these kinds of objects needn't be complicated, since there's not a lot of information to show: just a bunch of links and/or commands. For example, there needs to be a place to hang the ``show me all people with this name'' gesture, and the ``show me all messages from this user'' gesture. But just including the list there isn't going to work, since it's long; really, there wants to be a way to initialize a search with this user. Perhaps activating one of those controls would bring up the search tool with some terms already filled in, like+The presentation tools for these kinds of objects needn't be complicated, since there's not a lot of information to show: just a bunch of links and/or commands. For example, there needs to be a place to hang the "show me all people with this namegesture, and the "show me all messages from this usergesture. But just including the list there isn't going to work, since it's long; really, there wants to be a way to initialize a search with this user. Perhaps activating one of those controls would bring up the search tool with some terms already filled in, like
  
 user = "Jamie Zawinski <jwz@mozilla.org>" user = "Jamie Zawinski <jwz@mozilla.org>"
Line 230: Line 231:
 The problem with the annotation notion is that it's the first time that we consider a piece of data which is not merely a projection of data already present in the message store: it is out-of-band data that needs to be stored somewhere. In the address book? In LDAP? I have no idea.  The problem with the annotation notion is that it's the first time that we consider a piece of data which is not merely a projection of data already present in the message store: it is out-of-band data that needs to be stored somewhere. In the address book? In LDAP? I have no idea. 
  
-=====sets of people.=====+====sets of people.====
  
 Perhaps a simple list is sufficient, with options to sort in various ways (by last name, first name, email, host-name, or host-domain.)  Perhaps a simple list is sufficient, with options to sort in various ways (by last name, first name, email, host-name, or host-domain.) 
  
-=====messages.=====+====messages.====
  
 Presenting a single message is straightforward: just return a message/rfc822 or text/html document. However, there should be some other controls available: Reply-To-Sender, Reply-To-All, Forward. And there needs to be a place to hang the reciprocal links to the referring messages, to the folder, and so on. Presenting a single message is straightforward: just return a message/rfc822 or text/html document. However, there should be some other controls available: Reply-To-Sender, Reply-To-All, Forward. And there needs to be a place to hang the reciprocal links to the referring messages, to the folder, and so on.
Line 240: Line 241:
 Annotations of messages would be interesting as well. For example, one might want to make a note to one's self that two messages from different people refer to the same issue and should be dealt with at the same time.  Annotations of messages would be interesting as well. For example, one might want to make a note to one's self that two messages from different people refer to the same issue and should be dealt with at the same time. 
  
-=====sets of messages.=====+====sets of messages.====
  
 This presentation has to be fairly powerful; it needs to present a decent summary of the messages (with resizable columns for sender, recipient, date, and so on) and be able to do all the usual sorting and threading tricks. Basically, this has to be a very good thread display. This presentation has to be fairly powerful; it needs to present a decent summary of the messages (with resizable columns for sender, recipient, date, and so on) and be able to do all the usual sorting and threading tricks. Basically, this has to be a very good thread display.
Line 246: Line 247:
 It should also be able to incrementally update as results are coming back from the database, so that the user can see the results they're getting (and even examine messages) while more results are still coming in. It should also be able to incrementally update as results are coming back from the database, so that the user can see the results they're getting (and even examine messages) while more results are still coming in.
  
-Note that, to this view, the concept of ``folder'' is meaningless: a folder name is just another property by which searches can be pruned.+Note that, to this view, the concept of "folderis meaningless: a folder name is just another property by which searches can be pruned.
  
-Today, I can point my ``message set browser'' at my Inbox folder, but I can't point it at the set of messages with word in the body. The special treatment of Inbox is arbitrary and limiting. +Today, I can point my "message set browserat my Inbox folder, but I can't point it at the set of messages with word in the body. The special treatment of Inbox is arbitrary and limiting. 
  
-Annotating a message-set could mean manually including and excluding specific messages: a message-set could be considered a ``bucket'' which the user can then manipulate by hand, assign a name, and keep around. For use as a ``to do'' list, say. (Message inclusion and exclusion could be handled by manipulating the search terms, so it's not as hard a problem as textual annotations in general.) +Annotating a message-set could mean manually including and excluding specific messages: a message-set could be considered a "bucketwhich the user can then manipulate by hand, assign a name, and keep around. For use as a "to dolist, say. (Message inclusion and exclusion could be handled by manipulating the search terms, so it's not as hard a problem as textual annotations in general.) 
  
 Presentation tools should be linked as well: one should be able to pick up the sets displayed in one tool and project them into another. For example: Presentation tools should be linked as well: one should be able to pick up the sets displayed in one tool and project them into another. For example:
  
   * Show me all messages with word in body.   * Show me all messages with word in body.
-  * Drag the sender column away: that's a set of people, therefore it is displayed using a ``people browser''.+  * Drag the sender column away: that's a set of people, therefore it is displayed using a "people browser".
   * In the people browser, click on an address: refine the search to contain only those in the same domain as that address. A new, smaller list of people is presented.   * In the people browser, click on an address: refine the search to contain only those in the same domain as that address. A new, smaller list of people is presented.
   * Project the addresses of those people into a message-set-viewer: this shows all mail received from any of those people.    * Project the addresses of those people into a message-set-viewer: this shows all mail received from any of those people. 
Line 272: Line 273:
  
   * show me a graph of the age-distribution of my unanswered mail, or,   * show me a graph of the age-distribution of my unanswered mail, or,
-  * show me a graph of people who are known to have directly exchanged mail with each other so that I can see the ``clumping'' of my correspondents. +  * show me a graph of people who are known to have directly exchanged mail with each other so that I can see the "clumpingof my correspondents. 
  
 The object/presentation infrastructure should be designed so that new tools drop in easily, with few interdependencies. The object/presentation infrastructure should be designed so that new tools drop in easily, with few interdependencies.
  • intertwingle.1226770792.txt.gz
  • Last modified: 2008-11-15 17:39
  • by 81.188.78.24