Wednesday, December 9, 2009

tree visualizations

I did a little exploratory reading on tree visualizations, from Ben Schneiderman's original treemap to more advanced ones, like voronoi maps and jigsaw maps.

Operator interaction framework for visualization systems

After not using my blog for a while, I decided to get back into it. The paper that motivated me was Ed Huai-hsin Chi's paper, "An Operator Interaction framework for visualization systems". It is an excellent paper that seems to bring visualization and HCI (human-computer interaction) closer together by bridging Norman's "gulf of execution", where users may not understand the actual semantics of the operations they use. they use the example of selecting data for display. Is the data selection to be updated on all displays (ie, value modification) or does the data selection modify only the display in which it was performed in (ie, view modification). From the basic view/value distinction, they go on to make a taxonomy of operations that are used in visualization, extending previous work to make the visualization more interactive. They also put the different operations in a pipeline-like state machine (in contrast to prior approaches where the representation is more of a flow chart, ie, in the paper they propose the nodes to represent states of the data, rather than processing elements. The rest of the paper goes through many examples.

Saturday, April 4, 2009

latent variable text models

I've been reading about latent variable text models. I'm trying to make sense of them by reading a lot of different papers. I got started reading Blei, et Al.,'s paper on the Latent Dirichlet Allocation. This went a bit over my head, especially with respect to the variational methods. By reading the Grifith, Steyvers, and Tenenbaum paper, I got a taste of how the problem would be solved using Markov Chain Monte Carlo methods and Gibbs sampling. This gave me a broader idea of different approaches to the parameter estimation aspect, but I'm still a little bit in terra incognito. I went back and read the Hoffmann paper on probablistic LSA, which brings me back on my radar. The previous method, LSA used SVD, which I'm familiar with, more or less. The Griffiths, Steyvers, and Tenenbaum paper did a good job of motivating topic models over LSA with arguments from psychological literature. But they did less well of motivating topic models over PLSA (or aspect model, as Hoffmann calls it). Also, for PLSA, the parameter estimation is done via EM, which is also in my charted territory. He does move to a variational method ("Tempered EM"), which seems pretty cool-- it uses the entropy/Helmholtz free energy idea from chemistry--Pretty cool! If only I remembered everthing from chemistry.

Friday, January 16, 2009

An Introduction to Conditional Random Fields for Relational Learning

by Charles Sutton and Andrew McCallum (umass)

[Sutton07a] C. Sutton and A. McCallum. An Introduction to Conditional Random Fields for Relational Learning. In L.Getoor and B. Taskar, editors. Introduction to Statistical Relational Learning.MIT Press, 2007. [ url ]

Good intro to conditional random fields (CRFs)!

relation of naive Bayes to logistic regression, directed to undirected graphical models, Markov chains (HMMs) to linear-chain CRFS.

role of p(x) in generative models; lack thereof in, relaxation of model parameters, and freedom from (in)dependence assumptions in discriminative models.

Thursday, January 15, 2009

Project Athena as a Distributed Computer System

[Champine90] George A. Champine, Daniel E. Geer, Jr., and William N. Ruh, “Project Athena as a Distributed Computer System,” IEEE Computer, IEEE, vol. 23, September 1990, 40-50,

This paper describes an implemented distributed system, Athena, at MIT for students. It also compares this system to other distributed OS's. Supported by DEC and IBM

Assumed that computers were too expensive, to be bought by students but possibly cheaper in the future.

Key requirements:

* Scalability: must scale up to 10,000+ computers
* Reliability: must be available 24/7 even when some components fail.
* Public work stations: any user at any workstation.
* Security: System services must be secure even though individual workstations are not.
* Heterogeneity: The system must support a variety of hardware platforms.
* Coherency: All system applications must run on all workstations. Consistent look and feel.
* Affordability: low cost to own and operate.

Definitions:
user
client
server
service
name
binding
resolving
coherence
interoperability
authentication
authorization
fail-soft

mainframe model vs unified model
- security, resource allocation, privacy/network, mail, maintenance

other distributed os's:
* Amoeba
* Andrew
* Dash
* Eden
* Grapevine
* HCS
* Locus
* Mach
* Sprite
* V

Athena in terms of the requirements

System comparisons

Issues:
* naming: hosts, printers, services, files, users; replication vs. partitioning
* scalability: anything that scales linearly is probably not feasible in general.
* Security: authentication and authorization. centralized and encrypted password checking service. public-key cryptography. key distribution servers. access control lists and capabilities based authorization
* Compatibility: binary level, execution level, and protocol level.


Athena system design
* name service: Hesiod. Berkeley bind. fast front end to moira.
* file service: NFS, AFS
* printing service
* mail service
* real-time notification: Zephyr
* service management: Moira: configuration for mail, disk quotas, hardware config, post office allocation, and access control lists
* authentication: Kerberos. Login and out. Tickets.
* installation and update
* online consulting, discuss


Design of Distributed Systems Supporting Local Autonomy

[Clark80] David D. Clark and Liba Svobodova, “Design of Distributed Systems Supporting Local Autonomy,” 20th IEEE COMPCON, IEEE, February 1980, 438-444,

Use analogy to real world organizations.

individual nodes cooperate in a standardized manner but maintain a fair degree of autonomy wrt their management and internal organization.

Predicts that this will be the most widely used paradigm for distributed systems.

As opposed to the Athena paper, which says that distribution is about cost, this paper says that distribution is fundamentally about the needs of the problem to which distribution is applied: many applications are naturally distributed.

Also, as opposed to the Athena paper, which is a realized implementation, this paper is a more theoretical paper.

Components: nodes (PCs), servers, communication substrate

Issues considered: efficiency, reliability, transaction integrity, and expandability.

there was a part that seemed anti-RPC, where they argued that the application programmer should know whether functions or data being used are local or remote (however they state that this may be hidden from the end user).


End-to-end Arguments In System Design

[Saltzer81] J. H. Saltzer, D. P. Reed, and D. D. Clark, “End-to-end Arguments In System Design,” ACM Transactions on Computer Systems, ACM Press, vol. 2, no. 4, 1984, 277-288,

This paper argues that excessive concern about component reliability is unnecessary, and that checks must be done at the application level regardless.