Wednesday, May 31, 2006

GFS (Google File System) ?

 http://www.osweekly.com/index.php?option=com_content&Itemid=&task=view&id=2245

GFS (Google File System) is a distributed file system. A Distributed File System is one that supports sharing of files and resources, which are stored persistently over the network. GFS is implemented by Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. When I read their original paper on this file system, all questions I asked above were answered respectively. This is what they had to say:

"While sharing many of the same goals as previous distributed .le systems, our design has been driven by observations of our application workloads and technological environment, both current and anticipated that reflect a marked departure from some earlier file system assumptions.
….
The file system has successfully met our storage needs. It is widely deployed within Google as the storage platform for the generation and processing of data used by our service as well as research and development efforts that require large data sets. The largest cluster to date provides hundreds of terabytes of storage across thousands of disks on over a thousand machines, and it is concurrently accessed by hundreds of clients."

This file system had the same design goals as other existing distributed file system such as AFS (Andrew File System). These goals include:

- maximum performance
- ability to handle large number of users
- scalability
- to be able to handle inevitable expansions
- reliability to ensure maximum uptime and availability
- to ensure computers are available to handle queries

Now these were few goals which were common to all distributed file system. There are other design goals that are specific to GFS:

- optimized for nontraditional access patterns
- specific application workload
- specific technological environment
- designed for component failure
- Must include: monitoring, error detection, fault tolerance, automatic recovery

This filesystem actually provides a mountable Linux filesystem and uses our GMail accounts as a storage medium. It is a Python application and uses the FUSE userland framework to provide the filesystem, and uses libgmail library to communicate with GMail. It also supports most of the widely used file operations such as read, write, open, close, stat, symlink, link, unlink, truncate and rename. This implies that we can use all of our favorite UNIX command line tools to operate on files stored in GMail (e.g. cp, ls, mv, rm, ln, grep etc. etc.), though it does not implement a standard API. Files are also organized hierarchically in directories and identified by pathnames. GFS introduces two new operations as well:

- Snapshot
- Record append

According to the authors, snapshot creates a copy of a file or a directory tree at low cost. Record append allows multiple clients to append data to the same file concurrently while guaranteeing the atomicity of each individual client’s append. It is useful for implementing multi-way merge results and produce consumer queues that many clients can simultaneously append to without additional locking.

There are few people who think that garbage collection mechanism in C# and Java is wonderful. But what would happen if we extended this garbage collection to our file system? GFS has already accomplished this. Similar to garbage collector of programming languages, GFS does not immediately reclaim the available physical storage.

One of the problems with distributed systems is that any machine can go down at any given time. Considering this problem, GPS maintains high reliability using the following techniques:

- Replication
- Fast Recovery

According to the authors in fast recovery, both the master and the chunkserver are designed to restore their state and start in seconds no matter how they terminated. In fact, they do not distinguish between normal and abnormal termination; servers are routinely shut down just by killing the process. Clients and other servers experience a minor hiccup as they time out on their outstanding requests, reconnect to the restarted server and retry.

Replication can be done in two ways: In chunk replication, chunks are copied to various chunk servers. The levels of replication, i.e. how many times it should be replicated, can be supplied by the user. Similarly in master reliability, master state is replicated for reliability.

This was a brief introduction on Google File System. In our next article, we will look into its architecture and its overall structure.

0 comentarios: