Monday, August 29, 2011

Moving an Elephant

Large Scale Hadoop Data Migration at Facebook. Paul Yang describes moving Facebooks 30-PB data via replication across datacenters.  (Previous Post)

As the majority of the analytics is performed with Hive, we store the data on HDFS — the Hadoop distributed file system. In 2010, Facebook had the largest Hadoop cluster in the world, with over 20 PB of storage. By March 2011, the cluster had grown to 30 PB — that’s 3,000 times the size of the Library of Congress! At that point, we had run out of power and space to add more nodes, necessitating the move to a larger data center.