0x Elastic Storage Technology
The continuous resource pool from the cloud to user devices provides more elasticity of resource
allocation for diverse application requirements, but can also require more coordination effort
between resource nodes and user devices. Our architecture is designed to integrate these devices
so that we can combine their various advantages–for instance, using edge resources to reduce
storage latency and cloud resources to provide security.
Our elastic fog storage framework maps data into distributed storage nodes including local
physical volumes and (remote) commodity storage nodes. We assume that the commodity storage
nodes are object storage services that store data using simple key-value pairs, where the key is
the ID of the stored data and the value is the data content. The commodity storage can be cloud
(e.g. Amazon S3) or edge devices (e.g. network-attached storage and smartphone). Applications
read/write data to these storage locations through an elastic layer that provides a POSIX
interface. The elastic layer not only performs file data encoding and maps file information
(directory, stats, and content) into storages, but also separates the complexity of distributed
storage management from application functionality.
Files are spliced into smaller pieces called chunks for data deduplication and efficient
transmission. We sliced each file into 4 MB chunks, following the client applications of other
cloud storage services like Dropbox. Even though the typical chunk size of data deduplication
ranges from 8KB to 64KB to maximize deduplication rates, employing small chunks degrades data
transmission performance over wide area networks, as in our fog scenario. The chunk is also
encoded into shares using erasure coding (e.g. Reed-Solomon codes) for redundancy of data and
efficiency of storage space. Clients thus require any t from a total of n created shares to
decode their original data, where t and n are configurable parameters.
Metadata information indicates how to reconstruct a file by stor- ing information on its
constituent chunks and their share addresses. Thus, only clients with access to the metadata can
properly decode and reconstruct the given file. Metadata also needs to store the chunk reference
counts. Reference coun- ters are necessary for deleting chunks that have no referring files,
since common chunks can be shared with multiple files. The metadata is stored in a directory
(e.g. ∼/.meta) as a file for each data file, with the same directory hierarchy. Clients can
share the metadata files with each other in order to share access to data files.
Mapping files to objects
Since edge storage nodes may not be trustable,
we consider the client device to be the most trustable storage resource. We there- fore store
the file metadata in client storage, assuming that the storage is stable and large enough to
store the metadata securely. Users still need to backup the metadata regularly, but the metadata
is much smaller than the actual file content. Read/write operations access file content in the
form of chunks. We also lever- age local storage space as non-volatile memory for caching
chunks. Frequently required chunks are expected to be stored in local storage. If some chunks
are rarely used, a data scattering daemon, run as an asynchronous process from file operations,
encodes the chunks into shares and writes them up to remote storage locations so that they can
be removed from local storage locations.