--> /* MENU */

0x Elastic Fog Storage

Fog Storage service has highly variable latency and unpredictable availability, requiring more predictable strategies for handling file metadata and storing shares at different locations.

0x Elastic Storage Technology

System Architecture

The continuous resource pool from the cloud to user devices provides more elasticity of resource allocation for diverse application requirements, but can also require more coordination effort between resource nodes and user devices. Our architecture is designed to integrate these devices so that we can combine their various advantages–for instance, using edge resources to reduce storage latency and cloud resources to provide security.



Our elastic fog storage framework maps data into distributed storage nodes including local physical volumes and (remote) commodity storage nodes. We assume that the commodity storage nodes are object storage services that store data using simple key-value pairs, where the key is the ID of the stored data and the value is the data content. The commodity storage can be cloud (e.g. Amazon S3) or edge devices (e.g. network-attached storage and smartphone). Applications read/write data to these storage locations through an elastic layer that provides a POSIX interface. The elastic layer not only performs file data encoding and maps file information (directory, stats, and content) into storages, but also separates the complexity of distributed storage management from application functionality.

Data encoding

Files are spliced into smaller pieces called chunks for data deduplication and efficient transmission. We sliced each file into 4 MB chunks, following the client applications of other cloud storage services like Dropbox. Even though the typical chunk size of data deduplication ranges from 8KB to 64KB to maximize deduplication rates, employing small chunks degrades data transmission performance over wide area networks, as in our fog scenario. The chunk is also encoded into shares using erasure coding (e.g. Reed-Solomon codes) for redundancy of data and efficiency of storage space. Clients thus require any t from a total of n created shares to decode their original data, where t and n are configurable parameters.

Metadata

Metadata information indicates how to reconstruct a file by stor- ing information on its constituent chunks and their share addresses. Thus, only clients with access to the metadata can properly decode and reconstruct the given file. Metadata also needs to store the chunk reference counts. Reference coun- ters are necessary for deleting chunks that have no referring files, since common chunks can be shared with multiple files. The metadata is stored in a directory (e.g. ∼/.meta) as a file for each data file, with the same directory hierarchy. Clients can share the metadata files with each other in order to share access to data files.

Mapping files to objects

Since edge storage nodes may not be trustable, we consider the client device to be the most trustable storage resource. We there- fore store the file metadata in client storage, assuming that the storage is stable and large enough to store the metadata securely. Users still need to backup the metadata regularly, but the metadata is much smaller than the actual file content. Read/write operations access file content in the form of chunks. We also lever- age local storage space as non-volatile memory for caching chunks. Frequently required chunks are expected to be stored in local storage. If some chunks are rarely used, a data scattering daemon, run as an asynchronous process from file operations, encodes the chunks into shares and writes them up to remote storage locations so that they can be removed from local storage locations.

Why 0x Fog Storage?



• Availability: Edge locations may not always be accessible to users, due to connection failures or storage nodes leaving the system. We use redundancy to ensure reliable connectivity to multiple distributed storage locations.

• Scalability: The major performance bottleneck of a distributed storage sys- tem is metadata lookup, which reduces read/write performance. We there- fore minimize the metadata handling by decoupling it from the files stored. We also scale to multiple storage nodes by making the file management operations transparent to the client.

• Flexibility: Our elastic fog storage allows storage nodes to join and leave the system by incorporating resource discovery and automated configuration. It is also customizable according to users’ performance requirements.

• Efficiency: It is necessary to minimize data transmission between client and storage nodes in order not to overload network capacity. Elastic fog storage will also reuse existing resources and implementations instead of re-designing new components.

• Security: Edge storage locations may not be trustable, which requires in- corporating authentication and authorization mechanisms.

0x Elastic Storage Technology

System Architecture

The continuous resource pool from the cloud to user devices provides more elasticity of resource allocation for diverse application requirements, but can also require more coordination effort between resource nodes and user devices. Our architecture is designed to integrate these devices so that we can combine their various advantages–for instance, using edge resources to reduce storage latency and cloud resources to provide security.



Our elastic fog storage framework maps data into distributed storage nodes including local physical volumes and (remote) commodity storage nodes. We assume that the commodity storage nodes are object storage services that store data using simple key-value pairs, where the key is the ID of the stored data and the value is the data content. The commodity storage can be cloud (e.g. Amazon S3) or edge devices (e.g. network-attached storage and smartphone). Applications read/write data to these storage locations through an elastic layer that provides a POSIX interface. The elastic layer not only performs file data encoding and maps file information (directory, stats, and content) into storages, but also separates the complexity of distributed storage management from application functionality.

Data encoding

Files are spliced into smaller pieces called chunks for data deduplication and efficient transmission. We sliced each file into 4 MB chunks, following the client applications of other cloud storage services like Dropbox. Even though the typical chunk size of data deduplication ranges from 8KB to 64KB to maximize deduplication rates, employing small chunks degrades data transmission performance over wide area networks, as in our fog scenario. The chunk is also encoded into shares using erasure coding (e.g. Reed-Solomon codes) for redundancy of data and efficiency of storage space. Clients thus require any t from a total of n created shares to decode their original data, where t and n are configurable parameters.

Metadata

Metadata information indicates how to reconstruct a file by stor- ing information on its constituent chunks and their share addresses. Thus, only clients with access to the metadata can properly decode and reconstruct the given file. Metadata also needs to store the chunk reference counts. Reference coun- ters are necessary for deleting chunks that have no referring files, since common chunks can be shared with multiple files. The metadata is stored in a directory (e.g. ∼/.meta) as a file for each data file, with the same directory hierarchy. Clients can share the metadata files with each other in order to share access to data files.

Mapping files to objects

Since edge storage nodes may not be trustable, we consider the client device to be the most trustable storage resource. We there- fore store the file metadata in client storage, assuming that the storage is stable and large enough to store the metadata securely. Users still need to backup the metadata regularly, but the metadata is much smaller than the actual file content. Read/write operations access file content in the form of chunks. We also lever- age local storage space as non-volatile memory for caching chunks. Frequently required chunks are expected to be stored in local storage. If some chunks are rarely used, a data scattering daemon, run as an asynchronous process from file operations, encodes the chunks into shares and writes them up to remote storage locations so that they can be removed from local storage locations.