GridFS-MongoDB

GridFS is a mechanism for storing large files in MongoDB. A protocol that allows you to save an arbitary large file to the database. GridFS is a light weight specification for storing files that is built on top of normal MongoDB documents. The mongoDB server actually does almost nothing for handling  of GridFS requests; all of the work is handled by the client-side drivers and tools.

Why You Should Use GridFS:

  • You can keep information associated with the file (who’s edited it, download count, description, etc.) right with the file itself.
  • You can easily access info from random sections of large files,  traditional file tools aren’t good at this.
  • GridFS will leverage any existing replication or autosharding that you’ve setup for MongoDB,so getting failover and scale-out for storage is easy.
  • GridFS can alleviate some of the issuses that a certain file systems can exhibit when being used to store user uploads. For example, GridFS doesn’t have issues with storing large numbers of files in the same directory.
  • You can get a disk locality with GridFS, because MongoBD allocate data files in 2GB chunks.
How It Works :
  • Basic technique behind GridFS is we can store large files by splitting them into chunks.
  • GridFS breaks large files into manageable chunks. It saves the chunks to one collection (fs.chunks) and then metadata about the file to another collection (fs.files).
  • When you query for the file, GridFS queries the chunks collection and returns the file one piece at a time.
  • Each chunk is stored as a seprate document because MongoDB supports storing binary data in documents, we can keep storage overhead for chunks to a minimum.
  • We store a separate single document that groups the chunks together and contains metadata about the file.

By default chunks will use the fs.chunks collection,but this can be overridden. Within chunks collection the the structure of individual chunk:
{
“_id”          :ObjectId(“….”),
“n”             :0,
“data”        :BinData(“….”),
“files_id”  :ObjectId(“…..”)
}

files collection Store information in the form of :
{
“_id”                 : <unspecified>,    // unique ID for this file
“length”            : data_number,      // size of the file in bytes
“chunkSize”     : data_number,     // size of each of the chunks.  Default is 256k
“uploadDate”   : data_date,        // date when object first stored
“md5”              : data_string     // result of running the “filemd5” command on this file’s chunks
“filename”       : data_string,                          // human name for the file
“contentType” : data_string,                          // valid mime type for the object
“aliases”          : data_array of data_string,   // optional array of alias strings
“metadata”      : data_object,                        // anything the user wants to store
}

Indexes
GridFS implementations should create a unique, compound index in the chunks collection for files_id and no. Here’s how you’d do that from the shell:
db.fs.chunks.ensureIndex({files_id:1, no:1}, {unique: true});

This way, a chunk can be retrieved efficiently using it’s files_id and n values:
cursor = db.fs.chunks.findOne({files_id: myFileID}).sort({no:1});

Example:

A simple java application to store file in MongoDB collection and retrieve it.

import java.io.File;
import java.io.IOException;
import java.net.UnknownHostException;
import com.mongodb.DB;
import com.mongodb.DBCollection;
import com.mongodb.DBCursor;
import com.mongodb.Mongo;
import com.mongodb.MongoException;
import com.mongodb.gridfs.GridFS;
import com.mongodb.gridfs.GridFSDBFile;
import com.mongodb.gridfs.GridFSInputFile;

public class StoreImage {
public static void main(String[] args) {
try {
Mongo mongo = new Mongo(“localhost”, 27017);
DB database = mongo.getDB(“picturedb”);
DBCollection collection = db.getCollection(“imageCollec”);

String newFileName = “nitesh-java-image”;

File imageFile = new File(“D:\walle.jpg”);

// create a “photo” namespace for image sotrage.
GridFS gfsPhoto = new GridFS(db, “photo”);

// get image file from client’s drive
GridFSInputFile gfsFile = gfsPhoto.createFile(imageFile);

// set a new filename for image to identify.
gfsFile.setFilename(newFileName);

// save the image file into mongoDB
gfsFile.save();

// print the result(photo)
DBCursor cursor = gfsPhoto.getFileList();
while (cursor.hasNext()) {
System.out.println(cursor.next());
}

// get image file by it’s filename
GridFSDBFile imageForOutput = gfsPhoto.findOne(newFileName);

// save it into a new image file
imageForOutput.writeTo(“c:\nitesh-java-im.png”);

// remove the image file from mongoDB
gfsPhoto.remove(gfsPhoto.findOne(newFileName));

System.out.println(“Cheers….!! Done”);

} catch (UnknownHostException e) {
e.printStackTrace();
} catch (MongoException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}

}
}

Right from the app code, prototype building, system architecture till deployment, we assist our esteemed clients with all types of MongoDB Development Services. Moving a step further than development, SPEC INDIA offers MongoDB consulting services that provide comprehensive guidance, strategy building, deployment options and more.

Author: SPEC INDIA


less words, more information

Tech
IN 200
words

Read our microblogs

Subscribe Now For Fresh Content

Loading

Guest Contribution

We are looking for industry experts to contribute to our blog section through fresh and innovative content.

Write For Us

Our Portfolio

Proven Solutions Across Industries
Technology for Real-Life

Visit Our Portfolio

Scroll Up