Meet Variety, a Schema Analyzer for MongoDB
Variety is a lightweight tool which gives a feel for an application’s schema, as well as any schema outliers. It is particularly useful for
• quickly learning how data is structured, if inheriting a codebase with a production data dump
• finding all rare keys in a given collection
An Easy Example
We’ll make a collection, within the MongoDB shell:
db.users.insert({name: "Tom", bio: "A nice guy.", pets: ["monkey", "fish"], someWeirdLegacyKey: "I like Ike!"});
db.users.insert({name: "Dick", bio: "I swordfight."}); db.users.insert({name: "Harry", pets: "egret"});
db.users.insert({name: "Geneviève", bio: "Ça va?"}); END JAVASCRIPT
Let’s use Variety on this collection, and see what it can tell us:
$ mongo test --eval "var collection = 'users'" variety.js
The above is executed from terminal.”test” is the database containing the collection we are analyzing.
Variety’s output:
{ "_id" : { "key" : "_id" }, "value" : { "types" : [ "object" ] }, "totalOccurrences" : 4, "percentContaining" : 100 }
{ "_id" : { "key" : "name" }, "value" : { "types" : [ "string" ] }, "totalOccurrences" : 4, "percentContaining" : 100 }
{ "_id" : { "key" : "bio" }, "value" : { "types" : [ "string" ] }, "totalOccurrences" : 3, "percentContaining" : 75 }
{ "_id" : { "key" : "pets" }, "value" : { "types" : [ "string", "array" ] }, "totalOccurrences" : 2, "percentContaining" : 50 }
{ "_id" : { "key" : "someWeirdLegacyKey" }, "value" : { "type" : "string" }, "totalOccurrences" : 1, "percentContaining" : 25 }
Every document in the “users” collection has a “name” and “_id”. Most, but not all have a “bio”. Interestingly, it looks like “pets” can be either an array or a string. The application code really only expects arrays of pets. Have we discovered a bug, or a remnant of a previous schema? The first document created has a weird legacy key I’ve never seen before- the people who built the prototype didn’t clean up after themselves. These rare keys, whose contents are never used, have a strong potential to confuse developers, and could be removed once we verify our findings. For future use, results are also stored a varietyResults database.
Learn More!
Learn more about Variety now, including
• How to download Variety
• How to set a limit on the number of documents analyzed from a collection
• How to contribute, and report issues
Variety is free, open source, and written in 100% JavaScript. Check it out on Github.
-by James Cropcho
Source : http://blog.mongodb.org/post/21923016898/meet-variety-a-schema-analyzer-for-mongodb
0 nhận xét:
Đăng nhận xét