Loading XML into MongoDB
I’m starting a new app today and building out the data layer with MongoDB as my database. The app uses a collection from the USDA, that I thought makes a good sample for getting started with the “Load” portion of ETL into MongoDB.
The data is available from the USDA here – the raw XML for MyPyramid: http://explore.data.gov/download/b978-7txq/XML
Step 1 – Define a Class for the data
Although not absolutely necessary as you could build a raw BSON document directly from XML, you kind of miss out on some of the C# driver’s niceties if you do. Looking at the raw data, I came up with this class, along with a constructor that takes an XElement to handle the parsing. Strict DTO people might move that parsing to a function within the ETL process…up to you. The only MongoDB specific code here is the BsonId attribute, which I’ll put on the FoodCode property – a unique ID from the source system.
public class Food {
[BsonId]
public int FoodCode {get;set;}
public string DisplayName {get;set;}
public float PortionDefault {get;set;}
public float PortionAmount {get;set;}
public string PortionDisplayName {get;set;}
public float Factor {get;set;}
public float Increment {get;set;}
public float Multiplier {get;set;}
public float Grains {get;set;}
public float WholeGrains {get;set;}
public float Vegetables {get;set;}
public float OrangeVegetables {get;set;}
public float DarkGreenVegetables {get;set;}
public float StarchyVegetables {get;set;}
public float OtherVegetables {get;set;}
public float Fruits {get;set;}
public float Milk {get;set;}
public float Meats {get;set;}
public float Soy {get;set;}
public float DryBeansPeas {get;set;}
public float Oils {get;set;}
public float SolidFats {get;set;}
public float AddedSugars {get;set;}
public float Alcohol {get;set;}
public float Calories {get;set;}
public float SaturatedFats {get;set;}
public Food(XElement elem) {
this.FoodCode = Int32.Parse(elem.Element("Food_Code").Value);
this.DisplayName = elem.Element("Display_Name").Value;
this.PortionDefault = float.Parse (elem.Element("Portion_Default").Value);
this.PortionAmount = float.Parse (elem.Element("Portion_Amount").Value);
this.PortionDisplayName = elem.Element("Portion_Display_Name").Value;
if(elem.Element ("Factor") != null)
this.Factor = float.Parse (elem.Element("Factor").Value);
this.Increment = float.Parse (elem.Element("Increment").Value);
this.Multiplier = float.Parse (elem.Element("Multiplier").Value);
this.Grains = float.Parse (elem.Element("Grains").Value);
this.WholeGrains = float.Parse (elem.Element("Whole_Grains").Value);
this.Vegetables = float.Parse (elem.Element("Vegetables").Value);
this.OrangeVegetables = float.Parse (elem.Element("Orange_Vegetables").Value);
this.DarkGreenVegetables = float.Parse (elem.Element("Drkgreen_Vegetables").Value);
this.StarchyVegetables = float.Parse (elem.Element("Starchy_vegetables").Value);
this.OtherVegetables = float.Parse (elem.Element("Other_Vegetables").Value);
this.Fruits = float.Parse (elem.Element("Fruits").Value);
this.Milk = float.Parse (elem.Element("Milk").Value);
this.Meats = float.Parse (elem.Element("Meats").Value);
this.Soy = float.Parse (elem.Element("Soy").Value);
this.DryBeansPeas = float.Parse (elem.Element("Drybeans_Peas").Value);
this.Oils = float.Parse (elem.Element("Oils").Value);
this.SolidFats = float.Parse (elem.Element("Solid_Fats").Value);
this.AddedSugars = float.Parse (elem.Element("Added_Sugars").Value);
this.Alcohol = float.Parse (elem.Element("Alcohol").Value);
this.Calories = float.Parse (elem.Element("Calories").Value);
this.SaturatedFats = float.Parse (elem.Element("Saturated_Fats").Value);
}
}
You might notice I’m using float for my decimal values. That’s all the accuracy I need, but it does lose some precision. I’m rounding the data when I use it so it won’t really matter, but if your needs differ, choose a different numeric type.
Step 2 – Function for reading the XML file
This is a pretty small data file, only about 750 records, but loading it all into memory at once is a waste. I want to load the “Food_Display_Row” XML elements one at a time, convert to a Food object, store in MongoDB, and move on to the next. It’s a job for a streaming API and an iterator, powered by “yield return” to get one XElement at a time:
static IEnumerable<XElement> readElementStream(string fileName, string elementName) {
using(var reader = XmlReader.Create(fileName)) {
reader.MoveToContent();
while(reader.Read()) {
if(reader.NodeType == XmlNodeType.Element && reader.Name == elementName) {
var e = XElement.ReadFrom (reader) as XElement;
yield return e;
}
}
reader.Close ();
}
}
Step 3 – Pull it all together and load the data
With the pieces in place, the load process is pretty simple. Connect to the server, get the database (MongoDB creates it on first use), get the collection (MongoDB also creates the collection), and use the iterator to read the XML file, load each element into a Food object and insert into the MongoDB collection. At the end, we have a MongoDB database with a collection of data from the food guide pyramid.
var server = MongoServer.Create ("mongodb://localhost");
var db = server.GetDatabase("gov");
var foods = db.GetCollection<Food>("food");
foreach(var elem in readElementStream("~/Downloads/MyFoodapediaData/Food_Display_Table.xml", "Food_Display_Row")) {
var food = new Food(elem);
foods.Insert (food);
}
My favorite thing about this is that I never had to leave C# to create the database, parse the source XML, or load the data. I don’t have to run a separate ETL process or use management tools to configure my database schema. It’s a simple, self-contained solution.
My second favorite part is that I ran all of this under Ubuntu and Mono. It should work just as well under Windows and .NET, but life is better running under an open source software stack.
I hope you find this helpful if you’re getting started with MongoDB and want a little data to play with.
If C# is so awesome, why use anything else?
Anyone who knows me professionally knows I work in C# most of the time. I think it’s a great language that’s been well designed and made very portable by way of being an open language specification. A lot of people look at C# and say, that’s just Java with some Microsoft-extensions. Sort of, since it’s framework (.NET) ships with quite a few libraries that interoperate well with Windows, although the C# language itself doesn’t have anything to do with Windows, and runs on Linux, OS X, Android, iOS, and so on. In my opinion, Java has stagnated over the years, while C# has been evolving with generics (which Java followed), lambdas (Java finally gets them years later), anonymous types, partial method and class declarations, language integrated query (LINQ), dynamic runtime integration, and soon a simplified asynchronous programming model with await and async that will allow the runtime to deal with the gory details of async programming rather than forcing the programmer to understand and properly implement callbacks and cleanup. Java isn’t catching up fast, so Scala is filling the gaps, but C# remains years ahead.
Every year, my family looks at me funny when I they give me new books. That’s right, I’m a geek that reads computer books. Most of these books are not on C#, but on JavaScript, Python, Haskell, and I even keep an old PERL book on my shelf. What is all this other stuff?
JavaScript – it’s pretty rare these days that other developers would say, “why would you ever want to write JavaScript?” It’s a ubiquitous language amongst web browsers, and it’s pretty rare that anyone can write much of a browser-based application at all without it. Besides, the latest trend is to write a “language X to JavaScript converter” and what good would that be if I didn’t know JavaScript and wasn’t willing to learn language X? There have always been some nice server-side implementations, like Spidermonkey and newer V8, powering trendy applications like MongoDB and Node.js. Until I started down the Python path, whenever I needed extensibility, I would embed Spidermonkey for some JavaScript fun.
Python – in the realm of C#, a lot of people are uncomfortable mixing in Python. They don’t like the idea of losing compile-time checks and worry about needing a myriad of Python frameworks to solve any sizable development tasks. However, Python is an excellent tool for large and small projects alike, and IronPython take the Python language and gives it access to the full .NET framework. In the last few years, I’ve felt constrained if I didn’t have a layer of extensibility that IronPython can add to CLR applications. Python scripts let you treat code as data, meaning you can store it, transmit it, and change it at runtime. Python gives you a new way to move the problem around, solve it at a different time in your overall solution. It’s a great piece of the toolbox.
I remember spending weeks building business rules engines so non-programmers could add some logic to enterprise applications. These engines would use reflection and Lightweight Code Generation (LCG) and a clumsy UI where end users would select data objects and operators and build expression statements. IronPython uses LCG, is highly optimized, and gives you a general purpose scripting language with access to CLR objects. Most end users prefer the ability to write an expression in script rather than fumble with the type of UI needed to build an expression tree. This is just scratching the surface of the Python language, but at the very least, it’s a great tool anywhere you want to offer runtime extensibility.
Haskell is pure functional programming – no state, just functions. I used F# a bit for professional work just to learn it, but it allows you to fall back into the OOP line of thinking. Haskell makes you take a fully functional approach. I recommend every developer that’s looking to expand their approach to problem solving to spend some time with it.
What about that PERL 5 book? Well, I don’t use that, to be honest. I did once upon a time, but I really do avoid PERL at all costs. Maybe one day, I’ll pick it back up.
Calling IronPython from C#
There are a lot of great Python libraries out there, and IronPython makes it really easy to call many of them from .NET. Over the years, IronPython has become easier to embed in your applications, and the DLR that was added in .NET 4 makes it dead simple.
Say you have a Python expression (could be an entire module) in a string variable called “expression” – the code to execute that is this simple:
var engine = IronPython.Hosting.Python.CreateEngine(); var script = engine.CreateScriptSourceFromString(expression); var scope = engine.CreateScope(); dynamic result = script.Execute(scope);
When you execute that, your result will be whatever you returned from the Python expression. You could return a value or a function defined in Python. If your expression is a Python lambda taking three parameters, from your C# code you can write the following:
dynamic foo = result(a,b,c);
If you need to load additional .NET assemblies to expose them to the IronPython code, just call the following:
engine.Runtime.LoadAssembly(assembly);
What about passing parameters? The scope let’s you pass in a dictionary of parameters. The key to each dictionary entry is the name the parameter will have inside the IronPython scope, and the value is going to be the value of that parameter when script.Execute(scope) is called. To pass a dictionary of parameters, simply do this:
var parameters = new Dictionary<string,object>() {
{ "age", 30 }, { "name", "Vinny" }
}
scope = engine.CreateScope(parameters);
result = script.Execute(scope);
The parameters “age” and “name” will be passed into the scope of the IronPython script being executed.
Suppose you have additional Python modules that you want to call from your embedded IronPython. IronPython ships with quite a bit of the standard library, but your embedded code doesn’t necessarily know how to find it. A call to engine.SetSearchPaths(paths) adds a collection of strings with paths that IronPython should search when executing your code.
var paths = new List<string>();
paths.Add("c:\path\to\my\modules");
engine.SetSearchPaths(paths);
I encourage you explore the options for embedding IronPython in your own applications. The ScriptEngine is quite robust; you can execute string expressions or entire files, in the same AppDomain or in a new one.
TCP Proxy in C# using Task Parallel Library
Every now and then I have the need to proxy TCP communications, handy for things like viewing network traffic or proxying Silverlight or Flash requests. C# makes this pretty easy, and the Task Parallel Library (add-on to .NET 3.5 & shipped with .NET 4) simplifies the code with a nice fluent interface.
Here’s a quick example that works for proxying a VNC connection. There is one task for reading from the client and sending data to the server and another task for reading server responses and sending them to the client.
static TcpListener listener = new TcpListener(IPAddress.Any, 4502);
const int BUFFER_SIZE = 4096;
static void Main(string[] args) {
listener.Start();
new Task(() => {
// Accept clients.
while (true) {
var client = listener.AcceptTcpClient();
new Task(() => {
// Handle this client.
var clientStream = client.GetStream();
TcpClient server = new TcpClient("10.0.1.5", 5900);
var serverStream = server.GetStream();
new Task(() => {
byte[] message = new byte[BUFFER_SIZE];
int clientBytes;
while (true) {
try {
clientBytes = clientStream.Read(message, 0, BUFFER_SIZE);
}
catch {
// Socket error - exit loop. Client will have to reconnect.
break;
}
if (clientBytes == 0) {
// Client disconnected.
break;
}
serverStream.Write(message, 0, clientBytes);
}
client.Close();
}).Start();
new Task(() => {
byte[] message = new byte[BUFFER_SIZE];
int serverBytes;
while (true) {
try {
serverBytes = serverStream.Read(message, 0, BUFFER_SIZE);
clientStream.Write(message, 0, serverBytes);
}
catch {
// Server socket error - exit loop. Client will have to reconnect.
break;
}
if (serverBytes == 0) {
// server disconnected.
break;
}
}
}).Start();
}).Start();
}
}).Start();
Debug.WriteLine("Server listening on port 4502. Press enter to exit.");
Debug.ReadLine();
listener.Stop();
}
This is for illustrative purposes only. If you decide to use this in production, you’ll need to use TcpListener.BeginAcceptTcpClient() for async connections, you’ll need error handling and logging, and you’ll want some sort of pool to manage (and clean up) client socket connections. Have fun, and let me know if you have concerns or suggestions.
JSON and the DLR
JavaScript is a dynamic language, and with the DLR, C# can be as well. There have been more than a few times that I’ve wanted to pass JavaScript objects over to my C# code, but my object models didn’t match up, so I couldn’t easily deserialize them for server side processing. With the DLR, this is no longer a problem. We don’t need to modify a C# class to match the JavaScript object model. Instead, use the JavaScriptSerializer that ships with .NET 3.5 to get a Dictionary, then copy those elements into a DynamicObject.
First, our DynamicObject:
public class JsonDynamicObject : DynamicObject
{
private Dictionary<string,object> properties = new Dictionary<string, object>();
public override bool TryGetMember (GetMemberBinder binder, out object result)
{
object value;
if(properties.TryGetValue(binder.Name, out value)) {
result = value;
return true;
}
else {
result = null;
return false;
}
}
public override bool TrySetMember (SetMemberBinder binder, object value)
{
if(properties.ContainsKey(binder.Name)) {
properties[binder.Name] = value;
}
else {
properties.Add(binder.Name, value);
}
return true;
}
public override IEnumerable<string> GetDynamicMemberNames ()
{
return properties.Keys;
}
}
Next, use the JavaScriptSerializer to parse a string of JSON into a Dictionary<string,object>, and build an instance of our JsonDynamicObject from that:
public static JsonDynamicObject Parse(string json) {
var s = new System.Web.Script.Serialization.JavaScriptSerializer();
return buildDynamicObject((Dictionary<string,object>)s.DeserializeObject(json));
}
private static JsonDynamicObject buildDynamicObject(Dictionary<string,object> props) {
if(props != null) {
JsonDynamicObject dynObj = new JsonDynamicObject();
foreach(var kvp in props) {
Dictionary<string, object> subProps = kvp.Value as Dictionary<string, object>;
if(subProps != null) {
dynObj.properties.Add(kvp.Key, buildDynamicObject(subProps));
}
else {
dynObj.properties.Add(kvp.Key, kvp.Value);
}
}
return dynObj;
}
else {
return null;
}
In the end, we have an object that is fully compatible with the DLR and can even be passed on to other dynamic languages, such as IronPython running in the server:
dynamic d = JsonDynamicObject.Parse("{'UserID':23, 'User':{'Name':'Billy', 'Age':28}}");
Console.WriteLine(d.Blah);
Console.WriteLine(d.Test.What);
var scriptEngine = Python.CreateEngine();
var scope = scriptEngine.CreateScope();
scope.SetVariable("jsObj", d);
var source = scriptEngine.CreateScriptSourceFromString("print jsObj.User.Age * 10");
source.Execute(scope);
An object defined in JavaScript passed to C# then on to IronPython all in just a few lines of code!
Embedding PowerShell 2 in IronPython
PowerShell 2 makes the process of embedding PowerShell scripts inside other languages quite a bit simpler than in previous versions:
import clr
clr.AddReference("System.Management.Automation")
from System.Management.Automation import PowerShell
with PowerShell.Create() as ps:
script = ps.AddScript("param([System.String]$pname)\r\nGet-Process -name $pname")
script.AddParameters({"pname" : "devenv"})
output = [o.BaseObject for o in script.Invoke()]
# Now you can access the Process object returned from the PS script.
print output[0].VirtualMemorySize64
You should create the PowerShell instance within a “with” block to ensure it is properly disposed after use. To pass parameters into a PS script, you’ll need to make sure the script accepts a few named parameters using the PS param() statement. Then you create a dict() of the name value pairs and add it as the parameters to the script. A Python list comprehension makes quick work of unwrapping the CLR objects from the PS output collection.
WCF Serialization of DLR dynamic types
I’m a huge fan of the DLR, as it provides terrific interoperability between C# and dynamic languages like IronPython. To create a C# class that works with the DLR, the easiest thing to do is derive from DynamicObject. One limitation arises when trying to use a dynamic type in a WCF service. Trying to use a DynamicObject-derived type will result in a runtime exception when trying to serialize with WCF’s DataContractSerializer. This class (which fails serialization) was my first attempt:
[DataContract]
public class SerializableDynamicObject : DynamicObject
{
[DataMember]
private Dictionary<string, object> props = new Dictionary<string, object>();
public override bool TryGetMember(GetMemberBinder binder, out object result)
{
return props.TryGetValue(binder.Name, out result);
}
public override bool TrySetMember(SetMemberBinder binder, object value)
{
if (props.ContainsKey(binder.Name))
props[binder.Name] = value;
else
props.Add(binder.Name, value);
return true;
}
}
Trying to serialize an instance of SerializableDynamicObject results in the following exception:
Unhandled Exception: System.Runtime.Serialization.InvalidDataContractException:
Type ‘WCFDynamicObject.SerializableDynamicObject’ cannot inherit from a type that is not marked with DataContractAttribute or SerializableAttribute. Consider marking the base type ‘System.Dynamic.DynamicObject’ with DataContractAttribute or SerializableAttribute, or removing them from the derived type.
We can’t add attributes to DynamicObject, so we have to do this the *slightly* harder way by implementing IDynamicMetaObjectProvider rather than deriving from DynamicObject. The tricky part of this is creating the DynamicMetaObject which handles the evaluation of binding expressions. Luckily the DLR documentation on CodePlex has a great walkthrough for this.
The SerializableDynamicObject contains a dictionary of dynamic members and will serialize properly using WCF’s DataContractSerializer. Use it in place of DynamicObject when you want to be able to pass your dynamic types across WCF service boundaries.
[DataContract]
public class SerializableDynamicObject : IDynamicMetaObjectProvider
{
[DataMember]
private IDictionary<string,object> dynamicProperties = new Dictionary<string,object>();
#region IDynamicMetaObjectProvider implementation
public DynamicMetaObject GetMetaObject (Expression expression)
{
return new SerializableDynamicMetaObject(expression,
BindingRestrictions.GetInstanceRestriction(expression, this), this);
}
#endregion
#region Helper methods for dynamic meta object support
internal object setValue(string name, object value)
{
dynamicProperties.Add(name, value);
return value;
}
internal object getValue(string name)
{
object value;
if(!dynamicProperties.TryGetValue(name, out value)) {
value = null;
}
return value;
}
internal IEnumerable<string> getDynamicMemberNames()
{
return dynamicProperties.Keys;
}
#endregion
}
public class SerializableDynamicMetaObject : DynamicMetaObject
{
Type objType;
public SerializableDynamicMetaObject(Expression expression, BindingRestrictions restrictions, object value)
: base(expression, restrictions, value)
{
objType = value.GetType();
}
public override DynamicMetaObject BindGetMember (GetMemberBinder binder)
{
var self = this.Expression;
var dynObj = (SerializableDynamicObject)this.Value;
var keyExpr = Expression.Constant(binder.Name);
var getMethod = objType.GetMethod("getValue", BindingFlags.NonPublic | BindingFlags.Instance);
var target = Expression.Call(Expression.Convert(self, objType),
getMethod,
keyExpr);
return new DynamicMetaObject(target,
BindingRestrictions.GetTypeRestriction(self, objType));
}
public override DynamicMetaObject BindSetMember (SetMemberBinder binder, DynamicMetaObject value)
{
var self = this.Expression;
var keyExpr = Expression.Constant(binder.Name);
var valueExpr = Expression.Convert(value.Expression, typeof(object));
var setMethod = objType.GetMethod("setValue", BindingFlags.NonPublic | BindingFlags.Instance);
var target = Expression.Call(Expression.Convert(self, objType),
setMethod,
keyExpr,
valueExpr);
return new DynamicMetaObject(target,
BindingRestrictions.GetTypeRestriction(self, objType));
}
public override IEnumerable<string> GetDynamicMemberNames ()
{
var dynObj = (SerializableDynamicObject)this.Value;
return dynObj.getDynamicMemberNames();
}
}
One warning, dynamic members can be anything, meaning at runtime someone could assign a method to one of these fields. If this is possible in your application, you’ll need to ensure any methods assigned to the dynamic type are not serialized. I’m leaving this as an exercise for the reader.
Comparing method invocation using reflection and dynamic
In C# 4.0, the ‘dynamic’ keyword was added to specify that you don’t want compile time checking – all operations on an instance declared as ‘dynamic’ will be resolved at runtime. This doesn’t add any functionality that wasn’t possible before, as you could use reflection or LCG to accomplish this, but the syntax is much more natural to use and feels like the duck typing you would have in a typical dynamic language.
So what’s the cost in terms of performance? With a little benchmarking, I’ve found that there is a cost to using dynamic, but it’s certainly much faster than invoking methods using reflection. Compiled code, as you would expect, is clearly the fastest way to invoke methods. In my quick little benchmark I was able to invoke the same method about 75 times more often in a two second period using compiled code than using reflection. Compiled is still faster than dynamic – I could invoke the method about 6 times more often in code using compiled invocation than using dynamic. More surprising is the method could be invoked using dynamic about 12 times more often than using reflection.
Here are some numbers from one of my samples – I’m counting the number of times I could invoke the same method in a two second period using all three types of invocation running simultaneously to avoid environmental factors skewing results. Bigger is better – that means the method could be invoked faster.
Compiled: 159,277,840.00 method calls.
Reflection: 2,102,121.00 method calls.
Dynamic: 24,702,097.00 method calls.
What this tells me is whenever I can choose between reflection and dynamic for invoking a method, dynamic is quite likely to perform better.
Here is the source, in case you want to run a similar comparison on your own.
class Program
{
class MyTest
{
public ulong SomeField;
public void Increment()
{
SomeField = checked((ulong)(SomeField + 1));
}
}
static void Main(string[] args)
{
// Do any possible setup here so we're only invoking methods during the test.
MyTest testCompiled = new MyTest();
MyTest testReflection = new MyTest();
dynamic testDynamic = new MyTest();
Type t = typeof(MyTest);
MethodInfo incrementMethod = t.GetMethod("Increment");
// Define the ThreadStart for method invocation using compiled, reflection, and dynamic.
ThreadStart threadStartCompiled = new ThreadStart(() =>
{
while (true)
{
testCompiled.Increment();
}
});
ThreadStart threadStartReflection = new ThreadStart(() =>
{
while (true)
{
incrementMethod.Invoke(testReflection, null);
}
});
ThreadStart threadStartDynamic = new ThreadStart(() =>
{
while (true)
{
testDynamic.Increment();
}
});
// Setup the threads and start them.
Thread compiledTestThread = new Thread(threadStartCompiled);
Thread reflectionTestThread = new Thread(threadStartReflection);
Thread dynamicTestThread = new Thread(threadStartDynamic);
compiledTestThread.Start();
reflectionTestThread.Start();
dynamicTestThread.Start();
// Wait a couple of seconds to let each thread run then abort them all.
Thread.Sleep(2000);
compiledTestThread.Abort();
reflectionTestThread.Abort();
dynamicTestThread.Abort();
// Get the output
Console.WriteLine("Compiled: {0:n} method calls.", testCompiled.SomeField);
Console.WriteLine("Reflection: {0:n} method calls.", testReflection.SomeField);
Console.WriteLine("Dynamic: {0:n} method calls.", testDynamic.SomeField);
}
}
DLR + IBatis – ORM mappings for dynamic objects
I’m a big fan of using an ORM to hide the implementation details of the database from the people writing application logic. A lot of ORM’s depend on the structure of your object to determine what SQL they emit. That can be painful sometimes because you have to design your object model and database to fit the ORM. In the case of objects with dynamic members (like those supported by IronPython, IronRuby, and now the DLR in C# 4.0) the structure of your object isn’t necessarily determined at compile time. Instead, you have a backing collection like a Dictionary for your dynamic types that will get populated at runtime with any dynamic members. That dictionary can be full of more dynamic objects. As you can imagine, trying to persist a dynamic structure like this in a not-so-dynamic relational database can be tricky, but it also gives you some nice advantages. For one thing, you can change your database structure without having to deploy any code changes.
For an example of how to create a backing collection for dynamic types for .NET 4.0, follow the documentation on http://dlr.codeplex.com to derive from DynamicObject or implement IDynamicMetaObjectProvider. For IronPython in .NET 2.0, follow the steps regarding GetBoundMember and SetMemberAfter demonstrated in IronPython in Action for examples of how to do this.
Here’s an example to illustrate. I have a base class that all my entities derive from named DynamicBase. This class has the backing collection for any dynamic members, so if I’m in IronPython or dynamic C#, I can arbitrarily add properties to any class that derives from DynamicBase. In fact, if I didn’t want to derive any classes, I could probably stop with DynamicBase.
IronPython example:
simpleDynamicPerson = DynamicBase() simpleDynamicPerson.Name = "Bill" simpleDynamicPerson.Title = "Boss" simpleDynamicPerson.Height = "6ft" anotherDynamicPerson = DynamicBase() anotherDynamicPerson.Name = "Mike" anotherDynamicPerson.Title = "Assistant" anotherDynamicPerson.YearsWorking = 4 complexDynamicTeam = DynamicBase() complexDynamicTeam.Name = "MyTeam" complexDynamicTeam.Leader = simpleDynamicPerson complexDynamicTeam.Assistant = anotherDynamicPerson
After running this code, I have a DynamicBase instance, and in its DynamicProperties backing collection it has two entries: “Leader”, which contains the simpleDynamicPerson instance, and “Assistant”, which contains the anotherDynamicPerson instance. The simpleDynamicPerson instance’s DynamicProperties collection has entries for “Name”, “Title” and “Height” that are populated at runtime as well.
There aren’t really any classes to map here. There are just dictionaries of dictionaries that are holding all the data. using the IBatis DataMapper, I could run a statement like this:
Mapper.Instance().Insert("StoreDynamicTeam", complexDynamicTeam)
StoreDynamicTeam needs to be the name of an statement loaded in one of the SqlMap files loaded by SqlMap.config. The ORM mapping is the only thing that needs to know about the structure of the dynamic object, or at least the fields that need to be persisted.
<insert id="StoreDynamicTeam"> BEGIN TRAN DECLARE @teamID INT INSERT INTO TEAM (NAME) VALUES (#DynamicProperties.Name#) SET @teamID = SCOPE_IDENTITY() INSERT INTO PEOPLE (NAME, TITLE, HEIGHT, YEARSWORKING, TEAM_ID) VALUES (#Leader.DynamicProperties.Name#, #Leader.DynamicProperties.Title#, #Leader.DynamicProperties.Height#, #Leader.DynamicProperties.YearsWorking#, @teamID) INSERT INTO PEOPLE (NAME, TITLE, HEIGHT, YEARSWORKING, TEAM_ID) VALUES (#Assistant.DynamicProperties.Name#, #Assistant.DynamicProperties.Title#, #Assistant.DynamicProperties.Height#, #Assistant.DynamicProperties.YearsWorking#, @teamID) COMMIT </insert>
Getting that data back out isn’t much more difficult.
dynamicPeople = Mapper.Instance().QueryForList("GetDynamicTeam", teamID)
My statement and the resultMap need to deal with the dynamic parameters:
<select id="GetDynamicTeam" parameterClass="int" resultMap="DynamicTeamResult"> SELECT p.NAME AS PERON_NAME, p.TITLE AS TITLE, p.HEIGHT AS HEIGHT, p.YEARSWORKING AS YEARSWORKING FROM TEAM t INNER JOIN PEOPLE p ON p.TEAM_ID = t.TEAM_ID WHERE t.TEAM_ID = #value# </select> <resultMap id="DynamicPersonResult" class="MyClassLib.DynamicBase,MyClassLib"> <result property="DynamicProperties" resultMapping="DynamicPropertiesResult" /> </resultMap> <resultMap id="DynamicPropertiesResult" class="System.Collections.Generic.Dictionary`2[[System.String,mscorlib], [System.Object,mscorlib]], mscorlib"> <result property="Name" column="PERSON_NAME" /> <result property="Title" column="TITLE" /> <result property="Height" column="HEIGHT" /> <result property="YearsWorking" column="YEARSWORKING" /> </resultMap>
When you execute it, dynamicPeople will be a collection of DynamicBase objects with their dynamic properties populated. What’s very cool about these properties being dynamic is that you could change your query in your mapping file to add some new fields from some other table, update the resultMap to use those fields, and they’ll show up on your dynamic object.
Revisiting recursion with Dynamic Methods
Dynamic methods were added to .NET back in the 2.0 release as part of Lightweight Code Generation, and lots of technologies (i.e. IronPython) use them to do their dynamic dirty work of emitting code at runtime. Dynamic methods are methods that you create at runtime through System.Reflection.Emit much like you would generate an assembly with reflection, only you don’t need to put the method in an assembly, so there isn’t an assembly hanging around in memory…GC can clean up the JIT’d code after the method goes out of scope.
The code is also a lot more concise: no AssemblyBuilder, ModuleBuilder, TypeBuilder, MethodBuilder needed before you can write a dynamic method. For an example, I’m revisiting my old recursive Fibonacci function to illustrate writing a dynamic method and also show the performance cost for the runtime to resolve dynamic methods. Recursive functions demonstrate this the best because each recursion requires dynamic method resolution to occur.
DynamicMethod dm = new DynamicMethod("Fib", typeof(int), new Type[] { typeof(int) }, true);
ILGenerator il = dm.GetILGenerator();
Label isEqZero = il.DefineLabel();
Label isNotEqZero = il.DefineLabel();
Label isNotEqOne = il.DefineLabel();
Label retLabel = il.DefineLabel();
LocalBuilder local = il.DeclareLocal(typeof(int));
il.Emit(OpCodes.Ldarg_0);
il.Emit(OpCodes.Ldc_I4_0);
il.Emit(OpCodes.Ceq);
il.Emit(OpCodes.Brfalse, isNotEqZero);
// If it got here, then the input was zero, so go to return label.
il.Emit(OpCodes.Ldc_I4_0);
il.Emit(OpCodes.Stloc_0);
il.Emit(OpCodes.Br, retLabel);
il.MarkLabel(isNotEqZero);
il.Emit(OpCodes.Ldarg_0);
il.Emit(OpCodes.Ldc_I4_1);
il.Emit(OpCodes.Ceq);
il.Emit(OpCodes.Brfalse, isNotEqOne);
// If it got here, then the input was one, so go to return label.
il.Emit(OpCodes.Ldc_I4_1);
il.Emit(OpCodes.Stloc_0);
il.Emit(OpCodes.Br, retLabel);
il.MarkLabel(isNotEqOne);
// Should do recursion here.
il.Emit(OpCodes.Ldarg_0);
il.Emit(OpCodes.Ldc_I4_1);
il.Emit(OpCodes.Sub);
il.Emit(OpCodes.Call, dm);
il.Emit(OpCodes.Ldloc_0);
il.Emit(OpCodes.Add);
il.Emit(OpCodes.Stloc_0);
il.Emit(OpCodes.Ldarg_0);
il.Emit(OpCodes.Ldc_I4_2);
il.Emit(OpCodes.Sub);
il.Emit(OpCodes.Call, dm);
il.Emit(OpCodes.Ldloc_0);
il.Emit(OpCodes.Add);
il.Emit(OpCodes.Stloc_0);
// Loads the first local variable onto the stack and returns it.
il.MarkLabel(retLabel);
il.Emit(OpCodes.Ldloc_0);
il.Emit(OpCodes.Ret);
Func<int, int> invokeDynamicMethod = (Func<int, int>)dm.CreateDelegate(typeof(Func<int, int>));
for (int i = 0; i <= 40; i++)
{
int result = invokeDynamicMethod.Invoke(i);
Console.WriteLine("{0} {1}", i, result);
}
If you run this, you’ll notice that it gets slower with larger numbers because bigger numbers require more invocations of the dynamic method. Of course, you could solve this problem without using recursion, but I wanted to illustrate the performance penalty. That said, Lightweight Code Generation and dynamic methods give you some great flexibility without the additional baggage of generating a full fledged .NET assembly.