Archive

Posts Tagged ‘C#’

Launching a ServiceProcess using mono csharp script

December 30, 2014 Leave a comment

When developing service applications that need to run cross platform, you often need some “glue” to make the application behave appropriately on different platforms.  Service applications written in .NET languages derive from ServiceBase so they can be started by the the Windows Service Control Manager.  On Windows, they can be added to the service registry with ‘sc create’ or using installutil.exe.  You can usually get these to run on Unix and Linux using the mono-service executable, however this doesn’t always behave the way you might want when used from the various init systems.  Instead of using mono-service on *nix, you can using the csharp script mechanism in mono to launch the service process and handle signals to interact with your application. Here is an example of a service application:

using System;
using System.ServiceProcess;
using System.Threading;
using System.Threading.Tasks;

namespace ServiceApplication {
	public class App : ServiceBase {

		CancellationTokenSource cts = new CancellationTokenSource();

		protected override void OnStart(string[] args) {
			var token = cts.Token;
			Task.Factory.StartNew(() => {
				while(true) {
					if(token.IsCancellationRequested)
						break;
					Console.WriteLine("{0} - service is running", DateTime.Now);
					Thread.Sleep(TimeSpan.FromSeconds(5));
				}
			}, token);
		}

		protected override void OnStop() {
			cts.Cancel();
		}

	}
}

All the code above is compatible across platforms and may be installed and run as a Windows Service without modification. Now for a simple script that can run on mono platforms using the ‘csharp’ command to wrap the service an execute and wait for typical Unix signals before shutting down cleanly:

#!/usr/bin/csharp

/*
 * Script for launching a Windows ServiceProcess applicaiton
 * on mono as an upstart job.
 */

LoadAssembly("System.ServiceProcess.dll");
LoadAssembly("Mono.Posix.dll");
LoadAssembly("ServiceApplication.dll");
using System.Reflection;
using System.ServiceProcess;
using ServiceApplication;
using Mono.Unix;
using Mono.Unix.Native;

var service = new App();
var mi = typeof(App).GetMethod("OnStart", BindingFlags.Instance | BindingFlags.NonPublic);
var result = mi.Invoke(service, new object[] { new string[] {} });

Console.WriteLine("Service started.");

var signals = new UnixSignal[]{
    new UnixSignal (Signum.SIGINT),
    new UnixSignal (Signum.SIGTERM),
    new UnixSignal (Signum.SIGQUIT),
};

var signal = UnixSignal.WaitAny(signals, -1);
Console.WriteLine("Received {0} signal, exiting.", signals[signal].Signum);

service.Stop();

All the platform specific code is in this script so your application code remains completely cross platform, and it makes for a very simple upstart script, as an example:


# Copy to /etc/init/monoServiceApp.conf
start on runlevel [2345]
stop on runlevel [06]

respawn

chdir /path/to/service/

script
   csharp ServiceRunner.cs
end script

The job above will execute the ServiceRunner.cs script using the csharp shell command automatically on startup, shutdown, reboot, and when executed manually using upstart commands (i.e. sudo start monoServiceApp). Since it’s a simple script, upstart will track the PID automatically and send SIGTERM when it’s time to exit, which will be caught by the script for a clean call to the Stop() method.

Note that you can do something similar with mono-service, which is more robust and has options that will let it fit with various init systems. However, mileage tends to vary, so a csharp wrapper script can be a nice alternative.

Categories: Uncategorized Tags: , , ,

Loading XML into MongoDB

December 29, 2011 Leave a comment

I’m starting a new app today and building out the data layer with MongoDB as my database. The app uses a collection from the USDA, that I thought makes a good sample for getting started with the “Load” portion of ETL into MongoDB.

The data is available from the USDA here – the raw XML for MyPyramid: http://explore.data.gov/download/b978-7txq/XML

Step 1 – Define a Class for the data

Although not absolutely necessary as you could build a raw BSON document directly from XML, you kind of miss out on some of the C# driver’s niceties if you do. Looking at the raw data, I came up with this class, along with a constructor that takes an XElement to handle the parsing. Strict DTO people might move that parsing to a function within the ETL process…up to you. The only MongoDB specific code here is the BsonId attribute, which I’ll put on the FoodCode property – a unique ID from the source system.

public class Food {
	[BsonId]
	public int FoodCode {get;set;}
	public string DisplayName {get;set;}
	public float PortionDefault {get;set;}
	public float PortionAmount {get;set;}
	public string PortionDisplayName {get;set;}
	public float Factor {get;set;}
	public float Increment {get;set;}
	public float Multiplier {get;set;}
	public float Grains {get;set;}
	public float WholeGrains {get;set;}
	public float Vegetables {get;set;}
	public float OrangeVegetables {get;set;}
	public float DarkGreenVegetables {get;set;}
	public float StarchyVegetables {get;set;}
	public float OtherVegetables {get;set;}
	public float Fruits {get;set;}
	public float Milk {get;set;}
	public float Meats {get;set;}
	public float Soy {get;set;}
	public float DryBeansPeas {get;set;}
	public float Oils {get;set;}
	public float SolidFats {get;set;}
	public float AddedSugars {get;set;}
	public float Alcohol {get;set;}
	public float Calories {get;set;}
	public float SaturatedFats {get;set;}
	
	public Food(XElement elem) {
		this.FoodCode = Int32.Parse(elem.Element("Food_Code").Value);
		this.DisplayName = elem.Element("Display_Name").Value;
		this.PortionDefault = float.Parse (elem.Element("Portion_Default").Value);
		this.PortionAmount = float.Parse (elem.Element("Portion_Amount").Value);
		this.PortionDisplayName = elem.Element("Portion_Display_Name").Value;
		if(elem.Element ("Factor") != null)
			this.Factor = float.Parse (elem.Element("Factor").Value);
		this.Increment = float.Parse (elem.Element("Increment").Value);
		this.Multiplier = float.Parse (elem.Element("Multiplier").Value);
		this.Grains = float.Parse (elem.Element("Grains").Value);
		this.WholeGrains = float.Parse (elem.Element("Whole_Grains").Value);
		this.Vegetables = float.Parse (elem.Element("Vegetables").Value);
		this.OrangeVegetables = float.Parse (elem.Element("Orange_Vegetables").Value);
		this.DarkGreenVegetables = float.Parse (elem.Element("Drkgreen_Vegetables").Value);
		this.StarchyVegetables = float.Parse (elem.Element("Starchy_vegetables").Value);
		this.OtherVegetables = float.Parse (elem.Element("Other_Vegetables").Value);
		this.Fruits = float.Parse (elem.Element("Fruits").Value);
		this.Milk = float.Parse (elem.Element("Milk").Value);
		this.Meats = float.Parse (elem.Element("Meats").Value);
		this.Soy = float.Parse (elem.Element("Soy").Value);
		this.DryBeansPeas = float.Parse (elem.Element("Drybeans_Peas").Value);
		this.Oils = float.Parse (elem.Element("Oils").Value);
		this.SolidFats = float.Parse (elem.Element("Solid_Fats").Value);
		this.AddedSugars = float.Parse (elem.Element("Added_Sugars").Value);
		this.Alcohol = float.Parse (elem.Element("Alcohol").Value);
		this.Calories = float.Parse (elem.Element("Calories").Value);
		this.SaturatedFats = float.Parse (elem.Element("Saturated_Fats").Value);
	}
}

You might notice I’m using float for my decimal values. That’s all the accuracy I need, but it does lose some precision. I’m rounding the data when I use it so it won’t really matter, but if your needs differ, choose a different numeric type.

Step 2 – Function for reading the XML file

This is a pretty small data file, only about 750 records, but loading it all into memory at once is a waste. I want to load the “Food_Display_Row” XML elements one at a time, convert to a Food object, store in MongoDB, and move on to the next. It’s a job for a streaming API and an iterator, powered by “yield return” to get one XElement at a time:

static IEnumerable<XElement> readElementStream(string fileName, string elementName) {
	using(var reader = XmlReader.Create(fileName)) {
		reader.MoveToContent();
		while(reader.Read()) {
			if(reader.NodeType == XmlNodeType.Element && reader.Name == elementName) {
				var e = XElement.ReadFrom (reader) as XElement;
				yield return e;
			}
		}
		reader.Close ();
	}
}

Step 3 – Pull it all together and load the data

With the pieces in place, the load process is pretty simple. Connect to the server, get the database (MongoDB creates it on first use), get the collection (MongoDB also creates the collection), and use the iterator to read the XML file, load each element into a Food object and insert into the MongoDB collection. At the end, we have a MongoDB database with a collection of data from the food guide pyramid.

var server = MongoServer.Create ("mongodb://localhost");
var db = server.GetDatabase("gov");
var foods = db.GetCollection<Food>("food");

foreach(var elem in readElementStream("~/Downloads/MyFoodapediaData/Food_Display_Table.xml", "Food_Display_Row")) {
	var food = new Food(elem);
	foods.Insert (food);
}

My favorite thing about this is that I never had to leave C# to create the database, parse the source XML, or load the data. I don’t have to run a separate ETL process or use management tools to configure my database schema. It’s a simple, self-contained solution.

My second favorite part is that I ran all of this under Ubuntu and Mono. It should work just as well under Windows and .NET, but life is better running under an open source software stack.

I hope you find this helpful if you’re getting started with MongoDB and want a little data to play with.

Categories: C#, MongoDB Tags: ,

If C# is so awesome, why use anything else?

December 29, 2011 20 comments

Anyone who knows me professionally knows I work in C# most of the time. I think it’s a great language that’s been well designed and made very portable by way of being an open language specification. A lot of people look at C# and say, that’s just Java with some Microsoft-extensions. Sort of, since it’s framework (.NET) ships with quite a few libraries that interoperate well with Windows, although the C# language itself doesn’t have anything to do with Windows, and runs on Linux, OS X, Android, iOS, and so on. In my opinion, Java has stagnated over the years, while C# has been evolving with generics (which Java followed), lambdas (Java finally gets them years later), anonymous types, partial method and class declarations, language integrated query (LINQ), dynamic runtime integration, and soon a simplified asynchronous programming model with await and async that will allow the runtime to deal with the gory details of async programming rather than forcing the programmer to understand and properly implement callbacks and cleanup. Java isn’t catching up fast, so Scala is filling the gaps, but C# remains years ahead.

Every year, my family looks at me funny when I they give me new books. That’s right, I’m a geek that reads computer books. Most of these books are not on C#, but on JavaScript, Python, Haskell, and I even keep an old PERL book on my shelf. What is all this other stuff?

JavaScript – it’s pretty rare these days that other developers would say, “why would you ever want to write JavaScript?” It’s a ubiquitous language amongst web browsers, and it’s pretty rare that anyone can write much of a browser-based application at all without it. Besides, the latest trend is to write a “language X to JavaScript converter” and what good would that be if I didn’t know JavaScript and wasn’t willing to learn language X? There have always been some nice server-side implementations, like Spidermonkey and newer V8, powering trendy applications like MongoDB and Node.js. Until I started down the Python path, whenever I needed extensibility, I would embed Spidermonkey for some JavaScript fun.

Python – in the realm of C#, a lot of people are uncomfortable mixing in Python. They don’t like the idea of losing compile-time checks and worry about needing a myriad of Python frameworks to solve any sizable development tasks. However, Python is an excellent tool for large and small projects alike, and IronPython take the Python language and gives it access to the full .NET framework. In the last few years, I’ve felt constrained if I didn’t have a layer of extensibility that IronPython can add to CLR applications. Python scripts let you treat code as data, meaning you can store it, transmit it, and change it at runtime. Python gives you a new way to move the problem around, solve it at a different time in your overall solution. It’s a great piece of the toolbox.

I remember spending weeks building business rules engines so non-programmers could add some logic to enterprise applications. These engines would use reflection and Lightweight Code Generation (LCG) and a clumsy UI where end users would select data objects and operators and build expression statements. IronPython uses LCG, is highly optimized, and gives you a general purpose scripting language with access to CLR objects. Most end users prefer the ability to write an expression in script rather than fumble with the type of UI needed to build an expression tree. This is just scratching the surface of the Python language, but at the very least, it’s a great tool anywhere you want to offer runtime extensibility.

Haskell is pure functional programming – no state, just functions. I used F# a bit for professional work just to learn it, but it allows you to fall back into the OOP line of thinking. Haskell makes you take a fully functional approach. I recommend every developer that’s looking to expand their approach to problem solving to spend some time with it.

What about that PERL 5 book? Well, I don’t use that, to be honest. I did once upon a time, but I really do avoid PERL at all costs. Maybe one day, I’ll pick it back up.

Categories: C#, IronPython Tags: , ,

TCP Proxy in C# using Task Parallel Library

April 27, 2011 9 comments

Every now and then I have the need to proxy TCP communications, handy for things like viewing network traffic or proxying Silverlight or Flash requests. C# makes this pretty easy, and the Task Parallel Library (add-on to .NET 3.5 & shipped with .NET 4) simplifies the code with a nice fluent interface.

Here’s a quick example that works for proxying a VNC connection. There is one task for reading from the client and sending data to the server and another task for reading server responses and sending them to the client.

static TcpListener listener = new TcpListener(IPAddress.Any, 4502);

const int BUFFER_SIZE = 4096;

static void Main(string[] args) {
    listener.Start();
    new Task(() => {
        // Accept clients.
        while (true) {
            var client = listener.AcceptTcpClient();
            new Task(() => {
                // Handle this client.
                var clientStream = client.GetStream();
                TcpClient server = new TcpClient("10.0.1.5", 5900);
                var serverStream = server.GetStream();
                new Task(() => {
                    byte[] message = new byte[BUFFER_SIZE];
                    int clientBytes;
                    while (true) {
                        try {
                            clientBytes = clientStream.Read(message, 0, BUFFER_SIZE);
                        }
                        catch {
                            // Socket error - exit loop.  Client will have to reconnect.
                            break;
                        }
                        if (clientBytes == 0) {
                            // Client disconnected.
                            break;
                        }
                        serverStream.Write(message, 0, clientBytes);
                    }
                    client.Close();
                }).Start();
                new Task(() => {
                    byte[] message = new byte[BUFFER_SIZE];
                    int serverBytes;
                    while (true) {
                        try {
                            serverBytes = serverStream.Read(message, 0, BUFFER_SIZE);
                            clientStream.Write(message, 0, serverBytes);
                        }
                        catch {
                            // Server socket error - exit loop.  Client will have to reconnect.
                            break;
                        }
                        if (serverBytes == 0) {
                            // server disconnected.
                            break;
                        }
                    }
                }).Start();
            }).Start();
        }
    }).Start();
    Debug.WriteLine("Server listening on port 4502.  Press enter to exit.");
    Debug.ReadLine();
    listener.Stop();
}

This is for illustrative purposes only. If you decide to use this in production, you’ll need to use TcpListener.BeginAcceptTcpClient() for async connections, you’ll need error handling and logging, and you’ll want some sort of pool to manage (and clean up) client socket connections. Have fun, and let me know if you have concerns or suggestions.

Categories: C#, Task, tcp, TPL Tags: , , ,

Performance of IronPython vs. C#

October 21, 2009 25 comments

I’ve been using IronPython for a while, and every now and then I do a cursory check to see how it’s performs against the equivalent C# code. C# is generally a touch faster at most logic, but IronPython is faster at dynamic invocation of methods than reflection in C#. In general it’s a wash…over the course of an application you dabble in things that different languages are better optimized to handle. When performance is a wash, you get the freedom to choose entirely based on the features of each language, which is nice.

This was going well, until I had a need to do some recursion. I found that the recursive IronPython code was executing extremely slowly – code that was taking milliseconds in C# was taking minutes in IronPython. That’s not a wash anymore…it’s a few orders of magnitude. So to really figure out what’s going on with these recursive calls, I made two implementations of calculating Fibonacci numbers – one in C# and one in IronPython.

Updated 10/23 – I really need to rework this code a bit. The more I look at it, I don’t think I’m comparing apples to apples here, partly because Fibonacci isn’t all that representative of my actual problem where I’m recursing through complex types rather than primitives. After I do a little more research, maybe I can find out what performance metrics to look at when IPy (and dynamic code in general) executes appreciably slower than statically-typed C# code. The problem now is, I can get this code to run without a lot of GC, but it still does a lot of something else that doesn’t seem to show up in perfmon.

IronPython:

def fib(num):
   if num == 0:
      return 0
   elif num == 1:
      return 1
   else:
      return fib(num-1) + fib(num-2)

for i in range(0, 41): print "%i %i" % (i, fib(i))

C#:

class testfib
{
   private static int fib(int num)
   {
      if(num == 0)
         return 0;
      else if(num == 1)
         return 1;
      else
         return fib(num - 1) + fib(num - 2);
   }

   public static void Main(string[] args)
   {
      int i;
      for(i = 0; i < 41; i++)
         System.Console.WriteLine("{0} {1}", i, fib(i));
   }
}

I ran the C# version – 5.5 seconds on average. I ran the IronPython version, 12 2-1/2 minutes. What’s going on here?

Updated 10/23 – Perfmon turned out to tell me very little in compared to some profiling.

Function Time Consumed(units) Hit Count Time Consumed(%)
IronPython.Compiler.PythonScriptCode::Run 5876406555 14 27.7169028
IronPython.Runtime.PythonFunction.FunctionCaller::Call1 5870624528 37860134 27.68963105
IronPython.Runtime.Operations.PythonOps::GetLocal 2794453317 37860205 13.18043438
IronPython.Runtime.Operations.PythonOps::GetVariable 2532912650 37860205 11.94684082
IronPython.Runtime.Binding.PythonBinaryOperationBinder::IntSubtract 607137805 37860099 2.863651344
IronPython.Runtime.Operations.PythonOps::GetGlobalContext 325908336 37860134 1.53719277
IronPython.Runtime.Operations.Int32Ops::Subtract 320598896 37860099 1.512150045
IronPython.Runtime.Operations.PythonOps::GetParentContextFromFunction 315597320 37860134 1.488559404
IronPython.Runtime.Binding.PythonBinaryOperationBinder::IntAdd 282632451 18930035 1.333075937
MS.Win32.HwndSubclass::SubclassWndProc 260663364 4048 1.22945563
MS.Win32.HwndSubclass::DispatcherCallbackOperation 258788088 4048 1.220610625
MS.Win32.HwndWrapper::WndProc 258788088 4048 1.220610625

Compare this to my C# version, which has a lot less going on, since all it’s work is in the single fib method:

Function Time Consumed(units) Hit Count Time Consumed(%)
testfib::Main 1182320483 1 53.01288057
testfib::fib 1047930863 133691744 46.98711938
testfib::.cctor 0 1 0

The information from profiling turns out to be much more useful than the GC numbers I get from perfmon. What’s really going on? First off, IronPython is making millions of calls to PythonFunction.FunctionCaller.Call1. That casts the function to a PythonFunction, then does a second cast to a Func delegate, which then gets invoked.

After that, IronPython is making millions of calls to PythonOps.GetLocal, which calls PythonOps.GetVariable, which goes to the CodeContext and starts trying to lookup the variable in a few dictionaries, which appears to include the global dictionary by the number of calls to PythonOps.GetGlobalContext. The variables are local to the fib() function, but the fib() function itself is in the global context.

What else is in here? Int32Ops.Add and Int32Ops.Subtract are obviously called a lot to find Fibonacci numbers. These both have an overflow check in them in case the result is out of the range of an Int32, in which the result will be a boxed Int64 value. If the result happens to fit in a Int32, it will be performed again, with BigIntegerOps. All of this is to hide the work of handling integer overflow.

All this extra work – the casting, invoking, and dictionary lookups – are all by-products of working with a dynamic language. There isn’t a lot you can do to avoid it, but one thing that can cut down tremendously is to avoid recursion here. With an iterative solution, IronPython doesn’t have to continually lookup the functions to call, and you’re not going to be moving between function scopes so there is no need for looking up variables in dictionaries. The iterative solution in IronPython is much, much faster:

def fib(n):
   previous = -1
   result = 1
   for i in range(0,n+1):
      sum = result + previous
      previous = result
      result = sum
   return result

On with my post, and the GC numbers that are more relevant if recursing through objects, but less relevant to the code I presented with value types.

I fired up perfmon.exe and started adding various counters, but the one where you see the most visible difference is in the .NET CLR Memory, specifically the “# Gen 0 Collections”. In the IronPython version, the generation 0 collections numbered around 4 million 30,000! On my C# implementation, there actually were no collections. Everything was done on the stack. To be fair to IronPython which has to box everything in PythonType, I changed the C# code to box everything in objects, like the following:

public static object fib(object num)

The boxed C# implementation averaged 22 seconds to run, and also had about 12 6,000 generation 0 garbage collections. It’s still nowhere near the 4 million 30,000 collections taken by IronPython.

Two good things I learned:
1) Be careful with recursion in IronPython, as those PythonType objects carry some baggage that can keep the GC pretty busy.
2) Perfmon is a very important part of your toolbox if you’re fixing performance problems in IronPython code, or just .NET in general.

And, just for those Python purists out there, running it in CPython averaged 3 minutes, 18 seconds. Yes, much faster than IronPython, much slower than C#, and I have no idea how to figure out why (I have to get back to you on the CPython numbers). I also made a C version, and it averaged 3.8 6.5 seconds to execute. Obviously no garbage collection, but mysteriously slower than the C# version. Hopefully nobody cares about the C version, but here it is in case you want to try it for yourself:

int fib(int num)
{
   if(num == 0)
      return 0;
   else if(num == 1)
      return 1;
   else
      return fib(num - 1) + fib(num - 2);
}

int main(void)
{
   int i;
   for(i = 0; i < 41; i++)
      printf("%i %i\r\n", i, fib(i));
   return 0;
}