Adventures in Groovy

500px-Groovy-logo.svgLately my colleague Udo has been busy adding Groovy support to Z2. For those who are not aware, Groovy is a scripting language that is deeply rooted in the Java platform. While Groovy differs in Syntax, Groovy classes interoperate seamlessly with Java classes.

The cool aspect of integrating Groovy, apart from Groovy itself, is that the integration is extremely smooth and required no adaptation of Z2 as such. It’s all just yet another compiler. Using the ability to use Groovy code or any mix of Groovy code with Java code just means add the Groovy Add-On  to your system and to add the declaration

java.compile.order = groovy

to a Java component declaration. That’s it.

We prepared a little sample that highlights just a few capabilities such as

  • Mixing Java and Groovy
  • Using Groovlets and GSPs
  • Using the Groovy-based Spock test specification framework

Check it out at Groovy in Z2 sample.

Upcoming Talks

speaker-02There are two talks on Z2 upcoming in the near term: In only two weeks from now I will be talking about “Stop wasting your time with Java build tools” on May 11th at Codemotion 2013 in Berlin. Shouldn’t be too hard to guess what that talk will be about.

ETschatten_0Secondly, we are very happy to be present at the Entwicklertag 2013 on June 5th in Karlsruhe where I will be talking about “Buildfrei skalieren für Big Data mit Z2” which is all about how Big Data application scale-out benefits from the “self-distributing” approach of Z2. There will be plenty of time to meet and talk.

Update (2013-05-16): Here is the codemotion slide dec.

Checking out Netweaver Cloud

Just recently I have been taking a look at SAP’s Netweaver Cloud offering.

SAP is not exactly known as a great technology vendor. Neither as great vendor of technology nor as a vendor of great technology. As I have worked for SAP and I have been using SAP software and as we (ZFabrik) thought more than once about creating software in the SAP space, I was curious as to how “meaningful” NW Cloud looks from my perspective, that is, from the outside of SAP and to somebody not working in an SAP account.

Getting started is really easy. Once you have an SDN account, you enter the SAP HANA Cloud Cockpit (don’t be fooled by that. It’s all called HANA regardless of whether you get in touch with anything HANA or whether you would even have any clue what HANA could possibly be. Don’t worry. Ignore it. It’s simply the latest and they couldn’t help it).

Technically, what you get in the trial is a Suse Linux VM that runs some LJS server, which presents itself very much like an Apache Tomcat Web Application Server of some 7.x version (7.0.35 in my case). Note also the PS below. In my case the OS had 2 GB of memory, the JVM had a 1GB heap with 256M perm space (which I believe cannot be changed).

In order to deploy some simple Web app, you only need to install some rather harmless looking Eclipse plugins (of course – you get some more than you would ask for, but at least you do not get something like Netweaver Developer Studio) and the NW Cloud SDK. After installation and getting over with some config as explained in the guide, you can control the VM and deployment of your apps from there.

All in all: It’s a glorified Tomcat on in a hosted Linux VM. All data administration, VM configuration, etc… it all looks rather hard coded. Add that you can only deploy a web applications and it all does not look too impressive.

So what’s the business model? Why would somebody like me be tempted to consider NW Cloud for anything? Where is the market?

It’s all along the same lines why RIM (the Blackberry maker) was so successful in the beginning: Corporate IT departments would not be willing to accept the (perceived) risk of allowing access to MS Exchange from outside of the company network. RIM took that responsibility from them. Now SAP departments are not any more adventurous. Providing access to internal SAP systems? No way! NW Cloud comes with a Connectivity Service that promises to do exactly that: Provide access to internal SAP systems (via some VPN channel or so) to applications running on NW Cloud that itself are directly accessible from the internet. A simplified development and production setup hence looks like this:

nwcloudThat does actually make sense and may give rise to some interesting opportunities for software development for SAP accounts.

Happy Easter!

Ps.:In fact, it seems that the Web Container instance used in NW Cloud is actually based on Eclipse Virgo (formerly known as Spring dm Server). So it is actually not Apache Tomcat. Similar. But not quite the same in terms of naming integration and other low-level aspects.

Minimally Invasive Repairs with Z2

When I talk about Z2 I usually show a picture like this one:

surgery

I do that for a purpose of course. It is meant to provide the impression that everything is driven from the repositories. I.e. it’s system centric. To propagate changes, you only need to propagate changes. Nothing else requires thinking. Does that come across?

That picture captures a real development and production landscape of a previous project (except that I added the hubcr lately). So this is not an idea, it is meant to explain something existing. Hence it lacks all that was deemed obvious to the reader (developers familiar with the technology) at the time. The obvious at that time is of course all that matters when talking to people not familiar with Z2.

It does show repositories and change flow along some dev → staging → production chain. It shows developer environments on the bottom, some automated testing environment on the top left, some staging installation at the top middle and a production landscape on the right.

Coming back to the motivation for this blog’s title: As runtimes will pick up changes by themselves, because there is no deployments and the likes, there is no such thing as a roll out in the sense of a complicated activity. Changes can be applied and made effective at minimal impact on the operational complexity. If you need fix a single line of code, than that’s the most complex thing you need to do.

A few examples:

  • A pre-production environment, called “staging” here, provides a de-coupling from daily development progress for production qualification. From a developer perspective, fixing the staging system means to check out a Z2 core, fire it up, test local changes, commit/push.
    Due to its pre-configured environment, the developer’s local staging installation will pull anything needed from the right repositories and would even use a shared staging application database.
  • Running automated integration tests means exactly that: Running them. There is virtually no environment to write setup scripts for. Changes will be picked up automatically.
  • Applying a critical hot-fix in production requires to apply the minimal change possibly (after testing in pre-prod) in the production code base and to trigger a synchronization on the hub, restarting the servers as appropriate.

What makes repairs minimally invasive is not the amount of changed code, it’s the little amount of work to be done and the little risk to be taken when applying changes.

 

Z2 Deployment Potpourri

After some discussion following the HubCR description in the previous post I thought it would be nice to describe more useful ways of deploying Z2 with some pros and cons. After all, that’s good input to next version’s documentation anyway (note to myself: Add as Wiki content).

Let’s get started with THE standard out-of-the-box of operating Z2: All Z2 homes connect directly to an SCM (Subversion or Git) and nothing else is ever needed:

For development setups there is no better choice: Everything is intrinsically as up-to-date as possible and necessarily self-contained. Also for production setups (in that case the SCMs represent a production branch) this is great, if you do not worry about having source code being downloaded to production machines (a consequence of the local compilation step). It’s great for essentially the same reasons the development setup is cool: Minimal infrastructure required, no inconsistencies, simple scale-out, complete change history.

When deployments get bigger or you are worried about source code on production machines, the Hub Component Repository (HubCR) provides another level of scale and isolation without violating the pull-deployment approach. The HubCR looks like the regular component repository abstraction to reading nodes:

It prepares a pre-compiled view that does not contain source code anymore. Also it removes the need of compilation on production machines (which is actually not a big deal, as that would happen in parallel anyway). More significantly it takes away load from the SCMs. Should the HubCR nodes eventually become a bottle neck, these could be clustered as well. (Clustering of HubCR is not available today!).

The price to pay in comparison with the development setup is that a change will not be rolled out to the production environment before the HubCR node has been synchronized. And of course, you need to maintain a HubCR node in the first place.

A future application of the HubCR is to support automated updates for remotely hosted on-premise solutions. The hub side would provide the updates to remote systems and enforce licenses.

But now, let’s leave the pull-deployment approaches aside and see, if we can find some good reasons for push-deployments.

If your execution environment is far away and connectivity is poor or cannot be guaranteed at all and you still want to make sure there is no source code on production machines, you may create a binary export using the Distribution Exporter tool.

That tool creates a source code free repository structure that is semantically equivalent to the original source repositories (as seen from Z2) – much like the HubCR does it – but copies the results into a folder structure that can be used to create another repository.

In the case below, this is yet another SCM, that has production system configuration (which typically differs from development configuration):

Turning the arrow has non-trivial implications (i.e. you need to push and delta computation is not trivial). However as there is a version-controlled store that holds the production config, you still have good control over who changed what and when. In contrast to the HubCR, the repository can be hosted elsewhere – so that network access is not required or does not need to perform particularly well.

The last case to look at is – once more – very different from the others. Suppose you want to ship your solution in a simple bundle that can be installed into a remote file system.
In that case, you would use the Distribution Exporter tool and bundle the result as a file system component repository with a Z2 core that references it.

This is the closest to a traditional deployment and can easily be implemented via a build script.

Z2V2.1 is out – working on v2.2

It’s been a while already. Forgot to put it up in the blog: Version 2.1 of the z2-Environment is ready for use and learn. All the new stuff has been described in this previous post.

Among the features we plan for 2.2 the following stand out:

  • Eclipsoid-style Z2 support for IntelliJ IDEA
  • The Hub component repository

The goal of the first item is clear: Resolve a classpath for modules in the IDE corresponding to your runtime state (read on here for example) and offer the same developer convenience that can be achieved using the Eclipse plugin for users of the IntelliJ IDE.

The Hub component repository (or HubCR) requires some more background. By default, a Z2 installation (a z2 Home) accesses the version controlled storage that holds the system definition directly to download and prepare (and compile) modules as needed. This is the underpinnings of the system centric, pull-deployment based approach. In some situations however, it is not desirable to have source code on production servers at any time. This may be for compliance reasons or for fear of risking theft of intellectual property.

This is where the HubCR comes in. As a principle, all modules, all component resources, essentially anything the Z2 runtime knows about is served by component repository implementations.  There are implementations for Subversion, Git, file system folders, development workspaces and now, the latest addition, for another Z2 server that provides a consolidated, source-code free and pre-compiled view onto production resources (the HubCR provider).

So, instead of having production systems read and process source code directly, an intermediate node provides a semantically equivalent but pre-processed view onto the system definition:

The way the HubCR does that is by maintaining a pre-compiled and source-code stripped snapshot of the original production configuration. At the same time, the HubCR is just a regular z2 Home that runs the production config.

To the real production nodes (on the right in the diagram) however, the HubCR presents everything but the HubCR and other remote component repositories.

As a result, production systems can be completely separated from the source level details of the system definition. They don’t even see authentication details to the configuration store – only those necessary to access the HubCR service. At the same time, the pull semantics are preserved and updates can flow in and will be distributed consistently as for any other Z2-based system.

 

 

If Santa Claus would use Java best practices….

Imagine Santa would use Java best-practices to have Christmas presents built. He would…

… not use complete blueprints, instead he would specify only parts of a gift (there is a leg, there is a red dress, …);

… declare some random number of aspects – more or less abstractly (“legs need to attached too main corpse, …”) – and leave the details to his gnomes;

… declare a brush on the blueprint (if the eyes need to be painted, you need a brush right?);

… not realize that paint could be taken literally as a part of dolls in red dresses. As paint comes in cans, the gnomes would mount the empty can to the back of the doll.

Eventually his gnomes would stitch dolls in red dresses onto male cats. Sometimes several. As dolls are unthinkable without Spring rolls, there would be some. The cat would eat all of those. Most likely it would fall over, dump a heap, jump the wall, or all of it in some random sequence….

Merry Christmas to all of you!

A cloud deployment recipe

Lately I have been working on (re-) designing the operational mode of a Java-based solution that includes Hadoop and HBase and is to be scaled out in a “cloud-style” manner. In other words: changing the set of nodes that define the operational environment may change dramatically over the course of time in number and in other parameters (like the exact OS used) – and hence that change must be implementable as cheap as possible. And, if you do not want to get overwhelmed by scale complexity later on, this is something to definitely keep in mind right from the beginning.

Complexity again

Complexity is a killer for clustered solutions. For systems that may need to scale into the hundreds of nodes, any complexity avoided up-front is not just a risk-reduction. It’s a definite must-have.

In order to reduce complexity, you need to narrow dependencies between your stuff – that you can fully control – and their stuff – that you have reduced control of. In an ideal world, you would not need anything like a node local OS in the first place. Of course (and unfortunately) that is not realistic. The next best solution would obviously (!) be that all nodes get a copy (and updates) of everything they need by something as simple as running an rsync, an Subversion checkout, or a “Git pull”.

How to Install

The other important aspect, next to narrowing OS dependencies, is locality of what you install and users.

Linux package managers (rpm, dpkg) are a great way of configuring your OS as far as it concerns basic, stable components where incompatibilities are highly unlikely. If you can, getting along without them is better though. Anything you can configure to run from where you copied it to is naturally superior in simplicity, consistency, clean removal, clean update, ability to install parallel version – i. e. in all practical matters.

Also, and I personally think this is a quality that hardly gets the recognition it deserves, software that can be installed and run from a folder of your choice can be used by developers just the way it is used in production.

Package managers solve a problem in providing to a Linux distribution with dependencies. They do not necessarily solve your problems: On a later solution packages may have become incompatible with previous versions (as in the case of Ganglia for example). The JDK you used may not be available anymore, etc. You do not want the Linux distribution providers or other third parties to determine when you need to change your solution! You need to be on top of that!

Here’s my recipe:

  • Everything possible is installed in and executed by one and the same user account
  • All software as far as possible is installed in one simple folder layout that has one folder per software component
  • Moving software into location is generally good enough. No post-copy transformation is acceptable. At most environment variables may be used to convey topological information.

Specifically, the JDK, hadoop, hbase are installed by folder checkouts.

Fortunately, Java provides a rather solid isolation against OS matters to the extent that you can install a Java Runtime Environment essentially by copying it and you generally do not need native libraries that would have to be installed “ouf of band” (Admittedly as “malloc uses excessive memory for multi-threaded applications” shows the JDK insulation is sometimes not strong enough).

Automation

Just like people may look alike but do different things, so do nodes in your cluster. Doing different tasks translates to running different processes with potentially different configurations and may also mean to install different software components.

Fortunately there are tools that automate preparation of a node configuration as well as making sure that exactly the right stuff runs.

I chose CFEngine. While its configuration language sucks, the underlying model (promise convergence) is cool and it has hardly any dependency on anything else (which is arguably the coolest feature of CFEngine). So in short the process of adding a node to the cluster means:

  1. Add the node to the policy server stored configuration (i. e. designate a purpose – which can be based on IP ranges for example – and hence be “automatic”).
  2. Throw CFEngine at the node (install one .deb or ,rpm).
  3. Bootstrap the local CFEngine for the policy server.

Can be done even by an SSH-remoted script.

Once up, CFEngine will check every few minutes whether promises still hold. This includes configuration updates but also making sure your server is still running the right processes and services (assuming your promises are designed accordingly).

The other thing that is cool about CFEngine is that it can run in user-space. That is you could split your configuration into promises that require root-permissions (which better be few and robust) and leave the ugly stuff to user space configuration. If there is anything you want to hide from user space processes (e.g. private keys), you need to block root access as much as possible.

Finally some words on

Monitoring

Along with the ability to bring your solution to the nodes and to adapt it as needed and to make sure all necessary things are up, you can make sure monitoring agents are their and running. Agent based monitoring tools such as Ganglia and Zabbix provide a powerful information interface to the piece you are looking at. The “push from agent” model has other advantages:

  • As agents call the monitoring hub, they may self-register
  • If the agent doesn’t talk anymore, it is clear the node is in trouble
  • Less contention: A node gone bad will not hamper information retrieval (i.e. no blocked and waiting connections).

Ps.: Z2

In the setup above, Z2 is used as execution environment for application code, either to run some actual service endpoint, to run background jobs, or to run MapReduce jobs as in How to Hadoop. Z2 is installed by a core check out (as usual) and pulls its updates on its own.

Modular Vaadin over Spring and JPA

My first encounter with Vaadin was during the “Orignal1 project”. At the time we were looking for a user interface framework that was suitable for an externally available administration interface that was expected to

  • support few but demanding users that should get a rich user experience;
  • have a very streamlined, repetitive look and feel – no surprises when moving between views;
  • not be layout driven (no pixel perfect visuals);
  • become rather data centric and will require a lot of desktop-style interaction for data maintenance;
  • become subject to a lot of extensions over time.

Nobody in the team knew about Vaadin at that time (as far as I can remember – although the larger part of the team was located in Helsinki!). There was a lot of experience with design-driven mass-user Web sites on the team, also some for snappy, highly interactive rich client sites. Plain HTML or Sencha-style rich clients were quickly dismissed though for the problem of balancing productivity, extensibility, and user experience in the niche we were moving into.

When Vaadin was brought up, it only took a few samples to be convinced – despite its name that doesn’t sound too impressive for anybody not from Finland.

The downside of Vaadin is that one frequently has the feeling that the APIs could be somehow more elegant. On the plus side, considering our constraints, you get:

  • It’s all Java code. No other markup required (except for theming of course)
  • As its all Java, you can build composite controls, your own utility controls, anything unless when you want different HTML (in which case you need to compile GWT widgets).
  • As it is all Java, you can put your composite controls and utilities in re-use modules – including their resources (e.g. images).
  • As it is all Java, you can refactor directly from your IDE
  • Productivity is only limited by the quality of your application structure and re-use assets (and your dev environment of course)
  • It comes with a rich control set and there is an impressive add-on community

So, currently at least, if somebody asks for a toolkit to build intranet and “complex+few users+very much under maintenance” user interface, Vaadin is a safe choice.

I promised a sample. It’s here:

Sample-vaadin-spring-hibernate

It is very much like the other samples in our Wiki. And as before we use Spring to inject dependencies into the UI layer whereever needed. Which is kind of the point here. Vaadin was used just as a drop-in replacement for other UI implementations and it simply fit in -  without any bending and plumbing.

 

 

Z2 V2.1 is coming…

While I wanted to write about something completely different – much more technical and possibly quite boring to most – finishing up version 2.1 of Z2 keeps us so busy that I thought it may be much nicer to simply brag a bit about what is really cool in v2.1.

1. We do finally have a decent project management tool in place

We switched from Trac to Redmine and finally have issues, repositories, samples, and Wiki in one place and all linked up with each other. It’s here: Redmine at z2-environment.net.

The Wiki has turned out nice. Look here: Z2-Environment Wiki. Still some work in progress content of course.

2. We moved completely in with Git now

It’s not like we have become true Giterons now. Far from it. Subversion is supported as it was before.

Git does however fit nicely with Z2 – much more so than we thought a while back. Also, Git does help with cross-repository operations, where Subversion is rather weak.

In contrast to Subversion, with Git there is a natural tendency to have more and smaller projects. That is reflected in the repositories that make Z2 and its add-ons.

And we are know hosting all repositories ourself. Via Redmine they are all nicely integrated with the Web UI and we have all permission management abstracted out from the OS layer.

3. Add-ons

Previously we had two distributions: z2@base and z2@spring and planned for z2@hadoop. From v2.1 on there is only z2-base and add-ons and samples. I.e. there is a Spring add-on that has all the Spring modules (and a Wiki page and samples) as there is a Hadoop add-on (and a Wiki page and samples) and hopefully more.

To make use of an add-on, all you need to do is to declare another repository with Z2. Samples do so and you hardly notice. Want to try one? Takes no more than 5 minutes:

http://redmine.z2-environment.net/projects/z2-environment/wiki/Sample-spring-basic

With the simple ability to create your own repositories (here Git is nice), scenario setup is simple and still cleanly separated and in relation with the original repositories:

This is really great! Previously our approach was more like: Take a copy of the z2@base repository, modify the environment, and add your repository on top.

With the modular repository approach and the overriding of the environment (see How to create your own system) and by setting up via “git pull” this has become much simpler and repeatable – and upgradable!

4. Z2 Zero Downtime Upgrades

Now to a true software feature: Zero Downtime Upgrades (implemented via the Gateway project actually). Updating stateful Web applications typically implies downtime. Most do not have serializable session state, and even if they have, you would still have a short downtime when updating. With Z2 Zero Downtime Upgrades we use the built-in capability of worker process management to implement no-downtime upgrades, by putting old worker processes in retirement until all sessions bound to them have terminated and the gateway terminates them.

Old sessions can complete with the old code, while new sessions will make use of the new code.

5. Samples

Via the Wiki and the linked repositories we have a very convenient way for us to roll-out samples and for you to try and understand them. There is samples for Spring, transaction manager integration, hadoop (in progress), and we are working on more.

6. Lots of improvements and updates

Finally we are on Jetty 8. Eclipsoid integrates with the source code resolution during debugging, etc., etc.