I recently was tasked with performing an ETL task that should be done as efficiently and quickly as possible. The work led me to learn more about parallel and distributed processing in Clojure. In addition to having a greater appreciation for what Clojure enables (once again), I also pushed the boundaries of what I thought is possible using the available tools. I ultimately ended up writing a Spark job whose executors are each running N threads (currently, N=3). But the path to that solution taught as much by what didn’t work as much as what did work.
For those of you with experience with Maven, you might be wondering why anyone who is using Leiningen to build a project would then want to run that build tool from Maven, which is itself another build tool. There is a reason why I even ventured down this path. I would like to share what I have found so far, in case it benefits anyone else, but I would also like to get feedback from people who know of a better way of accomplishing the same goals.
I’m excited to be selected as a speaker at the upcoming Clojure/West 2015 conference next month in Portland! I’ll be talking on how Clojure can be used to program in other human languages (other than English). There are interesting opportunities related to diversity and access. I will be drawing on my experiences with programming in/for Thamil in the clj-thamil library. And I’ll see what other interesting, related ideas I can slip in (turtles that draw?)… and put a bird on them.
Or: A clear example of what macros can do
I started working on a library called clj-thamil that I envision as a general-purpose library for Thamil language computing (ex: mobile & web input method), but a slight excursion in that work has led me to some very deep, intriguing ideas — some of which are technical, and some of which are socio-cultural. But they all fit together in my mind — Clojure, macros, opportunity and diversity (in computing), and the non English-speaking world.
I think that the implications are things that we should all think about. But if nothing else, hopefully you can read this account and understand something about macros — the kind of power they uniquely provide and at least good one use case where they are necessary.
Back around the December – January time frame, I was trying to implement the Lambda Architecture as described by Nathan Marz. At that time, the early-release version of his upcoming Big Data book was just at chapter 5 or 6, but my goal was to tackle what seemed liked the harder part — real-time (Storm). The book chapters hadn’t yet caught up to it. A few slide decks mentioned their current implementations of a fully thought-out, end-to-end Lambda Architecture implementation that included Storm, but no reliable, easy-to-deploy code was readily available from the interwebs.
In installing Storm, it quickly seemed apparent that having Kafka running upstream of it was one way to support both real-time and batch processing of incoming data, and probably the one of least resistance. So I added installing Kafka to my to-do list.
Cutting edge technology means dealing with rough edges. I downloaded the latest versions of the relevant software components, but the integration of all of them didn’t work. As I found out, the reason was that versions of components that finally worked together with each other for me were not the most recent, but instead maybe a version or two behind.
The code that I ended up with to get Kafka and Storm working together on a toy example using the Twitter Dev Stream is on github here:
The Clojure for Beginners workshop at the Oakland Workshop Weekend a couple of weekends ago went well. Everyone who came to the workshop came into it curious about Clojure and they all stayed engaged and curious to the end! They all wanted the slides, and after a few fixes, I’ve uploaded them to my Github project page for the presentation here.
Just FYI, here are the considerations I had when creating the slides, from an instructor perspective:
I’ll be teaching a short workshop at the upcoming Oakland Workshop Weekend that is a beginner’s introduction to Clojure. The Clojure workshop is currently slated for the Saturday, June 22 at 5pm (until 7pm).
Workshop Weekend is a full 2-day long event where you register and can take as many courses as you can attend in those 2 days. Some of the classes for the upcoming June iteration include cheese making, aquaponics, negotation, soldering, and web hosting 101. It’s worth attending, especially if you’ve never been (and even if you’re not attending the Clojure workshop, which rumor has it will be super-awesome, natürlich!).
If you have any questions or suggestions about the Clojure workshop, drop me a line!
Update (6/14/2013): You can get a $10 discount on the registration if you use the code CLOJURE0613.
I use a fair number of libraries for my Clojure project that are not integrated into the Maven ecosystem by Clojars and Leiningen. For as long as I was using Leiningen 1, I was able to get around this using the hack of putting all my local/native libraries in the ‘libs’ directory. Leiningen 2 encourages standardizing on using the user’s local Maven repository, and eschewing the ‘libs’ hack is one part of that process. Not only is jumping on board with Leiningen v2 a good thing, but I needed v2 in order to open SWT windows in a REPL on Mac OS X for the first time ever. So the following are the instructions that I followed to ultimately get my local / native lib jars installed with my local Maven repo for Leiningen 2 to pick up automatically. Let me know if they work for you!
Most programming languages have facilities of some sort to enable the developer to execute commands on the OS’s shell prompt. Maybe it’s just me, but I’ve never been able to have an easy time at managing system processes properly (i.e., representing processes individually through the language’s constructs). Representing multiple processes piped from one to the next properly seemed like a pipe dream (sorry). I think the first painless way of doing this, at least among all the different options that I’ve seen, is in Clojure with the clj-commons-exec project. At the moment, it seems like a hidden gem. A side benefit of the project is that the evolution of the code for piping one process to the next, at least for me, is instructive on the difference between object-oriented and functional paradigms.
Since learning Clojure will most often require a lot of experience in coding in Clojure, a practical consideration for any new learner will be having an effective environment for editing code, testing, deploying, etc. On the Clojure development (developer?) website, there are instructions on getting started with Clojure. I believe that these instructions are intended to be the official help resource for everyone, and people are making good effort to keep it up-to-date and authoritative. My attempt, with this post, is to formalize what I know, and then contribute over to the official documentation whatever others might find a useful addition.