MTL Data Meetup at Radialpoint

MTL Data MeetupLove open data? Radialpoint is opening its doors to the MTL Data Meetup group, Wednesday, October 22, 2014 at 5:30 PM.

We’ll be exploring the evolution of open data with some great minds in the space. Dr. Diane Mercier will keynote the event talking to us about Montreal’s open data portal. After a quick break, Toby Hocking will show us data visualizations of bike count throughout several locations in Montreal between 2009-2013. We’ll wrap up the meetup with a new solution to Kaggle’s Titanic competition, a challenge where candidates use machine learning tools to predict which passengers survived the tragic 1912 shipwreck. A team has implemented a new Python solution using Theano (a deep learning Python library) and they are going to demo what they’ve been able to uncover using it.

Exciting stuff to come! Be sure to secure your spot here.

 

 


Elasticsearch Montreal Meetup Tonight!

Interested in stretching the possibilities of what you can do with elasticsearch?  Read the rest of this entry »


In the Near Future Data Sovereignty, Security and Privacy Will Be Why Organizations to Run to the Cloud, Not from it.

Why organizations will run to the cloud

When attending the Gigaom Structure conference in San Francisco this summer, two things stood out for me most. One was how few organizations are actually running loads in the cloud today. The second, was the huge amount of work, both legislatively and technically that cloud providers were doing in order to resolve concerns that companies have around data sovereignty, security and privacy. Millions of dollars are being spent to try and conform to different regions privacy concerns as well as huge lobbying efforts in order to shape policy that is public cloud friendly.

During the conference, a question was asked to the overfilled standing room of about 300 IT professionals and IT Architects from major North American corporations. “How many of you have working loads in the public cloud today – that you know about ?” asked the presenter. There was a small round of chuckles at the last part of the question. As I looked around, I was blown away to find that there were only about 10 other hands raised beside mine in the room.

It reminded me of a Microsoft conference that I attended in 2012. The presenter asked the room full of hundreds of IT professionals how many of them had their organizations on Windows 7 (which was released to public 3 years earlier) and I remember being amazed that I was 1 of 4 people in a room of hundreds that raised their hand. (The most unsettling thought was how many organizations might be running Vista!)

This to me, illustrates the lingering anxiety and resistance companies still have towards change. The cloud is change and its implications for data sovereignty, privacy and security remain safeguard excuses to remain stagnant.

Don’t get me wrong. The nature of some businesses (think banks) make entry into the cloud computing arena more complex. One reason is that federal regulations haven’t adjusted to allow them. Despite this, as time goes on, heavy, on premise IT infrastructures  will cease to be the standard—instead the exception to the rule.

“That You Know About”

Circling back to the initial question about how many organizations are storing loads in the cloud – the “that you know about” part speaks to the fact that cloud-use within companies is already happening despite policies to control or prevent it.  While the decision-makers, lawyers and other management hum and ha over whether to move to the cloud, often their developers have already found ways to work there.

While there are multitudes of reasons why the developers are running to the cloud, the fact that they are doing it despite corporate policies against it speaks to the inevitability of the approaching storm.

The Cloud is the Future

The rare edge cases like the story of CodeSpace.com quickly get pointed out as the reason why it’s too soon to move to the cloud. The reality is that no matter how good your security team is, how many lawyers you have and how many of them are spending their days worrying about data privacy and security concerns, there is no company in the world that is spending the resources, time and focus on this than companies like Google, Amazon and Microsoft.

The reason they are hiring the world’s best minds in these fields is not only because they want to resolve every country’s concerns regarding privacy legislation, not only because they want the most secure public clouds, but because they want there to be no more excuses that prevent key decision-makers from spending their dollars with them instead of with the Blade, SAN and Network equipment makers of the world.

In the very near future companies will be running to the public cloud because if they are genuinely concerned about having great security and privacy without any data sovereignty issues, they would be crazy to build their infrastructure anywhere else.

 Photo credit: http://www.joshuaearlephotography.com/


This post originally appeared on IT World Canada as part of their So You Think You Can Blog  contest


OSGi: The Gateway Into Micro-Services Architecture

The terms “modularity” and “microservices architecture” pop up quite often these days in context of building scalable, reliable distributed systems. Java platform itself is known to be weak with regards to modularity (Java 9 is going to address this by delivering project Jigsaw), giving a chance to frameworks like OSGi and JBoss Modules to emerge.

When I first heard about OSGi back in 2007, I was truly excited about all the advantages Java applications might benefit from by being built on top of it. But very quickly the frustration took place instead of excitement: no tooling support, very limited set of compatible libraries and frameworks, quite unstable and hard to troubleshoot runtime. Clearly, it was not ready to be used by an average Java developer and as such, I had to put it on the shelf. Over the years, OSGi has matured a lot and gained  widespread community support.

The curious reader may ask: what are the benefits of using modules and OSGi in particular? To name just a few problems it helps solve:

  • explicit (and versioned) dependency management: modules declare what they need (and optionally the version ranges)
  • small footprint: modules are not packaged with all their dependencies
  • easy release: modules can be developed and released independently
  • hot redeploy: individual modules may be redeployed without affecting others

In today’s post we are going to take a 10000 foot view on a state-of-the art in building modular Java applications using OSGi. Leaving aside discussions how good or bad OSGi is, we are going to build an example application consisting of following modules:

  • data access module
  • business services module
  • REST services module

Apache OpenJPA 2.3.0 / JPA 2.0 for data access (unfortunately, JPA 2.1 is not yet supported by OSGi implementation of our choice), Apache CXF 3.0.1 / JAX-RS 2.0 for REST layer are two main building blocks of the application. I found Christian Schneider‘s blog, Liquid Reality, to be invaluable source of information about OSGi (as well as many other topics).
In OSGi world, the modules are called bundles. Bundles manifest their dependencies (import packages) and the packages they expose (export packages) so other bundles are able to use them. Apache Maven supports this packaging model as well. The bundles are managed by OSGi runtime, or container, which in our case is going to be Apache Karaf 3.0.1 (actually, it is the single thing we need to download and unpack).

Let me stop talking and show some code. We are going to start from the top (REST) and go all the way to the bottom (data access) as it would be easier to follow. Our PeopleRestService is a typical example of JAX-RS 2.0 service implementation:

package com.example.jaxrs;

import java.util.Collection;

import javax.ws.rs.DELETE;
import javax.ws.rs.DefaultValue;
import javax.ws.rs.FormParam;
import javax.ws.rs.GET;
import javax.ws.rs.POST;
import javax.ws.rs.PUT;
import javax.ws.rs.Path;
import javax.ws.rs.PathParam;
import javax.ws.rs.Produces;
import javax.ws.rs.QueryParam;
import javax.ws.rs.core.Context;
import javax.ws.rs.core.MediaType;
import javax.ws.rs.core.Response;
import javax.ws.rs.core.UriInfo;

import com.example.data.model.Person;
import com.example.services.PeopleService;

@Path( "/people" )
public class PeopleRestService {
    private PeopleService peopleService;

    @Produces( { MediaType.APPLICATION_JSON } )
    @GET
    public Collection< Person > getPeople( 
            @QueryParam( "page") @DefaultValue( "1" ) final int page ) {
        return peopleService.getPeople( page, 5 );
    }

    @Produces( { MediaType.APPLICATION_JSON } )
    @Path( "/{email}" )
    @GET
    public Person getPerson( @PathParam( "email" ) final String email ) {
        return peopleService.getByEmail( email );
    }

    @Produces( { MediaType.APPLICATION_JSON  } )
    @POST
    public Response addPerson( @Context final UriInfo uriInfo, 
            @FormParam( "email" ) final String email, 
            @FormParam( "firstName" ) final String firstName, 
            @FormParam( "lastName" ) final String lastName ) {

        peopleService.addPerson( email, firstName, lastName );
        return Response.created( uriInfo
            .getRequestUriBuilder()
            .path( email )
            .build() ).build();
    }

    @Produces( { MediaType.APPLICATION_JSON  } )
    @Path( "/{email}" )
    @PUT
    public Person updatePerson( @PathParam( "email" ) final String email,
            @FormParam( "firstName" ) final String firstName, 
            @FormParam( "lastName" )  final String lastName ) {

        final Person person = peopleService.getByEmail( email );

        if( firstName != null ) {
            person.setFirstName( firstName );
        }

        if( lastName != null ) {
            person.setLastName( lastName );
        }

        return person;
    }

    @Path( "/{email}" )
    @DELETE
    public Response deletePerson( @PathParam( "email" ) final String email ) {
        peopleService.removePerson( email );
        return Response.ok().build();
    }

    public void setPeopleService( final PeopleService peopleService ) {
        this.peopleService = peopleService;
    }
}

As we can see, there is nothing here telling us about OSGi. The only dependency is the PeopleService which somehow should be injected into the PeopleRestService. How? Typically, OSGi applications use blueprint as the dependency injection framework, very similar to old buddy, XML based Spring configuration. It should be packaged along with application inside OSGI-INF/blueprint folder. Here is a blueprint example for our REST module, built on top of Apache CXF 3.0.1:

<blueprint xmlns="http://www.osgi.org/xmlns/blueprint/v1.0.0"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:jaxrs="http://cxf.apache.org/blueprint/jaxrs"
    xmlns:cxf="http://cxf.apache.org/blueprint/core"
    xsi:schemaLocation="

http://www.osgi.org/xmlns/blueprint/v1.0.0


http://www.osgi.org/xmlns/blueprint/v1.0.0/blueprint.xsd


http://cxf.apache.org/blueprint/jaxws


http://cxf.apache.org/schemas/blueprint/jaxws.xsd


http://cxf.apache.org/blueprint/jaxrs


http://cxf.apache.org/schemas/blueprint/jaxrs.xsd


http://cxf.apache.org/blueprint/core

        http://cxf.apache.org/schemas/blueprint/core.xsd">

    <cxf:bus id="bus">
        <cxf:features>
            <cxf:logging/>
        </cxf:features>
    </cxf:bus>

    <jaxrs:server address="/api" id="api">
        <jaxrs:serviceBeans>
             <ref component-id="peopleRestService"/>
        </jaxrs:serviceBeans>
        <jaxrs:providers>
            <bean class="com.fasterxml.jackson.jaxrs.json.JacksonJsonProvider" />
        </jaxrs:providers>
    </jaxrs:server>

    <!-- Implementation of the rest service -->
    <bean id="peopleRestService" class="com.example.jaxrs.PeopleRestService">
        <property name="peopleService" ref="peopleService"/>
    </bean>

    <reference id="peopleService" interface="com.example.services.PeopleService" />
</blueprint>

Very small and simple: basically the configuration just states that in order for the module to work, the reference to the com.example.services.PeopleService should be provided (effectively, by OSGi container). To see how it is going to happen, let us take a look on another module which exposes services. It contains only one interface PeopleService:

package com.example.services;

import java.util.Collection;

import com.example.data.model.Person;

public interface PeopleService {
    Collection< Person > getPeople( int page, int pageSize );
    Person getByEmail( final String email );
    Person addPerson( final String email, final String firstName, final String lastName );
    void removePerson( final String email );
}

And also provides its implementation as PeopleServiceImpl class:

package com.example.services.impl;

import java.util.Collection;

import org.osgi.service.log.LogService;

import com.example.data.PeopleDao;
import com.example.data.model.Person;
import com.example.services.PeopleService;

public class PeopleServiceImpl implements PeopleService {
    private PeopleDao peopleDao;
    private LogService logService;

    @Override
    public Collection< Person > getPeople( final int page, final int pageSize ) {
        logService.log( LogService.LOG_INFO, "Getting all people" );
        return peopleDao.findAll( page, pageSize );
    }

    @Override
    public Person getByEmail( final String email ) {
        logService.log( LogService.LOG_INFO, "Looking for a person with e-mail: " + email );
        return peopleDao.find( email );
    }

    @Override
    public Person addPerson( final String email, final String firstName, 
            final String lastName ) {
        logService.log( LogService.LOG_INFO, "Adding new person with e-mail: " + email );
        return peopleDao.save( new Person( email, firstName, lastName ) );
    } 

    @Override
    public void removePerson( final String email ) {
        logService.log( LogService.LOG_INFO, "Removing a person with e-mail: " + email );
        peopleDao.delete( email );
    }

    public void setPeopleDao( final PeopleDao peopleDao ) {
        this.peopleDao = peopleDao;
    }

    public void setLogService( final LogService logService ) {
        this.logService = logService;
    }
}

And this time again, very small and clean implementation with two injectable dependencies, org.osgi.service.log.LogService and com.example.data.PeopleDao. Its blueprint configuration, located inside OSGI-INF/blueprint folder, looks quite compact as well:

<blueprint xmlns="http://www.osgi.org/xmlns/blueprint/v1.0.0"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="

http://www.osgi.org/xmlns/blueprint/v1.0.0

        http://www.osgi.org/xmlns/blueprint/v1.0.0/blueprint.xsd">

    <service ref="peopleService" interface="com.example.services.PeopleService" />
    <bean id="peopleService" class="com.example.services.impl.PeopleServiceImpl">
        <property name="peopleDao" ref="peopleDao" />
        <property name="logService" ref="logService" />
    </bean>

    <reference id="peopleDao" interface="com.example.data.PeopleDao" />
    <reference id="logService" interface="org.osgi.service.log.LogService" />
</blueprint>

The references to PeopleDao and LogService are expected to be provided by OSGi container at runtime. Hovewer, PeopleService implementation is exposed as service and OSGi container will be able to inject it into PeopleRestService once its bundle is being activated.

The last piece of the puzzle, data access module, is a bit more complicated: it contains persistence configuration (META-INF/persistence.xml) and basically depends on JPA 2.0 capabilities of the OSGi container. The persistence.xml is quite basic:

<persistence xmlns="http://java.sun.com/xml/ns/persistence"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    version="2.0">

    <persistence-unit name="peopleDb" transaction-type="JTA">
        <jta-data-source>
            osgi:service/javax.sql.DataSource/(osgi.jndi.service.name=peopleDb)
        </jta-data-source>
        <class>com.example.data.model.Person</class>

        <properties>
            <property name="openjpa.jdbc.SynchronizeMappings" value="buildSchema"/>
        </properties>
    </persistence-unit>
</persistence>

Similarly to the service module, there is an interface PeopleDao exposed:

package com.example.data;

import java.util.Collection;

import com.example.data.model.Person;

public interface PeopleDao {
    Person save( final Person person );
    Person find( final String email );
    Collection< Person > findAll( final int page, final int pageSize );
    void delete( final String email );
}

With its implementation PeopleDaoImpl:

package com.example.data.impl;

import java.util.Collection;

import javax.persistence.EntityManager;
import javax.persistence.criteria.CriteriaBuilder;
import javax.persistence.criteria.CriteriaQuery;

import com.example.data.PeopleDao;
import com.example.data.model.Person;

public class PeopleDaoImpl implements PeopleDao {
    private EntityManager entityManager;

    @Override
    public Person save( final Person person ) {
        entityManager.persist( person );
        return person;
    }

    @Override
    public Person find( final String email ) {
        return entityManager.find( Person.class, email );
    }

    public void setEntityManager( final EntityManager entityManager ) {
        this.entityManager = entityManager;
    }

    @Override
    public Collection< Person > findAll( final int page, final int pageSize ) {
        final CriteriaBuilder cb = entityManager.getCriteriaBuilder();

        final CriteriaQuery< Person > query = cb.createQuery( Person.class );
            query.from( Person.class );

        return entityManager
            .createQuery( query )
            .setFirstResult(( page - 1 ) * pageSize )
            .setMaxResults( pageSize )
            .getResultList();
    }

    @Override
    public void delete( final String email ) {
        entityManager.remove( find( email ) );
    }
}

Please notice, although we are performing data manipulations, there is no mention of transactions as well as there are no explicit calls to entity manager’s transactions API. We are going to use the declarative approach to transactions as blueprint configuration supports that (the location is unchanged, OSGI-INF/blueprint folder):

<blueprint xmlns="http://www.osgi.org/xmlns/blueprint/v1.0.0"
    xmlns:jpa="http://aries.apache.org/xmlns/jpa/v1.1.0"
    xmlns:tx="http://aries.apache.org/xmlns/transactions/v1.0.0"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="

http://www.osgi.org/xmlns/blueprint/v1.0.0

        http://www.osgi.org/xmlns/blueprint/v1.0.0/blueprint.xsd">

    <service ref="peopleDao" interface="com.example.data.PeopleDao" />
    <bean id="peopleDao" class="com.example.data.impl.PeopleDaoImpl">
        <jpa:context unitname="peopleDb" property="entityManager" />
        <tx:transaction method="*" value="Required"/>
    </bean>

    <bean id="dataSource" class="org.hsqldb.jdbc.JDBCDataSource">
        <property name="url" value="jdbc:hsqldb:mem:peopleDb"/>
    </bean>

    <service ref="dataSource" interface="javax.sql.DataSource">
        <service-properties>
            <entry key="osgi.jndi.service.name" value="peopleDb" />
        </service-properties>
    </service>
</blueprint>

One thing to keep in mind: the application doesn’t need to create JPA 2.1‘s entity manager: the OSGi runtime is able do that and inject it everywhere it is required, driven by jpa:context declarations. Consequently, tx:transaction instructs the runtime to wrap the selected service methods inside transaction.

Now, when the last service PeopleDao is exposed, we are ready to deploy our modules with Apache Karaf 3.0.1. It is quite easy to do in three steps:

  • run the Apache Karaf 3.0.1 container
    bin/karaf (or bin\karaf.bat on Windows)
  • execute following commands from the Apache Karaf 3.0.1 shell:
    feature:repo-add cxf 3.0.1 
    feature:install http cxf jpa openjpa transaction jndi jdbc 
    install -s mvn:org.hsqldb/hsqldb/2.3.2 
    install -s mvn:com.fasterxml.jackson.core/jackson-core/2.4.0
    install -s mvn:com.fasterxml.jackson.core/jackson-annotations/2.4.0 
    install -s mvn:com.fasterxml.jackson.core/jackson-databind/2.4.0 
    install -s mvn:com.fasterxml.jackson.jaxrs/jackson-jaxrs-base/2.4.0 
    install -s mvn:com.fasterxml.jackson.jaxrs/jackson-jaxrs-json-provider/2.4.0
  • build our modules and copy them into Apache Karaf 3.0.1‘s deploy folder (while container is still running):
    mvn clean package cp module*/target/*jar apache-karaf-3.0.1/deploy/

When you run the list command in the Apache Karaf 3.0.1 shell, you should see the list of all activated bundles (modules), similar to this one: Where module-service, module-jax-rs and module-data correspond to the ones we are being developed. By default, all our Apache CXF 3.0.1 services will be available at base URL http://:8181/cxf/api/. It is easy to check by executing cxf:list-endpoints -f command in the Apache Karaf 3.0.1 shell. Let us make sure our REST layer works as expected by sending couple of HTTP requests. Let us create new person:

curl http://localhost:8181/cxf/api/people -iX POST -d "firstName=Tom&lastName=Knocker&email=a@b.com"

HTTP/1.1 201 Created
Content-Length: 0
Date: Sat, 09 Aug 2014 15:26:17 GMT
Location: http://localhost:8181/cxf/api/people/a@b.com
Server: Jetty(8.1.14.v20131031)

And verify that person has been created successfully:

curl -i http://localhost:8181/cxf/api/people

HTTP/1.1 200 OK
Content-Type: application/json
Date: Sat, 09 Aug 2014 15:28:20 GMT
Transfer-Encoding: chunked
Server: Jetty(8.1.14.v20131031)

[{"email":"a@b.com","firstName":"Tom","lastName":"Knocker"}]

Would be nice to check if database has the person populated as well. With Apache Karaf 3.0.1 shell it is very simple to do by executing just two commands: jdbc:datasources and jdbc:query peopleDb “select * from people”.

Awesome! I hope this introductory blog post opens yet another piece of interesting technology you may use for developing robust, scalable, modular and manageable software. We have not touched on many, many things but these are here for you to discover. The complete source code is available on GitHub.

Note to Hibernate 4.2.x / 4.3.x users: unfortunately, in the current release of Apache Karaf 3.0.1 the Hibernate 4.3.x does work properly at all (as JPA 2.1 is not yet supported) and, however I have managed to run with Hibernate 4.2.x, the container often refused to resolve the JPA-related dependencies.

Source: OSGi: the gateway into micro-services architecture


Canadian AI 2014 recap

View story at Medium.com


PyLadies Meetup “Python for Natural Language Processing” Held at Radialpoint This Thursday

PyLadies Meetup MTL

This Thursday, July 17, Radialpoint will be hosting a PyLadies meetup: “Python for Natural Language Processing”. Come and learn how Laura Hernandez, a PhD student from Ecole de Technologie Superieure, is aiming to detect Alzheimer’s disease using NLP while Zareen Syed deep-dives into the challenges of NLP. PyLadies organizer, Françoise Provencher will provide a demo on NLP tools in Python.

Come enjoy free snacks and refreshments alongside talented PyLadies!

Looking forward to seeing you!

Meetup Link : http://www.meetup.com/PyLadiesMTL/events/194800882/

Date: Thursday, July 17, 2014

Time: 6:30 PM to 9:00 PM

Address: 2050 Bleury, Suite 300, Montréal, QC (map)


Entity Linking and Retrieval for Semantic Search Montreal 2014

IMG_3684

A few weeks ago Radialpoint had the privilege of hosting a tutorial for Entity Linking and Retrieval for Semantic Search. For this purpose, we brought three presenters from Europe, researchers Edgar Meij from Yahoo! Labs, Krisztian Balog from the University of Stavanger and Daan Odijk from the University of Amsterdam. They spent a full day with us and a large cross-section of the Information Retrieval community here in Montreal, both on the academic and industrial sides.

This tutorial was the latest incarnation of a series of tutorials by the same presenters that started in SIGIR 2013. Besides SIGIR, The same material has been presented (among others) at WWW 2013 and WSDM 2014. These are the most important conferences in the field. SIGIR 2013 took place in Ireland, WWW 2013 in Brazil and WSDM 2014 in US. It is thus a great feat for us to have been able to host such a world-class tutorial here in downtown Montreal.

A Merger of Great Minds

I was very happy to see such a talented and diverse crowd attend the tutorial. There were people with a wide variety of interests and experience. I saw seasoned experts together with students, managers together with developers. Best of all I saw people driven by a sincere interest on the topic who were taking the opportunity to attend a training that, on top of travel costs to places such as Brazil, would have cost hundreds of dollars as part of SIGIR. This type of event cements the growing interest in data-driven R&D that Radialpoint is pushing into new levels.

The tutorial covered a key technology we’re integrating into award-winning Radialpoint Reveal and other upcoming products – the ability to detect entities (e.g., a hardware device or a software program) in running text and link it to an existing representation of the item (e.g., its Wikipedia page or its Tech Support page).

In true Open Source fashion, the presenters are making all the material for the tutorial available on github. It had two parts:

  • In the first part, they present the problem of entity linking, recognizing entities in running text and linking them back to a generalized concept graph (ontology).
  • In the second part, they cover entity retrieval, discussing the semantic search problem and statistical approaches to it.

The material is coupled with online exercises that can be found on Code Academy.

Post-Tutorial Discussions

After the event, we enjoyed some time at our office discussing how Entity Linking and Retrieval technology is best applied to knowledge management in the customer support industry. In particular, we discussed how technical support anomalies might be effectively represented as labeled subgraphs, an intriguing possibility we might explore in a very near future.

Closing Thoughts

A great tutorial, full of positive energy and forward-thinking ideas, the presenters, Edgar, Krisztian and Dan also extended their appreciation of how unique this experience was for them. Namely, how happy they were to see such diversity and different backgrounds interested in this topic. We wish them good luck and hope to see them during their next visit to Montreal!


Follow

Get every new post delivered to your Inbox.