Canadian AI 2014 recap

View story at Medium.com


PyLadies Meetup “Python for Natural Language Processing” Held at Radialpoint This Thursday

PyLadies Meetup MTL

This Thursday, July 17, Radialpoint will be hosting a PyLadies meetup: “Python for Natural Language Processing”. Come and learn how Laura Hernandez, a PhD student from Ecole de Technologie Superieure, is aiming to detect Alzheimer’s disease using NLP while Zareen Syed deep-dives into the challenges of NLP. PyLadies organizer, Françoise Provencher will provide a demo on NLP tools in Python.

Come enjoy free snacks and refreshments alongside talented PyLadies!

Looking forward to seeing you!

Meetup Link : http://www.meetup.com/PyLadiesMTL/events/194800882/

Date: Thursday, July 17, 2014

Time: 6:30 PM to 9:00 PM

Address: 2050 Bleury, Suite 300, Montréal, QC (map)


Entity Linking and Retrieval for Semantic Search Montreal 2014

IMG_3684

A few weeks ago Radialpoint had the privilege of hosting a tutorial for Entity Linking and Retrieval for Semantic Search. For this purpose, we brought three presenters from Europe, researchers Edgar Meij from Yahoo! Labs, Krisztian Balog from the University of Stavanger and Daan Odijk from the University of Amsterdam. They spent a full day with us and a large cross-section of the Information Retrieval community here in Montreal, both on the academic and industrial sides.

This tutorial was the latest incarnation of a series of tutorials by the same presenters that started in SIGIR 2013. Besides SIGIR, The same material has been presented (among others) at WWW 2013 and WSDM 2014. These are the most important conferences in the field. SIGIR 2013 took place in Ireland, WWW 2013 in Brazil and WSDM 2014 in US. It is thus a great feat for us to have been able to host such a world-class tutorial here in downtown Montreal.

A Merger of Great Minds

I was very happy to see such a talented and diverse crowd attend the tutorial. There were people with a wide variety of interests and experience. I saw seasoned experts together with students, managers together with developers. Best of all I saw people driven by a sincere interest on the topic who were taking the opportunity to attend a training that, on top of travel costs to places such as Brazil, would have cost hundreds of dollars as part of SIGIR. This type of event cements the growing interest in data-driven R&D that Radialpoint is pushing into new levels.

The tutorial covered a key technology we’re integrating into award-winning Radialpoint Reveal and other upcoming products – the ability to detect entities (e.g., a hardware device or a software program) in running text and link it to an existing representation of the item (e.g., its Wikipedia page or its Tech Support page).

In true Open Source fashion, the presenters are making all the material for the tutorial available on github. It had two parts:

  • In the first part, they present the problem of entity linking, recognizing entities in running text and linking them back to a generalized concept graph (ontology).
  • In the second part, they cover entity retrieval, discussing the semantic search problem and statistical approaches to it.

The material is coupled with online exercises that can be found on Code Academy.

Post-Tutorial Discussions

After the event, we enjoyed some time at our office discussing how Entity Linking and Retrieval technology is best applied to knowledge management in the customer support industry. In particular, we discussed how technical support anomalies might be effectively represented as labeled subgraphs, an intriguing possibility we might explore in a very near future.

Closing Thoughts

A great tutorial, full of positive energy and forward-thinking ideas, the presenters, Edgar, Krisztian and Dan also extended their appreciation of how unique this experience was for them. Namely, how happy they were to see such diversity and different backgrounds interested in this topic. We wish them good luck and hope to see them during their next visit to Montreal!


Research behind Reveal wins Best Paper Award!

Alexis Phil Pablo and Ary

To build Radialpoint Reveal we appled a combination of machine learning techniques to process search query logs. The research we performed formed the basis for the academic paper called Filtering Personal Queries from Mixed-Use Query Logs. We recently submitted it to the 27th Canadian Conference on Artificial Intelligence. It was Radialpoint’s first academic paper that we had a chance to present at Canadian AI, a major AI conference. Here’s the abstract of the paper:

Queries performed against the open Web during working hours reveal missing content in the internal documentation within an organization. Mining such queries is thus advantageous but it must strictly adhere to privacy policy and meet privacy expectations of the employees. Particularly, we need to filter queries related to non-work activities. We show that, in the case of technical support agents, 78.7% of personal queries can be filtered using a words-as-features Maximum Entropy approach, while losing only 9.3% of the business related queries. Further improvements can be expected when running a data mining algorithm on the queries and when filtering private information from its output.

And guess what, we won the Best Application Award at CCAI and we couldn’t be more excited! We view this award as an important recognition from AI thought leaders validates our approach. I would like to sincerely thank all of my collaborators for their work on this project, and the Conference for awarding us such an honour!

The conference was in Montreal from May 6th to 9th and is a gathering of world leaders in the development of artificial intelligence (AI) and machine learning technologies and research. Of the 86 papers they received from around the world, our paper beat out 21 other finalists vying for two prizes for original work in Theoretical and Applied AI. The prize is in the Applied category.

I’m very happy and proud about this, for many reasons. It’s a local conference and it’s the first time we’ve submitted. I feel it’s an independent validation of our work by world-renowned experts, that even if we know we’re right, other people confirm it as well. This is not the same as having a company auditing our code, it’s AI thought leaders and experts looking at our ideas and saying they are sound.

Co-authored by Ary Fagundes Bressane Neto, Philippe Desaulniers, Alexis Smirnov, and yours truly, the paper describes how we analyzed web searches by tech support agents to determine which technical issues they were researching. Our goal was to obtain valuable tech support knowledge that other tech support agents can leverage for solving technical problems faster. We created an end-to-end process for sifting through huge amounts of data and separating personal searches from business ones while protecting the agents’ privacy. The approach we came up with was clearly appreciated by the industry. I think the judges liked our practical approach to a real-life problem.

The experience of working on this project was truly gratifying. Although we applied our process to analyzing tech support searches, I feel this could be used in many other ways, such as finding out what’s trending in an organization, helping libraries mine queries to determine what knowledge they should purchase, or helping companies figure out where the greatest need for training is. I think it could really help organizations be more responsive to what people are looking for. The possibilities are endless!

 


Bring your VMWare Infrastructure to the next level using CloudStack

Few weeks ago, back in April 2014, Radialpoint and CloudOps had a chance to present at CloudStack Collaboration Conference 2014 in Denver. Our presentation was called Success Story: Bring your VMWare Infrastructure to the next level using CloudStack. We talked about how CloudStack helped us to transform VMware-based infrastructure to help development teams move faster. With this transition we also started using SaltStack to create complete environments for distributed systems and Twelve-Factor apps.


Log in to your node express site using Zendesk

If you’re working on a node.js express application and you need to implement a login page, Passport makes it a breeze. With passport you can set up basic username and password auth, but rather than forcing your users to remember yet another password you can also take advantage of passport’s modular design to give your users more options. Let them log in with their existing accounts at one of the major identity providers. Passport offers modules that support familiar identity providers such as Google and Facebook, as well as many others. These modules encapsulate the complexity of handling the different authentication protocols used by each identity provider, in addition to the myriad differences in implementation details among them.

We wanted to take this approach with the SupportKit Console, a tool that shows agents context about what the app was doing around the time the customer filed the support ticket. Agents use this tool in conjunction with the Zendesk agent interface, so it makes sense for them to log in to the SupportKit Console using an existing Zendesk account. This is made possible thanks to Zendesk’s support for OAuth2, but how does this work with node and express? Enter passport-zendesk, a new module for passport.

You’ll need to have your application registered in your Zendesk control panel, and you’ll need to set up passport in your node application. Once all that’s done, npm install passport-zendesk and configure it by filling in your zendesk domain, client ID, client secret, and callback URL like so:

var passport = require('passport');
var ZendeskStrategy = require('passport-zendesk').Strategy;

var passport = require('passport');
var ZendeskStrategy = require('passport-zendesk').Strategy;

passport.use(new ZendeskStrategy({
    subdomain: 'yourZendeskSubdomain',
    clientID: 'yourClientIdentifier',
    clientSecret: 'yourClientSecret'
    callbackURL: 'https://www.example.net/auth/zendesk/callback',
  },
  function(accessToken, refreshToken, profile, done) {
    done(err, user);
  }
));

Passport will now recognize a new authentication strategy named ‘zendesk’. You can now hook it up to some routes:

app.get('/auth/zendesk',
  passport.authenticate('zendesk'));

app.get('/auth/zendesk/callback',
  passport.authenticate('zendesk', { failureRedirect: '/login' }),
  function(req, res) {
    // Successful authentication, redirect home.
    res.redirect('/');
  });

We’ve set up two routes, /auth/zendesk and /auth/zendesk/callback. The user starts by browsing to /auth/zendesk from which ZendeskStrategy will redirect them to Zendesk. If the user hasn’t yet logged in, they will be prompted to do so. Zendesk will then present the user with a consent prompt that should look familiar if you’ve ever installed a Facebook application. The prompt displays information about your application and what level of access you are requesting, prompting the user to give their consent. This consent prompt only shows itself the first time around. oauth_prompt When the user’s consent (or denial) is given, the browser is redirected to our callback URL: /auth/zendesk/callback. Assuming the user has allowed us access, we’ll get a authentication code which we can use to fetch some profile information. The passport-zendesk module takes care of this and makes it available in req.user. You can then customize your site experience like so: success You can also configure passport-zendesk to authenticate with more than one Zendesk subdomain. This is useful if you’ve got a global oauth client set up and you need agents to be able to log in from any Zendesk subdomain. For more information, check it out the docs posted here.


Web scraping for all your needs!

Web scraping is fun. This is why we are today open-sourcing a web scraping tool, dubbed Store Scraper, for surveying and collecting data about the availability of the Android and iOS apps for a brand or company, given their website URL. The store scraper finds useful stuff like the number of downloads, app ratings, and even user reviews. In this post, we will talk about the type of heuristics used during the scraping.

Use the HTML, Luke

Since our input is a simple a website URL, we need to transform that into a meaningful search term to be used when querying the Play and iTunes store APIs. This is where web scraping does shine, since a webpage is usually filled with data ready to be extracted.

When defining the query term, we follow a very simple strategy:

  1. First, look into every single anchor element in the page, and see if any of them link to the itunes.apple.com subdomain. If found, then grab the app ID contained in the URL. The ID thus be used to query the iTunes store.
  2. If nothing is found in the body of the website, try the headers. Some websites offer a <meta name=”apple-itunes-app”> tag, which can be regexed to extract the app ID. More details on this super-duper, though proprietary, tag can be found on this page.
  3. We follow the same strategy from 1 for the Android App store: just check if there is any link pointing to the Play Store. If that works, just scrape the app’s name from its Store page.
  4. If all else fails, just look in specific spots on the page. The logo, the website’s main domain name, and the footer information are all possible locations where to find the actual app name.

When good is enough

For the dataset used at Radialpoint, we managed to retrieve the info for iOS apps 95% of the website URLs, while we succeeded 78% of the time with the Play store.

There are many ways to make this app better, which is why we decided to open -source it. So go check it out on Github and have fun!


Follow

Get every new post delivered to your Inbox.