Great that you’re interested in doing your internship or thesis at DPG Media! We are always looking for problem solvers with a strong background in AI or machine learning. At DPG Media, we have news content from many different newspapers in Belgium, the Netherlands and Denmark. We reach around 10 million people every day. On the data side, we have around 4000 journalists working on high-quality content and an event stream with more than 500 million events a day.
Our team (which is the team you would be a part of) specifically focuses on news personalization. We are both academically and engineering-wise a strong group with expertise (MSc/PhD, industry experience) in ML in general, recommender systems, and natural language processing.
Below you’ll find a list of ideas we find interesting. This is by no means an exhaustive list; we are keen on discussing your own ideas. Do not hesitate to get in touch if you’re either interested in one of the suggestions below or if you would like to talk about other options.
If you’re interested, please let us know by sending an email to email@example.com. We’ll have a (lightweight) application process to make sure we’re a good fit to each other.
We’ll need you to be able to work independently. Fluency in Python helps. Ideally, projects are 6 months or longer. We’d really like it if the result of the internship is a paper published at a conference or workshop.
Monitoring emerging events on Twitter: what should be news?
Newsworthy events constantly emerge. Many of these events surface on Twitter almost immediately. As a media company, it is impossible to monitor everything that is happening, regardless of how many journalists you have. Automating this process, recognizing newsworthy events on Twitter as soon as they emerge, could be of incredible value to all newspapers of DPG Media.
Predicting reception of articles; how do people react to news?
Monitor or maybe even predict how people react on articles on social media. Which articles are received positively/negatively, and which are controversial?
Bursting the bubble: providing different views on the news
A common conception about news recommendation is that it might create filter bubbles: people would, in that case, only receive recommendations that confirm their view. As opposed to Facebook/Google, where recommenders optimize for clicks, our goal is to optimally inform our users. This means that the opposite case is a very interesting one: how do we actively burst people’s filter bubbles? In other words: how do we automatically provide people with a view on the news that does not conform to their current one?
Recommendation: a weekly email with the best of all DPG Media titles, personalized
DPG Media offers many different newspaper titles (AD, Volkskrant, Trouw, Parool, many regional papers). To give users with a subscription to one of them access to all DPG Media content, we already have Topics. It would be very interesting to also offer a weekly newsletter with a personalized selection from all DPG Media titles. This allows us to even better inform our users.
Learn explaining news recommendations from explicit user feedback
Explaining recommendations are a hot topic in the academic recommendation systems community. This is also interesting for news personalization at DPG Media: we would not only like to recommend personally relevant content to users, but also want to automatically explain for what reason this article is relevant to a specific user. Given the volume of user interactions, we could go even further: we could ask users for explicit feedback on the explanations, thereby giving the users a direct way to correct us. This makes it possible to learn explaining recommendations in a supervised, online fashion.
Lots of news follows a predictable pattern Examples are the reports on local sport matches. This project would investigate (semi-)automating the production of such news. You would have our whole archive of millions of historical news articles available to train on.
What did I miss?
Every time a user re-appears on one of our platforms we could help them by telling them which significant events happened since we last saw them. If they’ve been away for a long holiday we might give them a timeline the get them up to speed. If it has been a day, the highlights or even a summary of what happened since yesterday. If nothing happened, we might want to tell them that.
Suggesting newspaper photographs
Photo editors manually go through all of these photographs to pick the most suitable one. This means that, at Volkskrant alone, photo editors look at ~25000 photographs every year. Given historical data, it might be possible to suggest the most suitable photographs and support them in their job.
Daily Briefing on a Virtual Assistant
Virtual assistants are very popular. In the US, an estimated 4.1 million smart speakers were sold last year. Earlier this year the Google Home was released for the Dutch language, so it is time for us to make content from v available through one of the smart speakers. The goal of this project is to have a prototype read a daily briefing to a user. The vast number of articles that we have at DPG Media is at your disposal for this.
Every newspaper is carefully composed by a manual process. The input to that process are a) restrictions such as: there needs to be a frontpage, the economy section should be 4 pages, foreign news should be 6 pages, etc, and b) todays articles. Editors then decide what goes where and which articles need to be shortened. This is an iterative and very labour intensive process. This project would look into ways of supporting editors to make this process faster without losing composition qualities such as cadence that newspapers currently exhibit.
Summaries of news articles can serve as a way to quickly give insight in why something should be worth reading. Either to our readers or to our journalists. When context is taken into account when producing a summary, we could dramatically increase the informativeness of the summary.
For instance, when a journalist is writing a news article, then this article should be taken as context for summaries of recommended articles from our archive. Such contextual summary could help the journalist in their writing process.
Or if we would take past reading behaviour from one of our readers into account, we could gear a summary to highlight new information for this reader.
When a summary for an article is produced, it would be valuable if we could provide evidence as to why it is a good summary. The summary could, for instance, point to several portions of the original article that it condensed. Or an explanation could be something like: “this summary says this is a foreign politics article because it talks about Donald Trump here in the full article”. This project would look into ways of explaining summaries.
Subscribe to offers for your favourite crate of beer
Imagine being able to subscribe to notifications for nearby deals of expensive products you often buy (think of your favourite crate of beer, frozen pizza’s, laundry detergent). Reclamefolder.nl is our platform for advertising leaflets that is aiming to provide that service. Being the number one player in the Netherlands in the domain of online leaflets, serving over 1.7 million visitors every month, Reclamefolder.nl is missing is a detailed understanding of our 50k live deals so that we can make them comparable and searchable.
Your task would be to set up an NLP (and maybe even computer vision) pipeline that takes as input the deals that are already cut from (pdf) advertising leaflets, categorize them and make them accessible through standing queries.
The assignment relates to one of our strategical pillars and you’ll be working with our dedicated online experts and technically skilled development team to test solutions and implement solutions in the real world to see immediate results.