Would you like to go to https://onsiter.com/us/ instead?
Til rådighed ASAP
(Opdateret 2025-01-10)Erfaren data scientist
Hørsholm, Danmark
Modersmål Danish, English
- +15 års erfaring med machine learning
- 7 års erfaring med large language models
- +15 års erfaring med data science
Kvalifikationer (8)
Machine Learning
large language models
STATISTIK
Artificial Intelligence (NLP)
ALGORITHMS
DATA SCIENCE
Programming
Mathematics
Resumé
With a PhD in Machine Learning and over 15 years of professional experience, I am an accomplished data scientist and researcher specializing in software development, statistics, and mathematics. My multidisciplinary expertise encompasses a robust theoretical foundation and extensive hands-on experience, gained through continuous learning and a large number of work engagements. I have successfully delivered innovative solutions across various domains, from optimizing product taxonomies for IKEA to developing automated price estimation systems for Mærsk. My academic journey at prestigious institutions like the University of Copenhagen and DTU has significantly shaped my problem-solving approach, while my professional ventures have refined my ability to apply advanced machine learning techniques to real-world challenges.
Professionel erfaring
2018-08 - Nuværende
I worked on machine learning projects for a wide array of startups and international clients such as Mærsk, Fairhomes, IKEA, Cathvision, SoftSingularity, FindZebra, Børns Vilkår, ESA, Molt Wengel, DSV, Likvido, Vento Maritime and others. Fairhomes (02/2023 - 04/2024) :
I have done two projects for Fairhomes. The first had a duration of 9 months. In that project I developed models that could estimates apartment prices and potential rent. The technologies used were GCP, Kubeflow and Airflow. The second project spanned about 6 months and was about using Large Language models for analysing appartments from listings (using text and images). The end result was a chat bot that could answer questions about the augmented real estate dataset.
Mærsk (05/2023 - 12/2023) : Assisted in devising an automated price estimation system for container shipping. Tools: Python, Azure, Databricks, Bayesian analysis, random forests, neural networks.
DSV (12/2021 - 08/2022) : Helped engineer a complete ML ops framework, enhancing existing models and introducing new models for data extraction from invoices. Technologies: Python, PyTorch, Large Language models, Bert, LayoutLm, Kubernetes, GCP.
IKEA (10/2019 - 01/2022) : As a contractor at IKEA, I contributed to a significant two-year project focused on optimizing localized product taxonomies. My role involved managing and analyzing extensive data sets, with volumes exceeding 40 TB daily. These taxonomies were constructed using a combination of search data, customer interactions, and revenue statistics. I also developed and implemented an innovative product recommendation algorithm, 'session2vec', which outperformed existing models in A/B testing. My expertise spanned several technologies including Docker, Kubernetes, Tensorflow, scikit-learn, Python, Airflow, and GCP. I also played a key role in delivering a comprehensive ML ops framework that covered the full lifecycle from data pre-processing to model deployment.
Soft Singularity (10/18 - 12/19): I developed several projects at Soft Singularity. One project was about building a prediction service that could predict areas in danger of fire based on satellite images . The purpose of this was to supplement the existing draught index published by the Danish Meteorological Institute. The technologies used were GCP, AWS, Python, Tensorflow and various image models.
HugeImpact (02/2019 - 07/2019): Large-scale classification of 120 million products into the Google taxonomy. This classification was based on an ensemble of image models and language models. Furthermore, semi-supervised learning was employed. The technologies used were Docker, Tensorflow, scikit learn, Python, Flask and more.
Børns Vilkår : Developed a cloud application to provide donor value analytics for the donation- reliant organization, integrating both statistical and ML methods. The technologies used were Docker, Tensorflow, scikit-learn, Python, PyMc3, Flask, GCP.
Molt Wengel : Did research on a multilingual law-focused search engine emphasizing NLP and information retrieval. The technologies used were Python, Tensorflow, Docker, Flask, AWS.
ESA : Engaged in a concise project to brainstorm commercial applications of satellite imagery, resulting in an invitation to Italy by ESA to present our innovative concepts, The technologies used were Docker, Tensorflow, scikit-learn, Python, Flask, GCP.
Cathvision : The project was about using machine learning for audio waveform analysis in order to classify signal properties.
2017-11 - 2018-07
KMD is one of the largest IT companies in Denmark (more than 3000 employees). They deliver software solutions to municipalities and central government and manages huge amounts of data. My role at KMD was to build a team and find new ways of using this data. One of the larger project I have been involved in is called EcoKnow and is about easing the case handling in the job centers
2013-09 - 2017-08
It is estimated that 350 million people worldwide suffer from one of the 7000 known rare diseases. Unfortunately, a medical professional is unlikely to see a single case of a given rare disease during an entire career. As a result, the correct diagnosis is often delayed for several years. FindZebra is a search engine for rare diseases that was created in order to help medical professionals diagnose rare diseases (also known as "zebras"). My primary role at FindZebra was to lead our machine learning research and general software development.
Example tasks
Improvement of search performance. I improved performance from 68.6% (when using the state-of-the-art open source software Solr) to 81.9%. The performance increase was obtained by using a combination of ensemble methods, improved neural models, corpus expansion, structured data sources and synonym injection.
Multilingual search. I showed how to create a language independent text representation that could be used for zero shot classification.
Information completion (how to predict "missing information" from a query).
Gene search (how to rank a list of genes by relevancy based on a query).
Large vocabulary problems for e.g. medical search. The article describing the results was accepted to NIPS 2017 (https://nips.cc/Conferences/2017/ Schedule?showEvent=9269). An implementation of the paper by Yann Dubois won the NIPS implementation challenge (https://github.com/YannDubs/Hash- Embeddings).
Leading several large software projects such as building and maintaining the FindZebra.com website and the development of a medical interviewer (chat bot).
During my time at FindZebra I supervised 20 student projects (9 master, 6 bachelor, and 5 specialized projects). The topics of the projects ranged from general machine learning, Bayesian networks and FindZebra specific topics.
Akademisk baggrund
2014-08 - 2017-08
2007-07 - 2009-02
2005-06 - 2007-06
2001-06 - 2005-06
Kontakt konsulent
Skal du hurtigt finde en ekspert?
Vi kan sætte dig i kontakt med kvalificerede eksperter, der matcher dine behov.
eller
