Data Scientist
What we do:
Recognized as one the Top 100 Tech Companies by Builtin.com and over 4.4-star review on Glassdoor, SambaSafety® is the pioneer of driver risk management software in North America. Trusted by over 2 million subscribed drivers; thousands of businesses look to Sambasafety to provide the most powerful, advanced, intuitive, and impactful risk solution platform on the market. SambaSafety is growing at an incredible rate with high employee engagement. It’s an exciting time to be at Samba. Now is the right time to join our high performing culture. We hope to see you here!
What You’ll Do:
- Partners with business leaders across the organization to define business problems and research questions
- Collates and extracts relevant information from large amounts of structured and unstructured data (internal and external) to enable analytical solutions.
- Conducts advanced analytics, predictive modeling, machine learning, simulation, optimization and other techniques to deliver insights and deliver analytical solutions to achieve business objectives
- Conducts end-to-end feature engineering (i.e. brainstorm, create, validate, encode), predictive modeling, performance evaluation and reporting
- Use fuzzy matching techniques to match
- Explain and visualize results and algorithm performance to both technical and non-technical audiences
- Write production level code in a dynamic, start-up like environment
What you'll Need:
- Bachelor's Degree in Mathematics, Statistics, Computer Science, Operational Research or related field; Advanced degree preferred.
- A minimum of 5 or more years in predictive modeling and extracting insights from large, disparate datasets
- Proven knowledge of modern statistical and machine learning techniques including, but not limited to, generalized linear models (e.g. binomial, Poisson, cox models), tree-based algorithms, neural networks, gradient boosting, regularization, mixture models, time series analysis and forecasting, clustering, anomaly detection, …etc.
- Solid practical and theoretical understanding of statistical inference applied to both experimental and quasi-experimental data (e.g. parametric and non-parametric hypothesis testing, resampling methods, causal effect detection and estimation, …etc.)
- Experience developing data science pipelines & workflows in R, Python, or equivalent open source programming languages.
- Experience matching datasets non-standardized fields (e.g. name, address, email) using fuzzy matching techniques and algorithms
- Ability to effectively communicate findings from complex analyses to non-technical audiences.
- Experience using ML libraries, such as caret, scikit-learn, mllib, etc.
- Proficient coding and tuning Relational SQL and NoSQL scripts: Microsoft SQL Server, Postgres, Cassandra, MongoDB, etc.
- Comfortable working from the command line (Linux)
- Experience working in AWS and other cloud storage and computing environments
- Familiarity with Big data tools: Hadoop, Spark, Kafka, etc.