My job alerts

Data Scientist

Populi

This job is no longer accepting applications

See open jobs at Populi.See open jobs similar to "Data Scientist" Connecticut Innovations.

Data Science

Farmington, CT, USA

Posted on Thursday, August 25, 2022

Description

Data Scientists are responsible for cleaning, transforming, enriching, and analyzing vast amounts of raw data from various systems using Apache Spark and other analytics packages to develop valuable features and to provide ready-to-use data to stakeholders for robust downstream analysis. They analyze data for correlations to identify trends and predictive power, and build, maintain, and deploy predictive models. Data Scientists work with analysts to understand business needs and requirements, and data engineers to implement scalable pipelines for ETL, model training, and scoring. They service both ad-hoc requests as well as core pipeline development. The ideal candidate has a passion for discovering insight hidden in large data sets and working with stakeholders to improve business outcomes. They keep up with the latest technology including the latest versions of spark, new analytical packages, etc. They must have a proven ability to drive business results with their data-based insights and be comfortable working with a wide range of stakeholders and functional teams.

Responsibilities

Collaborate with product management and engineering departments to understand company needs and devise possible solutions
Mine and analyze data from company databases to drive optimization and improvement of product development, marketing techniques and business strategies
Communicate results and ideas to key decision makers
Research and develop predictive models for data analysis
Optimize joint development efforts through appropriate database use and project design
Assess the effectiveness and accuracy of new data sources and data gathering techniques
Develop custom data transformations, models, algorithms to apply to data sets
Use predictive modeling to increase and optimize customer experiences, revenue generation, ad targeting and other business outcomes
Develop processes and tools to monitor and analyze model performance and data accuracy
Create high-performance pipelines in Apache Spark for data transformation, aggregation, and model training
Develop and share well documented analysis in data science notebook solutions like Jupyter
Write documentation with all code

Basic Qualifications

Excellent communication and interpersonal skills
Knowledge of agile methodologies and tools (e.g. Scrum, JIRA).
Basic system administration skills in both a Windows and Linux environment
Bachelor’s degree in Computer Science, Statistics, Applied Math or related field
3+ years practical experience with Apache Spark, ETL, machine learning, data processing, and data analytics
Strong experience with Python and Bash shell scripting
Strong experience with Apache Spark and MLflow
Strong experience with AWS and/or Google Cloud Platform
Experience training, deploying, monitoring, and updating machine learning models
Knowledge of a variety of machine learning techniques, including clustering, decision trees, random forest, boosting, text mining, and neural networks, and their real-world advantages and drawbacksGLM/Regression, Random Forest, Boosting, Trees, text mining, social network analysis, etc.
Experience with multiple machine learning libraries, including XGBoost, Scikit-learn, TensorFlow, and PyTorch
Experience working with and creating data architectures
The ability to teach and train others in the methodologies and practices used in data science
Familiarity with Git and code versioning practices
Familiarity with the the Atlassian product suite, including JIRA and Confluence
A drive to learn and master new technologies and techniques

Preferred Qualifications

Experience with Scala
3+ years of experience with healthcare data and use cases, particularly claims data
5+ years practical experience with Apache Spark (Scala and Python), ETL, machine learning, data processing, and data analytics
Strong experience with Apache Spark 3.x, including query tuning and performance optimization
Master’s or Doctoral Degree in Computer Science, Statistics, Applied Math or related field
Strong experience with AWS EMR, Glue, and Athena
Experience working with healthcare data, specifically healthcare insurance claims data

Populi is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you have a disability or special need that requires accommodation, please let us know.

This job is no longer accepting applications

See open jobs at Populi.See open jobs similar to "Data Scientist" Connecticut Innovations.

See more open positions at Populi

Career Opportunities

Data Scientist