hero

Career Opportunities

Data Scientist

Populi

Populi

Data Science
Farmington, CT, USA
Posted on Thursday, August 25, 2022

Description

Data Scientists are responsible for cleaning, transforming, enriching, and analyzing vast amounts of raw data from various systems using Apache Spark and other analytics packages to develop valuable features and to provide ready-to-use data to stakeholders for robust downstream analysis. They analyze data for correlations to identify trends and predictive power, and build, maintain, and deploy predictive models. Data Scientists work with analysts to understand business needs and requirements, and data engineers to implement scalable pipelines for ETL, model training, and scoring. They service both ad-hoc requests as well as core pipeline development. The ideal candidate has a passion for discovering insight hidden in large data sets and working with stakeholders to improve business outcomes. They keep up with the latest technology including the latest versions of spark, new analytical packages, etc. They must have a proven ability to drive business results with their data-based insights and be comfortable working with a wide range of stakeholders and functional teams.

Responsibilities

  • Collaborate with product management and engineering departments to understand company needs and devise possible solutions
  • Mine and analyze data from company databases to drive optimization and improvement of product development, marketing techniques and business strategies
  • Communicate results and ideas to key decision makers
  • Research and develop predictive models for data analysis
  • Optimize joint development efforts through appropriate database use and project design
  • Assess the effectiveness and accuracy of new data sources and data gathering techniques
  • Develop custom data transformations, models, algorithms to apply to data sets
  • Use predictive modeling to increase and optimize customer experiences, revenue generation, ad targeting and other business outcomes
  • Develop processes and tools to monitor and analyze model performance and data accuracy
  • Create high-performance pipelines in Apache Spark for data transformation, aggregation, and model training
  • Develop and share well documented analysis in data science notebook solutions like Jupyter
  • Write documentation with all code

Basic Qualifications

  • Excellent communication and interpersonal skills
  • Knowledge of agile methodologies and tools (e.g. Scrum, JIRA).
  • Basic system administration skills in both a Windows and Linux environment
  • Bachelor’s degree in Computer Science, Statistics, Applied Math or related field
  • 3+ years practical experience with Apache Spark, ETL, machine learning, data processing, and data analytics
  • Strong experience with Python and Bash shell scripting
  • Strong experience with Apache Spark and MLflow
  • Strong experience with AWS and/or Google Cloud Platform
  • Experience training, deploying, monitoring, and updating machine learning models
  • Knowledge of a variety of machine learning techniques, including clustering, decision trees, random forest, boosting, text mining, and neural networks, and their real-world advantages and drawbacksGLM/Regression, Random Forest, Boosting, Trees, text mining, social network analysis, etc.
  • Experience with multiple machine learning libraries, including XGBoost, Scikit-learn, TensorFlow, and PyTorch
  • Experience working with and creating data architectures
  • The ability to teach and train others in the methodologies and practices used in data science
  • Familiarity with Git and code versioning practices
  • Familiarity with the the Atlassian product suite, including JIRA and Confluence
  • A drive to learn and master new technologies and techniques

Preferred Qualifications

  • Experience with Scala
  • 3+ years of experience with healthcare data and use cases, particularly claims data
  • 5+ years practical experience with Apache Spark (Scala and Python), ETL, machine learning, data processing, and data analytics
  • Strong experience with Apache Spark 3.x, including query tuning and performance optimization
  • Master’s or Doctoral Degree in Computer Science, Statistics, Applied Math or related field
  • Strong experience with AWS EMR, Glue, and Athena
  • Experience working with healthcare data, specifically healthcare insurance claims data

Populi is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you have a disability or special need that requires accommodation, please let us know.