Location: Remote
Job Type: Co-op or Internship
Duration: 3-6 months
Authenticate.com is a leading provider of identity verification and background check solutions. Our innovative platform helps businesses prevent fraud, ensure compliance, and build trust with their users. We offer a wide range of verification services, including document verification, facial recognition, database checks, and continuous monitoring.
Job Summary:
We are seeking a highly motivated and detail-oriented Data Scientist Co-op to join our team. As a Data Scientist Co-op, you will play a critical role in developing and maintaining our data infrastructure, with a focus on creating vector databases and utilizing Large Language Models (LLMs) to normalize data for criminal history and employment history. This is an excellent opportunity to apply your data science skills to real-world problems and contribute to the development of innovative solutions in the identity verification and background screening space.
Responsibilities:
· Design, develop, and maintain vector databases to store and query large datasets related to criminal history and employment history
· Utilize Large Language Models (LLMs) to normalize and standardize data from various sources, ensuring consistency and accuracy
· Collaborate with cross-functional teams to integrate vector databases and LLM-based data normalization into our background screening and identity verification products
· Develop and implement data quality control processes to ensure data accuracy, completeness, and integrity
· Analyze and visualize data to identify trends, patterns, and insights that can inform product development and improvement
· Stay up-to-date with industry trends and advancements in natural language processing, machine learning, and data science
· Communicate technical results and insights to non-technical stakeholders through clear and concise reporting
Requirements:
· Currently enrolled in a Bachelor's or Master's degree program in Computer Science, Data Science, Mathematics, Statistics, or a related field
· Strong programming skills in Python, with experience in data science libraries such as NumPy, Pandas, and scikit-learn
· Familiarity with vector databases and Large Language Models (LLMs) such as BERT, RoBERTa, or DistilBERT
· Experience with data preprocessing, normalization, and feature engineering
· Knowledge of data visualization tools such as Matplotlib, Seaborn, or Plotly
· Excellent problem-solving skills, with the ability to work independently and collaboratively as part of a team
· Strong communication and interpersonal skills, with the ability to explain technical concepts to non-technical stakeholders
Nice to Have:
· Experience with cloud-based data storage solutions such as AWS S3 or Google Cloud Storage
· Familiarity with containerization using Docker and orchestration using Kubernetes
· Knowledge of data governance and compliance regulations such as GDPR and CCPA
· Experience with agile development methodologies and version control systems such as Git
What We Offer:
· Competitive co-op salary for full time co-ops, or complete flexibility and self-determination for unpaid interns
· Opportunity to work on cutting-edge projects in the identity verification and background screening space
· Collaborative and dynamic work environment with a team of experienced data scientists and engineers
· Professional development opportunities, including training and mentorship
· Flexible work arrangements, including remote work options
If you are a motivated and detail-oriented individual with a passion for data science and machine learning, please submit your resume, cover letter, and any relevant projects or code samples. We look forward to hearing from you!