Data Engineer (PySpark + Glue)

Irvine Technology Corporation

Apply Now

The Senior Data Engineer, you will develop ETL pipelines involving transformation of nested data stored in JSON and Parquet files, using GLUE + Pyspark.  

Develops and maintains scalable data pipelines and builds out new API integrations for data transfer.

Develop Terraform scripts to deploy infra required for ETL pipelines on AWS Performs data analysis required to troubleshoot data-related issues and assist in the resolution of data issues.  

Viewed as a data expert, this person drives innovation and plays a key role in the department. Participates in highly visible initiatives that have broad impact.

Identify, design, and implement internal process improvements: automate manual processes, optimize data delivery, re-design infrastructure for greater scalability.

Skills: Must-have

  • BS or MS degree in Computer Science or a related technical field
  • 5+ years of extensive ETL development experience using PySpark/Glue on
  • 5+ AWS years of experience in CSV, JSON, Parquet file formats,
  • 5+ especially with nested data types years of experience in S3, Athena,
  • 5+ RDS, Glue catalogue, CloudFormation
  • Strong understanding of ETL/Data-pipelines/Big Data architecture Strong Database/SQL experience in any RDBMS

Nice-to-have

  • Experience in schema design, data ingestion experience on Snowflake (or equivalent MPP)
  • Experience in orchestrating data processing jobs using
  • Step Function/Glue workflow/Apache Airflow (MWAA)
  • Experience in data analysis using Excel formulas, vlookup, pivot, slicers

Apply Now

  Apply with Google   Apply with Twitter
  Apply with Github   Apply with Linkedin   Apply with Indeed
  Stack Overflow