Duration: Right-to-Hire
Compensation: OPEN
Location: 100% Remote

Responsibilities:

  • Responsible for day-to-day data collection, transportation, maintenance/curation, and access to the corporate data asset.
  • Work cross-functionally across the enterprise to centralize data and standardize it for use by business, data science, or other stakeholders.
  • Increase awareness about available data and democratize access to it across the company.
  • Act as a key technical expert overseeing the quality of data pipelines into and within the company, working with data architecture and data quality teams to structure high-quality data-at-rest, and provisioning data for use and experimentation by various internal data customers.
  • Work in a hybrid environment with in-house, on-premise data sources as well as cloud and remote systems. 
  • Work closely with the data science team to enable exploration of new data sources and the development of models to classify and bring meaning to unknown and unstructured data.
  • Supervise data Ingestion and Integration processes from data source systems into the enterprise data warehouse, data lake, or other data storage and exploration tools (i.e., Databricks delta lake). 
  • Professionalize data engineering by developing standard management processes to oversee the lifecycle of ingress and egress data pipelines and customer data sandboxes under your management.
  • Partner with IT, data architecture, and other teams on the administration and monitoring of all data platforms to ensure data is properly transported, harmonized, and made accessible across key dimensions: business and financial policies, security, local-market regulatory rules, consumer privacy by design principles (PII management), and all linked across fundamental identity foundations.
  • Performance tune and optimize all data ingestion and data integration processes, including data platform and databases.

 Requirements:

  • Experienced building Data Ingestion Frameworks working with Azure Databricks and Azure Data Factory (ADF).
  • Experienced as a Data Engineer in large scale Data Lake/Data Cloud Data Platform implementation projects.
  • Experienced in Python and PySpark coding.
  • Experienced building event driven Azure Function based pipeline solutions.
  • Experience managing pipelines into a heterogenous data architecture/technology ecosystem.
  • Strong SQL experience.
  • Comfortable setting up and overseeing batch and API based data pipelines.
  • Experience using data tools like MFT (Managed File transfer – Tibco, IBM MQ FTE), ETL (Extract Transform and Load – Informatica), DataStage, Alteryx, industrial data pipeline tools, etc.
  • Experience with tools like Azure Data Lake (Analytics and Storage), Azure Data Factory (ADF), Data Warehouse, Synapse Analytics, Amazon Redshift, and Logic Apps.
  • Familiarity with Hadoop-based technologies (HDInsight, Spark, Hive, Pig, etc.). (Nice to Have).