Location: 100% Remote
- Responsible for day-to-day data collection, transportation, maintenance/curation, and access to the corporate data asset.
- Work cross-functionally across the enterprise to centralize data and standardize it for use by business, data science, or other stakeholders.
- Increase awareness about available data and democratize access to it across the company.
- Act as a key technical expert overseeing the quality of data pipelines into and within the company, working with data architecture and data quality teams to structure high-quality data-at-rest, and provisioning data for use and experimentation by various internal data customers.
- Work in a hybrid environment with in-house, on-premise data sources as well as cloud and remote systems.
- Work closely with the data science team to enable exploration of new data sources and the development of models to classify and bring meaning to unknown and unstructured data.
- Supervise data Ingestion and Integration processes from data source systems into the enterprise data warehouse, data lake, or other data storage and exploration tools (i.e., Databricks delta lake).
- Professionalize data engineering by developing standard management processes to oversee the lifecycle of ingress and egress data pipelines and customer data sandboxes under your management.
- Partner with IT, data architecture, and other teams on the administration and monitoring of all data platforms to ensure data is properly transported, harmonized, and made accessible across key dimensions: business and financial policies, security, local-market regulatory rules, consumer privacy by design principles (PII management), and all linked across fundamental identity foundations.
- Performance tune and optimize all data ingestion and data integration processes, including data platform and databases.
- Experienced building Data Ingestion Frameworks working with Azure Databricks and Azure Data Factory (ADF).
- Experienced as a Data Engineer in large scale Data Lake/Data Cloud Data Platform implementation projects.
- Experienced in Python and PySpark coding.
- Experienced building event driven Azure Function based pipeline solutions.
- Experience managing pipelines into a heterogenous data architecture/technology ecosystem.
- Strong SQL experience.
- Comfortable setting up and overseeing batch and API based data pipelines.
- Experience using data tools like MFT (Managed File transfer – Tibco, IBM MQ FTE), ETL (Extract Transform and Load – Informatica), DataStage, Alteryx, industrial data pipeline tools, etc.
- Experience with tools like Azure Data Lake (Analytics and Storage), Azure Data Factory (ADF), Data Warehouse, Synapse Analytics, Amazon Redshift, and Logic Apps.
- Familiarity with Hadoop-based technologies (HDInsight, Spark, Hive, Pig, etc.). (Nice to Have).