Mapbox Minsk R&D center (OOO MapData) enables navigation for people, packages and vehicles. Our underlying maps represent the ever-evolving world, accessing, aggregating, and adapting anonymous data from millions of sensors and phones in real-time.
What We Do
The Geodata team normalizes and conflates multiple data sources into consumable, high-quality data layers such as the road network, places of interest (POIs), buildings, places for internal and external customers. We ensure high-quality by first maintaining a standardized specification for all map layers that our internal customers like the Navigation, Maps and Search division rely on. These layers are maintained in a data warehouse, which is updated daily and preprocessed, filtered, conflated, and transformed into different formats for respective consumers.
We are built on top of AWS. We use tools like Airflow for job orchestration and automate our tasks using Lambda, ECS and PySpark applications on Qubole. We store our data in S3 and are heavy users of Hive. We use Amazon Athena to provide and measure our operational and qualitative metrics. We are building our own internal tools to accelerate our workflows.
What You’ll Do
- Work with many geospatial data sets, specifically road networks, buildings, POI and address data
- Implement distributed pipelines using Airflow and Spark to process geospatial data
- Integrate third party data sources from different geographic areas into the basemap
- Interface with engineers from other teams to analyze their needs for geospatial data and solve their data problems
- Implement automated quality metrics to ensure we are continuously delivering high quality data to our customers
- Participating in design and code reviews
- Mentor other software developers to develop all aspects of their engineering skill sets
- Create new data products by aggregating proprietary sources and derived data from sensors and aerial imagery
What We Believe are Important Traits for This Role
- 3+ years of working software experience
- Experience with AWS or another cloud provider.
- Proficiency in at least one modern programming language (Python, Scala, Java, …) suitable for data processing
- Proficiency in a query language like SQL
- Strong experience with data processing and developed judgement to implement new data pipelines and develop best practices around it.
- Familiarity working with Spark or other Hadoop based technologies
- Familiarity with CI/CD processes
- Familiarity handling processing and normalizing many different datasets into a single coherent product.
- Ability to communicate complex concepts to both peers and leadership. Strong verbal and written communication skills.
- Experience with introducing quality and operational metrics into a data ETL pipeline.
- High performing team player that can create consensus.
- Deliver key results quickly and resolve ambiguity in the customer's favor.
- Ability and willingness pivot to new languages, skills, techniques quickly.
- Experience with geospatial data analysis and processing is a plus.
- Experience with machine learning is a plus.