GLIDEFINDER
Plot
Forest fires are one of the natural disasters that we cannot control. However, by applying modern technologies, we have the opportunity to predict them and minimize the consequences for both individuals and businesses. We wanted to share our experience working on GlideFinder, a platform that uses images from NASA satellites to analyze wildfire information in real time and alert those nearby.
After we received an email from Dmitry Kryuk, Founder and CTO of GlideFinder, who was looking for experienced Google Cloud developers. He wanted us to improve the product by making some changes to the product’s architecture. In particular, he needed to integrate online data analysis tools and connect additional NASA satellites for more precise and accurate wildfire information.
How it works?
GlideFinder is a platform that locates wildfires, alerts subscribers, and provides timely analysis. GlideFinder was originally developed for the California area, where fires are particularly dangerous and annually destroy hundreds of homes, leading to billions of dollars in damage, business bankruptcies, and many deaths. Now, the company wants to help people around the world reduce the consequences of wildfires.
- NASA’s Suomi NPP (National Polar Orbiting Partnership) satellite
- VIIRS Satellite (Visible Infrared Imaging Radiometer Suite)
To make predictions about the spread of wildfires, the product analyzes data from:
- NASA’s historical MODIS/VIIRS fire data was collected over the previous 17 years
- United States Fire Administration database
- US Census Bureau data
When GlideFinder detects a forest fire, it warns users of the danger via SMS. Users can also track the movement, direction, speed and size of the fire on a real-time map integrated on the GlideFinder website.
GlideFinder users can also see how far a wildfire is from their home, their children’s school, their parents’ home, and the office. Most data analysis takes place on Google Cloud Platform, which reduces latency and provides all the necessary data analysis tools.
Customer requirements
Dmitry needs a team to upgrade the product. For this mission, he has the following requirements:
- Integrated streaming data analytics. Since the main project’s architecture is based on batch data processing, we needed to upgrade it so that the product could perform better real-time data analysis.
- Integration of additional Geostationary Operational Environmental Satellites (GOES-16 and GOES-17) will run new scans every 5-15 minutes so the platform can receive more detailed information about a fire forest.
Additionally, the platform can consolidate satellite data into a specific format. So we need to configure data preparation with geospatial transformations.
Our solutions and obstacles
To integrate streaming data analytics, we have developed an ETL (extract, transform, load) pipeline based on Google Cloud components such as Dataflow, PubSub, Cloud Functions, BigQuery, etc. We also materialize views with required business logic and use a Dataflow Job to process that data and write to JSON files in Cloud Storage and Cloud Firestore.
During Data Stream integration, we need to use Python SDK because the entire architecture is developed using this programming language. Since the Python architecture is less documented (unlike Java, which we apply more frequently), we spent a significant amount of time on the investigation process.
To add open source geostationary operational environmental satellites, we had to dig deeper into working with geographic data, so we conducted further research. We take the images received from satellites and divide them into geographic regions using geospatial transformations.
At this stage, we also applied to the Geospatial Data Abstraction Library, a computer software library for reading and writing raster and vector geospatial data formats. It took us about four months to complete all these tasks.
How did we do it?
For this project, we created new Buckets (the necessary buckets containing the project’s data) on Google Cloud Storage. Then, to upgrade the product’s infrastructure, we gradually migrated the bulk data components to the streaming infrastructure.
Integrating GOES-16 and GOES-17 has led us to take the following actions:
- Contact National Oceanic and Atmospheric Administration engineers and ask them to mirror GOES satellite data into GCS bins
- Glidefinder ETL system connected to GOES GCS message
- GOES processed historical data
- GOES historical data is stored into the BigQuery dataset
- Added Dataflow GOES data stream processing
- Added geospatial transformations to align GOES data with satellite data (like VIIRS) that has been collected
Storage system
Team composition
1 Project Manager
2 Data Scientist
Result
We’ve empowered GlideFinder with a new infrastructure best suited for online data analysis. We also integrated GOES-16 and GOES-17 geostationary satellites to perform new scans every 5-15 minutes, so users of the platform can get more relevant data about nearby forest fires. You can try the platform on the link.
We are proud to have participated in the development of GlideFinder because we had a unique opportunity, not only to increase our Google Cloud Platform expertise, but also to help save the lives of others.