UAP Data Analysis On 80,000 Sightings

Don Tadaya uses the 80,000 record NUFORC UAP database to demonstrate the applications of data-science for trend analysis and identifying hidden relationships in anomalous data-sets. Don is a Software Engineer & Data Scientist with experience at Northrup Grumman, DaScient Capital, and the National Security Agency. He has a BA in Mathematics & Philosophy from Arizona State University and certifications in Python, machine learning, java, and privacy from the University of Maryland, Stanford, and MIT, respectively.

Don discusses his work analyzing over 80,000 UFO reports from the National UFO Reporting Center (NUFORC) using Python and Kaggle. The analysis, detailed in a publicly accessible Kaggle notebook, involved cleaning and standardizing the data, addressing inconsistencies across decades of reports. Key data points included location (latitude/longitude), shape, date, comment length, and duration. Natural Language Processing (NLP) techniques were used to analyze comments, assigning a “veracity” score based on Robert Powell’s methodology. The resulting visualizations, including interactive 3D plots, reveal interesting trends, such as a decline in “flying disc” sightings and an increase in “sphere” and “light” sightings.

The analysis highlights the potential of applying these data analysis techniques to other anomalous datasets, emphasizing the need to overcome data silos and leverage readily available tools like Kaggle and Python. Future work includes incorporating machine learning to improve classification (e.g., distinguishing balloons from anomalous events) and expanding the analysis beyond US data. The project, completed in approximately eight days, demonstrates the power of data science in analyzing complex and potentially controversial datasets.

The Power of Python and Kaggle

The podcast delves into the practical aspects of Dawn’s analysis, highlighting the role of Python and the Kaggle platform. Kaggle, a collaborative data science platform, provided the perfect environment for this project. Its accessibility makes it a valuable tool for both beginners and experienced data scientists, fostering collaboration and innovation. The podcast demonstrates how Dawn utilized Kaggle’s capabilities to import, clean, and analyze the NUFORC data, creating interactive visualizations that bring the data to life.

Data Cleaning: The Foundation of Insight

A significant portion of the project involved data cleaning. The NUFORC dataset, spanning decades, presented challenges due to inconsistent reporting standards. Dawn meticulously cleaned the data, removing null values and standardizing inconsistent entries. This process, which took approximately 60% of the total project time, was crucial for accurate trend analysis. The initial 80,000 records were refined to a cleaner dataset of 66,516 observations.

Visualizing the Unknown: Interactive Data Exploration

The cleaned dataset was then used to create a series of compelling visualizations on Kaggle. These visualizations, accessible to the public without a login, allow users to explore the data interactively. Key features include:

  • Interactive 3D Plot: This plot visualizes sightings over time, using latitude, longitude, and date, with comment length determining bubble size. Users can filter by shape, allowing for detailed exploration of specific UAP types.
  • Dynamic Charts: These charts explore various relationships within the data, such as sighting duration by city, shape by state, and more. These dynamic visualizations provide a powerful way to identify trends and patterns.
  • Top 20 US Cities: The analysis revealed Seattle as a hotspot for reported sightings, surpassing even major cities like Phoenix, Las Vegas, and Los Angeles.

Beyond the Numbers: NLP and Veracity

The analysis extended beyond simple geographical location and sighting duration. Dawn employed Natural Language Processing (NLP) techniques to extract additional information from the textual comments accompanying each report. This allowed her to assess the “veracity” of reports, using a scoring system developed by Robert Powell, which weighs scientific descriptions against more emotional or subjective accounts.

Unexpected Findings and Future Directions

The analysis revealed some intriguing trends. For example, there’s been a decline in reported “flying disc” sightings, while sightings of spheres and shining lights have increased. The analysis also highlighted the need for further investigation into specific geographical areas and the potential for misidentification of certain phenomena (e.g., Iridium flares).

The project is ongoing, and Dawn plans to expand the analysis to include data from other sources, such as NICAP’s aviation UAP reports. She also plans to incorporate machine learning techniques to improve the classification of sightings and to develop predictive models. The collaboration with Kevin Colby, a former US Air Force tech sergeant, brings valuable expertise in textual analysis, complementing Dawn’s focus on visual analytics.

Conclusion: A New Era of UAP Analysis

Dawn Tadaya’s work represents a significant step forward in the analysis of UAP data. By applying modern data science techniques, she has transformed a vast, often chaotic dataset into a rich source of insights. Her project serves as a powerful example of how data analysis can shed light on even the most enigmatic phenomena, and it opens up exciting possibilities for future research in this field. The accessibility of her Kaggle notebook allows anyone to explore the data and contribute to this ongoing investigation. This is not just about UFOs; it’s about demonstrating the power of data science to unlock secrets hidden within seemingly disparate datasets. The techniques used here are applicable to a wide range of anomalous data sets, paving the way for a more data-driven approach to understanding the unknown.

May 29, 2023In Detection & Analytics