Manipulações básicas em dados de geolocalização utilizando Python

Mãos à obra

pip install pandas
import pandas as pddf = pd.read_csv('olist_geolocation_dataset.csv')
df.head()
df.info()
df.geolocation_state.value_counts()
df_agg = df.groupby(['geolocation_lat', 'geolocation_lng']).agg(
uf=('geolocation_state', 'min'),
n_pontos=('geolocation_lat', 'count')
).reset_index()
df_agg.head()
df_agg.shape

Garantindo somente dados do estado de Alagoas

pip install shapely
import json
from shapely.geometry import Point, Polygon
with open('AL.json') as json_file:
data_geo = json.load(json_file)
df_borders = pd.DataFrame(data_geo['borders'][0])df_borders.head()
poligono = Polygon(zip(list(df_borders.lng), list(df_borders.lat)))poligono
def within_polygon(lng, lat, polygon):
point = Point(float(lng), float(lat))
return point.within(polygon)
df_agg['localizado_no_poligono'] = df_agg.apply(lambda x: within_polygon(
x.geolocation_lng, x.geolocation_lat, poligono), axis=1)
df_agg.head()
df_agg.localizado_no_poligono.value_counts()
pd.crosstab(df_agg.uf, df_agg.localizado_no_poligono).reset_index()
df_al = df_agg[df_agg.localizado_no_poligono]df_al.shape

Plotando os pontos no mapa de Alagoas

pip install seabornpip install geopandas
import geopandas as gpd
import os
import seaborn as sns
import matplotlib.pyplot as plt
shape_path = os.path.join('al_municipios', 'AL_Municipios_2019.shp')
shape_al = gpd.read_file(shape_path)
fig, ax = plt.subplots(figsize=(15,8))
shape_al.plot(ax=ax, color='lightgray')
fig, ax = plt.subplots(figsize=(15,8))
shape_al.plot(ax=ax, color='lightgray')
sns.scatterplot(data=df_al,
x='geolocation_lng',
y='geolocation_lat',
size='n_pontos',
sizes=(50, 500),
alpha=0.3)
Fonte: Wikipedia
pip install haversine
from haversine import haversine, Unit, haversine_vectormaceio = (-9.647449, -35.709190)
itamaraca = (-9.345859, -35.865804)
haversine(maceio, itamaraca)
haversine(maceio, itamaraca, unit=Unit.METERS)
haversine(maceio, itamaraca, unit=Unit.MILES)
haversine(maceio, itamaraca, Unit.NAUTICAL_MILES)
df_al_10 = df_al.head(10).reset_index(drop=True)df_al_10
dij = df_al_10[['geolocation_lat', 'geolocation_lng']]
dij = [tuple(x) for x in dij.to_numpy()]
dij = haversine_vector(dij, dij, Unit.KILOMETERS, comb=True)
pd.DataFrame(dij).head(10)

Conclusão

--

--

--

https://acsjunior.com

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
António C. da Silva Júnior

António C. da Silva Júnior

https://acsjunior.com

More from Medium

Python Lesson : Tuples in Python

An introduction to multiprocessing in Python

[Python] Concurrent module

Auto-book COVID test appointment