Forward fill: propagar el ultimo valor conocido
last(ignorenulls=True) sobre una ventana no acotada a la izquierda para rellenar los huecos de una serie: el ffill de pandas, en distribuido.
Requisitos
PySpark 3.x
Python
from pyspark.sql import functions as F
from pyspark.sql.window import Window
w_ffill = (
Window.partitionBy("sensor_id")
.orderBy("reading_ts")
.rowsBetween(Window.unboundedPreceding, Window.currentRow)
)
df_filled = df.withColumn(
"temperature_filled",
F.last("temperature", ignorenulls=True).over(w_ffill),
)
# Backward fill : first(ignorenulls=True) sur la fenêtre miroir
# .rowsBetween(Window.currentRow, Window.unboundedFollowing)Resultado
+---------+-------------------+-----------+------------------+ |sensor_id| reading_ts|temperature|temperature_filled| +---------+-------------------+-----------+------------------+ | S-01|2026-06-09 10:00:00| 21.4| 21.4| | S-01|2026-06-09 10:05:00| null| 21.4| | S-01|2026-06-09 10:10:00| null| 21.4| | S-01|2026-06-09 10:15:00| 22.1| 22.1| +---------+-------------------+-----------+------------------+
PySparkWindowForward fillImputation