Spark

Union null-safe con eqNullSafe

La igualdad estandar elimina los NULL de ambos lados (NULL == NULL es NULL); eqNullSafe los empareja, imprescindible en claves compuestas nullable.

Requisitos

PySpark 3.x

Python
from pyspark.sql import functions as F

# Jointure standard : les lignes où region est NULL ne matchent JAMAIS
joined_lossy = old.join(new, old["region"] == new["region"])

# Null-safe : NULL <=> NULL est vrai
joined_safe = old.alias("o").join(
    new.alias("n"),
    F.col("o.region").eqNullSafe(F.col("n.region"))
    & F.col("o.segment").eqNullSafe(F.col("n.segment")),
    "full_outer",
)
# Équivalent SQL : ON o.region <=> n.region

Resultado

>>> joined_safe.select("o.region", "o.segment", "n.amount").show()
+------+-------+------+
|region|segment|amount|
+------+-------+------+
|  EMEA|    SMB|482.50|
|  null|    ENT|310.00|
|  null|   null| 75.25|
+------+-------+------+

Jointure standard : 1 ligne ; eqNullSafe : 3 lignes (les NULL matchent)
PySparkNULLeqNullSafeJoin

Snippets relacionados

Volver al Data Lab