Union null-safe con eqNullSafe
La igualdad estandar elimina los NULL de ambos lados (NULL == NULL es NULL); eqNullSafe los empareja, imprescindible en claves compuestas nullable.
Requisitos
PySpark 3.x
Python
from pyspark.sql import functions as F
# Jointure standard : les lignes où region est NULL ne matchent JAMAIS
joined_lossy = old.join(new, old["region"] == new["region"])
# Null-safe : NULL <=> NULL est vrai
joined_safe = old.alias("o").join(
new.alias("n"),
F.col("o.region").eqNullSafe(F.col("n.region"))
& F.col("o.segment").eqNullSafe(F.col("n.segment")),
"full_outer",
)
# Équivalent SQL : ON o.region <=> n.regionResultado
>>> joined_safe.select("o.region", "o.segment", "n.amount").show()
+------+-------+------+
|region|segment|amount|
+------+-------+------+
| EMEA| SMB|482.50|
| null| ENT|310.00|
| null| null| 75.25|
+------+-------+------+
Jointure standard : 1 ligne ; eqNullSafe : 3 lignes (les NULL matchent)PySparkNULLeqNullSafeJoin