且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

[Spark][Python]DataFrame的左右连接例子

更新时间:2022-09-22 16:59:42

[Spark][Python]DataFrame的左右连接例子

$ hdfs dfs -cat people.json

 

{"name":"Alice","pcode":"94304"}
{"name":"Brayden","age":30,"pcode":"94304"}
{"name":"Carla","age":19,"pcoe":"10036"}
{"name":"Diana","age":46}
{"name":"Etienne","pcode":"94104"}

 

$ hdfs dfs -cat pcodes.json

{"pcode":"10036","city":"New York","state":"NY"}
{"pcode":"87501","city":"Santa Fe","state":"NM"}
{"pcode":"94304","city":"Palo Alto","state":"CA"}
{"pcode":"94104","city":"San Francisco","state":"CA"}

 

$pyspark

sqlContext = HiveContext(sc)
peopleDF = sqlContext.read.json("people.json")
peopleDF.limit(5).show()

 

[Spark][Python]DataFrame的左右连接例子
+----+-------+-----+-----+
| age| name|pcode| pcoe|
+----+-------+-----+-----+
|null| Alice|94304| null|
| 30|Brayden|94304| null|
| 19| Carla| null|10036|
| 46| Diana| null| null|
|null|Etienne|94104| null|
+----+-------+-----+-----+
[Spark][Python]DataFrame的左右连接例子

 

sqlContext = HiveContext(sc)
pcodesDF = sqlContext.read.json("pcodes.json")
pcodesDF.limit(5).show()

[Spark][Python]DataFrame的左右连接例子
+-------------+-----+-----+
| city|pcode|state|
+-------------+-----+-----+
| New York|10036| NY|
| Santa Fe|87501| NM|
| Palo Alto|94304| CA|
|San Francisco|94104| CA|
+-------------+-----+-----+
[Spark][Python]DataFrame的左右连接例子

 

mydf000 = peopleDF.join(pcodesDF,"pcode")
mydf000.limit(5).show()

 

[Spark][Python]DataFrame的左右连接例子
+-----+----+-------+----+-------------+-----+
|pcode| age| name|pcoe| city|state|
+-----+----+-------+----+-------------+-----+
|94304|null| Alice|null| Palo Alto| CA|
|94304| 30|Brayden|null| Palo Alto| CA|
|94104|null|Etienne|null|San Francisco| CA|
+-----+----+-------+----+-------------+-----+
[Spark][Python]DataFrame的左右连接例子

 

mydf001=peopleDF.join(pcodesDF,"pcode","leftsemi")
mydf001.limit(5).show()

 

[Spark][Python]DataFrame的左右连接例子
+-----+----+-------+----+
|pcode| age| name|pcoe|
+-----+----+-------+----+
|94304|null| Alice|null|
|94304| 30|Brayden|null|
|94104|null|Etienne|null|
+-----+----+-------+----+
[Spark][Python]DataFrame的左右连接例子

 

mydf002=peopleDF.join(pcodesDF,"pcode","left_outer")
mydf002.limit(5).show()

 

[Spark][Python]DataFrame的左右连接例子
+-----+----+-------+-----+-------------+-----+
|pcode| age| name| pcoe| city|state|
+-----+----+-------+-----+-------------+-----+
|94304|null| Alice| null| Palo Alto| CA|
|94304| 30|Brayden| null| Palo Alto| CA|
| null| 19| Carla|10036| null| null|
| null| 46| Diana| null| null| null|
|94104|null|Etienne| null|San Francisco| CA|
+-----+----+-------+-----+-------------+-----+
[Spark][Python]DataFrame的左右连接例子

 

mydf003=peopleDF.join(pcodesDF,"pcode","right_outer")
mydf003.limit(5).show()

 

[Spark][Python]DataFrame的左右连接例子
+-----+----+-------+----+-------------+-----+
|pcode| age| name|pcoe| city|state|
+-----+----+-------+----+-------------+-----+
|10036|null| null|null| New York| NY|
|87501|null| null|null| Santa Fe| NM|
|94304|null| Alice|null| Palo Alto| CA|
|94304| 30|Brayden|null| Palo Alto| CA|
|94104|null|Etienne|null|San Francisco| CA|
+-----+----+-------+----+-------------+-----+
[Spark][Python]DataFrame的左右连接例子



本文转自健哥的数据花园博客园博客,原文链接:http://www.cnblogs.com/gaojian/p/7633001.html,如需转载请自行联系原作者