更新时间:2021-08-14 07:58:47
class pyspark.mllib.tree.RandomForest[source]
Learning algorithm for a random forest model for classification or regression.
New in version 1.2.0.
Train a random forest model for binary or multiclass classification.
Parameters: |
|
---|---|
Returns: |
RandomForestModel that can be used for prediction. |
Example usage:
>>> from pyspark.mllib.regression import LabeledPoint
>>> from pyspark.mllib.tree import RandomForest
>>>
>>> data = [
... LabeledPoint(0.0, [0.0]),
... LabeledPoint(0.0, [1.0]),
... LabeledPoint(1.0, [2.0]),
... LabeledPoint(1.0, [3.0])
... ]
>>> model = RandomForest.trainClassifier(sc.parallelize(data), 2, {}, 3, seed=42)
>>> model.numTrees()
3
>>> model.totalNumNodes()
7
>>> print(model)
TreeEnsembleModel classifier with 3 trees
>>> print(model.toDebugString())
TreeEnsembleModel classifier with 3 trees
Tree 0:
Predict: 1.0
Tree 1:
If (feature 0 <= 1.0)
Predict: 0.0
Else (feature 0 > 1.0)
Predict: 1.0
Tree 2:
If (feature 0 <= 1.0)
Predict: 0.0
Else (feature 0 > 1.0)
Predict: 1.0
>>> model.predict([2.0])
1.0
>>> model.predict([0.0])
0.0
>>> rdd = sc.parallelize([[3.0], [1.0]])
>>> model.predict(rdd).collect()
[1.0, 0.0]
New in version 1.2.0.
摘自:https://spark.apache.org/docs/latest/api/python/pyspark.mllib.html#pyspark.mllib.tree.DecisionTree
本文转自张昺华-sky博客园博客,原文链接:http://www.cnblogs.com/bonelee/p/7150484.html
,如需转载请自行联系原作者