Envoy源码分析之Stats Scope

更新时间：2022-09-30 13:59:56

Scope

在上一篇文章中提到Envoy中通过Scope来创建Metrics，为什么要搞一个Scope的东西出来呢?Scope诞生的目的其实是为了更好的管理一组stats，比如关于集群的stats，这类stats的名称有个特点就是都是以cluster.作为前缀，那么可以以cluster.来创建一个Scope，这样就可以通过这个Scope来管理所有的集群相关的stats，而且通过这个Scope创建的stats其名称可以省略掉cluster.前缀，这样可以节约很多内存资源。通过Scope还可以创建Scope，创建的Scope的名字会带上父Scope的名称。

上面这张图表示的是两个集群的upstream_rq_total这个指标使用Scope的表示形式。完整的指标名称是cluster.http1_cluster.upstream_rq_total和cluster.http2_cluster.upstream_rq_total在Envoy中会首先创建一个cluster.的Scope，然后通过这个Scope创建一个http1_cluster.的Scope，然后再创建一个http2_cluster.的Scope，最后分别利用这两个Scope创建upstream_rq_total stats。通过Scope一来可以有效的管理一组stats，另外通过Scope可以让一类stats共享stats前缀。避免冗余的stats字符串。例如上面的upstream_rq_total只需要存放upstream_rq_total这个字符串即可，可以共享对应Scope提供的前缀

  ScopePtr root_scope = store_->createScope("cluster.");
  auto http1_scope = root_scope->createScope("http1_cluster.");
  auto http2_scope = root_scope->createScope("http2_cluster.");
  auto upstream_rq_total_http1 = http1_scope->counter("upstream_rq_total");
  auto upstream_rq_total_http2 = http2_scope->counter("upstream_rq_total");

Store、ThreadLocalStore、 TlsScope

有了Scope后那如何去创建Scope呢?，如何去管理所有的Scope创建的Metrics呢?

Envoy源码分析之Stats Scope

Store继承自Scope接口，并额外增加了counters、gauges、histograms三个方法用于从所有的Scope中汇总所有的Metrics。StoreRoot继承Store并添加了和TagProducer、StatsMatcher、Sink相关的三个方法，最后ThreadLocalStoreImpl实现了这三个接口。首先来看下createScope方法，这是用来创建一个Scope然后返回，所有的Scope都存放在scopes_成员中。这里返回的Scope具体类型是ScopeImpl，继承自TlsScope。

ScopePtr ThreadLocalStoreImpl::createScope(const std::string& name) {
  auto new_scope = std::make_unique<ScopeImpl>(*this, name);
  Thread::LockGuard lock(lock_);
  scopes_.emplace(new_scope.get());
  return new_scope;
}

接着我们看下TlsScope。

class TlsScope : public Scope {
public:
  ~TlsScope() override = default;
  virtual Histogram& tlsHistogram(StatName name, ParentHistogramImpl& parent) PURE;
};

只是额外添加了一个tlsHistogram方法而已，继续看下它的实现。

  struct ScopeImpl : public TlsScope {
    ......
    ScopePtr createScope(const std::string& name) override {
      return parent_.createScope(symbolTable().toString(prefix_.statName()) + "." + name);
    }
        ....
    static std::atomic<uint64_t> next_scope_id_;

    const uint64_t scope_id_;
    ThreadLocalStoreImpl& parent_;
    StatNameStorage prefix_;
    mutable CentralCacheEntry central_cache_;
  };

  struct CentralCacheEntry {
    StatMap<CounterSharedPtr> counters_;
    StatMap<GaugeSharedPtr> gauges_;
    StatMap<ParentHistogramImplSharedPtr> histograms_;
    StatNameStorageSet rejected_stats_;
  };

每一个Scope都有一个CentralCacheEntry成员用于存放缓存的Metrics，createScope方法最终调用的还是ThreadLocalStoreImpl::createScope，所以ThreadLocalStoreImpl中可以保存所有创建的Scope。接下来看下ScopeImpl是如何创建Metrics的。

Counter& ScopeImpl::counter(const std::string& name) override {
  StatNameManagedStorage storage(name, symbolTable());
  return counterFromStatName(storage.statName());
}
Counter& ScopeImpl::counterFromStatName(StatName name) {
  // Setp1: 先通过StatsMatcher模块检查是否拒绝产生Stats，如果是就直接返回的一个NullCounter
  if (parent_.rejectsAll()) {
    return parent_.null_counter_;
  }

  // Setp2: 拼接完整的stat name
  Stats::SymbolTable::StoragePtr final_name = symbolTable().join({prefix_.statName(), name});
  StatName final_stat_name(final_name.get());

  // Setp3: 从thread local缓存中获取scope的缓存
  StatMap<CounterSharedPtr>* tls_cache = nullptr;
  StatNameHashSet* tls_rejected_stats = nullptr;
  if (!parent_.shutting_down_ && parent_.tls_) {
    TlsCacheEntry& entry = parent_.tls_->getTyped<TlsCache>().scope_cache_[this->scope_id_];
    tls_cache = &entry.counters_;
    tls_rejected_stats = &entry.rejected_stats_;
  }
    // Setp4: 创建Counter
  return safeMakeStat<Counter>(
      final_stat_name, central_cache_.counters_, central_cache_.rejected_stats_,
      [](Allocator& allocator, StatName name, absl::string_view tag_extracted_name,
         const std::vector<Tag>& tags) -> CounterSharedPtr {
        return allocator.makeCounter(name, tag_extracted_name, tags);
      },
      tls_cache, tls_rejected_stats, parent_.null_counter_);
}

为什么创建一个Counter要去拿TlsCache呢?，TlsCacheEntry和CentralCacheEntry是什么关系呢?

struct TlsCache : public ThreadLocal::ThreadLocalObject {
  absl::flat_hash_map<uint64_t, TlsCacheEntry> scope_cache_;
};

struct TlsCacheEntry {
    StatMap<CounterSharedPtr> counters_;
    StatMap<GaugeSharedPtr> gauges_;
    StatMap<TlsHistogramSharedPtr> histograms_;
    StatMap<ParentHistogramSharedPtr> parent_histograms_;
    StatNameHashSet rejected_stats_;
  };

可以看出这个TlsCache中存放的内容是一个Map，key是Scope id(目的是为了可以在ThreadLocal中存放多个Scope，通过Scope id来区分)，value是一个TlsCacheEntry，这个结构和Scope内的CentralCacheEntry是一模一样的。做这些的目的其实还是为了能让Envoy可以在核心流程中无锁的进行stats的统计。如果多个线程共享同一个Scope，那么每一个线程都通过同一个Scope来访问CentralCacheEntry，那么自然会存在多线程的问题，也就是说每次访问CentralCacheEntry都需要加锁。如果每一个线程都有一个自己独立的Scope，每一个Scope共享相同的Metrics，每个线程访问自己的Scope是线程安全的，然后找到对应的Metrics，这个Metrics本身的操作是线程安全的，这样就可以使得整个过程是无锁的了。为此Scope和内部存放的Metrics是解耦的，默认CentralCacheEntry为空，每当获取一个stats的时候，先查ThreadLocal中是否存在，不存在就去看CentralCacheEntry，没有的话就创建stats，然后放入CentralCacheEntry中，然后再存一份到ThreadLocal中，这样做的目的是为了可以在主线程可以通过遍历所有的Scope拿到CentralCacheEntry来最最后的汇总，具体的代码分析可以看下面的注释。

template <class StatType>
StatType& ThreadLocalStoreImpl::ScopeImpl::safeMakeStat(
    StatName name, StatMap<RefcountPtr<StatType>>& central_cache_map,
    StatNameStorageSet& central_rejected_stats, MakeStatFn<StatType> make_stat,
    StatMap<RefcountPtr<StatType>>* tls_cache, StatNameHashSet* tls_rejected_stats,
    StatType& null_stat) {
    // Setp1: 这个stats是否被rejected
  if (tls_rejected_stats != nullptr &&
      tls_rejected_stats->find(name) != tls_rejected_stats->end()) {
    return null_stat;
  }
    // Setp2: 查看Tls cache是否存在，存在就直接返回
  // If we have a valid cache entry, return it.
  if (tls_cache) {
    auto pos = tls_cache->find(name);
    if (pos != tls_cache->end()) {
      return *pos->second;
    }
  }

  // We must now look in the central store so we must be locked. We grab a reference to the
  // central store location. It might contain nothing. In this case, we allocate a new stat.
  // Setp3: 搜索central_cache，如果不存在就创建stats，这里要加锁的，因为主线程会访问            
  //                 central_cache，其他线程也会操作central_cache。
  Thread::LockGuard lock(parent_.lock_);
  auto iter = central_cache_map.find(name);
  RefcountPtr<StatType>* central_ref = nullptr;
  if (iter != central_cache_map.end()) {
    central_ref = &(iter->second);
  } else if (parent_.checkAndRememberRejection(name, central_rejected_stats, tls_rejected_stats)) {
    // Note that again we do the name-rejection lookup on the untruncated name.
    return null_stat;
  } else {
    TagExtraction extraction(parent_, name);
    RefcountPtr<StatType> stat =
        make_stat(parent_.alloc_, name, extraction.tagExtractedName(), extraction.tags());
    ASSERT(stat != nullptr);
    central_ref = &central_cache_map[stat->statName()];
    *central_ref = stat;
  }
    
  // Step4: 往Tls中也插入一份，使得Tls cache和central cache保持一致
  // If we have a TLS cache, insert the stat.
  if (tls_cache) {
    tls_cache->insert(std::make_pair((*central_ref)->statName(), *central_ref));
  }

  // Finally we return the reference.
  return **central_ref;
}

整个Scope的TlsCache、Central cache以及Metrics的的关系可以用下面这张图来表示。

Envoy源码分析之Stats Scope

IsolatedStoreImpl

最后来讲解下IsolatedStoreImpl，总的来说Envoy的stats store存在两个类别，一类就是ThreadLocalStore，这类store可以通过StoreRoot接口添加TagProducer、StatsMatcher以及设置Sink，也就是说这类Store存储的stats可以进行Tag的提取、可以通过配置的Sink把stats发送到其他地方，目前Envoy支持的Sink有statsd、dog_statsd、metrics_service、hystrix等，发送stats的时候还可以根据配置的StatsMatcher有选择的发送符合要求的stats，另外一类的stats store就是IsolatedStoreImpl，这类stats store仅仅是用来存储Envoy内部使用的一些stats，比如per upstream host的stats统计。这类stats量很大，它使用的就是IsolatedStoreImpl，也不会通过admin的stats接口暴露出去。IsolatedStoreImpl另外的一个用途就是单元测试。

总结

本文首先讲解了Scope的设计意图，通过Scope可以管理一组stats，还可以共享stats前缀，避免不必要的字符串冗余，接着讲解了stats store，一类是ThreadLocalStore，这类store通过central cache和Tls cache的设计避免了加锁操作，每个线程都会创建Scope还有对应的，每一个Scope都有一个central cache以及在ThreadLocal中有一个TlsCache，所有的这些Cache引用的Metrics是共享的。另外一类是IsolatedStoreImpl，是非线程安全的，在Envoy中主要用于两个地方，一个是per host的stats统计，另外一个则是单元测试，充当一个简单的stats store来进行stats统计相关的测试。

上一篇 : ：CCF小白刷题之路---201812-2 小明放学（C/C++ 100分）下一篇 : iOS视图置顶的应用：适配iOS12系统上日期控件被筛选视图遮挡问题

Envoy源码分析之Stats Scope

Scope

Store、ThreadLocalStore、 TlsScope

IsolatedStoreImpl

总结

相关阅读

推荐文章